Recent advances and perspectives in next generation sequencing application to the genetic research of type 2 diabetes

2019-07-23 11:52YuliaNasykhovaYuryBarbitoffElenaSerebryakovaDmitryKatserovAndreyGlotov
World Journal of Diabetes 2019年7期

Yulia A Nasykhova,Yury A Barbitoff,Elena A Serebryakova,Dmitry S Katserov,Andrey S Glotov

Abstract

Key words: Type 2 diabetes;Next-generation sequencing;Epigenetics;Genome-wide association study;Microbiome

INTRODUCTION

Type 2 diabetes (T2D) mellitus is a common complex disease that currently affects more than 400 million people throughout the world,and it is projected 552 million cases of T2D by the year 2030[1].The disease is characterized by insulin resistance and beta-cell dysfunction and can seriously impair overall quality of life[2].T2D may lead to increased risk of cardiovascular disease,stroke,kidney failure and can result in lower life expectancy by 5-10 years[3-5].T2D etiology is known to have a significant genetic component that is confirmed by family- and twin-based studies.The risk of the disease developing is approximately 70% when both parents have T2D and approximately 40% when one parent has disease[6].Twin studies have shown that the heritability of T2D ranges from 26% to 73%,and the concordance rate for T2D in monozygotic twins can reach 76%[7].Early identification of individuals at high T2D risk enables delay or prevention of T2D onset through effective lifestyle and/or pharma-cological interventions and has been shown to reduce costs of healthcare that causes continuing strong interest in revealing risk markers of T2D[8,9].

The development of high-throughput and affordable genotyping technologies,statistical tools and computational software has allowed remarkable progress over the past decade in the search for genetic associations.Since the first genome-wide association study (GWAS) for T2D identified novel susceptibility loci in 2007,more than 100 T2D susceptibility loci have been discovered[10].Next-generation sequencing(NGS) technologies have a broad range of applications in studying the genetic causes of T2D,such as: (1) Identification of rare and common genetic variants,associated with disease;(2) Functional studies for describing role of genes in disease pathogenesis;and (3) Evaluation of environmental contribution to the disease by using microbiome profiling methods.However,it remains uncertain if and to what extent our increasing knowledge of genetic and epigenetic T2D risk factors gained by NGS methods will translate into clinical practice.

The aim of this article is to summarize recent progress and discoveries for T2D genetics focusing on the sequencing analysis-based studies and review the challenges in studying the genetic basis of T2D in order to improve diagnosis,prevention,and treatment.

T2D SUSCEPTIBILITY LOCI IDENTIFIED BEFORE THE ERA OF GWAS

The earliest genetic studies of T2D susceptibility focused on family-based linkage analysis and analysis of candidate genes in small-size groups of patients.This approach was successful in identifying familial genetic variants with large effects such as those involved in monogenic forms of the disease.In the past two decades,numerous candidate gene studies have been performed to identify genetic variants for T2D.However only 4 genetic markers identified in these studies have been confirmed later by GWAS.The first genetic variant for T2D was the P12A polymorphism(rs1801282) in peroxisomal proliferator activated-receptor gamma gene (PPARG)[11].Then,in 2003,in a large-scale association study the previously identified association between the E23K (rs5219) polymorphism in a gene encoding inwardly rectifying potassium channel subfamily J,member 11 (KCNJ11) and T2D was replicated[12].E23K can alter function by inducing spontaneous over-activity of pancreatic β-cells,thus increasing the threshold ATP concentration for insulin release[13].In previous studies a polymorphism in this genes (KCNJ11 E23K) has been reported to be associated with T2D in several populations,although the data was inconsistent[14-18].Transcription factor 7-like 2 (T-cell specific,HMG-box) (TCF7L2) was shown to be associated with T2D[19].TCF7L2gene product is a member of the high mobility group box family of transcription factors,activated by the WNT signaling pathway and may play a master role in regulating insulin biosynthesis,secretion,and processing.Subsequently,two single nucleotide polymorphisms (SNPs) within intron 3 ofTCF7L2,rs7903146 and rs12255372,were confirmed to be strongly associated with T2D risk[20-22].Wolfram syndrome 1 gene (wolframin) (WFS1) was reported to be associated with T2D on the basis of in-depth studies of candidate genes[23].TheWFS1gene encodes wolframin,endoplasmic reticulum (ER) membrane protein with a role in ER calcium homeostasis.Mutations inWFS1are known to be associated with Wolfram syndrome[24].

GENOME-WIDE ASSOCIATION STUDIES ON T2D

Advances in technology of SNP genotyping,implementation of recent genetic knowledge gained from the Human Genome Project,and development of robust statistical methods have allowed GWAS to become the basic method for identification of common genetic variants associated with complex diseases such as T2D.Since the application of GWAS technology the discovery of genetic variants associated with T2D has developed dramatically.

In 2007,the first GWAS performed for T2D has identified three novel susceptibility loci related to pancreatic β-cells: (1) Solute carrier family 30 (zinc transporter),member 8 (SLC30A8),which is expressed exclusively in insulin-producing β-cells;(2)Insulin-degrading enzyme (IDE)-kinesin-interacting factor 11 (KIF11)-hematopoietically expressed homeobox (HHEX);and (3) Exostosin glycosyltransferase 2(EXT2)-ALX homeobox 4 (ALX4)[25].Subsequent GWAS revealed four additional loci associated with T2D,namely CDK5 regulatory subunit associated protein 1-like 1(CDKAL1),cyclin-dependent kinase inhibitor 2A (CDKN2A/B),insulin-like growth factor 2 mRNA binding protein 2 (IGF2BP2),and fat mass and obesity associated(FTO)[26-30].In addition,HNF1 homeobox B (HNF1B/ TCF2),a gene related to maturityonset diabetes of the young type 5 (MODY5),was shown to be associated with T2D[31].One important finding from the initial GWAS results was that effect sizes for common variants involved in T2D were likely to be modest.The statistical power to detect associations between genetic variants and a trait depends on the sample size,the distribution of effect sizes of (unknown) causal genetic variants,the frequency of those variants,and the linkage disequilibrium (LD) between observed genotyped DNA variants and the unknown causal variants[32].This led to an innovative data merging strategy now known as GWAS meta-analysis and resulted in multiple waves of GWAS studies for T2D.

In 2008,six new T2D loci includingJAZF1,CDC123/calcium/CAMK1D,TSPAN8/LGR5,THADA,ADAMTS9,andNOTCH2were reported by a meta-analysis combining three previous GWAS [Diabetes Genetic Initiative (DGI),Finland-United States Investigation of NIDDM Genetics (FUSION),and Wellcome Trust Case Control Consortium (WTCCC)][33].In 2009,two loci,namely insulin receptor substrate 1 (IRS1)and melatonin receptor 1B (MTNR1B) were identified to be associated with T2D by GWAS[34-36].TheIRS1gene is related to insulin resistance and hyperinsulinemia,whereasMTNR1Bis involved in impaired early insulin response to glucose[35].

In 2010 the second wave of the GWAS identified 17 new loci associated with T2D which was made possible because of improved efficiency of GWAS genotyping technology,enabling interrogation of larger numbers of SNPs that better cover common genetic variation across populations in increased sample sizes,as well as because of methodological innovations,such as imputation (described below),which allows prediction of genotypes at SNPs not typed on GWAS arrays[37].

In the past year a leap forward has occurred from smaller,cumulative advances to the description of up to around 250 genome-wide significant loci of T2D[10].In this work,a large meta-analysis of GWAS in sample of T2D including 62892 cases and 596424 controls was performed by combining 3 GWAS data sets of European ancestry:DIAbetes Genetics Replication and Meta-analysis (DIAGRAM),Genetic Epidemiology Research on Aging,and the full cohort release of the UK Biobank 39 previously unknown loci have been identified[38].This study highlighted the benefits of integrating multiple omics data to identify functional genes and putative regulatory mechanisms caused by genetic variation.Future applications of integrative omics data analyses are expected to improve our understanding of the biological mechanisms underlying common diseases such as T2D[38].

MAPPING OF CAUSAL VARIANTS AND DISEASE GENES BY NGS METHODS

While conventional genome-wide association studies allow to identify associated loci,GWAS alone cannot be used to map causal variants (many of which are expectedly rare in population),as the method strictly focuses on pre-selected common variants identified by the HapMap project in the beginning of the century[39].On the other hand,NGS presents a reasonable alternative to the chip-based methods.For genotyping purposes,NGS reads are aligned to a reference genome,and a set of statistical procedures is performed to identify variant sites[40].Thus,NGS directly identifies most of the genetic variants present in an individual's genome irrespective of their frequency,which enables testing of all variants' association.In this section,we will focus on how NGS datasets might be used for identification of novel causal variants for T2D,and which loci have been identified by these methods.

Fine mapping of GWAS signal using NGS-based reference panels

Large genome and exome sequencing and aggregation consortia,such as the 1000 Genomes project or UK10K provide valuable insights into linkage disequilibrium,i.e.,co-occurrence rates,between different variants,enabling probabilistic reconstruction of individual genome sequences from fixed number of genotyped loci (such as in traditional GWAS).This in turn enables testing for the role of rare variants without sequencingper se[37,41,42].Large reference panels for such genotype imputation have been constructed from sequencing data[43].Genotype imputation has been widely used in the studies of the genetic architecture of T2D[44].An interesting example is a 2014 study of Icelandic population[45].In this work,whole-genome sequencing study of a cohort of 2630 Icelanders was performed;and the identified SNPs and indels were imputed into 98721 controls and T2D patients genotyped with Illumina SNP chips.As a result of this study a rare variant inHNF1Agene,encoding for a transcription factor required for the expression of several liver-specific genes was identified.Moreover,a new signal with associationP<1 × 10-8at rs76895963,located within the first intron of cyclin D2 (CCND2) was observed[45].Two of the most recent and comprehensive research efforts aimed at fine mapping of association signal using imputation and islet-specific epigenome maps identified multiple previously unreported loci for T2D,includingPNPLA3,LPL,TPCN2,DENND2C,andKIF2B[46,47].Apart from using NGS datasets for rare variant imputation,different approaches based on combined SNP and exome chip methods have been developed,enhancing the power of imputationbased analyses[48].

Association of single rare variants with T2D in NGS-based studies

As previously stated,many new genetic associations relevant to T2D have been revealed by GWASs,but these findings represent common and mid-frequency genetic variants with small effect sizes and explain only a small proportion of heritability of the disease.Sequencing approach enables more complete assessments of lowfrequency and rare genetic variants that can be promising in investigation of complex traits.

Many published studies have focused on identification of T2D susceptibility loci from NGS data.In Danish study,the exomes of 1974 Danes were sequenced to a depth of 8 × and subsequently a two-stage follow-up in 15989 Danes and in a further 63896 Europeans were performed.A low-frequency coding variant in CD300LG associated with fasting HDL-cholesterol and two common coding variants in COBLL1 and MACF1 have been shown to be associated with T2D[49].CD300LG encodes a protein proposed to serve multiple functions,including endocytosis of various immunoglobulins and mediation of L-selectin-dependent lymphocyte rolling[50,51].Non-coding SNPs in COBLL1 and MACF1 have previously been associated with other metabolic phenotypes[52-54].

To investigate the hypothesis of “missing heritability”,the Genetics of Type 2 Diabetes and Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples Consortium (GoT2D/T2D-GENES Consortium) undertook whole genome sequencing in 2657 Europeans with and without diabetes,and exome sequencing in a total of 12940 subjects from five ancestral groups.Results of this study showed that the variants associated with T2D were overwhelmingly common and most located within regions previously identified by GWAS.A few coding variant associations outside established common variant GWAS regions have been identified(rs41278853 in MTMR3 gene;rs11549795,rs28265,rs36571 inASCC2gene).A coding variant reached genome-wide significance that was common in East Asian ancestry population (PAX4Arg192His,rs2233580)[55].PAX4gene encodes a transcription factor involved in islet differentiation and function.SomePAX4variants have been associated with early-onset monogenic diabetes[56,57].

Specific statistical approaches for rare variant associations on NGS data

Despite decreasing costs of NGS-based analyses,there still remain certain notable limitations of such studies.The most evident limitation of all the rare variant-based tests on both whole-genome and imputed SNP array datasets is the difficulty of obtaining enough observations to make confident statistical inference.For example,if a causal variant occurs at a rate of 10-4in a population,one would require many hundreds of thousands of individuals to test its association with the disease.To allow testing for the association of rare variants,especially in smaller samples,a group of techniques were developed,called Rare Variant Association tests.Most of rare-variant tests are designed to identify candidate disease genes through aggregation of all rare variants inside the coding sequence of each gene.Numerous strategies for gene-level testing of rare-variant association have been developed[58].The two main groups of such methods test either the imbalance of rare allele counts between cases and controls (burden tests) or the proportion of phenotypic variance explained by rare variant genotypes (variance-component tests).However,for T2D almost few significant gene-level associations have been found even in the largest NGS-based population cohorts[55,59].Only the largest study performed by whole-exome sequencing to date,which included 20791 T2D cases and 24440 controls of multiple ancestries(Hispanic/Latino,European,African-American,East-Asian,South-Asian),identified several gene-level associations: in 3 genes at exome-wide significance,including a T2D protective series of>30SLC30A8alleles,and within 12 gene sets,including those corresponding to T2D drug targets and candidate genes from knockout mice.The strongest T2D rare variant gene-level signals was shown to explain at most 25% of the heritability of the strongest common single variant signals[60].

Several alternative techniques have been developed to overcome the limitations of rare variant testing.In samples of limited size based on exome sequencing or targeted resequencing,contribution of rare variants might be assessed using tests for casespecificity conditioned on true population minor allele frequency[61].Such strategy may help to identify variants that serve as the candidate causal markers for the pathology.In a recent study by our group,we identified potential association for theVAV3,ADAMTS13,HBQ1,andDBHgenes with T2D and obesity.While these genes have not been previously implicated in the disease,they are reasonable targets for further clinical investigation.

Another approach to counteract statistical power limitation of rare-variant based tests in small NGS-based datasets is the usage of pedigrees.The biggest advantage of familial studies is that cohorts of related individuals would have higher frequency of alleles that are rare in the general population.One recent example of pedigree-based analysis is a study of 20 Mexican-American families comprising 1034 highly related individuals[62].While this study still did not identify any significant associations for individual rare variants,it has shown gene-level association for theCYP3A4andOR2T11genes with glycemic traits,such as fasting glucose levels and 2h insulin levels.

Overall,there are several ways in which NGS might be used to assist identification of causal genes and variants for T2D pathology.These associations are of ultimate relevance for genomic risk prediction of T2D and clinical decision making[63].Some of the inherent limitation of the technology,however,still do not allow thorough analysis of chromosome- and genome-level genetic variation and/or complex genome regions that are poorly accessible to short read sequencing[64-66].The spread of thirdgeneration sequencing technologies,such as the Oxford Nanopore Technologies single-molecule sequencing,as well as modifications to the existing laboratory and/or bioinformatic practices would shed light on the roles of higher-level genetic variants in T2D pathology.

NGS IN FUNCTIONAL GENOMIC STUDIES OF T2D

Apart from methods aimed at genotyping,NGS can also be used to dissect functional genome elements rather than sequence variants.NGS techniques for these purposes include transcriptional profiling approaches (RNA-Seq),epigenome mapping techniques (positional methods),and other[67,68].These methods are commonly used to both identify candidate disease genes and understand pathological mechanisms behind the observed phenotype.Below,we will provide several recent examples of application of these methods to the research of T2D (Figure 1).

Transcriptional profiling of whole tissues and single cells by RNA-Seq

Transcriptional profiling methods,such as RNA-Seq,are used to study activity patterns of genes.In the recent decade,transcriptomic technologies were frequently used to decipher the molecular pathology behind human disease[69].T2D,being one of the most common pathologies,has also been extensively studied by transcriptional profiling techniques in the recent decade[70].Traditional way to analyze RNA-Seq data is to align the reads to a reference genome and count the numbers of reads or fragments mapped to each gene or transcript.These counts are then used to search for genes which significantly change their expression in case vs controls (differentially expressed genes,DEGs) using conventional statistical tests or linear regression models,and identify biological processes which are dysregulated in one of the conditions.The latter task is solved by a family of gene set enrichment tests that analyze overrepresentation of genes from a certain pathway among the identified DEGs.Multiple downstream analyses can be performed to identify disease genes and pathways from both bulk and single-cell RNA-Seq data[71].Below,we will focus on several notable examples of how both bulk and single-cell technologies can be used to identify genes involved in pathological mechanisms of T2D.

One example of a conventional bulk RNA-Seq approach used to identify diseaserelevant pathways can be found in a recent work that studied transcriptional profiles of diabetic keratinocytes[72].This study showed extensive dysregulation of immunityrelated genes in these cells compared to controls,with as many as 420 differentially expressed genes identified in total.Moreover,this study has suggested a causal role of miR-340-3p-DTX3Linteraction in the pathological processes occurring in diabetic skin.

Multiple studies have also focused on the roles of microRNA (miRNA) in the pathology of T2D[73].microRNAs are a separate class of RNA molecules which play an important role in gene regulation via post-transcriptional gene silencing.One of the most recent studies aiming at systematic analysis of microRNA involvement in T2D by aggregation of published data identified as many 158 microRNAs reported to be differentially expressed in T2D.One example of an important microRNA identified in this study is the miR-375 RNA which affects expression of several disease-relevant genes in islets and other tissues.

Many studies suggest that the alterations in miRNA levels are associated with T2D development and its complications.miRNA may play a key role in regulation of the processes of carbohydrate and lipid metabolisms,adipocytokine and insulin signaling pathways involved in T2D development.It was shown that the dysregulated in the islets miR-7-5p,-129-3p,-136-5p,-187-3p,-224-5p,-369-5p,-375 -495-3p,-589-5p,-655-3p affect the expression of important genes involved in insulin signaling pathway.The altered level of miRNA miR-17-5p,-155-5p,-125b-5p,-30e-5p,-27a-5p,-221-3p,-199a-5p,-130b-3p,-181a-5p,-29a,-29b can cause the dysregulation of lipid and glucose metabolisms.For miR-130b-3p,-140-5p,-147a,-199a-5p,-27b,-221-3p and -30e-5p) their involvement in the regulation of adipogenesis was identified[74].Stability of miRNAs,their presence in various body fluids and significant changes of specific circulating miRNAs' concentrations associated with diseases allow studying them as potential reliable biomarkers for complex diseases such as T2D and related complications.However,there are some obstacles for straightforward clinical application of circulating miRNAs.The biggest difficulty is due to the composition of circulating miRNA that are sum of many different tissues and cell types in the body.At the same time,it is well known that the expression of miRNAs varies considerably between different tissues.

Another important branch of NGS-based transcriptional profiling techniques is the single-cell RNA sequencing (scRNA-Seq) which allows researchers to study transcriptional responses of individual cells and cell-types.scRNA-Seq techniques are also being extensively used to identify key disease genes for T2D in pancreas cells.For example,scRNA-Seq of pancreatic islets suggested a role ofFXYD2andGPD2genes in pathological processes behind T2D in certain islet cell types,with as many as 245 dysregulated genes in total[75,76].

Identification of epigenetic disease markers

Another widely used group of NGS methods is aimed at understanding the language of epigenetic marks,i.e.,non-DNA based units of genetic information.NGS technologies for epigenome studies include but are not limited to: (1) Methods for detection of specific DNA-protein interaction (e.g.,Chromatin Immuno Precipitation followed by Sequencing);(2) Methods for identification of DNA methylation sites (such as reduced representation bisulfite sequencing);and (3) Open chromatin mapping technologies(e.g.,DNAse-Seq or ATAC-Seq)[67].All of these methods provide valuable insights into dysregulation of cellular processes,which is of ultimate importance for T2D pathology[77].Epigenetic marks,as the dynamic features of the cell,are frequently considered as convenient biomarkers for disease risk prediction and prognosis in the clinic.A large-scale survey on the adverse outcome of adiposity showed that methylation pattern at certain loci predicts development of T2D in overweight people[78].A recent analysis of published data identified 8 differentially methylated genes as potential blood biomarkers of T2D (TCF7L2,KCNQ1,ABCG1,TXNIP,PHOSPHO1,SREBF1,SLC30A8,andFTO)[79].Epigenome profiles might also be used to enhance identification of causal variants at complex GWAS loci[80].

Overall,RNA-Seq and positional NGS techniques provide a very useful framework to investigate cellular processes that are affected during disease pathogenesis.These data may in turn be used for both prediction of diabetes risk and for designing clinical treatment of the disease;furthermore,simultaneous consideration of genotype,expression profile and epigenetic factors might assist efficient personalized treatment of T2D.Further integration of multiple omics datasets would allow researchers and clinicians to have a comprehensive look into the molecular pathology behind T2D.

Figure 1 Schematic representation of main type 2 diabetes loci identified recently by high-throughput (mostly,next-generation sequencing-based)technologies.

NGS STUDIES OF HUMAN GUT MICROBIOME AND T2D ASSOCIATIONS

Rapid progress of NGS technologies and bioinformatic data processing methods led to the advent of metagenome studies,i.e.,investigation of the microbial composition of natural inhabitants.A decade of advances in the field of intestinal microbiome analysis demonstrated that alterations of gut bacteria composition is implicated in a few medical conditions,including diabetes and obesity[81-83].Such progress can be attributed to a number of factors,for example,stable decrease of price per single run for NGS platforms,continuous development of bioinformatic tool/pipelines[84-87],creation of specialized gut microbiome 16s rRNA databases and use of metaproteomics,metabolomics and metatranscriptomics in conjuncture with genetic profiling[88-92].Still,there is no consensus concerning optimal conditions for conducting microbiome research.Choice between 16S RNA profiling/shotgun sequencing methods ,differences in effective coverage between V1-V9 hypervariable regions ,more precise quantitative analysis for microbiota constituents[93],and generalized protocols for sample acquisition are still in discussion,with main emphasise often being put on low reproducibility of results,partly due to the unstable nature of samples' bacterial composition[81,86,87,92,94].Overall,intestinal microbiome genetic profiling may find use in clinical practice with development of presently elusive“golden standard” for this research field,leading to better understanding of gut microbiota's role in human homeostasis and associations with diseases[95].

General overview of microorganisms involved at least partially with T2D

As of 2014,microbial community of human gut was estimated to contain at least 957 bacterial genera with phyla Actinobacteria,Bacteroides,Firmicutes,Proteobacteria and Verrucomicrobia demonstrating most diversity and abundance[96].While both types of diabetes mellitus are known to cause significant changes in gastrointestinal microbial composition,underlying mechanisms for dysbiosis and roles of all microbiome constituents,including bacteria,archaea,eukaryota and fungi,are still not fully understood.Roseburia intestinalis,Faecalibacterium prausnitzii,and familiesRuminococcaceae/Lachnospiraceae,all known as butyrate producers,were detected to be lower during T2D[97,98].Abundance ofAkkermansia muciniphila,a mucin-degrading primarily mucosal bacteria,had been connected to lower insulin resistance,while their low concentrations were associated with obesity,diabetes,IBD,ulcerative colitis and appendicitis,suggesting future use of this bacteria as a biomarker[99].However,such broad spectrum of diseases makes effective clinical usage questionable.PrevotellacopriandBacteroides vulgatuswere mentioned as possible promoters for insulin resistance due to active branched chain amino acids (BCAA) production[100].Data on generalFirmicutes/Bacteroidetesratio changes during prediabetes and T2D are contradicting,which may be explained by differences in sequencing methods and bioinformatics approaches[100,101].Recent 16S/18S/ITS microbiome profiling study of T2D with 49 adult participants in India showed interesting correlation for archaea,where concentration ofMethanobrevibacterincreased in direction from healthy subjects to fully developed T2D,whileMethanosphaeraconcentration gradually decreased.Fungal component demonstrated overall abundance growth with inclusion of pathogenicAspergillusandCandidaphyla[98].Most of aforementioned microorganisms were proposed as possible indicators for prediabetes,T1D (type 1 diabetes) and T2D,but their use in clinical practice is not recommended at the moment due to low amount of data and contradictory nature of results between studies,which may be solved in the future[86].

Linkage of microbiome to diabetes through obesity and metabolic syndrome.

Both T2D and obesity demonstrate a growing trend across the globe,with subjects suffering from the latter being often viewed as possible T2D risk group[102,103].Recent findings in the field of microbiome variation during diabetes and obesity had reaffirmed earlier theories concerning microbiota's participation in adipose tissue function and insulin resistance.Network-based gene expression association studies of host's genome underline digestive metabolism,immunization,and signal transduction as the most prominent mechanisms in development of obesity/T2D[104],while the data on gastrointestional microbiome role is yet to be unified in coherent system.Gut microbiota had been shown to regulate body mass in a set of fecal transplantation experiments conducted on lean,obese and germ-free mice.Transplantation of gastrointestional microbiota from lean to obese mice led to lower insulin resistance,while transfer of microbiota from obese to lean mice led to body mass increase by 60%and higher insulin resistance[83,104,105].Low grade inflammation,acquired through activation of TLR4/MyD88/NF-κB pathway by lipopolysaccharides from gramnegative bacterial walls,had been connected to insulin resistance through insulin receptor substrate serine phosphorylation by participants of inflammatory cascade[106].Inhibition of NF-κB led to increase of Akkermansia/Lactobacillus,reduced body mass and lower insulin tolerance[100,107].Short chain fatty acids (SCFA),obtained by bacteria through fermentation of non-digestible fibers,serve as signaling molecules in a broad list of processes,including proliferation of pancreatic β cells and insulin biosynthesis.This partially explains prebiotic treatment effectiveness and changes in abundance ofRoseburia intestinalisandFaecalibacterium prausnitzii,but further research is required,as results from different studies often contradict each other[100,108,109].High serum levels of BCAAare attributed to both obesity and T2D with steady increase ofPrevotella copriandBacteroides vulgatusduring the onset of the diseases[110].Both probiotics and prebiotics tend to increase insulin sensitivity and lower body mass,although studies have small sample sizes and require longitudinal research[111,112].

Metformin effects on microbiome composition

Recent findings demonstrate that effectiveness of metformin,most prescribed antidiabetic drug whose pharmacodynamics mainly involve activation of hepatic AMP-activated protein kinase in liver,may be partially attributed to mediation of diabetic dysbiosis.Increase ofAkkermansia muciniphilaabundance after metformin treatment was detected in both human and animal studies,while in vitro conditions in gut simulator demonstrated metformin as a growth factor for bothAkkermansia muciniphilaandBifidobacterium adolescentis[113,114].Metformin therapy was found to promote growth of SCFA-producing bacteria in rats (Allobaculum,Bacteroides,Blautia,Butyricoccus,Lactobacillus,Akkermansia and Phascolarctobacterium) and humans (Akkermansia,Lactobacillus,Bifidobacterium,Prevotella,Megasphaera,Shewanella,Blautia or Butyrivibrio)[113].

NEW APPROACHES FOR CLINICIAN INTERPRETATIONS OF NGS DATA

The identification of multiple loci by GWAS and sequencing technologies has given a considerable impetus to the disclosure of pathogenesis of T2D and provides a tempting opportunity to translate genetic information to clinical practice.This knowledge may have potential role in disease risk prediction including identification of subjects at risk of developing disease at an early-stage,and in clinical management of individuals to modify treatment regimens so that affected individuals would benefit most by their therapy and avoid the occurrence of complications[63].The emerging availability of genomic and electronic health data in large populations is a powerful tool for research that has drawn interest in bringing precision medicine to diabetes[115].

Can a genetic test motivate lifestyle changes?

According to the latest polls people are interested in genetic testing for T2D risk since this allows them to evaluate the individual feature of pathology state[116].However,several studies have shown that some factors contribute to the failure of individuals to conduct a genetic test.The main factors that influence refusal include distrust of medical researchers,religious prejudices and lower levels of education[117,118].Some have argued that the clinical significance of genetic markers of T2D have only a minor role in predicting the risk with careful clinical risk assessment,the predictive value increases[116,119].

Until recently,it has been assumed that genetic predisposition awareness can motivate healthy behavior[120].According to some authors,it is considered that the patient does not appear motivated to a healthy lifestyle after identifying his genetic predisposition[121-123].At the same time,research on the molecular basis of the development of T2D is absolutely necessary when making a diagnosis,since young individuals with T1D can also be obese[124,125].Misdiagnosis of diabetes can lead to misuse of medical treatment[126].

Studies of genetic biomarkers: Prediction,and diagnosis of T2D

Many studies have analyzed the utility of genetic variants in T2D risk prediction for undiagnosed individuals with T2D using cross-sectional studies and incident T2D using longitudinal studies.Early studies provided much optimism and showed that common variants at theTCF7L2locus predict the progression to diabetes in subjects with impaired glucose tolerance[63,127].Unfortunately,diabetes mellitus is diagnosed on the basis of its biochemical effects (increased glucose),and the absence of detection of the main defect,which indicates the absence of the disease[128].However,at present,aggregated available data do not provide robust evidence to support the utility of genetic testing for T2D predictions and indicate a modest contribution of genetic variants[129-131].Several large population-based follow-up studies have been published aiming to investigate the predictive power of common genetic variants on the risk of incident T2D.The results of these studies were similar to those from cross-sectional case-control studies.It was shown that risk variants did not essentially increase the AUC to predict T2D when combined with clinical risk factors[132].However,it seems possible to improve T2D risk prediction and overcome factors limiting predictive power,such as: (1) Modest effect sizes of common variants,(2) Insufficient knowledge of rare and coding variants missed by GWAS;(3) Heterogeneous nature of the disease;and (4) Genetic diversity between ethnic groups (detailed below).

The limitations related with modest effect sizes of common alleles and necessity of further investigation aimed to identify rare and coding variants involved in T2D pathogenesis have been reviewed above.T2D seemingly encompasses a group of several subtypes of diseases,which makes it rather difficult to distinguish it from other types,as it may be the result of defects in various metabolic pathways.The accuracy of prediction models may be affected by the fact that latent autoimmune diabetes in adults has been identified and the number of monogenic forms of diabetes is increasing,which can also indicate the level of misclassification[133].

In different populations,heterogeneity in association of genetic variants with the disease was demonstrated,apparently related to the design of the study,in particular the results of a large meta-analysis that combines cases of T2D with different origins or signs and evaluates them with a generalized intermediate hyperglycemia phenotype,despite the fact that the phenotype may differ due to a multitude of unrelated causes within the physiology of the body or the environment[134].In recent years,a large number of projects have been carried out to study the causes of diabetes,largescale studies have been created and huge biobanks of samples of these patients have been collected.In addition,some variants were found that are important in the prevention and treatment of T2D,found in individual population isolates,demonstrating the value of studying genetically isolated populations[128].Because of genetic drift,deleterious variants with large phenotypic effects could rise randomly to higher allele frequencies.Which makes investigation of such variants' association easier in isolated populations compared to the admixed ones,in which these variants might not be present or might be very rare[10].

Circulating miRNAs in plasma or serum have several features that make them ideal candidate biomarkers of complex diseases such as T2D[135].Hundreds of miRNAs are actively or passively released to the blood circulation to regulate specific gene function[136].Current studies demonstrate that changes in expression miRNAs involve in dysfunction of insulin and progression of T2D.Many studies confirmed that some miRNAs have been identified and found to be associated with T2D[137].miR-21,miR-126 and miR146a have been shown to have potential to be biomarkers of early diagnosis of T2D disease[138-140].Thus,the above mentioned miRNAs and a number of other miRNAs may be candidates for testing the effectiveness of therapy but further studies are needed to identify them[137].

Genetic tests of T2D: Implications for therapy

T2D commonly develops with insulin resistance,a disorder in which cells located primarily within the muscles,liver,and fat tissue do not use insulin properly,and progresses to pancreatic beta-cell failure.T2D trigger are insulin resistance and inadequate insulin secretion[141].

Selection of drug therapy based on the genetic features of the individual can be a huge breakthrough because there are individual drug idiosyncrasy and many patients eventually fail to achieve recommended levels of glycemic control due to their genetic characteristics[142,143].Currently,only half of patients initiating therapy with metformin or sulfonylurea,reached a level of hemoglobin A1c in 7%[144].It should be emphasized that sulfonylureas and metformin are the most studied classes of drugs used to treat T2D[115].

Sulphonylureas (SUs) are widely used drugs in the clinical practice however,different side effects,such as weight gain and increased risk of hypoglycemia,have been frequently[145].Studies have shown that these drugs can act effectively in response to a defect induced by variants inKCNJ11(rs5219,rs5215) andABCC8(rs757110) in patients with T2D[146,147].Also important in the selection of SUs play roleCYP2C9(rs1799853,rs1057910),TCF7L2(rs12255372,rs7903146),IRS1 (rs2943641,rs1801278)andCAPN10(rs3842570,rs3792267,rs5030952)[148-151].It should also be noted rs7754840 in the geneCDKAL1,which is significantly associated with the response to treatment with sulfonylurea and in combination with other clinical and pathological data will help move to individual therapy of patients with T2D[152].

Metformin is the most commonly used drug in the treatment of T2D,which is not metabolized in the liver,therefore,the effect of reducing the level of metformin is not affected by genetic variants in the genes encoding metabolizing enzymes[153].SLC22A1(rs12208357,rs34130495,rs35167514,rs34059508) is the most studied gene that is involved in the response to metformin[154].However,other genes involved in the metabolism of metformin have been identified,for example,SLC22A2(rs316019),PPARG(rs1801282)[145,155].It should also be noted that the T2D-associated variant rs7903146 inTCF7L2influences the acute response to both glipizide and metformin in persons free of overt diabetes[156].

CONCLUSION

The growing power and reducing cost sparked an enormous range of applications of NGS technology that gave us the excellent instrument for solving various problems in molecular biology.Rational usage of this instrument,taking into account all of its benefits and limitations,is the next step on the way to elucidation of pathogenesis of complex diseases such as T2D.Results obtained in sequencing-based studies combined with earlier findings from GWAS and candidate genes studies allow ordering and improving our knowledge about T2D and give us an opportunity to translate genetic information to clinical practice.The increasing knowledge provides a fascinating opportunity to use this information to predict the occurrence of disease and to identify subgroups of patients for whom therapies will have the greatest efficacy or the least adverse effect.However,this new knowledge should be treated with caution.Unfortunately,the accuracy of risk prediction models based on genetic information of T2D is not remarkable to date.Hence,further research and technological improvement is needed in studying the individual and aggregate contribution of genetic markers for the development of diabetes for widespread use in clinical practice.