Catalogue Search | MBRL

Rare-variant collapsing analyses for complex traits: guidelines and applications

by Povysil, Gundula , Petrovski, Slavé , Allen, Andrew S in Genetic diversity , Genome-wide association studies , Genomes

2019

The first phase of genome-wide association studies (GWAS) assessed the role of common variation in human disease. Advances optimizing and economizing high-throughput sequencing have enabled a second phase of association studies that assess the contribution of rare variation to complex disease in all protein-coding genes. Unlike the early microarray-based studies, sequencing-based studies catalogue the full range of genetic variation, including the evolutionarily youngest forms. Although the experience with common variants helped establish relevant standards for genome-wide studies, the analysis of rare variation introduces several challenges that require novel analysis approaches.

Journal Article

Share this book

Add to My Shelf

3.5KJPNv2: an allele frequency panel of 3552 Japanese individuals including the X chromosome

by Shimizu Atsushi , Katsuoka Fumiki , Koshiba Seizo in Population , Population genetics

2019

The first step towards realizing personalized healthcare is to catalog the genetic variations in a population. Since the dissemination of individual-level genomic information is strictly controlled, it will be useful to construct population-level allele frequency panels with easy-to-use interfaces. In the Tohoku Medical Megabank Project, we sequenced nearly 4000 individuals from a Japanese population and constructed an allele frequency panel of 3552 individuals after removing related samples. The panel is called the 3.5KJPNv2. It was constructed by using a standard pipeline including the 1KGP and gnomAD algorithms to reduce technical biases and to allow comparisons to other populations. Our database is the first large-scale panel providing the frequencies of variants present on the X chromosome and on the mitochondria in the Japanese population. All the data are available on our original database at https://jmorp.megabank.tohoku.ac.jp.Population genetics: large database of Japanese gene variations constructedA new database provides information on the frequency of genetic variations within 3552 Japanese individuals, and facilitates comparisons with other populations. The reference panel, constructed by Kengo Kinoshita of Tohoku University, Sendai, and colleagues in Japan is also the first large-scale database to provide genetic variation frequency information on the X chromosome and mitochondrial DNA in the Japanese population. The methods used to sequence the genetic data are similar to those used in other large databases, allowing comparisons with other populations. The population size and methods used to compile the database overcome limitations in previous Japanese reference panels. This and similar databases that catalog genetic variations within populations can improve efforts towards personalizing healthcare and contribute to the study of human population genetics. The database is publicly available online.

Journal Article

Share this book

Add to My Shelf

Long-read human genome sequencing and its applications

by Logsdon, Glennis A , Eichler, Evan E , Vollger, Mitchell R in Chromosomes , Diploids , DNA sequencing

2020

Over the past decade, long-read, single-molecule DNA sequencing technologies have emerged as powerful players in genomics. With the ability to generate reads tens to thousands of kilobases in length with an accuracy approaching that of short-read sequencing technologies, these platforms have proven their ability to resolve some of the most challenging regions of the human genome, detect previously inaccessible structural variants and generate some of the first telomere-to-telomere assemblies of whole chromosomes. Long-read sequencing technologies will soon permit the routine assembly of diploid genomes, which will revolutionize genomics by revealing the full spectrum of human genetic variation, resolving some of the missing heritability and leading to the discovery of novel mechanisms of disease.Long-read sequencing is becoming more accessible and more accurate. In this Review, Logsdon et al. discuss the currently available platforms, how the technologies are being applied to assemble and phase human genomes, and their impact on improving our understanding of human genetic variation.

Journal Article

Share this book

Add to My Shelf

Outbreak.info genomic reports: scalable and dynamic surveillance of SARS-CoV-2 variants and mutations

by Gangavarapu, Karthik , Haag, Emily , Hufbauer, Emory in 631/114/2405 , 631/114/794 , 631/208/726/649

2023

In response to the emergence of SARS-CoV-2 variants of concern, the global scientific community, through unprecedented effort, has sequenced and shared over 11 million genomes through GISAID, as of May 2022. This extraordinarily high sampling rate provides a unique opportunity to track the evolution of the virus in near real-time. Here, we present outbreak.info , a platform that currently tracks over 40 million combinations of Pango lineages and individual mutations, across over 7,000 locations, to provide insights for researchers, public health officials and the general public. We describe the interpretable visualizations available in our web application, the pipelines that enable the scalable ingestion of heterogeneous sources of SARS-CoV-2 variant data and the server infrastructure that enables widespread data dissemination via a high-performance API that can be accessed using an R package. We show how outbreak.info can be used for genomic surveillance and as a hypothesis-generation tool to understand the ongoing pandemic at varying geographic and temporal scales. The outbreak.info genomic reports provides comprehensive and detailed information about SARS-CoV-2 lineages and mutations worldwide, which facilitates near real-time genomic surveillance.

Journal Article

Share this book

Add to My Shelf

The contribution of genetic variants to disease depends on the ruler

by Visscher, Peter M. , Witte, John S. , Wray, Naomi R. in 631/208/2489/144 , 631/208/726/649 , 631/208/726/649/2219

2014

Key Points Although the historically different fields of quantitative genetics and epidemiology are converging to answer fundamental questions about genetic variation in risk underlying human diseases, the plethora of measures to quantify the contribution of variants to disease risk have differing terminology and assumptions, which obfuscate their use and interpretation. In this Analysis, we consider and contrast the most commonly used measures that assess disease risk contributed to the population by individual variants — the heritability of disease liability explained, approximate heritability explained, the sibling recurrence risk explained, the proportion of genetic variance explained on a logarthimic relative risk scale, the area under the receiver–operating curve (AUC) and the population attributable fraction (PAF) — and give numerical examples in breast cancer, Crohn's disease, rheumatoid arthritis and schizophrenia. We discuss the properties of these measures, show how they are connected to each other, consider the situations for which they are best suited and provide an online tool for their calculation. The most appropriate measure to use depends on the importance given to the frequency of a risk variant relative to its effect size on disease and on the baseline to which importance is expressed. These factors should be explicitly considered when assessing the contribution of genetic variants to disease. We recommend investigators to focus primarily on the heritability of liability or genetic variance on the logarthimic relative risk scale explained, as they give estimates that are less sensitive to rare high-risk variants than the other measures considered here. Moreover, we caution against using the PAF for genetic risk variants because it has various undesirable properties. The concept of individual loci providing an explanation for disease is less straightforward than it may seem at first sight, and we recommend investigators to undertake sensitivity analyses that explore how measures of the contribution of genetic variants to risk vary across a range of underlying assumptions. There are various measures to quantify the contribution of genetic variants to disease risk, but differing terminology and assumptions obfuscate their use and interpretation. In this Analysis, the authors consider and contrast six commonly used measures that assess disease risk of individual variants, and provide numerical examples in breast cancer, Crohn's disease, rheumatoid arthritis and schizophrenia. Our understanding of the genetic basis of disease has evolved from descriptions of overall heritability or familiality to the identification of large numbers of risk loci. One can quantify the impact of such loci on disease using a plethora of measures, which can guide future research decisions. However, different measures can attribute varying degrees of importance to a variant. In this Analysis, we consider and contrast the most commonly used measures — specifically, the heritability of disease liability, approximate heritability, sibling recurrence risk, overall genetic variance using a logarithmic relative risk scale, the area under the receiver–operating curve for risk prediction and the population attributable fraction — and give guidelines for their use that should be explicitly considered when assessing the contribution of genetic variants to disease.

Journal Article

Share this book

Add to My Shelf

Accurate detection of complex structural variations using single-molecule sequencing

by Arndt von Haeseler , Schatz, Michael C , Nattestad, Maria in Accuracy , Biology , Computer science

2018

Structural variations are the greatest source of genetic variation, but they remain poorly understood because of technological limitations. Single-molecule long-read sequencing has the potential to dramatically advance the field, although high error rates are a challenge with existing methods. Addressing this need, we introduce open-source methods for long-read alignment (NGMLR; https://github.com/philres/ngmlr) and structural variant identification (Sniffles; https://github.com/fritzsedlazeck/Sniffles) that provide unprecedented sensitivity and precision for variant detection, even in repeat-rich regions and for complex nested events that can have substantial effects on human health. In several long-read datasets, including healthy and cancerous human genomes, we discovered thousands of novel variants and categorized systematic errors in short-read approaches. NGMLR and Sniffles can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings.

Journal Article

Share this book

Add to My Shelf

Disease variant prediction with deep generative models of evolutionary data

by Min, Joseph K. , Frazer, Jonathan , Gal, Yarin in 631/114/1305 , 631/114/2397 , 631/208/2489/144

2021

Quantifying the pathogenicity of protein variants in human disease-related genes would have a marked effect on clinical decisions, yet the overwhelming majority (over 98%) of these variants still have unknown consequences 1 – 3 . In principle, computational methods could support the large-scale interpretation of genetic variants. However, state-of-the-art methods 4 – 10 have relied on training machine learning models on known disease labels. As these labels are sparse, biased and of variable quality, the resulting models have been considered insufficiently reliable 11 . Here we propose an approach that leverages deep generative models to predict variant pathogenicity without relying on labels. By modelling the distribution of sequence variation across organisms, we implicitly capture constraints on the protein sequences that maintain fitness. Our model EVE (evolutionary model of variant effect) not only outperforms computational approaches that rely on labelled data but also performs on par with, if not better than, predictions from high-throughput experiments, which are increasingly used as evidence for variant classification 12 – 16 . We predict the pathogenicity of more than 36 million variants across 3,219 disease genes and provide evidence for the classification of more than 256,000 variants of unknown significance. Our work suggests that models of evolutionary information can provide valuable independent evidence for variant interpretation that will be widely useful in research and clinical settings. A new computational method, EVE, classifies human genetic variants in disease genes using deep generative models trained solely on evolutionary sequences.

Journal Article

Share this book

Add to My Shelf

Graph pangenome captures missing heritability and empowers tomato breeding

by Zhou, Yao , Lin, Tao , Mueller, Lukas in 45/23 , 45/43 , 45/91

2022

Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits 1 , 2 . The solution to this problem is to identify all causal genetic variants and to measure their individual contributions 3 , 4 . Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used for genome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding. A precise catalogue of more than 19 million variants from 838 tomato genomes, including 32 new reference-level genome assemblies, advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding.

Journal Article

Share this book

Add to My Shelf

The GenomeAsia 100K Project enables genetic discoveries across Asia

by Shin, Jong-Yeon , Mehdi, Syed Q. , Parani, Madasamy in 45/23 , 631/208/457/649 , 631/208/726/649

2019

The underrepresentation of non-Europeans in human genetic studies so far has limited the diversity of individuals in genomic datasets and led to reduced medical relevance for a large proportion of the world’s population. Population-specific reference genome datasets as well as genome-wide association studies in diverse populations are needed to address this issue. Here we describe the pilot phase of the GenomeAsia 100K Project. This includes a whole-genome sequencing reference dataset from 1,739 individuals of 219 population groups and 64 countries across Asia. We catalogue genetic variation, population structure, disease associations and founder effects. We also explore the use of this dataset in imputation, to facilitate genetic studies in populations across Asia and worldwide. Using whole-genome sequencing data from 1,739 individuals, the GenomeAsia 100K Project catalogues genetic variation, population structure and disease associations to facilitate genetic studies in Asian populations and increase representation in genetics studies worldwide.

Journal Article

Share this book

Add to My Shelf

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program

by Torres, Raul , O’Connell, Jeffrey R. , Bobo, Dean M. in 45/43 , 631/208/457/649/2219 , 631/208/514/2254

2021

The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes) 1 . In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%. The goals, resources and design of the NHLBI Trans-Omics for Precision Medicine (TOPMed) programme are described, and analyses of rare variants detected in the first 53,831 samples provide insights into mutational processes and recent human evolutionary history.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter