Catalogue Search | MBRL

A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots

by Falush, Daniel , Lawson, Daniel J. , van Dorp, Lucy in 631/181/457 , 631/181/457/649 , Admixtures

2018

Genetic clustering algorithms, implemented in programs such as STRUCTURE and ADMIXTURE, have been used extensively in the characterisation of individuals and populations based on genetic data. A successful example is the reconstruction of the genetic history of African Americans as a product of recent admixture between highly differentiated populations. Histories can also be reconstructed using the same procedure for groups that do not have admixture in their recent history, where recent genetic drift is strong or that deviate in other ways from the underlying inference model. Unfortunately, such histories can be misleading. We have implemented an approach, badMIXTURE, to assess the goodness of fit of the model using the ancestry “palettes” estimated by CHROMOPAINTER and apply it to both simulated data and real case studies. Combining these complementary analyses with additional methods that are designed to test specific hypotheses allows a richer and more robust analysis of recent demographic history. Clustering methods such as STRUCTURE and ADMIXTURE are widely used in population genetic studies to investigate ancestry. Here, the authors provide a tutorial on how to interpret results of these analyses and a tool to test the goodness of fit of the model.

Journal Article

Share this book

Add to My Shelf

Evolutionary origin of genomic structural variations in domestic yaks

by Lenstra, Johannes A. , Yan, Ping , Zheng, Zeyu in 45/43 , 45/91 , 49/23

2023

Yak has been subject to natural selection, human domestication and interspecific introgression during its evolution. However, genetic variants favored by each of these processes have not been distinguished previously. We constructed a graph-genome for 47 genomes of 7 cross-fertile bovine species. This allowed detection of 57,432 high-resolution structural variants (SVs) within and across the species, which were genotyped in 386 individuals. We distinguished the evolutionary origins of diverse SVs in domestic yaks by phylogenetic analyses. We further identified 334 genes overlapping with SVs in domestic yaks that bore potential signals of selection from wild yaks, plus an additional 686 genes introgressed from cattle. Nearly 90% of the domestic yaks were introgressed by cattle. Introgression of an SV spanning the KIT gene triggered the breeding of white domestic yaks. We validated a significant association of the selected stratified SVs with gene expression, which contributes to phenotypic variations. Our results highlight that SVs of different origins contribute to the phenotypic diversity of domestic yaks. Yaks have been subject to natural selection, human domestication and interspecific introgression during their evolution. Here, the authors have identified genomic structural variations and the linked genes involved in these processes in domestic yaks, to reveal new insight into genetic basis of phenotypic diversity.

Journal Article

Share this book

Add to My Shelf

Host–parasite co-evolution and its genomic signature

by Ebert, Dieter , Fields, Peter D in Coevolution , Evolution , Genetic diversity

2020

Studies in diverse biological systems have indicated that host–parasite co-evolution is responsible for the extraordinary genetic diversity seen in some genomic regions, such as major histocompatibility (MHC) genes in jawed vertebrates and resistance genes in plants. This diversity is believed to evolve under balancing selection on hosts by parasites. However, the mechanisms that link the genomic signatures in these regions to the underlying co-evolutionary process are only slowly emerging. We still lack a clear picture of the co-evolutionary concepts and of the genetic basis of the co-evolving phenotypic traits in the interacting antagonists. Emerging genomic tools that provide new options for identifying underlying genes will contribute to a fuller understanding of the co-evolutionary process.Host–parasite co-evolution is expected to leave signatures of selection in the genome of both antagonists. Ebert and Fields discuss what is known about these signatures, how they relate to co-evolutionary processes and how they can help identify the genes underlying the co-evolving phenotypes.

Journal Article

Share this book

Add to My Shelf

A genomic mutational constraint map using variation in 76,156 human genomes

by Wilson, Michael W. , Ferriera, Steven , O’Donnell-Luria, Anne in 631/114 , 631/181/2474 , 631/181/457/649

2024

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders 1 – 4 , but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)—the largest public open-access human genome allele frequency reference dataset—and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation. A genomic constraint map for the human genome constructed using data from 76,156 human genomes from the Genome Aggregation Database shows that non-coding constrained regions are enriched for regulatory elements and variants associated with complex diseases and traits.

Journal Article

Share this book

Add to My Shelf

Whole-genome sequencing for an enhanced understanding of genetic variation among South Africans

by Mulder, Nicola , Aron, Shaun , Pepper, Michael S. in 631/181/457 , 631/181/457/649 , 631/208/182

2017

The Southern African Human Genome Programme is a national initiative that aspires to unlock the unique genetic character of southern African populations for a better understanding of human genetic diversity. In this pilot study the Southern African Human Genome Programme characterizes the genomes of 24 individuals (8 Coloured and 16 black southeastern Bantu-speakers) using deep whole-genome sequencing. A total of ~16 million unique variants are identified. Despite the shallow time depth since divergence between the two main southeastern Bantu-speaking groups (Nguni and Sotho-Tswana), principal component analysis and structure analysis reveal significant (p < 10−6) differentiation, and FST analysis identifies regions with high divergence. The Coloured individuals show evidence of varying proportions of admixture with Khoesan, Bantu-speakers, Europeans, and populations from the Indian sub-continent. Whole-genome sequencing data reveal extensive genomic diversity, increasing our understanding of the complex and region-specific history of African populations and highlighting its potential impact on biomedical research and genetic susceptibility to disease.

Journal Article

Share this book

Add to My Shelf

Genomic data in the All of Us Research Program

by Rehm, Heidi L. , Meller, Robert , Linder, Jodell E. in 45/43 , 631/114/129 , 631/181/457/649

2024

Comprehensively mapping the genetic basis of human disease across diverse individuals is a long-standing goal for the field of human genetics 1 – 4 . The All of Us Research Program is a longitudinal cohort study aiming to enrol a diverse group of at least one million individuals across the USA to accelerate biomedical research and improve human health 5 , 6 . Here we describe the programme’s genomics data release of 245,388 clinical-grade genome sequences. This resource is unique in its diversity as 77% of participants are from communities that are historically under-represented in biomedical research and 46% are individuals from under-represented racial and ethnic minorities. All of Us identified more than 1 billion genetic variants, including more than 275 million previously unreported genetic variants, more than 3.9 million of which had coding consequences. Leveraging linkage between genomic data and the longitudinal electronic health record, we evaluated 3,724 genetic variants associated with 117 diseases and found high replication rates across both participants of European ancestry and participants of African ancestry. Summary-level data are publicly available, and individual-level data can be accessed by researchers through the All of Us Researcher Workbench using a unique data passport model with a median time from initial researcher registration to data access of 29 hours. We anticipate that this diverse dataset will advance the promise of genomic medicine for all. A study describes the release of clinical-grade whole-genome sequence data for 245,388 diverse participants by the All of Us Research Program and characterizes the properties of the dataset.

Journal Article

Share this book

Add to My Shelf

Genomic insights into the formation of human populations in East Asia

by Oppenheimer, Jonas , Kennett, Douglas J. , Freilich, Suzanne in 45/23 , 631/181/2474 , 631/181/457

2021

The deep population history of East Asia remains poorly understood owing to a lack of ancient DNA data and sparse sampling of present-day people 1 , 2 . Here we report genome-wide data from 166 East Asian individuals dating to between 6000 bc and ad 1000 and 46 present-day groups. Hunter-gatherers from Japan, the Amur River Basin, and people of Neolithic and Iron Age Taiwan and the Tibetan Plateau are linked by a deeply splitting lineage that probably reflects a coastal migration during the Late Pleistocene epoch. We also follow expansions during the subsequent Holocene epoch from four regions. First, hunter-gatherers from Mongolia and the Amur River Basin have ancestry shared by individuals who speak Mongolic and Tungusic languages, but do not carry ancestry characteristic of farmers from the West Liao River region (around 3000 bc ), which contradicts theories that the expansion of these farmers spread the Mongolic and Tungusic proto-languages. Second, farmers from the Yellow River Basin (around 3000 bc ) probably spread Sino-Tibetan languages, as their ancestry dispersed both to Tibet—where it forms approximately 84% of the gene pool in some groups—and to the Central Plain, where it has contributed around 59–84% to modern Han Chinese groups. Third, people from Taiwan from around 1300 bc to ad 800 derived approximately 75% of their ancestry from a lineage that is widespread in modern individuals who speak Austronesian, Tai–Kadai and Austroasiatic languages, and that we hypothesize derives from farmers of the Yangtze River Valley. Ancient people from Taiwan also derived about 25% of their ancestry from a northern lineage that is related to, but different from, farmers of the Yellow River Basin, which suggests an additional north-to-south expansion. Fourth, ancestry from Yamnaya Steppe pastoralists arrived in western Mongolia after around 3000 bc but was displaced by previously established lineages even while it persisted in western China, as would be expected if this ancestry was associated with the spread of proto-Tocharian Indo-European languages. Two later gene flows affected western Mongolia: migrants after around 2000 bc with Yamnaya and European farmer ancestry, and episodic influences of later groups with ancestry from Turan. Genome-wide data from 166 East Asian individuals dating to between 6000 bc and ad 1000 and from 46 present-day groups provide insights into the histories of mixture and migration of human populations in East Asia.

Journal Article

Share this book

Add to My Shelf

Structural variation in the sequencing era

by Urban, Alexander E , Mills, Ryan E , Ho, Steve S in Algorithms , Genomes , Variation

2020

Identifying structural variation (SV) is essential for genome interpretation but has been historically difficult due to limitations inherent to available genome technologies. Detection methods that use ensemble algorithms and emerging sequencing technologies have enabled the discovery of thousands of SVs, uncovering information about their ubiquity, relationship to disease and possible effects on biological mechanisms. Given the variability in SV type and size, along with unique detection biases of emerging genomic platforms, multiplatform discovery is necessary to resolve the full spectrum of variation. Here, we review modern approaches for investigating SVs and proffer that, moving forwards, studies integrating biological information with detection will be necessary to comprehensively understand the impact of SV in the human genome.To map the full extent of structural variation in the human genome, detection methods are needed that improve on short-read approaches. This Review discusses how ensemble algorithms and emerging sequencing technologies are helping to resolve the full spectrum of structural variations.

Journal Article

Share this book

Add to My Shelf

Mutation bias reflects natural selection in Arabidopsis thaliana

by Becker, Claude , Fenster, Charles B. , Hildebrandt, Julia in 45/23 , 631/181/457/649 , 631/181/735

2022

Since the first half of the twentieth century, evolutionary theory has been dominated by the idea that mutations occur randomly with respect to their consequences 1 . Here we test this assumption with large surveys of de novo mutations in the plant Arabidopsis thaliana . In contrast to expectations, we find that mutations occur less often in functionally constrained regions of the genome—mutation frequency is reduced by half inside gene bodies and by two-thirds in essential genes. With independent genomic mutation datasets, including from the largest Arabidopsis mutation accumulation experiment conducted to date, we demonstrate that epigenomic and physical features explain over 90% of variance in the genome-wide pattern of mutation bias surrounding genes. Observed mutation frequencies around genes in turn accurately predict patterns of genetic polymorphisms in natural Arabidopsis accessions ( r = 0.96). That mutation bias is the primary force behind patterns of sequence evolution around genes in natural accessions is supported by analyses of allele frequencies. Finally, we find that genes subject to stronger purifying selection have a lower mutation rate. We conclude that epigenome-associated mutation bias 2 reduces the occurrence of deleterious mutations in Arabidopsis , challenging the prevailing paradigm that mutation is a directionless force in evolution. Data on de novo mutations in Arabidopsis thaliana reveal that mutations do not occur randomly; instead, epigenome-associated mutation bias reduces the occurrence of deleterious mutations.

Journal Article

Share this book

Add to My Shelf

Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England

by Gonçalves, Sónia , Volz, Erik , Chand, Meera in 631/181/457 , 631/326/596/4130 , 692/699/255/2514

2021

The SARS-CoV-2 lineage B.1.1.7, designated variant of concern (VOC) 202012/01 by Public Health England 1 , was first identified in the UK in late summer to early autumn 2020 2 . Whole-genome SARS-CoV-2 sequence data collected from community-based diagnostic testing for COVID-19 show an extremely rapid expansion of the B.1.1.7 lineage during autumn 2020, suggesting that it has a selective advantage. Here we show that changes in VOC frequency inferred from genetic data correspond closely to changes inferred by S gene target failures (SGTF) in community-based diagnostic PCR testing. Analysis of trends in SGTF and non-SGTF case numbers in local areas across England shows that B.1.1.7 has higher transmissibility than non-VOC lineages, even if it has a different latent period or generation time. The SGTF data indicate a transient shift in the age composition of reported cases, with cases of B.1.1.7 including a larger share of under 20-year-olds than non-VOC cases. We estimated time-varying reproduction numbers for B.1.1.7 and co-circulating lineages using SGTF and genomic data. The best-supported models did not indicate a substantial difference in VOC transmissibility among different age groups, but all analyses agreed that B.1.1.7 has a substantial transmission advantage over other lineages, with a 50% to 100% higher reproduction number. Genetic and testing data from England show that the SARS-CoV-2 variant of concern B.1.1.7 has a transmission advantage over other lineages.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter