Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
7,276
result(s) for
"Genetics, Population - statistics "
Sort by:
Inferring Continuous and Discrete Population Genetic Structure Across Space
2018
An important step in the analysis of genetic data is to describe and categorize natural variation. Individuals that live close together are, on average, more genetically similar than individuals sampled farther apart... A classic problem in population genetics is the characterization of discrete population structure in the presence of continuous patterns of genetic differentiation. Especially when sampling is discontinuous, the use of clustering or assignment methods may incorrectly ascribe differentiation due to continuous processes (e.g., geographic isolation by distance) to discrete processes, such as geographic, ecological, or reproductive barriers between populations. This reflects a shortcoming of current methods for inferring and visualizing population structure when applied to genetic data deriving from geographically distributed populations. Here, we present a statistical framework for the simultaneous inference of continuous and discrete patterns of population structure. The method estimates ancestry proportions for each sample from a set of two-dimensional population layers, and, within each layer, estimates a rate at which relatedness decays with distance. This thereby explicitly addresses the “clines versus clusters” problem in modeling population genetic variation, and remedies some of the overfitting to which nonspatial models are prone. The method produces useful descriptions of structure in genetic relatedness in situations where separated, geographically distributed populations interact, as after a range expansion or secondary contact. We demonstrate the utility of this approach using simulations and by applying it to empirical datasets of poplars and black bears in North America.
Journal Article
Identifying individuals with high risk of Alzheimer’s disease using polygenic risk scores
2021
Polygenic Risk Scores (PRS) for AD offer unique possibilities for reliable identification of individuals at high and low risk of AD. However, there is little agreement in the field as to what approach should be used for genetic risk score calculations, how to model the effect of
APOE
, what the optimal
p-
value threshold (pT) for SNP selection is and how to compare scores between studies and methods. We show that the best prediction accuracy is achieved with a model with two predictors (
APOE
and PRS excluding
APOE
region) with pT<0.1 for SNP selection. Prediction accuracy in a sample across different PRS approaches is similar, but individuals’ scores and their associated ranking differ. We show that standardising PRS against the population mean, as opposed to the sample mean, makes the individuals’ scores comparable between studies. Our work highlights the best strategies for polygenic profiling when assessing individuals for AD risk.
While polygenic risk scores have been shown to be correlated with disease risk, there is little agreement on how the score should be calculated. Here the authors investigate risk scores for Alzheimer’s disease, finding that the most effective approach includes an APOE score and a polygenic score excluding APOE.
Journal Article
Population Structure and Eigenanalysis
by
Price, Alkes L.
,
Patterson, Nick
,
Reich, David
in
Computer Simulation - statistics & numerical data
,
Eigenfunctions
,
Eukaryotes
2006
Current methods for inferring population structure from genetic data do not provide formal significance tests for population differentiation. We discuss an approach to studying population structure (principal components analysis) that was first applied to genetic data by Cavalli-Sforza and colleagues. We place the method on a solid statistical footing, using results from modern statistics to develop formal significance tests. We also uncover a general \"phase change\" phenomenon about the ability to detect structure in genetic data, which emerges from the statistical theory we use, and has an important implication for the ability to discover structure in genetic data: for a fixed but large dataset size, divergence between two populations (as measured, for example, by a statistic like FST) below a threshold is essentially undetectable, but a little above threshold, detection will be easy. This means that we can predict the dataset size needed to detect structure.
Journal Article
Estimating Individual Admixture Proportions from Next Generation Sequencing Data
by
Albrechtsen, Anders
,
Skotte, Line
,
Korneliussen, Thorfinn Sand
in
Algorithms
,
Computer Simulation
,
Data Interpretation, Statistical
2013
Inference of population structure and individual ancestry is important both for population genetics and for association studies. With next generation sequencing technologies it is possible to obtain genetic data for all accessible genetic variations in the genome. Existing methods for admixture analysis rely on known genotypes. However, individual genotypes cannot be inferred from low-depth sequencing data without introducing errors. This article presents a new method for inferring an individual’s ancestry that takes the uncertainty introduced in next generation sequencing data into account. This is achieved by working directly with genotype likelihoods that contain all relevant information of the unobserved genotypes. Using simulations as well as publicly available sequencing data, we demonstrate that the presented method has great accuracy even for very low-depth data. At the same time, we demonstrate that applying existing methods to genotypes called from the same data can introduce severe biases. The presented method is implemented in the NGSadmix software available at http://www.popgen.dk/software.
Journal Article
A framework for variation discovery and genotyping using next-generation DNA sequencing data
by
Rivas, Manuel A
,
Philippakis, Anthony A
,
Banks, Eric
in
631/208/2489/144
,
631/208/514/2254
,
Agriculture
2011
Mark DePristo and colleagues report an analytical framework to discover and genotype variation using whole exome and genome resequencing data from next-generation sequencing technologies. They apply these methods to low-pass population sequencing data from the 1000 Genomes Project.
Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.
Journal Article
Population genetic differentiation of height and body mass index across Europe
by
van Rheenen, Wouter
,
Veldink, Jan H
,
Boomsma, Dorret I
in
45/43
,
631/208/205/2138
,
631/208/457
2015
Matthew Robinson and colleagues report an analysis of population genetic differences in human height and body mass index (BMI) across 14 European populations. They estimate the proportion of additive genetic variance attributable to population genetic differences and find evidence for selection increasing height while reducing BMI in European nations.
Across-nation differences in the mean values for complex traits are common
1
,
2
,
3
,
4
,
5
,
6
,
7
,
8
, but the reasons for these differences are unknown. Here we find that many independent loci contribute to population genetic differences in height and body mass index (BMI) in 9,416 individuals across 14 European countries. Using discovery data on over 250,000 individuals and unbiased effect size estimates from 17,500 sibling pairs, we estimate that 24% (95% credible interval (CI) = 9%, 41%) and 8% (95% CI = 4%, 16%) of the captured additive genetic variance for height and BMI, respectively, reflect population genetic differences. Population genetic divergence differed significantly from that in a null model (height,
P
< 3.94 × 10
−8
; BMI,
P
< 5.95 × 10
−4
), and we find an among-population genetic correlation for tall and slender individuals (
r
= −0.80, 95% CI = −0.95, −0.60), consistent with correlated selection for both phenotypes. Observed differences in height among populations reflected the predicted genetic means (
r
= 0.51;
P
< 0.001), but environmental differences across Europe masked genetic differentiation for BMI (
P
< 0.58).
Journal Article
Nonparametric coalescent inference of mutation spectrum history and demography
by
DeWitt, William S.
,
Ragsdale, Aaron P.
,
Harris, Kelley
in
Animal populations
,
Animals
,
Biological Sciences
2021
As populations boom and bust, the accumulation of genetic diversity is modulated, encoding histories of living populations in present-day variation. Many methods exist to decode these histories, and all must make strong model assumptions. It is typical to assume that mutations accumulate uniformly across the genome at a constant rate that does not vary between closely related populations. However, recent work shows that mutational processes in human and great ape populations vary across genomic regions and evolve over time. This perturbs the mutation spectrum (relative mutation rates in different local nucleotide contexts). Here, we develop theoretical tools in the framework of Kingman’s coalescent to accommodate mutation spectrum dynamics. We present mutation spectrum history inference (mushi), a method to perform nonparametric inference of demographic and mutation spectrum histories from allele frequency data. We use mushi to reconstruct trajectories of effective population size and mutation spectrum divergence between human populations, identify mutation signatures and their dynamics in different human populations, and calibrate the timing of a previously reported mutational pulse in the ancestors of Europeans. We show that mutation spectrum histories can be placed in a well-studied theoretical setting and rigorously inferred from genomic variation data, like other features of evolutionary history.
Journal Article
Structure is more robust than other clustering methods in simulated mixed-ploidy populations
by
Kolář, Filip
,
Stift, Marc
,
Meirmans, Patrick G
in
Admixtures
,
Autotetraploid
,
Cluster analysis
2019
Analysis of population genetic structure has become a standard approach in population genetics. In polyploid complexes, clustering analyses can elucidate the origin of polyploid populations and patterns of admixture between different cytotypes. However, combining diploid and polyploid data can theoretically lead to biased inference with (artefactual) clustering by ploidy. We used simulated mixed-ploidy (diploid-autotetraploid) data to systematically compare the performance of k-means clustering and the model-based clustering methods implemented in Structure, Admixture, FastStructure and InStruct under different scenarios of differentiation and with different marker types. Under scenarios of strong population differentiation, the tested applications performed equally well. However, when population differentiation was weak, Structure was the only method that allowed unbiased inference with markers with limited genotypic information (co-dominant markers with unknown dosage or dominant markers). Still, since Structure was comparatively slow, the much faster but less powerful FastStructure provides a reasonable alternative for large datasets. Finally, although bias makes k-means clustering unsuitable for markers with incomplete genotype information, for large numbers of loci (>1000) with known dosage k-means clustering was superior to FastStructure in terms of power and speed. We conclude that Structure is the most robust method for the analysis of genetic structure in mixed-ploidy populations, although alternative methods should be considered under some specific conditions.
Journal Article
On Detecting Incomplete Soft or Hard Selective Sweeps Using Haplotype Structure
by
Mason, Liang
,
Nielsen, Rasmus
,
Korneliussen, Thorfinn
in
Blood parasites
,
Cholesterol
,
Demographics
2014
We present a new haplotype-based statistic (nSL) for detecting both soft and hard sweeps in population genomic data from a single population. We compare our new method with classic single-population haplotype and site frequency spectrum (SFS)-based methods and show that it is more robust, particularly to recombination rate variation. However, all statistics show some sensitivity to the assumptions of the demographic model. Additionally, we show that nSL has at least as much power as other methods under a number of different selection scenarios, most notably in the cases of sweeps from standing variation and incomplete sweeps. This conclusion holds up under a variety of demographic models. In many aspects, our new method is similar to the iHS statistic; however, it is generally more robust and does not require a genetic map. To illustrate the utility of our new method, we apply it to HapMap3 data and show that in the Yoruban population, there is strong evidence of selection on genes relating to lipid metabolism. This observation could be related to the known differences in cholesterol levels, and lipid metabolism more generally, between African Americans and other populations. We propose that the underlying causes for the selection on these genes are pleiotropic effects relating to blood parasites rather than their role in lipid metabolism.
Journal Article
A mixed-model approach for genome-wide association studies of correlated traits in structured populations
by
Vilhjálmsson, Bjarni J
,
Segura, Vincent
,
Long, Quan
in
631/208/205/2138
,
631/208/457
,
Agriculture
2012
Magnus Nordborg and colleagues report a parameterized multi-trait mixed model (MTMM) method applied to genome-wide association studies of correlated phenotypes. They test this approach, using both human and
Arabidopsis thaliana
data sets, and demonstrate how it can be used to identify pleiotropic loci and gene by environment interactions.
Genome-wide association studies (GWAS) are a standard approach for studying the genetics of natural variation. A major concern in GWAS is the need to account for the complicated dependence structure of the data, both between loci as well as between individuals. Mixed models have emerged as a general and flexible approach for correcting for population structure in GWAS. Here, we extend this linear mixed-model approach to carry out GWAS of correlated phenotypes, deriving a fully parameterized multi-trait mixed model (MTMM) that considers both the within-trait and between-trait variance components simultaneously for multiple traits. We apply this to data from a human cohort for correlated blood lipid traits from the Northern Finland Birth Cohort 1966 and show greatly increased power to detect pleiotropic loci that affect more than one blood lipid trait. We also apply this approach to an
Arabidopsis thaliana
data set for flowering measurements in two different locations, identifying loci whose effect depends on the environment.
Journal Article