Catalogue Search | MBRL

Accurate Genomic Prediction of Human Height

by de los Campos, Gustavo , Avery, Steven G , Vazquez, Ana I in Artificial intelligence , Body height , Body Height - genetics

2018

Hsu et al. used advanced methods from machine learning to analyze almost half a million genomes. They produced, for the first time, accurate genomic predictors for complex traits such as height, bone density, and educational attainment... We construct genomic predictors for heritable but extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). The constructed predictors explain, respectively, ∼40, 20, and 9% of total variance for the three traits, in data not used for training. For example, predicted heights correlate ∼0.65 with actual height; actual heights of most individuals in validation samples are within a few centimeters of the prediction. The proportion of variance explained for height is comparable to the estimated common SNP heritability from genome-wide complex trait analysis (GCTA), and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for SNPs. Thus, our results close the gap between prediction R-squared and common SNP heritability. The ∼20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common variants. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier genome-wide association studies (GWAS) for out-of-sample validation of our results.

Journal Article

Share this book

Add to My Shelf

Sibling variation in polygenic traits and DNA recombination mapping with UK Biobank and IVF family data

by Raben, Timothy G. , Hsu, Maximus , Widen, Erik in 631/208/2489/1512 , 631/208/457/649 , 631/208/480

2023

We use UK Biobank and a unique IVF family dataset (including genotyped embryos) to investigate sibling variation in both phenotype and genotype. We compare phenotype (disease status, height, blood biomarkers) and genotype (polygenic scores, polygenic health index) distributions among siblings to those in the general population. As expected, the between-siblings standard deviation in polygenic scores is 2 times smaller than in the general population, but variation is still significant. As previously demonstrated, this allows for substantial benefit from polygenic screening in IVF. Differences in sibling genotypes result from distinct recombination patterns in sexual reproduction. We develop a novel sibling-pair method for detection of recombination breaks via statistical discontinuities. The new method is used to construct a dataset of 1.44 million recombination events which may be useful in further study of meiosis.

Journal Article

Share this book

Add to My Shelf

Efficient blockLASSO for polygenic scores with applications to All of Us and UK Biobank

by Raben, Timothy G. , Hsu, Stephen D. H. , Widen, Erik in Agreements , Algorithms , Animal Genetics and Genomics

2025

We develop a “block” LASSO (blockLASSO) approach for training polygenic scores (PGS) and demonstrate its use in All of Us (AoU) and the UK Biobank (UKB). blockLASSO utilizes the approximate block diagonal structure (due to chromosomal partition of the genome) of linkage disequilibrium (LD). The new implementation can be used for exploratory and methods research where repeated PGS training is necessary and expensive. For 11 different phenotypes, in two different biobanks, and across 5 different ancestry groups (African, American, East Asian, European, and South Asian) – we demonstrate that blockLASSO is generally as effective for training PGS as a (global) LASSO. Previous work has shown penalized regression methods produce competitive PGS to alternative approaches. It has been shown that some phenotypes are more/less polygenic than others. Using sparse algorithms, an accurate PGS can be trained for type 1 diabetes (T1D) using ∼ 100 single nucleotide variants (SNVs), but a PGS for body mass index (BMI) would need more than 10k SNVs. blockLASSO produces similar PGS for phenotypes while training with just a fraction of the variants per block. Within AoU (using only genetic information) block PGS for T1D reaches an AUC of 0 . 63 ± 0.02 and for BMI a correlation of 0 . 21 ± 0.01 , whereas a global LASSO approach which finds for T1D an AUC 0 . 65 ± 0.03 and BMI a correlation 0 . 19 ± 0.03 . This new block approach is more computationally efficient and scalable than naive global machine learning approaches and makes it ideal for exploratory methods investigations based on penalized regression.

Journal Article

Share this book

Add to My Shelf

Genomic Prediction of 16 Complex Disease Risks Including Heart Attack, Diabetes, Breast and Prostate Cancer

by Raben, Timothy G. , Tellier, Laurent C. A. M. , Yong, Soke Yuen in 38/43 , 631/114/1305 , 631/208/2489/144

2019

We construct risk predictors using polygenic scores (PGS) computed from common Single Nucleotide Polymorphisms (SNPs) for a number of complex disease conditions, using L1-penalized regression (also known as LASSO) on case-control data from UK Biobank. Among the disease conditions studied are Hypothyroidism, (Resistant) Hypertension, Type 1 and 2 Diabetes, Breast Cancer, Prostate Cancer, Testicular Cancer, Gallstones, Glaucoma, Gout, Atrial Fibrillation, High Cholesterol, Asthma, Basal Cell Carcinoma, Malignant Melanoma, and Heart Attack. We obtain values for the area under the receiver operating characteristic curves (AUC) in the range ~0.58–0.71 using SNP data alone. Substantially higher predictor AUCs are obtained when incorporating additional variables such as age and sex. Some SNP predictors alone are sufficient to identify outliers (e.g., in the 99th percentile of polygenic score, or PGS) with 3–8 times higher risk than typical individuals. We validate predictors out-of-sample using the eMERGE dataset, and also with different ancestry subgroups within the UK Biobank population. Our results indicate that substantial improvements in predictive power are attainable using training sets with larger case populations. We anticipate rapid improvement in genomic prediction as more case-control data become available for analysis.

Journal Article

Share this book

Add to My Shelf

Biobank-scale methods and projections for sparse polygenic prediction from machine learning

by Raben, Timothy G. , Hsu, Stephen D. H. , Widen, Erik in 631/208/212 , 631/208/480 , 631/208/729

2023

In this paper we characterize the performance of linear models trained via widely-used sparse machine learning algorithms. We build polygenic scores and examine performance as a function of training set size, genetic ancestral background, and training method. We show that predictor performance is most strongly dependent on size of training data, with smaller gains from algorithmic improvements. We find that LASSO generally performs as well as the best methods, judged by a variety of metrics. We also investigate performance characteristics of predictors trained on one genetic ancestry group when applied to another. Using LASSO, we develop a novel method for projecting AUC and correlation as a function of data size (i.e., for new biobanks) and characterize the asymptotic limit of performance. Additionally, for LASSO (compressed sensing) we show that performance metrics and predictor sparsity are in agreement with theoretical predictions from the Donoho-Tanner phase transition. Specifically, a future predictor trained in the Taiwan Precision Medicine Initiative for asthma can achieve an AUC of 0 . 63 ( 0.02 ) and for height a correlation of 0 . 648 ( 0.009 ) for a Taiwanese population. This is above the measured values of 0 . 61 ( 0.01 ) and 0 . 631 ( 0.008 ) , respectively, for UK Biobank trained predictors applied to a European population.

Journal Article

Share this book

Add to My Shelf

Polygenic Health Index, General Health, and Pleiotropy: Sibling Analysis and Disease Risk Reduction

by Raben, Timothy G. , Tellier, Laurent C. A. M. , Hsu, Stephen D. H. in 692/308/2056 , 692/308/53/2423 , 692/499

2022

We construct a polygenic health index as a weighted sum of polygenic risk scores for 20 major disease conditions, including, e.g., coronary artery disease, type 1 and 2 diabetes, schizophrenia, etc. Individual weights are determined by population-level estimates of impact on life expectancy. We validate this index in odds ratios and selection experiments using unrelated individuals and siblings (pairs and trios) from the UK Biobank. Individuals with higher index scores have decreased disease risk across almost all 20 diseases (no significant risk increases), and longer calculated life expectancy. When estimated Disability Adjusted Life Years (DALYs) are used as the performance metric, the gain from selection among ten individuals (highest index score vs average) is found to be roughly 4 DALYs. We find no statistical evidence for antagonistic trade-offs in risk reduction across these diseases. Correlations between genetic disease risks are found to be mostly positive and generally mild. These results have important implications for public health and also for fundamental issues such as pleiotropy and genetic architecture of human disease conditions.

Journal Article

Share this book

Add to My Shelf

Time evolution of cascade decay

by Boyanovsky, Daniel , Lello, Louis in 11.10.-z , 11.15.Tk , 11.90.+t

2014

We study non-perturbatively the time evolution of cascade decay for generic fields and obtain the time dependence of amplitudes and populations for the resonant and final states. We analyze in detail the different time scales and the manifestation of unitary time evolution in the dynamics of production and decay of resonant intermediate and final states. The probability of occupation (population) 'flows' as a function of time from the initial to the final states. When the decay width of the parent particle is much larger than that of the intermediate resonant state there is a 'bottleneck' in the flow, the population of resonant states builds up to a maximum at nearly saturating unitarity and decays to the final state on the longer time scale . As a consequence of the wide separation of time scales in this case the cascade decay can be interpreted as evolving sequentially . In the opposite limit the population of resonances ( ) does not build up substantially and the cascade decay proceeds almost directly from the initial parent to the final state without resulting in a large amplitude of the resonant state. An alternative but equivalent non-perturbative method useful in cosmology is presented. Possible phenomenological implications for heavy sterile neutrinos as resonant states and consequences of quantum entanglement and correlations in the final state are discussed.

Journal Article

Share this book

Add to My Shelf

Embryo Screening for Polygenic Disease Risk: Recent Advances and Ethical Considerations

by Hsu, Stephen , Treff, Nathan R. , Eccles, Jennifer in adults , Breast cancer , breast neoplasms

2021

Machine learning methods applied to large genomic datasets (such as those used in GWAS) have led to the creation of polygenic risk scores (PRSs) that can be used identify individuals who are at highly elevated risk for important disease conditions, such as coronary artery disease (CAD), diabetes, hypertension, breast cancer, and many more. PRSs have been validated in large population groups across multiple continents and are under evaluation for widespread clinical use in adult health. It has been shown that PRSs can be used to identify which of two individuals is at a lower disease risk, even when these two individuals are siblings from a shared family environment. The relative risk reduction (RRR) from choosing an embryo with a lower PRS (with respect to one chosen at random) can be quantified by using these sibling results. New technology for precise embryo genotyping allows more sophisticated preimplantation ranking with better results than the current method of selection that is based on morphology. We review the advances described above and discuss related ethical considerations.

Journal Article

Share this book

Add to My Shelf

Entanglement entropy in particle decay

by Boyanovsky, Daniel , Lello, Louis , Holman, Richard in Classical and Quantum Gravitation , Correlation analysis , Decomposition

2013

A bstract The decay of a parent particle into two or more daughter particles results in an entangled quantum state as a consequence of conservation laws in the decay process. Recent experiments at Belle and BaBar take advantage of quantum entanglement and the correlations in the time evolution of the product particles to study CP and T violations. If one (or more) of the product particles are not observed, their degrees of freedom are traced out of the pure state density matrix resulting from the decay, leading to a mixed state density matrix and an entanglement entropy. This entropy is a measure of the loss of information present in the original quantum correlations of the entangled state. We use the Wigner-Weisskopf method to construct an approximation to this state that evolves in time in a manifestly unitary way. We then obtain the entanglement entropy from the reduced density matrix of one of the daughter particles obtained by tracing out the unobserved states, and follow its time evolution. We find that it grows over a time scale determined by the lifetime of the parent particle to a maximum, which when the width of the parent particle is narrow, describes the phase space distribution of maximally entangled Bell-like states. The method is generalized to the case in which the parent particle is described by a wave packet localized in space. Possible experimental avenues to measure the entanglement entropy in the decay of mesons at rest are discussed.

Journal Article

Share this book

Add to My Shelf