Catalogue Search | MBRL

TransferGWAS of T1-weighted brain MRI data from UK Biobank

by Monti, Remo , Lippert, Christoph , Rakowski, Alexander in Aged , Alzheimer Disease - diagnostic imaging , Alzheimer Disease - genetics

2024

Genome-wide association studies (GWAS) traditionally analyze single traits, e.g., disease diagnoses or biomarkers. Nowadays, large-scale cohorts such as UK Biobank (UKB) collect imaging data with sample sizes large enough to perform genetic association testing. Typical approaches to GWAS on high-dimensional modalities extract predefined features from the data, e.g., volumes of regions of interest. This limits the scope of such studies to predefined traits and can ignore novel patterns present in the data. TransferGWAS employs deep neural networks (DNNs) to extract low-dimensional representations of imaging data for GWAS, eliminating the need for predefined biomarkers. Here, we apply transferGWAS on brain MRI data from UKB. We encoded 36, 311 T1-weighted brain magnetic resonance imaging (MRI) scans using DNN models trained on MRI scans from the Alzheimer’s Disease Neuroimaging Initiative, and on natural images from the ImageNet dataset, and performed a multivariate GWAS on the resulting features. We identified 289 independent loci, associated among others with bone density, brain, or cardiovascular traits, and 11 regions having no previously reported associations. We fitted polygenic scores (PGS) of the deep features, which improved predictions of bone mineral density and several other traits in a multi-PGS setting, and computed genetic correlations with selected phenotypes, which pointed to novel links between diffusion MRI traits and type 2 diabetes. Overall, our findings provided evidence that features learned with DNN models can uncover additional heritable variability in the human brain beyond the predefined measures, and link them to a range of non-brain phenotypes.

Journal Article

Share this book

Add to My Shelf

FaST linear mixed models for genome-wide association studies

by Kadie, Carl M , Listgarten, Jennifer , Liu, Ying in 631/1647/2217/2138 , 631/1647/48 , Algorithms

2011

An algorithm for linear mixed models substantially reduces memory usage and run time for genome-wide association studies. The improved algorithm scales linearly in cohort size, allowing the application of these models to much larger samples. We describe factored spectrally transformed linear mixed models (FaST-LMM), an algorithm for genome-wide association studies (GWAS) that scales linearly with cohort size in both run time and memory use. On Wellcome Trust data for 15,000 individuals, FaST-LMM ran an order of magnitude faster than current efficient algorithms. Our algorithm can analyze data for 120,000 individuals in just a few hours, whereas current algorithms fail on data for even 20,000 individuals ( http://mscompbio.codeplex.com/ ).

Journal Article

Share this book

Add to My Shelf

Epigenome-wide association studies without the need for cell-type composition

by Aryee, Martin , Listgarten, Jennifer , Heckerman, David in 631/114/2415 , 631/114/2785 , 631/1647/2210/2213

2014

A statistical approach using a linear mixed model and principal-component analysis discovers phenotype-specific changes in epigenomes without requiring information on cell type composition. In epigenome-wide association studies, cell-type composition often differs between cases and controls, yielding associations that simply tag cell type rather than reveal fundamental biology. Current solutions require actual or estimated cell-type composition—information not easily obtainable for many samples of interest. We propose a method, FaST-LMM-EWASher, that automatically corrects for cell-type composition without the need for explicit knowledge of it, and then validate our method by comparison with the state-of-the-art approach. Corresponding software is available from http://www.microsoft.com/science/ .

Journal Article

Share this book

Add to My Shelf

A trustworthy AI reality-check: the lack of transparency of artificial intelligence products in healthcare

by Malpani, Rohit , Citro, Brian , Lippert, Christoph in AI ethics , Algorithms , Artificial intelligence

2024

Trustworthy medical AI requires transparency about the development and testing of underlying algorithms to identify biases and communicate potential risks of harm. Abundant guidance exists on how to achieve transparency for medical AI products, but it is unclear whether publicly available information adequately informs about their risks. To assess this, we retrieved public documentation on the 14 available CE-certified AI-based radiology products of the II b risk category in the EU from vendor websites, scientific publications, and the European EUDAMED database. Using a self-designed survey, we reported on their development, validation, ethical considerations, and deployment caveats, according to trustworthy AI guidelines. We scored each question with either 0, 0.5, or 1, to rate if the required information was “unavailable”, “partially available,” or “fully available.” The transparency of each product was calculated relative to all 55 questions. Transparency scores ranged from 6.4% to 60.9%, with a median of 29.1%. Major transparency gaps included missing documentation on training data, ethical considerations, and limitations for deployment. Ethical aspects like consent, safety monitoring, and GDPR-compliance were rarely documented. Furthermore, deployment caveats for different demographics and medical settings were scarce. In conclusion, public documentation of authorized medical AI products in Europe lacks sufficient public transparency to inform about safety and risks. We call on lawmakers and regulators to establish legally mandated requirements for public and substantive transparency to fulfill the promise of trustworthy AI for health.

Journal Article

Share this book

Add to My Shelf

easyGWAS

by Grimm, Dominik G. , Salomé, Patrice A. , Zhu, Wangsheng in Animal species , Arabidopsis - genetics , Arabidopsis - growth & development

2017

The ever-growing availability of high-quality genotypes for a multitude of species has enabled researchers to explore the underlying genetic architecture of complex phenotypes at an unprecedented level of detail using genome-wide association studies (GWAS). The systematic comparison of results obtained from GWAS of different traits opens up new possibilities, including the analysis of pleiotropic effects. Other advantages that result from the integration of multiple GWAS are the ability to replicate GWAS signals and to increase statistical power to detect such signals through meta-analyses. In order to facilitate the simple comparison of GWAS results, we present easyGWAS, a powerful, species-independent online resource for computing, storing, sharing, annotating, and comparing GWAS. The easyGWAS tool supports multiple species, the uploading of private genotype data and summary statistics of existing GWAS, as well as advanced methods for comparing GWAS results across different experiments and data sets in an interactive and user-friendly interface. easyGWAS is also a public data repository for GWAS data and summary statistics and already includes published data and results from several major GWAS. We demonstrate the potential of easyGWAS with a case study of the model organism Arabidopsis thaliana, using flowering and growth-related traits.

Journal Article

Share this book

Add to My Shelf

Predicting the SARS-CoV-2 effective reproduction number using bulk contact data from mobile phones

by Edelman, Jonathan Antonio , Rüdiger, Sten , Zernick, Detlef in Applied Mathematics , Biological Sciences , Biophysics and Computational Biology

2021

Over the last months, cases of SARS-CoV-2 surged repeatedly in many countries but could often be controlled with nonpharmaceutical interventions including social distancing. We analyzed deidentified Global Positioning System (GPS) tracking data from 1.15 to 1.4 million cell phones in Germany per day between March and November 2020 to identify encounters between individuals and statistically evaluate contact behavior. Using graph sampling theory, we estimated the contact index (CX), a metric for number and heterogeneity of contacts. We found that CX, and not the total number of contacts, is an accurate predictor for the effective reproduction number R derived from case numbers. A high correlation between CX and R recorded more than 2 wk later allows assessment of social behavior well before changes in case numbers become detectable. By construction, the CX quantifies the role of superspreading and permits assigning risks to specific contact behavior. We provide a critical CX value beyond which R is expected to rise above 1 and propose to use that value to leverage the social-distancing interventions for the coming months.

Journal Article

Share this book

Add to My Shelf

Identification of individuals by trait prediction using whole-genome sequencing data

by Zhu, Mingfu , Venter, J. Craig , Lu, Tim in Adult , African Americans , Age Factors

2017

Prediction of human physical traits and demographic information from genomic data challenges privacy and data deidentification in personalized medicine. To explore the current capabilities of phenotype-based genomic identification, we applied whole-genome sequencing, detailed phenotyping, and statistical modeling to predict biometric traits in a cohort of 1,061 participants of diverse ancestry. Individually, for a large fraction of the traits, their predictive accuracy beyond ancestry and demographic information is limited. However, we have developed a maximum entropy algorithm that integrates multiple predictions to determine which genomic samples and phenotype measurements originate from the same person. Using this algorithm, we have reidentified an average of >8 of 10 held-out individuals in an ethnically mixed cohort and an average of 5 of either 10 African Americans or 10 Europeans. This work challenges current conceptions of personal privacy and may have far-reaching ethical and legal implications.

Journal Article

Share this book

Add to My Shelf

Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes

by Ohler, Uwe , Rautenstrauch, Pia , Monti, Remo in 631/114/2184 , 631/208/205/2138 , 692/53/2421

2022

Here we present an exome-wide rare genetic variant association study for 30 blood biomarkers in 191,971 individuals in the UK Biobank. We compare gene-based association tests for separate functional variant categories to increase interpretability and identify 193 significant gene-biomarker associations. Genes associated with biomarkers were ~ 4.5-fold enriched for conferring Mendelian disorders. In addition to performing weighted gene-based variant collapsing tests, we design and apply variant-category-specific kernel-based tests that integrate quantitative functional variant effect predictions for missense variants, splicing and the binding of RNA-binding proteins. For these tests, we present a computationally efficient combination of the likelihood-ratio and score tests that found 36% more associations than the score test alone while also controlling the type-1 error. Kernel-based tests identified 13% more associations than their gene-based collapsing counterparts and had advantages in the presence of gain of function missense variants. We introduce local collapsing by amino acid position for missense variants and use it to interpret associations and identify potential novel gain of function variants in PIEZO1 . Our results show the benefits of investigating different functional mechanisms when performing rare-variant association tests, and demonstrate pervasive rare-variant contribution to biomarker variability. Genetic association studies for rare variants suffer from lack of power and thus there is a need for methods to improve rare variant discovery. Here, the authors present functionally informed association tests with increased statistical power to aid discovery and interpretation of rare variants.

Journal Article

Share this book

Add to My Shelf

Simulating rigid head motion artifacts on brain magnitude MRI data–Outcome on image quality and segmentation of the cerebral cortex

by Starke, Ludger , Klein, Tobias , Lai, Wei-Chang in Analysis , Biology and Life Sciences , Brain

2024

Magnetic Resonance Imaging (MRI) datasets from epidemiological studies often show a lower prevalence of motion artifacts than what is encountered in clinical practice. These artifacts can be unevenly distributed between subject groups and studies which introduces a bias that needs addressing when augmenting data for machine learning purposes. Since unreconstructed multi-channel k-space data is typically not available for population-based MRI datasets, motion simulations must be performed using signal magnitude data. There is thus a need to systematically evaluate how realistic such magnitude-based simulations are. We performed magnitude-based motion simulations on a dataset (MR-ART) from 148 subjects in which real motion-corrupted reference data was also available. The similarity of real and simulated motion was assessed by using image quality metrics (IQMs) including Coefficient of Joint Variation (CJV), Signal-to-Noise-Ratio (SNR), and Contrast-to-Noise-Ratio (CNR). An additional comparison was made by investigating the decrease in the Dice-Sørensen Coefficient (DSC) of automated segmentations with increasing motion severity. Segmentation of the cerebral cortex was performed with 6 freely available tools: FreeSurfer, BrainSuite, ANTs, SAMSEG, FastSurfer, and SynthSeg+. To better mimic the real subject motion, the original motion simulation within an existing data augmentation framework (TorchIO), was modified. This allowed a non-random motion paradigm and phase encoding direction. The mean difference in CJV/SNR/CNR between the real motion-corrupted images and our modified simulations (0.004±0.054/-0.7±1.8/-0.09±0.55) was lower than that of the original simulations (0.015±0.061/0.2±2.0/-0.29±0.62). Further, the mean difference in the DSC between the real motion-corrupted images was lower for our modified simulations (0.03±0.06) compared to the original simulations (-0.15±0.09). SynthSeg+ showed the highest robustness towards all forms of motion, real and simulated. In conclusion, reasonably realistic synthetic motion artifacts can be induced on a large-scale when only magnitude MR images are available to obtain unbiased data sets for the training of machine learning based models.

Journal Article

Share this book

Add to My Shelf

A unified framework for estimating country-specific cumulative incidence for 18 diseases stratified by polygenic risk

by Mägi, Reedik , Wolford, Brooke N. , Mars, Nina in 45/23 , 45/43 , 631/208/721

2024

Polygenic scores (PGSs) offer the ability to predict genetic risk for complex diseases across the life course; a key benefit over short-term prediction models. To produce risk estimates relevant to clinical and public health decision-making, it is important to account for varying effects due to age and sex. Here, we develop a novel framework to estimate country-, age-, and sex-specific estimates of cumulative incidence stratified by PGS for 18 high-burden diseases. We integrate PGS associations from seven studies in four countries ( N = 1,197,129) with disease incidences from the Global Burden of Disease. PGS has a significant sex-specific effect for asthma, hip osteoarthritis, gout, coronary heart disease and type 2 diabetes (T2D), with all but T2D exhibiting a larger effect in men. PGS has a larger effect in younger individuals for 13 diseases, with effects decreasing linearly with age. We show for breast cancer that, relative to individuals in the bottom 20% of polygenic risk, the top 5% attain an absolute risk for screening eligibility 16.3 years earlier. Our framework increases the generalizability of results from biobank studies and the accuracy of absolute risk estimates by appropriately accounting for age- and sex-specific PGS effects. Our results highlight the potential of PGS as a screening tool which may assist in the early prevention of common diseases. Here the authors present a framework for estimating disease risk using PGS accounting for country, age and sex. They find that PGSs have a significant sex-specific effect on common diseases, and their effect is typically larger in young individuals.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter