Catalogue Search | MBRL

FaST linear mixed models for genome-wide association studies

by Kadie, Carl M , Listgarten, Jennifer , Liu, Ying in 631/1647/2217/2138 , 631/1647/48 , Algorithms

2011

An algorithm for linear mixed models substantially reduces memory usage and run time for genome-wide association studies. The improved algorithm scales linearly in cohort size, allowing the application of these models to much larger samples. We describe factored spectrally transformed linear mixed models (FaST-LMM), an algorithm for genome-wide association studies (GWAS) that scales linearly with cohort size in both run time and memory use. On Wellcome Trust data for 15,000 individuals, FaST-LMM ran an order of magnitude faster than current efficient algorithms. Our algorithm can analyze data for 120,000 individuals in just a few hours, whereas current algorithms fail on data for even 20,000 individuals ( http://mscompbio.codeplex.com/ ).

Journal Article

Share this book

Add to My Shelf

Epigenome-wide association studies without the need for cell-type composition

by Aryee, Martin , Listgarten, Jennifer , Heckerman, David in 631/114/2415 , 631/114/2785 , 631/1647/2210/2213

2014

A statistical approach using a linear mixed model and principal-component analysis discovers phenotype-specific changes in epigenomes without requiring information on cell type composition. In epigenome-wide association studies, cell-type composition often differs between cases and controls, yielding associations that simply tag cell type rather than reveal fundamental biology. Current solutions require actual or estimated cell-type composition—information not easily obtainable for many samples of interest. We propose a method, FaST-LMM-EWASher, that automatically corrects for cell-type composition without the need for explicit knowledge of it, and then validate our method by comparison with the state-of-the-art approach. Corresponding software is available from http://www.microsoft.com/science/ .

Journal Article

Share this book

Add to My Shelf

Beyond Atopy: Multiple Patterns of Sensitization in Relation to Asthma in a Birth Cohort Study

by Simpson, Angela , Winn, John , Bishop, Christopher M. in Allergens , Anesthesia. Intensive care medicine. Transfusions. Cell therapy and gene therapy , Animals

2010

Abstract Rationale The pattern of IgE response (over time or to specific allergens) may reflect different atopic vulnerabilities which are related to the presence of asthma in a fundamentally different way from current definition of atopy. Objectives To redefine the atopic phenotype by identifying latent structure within a complex dataset, taking into account the timing and type of sensitization to specific allergens, and relating these novel phenotypes to asthma. Methods In a population-based birth cohort in which multiple skin and IgE tests have been taken throughout childhood, we used a machine learning approach to cluster children into multiple atopic classes in an unsupervised way. We then investigated the relation between these classes and asthma (symptoms, hospitalizations, lung function and airway reactivity). Measurements and Main Results A five-class model indicated a complex latent structure, in which children with atopic vulnerability were clustered into four distinct classes (Multiple Early [112/1053, 10.6%]; Multiple Late [171/1053, 16.2%]; Dust Mite [47/1053, 4.5%]; and Non-dust Mite [100/1053, 9.5%]), with a fifth class describing children with No Latent Vulnerability (623/1053, 59.2%). The association with asthma was considerably stronger for Multiple Early compared with other classes and conventionally defined atopy (odds ratio [95% CI]: 29.3 [11.1–77.2] versus 12.4 [4.8–32.2] versus 11.6 [4.8–27.9] for Multiple Early class versus Ever Atopic versus Atopic age 8). Lung function and airway reactivity were significantly poorer among children in Multiple Early class. Cox regression demonstrated a highly significant increase in risk of hospital admissions for wheeze/asthma after age 3 yr only among children in the Multiple Early class (HR 9.2 [3.5–24.0], P < 0.001). Conclusions IgE antibody responses do not reflect a single phenotype of atopy, but several different atopic vulnerabilities which differ in their relation with asthma presence and severity. Clinical trial registered with www.controlled-trials.com (ISRCTN72673620).

Journal Article

Share this book

Add to My Shelf

Ranking of non-coding pathogenic variants and putative essential regions of the human genome

by Wells, Alex , Heckerman, David , Yin, Li in 45/23 , 631/114/2397 , 631/208/212/2301

2019

A gene is considered essential if loss of function results in loss of viability, fitness or in disease. This concept is well established for coding genes; however, non-coding regions are thought less likely to be determinants of critical functions. Here we train a machine learning model using functional, mutational and structural features, including new genome essentiality metrics, 3D genome organization and enhancer reporter data to identify deleterious variants in non-coding regions. We assess the model for functional correlates by using data from tiling-deletion-based and CRISPR interference screens of activity of cis -regulatory elements in over 3 Mb of genome sequence. Finally, we explore two user cases that involve indels and the disruption of enhancers associated with a developmental disease. We rank variants in the non-coding genome according to their predicted deleteriousness. The model prioritizes non-coding regions associated with regulation of important genes and with cell viability, an in vitro surrogate of essentiality. Whole genome sequencing (WGS) holds promise to solve a subset of Mendelian disease cases for which exome sequencing did not provide a genetic diagnosis. Here, Wells et al. report a supervised machine learning model trained on functional, mutational and structural features for rank-scoring and interpreting variants in non-coding regions from WGS.

Journal Article

Share this book

Add to My Shelf

Efficient Control of Population Structure in Model Organism Association Mapping

by Heckerman, David , Wade, Claire M , Zaitlen, Noah A in Animals , Arabidopsis - genetics , Body Weight - genetics

2008

Genomewide association mapping in model organisms such as inbred mouse strains is a promising approach for the identification of risk factors related to human diseases. However, genetic association studies in inbred model organisms are confronted by the problem of complex population structure among strains. This induces inflated false positive rates, which cannot be corrected using standard approaches applied in human association studies such as genomic control or structured association. Recent studies demonstrated that mixed models successfully correct for the genetic relatedness in association mapping in maize and Arabidopsis panel data sets. However, the currently available mixed-model methods suffer from computational inefficiency. In this article, we propose a new method, efficient mixed-model association (EMMA), which corrects for population structure and genetic relatedness in model organism association mapping. Our method takes advantage of the specific nature of the optimization problem in applying mixed models for association mapping, which allows us to substantially increase the computational speed and reliability of the results. We applied EMMA to in silico whole-genome association mapping of inbred mouse strains involving hundreds of thousands of SNPs, in addition to Arabidopsis and maize data sets. We also performed extensive simulation studies to estimate the statistical power of EMMA under various SNP effects, varying degrees of population structure, and differing numbers of multiple measurements per strain. Despite the limited power of inbred mouse association mapping due to the limited number of available inbred strains, we are able to identify significantly associated SNPs, which fall into known QTL or genes identified through previous studies while avoiding an inflation of false positives. An R package implementation and webserver of our EMMA method are publicly available.

Journal Article

Share this book

Add to My Shelf

Chromosome 9p21 in amyotrophic lateral sclerosis in Finland: a genome-wide association study

by Hernandez, Dena G , Heckerman, David , Tienari, Pentti J in Adult , Aged , Aged, 80 and over

2010

The genetic cause of amyotrophic lateral sclerosis (ALS) is not well understood. Finland is a well suited location for a genome-wide association study of ALS because the incidence of the disease is one of the highest in the world, and because the genetic homogeneity of the Finnish population enhances the ability to detect risk loci. We aimed to identify genetic risk factors for ALS in the Finnish population. We did a genome-wide association study of Finnish patients with ALS and control individuals by use of Illumina genome-wide genotyping arrays. DNA was collected from patients who attended an ALS specialty clinic that receives referrals from neurologists throughout Finland. Control samples were from a population-based study of elderly Finnish individuals. Patients known to carry D90A alleles of the SOD1 gene (n=40) were included in the final analysis as positive controls to assess whether our genome-wide association study was able to detect an association signal at this locus. We obtained samples from 442 patients with ALS and 521 control individuals. After quality control filters were applied, 318 167 single nucleotide polymorphisms (SNPs) from 405 people with ALS and 497 control individuals were available for analysis. We identified two association peaks that exceeded genome-wide significance. One was located on chromosome 21q22 (rs13048019, p=2·58×10 −8), which corresponds to the autosomal recessive D90A allele of the SOD1 gene. The other was detected in a 232 kb block of linkage disequilibrium (rs3849942, p=9·11×10 −11) in a region of chromosome 9p that was previously identified in linkage studies of families with ALS. Within this region, we defined a 42-SNP haplotype that was associated with significantly increased risk of ALS (p=7·47×10 −33 when people with familial ALS were compared with controls, odds ratio 21·0, 95% CI 11·2–39·1) and which overlapped with an association locus recently reported for frontotemporal dementia. For the 93 patients with familial ALS, the population attributable risk for the chromosome 9p21 locus was 37·9% (95% CI 27·7–48·1) and that for D90A homozygosity was 25·5% (16·9–34·1). The chromosome 9p21 locus is a major cause of familial ALS in the Finnish population. Our data suggest the presence of a founder mutation for chromosome 9p21-linked ALS. Furthermore, the overlap with the risk haplotype recently reported for frontotemporal dementia provides further evidence of a shared genetic cause for these two neurodegenerative diseases. National Institutes of Health and National Institute on Aging, Microsoft Research, ALS Association, Helsinki University Central Hospital, Finnish Academy, Finnish Medical Society Duodecim, and Kuopio University.

Journal Article

Share this book

Add to My Shelf

Correction for hidden confounders in the genetic analysis of gene expression

by Schadt, Eric E. , Listgarten, Jennifer , Heckerman, David in Animals , Biological Sciences , computer software

2010

Understanding the genetic underpinnings of disease is important for screening, treatment, drug development, and basic biological insight. One way of getting at such an understanding is to find out which parts of our DNA, such as single-nucleotide polymorphisms, affect particular intermediary processes such as gene expression. Naively, such associations can be identified using a simple statistical test on all paired combinations of genetic variants and gene transcripts. However, a wide variety of confounders lie hidden in the data, leading to both spurious associations and missed associations if not properly addressed. We present a statistical model that jointly corrects for two particular kinds of hidden structure—population structure (e.g., race, family-relatedness), and microarray expression artifacts (e.g., batch effects), when these confounders are unknown. Applying our method to both real and synthetic, human and mouse data, we demonstrate the need for such a joint correction of confounders, and also the disadvantages of other possible approaches based on those in the current literature. In particular, we show that our class of models has maximum power to detect eQTL on synthetic data, and has the best performance on a bronze standard applied to real data. Lastly, our software and the associations we found with it are available at http://www.microsoft.com/science.

Journal Article

Share this book

Add to My Shelf

Selection bias at the heterosexual HIV-1 transmission bottleneck

by Goepfert, Paul , Shapiro, Roger , Prince, Jessica in amino acid sequences , Amino acids , Bias

2014

Although you might not think it, it's hard to catch HIV. Less than 1% of unprotected sexual exposures result in infection. What then leads to transmission? Carlson et al. determined the amino acid sequence of viruses infecting 137 Zambian heterosexual couples in which one partner infected the other (see the Perspective by Joseph and Swanstrom). The authors then used statistical modeling and found that transmitted viruses are typically the most evolutionarily fit. That is, compared to other viral variants in the infected person, the transmitted virus most closely matches the most common viral sequence found in the Zambian population. Science , this issue 10.1126/science.1254031 ; see also p. 136 An analysis of discordant couples reveals that transmitted HIV-1 viruses are typically the most evolutionarily fit. [Also see Perspective by Joseph and Swanstrom ] Heterosexual transmission of HIV-1 typically results in one genetic variant establishing systemic infection. We compared, for 137 linked transmission pairs, the amino acid sequences encoded by non-envelope genes of viruses in both partners and demonstrate a selection bias for transmission of residues that are predicted to confer increased in vivo fitness on viruses in the newly infected, immunologically naïve recipient. Although tempered by transmission risk factors, such as donor viral load, genital inflammation, and recipient gender, this selection bias provides an overall transmission advantage for viral quasispecies that are dominated by viruses with high in vivo fitness. Thus, preventative or therapeutic approaches that even marginally reduce viral fitness may lower the overall transmission rates and offer long-term benefits even upon successful transmission.

Journal Article

Share this book

Add to My Shelf

CTL Responses of High Functional Avidity and Broad Variant Cross-Reactivity Are Associated with HIV Control

by Ibarrondo, Javier , Heckerman, David , Mullins, James I. in Acquired immune deficiency syndrome , Adult , AIDS

2012

Cytotoxic T lymphocyte (CTL) responses targeting specific HIV proteins, in particular Gag, have been associated with relative control of viral replication in vivo. However, Gag-specific CTL can also be detected in individuals who do not control the virus and it remains thus unclear how Gag-specific CTL may mediate the beneficial effects in some individuals but not in others. Here, we used a 10mer peptide set spanning HIV Gag-p24 to determine immunogen-specific T-cell responses and to assess functional properties including functional avidity and cross-reactivity in 25 HIV-1 controllers and 25 non-controllers without protective HLA class I alleles. Our data challenge the common belief that Gag-specific T cell responses dominate the virus-specific immunity exclusively in HIV-1 controllers as both groups mounted responses of comparable breadths and magnitudes against the p24 sequence. However, responses in controllers reacted to lower antigen concentrations and recognized more epitope variants than responses in non-controllers. These cross-sectional data, largely independent of particular HLA genetics and generated using direct ex-vivo samples thus identify T cell responses of high functional avidity and with broad variant reactivity as potential functional immune correlates of relative HIV control.

Journal Article

Share this book

Add to My Shelf

ConPADE: Genome Assembly Ploidy Estimation from Next-Generation Sequencing Data

by Margarido, Gabriel R. A. , Heckerman, David in Accuracy , Algorithms , BASIC BIOLOGICAL SCIENCES

2015

As a result of improvements in genome assembly algorithms and the ever decreasing costs of high-throughput sequencing technologies, new high quality draft genome sequences are published at a striking pace. With well-established methodologies, larger and more complex genomes are being tackled, including polyploid plant genomes. Given the similarity between multiple copies of a basic genome in polyploid individuals, assembly of such data usually results in collapsed contigs that represent a variable number of homoeologous genomic regions. Unfortunately, such collapse is often not ideal, as keeping contigs separate can lead both to improved assembly and also insights about how haplotypes influence phenotype. Here, we describe a first step in avoiding inappropriate collapse during assembly. In particular, we describe ConPADE (Contig Ploidy and Allele Dosage Estimation), a probabilistic method that estimates the ploidy of any given contig/scaffold based on its allele proportions. In the process, we report findings regarding errors in sequencing. The method can be used for whole genome shotgun (WGS) sequencing data. We also show applicability of the method for variant calling and allele dosage estimation. Results for simulated and real datasets are discussed and provide evidence that ConPADE performs well as long as enough sequencing coverage is available, or the true contig ploidy is low. We show that ConPADE may also be used for related applications, such as the identification of duplicated genes in fragmented assemblies, although refinements are needed.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter