Catalogue Search | MBRL

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

by Ebler, Jana , Korbel, Jan O. , Ebert, Peter in 45/23 , 631/114/2785 , 631/208/212

2022

Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k -mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k -mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation—a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (≥50 bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows. PanGenie is an alignment-free, k -mer-based tool that utilizes a haplotype-resolved pangenome reference to genotype a wide range of variants.

Journal Article

Share this book

Add to My Shelf

Type 2 and interferon inflammation regulate SARS-CoV-2 entry factor expression in the airway epithelium

by Pruesse, Elmar , Rodriguez-Santana, Jose , DeFord, Peter in 13/106 , 38/91 , 631/208/200

2020

Coronavirus disease 2019 (COVID-19) is caused by SARS-CoV-2, an emerging virus that utilizes host proteins ACE2 and TMPRSS2 as entry factors. Understanding the factors affecting the pattern and levels of expression of these genes is important for deeper understanding of SARS-CoV-2 tropism and pathogenesis. Here we explore the role of genetics and co-expression networks in regulating these genes in the airway, through the analysis of nasal airway transcriptome data from 695 children. We identify expression quantitative trait loci for both ACE2 and TMPRSS2 , that vary in frequency across world populations. We find TMPRSS2 is part of a mucus secretory network, highly upregulated by type 2 (T2) inflammation through the action of interleukin-13, and that the interferon response to respiratory viruses highly upregulates ACE2 expression. IL-13 and virus infection mediated effects on ACE2 expression were also observed at the protein level in the airway epithelium. Finally, we define airway responses to common coronavirus infections in children, finding that these infections generate host responses similar to other viral species, including upregulation of IL6 and ACE2 . Our results reveal possible mechanisms influencing SARS-CoV-2 infectivity and COVID-19 clinical outcomes. ACE2 and TMPRSS2 have received recent attention as entry factors for SARS-CoV-2. Here the authors analyze nasal airway transcriptome data from 695 children determining ACE2 and TMPRSS2 expression is induced by viral and type2 inflammation, respectively, and both exhibit eQTLs that vary across world populations.

Journal Article

Share this book

Add to My Shelf

Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects

by Zhang, Yeting , Matise, Tara , Zody, Michael C. in 45/23 , 631/114/1314 , 631/114/2785

2018

Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing. This approach is no longer tenable given the scale of current studies and data volumes. Here, we define WGS data processing standards that allow different groups to produce functionally equivalent (FE) results, yet still innovate on data processing pipelines. We present initial FE pipelines developed at five genome centers and show that they yield similar variant calling results and produce significantly less variability than sequencing replicates. This work alleviates a key technical bottleneck for genome aggregation and helps lay the foundation for community-wide human genetics studies. Sharing of whole genome sequencing (WGS) data improves study scale and power, but data from different groups are often incompatible. Here, US genome centers and NIH programs define WGS data processing standards and a flexible validation method, facilitating collaboration in human genetics research.

Journal Article

Share this book

Add to My Shelf

The genomic basis of adaptive evolution in threespine sticklebacks

by Zody, Michael C. , Miller, Craig T. , Chan, Yingguang Frank in 631/158/857 , 631/181/759/2467 , 631/208/182

2012

Marine stickleback fish have colonized and adapted to thousands of streams and lakes formed since the last ice age, providing an exceptional opportunity to characterize genomic mechanisms underlying repeated ecological adaptation in nature. Here we develop a high-quality reference genome assembly for threespine sticklebacks. By sequencing the genomes of twenty additional individuals from a global set of marine and freshwater populations, we identify a genome-wide set of loci that are consistently associated with marine–freshwater divergence. Our results indicate that reuse of globally shared standing genetic variation, including chromosomal inversions, has an important role in repeated evolution of distinct marine and freshwater sticklebacks, and in the maintenance of divergent ecotypes during early stages of reproductive isolation. Both coding and regulatory changes occur in the set of loci underlying marine–freshwater evolution, but regulatory changes appear to predominate in this well known example of repeated adaptive evolution in nature. A reference genome sequence for threespine sticklebacks, and re-sequencing of 20 additional world-wide populations, reveals loci used repeatedly during vertebrate evolution; multiple chromosome inversions contribute to marine-freshwater divergence, and regulatory variants predominate over coding variants in this classic example of adaptive evolution in natural environments. The genomics of stickleback speciation Threespine sticklebacks have become a powerful model for studying the molecular basis of adaptive evolution. This paper presents a high-quality reference genome sequence, along with genomes of 20 further individuals from a global set of marine and freshwater populations. Genomic analysis reveals that reuse of globally shared standing genetic variation plays an important part in repeated evolution of distinct stickleback populations, and in the maintenance of divergent ecotypes during early stages of reproductive isolation. The data are consistent with an important role for regulatory changes during parallel evolution of marine and freshwater sticklebacks.

Journal Article

Share this book

Add to My Shelf

Characterization of a Novel Orthomyxo-like Virus Causing Mass Die-Offs of Tilapia

by Zamostiano, Rachel , Toussaint, Nora C. , Zody, Michael C. in Amino Acid Sequence , Amino acids , Animals

2016

Tilapia are an important global food source due to their omnivorous diet, tolerance for high-density aquaculture, and relative disease resistance. Since 2009, tilapia aquaculture has been threatened by mass die-offs in farmed fish in Israel and Ecuador. Here we report evidence implicating a novel orthomyxo-like virus in these outbreaks. The tilapia lake virus (TiLV) has a 10-segment, negative-sense RNA genome. The largest segment, segment 1, contains an open reading frame with weak sequence homology to the influenza C virus PB1 subunit. The other nine segments showed no homology to other viruses but have conserved, complementary sequences at their 5′ and 3′ termini, consistent with the genome organization found in other orthomyxoviruses. In situ hybridization indicates TiLV replication and transcription at sites of pathology in the liver and central nervous system of tilapia with disease. IMPORTANCE The economic impact of worldwide trade in tilapia is estimated at$7.5 billion U.S. dollars (USD) annually. The infectious agent implicated in mass tilapia die-offs in two continents poses a threat to the global tilapia industry, which not only provides inexpensive dietary protein but also is a major employer in the developing world. Here we report characterization of the causative agent as a novel orthomyxo-like virus, tilapia lake virus (TiLV). We also describe complete genomic and protein sequences that will facilitate TiLV detection and containment and enable vaccine development. The economic impact of worldwide trade in tilapia is estimated at $ 7.5 billion U.S. dollars (USD) annually. The infectious agent implicated in mass tilapia die-offs in two continents poses a threat to the global tilapia industry, which not only provides inexpensive dietary protein but also is a major employer in the developing world. Here we report characterization of the causative agent as a novel orthomyxo-like virus, tilapia lake virus (TiLV). We also describe complete genomic and protein sequences that will facilitate TiLV detection and containment and enable vaccine development.

Journal Article

Share this book

Add to My Shelf

Multi-omic analysis of Huntington’s disease reveals a compensatory astrocyte state

by Goldman, James E. , Kwon, Ji-Sun , Vonsattel, Jean Paul in 101/58 , 13/106 , 45/43

2024

The mechanisms underlying the selective regional vulnerability to neurodegeneration in Huntington’s disease (HD) have not been fully defined. To explore the role of astrocytes in this phenomenon, we used single-nucleus and bulk RNAseq, lipidomics, HTT gene CAG repeat-length measurements, and multiplexed immunofluorescence on HD and control post-mortem brains. We identified genes that correlated with CAG repeat length, which were enriched in astrocyte genes, and lipidomic signatures that implicated poly-unsaturated fatty acids in sensitizing neurons to cell death. Because astrocytes play essential roles in lipid metabolism, we explored the heterogeneity of astrocytic states in both protoplasmic and fibrous-like (CD44+) astrocytes. Significantly, one protoplasmic astrocyte state showed high levels of metallothioneins and was correlated with the selective vulnerability of distinct striatal neuronal populations. When modeled in vitro, this state improved the viability of HD-patient-derived spiny projection neurons. Our findings uncover key roles of astrocytic states in protecting against neurodegeneration in HD. Huntington’s disease (HD) is a neurodegenerative disease that shows selective regional vulnerability. Here, the authors show that postmortem brain HD astrocytes are regionally diverse, with a striatal disease-associated state and a cortical compensatory state that mitigated neural death.

Journal Article

Share this book

Add to My Shelf

Deep whole-genome sequencing of 3 cancer cell lines on 2 sequencing platforms

by Robine, Nicolas , Sanghvi, Rashesh , Zody, Michael C. in 45/23 , 631/114/2785 , 631/61/212

2019

To test the performance of a new sequencing platform, develop an updated somatic calling pipeline and establish a reference for future benchmarking experiments, we performed whole-genome sequencing of 3 common cancer cell lines (COLO-829, HCC-1143 and HCC-1187) along with their matched normal cell lines to great sequencing depths (up to 278x coverage) on both Illumina HiSeqX and NovaSeq sequencing instruments. Somatic calling was generally consistent between the two platforms despite minor differences at the read level. We designed and implemented a novel pipeline for the analysis of tumor-normal samples, using multiple variant callers. We show that coupled with a high-confidence filtering strategy, the use of combination of tools improves the accuracy of somatic variant calling. We also demonstrate the utility of the dataset by creating an artificial purity ladder to evaluate the somatic pipeline and benchmark methods for estimating purity and ploidy from tumor-normal pairs. The data and results of the pipeline are made accessible to the cancer genomics community.

Journal Article

Share this book

Add to My Shelf

Mutations causing medullary cystic kidney disease type 1 lie in a large VNTR in MUC1 missed by massively parallel sequencing

by Robinson, James T , Vylet'al, Petr , Handsaker, Robert E in 631/208/2489/144 , 631/208/514/2254 , 692/699/1585

2013

Anthony Bleyer, Eric Lander, Mark Daly and colleagues show that frameshift mutations in a large VNTR of MUC1 cause medullary cystic kidney disease type 1. Their discovery sheds light on the biology of this disease and highlights challenges in using massively parallel sequencing technologies to characterize certain types of sequence variants. Although genetic lesions responsible for some mendelian disorders can be rapidly discovered through massively parallel sequencing of whole genomes or exomes, not all diseases readily yield to such efforts. We describe the illustrative case of the simple mendelian disorder medullary cystic kidney disease type 1 (MCKD1), mapped more than a decade ago to a 2-Mb region on chromosome 1. Ultimately, only by cloning, capillary sequencing and de novo assembly did we find that each of six families with MCKD1 harbors an equivalent but apparently independently arising mutation in sequence markedly under-represented in massively parallel sequencing data: the insertion of a single cytosine in one copy (but a different copy in each family) of the repeat unit comprising the extremely long (∼1.5–5 kb), GC-rich (>80%) coding variable-number tandem repeat (VNTR) sequence in the MUC1 gene encoding mucin 1. These results provide a cautionary tale about the challenges in identifying the genes responsible for mendelian, let alone more complex, disorders through massively parallel sequencing.

Journal Article

Share this book

Add to My Shelf

Genome Evolution Following Host Jumps in the Irish Potato Famine Pathogen Lineage

by Jiang, Rays H.Y , Zody, Michael C , Farrer, Rhys A in Adaptation, Physiological - genetics , Agricultural sciences , Airborne microorganisms

2010

Many plant pathogens, including those in the lineage of the Irish potato famine organism Phytophthora infestans, evolve by host jumps followed by specialization. However, how host jumps affect genome evolution remains largely unknown. To determine the patterns of sequence variation in the P. infestans lineage, we resequenced six genomes of four sister species. This revealed uneven evolutionary rates across genomes with genes in repeat-rich regions showing higher rates of structural polymorphisms and positive selection. These loci are enriched in genes induced in planta, implicating host adaptation in genome evolution. Unexpectedly, genes involved in epigenetic processes formed another class of rapidly evolving residents of the gene-sparse regions. These results demonstrate that dynamic repeat-rich genome compartments underpin accelerated gene evolution following host jumps in this pathogen lineage.

Journal Article

Share this book

Add to My Shelf

Nasal airway transcriptome-wide association study of asthma reveals genetically driven mucus pathobiology

by Rios, Cydney L. , Fairbanks-Mahnke, Ana , Zody, Michael C. in 13/106 , 38/39 , 45/43

2022

To identify genetic determinants of airway dysfunction, we performed a transcriptome-wide association study for asthma by combining RNA-seq data from the nasal airway epithelium of 681 children, with UK Biobank genetic association data. Our airway analysis identified 102 asthma genes, 58 of which were not identified by transcriptome-wide association analyses using other asthma-relevant tissues. Among these genes were MUC5AC , an airway mucin, and FOXA3 , a transcriptional driver of mucus metaplasia. Muco-ciliary epithelial cultures from genotyped donors revealed that the MUC5AC risk variant increases MUC5AC protein secretion and mucus secretory cell frequency. Airway transcriptome-wide association analyses for mucus production and chronic cough also identified MUC5AC . These cis-expression variants were associated with trans effects on expression; the MUC5AC variant was associated with upregulation of non-inflammatory mucus secretory network genes, while the FOXA3 variant was associated with upregulation of type-2 inflammation-induced mucus-metaplasia pathway genes. Our results reveal genetic mechanisms of airway mucus pathobiology. Understanding regulation of genes associated to disease can reveal insights into disease mechanisms. Here, the authors perform an airway epithelial transcriptome-wide association analysis to elucidate genetic determinants of airway dysfunction in asthma, identifying genetic mechanisms of mucus pathobiology.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter