Catalogue Search | MBRL

The mutational constraint spectrum quantified from variation in 141,456 humans

by Roazen, David , Pierce-Hoffman, Emma , Novod, Sam in 45/23 , 631/208/212/2301 , 631/208/457/649/2219

2020

Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes 1 . Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

Journal Article

Share this book

Add to My Shelf

Transcript expression-aware annotation improves rare variant interpretation

by Poterba, Timothy , Alföldi, Jessica , Singer-Berk, Moriel in 38/23 , 38/91 , 45/91

2020

The acceleration of DNA sequencing in samples from patients and population studies has resulted in extensive catalogues of human genetic variation, but the interpretation of rare genetic variants remains problematic. A notable example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Here, by manual curation of putative loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD) 1 , we show that one explanation for this paradox involves alternative splicing of mRNA, which allows exons of a gene to be expressed at varying levels across different cell types. Currently, no existing annotation tool systematically incorporates information about exon expression into the interpretation of variants. We develop a transcript-level annotation metric known as the ‘proportion expressed across transcripts’, which quantifies isoform expression for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression (GTEx) project 2 and show that it can differentiate between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder and intellectual disability or developmental disorders to show that pLoF variants in weakly expressed regions have similar effect sizes to those of synonymous variants, whereas pLoF variants in highly expressed exons are most strongly enriched among cases. Our annotation is fast, flexible and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the analysis of rare variant burden in complex disorders, and the curation and prioritization of variants in recall-by-genotype studies. A novel variant annotation metric that quantifies the level of expression of genetic variants across tissues is validated in the Genome Aggregation Database (gnomAD) and is shown to improve rare variant interpretation.

Journal Article

Share this book

Add to My Shelf

Decoding Human Cytomegalovirus

by Hein, Marco Y. , Le, Vu Thuy Khanh , Shen, Ben in Alternative Splicing , Analytical, structural and metabolic biochemistry , Biological and medical sciences

2012

The human cytomegalovirus (HCMV) genome was sequenced 20 years ago. However, like those of other complex viruses, our understanding of its protein coding potential is far from complete. We used ribosome profiling and transcript analysis to experimentally define the HCMV translation products and follow their temporal expression. We identified hundreds of previously unidentified open reading frames and confirmed a fraction by means of mass spectrometry. We found that regulated use of alternative transcript start sites plays a broad role in enabling tight temporal control of HCMV protein expression and allowing multiple distinct polypeptides to be generated from a single genomic locus. Our results reveal an unanticipated complexity to the HCMV coding capacity and illustrate the role of regulated changes in transcript start sites in generating this complexity.

Journal Article

Share this book

Add to My Shelf

KSHV 2.0: A Comprehensive Annotation of the Kaposi's Sarcoma-Associated Herpesvirus Genome Using Next-Generation Sequencing Reveals Novel Genomic and Functional Features

by Bellare, Priya , Weisburd, Ben , Mercier, Alexandre in Architecture , Biology , Cell Line

2014

Productive herpesvirus infection requires a profound, time-controlled remodeling of the viral transcriptome and proteome. To gain insights into the genomic architecture and gene expression control in Kaposi's sarcoma-associated herpesvirus (KSHV), we performed a systematic genome-wide survey of viral transcriptional and translational activity throughout the lytic cycle. Using mRNA-sequencing and ribosome profiling, we found that transcripts encoding lytic genes are promptly bound by ribosomes upon lytic reactivation, suggesting their regulation is mainly transcriptional. Our approach also uncovered new genomic features such as ribosome occupancy of viral non-coding RNAs, numerous upstream and small open reading frames (ORFs), and unusual strategies to expand the virus coding repertoire that include alternative splicing, dynamic viral mRNA editing, and the use of alternative translation initiation codons. Furthermore, we provide a refined and expanded annotation of transcription start sites, polyadenylation sites, splice junctions, and initiation/termination codons of known and new viral features in the KSHV genomic space which we have termed KSHV 2.0. Our results represent a comprehensive genome-scale image of gene regulation during lytic KSHV infection that substantially expands our understanding of the genomic architecture and coding capacity of the virus.

Journal Article

Share this book

Add to My Shelf

Insights into the genetic epidemiology of Crohn's and rare diseases in the Ashkenazi Jewish population

by Beaugerie, Laurent , Silverberg, Mark S. , Pirinen, Matti in Algorithms , Ashkenazim , Biology and Life Sciences

2018

As part of a broader collaborative network of exome sequencing studies, we developed a jointly called data set of 5,685 Ashkenazi Jewish exomes. We make publicly available a resource of site and allele frequencies, which should serve as a reference for medical genetics in the Ashkenazim (hosted in part at https://ibd.broadinstitute.org, also available in gnomAD at http://gnomad.broadinstitute.org). We estimate that 34% of protein-coding alleles present in the Ashkenazi Jewish population at frequencies greater than 0.2% are significantly more frequent (mean 15-fold) than their maximum frequency observed in other reference populations. Arising via a well-described founder effect approximately 30 generations ago, this catalog of enriched alleles can contribute to differences in genetic risk and overall prevalence of diseases between populations. As validation we document 148 AJ enriched protein-altering alleles that overlap with \"pathogenic\" ClinVar alleles (table available at https://github.com/macarthur-lab/clinvar/blob/master/output/clinvar.tsv), including those that account for 10-100 fold differences in prevalence between AJ and non-AJ populations of some rare diseases, especially recessive conditions, including Gaucher disease (GBA, p.Asn409Ser, 8-fold enrichment); Canavan disease (ASPA, p.Glu285Ala, 12-fold enrichment); and Tay-Sachs disease (HEXA, c.1421+1G>C, 27-fold enrichment; p.Tyr427IlefsTer5, 12-fold enrichment). We next sought to use this catalog, of well-established relevance to Mendelian disease, to explore Crohn's disease, a common disease with an estimated two to four-fold excess prevalence in AJ. We specifically attempt to evaluate whether strong acting rare alleles, particularly protein-truncating or otherwise large effect-size alleles, enriched by the same founder-effect, contribute excess genetic risk to Crohn's disease in AJ, and find that ten rare genetic risk factors in NOD2 and LRRK2 are enriched in AJ (p < 0.005), including several novel contributing alleles, show evidence of association to CD. Independently, we find that genomewide common variant risk defined by GWAS shows a strong difference between AJ and non-AJ European control population samples (0.97 s.d. higher, p<10-16). Taken together, the results suggest coordinated selection in AJ population for higher CD risk alleles in general. The results and approach illustrate the value of exome sequencing data in case-control studies along with reference data sets like ExAC (sites VCF available via FTP at ftp.broadinstitute.org/pub/ExAC_release/release0.3/) to pinpoint genetic variation that contributes to variable disease predisposition across populations.

Journal Article

Share this book

Add to My Shelf

Compensatory induction of MYC expression by sustained CDK9 inhibition via a BRD4-dependent mechanism

by Ji, Xiaodan , Sutton, James , Gao, Zhenhai in Apoptosis , Biochemistry , BRD4

2015

CDK9 is the kinase subunit of positive transcription elongation factor b (P-TEFb) that enables RNA polymerase (Pol) II's transition from promoter-proximal pausing to productive elongation. Although considerable interest exists in CDK9 as a therapeutic target, little progress has been made due to lack of highly selective inhibitors. Here, we describe the development of i-CDK9 as such an inhibitor that potently suppresses CDK9 phosphorylation of substrates and causes genome-wide Pol II pausing. While most genes experience reduced expression, MYC and other primary response genes increase expression upon sustained i-CDK9 treatment. Essential for this increase, the bromodomain protein BRD4 captures P-TEFb from 7SK snRNP to deliver to target genes and also enhances CDK9's activity and resistance to inhibition. Because the i-CDK9-induced MYC expression and binding to P-TEFb compensate for P-TEFb's loss of activity, only simultaneously inhibiting CDK9 and MYC/BRD4 can efficiently induce growth arrest and apoptosis of cancer cells, suggesting the potential of a combinatorial treatment strategy. Cancers are often caused by mutations in genes that allow cells to proliferate uncontrollably. One gene that is frequently mutated in many cancers encodes a protein called MYC. The activity of this gene is normally tightly controlled, but the mutations found in human cancer cells mean that this gene is constantly switched on, and so too much MYC protein is produced. Previous studies have shown that a protein complex called ‘positive transcription elongation factor b’ (or P-TEFb for short) is essential to control the expression of the gene for MYC. P-TEFb works with an enzyme called RNA polymerase II to copy the instructions contained in protein-coding genes into long molecules called messenger RNAs. This process is called transcription and it involves a number of stages. P-TEFb is needed to start of one these stages, which is known as the ‘elongation’ step. P-TEFb consists of two protein subunits; one of which—an enzyme called CDK9—is the catalytic subunit. Most of the P-TEFb complexes in a cell are held in an inactive form, in which the activity of the CDK9 subunit is suppressed. If CDK9 is active when it should not be, certain proteins—including the MYC protein—can be produced in abnormally high amounts. This means that inhibiting CDK9 has been investigated as one way to reduce the production of the MYC protein. While some CDK9 inhibitors already exist, these compounds have the undesirable effect of also inhibiting other related enzymes and thus killing normal cells. Hence, new and more selective inhibitors of CDK9 are urgently needed. Lu, Xue et al. have now developed a new inhibitor of CDK9, called i-CDK9. The experiments show that i-CDK9 can potently inhibit CDK9 activity; and in human cells, very low levels of i-CDK9 prevented RNA polymerase II carrying out elongation for many genes. Unexpectedly, Lu, Xue et al. observed that more messenger RNA molecules that encode MYC were produced after cells were treated with low levels of i-CDK9. Further investigation revealed that this unexpected result occurred because the P-TEFb complexes were released from the inactive form and brought to the MYC gene by another protein called BRD4. This stimulated production of the MYC messenger RNAs. When P-TEFb was bound by BRD4, the CDK9 activity was also protected against inhibition by i-CDK9. Moreover, the reason that the MYC expression was induced by i-CDK9 is because the cells can compensate for the loss of CDK9 by using MYC to maintain the production of messenger RNAs of many key genes; these genes include the gene for MYC itself. These results suggest that CDK9 and MYC must be simultaneously inhibited in order to effectively treat cancers.

Journal Article

Share this book

Add to My Shelf

STRchive: a dynamic resource detailing population-level and locus-specific insights at tandem repeat disease loci

by VanNoy, Grace E. , Rehm, Heidi L. , Weisburd, Ben in Accuracy , Ataxia , Automation

2025

Approximately 8% of the human genome consists of repetitive elements called tandem repeats (TRs): short tandem repeats (STRs) of 1–6 bp motifs and variable number tandem repeats (VNTRs) of 7 + bp motifs. TR variants contribute to several dozen monogenic diseases but remain understudied and enigmatic. It remains comparatively challenging to interpret the clinical significance of TR variants, particularly relative to single nucleotide variants. We present STRchive ( http://strchive.org/ ), a dynamic resource consolidating information on TR disease loci from the research literature, up-to-date clinical resources, and large-scale genomic databases, streamlining TR variant interpretation at disease-associated loci.

Journal Article

Share this book

Add to My Shelf

Muscle transcriptome profiling reveals novel molecular pathways and biomarkers in laminin-α2 deficient patients

by Bonaccorso, Rosa , Weisburd, Ben , Pini, Veronica in Biomarkers , Biomedical and Life Sciences , Biomedicine

2026

Merosin-deficient congenital muscular dystrophy (LAMA2-RD) is a neuromuscular disorder caused by mutations in the LAMA2 gene, coding for the α2 subunit of laminin-211 (merosin). LAMA2 mutations leading to complete laminin-211 absence result in a severe clinical phenotype with profound muscle weakness and respiratory insufficiency, whereas mutations allowing the production of a partially functional protein are often associated with milder phenotypes. While several dysregulated pathways linked to LAMA2-RD have been reported, an in-depth characterization of muscle gene expression in patients with mutations differentially affecting LAMA2 expression is still lacking. We generated muscle transcriptomic data from patients with either complete or partial laminin-211 deficiency alongside healthy controls, and relied on complementary bioinformatic tools and curated literature review to identify pathways linked to the most dysregulated processes. Genes related to fibrosis, inflammation and metabolism were similarly expressed in both patient cohorts. However, a subset of novel pro-fibrotic and pro-inflammatory genes and lncRNAs (including PF4 and HOTAIR ) were exclusively expressed in patients (and mice) completely lacking laminin-211, indicating aspects exacerbated in this cohort. This study provides a comprehensive characterization of the main contributors to human LAMA2-RD pathology across disease severity, shedding light into novel genes and molecular pathways that could potentially serve as disease biomarkers or as targets for future therapeutic interventions.

Journal Article

Share this book

Add to My Shelf

REViewer: haplotype-resolved visualization of read alignments in and around tandem repeats

by Anyansi, Christine , Rehm, Heidi L. , Jadhav, Bharati in Alleles , Amyotrophic lateral sclerosis , Amyotrophic Lateral Sclerosis - genetics

2022

Background Expansions of short tandem repeats are the cause of many neurogenetic disorders including familial amyotrophic lateral sclerosis, Huntington disease, and many others. Multiple methods have been recently developed that can identify repeat expansions in whole genome or exome sequencing data. Despite the widely recognized need for visual assessment of variant calls in clinical settings, current computational tools lack the ability to produce such visualizations for repeat expansions. Expanded repeats are difficult to visualize because they correspond to large insertions relative to the reference genome and involve many misaligning and ambiguously aligning reads. Results We implemented REViewer, a computational method for visualization of sequencing data in genomic regions containing long repeat expansions and FlipBook, a companion image viewer designed for manual curation of large collections of REViewer images. To generate a read pileup, REViewer reconstructs local haplotype sequences and distributes reads to these haplotypes in a way that is most consistent with the fragment lengths and evenness of read coverage. To create appropriate training materials for onboarding new users, we performed a concordance study involving 12 scientists involved in short tandem repeat research. We used the results of this study to create a user guide that describes the basic principles of using REViewer as well as a guide to the typical features of read pileups that correspond to low confidence repeat genotype calls. Additionally, we demonstrated that REViewer can be used to annotate clinically relevant repeat interruptions by comparing visual assessment results of 44 FMR1 repeat alleles with the results of triplet repeat primed PCR. For 38 of these alleles, the results of visual assessment were consistent with triplet repeat primed PCR. Conclusions Read pileup plots generated by REViewer offer an intuitive way to visualize sequencing data in regions containing long repeat expansions. Laboratories can use REViewer and FlipBook to assess the quality of repeat genotype calls as well as to visually detect interruptions or other imperfections in the repeat sequence and the surrounding flanking regions. REViewer and FlipBook are available under open-source licenses at https://github.com/illumina/REViewer and https://github.com/broadinstitute/flipbook respectively.

Journal Article

Share this book

Add to My Shelf

Addendum: The mutational constraint spectrum quantified from variation in 141,456 humans

by Roazen, David , Pierce-Hoffman, Emma , Novod, Sam in 45/23 , 631/208/212/2301 , 631/208/457/649/2219

2021

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter