Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
70 result(s) for "Barash, Yoseph"
Sort by:
Integrative analysis reveals RNA G-quadruplexes in UTRs are selectively constrained and enriched for functional associations
G-quadruplex (G4) sequences are abundant in untranslated regions (UTRs) of human messenger RNAs, but their functional importance remains unclear. By integrating multiple sources of genetic and genomic data, we show that putative G-quadruplex forming sequences (pG4) in 5’ and 3’ UTRs are selectively constrained, and enriched for cis-eQTLs and RNA-binding protein (RBP) interactions. Using over 15,000 whole-genome sequences, we find that negative selection acting on central guanines of UTR pG4s is comparable to that of missense variation in protein-coding sequences. At multiple GWAS-implicated SNPs within pG4 UTR sequences, we find robust allelic imbalance in gene expression across diverse tissue contexts in GTEx, suggesting that variants affecting G-quadruplex formation within UTRs may also contribute to phenotypic variation. Our results establish UTR G4s as important cis-regulatory elements and point to a link between disruption of UTR pG4 and disease. G-quadruplexes (G4s) are secondary structures that can form in both DNA and RNA from guanine-rich sequences which are enriched in untranslated regions (UTRs). Here, Lee et al. find that putative G4-forming sequences are evolutionarily constrained, enriched for RNA-binding protein interactions and enriched for disease genetic associations.
A new view of transcriptome complexity and regulation through the lens of local splicing variations
Alternative splicing (AS) can critically affect gene function and disease, yet mapping splicing variations remains a challenge. Here, we propose a new approach to define and quantify mRNA splicing in units of local splicing variations (LSVs). LSVs capture previously defined types of alternative splicing as well as more complex transcript variations. Building the first genome wide map of LSVs from twelve mouse tissues, we find complex LSVs constitute over 30% of tissue dependent transcript variations and affect specific protein families. We show the prevalence of complex LSVs is conserved in humans and identify hundreds of LSVs that are specific to brain subregions or altered in Alzheimer's patients. Amongst those are novel isoforms in the Camk2 family and a novel poison exon in Ptbp1, a key splice factor in neurogenesis. We anticipate the approach presented here will advance the ability to relate tissue-specific splice variation to genetic variation, phenotype, and disease. Genes contain coded instructions to build other molecules that are collectively referred to as gene products. Building these products requires the gene’s instructions to be copied into a molecule of RNA in a process called transcription. Over 90% of human genes undergo a process by which different segments of the transcribed RNA molecule are either removed or retained. This process, termed alternative splicing, results in a single gene encoding different gene products that can perform in different ways. Alternative splicing can also mean that gene products vary between different cells, tissues and individuals. Some of these variations can be harmful and lead to disease. However, it is difficult with current methods to accurately identify variations in gene products that are due to alternative splicing and see how these products differ between groups of people, such as patients and healthy controls. Vaquero-Garcia, Barrera, Gazzara et al. have now developed new methods to define, measure and visualize the variations in RNA gene products. First, splicing variations were catalogued across a range of species from lizards to humans, which revealed that some fairly complicated variations were much more common than previously appreciated. These complex variations had not been studied much before, but the new methods showed that they make up a third of the variations in the RNA products copied from human genes. Vaquero-Garcia, Barrera, Gazzara et al. then showed that the new methods are more accurate and sensitive than previous methods, and can be used to discover splicing variations that were previously unknown. For example, applying the new methods to data collected in other studies revealed variations in genes that are important for brain development and activity. Further analysis then showed that these variations were also altered in brain samples from patients with Alzheimer disease. The new methods developed by Vaquero-Garcia, Barrera, Gazzara et al. can now shed new light on gene product variations, especially the more complex ones that have not been studied before. The next challenge is to use these tools to better understand the regulation and purpose of splicing variants and how they can contribute to diseases in humans.
Disrupting upstream translation in mRNAs is associated with human disease
Ribosome-profiling has uncovered pervasive translation in non-canonical open reading frames, however the biological significance of this phenomenon remains unclear. Using genetic variation from 71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5’UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals, respectively, and elucidate their impact on protein expression in human cells. Our results suggest translation disrupting mechanisms relating uORF variation to reduced protein expression, and demonstrate that translation at uORFs is genetically constrained in 50% of human genes. The significance of translated upstream open reading frames is not well known. Here, the authors investigate genetic variants in these regions, finding that they are under high evolutionary constraint and may contribute to disease.
Mapping RNA splicing variations in clinically accessible and nonaccessible tissues to facilitate Mendelian disease diagnosis using RNA-seq
Purpose RNA-seq is a promising approach to improve diagnoses by detecting pathogenic aberrations in RNA splicing that are missed by DNA sequencing. RNA-seq is typically performed on clinically accessible tissues (CATs) from blood and skin. RNA tissue specificity makes it difficult to identify aberrations in relevant but nonaccessible tissues (non-CATs). We determined how RNA-seq from CATs represent splicing in and across genes and non-CATs. Methods We quantified RNA splicing in 801 RNA-seq samples from 56 different adult and fetal tissues from Genotype-Tissue Expression Project (GTEx) and ArrayExpress. We identified genes and splicing events in each non-CAT and determined when RNA-seq in each CAT would inadequately represent them. We developed an online resource, MAJIQ-CAT, for exploring our analysis for specific genes and tissues. Results In non-CATs, 40.2% of genes have splicing that is inadequately represented by at least one CAT; 6.3% of genes have splicing inadequately represented by all CATs. A majority (52.1%) of inadequately represented genes are lowly expressed in CATs (transcripts per million (TPM) < 1), but 5.8% are inadequately represented despite being well expressed (TPM > 10). Conclusion Many splicing events in non-CATs are inadequately evaluated using RNA-seq from CATs. MAJIQ-CAT allows users to explore which accessible tissues, if any, best represent splicing in genes and tissues of interest.
Enhanced Integrated Gradients: improving interpretability of deep learning models using splicing codes as a case study
Despite the success and fast adaptation of deep learning models in biomedical domains, their lack of interpretability remains an issue. Here, we introduce Enhanced Integrated Gradients (EIG), a method to identify significant features associated with a specific prediction task. Using RNA splicing prediction as well as digit classification as case studies, we demonstrate that EIG improves upon the original Integrated Gradients method and produces sets of informative features. We then apply EIG to identify A1CF as a key regulator of liver-specific alternative splicing, supporting this finding with subsequent analysis of relevant A1CF functional (RNA-seq) and binding data (PAR-CLIP).
Identifying common transcriptome signatures of cancer by interpreting deep learning models
Background Cancer is a set of diseases characterized by unchecked cell proliferation and invasion of surrounding tissues. The many genes that have been genetically associated with cancer or shown to directly contribute to oncogenesis vary widely between tumor types, but common gene signatures that relate to core cancer pathways have also been identified. It is not clear, however, whether there exist additional sets of genes or transcriptomic features that are less well known in cancer biology but that are also commonly deregulated across several cancer types. Results Here, we agnostically identify transcriptomic features that are commonly shared between cancer types using 13,461 RNA-seq samples from 19 normal tissue types and 18 solid tumor types to train three feed-forward neural networks, based either on protein-coding gene expression, lncRNA expression, or splice junction use, to distinguish between normal and tumor samples. All three models recognize transcriptome signatures that are consistent across tumors. Analysis of attribution values extracted from our models reveals that genes that are commonly altered in cancer by expression or splicing variations are under strong evolutionary and selective constraints. Importantly, we find that genes composing our cancer transcriptome signatures are not frequently affected by mutations or genomic alterations and that their functions differ widely from the genes genetically associated with cancer. Conclusions Our results highlighted that deregulation of RNA-processing genes and aberrant splicing are pervasive features on which core cancer pathways might converge across a large array of solid tumor types.
MOCCASIN: a method for correcting for known and unknown confounders in RNA splicing analysis
The effects of confounding factors on gene expression analysis have been extensively studied following the introduction of high-throughput microarrays and subsequently RNA sequencing. In contrast, there is a lack of equivalent analysis and tools for RNA splicing. Here we first assess the effect of confounders on both expression and splicing quantifications in two large public RNA-Seq datasets (TARGET, ENCODE). We show quantification of splicing variations are affected at least as much as those of gene expression, revealing unwanted sources of variations in both datasets. Next, we develop MOCCASIN, a method to correct the effect of both known and unknown confounders on RNA splicing quantification and demonstrate MOCCASIN’s effectiveness on both synthetic and real data. Code, synthetic and corrected datasets are all made available as resources. Confounding factors on gene expression analysis can be analyzed by several existing tools. Here the authors develop an algorithm called MOCCASIN to correct the effect of known and unknown confounders on RNA splicing quantification.
RNA splicing analysis using heterogeneous and large RNA-seq datasets
The ubiquity of RNA-seq has led to many methods that use RNA-seq data to analyze variations in RNA splicing. However, available methods are not well suited for handling heterogeneous and large datasets. Such datasets scale to thousands of samples across dozens of experimental conditions, exhibit increased variability compared to biological replicates, and involve thousands of unannotated splice variants resulting in increased transcriptome complexity. We describe here a suite of algorithms and tools implemented in the MAJIQ v2 package to address challenges in detection, quantification, and visualization of splicing variations from such datasets. Using both large scale synthetic data and GTEx v8 as benchmark datasets, we assess the advantages of MAJIQ v2 compared to existing methods. We then apply MAJIQ v2 package to analyze differential splicing across 2,335 samples from 13 brain subregions, demonstrating its ability to offer insights into brain subregion-specific splicing regulation. Here the authors develop MAJIQ v2 to address challenges in detection, quantification, and visualization of RNA splicing variations from large heterogeneous RNA-Seq datasets. They then apply it to analyze 2,335 samples from 13 brain subregions.
Integrative analysis of RNA binding proteins identifies DDX55 as a novel regulator of 3’UTR isoform diversity
Background The 3’ untranslated regions (3’UTRs) of mRNAs play a critical role in controlling gene expression and function because they contain binding sites for microRNAs and RNA binding proteins (RBPs) that alter mRNA stability, localization, and translation. Most mRNA 3’ ends contain multiple polyadenylation sites (PAS) that can be utilized in condition-specific manners, a process known as alternative polyadenylation (APA). However, the mechanisms driving the regulation of APA remain poorly characterized. Results By integrating a large set of over 500 RNA binding protein (RBP) depletion and binding experiments across two cell lines generated by the ENCODE consortium, we uncovered many RBPs in each cell type whose depletion leads to widespread alteration of 3’UTR patterns. These include not only known regulators of APA, but also many putative novel regulators of 3’UTR isoform expression. We focused our analysis on the largely unstudied DEAD box RNA helicase DDX55, and validated its novel role in 3’UTR isoform regulation using molecular assays and targeted 3’ end sequencing experiments. Conclusions Our findings identify DDX55 as a new regulator of APA, particularly at PAS that contain features of RNA secondary structure. Our data also suggest additional previously unrecognized regulators of 3’UTR processing and differential stability.