Catalogue Search | MBRL

A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis

by Tanaka, Maho , Guo, Wenbin , Marquez, Yamile in Accuracy , alternative polyadenylation , Alternative Splicing

2022

Background Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation, or incomplete cDNA synthesis. Results We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 169,000 transcripts—twice that of the best current Arabidopsis transcriptome and including over 1500 novel genes. Seventy-eight percent of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We develop novel methods to determine splice junctions and transcription start and end sites accurately. Mismatch profiles around splice junctions provide a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identify high-confidence transcription start and end sites and remove fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provides higher resolution of transcript expression profiling and identifies cold-induced differential transcription start and polyadenylation site usage. Conclusions AtRTD3 is the most comprehensive Arabidopsis transcriptome currently. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage analysis from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single-molecule sequencing analysis from any species.

Journal Article

Share this book

Add to My Shelf

Discerning novel splice junctions derived from RNA-seq alignment: a deep learning approach

by Liu, Xinan , Zhang, Yi , Liu, Jinze in Accuracy , Algorithms , Alignment

2018

Background Exon splicing is a regulated cellular process in the transcription of protein-coding genes. Technological advancements and cost reductions in RNA sequencing have made quantitative and qualitative assessments of the transcriptome both possible and widely available. RNA-seq provides unprecedented resolution to identify gene structures and resolve the diversity of splicing variants. However, currently available ab initio aligners are vulnerable to spurious alignments due to random sequence matches and sample-reference genome discordance. As a consequence, a significant set of false positive exon junction predictions would be introduced, which will further confuse downstream analyses of splice variant discovery and abundance estimation. Results In this work, we present a deep learning based splice junction sequence classifier, named DeepSplice, which employs convolutional neural networks to classify candidate splice junctions. We show (I) DeepSplice outperforms state-of-the-art methods for splice site classification when applied to the popular benchmark dataset HS3D, (II) DeepSplice shows high accuracy for splice junction classification with GENCODE annotation, and (III) the application of DeepSplice to classify putative splice junctions generated by Rail-RNA alignment of 21,504 human RNA-seq data significantly reduces 43 million candidates into around 3 million highly confident novel splice junctions. Conclusions A model inferred from the sequences of annotated exon junctions that can then classify splice junctions derived from primary RNA-seq data has been implemented. The performance of the model was evaluated and compared through comprehensive benchmarking and testing, indicating a reliable performance and gross usability for classifying novel splice junctions derived from RNA-seq alignment.

Journal Article

Share this book

Add to My Shelf

Identification of Novel Circular RNAs of the Human Protein Arginine Methyltransferase 1 (PRMT1) Gene, Expressed in Breast Cancer Cells

by Scorilas, Andreas , Diamantopoulos, Marios A. , Papatsirou, Maria in Alzheimer's disease , Bioinformatics , Breast cancer

2022

Circular RNAs (circRNAs) constitute a type of RNA formed through back-splicing. In breast cancer, circRNAs are implicated in tumor onset and progression. Although histone methylation by PRMT1 is largely involved in breast cancer development and metastasis, the effect of circular transcripts deriving from this gene has not been examined. In this study, total RNA was extracted from four breast cancer cell lines and reversely transcribed using random hexamer primers. Next, first- and second-round PCRs were performed using gene-specific divergent primers. Sanger sequencing followed for the determination of the sequence of each novel PRMT1 circRNA. Lastly, bioinformatics analysis was conducted to predict the functions of the novel circRNAs. In total, nine novel circRNAs were identified, comprising both complete and truncated exons of the PRMT1 gene. Interestingly, we demonstrated that the back-splice junctions consist of novel splice sites of the PRMT1 exons. Moreover, the circRNA expression pattern differed among these four breast cancer cell lines. All the novel circRNAs are predicted to act as miRNA and/or protein sponges, while five circRNAs also possess an open reading frame. In summary, we described the complete sequence of nine novel circRNAs of the PRMT1 gene, comprising distinct back-splice junctions and probably having different molecular properties.

Journal Article

Share this book

Add to My Shelf

The Long Read Transcriptome of Rice (Oryza sativa ssp. japonica var. Nipponbare) Reveals Novel Transcripts

by Henry, Robert J , Furtado, Agnelo , Perlo, Virginie in Alternative splicing , Annotations , Complexity

2022

BackgroundHigh-throughput next-generation sequencing technologies offer a powerful approach to characterizing the transcriptomes of plants. Long read sequencing has been shown to support the discovery of novel isoforms of transcripts. This approach enables the generation of full-length sequences revealing splice variants that may be important in regulating gene action. Investigation of the diversity of transcripts in the rice transcriptome including splice variants was conducted using PacBio long-read sequence data to improve the annotation of the rice genome.ResultsA cDNA library was prepared from RNA extracted from leaves, roots, seeds, inflorescences, and panicles of O. sativa ssp. japonica var Nipponbare and sequenced on a PacBio Sequel platform. This produced 346,190 non-redundant full-length non-chimeric reads (FLNC) resulting in 33,504 high-quality transcripts. Half of the transcripts were multi-exonic and entirely matched with the reference transcripts. However, 14,874 novel isoforms were also identified resulting predominantly from intron retention and at least one novel splice site. Intron retention was the prevalent alternative splicing event and exon skipping was the least observed. Of 73,659 splice junctions, 12,755 (17%) represented novel splice junctions with canonical and non-canonical intron boundaries. The complexity of the transcriptome was examined in detail for 19 starch synthesis-related genes, defining 276 spliced isoforms of which 94 splice variants were novel.ConclusionThe data reveal the great complexity of the rice transcriptome. The novel transcripts provide new insights that may be a key input in future research to improve the annotation of the rice genome.

Journal Article

Share this book

Add to My Shelf

Using the Chou’s 5-steps rule to predict splice junctions with interpretable bidirectional long short-term memory networks

by Anand, Ashish , R, Athul , Singh, Kusum Kumari in Attention , Back propagation , Bidirectional long short-term memory networks

2020

Neural models have been able to obtain state-of-the-art performances on several genome sequence-based prediction tasks. Such models take only nucleotide sequences as input and learn relevant features on their own. However, extracting the interpretable motifs from the model remains a challenge. This work explores various existing visualization techniques in their ability to infer relevant sequence information learnt by a recurrent neural network (RNN) on the task of splice junction identification. The visualization techniques have been modulated to suit the genome sequences as input. The visualizations inspect genomic regions at the level of a single nucleotide as well as a span of consecutive nucleotides. This inspection is performed based on the modification of input sequences (perturbation based) or the embedding space (back-propagation based). We infer features pertaining to both canonical and non-canonical splicing from a single neural model. Results indicate that the visualization techniques produce comparable performances for branchpoint detection. However, in the case of canonical donor and acceptor junction motifs, perturbation based visualizations perform better than back-propagation based visualizations, and vice-versa for non-canonical motifs. The source code of our stand-alone SpliceVisuL tool is available at https://github.com/aaiitggrp/SpliceVisuL. [Display omitted] •We employ BLSTM network with attention for the prediction of splice junctions.•The proposed architecture, named SpliceVisuL, achieves state-of-the-art performance.•Some visualization techniques are redesigned to comprehend genome sequences.•Features learnt by the model are extracted and validated with the existing knowledge.•A comparative study of the visualizations is done in terms of the learnt features.

Journal Article

Share this book

Add to My Shelf

The U1 spliceosomal RNA is recurrently mutated in multiple cancers

by Kumar, Sachin A. , Shuai, Shimin , Gutierrez-Fernandez, Ana in 42/109 , 45/23 , 45/91

2019

Cancers are caused by genomic alterations known as drivers. Hundreds of drivers in coding genes are known but, to date, only a handful of noncoding drivers have been discovered—despite intensive searching 1 , 2 . Attention has recently shifted to the role of altered RNA splicing in cancer; driver mutations that lead to transcriptome-wide aberrant splicing have been identified in multiple types of cancer, although these mutations have only been found in protein-coding splicing factors such as splicing factor 3b subunit 1 ( SF3B1 ) 3 – 6 . By contrast, cancer-related alterations in the noncoding component of the spliceosome—a series of small nuclear RNAs (snRNAs)—have barely been studied, owing to the combined challenges of characterizing noncoding cancer drivers and the repetitive nature of snRNA genes 1 , 7 , 8 . Here we report a highly recurrent A>C somatic mutation at the third base of U1 snRNA in several types of tumour. The primary function of U1 snRNA is to recognize the 5′ splice site via base-pairing. This mutation changes the preferential A–U base-pairing between U1 snRNA and the 5′ splice site to C–G base-pairing, and thus creates novel splice junctions and alters the splicing pattern of multiple genes—including known drivers of cancer. Clinically, the A>C mutation is associated with heavy alcohol use in patients with hepatocellular carcinoma, and with the aggressive subtype of chronic lymphocytic leukaemia with unmutated immunoglobulin heavy-chain variable regions. The mutation in U1 snRNA also independently confers an adverse prognosis to patients with chronic lymphocytic leukaemia. Our study demonstrates a noncoding driver in spliceosomal RNAs, reveals a mechanism of aberrant splicing in cancer and may represent a new target for treatment. Our findings also suggest that driver discovery should be extended to a wider range of genomic regions. A highly recurrent A>C somatic mutation in U1 small nuclear RNA, which alters the splicing pattern of genes that include known drivers of cancer, is identified in several types of tumour.

Journal Article

Share this book

Add to My Shelf

In Silico Identification and Characterization of circRNAs as Potential Virulence-Related miRNA/siRNA Sponges from Entamoeba histolytica and Encystment-Related circRNAs from Entamoeba invadens

by Méndez-Tenorio, Alfonso , López-Luis, Mario Ángel , García-Lerena, Jesús Alberto in back splice junction , Biosynthesis , CIRIfull

2022

Ubiquitous eukaryotic non-coding circular RNAs regulate transcription and translation. We have reported full-length intronic circular RNAs (flicRNAs) in Entamoeba histolytica with esterified 3′ss and 5′ss. Their 5′ss GU-rich elements are essential for their biogenesis and their suggested role in transcription regulation. Here, we explored whether exonic, exonic-intronic, and intergenic circular RNAs are also part of the E. histolytica and E. invadens ncRNA RNAome and investigated their possible functions. Available RNA-Seq libraries were analyzed with the CIRI-full software in search of circular exonic RNAs (circRNAs). The robustness of the analyses was validated using synthetic decoy sequences with bona fide back splice junctions. Differentially expressed (DE) circRNAs, between the virulent HM1:IMSS and the nonvirulent Rahman E. histolytica strains, were identified, and their miRNA sponging potential was analyzed using the intaRNA software. Respectively, 188 and 605 reverse overlapped circRNAs from E. invadens and E. histolytica were identified. The sequence composition of the circRNAs was mostly exonic although different to human circRNAs in other attributes. 416 circRNAs from E. histolytica were virulent-specific and 267 were nonvirulent-specific. Out of the common circRNAs, 32 were DE between strains. Finally, we predicted that 8 of the DE circRNAs could function as sponges of the bioinformatically reported miRNAs in E. histolytica, whose functions are still unknown. Our results extend the E. histolytica RNAome and allow us to devise a hypothesis to test circRNAs/miRNAs/siRNAs interactions in determining the virulent/nonvirulent phenotypes and to explore other regulatory mechanisms during amoebic encystment.

Journal Article

Share this book

Add to My Shelf

Integrated analysis of genomic and transcriptomic data for the discovery of splice-associated variants in cancer

by Griffith, Obi L. , Chapman, William C. , Lin, Yiing in 45/23 , 45/91 , 631/114/1314

2023

Somatic mutations within non-coding regions and even exons may have unidentified regulatory consequences that are often overlooked in analysis workflows. Here we present RegTools ( www.regtools.org ), a computationally efficient, free, and open-source software package designed to integrate somatic variants from genomic data with splice junctions from bulk or single cell transcriptomic data to identify variants that may cause aberrant splicing. We apply RegTools to over 9000 tumor samples with both tumor DNA and RNA sequence data. RegTools discovers 235,778 events where a splice-associated variant significantly increases the splicing of a particular junction, across 158,200 unique variants and 131,212 unique junctions. To characterize these somatic variants and their associated splice isoforms, we annotate them with the Variant Effect Predictor, SpliceAI, and Genotype-Tissue Expression junction counts and compare our results to other tools that integrate genomic and transcriptomic data. While many events are corroborated by the aforementioned tools, the flexibility of RegTools also allows us to identify splice-associated variants in known cancer drivers, such as TP53 , CDKN2A , and B2M , and other genes. Analysing the regulatory consequences of mutations and splice variants at large scale in cancer requires efficient computational tools. Here, the authors develop RegTools, a software package that can identify splice-associated variants from large-scale genomics and transcriptomics data with efficiency and flexibility.

Journal Article

Share this book

Add to My Shelf

Alternative splicing modulation by G-quadruplexes

by Furlan, Giulia , Medhi, Ragini , Hemberg, Martin in 42/44 , 45/29 , 45/77

2022

Alternative splicing is central to metazoan gene regulation, but the regulatory mechanisms are incompletely understood. Here, we show that G-quadruplex (G4) motifs are enriched ~3-fold near splice junctions. The importance of G4s in RNA is emphasised by a higher enrichment for the non-template strand. RNA-seq data from mouse and human neurons reveals an enrichment of G4s at exons that were skipped following depolarisation induced by potassium chloride. We validate the formation of stable RNA G4s for three candidate splice sites by circular dichroism spectroscopy, UV-melting and fluorescence measurements. Moreover, we find that sQTLs are enriched at G4s, and a minigene experiment provides further support for their role in promoting exon inclusion. Analysis of >1,800 high-throughput experiments reveals multiple RNA binding proteins associated with G4s. Finally, exploration of G4 motifs across eleven species shows strong enrichment at splice sites in mammals and birds, suggesting an evolutionary conserved splice regulatory mechanism. Here the authors shows that G-quadruplexes, non-canonical DNA/RNA structures, can have a direct impact on alternative splicing and that binding of splicing regulators is affected by their presence.

Journal Article

Share this book

Add to My Shelf

OpenSpliceAI provides an efficient modular implementation of SpliceAI enabling easy retraining across nonhuman species

by Chao, Kuan-Hao , Salzberg, Steven L , Pertea, Mihaela in Animals , Computational Biology - methods , Deep Learning

2025

The SpliceAI deep learning system is currently one of the most accurate methods for identifying splicing signals directly from DNA sequences. However, its utility is limited by its reliance on older software frameworks and human-centric training data. Here, we introduce OpenSpliceAI, a trainable, open-source version of SpliceAI implemented in PyTorch to address these challenges. OpenSpliceAI supports both training from scratch and transfer learning, enabling seamless retraining on species-specific datasets and mitigating human-centric biases. Our experiments show that it achieves faster processing speeds and lower memory usage than the original SpliceAI code, allowing large-scale analyses of extensive genomic regions on a single GPU. Additionally, OpenSpliceAI’s flexible architecture makes for easier integration with established machine learning ecosystems, simplifying the development of custom splicing models for different species and applications. We demonstrate that OpenSpliceAI’s output is highly concordant with SpliceAI. In silico mutagenesis analyses confirm that both models rely on similar sequence features, and calibration experiments demonstrate similar score probability estimates.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter