Catalogue Search | MBRL

SpliceFinder: ab initio prediction of splice sites using convolutional neural network

by Wang, Ruohan , Wang, Jianping , Li, Shuaicheng in Algorithms , Analysis , Animals

2019

Background Identifying splice sites is a necessary step to analyze the location and structure of genes. Two dinucleotides, GT and AG, are highly frequent on splice sites, and many other patterns are also on splice sites with important biological functions. Meanwhile, the dinucleotides occur frequently at the sequences without splice sites, which makes the prediction prone to generate false positives. Most existing tools select all the sequences with the two dimers and then focus on distinguishing the true splice sites from those pseudo ones. Such an approach will lead to a decrease in false positives; however, it will result in non-canonical splice sites missing. Result We have designed SpliceFinder based on convolutional neural network (CNN) to predict splice sites. To achieve the ab initio prediction, we used human genomic data to train our neural network. An iterative approach is adopted to reconstruct the dataset, which tackles the data unbalance problem and forces the model to learn more features of splice sites. The proposed CNN obtains the classification accuracy of 90.25 % , which is 10% higher than the existing algorithms. The method outperforms other existing methods in terms of area under receiver operating characteristics (AUC), recall, precision, and F1 score. Furthermore, SpliceFinder can find the exact position of splice sites on long genomic sequences with a sliding window. Compared with other state-of-the-art splice site prediction tools, SpliceFinder generates results in about half lower false positive while keeping recall higher than 0.8. Also, SpliceFinder captures the non-canonical splice sites. In addition, SpliceFinder performs well on the genomic sequences of Drosophila melanogaster , Mus musculus , Rattus , and Danio rerio without retraining. Conclusion Based on CNN, we have proposed a new ab initio splice site prediction tool, SpliceFinder, which generates less false positives and can detect non-canonical splice sites. Additionally, SpliceFinder is transferable to other species without retraining. The source code and additional materials are available at https://gitlab.deepomics.org/wangruohan/SpliceFinder .

Journal Article

Share this book

Add to My Shelf

Frequent pathway mutations of splicing machinery in myelodysplasia

by Shiraishi, Yuichi , Sanada, Masashi , Hofmann, Wolf-Karsten in 631/208/1792 , 631/208/737 , 692/420

2011

Myelodysplastic syndromes and related disorders (myelodysplasia) are a heterogeneous group of myeloid neoplasms showing deregulated blood cell production with evidence of myeloid dysplasia and a predisposition to acute myeloid leukaemia, whose pathogenesis is only incompletely understood. Here we report whole-exome sequencing of 29 myelodysplasia specimens, which unexpectedly revealed novel pathway mutations involving multiple components of the RNA splicing machinery, including U2AF35 , ZRSR2 , SRSF2 and SF3B1 . In a large series analysis, these splicing pathway mutations were frequent (∼45 to ∼85%) in, and highly specific to, myeloid neoplasms showing features of myelodysplasia. Conspicuously, most of the mutations, which occurred in a mutually exclusive manner, affected genes involved in the 3′-splice site recognition during pre-mRNA processing, inducing abnormal RNA splicing and compromised haematopoiesis. Our results provide the first evidence indicating that genetic alterations of the major splicing components could be involved in human pathogenesis, also implicating a novel therapeutic possibility for myelodysplasia. RNA-splicing defects in blood disorders Exome sequencing and analysis of myelodysplasia specimens identified frequent non-overlapping alterations in multiple components of the RNA splicing machinery, including mutations in U2AF35 , ZRSR2 , SRSF2 and SF3B1 . Most affected genes are involved in recognition of the 3′ splice site during pre-messenger RNA processing, and are thought to cause abnormal RNA splicing and compromised haematopoiesis. The results demonstrate the role of aberrant splicing in human pathogenesis.

Journal Article

Share this book

Add to My Shelf

Introme accurately predicts the impact of coding and noncoding variants on gene splicing, with clinical applications

by Sullivan, Patricia J. , Dinger, Marcel E. , McCabe, Mark J. in Alternative splicing , Animal Genetics and Genomics , Artificial intelligence

2023

Predicting the impact of coding and noncoding variants on splicing is challenging, particularly in non-canonical splice sites, leading to missed diagnoses in patients. Existing splice prediction tools are complementary but knowing which to use for each splicing context remains difficult. Here, we describe Introme, which uses machine learning to integrate predictions from several splice detection tools, additional splicing rules, and gene architecture features to comprehensively evaluate the likelihood of a variant impacting splicing. Through extensive benchmarking across 21,000 splice-altering variants, Introme outperformed all tools (auPRC: 0.98) for the detection of clinically significant splice variants. Introme is available at https://github.com/CCICB/introme .

Journal Article

Share this book

Add to My Shelf

Splicing Enhancers at Intron–Exon Borders Participate in Acceptor Splice Sites Recognition

by Freiberger, Tomáš , Hujová, Pavla , Souček, Přemysl in Alternative Splicing - genetics , Base Sequence - genetics , Binding sites

2020

Acceptor splice site recognition (3′ splice site: 3′ss) is a fundamental step in precursor messenger RNA (pre-mRNA) splicing. Generally, the U2 small nuclear ribonucleoprotein (snRNP) auxiliary factor (U2AF) heterodimer recognizes the 3′ss, of which U2AF35 has a dual function: (i) It binds to the intron–exon border of some 3′ss and (ii) mediates enhancer-binding splicing activators’ interactions with the spliceosome. Alternative mechanisms for 3′ss recognition have been suggested, yet they are still not thoroughly understood. Here, we analyzed 3′ss recognition where the intron–exon border is bound by a ubiquitous splicing regulator SRSF1. Using the minigene analysis of two model exons and their mutants, BRCA2 exon 12 and VARS2 exon 17, we showed that the exon inclusion correlated much better with the predicted SRSF1 affinity than 3′ss quality, which were assessed using the Catalog of Inferred Sequence Binding Preferences of RNA binding proteins (CISBP-RNA) database and maximum entropy algorithm (MaxEnt) predictor and the U2AF35 consensus matrix, respectively. RNA affinity purification proved SRSF1 binding to the model 3′ss. On the other hand, knockdown experiments revealed that U2AF35 also plays a role in these exons’ inclusion. Most probably, both factors stochastically bind the 3′ss, supporting exon recognition, more apparently in VARS2 exon 17. Identifying splicing activators as 3′ss recognition factors is crucial for both a basic understanding of splicing regulation and human genetic diagnostics when assessing variants’ effects on splicing.

Journal Article

Share this book

Add to My Shelf

Impact of U2-type introns on splice site prediction in A. thaliana species using deep learning

by De Neve, Wesley , Depuydt, Stephen , Van Messem, Arnout in Acceptor sites , Algorithms , Applied Mathematics

2025

Background Splice site prediction in plant genomes poses substantial challenges that can be addressed using deep learning models. U2-type introns are especially useful for such studies given their ubiquity in plant genomes and the availability of rich datasets. We formulated two hypotheses: one proposing that short introns may enhance prediction effectiveness due to reduced spatial complexity, and another suggesting that sequences with multiple introns provide a richer context for splicing events. Results Our findings demonstrate that (1) models trained on datasets containing shorter introns achieve improved effectiveness for acceptor splice sites, but not for donor splice sites, indicating a more nuanced relationship between intron length and splice site prediction than initially hypothesized, and (2) models trained on datasets with multiple introns per sequence show higher effectiveness compared to those trained on datasets with a single intron per sequence. Notably, among the 402 bp sequences analyzed, 72% contained single introns while 28% contained multiple introns for donor sites (36,399 versus 13,987 sequences), with similar proportions observed for acceptor sites (37,236 versus 14,112 sequences). These computational insights align with biological observations, particularly regarding the conserved spatial relationship between branch points and acceptor splice sites, as well as the synergistic effects of multiple introns on splicing efficiency. Conclusions The obtained results contribute to a deeper understanding of how intronic features influence splice site prediction and suggest that future prediction models should consider factors such as intron length, multiplicity, and the spatial arrangement of splice-related signals.

Journal Article

Share this book

Add to My Shelf

Splam: a deep-learning-based splice site predictor that improves spliced alignments

by Chao, Kuan-Hao , Pertea, Mihaela , Salzberg, Steven L. in Accuracy , Animal Genetics and Genomics , Bioinformatics

2024

The process of splicing messenger RNA to remove introns plays a central role in creating genes and gene variants. We describe Splam, a novel method for predicting splice junctions in DNA using deep residual convolutional neural networks. Unlike previous models, Splam looks at a 400-base-pair window flanking each splice site, reflecting the biological splicing process that relies primarily on signals within this window. Splam also trains on donor and acceptor pairs together, mirroring how the splicing machinery recognizes both ends of each intron. Compared to SpliceAI, Splam is consistently more accurate, achieving 96% accuracy in predicting human splice junctions.

Journal Article

Share this book

Add to My Shelf

Single base-pair substitutions in exon-intron junctions of human genes: nature, distribution, and consequences for mRNA splicing

by Hundrieser, Bernd , Thomas, Nick S.T. , Hampe, Jochen in cryptic splice-site , Databases, Nucleic Acid , DNA Mutational Analysis

2007

Although single base‐pair substitutions in splice junctions constitute at least 10% of all mutations causing human inherited disease, the factors that determine their phenotypic consequences at the RNA level remain to be fully elucidated. Employing a neural network for splice‐site recognition, we performed a meta‐analysis of 478 disease‐associated splicing mutations, in 38 different genes, for which detailed laboratory‐based mRNA phenotype assessment had been performed. Inspection of the ±50‐bp DNA sequence context of the mutations revealed that exon skipping was the preferred phenotype when the immediate vicinity of the affected exon–intron junctions was devoid of alternative splice‐sites. By contrast, in the presence of at least one such motif, cryptic splice‐site utilization, became more prevalent. This association was, however, confined to donor splice‐sites. Outside the obligate dinucleotide, the spatial distribution of pathological mutations was found to differ significantly from that of SNPs. Whereas disease‐associated lesions clustered at positions –1 and +3 to +6 for donor sites and –3 for acceptor sites, SNPs were found to be almost evenly distributed over all sequence positions considered. When all putative missense mutations in the vicinity of splice‐sites were extracted from the Human Gene Mutation Database for the 38 studied genes, a significantly higher proportion of changes at donor sites (37/152; 24.3%) than at acceptor splice‐sites (1/142; 0.7%) was found to reduce the neural network signal emitted by the respective splice‐site. Based upon these findings, we estimate that some 1.6% of disease‐causing missense substitutions in human genes are likely to affect the mRNA splicing phenotype. Taken together, our results are consistent with correct donor splice‐site recognition being a key step in exon recognition. Hum Mutat 28(2), 150–158, 2007. © 2006 Wiley‐Liss, Inc.

Journal Article

Share this book

Add to My Shelf

DRBD3 regulates long non-coding RNA abundance and cryptic splice site selection in trypanosomes

by Gómez-Liñán, Claudia , Pérez-Victoria, José M. , Sánchez-Luque, Francisco J. in Alternative Splicing , Biochemistry , Biomedical and Life Sciences

2025

Trypanosomes are unicellular eukaryotes that rely heavily on post-transcriptional mechanisms to control gene expression. DRBD3 is an RNA-binding protein known to play important roles in mRNA processing, stability, transport and translation. It was found to associate with grumpy , a long non-coding RNA (lncRNA) recently characterized in Trypanosoma brucei . Here, we explore the role of DRBD3 in lncRNA metabolism and show that its depletion leads to the upregulation of a specific subset of approximately one hundred lncRNAs in both bloodstream and procyclic forms, likely through the activation of cryptic splice sites. The effect of DRBD3 depletion on lncRNA expression appears to be mostly indirect, and results from reduced levels of the poly(A) polymerase PAP1 following DRBD3 silencing. In addition to its impact on lncRNAs, DRBD3 loss also affects the processing of protein-coding genes, leading to alternative trans -splicing and protein truncation. Furthermore, we demonstrate that DRBD3 regulates the splicing of the newly identified intron in the transcript encoding the RNA-binding protein RBP20, and is important for maintaining the balance between trans - and cis -splicing. Our results position DRBD3 as a high-level regulatory factor that shapes the expression landscape of both coding and non-coding genes in trypanosomes.

Journal Article

Share this book

Add to My Shelf

Gene disruption through base editing-induced messenger RNA missplicing in plants

by Li, Jian-Feng , Liang, Jieping , Wang, Fengzhu in Arabidopsis , Arabidopsis - genetics , Base Sequence

2019

Gene knockout tools are highly desirable for basic and applied plant research. Here, we leverage the Cas9-derived cytosine base editor to introduce precise C-to-T mutations to disrupt the highly conserved intron donor site GT or acceptor site AG, thereby inducing messenger RNA (mRNA) missplicing and gene disruption. As proof of concept, we successfully obtained Arabidopsis null mutant of MTA gene in the T₂ generation and rice double null mutant of GL1-1 and NAL1 genes in the T₀ generation by this strategy. Elimination of the original intron donor site or acceptor site could trigger aberrant splicing at a new specific exonic site, but not at the closest GT or AG site, suggesting cryptic rules governing splice site recognition. The strategy presented expands the applications of base editing technologies in plants by providing a new means for gene inactivation without generating DNA double-strand breaks, and it can potentially serve as a useful tool for studying the biology of mRNA splicing.

Journal Article

Share this book

Add to My Shelf

Crystal structure of human U1 snRNP, a small nuclear ribonucleoprotein particle, reveals the mechanism of 5′ splice site recognition

by Kondo, Yasushi , van Roon, Anne-Marie M , Oubridge, Chris in 5′ splice site , Biophysics and Structural Biology , Crystal structure

2015

U1 snRNP binds to the 5′ exon-intron junction of pre-mRNA and thus plays a crucial role at an early stage of pre-mRNA splicing. We present two crystal structures of engineered U1 sub-structures, which together reveal at atomic resolution an almost complete network of protein–protein and RNA-protein interactions within U1 snRNP, and show how the 5′ splice site of pre-mRNA is recognised by U1 snRNP. The zinc-finger of U1-C interacts with the duplex between pre-mRNA and the 5′-end of U1 snRNA. The binding of the RNA duplex is stabilized by hydrogen bonds and electrostatic interactions between U1-C and the RNA backbone around the splice junction but U1-C makes no base-specific contacts with pre-mRNA. The structure, together with RNA binding assays, shows that the selection of 5′-splice site nucleotides by U1 snRNP is achieved predominantly through basepairing with U1 snRNA whilst U1-C fine-tunes relative affinities of mismatched 5′-splice sites. Genes are made up of long stretches of DNA. The regions of a gene that code for proteins (known as exons) are interrupted by stretches of non-coding DNA called introns. To produce proteins from a gene, the DNA is ‘transcribed’ to form pre-mRNA molecules, from which the introns must be removed in a process called splicing. The remaining exons are then joined together to form a mature mRNA molecule that contains the instructions to build a protein. Errors in the splicing process can lead to numerous diseases, such as cancer. A molecular machine known as a spliceosome is responsible for splicing the pre-mRNA molecules. This consists of five different complexes called small nuclear ribonucleoprotein particles (snRNPs), which are in turn made up from numerous proteins and RNA molecules. The spliceosome assembles anew every time it splices, and an early step in this assembly process involves the interaction of an snRNP called U1 with the start of an intron in the pre-mRNA. This interaction then stimulates the assembly of the rest of the spliceosome. In 2009, researchers reported the structure of the U1 snRNP, but the structure did not contain enough detail to reveal how the snRNP recognizes the start of an intron. Kondo, Oubridge et al., including some of the researchers involved in the 2009 work, now present the crystal structure of the human version of the U1 snRNP in more detail. High-quality crystal structures of the complete U1 snRNP molecule could not be obtained because the arrangement of the RNA molecules in the snRNP prevented a regular crystal from forming. Kondo, Oubridge et al. instead engineered two subcomponents of U1 snRNP that each crystallized well, and determined their structures. This revealed that the interactions between the various parts of the U1 snRNP form a complex network. A protein present in the U1 snRNP, known as U1-C, had previously been reported to be able to recognize introns on its own—without requiring the complete U1 snRNP. Kondo, Oubridge et al. reveal that this is not the case and that U1-C does not read the intron RNA sequence directly. Instead, U1 snRNP is able to find the start of the intron because the U1 RNA can stably bind to this site. The U1-C protein can however adjust the strength of this binding to ensure that the spliceosome can operate with a variety of intron start sequences (or signals).

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter