Catalogue Search | MBRL

CI-SpliceAI—Improving machine learning predictions of disease causing splicing variants using curated alternative splice sites

by Baralle, Diana , Niranjan, Mahesan , Strauch, Yaron in Algorithms , Alternative Splicing , Analysis

2022

It is estimated that up to 50% of all disease causing variants disrupt splicing. Due to its complexity, our ability to predict which variants disrupt splicing is limited, meaning missed diagnoses for patients. The emergence of machine learning for targeted medicine holds great potential to improve prediction of splice disrupting variants. The recently published SpliceAI algorithm utilises deep neural networks and has been reported to have a greater accuracy than other commonly used methods. The original SpliceAI was trained on splice sites included in primary isoforms combined with novel junctions observed in GTEx data, which might introduce noise and de-correlate the machine learning input with its output. Limiting the data to only validated and manual annotated primary and alternatively spliced GENCODE sites in training may improve predictive abilities. All of these gene isoforms were collapsed (aggregated into one pseudo-isoform) and the SpliceAI architecture was retrained (CI-SpliceAI). Predictive performance on a newly curated dataset of 1,316 functionally validated variants from the literature was compared with the original SpliceAI, alongside MMSplice, MaxEntScan, and SQUIRLS. Both SpliceAI algorithms outperformed the other methods, with the original SpliceAI achieving an accuracy of ∼91%, and CI-SpliceAI showing an improvement at ∼92% overall. Predictive accuracy increased in the majority of curated variants. We show that including only manually annotated alternatively spliced sites in training data improves prediction of clinically relevant variants, and highlight avenues for further performance improvements.

Journal Article

Share this book

Add to My Shelf

Recommendations for clinical interpretation of variants found in non-coding regions of the genome

by Heidi L. Rehm , Diana Baralle , Richard D. Bagnall in Binding sites , Bioinformatics , Biomedical and Life Sciences

2022

Background The majority of clinical genetic testing focuses almost exclusively on regions of the genome that directly encode proteins. The important role of variants in non-coding regions in penetrant disease is, however, increasingly being demonstrated, and the use of whole genome sequencing in clinical diagnostic settings is rising across a large range of genetic disorders. Despite this, there is no existing guidance on how current guidelines designed primarily for variants in protein-coding regions should be adapted for variants identified in other genomic contexts. Methods We convened a panel of nine clinical and research scientists with wide-ranging expertise in clinical variant interpretation, with specific experience in variants within non-coding regions. This panel discussed and refined an initial draft of the guidelines which were then extensively tested and reviewed by external groups. Results We discuss considerations specifically for variants in non-coding regions of the genome. We outline how to define candidate regulatory elements, highlight examples of mechanisms through which non-coding region variants can lead to penetrant monogenic disease, and outline how existing guidelines can be adapted for the interpretation of these variants. Conclusions These recommendations aim to increase the number and range of non-coding region variants that can be clinically interpreted, which, together with a compatible phenotype, can lead to new diagnoses and catalyse the discovery of novel disease mechanisms.

Journal Article

Share this book

Add to My Shelf

Prenatal exome sequencing analysis in fetal structural anomalies detected by ultrasonography (PAGE): a cohort study

by Newbury-Ecob, Ruth , Westwood, Paul , Carey, Georgina in Abnormal Karyotype - embryology , Abnormal Karyotype - statistics & numerical data , Abnormalities

2019

Fetal structural anomalies, which are detected by ultrasonography, have a range of genetic causes, including chromosomal aneuploidy, copy number variations (CNVs; which are detectable by chromosomal microarrays), and pathogenic sequence variants in developmental genes. Testing for aneuploidy and CNVs is routine during the investigation of fetal structural anomalies, but there is little information on the clinical usefulness of genome-wide next-generation sequencing in the prenatal setting. We therefore aimed to evaluate the proportion of fetuses with structural abnormalities that had identifiable variants in genes associated with developmental disorders when assessed with whole-exome sequencing (WES). In this prospective cohort study, two groups in Birmingham and London recruited patients from 34 fetal medicine units in England and Scotland. We used whole-exome sequencing (WES) to evaluate the presence of genetic variants in developmental disorder genes (diagnostic genetic variants) in a cohort of fetuses with structural anomalies and samples from their parents, after exclusion of aneuploidy and large CNVs. Women were eligible for inclusion if they were undergoing invasive testing for identified nuchal translucency or structural anomalies in their fetus, as detected by ultrasound after 11 weeks of gestation. The partners of these women also had to consent to participate. Sequencing results were interpreted with a targeted virtual gene panel for developmental disorders that comprised 1628 genes. Genetic results related to fetal structural anomaly phenotypes were then validated and reported postnatally. The primary endpoint, which was assessed in all fetuses, was the detection of diagnostic genetic variants considered to have caused the fetal developmental anomaly. The cohort was recruited between Oct 22, 2014, and June 29, 2017, and clinical data were collected until March 31, 2018. After exclusion of fetuses with aneuploidy and CNVs, 610 fetuses with structural anomalies and 1202 matched parental samples (analysed as 596 fetus-parental trios, including two sets of twins, and 14 fetus-parent dyads) were analysed by WES. After bioinformatic filtering and prioritisation according to allele frequency and effect on protein and inheritance pattern, 321 genetic variants (representing 255 potential diagnoses) were selected as potentially pathogenic genetic variants (diagnostic genetic variants), and these variants were reviewed by a multidisciplinary clinical review panel. A diagnostic genetic variant was identified in 52 (8·5%; 95% CI 6·4–11·0) of 610 fetuses assessed and an additional 24 (3·9%) fetuses had a variant of uncertain significance that had potential clinical usefulness. Detection of diagnostic genetic variants enabled us to distinguish between syndromic and non-syndromic fetal anomalies (eg, congenital heart disease only vs a syndrome with congenital heart disease and learning disability). Diagnostic genetic variants were present in 22 (15·4%) of 143 fetuses with multisystem anomalies (ie, more than one fetal structural anomaly), nine (11·1%) of 81 fetuses with cardiac anomalies, and ten (15·4%) of 65 fetuses with skeletal anomalies; these phenotypes were most commonly associated with diagnostic variants. However, diagnostic genetic variants were least common in fetuses with isolated increased nuchal translucency (≥4·0 mm) in the first trimester (in three [3·2%] of 93 fetuses). WES facilitates genetic diagnosis of fetal structural anomalies, which enables more accurate predictions of fetal prognosis and risk of recurrence in future pregnancies. However, the overall detection of diagnostic genetic variants in a prospectively ascertained cohort with a broad range of fetal structural anomalies is lower than that suggested by previous smaller-scale studies of fewer phenotypes. WES improved the identification of genetic disorders in fetuses with structural abnormalities; however, before clinical implementation, careful consideration should be given to case selection to maximise clinical usefulness. UK Department of Health and Social Care and The Wellcome Trust.

Journal Article

Share this book

Add to My Shelf

Blood RNA analysis can increase clinical diagnostic rate and resolve variants of uncertain significance

by Baralle, Diana , Kelly, Hugh , Douglas, Andrew G. L. in Biomedical and Life Sciences , Biomedicine , Computational Biology

2020

Purpose Diagnosis of genetic disorders is hampered by large numbers of variants of uncertain significance (VUSs) identified through next-generation sequencing. Many such variants may disrupt normal RNA splicing. We examined effects on splicing of a large cohort of clinically identified variants and compared performance of bioinformatic splicing prediction tools commonly used in diagnostic laboratories. Methods Two hundred fifty-seven variants (coding and noncoding) were referred for analysis across three laboratories. Blood RNA samples underwent targeted reverse transcription polymerase chain reaction (RT-PCR) analysis with Sanger sequencing of PCR products and agarose gel electrophoresis. Seventeen samples also underwent transcriptome-wide RNA sequencing with targeted splicing analysis based on Sashimi plot visualization. Bioinformatic splicing predictions were obtained using Alamut, HSF 3.1, and SpliceAI software. Results Eighty-five variants (33%) were associated with abnormal splicing. The most frequent abnormality was upstream exon skipping (39/85 variants), which was most often associated with splice donor region variants. SpliceAI had greatest accuracy in predicting splicing abnormalities (0.91) and outperformed other tools in sensitivity and specificity. Conclusion Splicing analysis of blood RNA identifies diagnostically important splicing abnormalities and clarifies functional effects of a significant proportion of VUSs. Bioinformatic predictions are improving but still make significant errors. RNA analysis should therefore be routinely considered in genetic disease diagnostics.

Journal Article

Share this book

Add to My Shelf

Rare coding variants in the phospholipase D3 gene confer risk for Alzheimer’s disease

by Ryten, Mina , Skorupa, Tara , Sassi, Celeste in 13/106 , 13/44 , 14/1

2014

Whole-exome sequencing reveals that a rare variant of phospholipase D3 ( PLD3 ( V232M )) segregates with Alzheimer’s disease status in two independent families and doubles risk for the disease in case–control series, and that several other PLD3 variants increase risk for Alzheimer’s disease in African Americans and people of European descent. New genetic risk variant for Alzheimer's disease The identification of mutations causing Alzheimer's disease in amyloid-β precursor protein, presenilin 1 and presenilin 2 led to a better understanding of the pathobiology of the condition. Further mutations are expected to be implicated, but the identification of such variants has been challenging. These authors used exome sequencing to identify low-frequency coding variants with large effects on late-onset Alzheimer's disease. They report several coding variants in the gene PLD3 , coding for phospholipase D3, that increase disease risk at least twofold. PLD3 may have a role in the processing of amyloid-β and may have potential as a novel therapeutic target. Genome-wide association studies (GWAS) have identified several risk variants for late-onset Alzheimer's disease (LOAD) 1 , 2 . These common variants have replicable but small effects on LOAD risk and generally do not have obvious functional effects. Low-frequency coding variants, not detected by GWAS, are predicted to include functional variants with larger effects on risk. To identify low-frequency coding variants with large effects on LOAD risk, we carried out whole-exome sequencing (WES) in 14 large LOAD families and follow-up analyses of the candidate variants in several large LOAD case–control data sets. A rare variant in PLD3 (phospholipase D3; Val232Met) segregated with disease status in two independent families and doubled risk for Alzheimer’s disease in seven independent case–control series with a total of more than 11,000 cases and controls of European descent. Gene-based burden analyses in 4,387 cases and controls of European descent and 302 African American cases and controls, with complete sequence data for PLD3 , reveal that several variants in this gene increase risk for Alzheimer’s disease in both populations. PLD3 is highly expressed in brain regions that are vulnerable to Alzheimer’s disease pathology, including hippocampus and cortex, and is expressed at significantly lower levels in neurons from Alzheimer’s disease brains compared to control brains. Overexpression of PLD3 leads to a significant decrease in intracellular amyloid-β precursor protein (APP) and extracellular Aβ42 and Aβ40 (the 42- and 40-residue isoforms of the amyloid-β peptide), and knockdown of PLD3 leads to a significant increase in extracellular Aβ42 and Aβ40. Together, our genetic and functional data indicate that carriers of PLD3 coding variants have a twofold increased risk for LOAD and that PLD3 influences APP processing. This study provides an example of how densely affected families may help to identify rare variants with large effects on risk for disease or other complex traits.

Journal Article

Share this book

Add to My Shelf

Discovery of four recessive developmental disorders using probabilistic genotype and phenotype matching among 4,125 families

by Cole, Trevor , Brady, Angela F , Lees, Melissa in 45/23 , 631/208/1516 , 631/208/2489

2015

Matthew Hurles, David FitzPatrick and colleagues report the discovery of four novel Mendelian disorders based on their analysis of exome sequence data from 4,125 families with diverse rare developmental disorders. They present their analytical pipeline as a general strategy for the discovery of genetic causes of autosomal recessive disorders. Discovery of most autosomal recessive disease-associated genes has involved analysis of large, often consanguineous multiplex families or small cohorts of unrelated individuals with a well-defined clinical condition. Discovery of new dominant causes of rare, genetically heterogeneous developmental disorders has been revolutionized by exome analysis of large cohorts of phenotypically diverse parent-offspring trios 1 , 2 . Here we analyzed 4,125 families with diverse, rare and genetically heterogeneous developmental disorders and identified four new autosomal recessive disorders. These four disorders were identified by integrating Mendelian filtering (selecting probands with rare, biallelic and putatively damaging variants in the same gene) with statistical assessments of (i) the likelihood of sampling the observed genotypes from the general population and (ii) the phenotypic similarity of patients with recessive variants in the same candidate gene. This new paradigm promises to catalyze the discovery of novel recessive disorders, especially those with less consistent or nonspecific clinical presentations and those caused predominantly by compound heterozygous genotypes.

Journal Article

Share this book

Add to My Shelf

Identification of diagnostic candidates in Mendelian disorders using an RNA sequencing-centric approach

by Baralle, Diana , Hunt, David , Bunyan, David J. in Alternative Splicing , Analysis , Bioinformatics

2024

Background RNA sequencing (RNA-seq) is increasingly being used as a complementary tool to DNA sequencing in diagnostics where DNA analysis has been uninformative. RNA-seq enables the identification of aberrant splicing and aberrant gene expression, improving the interpretation of variants of unknown significance (VUSs), and provides the opportunity to scan the transcriptome for aberrant splicing and expression in relevant genes that may be the cause of a patient’s phenotype. This work aims to investigate the feasibility of generating new diagnostic candidates in patients without a previously reported VUS using an RNA-seq-centric approach. Methods We systematically assessed the transcriptomic profiles of 86 patients with suspected Mendelian disorders, 38 of whom had no candidate sequence variant, using RNA from blood samples. Each VUS was visually inspected to search for splicing abnormalities. Once aberrant splicing was identified in cases with VUS, multiple open-source alternative splicing tools were used to investigate if they would identify what was observed in IGV. Expression outliers were detected using OUTRIDER. Diagnoses in cases without a VUS were explored using two separate strategies. Results RNA-seq allowed us to assess 71% of VUSs, detecting aberrant splicing in 14/48 patients with a VUS. We identified four new diagnoses by detecting novel aberrant splicing events in patients with no candidate sequence variants from prior DNA testing ( n = 32) or where the candidate VUS did not affect splicing ( n = 23). An additional diagnosis was made through the detection of skewed X-inactivation. Conclusion This work demonstrates the utility of an RNA-centric approach in identifying novel diagnoses in patients without candidate VUSs. It underscores the utility of blood-based RNA analysis in improving diagnostic yields and highlights optimal approaches for such analyses.

Journal Article

Share this book

Add to My Shelf

A systematic analysis of splicing variants identifies new diagnoses in the 100,000 Genomes Project

by Blakes, Alexander J. M. , Baralle, Diana , Douglas, Andrew G. L. in Analysis , Annotations , Artificial intelligence

2022

Background Genomic variants which disrupt splicing are a major cause of rare genetic diseases. However, variants which lie outside of the canonical splice sites are difficult to interpret clinically. Improving the clinical interpretation of non-canonical splicing variants offers a major opportunity to uplift diagnostic yields from whole genome sequencing data. Methods Here, we examine the landscape of splicing variants in whole-genome sequencing data from 38,688 individuals in the 100,000 Genomes Project and assess the contribution of non-canonical splicing variants to rare genetic diseases. We use a variant-level constraint metric (the mutability-adjusted proportion of singletons) to identify constrained functional variant classes near exon–intron junctions and at putative splicing branchpoints. To identify new diagnoses for individuals with unsolved rare diseases in the 100,000 Genomes Project, we identified individuals with de novo single-nucleotide variants near exon–intron boundaries and at putative splicing branchpoints in known disease genes. We identified candidate diagnostic variants through manual phenotype matching and confirmed new molecular diagnoses through clinical variant interpretation and functional RNA studies. Results We show that near-splice positions and splicing branchpoints are highly constrained by purifying selection and harbour potentially damaging non-coding variants which are amenable to systematic analysis in sequencing data. From 258 de novo splicing variants in known rare disease genes, we identify 35 new likely diagnoses in probands with an unsolved rare disease. To date, we have confirmed a new diagnosis for six individuals, including four in whom RNA studies were performed. Conclusions Overall, we demonstrate the clinical value of examining non-canonical splicing variants in individuals with unsolved rare diseases.

Journal Article

Share this book

Add to My Shelf

The epigenetic landscape of Alzheimer's disease

by Cruchaga, Carlos , Lord, Jenny in 38/47 , 45/43 , 631/1647/2210/2213

2014

Two independent epigenome-wide association studies of Alzheimer's disease cohorts have identified overlapping methylation signals in four loci, ANK1 , RPL13 , RHBDF2 and CDH23 , not previously associated with Alzheimer's disease. These studies also suggest that epigenetic changes contribute more to Alzheimer's disease than expected.

Journal Article

Share this book

Add to My Shelf

Whole genome sequencing in the diagnosis of primary ciliary dyskinesia

by Green, Ben , Hunt, David , Mennella, Vito in Adolescent , Adult , Biomedical and Life Sciences

2021

Background It is estimated that 1–13% of cases of bronchiectasis in adults globally are attributable to primary ciliary dyskinesia (PCD) but many adult patients with bronchiectasis have not been investigated for PCD. PCD is a disorder caused by mutations in genes required for motile cilium structure or function, resulting in impaired mucociliary clearance. Symptoms appear in infancy but diagnosis is often late or missed, often due to the lack of a “gold standard” diagnostic tool and non-specific symptoms. Mutations in > 50 genes account for around 70% of cases, with additional genes, and non-coding, synonymous, missense changes or structural variants (SVs) in known genes presumed to account for the missing heritability. Methods UK patients with no identified genetic confirmation for the cause of their PCD or bronchiectasis were eligible for whole genome sequencing (WGS) in the Genomics England Ltd 100,000 Genomes Project. 21 PCD probands and 52 non-cystic fibrosis (CF) bronchiectasis probands were recruited in Wessex Genome Medicine Centre (GMC). We carried out analysis of single nucleotide variants (SNVs) and SVs in all families recruited in Wessex GMC. Results 16/21 probands in the PCD cohort received confirmed (n = 9), probable (n = 4) or possible (n = 3) diagnosis from WGS, although 13/16 of these could have been picked up by current standard of care gene panel testing. In the other cases, SVs were identified which were missed by panel testing. We identified variants in novel PCD candidate genes ( IFT140 and PLK4 ) in 2 probands in the PCD cohort. 3/52 probands in the non-CF bronchiectasis cohort received a confirmed (n = 2) or possible (n = 1) diagnosis of PCD. We identified variants in novel PCD candidate genes ( CFAP53 and CEP164 ) in 2 further probands in the non-CF bronchiectasis cohort. Conclusions Genetic testing is an important component of diagnosing PCD, especially in cases of atypical disease history. WGS is effective in cases where prior gene panel testing has found no variants or only heterozygous variants. In these cases it may detect SVs and is a powerful tool for novel gene discovery.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter