Catalogue Search | MBRL

Benchmarking splice variant prediction algorithms using massively parallel splicing assays

by Kitzman, Jacob O. , Smith, Cathy in Algorithms , Alternative splicing , Animal Genetics and Genomics

2023

Background Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. Results We benchmark eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compare experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms’ concordance with MPSA measurements, and with each other, is lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieve the best overall performance at distinguishing disruptive and neutral variants, and controlling for overall call rate genome-wide, SpliceAI and Pangolin have superior sensitivity. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. Conclusion SpliceAI and Pangolin show the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.

Journal Article

Share this book

Add to My Shelf

Fragment Length of Circulating Tumor DNA

by Hellwig, Sabine , Rostomily, Robert C. , Baker, Daniel N. in Alleles , Animals , Biology and life sciences

2016

Malignant tumors shed DNA into the circulation. The transient half-life of circulating tumor DNA (ctDNA) may afford the opportunity to diagnose, monitor recurrence, and evaluate response to therapy solely through a non-invasive blood draw. However, detecting ctDNA against the normally occurring background of cell-free DNA derived from healthy cells has proven challenging, particularly in non-metastatic solid tumors. In this study, distinct differences in fragment length size between ctDNAs and normal cell-free DNA are defined. Human ctDNA in rat plasma derived from human glioblastoma multiforme stem-like cells in the rat brain and human hepatocellular carcinoma in the rat flank were found to have a shorter principal fragment length than the background rat cell-free DNA (134-144 bp vs. 167 bp, respectively). Subsequently, a similar shift in the fragment length of ctDNA in humans with melanoma and lung cancer was identified compared to healthy controls. Comparison of fragment lengths from cell-free DNA between a melanoma patient and healthy controls found that the BRAF V600E mutant allele occurred more commonly at a shorter fragment length than the fragment length of the wild-type allele (132-145 bp vs. 165 bp, respectively). Moreover, size-selecting for shorter cell-free DNA fragment lengths substantially increased the EGFR T790M mutant allele frequency in human lung cancer. These findings provide compelling evidence that experimental or bioinformatic isolation of a specific subset of fragment lengths from cell-free DNA may improve detection of ctDNA.

Journal Article

Share this book

Add to My Shelf

Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions

by Burton, Joshua N , Kitzman, Jacob O , Shendure, Jay in 631/1647/514/2254 , 631/208/212/2302 , 631/208/69

2013

Short sequencing reads are scaffolded into chromosome-scale sequences with the help of chromatin-interaction data. Genomes assembled de novo from short reads are highly fragmented relative to the finished chromosomes of Homo sapiens and key model organisms generated by the Human Genome Project. To address this problem, we need scalable, cost-effective methods to obtain assemblies with chromosome-scale contiguity. Here we show that genome-wide chromatin interaction data sets, such as those generated by Hi-C, are a rich source of long-range information for assigning, ordering and orienting genomic sequences to chromosomes, including across centromeres. To exploit this finding, we developed an algorithm that uses Hi-C data for ultra-long-range scaffolding of de novo genome assemblies. We demonstrate the approach by combining shotgun fragment and short jump mate-pair sequences with Hi-C data to generate chromosome-scale de novo assemblies of the human, mouse and Drosophila genomes, achieving—for the human genome—98% accuracy in assigning scaffolds to chromosome groups and 99% accuracy in ordering and orienting scaffolds within chromosome groups. Hi-C data can also be used to validate chromosomal translocations in cancer genomes.

Journal Article

Share this book

Add to My Shelf

Haplotype-resolved genome sequencing: experimental methods and applications

by Shendure, Jay , Kitzman, Jacob O. , Adey, Andrew in 45/23 , 631/1647/514/2254 , 631/208/212/2301

2015

Key Points Haplotypes link together (that is, 'phase') groups of genetic variants that co-occur on single chromosomes. Although haplotypes have an important role in clinical genetics and association studies, they are not typically obtained by contemporary genotyping or sequencing technologies and must be determined separately. Inferential methods for haplotype determination perform fairly poorly for the rare and private variants implicated in many genetic diseases. To phase this class of variants accurately and comprehensively, direct experimental methods are needed. Dense haplotyping methods comprehensively phase variants into haplotype blocks at the scale of a single gene or a small number of genes and corresponding regulatory regions. Contiguity is defined within each block but not between adjacent or distant haplotype blocks. Sparse haplotyping methods phase a more modest number of distant variants distributed along an entire chromosome or a chromosome arm. Resulting haplotypes are not comprehensive but have long-range contiguity that is currently unattainable using dense methods. Reference panels of previously ascertained haplotypes can be used to correct errors in, or increase the density or contiguity of, directly obtained haplotypes. Such hybrid approaches yield improved haplotypes at low costs. Although contiguity metrics are typically used to compare haplotype assemblies, comprehensive comparisons should also include measures of the accuracy, density and allele frequency spectrum of the phased variants. High-throughput DNA sequencing technologies are providing an ever-expanding wealth of genome sequence data, including detailed information on human genetic variation. However, such data typically lack haplotype information (that is, the cis -connectivity of variants along individual chromosomes). This Review describes diverse recent experimental methods by which genetic variants can be resolved into haplotypes, accompanying computational methods and important applications of these methods in genomics and biomedical science. Human genomes are diploid and, for their complete description and interpretation, it is necessary not only to discover the variation they contain but also to arrange it onto chromosomal haplotypes. Although whole-genome sequencing is becoming increasingly routine, nearly all such individual genomes are mostly unresolved with respect to haplotype, particularly for rare alleles, which remain poorly resolved by inferential methods. Here, we review emerging technologies for experimentally resolving (that is, 'phasing') haplotypes across individual whole-genome sequences. We also discuss computational methods relevant to their implementation, metrics for assessing their accuracy and completeness, and the relevance of haplotype information to applications of genome sequencing in research and clinical medicine.

Journal Article

Share this book

Add to My Shelf

Saturation-scale functional evidence supports clinical variant interpretation in Lynch syndrome

by Chamberlin, Adam , Karam, Rachid , Kitzman, Jacob O. in Animal Genetics and Genomics , Bioinformatics , Biomedical and Life Sciences

2022

Background Lynch syndrome (LS) is a cancer predisposition syndrome affecting more than 1 in every 300 individuals worldwide. Clinical genetic testing for LS can be life-saving but is complicated by the heavy burden of variants of uncertain significance (VUS), especially missense changes. Result To address this challenge, we leverage a multiplexed analysis of variant effect (MAVE) map covering >94% of the 17,746 possible missense variants in the key LS gene MSH2 . To establish this map’s utility in large-scale variant reclassification, we overlay it on clinical databases of >15,000 individuals with LS gene variants uncovered during clinical genetic testing. We validate these functional measurements in a cohort of individuals with paired tumor-normal test results and find that MAVE-based function scores agree with the clinical interpretation for every one of the MSH2 missense variants with an available classification. We use these scores to attempt reclassification for 682 unique missense VUS, among which 34 scored as deleterious by our function map, in line with previously published rates for other cancer predisposition genes. Combining functional data and other evidence, ten missense VUS are reclassified as pathogenic/likely pathogenic, and another 497 could be moved to benign/likely benign. Finally, we apply these functional scores to paired tumor-normal genetic tests and identify a subset of patients with biallelic somatic loss of function, reflecting a sporadic Lynch-like Syndrome with distinct implications for treatment and relatives’ risk. Conclusion This study demonstrates how high-throughput functional assays can empower scalable VUS resolution and prospectively generate strong evidence for variant classification.

Journal Article

Share this book

Add to My Shelf

Haplotypes drop by drop

by Kitzman, Jacob O in 631/208/726/649/2157 , 631/208/728 , 631/61/212

2016

Short-read sequencing provides haplotype information when long DNA fragments are barcoded in microfluidic droplets.

Journal Article

Share this book

Add to My Shelf

Elevated exopolysaccharide levels in Pseudomonas aeruginosa flagellar mutants have implications for biofilm growth and chronic infections

by Shendure, Jay , Kitzman, Jacob O. , Irie, Yasuhiko in Adaptation , Amino acids , Bacterial Proteins - genetics

2020

Pseudomonas aeruginosa colonizes the airways of cystic fibrosis (CF) patients, causing infections that can last for decades. During the course of these infections, P. aeruginosa undergoes a number of genetic adaptations. One such adaptation is the loss of swimming motility functions. Another involves the formation of the rugose small colony variant (RSCV) phenotype, which is characterized by overproduction of the exopolysaccharides Pel and Psl. Here, we provide evidence that the two adaptations are linked. Using random transposon mutagenesis, we discovered that flagellar mutations are linked to the RSCV phenotype. We found that flagellar mutants overexpressed Pel and Psl in a surface-contact dependent manner. Genetic analyses revealed that flagellar mutants were selected for at high frequencies in biofilms, and that Pel and Psl expression provided the primary fitness benefit in this environment. Suppressor mutagenesis of flagellar RSCVs indicated that Psl overexpression required the mot genes, suggesting that the flagellum stator proteins function in a surface-dependent regulatory pathway for exopolysaccharide biosynthesis. Finally, we identified flagellar mutant RSCVs among CF isolates. The CF environment has long been known to select for flagellar mutants, with the classic interpretation being that the fitness benefit gained relates to an impairment of the host immune system to target a bacterium lacking a flagellum. Our new findings lead us to propose that exopolysaccharide production is a key gain-of-function phenotype that offers a new way to interpret the fitness benefits of these mutations.

Journal Article

Share this book

Add to My Shelf

Massively Parallel Functional Analysis of BRCA1 RING Domain Variants

by Starita, Lea M , Kitzman, Jacob O , Shendure, Jay in Biological variation , BRCA1 Protein - chemistry , BRCA1 Protein - genetics

2015

Interpreting variants of uncertain significance (VUS) is a central challenge in medical genetics. One approach is to experimentally measure the functional consequences of VUS, but to date this approach has been post hoc and low throughput. Here we use massively parallel assays to measure the effects of nearly 2000 missense substitutions in the RING domain of BRCA1 on its E3 ubiquitin ligase activity and its binding to the BARD1 RING domain. From the resulting scores, we generate a model to predict the capacities of full-length BRCA1 variants to support homology-directed DNA repair, the essential role of BRCA1 in tumor suppression, and show that it outperforms widely used biological-effect prediction algorithms. We envision that massively parallel functional assays may facilitate the prospective interpretation of variants observed in clinical sequencing.

Journal Article

Share this book

Add to My Shelf

GCAF(TMEM251) regulates lysosome biogenesis by activating the mannose-6-phosphate pathway

by Li, Ming , Chen, Liang , Kitzman, Jacob O. in 13/1 , 13/31 , 13/44

2022

The mannose-6-phosphate (M6P) biosynthetic pathway for lysosome biogenesis has been studied for decades and is considered a well-understood topic. However, whether this pathway is regulated remains an open question. In a genome-wide CRISPR/Cas9 knockout screen, we discover TMEM251 as the first regulator of the M6P modification. Deleting TMEM251 causes mistargeting of most lysosomal enzymes due to their loss of M6P modification and accumulation of numerous undigested materials. We further demonstrate that TMEM251 localizes to the Golgi and is required for the cleavage and activity of GNPT, the enzyme that catalyzes M6P modification. In zebrafish, TMEM251 deletion leads to severe developmental defects including heart edema and skeletal dysplasia, which phenocopies Mucolipidosis Type II. Our discovery provides a mechanism for the newly discovered human disease caused by TMEM251 mutations. We name TMEM251 as G NPTAB c leavage and a ctivity f actor (GCAF) and its related disease as Mucolipidosis Type V. Lysosomal biogenesis errors often result in diseases including mucolipidosis. Here Zhang and Yang et al. identify TMEM251/GCAF as a mannose-6-phosphate modification regulator that is necessary for correct lysosomal targeting, and classify Mucolipidosis Type V as resulting from GCAF mutations.

Journal Article

Share this book

Add to My Shelf

The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line

by Shendure, Jay , Kitzman, Jacob O. , Lee, Choli in 631/208/177 , 631/208/514/2254 , 631/208/728

2013

Haplotype-resolved whole-genome sequencing of the HeLa CCL-2 strain shows that HeLa is relatively stable in terms of point variation; integration of several data sets reveals strong, haplotype-specific activation of the proto-oncogene MYC by the human papilloma virus type 18 genome, and enables the relationship between gene dosage and expression to be examined. HeLa cell genome is unexpectedly stable The first genomic characterization of the HeLa cancer cell line, the longest-serving and arguably most commonly used human cell line in biomedical research, reveals a genome that is surprisingly stable with respect to both point-mutation and copy-number alterations. The point-mutation rate may be no higher than the somatic mutation rate of normal tissue, and very few copy-number alterations distinguish the genomes of different HeLa strains that were split from one another in the mid-1950s. The authors examine the relationship between gene dosage and expression by integrating several data sets, including those from the ENCODE project, and find strong activation of the MYC proto-oncogene by the human papilloma virus type 18 (HPV-18) integration at chromosome 8q24.21. The HeLa cell line was established in 1951 from cervical cancer cells taken from a patient, Henrietta Lacks. This was the first successful attempt to immortalize human-derived cells in vitro 1 . The robust growth and unrestricted distribution of HeLa cells resulted in its broad adoption—both intentionally and through widespread cross-contamination 2 —and for the past 60 years it has served a role analogous to that of a model organism 3 . The cumulative impact of the HeLa cell line on research is demonstrated by its occurrence in more than 74,000 PubMed abstracts (approximately 0.3%). The genomic architecture of HeLa remains largely unexplored beyond its karyotype 4 , partly because like many cancers, its extensive aneuploidy renders such analyses challenging. We carried out haplotype-resolved whole-genome sequencing 5 of the HeLa CCL-2 strain, examined point- and indel-mutation variations, mapped copy-number variations and loss of heterozygosity regions, and phased variants across full chromosome arms. We also investigated variation and copy-number profiles for HeLa S3 and eight additional strains. We find that HeLa is relatively stable in terms of point variation, with few new mutations accumulating after early passaging. Haplotype resolution facilitated reconstruction of an amplified, highly rearranged region of chromosome 8q24.21 at which integration of the human papilloma virus type 18 (HPV-18) genome occurred and that is likely to be the event that initiated tumorigenesis. We combined these maps with RNA-seq 6 and ENCODE Project 7 data sets to phase the HeLa epigenome. This revealed strong, haplotype-specific activation of the proto-oncogene MYC by the integrated HPV-18 genome approximately 500 kilobases upstream, and enabled global analyses of the relationship between gene dosage and expression. These data provide an extensively phased, high-quality reference genome for past and future experiments relying on HeLa, and demonstrate the value of haplotype resolution for characterizing cancer genomes and epigenomes.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter