Catalogue Search | MBRL

MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects

by Yandell, Mark , Holt, Carson in Algorithms , Animals , Bioinformatics

2011

Background Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita . This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. Results We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. Conclusions MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets.

Journal Article

Share this book

Add to My Shelf

Settling the score: variant prioritization and Mendelian disease

by Yandell, Mark , Eilbeck, Karen , Quinlan, Aaron in 631/114/129/2043 , 631/114/2184 , 631/1647/514/2254

2017

Key Points Exome and genome sequencing reveal thousands to millions of genetic variants in a typical individual. A fundamental challenge in human genetics is isolating the small subset (typically one or two) of variants that cause a Mendelian disease phenotype. This Review describes the computational approaches used to prioritize variants in Mendelian disease. A multitude of tools prioritize variants on the basis of biochemical, evolutionary, allele segregation and population frequency characteristics in an attempt to prioritize the list of potential causative variants. The strategies and caveats associated with these tools are outlined in this Review. Burden tests take prioritization to the next level by aggregating the variants observed at a given locus to calculate a burden score for the gene. Most burden testing software tools also evaluate potentially damaging genotypes in the context of other genotypes observed at the same locus in a control population. Variant interpretation is the process of drawing direct connections from individual variants to disease phenotypes, and this process is central to both clinical reporting of results and incidental findings, as well as research endeavours that include variant discovery and return of results. Variant prioritization and interpretation are especially challenging for non-coding variants, structural variants and synonymous exonic variants. Furthermore, increasingly complex reference genomes introduce new demands for variant discovery tools. Each of these challenges drive increasingly sophisticated software solutions. For clinical cases of Mendelian disease that lack a genetic diagnosis, genome and exome sequencing are increasingly used for seeking the genetic cause. This Review discusses the strategies and computational tools for prioritizing the many genetic variants identified in each genome into those that are most likely to be causal for disease. The authors discuss how diverse types of biochemical, evolutionary, pedigree and clinical-phenotype information are used, and they highlight common pitfalls to be aware of for responsible variant prioritization. When investigating Mendelian disease using exome or genome sequencing, distinguishing disease-causing genetic variants from the multitude of candidate variants is a complex, multidimensional task. Many prioritization tools and online interpretation resources exist, and professional organizations have offered clinical guidelines for review and return of prioritization results. In this Review, we describe the strengths and weaknesses of widely used computational approaches, explain their roles in the diagnostic and discovery process and discuss how they can inform (and misinform) expert reviewers. We place variant prioritization in the wider context of gene prioritization, burden testing and genotype–phenotype association, and we discuss opportunities and challenges introduced by whole-genome sequencing.

Journal Article

Share this book

Add to My Shelf

Transposable Elements Are Major Contributors to the Origin, Diversification, and Regulation of Vertebrate Long Noncoding RNAs

by Bourque, Guillaume , Lynch, Vincent J. , Feschotte, Cédric in Animals , Biology , DNA Transposable Elements

2013

Advances in vertebrate genomics have uncovered thousands of loci encoding long noncoding RNAs (lncRNAs). While progress has been made in elucidating the regulatory functions of lncRNAs, little is known about their origins and evolution. Here we explore the contribution of transposable elements (TEs) to the makeup and regulation of lncRNAs in human, mouse, and zebrafish. Surprisingly, TEs occur in more than two thirds of mature lncRNA transcripts and account for a substantial portion of total lncRNA sequence (~30% in human), whereas they seldom occur in protein-coding transcripts. While TEs contribute less to lncRNA exons than expected, several TE families are strongly enriched in lncRNAs. There is also substantial interspecific variation in the coverage and types of TEs embedded in lncRNAs, partially reflecting differences in the TE landscapes of the genomes surveyed. In human, TE sequences in lncRNAs evolve under greater evolutionary constraint than their non-TE sequences, than their intronic TEs, or than random DNA. Consistent with functional constraint, we found that TEs contribute signals essential for the biogenesis of many lncRNAs, including ~30,000 unique sites for transcription initiation, splicing, or polyadenylation in human. In addition, we identified ~35,000 TEs marked as open chromatin located within 10 kb upstream of lncRNA genes. The density of these marks in one cell type correlate with elevated expression of the downstream lncRNA in the same cell type, suggesting that these TEs contribute to cis-regulation. These global trends are recapitulated in several lncRNAs with established functions. Finally a subset of TEs embedded in lncRNAs are subject to RNA editing and predicted to form secondary structures likely important for function. In conclusion, TEs are nearly ubiquitous in lncRNAs and have played an important role in the lineage-specific diversification of vertebrate lncRNA repertoires.

Journal Article

Share this book

Add to My Shelf

MAKER-P: A Tool Kit for the Rapid Creation, Management, and Quality Control of Plant Genome Annotations

by Moghe, Gaurav D. , Hufnagel, David E. , Yandell, Mark in Alternative Splicing - genetics , Arabidopsis , Arabidopsis - genetics

2014

We have optimized and extended the widely used annotation engine MAKER in order to better support plant genome annotation efforts. New features include better parallelization for large repeat-rich plant genomes, noncoding RNA annotation capabilities, and support for pseudogene identification. We have benchmarked the resulting software tool kit, MAKER-P, using the Arabidopsis (Arabidopsis thaliana) and maize (Zea mays) genomes. Here, we demonstrate the ability of the MAKER-P tool kit to automatically update, extend, and revise the Arabidopsis annotations in light of newly available data and to annotate pseudogenes and noncoding RNAs absent from The Arabidopsis Informatics Resource 10 build. Our results demonstrate that MAKER-P can be used to manage and improve the annotations of even Arabidopsis, perhaps the best-annotated plant genome. We have also installed and benchmarked MAKER-P on the Texas Advanced Computing Center. We show that this public resource can de novo annotate the entire Arabidopsis and maize genomes in less than 3 h and produce annotations of comparable quality to those of the current The Arabidopsis Information Resource 10 and maize V2 annotation builds.

Journal Article

Share this book

Add to My Shelf

An artificial intelligence approach for investigating multifactorial pain-related features of endometriosis

by Hernandez, Edgar Javier , Eilbeck, Karen , Schliep, Karen C. in Analysis , Artificial Intelligence , Bayes Theorem

2024

Endometriosis is a debilitating, chronic disease that is estimated to affect 11% of reproductive-age women. Diagnosis of endometriosis is difficult with diagnostic delays of up to 12 years reported. These delays can negatively impact health and quality of life. Vague, nonspecific symptoms, like pain, with multiple differential diagnoses contribute to the difficulty of diagnosis. By investigating previously imprecise symptoms of pain, we sought to clarify distinct pain symptoms indicative of endometriosis, using an artificial intelligence-based approach. We used data from 473 women undergoing laparoscopy or laparotomy for a variety of surgical indications. Multiple anatomical pain locations were clustered based on the associations across samples to increase the power in the probability calculations. A Bayesian network was developed using pain-related features, subfertility, and diagnoses. Univariable and multivariable analyses were performed by querying the network for the relative risk of a postoperative diagnosis, given the presence of different symptoms. Performance and sensitivity analyses demonstrated the advantages of Bayesian network analysis over traditional statistical techniques. Clustering grouped the 155 anatomical sites of pain into 15 pain locations. After pruning, the final Bayesian network included 18 nodes. The presence of any pain-related feature increased the relative risk of endometriosis (p-value < 0.001). The constellation of chronic pelvic pain, subfertility, and dyspareunia resulted in the greatest increase in the relative risk of endometriosis. The performance and sensitivity analyses demonstrated that the Bayesian network could identify and analyze more significant associations with endometriosis than traditional statistical techniques. Pelvic pain, frequently associated with endometriosis, is a common and vague symptom. Our Bayesian network for the study of pain-related features of endometriosis revealed specific pain locations and pain types that potentially forecast the diagnosis of endometriosis.

Journal Article

Share this book

Add to My Shelf

Characterization of the Conus bullatus genome and its venom-duct transcriptome

by Bandyopadhyay, Pradip K , Olivera, Baldomero M , Hu, Hao in Amino acids , Animal Genetics and Genomics , Animals

2011

Background The venomous marine gastropods, cone snails (genus Conus ), inject prey with a lethal cocktail of conopeptides, small cysteine-rich peptides, each with a high affinity for its molecular target, generally an ion channel, receptor or transporter. Over the last decade, conopeptides have proven indispensable reagents for the study of vertebrate neurotransmission. Conus bullatus belongs to a clade of Conus species called Textilia , whose pharmacology is still poorly characterized. Thus the genomics analyses presented here provide the first step toward a better understanding the enigmatic Textilia clade. Results We have carried out a sequencing survey of the Conus bullatus genome and venom-duct transcriptome. We find that conopeptides are highly expressed within the venom-duct, and describe an in silico pipeline for their discovery and characterization using RNA-seq data. We have also carried out low-coverage shotgun sequencing of the genome, and have used these data to determine its size, genome-wide base composition, simple repeat, and mobile element densities. Conclusions Our results provide the first global view of venom-duct transcription in any cone snail. A notable feature of Conus bullatus venoms is the breadth of A-superfamily peptides expressed in the venom duct, which are unprecedented in their structural diversity. We also find SNP rates within conopeptides are higher compared to the remainder of C. bullatus transcriptome, consistent with the hypothesis that conopeptides are under diversifying selection.

Journal Article

Share this book

Add to My Shelf

Transposable element islands facilitate adaptation to novel environments in an invasive species

by Ence, Daniel , Yandell, Mark , Kim, Jay W. in 631/158/2178 , 631/158/857 , 631/181/2474

2014

Adaptation requires genetic variation, but founder populations are generally genetically depleted. Here we sequence two populations of an inbred ant that diverge in phenotype to determine how variability is generated. Cardiocondyla obscurior has the smallest of the sequenced ant genomes and its structure suggests a fundamental role of transposable elements (TEs) in adaptive evolution. Accumulations of TEs (TE islands) comprising 7.18% of the genome evolve faster than other regions with regard to single-nucleotide variants, gene/exon duplications and deletions and gene homology. A non-random distribution of gene families, larvae/adult specific gene expression and signs of differential methylation in TE islands indicate intragenomic differences in regulation, evolutionary rates and coalescent effective population size. Our study reveals a tripartite interplay between TEs, life history and adaptation in an invasive species. Genetic variation is key to species evolution. Here the authors sequence two phenotypically distinct populations of the ant Cardiocondyla obscurior , and find accumulations of transposable elements correlating with genetic variation that may have a role in differentiation, adaptation and speciation.

Journal Article

Share this book

Add to My Shelf

Quantitative measures for the management and comparison of annotated genomes

by Eilbeck, Karen , Moore, Barry , Holt, Carson in Algorithms , Alternative Splicing , Animals

2009

Background The ever-increasing number of sequenced and annotated genomes has made management of their annotations a significant undertaking, especially for large eukaryotic genomes containing many thousands of genes. Typically, changes in gene and transcript numbers are used to summarize changes from release to release, but these measures say nothing about changes to individual annotations, nor do they provide any means to identify annotations in need of manual review. Results In response, we have developed a suite of quantitative measures to better characterize changes to a genome's annotations between releases, and to prioritize problematic annotations for manual review. We have applied these measures to the annotations of five eukaryotic genomes over multiple releases – H. sapiens , M. musculus , D. melanogaster , A. gambiae , and C. elegans . Conclusion Our results provide the first detailed, historical overview of how these genomes' annotations have changed over the years, and demonstrate the usefulness of these measures for genome annotation management.

Journal Article

Share this book

Add to My Shelf

Artificial intelligence enables comprehensive genome interpretation and nomination of candidate diagnoses for rare genetic diseases

by Frise, Erwin , McCarthy, Jeanette , Yandell, Mark in Algorithms , Analysis , Artificial Intelligence

2021

Background Clinical interpretation of genetic variants in the context of the patient’s phenotype is becoming the largest component of cost and time expenditure for genome-based diagnosis of rare genetic diseases. Artificial intelligence (AI) holds promise to greatly simplify and speed genome interpretation by integrating predictive methods with the growing knowledge of genetic disease. Here we assess the diagnostic performance of Fabric GEM, a new, AI-based, clinical decision support tool for expediting genome interpretation. Methods We benchmarked GEM in a retrospective cohort of 119 probands, mostly NICU infants, diagnosed with rare genetic diseases, who received whole-genome or whole-exome sequencing (WGS, WES). We replicated our analyses in a separate cohort of 60 cases collected from five academic medical centers. For comparison, we also analyzed these cases with current state-of-the-art variant prioritization tools. Included in the comparisons were trio, duo, and singleton cases. Variants underpinning diagnoses spanned diverse modes of inheritance and types, including structural variants (SVs). Patient phenotypes were extracted from clinical notes by two means: manually and using an automated clinical natural language processing (CNLP) tool. Finally, 14 previously unsolved cases were reanalyzed. Results GEM ranked over 90% of the causal genes among the top or second candidate and prioritized for review a median of 3 candidate genes per case, using either manually curated or CNLP-derived phenotype descriptions. Ranking of trios and duos was unchanged when analyzed as singletons. In 17 of 20 cases with diagnostic SVs, GEM identified the causal SVs as the top candidate and in 19/20 within the top five, irrespective of whether SV calls were provided or inferred ab initio by GEM using its own internal SV detection algorithm. GEM showed similar performance in absence of parental genotypes. Analysis of 14 previously unsolved cases resulted in a novel finding for one case, candidates ultimately not advanced upon manual review for 3 cases, and no new findings for 10 cases. Conclusions GEM enabled diagnostic interpretation inclusive of all variant types through automated nomination of a very short list of candidate genes and disorders for final review and reporting. In combination with deep phenotyping by CNLP, GEM enables substantial automation of genetic disease diagnosis, potentially decreasing cost and expediting case review.

Journal Article

Share this book

Add to My Shelf

Elucidation of the molecular envenomation strategy of the cone snail Conus geographus through transcriptome sequencing of its venom duct

by Bandyopadhyay, Pradip K , Olivera, Baldomero M , Hu, Hao in Academic libraries , Amino Acid Sequence , Analysis

2012

Background The fish-hunting cone snail, Conus geographus , is the deadliest snail on earth. In the absence of medical intervention, 70% of human stinging cases are fatal. Although, its venom is known to consist of a cocktail of small peptides targeting different ion-channels and receptors, the bulk of its venom constituents, their sites of manufacture, relative abundances and how they function collectively in envenomation has remained unknown. Results We have used transcriptome sequencing to systematically elucidate the contents the C. geographus venom duct, dividing it into four segments in order to investigate each segment’s mRNA contents. Three different types of calcium channel (each targeted by unrelated, entirely distinct venom peptides) and at least two different nicotinic receptors appear to be targeted by the venom. Moreover, the most highly expressed venom component is not paralytic, but causes sensory disorientation and is expressed in a different segment of the venom duct from venoms believed to cause sensory disruption. We have also identified several new toxins of interest for pharmaceutical and neuroscience research. Conclusions Conus geographus is believed to prey on fish hiding in reef crevices at night. Our data suggest that disorientation of prey is central to its envenomation strategy. Furthermore, venom expression profiles also suggest a sophisticated layering of venom-expression patterns within the venom duct, with disorientating and paralytic venoms expressed in different regions. Thus, our transcriptome analysis provides a new physiological framework for understanding the molecular envenomation strategy of this deadly snail.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter