Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
64 result(s) for "Grabherr, Manfred"
Sort by:
Computational methods for transcriptome annotation and quantification using RNA-seq
High-throughput RNA sequencing (RNA-seq) promises a comprehensive picture of the transcriptome, allowing for the complete annotation and quantification of all genes and their isoforms across samples. Realizing this promise requires increasingly complex computational methods. These computational challenges fall into three main categories: (i) read mapping, (ii) transcriptome reconstruction and (iii) expression quantification. Here we explain the major conceptual and practical challenges, and the general classes of solutions for each category. Finally, we highlight the interdependence between these categories and discuss the benefits for different biological applications.
De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis
De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net . The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.
An Improved Canine Genome and a Comprehensive Catalogue of Coding Genes and Non-Coding Transcripts
The domestic dog, Canis familiaris, is a well-established model system for mapping trait and disease loci. While the original draft sequence was of good quality, gaps were abundant particularly in promoter regions of the genome, negatively impacting the annotation and study of candidate genes. Here, we present an improved genome build, canFam3.1, which includes 85 MB of novel sequence and now covers 99.8% of the euchromatic portion of the genome. We also present multiple RNA-Sequencing data sets from 10 different canine tissues to catalog ∼175,000 expressed loci. While about 90% of the coding genes previously annotated by EnsEMBL have measurable expression in at least one sample, the number of transcript isoforms detected by our data expands the EnsEMBL annotations by a factor of four. Syntenic comparison with the human genome revealed an additional ∼3,000 loci that are characterized as protein coding in human and were also expressed in the dog, suggesting that those were previously not annotated in the EnsEMBL canine gene set. In addition to ∼20,700 high-confidence protein coding loci, we found ∼4,600 antisense transcripts overlapping exons of protein coding genes, ∼7,200 intergenic multi-exon transcripts without coding potential, likely candidates for long intergenic non-coding RNAs (lincRNAs) and ∼11,000 transcripts were reported by two different library construction methods but did not fit any of the above categories. Of the lincRNAs, about 6,000 have no annotated orthologs in human or mouse. Functional analysis of two novel transcripts with shRNA in a mouse kidney cell line altered cell morphology and motility. All in all, we provide a much-improved annotation of the canine genome and suggest regulatory functions for several of the novel non-coding transcripts.
Full-length transcriptome assembly from RNA-Seq data without a reference genome
Reconstructing full-length transcripts from high-throughput RNA sequencing data is difficult without a reference genome sequence. Grabherr et al . describe Trinity, an algorithm for assembling full-length transcripts from short reads without first mapping the reads to a genome sequence. Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.
MysteryMaster: scraping the bottom of the barrel of barcoded Oxford nanopore reads
Background The high error rate associated with Oxford Nanopore sequencing technology adversely affects demultiplexing. To improve demultiplexing and reduce unclassified reads from nanopore sequencing data, we developed MysteryMaster , a demultiplexer that utilizes the optimal sequence aligner, Cola. Results When compared to Oxford Nanopore´s Dorado and Guppy demultiplexing tools across three datasets of 37 diverse samples with established ground truth, we found that MysteryMaster accurately identifies a similar or greater percentage of reads among the different basecalling models: Fast, HAC, and SUP. MysteryMaster performs slightly better than the other tools on data that was basecalled using the Fast basecalled model, while its performance in HAC and SUP data is similar to Dorado’s. MysteryMaster has a false positive rate of just 0.41% with default settings. Conclusions While MysteryMaster can function as a standalone demultiplexer tool, the sequential application of Dorado and MysteryMaster produced the best overall performance.
The genomic basis of adaptive evolution in threespine sticklebacks
Marine stickleback fish have colonized and adapted to thousands of streams and lakes formed since the last ice age, providing an exceptional opportunity to characterize genomic mechanisms underlying repeated ecological adaptation in nature. Here we develop a high-quality reference genome assembly for threespine sticklebacks. By sequencing the genomes of twenty additional individuals from a global set of marine and freshwater populations, we identify a genome-wide set of loci that are consistently associated with marine–freshwater divergence. Our results indicate that reuse of globally shared standing genetic variation, including chromosomal inversions, has an important role in repeated evolution of distinct marine and freshwater sticklebacks, and in the maintenance of divergent ecotypes during early stages of reproductive isolation. Both coding and regulatory changes occur in the set of loci underlying marine–freshwater evolution, but regulatory changes appear to predominate in this well known example of repeated adaptive evolution in nature. A reference genome sequence for threespine sticklebacks, and re-sequencing of 20 additional world-wide populations, reveals loci used repeatedly during vertebrate evolution; multiple chromosome inversions contribute to marine-freshwater divergence, and regulatory variants predominate over coding variants in this classic example of adaptive evolution in natural environments. The genomics of stickleback speciation Threespine sticklebacks have become a powerful model for studying the molecular basis of adaptive evolution. This paper presents a high-quality reference genome sequence, along with genomes of 20 further individuals from a global set of marine and freshwater populations. Genomic analysis reveals that reuse of globally shared standing genetic variation plays an important part in repeated evolution of distinct stickleback populations, and in the maintenance of divergent ecotypes during early stages of reproductive isolation. The data are consistent with an important role for regulatory changes during parallel evolution of marine and freshwater sticklebacks.
Broad-scale phylogenomics provides insights into retrovirus–host evolution
Genomic data provide an excellent resource to improve understanding of retrovirus evolution and the complex relationships among viruses and their hosts. In conjunction with broad-scale in silico screening of vertebrate genomes, this resource offers an opportunity to complement data on the evolution and frequency of past retroviral spread and so evaluate future risks and limitations for horizontal transmission between different host species. Here, we develop a methodology for extracting phylogenetic signal from large endogenous retrovirus (ERV) datasets by collapsing information to facilitate broad-scale phylogenomics across a wide sample of hosts. Starting with nearly 90,000 ERVs from 60 vertebrate host genomes, we construct phylogenetic hypotheses and draw inferences regarding the designation, host distribution, origin, and transmission of the Gammaretrovirus genus and associated class I ERVs. Our results uncover remarkable depths in retroviral sequence diversity, supported within a phylogenetic context. This finding suggests that current infectious exogenous retrovirus diversity may be underestimated, adding credence to the possibility that many additional exogenous retroviruses may remain to be discovered in vertebrate taxa. We demonstrate a history of frequent horizontal interorder transmissions from a rodent reservoir and suggest that rats may have acted as important overlooked facilitators of gammaretrovirus spread across diverse mammalian hosts. Together, these results demonstrate the promise of the methodology used here to analyze large ERV datasets and improve understanding of retroviral evolution and diversity for utilization in wider applications.
MindReader: Unsupervised Classification of Electroencephalographic Data
Electroencephalogram (EEG) interpretation plays a critical role in the clinical assessment of neurological conditions, most notably epilepsy. However, EEG recordings are typically analyzed manually by highly specialized and heavily trained personnel. Moreover, the low rate of capturing abnormal events during the procedure makes interpretation time-consuming, resource-hungry, and overall an expensive process. Automatic detection offers the potential to improve the quality of patient care by shortening the time to diagnosis, managing big data and optimizing the allocation of human resources towards precision medicine. Here, we present MindReader, a novel unsupervised machine-learning method comprised of the interplay between an autoencoder network, a hidden Markov model (HMM), and a generative component: after dividing the signal into overlapping frames and performing a fast Fourier transform, MindReader trains an autoencoder neural network for dimensionality reduction and compact representation of different frequency patterns for each frame. Next, we processed the temporal patterns using a HMM, while a third and generative component hypothesized and characterized the different phases that were then fed back to the HMM. MindReader then automatically generates labels that the physician can interpret as pathological and non-pathological phases, thus effectively reducing the search space for trained personnel. We evaluated MindReader’s predictive performance on 686 recordings, encompassing more than 980 h from the publicly available Physionet database. Compared to manual annotations, MindReader identified 197 of 198 epileptic events (99.45%), and is, as such, a highly sensitive method, which is a prerequisite for clinical use.
Genomic Analysis of the Basal Lineage Fungus Rhizopus oryzae Reveals a Whole-Genome Duplication
Rhizopus oryzae is the primary cause of mucormycosis, an emerging, life-threatening infection characterized by rapid angioinvasive growth with an overall mortality rate that exceeds 50%. As a representative of the paraphyletic basal group of the fungal kingdom called \"zygomycetes,\" R. oryzae is also used as a model to study fungal evolution. Here we report the genome sequence of R. oryzae strain 99-880, isolated from a fatal case of mucormycosis. The highly repetitive 45.3 Mb genome assembly contains abundant transposable elements (TEs), comprising approximately 20% of the genome. We predicted 13,895 protein-coding genes not overlapping TEs, many of which are paralogous gene pairs. The order and genomic arrangement of the duplicated gene pairs and their common phylogenetic origin provide evidence for an ancestral whole-genome duplication (WGD) event. The WGD resulted in the duplication of nearly all subunits of the protein complexes associated with respiratory electron transport chains, the V-ATPase, and the ubiquitin-proteasome systems. The WGD, together with recent gene duplications, resulted in the expansion of multiple gene families related to cell growth and signal transduction, as well as secreted aspartic protease and subtilase protein families, which are known fungal virulence factors. The duplication of the ergosterol biosynthetic pathway, especially the major azole target, lanosterol 14alpha-demethylase (ERG11), could contribute to the variable responses of R. oryzae to different azole drugs, including voriconazole and posaconazole. Expanded families of cell-wall synthesis enzymes, essential for fungal cell integrity but absent in mammalian hosts, reveal potential targets for novel and R. oryzae-specific diagnostic and therapeutic treatments.
EZTraits: A programmable tool to evaluate multi-site deterministic traits
The vast majority of human traits, including many disease phenotypes, are affected by alleles at numerous genomic loci. With a continually increasing set of variants with published clinical disease or biomarker associations, an easy-to-use tool for non-programmers to rapidly screen VCF files for risk alleles is needed. We have developed EZTraits as a tool to quickly evaluate genotype data against a set of rules defined by the user. These rules can be defined directly in the scripting language Lua , for genotype calls using variant ID (RS number) or chromosomal position. Alternatively, EZTraits can parse simple and intuitive text including concepts like ’ any ’ or ’ all ’. Thus, EZTraits is designed to support rapid genetic analysis and hypothesis-testing by researchers, regardless of programming experience or technical background. The software is implemented in C++ and compiles and runs on Linux and MacOS. The source code is available under the MIT license from https://github.com/selfdecode/rd-eztraits .