Catalogue Search | MBRL

Accurate isoform discovery with IsoQuant using long reads

by Joglekar, Anoushka , Mikheenko, Alla , Tilgner, Hagen U. in 631/114/2785 , 631/114/794 , Agriculture

2023

Annotating newly sequenced genomes and determining alternative isoforms from long-read RNA data are complex and incompletely solved problems. Here we present IsoQuant—a computational tool using intron graphs that accurately reconstructs transcripts both with and without reference genome annotation. For novel transcript discovery, IsoQuant reduces the false-positive rate fivefold and 2.5-fold for Oxford Nanopore reference-based or reference-free mode, respectively. IsoQuant also improves performance for Pacific Biosciences data. IsoQuant predicts novel isoforms from long-read RNA sequencing.

Journal Article

Share this book

Add to My Shelf

Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

by Banfield, Jillian F , Lapidus, Alla , Liu, Wen-Tso in 45/23 , 60 APPLIED LIFE SCIENCES , 631/326/171

2017

Standards for sequencing the microbial 'uncultivated majority', namely bacterial and archaeal single-cell genome sequences, and genome sequences from metagenomic datasets, are proposed. We present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of the Minimum Information about Any (x) Sequence (MIxS). The standards are the Minimum Information about a Single Amplified Genome (MISAG) and the Minimum Information about a Metagenome-Assembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Gene Sequence (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity.

Journal Article

Share this book

Add to My Shelf

CDSnake: Snakemake pipeline for retrieval of annotated OTUs from paired-end reads using CD-HIT utilities

by Lapidus, Alla , Korobeynikov, Anton , Kondratenko, Yulia in 16S metagenomics , Algorithms , Analysis

2020

Background Illumina paired-end reads are often used for 16S analysis in metagenomic studies. Since DNA fragment size is usually smaller than the sum of lengths of paired reads, reads can be merged for downstream analysis. In spite of development of several tools for merging of paired-end reads, poor quality at the 3′ ends within the overlapping region prevents the accurate combining of significant portion of read pairs. Recently CD-HIT-OTU-Miseq was presented as a new approach for 16S analysis using the paired-end reads, it completely avoids the reads merging process due to separate clustering of paired reads. CD-HIT-OTU-Miseq is a set of tools which are supposed to be successively launched by auxiliary shell scripts. This launch mode is not suitable for processing of big amounts of data generated in modern omics experiments. To solve this issue we created CDSnake – Snakemake pipeline utilizing CD-HIT tools for easier consecutive launch of CD-HIT-OTU-Miseq tools for complete processing of paired end reads in metagenomic studies. Usage of pipeline make 16S analysis easier due to one-command launch and helps to yield reproducible results. Results We benchmarked our pipeline against two commonly used pipelines for OTU retrieval, incorporated into popular workflow for microbiome analysis, QIIME2 - DADA2 and deblur. Three mock datasets having highly overlapping paired-end 2 × 250 bp reads were used for benchmarking - Balanced, HMP, and Extreme. CDSnake outputted less OTUs than DADA2 and deblur. However, on Balanced and HMP datasets number of OTUs outputted by CDSnake was closer to real number of strains which were used for mock community generation, than those outputted by DADA2 and deblur. Though generally slower than other pipelines, CDSnake outputted higher total counts, preserving more information from raw data. Inheriting this properties from original CD-HIT-OTU-MiSeq utilities, CDSnake made their usage handier due to simple scalability, easier automated runs and other Snakemake benefits. Conclusions We developed Snakemake pipeline for OTU-MiSeq utilities, which simplified and automated data analysis. Benchmarking showed that this approach is capable to outperform popular tools in certain conditions.

Journal Article

Share this book

Add to My Shelf

The Genomes of the Fungal Plant Pathogens Cladosporium fulvum and Dothistroma septosporum Reveal Adaptation to Different Hosts and Lifestyles But Also Signatures of Common Ancestry

by Ctr BioSyst Genom , Lapidus, Alla , Ohm, Robin A in Adaptation (Biology) , Adaptation, Physiological - genetics , Annan biologi

2012

We sequenced and compared the genomes of the Dothideomycete fungal plant pathogens Cladosporium fulvum (Cfu) (syn. Passalora fulva) and Dothistroma septosporum (Dse) that are closely related phylogenetically, but have different lifestyles and hosts. Although both fungi grow extracellularly in close contact with host mesophyll cells, Cfu is a biotroph infecting tomato, while Dse is a hemibiotroph infecting pine. The genomes of these fungi have a similar set of genes (70% of gene content in both genomes are homologs), but differ significantly in size (Cfu >61.1-Mb; Dse 31.2-Mb), which is mainly due to the difference in repeat content (47.2% in Cfu versus 3.2% in Dse). Recent adaptation to different lifestyles and hosts is suggested by diverged sets of genes. Cfu contains an alpha-tomatinase gene that we predict might be required for detoxification of tomatine, while this gene is absent in Dse. Many genes encoding secreted proteins are unique to each species and the repeat-rich areas in Cfu are enriched for these species-specific genes. In contrast, conserved genes suggest common host ancestry. Homologs of Cfu effector genes, including Ecp2 and Avr4, are present in Dse and induce a Cf-Ecp2- and Cf-4-mediated hypersensitive response, respectively. Strikingly, genes involved in production of the toxin dothistromin, a likely virulence factor for Dse, are conserved in Cfu, but their expression differs markedly with essentially no expression by Cfu in planta. Likewise, Cfu has a carbohydrate-degrading enzyme catalog that is more similar to that of necrotrophs or hemibiotrophs and a larger pectinolytic gene arsenal than Dse, but many of these genes are not expressed in planta or are pseudogenized. Overall, comparison of their genomes suggests that these closely related plant pathogens had a common ancestral host but since adapted to different hosts and lifestyles by a combination of differentiated gene content, pseudogenization, and gene regulation.

Journal Article

Share this book

Add to My Shelf

Metabolic analysis of the soil microbe Dechloromonas aromatica str. RCB: indications of a surprisingly complex life-style and cryptic anaerobic pathways for aromatic degradation

by Trong, Stephan , Salinero, Kennan Kellaris , Lapidus, Alla in Anaerobiosis , Animal Genetics and Genomics , Azoarcus

2009

Background Initial interest in Dechloromonas aromatica strain RCB arose from its ability to anaerobically degrade benzene. It is also able to reduce perchlorate and oxidize chlorobenzoate, toluene, and xylene, creating interest in using this organism for bioremediation. Little physiological data has been published for this microbe. It is considered to be a free-living organism. Results The a priori prediction that the D. aromatica genome would contain previously characterized \"central\" enzymes to support anaerobic aromatic degradation of benzene proved to be false, suggesting the presence of novel anaerobic aromatic degradation pathways in this species. These missing pathways include the benzylsuccinate synthase ( bss ABC) genes (responsible for fumarate addition to toluene) and the central benzoyl-CoA pathway for monoaromatics. In depth analyses using existing TIGRfam, COG, and InterPro models, and the creation of de novo HMM models, indicate a highly complex lifestyle with a large number of environmental sensors and signaling pathways, including a relatively large number of GGDEF domain signal receptors and multiple quorum sensors. A number of proteins indicate interactions with an as yet unknown host, as indicated by the presence of predicted cell host remodeling enzymes, effector enzymes, hemolysin-like proteins, adhesins, NO reductase, and both type III and type VI secretory complexes. Evidence of biofilm formation including a proposed exopolysaccharide complex and exosortase (epsH) are also present. Annotation described in this paper also reveals evidence for several metabolic pathways that have yet to be observed experimentally, including a sulphur oxidation ( sox FCDYZAXB) gene cluster, Calvin cycle enzymes, and proteins involved in nitrogen fixation in other species (including RubisCo, ribulose-phosphate 3-epimerase, and nif gene families, respectively). Conclusion Analysis of the D. aromatica genome indicates there is much to be learned regarding the metabolic capabilities, and life-style, for this microbial species. Examples of recent gene duplication events in signaling as well as dioxygenase clusters are present, indicating selective gene family expansion as a relatively recent event in D. aromatica 's evolutionary history. Gene families that constitute metabolic cycles presumed to create D. aromatica' s environmental 'foot-print' indicate a high level of diversification between its predicted capabilities and those of its close relatives, A. aromaticum str EbN1 and Azoarcus BH72.

Journal Article

Share this book

Add to My Shelf

Comparative genomics of biotechnologically important yeasts

by Klenk, Hans-Peter , Lopes, Mariana R. , Lapidus, Alla in Ascomycetes , Ascomycota - classification , Ascomycota - genetics

2016

Ascomycete yeasts are metabolically diverse, with great potential for biotechnology. Here, we report the comparative genome analysis of 29 taxonomically and biotechnologically important yeasts, including 16 newly sequenced. We identify a genetic code change, CUG-Ala, in Pachysolen tannophilus in the clade sister to the known CUG-Ser clade. Our well-resolved yeast phylogeny shows that some traits, such as methylotrophy, are restricted to single clades, whereas others, such as L-rhamnose utilization, have patchy phylogenetic distributions. Gene clusters, with variable organization and distribution, encode many pathways of interest. Genomics can predict some biochemical traits precisely, but the genomic basis of others, such as xylose utilization, remains unresolved. Our data also provide insight into early evolution of ascomycetes. We document the loss of H3K9me2/3 heterochromatin, the origin of ascomycete mating-type switching, and panascomycete synteny at the MAT locus. These data and analyses will facilitate the engineering of efficient biosynthetic and degradative pathways and gateways for genomic manipulation.

Journal Article

Share this book

Add to My Shelf

Transposable elements versus the fungal genome: impact on whole-genome architecture and transcriptional profiles

by Stajich, Jason E , Castanera Andrés, Raúl , Schmutz, Jeremy in 60 APPLIED LIFE SCIENCES , Ascomycota - genetics , Base Sequence

2016

Transposable elements (TEs) are exceptional contributors to eukaryotic genome diversity. Their ubiquitous presence impacts the genomes of nearly all species and mediates genome evolution by causing mutations and chromosomal rearrangements and by modulating gene expression. We performed an exhaustive analysis of the TE content in 18 fungal genomes, including strains of the same species and species of the same genera. Our results depicted a scenario of exceptional variability, with species having 0.02 to 29.8% of their genome consisting of transposable elements. A detailed analysis performed on two strains of Pleurotus ostreatus uncovered a genome that is populated mainly by Class I elements, especially LTR-retrotransposons amplified in recent bursts from 0 to 2 million years (My) ago. The preferential accumulation of TEs in clusters led to the presence of genomic regions that lacked intra- and inter-specific conservation. In addition, we investigated the effect of TE insertions on the expression of their nearby upstream and downstream genes. Our results showed that an important number of genes under TE influence are significantly repressed, with stronger repression when genes are localized within transposon clusters. Our transcriptional analysis performed in four additional fungal models revealed that this TE-mediated silencing was present only in species with active cytosine methylation machinery. We hypothesize that this phenomenon is related to epigenetic defense mechanisms that are aimed to suppress TE expression and control their proliferation.

Journal Article

Share this book

Add to My Shelf

The Complete Genome Sequence of Cupriavidus metallidurans Strain CH34, a Master Survivalist in Harsh and Anthropogenic Environments

by Vallaeys, Tatiana , Dunn, John , Lapidus, Alla in Analysis , Annotations , Anthropogenic factors

2010

Many bacteria in the environment have adapted to the presence of toxic heavy metals. Over the last 30 years, this heavy metal tolerance was the subject of extensive research. The bacterium Cupriavidus metallidurans strain CH34, originally isolated by us in 1976 from a metal processing factory, is considered a major model organism in this field because it withstands milli-molar range concentrations of over 20 different heavy metal ions. This tolerance is mostly achieved by rapid ion efflux but also by metal-complexation and -reduction. We present here the full genome sequence of strain CH34 and the manual annotation of all its genes. The genome of C. metallidurans CH34 is composed of two large circular chromosomes CHR1 and CHR2 of, respectively, 3,928,089 bp and 2,580,084 bp, and two megaplasmids pMOL28 and pMOL30 of, respectively, 171,459 bp and 233,720 bp in size. At least 25 loci for heavy-metal resistance (HMR) are distributed over the four replicons. Approximately 67% of the 6,717 coding sequences (CDSs) present in the CH34 genome could be assigned a putative function, and 9.1% (611 genes) appear to be unique to this strain. One out of five proteins is associated with either transport or transcription while the relay of environmental stimuli is governed by more than 600 signal transduction systems. The CH34 genome is most similar to the genomes of other Cupriavidus strains by correspondence between the respective CHR1 replicons but also displays similarity to the genomes of more distantly related species as a result of gene transfer and through the presence of large genomic islands. The presence of at least 57 IS elements and 19 transposons and the ability to take in and express foreign genes indicates a very dynamic and complex genome shaped by evolutionary forces. The genome data show that C. metallidurans CH34 is particularly well equipped to live in extreme conditions and anthropogenic environments that are rich in metals.

Journal Article

Share this book

Add to My Shelf

Genome sequence of the button mushroom Agaricus bisporus reveals mechanisms governing adaptation to a humic-rich ecological niche

by Lucas, Susan M , Patyshakuliyeva, Aleksandrina , Unité de recherche Mycologie et Sécurité des Aliments (MycSA) ; Institut National de la Recherche Agronomique (INRA) in Adaptation, Physiological - genetics , Agaricus , Agaricus - genetics

2012

Agaricus bisporus is the model fungus for the adaptation, persistence, and growth in the humic-rich leaf-litter environment. Aside from its ecological role, A. bisporus has been an important component of the human diet for over 200 y and worldwide cultivation of the “button mushroom” forms a multibillion dollar industry. We present two A. bisporus genomes, their gene repertoires and transcript profiles on compost and during mushroom formation. The genomes encode a full repertoire of polysaccharide-degrading enzymes similar to that of wood-decayers. Comparative transcriptomics of mycelium grown on defined medium, casing-soil, and compost revealed genes encoding enzymes involved in xylan, cellulose, pectin, and protein degradation are more highly expressed in compost. The striking expansion of heme-thiolate peroxidases and β-etherases is distinctive from Agaricomycotina wood-decayers and suggests a broad attack on decaying lignin and related metabolites found in humic acid-rich environment. Similarly, up-regulation of these genes together with a lignolytic manganese peroxidase, multiple copper radical oxidases, and cytochrome P450s is consistent with challenges posed by complex humic-rich substrates. The gene repertoire and expression of hydrolytic enzymes in A. bisporus is substantially different from the taxonomically related ectomycorrhizal symbiont Laccaria bicolor . A common promoter motif was also identified in genes very highly expressed in humic-rich substrates. These observations reveal genetic and enzymatic mechanisms governing adaptation to the humic-rich ecological niche formed during plant degradation, further defining the critical role such fungi contribute to soil structure and carbon sequestration in terrestrial ecosystems. Genome sequence will expedite mushroom breeding for improved agronomic characteristics.

Journal Article

Share this book

Add to My Shelf

The Fast Changing Landscape of Sequencing Technologies and Their Impact on Microbial Genome Assemblies and Annotation

by Woyke, Tanja , Kyrpides, Nikos C. , Clum, Alicia in Algorithms , Analysis , Annotations

2012

The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter