Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
304
result(s) for
"Kyrpides, Nikos"
Sort by:
CheckV assesses the quality and completeness of metagenome-assembled viral genomes
by
Camargo, Antonio Pedro
,
Schulz, Frederik
,
Kyrpides, Nikos C.
in
631/114/2785
,
631/326/596/2142
,
Agriculture
2021
Millions of new viral sequences have been identified from metagenomes, but the quality and completeness of these sequences vary considerably. Here we present CheckV, an automated pipeline for identifying closed viral genomes, estimating the completeness of genome fragments and removing flanking host regions from integrated proviruses. CheckV estimates completeness by comparing sequences with a large database of complete viral genomes, including 76,262 identified from a systematic search of publicly available metagenomes, metatranscriptomes and metaviromes. After validation on mock datasets and comparison to existing methods, we applied CheckV to large and diverse collections of metagenome-assembled viral sequences, including IMG/VR and the Global Ocean Virome. This revealed 44,652 high-quality viral genomes (that is, >90% complete), although the vast majority of sequences were small fragments, which highlights the challenge of assembling viral genomes from short-read metagenomes. Additionally, we found that removal of host contamination substantially improved the accurate identification of auxiliary metabolic genes and interpretation of viral-encoded functions.
The quality of viral genomes assembled from metagenome data is assessed by CheckV.
Journal Article
New insights from uncultivated genomes of the global human gut microbiome
by
Pollard, Katherine S.
,
Shi, Zhou Jason
,
Kyrpides, Nikos C.
in
631/114/2785
,
631/326/171
,
Acids
2019
The genome sequences of many species of the human gut microbiome remain unknown, largely owing to challenges in cultivating microorganisms under laboratory conditions. Here we address this problem by reconstructing 60,664 draft prokaryotic genomes from 3,810 faecal metagenomes, from geographically and phenotypically diverse humans. These genomes provide reference points for 2,058 newly identified species-level operational taxonomic units (OTUs), which represents a 50% increase over the previously known phylogenetic diversity of sequenced gut bacteria. On average, the newly identified OTUs comprise 33% of richness and 28% of species abundance per individual, and are enriched in humans from rural populations. A meta-analysis of clinical gut-microbiome studies pinpointed numerous disease associations for the newly identified OTUs, which have the potential to improve predictive models. Finally, our analysis revealed that uncultured gut species have undergone genome reduction that has resulted in the loss of certain biosynthetic pathways, which may offer clues for improving cultivation strategies in the future.
Draft prokaryotic genomes from faecal metagenomes of diverse human populations enrich our understanding of the human gut microbiome by identifying over two thousand new species-level taxa that have numerous disease associations.
Journal Article
A unified catalog of 204,938 reference genomes from the human gut microbiome
by
Pollard, Katherine S.
,
Almeida, Alexandre
,
Boland, Miguel
in
631/326/2565/2134
,
631/326/2565/2142
,
Agriculture
2021
Comprehensive, high-quality reference genomes are required for functional characterization and taxonomic assignment of the human gut microbiota. We present the Unified Human Gastrointestinal Genome (UHGG) collection, comprising 204,938 nonredundant genomes from 4,644 gut prokaryotes. These genomes encode >170 million protein sequences, which we collated in the Unified Human Gastrointestinal Protein (UHGP) catalog. The UHGP more than doubles the number of gut proteins in comparison to those present in the Integrated Gene Catalog. More than 70% of the UHGG species lack cultured representatives, and 40% of the UHGP lack functional annotations. Intraspecies genomic variation analyses revealed a large reservoir of accessory genes and single-nucleotide variants, many of which are specific to individual human populations. The UHGG and UHGP collections will enable studies linking genotypes to phenotypes in the human gut microbiome.
More than 200,000 gut prokaryotic reference genomes and the proteins they encode are collated, providing comprehensive resources for microbiome researchers.
Journal Article
Identification of mobile genetic elements with geNomad
2024
Identifying and characterizing mobile genetic elements in sequencing data is essential for understanding their diversity, ecology, biotechnological applications and impact on public health. Here we introduce geNomad, a classification and annotation framework that combines information from gene content and a deep neural network to identify sequences of plasmids and viruses. geNomad uses a dataset of more than 200,000 marker protein profiles to provide functional gene annotation and taxonomic assignment of viral genomes. Using a conditional random field model, geNomad also detects proviruses integrated into host genomes with high precision. In benchmarks, geNomad achieved high classification performance for diverse plasmids and viruses (Matthews correlation coefficient of 77.8% and 95.3%, respectively), substantially outperforming other tools. Leveraging geNomad’s speed and scalability, we processed over 2.7 trillion base pairs of sequencing data, leading to the discovery of millions of viruses and plasmids that are available through the IMG/VR and IMG/PR databases. geNomad is available at
https://portal.nersc.gov/genomad
.
geNomad identifies mobile genetic elements in sequencing data.
Journal Article
Programmed DNA destruction by miniature CRISPR-Cas14 enzymes
by
Harrington, Lucas B.
,
Kyrpides, Nikos C.
,
Chen, Janice S.
in
Adaptive immunity
,
Adaptive systems
,
Amino acids
2018
CRISPR-Cas9 systems have been causing a revolution in biology. Harrington
et al.
describe the discovery and technological implementation of an additional type of CRISPR system based on an extracompact effector protein, Cas14. Metagenomics data, particularly from uncultivated samples, uncovered the CRISPR-Cas14 systems containing all the components necessary for adaptive immunity in prokaryotes. At half the size of class 2 CRISPR effectors, Cas14 appears to target single-stranded DNA without class 2 sequence restrictions. By leveraging this activity, a fast and high-fidelity nucleic acid detection system enabled detection of single-nucleotide polymorphisms.
Science
, this issue p.
839
Identification, characterization, and technological implementation of additional archaea-derived CRISPR-Cas14 systems are described.
CRISPR-Cas systems provide microbes with adaptive immunity to infectious nucleic acids and are widely employed as genome editing tools. These tools use RNA-guided Cas proteins whose large size (950 to 1400 amino acids) has been considered essential to their specific DNA- or RNA-targeting activities. Here we present a set of CRISPR-Cas systems from uncultivated archaea that contain Cas14, a family of exceptionally compact RNA-guided nucleases (400 to 700 amino acids). Despite their small size, Cas14 proteins are capable of targeted single-stranded DNA (ssDNA) cleavage without restrictive sequence requirements. Moreover, target recognition by Cas14 triggers nonspecific cutting of ssDNA molecules, an activity that enables high-fidelity single-nucleotide polymorphism genotyping (Cas14-DETECTR). Metagenomic data show that multiple CRISPR-Cas14 systems evolved independently and suggest a potential evolutionary origin of single-effector CRISPR-based adaptive immunity.
Journal Article
Comparative Metagenomic and Metatranscriptomic Analysis of Hindgut Paunch Microbiota in Wood- and Dung-Feeding Higher Termites
by
He, Shaomei
,
Scheffrahn, Rudolf H.
,
Kyrpides, Nikos C.
in
Amitermes wheeleri
,
Analysis
,
Animals
2013
Termites effectively feed on many types of lignocellulose assisted by their gut microbial symbionts. To better understand the microbial decomposition of biomass with varied chemical profiles, it is important to determine whether termites harbor different microbial symbionts with specialized functionalities geared toward different feeding regimens. In this study, we compared the microbiota in the hindgut paunch of Amitermes wheeleri collected from cow dung and Nasutitermes corniger feeding on sound wood by 16S rRNA pyrotag, comparative metagenomic and metatranscriptomic analyses. We found that Firmicutes and Spirochaetes were the most abundant phyla in A. wheeleri, in contrast to N. corniger where Spirochaetes and Fibrobacteres dominated. Despite this community divergence, a convergence was observed for functions essential to termite biology including hydrolytic enzymes, homoacetogenesis and cell motility and chemotaxis. Overrepresented functions in A. wheeleri relative to N. corniger microbiota included hemicellulose breakdown and fixed-nitrogen utilization. By contrast, glycoside hydrolases attacking celluloses and nitrogen fixation genes were overrepresented in N. corniger microbiota. These observations are consistent with dietary differences in carbohydrate composition and nutrient contents, but may also reflect the phylogenetic difference between the hosts.
Journal Article
Protein structure determination using metagenome sequence data
by
Huang, Po-Ssu
,
Kim, David E.
,
Kamisetty, Hetunandan
in
Algorithms
,
Alignment
,
Amino Acid Sequence
2017
Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families and that metagenome sequence data more than triple the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact-based structure matching, and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the Protein Data Bank. This approach provides the representative models for large protein families originally envisioned as the goal of the Protein Structure Initiative at a fraction of the cost.
Journal Article
Giant virus diversity and host interactions through global metagenomics
2020
Our current knowledge about nucleocytoplasmic large DNA viruses (NCLDVs) is largely derived from viral isolates that are co-cultivated with protists and algae. Here we reconstructed 2,074 NCLDV genomes from sampling sites across the globe by building on the rapidly increasing amount of publicly available metagenome data. This led to an 11-fold increase in phylogenetic diversity and a parallel 10-fold expansion in functional diversity. Analysis of 58,023 major capsid proteins from large and giant viruses using metagenomic data revealed the global distribution patterns and cosmopolitan nature of these viruses. The discovered viral genomes encoded a wide range of proteins with putative roles in photosynthesis and diverse substrate transport processes, indicating that host reprogramming is probably a common strategy in the NCLDVs. Furthermore, inferences of horizontal gene transfer connected viral lineages to diverse eukaryotic hosts. We anticipate that the global diversity of NCLDVs that we describe here will establish giant viruses—which are associated with most major eukaryotic lineages—as important players in ecosystems across Earth’s biomes.
Analysis of metagenomics data revealed that large and giant viruses are globally widely distributed and are associated with most major eukaryotic lineages.
Journal Article
Giant viruses with an expanded complement of translation system components
by
Daims, Holger
,
Koonin, Eugene V.
,
Woyke, Tanja
in
60 APPLIED LIFE SCIENCES
,
Amino acids
,
Amino Acyl-tRNA Synthetases - chemistry
2017
The discovery of giant viruses blurred the sharp division between viruses and cellular life. Giant virus genomes encode proteins considered as signatures of cellular organisms, particularly translation system components, prompting hypotheses that these viruses derived from a fourth domain of cellular life. Here we report the discovery of a group of giant viruses (Klosneuviruses) in metagenomic data. Compared with other giant viruses, the Klosneuviruses encode an expanded translation machinery, including aminoacyl transfer RNA synthetases with specificities for all 20 amino acids. Notwithstanding the prevalence of translation system components, comprehensive phylogenomic analysis of these genes indicates that Klosneuviruses did not evolve from a cellular ancestor but rather are derived from a much smaller virus through extensive gain of host genes.
Journal Article
Uncovering Earth’s virome
by
Huntemann, Marcel
,
Kyrpides, Nikos C.
,
Mikhailova, Natalia
in
60 APPLIED LIFE SCIENCES
,
631/158/855
,
631/326/1321
2016
Viruses are the most abundant biological entities on Earth, but challenges in detecting, isolating, and classifying unknown viruses have prevented exhaustive surveys of the global virome. Here we analysed over 5 Tb of metagenomic sequence data from 3,042 geographically diverse samples to assess the global distribution, phylogenetic diversity, and host specificity of viruses. We discovered over 125,000 partial DNA viral genomes, including the largest phage yet identified, and increased the number of known viral genes by 16-fold. Half of the predicted partial viral genomes were clustered into genetically distinct groups, most of which included genes unrelated to those in known viruses. Using CRISPR spacers and transfer RNA matches to link viral groups to microbial host(s), we doubled the number of microbial phyla known to be infected by viruses, and identified viruses that can infect organisms from different phyla. Analysis of viral distribution across diverse ecosystems revealed strong habitat-type specificity for the vast majority of viruses, but also identified some cosmopolitan groups. Our results highlight an extensive global viral diversity and provide detailed insight into viral habitat distribution and host–virus interactions.
An integrated computational approach that explores the viral content of more than 3,000 metagenomic samples collected globally highlights the existing global viral diversity, increases the known number of viral genes by an order of magnitude, and provides detailed insights into viral distribution across diverse ecosystems and into virus–host interactions.
A map of the viral world
Viruses influence virtually all of the biogeochemical processes occurring on our planet, but they remain enigmatic because it has proved difficult to detect, isolate and classify them in large-scale studies. However, in recent years a vast amount of metagenomic data have been collected, and now Nikos Kyrpides and colleagues have developed a computational approach to extract more detail from that dataset and create the first global map of viral biogeography. They explore the viral content of more than 3,000 metagenomic samples collected globally, identify 125,000 partial DNA viral genomes — including the largest known phage — and increase the number of known viral genes 16-fold.
Journal Article