Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
857 result(s) for "orthology"
Sort by:
eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale
Abstract Even though automated functional annotation of genes represents a fundamental step in most genomic and metagenomic workflows, it remains challenging at large scales. Here, we describe a major upgrade to eggNOG-mapper, a tool for functional annotation based on precomputed orthology assignments, now optimized for vast (meta)genomic data sets. Improvements in version 2 include a full update of both the genomes and functional databases to those from eggNOG v5, as well as several efficiency enhancements and new features. Most notably, eggNOG-mapper v2 now allows for: 1) de novo gene prediction from raw contigs, 2) built-in pairwise orthology prediction, 3) fast protein domain discovery, and 4) automated GFF decoration. eggNOG-mapper v2 is available as a standalone tool or as an online service at http://eggnog-mapper.embl.de.
OrthoFinder: phylogenetic orthology inference for comparative genomics
Here, we present a major advance of the OrthoFinder method. This extends OrthoFinder’s high accuracy orthogroup inference to provide phylogenetic inference of orthologs, rooted gene trees, gene duplication events, the rooted species tree, and comparative genomics statistics. Each output is benchmarked on appropriate real or simulated datasets, and where comparable methods exist, OrthoFinder is equivalent to or outperforms these methods. Furthermore, OrthoFinder is the most accurate ortholog inference method on the Quest for Orthologs benchmark test. Finally, OrthoFinder’s comprehensive phylogenetic analysis is achieved with equivalent speed and scalability to the fastest, score-based heuristic methods. OrthoFinder is available at https://github.com/davidemms/OrthoFinder .
Disentangling Sources of Gene Tree Discordance in Phylogenomic Data Sets
Gene tree discordance in large genomic data sets can be caused by evolutionary processes such as incomplete lineage sorting and hybridization, as well as model violation, and errors in data processing, orthology inference, and gene tree estimation. Species tree methods that identify and accommodate all sources of conflict are not available, but a combination of multiple approaches can help tease apart alternative sources of conflict. Here, using a phylotranscriptomic analysis in combination with reference genomes, we test a hypothesis of ancient hybridization events within the plant family Amaranthaceae s.l. thatwas previously supported bymorphological, ecological, and Sanger-based molecular data. The data set included seven genomes and 88 transcriptomes, 17 generated for this study. We examined gene-tree discordance using coalescent-based species trees and network inference, gene tree discordance analyses, site pattern tests of introgression, topology tests, synteny analyses, and simulations. We found that a combination of processes might have generated the high levels of gene tree discordance in the backbone of Amaranthaceae s.l. Furthermore, we found evidence that three consecutive short internal branches produce anomalous trees contributing to the discordance. Overall, our results suggest that Amaranthaceae s.l. might be a product of an ancient and rapid lineage diversification, and remains, and probably will remain, unresolved. This work highlights the potential problems of identifiability associated with the sources of gene tree discordance including, in particular, phylogenetic network methods. Our results also demonstrate the importance of thoroughly testing for multiple sources of conflict in phylogenomic analyses, especially in the context of ancient, rapid radiations. We provide several recommendations for exploring conflicting signals in such situations.
SonicParanoid2: fast, accurate, and comprehensive orthology inference with machine learning and language models
Accurate inference of orthologous genes constitutes a prerequisite for comparative and evolutionary genomics. SonicParanoid is one of the fastest tools for orthology inference; however, its scalability and accuracy have been hampered by time-consuming all-versus-all alignments and the existence of proteins with complex domain architectures. Here, we present a substantial update of SonicParanoid, where a gradient boosting predictor halves the execution time and a language model doubles the recall. Application to empirical large-scale and standardized benchmark datasets shows that SonicParanoid2 is much faster than comparable methods and also the most accurate. SonicParanoid2 is available at https://gitlab.com/salvo981/sonicparanoid2 and https://zenodo.org/doi/10.5281/zenodo.11371108 .
Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper
Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines are relatively inaccessible (e.g., new assignments only available through database updates), less precise homology-based functional transfer is still the default for (meta-)genome annotation. We, therefore, developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from the eggNOG database. To validate our method, we benchmarked Gene Ontology (GO) predictions against two widely used homology-based approaches: BLAST and InterProScan. Orthology filters applied to BLAST results reduced the rate of false positive assignments by 11%, and increased the ratio of experimentally validated terms recovered over all terms assigned per protein by 15%. Compared with InterProScan, eggNOG-mapper achieved similar proteome coverage and precision while predicting, on average, 41 more terms per protein and increasing the rate of experimentally validated terms recovered over total term assignments per protein by 35%. EggNOG-mapper predictions scored within the top-5 methods in the three GO categories using the CAFA2 NK-partial benchmark. Finally, we evaluated eggNOG-mapper for functional annotation of metagenomics data, yielding better performance than interProScan. eggNOG-mapper runs ∼15× faster than BLAST and at least 2.5× faster than InterProScan. The tool is available standalone and as an online service at http://eggnog-mapper.embl.de.
Integrated metagenomic and metabolomic analysis reveals distinct gut-microbiome-derived phenotypes in early-onset colorectal cancer
ObjectiveThe incidence of early-onset colorectal cancer (EO-CRC) is steadily increasing. Here, we aimed to characterise the interactions between gut microbiome, metabolites and microbial enzymes in EO-CRC patients and evaluate their potential as non-invasive biomarkers for EO-CRC.DesignWe performed metagenomic and metabolomic analyses, identified multiomics markers and constructed CRC classifiers for the discovery cohort with 130 late-onset CRC (LO-CRC), 114 EO-CRC subjects and age-matched healthy controls (97 LO-Control and 100 EO-Control). An independent cohort of 38 LO-CRC, 24 EO-CRC, 22 LO-Controls and 24 EO-Controls was analysed to validate the results.ResultsCompared with controls, reduced alpha-diversity was apparent in both, LO-CRC and EO-CRC subjects. Although common variations existed, integrative analyses identified distinct microbiome–metabolome associations in LO-CRC and EO-CRC. Fusobacterium nucleatum enrichment and short-chain fatty acid depletion, including reduced microbial GABA biosynthesis and a shift in acetate/acetaldehyde metabolism towards acetyl-CoA production characterises LO-CRC. In comparison, multiomics signatures of EO-CRC tended to be associated with enriched Flavonifractor plauti and increased tryptophan, bile acid and choline metabolism. Notably, elevated red meat intake-related species, choline metabolites and KEGG orthology (KO) pldB and cbh gene axis may be potential tumour stimulators in EO-CRC. The predictive model based on metagenomic, metabolomic and KO gene markers achieved a powerful classification performance for distinguishing EO-CRC from controls.ConclusionOur large-sample multiomics data suggest that altered microbiome–metabolome interplay helps explain the pathogenesis of EO-CRC and LO-CRC. The potential of microbiome-derived biomarkers as promising non-invasive tools could be used for the accurate detection and distinction of individuals with EO-CRC.
Orthology Clusters from Gene Trees with Possvm
Abstract Possvm (Phylogenetic Ortholog Sorting with Species oVerlap and MCL [Markov clustering algorithm]) is a tool that automates the process of identifying clusters of orthologous genes from precomputed phylogenetic trees and classifying gene families. It identifies orthology relationships between genes using the species overlap algorithm to infer taxonomic information from the gene tree topology, and then uses the MCL to identify orthology clusters and provide annotated gene families. Our benchmarking shows that this approach, when provided with accurate phylogenies, is able to identify manually curated orthogroups with very high precision and recall. Overall, Possvm automates the routine process of gene tree inspection and annotation in a highly interpretable manner, and provides reusable outputs and phylogeny-aware gene annotations that can be used to inform comparative genomics and gene family evolution analyses.
Parallel ddRAD and Genome Skimming Analyses Reveal a Radiative and Reticulate Evolutionary History of the Temperate Bamboos
Rapid evolutionary radiations are among the most challenging phylogenetic problems, wherein different types of data (e.g., morphology and molecular) or genetic markers (e.g., nuclear and organelle) often yield inconsistent results. The tribe Arundinarieae, that is, the temperate bamboos, is a clade of tetraploid originated 22 Ma and subsequently radiated in East Asia. Previous studies of Arundinarieae have found conflicting relationships and/or low support. Here, we obtain nuclear markers from ddRAD data for 213 Arundinarieae taxa and parallel sampling of chloroplast genomes from genome skimming for 147 taxa. We first assess the feasibility of using ddRAD-seq data for phylogenetic estimates of paleopolyploid and rapidly radiated lineages, optimize clustering thresholds, and analysis workflow for orthology identification. Reference based ddRAD data assembly approaches perform well and yield strongly supported relationships that are generally concordant with morphology-based taxonomy. We recover five major lineages, two of which are notable (the pachymorph and leptomorph lineages), in that they correspond with distinct rhizome morphologies. By contrast, the phylogeny from chloroplast genomes differed significantly. Based on multiple lines of evidence, the ddRAD tree is favored as the best species tree estimation for temperate bamboos. Using a time-calibrated ddRAD tree, we find that Arundinarieae diversified rapidly around the mid-Miocene corresponding with intensification of the East Asian monsoon and the evolution of key innovations including the leptomorph rhizomes. Our results provide a highly resolved phylogeny of Arundinarieae, shed new light on the radiation and reticulate evolutionary history of this tribe, and provide an empirical example for the study of recalcitrant plant radiations.
GENESPACE tracks regions of interest and gene copy number variation across multiple genomes
The development of multiple chromosome-scale reference genome sequences in many taxonomic groups has yielded a high-resolution view of the patterns and processes of molecular evolution. Nonetheless, leveraging information across multiple genomes remains a significant challenge in nearly all eukaryotic systems. These challenges range from studying the evolution of chromosome structure, to finding candidate genes for quantitative trait loci, to testing hypotheses about speciation and adaptation. Here, we present GENESPACE, which addresses these challenges by integrating conserved gene order and orthology to define the expected physical position of all genes across multiple genomes. We demonstrate this utility by dissecting presence–absence, copy-number, and structural variation at three levels of biological organization: spanning 300 million years of vertebrate sex chromosome evolution, across the diversity of the Poaceae (grass) plant family, and among 26 maize cultivars. The methods to build and visualize syntenic orthology in the GENESPACE R package offer a significant addition to existing gene family and synteny programs, especially in polyploid, outbred, and other complex genomes. The genome is the complete DNA sequence of an individual. It is a crucial foundation for many studies in medicine, agriculture, and conservation biology. Advances in genetics have made it possible to rapidly sequence, or read out, the genome of many organisms. For closely related species, scientists can then do detailed comparisons, revealing similar genes with a shared past or a common role, but comparing more distantly related organisms remains difficult. One major challenge is that genes are often lost or duplicated over evolutionary time. One way to be more confident is to look at ‘synteny’, or how genes are organized or ordered within the genome. In some groups of species, synteny persists across millions of years of evolution. Combining sequence similarity with gene order could make comparisons between distantly related species more robust. To do this, Lovell et al. developed GENESPACE, a software that links similarities between DNA sequences to the order of genes in a genome. This allows researchers to visualize and explore related DNA sequences and determine whether genes have been lost or duplicated. To demonstrate the value of GENESPACE, Lovell et al. explored evolution in vertebrates and flowering plants. The software was able to highlight the shared sequences between unique sex chromosomes in birds and mammals, and it was able to track the positions of genes important in the evolution of grass crops including maize, wheat, and rice. Exploring the genetic code in this way could lead to a better understanding of the evolution of important sections of the genome. It might also allow scientists to find target genes for applications like crop improvement. Lovell et al. have designed the GENESPACE software to be easy for other scientists to use, allowing them to make graphics and perform analyses with few programming skills.
Genome-wide identification and expression profiling of trihelix gene family under abiotic stresses in wheat
Background The trihelix gene family is a plant-specific transcription factor family that plays important roles in plant growth, development, and responses to abiotic stresses. However, to date, no systemic characterization of the trihelix genes has yet been conducted in wheat and its close relatives. Results We identified a total of 94 trihelix genes in wheat, as well as 22 trihelix genes in Triticum urartu , 29 in Aegilops tauschii , and 31 in Brachypodium distachyon . We analyzed the chromosomal locations and orthology relations of the identified trihelix genes, and no trihelix gene was found to be located on chromosome 7A, 7B, or 7D of wheat, thereby reflecting the uneven distributions of wheat trihelix genes. Phylogenetic analysis indicated that the 186 identified trihelix proteins in wheat, rice, B. distachyon , and Arabidopsis were clustered into five major clades. The trihelix genes belonging to the same clades usually shared similar motif compositions and exon/intron structural patterns. Five pairs of tandem duplication genes and three pairs of segmental duplication genes were identified in the wheat trihelix gene family, thereby validating the supposition that more intrachromosomal gene duplication events occur in the genome of wheat than in that of other grass species. The tissue-specific expression and differential expression profiling of the identified genes under cold and drought stresses were analyzed by using RNA-seq data. qRT-PCR was also used to confirm the expression profiles of ten selected wheat trihelix genes under multiple abiotic stresses, and we found that these genes mainly responded to salt and cold stresses. Conclusions In this study, we identified trihelix genes in wheat and its close relatives and found that gene duplication events are the main driving force for trihelix gene evolution in wheat. Our expression profiling analysis demonstrated that wheat trihelix genes responded to multiple abiotic stresses, especially salt and cold stresses. The results of our study built a basis for further investigation of the functions of wheat trihelix genes and provided candidate genes for stress-resistant wheat breeding programs.