Catalogue Search | MBRL

Highly conserved low‐copy nuclear genes as effective markers for phylogenetic analyses in angiosperms

by Ma, Hong , Zeng, Liping , Shan, Hongyan in angiosperm phylogeny , Angiospermae , Angiosperms

2012

• Organismal phylogeny provides a crucial evolutionary framework for many studies and the angiosperm phylogeny has been greatly improved recently, largely using organellar and rDNA genes. However, low‐copy protein‐coding nuclear genes have not been widely used on a large scale in spite of the advantages of their biparental inheritance and vast number of choices. • Here, we identified 1083 highly conserved low‐copy nuclear genes by genome comparison. Furthermore, we demonstrated the use of five nuclear genes in 91 angiosperms representing 46 orders (73% of orders) and three gymnosperms as outgroups for a highly resolved phylogeny. • These nuclear genes are easy to clone and align, and more phylogenetically informative than widely used organellar genes. The angiosperm phylogeny reconstructed using these genes was largely congruent with previous ones mainly inferred from organellar genes. Intriguingly, several new placements were uncovered for some groups, including those among the rosids, the asterids, and between the eudicots and several basal angiosperm groups. • These conserved universal nuclear genes have several inherent qualities enabling them to be good markers for reconstructing angiosperm phylogeny, even eukaryotic relationships, further providing new insights into the evolutionary history of angiosperms.

Journal Article

Share this book

Add to My Shelf

A customized nuclear target enrichment approach for developing a phylogenomic baseline for Dioscorea yams (Dioscoreaceae)

by Viruel, Juan , Forest, Félix , Gravendeel, Barbara in Application , Crops , Dioscorea

2019

Premise We developed a target enrichment panel for phylogenomic studies of Dioscorea, an economically important genus with incompletely resolved relationships. Methods Our bait panel comprises 260 low‐ to single‐copy nuclear genes targeted to work in Dioscorea, assessed here using a preliminary taxon sampling that includes both distantly and closely related taxa, including several yam crops and potential crop wild relatives. We applied coalescent‐based and maximum likelihood phylogenomic inference approaches to the pilot taxon set, incorporating new and published transcriptome data from additional species. Results The custom panel retrieved ~94% of targets and >80% of full gene length from 88% and 68% of samples, respectively. It has minimal gene overlap with existing panels designed for angiosperm‐wide studies and generally recovers longer and more variable targets. Pilot phylogenomic analyses consistently resolve most deep and recent relationships with strong support across analyses and point to revised relationships between the crop species D. alata and candidate crop wild relatives. Discussion Our customized panel reliably retrieves targeted loci from Dioscorea, is informative for resolving relationships in denser samplings, and is suitable for refining our understanding of the independent origins of cultivated yam species; the panel likely has broader promise for phylogenomic studies across Dioscoreales.

Journal Article

Share this book

Add to My Shelf

A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies

by Guttman, David S. , Thakur, Shalabh in Algorithms , Amino Acid Sequence , Bioinformatics

2016

Background Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, most scale poorly when faced with hundreds of genome sequences, and many require extensive manual curation. Results We have developed a de-novo genome analysis pipeline (DeNoGAP) for the automated, iterative and high-throughput analysis of data from comparative genomics projects involving hundreds of whole genome sequences. The pipeline is designed to perform reference-assisted and de novo gene prediction, homolog protein family assignment, ortholog prediction, functional annotation, and pan-genome analysis using a range of proven tools and databases. While most existing methods scale quadratically with the number of genomes since they rely on pairwise comparisons among predicted protein sequences, DeNoGAP scales linearly since the homology assignment is based on iteratively refined hidden Markov models. This iterative clustering strategy enables DeNoGAP to handle a very large number of genomes using minimal computational resources. Moreover, the modular structure of the pipeline permits easy updates as new analysis programs become available. Conclusion DeNoGAP integrates bioinformatics tools and databases for comparative analysis of a large number of genomes. The pipeline offers tools and algorithms for annotation and analysis of completed and draft genome sequences. The pipeline is developed using Perl, BioPerl and SQLite on Ubuntu Linux version 12.04 LTS. Currently, the software package accompanies script for automated installation of necessary external programs on Ubuntu Linux; however, the pipeline should be also compatible with other Linux and Unix systems after necessary external programs are installed. DeNoGAP is freely available at https://sourceforge.net/projects/denogap/ .

Journal Article

Share this book

Add to My Shelf

Specimen Identification Through Multilocus Species Tree Constructed From Single‐Copy Orthologs (SCOs): A Case Study in Cymbidium Subgenus Jensoa

by Yang, Jun‐Bo , He, Zheng‐Shan , Huang, Jia‐Lin in Bar codes , Case studies , closely related species

2025

Standard barcodes and ultra‐barcode encounter significant challenges when delimiting and discriminating closely related species characterized by deep coalescence, hybrid speciation, gene flow, or low sequence variation. Single‐copy orthologs (SCOs) have been widely recognized as standardized nuclear markers in metazoan DNA taxonomy, yet their application in plant taxonomy remains unexplored. This study evaluates the efficacy of SCOs for identifying recently diverged species within the Cymbidium subgenus Jensoa, where ultra‐barcodes have previously shown limited resolution. Remarkably, over 90% of the 9094 targeted reference SCOs, inferred from three Cymbidium genomes, were successfully retrieved for all 11 representative species in subg. Jensoa using ALiBaSeq at a minimal 5× depth from whole genome shotgun sequences. The species tree, reconstructed from multiple refined SCO matrices under the coalescent model, effectively distinguished all species and identified mislabeled or misidentified specimens. The comprehensive and refined SCO matrices produced by our pipeline not only enhance phylogenetic analysis but also improve the precision of species diagnosis. Additionally, biparentally inherited SCOs, serving as multi‐locus markers, not only augment the effectiveness of DNA barcoding but also support a transition to multi‐locus, species‐tree‐based specimen assignment strategies. Standard DNA barcodes often fail to distinguish closely related plant species due to deep coalescence, hybridization, and low‐sequence variation. In this study, we demonstrate that single‐copy orthologs (SCOs), widely used in metazoan taxonomy, can serve as effective multilocus markers for plant species identification. Using Cymbidium subgenus Jensoa as a case study, we show that SCO‐based species trees significantly improve species resolution and specimen identification, offering a powerful alternative to traditional barcode methods.

Journal Article

Share this book

Add to My Shelf

OrthoRefine: automated enhancement of prior ortholog identification via synteny

by Ludwig, J. , Mrázek, J. in Algorithms , Analysis , Automation

2024

Background Identifying orthologs continues to be an early and imperative step in genome analysis but remains a challenging problem. While synteny (conservation of gene order) has previously been used independently and in combination with other methods to identify orthologs, applying synteny in ortholog identification has yet to be automated in a user-friendly manner. This desire for automation and ease-of-use led us to develop OrthoRefine, a standalone program that uses synteny to refine ortholog identification. Results We developed OrthoRefine to improve the detection of orthologous genes by implementing a look-around window approach to detect synteny. We tested OrthoRefine in tandem with OrthoFinder, one of the most used software for identification of orthologs in recent years. We evaluated improvements provided by OrthoRefine in several bacterial and a eukaryotic dataset. OrthoRefine efficiently eliminates paralogs from orthologous groups detected by OrthoFinder. Using synteny increased specificity and functional ortholog identification; additionally, analysis of BLAST e-value, phylogenetics, and operon occurrence further supported using synteny for ortholog identification. A comparison of several window sizes suggested that smaller window sizes (eight genes) were generally the most suitable for identifying orthologs via synteny. However, larger windows (30 genes) performed better in datasets containing less closely related genomes. A typical run of OrthoRefine with ~ 10 bacterial genomes can be completed in a few minutes on a regular desktop PC. Conclusion OrthoRefine is a simple-to-use, standalone tool that automates the application of synteny to improve ortholog detection. OrthoRefine is particularly efficient in eliminating paralogs from orthologous groups delineated by standard methods.

Journal Article

Share this book

Add to My Shelf

Species and population specific gene expression in blood transcriptomes of marine turtles

by Sterling, Eleanor J. , Lewison, Rebecca L. , Benson, Scott R. in Animal Genetics and Genomics , Annotations , Anthropogenic factors

2021

Background Transcriptomic data has demonstrated utility to advance the study of physiological diversity and organisms’ responses to environmental stressors. However, a lack of genomic resources and challenges associated with collecting high-quality RNA can limit its application for many wild populations. Minimally invasive blood sampling combined with de novo transcriptomic approaches has great potential to alleviate these barriers. Here, we advance these goals for marine turtles by generating high quality de novo blood transcriptome assemblies to characterize functional diversity and compare global transcriptional profiles between tissues, species, and foraging aggregations. Results We generated high quality blood transcriptome assemblies for hawksbill ( Eretmochelys imbricata ) , loggerhead ( Caretta caretta ), green ( Chelonia mydas ), and leatherback ( Dermochelys coriacea ) turtles. The functional diversity in assembled blood transcriptomes was comparable to those from more traditionally sampled tissues. A total of 31.3% of orthogroups identified were present in all four species, representing a core set of conserved genes expressed in blood and shared across marine turtle species. We observed strong species-specific expression of these genes, as well as distinct transcriptomic profiles between green turtle foraging aggregations that inhabit areas of greater or lesser anthropogenic disturbance. Conclusions Obtaining global gene expression data through non-lethal, minimally invasive sampling can greatly expand the applications of RNA-sequencing in protected long-lived species such as marine turtles. The distinct differences in gene expression signatures between species and foraging aggregations provide insight into the functional genomics underlying the diversity in this ancient vertebrate lineage. The transcriptomic resources generated here can be used in further studies examining the evolutionary ecology and anthropogenic impacts on marine turtles.

Journal Article

Share this book

Add to My Shelf

A sugar utilization phenotype contributes to the formation of genetic exchange communities in lactic acid bacteria

by Masanori Arita , Shinkuro Takenaka , Takeshi Kawashima in Analysis , Bacteria , Brewing

2021

Abstract In prokaryotes, a major contributor to genomic evolution is the exchange of genes via horizontal gene transfer (HGT). Areas with a high density of HGT networks are defined as genetic exchange communities (GECs). Although some phenotypes associated with specific ecological niches are linked to GECs, little is known about the phenotypic influences on HGT in bacterial groups within a taxonomic family. Thanks to the published genome sequences and phenotype data of lactic acid bacteria (LAB), it is now possible to obtain more detailed information about the phenotypes that affect GECs. Here, we have investigated the relationship between HGT and internal and external environmental factors for 178 strains from 24 genera in the Lactobacillaceae family. We found a significant correlation between strains with high utilization of sugars and HGT bias. The result suggests that the phenotype of the utilization of a variety of sugars is key to the construction of GECs in this family. This feature is consistent with the fact that the Lactobacillaceae family contributes to the production of a wide variety of fermented foods by sharing niches such as those in vegetables, dairy products and brewing-related environments. This result provides the first evidence that phenotypes associated with ecological niches contribute to form GECs in the LAB family. The ability to utilize a variety of sugars contributed to increased horizontal gene transfer and the formation of genetic exchange communities in the ecological niches among lactic acid bacteria.

Journal Article

Share this book

Add to My Shelf

lncEvo: automated identification and conservation study of long noncoding RNAs

by Bryzghalov, Oleksii , Szcześniak, Michał Wojciech , Makałowska, Izabela in Algorithms , Annotations , Assembly

2021

Background Long noncoding RNAs represent a large class of transcripts with two common features: they exceed an arbitrary length threshold of 200 nt and are assumed to not encode proteins. Although a growing body of evidence indicates that the vast majority of lncRNAs are potentially nonfunctional, hundreds of them have already been revealed to perform essential gene regulatory functions or to be linked to a number of cellular processes, including those associated with the etiology of human diseases. To better understand the biology of lncRNAs, it is essential to perform a more in-depth study of their evolution. In contrast to protein-encoding transcripts, however, they do not show the strong sequence conservation that usually results from purifying selection; therefore, software that is typically used to resolve the evolutionary relationships of protein-encoding genes and transcripts is not applicable to the study of lncRNAs. Results To tackle this issue, we developed lncEvo, a computational pipeline that consists of three modules: (1) transcriptome assembly from RNA-Seq data, (2) prediction of lncRNAs, and (3) conservation study—a genome-wide comparison of lncRNA transcriptomes between two species of interest, including search for orthologs. Importantly, one can choose to apply lncEvo solely for transcriptome assembly or lncRNA prediction, without calling the conservation-related part. Conclusions lncEvo is an all-in-one tool built with the Nextflow framework, utilizing state-of-the-art software and algorithms with customizable trade-offs between speed and sensitivity, ease of use and built-in reporting functionalities. The source code of the pipeline is freely available for academic and nonacademic use under the MIT license at https://gitlab.com/spirit678/lncrna_conservation_nf .

Journal Article

Share this book

Add to My Shelf

Cross-species analysis between the maize smut fungi Ustilago maydis and Sporisorium reilianum highlights the role of transcriptional change of effector orthologs for virulence and disease

by Zuo, Weiliang , Depotter, Jasper R. L. , Gupta, Deepak K. in Colonization , Corn , CRISPR

2021

• The constitution and regulation of effector repertoires shape host–microbe interactions. Ustilago maydis and Sporisorium reilianum are two closely related smut fungi, which both infect maize but cause distinct disease symptoms. Understanding how effector orthologs are regulated in these two pathogens can therefore provide insights into the evolution of different infection strategies. • We tracked the infection progress of U. maydis and S. reilianum in maize leaves and used two distinct infection stages for cross-species RNA-sequencing analyses. We identified 207 of 335 one-to-one effector orthologs as differentially regulated during host colonization, which might reflect the distinct disease development strategies. • Using CRISPR-Cas9-mediated gene conversion, we identified two differentially expressed effector orthologs with conserved function between two pathogens. Thus, differential expression of functionally conserved genes might contribute to species-specific adaptation and symptom development. Interestingly, another differentially expressed orthogroup (UMAG_05318/Sr10075) showed divergent protein function, providing a possible case for neofunctionalization. • Collectively, we demonstrated that the diversification of effector genes in related pathogens can be caused both by alteration on the transcriptional level and through functional diversification of the encoded effector proteins.

Journal Article

Share this book

Add to My Shelf

Estimating transcriptome complexities across eukaryotes

by Rogers, Rebekah L. , Titus-McQuillan, James E. , McIntyre, Lauren M. in Alternative Splicing , Analysis , Animal Genetics and Genomics

2023

Background Genomic complexity is a growing field of evolution, with case studies for comparative evolutionary analyses in model and emerging non-model systems. Understanding complexity and the functional components of the genome is an untapped wealth of knowledge ripe for exploration. With the “remarkable lack of correspondence” between genome size and complexity, there needs to be a way to quantify complexity across organisms. In this study, we use a set of complexity metrics that allow for evaluating changes in complexity using TranD. Results We ascertain if complexity is increasing or decreasing across transcriptomes and at what structural level, as complexity varies. In this study, we define three metrics – TpG, EpT, and EpG- to quantify the transcriptome's complexity that encapsulates the dynamics of alternative splicing. Here we compare complexity metrics across 1) whole genome annotations, 2) a filtered subset of orthologs, and 3) novel genes to elucidate the impacts of orthologs and novel genes in transcript model analysis. Effective Exon Number (EEN) issued to compare the distribution of exon sizes within transcripts against random expectations of uniform exon placement. EEN accounts for differences in exon size, which is important because novel gene differences in complexity for orthologs and whole-transcriptome analyses are biased towards low-complexity genes with few exons and few alternative transcripts. Conclusions With our metric analyses, we are able to quantify changes in complexity across diverse lineages with greater precision and accuracy than previous cross-species comparisons under ortholog conditioning. These analyses represent a step toward whole-transcriptome analysis in the emerging field of non-model evolutionary genomics, with key insights for evolutionary inference of complexity changes on deep timescales across the tree of life. We suggest a means to quantify biases generated in ortholog calling and correct complexity analysis for lineage-specific effects. With these metrics, we directly assay the quantitative properties of newly formed lineage-specific genes as they lower complexity.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter