Catalogue Search | MBRL

NaviGO: interactive tool for visualization and functional similarity and coherence analysis with gene ontology

by Yerneni, Satwica , Ding, Ziyun , Wei, Qing in Algorithms , Annotations , Bioinformatics

2017

Background The number of genomics and proteomics experiments is growing rapidly, producing an ever-increasing amount of data that are awaiting functional interpretation. A number of function prediction algorithms were developed and improved to enable fast and automatic function annotation. With the well-defined structure and manual curation, Gene Ontology (GO) is the most frequently used vocabulary for representing gene functions. To understand relationship and similarity between GO annotations of genes, it is important to have a convenient pipeline that quantifies and visualizes the GO function analyses in a systematic fashion. Results NaviGO is a web-based tool for interactive visualization, retrieval, and computation of functional similarity and associations of GO terms and genes. Similarity of GO terms and gene functions is quantified with six different scores including protein-protein interaction and context based association scores we have developed in our previous works. Interactive navigation of the GO function space provides intuitive and effective real-time visualization of functional groupings of GO terms and genes as well as statistical analysis of enriched functions. Conclusions We developed NaviGO, which visualizes and analyses functional similarity and associations of GO terms and genes. The NaviGO webserver is freely available at: http://kiharalab.org/web/navigo .

Journal Article

Share this book

Add to My Shelf

About the dark corners in the gene function space of Escherichia coli remaining without illumination by scientific literature

by Tantoso, Erwin , Jensen, Lars Juhl , Eisenhaber, Birgit in Analysis , Automation , Bacterial genetics

2023

Background Although Escherichia coli ( E. coli ) is the most studied prokaryote organism in the history of life sciences, many molecular mechanisms and gene functions encoded in its genome remain to be discovered. This work aims at quantifying the illumination of the E. coli gene function space by the scientific literature and how close we are towards the goal of a complete list of E. coli gene functions. Results The scientific literature about E. coli protein-coding genes has been mapped onto the genome via the mentioning of names for genomic regions in scientific articles both for the case of the strain K-12 MG1655 as well as for the 95%-threshold softcore genome of 1324 E. coli strains with known complete genome. The article match was quantified with the ratio of a given gene name’s occurrence to the mentioning of any gene names in the paper. The various genome regions have an extremely uneven literature coverage. A group of elite genes with ≥ 100 full publication equivalents (FPEs, FPE = 1 is an idealized publication devoted to just a single gene) attracts the lion share of the papers. For K-12, ~ 65% of the literature covers just 342 elite genes; for the softcore genome, ~ 68% of the FPEs is about only 342 elite gene families (GFs). We also find that most genes/GFs have at least one mentioning in a dedicated scientific article (with the exception of at least 137 protein-coding transcripts for K-12 and 26 GFs from the softcore genome). Whereas the literature growth rates were highest for uncharacterized or understudied genes until 2005–2010 compared with other groups of genes, they became negative thereafter. At the same time, literature for anyhow well-studied genes started to grow explosively with threshold T10 (≥ 10 FPEs). Typically, a body of ~ 20 actual articles generated over ~ 15 years of research effort was necessary to reach T10. Lineage-specific co-occurrence analysis of genes belonging to the accessory genome of E. coli together with genomic co-localization and sequence-analytic exploration hints previously completely uncharacterized genes yahV and yddL being associated with osmotic stress response/motility mechanisms. Conclusion If the numbers of scientific articles about uncharacterized and understudied genes remain at least at present levels, full gene function lists for the strain K-12 MG1655 and the E. coli softcore genome are in reach within the next 25–30 years. Once the literature body for a gene crosses 10 FPEs, most of the critical fundamental research risk appears overcome and steady incremental research becomes possible.

Journal Article

Share this book

Add to My Shelf

Screens in fly and beetle reveal vastly divergent gene sets required for developmental processes

by Klingler, Martin , Kessel, Tobias , Siemanowski, Janna in Animals , Annotations , Arthropods

2022

Background Most of the known genes required for developmental processes have been identified by genetic screens in a few well-studied model organisms, which have been considered representative of related species, and informative—to some degree—for human biology. The fruit fly Drosophila melanogaster is a prime model for insect genetics, and while conservation of many gene functions has been observed among bilaterian animals, a plethora of data show evolutionary divergence of gene function among more closely-related groups, such as within the insects. A quantification of conservation versus divergence of gene functions has been missing, without which it is unclear how representative data from model systems actually are. Results Here, we systematically compare the gene sets required for a number of homologous but divergent developmental processes between fly and beetle in order to quantify the difference of the gene sets. To that end, we expanded our RNAi screen in the red flour beetle Tribolium castaneum to cover more than half of the protein-coding genes. Then we compared the gene sets required for four different developmental processes between beetle and fly. We found that around 50% of the gene functions were identified in the screens of both species while for the rest, phenotypes were revealed only in fly (~ 10%) or beetle (~ 40%) reflecting both technical and biological differences. Accordingly, we were able to annotate novel developmental GO terms for 96 genes studied in this work. With this work, we publish the final dataset for the pupal injection screen of the iBeetle screen reaching a coverage of 87% (13,020 genes). Conclusions We conclude that the gene sets required for a homologous process diverge more than widely believed. Hence, the insights gained in flies may be less representative for insects or protostomes than previously thought, and work in complementary model systems is required to gain a comprehensive picture. The RNAi screening resources developed in this project, the expanding transgenic toolkit, and our large-scale functional data make T. castaneum an excellent model system in that endeavor.

Journal Article

Share this book

Add to My Shelf

PlantGPT: An Arabidopsis‐Based Intelligent Agent that Answers Questions about Plant Functional Genomics

by Wen, Jun , Xie, Yongyao , Wang, Yu in Accuracy , Arabidopsis - genetics , Crops

2025

Research into plant gene function is crucial for developing strategies to increase crop yields. The recent introduction of large language models (LLMs) offers a means to aggregate large amounts of data into a queryable format, but the output can contain inaccurate or false claims known as hallucinations. To minimize such hallucinations and produce high‐quality knowledge‐based outputs, the s of over 60 000 plant research articles are compiled into a Chroma database for retrieval‐augmented generation (RAG). Then linguistic data are used from 13 993 Arabidopsis (Arabidopsis thaliana) phenotypes and 23 323 gene functions to fine‐tune the LLM Llama3‐8B, producing PlantGPT, a virtual expert in Arabidopsis phenotype–gene research. By evaluating answers to test questions, it is demonstrated that PlantGPT outperforms general LLMs in answering specialized questions. The findings provide a blueprint for functional genomics research in food crops and demonstrate the potential for developing LLMs for plant research modalities. To provide broader access and facilitate adoption, the online tool http://www.plantgpt.icu is developed, which will allow researchers to use PlantGPT in their scientific investigations. PlantGPT integrates 60 000+ plant research articles with Arabidopsis phenotype‐gene data through retrieval‐augmented generation and fine‐tuning of Llama3‐8B. This open‐source, specialized AI system outperforms general large language models in plant gene‐phenotype relationships, establishing a new paradigm for functional genomics research and molecular design breeding.

Journal Article

Share this book

Add to My Shelf

Did the early full genome sequencing of yeast boost gene function discovery?

by Tantoso, Erwin , Jensen, Lars Juhl , Eisenhaber, Birgit in 21st century , Automation , Base Sequence

2023

Background Although the genome of Saccharomyces cerevisiae ( S. cerevisiae ) was the first one of a eukaryote organism that was fully sequenced (in 1996), a complete understanding of the potential of encoded biomolecular mechanisms has not yet been achieved. Here, we wish to quantify how far the goal of a full list of S. cerevisiae gene functions still is. Results The scientific literature about S. cerevisiae protein-coding genes has been mapped onto the yeast genome via the mentioning of names for genomic regions in scientific publications. The match was quantified with the ratio of a given gene name’s occurrences to those of any gene names in the article. We find that ~ 230 elite genes with ≥ 75 full publication equivalents (FPEs, FPE = 1 is an idealized publication referring to just a single gene) command ~ 45% of all literature. At the same time, about two thirds of the genes (each with less than 10 FPEs) are described in just 12% of the literature (in average each such gene has just ~ 1.5% of the literature of an elite gene). About 600 genes have not been mentioned in any dedicated article. Compared with other groups of genes, the literature growth rates were highest for uncharacterized or understudied genes until late nineties of the twentieth century. Yet, these growth rates deteriorated and became negative thereafter. Thus, yeast function discovery for previously uncharacterized genes has returned to the level of ~ 1980. At the same time, literature for anyhow well-studied genes (with a threshold T10 (≥ 10 FPEs) and higher) remains steadily growing. Conclusions Did the early full genome sequencing of yeast boost gene function discovery? The data proves that the moment of publishing the full genome in reality coincides with the onset of decline of gene function discovery for previously uncharacterized genes. If the current status of literature about yeast molecular mechanisms can be extrapolated into the future, it will take about another ~ 50 years to complete the yeast gene function list. We found that a small group of scientific journals contributed extraordinarily to publishing early reports relevant to yeast gene function discoveries.

Journal Article

Share this book

Add to My Shelf

Elucidating gene function and function evolution through comparison of co-expression networks of plants

by Janowski, Marcin , Vaid, Neha , Mutwil, Marek in Annotations , Associations , Biological evolution

2014

The analysis of gene expression data has shown that transcriptionally coordinated (co-expressed) genes are often functionally related, enabling scientists to use expression data in gene function prediction. This Focused Review discusses our original paper (Large-scale co-expression approach to dissect secondary cell wall formation across plant species, Frontiers in Plant Science 2:23). In this paper we applied cross-species analysis to co-expression networks of genes involved in cellulose biosynthesis. We showed that the co-expression networks from different species are highly similar, indicating that whole biological pathways are conserved across species. This finding has two important implications. First, the analysis can transfer gene function annotation from well-studied plants, such as Arabidopsis, to other, uncharacterized plant species. As the analysis finds genes that have similar sequence and similar expression pattern across different organisms, functionally equivalent genes can be identified. Second, since co-expression analyses are often noisy, a comparative analysis should have higher performance, as parts of co-expression networks that are conserved are more likely to be functionally relevant. In this Focused Review, we outline the comparative analysis done in the original paper and comment on the recent advances and approaches that allow comparative analyses of co-function networks. We hypothesize that in comparison to simple co-expression analysis, comparative analysis would yield more accurate gene function predictions. Finally, by combining comparative analysis with genomic information of green plants, we propose a possible composition of cellulose biosynthesis machinery during earlier stages of plant evolution.

Journal Article

Share this book

Add to My Shelf

Reconstitution and Transmission of Gut Microbiomes and Their Genes between Generations

by Zilber-Rosenberg, Ilana , Rosenberg, Eugene in Animals , asexual reproduction , Asexuality

2021

Microbiomes are transmitted between generations by a variety of different vertical and/or horizontal modes, including vegetative reproduction (vertical), via female germ cells (vertical), coprophagy and regurgitation (vertical and horizontal), physical contact starting at birth (vertical and horizontal), breast-feeding (vertical), and via the environment (horizontal). Analyses of vertical transmission can result in false negatives (failure to detect rare microbes) and false positives (strain variants). In humans, offspring receive most of their initial gut microbiota vertically from mothers during birth, via breast-feeding and close contact. Horizontal transmission is common in marine organisms and involves selectivity in determining which environmental microbes can colonize the organism’s microbiome. The following arguments are put forth concerning accurate microbial transmission: First, the transmission may be of functions, not necessarily of species; second, horizontal transmission may be as accurate as vertical transmission; third, detection techniques may fail to detect rare microbes; lastly, microbiomes develop and reach maturity with their hosts. In spite of the great variation in means of transmission discussed in this paper, microbiomes and their functions are transferred from one generation of holobionts to the next with fidelity. This provides a strong basis for each holobiont to be considered a unique biological entity and a level of selection in evolution, largely maintaining the uniqueness of the entity and conserving the species from one generation to the next.

Journal Article

Share this book

Add to My Shelf

Proxies of CRISPR/Cas9 Activity To Aid in the Identification of Mutagenized Arabidopsis Plants

by Danna, Cristian H , Li, Renyu , Vavrik, Charles in CRISPR , Genes , Genotype & phenotype

2020

CRISPR/Cas9 has become the preferred gene-editing technology to obtain loss-of-function mutants in plants, and hence a valuable tool to study gene function. This is mainly due to the easy reprogramming of Cas9 specificity using customizable small non-coding RNAs, and to the possibility of editing several independent genes simultaneously. Despite these advances, the identification of CRISPR-edited plants remains time and resource-intensive. Here, based on the premise that one editing event in one locus is a good predictor of editing event/s in other locus/loci, we developed a CRISPR co-editing selection strategy that greatly facilitates the identification of CRISPR-mutagenized Arabidopsis thaliana plants. This strategy is based on targeting the gene/s of interest simultaneously with a proxy of CRISPR-Cas9-directed mutagenesis. The proxy is an endogenous gene whose loss-of-function produces an easy-to-detect visible phenotype that is unrelated to the expected phenotype of the gene/s under study. We tested this strategy via assessing the frequency of co-editing of three functionally unrelated proxy genes. We found that each proxy predicted the occurrence of mutations in each surrogate gene with efficiencies ranging from 68 to 100%. The selection strategy laid out here provides a framework to facilitate the identification of multiplex edited plants, thus aiding in the study of gene function when functional redundancy hinders the effort to define gene-function-phenotype links.

Journal Article

Share this book

Add to My Shelf

Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression

by Veldink, Jan H. , Hewitt, Alex W. , Thiery, Joachim in 631/208/199 , 631/208/200 , 631/208/205/2138

2021

Trait-associated genetic variants affect complex phenotypes primarily via regulatory mechanisms on the transcriptome. To investigate the genetics of gene expression, we performed cis - and trans -expression quantitative trait locus (eQTL) analyses using blood-derived expression from 31,684 individuals through the eQTLGen Consortium. We detected cis -eQTL for 88% of genes, and these were replicable in numerous tissues. Distal trans -eQTL (detected for 37% of 10,317 trait-associated variants tested) showed lower replication rates, partially due to low replication power and confounding by cell type composition. However, replication analyses in single-cell RNA-seq data prioritized intracellular trans -eQTL. Trans -eQTL exerted their effects via several mechanisms, primarily through regulation by transcription factors. Expression of 13% of the genes correlated with polygenic scores for 1,263 phenotypes, pinpointing potential drivers for those traits. In summary, this work represents a large eQTL resource, and its results serve as a starting point for in-depth interpretation of complex phenotypes. Analyses of expression profiles from whole blood of 31,684 individuals identify cis -expression quantitative trait loci (eQTL) effects for 88% of genes and trans -eQTL effects for 37% of trait-associated variants.

Journal Article

Share this book

Add to My Shelf

GND-PCA Method for Identification of Gene Functions Involved in Asymmetric Division of C. elegans

by Chen, Yen-Wei , Han, Xian-Hua , Yang, Sihai in Asymmetry , C. elegans , Cell division

2023

Due to the rapid development of imaging technology, a large number of biological images have been obtained with three-dimensional (3D) spatial information, time information, and spectral information. Compared with the case of two-dimensional images, the framework for analyzing multidimensional bioimages has not been completely established yet. WDDD is an open biological image database. It dynamically records 3D developmental images of 186 samples of nematodes C. elegans. In this study, based on WDDD, we constructed a framework to analyze the multidimensional dataset, which includes image segmentation, image registration, size registration by the length of main axes, time registration by extracting key time points, and finally, using generalized N-dimensional principal component analysis (GND-PCA) to analyze the phenotypes of bioimages. As a data-driven technique, GND-PCA can automatically extract the important factors involved in the development of P1 and AB in C. elegans. A 3D bioimage can be regarded as a third-order tensor. Therefore, GND-PCA was applied to the set of third-order tensors, and a set of third-order tensor bases was iteratively learned to linearly approximate the set. For each tensor base, a corresponding characteristic image is built to reveal its geometric meaning. The results show that different bases can be used to express different vital factors in development, such as the asymmetric division within the two-cell stage of C. elegans. Based on selected bases, statistical models were built by 50 wild-type (WT) embryos in WDDD, and were applied to RNA interference (RNAi) embryos. The results of statistical testing demonstrated the effectiveness of this method.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter