Catalogue Search | MBRL

Genome-Wide Prediction of Metabolic Enzymes, Pathways, and Gene Clusters in Plants

by Kahn, Daniel , Chavali, Arvind K. , Zhang, Peifen in BASIC BIOLOGICAL SCIENCES , BIOCHEMISTRY AND METABOLISM , Biosynthetic Pathways - genetics

2017

Plant metabolism underpins many traits of ecological and agronomic importance. Plants produce numerous compounds to cope with their environments but the biosynthetic pathways for most of these compounds have not yet been elucidated. To engineer and improve metabolic traits, we need comprehensive and accurate knowledge of the organization and regulation of plant metabolism at the genome scale. Here, we present a computational pipeline to identify metabolic enzymes, pathways, and gene clusters from a sequenced genome. Using this pipeline, we generated metabolic pathway databases for 22 species and identified metabolic gene clusters from 18 species. This unified resource can be used to conduct a wide array of comparative studies of plant metabolism. Using the resource, we discovered a widespread occurrence of metabolic gene clusters in plants: 11,969 clusters from 18 species. The prevalence of metabolic gene clusters offers an intriguing possibility of an untapped source for uncovering new metabolite biosynthesis pathways. For example, more than 1,700 clusters contain enzymes that could generate a specialized metabolite scaffold (signature enzymes) and enzymes that modify the scaffold (tailoring enzymes). In four species with sufficient gene expression data, we identified 43 highly coexpressed clusters that contain signature and tailoring enzymes, of which eight were characterized previously to be functional pathways. Finally, we identified patterns of genome organization that implicate local gene duplication and, to a lesser extent, single gene transposition as having played roles in the evolution of plant metabolic gene clusters.

Journal Article

Share this book

Add to My Shelf

Editorial: Modern machine learning approaches for quantitative inference of gene regulation from genomic and epigenomic features

by Banf, Michael , Hartwig, Thomas , Zhao, Kangmei in Adaptation , Biosynthesis , Cell walls

2023

Journal Article

Share this book

Add to My Shelf

Hybrid allele-specific ChIP-seq analysis identifies variation in brassinosteroid-responsive transcription factor binding linked to traits in maize

by Snodgrass, Samantha J. , Seetharam, Arun S. , Banf, Michael in Allele-specific , Alleles , Animal Genetics and Genomics

2023

Background Genetic variation in regulatory sequences that alter transcription factor (TF) binding is a major cause of phenotypic diversity. Brassinosteroid is a growth hormone that has major effects on plant phenotypes. Genetic variation in brassinosteroid-responsive cis-elements likely contributes to trait variation. Pinpointing such regulatory variations and quantitative genomic analysis of the variation in TF-target binding, however, remains challenging. How variation in transcriptional targets of signaling pathways such as the brassinosteroid pathway contributes to phenotypic variation is an important question to be investigated with innovative approaches. Results Here, we use a hybrid allele-specific chromatin binding sequencing (HASCh-seq) approach and identify variations in target binding of the brassinosteroid-responsive TF ZmBZR1 in maize. HASCh-seq in the B73xMo17 F1s identifies thousands of target genes of ZmBZR1. Allele-specific ZmBZR1 binding (ASB) has been observed for 18.3% of target genes and is enriched in promoter and enhancer regions. About a quarter of the ASB sites correlate with sequence variation in BZR1-binding motifs and another quarter correlate with haplotype-specific DNA methylation, suggesting that both genetic and epigenetic variations contribute to the high level of variation in ZmBZR1 occupancy. Comparison with GWAS data shows linkage of hundreds of ASB loci to important yield and disease-related traits. Conclusion Our study provides a robust method for analyzing genome-wide variations of TF occupancy and identifies genetic and epigenetic variations of the brassinosteroid response transcription network in maize.

Journal Article

Share this book

Add to My Shelf

The Reasonable Effectiveness of Randomness in Scalable and Integrative Gene Regulatory Network Inference and Beyond

by Banf, Michael , Hartwig, Thomas in Algorithms , Binding sites , Biological computing

2021

Gene regulation is orchestrated by a vast number of molecules, including transcription factors and co-factors, chromatin regulators, as well as epigenetic mechanisms, and it has been shown that transcriptional misregulation, e.g., caused by mutations in regulatory sequences, is responsible for a plethora of diseases, including cancer, developmental or neurological disorders. As a consequence, decoding the architecture of gene regulatory networks has become one of the most important tasks in modern (computational) biology. However, to advance our understanding of the mechanisms involved in the transcriptional apparatus, we need scalable approaches that can deal with the increasing number of large-scale, high-resolution, biological datasets. In particular, such approaches need to be capable of efficiently integrating and exploiting the biological and technological heterogeneity of such datasets in order to best infer the underlying, highly dynamic regulatory networks, often in the absence of sufficient ground truth data for model training or testing. With respect to scalability, randomized approaches have proven to be a promising alternative to deterministic methods in computational biology. As an example, one of the top performing algorithms in a community challenge on gene regulatory network inference from transcriptomic data is based on a random forest regression model. In this concise survey, we aim to highlight how randomized methods may serve as a highly valuable tool, in particular, with increasing amounts of large-scale, biological experiments and datasets being collected. Given the complexity and interdisciplinary nature of the gene regulatory network inference problem, we hope our survey maybe helpful to both computational and biological scientists. It is our aim to provide a starting point for a dialogue about the concepts, benefits, and caveats of the toolbox of randomized methods, since unravelling the intricate web of highly dynamic, regulatory events will be one fundamental step in understanding the mechanisms of life and eventually developing efficient therapies to treat and cure diseases.

Journal Article

Share this book

Add to My Shelf

Enhancing gene regulatory network inference through data integration with markov random fields

by Banf, Michael , Rhee, Seung Y. in 631/114/1305 , 631/114/2114 , BASIC BIOLOGICAL SCIENCES

2017

A gene regulatory network links transcription factors to their target genes and represents a map of transcriptional regulation. Much progress has been made in deciphering gene regulatory networks computationally. However, gene regulatory network inference for most eukaryotic organisms remain challenging. To improve the accuracy of gene regulatory network inference and facilitate candidate selection for experimentation, we developed an algorithm called GRACE (Gene Regulatory network inference ACcuracy Enhancement). GRACE exploits biological a priori and heterogeneous data integration to generate high- confidence network predictions for eukaryotic organisms using Markov Random Fields in a semi-supervised fashion. GRACE uses a novel optimization scheme to integrate regulatory evidence and biological relevance. It is particularly suited for model learning with sparse regulatory gold standard data. We show GRACE’s potential to produce high confidence regulatory networks compared to state of the art approaches using Drosophila melanogaster and Arabidopsis thaliana data. In an A. thaliana developmental gene regulatory network, GRACE recovers cell cycle related regulatory mechanisms and further hypothesizes several novel regulatory links, including a putative control mechanism of vascular structure formation due to modifications in cell proliferation.

Journal Article

Share this book

Add to My Shelf

Assessment of network module identification across complex diseases

by Mall, Raghvendra , Yoo-Ah, Kim , Banf, Michael in Algorithms , Benchmarks , Bioinformatics

2019

Many bioinformatics methods have been proposed for reducing the complexity of large gene or protein networks into relevant subnetworks or modules. Yet, how such methods compare to each other in terms of their ability to identify disease-relevant modules in different types of network remains poorly understood. We launched the ‘Disease Module Identification DREAM Challenge’, an open competition to comprehensively assess module identification methods across diverse protein–protein interaction, signaling, gene co-expression, homology and cancer-gene networks. Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies. Our robust assessment of 75 module identification methods reveals top-performing algorithms, which recover complementary trait-associated modules. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets. This community challenge establishes biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology.

Journal Article

Share this book

Add to My Shelf

microProtein Prediction Program (miP3): A Software for Predicting microProteins and Their Target Transcription Factors

by Magnani, Enrico , Banf, Michael , Rhee, Seung Yon in Algorithms , Deoxyribonucleic acid , Genomes

2015

An emerging concept in transcriptional regulation is that a class of truncated transcription factors (TFs), called microProteins (miPs), engages in protein-protein interactions with TF complexes and provides feedback controls. A handful of miP examples have been described in the literature but the extent of their prevalence is unclear. Here we present an algorithm that predicts miPs and their target TFs from a sequenced genome. The algorithm is called miP prediction program (miP3), which is implemented in Python. The software will help shed light on the prevalence, biological roles, and evolution of miPs. Moreover, miP3 can be used to predict other types of miP-like proteins that may have evolved from other functional classes such as kinases and receptors. The program is freely available and can be applied to any sequenced genome.

Journal Article

Share this book

Add to My Shelf

PictureSensation – a mobile application to help the blind explore the visual world through touch and sound

by Mikalay, Ruben , Watzke, Baris , Blanz, Volker in Blindness , Technical Note , Visual impairment

2016

We present PictureSensation, a mobile application for the hapto-acoustic exploration of images. It is designed to allow for the visually impaired to gain direct perceptual access to images via an acoustic signal. PictureSensation introduces a swipe-gesture based, speech-guided, barrier free user interface to guarantee autonomous usage by a blind user. It implements a recently proposed exploration and audification principle, which harnesses exploration methods that the visually impaired are used to from everyday life. In brief, a user explores an image actively on a touch screen and receives auditory feedback about its content at his current finger position. PictureSensation provides an extensive tutorial and training mode, to allow for a blind user to become familiar with the use of the application itself as well as the principles of image content to sound transformations, without any assistance from a normal-sighted person. We show our application’s potential to help visually impaired individuals explore, interpret and understand entire scenes, even on small smartphone screens. Providing more than just verbal scene descriptions, PictureSensation presents a valuable mobile tool to grant the blind access to the visual world through exploration, anywhere.

Journal Article

Share this book

Add to My Shelf

Tripartite-GraphRAG via Plugin Ontologies

by Banf, Michael , Kuhn, Johannes in Density , Graph representations , Graphical representations

2025

Large Language Models (LLMs) have shown remarkable capabilities across various domains, yet they struggle with knowledge-intensive tasks in areas that demand factual accuracy, e.g. industrial automation and healthcare. Key limitations include their tendency to hallucinate, lack of source traceability (provenance), and challenges in timely knowledge updates. Combining language models with knowledge graphs (GraphRAG) offers promising avenues for overcoming these deficits. However, a major challenge lies in creating such a knowledge graph in the first place. Here, we propose a novel approach that combines LLMs with a tripartite knowledge graph representation, which is constructed by connecting complex, domain-specific objects via a curated ontology of corresponding, domain-specific concepts to relevant sections within chunks of text through a concept-anchored pre-analysis of source documents starting from an initial lexical graph. Subsequently, we formulate LLM prompt creation as an unsupervised node classification problem allowing for the optimization of information density, coverage, and arrangement of LLM prompts at significantly reduced lengths. An initial experimental evaluation of our approach on a healthcare use case, involving multi-faceted analyses of patient anamneses given a set of medical concepts as well as a series of clinical guideline literature, indicates its potential to optimize information density, coverage, and arrangement of LLM prompts while significantly reducing their lengths, which, in turn, may lead to reduced costs as well as more consistent and reliable LLM outputs.

Paper

Share this book

Add to My Shelf

METACLUSTERplus - an R package for probabilistic inference and visualization of context-specific transcriptional regulation of biosynthetic gene clusters

by Banf, Michael in Bioinformatics , Chromosomes , Enzymes

2022

Fungi and plants reveal widespread occurrences of metabolic enzymes co-located on the chromosome, some already characterized as being biosynthetic pathways for specialized metabolites, such as terpenes synthesizing enzyme clusters in Lotus japonicus and Arabidopsis thaliana. These clusters display context-specific co-expression of clustered enzymes, indicating a shared transcriptional response in a spatial and condition specific manner, and co-regulation due to promoter binding by shared transcription factors may be one way to facilitate coordinated expression. To enhance our understanding of context-specific transcriptional gene cluster regulation, we redefine and augment this probabilistic framework, labelled METACLUSTERplus, integrating gene expression compendia, context-specific annotations, biosynthetic gene cluster definitions, as well as gene regulatory network architectures. Further, it provides a set of appealing and intuitive visualizations of inferred results for analysis and publication. METACLUSTERplus is available at https://github.com/mbanf/MetaclusterPlus . Competing Interest Statement The authors have declared no competing interest.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter