Catalogue Search | MBRL

Using graph theory to analyze biological networks

by Bagos, Pantelis G , Aerts, Jan , Schneider, Reinhard in Algorithms , Bioinformatics , biological network

2011

Understanding complex systems often requires a bottom-up analysis towards a systems biology approach. The need to investigate a system, not only as individual components but as a whole, emerges. This can be done by examining the elementary constituents individually and then how these are connected. The myriad components of a system and their interactions are best characterized as networks and they are mainly represented as graphs where thousands of nodes are connected with thousands of vertices. In this article we demonstrate approaches, models and methods from the graph theory universe and we discuss ways in which they can be used to reveal hidden properties and features of a network. This network profiling combined with knowledge extraction will help us to better understand the biological significance of the system.

Journal Article

Share this book

Add to My Shelf

BioMOBS: A multi-omics visual analytics workflow for biomolecular insight generation

by Peeters, Jannes , Aerts, Jan , Heylen, Dries in Analysis , Biobanks , Biochemistry

2023

One of the challenges in multi-omics data analysis for precision medicine is the efficient exploration of undiscovered molecular interactions in disease processes. We present BioMOBS, a workflow consisting of two data visualization tools integrated with an open-source molecular information database to perform clinically relevant analyses ( https://github.com/driesheylen123/BioMOBS ). We performed exploratory pathway analysis with BioMOBS and demonstrate its ability to generate relevant molecular hypotheses, by reproducing recent findings in type 2 diabetes UK biobank data. The central visualisation tool, where data-driven and literature-based findings can be integrated, is available within the github link as well. BioMOBS is a workflow that leverages information from multiple data-driven interactive analyses and visually integrates it with established pathway knowledge. The demonstrated use cases place trust in the usage of BioMOBS as a procedure to offer clinically relevant insights in disease pathway analyses on various types of omics data.

Journal Article

Share this book

Add to My Shelf

Origins and functional impact of copy number variation in the human genome

by Conrad, Donald F. , Aerts, Jan , Redon, Richard in Biological and medical sciences , Deoxyribonucleic acid , Design

2010

Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs. Major CNV data sets Copy number variations or CNVs are a common form of genetic variation between individuals, caused by genomic rearrangements, either inherited or due to de novo mutation. A major collaborative effort using tiling oligonucleotide microarrays and HapMap samples has generated a comprehensive working map of 11,700 CNVs in the human genome. About half of these were also genotyped in individuals of different ancestry — European, African or East Asian. Thirty loci with CNVs that are candidates for influencing disease susceptibility were identified. Published online last October, this vast data set is a landmark in terms of completeness and spatial resolution, and as John Armour wrote in News & Views , is likely to stand as a definitive resource for years to come. This resource is the main focus of a new genome-wide association study, from the Wellcome Trust Case Control Consortium, of the links between common CNVs and eight common human diseases. Providing a wealth of technical insights to inform future study design and analysis, the Wellcome study also implies that common CNVs that can be genotyped using existing platforms are unlikely to have a major role in the genetic basis of common diseases. Much genetic variation among humans can be accounted for by structural DNA differences that are greater than 1 kilobase in size. Here, using tiling oligonucleotide arrays and HapMap samples, a map of 11,700 copy number variations (CNVs) bigger than 443 base pairs has been generated. About half of these CNVs were also genotyped in individuals of different ancestry. The results offer insight into the relative prevalence of mechanisms that generate CNVs, their evolution, and their contribution to complex genetic diseases.

Journal Article

Share this book

Add to My Shelf

Latent Dirichlet Allocation reveals tomato root-associated bacterial interactions responding to hairy root disease

by Rediers, Hans , Aerts, Jan , Vargas Ribera, Pablo in 16S rRNA amplicon sequencing , Agrobacterium , Analysis

2025

Background Hairy root disease (HRD), caused by rhizogenic Agrobacterium strains, is a significant disease threat to modern hydroponic greenhouses, which can result in up to 15% loss in yield. Our prior research has suggested increased alpha diversity after infection in hydroponic tomato root-associated microbiota. However, a more detailed investigation of how root-associated microbial components (MCs; clusters of weighted bacterial features) respond to disease and the underlying mechanisms remains lacking. To address this gap, we applied Latent Dirichlet Allocation (LDA) to analyze MCs from 12 Belgian commercial hydroponic tomato greenhouses. Using high-throughput amplicon sequencing of the 16S rRNA locus, three locations along each greenhouse irrigation system (beginning, middle, and end) were sampled at 5 time points throughout the 2018 growing season. Results In this study, we used LDA to identify root-associated MCs and gained insights into temporal changes and new health statuses. First, we observed a structured temporal pattern from the early stage (ES; sampling time points 1 and 2) through a transitional stage (TS; sampling time point 3) to the late stage (LS; sampling time points 4 and 5), showing different MC trajectories by health status. Second, MC4 (characterised by Paenibacillus spp.) was pronounced for healthy greenhouses in the ES, MC7 (characterised by rhizogenic Agrobacterium spp., Devosia and Limnobacter amplicon sequence variants (ASVs)) was pronounced for pre-symptomatic status, while MC0 (characterized by Comamonadaceae spp. ASVs) was indicative of an intermediate state between healthy and infected conditions. Furthermore, the ratio between Paenibacillus ASV and rhizogenic Agrobacterium ASV can be used as a biomarker to assess greenhouse health status in both ES and LS. Conclusion We investigated hydroponic tomato root-associated MCs responses to HRD using LDA, which revealed different MC trajectories in terms of plant health. Our study advances knowledge of hairy root disease regarding the mechanisms that can improve plant health monitoring in greenhouses and biocontrol strategies. From a computational perspective, we demonstrate how to apply LDA-a powerful analytical tool-to understudied subfields through visual analytics.

Journal Article

Share this book

Add to My Shelf

A visual analytic approach for the identification of ICU patient subpopulations using ICD diagnostic codes

by Aerts, Jan , Alcaide, Daniel in Algorithms , Analysis , Clinical trials

2021

A large number of clinical concepts are categorized under standardized formats that ease the manipulation, understanding, analysis, and exchange of information. One of the most extended codifications is the International Classification of Diseases (ICD) used for characterizing diagnoses and clinical procedures. With formatted ICD concepts, a patient profile can be described through a set of standardized and sorted attributes according to the relevance or chronology of events. This structured data is fundamental to quantify the similarity between patients and detect relevant clinical characteristics. Data visualization tools allow the representation and comprehension of data patterns, usually of a high dimensional nature, where only a partial picture can be projected. In this paper, we provide a visual analytics approach for the identification of homogeneous patient cohorts by combining custom distance metrics with a flexible dimensionality reduction technique. First we define a new metric to measure the similarity between diagnosis profiles through the concordance and relevance of events. Second we describe a variation of the Simplified Topological Abstraction of Data (STAD) dimensionality reduction technique to enhance the projection of signals preserving the global structure of data. The MIMIC-III clinical database is used for implementing the analysis into an interactive dashboard, providing a highly expressive environment for the exploration and comparison of patients groups with at least one identical diagnostic ICD code. The combination of the distance metric and STAD not only allows the identification of patterns but also provides a new layer of information to establish additional relationships between patient cohorts. The method and tool presented here add a valuable new approach for exploring heterogeneous patient populations. In addition, the distance metric described can be applied in other domains that employ ordered lists of categorical data.

Journal Article

Share this book

Add to My Shelf

FLASC: a flare-sensitive clustering algorithm

by Peeters, Jannes , Aerts, Jan , Bot, Daniël M. in Algorithms , Algorithms and Analysis of Algorithms , Analysis

2025

Exploratory data analysis workflows often use clustering algorithms to find groups of similar data points. The shape of these clusters can provide meaningful information about the data. For example, a Y-shaped cluster might represent an evolving process with two distinct outcomes. This article presents flare-sensitive clustering (FLASC), an algorithm that detects branches within clusters to identify such shape-based subgroups. FLASC builds upon HDBSCAN*—a state-of-the-art density-based clustering algorithm—and detects branches in a post-processing step using within-cluster connectivity. Two algorithm variants are presented, which trade computational cost for noise robustness. We show that both variants scale similarly to HDBSCAN* regarding computational cost and provide similar outputs across repeated runs. In addition, we demonstrate the benefit of branch detection on two real-world data sets. Our implementation is included in the hdbscan Python package and available as a standalone package at https://github.com/vda-lab/pyflasc .

Journal Article

Share this book

Add to My Shelf

Duplication of a promiscuous transcription factor drives the emergence of a new regulatory network

by Vladimir Benes , Arnout Voet , Karin Voordeckers in 38/15 , 38/35 , 38/39

2014

The emergence of new genes throughout evolution requires rewiring and extension of regulatory networks. However, the molecular details of how the transcriptional regulation of new gene copies evolves remain largely unexplored. Here we show how duplication of a transcription factor gene allowed the emergence of two independent regulatory circuits. Interestingly, the ancestral transcription factor was promiscuous and could bind different motifs in its target promoters. After duplication, one paralogue evolved increased binding specificity so that it only binds one type of motif, whereas the other copy evolved a decreased activity so that it only activates promoters that contain multiple binding sites. Interestingly, only a few mutations in both the DNA-binding domains and in the promoter binding sites were required to gradually disentangle the two networks. These results reveal how duplication of a promiscuous transcription factor followed by concerted cis and trans mutations allows expansion of a regulatory network. The molecular basis of transcriptional regulation evolution following gene duplication is poorly understood. Here the authors show how duplication of a promiscuous fungal transcription factor followed by concerted cis and trans mutations generates a novel regulatory network.

Journal Article

Share this book

Add to My Shelf

Arena3D: visualizing time-driven phenotypic differences in biological systems

by Aerts, Jan , Pavlopoulos, Georgios A , Secrier, Maria in Algorithms , Animals , Bioinformatics

2012

Background Elucidating the genotype-phenotype connection is one of the big challenges of modern molecular biology. To fully understand this connection, it is necessary to consider the underlying networks and the time factor. In this context of data deluge and heterogeneous information, visualization plays an essential role in interpreting complex and dynamic topologies. Thus, software that is able to bring the network, phenotypic and temporal information together is needed. Arena3D has been previously introduced as a tool that facilitates link discovery between processes. It uses a layered display to separate different levels of information while emphasizing the connections between them. We present novel developments of the tool for the visualization and analysis of dynamic genotype-phenotype landscapes. Results Version 2.0 introduces novel features that allow handling time course data in a phenotypic context. Gene expression levels or other measures can be loaded and visualized at different time points and phenotypic comparison is facilitated through clustering and correlation display or highlighting of impacting changes through time. Similarity scoring allows the identification of global patterns in dynamic heterogeneous data. In this paper we demonstrate the utility of the tool on two distinct biological problems of different scales. First, we analyze a medium scale dataset that looks at perturbation effects of the pluripotency regulator Nanog in murine embryonic stem cells. Dynamic cluster analysis suggests alternative indirect links between Nanog and other proteins in the core stem cell network. Moreover, recurrent correlations from the epigenetic to the translational level are identified. Second, we investigate a large scale dataset consisting of genome-wide knockdown screens for human genes essential in the mitotic process. Here, a potential new role for the gene lsm14a in cytokinesis is suggested. We also show how phenotypic patterning allows for extensive comparison and identification of high impact knockdown targets. Conclusions We present a new visualization approach for perturbation screens with multiple phenotypic outcomes. The novel functionality implemented in Arena3D enables effective understanding and comparison of temporal patterns within morphological layers, to help with the system-wide analysis of dynamic processes. Arena3D is available free of charge for academics as a downloadable standalone application from: http://arena3d.org/ .

Journal Article

Share this book

Add to My Shelf

The impact of early-life rearing conditions on the porcine gut microbiota and immune system

by Aerts, Jan , Xiao, Chuanpi , Everaert, Nadia in Agriculture , Antibiotics , Biomedical and Life Sciences

2025

Background Early life represents an unparalleled window in the life of the pig in which the gut microbiota interacts with its host’s naïve immune system. Yet, modern swine production often favours conditions that promote production efficiency rather than enriched microbiota development, the long-term consequences of which remain poorly understood. This study sought to analyse the long-term impacts of early-life rearing conditions on the gut microbiota until day 90, and in turn, its physiological and immunological consequences. We established two rearing conditions from farrowing until day 90: enriched, microbiota-enhancing husbandry characterised by weaning at 6 weeks and the provision of litter material throughout; and restricted, microbiota-depleting husbandry comprising weaning at 3 weeks and antibiotic administration from days 2 to 9. The day 42 faecal, and day 90 ileal and faecal microbiotas underwent 16 S V1-V9 rRNA gene sequencing. Intestinal and faecal volatile fatty acids were measured via gas chromatography, haematological parameters were assessed from whole blood, and serum immunoglobulin G was measured. Immune-focused gene expression in the spleen and ileum was also measured via qPCR. Results The faecal microbiota exhibited differential β-diversity by group at both timepoints. On day 90, enriched pigs exhibited significantly elevated ileal villus height to crypt depth ratios, which were negatively correlated with serum IgG. Conversely, restricted pigs had more branched-chain fatty acids in the colon and faeces, alongside signs of heightened immune activity, with haematology showing enhanced neutrophil activation, and elevated lymphocyte and IgG levels. In the spleen, gene sets comprising genes for the pro-inflammatory cytokines IL-6, IL-15 and IFN-γ were upregulated among restricted pigs, while enriched pigs exhibited better-primed innate immune systems. Conclusions These findings demonstrate long-term impacts of early-life rearing on faecal microbiota composition. We furthermore observed a potential shift towards inflammation and altered haematology associated with the microbiota.

Journal Article

Share this book

Add to My Shelf

MCLEAN: Multilevel Clustering Exploration As Network

by Aerts, Jan , Alcaide, Daniel in Algorithms , Analysis , Analytics

2018

Finding useful patterns in datasets has attracted considerable interest in the field of visual analytics. One of the most common tasks is the identification and representation of clusters. However, this is non-trivial in heterogeneous datasets since the data needs to be analyzed from different perspectives. Indeed, highly variable patterns may mask underlying trends in the dataset. Dendrograms are graphical representations resulting from agglomerative hierarchical clustering and provide a framework for viewing the clustering at different levels of detail. However, dendrograms become cluttered when the dataset gets large, and the single cut of the dendrogram to demarcate different clusters can be insufficient in heterogeneous datasets. In this work, we propose a visual analytics methodology called MCLEAN that offers a general approach for guiding the user through the exploration and detection of clusters. Powered by a graph-based transformation of the relational data, it supports a scalable environment for representation of heterogeneous datasets by changing the spatialization. We thereby combine multilevel representations of the clustered dataset with community finding algorithms. Our approach entails displaying the results of the heuristics to users, providing a setting from which to start the exploration and data analysis. To evaluate our proposed approach, we conduct a qualitative user study, where participants are asked to explore a heterogeneous dataset, comparing the results obtained by MCLEAN with the dendrogram. These qualitative results reveal that MCLEAN is an effective way of aiding users in the detection of clusters in heterogeneous datasets. The proposed methodology is implemented in an R package available at https://bitbucket.org/vda-lab/mclean .

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter