Catalogue Search | MBRL

Predicting intercellular communication based on metabolite-related ligand-receptor interactions with MRCLinkdb

by Zhang, Yang , Zhan, Meixiao , Sun, Taoping in Algorithms , Animals , Biomedical and Life Sciences

2024

Background Metabolite-associated cell communications play critical roles in maintaining human biological function. However, most existing tools and resources focus only on ligand-receptor interaction pairs where both partners are proteinaceous, neglecting other non-protein molecules. To address this gap, we introduce the MRCLinkdb database and algorithm, which aggregates and organizes data related to non-protein L-R interactions in cell-cell communication, providing a valuable resource for predicting intercellular communication based on metabolite-related ligand-receptor interactions. Results Here, we manually curated the metabolite-ligand-receptor (ML-R) interactions from the literature and known databases, ultimately collecting over 790 human and 670 mouse ML-R interactions. Additionally, we compiled information on over 1900 enzymes and 260 transporter entries associated with these metabolites. We developed Metabolite-Receptor based Cell Link Database (MRCLinkdb) to store these ML-R interactions data. Meanwhile, the platform also offers extensive information for presenting ML-R interactions, including fundamental metabolite information and the overall expression landscape of metabolite-associated gene sets (such as receptor, enzymes, and transporter proteins) based on single-cell transcriptomics sequencing (covering 35 human and 26 mouse tissues, 52 human and 44 mouse cell types) and bulk RNA-seq/microarray data (encompassing 62 human and 39 mouse tissues). Furthermore, MRCLinkdb introduces a web server dedicated to the analysis of intercellular communication based on ML-R interactions. MRCLinkdb is freely available at https://www.cellknowledge.com.cn/mrclinkdb/ . Conclusions In addition to supplementing ligand-receptor databases, MRCLinkdb may provide new perspectives for decoding the intercellular communication and advancing related prediction tools based on ML-R interactions.

Journal Article

Share this book

Add to My Shelf

Prediction of protein–protein interaction based on interaction-specific learning and hierarchical information

by Jiang, Jing , Wang, Peng , Cao, Xiaofeng in Biomedical and Life Sciences , Computational Biology - methods , Computational inference of protein conformations and interactions

2025

Background Prediction of protein–protein interactions (PPIs) is fundamental for identifying drug targets and understanding cellular processes. The rapid growth of PPI studies necessitates the development of efficient and accurate tools for automated prediction of PPIs. In recent years, several robust deep learning models have been developed for PPI prediction and have found widespread application in proteomics research. Despite these advancements, current computational tools still face limitations in modeling both the pairwise interactions and the hierarchical relationships between proteins. Results We present HI-PPI, a novel deep learning method that integrates hierarchical representation of PPI network and interaction-specific learning for protein–protein interaction prediction. HI-PPI extracts the hierarchical information by embedding structural and relational information into hyperbolic space. A gated interaction network is then employed to extract pairwise features for interaction prediction. Experiments on multiple benchmark datasets demonstrate that HI-PPI outperforms the state-of-the-art methods; HI-PPI improves Micro-F1 scores by 2.62%–7.09% over the second-best method. Moreover, HI-PPI offers explicit interpretability of the hierarchical organization within the PPI network. The distance between the origin and the hyperbolic embedding computed by HI-PPI naturally reflects the hierarchical level of proteins. Conclusions Overall, the proposed HI-PPI effectively addresses the limitations of existing PPI prediction methods. By leveraging the hierarchical structure of PPI network, HI-PPI significantly enhances the accuracy and robustness of PPI predictions.

Journal Article

Share this book

Add to My Shelf

Negative sampling strategies impact the prediction of scale-free biomolecular network interactions with machine learning

by Zhao, Guoqing , Liu, Zhi-Ping , Shao, Bowen in Analysis , Bias , Bioinformatics

2025

Background Understanding protein-molecular interaction is crucial for unraveling the mechanisms underlying diverse biological processes. Machine learning (ML) techniques have been extensively employed in predicting these interactions and have garnered substantial research focus. Previous studies have predominantly centered on improving model performance through novel and efficient ML approaches, often resulting in overoptimistic predictive estimates. However, these advancements frequently neglect the inherent biases stemming from network properties, particularly in biological contexts. Results In this study, we examined the biases inherent in ML models during the learning and prediction of protein-molecular interactions, particularly those arising from the scale-free property of biological networks—a characteristic where in a few nodes have many connections while most have very few. Our comprehensive analysis across diverse tasks, datasets, and ML methods provides compelling evidence of these biases. We discovered that the training and evaluation of ML models are profoundly influenced by network topology, potentially distorting model performance assessments. To mitigate this issue, we propose the degree distribution balanced (DDB) sampling strategy, a straightforward yet potent approach that alleviates biases stemming from network properties. This method further underscores the limitations of certain ML models in learning protein-molecular interactions solely from intrinsic molecular features. Conclusions Our findings present a novel perspective for assessing the performance of ML models in inferring protein-molecular interactions with greater fairness. By addressing biases introduced by network properties, the DDB sampling approach provides a more balanced and precise assessment of model capabilities. These insights hold the potential to bolster the reliability of ML models in bioinformatics, fostering a more stringent evaluation framework for predicting protein-molecular interactions.

Journal Article

Share this book

Add to My Shelf

Bridging chemical structure and conceptual knowledge enables accurate prediction of compound-protein interaction

by Jiang, Jing , Zeng, Li , Ma, Tengfei in Artificial intelligence , Biological activity , Biomedical and Life Sciences

2024

Background Accurate prediction of compound-protein interaction (CPI) plays a crucial role in drug discovery. Existing data-driven methods aim to learn from the chemical structures of compounds and proteins yet ignore the conceptual knowledge that is the interrelationships among the fundamental elements in the biomedical knowledge graph (KG). Knowledge graphs provide a comprehensive view of entities and relationships beyond individual compounds and proteins. They encompass a wealth of information like pathways, diseases, and biological processes, offering a richer context for CPI prediction. This contextual information can be used to identify indirect interactions, infer potential relationships, and improve prediction accuracy. In real-world applications, the prevalence of knowledge-missing compounds and proteins is a critical barrier for injecting knowledge into data-driven models. Results Here, we propose BEACON, a data and knowledge dual-driven framework that bridges chemical structure and conceptual knowledge for CPI prediction. The proposed BEACON learns the consistent representations by maximizing the mutual information between chemical structure and conceptual knowledge and predicts the missing representations by minimizing their conditional entropy. BEACON achieves state-of-the-art performance on multiple datasets compared to competing methods, notably with 5.1% and 6.6% performance gain on the BIOSNAP and DrugBank datasets, respectively. Moreover, BEACON is the only approach capable of effectively predicting knowledge representations for knowledge-lacking compounds and proteins. Conclusions Overall, our work provides a general approach for directly injecting conceptual knowledge to enhance the performance of CPI prediction.

Journal Article

Share this book

Add to My Shelf

An interpretable geometric graph neural network for enhancing the generalizability of drug–target interaction prediction

by Cui, Feifei , Xiong, An , Xia, Yan in Biomedical and Life Sciences , Computational inference of protein conformations and interactions , Cross-attention

2025

Background Accurate prediction of drug–target interactions (DTIs) is essential for advancing drug discovery. Although numerous computational methods have been proposed, many exhibit limited generalization, particularly when dealing with unseen drugs or targets. Results To address this challenge, we introduce GPS-DTI, a deep learning framework designed to capture both local and global features of drugs and proteins, thereby enhancing predictive robustness. Specifically, GPS-DTI employs a graph isomorphism network with edge features (GINE)–based graph neural network, combined with a multi-head attention mechanism (MHAM), to effectively model the structural characteristics of drug molecules. For proteins, representations are derived from the pre-trained Evolutionary Scale Model (ESM-2) model and further refined through convolutional neural networks (CNNs), yielding rich feature embeddings. A cross-attention module integrates drug and protein features, uncovering biologically meaningful interactions and improving model interpretability. Conclusions Comprehensive benchmarking across in-domain and cross-domain DTI prediction tasks demonstrates that GPS-DTI outperforms existing methods, underscoring its strong generalization capability. Notably, the model achieves state-of-the-art performance on drug–target affinity (DTA) tasks and shows robust adaptability when evaluated on an independent Coronavirus Disease 2019 (COVID-19)–related test set. Furthermore, visualization of cross-attention maps offers interpretable insights into key molecular interactions, highlighting the potential of GPS-DTI in real-world drug discovery applications.

Journal Article

Share this book

Add to My Shelf

Accurate prediction of drug-protein interactions by maintaining the original topological relationships among embeddings

by Wei, Jinmao , Chen, Xiran , Li, Yanfei in Accuracy , Algebraic topology , Analysis

2025

Background Learning-based methods have recently demonstrated strong potential in predicting drug-protein interactions (DPIs). However, existing approaches often fail to achieve accurate predictions on real-world imbalanced datasets while maintaining high generalizability and scalability, limiting their practical applicability. Results This study proposes a highly generalized model, GLDPI, aimed at improving prediction accuracy in imbalanced scenarios by preserving the topological relationships among initial molecular representations in the embedding space. Specifically, GLDPI employs dedicated encoders to transform one-dimensional sequence information of drugs and proteins into embedding representations and efficiently calculates the likelihood of DPIs using cosine similarity. Additionally, we introduce a prior loss function based on the guilt-by-association principle to ensure that the topology of the embedding space aligns with the structure of the initial drug-protein network. This design enables GLDPI to effectively capture network relationships and key features of molecular interactions, thereby significantly enhancing predictive performance. Conclusions Experimental results highlight GLDPI’s superior performance on multiple highly imbalanced benchmark datasets, achieving over a 100% improvement in the AUPR metric compared to state-of-the-art methods. Additionally, GLDPI demonstrates exceptional generalization capabilities in cold-start experiments, excelling in predicting novel drug-protein interactions. Furthermore, the model exhibits remarkable scalability, efficiently inferring approximately 1.2 × 10 10 drug-protein pairs in less than 10 h.

Journal Article

Share this book

Add to My Shelf

MESM: integrating multi-source data for high-accuracy protein-protein interactions prediction through multimodal language models

by Chang, Shan , Chu, Jinming , Shen, Liyan in Amino acid sequence , Amino acids , Artificial neural networks

2025

Background Protein-protein interactions (PPIs) play a critical role in essential biological processes such as signal transduction, enzyme activity regulation, cytoskeletal structure, immune responses, and gene regulation. However, current methods mainly focus on extracting features from protein sequences and using graph neural network (GNN) to acquire interaction information from the PPI network graph. This limits the model’s ability to learn richer and more effective interaction information, thereby affecting prediction performance. Results In this study, we propose a novel deep learning method, MESM, for effectively predicting PPI. The datasets used for the PPI prediction task were primarily constructed from the STRING database, including two Homo sapiens PPI datasets, SHS27k and SHS148k, and two Saccharomyces cerevisiae PPI datasets, SYS30k and SYS60k. MESM consists of three key modules, as follows: First, MESM extracts multimodal representations from protein sequence information, protein structure information, and point cloud features through Sequence Variational Autoencoder (SVAE), Variational Graph Autoencoder (VGAE), and PointNet Autoencoder (PAE). Then, Fusion Autoencoder (FAE) is used to integrate these multimodal features, generating rich and balanced protein representations. Next, MESM leverages GraphGPS to learn structural information from the PPI network graph structure and combines Graph Attention Network (GAT) to further capture protein interaction information. Finally, MESM uses Graph Convolutional Network (GCN) and SubgraphGCN to extract global and local features from the perspective of the overall graph and subgraphs. Moreover, we build seven independent graphs from the overall PPI network graph to specifically learn the features of each PPI type, thereby enhancing the model’s learning ability for different types of interactions. Conclusions Compared to the state-of-the-art methods, MESM achieved improvements of 8.77%, 4.98%, 7.48%, and 6.08% on SHS27k, SHS148k, SYS30k, and SYS60k, respectively. The experimental results demonstrate that MESM exhibits significant improvements in PPI prediction performance.

Journal Article

Share this book

Add to My Shelf

CRBPSA: CircRNA-RBP interaction sites identification using sequence structural attention model

by Dai, Qi , Wang, Tao , Zou, Quan in Algorithms , Apoptosis , Attention mechanism

2024

Background Due to the ability of circRNA to bind with corresponding RBPs and play a critical role in gene regulation and disease prevention, numerous identification algorithms have been developed. Nevertheless, most of the current mainstream methods primarily capture one-dimensional sequence features through various descriptors, while neglecting the effective extraction of secondary structure features. Moreover, as the number of introduced descriptors increases, the issues of sparsity and ineffective representation also rise, causing a significant burden on computational models and leaving room for improvement in predictive performance. Results Based on this, we focused on capturing the features of secondary structure in sequences and developed a new architecture called CRBPSA, which is based on a sequence-structure attention mechanism. Firstly, a base-pairing matrix is generated by calculating the matching probability between each base, with a Gaussian function introduced as a weight to construct the secondary structure. Then, a Structure_Transformer is employed to extract base-pairing information and spatial positional dependencies, enabling the identification of binding sites through deeper feature extraction. Experimental results using the same set of hyperparameters on 37 circRNA datasets, totaling 671,952 samples, show that the CRBPSA algorithm achieves an average AUC of 99.93%, surpassing all existing prediction methods. Conclusions CRBPSA is a lightweight and efficient prediction tool for circRNA-RBP, which can capture structural features of sequences with minimal computational resources and accurately predict protein-binding sites. This tool facilitates a deeper understanding of the biological processes and mechanisms underlying circRNA and protein interactions.

Journal Article

Share this book

Add to My Shelf

PharmRL: pharmacophore elucidation with deep geometric reinforcement learning

by R. Koes, David , Aggarwal, Rishal in Algorithms , Automation , Binding Sites

2024

Background Molecular interactions between proteins and their ligands are important for drug design. A pharmacophore consists of favorable molecular interactions in a protein binding site and can be utilized for virtual screening. Pharmacophores are easiest to identify from co-crystal structures of a bound protein-ligand complex. However, designing a pharmacophore in the absence of a ligand is a much harder task. Results In this work, we develop a deep learning method that can identify pharmacophores in the absence of a ligand. Specifically, we train a CNN model to identify potential favorable interactions in the binding site, and develop a deep geometric Q-learning algorithm that attempts to select an optimal subset of these interaction points to form a pharmacophore. With this algorithm, we show better prospective virtual screening performance, in terms of F1 scores, on the DUD-E dataset than random selection of ligand-identified features from co-crystal structures. We also conduct experiments on the LIT-PCBA dataset and show that it provides efficient solutions for identifying active molecules. Finally, we test our method by screening the COVID moonshot dataset and show that it would be effective in identifying prospective lead molecules even in the absence of fragment screening experiments. Conclusions PharmRL addresses the need for automated methods in pharmacophore design, particularly in cases where a cognate ligand is unavailable. Experimental results demonstrate that PharmRL generates functional pharmacophores. Additionally, we provide a Google Colab notebook to facilitate the use of this method.

Journal Article

Share this book

Add to My Shelf

PBertKla: a protein large language model for predicting human lysine lactylation sites

by Wei, Yijie , Zhu, Tao , Xie, Sijia in Accuracy , Amino acids , BERT

2025

Background Lactylation is a newly discovered type of post-translational modification, primarily occurring on lysine (K) residues of both histones and non-histones to exert diverse effects on target proteins. Research has shown that lysine lactylation (Kla) modification is ubiquitous in different cells and participates in the determination of cell function and fate, as well as in the initiation and progression of various diseases. Precise identification of Kla sites is fundamental for elucidating their biological functions and uncovering their application potential. Results Here, we proposed a novel human Kla site predictor (named PBertKla) through curating a reliable benchmark dataset with proper sample length and sequence identity threshold to train a protein large language model with optimal hyperparameters. Extensive experimental results consistently demonstrated that our model possessed robust human Kla site prediction ability, achieving an AUC (area under receiver operating characteristic curve) value of over 0.880 on the independent validation data. Feature visualization analysis further validated the effectiveness of in feature learning and representation from Kla sequences. Moreover, we benchmarked PBertKla against other cutting-edge models on an independent testing dataset from different sources, highlighting its superiority and transferability. Conclusions All results indicated that PBertKla excelled as an automatic predictor of human Kla sites, and it would advance the investigation of lactylation modifications and their significance in health and disease.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter