Catalogue Search | MBRL

SST-ResNet: A Sequence and Structure Information Integration Model for Protein Property Prediction

by Zhou, Guowei , Zhao, Yanpeng , Bo, Xiaochen in Algorithms , Amino Acid Sequence , Amino acids

2025

Proteins are the basic building blocks of life and perform fundamental functions in biology. Predicting protein properties based on amino acid sequences and 3D structures has become a key approach to accelerating drug development. In this study, we propose a novel sequence- and structure-based framework, SST-ResNet, which consists of the multimodal language model ProSST and a multi-scale information integration module. This framework is designed to deeply explore the latent relationships between protein sequences and structures, thereby achieving superior synergistic prediction performance. Our method outperforms previous joint prediction models on Enzyme Commission (EC) numbers and Gene Ontology (GO) tasks. Furthermore, we demonstrate the necessity of multi-scale information integration for these two types of data and illustrate its exceptional performance on key tasks. We anticipate that this framework can be extended to a broader range of protein property prediction problems, ultimately facilitating drug development.

Journal Article

Share this book

Add to My Shelf

A benchmark study of deep learning-based multi-omics data fusion methods for cancer

by Wen, Yuqi , Zhang, Zhongnan , Leng, Dongjin in Algorithms , Animal Genetics and Genomics , Benchmarking

2022

Background A fused method using a combination of multi-omics data enables a comprehensive study of complex biological processes and highlights the interrelationship of relevant biomolecules and their functions. Driven by high-throughput sequencing technologies, several promising deep learning methods have been proposed for fusing multi-omics data generated from a large number of samples. Results In this study, 16 representative deep learning methods are comprehensively evaluated on simulated, single-cell, and cancer multi-omics datasets. For each of the datasets, two tasks are designed: classification and clustering. The classification performance is evaluated by using three benchmarking metrics including accuracy, F1 macro, and F1 weighted. Meanwhile, the clustering performance is evaluated by using four benchmarking metrics including the Jaccard index (JI), C-index, silhouette score, and Davies Bouldin score. For the cancer multi-omics datasets, the methods’ strength in capturing the association of multi-omics dimensionality reduction results with survival and clinical annotations is further evaluated. The benchmarking results indicate that moGAT achieves the best classification performance. Meanwhile, efmmdVAE, efVAE, and lfmmdVAE show the most promising performance across all complementary contexts in clustering tasks. Conclusions Our benchmarking results not only provide a reference for biomedical researchers to choose appropriate deep learning-based multi-omics data fusion methods, but also suggest the future directions for the development of more effective multi-omics data fusion methods. The deep learning frameworks are available at https://github.com/zhenglinyi/DL-mo .

Journal Article

Share this book

Add to My Shelf

Predicting Antibody Affinity Changes upon Mutation Based on Unbound Protein Structures

by Bo, Xiaochen , Chi, Xiangyang , Chen, Zhengshan in Antibodies , Antibodies, Monoclonal - chemistry , Antibodies, Monoclonal - genetics

2025

Antibodies are key proteins in the immune system that can reversibly and non-covalently bind specifically to their corresponding antigens, forming antigen–antibody complexes. They play a crucial role in recognizing foreign or self-antigens during the adaptive immune response. Monoclonal antibodies have emerged as a promising class of biological macromolecule therapeutics with broad market prospects. In the process of antibody drug development, a key engineering challenge is to improve the affinity of candidate antibodies, without experimentally resolved structures of the antigen–antibody complexes as input for computer-aided predictive methods. In this work, we present an approach for predicting the effect of residue mutations on antibody affinity without the structures of the antigen–antibody complexes. The method involves the graph representation of proteins and utilizes a pre-trained encoder. The encoder captures the residue-level microenvironment of the target residue on the antibody along with the antigen context pre- and post-mutation. The encoder inherently possesses the potential to identify paratope residues. In addition, we curated a benchmark dataset specifically for mutations of the antibody. Compared to baseline methods based on complex structures and sequences, our approach achieves superior or comparable average accuracy on benchmark datasets. Additionally, we validate its advantage of not requiring antigen–antibody complex structures as input for predicting the effects of mutations in antibodies against SARS-CoV-2, influenza, and human cytomegalovirus. Our method shows its potential for identifying mutations that improve antibody affinity in practical antibody engineering applications.

Journal Article

Share this book

Add to My Shelf

ggtreeExtra: Compact Visualization of Richly Annotated Phylogenetic Data

by Zhou, Lang , Xu, Shuangbin , Bo, Xiaochen in Phylogeny , Resources , Scientific visualization

2021

We present the ggtreeExtra package for visualizing heterogeneous data with a phylogenetic tree in a circular or rectangular layout (https://www.bioconductor.org/packages/ggtreeExtra). The package supports more data types and visualization methods than other tools. It supports using the grammar of graphics syntax to present data on a tree with richly annotated layers and allows evolutionary statistics inferred by commonly used software to be integrated and visualized with external data. GgtreeExtra is a universal tool for tree data visualization. It extends the applications of the phylogenetic tree in different disciplines by making more domain-specific data to be available to visualize and interpret in the evolutionary context.

Journal Article

Share this book

Add to My Shelf

Deep learning-based transcriptome data classification for drug-target interaction prediction

by Xie, Lingwei , Zhang, Zhongnan , Bo, Xiaochen in Acids , Algorithms , Animal Genetics and Genomics

2018

Background The ability to predict the interaction of drugs with target proteins is essential to research and development of drug. However, the traditional experimental paradigm is costly, and previous in silico prediction paradigms have been impeded by the wide range of data platforms and data scarcity. Results In this paper, we modeled the prediction of drug-target interactions as a binary classification task. Using transcriptome data from the L1000 database of the LINCS project, we developed a framework based on a deep-learning algorithm to predict potential drug target interactions. Once fully trained, the model achieved over 98% training accuracy. The results of our research demonstrated that our framework could discover more reliable DTIs than found by other methods. This conclusion was validated further across platforms with a high percentage of overlapping interactions. Conclusions Our model’s capacity of integrating transcriptome data from drugs and genes strongly suggests the strength of its potential for DTI prediction, thereby improving the drug discovery process.

Journal Article

Share this book

Add to My Shelf

A Point Cloud Graph Neural Network for Protein–Ligand Binding Site Prediction

by Zhao, Yanpeng , Bo, Xiaochen , Li, Mengfan in Algorithms , Amino acids , Artificial intelligence

2024

Predicting protein–ligand binding sites is an integral part of structural biology and drug design. A comprehensive understanding of these binding sites is essential for advancing drug innovation, elucidating mechanisms of biological function, and exploring the nature of disease. However, accurately identifying protein–ligand binding sites remains a challenging task. To address this, we propose PGpocket, a geometric deep learning-based framework to improve protein–ligand binding site prediction. Initially, the protein surface is converted into a point cloud, and then the geometric and chemical properties of each point are calculated. Subsequently, the point cloud graph is constructed based on the inter-point distances, and the point cloud graph neural network (GNN) is applied to extract and analyze the protein surface information to predict potential binding sites. PGpocket is trained on the scPDB dataset, and its performance is verified on two independent test sets, Coach420 and HOLO4K. The results show that PGpocket achieves a 58% success rate on the Coach420 dataset and a 56% success rate on the HOLO4K dataset. These results surpass competing algorithms, demonstrating PGpocket’s advancement and practicality for protein–ligand binding site prediction.

Journal Article

Share this book

Add to My Shelf

CGMega: explainable graph neural network framework with attention mechanisms for cancer gene module dissection

by Chen, Hebing , Bo, Xiaochen , Ren, Chao in 38/39 , 38/91 , 631/114/1305

2024

Cancer is rarely the straightforward consequence of an abnormality in a single gene, but rather reflects a complex interplay of many genes, represented as gene modules. Here, we leverage the recent advances of model-agnostic interpretation approach and develop CGMega, an explainable and graph attention-based deep learning framework to perform cancer gene module dissection. CGMega outperforms current approaches in cancer gene prediction, and it provides a promising approach to integrate multi-omics information. We apply CGMega to breast cancer cell line and acute myeloid leukemia (AML) patients, and we uncover the high-order gene module formed by ErbB family and tumor factors NRG1 , PPM1A and DLG2 . We identify 396 candidate AML genes, and observe the enrichment of either known AML genes or candidate AML genes in a single gene module. We also identify patient-specific AML genes and associated gene modules. Together, these results indicate that CGMega can be used to dissect cancer gene modules, and provide high-order mechanistic insights into cancer development and heterogeneity. Gene modules are widespread and important for studying cancer. Here, authors propose an explainable deep learning-based framework, CGMega, which incorporates multi-omics information from the three-dimensional genome, epigenome, and protein-protein interactions to dissect cancer gene modules.

Journal Article

Share this book

Add to My Shelf

Decitabine priming increases anti–PD-1 antitumor efficacy by promoting CD8+ progenitor exhausted T cell expansion in tumor models

by Dong, Liang , Nie, Jing , Chen, Hebing in 5-aza-2'-deoxycytidine , Activator protein 1 , Antitumor activity

2023

CD8+ exhausted T cells (Tex) are heterogeneous. PD-1 inhibitors reinvigorate progenitor Tex, which subsequently differentiate into irresponsive terminal Tex. The ability to maintain a capacity for durable proliferation of progenitor Tex is important, but the mechanism remains unclear. Here, we showed CD8+ progenitor Tex pretreated with decitabine, a low-dose DNA demethylating agent, had enhanced proliferation and effector function against tumors after anti-PD-1 treatment in vitro. Treatment with decitabine plus anti-PD-1 promoted the activation and expansion of tumor-infiltrated CD8+ progenitor Tex and efficiently suppressed tumor growth in multiple tumor models. Transcriptional and epigenetic profiling of tumor-infiltrated T cells demonstrated that the combination of decitabine plus anti-PD-1 markedly elevated the clonal expansion and cytolytic activity of progenitor Tex compared with anti-PD-1 monotherapy and restrained CD8+ T cell terminal differentiation. Strikingly, decitabine plus anti-PD-1 sustained the expression and activity of the AP-1 transcription factor JunD, which was reduced following PD-1 blockade therapy. Downregulation of JunD repressed T cell proliferation, and activation of JNK/AP-1 signaling in CD8+ T cells enhanced the antitumor capacity of PD-1 inhibitors. Together, epigenetic agents remodel CD8+ progenitor Tex populations and improve responsiveness to anti-PD-1 therapy.

Journal Article

Share this book

Add to My Shelf

High-resolution annotation of the mouse preimplantation embryo transcriptome using long-read sequencing

by Huang, Xingxu , Bo, Xiaochen , Ren, Chao in 38/77 , 49/91 , 631/1647/514/2254

2020

The transcriptome of the preimplantation mouse embryo has been previously annotated by short-read sequencing, with limited coverage and accuracy. Here we utilize a low-cell number transcriptome based on the Smart-seq2 method to perform long-read sequencing. Our analysis describes additional novel transcripts and complexity of the preimplantation transcriptome, identifying 2280 potential novel transcripts from previously unannotated loci and 6289 novel splicing isoforms from previously annotated genes. Notably, these novel transcripts and isoforms with transcription start sites are enriched for an active promoter modification, H3K4me3. Moreover, we generate a more complete and precise transcriptome by combining long-read and short-read data during early embryogenesis. Based on this approach, we identify a previously undescribed isoform of Kdm4dl with a modified mRNA reading frame and a novel noncoding gene designated XLOC_004958 . Depletion of Kdm4dl or XLOC_004958 led to abnormal blastocyst development. Thus, our data provide a high-resolution and more precise transcriptome during preimplantation mouse embryogenesis. Until now, the transcriptome of preimplantation mouse embryos has only been analysed by short-read sequencing. Here, the authors perform long-read sequencing to provide a more detailed transcriptome of the preimplantation mouse embryo, identifying various novel transcripts, for example Kdm4dl.

Journal Article

Share this book

Add to My Shelf

Comprehensive Identification and Annotation of Cell Type-Specific and Ubiquitous CTCF-Binding Sites in the Human Genome

by Chen, Hebing , Wang, Shengqi , Bo, Xiaochen in Annotations , Binding proteins , Binding sites

2012

Chromatin insulators are DNA elements that regulate the level of gene expression either by preventing gene silencing through the maintenance of heterochromatin boundaries or by preventing gene activation by blocking interactions between enhancers and promoters. CCCTC-binding factor (CTCF), a ubiquitously expressed 11-zinc-finger DNA-binding protein, is the only protein implicated in the establishment of insulators in vertebrates. While CTCF has been implicated in diverse regulatory functions, CTCF has only been studied in a limited number of cell types across human genome. Thus, it is not clear whether the identified cell type-specific differences in CTCF-binding sites are functionally significant. Here, we identify and characterize cell type-specific and ubiquitous CTCF-binding sites in the human genome across 38 cell types designated by the Encyclopedia of DNA Elements (ENCODE) consortium. These cell type-specific and ubiquitous CTCF-binding sites show uniquely versatile transcriptional functions and characteristic chromatin features. In addition, we confirm the insulator barrier function of CTCF-binding and explore the novel function of CTCF in DNA replication. These results represent a critical step toward the comprehensive and systematic understanding of CTCF-dependent insulators and their versatile roles in the human genome.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter