Catalogue Search | MBRL

MVP predicts the pathogenicity of missense variants by deep learning

by Long, John J. , Qi, Hongjian , Chen, Chen in 45/23 , 631/114/1305 , 631/114/2184

2021

Accurate pathogenicity prediction of missense variants is critically important in genetic studies and clinical diagnosis. Previously published prediction methods have facilitated the interpretation of missense variants but have limited performance. Here, we describe MVP (Missense Variant Pathogenicity prediction), a new prediction method that uses deep residual network to leverage large training data sets and many correlated predictors. We train the model separately in genes that are intolerant of loss of function variants and the ones that are tolerant in order to take account of potentially different genetic effect size and mode of action. We compile cancer mutation hotspots and de novo variants from developmental disorders for benchmarking. Overall, MVP achieves better performance in prioritizing pathogenic missense variants than previous methods, especially in genes tolerant of loss of function variants. Finally, using MVP, we estimate that de novo coding variants contribute to 7.8% of isolated congenital heart disease, nearly doubling previous estimates. Accurate prediction of variant pathogenicity is essential to understanding genetic risks in disease. Here, the authors present a deep neural network method for prediction of missense variant pathogenicity, MVP, and demonstrate its utility in prioritizing de novo variants contributing to developmental disorders.

Journal Article

Share this book

Add to My Shelf

Template-based prediction of protein structure with deep learning

by Zhang, Haicang , Shen, Yufeng in Accuracy , Algorithms , Alignment

2020

Background Accurate prediction of protein structure is fundamentally important to understand biological function of proteins. Template-based modeling, including protein threading and homology modeling, is a popular method for protein tertiary structure prediction. However, accurate template-query alignment and template selection are still very challenging, especially for the proteins with only distant homologs available. Results We propose a new template-based modelling method called ThreaderAI to improve protein tertiary structure prediction. ThreaderAI formulates the task of aligning query sequence with template as the classical pixel classification problem in computer vision and naturally applies deep residual neural network in prediction. ThreaderAI first employs deep learning to predict residue-residue aligning probability matrix by integrating sequence profile, predicted sequential structural features, and predicted residue-residue contacts, and then builds template-query alignment by applying a dynamic programming algorithm on the probability matrix. We evaluated our methods both in generating accurate template-query alignment and protein threading. Experimental results show that ThreaderAI outperforms currently popular template-based modelling methods HHpred, CNFpred, and the latest contact-assisted method CEthreader, especially on the proteins that do not have close homologs with known structures. In particular, in terms of alignment accuracy measured with TM-score, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 56, 13, and 11%, respectively, on template-query pairs at the similarity of fold level from SCOPe data. And on CASP13’s TBM-hard data, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 16, 9 and 8% in terms of TM-score, respectively. Conclusions These results demonstrate that with the help of deep learning, ThreaderAI can significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins.

Journal Article

Share this book

Add to My Shelf

Microblog-HAN: A micro-blog rumor detection model based on heterogeneous graph attention network

by Wang, Yaojun , Bi, Bei , Gao, Yang in Biology and Life Sciences , Communication , Communications networks

2022

Although social media has highly facilitated people’s daily communication and dissemination of information, it has unfortunately been an ideal hotbed for the breeding and dissemination of Internet rumors. Therefore, automatically monitoring rumor dissemination in the early stage is of great practical significance. However, the existing detection methods fail to take full advantage of the semantics of the microblog information propagation graph. To address this shortcoming, this study models the information transmission network of a microblog as a heterogeneous graph with a variety of semantic information and then constructs a Microblog-HAN, which is a graph-based rumor detection model, to capture and aggregate the semantic information using attention layers. Specifically, after the initial textual and visual features of posts are extracted, the node-level attention mechanism combines neighbors of the microblog nodes to generate three groups of node embeddings with specific semantics. Moreover, semantic-level attention fuses different semantics to obtain the final node embedding of the microblog, which is then used as a classifier’s input. Finally, the classification results of whether the microblog is a rumor or not are obtained. The experimental results on two real-world microblog rumor datasets, Weibo2016 and Weibo2021, demonstrate that the proposed Microblog-HAN can detect microblog rumors with an accuracy of over 92%, demonstrating its superiority over the most existing methods in identifying rumors from the view of the whole information transmission graph.

Journal Article

Share this book

Add to My Shelf

FALCON2: a web server for high-quality prediction of protein tertiary structures

by Sun, Shiwei , Bu, Dongbo , Kong, Lupeng in Ab initio prediction , Algorithms , Amino Acid Sequence

2021

Background Accurate prediction of protein tertiary structures is highly desired as the knowledge of protein structures provides invaluable insights into protein functions. We have designed two approaches to protein structure prediction, including a template-based modeling approach (called ProALIGN) and an ab initio prediction approach (called ProFOLD). Briefly speaking, ProALIGN aligns a target protein with templates through exploiting the patterns of context-specific alignment motifs and then builds the final structure with reference to the homologous templates. In contrast, ProFOLD uses an end-to-end neural network to estimate inter-residue distances of target proteins and builds structures that satisfy these distance constraints. These two approaches emphasize different characteristics of target proteins: ProALIGN exploits structure information of homologous templates of target proteins while ProFOLD exploits the co-evolutionary information carried by homologous protein sequences. Recent progress has shown that the combination of template-based modeling and ab initio approaches is promising. Results In the study, we present FALCON2, a web server that integrates ProALIGN and ProFOLD to provide high-quality protein structure prediction service. For a target protein, FALCON2 executes ProALIGN and ProFOLD simultaneously to predict possible structures and selects the most likely one as the final prediction result. We evaluated FALCON2 on widely-used benchmarks, including 104 CASP13 (the 13th Critical Assessment of protein Structure Prediction) targets and 91 CASP14 targets. In-depth examination suggests that when high-quality templates are available, ProALIGN is superior to ProFOLD and in other cases, ProFOLD shows better performance. By integrating these two approaches with different emphasis, FALCON2 server outperforms the two individual approaches and also achieves state-of-the-art performance compared with existing approaches. Conclusions By integrating template-based modeling and ab initio approaches, FALCON2 provides an easy-to-use and high-quality protein structure prediction service for the community and we expect it to enable insights into a deep understanding of protein functions.

Journal Article

Share this book

Add to My Shelf

GPCR-BSD: a database of binding sites of human G-protein coupled receptors under diverse states

by Zhou, Han , Bu, Dongbo , Zhou, Liangliang in Algorithms , Analysis , Binding site

2024

G-protein coupled receptors (GPCRs), the largest family of membrane proteins in human body, involve a great variety of biological processes and thus have become highly valuable drug targets. By binding with ligands (e.g., drugs), GPCRs switch between active and inactive conformational states, thereby performing functions such as signal transmission. The changes in binding pockets under different states are important for a better understanding of drug-target interactions. Therefore it is critical, as well as a practical need, to obtain binding sites in human GPCR structures. We report a database (called GPCR-BSD) that collects 127,990 predicted binding sites of 803 GPCRs under active and inactive states (thus 1,606 structures in total). The binding sites were identified from the predicted GPCR structures by executing three geometric-based pocket prediction methods, fpocket, CavityPlus and GHECOM. The server provides query, visualization, and comparison of the predicted binding sites for both GPCR predicted and experimentally determined structures recorded in PDB. We evaluated the identified pockets of 132 experimentally determined human GPCR structures in terms of pocket residue coverage, pocket center distance and redocking accuracy. The evaluation showed that fpocket and CavityPlus methods performed better and successfully predicted orthosteric binding sites in over 60% of the 132 experimentally determined structures. The GPCR Binding Site database is freely accessible at http://121.41.176.252:8021 . This study not only provides a systematic evaluation of the commonly-used fpocket and CavityPlus methods for the first time but also meets the need for binding site information in GPCR studies.

Journal Article

Share this book

Add to My Shelf

Molten Pool Behaviors in Double-Sided Pulsed GMAW of T-Joint: A Numerical Study

by Wang, Chunsheng , Zhang, Haicang , Lin, Sanbao in Aluminum , Behavior , Computer aided engineering

2021

The T-joint is one of the essential types of joints in aluminum welded structures. Double-sided welding is a preferable solution to maintain high efficiency and avoid significant distortion during T-joint welding. However, interactions between double-sided molten pools make flow behaviors complicated during welding. Numerical simulations regarding molten pool behaviors were conducted in this research to understand the complex flow phenomenon. The influences of wire feed rates and torch distances were simulated and discussed. The results show that droplet impinging drives the fluid to flow down to the root and form a frontward vortex. Marangoni stress forces the fluid to form an outward vortex near the molten pool boundary and flatten the concave-shaped molten pool surface. With an increased wire feed speed, the volume of the molten pool increases, and the root fusion is improved. With an increased torch distance, the width of the front molten pool decreases while the length increases, and the rear molten pool size decreases slightly. Both wire feed speeds and the torch distances have limited influences on the basic flow characteristics.

Journal Article

Share this book

Add to My Shelf

Predicting protein inter-residue contacts using composite likelihood maximization and deep learning

by Sun, Shiwei , Zhang, Qi , Bu, Dongbo in Algorithms , Benchmarking , Bioinformatics

2019

Background Accurate prediction of inter-residue contacts of a protein is important to calculating its tertiary structure. Analysis of co-evolutionary events among residues has been proved effective in inferring inter-residue contacts. The Markov random field (MRF) technique, although being widely used for contact prediction, suffers from the following dilemma: the actual likelihood function of MRF is accurate but time-consuming to calculate; in contrast, approximations to the actual likelihood, say pseudo-likelihood, are efficient to calculate but inaccurate. Thus, how to achieve both accuracy and efficiency simultaneously remains a challenge. Results In this study, we present such an approach (called clmDCA) for contact prediction. Unlike plmDCA using pseudo-likelihood, i.e., the product of conditional probability of individual residues, our approach uses composite-likelihood, i.e., the product of conditional probability of all residue pairs. Composite likelihood has been theoretically proved as a better approximation to the actual likelihood function than pseudo-likelihood. Meanwhile, composite likelihood is still efficient to maximize, thus ensuring the efficiency of clmDCA. We present comprehensive experiments on popular benchmark datasets, including PSICOV dataset and CASP-11 dataset, to show that: i ) clmDCA alone outperforms the existing MRF-based approaches in prediction accuracy. i i ) When equipped with deep learning technique for refinement, the prediction accuracy of clmDCA was further significantly improved, suggesting the suitability of clmDCA for subsequent refinement procedure. We further present a successful application of the predicted contacts to accurately build tertiary structures for proteins in the PSICOV dataset. Conclusions Composite likelihood maximization algorithm can efficiently estimate the parameters of Markov Random Fields and can improve the prediction accuracy of protein inter-residue contacts.

Journal Article

Share this book

Add to My Shelf

Study on Medium-Thick Al-Alloy T-Joints by Dual P-GMAW Bilateral Synchronous Welding

by Wang, Chunsheng , Chen, Shujun , Zhao, Yun in adaptive deposition , Al-alloy , Alloys

2021

The T-joints of medium-thick 6082 Al-alloy plates created by dual pulsed gas metal arc welding (P-GMAW) and bilateral synchronous welding were investigated to improve weld quality using the adaptive deposition method, which calculates the minimum amount of deposition according to the welding condition, groove size, and cross-sectional area, effectively reducing the heat input and deformation of the welds on the basis of weld filling. The optimized linear energy with a wire feed speed (WFS) of 9.5 m/min can ensure a well-formed weld with a complete root fusion, and high-quality T-joint welds were obtained both in root openings of 0 mm and 1 mm. The biggest penetration was 4 mm, which was four times more than that of the result from a single torch welding process. When the distance between the two welding torches exceeded 20 mm, the molten pool was completely separated, and process pores were observed in the unfused root zone. Influenced by the thermal cycles in asymmetric welding, the hardness distribution changed: the width of the softer zone at the base plate with the fore arc was smaller than that zone with the rear arc. Furthermore, dual P-GMAW bilateral synchronous welding with an asymmetric heat source can further reduce the deformation of the welded joint by about 20% compared to that of symmetric welding.

Journal Article

Share this book

Add to My Shelf

Constructing effective energy functions for protein structure prediction through broadening attraction-basin and reverse Monte Carlo sampling

by Sun, Shiwei , Bu, Dongbo , Wang, Chao in Algorithms , Analysis , Attraction

2019

Background The ab initio approaches to protein structure prediction usually employ the Monte Carlo technique to search the structural conformation that has the lowest energy. However, the widely-used energy functions are usually ineffective for conformation search. How to construct an effective energy function remains a challenging task. Results Here, we present a framework to construct effective energy functions for protein structure prediction. Unlike existing energy functions only requiring the native structure to be the lowest one, we attempt to maximize the attraction-basin where the native structure lies in the energy landscape. The underlying rationale is that each energy function determines a specific energy landscape together with a native attraction-basin, and the larger the attraction-basin is, the more likely for the Monte Carlo search procedure to find the native structure. Following this rationale, we constructed effective energy functions as follows: i ) To explore the native attraction-basin determined by a certain energy function, we performed reverse Monte Carlo sampling starting from the native structure, identifying the structural conformations on the edge of attraction-basin. i i ) To broaden the native attraction-basin, we smoothened the edge points of attraction-basin through tuning weights of energy terms, thus acquiring an improved energy function. Our framework alternates the broadening attraction-basin and reverse sampling steps (thus called BARS) until the native attraction-basin is sufficiently large. We present extensive experimental results to show that using the BARS framework, the constructed energy functions could greatly facilitate protein structure prediction in improving the quality of predicted structures and speeding up conformation search. Conclusion Using the BARS framework, we constructed effective energy functions for protein structure prediction, which could improve the quality of predicted structures and speed up conformation search as well.

Journal Article

Share this book

Add to My Shelf

Correction to: Predicting protein inter-residue contacts using composite likelihood maximization and deep learning

by Sun, Shiwei , Zhang, Qi , Bu, Dongbo in Algorithms , Bioinformatics , Biomedical and Life Sciences

2019

Red (green) dots indicate correct (incorrect) prediction, while grey dots indicate all true residue-residue contacts. a The comparison between clmDCA (in upper-left triangle) and plmDCA (in lower-right triangle). b The comparison between clmDCA (in upper-left triangle) and clmDCA after refining using deep residual network (in lower-right triangle) Full size image Fig. 3 figure3 The relationship between the prediction accuracy and quality of MSA. Sequence separation: > 6 AA Full size image Fig. 4 figure4 Native structure and predicted structures for protein structure with PDB ID: 1vmbA. a Native structure. b Structure built using contacts predicted by plmDCA (TMscore: 0.42). c Structure built using contacts predicted by clmDCA alone (TMscore: 0.55). d Structure built using contacts predicted by clmDCA together with deep learning for refinement (TMscore: 0.72) Full size image Fig. 5 figure5 Procedure of clmDCA to predict inter-residue contacts. a For a query protein (1wlg_A as an example), we identified its homologues by running HHblits [59] against nr90 sequence database (parameter setting: j: 3, id: 90, cov: 70) and constructed multiple sequence alignment of these proteins. b The correlation among residues in MSA was disentangled using composite likelihood maximization technique, generating prediction of inter-residue contacts. c The predicted contacts were fed into a deep neural network for refinement. d The refined prediction of inter-residue contacts Full size image Reference 1. Rights and permissions Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter