Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
318
result(s) for
"Position-Specific Scoring Matrices"
Sort by:
Reliable scaling of position weight matrices for binding strength comparisons between transcription factors
2015
Background
Scoring DNA sequences against Position Weight Matrices (PWMs) is a widely adopted method to identify putative transcription factor binding sites. While common bioinformatics tools produce scores that can reflect the binding strength between a specific transcription factor and the DNA, these scores are not directly comparable between different transcription factors. Other methods, including p-value associated approaches (Touzet H, Varré J-S. Efficient and accurate p-value computation for position weight matrices. Algorithms Mol Biol. 2007;2(1510.1186):1748–7188), provide more rigorous ways to identify potential binding sites, but their results are difficult to interpret in terms of binding energy, which is essential for the modeling of transcription factor binding dynamics and enhancer activities.
Results
Here, we provide two different ways to find the scaling parameter
λ
that allows us to infer binding energy from a PWM score. The first approach uses a PWM and background genomic sequence as input to estimate
λ
for a specific transcription factor, which we applied to show that
λ
distributions for different transcription factor families correspond with their DNA binding properties. Our second method can reliably convert
λ
between different PWMs of the same transcription factor, which allows us to directly compare PWMs that were generated by different approaches.
Conclusion
These two approaches provide computationally efficient ways to scale PWM scores and estimate the strength of transcription factor binding sites in quantitative studies of binding dynamics. Their results are consistent with each other and previous reports in most of cases.
Journal Article
Target-DBPPred: An intelligent model for prediction of DNA-binding proteins using discrete wavelet transform based compression and light eXtreme gradient boosting
2022
DNA-protein interaction is a critical biological process that performs influential activities, including DNA transcription and recombination. DBPs (DNA-binding proteins) are closely associated with different kinds of human diseases (asthma, cancer, and AIDS), while some of the DBPs are used in the production of antibiotics, steroids, and anti-inflammatories. Several methods have been reported for the prediction of DBPs. However, a more intelligent method is still highly desirable for the accurate prediction of DBPs. This study presents an intelligent computational method, Target-DBPPred, to improve DBPs prediction. Important features from primary protein sequences are investigated via a novel feature descriptor, called EDF-PSSM-DWT (Evolutionary difference formula position-specific scoring matrix-discrete wavelet transform) and several other multi-evolutionary methods, including F-PSSM (Filtered position-specific scoring matrix), EDF-PSSM (Evolutionary difference formula position-specific scoring matrix), PSSM-DPC (Position-specific scoring matrix-dipeptide composition), and Lead-BiPSSM (Lead-bigram-position specific scoring matrix) to encapsulate diverse multivariate features. The best feature set from the features of each descriptor is selected using sequential forward selection (SFS). Further, four models are trained using Adaboost, XGB (eXtreme gradient boosting), ERT (extremely randomized trees), and LiXGB (Light eXtreme gradient boosting) classifiers. LiXGB, with the best feature set of EDF-PSSM-DWT, has attained 6.69% and 15.07% higher performance in terms of accuracies using training and testing datasets, respectively. The obtained results verify the improved performance of our proposed predictor over the existing predictors.
[Display omitted]
•Designed a novel predictor named Target-DBPPred for prediction of DNA-binding proteins.•The features are explored by EDF-PSSM-DWT, F-PSSM, PSSM-DPC, and Lead-BiPSSM.•The classification is performed by LiXGB, XGB, and ERT.•Target-DBPPred has secured the highest prediction results.
Journal Article
PCVMZM: Using the Probabilistic Classification Vector Machines Model Combined with a Zernike Moments Descriptor to Predict Protein–Protein Interactions from Protein Sequences
by
Li, Xiao
,
Jiang, Tonghai
,
Zhang, Jingting
in
Bacterial Proteins - classification
,
Bacterial Proteins - metabolism
,
Computational Biology - methods
2017
Protein–protein interactions (PPIs) are essential for most living organisms’ process. Thus, detecting PPIs is extremely important to understand the molecular mechanisms of biological systems. Although many PPIs data have been generated by high-throughput technologies for a variety of organisms, the whole interatom is still far from complete. In addition, the high-throughput technologies for detecting PPIs has some unavoidable defects, including time consumption, high cost, and high error rate. In recent years, with the development of machine learning, computational methods have been broadly used to predict PPIs, and can achieve good prediction rate. In this paper, we present here PCVMZM, a computational method based on a Probabilistic Classification Vector Machines (PCVM) model and Zernike moments (ZM) descriptor for predicting the PPIs from protein amino acids sequences. Specifically, a Zernike moments (ZM) descriptor is used to extract protein evolutionary information from Position-Specific Scoring Matrix (PSSM) generated by Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST). Then, PCVM classifier is used to infer the interactions among protein. When performed on PPIs datasets of Yeast and H. Pylori, the proposed method can achieve the average prediction accuracy of 94.48% and 91.25%, respectively. In order to further evaluate the performance of the proposed method, the state-of-the-art support vector machines (SVM) classifier is used and compares with the PCVM model. Experimental results on the Yeast dataset show that the performance of PCVM classifier is better than that of SVM classifier. The experimental results indicate that our proposed method is robust, powerful and feasible, which can be used as a helpful tool for proteomics research.
Journal Article
A Treatise to Computational Approaches Towards Prediction of Membrane Protein and Its Subtypes
by
Butt, Ahmad Hassan
,
Khan, Yaser Daanial
,
Rasool, Nouman
in
Algorithms
,
Amino acids
,
Amino Acids - chemistry
2017
Membrane proteins are vital mediating molecules responsible for the interaction of a cell with its surroundings. These proteins are involved in different functionalities such as ferrying of molecules and nutrients across membrane, recognizing foreign bodies, receiving outside signals and translating them into the cell. Membrane proteins play significant role in drug interaction as nearly 50% of the drug targets are membrane proteins. Due to the momentous role of membrane protein in cell activity, computational models able to predict membrane protein with accurate measures bears indispensable importance. The conventional experimental methods used for annotating membrane proteins are time-consuming and costly and in some cases impossible. Computationally intelligent techniques have emerged to be as a useful resource in the automation of prediction and hence the annotation process. In this study, various techniques have been reviewed that are based on different computational intelligence models used for prediction process. These techniques were formulated by different researchers and were further evaluated to provide a comparative analysis. Analysis shows that the usage of support vector machine-based prediction techniques bears more assiduous results.
Journal Article
PLM-ATG: Identification of Autophagy Proteins by Integrating Protein Language Model Embeddings with PSSM-Based Features
2025
Autophagy critically regulates cellular development while maintaining pathophysiological homeostasis. Since the autophagic process is tightly regulated by the coordination of autophagy-related proteins (ATGs), precise identification of these proteins is essential. Although current computational approaches have addressed experimental recognition’s costly and time-consuming challenges, they still have room for improvement since handcrafted features inadequately capture the intricate patterns and relationships hidden in sequences. In this study, we propose PLM-ATG, a novel computational model that integrates support vector machines with the fusion of protein language model (PLM) embeddings and position-specific scoring matrix (PSSM)-based features for the ATG identification. First, we extracted sequence-based features and PSSM-based features as the inputs of six classifiers to establish baseline models. Among these, the combination of the SVM classifier and the AADP-PSSM feature set achieved the best prediction accuracy. Second, two popular PLM embeddings, i.e., ESM-2 and ProtT5, were fused with the AADP-PSSM features to further improve the prediction of ATGs. Third, we selected the optimal feature subset from the combination of the ESM-2 embeddings and AADP-PSSM features to train the final SVM model. The proposed PLM-ATG achieved an accuracy of 99.5% and an MCC of 0.990, which are nearly 5% and 0.1 higher than those of the state-of-the-art model EnsembleDL-ATG, respectively.
Journal Article
Simple adjustment of the sequence weight algorithm remarkably enhances PSI-BLAST performance
by
Tomii, Kentaro
,
Oda, Toshiyuki
,
Lim, Kyungtaek
in
Algorithms
,
Amino Acid Sequence
,
Area Under Curve
2017
Background
PSI-BLAST, an extremely popular tool for sequence similarity search, features the utilization of Position-Specific Scoring Matrix (PSSM) constructed from a multiple sequence alignment (MSA). PSSM allows the detection of more distant homologs than a general amino acid substitution matrix does. An accurate estimation of the weights for sequences in an MSA is crucially important for PSSM construction. PSI-BLAST divides a given MSA into multiple blocks, for which sequence weights are calculated. When the block width becomes very narrow, the sequence weight calculation can be odd.
Results
We demonstrate that PSI-BLAST indeed generates a significant fraction of blocks having width less than 5, thereby degrading the PSI-BLAST performance. We revised the code of PSI-BLAST to prevent the blocks from being narrower than a given minimum block width (MBW). We designate the modified application of PSI-BLAST as PSI-BLASTexB. When MBW is 25, PSI-BLASTexB notably outperforms PSI-BLAST consistently for three independent benchmark sets. The performance boost is even more drastic when an MSA, instead of a sequence, is used as a query.
Conclusions
Our results demonstrate that the generation of narrow-width blocks during the sequence weight calculation is a critically important factor that restricts the PSI-BLAST search performance. By preventing narrow blocks, PSI-BLASTexB upgrades the PSI-BLAST performance remarkably. Binaries and source codes of PSI-BLASTexB (MBW = 25) are available at
https://github.com/kyungtaekLIM/PSI-BLASTexB
.
Journal Article
MFSPSSMpred: identifying short disorder-to-order binding regions in disordered proteins based on contextual local evolutionary conservation
by
Yamana, Hayato
,
Fang, Chun
,
Tominaga, Daisuke
in
Algorithms
,
Amino acid composition
,
Amino acids
2013
Background
Molecular recognition features (MoRFs) are short binding regions located in longer intrinsically disordered protein regions. Although these short regions lack a stable structure in the natural state, they readily undergo disorder-to-order transitions upon binding to their partner molecules. MoRFs play critical roles in the molecular interaction network of a cell, and are associated with many human genetic diseases. Therefore, identification of MoRFs is an important step in understanding functional aspects of these proteins and in finding applications in drug design.
Results
Here, we propose a novel method for identifying MoRFs, named as MFSPSSMpred (Masked, Filtered and Smoothed Position-Specific Scoring Matrix-based Predictor). Firstly, a masking method is used to calculate the average local conservation scores of residues within a masking-window length in the position-specific scoring matrix (PSSM). Then, the scores below the average are filtered out. Finally, a smoothing method is used to incorporate the features of flanking regions for each residue to prepare the feature sets for prediction. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the PSSM of sequence only. Experimental results show that, comparing with other methods tested on the same datasets, our method achieves the best performance: achieving 0.004~0.079 higher AUC than other methods when tested on TEST419, and achieving 0.045~0.212 higher AUC than other methods when tested on TEST2012. In addition, when tested on an independent membrane proteins-related dataset, MFSPSSMpred significantly outperformed the existing predictor MoRFpred.
Conclusions
This study suggests that: 1) amino acid composition and physicochemical properties in the flanking regions of MoRFs are very different from those in the general non-MoRF regions; 2) MoRFs contain both highly conserved residues and highly variable residues and, on the whole, are highly locally conserved; and 3) combining contextual information with local conservation information of residues facilitates the prediction of MoRFs.
Journal Article
Prediction of apoptosis protein subcellular location based on position-specific scoring matrix and isometric mapping algorithm
2019
Apoptosis proteins are related to many diseases. Obtaining the subcellular localization information of apoptosis proteins is helpful to understand the mechanism of diseases and to develop new drugs. At present, the researchers mainly focus on the primary protein sequences, so there is still room for improvement in the prediction accuracy of the subcellular localization of apoptosis proteins. In this paper, a new method named ERT-ECT-PSSM-IS is proposed to predict apoptosis proteins based on the position-specific scoring matrix (PSSM). First, the local and global features of different directions are extracted by evolutionary row transformation (ERT) and cross-covariance of evolutionary column transformation (ECT) based on PSSM (ERT-ECT-PSSM). Second, an improved isometric mapping algorithm (I-SMA) is used to eliminate redundant features. Finally, we adopt a support vector machine (SVM) to classify our results, and the prediction accuracy is evaluated by jackknife cross-validation tests. The experimental results show that the proposed method not only extracts more abundant feature expression but also has better predictive performance and robustness for the subcellular localization of apoptosis proteins in ZD98, ZW225, and CL317 databases.
Journal Article
Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
by
Weirauch, Matthew T
,
Frey, Brendan J
,
Alipanahi, Babak
in
631/114
,
631/114/2114
,
631/114/2785
2015
The binding specificities of RNA- and DNA-binding proteins are determined from experimental data using a ‘deep learning’ approach.
Knowing the sequence specificities of DNA- and RNA-binding proteins is essential for developing models of the regulatory processes in biological systems and for identifying causal disease variants. Here we show that sequence specificities can be ascertained from experimental data with 'deep learning' techniques, which offer a scalable, flexible and unified computational approach for pattern discovery. Using a diverse array of experimental data and evaluation metrics, we find that deep learning outperforms other state-of-the-art methods, even when training on
in vitro
data and testing on
in vivo
data. We call this approach DeepBind and have built a stand-alone software tool that is fully automatic and handles millions of sequences per experiment. Specificities determined by DeepBind are readily visualized as a weighted ensemble of position weight matrices or as a 'mutation map' that indicates how variations affect binding within a specific sequence.
Journal Article
Assessing the convergent validity between the automated emotion recognition software Noldus FaceReader 7 and Facial Action Coding System Scoring
by
Skiendziel, Tanja
,
Rösch, Andreas G.
,
Schultheiss, Oliver C.
in
Automation
,
Biology and Life Sciences
,
Classification
2019
This study validates automated emotion and action unit (AU) coding applying FaceReader 7 to a dataset of standardized facial expressions of six basic emotions (Standardized and Motivated Facial Expressions of Emotion). Percentages of correctly and falsely classified expressions are reported. The validity of coding AUs is provided by correlations between the automated analysis and manual Facial Action Coding System (FACS) scoring for 20 AUs. On average 80% of the emotional facial expressions are correctly classified. The overall validity of coding AUs is moderate with the highest validity indicators for AUs 1, 5, 9, 17 and 27. These results are compared to the performance of FaceReader 6 in previous research, with our results yielding comparable validity coefficients. Practical implications and limitations of the automated method are discussed.
Journal Article