Catalogue Search | MBRL

Pattern classification using ensemble methods

by Rokach, Lior in Pattern recognition systems. , Algorithms. , Machine learning.

Book

Data mining with decision trees : theory and applications

by Rokach, Lior , Maimon, Oded in Artificial Intelligence (Machine Learning, Neural Networks, Fuzzy Logic) , Computer Systems (Database Systems, Operating Systems) , Data mining

2008,2007

This is the first comprehensive book dedicated entirely to the field of decision trees in data mining and covers all aspects of this important technique. Decision trees have become one of the most powerful and popular approaches in knowledge discovery and data mining, the science and technology of exploring large and complex bodies of data in order to discover useful patterns. The area is of great importance because it enables modeling and knowledge extraction from the abundance of data available. Both theoreticians and practitioners are continually seeking techniques to make the process more efficient, cost-effective and accurate. Decision trees, originally implemented in decision theory and statistics, are highly effective tools in other areas such as data mining, text mining, information extraction, machine learning, and pattern recognition. This book invites readers to explore the many benefits in data mining that decision trees offer:

eBook

Share this book

Add to My Shelf

Ensemble learning : pattern classification using ensemble methods

by Rokach, Lior, author in Pattern recognition systems. , Algorithms. , Machine learning.

Book

Share this book

Add to My Shelf

Molecule generation using transformers and policy gradient reinforcement learning

by Shtar, Guy , Shapira, Bracha , Mazuz, Eyal in 631/114/1305 , 631/154/309/2144 , 631/92/630

2023

Generating novel valid molecules is often a difficult task, because the vast chemical space relies on the intuition of experienced chemists. In recent years, deep learning models have helped accelerate this process. These advanced models can also help identify suitable molecules for disease treatment. In this paper, we propose Taiga, a transformer-based architecture for the generation of molecules with desired properties. Using a two-stage approach, we first treat the problem as a language modeling task of predicting the next token, using SMILES strings. Then, we use reinforcement learning to optimize molecular properties such as QED. This approach allows our model to learn the underlying rules of chemistry and more easily optimize for molecules with desired properties. Our evaluation of Taiga, which was performed with multiple datasets and tasks, shows that Taiga is comparable to, or even outperforms, state-of-the-art baselines for molecule optimization, with improvements in the QED ranging from 2 to over 20 percent. The improvement was demonstrated both on datasets containing lead molecules and random molecules. We also show that with its two stages, Taiga is capable of generating molecules with higher biological property scores than the same model without reinforcement learning.

Journal Article

Share this book

Add to My Shelf

Data mining with decision trees : theory and applications

by Rokach, Lior , Maimon, Oded in Data mining. , Decision trees. , Machine learning.

Book

Share this book

Add to My Shelf

Detecting drug-drug interactions using artificial neural networks and classic graph similarity measures

by Rokach, Lior , Shtar, Guy , Shapira, Bracha in Algorithms , Analysis , Artificial intelligence

2019

Drug-drug interactions are preventable causes of medical injuries and often result in doctor and emergency room visits. Computational techniques can be used to predict potential drug-drug interactions. We approach the drug-drug interaction prediction problem as a link prediction problem and present two novel methods for drug-drug interaction prediction based on artificial neural networks and factor propagation over graph nodes: adjacency matrix factorization (AMF) and adjacency matrix factorization with propagation (AMFP). We conduct a retrospective analysis by training our models on a previous release of the DrugBank database with 1,141 drugs and 45,296 drug-drug interactions and evaluate the results on a later version of DrugBank with 1,440 drugs and 248,146 drug-drug interactions. Additionally, we perform a holdout analysis using DrugBank. We report an area under the receiver operating characteristic curve score of 0.807 and 0.990 for the retrospective and holdout analyses respectively. Finally, we create an ensemble-based classifier using AMF, AMFP, and existing link prediction methods and obtain an area under the receiver operating characteristic curve of 0.814 and 0.991 for the retrospective and the holdout analyses. We demonstrate that AMF and AMFP provide state of the art results compared to existing methods and that the ensemble-based classifier improves the performance by combining various predictors. Additionally, we compare our methods with multi-source data-based predictors using cross-validation. In the multi-source data comparison, our methods outperform various ensembles created using 29 different predictors based on several data sources. These results suggest that AMF, AMFP, and the proposed ensemble-based classifier can provide important information during drug development and regarding drug prescription given only partial or noisy data. Additionally, the results indicate that the interaction network (known DDIs) is the most useful data source for identifying potential DDIs and that our methods take advantage of it better than the other methods investigated. The methods we present can also be used to solve other link prediction problems. Drug embeddings (compressed representations) created when training our models using the interaction network have been made public.

Journal Article

Share this book

Add to My Shelf

CaSTLe – Classification of single cells by transfer learning: Harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments

by Lieberman, Yuval , Rokach, Lior , Shay, Tal in Access to Information , Animals , Annotations

2018

Single-cell RNA sequencing (scRNA-seq) is an emerging technology for profiling the gene expression of thousands of cells at the single cell resolution. Currently, the labeling of cells in an scRNA-seq dataset is performed by manually characterizing clusters of cells or by fluorescence-activated cell sorting (FACS). Both methods have inherent drawbacks: The first depends on the clustering algorithm used and the knowledge and arbitrary decisions of the annotator, and the second involves an experimental step in addition to the sequencing and cannot be incorporated into the higher throughput scRNA-seq methods. We therefore suggest a different approach for cell labeling, namely, classifying cells from scRNA-seq datasets by using a model transferred from different (previously labeled) datasets. This approach can complement existing methods, and-in some cases-even replace them. Such a transfer-learning framework requires selecting informative features and training a classifier. The specific implementation for the framework that we propose, designated ''CaSTLe-classification of single cells by transfer learning,'' is based on a robust feature engineering workflow and an XGBoost classification model built on these features. Evaluation of CaSTLe against two benchmark feature-selection and classification methods showed that it outperformed the benchmark methods in most cases and yielded satisfactory classification accuracy in a consistent manner. CaSTLe has the additional advantage of being parallelizable and well suited to large datasets. We showed that it was possible to classify cell types using transfer learning, even when the databases contained a very small number of genes, and our study thus indicates the potential applicability of this approach for analysis of scRNA-seq datasets.

Journal Article

Share this book

Add to My Shelf

GNNs and ensemble models enhance the prediction of new sRNA-mRNA interactions in unseen conditions

by Cohen, Shani , Rokach, Lior , Veksler-Lublinsky, Isana in Algorithms , Analysis , Bacteria

2025

Bacterial small RNAs (sRNAs) are pivotal in post-transcriptional regulation, affecting functions like virulence, metabolism, and gene expression by binding specific mRNA targets. Identifying these targets is crucial to understanding sRNA regulation across species. Despite advancements in high-throughput (HT) experimental methods, they remain technically challenging and are limited to detecting sRNA-target interactions under specific environmental conditions. Therefore, computational approaches, especially machine learning (ML), are essential for identifying strong candidates for biological validation. In this paper, we hypothesize that ML models trained on large-scale interaction data from specific conditions can accurately predict new interactions in unseen conditions within the same bacterial strain. To test this, we developed models from two families: (1) graph neural networks (GNNs), including GraphRNA and kGraphRNA , that learn transformed representations of interacting sRNA-mRNA pairs via graph relationships, and (2) decision forests, sInterRF (Random Forest) and sInterXGB (XGBoost), which use various interaction features for prediction. We also proposed Summation Ensemble Models (SEM) that combine scores from multiple models. Across three seen-to-unseen conditions evaluations, our models —particularly kGraphRNA — significantly improved the area under the ROC curve (AUC) and Precision-Recall curve (PR-AUC) compared to sRNARFTarget , CopraRNA , and RNAup . The SEM model combining GraphRNA and CopraRNA outperformed CopraRNA alone on a low-throughput (LT) interactions test set (HT-to-LT evaluation). Beyond enhanced performance, our models enable target prediction for species-specific sRNAs, a capability lacking in some existing tools. Furthermore, GNN models remove the dependency on external tools like RNAplex or RNAup to compute hybridization duplex or energy features, enhancing scalability and runtime efficiency. While this study focuses on E. coli K12 MG1655 interactions, our methods are fully adaptable to predict interactions in other bacterial strains, given sufficient data for training. Our comprehensive feature importance analysis revealed the complexity of sRNA-mRNA interactions across environmental conditions, underscoring the significance of RNA sequence composition and duplex structure characteristics, like base pairing and energy factors; findings that align with biological evidence from previous studies. As HT experiments expand sRNA-target interaction data across conditions in various bacteria, our ML methods with features analysis offer promising advances in sRNA-target prediction and deeper insights into sRNA regulatory mechanisms across diverse species.

Journal Article

Share this book

Add to My Shelf

Wearable Sensors for Ensuring Sports Safety in Children with Autism Spectrum Disorder: A Comprehensive Review

by Arbili, Ofir , Rokach, Lior , Cohen, Seffi in Anxiety , autism spectrum disorder , Autism Spectrum Disorder - physiopathology

2025

Children with Autism Spectrum Disorder (ASD) often face unique risks during sports activities due to challenges such as motor coordination difficulties, sensory sensitivities, and communication impairments. This paper provides a comprehensive review of the use of wearable sensor technologies to enhance the safety and participation of children with ASD in sports. Utilizing a systematic approach, we analyze 144 papers identified through advanced search methodology. Our findings reveal that wearable sensors can monitor physiological signals like heart rate variability and electrodermal activity and biomechanical signals such as movement patterns to detect early signs of distress, anxiety, or potential injury. The integration of these technologies into sports settings for children with ASD presents significant potential for improving safety, reducing participation barriers, and enhancing overall well-being. Key findings indicate that while the application of wearable sensors in this context is still emerging, early results are promising. However, challenges remain regarding device usability, data privacy, and the need for further research to validate the effectiveness of these technologies in real-world sports environments. This review highlights the importance of interdisciplinary collaboration among researchers, technology developers, educators, and caregivers to develop and implement wearable sensor solutions that are tailored to the unique needs of children with ASD, thereby promoting safer and more inclusive sports participation.

Journal Article

Share this book

Add to My Shelf

A simplified similarity-based approach for drug-drug interaction prediction

by Shtar, Guy , Solomon, Adir , Shapira, Bracha in Accuracy , Adverse and side effects , Analysis

2023

Drug-drug interactions (DDIs) are a critical component of drug safety surveillance. Laboratory studies aimed at detecting DDIs are typically difficult, expensive, and time-consuming; therefore, developing in-silico methods is critical. Machine learning-based approaches for DDI prediction have been developed; however, in many cases, their ability to achieve high accuracy relies on data only available towards the end of the molecule lifecycle. Here, we propose a simple yet effective similarity-based method for preclinical DDI prediction where only the chemical structure is available. We test the model on new, unseen drugs. To focus on the preclinical problem setting, we conducted a retrospective analysis and tested the models on drugs that were added to a later version of the DrugBank database. We extend an existing method, adjacency matrix factorization with propagation (AMFP), to support unseen molecules by applying a new lookup mechanism to the drugs’ chemical structure, lookup adjacency matrix factorization with propagation (LAMFP). We show that using an ensemble of different similarity measures improves the results. We also demonstrate that Chemprop, a message-passing neural network, can be used for DDI prediction. In computational experiments, LAMFP results in high accuracy, with an area under the receiver operating characteristic curve of 0.82 for interactions involving a new drug and an existing drug and for interactions involving only existing drugs. Moreover, LAMFP outperforms state-of-the-art, complex graph neural network DDI prediction methods.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter