Catalogue Search | MBRL

Drug-target interaction prediction with tree-ensemble learning and output space reconstruction

by Vens, Celine , Pliakos, Konstantinos in Algorithms , Analysis , Benchmarking

2020

Background Computational prediction of drug-target interactions (DTI) is vital for drug discovery. The experimental identification of interactions between drugs and target proteins is very onerous. Modern technologies have mitigated the problem, leveraging the development of new drugs. However, drug development remains extremely expensive and time consuming. Therefore, in silico DTI predictions based on machine learning can alleviate the burdensome task of drug development. Many machine learning approaches have been proposed over the years for DTI prediction. Nevertheless, prediction accuracy and efficiency are persisting problems that still need to be tackled. Here, we propose a new learning method which addresses DTI prediction as a multi-output prediction task by learning ensembles of multi-output bi-clustering trees (eBICT) on reconstructed networks. In our setting, the nodes of a DTI network (drugs and proteins) are represented by features (background information). The interactions between the nodes of a DTI network are modeled as an interaction matrix and compose the output space in our problem. The proposed approach integrates background information from both drug and target protein spaces into the same global network framework. Results We performed an empirical evaluation, comparing the proposed approach to state of the art DTI prediction methods and demonstrated the effectiveness of the proposed approach in different prediction settings. For evaluation purposes, we used several benchmark datasets that represent drug-protein networks. We show that output space reconstruction can boost the predictive performance of tree-ensemble learning methods, yielding more accurate DTI predictions. Conclusions We proposed a new DTI prediction method where bi-clustering trees are built on reconstructed networks. Building tree-ensemble learning models with output space reconstruction leads to superior prediction results, while preserving the advantages of tree-ensembles, such as scalability, interpretability and inductive setting.

Journal Article

Share this book

Add to My Shelf

Multivariate Gaussian and Student-t process regression for multi-output prediction

by Wang, Bo , Chen, Zexun , Gorban, Alexander N. in Air quality , Artificial Intelligence , Computational Biology/Bioinformatics

2020

Gaussian process model for vector-valued function has been shown to be useful for multi-output prediction. The existing method for this model is to reformulate the matrix-variate Gaussian distribution as a multivariate normal distribution. Although it is effective in many cases, reformulation is not always workable and is difficult to apply to other distributions because not all matrix-variate distributions can be transformed to respective multivariate distributions, such as the case for matrix-variate Student- t distribution. In this paper, we propose a unified framework which is used not only to introduce a novel multivariate Student- t process regression model (MV-TPR) for multi-output prediction, but also to reformulate the multivariate Gaussian process regression (MV-GPR) that overcomes some limitations of the existing methods. Both MV-GPR and MV-TPR have closed-form expressions for the marginal likelihoods and predictive distributions under this unified framework and thus can adopt the same optimization approaches as used in the conventional GPR. The usefulness of the proposed methods is illustrated through several simulated and real-data examples. In particular, we verify empirically that MV-TPR has superiority for the datasets considered, including air quality prediction and bike rent prediction. At last, the proposed methods are shown to produce profitable investment strategies in the stock markets.

Journal Article

Share this book

Add to My Shelf

Multi-Output Monitoring of High-Speed Laser Welding State Based on Deep Learning

by Du, Dong , Xue, Boce , Chang, Baohua in Algorithms , CNN visualization , Deep learning

2021

In order to ensure the production quality of high-speed laser welding, it is necessary to simultaneously monitor multiple state properties. Monitoring methods combining vision sensing and deep learning models are popular but most models used can only make predictions on single welding state property. In this contribution, we propose a multi-output model based on a lightweight convolutional neural network (CNN) architecture and introduce the particle swarm optimization (PSO) technique to optimize the loss function of the model, to simultaneously monitor multiple state properties of high-speed laser welding of AISI 304 austenitic stainless steel. High-speed imaging is performed to capture images of the melt pool and the dataset is built. Test results of different models show that the proposed model can achieve monitoring of multiple welding state properties accurately and efficiently. In addition, we make an interpretation and discussion on the prediction of the model through a visualization method, which can help to deepen our understanding of the relationship between the melt pool appearance and welding state. The proposed method can not only be applied to the monitoring of high-speed laser welding but also has the potential to be used in other procedures of welding state monitoring.

Journal Article

Share this book

Add to My Shelf

Against the Flow of Time with Multi-Output Models

by Jakubík, Jozef , Phuong, Mary , Chvosteková, Martina in Autoregressive processes , Causality , Data analysis

2023

Recent work has paid close attention to the first principle of Granger causality, according to which cause precedes effect. In this context, the question may arise whether the detected direction of causality also reverses after the time reversal of unidirectionally coupled data. Recently, it has been shown that for unidirectionally causally connected autoregressive (AR) processes → , after time reversal of data, the opposite causal direction → is indeed detected, although typically as part of the bidirectional link. As we argue here, the answer is different when the measured data are not from AR processes but from linked deterministic systems. When the goal is the usual forward data analysis, cross-mapping-like approaches correctly detect → , while Granger causality-like approaches, which should not be used for deterministic time series, detect causal independence ⫫ . The results of backward causal analysis depend on the predictability of the reversed data. Unlike AR processes, observables from deterministic dynamical systems, even complex nonlinear ones, can be predicted well forward, while backward predictions can be difficult (notably when the time reversal of a function leads to one-to-many relations). To address this problem, we propose an approach based on models that provide multiple candidate predictions for the target, combined with a loss function that consideres only the best candidate. The resulting good forward and backward predictability supports the view that unidirectionally causally linked deterministic dynamical systems → can be expected to detect the same link both before and after time reversal.

Journal Article

Share this book

Add to My Shelf

Multi-Output Based Hybrid Integrated Models for Student Performance Prediction

by Xue, Han , Niu, Yanmin in Academic achievement , Accuracy , Algorithms

2023

In higher education, student learning relies increasingly on autonomy. With the rise in blended learning, both online and offline, students need to further improve their online learning effectiveness. Therefore, predicting students’ performance and identifying students who are struggling in real time to intervene is an important way to improve learning outcomes. However, currently, machine learning in grade prediction applications typically only employs a single-output prediction method and has lagging issues. To advance the prediction of time and enhance the predictive attributes, as well as address the aforementioned issues, this study proposes a multi-output hybrid ensemble model that utilizes data from the Superstar Learning Communication Platform (SLCP) to predict grades. Experimental results show that using the first six weeks of SLCP data and the Xgboost model to predict mid-term and final grades meant that accuracy reached 78.37%, which was 3–8% higher than the comparison models. Using the Gdbt model to predict homework and experiment grades, the average mean squared error was 16.76, which is better than the comparison models. This study uses a multi-output hybrid ensemble model to predict how grades can help improve student learning quality and teacher teaching effectiveness.

Journal Article

Share this book

Add to My Shelf

Predicting Microbial Species in a River Based on Physicochemical Properties by Bio-Inspired Metaheuristic Optimized Machine Learning

by Sun, Qian , Truong, Dinh-Nhat , Susilo, Billy in Agricultural production , Artificial intelligence , Bacteria

2019

The main goal of the analysis of microbial ecology is to understand the relationship between Earth’s microbial community and their functions in the environment. This paper presents a proof-of-concept research to develop a bioclimatic modeling approach that leverages artificial intelligence techniques to identify the microbial species in a river as a function of physicochemical parameters. Feature reduction and selection are both utilized in the data preprocessing owing to the scarce of available data points collected and missing values of physicochemical attributes from a river in Southeast China. A bio-inspired metaheuristic optimized machine learner, which supports the adjustment to the multiple-output prediction form, is used in bioclimatic modeling. The accuracy of prediction and applicability of the model can help microbiologists and ecologists in quantifying the predicted microbial species for further experimental planning with minimal expenditure, which is become one of the most serious issues when facing dramatic changes of environmental conditions caused by global warming. This work demonstrates a neoteric approach for potential use in predicting preliminary microbial structures in the environment.

Journal Article

Share this book

Add to My Shelf

State of Health Trajectory Prediction Based on Multi-Output Gaussian Process Regression for Lithium-Ion Battery

by Jiwei Wang , Guoqing Guan , Kaile Peng in Aging , Artificial intelligence , Battery cycles

2022

Lithium-ion battery state of health (SOH) accurate prediction is of great significance to ensure the safe reliable operation of electric vehicles and energy storage systems. However, safety issues arising from the inaccurate estimation and prediction of battery SOH have caused widespread concern in academic and industrial communities. In this paper, a method is proposed to build an accurate SOH prediction model for battery packs based on multi-output Gaussian process regression (MOGPR) by employing the initial cycle data of the battery pack and the entire life cycling data of battery cells. Firstly, a battery aging experimental platform is constructed to collect battery aging data, and health indicators (HIs) that characterize battery aging are extracted. Then, the correlation between the HIs and the battery capacity is evaluated by the Pearson correlation analysis method, and the HIs that own a strong correlation to the battery capacity are screened. Finally, two MOGPR models are constructed to predict the HIs and SOH of the battery pack. Based on the first MOGPR model and the early HIs of the battery pack, the future cycle HIs can be predicted. In addition, the predicted HIs and the second MOGPR model are used to predict the SOH of the battery pack. The experimental results verify that the approach has a competitive performance; the mean and maximum values of the mean absolute error (MAE) and root mean square error (RMSE) are 1.07% and 1.42%, and 1.77% and 2.45%, respectively.

Journal Article

Share this book

Add to My Shelf

Applications of multi-fidelity multi-output Kriging to engineering design optimization

by Toal, David J. J. in Accuracy , Combustion chambers , Computational Mathematics and Numerical Analysis

2023

Surrogate modelling is a popular approach for reducing the number of high fidelity simulations required within an engineering design optimization. Multi-fidelity surrogate modelling can further reduce this effort by exploiting low fidelity simulation data. Multi-output surrogate modelling techniques offer a way for categorical variables e.g. the choice of material, to be included within such models. While multi-fidelity multi-output surrogate modelling strategies have been proposed, to date only their predictive performance rather than optimization performance has been assessed. This paper considers three different multi-fidelity multi-output Kriging based surrogate modelling approaches and compares them to ordinary Kriging and multi-fidelity Kriging. The first approach modifies multi-fidelity Kriging to include multiple outputs whereas the second and third approaches model the different levels of simulation fidelity as different outputs within a multi-output Kriging model. Each of these techniques is assessed using three engineering design problems including the optimization of a gas turbine combustor in the presence of a topological variation, the optimization of a vibrating truss where the material can vary and finally, the parallel optimization of a family of airfoils.

Journal Article

Share this book

Add to My Shelf

A Bayesian Genomic Multi-output Regressor Stacking Model for Predicting Multi-trait Multi-environment Plant Breeding Data

by Montesinos-López, José C , Singh, Ravi , Cuevas, Jaime in Accuracy , Breeding of animals , Estimates

2019

In this paper we propose a Bayesian multi-output regressor stacking (BMORS) model that is a generalization of the multi-trait regressor stacking method. The proposed BMORS model consists of two stages: in the first stage, a univariate genomic best linear unbiased prediction (GBLUP including genotype × environment interaction GE) model is implemented for each of the L traits under study; then the predictions of all traits are included as covariates in the second stage, by implementing a Ridge regression model. The main objectives of this research were to study alternative models to the existing multi-trait multi-environment (BMTME) model with respect to (1) genomic-enabled prediction accuracy, and (2) potential advantages in terms of computing resources and implementation. We compared the predictions of the BMORS model to those of the univariate GBLUP model using 7 maize and wheat datasets. We found that the proposed BMORS produced similar predictions to the univariate GBLUP model and to the BMTME model in terms of prediction accuracy; however, the best predictions were obtained under the BMTME model. In terms of computing resources, we found that the BMORS is at least 9 times faster than the BMTME method. Based on our empirical findings, the proposed BMORS model is an alternative for predicting multi-trait and multi-environment data, which are very common in genomic-enabled prediction in plant and animal breeding programs.

Journal Article

Share this book

Add to My Shelf

Autoreplicative random forests with applications to missing value imputation

by Carreño, Ander , Read, Jesse , Antonenko, Ekaterina in Artificial Intelligence , Computer Science , Control

2024

Missing values are a common problem in data science and machine learning. Removing instances with missing values is a straightforward workaround, but this can significantly hinder subsequent data analysis, particularly when features outnumber instances. There are a variety of methodologies proposed in the literature for imputing missing values. Denoising Autoencoders, for example, have been leveraged efficiently for imputation. However, neural network approaches have been relatively less effective on smaller datasets. In this work, we propose Autoreplicative Random Forests (ARF) as a multi-output learning approach, which we introduce in the context of a framework that may impute via either an iterative or procedural process. Experiments on several low- and high-dimensional datasets show that ARF is computationally efficient and exhibits better imputation performance than its competitors, including neural network approaches. In order to provide statistical analysis and mathematical background to the proposed missing value imputation framework, we also propose probabilistic ARFs, where the confidence values are provided over different imputation hypotheses, therefore maximizing the utility of such a framework in a machine-learning pipeline targeting predictive performance.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter