Catalogue Search | MBRL

by Kutz, J. N. , Fasel, U. , Brunton, B. W.

2022

Sparse model identification enables the discovery of nonlinear dynamical systems purely from data; however, this approach is sensitive to noise, especially in the low-data limit. In this work, we leverage the statistical approach of bootstrap aggregating (bagging) to robustify the sparse identification of the nonlinear dynamics (SINDy) algorithm. First, an ensemble of SINDy models is identified from subsets of limited and noisy data. The aggregate model statistics are then used to produce inclusion probabilities of the candidate functions, which enables uncertainty quantification and probabilistic forecasts. We apply this ensemble-SINDy (E-SINDy) algorithm to several synthetic and real-world datasets and demonstrate substantial improvements to the accuracy and robustness of model discovery from extremely noisy and limited data. For example, E-SINDy uncovers partial differential equations models from data with more than twice as much measurement noise as has been previously reported. Similarly, E-SINDy learns the Lotka Volterra dynamics from remarkably limited data of yearly lynx and hare pelts collected from 1900 to 1920. E-SINDy is computationally efficient, with similar scaling as standard SINDy. Finally, we show that ensemble statistics from E-SINDy can be exploited for active learning and improved model predictive control.

Journal Article

Share this book

Add to My Shelf

Escalation of Forecasting Accuracy through Linear Combiners of Predictive Models

by Chandra Nayak, Sarat in Accuracy , Autoregressive models , combining forecasts; ensemble method; artificial neural network; stock market prediction; financial time series forecasting; exchange rate forecasting; multilayer perceptron

2019

Precise and proficient modelling and forecasting financial time series has been paying attention of researchers, which leads to the development of various statistical and machine learning based models. Accuracy of a particular method is problem and domain specific, hence identifying best method is controversial. To boost up overall accuracies and minimizing risk of model selection, combination of outputs of different models has been recommended in the literature. This work presents a linear combiner of five predictive models i.e. ARIMA, RBFNN, MLP, SVM, and FLANN for improving prediction accuracy. Four statistical methods i.e. trimmed mean, simple average, median, and an error based method are used for suitable choice of combining weights. The individual forecasts and the linear combiner are used separately to predict closing price of five stock markets and exchange rate of five global markets. Extensive simulation work demonstrates the feasibility and supremacy of the linear combiner.

Journal Article

Share this book

Add to My Shelf

A multimodel random forest ensemble method for an improved assessment of Chinese terrestrial vegetation carbon density

by Wei, Jie , Wang, Zhaosheng , Song, Wenchao in Accuracy , Biomass , Biosphere

2023

Assessing the terrestrial vegetation carbon density (TVCD) is crucial for evaluating the national carbon balance. However, current national‐scale TVCD assessments show strong disparities, despite the good estimation method of their underlying models. Here, we attribute this contradiction to a flaw in the methods of using multimodel simulation results, which ignore the connections between results, leading to an overoptimistic evaluation of the multimodel ensemble mean (MMEM) method. Thus, using the state‐of‐the‐art multimodel random forest ensemble (MMRFE) method to integrate the results of 10 models, we reproduced Chinese TVCD data during 1982–2010. Compared with the nationally averaged TVCD field investigation data (27 ± 26 Mg C/ha), we found that the results of five models were overestimated by 7.4%–85.2%, and the remaining models were underestimated by 3.7%–77.8%. The MMEM TVCD method produced an overestimation of 2%, but the MMRFE method produced an underestimation of only 0.2%. Additionally, the summary Taylor diagrams of the TVCD at the national and ecosystem (forest, shrub, grass and crop ecosystems) scales all showed that the MMRFE TVCD produced the smallest standard deviations and root mean square deviations and the highest correlation coefficients. Furthermore, the MMRFE TVCDs were all significantly positively correlated with the normalized difference vegetation index (NDVI), and they had the same increasing trend, but an opposite variation trend from the MMEM TVCD and NDVI. This result implied that the spatiotemporal variation modes of the MMRFE TVCD were consistent with those of the NDVI. The results suggested that compared with the traditional MMEM method, the MMRFE TVCD and its spatiotemporal variation modes were more similar to the real TVCD. In conclusion, the MMRFE method can effectively improve the accuracy of national‐scale TVCD estimation, and effectively reduce the uncertainty of large‐scale terrestrial vegetation carbon estimation processes. Notably, we provide a new method that uses a machine learning approach to mine multimodel terrestrial carbon information to reduce the uncertainty in the estimation of terrestrial ecosystem carbon components.

Journal Article

Share this book

Add to My Shelf

Machine learning uncovers the most robust self-report predictors of relationship quality across 43 longitudinal couples studies

by Gordon, Amie M. , Overall, Nickola C. , Clarke, Jennifer in Emotions , Family Characteristics , Female

2020

Given the powerful implications of relationship quality for health and well-being, a central mission of relationship science is explaining why some romantic relationships thrive more than others. This large-scale project used machine learning (i.e., Random Forests) to 1) quantify the extent to which relationship quality is predictable and 2) identify which constructs reliably predict relationship quality. Across 43 dyadic longitudinal datasets from 29 laboratories, the top relationship-specific predictors of relationship quality were perceived-partner commitment, appreciation, sexual satisfaction, perceived-partner satisfaction, and conflict. The top individualdifference predictors were life satisfaction, negative affect, depression, attachment avoidance, and attachment anxiety. Overall, relationship-specific variables predicted up to 45% of variance at baseline, and up to 18% of variance at the end of each study. Individual differences also performed well (21% and 12%, respectively). Actor-reported variables (i.e., own relationship-specific and individual-difference variables) predicted two to four times more variance than partner-reported variables (i.e., the partner’s ratings on those variables). Importantly, individual differences and partner reports had no predictive effects beyond actor-reported relationshipspecific variables alone. These findings imply that the sum of all individual differences and partner experiences exert their influence on relationship quality via a person’s own relationship-specific experiences, and effects due to moderation by individual differences and moderation by partner-reports may be quite small. Finally, relationship-quality change (i.e., increases or decreases in relationship quality over the course of a study) was largely unpredictable from any combination of self-report variables. This collective effort should guide future models of relationships.

Journal Article

Share this book

Add to My Shelf

Ensemble‐based adaptive soft sensor for fault‐tolerant biomass monitoring

by Siegl, Manuel , Becker, Thomas , Brunner, Vincent in adaptive modeling , Algorithms , Biomass

2022

The accuracy and precision of soft sensors depend strongly on the reliability of underlying model inputs. These inputs (particularly readings of hardware sensors) are frequently subject to faults. This study aims to develop an adaptive soft sensor capable of reliable and robust biomass concentration predictions in the presence of faulty model inputs for a Pichia pastoris bioprocess. Hence, three soft sensor submodels were developed based on three independent model inputs (base addition, CO2 production, and mid‐infrared spectrum). An ensemble‐based algorithm combined the submodels to form an ensemble model, that is, an adaptive soft sensor, to achieve fault‐tolerant prediction. The algorithm's basic steps are as follows: the initial determination of submodel reliability is followed by selecting appropriate submodels to generate a reliable prediction via variance‐based weighting of the submodels. The adaptive soft sensor demonstrated high robustness and accuracy in biomass prediction in the presence of multiple simulated sensor faults (RMSE = 0.43 g L−1) and multiple real sensor faults (RMSE = 0.70 g L−1).

Journal Article

Share this book

Add to My Shelf

CatBoost for big data: an interdisciplinary review

by Hancock, John T. , Khoshgoftaar, Taghi M. in Algorithms , Best practice , Big Data

2020

Gradient Boosted Decision Trees (GBDT’s) are a powerful tool for classification and regression tasks in Big Data. Researchers should be familiar with the strengths and weaknesses of current implementations of GBDT’s in order to use them effectively and make successful contributions. CatBoost is a member of the family of GBDT machine learning ensemble techniques. Since its debut in late 2018, researchers have successfully used CatBoost for machine learning studies involving Big Data. We take this opportunity to review recent research on CatBoost as it relates to Big Data, and learn best practices from studies that cast CatBoost in a positive light, as well as studies where CatBoost does not outshine other techniques, since we can learn lessons from both types of scenarios. Furthermore, as a Decision Tree based algorithm, CatBoost is well-suited to machine learning tasks involving categorical, heterogeneous data. Recent work across multiple disciplines illustrates CatBoost’s effectiveness and shortcomings in classification and regression tasks. Another important issue we expose in literature on CatBoost is its sensitivity to hyper-parameters and the importance of hyper-parameter tuning. One contribution we make is to take an interdisciplinary approach to cover studies related to CatBoost in a single work. This provides researchers an in-depth understanding to help clarify proper application of CatBoost in solving problems. To the best of our knowledge, this is the first survey that studies all works related to CatBoost in a single publication.

Journal Article

Share this book

Add to My Shelf

Completed Review of Various Solar Power Forecasting Techniques Considering Different Viewpoints

by Huang, Cheng-Liang , Phan, Quoc-Thang , Wu, Yuan-Kang in Algorithms , Alternative energy sources , Classification

2022

Solar power has rapidly become an increasingly important energy source in many countries over recent years; however, the intermittent nature of photovoltaic (PV) power generation has a significant impact on existing power systems. To reduce this uncertainty and maintain system security, precise solar power forecasting methods are required. This study summarizes and compares various PV power forecasting approaches, including time-series statistical methods, physical methods, ensemble methods, and machine and deep learning methods, the last of which there is a particular focus. In addition, various optimization algorithms for model parameters are summarized, the crucial factors that influence PV power forecasts are investigated, and input selection for PV power generation forecasting models are discussed. Probabilistic forecasting is expected to play a key role in the PV power forecasting required to meet the challenges faced by modern grid systems, and so this study provides a comparative analysis of existing deterministic and probabilistic forecasting models. Additionally, the importance of data processing techniques that enhance forecasting performance are highlighted. In comparison with the extant literature, this paper addresses more of the issues concerning the application of deep and machine learning to PV power forecasting. Based on the survey results, a complete and comprehensive solar power forecasting process must include data processing and feature extraction capabilities, a powerful deep learning structure for training, and a method to evaluate the uncertainty in its predictions.

Journal Article

Share this book

Add to My Shelf

Comprehensive benchmarking and ensemble approaches for metagenomic classifiers

by Rosen, Gail L. , Ounit, Rachid , Hasan, Nur A. in Algorithms , Animal Genetics and Genomics , Artificial chromosomes

2017

Background One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited. Results In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages. Conclusions This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.

Journal Article

Share this book

Add to My Shelf

A novel comprehensive investigation for enhancing cluster analysis accuracy through ensemble learning methods

by K, LNC Prakash , Raju, Kachapuram Basava , Lakshmi, H. N.

2024

Ensemble learning stands out as a widely embraced technique in machine learning. This research explores the application of ensemble learning, including ensemble clustering, to enhance the precision of cluster analysis for datasets with multiple attributes and unclear correlations. Employing a majority voting-based ensemble clustering approach, specific techniques such as k-means clustering, affinity propagation, mean shift, BIRCH clustering, and others are applied to defined datasets, leading to improved clustering results. The study involves a comprehensive comparative analysis, contrasting ensemble clustering outcomes with those of individual techniques. The process of improving cluster identification accuracy encompasses data collection, pre-processing to exclude irrelevant elements, and the application of standard clustering algorithms. The task includes defining the optimal number of groups before comparing clustering models. Additionally, a combined model is constructed by merging BIRCH clustering and mean shift clustering, leveraging their advantages to enhance overall clustering strength and accuracy. This research contributes to advancing ensemble learning and ensemble clustering methodologies, offering improved accuracy, and uncovering hidden patterns in complex datasets.

Journal Article

Share this book

Add to My Shelf

An integrated approach of hybrid ensemble machinelearning-based efficient seismic slope fragilityassessment and GIS mapping

by Go, Chaeyeon , Mostafizur, Rahman Md , Hahm, Daegi

2026

Seismic slope failures are difficult to predict due to the probabilistic nature of soil properties and seismic loads. Additionally, high-fidelity simulation-based fragility assessment methods for soil slopes require substantial computational resources, making it time-consuming to link these results to geographic information systems (GIS). Thus, this study proposes a computationally efficient approach for seismic slope fragility assessment, maintaining high accuracy while reducing computational cost. Based on slope failure thresholds from observation data, extensive displacement-based fragility analyses are conducted, and High-Confidence-of-Low-Probability-of-Failure (HCLPF) values are calculated across diverse slope conditions. Based on such an HCLPF dataset, a machine learning (ML) model predicting HCLPF of fragility analyses is established. A Hybrid Ensemble Method (HEM) combining Extreme Gradient Boosting (XGB) and Bagging Ensemble Method (BEM) is newly proposed for accurate prediction. The sub-strategy is also presented to reduce iterative optimization cost for XGB within the proposed HEM. Consequently, the HEM model outperformed existing individual and ensemble models in accuracy on test data. Also, when integrated into GIS, the HEM-based fragility prediction map closely matched a high-fidelity simulation-based fragility map, achieving about 95% accuracy while reducing computational costs by about 96%.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter