Catalogue Search | MBRL

Development of a Prediction Method of Cell Density in Autotrophic/Heterotrophic Microorganism Mixtures by Machine Learning Using Absorbance Spectrum Data

by Akihito Nakanishi , Hiroaki Fukunishi , Fumihito Eguchi in Absorbance , Algorithms , Artificial intelligence

2022

Microflora is actively used to produce value-added materials in industry, and each cell density should be controlled for stable microflora use. In this study, a simple system evaluating the cell density was constructed with artificial intelligence (AI) using the absorbance spectra data of microflora. To set up the system, the prediction system for cell density based on machine learning was constructed using the spectra data as the feature from the mixture of Saccharomyces cerevisiae and Chlamydomonas reinhardtii. As the results of predicting cell density by extremely randomized trees, when the cell densities of S. cerevisiae and C. reinhardtii were shifted and fixed, the coefficient of determination (R2) was 0.8495; on the other hand, when the cell densities of S. cerevisiae and C. reinhardtii were fixed and shifted, the R2 was 0.9232. To explain the prediction system, the randomized trees regressor of the decision tree-based ensemble learning method as the machine learning algorithm and Shapley additive explanations (SHAPs) as the explainable AI (XAI) to interpret the features contributing to the prediction results were used. As a result of the SHAP analyses, not only the optical density, but also the absorbance of the Soret and Q bands derived from the chloroplasts of C. reinhardtii could contribute to the prediction as the features. The simple cell density evaluating system could have an industrial impact.

Journal Article

Share this book

Add to My Shelf

SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome

by Basith, Shaherin , Shin, Tae Hwan , Manavalan, Balachandran in Accuracy , Adenine , Computer applications

2019

DNA N6-adenine methylation (6mA) is an epigenetic modification in prokaryotes and eukaryotes. Identifying 6mA sites in rice genome is important in rice epigenetics and breeding, but non-random distribution and biological functions of these sites remain unclear. Several machine-learning tools can identify 6mA sites but show limited prediction accuracy, which limits their usability in epigenetic research. Here, we developed a novel computational predictor, called the Sequence-based DNA N6-methyladenine predictor (SDM6A), which is a two-layer ensemble approach for identifying 6mA sites in the rice genome. Unlike existing methods, which are based on single models with basic features, SDM6A explores various features, and five encoding methods were identified as appropriate for this problem. Subsequently, an optimal feature set was identified from encodings, and corresponding models were developed individually using support vector machine and extremely randomized tree. First, all five single models were integrated via ensemble approach to define the class for each classifier. Second, two classifiers were integrated to generate a final prediction. SDM6A achieved robust performance on cross-validation and independent evaluation, with average accuracy and Matthews correlation coefficient (MCC) of 88.2% and 0.764, respectively. Corresponding metrics were 4.7%–11.0% and 2.3%–5.5% higher than those of existing methods, respectively. A user-friendly, publicly accessible web server (http://thegleelab.org/SDM6A) was implemented to predict novel putative 6mA sites in rice genome.

Journal Article

Share this book

Add to My Shelf

iBCE-EL: A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction

by Govindaraj, Rajiv Gandhi , Shin, Tae Hwan , Kim, Myeong Ok in Algorithms , Amino acid composition , Amino acid sequence

2018

Identification of B-cell epitopes (BCEs) is a fundamental step for epitope-based vaccine development, antibody production, and disease prevention and diagnosis. Due to the avalanche of protein sequence data discovered in postgenomic age, it is essential to develop an automated computational method to enable fast and accurate identification of novel BCEs within vast number of candidate proteins and peptides. Although several computational methods have been developed, their accuracy is unreliable. Thus, developing a reliable model with significant prediction improvements is highly desirable. In this study, we first constructed a non-redundant data set of 5,550 experimentally validated BCEs and 6,893 non-BCEs from the Immune Epitope Database. We then developed a novel ensemble learning framework for improved linear BCE predictor called iBCE-EL, a fusion of two independent predictors, namely, extremely randomized tree (ERT) and gradient boosting (GB) classifiers, which, respectively, uses a combination of physicochemical properties (PCP) and amino acid composition and a combination of dipeptide and PCP as input features. Cross-validation analysis on a benchmarking data set showed that iBCE-EL performed better than individual classifiers (ERT and GB), with a Matthews correlation coefficient (MCC) of 0.454. Furthermore, we evaluated the performance of iBCE-EL on the independent data set. Results show that iBCE-EL significantly outperformed the state-of-the-art method with an MCC of 0.463. To the best of our knowledge, iBCE-EL is the first ensemble method for linear BCEs prediction. iBCE-EL was implemented in a web-based platform, which is available at http://thegleelab.org/iBCE-EL. iBCE-EL contains two prediction modes. The first one identifying peptide sequences as BCEs or non-BCEs, while later one is aimed at providing users with the option of mining potential BCEs from protein sequences.

Journal Article

Share this book

Add to My Shelf

Flash Flood Susceptibility Modeling Using New Approaches of Hybrid and Ensemble Tree-Based Machine Learning Algorithms

by Saha, Asish , Melesse, Assefa M. , Chandra Pal, Subodh in adverse effects , Algorithms , altitude

2020

Flash flooding is considered one of the most dynamic natural disasters for which measures need to be taken to minimize economic damages, adverse effects, and consequences by mapping flood susceptibility. Identifying areas prone to flash flooding is a crucial step in flash flood hazard management. In the present study, the Kalvan watershed in Markazi Province, Iran, was chosen to evaluate the flash flood susceptibility modeling. Thus, to detect flash flood-prone zones in this study area, five machine learning (ML) algorithms were tested. These included boosted regression tree (BRT), random forest (RF), parallel random forest (PRF), regularized random forest (RRF), and extremely randomized trees (ERT). Fifteen climatic and geo-environmental variables were used as inputs of the flash flood susceptibility models. The results showed that ERT was the most optimal model with an area under curve (AUC) value of 0.82. The rest of the models’ AUC values, i.e., RRF, PRF, RF, and BRT, were 0.80, 0.79, 0.78, and 0.75, respectively. In the ERT model, the areal coverage for very high to moderate flash flood susceptible area was 582.56 km2 (28.33%), and the rest of the portion was associated with very low to low susceptibility zones. It is concluded that topographical and hydrological parameters, e.g., altitude, slope, rainfall, and the river’s distance, were the most effective parameters. The results of this study will play a vital role in the planning and implementation of flood mitigation strategies in the region.

Journal Article

Share this book

Add to My Shelf

A machine learning algorithm to explore the drivers of carbon emissions in Chinese cities

by Cao, Qiang , Xia, Lina , Yu, Wenmei in 704/172/4081 , 704/844 , 704/844/2175

2024

As the world’s largest energy consumer and carbon emitter, the task of carbon emission reduction is imminent. In order to realize the dual-carbon goal at an early date, it is necessary to study the key factors affecting China’s carbon emissions and their non-linear relationships. This paper compares the performance of six machine learning algorithms to that of traditional econometric models in predicting carbon emissions in China from 2011 to 2020 using panel data from 254 cities in China. Specifically, it analyzes the comparative importance of domestic economic, external economic, and policy uncertainty factors as well as the nonparametric relationship between these factors and carbon emissions based on the Extra-trees model. Results show that energy consumption (ENC) remains the root cause of increased carbon emissions among domestic economic factors, although government intervention (GOV) and digital finance (DIG) can significantly reduce it. Next, among the external economic and policy uncertainty factors, foreign direct investment (FDI) and economic policy uncertainty (EPU) are important factors influencing carbon emissions, and the partial dependence plots (PDPs) confirm the pollution haven hypothesis and also reveal the role of EPU in reducing carbon emissions. The heterogeneity of factors affecting carbon emissions is also analyzed under different city sizes, and it is found that ENC is a common driving factor in cities of different sizes, but there are some differences. Finally, appropriate policy recommendations are proposed by us to help China move rapidly towards a green and sustainable development path.

Journal Article

Share this book

Add to My Shelf

Evaluating classifier performance with highly imbalanced Big Data

by Hancock, John T , Khoshgoftaar, Taghi M , Johnson, Justin M in Big Data , Classification , Classifiers

2023

Using the wrong metrics to gauge classification of highly imbalanced Big Data may hide important information in experimental results. However, we find that analysis of metrics for performance evaluation and what they can hide or reveal is rarely covered in related works. Therefore, we address that gap by analyzing multiple popular performance metrics on three Big Data classification tasks. To the best of our knowledge, we are the first to utilize three new Medicare insurance claims datasets which became publicly available in 2021. These datasets are all highly imbalanced. Furthermore, the datasets are comprised of completely different data. We evaluate the performance of five ensemble learners in the Machine Learning task of Medicare fraud detection. Random Undersampling (RUS) is applied to induce five class ratios. The classifiers are evaluated with both the Area Under the Receiver Operating Characteristic Curve (AUC), and Area Under the Precision Recall Curve (AUPRC) metrics. We show that AUPRC provides a better insight into classification performance. Our findings reveal that the AUC metric hides the performance impact of RUS. However, classification results in terms of AUPRC show RUS has a detrimental effect. We show that, for highly imbalanced Big Data, the AUC metric fails to capture information about precision scores and false positive counts that the AUPRC metric reveals. Our contribution is to show AUPRC is a more effective metric for evaluating the performance of classifiers when working with highly imbalanced Big Data.

Journal Article

Share this book

Add to My Shelf

Rapid natural hazard extent estimation from twitter data: investigation for hurricane impact areas

by Chanussot, Jocelyn , Keller, Sina , Florath, Janine in Accuracy , Civil Engineering , Cyclones

2024

Natural hazards have occurred more frequently in the past years and pose a severe risk to human life. Their extents and, thereby, the most heavily affected areas must be estimated at the earliest to limit damages or initiate rescue services. For such estimations, a widely available data source, which is comparatively responsive to short-time changes, is needed and provided by volunteered geographic information (VGI) data. Tropical cyclones are natural hazard events that can cause enormous spatially extended damage. In this study, we introduce Machine Learning approaches such as Extremely Randomized Tree (ET) and Geographically Weighted Regression for estimating hurricane-impacted regions from VGI data. In addition to the general approximate track extent estimation, we also evaluate the possibilities of temporal estimation of track development from VGI data. Different scenarios are evaluated, and we find that the results mainly depend on the choice of the geographical splits for training and test data for the underlying regression task. Suitable splits lead to R 2 of 99% in the best cases with the ET model. The estimation results are satisfying when considering the temporal aspect and represent a use-case scenario. Such a combination of Machine Learning approaches and VGI is a simple and fast approach for early natural hazard estimation.

Journal Article

Share this book

Add to My Shelf

A Cascade Ensemble Learning Model for Human Activity Recognition with Smartphones

by Pan, Zhigeng , Jin, Linpeng , Xu, Shoujiang in Artificial intelligence , cascade ensemble learning model , Cellular telephones

2019

Human activity recognition (HAR) has gained lots of attention in recent years due to its high demand in different domains. In this paper, a novel HAR system based on a cascade ensemble learning (CELearning) model is proposed. Each layer of the proposed model is comprised of Extremely Gradient Boosting Trees (XGBoost), Random Forest, Extremely Randomized Trees (ExtraTrees) and Softmax Regression, and the model goes deeper layer by layer. The initial input vectors sampled from smartphone accelerometer and gyroscope sensor are trained separately by four different classifiers in the first layer, and the probability vectors representing different classes to which each sample belongs are obtained. Both the initial input data and the probability vectors are concatenated together and considered as input to the next layer’s classifiers, and eventually the final prediction is obtained according to the classifiers of the last layer. This system achieved satisfying classification accuracy on two public datasets of HAR based on smartphone accelerometer and gyroscope sensor. The experimental results show that the proposed approach has gained better classification accuracy for HAR compared to existing state-of-the-art methods, and the training process of the model is simple and efficient.

Journal Article

Share this book

Add to My Shelf

Short-Term Solar Irradiance Forecasting Using Random Forest-Based Models with a Focus on Mountain Locations

by Paulescu, Eugenia , Paulescu, Marius , Velimirovici, Lucas in Accuracy , Artificial intelligence , Decision trees

2026

Photovoltaic (PV) power forecasting has become a key tool for the intelligent management of electrical grids. Since the largest source of error in PV power forecasting originates from uncertainties in solar irradiance prediction, improving the accuracy of solar irradiance forecasts has emerged as an active research topic. This study evaluates multiple random tree-based model versions using a challenging dataset collected at globally distributed stations, spanning elevations from sea level to nearly 4000 m and covering a wide range of climate classes. The originality of the study lies in the synergistic contribution of two elements: the innovative inclusion of diffuse irradiance among the predictors and a comparative analysis of forecast quality across lowland and mountainous locations. In such environments, accurate solar resource forecasting is particularly important for the intelligent management of stand-alone PV systems deployed at high altitudes and in remote, off-grid areas. Overall, the results identify Extremely Randomized Trees (XTRc) as the best-performing model. XTRc achieves Skill Scores ranging from 0.087 to 0.298 across individual stations. The model accuracy remains high even at mountain stations, provided that sky-condition variability is low.

Journal Article

Share this book

Add to My Shelf

Induction motor condition monitoring using infrared thermography imaging and ensemble learning techniques

by Bettahar, Toufik , Benazzouz, Djamel , Rahmoune, Chemseddine in Condition monitoring , Ensemble learning , Fault detection

2021

In this paper, a novel noncontact and nonintrusive framework experimental method is used for the monitoring and the diagnosis of a three phase’s induction motor faults based on an infrared thermography technique (IRT). The basic structure of this work begins with this applying IRT to obtain a thermograph of the considered machine. Then, bag-of-visual-word (BoVW) is used to extract the fault features with Speeded-Up Robust Features (SURF) detector and descriptor from the IRT images. Finally, various faults patterns in the induction motor are automatically identified using an ensemble learning called Extremely Randomized Tree (ERT). The proposed method effectiveness is evaluated based on the experimental IRT images, and the diagnosis results show its capacity and that it can be considered as a powerful diagnostic tool with a high classification accuracy and stability compared to other previously used methods.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter