Catalogue Search | MBRL

Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks

by Miller, James D. (Software consultant), author in Machine learning. , Mathematical statistics. , Data mining.

Book

Share this book

Add to My Shelf

Optimizing the Predictive Ability of Machine Learning Methods for Landslide Susceptibility Mapping Using SMOTE for Lishui City in Zhejiang Province, China

by Feng, Luwei , Wang, Yumiao , Du, Qingyun in Algorithms , Area Under Curve , Artificial intelligence

2019

The main goal of this study was to use the synthetic minority oversampling technique (SMOTE) to expand the quantity of landslide samples for machine learning methods (i.e., support vector machine (SVM), logistic regression (LR), artificial neural network (ANN), and random forest (RF)) to produce high-quality landslide susceptibility maps for Lishui City in Zhejiang Province, China. Landslide-related factors were extracted from topographic maps, geological maps, and satellite images. Twelve factors were selected as independent variables using correlation coefficient analysis and the neighborhood rough set (NRS) method. In total, 288 soil landslides were mapped using field surveys, historical records, and satellite images. The landslides were randomly divided into two datasets: 70% of all landslides were selected as the original training dataset and 30% were used for validation. Then, SMOTE was employed to generate datasets with sizes ranging from two to thirty times that of the training dataset to establish and compare the four machine learning methods for landslide susceptibility mapping. In addition, we used slope units to subdivide the terrain to determine the landslide susceptibility. Finally, the landslide susceptibility maps were validated using statistical indexes and the area under the curve (AUC). The results indicated that the performances of the four machine learning methods showed different levels of improvement as the sample sizes increased. The RF model exhibited a more substantial improvement (AUC improved by 24.12%) than did the ANN (18.94%), SVM (17.77%), and LR (3.00%) models. Furthermore, the ANN model achieved the highest predictive ability (AUC = 0.98), followed by the RF (AUC = 0.96), SVM (AUC = 0.94), and LR (AUC = 0.79) models. This approach significantly improves the performance of machine learning techniques for landslide susceptibility mapping, thereby providing a better tool for reducing the impacts of landslide disasters.

Journal Article

Share this book

Add to My Shelf

Learn R for applied statistics : with data visualizations, regressions, and statistics

by Goh Ming Hui, Eric, author in R (Computer program language) , Data mining. , Machine learning.

\"Gain the R programming language fundamentals for doing the applied statistics useful for data exploration and analysis in data science and data mining. This book covers topics ranging from R syntax basics, descriptive statistics, and data visualizations to inferential statistics and regressions. After learning R's syntax, you will work through data visualizations such as histograms and boxplot charting, descriptive statistics, and inferential statistics such as t-test, chi-square test, ANOVA, non-parametric test, and linear regressions. \"Learn R for applied statistics\" is a timely skills-migration book that equips you with the R programming fundamentals and introduces you to applied statistics for data explorations. You will: Discover R, statistics, data science, data mining, and big data ; Master the fundamentals of R programming, including variables and arithmetic, vectors, lists, data frames, conditional statements, loops, and functions ; Work with descriptive statistics ; Create data visualizations, including bar charts, line charts, scatter plots, boxplots, histograms, and scatterplots ; Use inferential statistics including t-tests, chi-square tests, ANOVA, non-parametric tests, linear regressions, and multiple linear regressions\"--Back cover.

Book

Share this book

Add to My Shelf

Partitioning variability in animal behavioral videos using semi-supervised variational autoencoders

by Wu, Anqi , Schartner, Michael , Rodriguez, Erica in Algorithms , Animal behavior , Animals

2021

Recent neuroscience studies demonstrate that a deeper understanding of brain function requires a deeper understanding of behavior. Detailed behavioral measurements are now often collected using video cameras, resulting in an increased need for computer vision algorithms that extract useful information from video data. Here we introduce a new video analysis tool that combines the output of supervised pose estimation algorithms (e.g. DeepLabCut) with unsupervised dimensionality reduction methods to produce interpretable, low-dimensional representations of behavioral videos that extract more information than pose estimates alone. We demonstrate this tool by extracting interpretable behavioral features from videos of three different head-fixed mouse preparations, as well as a freely moving mouse in an open field arena, and show how these interpretable features can facilitate downstream behavioral and neural analyses. We also show how the behavioral features produced by our model improve the precision and interpretation of these downstream analyses compared to using the outputs of either fully supervised or fully unsupervised methods alone.

Journal Article

Share this book

Add to My Shelf

Introduction to machine learning with R : rigorous mathematical analysis

by Burger, Scott V., author in Machine learning. , R (Computer program language) , Statistics Data processing.

Machine learning can be a difficult subject if you're not familiar with the basics. With this book, you'll get a solid foundation of introductory principles used in machine learning with the statistical programming language R. You'll start with the basics like regression, then move into more advanced topics like neural networks, and finally delve into the frontier of machine learning in the R world with packages like Caret. By developing a familiarity with topics like understanding the difference between regression and classification models, you'll be able to solve an array of machine learning problems. Knowing when to use a specific model or not can mean the difference between a highly accurate model and a completely useless one. This book provides copious examples to build a working knowledge of machine learning. Understand the major parts of machine learning algorithms Recognize how machine learning can be used to solve a problem in a simple manner Figure out when to use certain machine learning algorithms versus others Learn how to operationalize algorithms with cutting edge packages

Book

Share this book

Add to My Shelf

Applying Multivariate Segmentation Methods to Human Activity Recognition From Wearable Sensors’ Data

by Chiang, Yao-Yi , Habre, Rima , Bui, Alex AT in Accelerometry - methods , Adult , Algorithms

2019

Time-resolved quantification of physical activity can contribute to both personalized medicine and epidemiological research studies, for example, managing and identifying triggers of asthma exacerbations. A growing number of reportedly accurate machine learning algorithms for human activity recognition (HAR) have been developed using data from wearable devices (eg, smartwatch and smartphone). However, many HAR algorithms depend on fixed-size sampling windows that may poorly adapt to real-world conditions in which activity bouts are of unequal duration. A small sliding window can produce noisy predictions under stable conditions, whereas a large sliding window may miss brief bursts of intense activity. We aimed to create an HAR framework adapted to variable duration activity bouts by (1) detecting the change points of activity bouts in a multivariate time series and (2) predicting activity for each homogeneous window defined by these change points. We applied standard fixed-width sliding windows (4-6 different sizes) or greedy Gaussian segmentation (GGS) to identify break points in filtered triaxial accelerometer and gyroscope data. After standard feature engineering, we applied an Xgboost model to predict physical activity within each window and then converted windowed predictions to instantaneous predictions to facilitate comparison across segmentation methods. We applied these methods in 2 datasets: the human activity recognition using smartphones (HARuS) dataset where a total of 30 adults performed activities of approximately equal duration (approximately 20 seconds each) while wearing a waist-worn smartphone, and the Biomedical REAl-Time Health Evaluation for Pediatric Asthma (BREATHE) dataset where a total of 14 children performed 6 activities for approximately 10 min each while wearing a smartwatch. To mimic a real-world scenario, we generated artificial unequal activity bout durations in the BREATHE data by randomly subdividing each activity bout into 10 segments and randomly concatenating the 60 activity bouts. Each dataset was divided into ~90% training and ~10% holdout testing. In the HARuS data, GGS produced the least noisy predictions of 6 physical activities and had the second highest accuracy rate of 91.06% (the highest accuracy rate was 91.79% for the sliding window of size 0.8 second). In the BREATHE data, GGS again produced the least noisy predictions and had the highest accuracy rate of 79.4% of predictions for 6 physical activities. In a scenario with variable duration activity bouts, GGS multivariate segmentation produced smart-sized windows with more stable predictions and a higher accuracy rate than traditional fixed-size sliding window approaches. Overall, accuracy was good in both datasets but, as expected, it was slightly lower in the more real-world study using wrist-worn smartwatches in children (BREATHE) than in the more tightly controlled study using waist-worn smartphones in adults (HARuS). We implemented GGS in an offline setting, but it could be adapted for real-time prediction with streaming data.

Journal Article

Share this book

Add to My Shelf

Advances in analytics and applications

by International Conference on Advanced Data Analysis, Business Analytics and Intelligence (5th : 2017 : Ahmedabad, India) , Laha, Arnab Kumar, editor , Indian Institute of Management, Ahmedabad, sponsoring body in Business Data processing Congresses. , Mathematical statistics Congresses. , Machine learning Congresses.

Book

Share this book

Add to My Shelf

Ensemble-based kernel learning for a class of data assimilation problems with imperfect forward simulators

by Luo, Xiaodong in Algorithms , Analogies , Analysis

2019

Simulator imperfection, often known as model error, is ubiquitous in practical data assimilation problems. Despite the enormous efforts dedicated to addressing this problem, properly handling simulator imperfection in data assimilation remains to be a challenging task. In this work, we propose an approach to dealing with simulator imperfection from a point of view of functional approximation that can be implemented through a certain machine learning method, such as kernel-based learning adopted in the current work. To this end, we start from considering a class of supervised learning problems, and then identify similarities between supervised learning and variational data assimilation. These similarities found the basis for us to develop an ensemble-based learning framework to tackle supervised learning problems, while achieving various advantages of ensemble-based methods over the variational ones. After establishing the ensemble-based learning framework, we proceed to investigate the integration of ensemble-based learning into an ensemble-based data assimilation framework to handle simulator imperfection. In the course of our investigations, we also develop a strategy to tackle the issue of multi-modality in supervised-learning problems, and transfer this strategy to data assimilation problems to help improve assimilation performance. For demonstration, we apply the ensemble-based learning framework and the integrated, ensemble-based data assimilation framework to a supervised learning problem and a data assimilation problem with an imperfect forward simulator, respectively. The experiment results indicate that both frameworks achieve good performance in relevant case studies, and that functional approximation through machine learning may serve as a viable way to account for simulator imperfection in data assimilation problems.

Journal Article

Share this book

Add to My Shelf

Multi-label dimensionality reduction

by Sun, Liang, author , Ji, Shuiwang, 1977- author , Ye, Jieping, author in Dimension reduction (Statistics) , Machine learning. , Canonical correlation (Statistics)

Book

Share this book

Add to My Shelf

Evaluation of a Machine Learning Model Based on Pretreatment Symptoms and Electroencephalographic Features to Predict Outcomes of Antidepressant Treatment in Adults With Depression

by Dass, Nathan , Irvin, Jeremy , Williams, Leanne M. in Adult , Algorithms , Antidepressants

2020

Despite the high prevalence and potential outcomes of major depressive disorder, whether and how patients will respond to antidepressant medications is not easily predicted. To identify the extent to which a machine learning approach, using gradient-boosted decision trees, can predict acute improvement for individual depressive symptoms with antidepressants based on pretreatment symptom scores and electroencephalographic (EEG) measures. This prognostic study analyzed data collected as part of the International Study to Predict Optimized Treatment in Depression, a randomized, prospective open-label trial to identify clinically useful predictors and moderators of response to commonly used first-line antidepressant medications. Data collection was conducted at 20 sites spanning 5 countries and including 518 adult outpatients (18-65 years of age) from primary care or specialty care practices who received a diagnosis of current major depressive disorder between December 1, 2008, and September 30, 2013. Patients were antidepressant medication naive or willing to undergo a 1-week washout period of any nonprotocol antidepressant medication. Statistical analysis was conducted from January 5 to June 30, 2019. Participants with major depressive disorder were randomized in a 1:1:1 ratio to undergo 8 weeks of treatment with escitalopram oxalate (n = 162), sertraline hydrochloride (n = 176), or extended-release venlafaxine hydrochloride (n = 180). The primary objective was to predict improvement in individual symptoms, defined as the difference in score for each of the symptoms on the 21-item Hamilton Rating Scale for Depression from baseline to week 8, evaluated using the C index. The resulting data set contained 518 patients (274 women; mean [SD] age, 39.0 [12.6] years; mean [SD] 21-item Hamilton Rating Scale for Depression score improvement, 13.0 [7.0]). With the use of 5-fold cross-validation for evaluation, the machine learning model achieved C index scores of 0.8 or higher on 12 of 21 clinician-rated symptoms, with the highest C index score of 0.963 (95% CI, 0.939-1.000) for loss of insight. The importance of any single EEG feature was higher than 5% for prediction of 7 symptoms, with the most important EEG features being the absolute delta band power at the occipital electrode sites (O1, 18.8%; Oz, 6.7%) for loss of insight. Over and above the use of baseline symptom scores alone, the use of both EEG and baseline symptom features was associated with a significant increase in the C index for improvement in 4 symptoms: loss of insight (C index increase, 0.012 [95% CI, 0.001-0.020]), energy loss (C index increase, 0.035 [95% CI, 0.011-0.059]), appetite changes (C index increase, 0.017 [95% CI, 0.003-0.030]), and psychomotor retardation (C index increase, 0.020 [95% CI, 0.008-0.032]). This study suggests that machine learning may be used to identify independent associations of symptoms and EEG features to predict antidepressant-associated improvements in specific symptoms of depression. The approach should next be prospectively validated in clinical trials and settings. ClinicalTrials.gov Identifier: NCT00693849.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter