Catalogue Search | MBRL

Introduction to statistical relational learning

by Getoor, Lise, editor , Taskar, Ben, editor in Relational databases. , Machine learning Statistical methods. , Computer algorithms.

Book

Share this book

Add to My Shelf

An Elementary Introduction to Statistical Learning Theory

by Kulkarni, Sanjeev , Harman, Gilbert in Machine learning , Machine learning -- Statistical methods , MATHEMATICS

2011

A thought-provoking look at statistical learning theory and its role in understanding human learning and inductive reasoning A joint endeavor from leading researchers in the fields of philosophy and electrical engineering, An Elementary Introduction to Statistical Learning Theory is a comprehensive and accessible primer on the rapidly evolving fields of statistical pattern recognition and statistical learning theory. Explaining these areas at a level and in a way that is not often found in other books on the topic, the authors present the basic theory behind contemporary machine learning and uniquely utilize its foundations as a framework for philosophical thinking about inductive inference. Promoting the fundamental goal of statistical learning, knowing what is achievable and what is not, this book demonstrates the value of a systematic methodology when used along with the needed techniques for evaluating the performance of a learning system. First, an introduction to machine learning is presented that includes brief discussions of applications such as image recognition, speech recognition, medical diagnostics, and statistical arbitrage. To enhance accessibility, two chapters on relevant aspects of probability theory are provided. Subsequent chapters feature coverage of topics such as the pattern recognition problem, optimal Bayes decision rule, the nearest neighbor rule, kernel rules, neural networks, support vector machines, and boosting. Appendices throughout the book explore the relationship between the discussed material and related topics from mathematics, philosophy, psychology, and statistics, drawing insightful connections between problems in these areas and statistical learning theory. All chapters conclude with a summary section, a set of practice questions, and a reference sections that supplies historical notes and additional resources for further study. An Elementary Introduction to Statistical Learning Theory is an excellent book for courses on statistical learning theory, pattern recognition, and machine learning at the upper-undergraduate and graduate levels. It also serves as an introductory reference for researchers and practitioners in the fields of engineering, computer science, philosophy, and cognitive science that would like to further their knowledge of the topic.

eBook

Share this book

Add to My Shelf

Game theory for data science : eliciting truthful information

by Faltings, Boi, author , Radanovic, Goran, author in Game theory. , Information science Statistical methods. , Data mining.

Book

Share this book

Add to My Shelf

Spatial statistical machine learning models to assess the relationship between development vulnerabilities and educational factors in children in Queensland, Australia

by Price, Aiden , Draidi Areed, Wala , Arnett, Kathryn in Australia , Biostatistics , Case studies

2022

Background The health and development of children during their first year of full time school is known to impact their social, emotional, and academic capabilities throughout and beyond early education. Physical health, motor development, social and emotional well-being, learning styles, language and communication, cognitive skills, and general knowledge are all considered to be important aspects of a child’s health and development. It is important for many organisations and governmental agencies to continually improve their understanding of the factors which determine or influence development vulnerabilities among children. This article studies the relationships between development vulnerabilities and educational factors among children in Queensland, Australia. Methods Spatial statistical machine learning models are reviewed and compared in the context of a study of geographic variation in the association between development vulnerabilities and attendance at preschool among children in Queensland, Australia. A new spatial random forest (SRF) model is suggested that can explain more of the spatial variation in data than other approaches. Results In the case study, spatial models were shown to provide a better fit compared to models that ignored the spatial variation in the data. The SRF model was shown to be the only model which can explain all of the spatial variation in each of the development vulnerabilities considered in the case study. The spatial analysis revealed that the attendance at preschool factor has a strong influence on the physical health domain vulnerability and emotional maturity vulnerability among children in their first year of school. Conclusion This study confirmed that it is important to take into account the spatial nature of data when fitting statistical machine learning models. A new spatial random forest model was introduced and was shown to explain more of the spatial variation and provide a better model fit in the case study of development vulnerabilities among children in Queensland. At small-area population level, increased attendance at preschool was strongly associated with reduced physical and emotional development vulnerabilities among children in their first year of school.

Journal Article

Share this book

Add to My Shelf

Statistics for machine learning : build supervised, unsupervised, and reinforcement learning models using both Python and R

by Dangeti, Pratap, author in Big data Statistical methods. , Machine learning. , Python (Computer program language)

Book

Share this book

Add to My Shelf

Estimation on total phosphorus of agriculture soil in China: a new sight with comparison of model learning methods

by Ramirez-Granada, Lina , Li, Gang , Wu, Caicong in Agricultural ecosystems , Agriculture , Air temperature

2023

PurposeAlthough soil total phosphorus (TP) is a primary and essential large element reflecting the soil fertility in agricultural ecosystems, studies on model development of TP and its differences between wheat and paddy lands after a long cultivation history at a regional scale are still limited. Hence, a comparison model of TP with different learning methods and datasets were built, and the relationship between environmental factors and TP were discussed.MethodsTP from a long cultivation of either wheat or paddy agriculture systems was investigated, and the regression between TP and climate parameters (air temperatures, precipitation, humidity, and atmospheric pressure) and latitude were analyzed. A comparison of model development with six learning methods, including one statistical learning method (linear) and five machine learning methods (support vector regression, decision tree, random forest, XGBoost, and LightGBM), and two datasets (0–20 and 0–170-cm soil layers) was made. The models were evaluated by the root mean squared error (RMSE), mean deviation (RMD), mean absolute error (MAE), and model effective (EF).ResultsThe results showed that the TP content of the top soil layer in wheat lands (0.89 ± 0.01 g kg−1) was significantly higher than that of paddy lands (0.63 ± 0.01 g kg−1). The annual average precipitation, humidity, and air temperature had significant negative relationships with TP content, while the annual average atmospheric pressure and latitude had significant positive relationships with TP. Most machine learning methods showed better performances than that of a statistical learning method with the highest r2 of 0.82. The different datasets used for model development had no significant effect on model performances.ConclusionThe average TP content of the top soil layer tends to be greater in wheat lands than that in paddy lands after a long cultivation. Other than the different statistical parameters (the average, maximum, and minimum values) of each climate parameter, comprehensive climate parameters including the annual, semiannual, quarterly, and monthly air temperature, precipitation, humidity, and atmospheric pressure should be considered for further model development. Although different datasets in variable soil depth had no significant effect on model performances, machine learning methods such as random forest, XGBoost, and LightGBM are recommended for better performance than a linear learning method for soil TP model development. It is recommended that a comparison of different machine learning methods will help build a stronger model in similar studies.

Journal Article

Share this book

Add to My Shelf

TensorFlow 2. 0 Quick Start Guide

by Holdroyd, Tony in Machine learning-Statistical methods

2019

TensorFlow is one of the most popular machine learning frameworks in Python. With this book, you will improve your knowledge of some of the latest TensorFlow features and will be able to perform supervised and unsupervised machine learning and also train neural networks.

eBook

Share this book

Add to My Shelf

Bridging the gap between pricing and reserving with an occurrence and development model for non-life insurance claims

by Crevecoeur, Jonas , Antonio, Katrien , Desmedt, Stijn in Actuarial science , Actuaries , Estimates

2023

Due to the presence of reporting and settlement delay, claim data sets collected by non-life insurance companies are typically incomplete, facing right censored claim count and claim severity observations. Current practice in non-life insurance pricing tackles these right censored data via a two-step procedure. First, best estimates are computed for the number of claims that occurred in past exposure periods and the ultimate claim severities, using the incomplete, historical claim data. Second, pricing actuaries build predictive models to estimate technical, pure premiums for new contracts by treating these best estimates as actual observed outcomes, hereby neglecting their inherent uncertainty. We propose an alternative approach that brings valuable insights for both non-life pricing and reserving. As such, we effectively bridge these two key actuarial tasks that have traditionally been discussed in silos. Hereto, we develop a granular occurrence and development model for non-life claims that tackles reserving and at the same time resolves the inconsistency in traditional pricing techniques between actual observations and imputed best estimates. We illustrate our proposed model on an insurance as well as a reinsurance portfolio. The advantages of our proposed strategy are most compelling in the reinsurance illustration where large uncertainties in the best estimates originate from long reporting and settlement delays, low claim frequencies and heavy (even extreme) claim sizes.

Journal Article

Share this book

Add to My Shelf

Chemometric analysis in Raman spectroscopy from experimental design to machine learning–based modeling

by Guo, Shuxia , Popp, Jürgen , Bocklitz, Thomas in 631/114/2164 , 639/638/440/527/1821 , 639/705/1042

2021

Raman spectroscopy is increasingly being used in biology, forensics, diagnostics, pharmaceutics and food science applications. This growth is triggered not only by improvements in the computational and experimental setups but also by the development of chemometric techniques. Chemometric techniques are the analytical processes used to detect and extract information from subtle differences in Raman spectra obtained from related samples. This information could be used to find out, for example, whether a mixture of bacterial cells contains different species, or whether a mammalian cell is healthy or not. Chemometric techniques include spectral processing (ensuring that the spectra used for the subsequent computational processes are as clean as possible) as well as the statistical analysis of the data required for finding the spectral differences that are most useful for differentiation between, for example, different cell types. For Raman spectra, this analysis process is not yet standardized, and there are many confounding pitfalls. This protocol provides guidance on how to perform a Raman spectral analysis: how to avoid these pitfalls, and strategies to circumvent problematic issues. The protocol is divided into four parts: experimental design, data preprocessing, data learning and model transfer. We exemplify our workflow using three example datasets where the spectra from individual cells were collected in single-cell mode, and one dataset where the data were collected from a raster scanning–based Raman spectral imaging experiment of mice tissue. Our aim is to help move Raman-based technologies from proof-of-concept studies toward real-world applications. Raman spectroscopy is increasingly being used in biological assays and studies. This protocol provides guidance for performing chemometric analysis to detect and extract information relating to the chemical differences between biological samples.

Journal Article

Share this book

Add to My Shelf

Machine learning in medicine: a practical introduction

by Sidey-Gibbons, Jenni A. M. , Sidey-Gibbons, Chris J. in Accuracy , Algorithms , Archives & records

2019

Background Following visible successes on a wide range of predictive tasks, machine learning techniques are attracting substantial interest from medical researchers and clinicians. We address the need for capacity development in this area by providing a conceptual introduction to machine learning alongside a practical guide to developing and evaluating predictive algorithms using freely-available open source software and public domain data. Methods We demonstrate the use of machine learning techniques by developing three predictive models for cancer diagnosis using descriptions of nuclei sampled from breast masses. These algorithms include regularized General Linear Model regression (GLMs), Support Vector Machines (SVMs) with a radial basis function kernel, and single-layer Artificial Neural Networks. The publicly-available dataset describing the breast mass samples ( N =683) was randomly split into evaluation ( n =456) and validation ( n =227) samples. We trained algorithms on data from the evaluation sample before they were used to predict the diagnostic outcome in the validation dataset. We compared the predictions made on the validation datasets with the real-world diagnostic decisions to calculate the accuracy, sensitivity, and specificity of the three models. We explored the use of averaging and voting ensembles to improve predictive performance. We provide a step-by-step guide to developing algorithms using the open-source R statistical programming environment. Results The trained algorithms were able to classify cell nuclei with high accuracy (.94 -.96), sensitivity (.97 -.99), and specificity (.85 -.94). Maximum accuracy (.96) and area under the curve (.97) was achieved using the SVM algorithm. Prediction performance increased marginally (accuracy =.97, sensitivity =.99, specificity =.95) when algorithms were arranged into a voting ensemble. Conclusions We use a straightforward example to demonstrate the theory and practice of machine learning for clinicians and medical researchers. The principals which we demonstrate here can be readily applied to other complex tasks including natural language processing and image recognition.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter