Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Series TitleSeries Title
-
Reading LevelReading Level
-
YearFrom:-To:
-
More FiltersMore FiltersContent TypeItem TypeIs Full-Text AvailableSubjectPublisherSourceDonorLanguagePlace of PublicationContributorsLocation
Done
Filters
Reset
91,015
result(s) for
"Applied statistics"
Sort by:
Top 10 algorithms in data mining
by
Motoda, Hiroshi
,
McLachlan, Geoffrey J.
,
Yu, Philip S.
in
Algorithms
,
Computer engineering
,
Computer Science
2008
This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5,
k
-Means, SVM, Apriori, EM, PageRank, AdaBoost,
k
NN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification, clustering, statistical learning, association analysis, and link mining, which are all among the most important topics in data mining research and development.
Journal Article
Heteroscedastic BART via Multiplicative Regression Trees
by
George, E. I.
,
McCulloch, R. E.
,
Chipman, H. A.
in
Applied statistical inference
,
Applied statistics
,
Big data
2020
Bayesian additive regression trees (BART) has become increasingly popular as a flexible and scalable nonparametric regression approach for modern applied statistics problems. For the practitioner dealing with large and complex nonlinear response surfaces, its advantages include a matrix-free formulation and the lack of a requirement to prespecify a confining regression basis. Although flexible in fitting the mean, BART has been limited by its reliance on a constant variance error model. Alleviating this limitation, we propose HBART, a nonparametric heteroscedastic elaboration of BART. In BART, the mean function is modeled with a sum of trees, each of which determines an additive contribution to the mean. In HBART, the variance function is further modeled with a product of trees, each of which determines a multiplicative contribution to the variance. Like the mean model, this flexible, multidimensional variance model is entirely nonparametric with no need for the prespecification of a confining basis. Moreover, with this enhancement, HBART can provide insights into the potential relationships of the predictors with both the mean and the variance. Practical implementations of HBART with revealing new diagnostic plots are demonstrated with simulated and real data on used car prices and song year of release.
Supplementary materials
for this article are available online.
Journal Article
Electromagnetics, control and robotics : a problems & solutions approach
This book covers a variety of problems, and offers solutions to some, in: Statistical state and parameter estimation in nonlinear stochastic dynamical system in both the classical and quantum scenarios Propagation of electromagnetic waves in a plasma as described by the Boltzmann Kinetic Transport Equation Classical and Quantum General Relativity It will be of use to Engineering undergraduate students interested in analysing the motion of robots subject to random perturbation, and also to research scientists working in Quantum Filtering.
Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data
2014
Change point analysis has applications in a wide variety of fields. The general problem concerns the inference of a change in distribution for a set of time-ordered observations. Sequential detection is an online version in which new data are continually arriving and are analyzed adaptively. We are concerned with the related, but distinct, offline version, in which retrospective analysis of an entire sequence is performed. For a set of multivariate observations of arbitrary dimension, we consider nonparametric estimation of both the number of change points and the positions at which they occur. We do not make any assumptions regarding the nature of the change in distribution or any distribution assumptions beyond the existence of the αth absolute moment, for some α ∈ (0, 2). Estimation is based on hierarchical clustering and we propose both divisive and agglomerative algorithms. The divisive method is shown to provide consistent estimates of both the number and the location of change points under standard regularity assumptions. We compare the proposed approach with competing methods in a simulation study. Methods from cluster analysis are applied to assess performance and to allow simple comparisons of location estimates, even when the estimated number differs. We conclude with applications in genetics, finance, and spatio-temporal analysis. Supplementary materials for this article are available online.
Journal Article
The end of average : how we succeed in a world that values sameness
\"Weaving science, history, and his experiences as a high school dropout, Rose brings to life the untold story of how we came to embrace the scientifically flawed idea that averages can be used to understand individuals and offers a powerful alternative: the three principles of individuality\"-- Provided by publisher.
The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study
by
Knol, Dirk L.
,
Patrick, Donald L.
,
de Vet, Henrica C. W.
in
Applied statistics
,
Biostatistics
,
Clinical medicine
2010
Background Aim of the COSMIN study (COnsensusbased Standards for the selection of health status Measurement INstruments) was to develop a consensus-based checklist to evaluate the methodological quality of studies on measurement properties. We present the COSMIN checklist and the agreement of the panel on the items of the checklist. Methods A four-round Delphi study was performed with international experts (psychologists, epidemiologists, statisticians and clinicians). Of the 91 invited experts, 57 agreed to participate (63%). Panel members were asked to rate their (dis) agreement with each proposal on a five-point scale. Consensus was considered to be reached when at least 67% of the panel members indicated 'agree' or 'strongly agree'. Results Consensus was reached on the inclusion of the following measurement properties: internal consistency, reliability, measurement error, content validity (including face validity), construct validity (including structural validity, hypotheses testing and cross-cultural validity), criterion validity, responsiveness, and interpretability. The latter was not considered a measurement property. The panel also reached consensus on how these properties should be assessed. Conclusions The resulting COSMIN checklist could be useful when selecting a measurement instrument, peerreviewing a manuscript, designing or reporting a study on measurement properties, or for educational purposes.
Journal Article
SEQUENTIAL MULTI-SENSOR CHANGE-POINT DETECTION
2013
We develop a mixture procedure to monitor parallel streams of data for a change-point that affects only a subset of them, without assuming a spatial structure relating the data streams to one another. Observations are assumed initially to be independent standard normal random variables. After a change-point the observations in a subset of the streams of data have nonzero mean values. The subset and the post-change means are unknown. The procedure we study uses stream specific generalized likelihood ratio statistics, which are combined to form an overall detection statistic in a mixture model that hypothesizes an assumed fraction p 0 of affected data streams. An analytic expression is obtained for the average run length (ARL) when there is no change and is shown by simulations to be very accurate. Similarly, an approximation for the expected detection delay (EDD) after a change-point is also obtained. Numerical examples are given to compare the suggested procedure to other procedures for unstructured problems and in one case where the problem is assumed to have a well-defined geometric structure. Finally we discuss sensitivity of the procedure to the assumed value of p 0 and suggest a generalization.
Journal Article
A Guide to Teaching Data Science
2018
Demand for data science education is surging and traditional courses offered by statistics departments are not meeting the needs of those seeking training. This has led to a number of opinion pieces advocating for an update to the Statistics curriculum. The unifying recommendation is that computing should play a more prominent role. We strongly agree with this recommendation, but advocate the main priority is to bring applications to the forefront as proposed by Nolan and Speed in 1999. We also argue that the individuals tasked with developing data science courses should not only have statistical training, but also have experience analyzing data with the main objective of solving real-world problems. Here, we share a set of general principles and offer a detailed guide derived from our successful experience developing and teaching a graduate-level, introductory data science course centered entirely on case studies. We argue for the importance of statistical thinking, as defined by Wild and Pfannkuch in 1999 and describe how our approach teaches students three key skills needed to succeed in data science, which we refer to as creating, connecting, and computing. This guide can also be used for statisticians wanting to gain more practical knowledge about data science before embarking on teaching an introductory course. Supplementary materials for this article are available online.
Journal Article