Catalogue Search | MBRL

Soil Classification from Cone Penetration Test Profiles Based on XGBoost

by Ni, Jiaze , Zhang, Dongming , Wang, Feiyang in Accuracy , Accuracy Coverage Rate , Algorithms

2026

This study develops a machine-learning-based framework for multiclass soil classification using Cone Penetration Test (CPT) data, aiming to overcome the limitations of traditional empirical Soil Behavior Type (SBT) charts and improve the automation, continuity, robustness, and reliability of stratigraphic interpretation. A dataset of 340 CPT soundings from 26 sites in Shanghai is compiled, and a sliding-window feature engineering strategy is introduced to transform point measurements into local pattern descriptors. An XGBoost-based multiclass classifier is then constructed using fifteen engineered features, integrating second-order optimization, regularized tree structures, and probability-based decision functions. Results demonstrate that the proposed method achieves strong classification performance across nine soil categories, with an overall classification accuracy of approximately 92.6%, an average F1-score exceeding 0.905, and a mean Average Precision (mAP) of 0.954. The confusion matrix, P–R curves, and prediction probabilities show that soil types with distinctive CPT signatures are classified with near-perfect confidence, whereas transitional clay–silt facies exhibit moderate but geologically consistent misclassification. To evaluate depth-wise prediction reliability, an Accuracy Coverage Rate (ACR) metric is proposed. Analysis of all CPTs reveals a mean ACR of 0.924, and the ACR follows a Weibull distribution. Feature importance analysis indicates that depth-dependent variables and smoothed ps statistics are the dominant predictors governing soil behavior differentiation. The proposed XGBoost-based framework effectively captures nonlinear CPT–soil relationships, offering a practical and interpretable tool for high-resolution soil classification in subsurface investigations.

Journal Article

Share this book

Add to My Shelf

Transforming the empirical likelihood towards better accuracy

by TSAO, Min , JING, Bing-Yi , ZHOU, Wang in Accuracy , Alleviation , Complexity

2017

Under-coverage has been a long-standing issue with the empirical likelihood confidence region. Several methods can be used to address this issue, but they all add complexity to the empirical likelihood inference requiring extra computation and/or extra theoretical investigation. The objective of this article is to find a method that does not add complexity. To this end we look for a simple transformation of the empirical likelihood to alleviate the under-coverage. Using several criteria concerning the accuracy, consistency, and preservation of the geometric appeal of the original empirical likelihood we obtain a transformed version of the empirical likelihood that is extremely simple in theory and computation. Its confidence regions are surprisingly accurate, even in small sample and multidimensional situations. It can be easily used to alleviate the under-coverage problem of empirical likelihood confidence regions. Les zones de confiance issues de la vraisemblance empirique souffrent depuis toujours de souscouverture. Plusieurs méthodes peuvent régler ce problème, mais elles nécessitent toutes des calculs additionnels ou une étude théorique approfondie. Les auteurs proposent donc une approche qui n’augmente pas la complexité de la méthode. Ils présentent en effet une transformation simple de la vraisemblance empirique qui corrige la sous-couverture. En se basant sur plusieurs critères de précision, de convergence et de préservation des bonnes caractéristiques géométriques de la méthode originale, les auteurs obtiennent une version transformée de la vraisemblance empirique qui s’avère extrêmement simple tant au point de vue théorique que calculatoire. Les zones de confiance sont particulièrement justes, même pour des échantillons de petite taille et pour des données multivariées. Cette solution permet de régler le problème de souscouverture pour les zones de confiance issues de la vraisemblance empirique.

Journal Article

Share this book

Add to My Shelf

Confidence bands in non‐parametric errors‐in‐variables regression

by Jamshidi, Farshid , Delaigle, Aurore , Hall, Peter in Air quality , Bands , Bandwidth choice

2015

Errors‐in‐variables regression is important in many areas of science and social science, e.g. in economics where it is often a feature of hedonic models, in environmental science where air quality indices are measured with error, in biology where the vegetative mass of plants is frequently obscured by mismeasurement and in nutrition where reported fat intake is typically subject to substantial error. To date, in non‐parametric contexts, the great majority of work has focused on methods for estimating the mean as a function, with relatively little attention being paid to techniques for empirical assessment of the accuracy of the estimator. We develop methodologies for constructing confidence bands. Our contributions include techniques for tuning parameter choice aimed at minimizing the coverage error of confidence bands.

Journal Article

Share this book

Add to My Shelf

Parametric Bootstrap Approximation to the Distribution of EBLUP and Related Prediction Intervals in Linear Mixed Models

by Chatterjee, Snigdhansu , Lahiri, Partha , Li, Huilin in 62D05 , 62F25 , 62F40

2008

Empirical best linear unbiased prediction (EBLUP) method uses a linear mixed model in combining information from different sources of information. This method is particularly useful in small area problems. The variability of an EBLUP is traditionally measured by the mean squared prediction error (MSPE), and interval estimates are generally constructed using estimates of the MSPE. Such methods have shortcomings like under-coverage or over-coverage, excessive length and lack of interpretability. We propose a parametric bootstrap approach to estimate the entire distribution of a suitably centered and scaled EBLUP. The bootstrap histogram is highly accurate, and differs from the true EBLUP distribution by only $O(d^{3}n^{-3/2})$, where d is the number of parameters and n the number of observations. This result is used to obtain highly accurate prediction intervals. Simulation results demonstrate the superiority of this method over existing techniques of constructing prediction intervals in linear mixed models.

Journal Article

Share this book

Add to My Shelf

Split sample methods for constructing confidence intervals for binomial and Poisson parameters

by Decrouez, Geoffrey , Hall, Peter in Accuracy , Asymptotic expansion , Bayesian theory

2014

We introduce a new method for improving the coverage accuracy of confidence intervals for means of lattice distributions. The technique can be applied very generally to enhance existing approaches, although we consider it in greatest detail in the context of estimating a binomial proportion or a Poisson mean, where it is particularly effective. The method is motivated by a simple theoretical result, which shows that, by splitting the original sample of size n into two parts, of sizes n1 and n2=n−n1, and basing the confidence procedure on the average of the means of these two subsamples, the highly oscillatory behaviour of coverage error, as a function of n, is largely removed. Perhaps surprisingly, this approach does not increase confidence interval width; usually the width is slightly reduced. Contrary to what might be expected, our new method performs well when it is used to modify confidence intervals based on existing techniques that already perform very well—it typically improves significantly their coverage accuracy. Each application of the split sample method to an existing confidence interval procedure results in a new technique.

Journal Article

Share this book

Add to My Shelf

Bootstrap confidence intervals and hypothesis tests for extrema of parameters

by Miller, Hugh , Hall, Peter in Accuracy , Applications , Approximation

2010

The bootstrap provides effective and accurate methodology for a wide variety of statistical problems which might not otherwise enjoy practicable solutions. However, there still exist important problems where standard bootstrap estimators are not consistent, and where alternative approaches, for example the m-out-of-n bootstrap and asymptotic methods, also face significant challenges. One of these is the problem of constructing confidence intervals or hypothesis tests for extrema of parameters, for example for the maximum of p parameters where each has to be estimated from data. In the present paper we suggest approaches to solving this problem. We use the bootstrap to construct an accurate estimator of the joint distribution of centred parameter estimators, and we base the procedure, either a confidence interval or a hypothesis test, on that distribution estimator. Our methodology is designed so that it errs on the side of conservatism, modulo the small inaccuracy of the bootstrap step.

Journal Article

Share this book

Add to My Shelf

Calibration of the empirical likelihood for high-dimensional data

by Wang, Zhaojun , Zou, Changliang , Liu, Yukun in Asymptotic methods , Asymptotic properties , Calibration

2013

This article is concerned with the calibration of the empirical likelihood (EL) for high-dimensional data where the data dimension may increase as the sample size increases. We analyze the asymptotic behavior of the EL under a general multivariate model and provide weak conditions under which the best rate for the asymptotic normality of the empirical likelihood ratio (ELR) is achieved. In addition, there is usually substantial lack-of-fit when the ELR is calibrated by the usual normal in high dimensions, producing tests with type I errors much larger than nominal levels. We find that this is mainly due to the underestimation of the centralized and normalized quantities of the ELR. By examining the connection between the ELR and the classical Hotelling’s -square statistic, we propose an effective calibration method which works much better in most situations.

Journal Article

Share this book

Add to My Shelf

Two-sample empirical likelihood method for difference between coefficients in linear regression model

by Zi, Xuemin , Zou, Changliang , Liu, Yukun in Accuracy , Asymptotic methods , Asymptotic properties

2012

The empirical likelihood method is proposed to construct the confidence regions for the difference in value between coefficients of two-sample linear regression model. Unlike existing empirical likelihood procedures for one-sample linear regression models, as the empirical likelihood ratio function is not concave, the usual maximum empirical likelihood estimation cannot be obtained directly. To overcome this problem, we propose to incorporate a natural and well-explained restriction into likelihood function and obtain a restricted empirical likelihood ratio statistic (RELR). It is shown that RELR has an asymptotic chi-squared distribution. Furthermore, to improve the coverage accuracy of the confidence regions, a Bartlett correction is applied. The effectiveness of the proposed approach is demonstrated by a simulation study.

Journal Article

Share this book

Add to My Shelf

Empirical likelihood confidence regions for comparison distributions and roc curves

by Zhou, Wang , Claeskens, Gerda , Peng, Liang in Approximation , Bootstrap , comparison distribution

2003

The authors derive empirical likelihood confidence regions for the comparison distribution of two populations whose distributions are to be tested for equality using random samples. Another application they consider is to ROC curves, which are used to compare measurements of a diagnostic test from two populations. The authors investigate the smoothed empirical likelihood method for estimation in this context, and empirical likelihood based confidence intervals are obtained by means of the Wilks theorem. A bootstrap approach allows for the construction of confidence bands. The method is illustrated with data analysis and a simulation study. /// Les auteurs déduisent de la vraisemblance empirique des régions de confiance pour la distribution comparée de deux populations dont on veut tester l'égalité en loi au moyen d'échantillons aléatoires. Une autre application qu'ils considèrent concerne les courbes ROC, qui permettent de comparer les résultats d'un test diagnostique effectué auprès de deux populations. L'estimation proposée par les auteurs dans ce contexte s'appuie sur une méthode de lissage de la vraisemblance empirique qui conduit, grâce au théorème de Wilks, aux intervalles de confiance recherchés. Une approche bootstrap permet en outre de construire des bandes de confiance. La méthode est illustrée au moyen de simulations et d'un jeu de données.

Journal Article

Share this book

Add to My Shelf

Local Post-Stratification in Dual System Accuracy and Coverage Evaluation for the U.S. Census

by Mule, Vincent T. , Tang, Cheng Yong , Chen, Song Xi in Accuracy , Accuracy and coverage evaluation , Applications

2010

We consider a local post-stratification approach to analyze the capture—recapture dual system Accuracy and Coverage Evaluation (A.C.E.) data associated with the 2000 U.S. Census. The local post-stratification is carried out via a nonparametric regression estimation of the census enumeration and the correct enumeration functions. We propose a nonparametric population size estimator that is designed to accommodate some key aspects of the A.C.E.: missing values, erroneous enumerations, and extra covariates affecting the missingness and correct enumeration. The resulting estimates are compared with estimates from a conventional post-stratification and a logistic regression approach in an analysis on the 2000 Census A.C.E. data.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter