Catalogue Search | MBRL

Likelihood Cross-Validation Versus Least Squares Cross-Validation for Choosing the Smoothing Parameter in Kernel Home-Range Analysis

by HORNE, JON S. , GARTON, EDWARD O. in Data smoothing , Density estimation , Estimating techniques

2006

Fixed kernel density analysis with least squares cross-validation (LSCVh) choice of the smoothing parameter is currently recommended for home-range estimation. However, LSCVh has several drawbacks, including high variability, a tendency to undersmooth data, and multiple local minima in the LSCVh function. An alternative to LSCVh is likelihood cross-validation (CVh). We used computer simulations to compare estimated home ranges using fixed kernel density with CVh and LSCVh to true underlying distributions. Likelihood cross-validation generally performed better than LSCVh, producing estimates with better fit and less variability, and it was especially beneficial at sample sizes <˜50. Because CVh is based on minimizing the Kullback-Leibler distance and LSCVh the integrated squared error, for each of these measures of discrepancy, we discussed their foundation and general use, statistical properties as they relate to home-range analysis, and the biological or practical interpretation of these statistical properties. We found 2 important problems related to computation of kernel home-range estimates, including multiple minima in the LSCVh and CVh functions and discrepancies among estimates from current home-range software. Choosing an appropriate smoothing parameter is critical when using kernel methods to estimate animal home ranges, and our study provides useful guidelines when making this decision.

Journal Article

Share this book

Add to My Shelf

Choice of Prognostic Estimators in Joint Models by Estimating Differences of Expected Conditional Kullback–Leibler Risks

by Liquet, Benoit , Commenges, Daniel , Proust‐Lima, Cécile in Assumption of risk , BIOMETRIC METHODOLOGY , Biometrics

2012

Prognostic estimators for a clinical event may use repeated measurements of markers in addition to fixed covariates. These measurements can be linked to the clinical event by joint models that involve latent features. When the objective is to choose between different prognosis estimators based on joint models, the conventional Akaike information criterion is not well adapted and decision should be based on predictive accuracy. We define an adapted risk function called expected prognostic cross‐entropy. We define another risk function for the case of right‐censored observations, the expected prognostic observed cross‐entropy (EPOCE). These risks can be estimated by leave‐one‐out cross‐validation, for which we give approximate formulas and asymptotic distributions. The approximated cross‐validated estimator CVPOL a of EPOCE is studied in simulation and applied to the comparison of several joint latent class models for prognosis of recurrence of prostate cancer using prostate‐specific antigen measurements.

Journal Article

Share this book

Add to My Shelf

Direct importance estimation for covariate shift adaptation

by von Bünau, Paul , Sugiyama, Masashi , Nakajima, Shinichi in Algorithms , Computer science , Economics

2008

A situation where training and test samples follow different input distributions is called covariate shift . Under covariate shift, standard learning methods such as maximum likelihood estimation are no longer consistent—weighted variants according to the ratio of test and training input densities are consistent. Therefore, accurately estimating the density ratio, called the importance , is one of the key issues in covariate shift adaptation. A naive approach to this task is to first estimate training and test input densities separately and then estimate the importance by taking the ratio of the estimated densities. However, this naive approach tends to perform poorly since density estimation is a hard task particularly in high dimensional cases. In this paper, we propose a direct importance estimation method that does not involve density estimation. Our method is equipped with a natural cross validation procedure and hence tuning parameters such as the kernel width can be objectively optimized. Furthermore, we give rigorous mathematical proofs for the convergence of the proposed algorithm. Simulations illustrate the usefulness of our approach.

Journal Article

Share this book

Add to My Shelf

Choice of Estimators Based on Different Observations: Modified AIC and LCV Criteria

by LIQUET, BENOIT , COMMENGES, DANIEL in Applications , Approximation , Biology, psychology, social sciences

2011

It is quite common in epidemiology that we wish to assess the quality of estimators on a particular set of information, whereas the estimators may use a larger set of information. Two examples are studied: the first occurs when we construct a model for an event which happens if a continuous variable is above a certain threshold. We can compare estimators based on the observation of only the event or on the whole continuous variable. The other example is that of predicting the survival based only on survival information or using in addition information on a disease. We develop modified Akaike information criterion (AIC) and Likelihood cross-validation (LCV) criteria to compare estimators in this non-standard situation. We show that a normalized difference of AIC has a bias equal to o(n −1 ) if the estimators are based on well-specified models; a normalized difference of LCV always has a bias equal to o(n −1 ). A simulation study shows that both criteria work well, although the normalized difference of LCV tends to be better and is more robust. Moreover in the case of well-specified models the difference of risks boils down to the difference of statistical risks which can be rather precisely estimated. For 'compatible' models the difference of risks is often the main term but there can also be a difference of mis-specification risks.

Journal Article

Share this book

Add to My Shelf

Robust Likelihood Cross-Validation for Kernel Density Estimation

by Wu, Ximing in Bandwidth selection , Likelihood cross-validation , Multivariate density estimation

2019

Likelihood cross-validation for kernel density estimation is known to be sensitive to extreme observations and heavy-tailed distributions. We propose a robust likelihood-based cross-validation method to select bandwidths in multivariate density estimations. We derive this bandwidth selector within the framework of robust maximum likelihood estimation. This method establishes a smooth transition from likelihood cross-validation for nonextreme observations to least squares cross-validation for extreme observations, thereby combining the efficiency of likelihood cross-validation and the robustness of least-squares cross-validation. We also suggest a simple rule to select the transition threshold. We demonstrate the finite sample performance and practical usefulness of the proposed method via Monte Carlo simulations and a real data application on Chinese air pollution.

Journal Article

Share this book

Add to My Shelf

Local likelihood density estimation for interval censored data

by Stafford, James E. , Duchesne, Thierry , Braun, John in Aggregate data , Algorithms , Censored data

2005

The authors propose a class of procedures for local likelihood estimation from data that are either interval-censored or that have been aggregated into bins. One such procedure relies on an algorithm that generalizes existing self-consistency algorithms by introducing kernel smoothing at each step of the iteration. The entire class of procedures yields estimates that are obtained as solutions of fixed point equations. By discretizing and applying numerical integration, the authors use fixed point theory to study convergence of algorithms for the class. Rapid convergence is effected by the implementation of a local EM algorithm as a global Newton iteration. The latter requires an explicit solution of the local likelihood equations which can be found by using the symbolic Newton-Raphson algorithm, if necessary. /// Les auteurs proposent une classe de procédures pour l'estimation de la densité par vraisemblance locale lorsque les données sont censurées par intervalle ou qu'elles ont été regroupées en classes. L'une de ces procédures s'appuie sur un algorithme qui, en faisant appel à un noyau lissant à chaque itération, généralise les algorithmes auto-convergents déjà existants. Les estimations auxquelles la classe conduit sont des points fixes de certaines équations. En s'appuyant sur des techniques de discrétisation et d'intégration numérique, les auteurs se servent de la théorie des points fixes pour étudier la convergence des algorithmes de la classe. La convergence est accélérée par l'emploi d'un algorithme EM local dans l'itération globale de la méthode de Newton. Cette dernière fait intervenir une solution d'équations de vraisemblance locale qui, au besoin, peut être trouvée au moyen d'un algorithme de Newton-Raphson symbolique.

Journal Article

Share this book

Add to My Shelf

Non-parametric adaptive bandwidth selection for kernel estimators of spatial intensity functions

by van Lieshout, M. N. M. in Bandwidths , Bias , Economics

2024

We introduce a new fully non-parametric two-step adaptive bandwidth selection method for kernel estimators of spatial point process intensity functions based on the Campbell–Mecke formula and Abramson’s square root law. We present a simulation study to assess its performance relative to other adaptive and global bandwidth selectors, investigate the influence of the pilot estimator and apply the technique to two data sets: A pattern of trees and an earthquake catalogue.

Journal Article

Share this book

Add to My Shelf

Assessment of the Generalization Ability of the ASTM E900-15 Embrittlement Trend Curve by Means of Monte Carlo Cross-Validation

by Diego Ferreño , Mark Kirk , Marta Serrano in Complexity , Embrittlement , Fluence

2022

The standard ASTM E900-15 provides an analytical expression to determine the transition temperature shift exhibited by Charpy V-notch data at 41-J for irradiated pressure vessel materials as a function of the variables copper, nickel, phosphorus, manganese, irradiation temperature, neutron fluence, and product form. The 26 free parameters included in this embrittlement correlation were fitted through maximum likelihood estimation using the PLOTTER—BASELINE database, which contains 1878 observations from commercial power reactors. The complexity of this model, derived from its high number of free parameters, invites a consideration of the possible existence of overfitting. The undeniable goal of a good predictive model is to generalize well from the training data that was used to fit its free parameters to new data from the problem domain. Overfitting takes place when a model, due to its high complexity, is able to learn not only the signal but also the noise in the training data to the extent that it negatively impacts the performance of the model on new data. This paper proposes the resampling method of Monte Carlo cross-validation to estimate the putative overfitting level of the ASTM E900-15 predictive model. This methodology is general and can be employed with any predictive model. After 5000 iterations of Monte Carlo cross-validation, large training and test datasets (7,035,000 and 2,355,000 instances, respectively) were obtained and compared to measure the amount of overfitting. A slightly lower prediction capacity was observed in the test set, both in terms of R2 (0.871 vs. 0.877 in the train set) and the RMSE (13.53 °C vs. 13.22 °C in the train set). Besides, strong statistically significant differences, which contrast with the subtle differences observed in R2 and RMSE, were obtained both between the means and the variances of the training and test sets. This result, which may seem paradoxical, can be properly interpreted from a correct understanding of the meaning of the p-value in practical terms. In conclusion, the ASTM E900-15 embrittlement trend curve possess good generalization ability and experiences a limited amount of overfitting.

Journal Article

Share this book

Add to My Shelf

Selecting the best home range model: an information-theoretic approach

by Garton, Edward O. , Horne, Jon S. in Akaike's information criterion , Animal and plant ecology , Animal behavior

2006

Choosing an appropriate home range model is important for describing space use by animals and understanding the ecological processes affecting animal movement. Traditional approaches for choosing among home range models have not resulted in general, consistent, and unambiguous criteria that can be applied to individual data sets. We present a new application of information-theoretic model selection that overcomes many of the limitations of traditional approaches, as follows. (1) It alleviates the need to know the true home range to assess home range models, thus allowing performance to be evaluated with data on individual animals. (2) The best model can be chosen from a set of candidate models with the proper balance between fit and complexity. (3) If candidate home range models are based on underlying ecological processes, researchers can use the selected model not only to describe the home range, but also to infer the importance of various ecological processes affecting animal movements within the home range.

Journal Article

Share this book

Add to My Shelf

SEMIPARAMETRIC ZERO-INFLATED MODELING IN MULTI-ETHNIC STUDY OF ATHEROSCLEROSIS (MESA)

by Kronmal, Richard , Liu, Hai , Ma, Shuangge in Atherosclerosis , Calcium , Cardiovascular disease

2012

We analyze the Agatston score of coronary artery calcium (CAC) from the Multi-Ethnic Study of Atherosclerosis (MESA) using the semiparametric zero-inflated modeling approach, where the observed CAC scores from this cohort consist of high frequency of zeroes and continuously distributed positive values. Both partially constrained and unconstrained models are considered to investigate the underlying biological processes of CAC development from zero to positive, and from small amount to large amount. Different from existing studies, a model selection procedure based on likelihood cross-validation is adopted to identify the optimal model, which is justified by comparative Monte Carlo studies. A shrinkaged version of cubic regression spline is used for model estimation and variable selection simultaneously. When applying the proposed methods to the MESA data analysis, we show that the two biological mechanisms influencing the initiation of CAC and the magnitude of CAC when it is positive are better characterized by an unconstrained zero-inflated normal model. Our results are significantly different from those in published studies, and may provide further insights into the biological mechanisms underlying CAC development in humans. This highly flexible statistical framework can be applied to zero-inflated data analyses in other areas.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter