Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
101
result(s) for
"62J99"
Sort by:
REGULARIZED ESTIMATION IN SPARSE HIGH-DIMENSIONAL TIME SERIES MODELS
2015
Many scientific and economic problems involve the analysis of high-dimensional time series datasets. However, theoretical studies in highdimensional statistics to date rely primarily on the assumption of independent and identically distributed (i.i.d.) samples. In this work, we focus on stable Gaussian processes and investigate the theoretical properties of l₁-regularized estimates in two important statistical problems in the context of high-dimensional time series: (a) stochastic regression with serially correlated errors and (b) transition matrix estimation in vector autoregressive (VAR) models. We derive nonasymptotic upper bounds on the estimation errors of the regularized estimates and establish that consistent estimation under high-dimensional scaling is possible via l₁-regularization for a large class of stable processes under sparsity constraints. A key technical contribution of the work is to introduce a measure of stability for stationary processes using their spectral properties that provides insight into the effect of dependence on the accuracy of the regularized estimates. With this proposed stability measure, we establish some useful deviation bounds for dependent data, which can be used to study several important regularized estimates in a time series setting.
Journal Article
ON THE DEFINITION OF A CONFOUNDER
2013
The causal inference literature has provided a clear formal definition of confounding expressed in terms of counterfactual independence. The literature has not, however, come to any consensus on a formal definition of a confounder, as it has given priority to the concept of confounding over that of a confounder. We consider a number of candidate definitions arising from various more informal statements made in the literature. We consider the properties satisfied by each candidate definition, principally focusing on (i) whether under the candidate definition control for all \"confounders\" suffices to control for \"confounding\" and (ii) whether each confounder in some context helps eliminate or reduce confounding bias. Several of the candidate definitions do not have these two properties. Only one candidate definition of those considered satisfies both properties. We propose that a \"confounder\" be defined as a pre-exposure covariate C for which there exists a set of other covariates X such that effect of the exposure on the outcome is unconfounded conditional on (X, C) but such that for no proper subset of (X, C) is the effect of the exposure on the outcome unconfounded given the subset. We also provide a conditional analogue of the above definition; and we propose a variable that helps reduce bias but not eliminate bias be referred to as a \"surrogate confounder.\" These definitions are closely related to those given by Robins and Morgenstern [Comput. Math. Appl. 14 (1987) 869-916]. The implications that hold among the various candidate definitions are discussed.
Journal Article
NUCLEAR-NORM PENALIZATION AND OPTIMAL RATES FOR NOISY LOW-RANK MATRIX COMPLETION
2011
This paper deals with the trace regression model where n entries or linear combinations of entries of an unknown m₁ x m₂ matrix A₀ corrupted by noise are observed. We propose a new nuclear-norm penalized estimator of A₀ and establish a general sharp oracle inequality for this estimator for arbitrary values of n, m₁, m₂ under the condition of isometry in expectation. Then this method is applied to the matrix completion problem. In this case, the estimator admits a simple explicit form, and we prove that it satisfies oracle inequalities with faster rates of convergence than in the previous works. They are valid, in particular, in the high-dimensional setting m₁ m₂ ≫ n. We show that the obtained rates are optimal up to logarithmic factors in a minimax sense and also derive, for any fixed matrix A₀, a nonminimax lower bound on the rate of convergence of our estimator, which coincides with the upper bound up to a constant factor. Finally, we show that our procedure provides an exact recovery of the rank of A₀ with probability close to 1. We also discuss the statistical learning setting where there is no underlying model determined by A₀, and the aim is to find the best trace regression model approximating the data. As a by-product, we show that, under the restricted eigenvalue condition, the usual vector Lasso estimator satisfies a sharp oracle inequality (i.e., an oracle inequality with leading constant 1).
Journal Article
Gradient boosting with extreme-value theory for wildfire prediction
2023
This paper details the approach of the team Kohrrelation in the 2021 Extreme Value Analysis data challenge, dealing with the prediction of wildfire counts and sizes over the contiguous US. Our approach uses ideas from extreme-value theory in a machine learning context with theoretically justified loss functions for gradient boosting. We devise a spatial cross-validation scheme and show that in our setting it provides a better proxy for test set performance than naive cross-validation. The predictions are benchmarked against boosting approaches with different loss functions, and perform competitively in terms of the score criterion, finally placing second in the competition ranking.
Journal Article
A UNIFIED APPROACH TO MODEL SELECTION AND SPARSE RECOVERY USING REGULARIZED LEAST SQUARES
2009
Model selection and sparse recovery are two important problems for which many regularization methods have been proposed. We study the properties of regularization methods in both problems under the unified framework of regularized least squares with concave penalties. For model selection, we establish conditions under which a regularized least squares estimator enjoys a nonasymptotic property, called the weak oracle property, where the dimensionality can grow exponentially with sample size. For sparse recovery, we present a sufficient condition that ensures the recoverability of the sparsest solution. In particular, we approach both problems by considering a family of penalties that give a smooth homotopy between L₀ and L₁ penalties. We also propose the sequentially and iteratively reweighted squares (SIRS) algorithm for sparse recovery. Numerical studies support our theoretical results and demonstrate the advantage of our new methods for model selection and sparse recovery.
Journal Article
Functional Linear Regression That's Interpretable
2009
Regression models to relate a scalar Y to a functional predictor X (t) are becoming increasingly common. Work in this area has concentrated on estimating a coefficient function, β(t), with Y related to X(t) through ∫ β(t)X(t)dt. Regions where β(t) ≠ 0 correspond to places where there is a relationship between X (t) and Y. Alternatively, points where β(t) = 0 indicate no relationship. Hence, for interpretation purposes, it is desirable for a regression procedure to be capable of producing estimates of β(t) that are exactly zero over regions with no apparent relationship and have simple structures over the remaining regions. Unfortunately, most fitting procedures result in an estimate for β(t) that is rarely exactly zero and has unnatural wiggles making the curve hard to interpret. In this article we introduce a new approach which uses variable selection ideas, applied to various derivatives of β(t), to produce estimates that are both interpretable, flexible and accurate. We call our method \"Functional Linear Regression That's Interpretable\" (FLiRTI) and demonstrate it on simulated and real-world data sets. In addition, non-asymptotic theoretical bounds on the estimation error are presented. The bounds provide strong theoretical motivation for our approach.
Journal Article
Research on the innovative inheritance application of Guangxi minority elements in clothing design under the perspective of rural revitalization
2024
To be able to protect the historical culture better and cater to public preferences, this paper proposes the innovative inheritance of Guangxi minority costume design under rural revitalization. According to the object similarity shift, make all the objective function values reach predetermined values, iteratively calculate new clustering centers, and select initial values so that the minimal local values converge. Using Euclidean distance to calculate the distance between objects and divide them into corresponding clusters, we continuously process the data set and complexity to ensure that no clustering bias is formed. The curve relationship is constructed through a nonlinear function to reflect the possible probability change of clothing innovation and inheritance events, and the function is logarithmically transformed to increase the probability of occurrence of ethnic clothing innovation and inheritance under rural revitalization. The nonlinear relationship was transformed into a linear one so that the variables conformed to the normal distribution, and the clothing design innovation inheritance under rural revitalization was extracted. The analysis of the results shows that ethnic elements have a greater influence on contemporary product design under the perspective of rural revitalization, and the algorithm proposed in this paper has a high accuracy rate of 65% for the study of clothing design.
Journal Article
Online updating method with new variables for big data streams
2018
For big data arriving in streams online updating is an important statistical method that breaks the storage barrier and the computational barrier under certain circumstances. In the regression context online updating algorithms assume that the set of predictor variables does not change, and consequently cannot incorporate new variables that may become available midway through the data stream. A naive approach would be to discard all previous information and start updating with new variables from scratch. We propose a method that utilizes the information from earlier data in the online updating algorithm with bias corrections to improve efficiency. The method is developed for linear models first, and then extended to estimating equations for generalized linear models. Closed-form expressions for the efficiency gain over the naive approach are derived in a particular linear model setting. We compare the performance of our proposed bias-correcting approach and the naive approach in simulation studies with data generated from a normal linear model and a logistic regression model. The method is applied to a study on airline delay, where reasons for delays were only available more recently, starting in 2003.
Pour un flux de mégadonnées, la mise à jour en continu d’une méthode statistique permet d’éviter des problèmes d’entreposage et règle dans certains cas des défis relatifs au temps de calcul. Dans le contexte de la régression, les méthodes de mise à jour en continu supposent que l’ensemble des prédicteurs ne change pas et, par conséquent, une variable qui devient disponible en cours de route ne peut pas être intégrée au modèle. Une approche naïve consisterait à ignorer toute l’information recueillie précédemment, puis à recommencer avec un nouveau modèle qui inclut la nouvelle variable. Les auteurs proposent une méthode qui continue à utiliser les données précédentes avec un algorithme de mise à jour en continu du modèle auquel ils ajoutent une correction pour le biais afin d’en améliorer l’efficacité. Ils développent d’abord la méthode pour les modèles linéaires, puis l’étendent aux équations d’estimation pour les modèles linéaires généralisés. Ils trouvent des expressions analytiques pour le gain d’efficacité par rapport à la méthode naïve pour un certain type de modèle linéaire. Ils comparent la performance de leur méthode par rapport à l’approche naïve dans le cadre de simulations où les données sont générées selon un modèle linéaire normal, puis selon un modèle de régression logistique. Finalement, les auteurs appliquent leur méthode à une étude sur les retards de vols d’avion où la raison du retard n’est disponible qu’à partir de 2003.
Journal Article
Clustering for Bivariate Functional Data
by
Wan, Yan-ling
,
Cao, Shi-yun
,
Zhou, Yan-qiu
in
Algorithms
,
Applications of Mathematics
,
Bivariate analysis
2024
In this paper, we consider the clustering of bivariate functional data where each random surface consists of a set of curves recorded repeatedly for each subject. The
k
-centres surface clustering method based on marginal functional principal component analysis is proposed for the bivariate functional data, and a novel clustering criterion is presented where both the random surface and its partial derivative function in two directions are considered. In addition, we also consider two other clustering methods,
k
-centres surface clustering methods based on product functional principal component analysis or double functional principal component analysis. Simulation results indicate that the proposed methods have a nice performance in terms of both the correct classification rate and the adjusted rand index. The approaches are further illustrated through empirical analysis of human mortality data.
Journal Article
Residual Diagnostic Methods for Bell-Type Count Models
by
Akdur, Hatice Tul Kubra
,
Kilic, Duygu
,
Bayrak, Hulya
in
Binomial distribution
,
Biostatistics
,
Blood cells
2024
Count datasets represented as integers are commonly encountered in various scientific fields, encompassing scenarios such as the number of species in a habitat, the number of accidents at a junction, the number of infected cells. This type of data often entails the presence of zero counts, which can be notably prevalent within the dataset. Recently, the zero-inflated Bell distribution family has been introduced to address the substantial occurrence of zeros in count datasets. Model diagnosis is a crucial step to ensure the appropriateness of a fitted model for the given data. While Pearson and deviance residuals are commonly employed for diagnosing count models in practical applications, it is widely acknowledged that these residuals do not adhere to normality when applied to count data. In the present study, our focus lies in evaluating the effectiveness of conventional diagnostic tools, including Pearson and deviance residuals, as well as randomized quantile residuals (RQRs) for the novel Bell and zero-inflated Bell models, which have been proposed as solutions to address overdispersion and zero inflation, respectively. Through this investigation, we aim to shed light on the performance of these residuals in the context of these newly proposed models. In the simulation study, the normality of randomized quantile residuals based on the Shapiro-Wilk test’s p-values are investigated for detecting overdispersion and zero inflation for the Bell-type regression models. The findings of this study indicate the superiority of RQRs in detecting distributional assumptions and reveal that RQRs possess the capability to detect overdispersion and zero inflation under Bell-type models. The number of infected blood cells is used in the application part of the study to illustrate the residual diagnostics of Bell-type regression models. Poisson, Bell, negative binomial, and their zero-inflated versions are utilized to analyze the infected blood cells dataset. Model fit criteria are employed to compare the analysis results of these count models, both in terms of goodness of fit and residual diagnostics.
Journal Article