Catalogue Search | MBRL

ESTIMATING AND UNDERSTANDING EXPONENTIAL RANDOM GRAPH MODELS

by Chatterjee, Sourav , Diaconis, Persi in 05C80 , 60F10 , 62F10

2013

We introduce a method for the theoretical analysis of exponential random graph models. The method is based on a large-deviations approximation to the normalizing constant shown to be consistent using theory developed by Chatterjee and Varadhan [European J. Combin. 32 (2011) 1000—1017]. The theory explains a host of difficulties encountered by applied workers: many distinct models have essentially the same MLE, rendering the problems \"practically\" ill-posed. We give the first rigorous proofs of \"degeneracy\" observed in these models. Here, almost all graphs have essentially no edges or are essentially complete. We supplement recent work of Bhamidi, Bresler and Sly [2008 IEEE 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (2008) 803—812 IEEE] showing that for many models, the extra sufficient statistics are useless: most realizations look like the results of a simple Erdős—Rényi model. We also find classes of models where the limiting graphs differ from Erdős—Rényi graphs. A limitation of our approach, inherited from the limitation of graph limit theory, is that it works only for dense graphs.

Journal Article

Share this book

Add to My Shelf

LOCAL CASE-CONTROL SAMPLING: EFFICIENT SUBSAMPLING IN IMBALANCED DATA SETS

by Hastie, Trevor , Fithian, William in 62D05 , 62F10 , case-control sampling

2014

For classification problems with significant class imbalance, subsampling can reduce computational costs at the price of inflated variance in estimating model parameters. We propose a method for subsampling efficiently for logistic regression by adjusting the class balance locally in feature space via an accept-reject scheme. Our method generalizes standard case-control sampling, using a pilot estimate to preferentially select examples whose responses are conditionally rare given their features. The biased subsampling is corrected by a post-hoc analytic adjustment to the parameters. The method is simple and requires one parallelizable scan over the full data set. Standard case-control sampling is inconsistent under model misspecification for the population risk-minimizing coefficients θ*. By contrast, our estimator is consistent for θ* provided that the pilot estimate is. Moreover, under correct specification and with a consistent, independent pilot estimate, our estimator has exactly twice the asymptotic variance of the full-sample MLE—even if the selected subsample comprises a miniscule fraction of the full data set, as happens when the original data are severely imbalanced. The factor of two improves to $1 + \\frac{1}{c}$ if we multiply the baseline acceptance probabilities by c > 1 (and weight points with acceptance probability greater than 1), taking roughly $\\frac{{1 + c}}{2}$ times as many data points into the subsample. Experiments on simulated and real data show that our method can substantially outperform standard case-control subsampling.

Journal Article

Share this book

Add to My Shelf

ESTIMATION OF HIGH-DIMENSIONAL LOW-RANK MATRICES

by Tsybakov, Alexandre B. , Rohde, Angelika in 62F10 , 62G05 , Analytical estimating

2011

Suppose that we observe entries or, more generally, linear combinations of entries of an unknown m × T -matrix A corrupted by noise. We are particularly interested in the high-dimensional setting where the number mT of unknown entries can be much larger than the sample size N. Motivated by several applications, we consider estimation of matrix A under the assumption that it has small rank. This can be viewed as dimension reduction or sparsity assumption. In order to shrink toward a low-rank representation, we investigate penalized least squares estimators with a Schatten-p quasinorm penalty term, p ≤ 1. We study these estimators under two possible assumptions—a modified version of the restricted isometry condition and a uniform bound on the ratio \"empirical norm induced by the sampling operator/Frobenius norm.\" The main results are stated as nonasymptotic upper bounds on the prediction risk and on the Schatten-q risk of the estimators, where q ∈ [p, 2]. The rates that we obtain for the prediction risk are of the form rm/N (for m = T), up to logarithmic factors, where r is the rank of A. The particular examples of multi-task learning and matrix completion are worked out in detail. The proofs are based on tools from the theory of empirical processes. As a by-product, we derive bounds for the kth entropy numbers of the quasi-convex Schatten class embeddings $S_{p}^{M}\\hookrightarrow S_{2}^{M}$ , p < 1, which are of independent interest.

Journal Article

Share this book

Add to My Shelf

ASYMPTOTICS IN DIRECTED EXPONENTIAL RANDOM GRAPH MODELS WITH AN INCREASING BI-DEGREE SEQUENCE

by Zhu, Ji , Yan, Ting , Leng, Chenlei in 05C80 , 62B05 , 62E20

2016

Although asymptotic analyses of undirected network models based on degree sequences have started to appear in recent literature, it remains an open problem to study statistical properties of directed network models. In this paper, we provide for the first time a rigorous analysis of directed exponential random graph models using the in-degrees and out-degrees as sufficient statistics with binary as well as continuous weighted edges. We establish the uniform consistency and the asymptotic normality for the maximum likelihood estimate, when the number of parameters grows and only one realized observation of the graph is available. One key technique in the proofs is to approximate the inverse of the Fisher information matrix using a simple matrix with high accuracy. Numerical studies confirm our theoretical findings.

Journal Article

Share this book

Add to My Shelf

The unit Muth distribution: statistical properties and applications

by Maya, R. , Irshad, M. R. , Jodrá, P. in Algebra , Analysis , Geometry

2024

This paper introduces a bounded probability distribution which is derived from the Muth distribution. The main statistical properties are studied and analytical expressions are provided for the moments, incomplete moments, inverse of the cumulative distribution function, extropy, Lorentz and Bonferroni curves, among others. Moreover, it possesses both monotone and non-monotone hazard rate functions so the new distribution is rich enough to model real data. Different estimation methods are applied to estimate the parameters of the model and a Monte Carlo simulation study assesses their performances. The usefulness in practical applications is illustrated using two real data sets and the results show that the proposed distribution provides better fits than other competing distributions commonly used to model data with bounded support.

Journal Article

Share this book

Add to My Shelf

FIXED POINTS EM ALGORITHM AND NONNEGATIVE RANK BOUNDARIES

by Kubjas, Kaie , Robeva, Elina , Sturmfels, Bernd in 13P25 , 62F10 , EM algorithm

2015

Mixtures of r independent distributions for two discrete random variables can be represented by matrices of nonnegative rank r. Likelihood inference for the model of such joint distributions leads to problems in real algebraic geometry that are addressed here for the first time. We characterize the set of fixed points of the Expectation-Maximization algorithm, and we study the boundary of the space of matrices with nonnegative rank at most 3. Both of these sets correspond to algebraic varieties with many irreducible components.

Journal Article

Share this book

Add to My Shelf

VARIABLE SELECTION IN LINEAR MIXED EFFECTS MODELS

by Fan, Yingying , Li, Runze in 62F10 , 62J05 , 62J07

2012

This paper is concerned with the selection and estimation of fixed and random effects in linear mixed effects models. We propose a class of nonconcave penalized profile likelihood methods for selecting and estimating important fixed effects. To overcome the difficulty of unknown covariance matrix of random effects, we propose to use a proxy matrix in the penalized profile likelihood. We establish conditions on the choice of the proxy matrix and show that the proposed procedure enjoys the model selection consistency where the number of fixed effects is allowed to grow exponentially with the sample size. We further propose a group variable selection strategy to simultaneously select and estimate important random effects, where the unknown covariance matrix of random effects is replaced with a proxy matrix. We prove that, with the proxy matrix appropriately chosen, the proposed procedure can identify all true random effects with asymptotic probability one, where the dimension of random effects vector is allowed to increase exponentially with the sample size. Monte Carlo simulation studies are conducted to examine the finite-sample performance of the proposed procedures. We further illustrate the proposed procedures via a real data example.

Journal Article

Share this book

Add to My Shelf

PARAMETRIC ESTIMATION. FINITE SAMPLE THEORY

by Spokoiny, Vladimir in 62F10 , 62F25 , 62H12

2012

The paper aims at reconsidering the famous Le Cam LAN theory. The main features of the approach which make it different from the classical one are as follows: (1) the study is nonasymptotic, that is, the sample size is fixed and does not tend to infinity; (2) the parametric assumption is possibly misspecified and the underlying data distribution can lie beyond the given parametric family. These two features enable to bridge the gap between parametric and nonparametric theory and to build a unified framework for statistical estimation. The main results include large deviation bounds for the (quasi) maximum likelihood and the local quadratic bracketing of the log-likelihood process. The latter yields a number of important corollaries for statistical inference: concentration, confidence and risk bounds, expansion of the maximum likelihood estimate, etc. All these corollaries are stated in a nonclassical way admitting a model misspecification and finite samples. However, the classical asymptotic results including the efficiency bounds can be easily derived as corollaries of the obtained nonasymptotic statements. At the same time, the new bracketing device works well in the situations with large or growing parameter dimension in which the classical parametric theory fails. The general results are illustrated for the i. i.d. setup as well as for generalized linear and median estimation. The results apply for any dimension of the parameter space and provide a quantitative lower bound on the sample size yielding the root-n accuracy.

Journal Article

Share this book

Add to My Shelf

A stochastic model of area-biased Kpenadidum distribution with the characteristics and applications to real-lifetime data

by Poovitha, R. , Pandiyan, P. in 62E15 , 62F10 , 62N05

2025

In this present research, we explore the statistical characteristics of the Area-Biased Kpenadidum Distribution (ABKD), a novel probability model. The maximum likelihood method has been used to estimate the parameters, and the asymptotic findings have been explained. The new distribution was compared to the Shanker, Lindley, and Kpenadidum distributions. When the distribution was fitted to cancer data, a good fit was observed.

Journal Article

Share this book

Add to My Shelf

Point and Interval Estimation of Weibull Parameters Based on Joint Progressively Censored Data

by Kundu, Debasis , Mondal, Shuvashree in Mathematics and Statistics , Statistics

2019

The analysis of progressively censored data has received considerable attention in the last few years. In this paper, we consider the joint progressive censoring scheme for two populations. It is assumed that the lifetime distribution of the items from the two populations follows Weibull distribution with the same shape but different scale parameters. Based on the joint progressive censoring scheme, first, we consider the maximum likelihood estimators of the unknown parameters whenever they exist. We provide the Bayesian inferences of the unknown parameters under a fairly general priors on the shape and scale parameters. The Bayes estimators and the associated credible intervals cannot be obtained in closed form, and we propose to use the importance sampling technique to compute the same. Further, we consider the problem when it is known a priori that the expected lifetime of one population is smaller than the other. We provide the order-restricted classical and Bayesian inferences of the unknown parameters. Monte Carlo simulations are performed to observe the performances of the different estimators and the associated confidence and credible intervals. One real data set has been analyzed for illustrative purpose.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter