Catalogue Search | MBRL

Multivariate Matching Methods That Are Monotonic Imbalance Bounding

by Iacus, Stefano M. , King, Gary , Porro, Giuseppe in Applications , Causal inference , Combinatorics

2011

We introduce a new \"Monotonic Imbalance Bounding\" (MIB) class of matching methods for causal inference with a surprisingly large number of attractive statistical properties. MIB generalizes and extends in several new directions the only existing class, \"Equal Percent Bias Reducing\" (EPBR), which is designed to satisfy weaker properties and only in expectation. We also offer strategies to obtain specific members of the MIB class, and analyze in more detail a member of this class, called Coarsened Exact Matching, whose properties we analyze from this new perspective. We offer a variety of analytical results and numerical simulations that demonstrate how members of the MIB class can dramatically improve inferences relative to EPBR-based matching methods.

Journal Article

Share this book

Add to My Shelf

From real affine geometry to complex geometry

by Siebert, Bernd , Gross, Mark in Algebra , Algebraic geometry , Automorphisms

2011

We construct from a real affine manifold with singularities (a tropical manifold) a degeneration of Calabi-Yau manifolds. This solves a fundamental problem in mirror symmetry. Furthermore, a striking feature of our approach is that it yields an explicit and canonical order-by-order description of the degeneration via families of tropical trees. This gives complete control of the B-model side of mirror symmetry in terms of tropical geometry. For example, we expect that our deformation parameter is a canonical coordinate, and expect period calculations to be expressible in terms of tropical curves. We anticipate this will lead to a proof of mirror symmetry via tropical methods.

Journal Article

Share this book

Add to My Shelf

SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN

by Hansen, C. , Belloni, A. , Chernozhukov, V. in Applications , Approximation , Combinatorics

2012

We develop results for the use of Lasso and post-Lasso methods to form first-stage predictions and estimate optimal instruments in linear instrumental variables (IV) models with many instruments, p. Our results apply even when p is much larger than the sample size, n. We show that the IV estimator based on using Lasso or post-Lasso in the first stage is root-n consistent and asymptotically normal when the first stage is approximately sparse, that is, when the conditional expectation of the endogenous variables given the instruments can be well-approximated by a relatively small set of variables whose identities may be unknown. We also show that the estimator is semiparametrically efficient when the structural error is homoscedastic. Notably, our results allow for imperfect model selection, and do not rely upon the unrealistic \"beta-min\" conditions that are widely used to establish validity of inference following model selection (see also Belloni, Chernozhukov, and Hansen (2011b)). In simulation experiments, the Lasso-based IV estimator with a data-driven penalty performs well compared to recently advocated many-instrument robust procedures. In an empirical example dealing with the effect of judicial eminent domain decisions on economic outcomes, the Lasso-based IV estimator outperforms an intuitive benchmark. Optimal instruments are conditional expectations. In developing the IV results, we establish a series of new results for Lasso and post-Lasso estimators of nonparametric conditional expectation functions which are of independent theoretical and practical interest. We construct a modification of Lasso designed to deal with non-Gaussian, heteroscedastic disturbances that uses a data-weighted 𝓁₁-penalty function. By innovatively using moderate deviation theory for self-normalized sums, we provide convergence rates for the resulting Lasso and post-Lasso estimators that are as sharp as the corresponding rates in the homoscedastic Gaussian case under the condition that log p = o(n 1/3 ). We also provide a data-driven method for choosing the penalty level that must be specified in obtaining Lasso and post-Lasso estimates and establish its asymptotic validity under non-Gaussian, heteroscedastic disturbances.

Journal Article

Share this book

Add to My Shelf

Estimating the technology of cognitive and noncognitive skill formation

by Cunha, Flavio , Schennach, Susanne M. , Heckman, James J. in anchoring test scores , Applications , Bildungsinvestition

2010

This paper formulates and estimates multistage production functions for children's cognitive and noncognitive skills. Skills are determined by parental environments and investments at different stages of childhood. We estimate the elasticity of substitution between investments in one period and stocks of skills in that period to assess the benefits of early investment in children compared to later remediation. We establish nonparametric identification of a general class of production technologies based on nonlinear factor models with endogenous inputs. A by-product of our approach is a framework for evaluating childhood and schooling interventions that does not rely on arbitrarily scaled test scores as outputs and recognizes the differential effects of the same bundle of skills in different tasks. Using the estimated technology, we determine optimal targeting of interventions to children with different parental and personal birth endowments. Substitutability decreases in later stages of the life cycle in the production of cognitive skills. It is roughly constant across stages of the life cycle in the production of noncognitive skills. This finding has important implications for the design of policies that target the disadvantaged. For most configurations of disadvantage it is optimal to invest relatively more in the early stages of childhood than in later stages.

Journal Article

Share this book

Add to My Shelf

Relational retrieval using a combination of path-constrained random walks

by Lao, Ni , Cohen, William W. in Algorithms , Applied sciences , Artificial Intelligence

2010

Scientific literature with rich metadata can be represented as a labeled directed graph. This graph representation enables a number of scientific tasks such as ad hoc retrieval or named entity recognition (NER) to be formulated as typed proximity queries in the graph. One popular proximity measure is called Random Walk with Restart (RWR), and much work has been done on the supervised learning of RWR measures by associating each edge label with a parameter. In this paper, we describe a novel learnable proximity measure which instead uses one weight per edge label sequence : proximity is defined by a weighted combination of simple “path experts”, each corresponding to following a particular sequence of labeled edges. Experiments on eight tasks in two subdomains of biology show that the new learning method significantly outperforms the RWR model (both trained and untrained). We also extend the method to support two additional types of experts to model intrinsic properties of entities: query-independent experts , which generalize the PageRank measure, and popular entity experts which allow rankings to be adjusted for particular entities that are especially important.

Journal Article

Share this book

Add to My Shelf

Energy-Based Geometric Multi-model Fitting

by Boykov, Yuri , Isack, Hossam in Algorithmics. Computability. Computer arithmetics , Algorithms , Analysis

2012

Geometric model fitting is a typical chicken-&-egg problem: data points should be clustered based on geometric proximity to models whose unknown parameters must be estimated at the same time. Most existing methods, including generalizations of RANSAC , greedily search for models with most inliers (within a threshold) ignoring overall classification of points. We formulate geometric multi-model fitting as an optimal labeling problem with a global energy function balancing geometric errors and regularity of inlier clusters. Regularization based on spatial coherence (on some near-neighbor graph) and/or label costs is NP hard. Standard combinatorial algorithms with guaranteed approximation bounds (e.g. α -expansion) can minimize such regularization energies over a finite set of labels, but they are not directly applicable to a continuum of labels, e.g. in line fitting. Our proposed approach ( PEaRL ) combines model sampling from data points as in RANSAC with iterative re-estimation of inliers and models’ parameters based on a global regularization functional. This technique efficiently explores the continuum of labels in the context of energy minimization. In practice, PEaRL converges to a good quality local minimum of the energy automatically selecting a small number of models that best explain the whole data set. Our tests demonstrate that our energy-based approach significantly improves the current state of the art in geometric model fitting currently dominated by various greedy generalizations of RANSAC .

Journal Article

Share this book

Add to My Shelf

IDENTIFIABILITY OF PARAMETERS IN LATENT STRUCTURE MODELS WITH MANY OBSERVED VARIABLES

by Matias, Catherine , Allman, Elizabeth S. , Rhodes, John A. in 62E10 , 62F99 , 62G99

2009

While hidden class models of various types arise in many statistical applications, it is often difficult to establish the identifiability of their parameters. Focusing on models in which there is some structure of independence of some of the observed variables conditioned on hidden ones, we demonstrate a general approach for establishing identifiability utilizing algebraic arguments. A theorem of J. Kruskal for a simple latent-class model with finite state space lies at the core of our results, though we apply it to a diverse set of models. These include mixtures of both finite and nonparametric product distributions, hidden Markov models and random graph mixture models, and lead to a number of new results and improvements to old ones. In the parametric setting, this approach indicates that for such models, the classical definition of identifiability is typically too strong. Instead generic identifiability holds, which implies that the set of nonidentifiable parameters has measure zero, so that parameter inference is still meaningful. In particular, this sheds light on the properties of finite mixtures of Bernoulli products, which have been used for decades despite being known to have nonidentifiable parameters. In the nonparametric setting, we again obtain identifiability only when certain restrictions are placed on the distributions that are mixed, but we explicitly describe the conditions.

Journal Article

Share this book

Add to My Shelf

p4est : Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees

by Burstedde, Carsten , Wilcox, Lucas C. , Ghattas, Omar in Adaptive algorithms , Algorithms , Coarsening

2011

(ProQuest: ... denotes formulae/symbols omitted.)The authors present scalable algorithms for parallel adaptive mesh refinement and coarsening (AMR), partitioning, and 2:1 balancing on computational domains composed of multiple connected two-dimensional quadtrees or three-dimensional octrees, referred to as a forest of octrees. By distributing the union of octants from all octrees in parallel, they combine the high scalability proven previously for adaptive single-octree algorithms with the geometric flexibility that can be achieved by arbitrarily connected hexahedral macromeshes, in which each macroelement is the root of an adapted octree. A key concept of their approach is an encoding scheme of the interoctree connectivity that permits arbitrary relative orientations between octrees. They demonstrate the parallel scalability of p4est on its own and in combination with two geophysics codes. Using p4est they generate and adapt multioctree meshes with up to 5.13 x ... octants on as many as 220,320 CPU cores and execute the 2:1 balance algorithm in less than 10 seconds per million octants per process.

Journal Article

Share this book

Add to My Shelf

Maximizing Non-monotone Submodular Functions

by Vondrák, Jan , Mirrokni, Vahab S. , Feige, Uriel in Algorithmics. Computability. Computer arithmetics , Algorithms , Applied mathematics

2011

Submodular maximization generalizes many important problems including Max Cut in directed and undirected graphs and hypergraphs, certain constraint satisfaction problems, and maximum facility location problems. Unlike the problem of minimizing submodular functions, the problem of maximizing submodular functions is NP-hard. In this paper, we design the first constant-factor approximation algorithms for maximizing nonnegative (non-monotone) submodular functions. In particular, we give a deterministic local-search $\\frac{1}{3}$-approximation and a randomized $\\frac{2}{5}$-approximation algorithm for maximizing nonnegative submodular functions. We also show that a uniformly random set gives a $\\frac{1}{4}$-approximation. For symmetric submodular functions, we show that a random set gives a $\\frac{1}{2}$-approximation, which can also be achieved by deterministic local search. These algorithms work in the value oracle model, where the submodular function is accessible through a black box returning $f(S)$ for a given set $S$. We show that in this model, a $(\\frac{1}{2}+\\epsilon)$-approximation for symmetric submodular functions would require an exponential number of queries for any fixed $\\epsilon>0$. In the model where $f$ is given explicitly (as a sum of nonnegative submodular functions, each depending only on a constant number of elements), we prove NP-hardness of $(\\frac{5}{6}+\\epsilon)$-approximation in the symmetric case and NP-hardness of $(\\frac{3}{4}+\\epsilon)$-approximation in the general case.

Journal Article

Share this book

Add to My Shelf

Fuzzy Extractors: How to Generate Strong Keys from Biometrics and Other Noisy Data

by Dodis, Yevgeniy , Smith, Adam , Ostrovsky, Rafail in Applied sciences , Biometrics , Combinatorics

2008

We provide formal definitions and efficient secure techniques for turning noisy information into keys usable for any cryptographic application, and, in particular, reliably and securely authenticating biometric data. Our techniques apply not just to biometric information, but to any keying material that, unlike traditional cryptographic keys, is (1) not reproducible precisely and (2) not distributed uniformly. We propose two primitives: a fuzzy extractor reliably extracts nearly uniform randomness $R$ from its input; the extraction is error-tolerant in the sense that $R$ will be the same even if the input changes, as long as it remains reasonably close to the original. Thus, $R$ can be used as a key in a cryptographic application. A secure sketch produces public information about its input $w$ that does not reveal $w$ and yet allows exact recovery of $w$ given another value that is close to $w$. Thus, it can be used to reliably reproduce error-prone biometric inputs without incurring the security risk inherent in storing them. We define the primitives to be both formally secure and versatile, generalizing much prior work. In addition, we provide nearly optimal constructions of both primitives for various measures of \"closeness\" of input data, such as Hamming distance, edit distance, and set difference.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter