Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
3,709
result(s) for
"Combinatorics. Ordered structures"
Sort by:
Multivariate Matching Methods That Are Monotonic Imbalance Bounding
by
Iacus, Stefano M.
,
King, Gary
,
Porro, Giuseppe
in
Applications
,
Causal inference
,
Combinatorics
2011
We introduce a new \"Monotonic Imbalance Bounding\" (MIB) class of matching methods for causal inference with a surprisingly large number of attractive statistical properties. MIB generalizes and extends in several new directions the only existing class, \"Equal Percent Bias Reducing\" (EPBR), which is designed to satisfy weaker properties and only in expectation. We also offer strategies to obtain specific members of the MIB class, and analyze in more detail a member of this class, called Coarsened Exact Matching, whose properties we analyze from this new perspective. We offer a variety of analytical results and numerical simulations that demonstrate how members of the MIB class can dramatically improve inferences relative to EPBR-based matching methods.
Journal Article
From real affine geometry to complex geometry
2011
We construct from a real affine manifold with singularities (a tropical manifold) a degeneration of Calabi-Yau manifolds. This solves a fundamental problem in mirror symmetry. Furthermore, a striking feature of our approach is that it yields an explicit and canonical order-by-order description of the degeneration via families of tropical trees. This gives complete control of the B-model side of mirror symmetry in terms of tropical geometry. For example, we expect that our deformation parameter is a canonical coordinate, and expect period calculations to be expressible in terms of tropical curves. We anticipate this will lead to a proof of mirror symmetry via tropical methods.
Journal Article
SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN
2012
We develop results for the use of Lasso and post-Lasso methods to form first-stage predictions and estimate optimal instruments in linear instrumental variables (IV) models with many instruments, p. Our results apply even when p is much larger than the sample size, n. We show that the IV estimator based on using Lasso or post-Lasso in the first stage is root-n consistent and asymptotically normal when the first stage is approximately sparse, that is, when the conditional expectation of the endogenous variables given the instruments can be well-approximated by a relatively small set of variables whose identities may be unknown. We also show that the estimator is semiparametrically efficient when the structural error is homoscedastic. Notably, our results allow for imperfect model selection, and do not rely upon the unrealistic \"beta-min\" conditions that are widely used to establish validity of inference following model selection (see also Belloni, Chernozhukov, and Hansen (2011b)). In simulation experiments, the Lasso-based IV estimator with a data-driven penalty performs well compared to recently advocated many-instrument robust procedures. In an empirical example dealing with the effect of judicial eminent domain decisions on economic outcomes, the Lasso-based IV estimator outperforms an intuitive benchmark. Optimal instruments are conditional expectations. In developing the IV results, we establish a series of new results for Lasso and post-Lasso estimators of nonparametric conditional expectation functions which are of independent theoretical and practical interest. We construct a modification of Lasso designed to deal with non-Gaussian, heteroscedastic disturbances that uses a data-weighted 𝓁₁-penalty function. By innovatively using moderate deviation theory for self-normalized sums, we provide convergence rates for the resulting Lasso and post-Lasso estimators that are as sharp as the corresponding rates in the homoscedastic Gaussian case under the condition that log p = o(n 1/3 ). We also provide a data-driven method for choosing the penalty level that must be specified in obtaining Lasso and post-Lasso estimates and establish its asymptotic validity under non-Gaussian, heteroscedastic disturbances.
Journal Article
Estimating the technology of cognitive and noncognitive skill formation
by
Cunha, Flavio
,
Schennach, Susanne M.
,
Heckman, James J.
in
anchoring test scores
,
Applications
,
Bildungsinvestition
2010
This paper formulates and estimates multistage production functions for children's cognitive and noncognitive skills. Skills are determined by parental environments and investments at different stages of childhood. We estimate the elasticity of substitution between investments in one period and stocks of skills in that period to assess the benefits of early investment in children compared to later remediation. We establish nonparametric identification of a general class of production technologies based on nonlinear factor models with endogenous inputs. A by-product of our approach is a framework for evaluating childhood and schooling interventions that does not rely on arbitrarily scaled test scores as outputs and recognizes the differential effects of the same bundle of skills in different tasks. Using the estimated technology, we determine optimal targeting of interventions to children with different parental and personal birth endowments. Substitutability decreases in later stages of the life cycle in the production of cognitive skills. It is roughly constant across stages of the life cycle in the production of noncognitive skills. This finding has important implications for the design of policies that target the disadvantaged. For most configurations of disadvantage it is optimal to invest relatively more in the early stages of childhood than in later stages.
Journal Article
Relational retrieval using a combination of path-constrained random walks
2010
Scientific literature with rich metadata can be represented as a labeled directed graph. This graph representation enables a number of scientific tasks such as
ad hoc
retrieval or named entity recognition (NER) to be formulated as
typed proximity queries
in the graph. One popular proximity measure is called
Random Walk with Restart
(RWR), and much work has been done on the supervised learning of RWR measures by associating each edge label with a parameter. In this paper, we describe a novel learnable proximity measure which instead uses one weight per edge label
sequence
: proximity is defined by a weighted combination of simple “path experts”, each corresponding to following a particular sequence of labeled edges. Experiments on eight tasks in two subdomains of biology show that the new learning method significantly outperforms the RWR model (both trained and untrained). We also extend the method to support two additional types of experts to model intrinsic properties of entities:
query-independent experts
, which generalize the PageRank measure, and
popular entity experts
which allow rankings to be adjusted for particular entities that are especially important.
Journal Article
Energy-Based Geometric Multi-model Fitting
by
Boykov, Yuri
,
Isack, Hossam
in
Algorithmics. Computability. Computer arithmetics
,
Algorithms
,
Analysis
2012
Geometric model fitting is a typical chicken-&-egg problem: data points should be clustered based on geometric proximity to models whose unknown parameters must be estimated at the same time. Most existing methods, including generalizations of
RANSAC
, greedily search for models with most inliers (within a threshold) ignoring overall classification of points. We formulate geometric multi-model fitting as an optimal labeling problem with a global energy function balancing geometric errors and
regularity
of inlier clusters. Regularization based on spatial coherence (on some near-neighbor graph) and/or label costs is NP hard. Standard combinatorial algorithms with guaranteed approximation bounds (e.g.
α
-expansion) can minimize such regularization energies over a finite set of labels, but they are not directly applicable to a continuum of labels, e.g.
in line fitting. Our proposed approach (
PEaRL
) combines model sampling from data points as in
RANSAC
with iterative re-estimation of inliers and models’ parameters based on a global regularization functional. This technique efficiently explores the continuum of labels in the context of energy minimization. In practice,
PEaRL
converges to a good quality local minimum of the energy automatically selecting a small number of models that best explain the whole data set. Our tests demonstrate that our energy-based approach significantly improves the current state of the art in geometric model fitting currently dominated by various greedy generalizations of
RANSAC
.
Journal Article
IDENTIFIABILITY OF PARAMETERS IN LATENT STRUCTURE MODELS WITH MANY OBSERVED VARIABLES
2009
While hidden class models of various types arise in many statistical applications, it is often difficult to establish the identifiability of their parameters. Focusing on models in which there is some structure of independence of some of the observed variables conditioned on hidden ones, we demonstrate a general approach for establishing identifiability utilizing algebraic arguments. A theorem of J. Kruskal for a simple latent-class model with finite state space lies at the core of our results, though we apply it to a diverse set of models. These include mixtures of both finite and nonparametric product distributions, hidden Markov models and random graph mixture models, and lead to a number of new results and improvements to old ones. In the parametric setting, this approach indicates that for such models, the classical definition of identifiability is typically too strong. Instead generic identifiability holds, which implies that the set of nonidentifiable parameters has measure zero, so that parameter inference is still meaningful. In particular, this sheds light on the properties of finite mixtures of Bernoulli products, which have been used for decades despite being known to have nonidentifiable parameters. In the nonparametric setting, we again obtain identifiability only when certain restrictions are placed on the distributions that are mixed, but we explicitly describe the conditions.
Journal Article
p4est : Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees
by
Burstedde, Carsten
,
Wilcox, Lucas C.
,
Ghattas, Omar
in
Adaptive algorithms
,
Algorithms
,
Coarsening
2011
(ProQuest: ... denotes formulae/symbols omitted.)The authors present scalable algorithms for parallel adaptive mesh refinement and coarsening (AMR), partitioning, and 2:1 balancing on computational domains composed of multiple connected two-dimensional quadtrees or three-dimensional octrees, referred to as a forest of octrees. By distributing the union of octants from all octrees in parallel, they combine the high scalability proven previously for adaptive single-octree algorithms with the geometric flexibility that can be achieved by arbitrarily connected hexahedral macromeshes, in which each macroelement is the root of an adapted octree. A key concept of their approach is an encoding scheme of the interoctree connectivity that permits arbitrary relative orientations between octrees. They demonstrate the parallel scalability of p4est on its own and in combination with two geophysics codes. Using p4est they generate and adapt multioctree meshes with up to 5.13 x ... octants on as many as 220,320 CPU cores and execute the 2:1 balance algorithm in less than 10 seconds per million octants per process.
Journal Article
Maximizing Non-monotone Submodular Functions
by
Vondrák, Jan
,
Mirrokni, Vahab S.
,
Feige, Uriel
in
Algorithmics. Computability. Computer arithmetics
,
Algorithms
,
Applied mathematics
2011
Submodular maximization generalizes many important problems including Max Cut in directed and undirected graphs and hypergraphs, certain constraint satisfaction problems, and maximum facility location problems. Unlike the problem of minimizing submodular functions, the problem of maximizing submodular functions is NP-hard. In this paper, we design the first constant-factor approximation algorithms for maximizing nonnegative (non-monotone) submodular functions. In particular, we give a deterministic local-search $\\frac{1}{3}$-approximation and a randomized $\\frac{2}{5}$-approximation algorithm for maximizing nonnegative submodular functions. We also show that a uniformly random set gives a $\\frac{1}{4}$-approximation. For symmetric submodular functions, we show that a random set gives a $\\frac{1}{2}$-approximation, which can also be achieved by deterministic local search. These algorithms work in the value oracle model, where the submodular function is accessible through a black box returning $f(S)$ for a given set $S$. We show that in this model, a $(\\frac{1}{2}+\\epsilon)$-approximation for symmetric submodular functions would require an exponential number of queries for any fixed $\\epsilon>0$. In the model where $f$ is given explicitly (as a sum of nonnegative submodular functions, each depending only on a constant number of elements), we prove NP-hardness of $(\\frac{5}{6}+\\epsilon)$-approximation in the symmetric case and NP-hardness of $(\\frac{3}{4}+\\epsilon)$-approximation in the general case.
Journal Article
Fuzzy Extractors: How to Generate Strong Keys from Biometrics and Other Noisy Data
by
Dodis, Yevgeniy
,
Smith, Adam
,
Ostrovsky, Rafail
in
Applied sciences
,
Biometrics
,
Combinatorics
2008
We provide formal definitions and efficient secure techniques for turning noisy information into keys usable for any cryptographic application, and, in particular, reliably and securely authenticating biometric data. Our techniques apply not just to biometric information, but to any keying material that, unlike traditional cryptographic keys, is (1) not reproducible precisely and (2) not distributed uniformly. We propose two primitives: a fuzzy extractor reliably extracts nearly uniform randomness $R$ from its input; the extraction is error-tolerant in the sense that $R$ will be the same even if the input changes, as long as it remains reasonably close to the original. Thus, $R$ can be used as a key in a cryptographic application. A secure sketch produces public information about its input $w$ that does not reveal $w$ and yet allows exact recovery of $w$ given another value that is close to $w$. Thus, it can be used to reliably reproduce error-prone biometric inputs without incurring the security risk inherent in storing them. We define the primitives to be both formally secure and versatile, generalizing much prior work. In addition, we provide nearly optimal constructions of both primitives for various measures of \"closeness\" of input data, such as Hamming distance, edit distance, and set difference.
Journal Article