Catalogue Search | MBRL

Combining data and theory for derivable scientific discovery with AI-Descartes

by El Khadir, Bachir , Dash, Sanjeeb , Goncalves, Joao in 639/705/117 , 639/766/259 , Algorithms

2023

Scientists aim to discover meaningful formulae that accurately describe experimental data. Mathematical models of natural phenomena can be manually created from domain knowledge and fitted to data, or, in contrast, created automatically from large datasets with machine-learning algorithms. The problem of incorporating prior knowledge expressed as constraints on the functional form of a learned model has been studied before, while finding models that are consistent with prior knowledge expressed via general logical axioms is an open problem. We develop a method to enable principled derivations of models of natural phenomena from axiomatic knowledge and experimental data by combining logical reasoning with symbolic regression. We demonstrate these concepts for Kepler’s third law of planetary motion, Einstein’s relativistic time-dilation law, and Langmuir’s theory of adsorption. We show we can discover governing laws from few data points when logical reasoning is used to distinguish between candidate formulae having similar error on the data. Automatic extraction of consistent governing laws from data is a challenging problem. The authors propose a method that takes as input experimental data and background theory and combines symbolic regression with logical reasoning to obtain scientifically meaningful symbolic formulas.

Journal Article

Share this book

Add to My Shelf

Improved Approximation Algorithms for Geometric Set Cover

by Clarkson, Kenneth L. , Varadarajan, Kasturi in Algorithms , Approximation , Computer science

2007

Given a collection S of subsets of some set symbol omitted, symbol omitted and symbol omitted, the set cover problem is to find the smallest subcollection C is a subset S that covers (symbol omitted), that is, (symbol omitted),where (symbol omitted) denotes (equation omitted). We assume of course that S covers (symbol omitted). While the general problem is NP-hard to solve, even approximately, here we consider some geometric special cases, where usually (equation omitted)= (equation omitted). Combining previously known techniques [4], [5], we show that polynomial-time approximation algorithms with provable performance exist, under a certain general condition: that for a random subset R is a subset S and nondecreasing function f(dot), there is a decomposition of the complement (symbol omitted) into an expected at most f(symbol omitted) regions, each region of a particular simple form. Under this condition, a cover of size O(equation omitted) can be found in polynomial time. Using this result, and combinatorial geometry results implying bounding functions f(c) that are nearly linear, we obtain o(log c) approximation algorithms for covering by fat triangles, by pseudo-disks, by a family of fat objects, and others. Similarly, constant-factor approximations follow for similar-sized fat triangles and fat objects, and for fat wedges. With more work, we obtain constant-factor approximation algorithms for covering by unit cubes in (symbol omitted) and for guarding an x-monotone polygonal chain. [PUBLICATION ABSTRACT]

Journal Article

Share this book

Add to My Shelf

Random Sampling with Removal

by Lengler Johannes , Clarkson, Kenneth L , Gärtner Bernd in Algorithms , Computational geometry , Constraints

2020

We study randomized algorithms for constrained optimization, in abstract frameworks that include, in strictly increasing generality: convex programming; LP-type problems; violator spaces; and a setting we introduce, consistent spaces. Such algorithms typically involve a step of finding the optimal solution for a random sample of the constraints. They exploit the condition that, in finite dimension δ, this sample optimum violates a provably small expected fraction of the non-sampled constraints, with the fraction decreasing in the sample size r. We extend such algorithms by considering the technique of removal, where a fixed number k of constraints are removed from the sample according to a fixed rule, with the goal of improving the solution quality. This may have the effect of increasing the number of violated non-sampled constraints. We study this increase, and bound it in a variety of general settings. This work is motivated by, and extends, results on removal as proposed for chance-constrained optimization. For many relevant values of r, δ, and k, we prove matching upper and lower bounds for the expected number of constraints violated by a random sample, after the removal of k constraints. For a large range of values of k, the new upper bounds improve the previously best bounds for LP-type problems, which moreover had only been known in special cases, and not in the generality we consider. Moreover, we show that our results extend from finite to infinite spaces, for chance-constrained optimization.

Journal Article

Share this book

Add to My Shelf

Self-Improving Algorithms

by Ailon, Nir , Mulzer, Wolfgang , Liu, Ding in Algorithmics. Computability. Computer arithmetics , Algorithms , Applied sciences

2011

We investigate ways in which an algorithm can improve its expected performance by fine-tuning itself automatically with respect to an unknown input distribution D. We assume here that D is of product type. More precisely, suppose that we need to process a sequence I1, I2,... of inputs I = (x1, x2,..., xn) of some fixed length n, where each xi is drawn independently from some arbitrary, unknown distribution Di. The goal is to design an algorithm for these inputs so that eventually the expected running time will be optimal for the input distribution D = Πi Di. We give such self-improving algorithms for two problems: (i) sorting a sequence of numbers and (ii) computing the Delaunay triangulation of a planar point set. Both algorithms achieve optimal expected limiting complexity. The algorithms begin with a training phase during which they collect information about the input distribution, followed by a stationary regime in which the algorithms settle to their optimized incarnations. [PUBLICATION ABSTRACT]

Journal Article

Share this book

Add to My Shelf

Self-Improving Algorithms for Coordinatewise Maxima and Convex Hulls

by Clarkson, Kenneth L. , Seshadhri, C. , Mulzer, Wolfgang in Algorithms , Collisions , Computation

2014

Finding the coordinatewise maxima and the convex hull of a planar point set are probably the most classic problems in computational geometry. We consider these problems in the self-improving setting. Here, we have $n$ distributions $\\mathcal{D}_1, \\ldots, \\mathcal{D}_n$ of planar points. An input point set $(p_1, \\ldots, p_n)$ is generated by taking an independent sample $p_i$ from each $\\mathcal{D}_i$, so the input is distributed according to the product $\\mathcal{D} = \\prod_i \\mathcal{D}_i$. A self-improving algorithm repeatedly gets inputs from the distribution $\\mathcal{D}$ (which is a priori unknown), and it tries to optimize its running time for $\\mathcal{D}$. The algorithm uses the first few inputs to learn salient features of the distribution $\\mathcal{D}$ before it becomes fine-tuned to $\\mathcal{D}$. Let $\\text{OPT-MAX}_\\mathcal{D}$ (resp., $\\text{OPT-CH}_\\mathcal{D}$) be the expected depth of an optimal linear comparison tree computing the maxima (resp., convex hull) for $\\mathcal{D}$. Our maxima algorithm eventually achieves expected running time $O(\\text{OPT-MAX}_\\mathcal{D} + n)$. Furthermore, we give a self-improving algorithm for convex hulls with expected running time $O(\\text{OPT-CH}_\\mathcal{D} + n\\log\\log n)$. Our results require new tools for understanding linear comparison trees. In particular, we convert a general linear comparison tree to a restricted version that can then be related to the running time of our algorithms. Another interesting feature is an interleaved search procedure to determine the likeliest point to be extremal with minimal computation. This allows our algorithms to be competitive with the optimal algorithm for $\\mathcal{D}$. [PUBLICATION ABSTRACT]

Journal Article

Share this book

Add to My Shelf

Faster Kernel Ridge Regression Using Sketching and Preconditioning

by Clarkson, Kenneth L. , Woodruff, David P. , Avron, Haim

2017

Journal Article

Share this book

Add to My Shelf

Foreword

by Pach, János , Tóth, Csaba D , Clarkson, Kenneth L

2024

Journal Article

Share this book

Add to My Shelf

Combining data and theory for derivable scientific discovery with AI-Descartes

by El Khadir, Bachir , Dash, Sanjeeb , Goncalves, Joao in Science & Technology - Other Topics

2023

Abstract Scientists aim to discover meaningful formulae that accurately describe experimental data. Mathematical models of natural phenomena can be manually created from domain knowledge and fitted to data, or, in contrast, created automatically from large datasets with machine-learning algorithms. The problem of incorporating prior knowledge expressed as constraints on the functional form of a learned model has been studied before, while finding models that are consistent with prior knowledge expressed via general logical axioms is an open problem. We develop a method to enable principled derivations of models of natural phenomena from axiomatic knowledge and experimental data by combining logical reasoning with symbolic regression. We demonstrate these concepts for Kepler’s third law of planetary motion, Einstein’s relativistic time-dilation law, and Langmuir’s theory of adsorption. We show we can discover governing laws from few data points when logical reasoning is used to distinguish between candidate formulae having similar error on the data.

Journal Article

Share this book

Add to My Shelf

A Randomized Algorithm for Closest-Point Queries

by Clarkson, Kenneth L. in Algorithms , Applied sciences , Computer science; control theory; systems

1988

An algorithm for closest-point queries is given. The problem is this: given a set$S$of$n$points in$d$ -dimensional space, build a data structure so that given an arbitrary query point$p$ , a closest point in$S$to$p$can be found quickly. The measure of distance is the Euclidean norm. This is sometimes called the post-office problem. The new data structure will be termed an RPO tree, from Randomized Post Office. The expected time required to build an RPO tree is$O(n^{\\lceil {{d / 2}} \\rceil (1 + \\epsilon )} )$ , for any fixed$\\epsilon > 0$ , and a query can be answered in$O(\\log n)$worst-case time. An RPO tree requires$O(n^{\\lceil {{d / 2}} \\rceil (1 + \\epsilon )} )$space in the worst case. The constant factors in these bounds depend on$d$and$\\epsilon $ . The bounds are average-case due to the randomization employed by the algorithm, and hold for any set of input points. This result approaches the$\\Omega (n^{\\lceil {{d / 2}} \\rceil } )$worst-case time required for any algorithm that constructs the Voronoi diagram of the input points, and is a considerable improvement over previous bounds for$d > 3$ . The main step of the construction algorithm is the determination of the Voronoi diagram of a random sample of the sites, and the triangulation of that diagram.

Journal Article

Share this book

Add to My Shelf

Capacity Analysis of Vector Symbolic Architectures

by Ubaru, Shashanka , Yang, Elizabeth , Clarkson, Kenneth L in Algorithms , Associative memory , Bundling

2023

Hyperdimensional computing (HDC) is a biologically-inspired framework which represents symbols with high-dimensional vectors, and uses vector operations to manipulate them. The ensemble of a particular vector space and a prescribed set of vector operations (including one addition-like for \"bundling\" and one outer-product-like for \"binding\") form a *vector symbolic architecture* (VSA). While VSAs have been employed in numerous applications and have been studied empirically, many theoretical questions about VSAs remain open. We analyze the *representation capacities* of four common VSAs: MAP-I, MAP-B, and two VSAs based on sparse binary vectors. \"Representation capacity' here refers to bounds on the dimensions of the VSA vectors required to perform certain symbolic tasks, such as testing for set membership \$i \\in S\$ and estimating set intersection sizes \$|X \\cap Y|\$ for two sets of symbols \$X\$ and \$Y\$, to a given degree of accuracy. We also analyze the ability of a novel variant of a Hopfield network (a simple model of associative memory) to perform some of the same tasks that are typically asked of VSAs. In addition to providing new bounds on VSA capacities, our analyses establish and leverage connections between VSAs, \"sketching\" (dimensionality reduction) algorithms, and Bloom filters.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter