Catalogue Search | MBRL

Tensor-tensor algebra for optimal representation and compression of multiway data

by Avron, Haim , Newman, Elizabeth , Horesh, Lior in Applied Mathematics , Compressibility , Compression

2021

With the advent of machine learning and its overarching pervasiveness it is imperative to devise ways to represent large datasets efficiently while distilling intrinsic features necessary for subsequent analysis. The primary workhorse used in data dimensionality reduction and feature extraction has been the matrix singular value decomposition (SVD), which presupposes that data have been arranged in matrix format. A primary goal in this study is to show that high-dimensional datasets are more compressible when treated as tensors (i.e., multiway arrays) and compressed via tensor-SVDs under the tensor-tensor product constructs and its generalizations. We begin by proving Eckart–Young optimality results for families of tensor-SVDs under two different truncation strategies. Since such optimality properties can be proven in both matrix and tensor-based algebras, a fundamental question arises: Does the tensor construct subsume the matrix construct in terms of representation efficiency? The answer is positive, as proven by showing that a tensor-tensor representation of an equal dimensional spanning space can be superior to its matrix counterpart. We then use these optimality results to investigate how the compressed representation provided by the truncated tensor SVD is related both theoretically and empirically to its two closest tensor-based analogs, the truncated high-order SVD and the truncated tensor-train SVD.

Journal Article

Share this book

Add to My Shelf

Faster Subset Selection for Matrices and Applications

by Avron, Haim , Boutsidis, Christos in Algorithms , Approximation , Feature selection

2013

We study the following problem of subset selection for matrices: given a matrix $\\mathbf{X} \\in \\mathbb{R}^{n \\times m}$ ($m > n$) and a sampling parameter $k$ ($n \\le k \\le m$), select a subset of $k$ columns from $\\mathbf{X}$ such that the pseudoinverse of the sampled matrix has as small a norm as possible. In this work, we focus on the Frobenius and the spectral matrix norms. We describe several novel (deterministic and randomized) approximation algorithms for this problem with approximation bounds that are optimal up to constant factors. Additionally, we show that the combinatorial problem of finding a low-stretch spanning tree in an undirected graph corresponds to subset selection, and discuss various implications of this reduction. [PUBLICATION ABSTRACT]

Journal Article

Share this book

Add to My Shelf

Dimensionality reduction of longitudinal ’omics data using modern tensor factorizations

by Kviatcovsky, Denise , Elinav, Eran , Avron, Haim in Algorithms , Analytical methods , Approximation

2022

Longitudinal ’omics analytical methods are extensively used in the evolving field of precision medicine, by enabling ‘big data’ recording and high-resolution interpretation of complex datasets, driven by individual variations in response to perturbations such as disease pathogenesis, medical treatment or changes in lifestyle. However, inherent technical limitations in biomedical studies often result in the generation of feature-rich and sample-limited datasets. Analyzing such data using conventional modalities often proves to be challenging since the repeated, high-dimensional measurements overload the outlook with inconsequential variations that must be filtered from the data in order to find the true, biologically relevant signal. Tensor methods for the analysis and meaningful representation of multiway data may prove useful to the biological research community by their advertised ability to tackle this challenge. In this study, we present tcam —a new unsupervised tensor factorization method for the analysis of multiway data. Building on top of cutting-edge developments in the field of tensor-tensor algebra, we characterize the unique mathematical properties of our method, namely, 1) preservation of geometric and statistical traits of the data, which enable uncovering information beyond the inter-individual variation that often takes over the focus, especially in human studies. 2) Natural and straightforward out-of-sample extension, making tcam amenable for integration in machine learning workflows. A series of re-analyses of real-world, human experimental datasets showcase these theoretical properties, while providing empirical confirmation of tcam ’s utility in the analysis of longitudinal ’omics data.

Journal Article

Share this book

Add to My Shelf

Semi-Infinite Linear Regression and Its Applications

by Avron, Haim , Shustin, Paz Fink

2022

Journal Article

Share this book

Add to My Shelf

Blendenpik: Supercharging LAPACK's Least-Squares Solver

by Maymounkov, Petar , Avron, Haim , Toledo, Sivan in Algorithms , Blends , Computation

2010

Several innovative random-sampling and random-mixing techniques for solving problems in linear algebra have been proposed in the last decade, but they have not yet made a significant impact on numerical linear algebra. We show that by using a high-quality implementation of one of these techniques, we obtain a solver that performs extremely well in the traditional yardsticks of numerical linear algebra: it is significantly faster than high-performance implementations of existing state-of-the-art algorithms, and it is numerically backward stable. More specifically, we describe a least-squares solver for dense highly overdetermined systems that achieves residuals similar to those of direct QR factorization-based solvers (LAPACK), outperforms LAPACK by large factors, and scales significantly better than any QR-based solver. [PUBLICATION ABSTRACT]

Journal Article

Share this book

Add to My Shelf

Faster Kernel Ridge Regression Using Sketching and Preconditioning

by Clarkson, Kenneth L. , Woodruff, David P. , Avron, Haim

2017

Journal Article

Share this book

Add to My Shelf

Sketching for Principal Component Regression

by Mor-Yosef, Liron , Avron, Haim

2019

Journal Article

Share this book

Add to My Shelf

High-Performance Kernel Machines With Implicit Distributed Optimization and Randomization

by Sindhwani, Vikas , Avron, Haim in Approximation , Big-data , Comparative analysis

2016

We propose a framework for massive-scale training of kernel-based statistical models, based on combining distributed convex optimization with randomization techniques. Our approach is based on a block-splitting variant of the alternating directions method of multipliers, carefully reconfigured to handle very large random feature matrices under memory constraints, while exploiting hybrid parallelism typically found in modern clusters of multicore machines. Our high-performance implementation supports a variety of statistical learning tasks by enabling several loss functions, regularization schemes, kernels, and layers of randomized approximations for both dense and sparse datasets, in an extensible framework. We evaluate our implementation on large-scale model construction tasks and provide a comparison against existing sequential and parallel libraries. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

Multivariate Trace Estimation Using Quantum State Space Linear Algebra

by Ubaru, Shashanka , Mor-Yosef, Liron , Avron, Haim

2025

Journal Article

Share this book

Add to My Shelf

Efficient Dimensionality Reduction for Canonical Correlation Analysis

by Zouzias, Anastasios , Avron, Haim , Boutsidis, Christos

2014