Catalogue Search | MBRL

Communication-optimal Parallel and Sequential QR and LU Factorizations

by Langou, Julien , Hoemmen, Mark , Demmel, James in Algorithms , Bandwidths , Communication

2012

We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform and just as stable as Householder QR. We prove optimality by deriving new lower bounds for the number of multiplications done by \"non-Strassen-like\" QR, and using these in known communication lower bounds that are proportional to the number of multiplications. We not only show that our QR algorithms attain these lower bounds (up to polylogarithmic factors), but that existing LAPACK and ScaLAPACK algorithms perform asymptotically more communication. We derive analogous communication lower bounds for LU factorization and point out recent LU algorithms in the literature that attain at least some of these lower bounds. The sequential and parallel QR algorithms for tall and skinny matrices lead to significant speedups in practice over some of the existing algorithms, including LAPACK and ScaLAPACK, for example, up to 6.7 times over ScaLAPACK. A performance model for the parallel algorithm for general rectangular matrices predicts significant speedups over ScaLAPACK.

Journal Article

Share this book

Add to My Shelf

Adaptive linear solution process for single-phase Darcy flow

by Yousef, Soleiman , Jorti, Zakariae , Anciaux-Sedrakian, Ani in Differential equations , Dimensions , Finite volume method

2020

This article presents an adaptive approach for solving linear systems arising from self-adjoint Partial Differential Equations (PDE) problems discretized by cell-centered finite volume method and stemming from single-phase flow simulations. This approach aims at reducing the algebraic error in targeted parts of the domain using a posteriori error estimates. Numerical results of a reservoir simulation example for heterogeneous porous media in two dimensions are discussed. Using the adaptive solve procedure, we obtain a significant gain in terms of the number of time steps and iterations compared to a standard solve.

Journal Article

Share this book

Add to My Shelf

S-Step BiCGStab Algorithms for Geoscience Dynamic Simulations

by Moufawad, Sophie , Yousef, Soleiman , Anciaux-Sedrakian, Ani in Algorithms , Communication , Computer simulation

2016

In basin and reservoir simulations, the most expensive and time consuming phase is solving systems of linear equations using Krylov subspace methods such as BiCGStab. For this reason, we explore the possibility of using communication avoiding Krylov subspace methods (s-step BiCGStab), that speedup of the convergence time on modern-day architectures, by restructuring the algorithms to reduce communication. We introduce some variants of s-step BiCGStab with better numerical stability for the targeted systems. Dans les simulateurs d’écoulement en milieu poreux, comme les simulateurs de réservoir et de bassin, la résolution de système linéaire constitue l’étape la plus consommatrice en temps de calcul et peut même représenter jusqu’à 80 % du temps de la simulation. Ceci montre que la performance de ces simulateurs dépend fortement de l’efficacité des solveurs linéaires. En même temps, les machines parallèles modernes disposent d’un grand nombre de processeurs et d’unités de calcul massivement parallèle. Dans cet article, nous proposons de nouveaux algorithmes BiCGStab, basés sur l’algorithme à moindre communication nommé s-step, permettant d’éviter un certain nombre de communication afin d’exploiter pleinement les architectures hautement parallèles.

Journal Article

Share this book

Add to My Shelf

A Class of Efficient Locally Constructed Preconditioners Based on Coarse Spaces

by Al Daas, Hussam , Grigori, Laura in Mathematics

2019

In this paper we present a class of robust and fully algebraic two-level preconditioners for symmetric positive definite (SPD) matrices. We introduce the notion of algebraic local symmetric positive semidefinite splitting of an SPD matrix and we give a characterization of this splitting. This splitting leads to construct algebraically and locally a class of efficient coarse spaces which bound the spectral condition number of the preconditioned system by a number defined a priori. We also introduce the tau-filtering subspace. This concept helps compare the dimension minimality of coarse spaces. Some PDEs-dependant preconditioners correspond to a special case. The examples of the algebraic coarse spaces in this paper are not practical due to expensive construction. We propose a heuristic approximation that is not costly. Numerical experiments illustrate the efficiency of the proposed method.

Journal Article

Share this book

Add to My Shelf

An Improved Analysis and Unified Perspective on Deterministic and Randomized Low-Rank Matrix Approximation

by Rusciano, Alexander , Demmel, James , Grigori, Laura

2023

Journal Article

Share this book

Add to My Shelf

Higher-Order QR with Tournament Pivoting for Tensor Compression

by Frenkiel, David , Grigori, Laura , Beaupère, Matthias

2023

Journal Article

Share this book

Add to My Shelf

Numerical algorithms for high-performance computational science

by Grigori, Laura , Higham, Nicholas J. , Dongarra, Jack in Computer Science , Distributed, Parallel, and Cluster Computing

2020

A number of features of today’s high-performance computers make it challenging to exploit these machines fully for computational science. These include increasing core counts but stagnant clock frequencies; the high cost of data movement; use of accelerators (GPUs, FPGAs, coprocessors), making architectures increasingly heterogeneous; and multi- ple precisions of floating-point arithmetic, including half-precision. Moreover, as well as maximizing speed and accuracy, minimizing energy consumption is an important criterion. New generations of algorithms are needed to tackle these challenges. We discuss some approaches that we can take to develop numerical algorithms for high-performance computational science, with a view to exploiting the next generation of supercomputers. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

Journal Article

Share this book

Add to My Shelf

Numerical algorithms for high-performance computational science

by Higham, Nicholas J. , Dongarra, Jack , Grigori, Laura

2020

A number of features of today’s high-performance computers make it challenging to exploit these machines fully for computational science. These include increasing core counts but stagnant clock frequencies; the high cost of data movement; use of accelerators (GPUs, FPGAs, coprocessors), making architectures increasingly heterogeneous; and multiple precisions of floating-point arithmetic, including half-precision. Moreover, as well as maximizing speed and accuracy, minimizing energy consumption is an important criterion. New generations of algorithms are needed to tackle these challenges. We discuss some approaches that we can take to develop numerical algorithms for high-performance computational science, with a view to exploiting the next generation of supercomputers. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

Journal Article

Share this book

Add to My Shelf

Enlarged Krylov Subspace Conjugate Gradient Methods for Reducing Communication

by Moufawad, Sophie , Grigori, Laura , Nataf, Frederic in Computer Science , Distributed, Parallel, and Cluster Computing

2016

In this paper we introduce a new approach for reducing communication in Krylov subspace methods that consists of enlarging the Krylov subspace by a maximum of $t$ vectors per iteration, based on a domain decomposition of the graph of $A$. The obtained enlarged Krylov subspace $\\mathscr{K}_{k,t}(A,r_0)$ is a superset of the Krylov subspace $\\mathcal{K}_k(A,r_0)$, $\\mathcal{K}_k(A,r_0) \\subset \\mathscr{K}_{k,t}(A,r_0)$. Thus, we search for the solution of the system $Ax=b$ in $\\mathscr{K}_{k,t}(A,r_0)$ instead of $\\mathcal{K}_k(A,r_0)$. Moreover, we show in this paper that the enlarged Krylov projection subspace methods lead to faster convergence in terms of iterations and parallelizable algorithms with less communication, with respect to Krylov methods.

Journal Article

Share this book

Add to My Shelf

Interpretation of parareal as a two-level additive Schwarz in time preconditioner and its acceleration with GMRES

by Grigori, Laura , Nguyen, Van-Thanh in Acceleration , Advection , Algebra

2023

We describe an interpretation of parareal as a two-level additive Schwarz preconditioner in the time domain. We show that this two-level preconditioner in time is equivalent to parareal and to multigrid reduction in time (MGRIT) with F-relaxation. We also discuss the case when additional fine or coarse propagation steps are applied in the preconditioner. This leads to procedures equivalent to MGRIT with FCF-relaxation and to MGRIT with F(CF) 2 -relaxation or overlapping parareal. Numerical results show that these variants have faster convergence in some cases. In addition, we also apply a Krylov subspace method, namely GMRES (generalized minimal residual), to accelerate the parareal algorithm. Better convergence is obtained, especially for the advection-reaction-diffusion equation in the case when advection and reaction coefficients are large.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter