Catalogue Search | MBRL

An Algebraic Sparsified Nested Dissection Algorithm Using Low-Rank Approximations

by Boman, Erik G. , Rajamanickam, Sivasankaran , Darve, Eric in hierarchical matrix , low-rank , MATHEMATICS AND COMPUTING

2020

Here, we propose a new algorithm for the fast solution of large, sparse, symmetric positive-definite linear systems, spaND (sparsified Nested Dissection). It is based on nested dissection, sparsification, and low-rank compression. After eliminating all interiors at a given level of the elimination tree, the algorithm sparsifies all separators corresponding to the interiors. This operation reduces the size of the separators by eliminating some degrees of freedom but without introducing any fill-in. This is done at the expense of a small and controllable approximation error. The result is an approximate factorization that can be used as an efficient preconditioner. We then perform several numerical experiments to evaluate this algorithm. We demonstrate that a version using orthogonal factorization and block-diagonal scaling takes fewer CG iterations to converge than previous similar algorithms on various kinds of problems. Furthermore, this algorithm is provably guaranteed to never break down and the matrix stays symmetric positive-definite throughout the process. We evaluate the algorithm on some large problems show it exhibits near-linear scaling. The factorization time is roughly $\\mathcal{O}$(N), and the number of iterations grows slowly with N.

Journal Article

Share this book

Add to My Shelf

Hierarchical Orthogonal Factorization: Sparse Square Matrices

by Darve, Eric , Gnanasekaran, Abeynaya

2022

Journal Article

Share this book

Add to My Shelf

αSetup-AMG: an adaptive-setup-based parallel AMG solver for sequence of sparse linear systems

by An, Hengbin , Xu, Xiaowen , Shu, Shi in Algebra , Algorithms , Coarsening

2020

The algebraic multigrain (AMG) is one of the most frequently used algorithms for the solution of large-scale sparse linear systems in many realistic simulations of science and engineering applications. However, as the concurrency of supercomputers increasing, the AMG solver increasingly leads to poor parallel scalability due to its coarse-level construction in the setup phase. In this paper, to improve the parallel scalability of the traditional AMG to solve the sequence of sparse linear systems arising from PDE-based simulations, we propose a new AMG procedure called αSetup-AMG based on an adaptive setup strategy. The main idea behind αSetup-AMG is the introduction of a setup condition in the coarsening process so that the setup is constructed as it needed instead of constructing in advance via an independent phase in the traditional procedure. As a result, αSetup-AMG requires fewer setup cost and level numbers for the sequence of linear systems. The numerical results on thousands of cores for a radiation hydrodynamics simulation in the inertial confinement fusion (ICF) application show the significant improvement in the efficiency of the αSetup-AMG solver.

Journal Article

Share this book

Add to My Shelf

Reordering Strategy for Blocking Optimization in Sparse Linear Solvers

by Faverge, Mathieu , Ramet, Pierre , Pichon, Gregoire in Computer Science , Distributed, Parallel, and Cluster Computing

2017

Solving sparse linear systems is a problem that arises in many scientific applications, and sparse direct solvers are a time-consuming and key kernel for those applications and for more advanced solvers such as hybrid direct-iterative solvers. For this reason, optimizing their performance on modern architectures is critical. The preprocessing steps of sparse direct solvers—ordering and block-symbolic factorization—are two major steps that lead to a reduced amount of computation and memory and to a better task granularity to reach a good level of performance when using BLAS kernels. With the advent of GPUs, the granularity of the block computation has become more important than ever. In this paper, we present a reordering strategy that increases this block granularity. This strategy relies on block-symbolic factorization to refine the ordering produced by tools such as Metis or Scotch, but it does not impact the number of operations required to solve the problem. We integrate this algorithm in the PaStiX solver and show an important reduction of the number of off-diagonal blocks on a large spectrum of matrices. This improvement leads to an increase in efficiency of up to 20% on GPUs. 1. Introduction. Many scientific applications, such as electromagnetism, astrophysics , and computational fluid dynamics, use numerical models that require solving linear systems of the form Ax = b. In those problems, the matrix A can be considered as either dense (almost no zero entries) or sparse (mostly zero entries). Due to multiple structural and numerical differences that appear in those problems, many different solutions exist to solve them. In this paper, we focus on problems leading to sparse systems with a symmetric pattern and, more specifically, on direct methods which factorize the matrix A in LL t , LDL t , or LU , with L, D, and U, respectively, unit lower triangular, diagonal, and upper triangular according to the problem numerical properties. Those sparse matrices appear mostly when discretizing partial differential equations (PDEs) on two-(2D) and three-(3D) dimensional finite element or finite volume meshes. The main issue with such factorizations is the fill-in—zero entries becoming nonzero—that appears in the factorized form of A during the execution of the algorithm. If not correctly considered, the fill-in can transform the sparse matrix into a dense one which might not fit in memory. In this context, sparse direct solvers rely on two important preprocessing steps to reduce this fill-in and control where it appears. The first one finds a suitable ordering of the unknowns that aims at minimizing the fill-in to limit the memory overhead and floating point operations (Flops) required to complete the factorization. The problem is then transformed into (P AP t)(P x) = P b,

Journal Article

Share this book

Add to My Shelf

A parallel log barrier-based mesh warping algorithm for distributed memory machines

by Panitanarak, Thap , Shontz, Suzanne M in Algorithms , Computation , Computer memory

2018

Parallel dynamic meshes are essential for computational simulations of large-scale scientific applications involving motion. To address this need, we propose parallel LBWARP, a parallel log barrier-based tetrahedral mesh warping algorithm for distributed memory machines. Our algorithm is a general-purpose, geometric mesh warping algorithm that parallelizes the sequential LBWARP algorithm proposed by Shontz and Vavasis. The first step of the algorithm involves computation of a set of local weights for each interior node which describe the relative distances of the node to each of its neighbors. The weight computation step is the most time consuming in the parallel algorithm. Based on our choice of the mesh partition and the corresponding distribution of data and assignment of tasks to processors, communication among processors is avoided in an embarrassingly parallel computation of the weights. Once this representation of the initial mesh is determined, a target deformation of the boundary is applied, also in an embarrassingly parallel manner. Finally, new coordinates of the interior nodes are obtained by solving a system of linear equations with multiple right-hand sides that is based on the weights and boundary deformation. This linear system can be solved using one of three parallel sparse linear solvers, i.e., the distributed block BiCG, block GMRES, or LU algorithm, all of which support the solution of linear systems with multiple right-hand side vectors. Our numerical results demonstrate good efficiency and strong scalability of parallel LBWARP on up to 64 processors, as the experiments show close to linear speedup in all cases. Weak scalability is also demonstrated. The performance of the parallel sparse linear solvers is dependent on factors such as the mesh size, the amount of available memory, and the number of processors. For example, the distributed LU algorithm gives better performance on small meshes, whereas the distributed block BiCG and distributed block GMRES algorithms yield better performance when the amount of available memory is limited. Finally, we demonstrate the parallel LBWARP performance for a sequence of mesh deformations which can significantly reduce the runtime of the overall algorithm. When applied to k deformations, parallel LBWARP reuses the weight matrix, that was computed during the first deformation, when the distributed LU linear solver is employed. This gives close to k-time performance for sufficiently many deformations.

Journal Article

Share this book

Add to My Shelf

Non-intrusive parallelization of multibody system dynamic simulations

by Lugrís, Urbano , Luaces, Alberto , González, Francisco in Classical and Continuum Physics , Computational Science and Engineering , Computer simulation

2009

This paper evaluates two non-intrusive parallelization techniques for multibody system dynamics: parallel sparse linear equation solvers and OpenMP. Both techniques can be applied to existing simulation software with minimal changes in the code structure; this is a major advantage over Message Passing Interface, the standard parallelization method in multibody dynamics. Both techniques have been applied to parallelize a starting sequential implementation of a global index-3 augmented Lagrangian formulation combined with the trapezoidal rule as numerical integrator, in order to solve the forward dynamics of a variable-loop four-bar mechanism. Numerical experiments have been performed to measure the efficiency as a function of problem size and matrix filling. Results show that the best parallel solver (Pardiso) performs better than the best sequential solver (CHOLMOD) for multibody problems of large and medium sizes leading to matrix fillings above 10. OpenMP also proved to be advantageous even for problems of small sizes. Both techniques delivered speedups above 70% of the maximum theoretical values for a wide range of multibody problems.

Journal Article

Share this book

Add to My Shelf

On the Easy Use of Scientific Computing Services for Large Scale Linear Algebra and Parallel Decision Making with the P-Grade Portal

by Guivarch, Ronan , Astsatryan, Hrachya , Shoukouryan, Yuri in Algorithms , Applied sciences , Computation

2013

Scientific research is becoming increasingly dependent on the large-scale analysis of data using distributed computing infrastructures (Grid, cloud, GPU, etc.). Scientific computing (Petitet et al. 1999 ) aims at constructing mathematical models and numerical solution techniques for solving problems arising in science and engineering. In this paper, we describe the services of an integrated portal based on the P-Grade (Parallel Grid Run-time and Application Development Environment) portal ( http://www.p-grade.hu ) that enables the solution of large-scale linear systems of equations using direct solvers, makes easier the use of parallel block iterative algorithm and provides an interface for parallel decision making algorithms. The ultimate goal is to develop a single sign on integrated multi-service environment providing an easy access to different kind of mathematical calculations and algorithms to be performed on hybrid distributed computing infrastructures combining the benefits of large clusters, Grid or cloud, when needed.

Journal Article

Share this book

Add to My Shelf

SlabLU: a two-level sparse direct solver for elliptic PDEs

by Yesypenko, Anna , Martinsson, Per-Gunnar in Computational mathematics , Computational Mathematics and Numerical Analysis , Computational Science and Engineering

2024

The paper describes a sparse direct solver for the linear systems that arise from the discretization of an elliptic PDE on a two-dimensional domain. The scheme decomposes the domain into thin subdomains, or “slabs” and uses a two-level approach that is designed with parallelization in mind. The scheme takes advantage of H 2 -matrix structure emerging during factorization and utilizes randomized algorithms to efficiently recover this structure. As opposed to multi-level nested dissection schemes that incorporate the use of H or H 2 matrices for a hierarchy of front sizes, SlabLU is a two-level scheme which only uses H 2 -matrix algebra for fronts of roughly the same size. The simplicity allows the scheme to be easily tuned for performance on modern architectures and GPUs. The solver described is compatible with a range of different local discretizations, and numerical experiments demonstrate its performance for regular discretizations of rectangular and curved geometries. The technique becomes particularly efficient when combined with very high-order accurate multidomain spectral collocation schemes. With this discretization, a Helmholtz problem on a domain of size 1000 λ × 1000 λ (for which N = 100 M ) is solved in 15 min to 6 correct digits on a high-powered desktop with GPU acceleration.

Journal Article

Share this book

Add to My Shelf

Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments

by Buluç, Aydin , Gilbert, John R. in Algorithms , Blocking , Experiments

2012

Generalized sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. Here we show that SpGEMM also yields efficient algorithms for general sparse-matrix indexing in distributed memory, provided that the underlying SpGEMM implementation is sufficiently flexible and scalable. We demonstrate that our parallel SpGEMM methods, which use two-dimensional block data distributions with serial hypersparse kernels, are indeed highly flexible, scalable, and memory-efficient in the general case. This algorithm is the first to yield increasing speedup on an unbounded number of processors; our experiments show scaling up to thousands of processors in a variety of test scenarios.

Journal Article

Share this book

Add to My Shelf

Approximating sparse Hessian matrices using large-scale linear least squares

by Gould, Nicholas I. M , Fowkes, Jaroslav M , Scott, Jennifer A in Algorithms , Approximation , Hessian matrices

2024

Large-scale optimization algorithms frequently require sparse Hessian matrices that are not readily available. Existing methods for approximating large sparse Hessian matrices have limitations. To try and overcome these, we propose a novel approach that reformulates the problem as the solution of a large linear least squares problem. The least squares problem is sparse but can include a number of rows that contain significantly more entries than other rows and are regarded as dense. We exploit recent work on solving such problems using either the normal equations or an augmented system to derive a robust approach for computing approximate sparse Hessian matrices. Example sparse Hessians from the CUTEst test problem collection for optimization illustrate the effectiveness and robustness of the new method.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter