Catalogue Search | MBRL

Sparse image and signal processing : wavelets and related geometric multiscale analysis

by Starck, J.-L. (Jean-Luc), 1965- author , Murtagh, Fionn, author , Fadili, Jalal M., 1973- author in Transformations (Mathematics) , Signal processing. , Image processing.

Book

Share this book

Add to My Shelf

A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units

by Fehske, Holger , Kreutzer, Moritz , Hager, Georg

2014

Journal Article

Share this book

Add to My Shelf

Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments

by Buluç, Aydin , Gilbert, John R. in Algorithms , Blocking , Experiments

2012

Generalized sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. Here we show that SpGEMM also yields efficient algorithms for general sparse-matrix indexing in distributed memory, provided that the underlying SpGEMM implementation is sufficiently flexible and scalable. We demonstrate that our parallel SpGEMM methods, which use two-dimensional block data distributions with serial hypersparse kernels, are indeed highly flexible, scalable, and memory-efficient in the general case. This algorithm is the first to yield increasing speedup on an unbounded number of processors; our experiments show scaling up to thousands of processors in a variety of test scenarios.

Journal Article

Share this book

Add to My Shelf

Register-Aware Optimizations for Parallel Sparse Matrix–Matrix Multiplication

by Liu, Junhong , He, Xin , Liu, Weifeng in Accumulators , Algorithms , Low speed

2019

General sparse matrix–matrix multiplication (SpGEMM) is a fundamental building block of a number of high-level algorithms and real-world applications. In recent years, several efficient SpGEMM algorithms have been proposed for many-core processors such as GPUs. However, their implementations of sparse accumulators, the core component of SpGEMM, mostly use low speed on-chip shared memory and global memory, and high speed registers are seriously underutilised. In this paper, we propose three novel register-aware SpGEMM algorithms for three representative sparse accumulators, i.e., sort, merge and hash, respectively. We fully utilise the GPU registers to fetch data, finish computations and store results out. In the experiments, our algorithms deliver excellent performance on a benchmark suite including 205 sparse matrices from the SuiteSparse Matrix Collection. Specifically, on an Nvidia Pascal P100 GPU, our three register-aware sparse accumulators achieve on average 2.0 \\[\\times \\] (up to 5.4 \\[\\times \\]), 2.6 \\[\\times \\] (up to 10.5 \\[\\times \\]) and 1.7 \\[\\times \\] (up to 5.2 \\[\\times \\]) speedups over their original implementations in libraries bhSPARSE, RMerge and NSPARSE, respectively.

Journal Article

Share this book

Add to My Shelf

TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs

by Liu, Weifeng , Lu, Zhengyang in Algorithms , Algorithms and Applications of High Performance Sparse Matrix Computations , Architecture

2023

Sparse triangular solve (SpTRSV) is one of the most important level-2 kernels in sparse basic linear algebra subprograms (BLAS). Compared to another level-2 sparse BLAS kernel sparse matrix–vector multiplication (SpMV), SpTRSV is in general more difficult to find high parallelism on many-core processors, such as GPUs. Nowadays, much work focuses on reducing dependencies and synchronizations in the level-set and Sync-free algorithms for SpTRSV. However, there is less work that can make good use of sparse spatial structure for SpTRSV on GPUs. In this paper, we propose a tiled algorithm called TileSpTRSV for optimizing SpTRSV on GPUs through exploiting 2D spatial structure of sparse matrices. We design two algorithm implementations, i.e., TileSpTRSV_level-set and TileSpTRSV_sync-free, for TileSpTRSV on top of level-set and Sync-free algorithms, respectively. By testing 16 representative matrices on a latest NVIDIA GPU, the experimental results show that TileSpTRSV_level-set gives on average 5.29 × (up to 38.10 × ), 5.33 × (up to 21.32 × ) and 2.62 × (up to 12.87 × ) speedups over cuSPARSE, Sync-free and Recblock algorithms on the 16 representative matrices, respectively.

Journal Article

Share this book

Add to My Shelf

Optimizing partitioned CSR-based SpGEMM on the Sunway TaihuLight

by Chen, Yuedan , Yang, Wangdong , Xiao, Guoqing in Advances in Parallel and Distributed Computing for Neural Computing , Artificial Intelligence , Computational Biology/Bioinformatics

2020

General sparse matrix-sparse matrix (SpGEMM) multiplication is one of the basic kernels in a great many applications. Several works focus on various optimizations for SpGEMM. To fully exploit the powerful computing capability of the Sunway TaihuLight supercomputer for SpGEMM, this paper designs the partitioning method and parallelization of CSR-based SpGEMM to make it well match to the Sunway architecture. In addition, this paper optimizes the partitioning method based on the distribution of the floating-point calculations of the CSR-based SpGEMM to achieve the load balance and performance improvement on the Sunway. We, respectively, analyze the performance, including the memory footprint and the execution time, of the parallel CSR-based SpGEMM and the optimized CSR-based SpGEMM on the Sunway. The experimental results show that the optimized CSR-based SpGEMM outperforms over the parallel CSR-based SpGEMM and has good scalability on the Sunway.

Journal Article

Share this book

Add to My Shelf

Faster algorithms for sparse ILP and hypergraph multi-packing/multi-cover problems

by Gribanov, Dmitry , Shumilov, Ivan , Malyshev, Dmitry in Algorithms , Complexity , Computer science

2024

In our paper, we consider the following general problems: check feasibility, count the number of feasible solutions, find an optimal solution, and count the number of optimal solutions in P∩Zn, assuming that P is a polyhedron, defined by systems Ax≤b or Ax=b,x≥0 with a sparse matrix A. We develop algorithms for these problems that outperform state-of-the-art ILP and counting algorithms on sparse instances with bounded elements in terms of the computational complexity. Assuming that the matrix A has bounded elements, our complexity bounds have the form sO(n), where s is the minimum between numbers of non-zeroes in columns and rows of A, respectively. For s=o(logn), this bound outperforms the state-of-the-art ILP feasibility complexity bound (logn)O(n), due to Reis & Rothvoss (in: 2023 IEEE 64th Annual symposium on foundations of computer science (FOCS), IEEE, pp. 974–988). For s=ϕo(logn), where ϕ denotes the input bit-encoding length, it outperforms the state-of-the-art ILP counting complexity bound ϕO(nlogn), due to Barvinok et al. (in: Proceedings of 1993 IEEE 34th annual foundations of computer science, pp. 566–572, https://doi.org/10.1109/SFCS.1993.366830, 1993), Dyer, Kannan (Math Oper Res 22(3):545–549, https://doi.org/10.1287/moor.22.3.545, 1997), Barvinok, Pommersheim (Algebr Combin 38:91–147, 1999), Barvinok (in: European Mathematical Society, ETH-Zentrum, Zurich, 2008). We use known and new methods to develop new exponential algorithms for Edge/Vertex Multi-Packing/Multi-Cover Problems on graphs and hypergraphs. This framework consists of many different problems, such as the Stable Multi-set, Vertex Multi-cover, Dominating Multi-set, Set Multi-cover, Multi-set Multi-cover, and Hypergraph Multi-matching problems, which are natural generalizations of the standard Stable Set, Vertex Cover, Dominating Set, Set Cover, and Maximum Matching problems.

Journal Article

Share this book

Add to My Shelf

Fast and accurate pseudoinverse with sparse matrix reordering and incremental approach

by Jung, Jinhong , Lee, Sael in Approximation , Computation , Linear systems

2020

How can we compute the pseudoinverse of a sparse feature matrix efficiently and accurately for solving optimization problems? A pseudoinverse is a generalization of a matrix inverse, which has been extensively utilized as a fundamental building block for solving linear systems in machine learning. However, an approximate computation, let alone an exact computation, of pseudoinverse is very time-consuming due to its demanding time complexity, which limits it from being applied to large data. In this paper, we propose FastPI (Fast PseudoInverse), a novel incremental singular value decomposition (SVD) based pseudoinverse method for sparse matrices. Based on the observation that many real-world feature matrices are sparse and highly skewed, FastPI reorders and divides the feature matrix and incrementally computes low-rank SVD from the divided components. To show the efficacy of proposed FastPI, we apply them in real-world multi-label linear regression problems. Through extensive experiments, we demonstrate that FastPI computes the pseudoinverse faster than other approximate methods without loss of accuracy. Results imply that our method efficiently computes the low-rank pseudoinverse of a large and sparse matrix that other existing methods cannot handle with limited time and space.

Journal Article

Share this book

Add to My Shelf

Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs)

by Katib, Iyad , Albeshri, Aiiad , AlAhmadi, Sarah in Algorithms , Business metrics , Central processing units

2020

Graphics processing units (GPUs) have delivered a remarkable performance for a variety of high performance computing (HPC) applications through massive parallelism. One such application is sparse matrix-vector (SpMV) computations, which is central to many scientific, engineering, and other applications including machine learning. No single SpMV storage or computation scheme provides consistent and sufficiently high performance for all matrices due to their varying sparsity patterns. An extensive literature review reveals that the performance of SpMV techniques on GPUs has not been studied in sufficient detail. In this paper, we provide a detailed performance analysis of SpMV performance on GPUs using four notable sparse matrix storage schemes (compressed sparse row (CSR), ELLAPCK (ELL), hybrid ELL/COO (HYB), and compressed sparse row 5 (CSR5)), five performance metrics (execution time, giga floating point operations per second (GFLOPS), achieved occupancy, instructions per warp, and warp execution efficiency), five matrix sparsity features (nnz, anpr, nprvariance, maxnpr, and distavg), and 17 sparse matrices from 10 application domains (chemical simulations, computational fluid dynamics (CFD), electromagnetics, linear programming, economics, etc.). Subsequently, based on the deeper insights gained through the detailed performance analysis, we propose a technique called the heterogeneous CPU–GPU Hybrid (HCGHYB) scheme. It utilizes both the CPU and GPU in parallel and provides better performance over the HYB format by an average speedup of 1.7x. Heterogeneous computing is an important direction for SpMV and other application areas. Moreover, to the best of our knowledge, this is the first work where the SpMV performance on GPUs has been discussed in such depth. We believe that this work on SpMV performance analysis and the heterogeneous scheme will open up many new directions and improvements for the SpMV computing field in the future.

Journal Article

Share this book

Add to My Shelf

A novel image hashing with low-rank sparse matrix decomposition and feature distance

by Zhang, Hanyun , Tang, Zhenjun , Liang, Xiaoping in Algorithms , Artificial Intelligence , Computer Graphics

2025

Image hashing is an efficient technique of image processing for various applications, such as retrieval, copy detection and authentication. In this paper, we design a novel image hashing algorithm using LRSMD (low-rank sparse matrix decomposition). Firstly, an input image is preprocessed by interpolation, Gaussian blur and color space conversion. Next, the preprocessed image is fed into the LRSMD for learning a low-rank matrix. Then, statistical features of non-overlapping blocks in the low-rank matrix are extracted. Finally, the hash code is obtained by calculating feature distances. Various experiments are done on public datasets to demonstrate the robustness and discrimination of the proposed algorithm. The results show that the proposed algorithm outperforms several advanced algorithms in balancing the performances of robustness and discrimination.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter