Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Reading Level
      Reading Level
      Clear All
      Reading Level
  • Content Type
      Content Type
      Clear All
      Content Type
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Item Type
    • Is Full-Text Available
    • Subject
    • Publisher
    • Source
    • Donor
    • Language
    • Place of Publication
    • Contributors
    • Location
2,649 result(s) for "Sparse matrices"
Sort by:
Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments
Generalized sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. Here we show that SpGEMM also yields efficient algorithms for general sparse-matrix indexing in distributed memory, provided that the underlying SpGEMM implementation is sufficiently flexible and scalable. We demonstrate that our parallel SpGEMM methods, which use two-dimensional block data distributions with serial hypersparse kernels, are indeed highly flexible, scalable, and memory-efficient in the general case. This algorithm is the first to yield increasing speedup on an unbounded number of processors; our experiments show scaling up to thousands of processors in a variety of test scenarios.
Register-Aware Optimizations for Parallel Sparse Matrix–Matrix Multiplication
General sparse matrix–matrix multiplication (SpGEMM) is a fundamental building block of a number of high-level algorithms and real-world applications. In recent years, several efficient SpGEMM algorithms have been proposed for many-core processors such as GPUs. However, their implementations of sparse accumulators, the core component of SpGEMM, mostly use low speed on-chip shared memory and global memory, and high speed registers are seriously underutilised. In this paper, we propose three novel register-aware SpGEMM algorithms for three representative sparse accumulators, i.e., sort, merge and hash, respectively. We fully utilise the GPU registers to fetch data, finish computations and store results out. In the experiments, our algorithms deliver excellent performance on a benchmark suite including 205 sparse matrices from the SuiteSparse Matrix Collection. Specifically, on an Nvidia Pascal P100 GPU, our three register-aware sparse accumulators achieve on average 2.0 \\[\\times \\] (up to 5.4 \\[\\times \\]), 2.6 \\[\\times \\] (up to 10.5 \\[\\times \\]) and 1.7 \\[\\times \\] (up to 5.2 \\[\\times \\]) speedups over their original implementations in libraries bhSPARSE, RMerge and NSPARSE, respectively.
TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs
Sparse triangular solve (SpTRSV) is one of the most important level-2 kernels in sparse basic linear algebra subprograms (BLAS). Compared to another level-2 sparse BLAS kernel sparse matrix–vector multiplication (SpMV), SpTRSV is in general more difficult to find high parallelism on many-core processors, such as GPUs. Nowadays, much work focuses on reducing dependencies and synchronizations in the level-set and Sync-free algorithms for SpTRSV. However, there is less work that can make good use of sparse spatial structure for SpTRSV on GPUs. In this paper, we propose a tiled algorithm called TileSpTRSV for optimizing SpTRSV on GPUs through exploiting 2D spatial structure of sparse matrices. We design two algorithm implementations, i.e., TileSpTRSV_level-set and TileSpTRSV_sync-free, for TileSpTRSV on top of level-set and Sync-free algorithms, respectively. By testing 16 representative matrices on a latest NVIDIA GPU, the experimental results show that TileSpTRSV_level-set gives on average 5.29 × (up to 38.10 × ), 5.33 × (up to 21.32 × ) and 2.62 × (up to 12.87 × ) speedups over cuSPARSE, Sync-free and Recblock algorithms on the 16 representative matrices, respectively.
Optimizing partitioned CSR-based SpGEMM on the Sunway TaihuLight
General sparse matrix-sparse matrix (SpGEMM) multiplication is one of the basic kernels in a great many applications. Several works focus on various optimizations for SpGEMM. To fully exploit the powerful computing capability of the Sunway TaihuLight supercomputer for SpGEMM, this paper designs the partitioning method and parallelization of CSR-based SpGEMM to make it well match to the Sunway architecture. In addition, this paper optimizes the partitioning method based on the distribution of the floating-point calculations of the CSR-based SpGEMM to achieve the load balance and performance improvement on the Sunway. We, respectively, analyze the performance, including the memory footprint and the execution time, of the parallel CSR-based SpGEMM and the optimized CSR-based SpGEMM on the Sunway. The experimental results show that the optimized CSR-based SpGEMM outperforms over the parallel CSR-based SpGEMM and has good scalability on the Sunway.
Faster algorithms for sparse ILP and hypergraph multi-packing/multi-cover problems
In our paper, we consider the following general problems: check feasibility, count the number of feasible solutions, find an optimal solution, and count the number of optimal solutions in P∩Zn, assuming that P is a polyhedron, defined by systems Ax≤b or Ax=b,x≥0 with a sparse matrix A. We develop algorithms for these problems that outperform state-of-the-art ILP and counting algorithms on sparse instances with bounded elements in terms of the computational complexity. Assuming that the matrix A has bounded elements, our complexity bounds have the form sO(n), where s is the minimum between numbers of non-zeroes in columns and rows of A, respectively. For s=o(logn), this bound outperforms the state-of-the-art ILP feasibility complexity bound (logn)O(n), due to Reis & Rothvoss (in: 2023 IEEE 64th Annual symposium on foundations of computer science (FOCS), IEEE, pp. 974–988). For s=ϕo(logn), where ϕ denotes the input bit-encoding length, it outperforms the state-of-the-art ILP counting complexity bound ϕO(nlogn), due to Barvinok et al. (in: Proceedings of 1993 IEEE 34th annual foundations of computer science, pp. 566–572, https://doi.org/10.1109/SFCS.1993.366830, 1993), Dyer, Kannan (Math Oper Res 22(3):545–549, https://doi.org/10.1287/moor.22.3.545, 1997), Barvinok, Pommersheim (Algebr Combin 38:91–147, 1999), Barvinok (in: European Mathematical Society, ETH-Zentrum, Zurich, 2008). We use known and new methods to develop new exponential algorithms for Edge/Vertex Multi-Packing/Multi-Cover Problems on graphs and hypergraphs. This framework consists of many different problems, such as the Stable Multi-set, Vertex Multi-cover, Dominating Multi-set, Set Multi-cover, Multi-set Multi-cover, and Hypergraph Multi-matching problems, which are natural generalizations of the standard Stable Set, Vertex Cover, Dominating Set, Set Cover, and Maximum Matching problems.
Fast and accurate pseudoinverse with sparse matrix reordering and incremental approach
How can we compute the pseudoinverse of a sparse feature matrix efficiently and accurately for solving optimization problems? A pseudoinverse is a generalization of a matrix inverse, which has been extensively utilized as a fundamental building block for solving linear systems in machine learning. However, an approximate computation, let alone an exact computation, of pseudoinverse is very time-consuming due to its demanding time complexity, which limits it from being applied to large data. In this paper, we propose FastPI (Fast PseudoInverse), a novel incremental singular value decomposition (SVD) based pseudoinverse method for sparse matrices. Based on the observation that many real-world feature matrices are sparse and highly skewed, FastPI reorders and divides the feature matrix and incrementally computes low-rank SVD from the divided components. To show the efficacy of proposed FastPI, we apply them in real-world multi-label linear regression problems. Through extensive experiments, we demonstrate that FastPI computes the pseudoinverse faster than other approximate methods without loss of accuracy. Results imply that our method efficiently computes the low-rank pseudoinverse of a large and sparse matrix that other existing methods cannot handle with limited time and space.
Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs)
Graphics processing units (GPUs) have delivered a remarkable performance for a variety of high performance computing (HPC) applications through massive parallelism. One such application is sparse matrix-vector (SpMV) computations, which is central to many scientific, engineering, and other applications including machine learning. No single SpMV storage or computation scheme provides consistent and sufficiently high performance for all matrices due to their varying sparsity patterns. An extensive literature review reveals that the performance of SpMV techniques on GPUs has not been studied in sufficient detail. In this paper, we provide a detailed performance analysis of SpMV performance on GPUs using four notable sparse matrix storage schemes (compressed sparse row (CSR), ELLAPCK (ELL), hybrid ELL/COO (HYB), and compressed sparse row 5 (CSR5)), five performance metrics (execution time, giga floating point operations per second (GFLOPS), achieved occupancy, instructions per warp, and warp execution efficiency), five matrix sparsity features (nnz, anpr, nprvariance, maxnpr, and distavg), and 17 sparse matrices from 10 application domains (chemical simulations, computational fluid dynamics (CFD), electromagnetics, linear programming, economics, etc.). Subsequently, based on the deeper insights gained through the detailed performance analysis, we propose a technique called the heterogeneous CPU–GPU Hybrid (HCGHYB) scheme. It utilizes both the CPU and GPU in parallel and provides better performance over the HYB format by an average speedup of 1.7x. Heterogeneous computing is an important direction for SpMV and other application areas. Moreover, to the best of our knowledge, this is the first work where the SpMV performance on GPUs has been discussed in such depth. We believe that this work on SpMV performance analysis and the heterogeneous scheme will open up many new directions and improvements for the SpMV computing field in the future.
A novel image hashing with low-rank sparse matrix decomposition and feature distance
Image hashing is an efficient technique of image processing for various applications, such as retrieval, copy detection and authentication. In this paper, we design a novel image hashing algorithm using LRSMD (low-rank sparse matrix decomposition). Firstly, an input image is preprocessed by interpolation, Gaussian blur and color space conversion. Next, the preprocessed image is fed into the LRSMD for learning a low-rank matrix. Then, statistical features of non-overlapping blocks in the low-rank matrix are extracted. Finally, the hash code is obtained by calculating feature distances. Various experiments are done on public datasets to demonstrate the robustness and discrimination of the proposed algorithm. The results show that the proposed algorithm outperforms several advanced algorithms in balancing the performances of robustness and discrimination.