Catalogue Search | MBRL

Acceleration of Approximate Matrix Multiplications on GPUs

by Okuyama, Takuya , Röhm, André , Mihana, Takatomo in Acceleration , Accuracy , Algorithms

2023

Matrix multiplication is important in various information-processing applications, including the computation of eigenvalues and eigenvectors, and in combinatorial optimization algorithms. Therefore, reducing the computation time of matrix products is essential to speed up scientific and practical calculations. Several approaches have been proposed to speed up this process, including GPUs, fast matrix multiplication libraries, custom hardware, and efficient approximate matrix multiplication (AMM) algorithms. However, research to date has yet to focus on accelerating AMMs for general matrices on GPUs, despite the potential of GPUs to perform fast and accurate matrix product calculations. In this paper, we propose a method for improving Monte Carlo AMMs. We also give an analytical solution for the optimal values of the hyperparameters in the proposed method. The proposed method improves the approximation of the matrix product without increasing the computation time compared to the conventional AMMs. It is also designed to work well with parallel operations on GPUs and can be incorporated into various algorithms. Finally, the proposed method is applied to a power method used for eigenvalue computation. We demonstrate that, on an NVIDIA A100 GPU, the computation time can be halved compared to the conventional power method using cuBLAS.

Journal Article

Share this book

Add to My Shelf

Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence

by Patterson, Joshua , Raschka, Sebastian , Nolet, Corey in data science , deep learning , GPU computing

2020

Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline. At the core of this revolution lies the tools and the methods that are driving it, from processing the massive piles of data generated each day to learning from and taking useful action. Deep neural networks, along with advancements in classical machine learning and scalable general-purpose graphics processing unit (GPU) computing, have become critical components of artificial intelligence, enabling many of these astounding breakthroughs and lowering the barrier to adoption. Python continues to be the most preferred language for scientific computing, data science, and machine learning, boosting both performance and productivity by enabling the use of low-level libraries and clean high-level APIs. This survey offers insight into the field of machine learning with Python, taking a tour through important topics to identify some of the core hardware and software paradigms that have enabled it. We cover widely-used libraries and concepts, collected together for holistic comparison, with the goal of educating the reader and driving the field of Python machine learning forward.

Journal Article

Share this book

Add to My Shelf

Fast Gravitational-wave Parameter Estimation without Compromises

by Wong, Kaze W. K , Edwards, Thomas D. P , Isi, Maximiliano in Astrophysics , Gravitational waves , Heterodyning

2023

We present a lightweight, flexible, and high-performance framework for inferring the properties of gravitational-wave events. By combining likelihood heterodyning, automatically differentiable, and accelerator-compatible waveforms, and gradient-based Markov Chain Monte Carlo sampling enhanced by normalizing flows, we achieve full Bayesian parameter estimation for real events like GW150914 and GW170817 within a minute of sampling time. Our framework does not require pretraining or explicit reparameterizations and can be generalized to handle higher dimensional problems. We present the details of our implementation and discuss trade-offs and future developments in the context of other proposed strategies for real-time parameter estimation. Our code for running the analysis is publicly available on GitHub at https://github.com/kazewong/jim.

Journal Article

Share this book

Add to My Shelf

CGOLS V: Disk-wide Stellar Feedback and Observational Implications of the Cholla Galactic Wind Model

by Schneider, Evan E , Mao, S. Alwin in Absorption , Energy , Energy flux

2024

We present the fifth simulation in the Cholla Galactic OutfLow Simulation (CGOLS) project—a set of isolated starburst galaxy simulations modeled over large scales (10 kpc) at uniformly high resolution (Δx ≈ 5 pc). Supernova feedback in this simulation is implemented as a disk-wide distribution of clusters, and we assess the impact of this geometry on several features of the resulting outflow, including the radial profiles of various phases; mass, momentum, and energy outflow rates; covering fraction of cool gas; mock absorption-line spectra; and X-ray surface brightness. In general, we find that the outflow generated by this model is cooler, slower, and contains more mass in the cool phase than a more centrally concentrated outflow driven by a similar number of supernovae. In addition, the energy loading factors in the hot phase are an order of magnitude lower, indicating much larger losses due to radiative cooling in the outflow. However, coupling between the hot and cool phases is more efficient than in the nuclear burst case, with almost 50% of the total outflowing energy flux carried by the cool phase at a radial distance of 5 kpc. These physical differences have corresponding signatures in observable quantities: the covering fraction of cool gas is much larger, and there is greater evidence of absorption in low and intermediate ionization energy lines. Taken together, our simulations indicate that centrally concentrated starbursts are more effective at driving hot, low-density outflows that will expand far into the halo, while galaxy-wide bursts may be more effective at removing cool gas from the disk.

Journal Article

Share this book

Add to My Shelf

Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems

by Haidar, Azzam , Higham, Nicholas J. , Dongarra, Jack in GMRES , GPU computing , half precision arithmetic

2020

Double-precision floating-point arithmetic (FP64) has been the de facto standard for engineering and scientific simulations for several decades. Problem complexity and the sheer volume of data coming from various instruments and sensors motivate researchers to mix and match various approaches to optimize compute resources, including different levels of floating-point precision. In recent years, machine learning has motivated hardware support for half-precision floating-point arithmetic. A primary challenge in high-performance computing is to leverage reduced-precision and mixed-precision hardware. We show how the FP16/FP32 Tensor Cores on NVIDIA GPUs can be exploited to accelerate the solution of linear systems of equations Ax = b without sacrificing numerical stability. The techniques we employ include multiprecision LU factorization, the preconditioned generalized minimal residual algorithm (GMRES), and scaling and auto-adaptive rounding to avoid overflow. We also show how to efficiently handle systems with multiple right-hand sides. On the NVIDIA Quadro GV100 (Volta) GPU, we achieve a 4 × –5× performance increase and 5× better energy efficiency versus the standard FP64 implementation while maintaining an FP64 level of numerical stability.

Journal Article

Share this book

Add to My Shelf

BindsNET: A Machine Learning-Oriented Spiking Neural Networks Library in Python

by Sanghavi, Darpan T. , Hazan, Hananel , Khan, Hassaan in Algorithms , Artificial intelligence , Back propagation

2018

The development of spiking neural network simulation software is a critical component enabling the modeling of neural systems and the development of biologically inspired algorithms. Existing software frameworks support a wide range of neural functionality, software abstraction levels, and hardware devices, yet are typically not suitable for rapid prototyping or application to problems in the domain of machine learning. In this paper, we describe a new Python package for the simulation of spiking neural networks, specifically geared toward machine learning and reinforcement learning. Our software, called BindsNET, enables rapid building and simulation of spiking networks and features user-friendly, concise syntax. BindsNET is built on the PyTorch deep neural networks library, facilitating the implementation of spiking neural networks on fast CPU and GPU computational platforms. Moreover, the BindsNET framework can be adjusted to utilize other existing computing and hardware backends; e.g., TensorFlow and SpiNNaker. We provide an interface with the OpenAI gym library, allowing for training and evaluation of spiking networks on reinforcement learning environments. We argue that this package facilitates the use of spiking networks for large-scale machine learning problems and show some simple examples by using BindsNET in practice.

Journal Article

Share this book

Add to My Shelf

Tofu: a fast, versatile and user‐friendly image processing toolkit for computed tomography

by Helfen, Lukas , Emslie, Iain , Zuber, Marcus in 3D reconstruction , Algorithms , artifact removal

2022

Tofu is a toolkit for processing large amounts of images and for tomographic reconstruction. Complex image processing tasks are organized as workflows of individual processing steps. The toolkit is able to reconstruct parallel and cone beam as well as tomographic and laminographic geometries. Many pre‐ and post‐processing algorithms needed for high‐quality 3D reconstruction are available, e.g. phase retrieval, ring removal and de‐noising. Tofu is optimized for stand‐alone GPU workstations on which it achieves reconstruction speed comparable with costly CPU clusters. It automatically utilizes all GPUs in the system and generates 3D reconstruction code with minimal number of instructions given the input geometry (parallel/cone beam, tomography/laminography), hence yielding optimal run‐time performance. In order to improve accessibility for researchers with no previous knowledge of programming, tofu contains graphical user interfaces for both optimization of 3D reconstruction parameters and batch processing of data with pre‐configured workflows for typical computed tomography reconstruction. The toolkit is open source and extensive documentation is available for both end‐users and developers. Thanks to the mentioned features, tofu is suitable for both expert users with specialized image processing needs (e.g. when dealing with data from custom‐built computed tomography scanners) and for application‐specific end‐users who just need to reconstruct their data on off‐the‐shelf hardware. The versatile and user‐friendly image processing toolkit tofu, optimized for 3D reconstruction of parallel beam, cone beam, tomography and laminography data, is presented.

Journal Article

Share this book

Add to My Shelf

FastQSL: A Fast Computation Method for Quasi-separatrix Layers

by Liu, Rui , Chen, Jun , Zhang, PeiJin in Astronomy , Astrophysics , Data transfer (computers)

2022

Magnetic reconnection preferentially takes place at the intersection of two separatrices or two quasi-separatrix layers, which can be quantified by the squashing factor Q, whose calculation is computationally expensive due to the need to trace as many field lines as possible. We developed a method (FastQSL) optimized for obtaining Q and the twist number in a 3D data cube. FastQSL utilizes the hardware acceleration of the graphics processing unit and adopts a step-size adaptive scheme for the most computationally intensive part: tracing magnetic field lines. As a result, it achieves a computational efficiency of 4.53 million Q values per second. FastQSL is open source, and user-friendly for data import, export, and visualization.

Journal Article

Share this book

Add to My Shelf

Performance-portable Binary Neutron Star Mergers with AthenaK

by Cook, William , Fields, Jacob , Radice, David in Accretion disks , Binary stars , Black holes

2025

We introduce an extension to the AthenaK code for general-relativistic magnetohydrodynamics (GRMHD) in dynamical spacetimes using a 3+1 conservative Eulerian formulation. Like the fixed-spacetime GRMHD solver, we use standard finite-volume methods to evolve the fluid and a constrained-transport scheme to preserve the divergence-free constraint for the magnetic field. We also utilize a first-order flux correction (FOFC) scheme to reduce the need for an artificial atmosphere and optionally enforce a maximum principle to improve robustness. We demonstrate the accuracy of AthenaK using a set of standard tests in flat and curved spacetimes. Using a SANE accretion disk around a Kerr black hole, we compare the new solver to the existing solver for stationary spacetimes using the so-called “HARM-like” formulation. We find that both formulations converge to similar results. We also include the first published binary neutron star (BNS) mergers performed on graphical processing units (GPUs). Thanks to the FOFC scheme, our BNS mergers maintain a relative error of O(10−11) or better in baryon mass conservation up to collapse. Finally, we perform scaling tests of AthenaK on OLCF Frontier, where we show excellent weak scaling of ≥80% efficiency up to 32,768 GPUs and 74% up to 65,536 GPUs for a GRMHD problem in dynamical spacetimes with six levels of mesh refinement. AthenaK achieves an order-of-magnitude speedup using GPUs compared to CPUs, demonstrating that it is suitable for performing numerical relativity problems on modern exascale resources.

Journal Article

Share this book

Add to My Shelf

Accelerating Non-LTE Synthesis and Inversions with Graph Networks

by Esteban Pozuelo, S , Vicente Arévalo, A , Asensio Ramos, A in Chromosphere , Computing costs , Datasets

2022

The computational cost of fast non-LTE synthesis is one of the challenges that limits the development of 2D and 3D inversion codes. It also makes the interpretation of observations of lines formed in the chromosphere and transition region a slow and computationally costly process, which limits the inference of the physical properties on rather small fields of view. Having access to a fast way of computing the deviation from the LTE regime through the departure coefficients could largely alleviate this problem. We propose to build and train a graph network that quickly predicts the atomic level populations without solving the non-LTE problem. We find an optimal architecture for the graph network for predicting the departure coefficients of the levels of an atom from the physical conditions of a model atmosphere. A suitable data set with a representative sample of potential model atmospheres is used for training. This data set has been computed using existing non-LTE synthesis codes. The graph network has been integrated into existing synthesis and inversion codes for the particular case of Ca ii. We demonstrate orders-of-magnitude gain in computing speed. We analyze the generalization capabilities of the graph network and demonstrate that it produces good predicted departure coefficients for unseen models. We implement this approach in Hazel2 and show how the inversions nicely compare with those obtained with standard non-LTE inversion codes. Our approximate method opens up the possibility of extracting physical information from the chromosphere on large fields of view with time evolution.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter