Catalogue Search | MBRL

General-purpose graphics processor architectures

by Aamodt, Tor M., author , Fung, Wilson Wai Lun, author , Rogers, Timothy G., author in Graphics processing units. , Computer architecture.

Originally developed to support video games, graphics processor units (GPUs) are now increasingly used for general-purpose (non-graphics) applications ranging from machine learning to mining of cryptographic currencies. GPUs can achieve improved performance and efficiency versus central processing units (CPUs) by dedicating a larger fraction of hardware resources to computation. In addition, their general-purpose programmability makes contemporary GPUs appealing to software developers in comparison to domain-specific accelerators. This book provides an introduction to those interested in studying the architecture of GPUs that support general-purpose computing. It collects together information currently only found among a wide range of disparate sources. The authors led development of the GPGPU-Sim simulator widely used in academic research on GPU architectures. The first chapter of this book describes the basic hardware structure of GPUs and provides a brief overview of their history. Chapter 2 provides a summary of GPU programming models relevant to the rest of the book. Chapter 3 explores the architecture of GPU compute cores. Chapter 4 explores the architecture of the GPU memory system. After describing the architecture of existing systems, Chapters 3 and 4 provide an overview of related research. Chapter 5 summarizes cross-cutting research impacting both the compute core and memory system. This book should provide a valuable resource for those wishing to understand the architecture of graphics processor units (GPUs) used for acceleration of general-purpose applications and to those who want to obtain an introduction to the rapidly growing body of research exploring how to improve the architecture of these GPUs.

Book

Share this book

Add to My Shelf

GPU Computing Gems Emerald Edition

by Hwu, Wen-mei W in Computer graphics , Digital techniques , Graphics Processing Unit (GPU)

2011

GPU Computing Gems Emerald Edition offers practical techniques in parallel computing using graphics processing units (GPUs) to enhance scientific research.The first volume in Morgan Kaufmann's Applications of GPU Computing Series, this book offers the latest insights and research in computer vision, electronic design automation, and emerging.

eBook

Share this book

Add to My Shelf

GPU pro 5 : advanced rendering techniques

by Engel, Wolfgang F., editor in Rendering (Computer graphics) , Graphics processing units Programming. , Computer graphics.

Book

Share this book

Add to My Shelf

A distributed parallel multiple-relaxation-time lattice Boltzmann method on general-purpose graphics processing units for the rapid and scalable computation of absolute permeability from high-resolution 3D micro-CT images

by Hofmann, R. , Dietderich, J. , Gray, F. in Accuracy , Carbonates , Components

2018

Digital rock physics (DRP) is a rapidly evolving technology targeting fast turnaround times for repeatable core analysis and multi-physics simulation of rock properties. We develop and validate a rapid and scalable distributed-parallel single-phase pore-scale flow simulator for permeability estimation on real 3D pore-scale micro-CT images using a novel variant of the lattice Boltzmann method (LBM). The LBM code implementation is designed to take maximum advantage of distributed computing on multiple general-purpose graphics processing units (GPGPUs). We describe and extensively test the distributed parallel implementation of an innovative LBM algorithm for simulating flow in pore-scale media based on the multiple-relaxation-time (MRT) model that utilizes a precise treatment of body force. While the individual components of the resulting simulator can be separately found in various references, our novel contributions are (1) the integration of all of the mathematical and high-performance computing components together with a highly optimized code implementation and (2) the delivery of quantitative results with the simulator in terms of robustness, accuracy, and computational efficiency for a variety of flow geometries including various types of real rock images. We report on extensive validations of the simulator in terms of accuracy and provide near-ideal distributed parallel scalability results on large pore-scale image volumes that were largely computationally inaccessible prior to our implementation. We validate the accuracy of the MRT-LBM simulator on model geometries with analytical solutions. Permeability estimation results are then provided on large 3D binary microstructures including a sphere pack and rocks from various sandstone and carbonate formations. We quantify the scalability behavior of the distributed parallel implementation of MRT-LBM as a function of model type/size and the number of utilized GPGPUs for a panoply of permeability estimation problems.

Journal Article

Share this book

Add to My Shelf

GPU pro 7 : advanced rendering techniques

by Engel, Wolfgang in Computer graphics. , Rendering (Computer graphics) , Real-time data processing.

Book

Share this book

Add to My Shelf

DIESEL: A novel deep learning-based tool for SpMV computations and solving sparse linear equation systems

by Katib, Iyad , Albeshri, Aiiad , Mohammed, Thaha in Accuracy , Algorithms , Artificial intelligence

2021

Sparse linear algebra is central to many areas of engineering, science, and business. The community has done considerable work on proposing new methods for sparse matrix-vector multiplication (SpMV) computations and iterative sparse solvers on graphical processing units (GPUs). Due to vast variations in matrix features, no single method performs well across all sparse matrices. A few tools on automatic prediction of best-performing SpMV kernels have emerged recently and require many more efforts to fully utilize their potential. The utilization of a GPU by the existing SpMV kernels is far from its full capacity. Moreover, the development and performance analysis of SpMV techniques on GPUs have not been studied in sufficient depth. This paper proposes DIESEL, a deep learning-based tool that predicts and executes the best performing SpMV kernel for a given matrix using a feature set carefully devised by us through rigorous empirical and mathematical instruments. The dataset comprises 1056 matrices from 26 different real-life application domains including computational fluid dynamics, materials, electromagnetics, economics, and more. We propose a range of new metrics and methods for performance analysis, visualization, and comparison of SpMV tools. DIESEL provides better performance with its accuracy 88.2% , workload accuracy 91.96% , and average relative loss 4.4% , compared to 85.9% , 85.31% , and 7.65% by the next best performing artificial intelligence (AI)-based SpMV tool. The extensive results and analyses presented in this paper provide several key insights into the performance of the SpMV tools and how these relate to the matrix datasets and the performance metrics, allowing the community to further improve and compare basic and AI-based SpMV tools in the future.

Journal Article

Share this book

Add to My Shelf

Heterogeneous parallel computing accelerated iterative subpixel digital image correlation

by Chen, Wei , Zhou, LiCheng , Liu, ZeJia in Algorithms , Central processing units , CPUs

2018

Parallel computing techniques have been introduced into digital image correlation (DIC) in recent years and leads to a surge in computation speed. The graphics processing unit (GPU)-based parallel computing demonstrated a surprising effect on accelerating the iterative subpixel DIC, compared with CPU-based parallel computing. In this paper, the performances of the two kinds of parallel computing techniques are compared for the previously proposed path-independent DIC method, in which the initial guess for the inverse compositional Gauss-Newton (IC-GN) algorithm at each point of interest (POI) is estimated through the fast Fourier transform-based cross-correlation (FFT-CC) algorithm. Based on the performance evaluation, a heterogeneous parallel computing (HPC) model is proposed with hybrid mode of parallelisms in order to combine the computing power of GPU and multicore CPU. A scheme of trial computation test is developed to optimize the configuration of the HPC model on a specific computer. The proposed HPC model shows excellent performance on a middle-end desktop computer for real-time subpixel DIC with high resolution of more than 10000 POIs per frame.

Journal Article

Share this book

Add to My Shelf

A heterogeneous parallel Red–Black SOR technique and the numerical study on SIMPLE

by Li, Ruitian , Gong, Liang , Xu, Minghai in Algorithms , Color , Computing time

2020

A basic heterogeneous parallel Red–Black successive over-relaxation (SOR) implement, the mono-color floating-point scheme, was developed on graphics processing units (GPU) with OpenCL platform. Designed in fine granularity, compact data structure, and stencil function, a concise mapping relationship was created to implicitly describe the complex rules for searching neighbor elements, which could avoid low utilization of GPU in the traditional scheme of Red–Black SOR. The new mono-color floating-point scheme was applied to build fast Semi-Implicit Method for Pressure Linked Equations (SIMPLE) solver with OpenCL and OpenMP on the heterogeneous parallel computing device. Compared with SIMPLE solver in the traditional Red–Black SOR scheme, the new scheme can achieve 1.7 to 1.8 faster accelerative performance on the same GPU. And this scheme can eliminate the complex searching module in mono-color logical scheme and behave better than the mono-color logical scheme by 20–30% acceleration. Numerical cases in double precision showed that SIMPLE solver on GPU with new scheme of Red–Black SOR could save up to 92% computing time compared with the serial solver on CPU.

Journal Article

Share this book

Add to My Shelf

Direct simulation of pore-scale two-phase visco-capillary flow on large digital rock images using a phase-field lattice Boltzmann method on general-purpose graphics processing units

by Dietderich, J. , Saxena, N. , Alpak, F. O. in Accuracy , BGK model , Boltzmann transport equation

2019

We describe the underlying mathematics, validation, and applications of a novel Helmholtz free-energy—minimizing phase-field model solved within the framework of the lattice Boltzmann method (LBM) for efficiently simulating two-phase pore-scale flow directly on large 3D images of real rocks obtained from micro-computed tomography (micro-CT) scanning. The code implementation of the technique, coined as the eLBM (energy-based LBM), is performed in CUDA programming language to take maximum advantage of accelerated computing by use of multinode general-purpose graphics processing units (GPGPUs). eLBM’s momentum-balance solver is based on the multiple-relaxation-time (MRT) model. The Boltzmann equation is discretized in space, velocity (momentum), and time coordinates using a 3D 19-velocity grid (D3Q19 scheme), which provides the best compromise between accuracy and computational efficiency. The benefits of the MRT model over the conventional single-relaxation-time Bhatnagar-Gross-Krook (BGK) model are (I) enhanced numerical stability, (II) independent bulk and shear viscosities, and (III) viscosity-independent, nonslip boundary conditions. The drawback of the MRT model is that it is slightly more computationally demanding compared to the BGK model. This minor hurdle is easily overcome through a GPGPU implementation of the MRT model for eLBM. eLBM is, to our knowledge, the first industrial grade–distributed parallel implementation of an energy-based LBM taking advantage of multiple GPGPU nodes. The Cahn-Hilliard equation that governs the order-parameter distribution is fully integrated into the LBM framework that accelerates the pore-scale simulation on real systems significantly. While individual components of the eLBM simulator can be separately found in various references, our novel contributions are (1) integrating all computational and high-performance computing components together into a unified implementation and (2) providing comprehensive and definitive quantitative validation results with eLBM in terms of robustness and accuracy for a variety of flow domains including various types of real rock images. We successfully validate and apply the eLBM on several transient two-phase flow problems of gradually increasing complexity. Investigated problems include the following: (1) snap-off in constricted capillary tubes; (2) Haines jumps on a micromodel (during drainage), Ketton limestone image, and Fontainebleau and Castlegate sandstone images (during drainage and subsequent imbibition); and (3) capillary desaturation simulations on a Berea sandstone image including a comparison of numerically computed residual non-wetting-phase saturations (as a function of the capillary number) to data reported in the literature. Extensive physical validation tests and applications on large 3D rock images demonstrate the reliability, robustness, and efficacy of the eLBM as a direct visco-capillary pore-scale two-phase flow simulator for digital rock physics workflows.

Journal Article

Share this book

Add to My Shelf

Evolutionary induction of a decision tree for large-scale data: a GPU-based approach

by Czajkowski, Marcin , Jurczuk, Krzysztof , Kretowski, Marek in Algorithms , Artificial Intelligence , Central processing units

2017

Evolutionary induction of decision trees is an emerging alternative to greedy top-down approaches. Its growing popularity results from good prediction performance and less complex output trees. However, one of the major drawbacks associated with the application of evolutionary algorithms is the tree induction time, especially for large-scale data. In the paper, we design and implement a graphics processing unit (GPU)-based parallelization of evolutionary induction of decision trees. We apply a Compute Unified Device Architecture programming model, which supports general-purpose computation on a GPU (GPGPU). The selection and genetic operators are performed sequentially on a CPU, while the evaluation process for the individuals in the population is parallelized. The data-parallel approach is applied, and thus, the parts of a dataset are spread over GPU cores. Each core processes the assigned chunk of the data. Finally, the results from all GPU cores are merged and the sought tree metrics are sent to the CPU. Computational performance of the proposed approach is validated experimentally on artificial and real-life datasets. A comparison with the traditional CPU version shows that evolutionary induction of decision trees supported by GPGPU can be accelerated significantly (even up to 800 times) and allows for processing of much larger datasets.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter