Catalogue Search | MBRL

SPH-DEM coupling method based on GPU and its application to the landslide tsunami. Part I: method and validation

by Zhou, Qian , Xu, Wen-Jie , Dong, Xue-Yang in Accuracy , Assembly , Coupling

2022

Landslide-induced tsunami is a complex fluid–solid coupling process that plays a crucial role in the study of a disaster chain. To simulate the coupling behaviors between the fluid and solid, a graphics processing unit-based coupled smoothed particle hydrodynamics (SPH)-discrete element method (DEM) code is developed. A series of numerical tests, which are based on the laboratory test by Koshizuka et al. (Particle method for calculating splashing of incompressible viscous fluid, 1995) and Kleefsman et al. (J Comput Phys 206:363–393, 2005), are carried out to study the influence of the parameters, and to verify the accuracy of the developed SPH code. To ensure accurate results of the SPH simulation, the values for the diffusion term, particle resolution (1/25 characteristic length), and smoothing length (1.2 times of particle interval) are suggested. The ratio of the SPH particle size and the DEM particle’s diameter influences the accuracy of the coupling simulation between solid particles and water. For the coupling simulation of a single particle or a loose particle assembly (not contact each other) with fluid, this ratio should be smaller than 1/20; for a dense particle assembly, a ratio of smaller than 1/6 will be good.

Journal Article

Share this book

Add to My Shelf

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

by Bobák, Martin , Tran, Viet , Dlugolinsky, Stefan in Algorithms , Artificial intelligence , Big Data

2019

The combined impact of new computing resources and techniques with an increasing avalanche of large datasets, is transforming many research areas and may lead to technological breakthroughs that can be used by billions of people. In the recent years, Machine Learning and especially its subfield Deep Learning have seen impressive advances. Techniques developed within these two fields are now able to analyze and learn from huge amounts of real world examples in a disparate formats. While the number of Machine Learning algorithms is extensive and growing, their implementations through frameworks and libraries is also extensive and growing too. The software development in this field is fast paced with a large number of open-source software coming from the academy, industry, start-ups or wider open-source communities. This survey presents a recent time-slide comprehensive overview with comparisons as well as trends in development and usage of cutting-edge Artificial Intelligence software. It also provides an overview of massive parallelism support that is capable of scaling computation effectively and efficiently in the era of Big Data.

Journal Article

Share this book

Add to My Shelf

Direct simulation of pore-scale two-phase visco-capillary flow on large digital rock images using a phase-field lattice Boltzmann method on general-purpose graphics processing units

by Dietderich, J. , Saxena, N. , Alpak, F. O. in Accuracy , BGK model , Boltzmann transport equation

2019

We describe the underlying mathematics, validation, and applications of a novel Helmholtz free-energy—minimizing phase-field model solved within the framework of the lattice Boltzmann method (LBM) for efficiently simulating two-phase pore-scale flow directly on large 3D images of real rocks obtained from micro-computed tomography (micro-CT) scanning. The code implementation of the technique, coined as the eLBM (energy-based LBM), is performed in CUDA programming language to take maximum advantage of accelerated computing by use of multinode general-purpose graphics processing units (GPGPUs). eLBM’s momentum-balance solver is based on the multiple-relaxation-time (MRT) model. The Boltzmann equation is discretized in space, velocity (momentum), and time coordinates using a 3D 19-velocity grid (D3Q19 scheme), which provides the best compromise between accuracy and computational efficiency. The benefits of the MRT model over the conventional single-relaxation-time Bhatnagar-Gross-Krook (BGK) model are (I) enhanced numerical stability, (II) independent bulk and shear viscosities, and (III) viscosity-independent, nonslip boundary conditions. The drawback of the MRT model is that it is slightly more computationally demanding compared to the BGK model. This minor hurdle is easily overcome through a GPGPU implementation of the MRT model for eLBM. eLBM is, to our knowledge, the first industrial grade–distributed parallel implementation of an energy-based LBM taking advantage of multiple GPGPU nodes. The Cahn-Hilliard equation that governs the order-parameter distribution is fully integrated into the LBM framework that accelerates the pore-scale simulation on real systems significantly. While individual components of the eLBM simulator can be separately found in various references, our novel contributions are (1) integrating all computational and high-performance computing components together into a unified implementation and (2) providing comprehensive and definitive quantitative validation results with eLBM in terms of robustness and accuracy for a variety of flow domains including various types of real rock images. We successfully validate and apply the eLBM on several transient two-phase flow problems of gradually increasing complexity. Investigated problems include the following: (1) snap-off in constricted capillary tubes; (2) Haines jumps on a micromodel (during drainage), Ketton limestone image, and Fontainebleau and Castlegate sandstone images (during drainage and subsequent imbibition); and (3) capillary desaturation simulations on a Berea sandstone image including a comparison of numerically computed residual non-wetting-phase saturations (as a function of the capillary number) to data reported in the literature. Extensive physical validation tests and applications on large 3D rock images demonstrate the reliability, robustness, and efficacy of the eLBM as a direct visco-capillary pore-scale two-phase flow simulator for digital rock physics workflows.

Journal Article

Share this book

Add to My Shelf

A distributed parallel multiple-relaxation-time lattice Boltzmann method on general-purpose graphics processing units for the rapid and scalable computation of absolute permeability from high-resolution 3D micro-CT images

by Hofmann, R. , Dietderich, J. , Gray, F. in Accuracy , Carbonates , Components

2018

Digital rock physics (DRP) is a rapidly evolving technology targeting fast turnaround times for repeatable core analysis and multi-physics simulation of rock properties. We develop and validate a rapid and scalable distributed-parallel single-phase pore-scale flow simulator for permeability estimation on real 3D pore-scale micro-CT images using a novel variant of the lattice Boltzmann method (LBM). The LBM code implementation is designed to take maximum advantage of distributed computing on multiple general-purpose graphics processing units (GPGPUs). We describe and extensively test the distributed parallel implementation of an innovative LBM algorithm for simulating flow in pore-scale media based on the multiple-relaxation-time (MRT) model that utilizes a precise treatment of body force. While the individual components of the resulting simulator can be separately found in various references, our novel contributions are (1) the integration of all of the mathematical and high-performance computing components together with a highly optimized code implementation and (2) the delivery of quantitative results with the simulator in terms of robustness, accuracy, and computational efficiency for a variety of flow geometries including various types of real rock images. We report on extensive validations of the simulator in terms of accuracy and provide near-ideal distributed parallel scalability results on large pore-scale image volumes that were largely computationally inaccessible prior to our implementation. We validate the accuracy of the MRT-LBM simulator on model geometries with analytical solutions. Permeability estimation results are then provided on large 3D binary microstructures including a sphere pack and rocks from various sandstone and carbonate formations. We quantify the scalability behavior of the distributed parallel implementation of MRT-LBM as a function of model type/size and the number of utilized GPGPUs for a panoply of permeability estimation problems.

Journal Article

Share this book

Add to My Shelf

Deep Learning with Microfluidics for Biotechnology

by Sovilj, Dušan , Sanner, Scott , Young, Edmond W.K. in Algorithms , Artificial intelligence , Artificial neural networks

2019

Advances in high-throughput and multiplexed microfluidics have rewarded biotechnology researchers with vast amounts of data but not necessarily the ability to analyze complex data effectively. Over the past few years, deep artificial neural networks (ANNs) leveraging modern graphics processing units (GPUs) have enabled the rapid analysis of structured input data – sequences, images, videos – to predict complex outputs with unprecedented accuracy. While there have been early successes in flow cytometry, for example, the extensive potential of pairing microfluidics (to acquire data) and deep learning (to analyze data) to tackle biotechnology challenges remains largely untapped. Here we provide a roadmap to integrating deep learning and microfluidics in biotechnology laboratories that matches computational architectures to problem types, and provide an outlook on emerging opportunities. High-throughput microfluidics has revolutionized biotechnology assays, enabling intriguing new approaches often at the single-cell level. Combining deep learning (to analyze data) with microfluidics (to acquire data) represents an emerging opportunity in biotechnology that remains largely untapped. Deep learning architectures have been developed to tackle raw structured data and address problems common to microfluidics applications in biotechnology. With the abundance of open-source training materials and low-cost graphics processing units, the barriers to entry for microfluidics labs have never been lower.

Journal Article

Share this book

Add to My Shelf

Toward Optimal Computation of Ultrasound Image Reconstruction Using CPU and GPU

by Udomchai Techavipoo , Nobuhiko Sugino , Rachaporn Keinprasit in array transducer , array transducer; CUDA; dynamic receive beamforming; graphics processing unit; image reconstruction; ultrasound imaging , Arrays

2016

An ultrasound image is reconstructed from echo signals received by array elements of a transducer. The time of flight of the echo depends on the distance between the focus to the array elements. The received echo signals have to be delayed to make their wave fronts and phase coherent before summing the signals. In digital beamforming, the delays are not always located at the sampled points. Generally, the values of the delayed signals are estimated by the values of the nearest samples. This method is fast and easy, however inaccurate. There are other methods available for increasing the accuracy of the delayed signals and, consequently, the quality of the beamformed signals; for example, the in-phase (I)/quadrature (Q) interpolation, which is more time consuming but provides more accurate values than the nearest samples. This paper compares the signals after dynamic receive beamforming, in which the echo signals are delayed using two methods, the nearest sample method and the I/Q interpolation method. The comparisons of the visual qualities of the reconstructed images and the qualities of the beamformed signals are reported. Moreover, the computational speeds of these methods are also optimized by reorganizing the data processing flow and by applying the graphics processing unit (GPU). The use of single and double precision floating-point formats of the intermediate data is also considered. The speeds with and without these optimizations are also compared.

Journal Article

Share this book

Add to My Shelf

Convolution hierarchical deep-learning neural network (C-HiDeNN) with graphics processing unit (GPU) acceleration

by Guo, Jiachen , Park, Chanwook , Wagner, Gregory J. in Accuracy , Classical and Continuum Physics , Computational Science and Engineering

2023

We propose the Convolution Hierarchical Deep-learning Neural Network (C-HiDeNN) that can be tuned to have superior accuracy, higher smoothness, and faster convergence rates like higher order finite element methods (FEM) while using only linear element’s degrees of freedom. This is based on our newly developed convolution interpolation theory (Lu et al. in Comput Mech, 2023) and this article focuses on the deep-learning interpretation of C-HiDeNN with graphics processing unit (GPU) programming using JAX library in Python. Instead of increasing the degrees of freedom like higher order FEM, C-HiDeNN takes advantage of neighboring elements to construct the so-called convolution patch functions. The computational overhead of C-HiDeNN is reduced by GPU programming and the total solution time is brought down to the same order as commercial FEM software running on a CPU, however, with orders of magnitude better accuracy and faster convergence rates. C-HiDeNN is locking-free regardless of element types (even with 3-node triangular elements or 4-node tetrahedral elements). C-HiDeNN is also capable of r-h-p-mesh adaptivity like its predecessor HiDeNN (Zhang et al. in Comput Mech 67:207–230, 2021) with additional “a” (dilation parameter) adaptivity that stems from the convolution patch function and “p” adaptivity with higher accuracy and with the same degrees of freedom as that of the linear finite elements. C-HiDeNN potentially has myriad future applications in multiscale analysis, additive and advanced manufacturing process simulations, and high-resolution topology optimization. Details on these applications can be found in the companion papers (Lu et al. 2023; Saha et al. in Comput Mech, 2023; Li et al. in Comput Mech, 2023) published in this special issue.

Journal Article

Share this book

Add to My Shelf

GEYSER: 3D thermo-hydrodynamic reactive transport numerical simulator including porosity and permeability evolution using GPU clusters

by Omlin, Samuel , Miller, Stephen A. , Sohrabi, Reza in Carbon dioxide , Carbon dioxide fixation , Carbon sequestration

2019

GEYSER, an acronym for Graphic processing units (GPU) cluster computing for Enhanced hYdrothermal SystEms with Reactive transport, is a 3D simulator that includes porosity and permeability evolution for mass and heat transport processes in fractured geological media. The simulator also includes mass porosity and permeability evolution in response to dehydration reactions of hydrous minerals. GEYSER utilizes a finite difference scheme to solve the governing PDEs associated with 3D large-scale hydrothermal systems or geothermal reservoirs. This tool is a high performance code using GPU workstations or cluster technology. The physical processes implemented into the code are those associated with deep hydrogeological complexes where high fluid pressures generated by dehydration reactions can be sufficient to induce hydrofractures that significantly influence the porosity and permeability structures within geological formation. The governing equations are described and implemented and applied to a simplified 3D model of a magmatic intrusion at depth underlying a deep sedimentary cover. Close to ideal, weak scaling is demonstrated on GPU clusters with up to 128 GPUs. The numerical model can be used to investigate and understand coupled and time-dependent hydromechanical and thermodynamic processes at high resolution of the 3D computational domain. Applications include the hydrogeology of volcanic environments or exploitation of sediment-hosted geothermal resources. The code can also be suited for porosity and permeability evolution regarding pressure and temperature reaction rate to rock decarbonization for CO 2 sequestration in deep sedimentary formations.

Journal Article

Share this book

Add to My Shelf

Evaluation of NVIDIA Xavier NX Platform for Real-Time Image Processing for Plasma Diagnostics

by Nowak vel Nowakowski, Patryk , Makowski, Dariusz , Gao, Yu in Algorithms , Artificial intelligence , Benchmarks

2022

Machine protection is a core task of real-time image diagnostics aiming for steady-state operation in nuclear fusion devices. The paper evaluates the applicability of the newest low-power NVIDIA Jetson Xavier NX platform for image plasma diagnostics. This embedded NVIDIA Tegra System-on-a-Chip (SoC) integrates a Graphics Processing Unit (GPU) and Central Processing Unit (CPU) on a single chip. The hardware differences and features compared to the previous NVIDIA Jetson TX2 are signified. Implemented algorithms detect thermal events in real-time, utilising the high parallelism provided by the embedded General-Purpose computing on Graphics Processing Units (GPGPU). The performance and accuracy are evaluated on the experimental data from the Wendelstein 7-X (W7-X) stellarator. Strike-line and reflection events are primarily investigated, yet benchmarks for overload hotspots, surface layers and visualisation algorithms are also included. Their detection might allow for automating real-time risk evaluation incorporated in the divertor protection system in W7-X. For the first time, the paper demonstrates the feasibility of complex real-time image processing in nuclear fusion applications on low-power embedded devices. Moreover, GPU-accelerated reference processing pipelines yielding higher accuracy compared to the literature results are proposed, and remarkable performance improvement resulting from the upgrade to the Xavier NX platform is attained.

Journal Article

Share this book

Add to My Shelf

CWLP: coordinated warp scheduling and locality-protected cache allocation on GPUs

by Zhang, Yang , Xing, Zuo-cheng , Liu, Cang in Chips (memory devices) , Communications Engineering , Computation

2018

As we approach the exascale era in supercomputing, designing a balanced computer system with a powerful computing ability and low power requirements has becoming increasingly important. The graphics processing unit (GPU) is an accelerator used widely in most of recent supercomputers. It adopts a large number of threads to hide a long latency with a high energy efficiency. In contrast to their powerful computing ability, GPUs have only a few megabytes of fast on-chip memory storage per streaming multiprocessor (SM). The GPU cache is inefficient due to a mismatch between the throughput-oriented execution model and cache hierarchy design. At the same time, current GPUs fail to handle burst-mode long-access latency due to GPU’s poor warp scheduling method. Thus, benefits of GPU’s high computing ability are reduced dramatically by the poor cache management and warp scheduling methods, which limit the system performance and energy efficiency. In this paper, we put forward a coordinated warp scheduling and locality-protected (CWLP) cache allocation scheme to make full use of data locality and hide latency. We first present a locality-protected cache allocation method based on the instruction program counter (LPC) to promote cache performance. Specifically, we use a PC-based locality detector to collect the reuse information of each cache line and employ a prioritised cache allocation unit (PCAU) which coordinates the data reuse information with the time-stamp information to evict the lines with the least reuse possibility. Moreover, the locality information is used by the warp scheduler to create an intelligent warp reordering scheme to capture locality and hide latency. Simulation results show that CWLP provides a speedup up to 19.8% and an average improvement of 8.8% over the baseline methods.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter