Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
2,570
result(s) for
"GPU"
Sort by:
Accelerating AutoDock Vina with GPUs
2022
AutoDock Vina is one of the most popular molecular docking tools. In the latest benchmark CASF-2016 for comparative assessment of scoring functions, AutoDock Vina won the best docking power among all the docking tools. Modern drug discovery is facing a common scenario of large virtual screening of drug hits from huge compound databases. Due to the seriality characteristic of the AutoDock Vina algorithm, there is no successful report on its parallel acceleration with GPUs. Current acceleration of AutoDock Vina typically relies on the stack of computing power as well as the allocation of resource and tasks, such as the VirtualFlow platform. The vast resource expenditure and the high access threshold of users will greatly limit the popularity of AutoDock Vina and the flexibility of its usage in modern drug discovery. In this work, we proposed a new method, Vina-GPU, for accelerating AutoDock Vina with GPUs, which is greatly needed for reducing the investment for large virtual screens and also for wider application in large-scale virtual screening on personal computers, station servers or cloud computing, etc. Our proposed method is based on a modified Monte Carlo using simulating annealing AI algorithm. It greatly raises the number of initial random conformations and reduces the search depth of each thread. Moreover, a classic optimizer named BFGS is adopted to optimize the ligand conformations during the docking progress, before a heterogeneous OpenCL implementation was developed to realize its parallel acceleration leveraging thousands of GPU cores. Large benchmark tests show that Vina-GPU reaches an average of 21-fold and a maximum of 50-fold docking acceleration against the original AutoDock Vina while ensuring their comparable docking accuracy, indicating its potential for pushing the popularization of AutoDock Vina in large virtual screens.
Journal Article
Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey
by
Bobák, Martin
,
Tran, Viet
,
Dlugolinsky, Stefan
in
Algorithms
,
Artificial intelligence
,
Big Data
2019
The combined impact of new computing resources and techniques with an increasing avalanche of large datasets, is transforming many research areas and may lead to technological breakthroughs that can be used by billions of people. In the recent years, Machine Learning and especially its subfield Deep Learning have seen impressive advances. Techniques developed within these two fields are now able to analyze and learn from huge amounts of real world examples in a disparate formats. While the number of Machine Learning algorithms is extensive and growing, their implementations through frameworks and libraries is also extensive and growing too. The software development in this field is fast paced with a large number of open-source software coming from the academy, industry, start-ups or wider open-source communities. This survey presents a recent time-slide comprehensive overview with comparisons as well as trends in development and usage of cutting-edge Artificial Intelligence software. It also provides an overview of massive parallelism support that is capable of scaling computation effectively and efficiently in the era of Big Data.
Journal Article
FUNWAVE‐GPU: Multiple‐GPU Acceleration of a Boussinesq‐Type Wave Model
2020
This paper documents development of a multiple‐Graphics Processing Unit (GPU) version of FUNWAVE‐Total Variation Diminishing (TVD), an open‐source model for solving the fully nonlinear Boussinesq wave equations using a high‐order TVD solver. The numerical schemes of FUNWAVE‐TVD, including Cartesian and spherical coordinates, are rewritten using CUDA Fortran, with inter‐GPU communication facilitated by the Message Passing Interface. Since FUNWAVE‐TVD involves the discretization of high‐order dispersive derivatives, the on‐chip shared memory is utilized to reduce global memory access. To further optimize performance, the batched tridiagonal solver is scheduled simultaneously in multiple‐GPU streams, which can reduce the GPU execution time by 20–30%. The GPU version is validated through a benchmark test for wave runup on a complex shoreline geometry, as well as a basin‐scale tsunami simulation of the 2011 Tohoku‐oki event. Efficiency evaluation shows that, in comparison with the CPU version running at a 36‐core HPC node, speedup ratios of 4–7 and above 10 can be observed for single‐ and double‐GPU runs, respectively. The performance metrics of multiple‐GPU implementation needs to be further evaluated when appropriate. Plain Language Summary Numerical modeling of surface wave dynamics is necessary for coastal infrastructure design. FUNWAVE‐Total Variation Diminishing is a widely accepted open‐source wave model for simulating surface wave propagation and wave‐driven processes in the nearshore region, as well as tsunami wave propagation at oceanic scales. Due to the complexity of governing equations and corresponding numerical methods, the modeling of wave dynamics usually depends on the use of High Performance Clusters, which are both expensive and power consuming. To address this problem, Graphics Processing Unit (GPU)‐accelerated computing is introduced in the FUNWAVE‐Total Variation Diminishing for wave dynamics modeling in this study. GPUs were originally used for image processing and visualization purpose in personal computers. Because GPUs have thousands of “Cores” that can implement arithmetic computations simultaneously, they are now widely employed to facilitate computing‐intensive tasks such as deep learning and engineering computations. We find that by porting wave model to GPU devices, the modeling of surface wave dynamics over a large domain can be achieved by an affordable stand‐alone PC with GPU cards installed. Key Points The fully nonlinear Boussinesq wave model FUNWAVE‐TVD is ported to multiple‐GPU for acceleration The GPU version is ideal for solving wave problems over large computational domains in a stand‐alone machine
Journal Article
GPU accelerated adaptive banded event alignment for rapid comparative nanopore signal analysis
by
Simpson, Jared T.
,
Smith, Martin A.
,
Parameswaran, Sri
in
Adaptive algorithms
,
Algorithms
,
Alignment
2020
Background
Nanopore sequencing enables portable, real-time sequencing applications, including point-of-care diagnostics and in-the-field genotyping. Achieving these outcomes requires efficient bioinformatic algorithms for the analysis of raw nanopore signal data. However, comparing raw nanopore signals to a biological reference sequence is a computationally complex task. The dynamic programming algorithm called Adaptive Banded Event Alignment (ABEA) is a crucial step in polishing sequencing data and identifying non-standard nucleotides, such as measuring DNA methylation. Here, we parallelise and optimise an implementation of the ABEA algorithm (termed
f5c
) to efficiently run on heterogeneous CPU-GPU architectures.
Results
By optimising memory, computations and load balancing between CPU and GPU, we demonstrate how
f5c
can perform ∼3-5 × faster than an optimised version of the original CPU-only implementation of ABEA in the
Nanopolish
software package. We also show that
f5c
enables DNA methylation detection on-the-fly using an embedded System on Chip (SoC) equipped with GPUs.
Conclusions
Our work not only demonstrates that complex genomics analyses can be performed on lightweight computing systems, but also benefits High-Performance Computing (HPC). The associated source code for
f5c
along with GPU optimised ABEA is available at
https://github.com/hasindu2008/f5c
.
Journal Article
Multi-GPU multi-display rendering of extremely large 3D environments
2023
In real-time rendering applications, mesh rendering quality suffers from limited GPU memory capacity and display resolution. Due to the increased complexity of models and the demand for higher display resolutions, people have started building commodity workstations with multiple GPUs at a low cost. As a result, more GPU memory is available across multiple GPUs, and a higher display resolution can be achieved by connecting each GPU to a display monitor, resulting in a large tiled display configuration. However, a multi-GPU workstation may not efficiently handle a complex model that cannot fit into the GPU memory, due to (1) the unified configuration treating GPUs as one hardware entity and requiring the same data replicated in all GPUs, and (2) the lack of scalability to reduce, balance, and stream data dynamically between the CPU and GPUs as well as among the GPUs. In this work, we present a fine-grained parallel rendering approach that integrates a view-dependent LOD selection strategy with the inter-GPU load balancing method to ensure each GPU handles the portion of data it rasterizes, without data replication. A new multi-GPU out-of-core method minimizes the amount of data transferred from the CPU to each GPU by taking the advantage of frame-to-frame coherence. A comprehensive evaluation is presented to understand the efficiency and scalability of the execution components over extremely large scenes.
Journal Article
A parallel image encryption algorithm based on the piecewise linear chaotic map and hyper-chaotic map
by
Ding, Xuemei
,
Luo, Yuling
,
Liu, Junxiu
in
Algorithms
,
Automotive Engineering
,
Classical Mechanics
2018
This paper proposes a parallel digital image encryption algorithm based on a piecewise linear chaotic map (PWLCM) and a four-dimensional hyper-chaotic map (FDHCM). Firstly, two decimals are obtained based on the plain-image and external keys, using a novel parallel quantification method. They are used as the initial value and control parameter for the PWLCM. Then, an encryption matrix and four chaotic sequences are constructed using the PWLCM and FDHCM, which control the permutation and diffusion processes. The proposed algorithm is implemented and tested in parallel based on a graphics processing unit device. Numerical analysis and experimental results show that the proposed algorithm achieves a high encryption speed and a good security performance, which provides a potential solution for real-time image encryption applications.
Journal Article
Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence
by
Patterson, Joshua
,
Raschka, Sebastian
,
Nolet, Corey
in
data science
,
deep learning
,
GPU computing
2020
Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline. At the core of this revolution lies the tools and the methods that are driving it, from processing the massive piles of data generated each day to learning from and taking useful action. Deep neural networks, along with advancements in classical machine learning and scalable general-purpose graphics processing unit (GPU) computing, have become critical components of artificial intelligence, enabling many of these astounding breakthroughs and lowering the barrier to adoption. Python continues to be the most preferred language for scientific computing, data science, and machine learning, boosting both performance and productivity by enabling the use of low-level libraries and clean high-level APIs. This survey offers insight into the field of machine learning with Python, taking a tour through important topics to identify some of the core hardware and software paradigms that have enabled it. We cover widely-used libraries and concepts, collected together for holistic comparison, with the goal of educating the reader and driving the field of Python machine learning forward.
Journal Article
Numerical behavior of NVIDIA tensor cores
by
Pranesh, Srikara
,
Higham, Nicholas J.
,
Mikaitis, Mantas
in
Accelerators
,
Algorithms and Analysis of Algorithms
,
Binary16
2021
We explore the floating-point arithmetic implemented in the NVIDIA tensor cores, which are hardware accelerators for mixed-precision matrix multiplication available on the Volta, Turing, and Ampere microarchitectures. Using Volta V100, Turing T4, and Ampere A100 graphics cards, we determine what precision is used for the intermediate results, whether subnormal numbers are supported, what rounding mode is used, in which order the operations underlying the matrix multiplication are performed, and whether partial sums are normalized. These aspects are not documented by NVIDIA, and we gain insight by running carefully designed numerical experiments on these hardware units. Knowing the answers to these questions is important if one wishes to: (1) accurately simulate NVIDIA tensor cores on conventional hardware; (2) understand the differences between results produced by code that utilizes tensor cores and code that uses only IEEE 754-compliant arithmetic operations; and (3) build custom hardware whose behavior matches that of NVIDIA tensor cores. As part of this work we provide a test suite that can be easily adapted to test newer versions of the NVIDIA tensor cores as well as similar accelerators from other vendors, as they become available. Moreover, we identify a non-monotonicity issue affecting floating point multi-operand adders if the intermediate results are not normalized after each step.
Journal Article
Fast Gravitational-wave Parameter Estimation without Compromises
by
Wong, Kaze W. K
,
Edwards, Thomas D. P
,
Isi, Maximiliano
in
Astrophysics
,
Gravitational waves
,
Heterodyning
2023
We present a lightweight, flexible, and high-performance framework for inferring the properties of gravitational-wave events. By combining likelihood heterodyning, automatically differentiable, and accelerator-compatible waveforms, and gradient-based Markov Chain Monte Carlo sampling enhanced by normalizing flows, we achieve full Bayesian parameter estimation for real events like GW150914 and GW170817 within a minute of sampling time. Our framework does not require pretraining or explicit reparameterizations and can be generalized to handle higher dimensional problems. We present the details of our implementation and discuss trade-offs and future developments in the context of other proposed strategies for real-time parameter estimation. Our code for running the analysis is publicly available on GitHub at https://github.com/kazewong/jim.
Journal Article
A survey on parallel clustering algorithms for Big Data
2021
Data clustering is one of the most studied data mining tasks. It aims, through various methods, to discover previously unknown groups within the data sets. In the past years, considerable progress has been made in this field leading to the development of innovative and promising clustering algorithms. These traditional clustering algorithms present some serious issues in connection with the speed-up, the throughput, and the scalability. Thus, they can no longer be directly used in the context of Big Data, where data are mainly characterized by their volume, velocity, and variety. In order to overcome their limitations, the research today is heading to the parallel computing concept by giving rise to the so-called parallel clustering algorithms. This paper presents an overview of the latest parallel clustering algorithms categorized according to the computing platforms used to handle the Big Data, namely, the horizontal and vertical scaling platforms. The former category includes peer-to-peer networks, MapReduce, and Spark platforms, while the latter category includes Multi-core processors, Graphics Processing Unit, and Field Programmable Gate Arrays platforms. In addition, it includes a comparison of the performance of the reviewed algorithms based on some common criteria of clustering validation in the Big Data context. Therefore, it provides the reader with an overall vision of the current parallel clustering techniques.
Journal Article