Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
LanguageLanguage
-
SubjectSubject
-
Item TypeItem Type
-
DisciplineDiscipline
-
YearFrom:-To:
-
More FiltersMore FiltersIs Peer Reviewed
Done
Filters
Reset
2
result(s) for
"OpenCL/CUDA"
Sort by:
A study of graphics hardware accelerated particle swarm optimization with digital pheromones
by
Winer, Eliot
,
Kalivarapu, Vijay
in
Central processing units
,
Computational Mathematics and Numerical Analysis
,
Computing costs
2015
Programmable Graphics Processing Units (GPUs) have lately become a promising means to perform scientific computations. Modern GPUs have proven to outperform the number of floating point operations when compared to traditional Central Processing Units (CPUs) through inherent data parallel architecture and higher bandwidth capabilities. They allow scientific computations to be performed without noticeable degradation in accuracy in a fraction of the time compared to traditional CPUs at substantially reduced costs, making them viable alternatives to expensive computer clusters or workstations. GPU programmability however, has fostered the development of a variety of programming languages making it challenging to select a computing language and use it consistently without the pitfall of being obsolete. Some GPU languages are hardware specific and are designed to rake in performance boosts when used with their host GPUs (e.g., Nvidia Cuda). Others are operating system specific (e.g., Microsoft HLSL). A few are platform agnostic lending themselves to be used on a workstation with any CPU and a GPU (e.g., GLSL, OpenCL).
Of a number of companies and organizations that implement formal optimization into their processes, only a few utilize GPUs. It is either because the others are either vested much into CPU based computing or they are not fully aware of the benefits of implementing population based optimization routines in GPUs. Literature shows a large number of research publications specifically in the field of optimization utilizing GPUs. However, most of them are limited to a specific GPU hardware or addressed specific problems. The diversity in current GPU hardware and software APIs present overwhelming number of choices making it challenging to decide where and how to begin transitioning to GPU based computing, impeding promising computing avenues that relatively is very cost effective. In this paper, the authors precisely intend to address some of these issues by broadly classifying GPU APIs into three categories: 1) Hardware vendor dependent GPU APIs, 2) Graphical in context APIs, and 3) Platform agnostic APIs. Prior work by the authors demonstrated the capability of digital pheromones within Particle Swarm Optimization (PSO) for searching n-dimensional design spaces with improved accuracy, efficiency and reliability in serial and parallel CPU computing environments. To study the impact of GPUs, the authors have taken this digital pheromone variant of PSO and implemented it on three GPU APIs, each representing a category listed above, in a simplistic sense – delegate unconstrained explicit objective function evaluations to GPUs. While this approach itself cannot be considered novel, the takeaways from implementing it on different GPU APIs provided a wealth of information that the authors believe can help optimization companies and organizations make informed decisions in implementing GPUs in their processes.
Journal Article
Hybrid/Heterogeneous Programming with OMPSS and Its Software/Hardware Implications
by
Duran, Alejandro
,
Bueno, Javier
,
Etsion, Yoav
in
clustered heterogeneous multi‐/many‐core systems
,
computing systems
,
message passing interface
2017
This chapter describes how OmpSs extends the OpenMP 3.0 node programming model and how it leverages message passing interface (MPI) and OpenCL/CUDA, mastering the efficient programming of the clustered heterogeneous multi‐/many‐core systems that will be available in current and future computing systems. It describes the language extensions and the implementation of OmpSs, focusing on the intelligence that needs to be embedded in the runtime system to effectively lower the programmability wall and the opportunities to implement new mechanisms and policies. The chapter reasons about the overheads related with task management (detecting intertask data dependencies, identifying task‐level parallelism and executing tasks out of order) in OmpSs examining how far a software implementation can go to cope with fine‐grain parallelism and opening the door to novel hardware mechanisms for emerging multicore architectures. The chapter provides a brief description of the OmpSs execution model to understand the programming model extensions.
Book Chapter