Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Series TitleSeries Title
-
Reading LevelReading Level
-
YearFrom:-To:
-
More FiltersMore FiltersContent TypeItem TypeIs Full-Text AvailableSubjectPublisherSourceDonorLanguagePlace of PublicationContributorsLocation
Done
Filters
Reset
725
result(s) for
"Heterogeneous computing"
Sort by:
Blockchain in the Industrial Internet of Things
by
Ramasamy, Lakshmana Kumar
,
Kadry, Seifedine
in
Heterogeneous distributed computing systems
,
Smart contracts
2022
Blockchain and the Internet of Things are separately regarded as highly capable popular technologies. Combining these gives Blockchain for the Industrial Internet of Things, which overcomes the security issues associated with the IOT and can further the development of Industry 4.0.
Linked data management
\"With the growing popularity of the Semantic Web, more and more semantic data and data sources become available and accessible for everyone. By establishing semantic links between the data, answers to (complex) queries can be evaluated based on the data on multiple providers instead of just one. This book motivates, introduces, and details techniques for processing heterogeneous structured data on the Web by providing a comprehensive overview for database researchers and practitioners about this new publishing paradigm on the web, and show how the abundance of data published as Linked Data can serve as a fertile ground for database research and experimentation\"-- Provided by publisher.
Transition of HPC towards exascale computing
by
D'Hollander, E.
in
Heterogeneous computing-Congresses
,
High performance computing-Congresses
,
Supercomputers-Congresses
2013
The US, Europe, Japan and China are racing to develop the next generation of supercomputers - exascale machines capable of 10 to the 18th power calculations a second - by 2020.
Heterogeneous Computing
2019
If you look around you will find that all computer systems, from your portable devices to the strongest supercomputers, are heterogeneous in nature. The most obvious heterogeneity is the existence of computing nodes of different capabilities (e.g. multicore, GPUs, FPGAs, ...). But there are also other heterogeneity factors that exist in computing systems, like the memory system components, interconnection, etc. The main reason for these different types of heterogeneity is to have good performance with power efficiency. Heterogeneous computing results in both challenges and opportunities. This book discusses both. It shows that we need to deal with these challenges at all levels of the computing stack: from algorithms all the way to process technology. We discuss the topic of heterogeneous computing from different angles: hardware challenges, current hardware state-of-the-art, software issues, how to make the best use of the current heterogeneous systems, and what lies ahead. The aim of this book is to introduce the big picture of heterogeneous computing. Whether you are a hardware designer or a software developer, you need to know how the pieces of the puzzle fit together. The main goal is to bring researchers and engineers to the forefront of the research frontier in the new era that started a few years ago and is expected to continue for decades. We believe that academics, researchers, practitioners, and students will benefit from this book and will be prepared to tackle the big wave of heterogeneous computing that is here to stay.
Perspective: an optoelectronic future for heterogeneous, dendritic computing
by
Abdelghany, Mahmoud
,
Lee, Yun-Jhu
,
Ambethkar, Hari Rakul
in
analog computing
,
dendritic computing
,
heterogeneous computing
2024
With the increasing number of applications reliant on large neural network models, the pursuit of more suitable computing architectures is becoming increasingly relevant. Progress toward co-integrated silicon photonic and CMOS circuits provides new opportunities for computing architectures with high bandwidth optical networks and high-speed computing. In this paper, we discuss trends in neuromorphic computing architecture and outline an optoelectronic future for heterogeneous, dendritic neuromorphic computing.
Journal Article
Graphics processing unit-accelerated mesh-based Monte Carlo photon transport simulations
2019
The mesh-based Monte Carlo (MMC) algorithm is increasingly used as the gold-standard for developing new biophotonics modeling techniques in 3-D complex tissues, including both diffusion-based and various Monte Carlo (MC)-based methods. Compared to multilayered and voxel-based MCs, MMC can utilize tetrahedral meshes to gain improved anatomical accuracy but also results in higher computational and memory demands. Previous attempts of accelerating MMC using graphics processing units (GPUs) have yielded limited performance improvement and are not publicly available. We report a highly efficient MMC—MMCL—using the OpenCL heterogeneous computing framework and demonstrate a speedup ratio up to 420× compared to state-of-the-art single-threaded CPU simulations. The MMCL simulator supports almost all advanced features found in our widely disseminated MMC software, such as support for a dozen of complex source forms, wide-field detectors, boundary reflection, photon replay, and storing a rich set of detected photon information. Furthermore, this tool supports a wide range of GPUs/CPUs across vendors and is freely available with full source codes and benchmark suites at http://mcx.space/#mmc.
Journal Article
一种改进型LeNet的交通标识多分类异构加速器的实现
2024
提出一种基于改进型 LeNet的交通标志多分类异构加速器的实现方案.该加速器利用 ARM+FPGA异构平台,将改进型 LeNet的前向推理部署到 FPGA上,实现并行计算.在 FPGA端,采用 AXI-Stream协议,通过 DMA实现数据高速流转,使用数组分区和多级流水线等技术实现数据的并行处理.在 ARM端使用PYNQ框架进行数据更新和加速器调度.在 GTSRB数据集上的实验结果显示,该设计方案在工作时钟频率为 50 MHz时,平均推理时间为 14.489 ms,在 MCU上的推理时间为 710 ms,加速比可达 49,对于实现交通标识多分类的边缘端应用具有显著的作用.
Journal Article
To Exascale and Beyond—The Simple Cloud‐Resolving E3SM Atmosphere Model (SCREAM), a Performance Portable Global Atmosphere Model for Cloud‐Resolving Scales
by
Donahue, A. S.
,
Bogenschutz, P. A.
,
Sreepathi, S.
in
Aerosols
,
Atmosphere
,
Atmospheric models
2024
The new generation of heterogeneous CPU/GPU computer systems offer much greater computational performance but are not yet widely used for climate modeling. One reason for this is that traditional climate models were written before GPUs were available and would require an extensive overhaul to run on these new machines. In addition, even conventional “high–resolution” simulations don't currently provide enough parallel work to keep GPUs busy, so the benefits of such overhaul would be limited for the types of simulations climate scientists are accustomed to. The vision of the Simple Cloud‐Resolving Energy Exascale Earth System (E3SM) Atmosphere Model (SCREAM) project is to create a global atmospheric model with the architecture to efficiently use GPUs and horizontal resolution sufficient to fully take advantage of GPU parallelism. After 5 years of model development, SCREAM is finally ready for use. In this paper, we describe the design of this new code, its performance on both CPU and heterogeneous machines, and its ability to simulate real‐world climate via a set of four 40 day simulations covering all 4 seasons of the year. Plain Language Summary This paper describes the design and development of a 3 km version of the Energy Exascale Earth System Model (E3SM) atmosphere model, which has been fully rewritten in C++ using the Kokkos library for performance portability. This newly rewritten model is able to take advantage of the state–of–the–science high performance computing systems which use graphical processor units (GPUs) to mitigate much of the computational expense which typically plagues high–resolution global modeling. Taking advantage of this high–performance we are able to run four seasons of simulations at 3 km global resolution. We discuss the biases, including the diurnal cycle, by comparing model results with satellite and Atmospheric Radiation Measurement ground‐based site data. Key Points Describes the C++/Kokkos implementation of the Simple Cloud–Resolving E3SM Atmosphere Model (SCREAMv1) SCREAMv1 leverages GPUs to surpass one simulated year per compute day at global 3 km resolution High resolution improves some meso‐scale features and the diurnal cycle but large‐scale biases require improvement across all four seasons
Journal Article
Research on Convolutional Neural Network Inference Acceleration and Performance Optimization for Edge Intelligence
by
Yang, Zhenhao
,
Lin, Daoqian
,
Tan, Junwen
in
Algorithms
,
Artificial intelligence
,
Deep learning
2023
In recent years, edge intelligence (EI) has emerged, combining edge computing with AI, and specifically deep learning, to run AI algorithms directly on edge devices. In practical applications, EI faces challenges related to computational power, power consumption, size, and cost, with the primary challenge being the trade-off between computational power and power consumption. This has rendered traditional computing platforms unsustainable, making heterogeneous parallel computing platforms a crucial pathway for implementing EI. In our research, we leveraged the Xilinx Zynq 7000 heterogeneous computing platform, employed high-level synthesis (HLS) for design, and implemented two different accelerators for LeNet-5 using loop unrolling and pipelining optimization techniques. The experimental results show that when running at a clock speed of 100 MHz, the PIPELINE accelerator, compared to the UNROLL accelerator, experiences an 8.09% increase in power consumption but speeds up by 14.972 times, making the PIPELINE accelerator superior in performance. Compared to the CPU, the PIPELINE accelerator reduces power consumption by 91.37% and speeds up by 70.387 times, while compared to the GPU, it reduces power consumption by 93.35%. This study provides two different optimization schemes for edge intelligence applications through design and experimentation and demonstrates the impact of different quantization methods on FPGA resource consumption. These experimental results can provide a reference for practical applications, thereby providing a reference hardware acceleration scheme for edge intelligence applications.
Journal Article