Catalogue Search | MBRL

A survey of FPGA-based accelerators for convolutional neural networks

by Mittal, Sparsh in Acceleration , Accelerators , Algorithms

2020

Deep convolutional neural networks (CNNs) have recently shown very high accuracy in a wide range of cognitive tasks, and due to this, they have received significant interest from the researchers. Given the high computational demands of CNNs, custom hardware accelerators are vital for boosting their performance. The high energy efficiency, computing capabilities and reconfigurability of FPGA make it a promising platform for hardware acceleration of CNNs. In this paper, we present a survey of techniques for implementing and optimizing CNN algorithms on FPGA. We organize the works in several categories to bring out their similarities and differences. This paper is expected to be useful for researchers in the area of artificial intelligence, hardware architecture and system design.

Journal Article

Share this book

Add to My Shelf

A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing

by Véstias, Mário P. in Algorithms , Architecture , Artificial neural networks

2019

The convolutional neural network (CNN) is one of the most used deep learning models for image detection and classification, due to its high accuracy when compared to other machine learning algorithms. CNNs achieve better results at the cost of higher computing and memory requirements. Inference of convolutional neural networks is therefore usually done in centralized high-performance platforms. However, many applications based on CNNs are migrating to edge devices near the source of data due to the unreliability of a transmission channel in exchanging data with a central server, the uncertainty about channel latency not tolerated by many applications, security and data privacy, etc. While advantageous, deep learning on edge is quite challenging because edge devices are usually limited in terms of performance, cost, and energy. Reconfigurable computing is being considered for inference on edge due to its high performance and energy efficiency while keeping a high hardware flexibility that allows for the easy adaption of the target computing platform to the CNN model. In this paper, we described the features of the most common CNNs, the capabilities of reconfigurable computing for running CNNs, the state-of-the-art of reconfigurable computing implementations proposed to run CNN models, as well as the trends and challenges for future edge reconfigurable platforms.

Journal Article

Share this book

Add to My Shelf

A reconfigurable processor for mix-precision CNNs on FPGA

by CHANG, Libo , ZHANG, Shengbing in convolutional neural network accelerator , mixed-precision quantization , reconfigurable computing

2022

To solve the problem of low computing efficiency of existing accelerators for convolutional neural network (CNNs), which caused by the inability to adapt to the characteristics of computing mode and caching for the mixed-precision quantized CNNs model, we propose a reconfigurable CNN processor in this paper, which consists of the reconfigurable adaptable computing unit, flexible on-chip cache unit and macro-instruction set. The multi-core CNN processor can be reconstructed according to the structure of CNN models and constraints of reconfigurable resources, to improve the utilization of computing resources. The elastic on-chip buffer and the data access approach by dynamically configuring an address to better utilization of on-chip memory. Then, the macroinstruction set architecture (mISA) can fully express the characteristics of the mixed-precision CNN models and reconfigurable processors, to reduce the complexity of mapping CNNs with different network structures and computing modes to reconfigurable the CNNs processors. For the well-known CNNs-VGG16 and ResNet-50, the proposed CNN processor has been implemented using Ultra96-V2 and ZCU102 FPGA, showing the throughput of 216.6 GOPS, and 214 GOPS, the computing efficiency of 0.63 GOPS/DSP and 0.64 GOPS/DSP on Ultra96-V2, respectively, achieving a better efficiency than the CNN accelerator based on fixed bit-width. Meanwhile, for ResNet-50, the throughput and the computing efficiency are up to 931.8 GOPS, 0.40 GOPS/DSP on ZCU102, respectively. In addition, these achieve up to 55.4% higher throughput than state-of-the-art CNN accelerators. 为了解决已有卷积神经网络(convolution neural networks，CNNs)加速器，因无法适应混合量化CNN模型的计算模式和访存特性而引起加速器效率低的问题，设计了可适应混合量化模型的可重构计算单元、弹性片上缓存单元和宏数据流指令集。其中，采用了可根据CNN模型结构的重构多核结构以提高计算资源利用率，采用弹性存储结构以及基于Tile的动态缓存划分策略以提高片上数据复用率，采用可有效表达混合精度CNN模型计算和可重构处理器特性的宏数据流指令集以降低映射策略的复杂度。在Ultra96-V2平台上实现VGG-16和ResNet-50的计算性能达到216.6和214 GOPS，计算效率达到0.63和0.64 GOPS/DSP。同时，在ZCU102平台上实现ResNet-50的计算性能可达931.8 GOPS，计算效率可达0.40 GOPS/DSP，相较于其他类似CNN加速器，计算性能和计算效率分别提高了55.4% 和100%。

Journal Article

Share this book

Add to My Shelf

Reconfigurable processing for satellite on-board automatic cloud cover assessment

by Irish, Richard , El-Ghazawi, Tarek , El-Araby, Esam in Algorithms , Cloud cover , Clouds

2009

Clouds have a critical role in many studies such as weather- and climate-related investigations. However, they represent a source of errors in many applications, and the presence of cloud contamination can hinder the use of satellite data. In addition, sending cloudy data to ground stations can result in an inefficient utilization of the communication bandwidth. This requires satellite on-board cloud detection capability to mask out cloudy pixels from further processing. Remote sensing satellite missions have always required smaller size, lower cost, more flexibility, and higher computational power. Reconfigurable Computers (RCs) combine the flexibility of traditional microprocessors with the power of Field Programmable Gate Arrays (FPGAs). Therefore, RCs are a promising candidate for on-board preprocessing. This paper presents the design and implementation of an RC-based real-time cloud detection system. We investigate the potential of using RCs for on-board preprocessing by prototyping the Landsat 7 ETM+ ACCA algorithm on one of the state-of-the-art reconfigurable platforms, SRC-6. It will be shown that our work provides higher detection accuracy and over one order of magnitude improvement in performance when compared to previously reported investigations.

Journal Article

Share this book

Add to My Shelf

Innovations in mathematical modeling, AI, and optimization techniques

by Yasuo, Nobuaki , Takata, Masami , Ohue, Masahito in Aircraft , Artificial intelligence , Chemical compounds

2025

This special issue is dedicated to examining the rapidly evolving fields of artificial intelligence, mathematical modeling, and optimization, with particular emphasis on their growing importance in computational science. It features the most notable papers from the \"Mathematical Modeling and Problem Solving\" workshop at PDPTA'24, the 30th International Conference on Parallel and Distributed Processing Techniques and Applications. The issue showcases pioneering research in areas such as natural language processing, system optimization, and high-performance computing. The nine selected studies include novel AI-driven methods for chemical compound generation, historical text recognition, and music recommendation, along with advancements in hardware optimization through reconfigurable accelerators and vector register sharing. Additionally, evolutionary and hyper-heuristic algorithms are explored for sophisticated problem-solving in engineering design, and innovative techniques are introduced for high-speed numerical methods in large-scale systems. Collectively, these contributions demonstrate the significance of AI, supercomputing, and advanced algorithms in driving the next generation of scientific discovery.

Journal Article

Share this book

Add to My Shelf

A twofold bio-inspired system for mitigating SEUs in the controllers of digital system deployed on FPGA

by Deepanjali, S. , Mahammad, S. K. Noor in Circuits , Compilers , Computer engineering

2024

Reconfigurable hardware, extensively employed in mission-critical digital applications like space and military electronics due to its adaptability, encounters the issue of soft errors, especially in control path elements, which could result in functional failure. Various system-level fault tolerance methodologies exist, and this paper implements a bio-inspired fault tolerance technique called evolvable hardware (EHW). The preferred implementation of the EHW system involves hosting the evolutionary algorithm on the processor alongside the reconfigurable hardware. However, this approach encounters delays in the intercommunication of the evolved circuit between the reconfigurable hardware and the processor. To address this issue, the paper proposes a two-tier architecture to achieve absolute fault mitigation in the controller. In this architecture, Tier-1 involves the digital implementation of the genetic algorithm on the reconfigurable hardware to mitigate errors in the controller, while Tier-2 focuses on mitigating errors occurring in Tier-1. The aim is to establish an absolute and self-resilient controller hardware to mitigate faults. The study simulates faults at the target circuit and genetic module as a proof of concept. The proposed two-tier single event upset (SEU) mitigation is deployed on Microsemi’s ProAsic3e FPGA (Field Programmable Gate Array), achieving an average efficiency of 91%. This efficiency is accompanied by ten times lesser resource utilization compared to traditional methodologies and a 30% accelerated speed when compared to hybrid evolvable systems.

Journal Article

Share this book

Add to My Shelf

Noise-Adaptive Visible Light Communications Receiver for Automotive Applications: A Step Toward Self-Awareness

by Căilean, Alin-Mihai , Dimian, Mihai , Popa, Valentin in adaptive communication , context-adaptive receiver , Design

2020

Visible light communications are considered as a promising solution for inter-vehicle communications, which in turn can significantly enhance the traffic safety and efficiency. However, the vehicular visible light communications (VLC) channel is highly dynamic, very unpredictable, and subject to many noise sources. Enhancing VLC systems with self-aware capabilities would maximize the communication performances and efficiency, whatever the environmental conditions. Within this context, this letter proposes a novel signal to noise ratio (SNR)-adaptive visible light communication receiver architecture aimed for automotive applications. The novelty of this letter comes from an open loop signal processing technique in which the signal treatment complexity is established based on a real-time SNR analysis. So, the receiver evaluates the SNR, and based on this assessment, it reconfigures its structural design in order to ensure a proper signal treatment, while providing an optimal tradeoff between communication performances and computational resources usage. This approach based on software reconfiguration has the potential to provide the system with enhanced flexibility and enables its usage in resource sharing application. As far as we know, this approach has not been considered in vehicular VLC systems. The performances of the proposed architecture are demonstrated by simulations, which confirm the SNR-adaptive capacity and the optimized performances.

Journal Article

Share this book

Add to My Shelf

A Reconfigurable Framework for Hybrid Quantum–Classical Computing

by Mahmud, Naveed , Pratibha in Algorithms , Central processing units , Circuits

2025

Hybrid quantum–classical (HQC) computing refers to the approach of executing algorithms coherently on both quantum and classical resources. This approach makes the best use of current or near-term quantum computers by sharing the workload with classical high-performance computing. However, HQC algorithms often require a back-and-forth exchange of data between quantum and classical processors, causing system bottlenecks and leading to high latency in applications. The objective of this study is to investigate novel frameworks that unify quantum and reconfigurable resources for HQC and mitigate system bottleneck and latency issues. In this paper, we propose a reconfigurable framework for hybrid quantum–classical computing. The proposed framework integrates field-programmable gate arrays (FPGAs) with quantum processing units (QPUs) for deploying HQC algorithms. The classical subroutines of the algorithms are accelerated on FPGA fabric using a high-throughput processing pipeline, while quantum subroutines are executed on the QPUs. High-level software is used to seamlessly facilitate data exchange between classical and quantum workloads through high-performance channels. To evaluate the proposed framework, an HQC algorithm, namely variational quantum classification, and the MNIST dataset are used as a test case. We present a quantitative comparison of the proposed framework with a state-of-the-art quantum software framework running on a server-grade CPU. The results demonstrate that the FPGA pipeline achieves up to 8× improvement in runtime compared to the CPU baseline.

Journal Article

Share this book

Add to My Shelf

High-Performance Parallel Implementation of Genetic Algorithm on FPGA

by Torquato, Matheus F , Fernandes, Marcelo A C in Field programmable gate arrays , Genetic algorithms , Hardware

2019

Genetic algorithms (GAs) are used to solve search and optimization problems in which an optimal solution can be found using an iterative process with probabilistic and non-deterministic transitions. However, depending on the problem’s nature, the time required to find a solution can be high in sequential machines due to the computational complexity of genetic algorithms. This work proposes a full-parallel implementation of a genetic algorithm on field-programmable gate array (FPGA). Optimization of the system’s processing time is the main goal of this project. Results associated with the processing time and area occupancy (on FPGA) for various population sizes are analyzed. Studies concerning the accuracy of the GA response for the optimization of two variables functions were also evaluated for the hardware implementation. However, the high-performance implementation proposed in this paper is able to work with more variable from some adjustments on hardware architecture. The results showed that the GA full-parallel implementation achieved throughput about 16 millions of generations per second and speedups between 17 and 170,000 associated with several works proposed in the literature.

Journal Article

Share this book

Add to My Shelf

Field‐programmable gate array acceleration of the Tersoff potential in LAMMPS

by Deng, Quan , Liu, Qiang in Algorithms , Computing time , Design optimization

2025

Molecular dynamics simulation is a common method to help humans understand the microscopic world. The traditional general‐purpose high‐performance computing platforms are hindered by low computational and power efficiency, constraining the practical application of large‐scale and long‐time many‐body molecular dynamics simulations. In order to address these problems, a novel molecular dynamics accelerator for the Tersoff potential is designed based on field‐programmable gate array (FPGA) platforms, which enables the acceleration of LAMMPS using FPGAs. Firstly, an on‐the‐fly method is proposed to build neighbor lists and reduce storage usage. Besides, multilevel parallelizations are implemented to enable the accelerator to be flexibly deployed on FPGAs of different scales and achieve good performance. Finally, mathematical models of the accelerator are built, and a method for using the models to determine the optimal‐performance parameters is proposed. Experimental results show that, when tested on the Xilinx Alveo U200, the proposed accelerator achieves a performance of 9.51 ns/day for the Tersoff simulation in a 55,296‐atom system, which is a 2.00× $$ \\times $$increase in performance when compared to Intel I7‐8700K and 1.70× $$ \\times $$to NVIDIA Tesla K40c under the same test case. In addition, in terms of computational efficiency and power efficiency, the proposed accelerator achieves improvements of 2.00× $$ \\times $$and 7.19× $$ \\times $$compared to Intel I7‐8700K, and 4.33× $$ \\times $$and 2.11× $$ \\times $$compared to NVIDIA Titan Xp, respectively. We propose an FPGA‐based molecular dynamics accelerator with customized computing architecture for the Tersoff potential. The designed accelerator achieves good acceleration of the Tersoff potential, showing the potential of extending LAMMPS to FPGAs for high power efficiency and high computational efficiency.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter