Catalogue Search | MBRL

The promise of training deep neural networks on CPUs: A survey

by He, Wei in Algorithms , artificial intelligence accelerator , Artificial neural networks

2023

This survey presents a comprehensive analysis of the potential benefits and challenges of training deep neural networks (DNNs) on CPUs, summarizing existing research in the field. Five distinct DNN models are examined: Ternary Neural Networks (TNNs), Binary Neural Networks (BNNs), Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and a novel method called Sub-Linear Deep Learning Engine (SLIDE), specifically designed for CPU-based network training. The survey emphasizes the advantages of using CPUs for DNN training, such as low cost, compact size, and broad applicability across various domains. Furthermore, the survey collects concerns related to CPU acceleration, including the absence of a unified programming model and the inefficiencies in DNN training due to increased floating-point operations. The survey explores algorithmic and hardware optimization strategies, incorporating compressed network structures, innovative techniques like SLIDE, and the RISC-V instruction set to tackle these issues. According to the survey, CPUs are more likely to become the alternative for developers with limited resources in the future. Through continued algorithm optimization and hardware enhancements, CPUs can provide more cost-efficient neural network training solutions, excelling in areas such as mobile servers and edge computing.

Journal Article

Share this book

Add to My Shelf

Characterization of Single-Event Effects in a Microcontroller with an Artificial Neural Network Accelerator

by Santos, Douglas A. , Kastriotou, Maria , Cazzaniga, Carlo in Algorithms , Artificial intelligence , Artificial neural networks

2024

Artificial neural networks (ANNs) have become essential components in various safety-critical applications, including autonomous vehicles, medical devices, and avionics, where system failures can lead to severe risks. Edge AI devices, which process data locally without relying on the cloud, are increasingly used to meet the performance and real-time demands of these applications. However, their reliability in radiation-prone environments is a significant concern. In this context, this paper evaluates the MAX78000, an ultra-low-power Edge AI microcontroller with a hardware-based convolutional neural network (CNN) accelerator, focusing on its behavior in radiation environments. To assess the reliability of the MAX78000, we performed a test campaign at the ChipIR neutron irradiation facility using two different ANNs. We implemented techniques to improve system observability during ANN inference and analyzed the radiation-induced errors observed. The results present a comparative analysis between the two ANN architectures, which shows that the complexity of the ANN directly impacts its reliability.

Journal Article

Share this book

Add to My Shelf

Bridging HDL Bits and Caffe: An Educational Path to AI Accelerator Design

by Elsharkawy, Mostafa Osama Abdellatif , Shanker, Manjusha , Agarwal, Harsh in artificial intelligence accelerator , caffe , co- simulation

2025

An integrated educational pipeline is presented that bridges hardware description language (HDL) exercises with the Caffe deep learning framework, enabling progression from Verilog fundamentals to the deployment of convolutional neural network accelerators on field-programmable gate array (FPGAs). Parameterized, pipelined Verilog modules for convolution, pooling, ReLU activation, and fully connected layers are developed and verified using fixed test vectors. Meanwhile, a LeNet-5 model is defined and trained in Google Colab by using an AccDNN enabled Caffe build. Trained weights are exported, quantized to eight-bit fixed point by using Ristretto, and loaded into Verilog testbenches. A Python based co-simulation harness is provided to automate parameter extraction, regression testing, and bitaccurate comparison of outputs for hundreds of MNIST samples. The entire design is synthesized on an Artix-7 FPGA, achieving 25% LUT utilization, 50% DSP utilization, and 30% block RAM utilization at 100 MHz. An end-to-end inference latency of 0.156 ms is reported, corresponding to a 6,400 images per second throughput. Through the combination of HDL assignments, Caffe-based training, quantization analysis, and FPGA synthesis, learners are equipped with a complete workflow for accelerator design, verification, and performance evaluation. A controlled experiment involved 10 participants show that the proposed method is effective in supporting AI learning.

Journal Article

Share this book

Add to My Shelf

Artificial Intelligence Accelerators Based on Graphene Optoelectronic Devices

by Chen, Ruiyang , Yu, Cunxi , Gao, Weilu in Algorithms , Aperture , Artificial intelligence

2021

Optical and optoelectronic approaches of performing matrix–vector multiplication (MVM) operations have shown the great promise of accelerating machine learning (ML) algorithms with unprecedented performance. The incorporation of nanomaterials into the system can further improve the device and system performance thanks to their extraordinary properties, but the nonuniformity and variation of nanostructures in the macroscopic scale pose severe limitations for large‐scale hardware deployment. Here, a new optoelectronic architecture is presented, consisting of spatial light modulators and tunable responsivity photodetector arrays made from graphene to perform MVM. The ultrahigh carrier mobility of graphene, high‐power‐efficiency electro‐optic control, and extreme parallelism suggest ultrahigh data throughput and ultralow energy consumption. Moreover, a methodology of performing accurate calculations with imperfect components is developed, laying the foundation for scalable systems. Finally, a few representative ML algorithms are demonstrated, including singular value decomposition, support vector machine, and deep neural networks, to show the versatility and generality of the platform. An optoelectronic hardware for accelerating artificial intelligence with ultrahigh parallelism, throughput, and ultralow energy consumption is proposed and analyzed, consisting of graphene spatial light modulators and a photodetector array. Procedures of performing accurate calculation using imperfect components are developed, which are crucial for large‐scale deployment. A few representative machine learning algorithms are demonstrated to show the versatility and generality of the system.

Journal Article

Share this book

Add to My Shelf

A graph placement methodology for fast chip design

by Dean, Jeff , Le, Quoc V. , Jiang, Joe Wenjie in 639/166/987 , 639/705/1042 , Accelerators

2021

Chip floorplanning is the engineering task of designing the physical layout of a computer chip. Despite five decades of research 1 , chip floorplanning has defied automation, requiring months of intense effort by physical design engineers to produce manufacturable layouts. Here we present a deep reinforcement learning approach to chip floorplanning. In under six hours, our method automatically generates chip floorplans that are superior or comparable to those produced by humans in all key metrics, including power consumption, performance and chip area. To achieve this, we pose chip floorplanning as a reinforcement learning problem, and develop an edge-based graph convolutional neural network architecture capable of learning rich and transferable representations of the chip. As a result, our method utilizes past experience to become better and faster at solving new instances of the problem, allowing chip design to be performed by artificial agents with more experience than any human designer. Our method was used to design the next generation of Google’s artificial intelligence (AI) accelerators, and has the potential to save thousands of hours of human effort for each new generation. Finally, we believe that more powerful AI-designed hardware will fuel advances in AI, creating a symbiotic relationship between the two fields. Machine learning tools are used to greatly accelerate chip layout design, by posing chip floorplanning as a reinforcement learning problem and using neural networks to generate high-performance chip layouts.

Journal Article

Share this book

Add to My Shelf

Inference in artificial intelligence with deep optics and photonics

by Ozcan, Aydogan , Gigan, Sylvain , Miller, David A. B. in Accelerators , Algorithms , Artificial intelligence

2020

Artificial intelligence tasks across numerous applications require accelerators for fast and low-power execution. Optical computing systems may be able to meet these domain-specific needs but, despite half a century of research, general-purpose optical computing systems have yet to mature into a practical technology. Artificial intelligence inference, however, especially for visual computing applications, may offer opportunities for inference based on optical and photonic systems. In this Perspective, we review recent work on optical computing for artificial intelligence applications and discuss its promise and challenges. Recent work on optical computing for artificial intelligence applications is reviewed and the potential and challenges of all-optical and hybrid optical networks are discussed.

Journal Article

Share this book

Add to My Shelf

Single-chip photonic deep neural network with forward-only training

by Bandyopadhyay, Saumil , Krastanov, Stefan , Harris, Nicholas in 639/624/1075/1079 , 639/624/1075/401 , 639/624/399/1099

2024

As deep neural networks revolutionize machine learning, energy consumption and throughput are emerging as fundamental limitations of complementary metal–oxide–semiconductor (CMOS) electronics. This has motivated a search for new hardware architectures optimized for artificial intelligence, such as electronic systolic arrays, memristor crossbar arrays and optical accelerators. Optical systems can perform linear matrix operations at an exceptionally high rate and efficiency, motivating recent demonstrations of low-latency matrix accelerators and optoelectronic image classifiers. However, demonstrating coherent, ultralow-latency optical processing of deep neural networks has remained an outstanding challenge. Here we realize such a system in a scalable photonic integrated circuit that monolithically integrates multiple coherent optical processor units for matrix algebra and nonlinear activation functions into a single chip. We experimentally demonstrate this fully integrated coherent optical neural network architecture for a deep neural network with six neurons and three layers that optically computes both linear and nonlinear functions with a latency of 410 ps, unlocking new applications that require ultrafast, direct processing of optical signals. We implement backpropagation-free in situ training on this system, achieving 92.5% accuracy on a six-class vowel classification task, which is comparable to the accuracy obtained on a digital computer. This work lends experimental evidence to theoretical proposals for in situ training, enabling orders of magnitude improvements in the throughput of training data. Moreover, the fully integrated coherent optical neural network opens the path to inference at nanosecond latency and femtojoule per operation energy efficiency. Researchers experimentally demonstrate a fully integrated coherent optical neural network. The system, with six neurons and three layers, operates with a latency of 410 ps.

Journal Article

Share this book

Add to My Shelf

Large-Scale Optical Neural Networks Based on Photoelectric Multiplication

by Bernstein, Liane , Hamerly, Ryan , Englund, Dirk in Accelerators , Artificial intelligence , Artificial neural networks

2019

Recent success in deep neural networks has generated strong interest in hardware accelerators to improve speed and energy consumption. This paper presents a new type of photonic accelerator based on coherent detection that is scalable to large (N≳106) networks and can be operated at high (gigahertz) speeds and very low (subattojoule) energies per multiply and accumulate (MAC), using the massive spatial multiplexing enabled by standard free-space optical components. In contrast to previous approaches, both weights and inputs are optically encoded so that the network can be reprogrammed and trained on the fly. Simulations of the network using models for digit and image classification reveal a “standard quantum limit” for optical neural networks, set by photodetector shot noise. This bound, which can be as low as50zJ/MAC, suggests that performance below the thermodynamic (Landauer) limit for digital irreversible computation is theoretically possible in this device. The proposed accelerator can implement both fully connected and convolutional networks. We also present a scheme for backpropagation and training that can be performed in the same hardware. This architecture will enable a new class of ultralow-energy processors for deep learning.

Journal Article

Share this book

Add to My Shelf

Conceptual Understanding through Efficient Automated Design of Quantum Optical Experiments

by Tischler, Nora , Aspuru-Guzik, Alán , Kottmann, Jakob S. in Algorithms , Artificial intelligence , Astronomy

2021

Artificial intelligence (AI) is a potentially disruptive tool for physics and science in general. One crucial question is how this technology can contribute at a conceptual level to help acquire new scientific understanding. Scientists have used AI techniques to rediscover previously known concepts. So far, no examples of that kind have been reported that are applied to open problems for getting new scientific concepts and ideas. Here, we present Theseus, an algorithm that can provide new conceptual understanding, and we demonstrate its applications in the field of experimental quantum optics. To do so, we make four crucial contributions. (i) We introduce a graph-based representation of quantum optical experiments that can be interpreted and used algorithmically. (ii) We develop an automated design approach for new quantum experiments, which is orders of magnitude faster than the best previous algorithms at concrete design tasks for experimental configuration. (iii) We solve several crucial open questions in experimental quantum optics which involve practical blueprints of resource states in photonic quantum technology and quantum states and transformations that allow for new foundational quantum experiments. Finally, and most importantly, (iv) the interpretable representation and enormous speed-up allow us to produce solutions that a human scientist can interpret and gain new scientific concepts from outright. We anticipate that Theseus will become an essential tool in quantum optics for developing new experiments and photonic hardware. It can further be generalized to answer open questions and provide new concepts in a large number of other quantum physical questions beyond quantum optical experiments. Theseus is a demonstration of explainable AI (XAI) in physics that shows how AI algorithms can contribute to science on a conceptual level.

Journal Article

Share this book

Add to My Shelf

A survey of FPGA-based accelerators for convolutional neural networks

by Mittal, Sparsh in Acceleration , Accelerators , Algorithms

2020

Deep convolutional neural networks (CNNs) have recently shown very high accuracy in a wide range of cognitive tasks, and due to this, they have received significant interest from the researchers. Given the high computational demands of CNNs, custom hardware accelerators are vital for boosting their performance. The high energy efficiency, computing capabilities and reconfigurability of FPGA make it a promising platform for hardware acceleration of CNNs. In this paper, we present a survey of techniques for implementing and optimizing CNN algorithms on FPGA. We organize the works in several categories to bring out their similarities and differences. This paper is expected to be useful for researchers in the area of artificial intelligence, hardware architecture and system design.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter