Catalogue Search | MBRL

Accurate deep neural network inference using computational phase-change memory

by Boybat, Irem , Eleftheriou, Evangelos , Joshi, Vinay in 639/705/117 , 639/925/927/1007 , Accuracy

2020

In-memory computing using resistive memory devices is a promising non-von Neumann approach for making energy-efficient deep learning inference hardware. However, due to device variability and noise, the network needs to be trained in a specific way so that transferring the digitally trained weights to the analog resistive memory devices will not result in significant loss of accuracy. Here, we introduce a methodology to train ResNet-type convolutional neural networks that results in no appreciable accuracy loss when transferring weights to phase-change memory (PCM) devices. We also propose a compensation technique that exploits the batch normalization parameters to improve the accuracy retention over time. We achieve a classification accuracy of 93.7% on CIFAR-10 and a top-1 accuracy of 71.6% on ImageNet benchmarks after mapping the trained weights to PCM. Our hardware results on CIFAR-10 with ResNet-32 demonstrate an accuracy above 93.5% retained over a one-day period, where each of the 361,722 synaptic weights is programmed on just two PCM devices organized in a differential configuration. Designing deep learning inference hardware based on in-memory computing remains a challenge. Here, the authors propose a strategy to train ResNet-type convolutional neural networks which results in reduced accuracy loss when transferring weights to in-memory computing hardware based on phase-change memory.

Journal Article

Share this book

Add to My Shelf

Neuromorphic computing with multi-memristive synapses

by Boybat, Irem , Eleftheriou, Evangelos , Moraitis, Timoleon in 631/378/116/1925 , 639/705/1042 , 639/925/927/1007

2018

Neuromorphic computing has emerged as a promising avenue towards building the next generation of intelligent computing systems. It has been proposed that memristive devices, which exhibit history-dependent conductivity modulation, could efficiently represent the synaptic weights in artificial neural networks. However, precise modulation of the device conductance over a wide dynamic range, necessary to maintain high network accuracy, is proving to be challenging. To address this, we present a multi-memristive synaptic architecture with an efficient global counter-based arbitration scheme. We focus on phase change memory devices, develop a comprehensive model and demonstrate via simulations the effectiveness of the concept for both spiking and non-spiking neural networks. Moreover, we present experimental results involving over a million phase change memory devices for unsupervised learning of temporal correlations using a spiking neural network. The work presents a significant step towards the realization of large-scale and energy-efficient neuromorphic computing systems. Memristive technology is a promising avenue towards realizing efficient non-von Neumann neuromorphic hardware. Boybat et al. proposes a multi-memristive synaptic architecture with a counter-based global arbitration scheme to address challenges associated with the non-ideal memristive device behavior.

Journal Article

Share this book

Add to My Shelf

Experimental Demonstration of Supervised Learning in Spiking Neural Networks with Phase-Change Memory Synapses

by Nandakumar, S. R. , Le Gallo, Manuel , Boybat, Irem in 639/166/987 , 639/925/927 , Action Potentials

2020

Spiking neural networks (SNN) are computational models inspired by the brain’s ability to naturally encode and process information in the time domain. The added temporal dimension is believed to render them more computationally efficient than the conventional artificial neural networks, though their full computational capabilities are yet to be explored. Recently, in-memory computing architectures based on non-volatile memory crossbar arrays have shown great promise to implement parallel computations in artificial and spiking neural networks. In this work, we evaluate the feasibility to realize high-performance event-driven in-situ supervised learning systems using nanoscale and stochastic analog memory synapses. For the first time, the potential of analog memory synapses to generate precisely timed spikes in SNNs is experimentally demonstrated. The experiment targets applications which directly integrates spike encoded signals generated from bio-mimetic sensors with in-memory computing based learning systems to generate precisely timed control signal spikes for neuromorphic actuators. More than 170,000 phase-change memory (PCM) based synapses from our prototype chip were trained based on an event-driven learning rule, to generate spike patterns with more than 85% of the spikes within a 25 ms tolerance interval in a 1250 ms long spike pattern. We observe that the accuracy is mainly limited by the imprecision related to device programming and temporal drift of conductance values. We show that an array level scaling scheme can significantly improve the retention of the trained SNN states in the presence of conductance drift in the PCM. Combining the computational potential of supervised SNNs with the parallel compute power of in-memory computing, this work paves the way for next-generation of efficient brain-inspired systems.

Journal Article

Share this book

Add to My Shelf

Mixed-Precision Deep Learning Based on Computational Memory

by Mariani, Giovanni , Piveteau, Christophe , Le Gallo, Manuel in Arrays , Artificial intelligence , Cognitive ability

2020

Deep neural networks (DNNs) have revolutionized the field of artificial intelligence and have achieved unprecedented success in cognitive tasks such as image and speech recognition. Training of large DNNs, however, is computationally intensive and this has motivated the search for novel computing architectures targeting this application. A computational memory unit with nanoscale resistive memory devices organized in crossbar arrays could store the synaptic weights in their conductance states and perform the expensive weighted summations in place in a non-von Neumann manner. However, updating the conductance states in a reliable manner during the weight update process is a fundamental challenge that limits the training accuracy of such an implementation. Here, we propose a mixed-precision architecture that combines a computational memory unit performing the weighted summations and imprecise conductance updates with a digital processing unit that accumulates the weight updates in high precision. A combined hardware/software training experiment of a multilayer perceptron based on the proposed architecture using a phase-change memory (PCM) array achieves 97.73% test accuracy on the task of classifying handwritten digits (based on the MNIST dataset), within 0.6% of the software baseline. The architecture is further evaluated using accurate behavioral models of PCM on a wide class of networks, namely convolutional neural networks, long-short-term-memory networks, and generative-adversarial networks. Accuracies comparable to those of floating-point implementations are achieved without being constrained by the non-idealities associated with the PCM devices. A system-level study demonstrates 172 × improvement in energy efficiency of the architecture when used for training a multilayer perceptron compared with a dedicated fully digital 32-bit implementation.

Journal Article

Share this book

Add to My Shelf

Neuromorphic computing using non-volatile memory

by Kim, Sangbum , Shelby, Robert M. , Kim, Seyoung in Algorithms , Arrays , Artificial neural networks

2017

Dense crossbar arrays of non-volatile memory (NVM) devices represent one possible path for implementing massively-parallel and highly energy-efficient neuromorphic computing systems. We first review recent advances in the application of NVM devices to three computing paradigms: spiking neural networks (SNNs), deep neural networks (DNNs), and 'Memcomputing'. In SNNs, NVM synaptic connections are updated by a local learning rule such as spike-timing-dependent-plasticity, a computational approach directly inspired by biology. For DNNs, NVM arrays can represent matrices of synaptic weights, implementing the matrix-vector multiplication needed for algorithms such as backpropagation in an analog yet massively-parallel fashion. This approach could provide significant improvements in power and speed compared to GPU-based DNN training, for applications of commercial significance. We then survey recent research in which different types of NVM devices - including phase change memory, conductive-bridging RAM, filamentary and non-filamentary RRAM, and other NVMs - have been proposed, either as a synapse or as a neuron, for use within a neuromorphic computing application. The relevant virtues and limitations of these devices are assessed, in terms of properties such as conductance dynamic range, (non)linearity and (a)symmetry of conductance response, retention, endurance, required switching power, and device variability.

Journal Article

Share this book

Add to My Shelf

Equivalent-accuracy accelerated neural-network training using analogue memory

by Shelby, Robert M. , Narayanan, Pritish , Burr, Geoffrey W. in 639/166/987 , 639/705/258 , 639/766/119/995

2018

Neural-network training can be slow and energy intensive, owing to the need to transfer the weight data for the network between conventional digital memory chips and processor chips. Analogue non-volatile memory can accelerate the neural-network training algorithm known as backpropagation by performing parallelized multiply–accumulate operations in the analogue domain at the location of the weight data. However, the classification accuracies of such in situ training using non-volatile-memory hardware have generally been less than those of software-based training, owing to insufficient dynamic range and excessive weight-update asymmetry. Here we demonstrate mixed hardware–software neural-network implementations that involve up to 204,900 synapses and that combine long-term storage in phase-change memory, near-linear updates of volatile capacitors and weight-data transfer with ‘polarity inversion’ to cancel out inherent device-to-device variations. We achieve generalization accuracies (on previously unseen data) equivalent to those of software-based training on various commonly used machine-learning test datasets (MNIST, MNIST-backrand, CIFAR-10 and CIFAR-100). The computational energy efficiency of 28,065 billion operations per second per watt and throughput per area of 3.6 trillion operations per second per square millimetre that we calculate for our implementation exceed those of today’s graphical processing units by two orders of magnitude. This work provides a path towards hardware accelerators that are both fast and energy efficient, particularly on fully connected neural-network layers. Analogue-memory-based neural-network training using non-volatile-memory hardware augmented by circuit simulations achieves the same accuracy as software-based training but with much improved energy efficiency and speed.

Journal Article

Share this book

Add to My Shelf

Supernetwork-based efficient mapping of deep learning applications to mixed-precision hardware using model adaptation

by Lammie, Corey , Boybat, Irem , Burr, Geoffrey W.

2026

The rapid proliferation of Artificial Intelligence applications necessitates scalable solutions that perform efficiently under real-world constraints. Heterogeneous accelerators combining specialized analog and digital units offer localized, energy-efficient neural network computations. However, achieving optimal performance on these platforms requires balancing energy efficiency and model accuracy through optimized neural network layer mapping. To this end, we introduce Mixed-Precision Supernetwork, a unified framework for training mixed-precision supernetworks that seamlessly integrate quantized layers with analog noise-sensitive layers. Mixed-Precision Supernetwork incorporates a mapping-aware adaptation strategy, dynamically optimizing layer assignments while refining the neural network via hardware-aware architecture search. This dual innovation establishes Mixed-Precision Supernetwork as a groundbreaking approach for deploying deep learning models efficiently on heterogeneous accelerators. On average, Mixed-Precision Supernetwork produces mappings ~ 2.2 × faster and achieves a ~ 3.4% increase in model accuracy over a fully analog approach, while improving energy-efficiency by mapping up to 80% of the model's weights to analog hardware while maintaining full-precision accuracy.

Journal Article

Share this book

Add to My Shelf

Efficient scaling of large language models with mixture of experts and 3D analog in-memory computing

by Boybat, Irem , Burr, Geoffrey W. , Filipiak, Bill in 639/705 , 639/925/927 , Computation

2025

Large language models (LLMs), with their remarkable generative capacities, have greatly impacted a range of fields, but they face scalability challenges due to their large parameter counts, which result in high costs for training and inference. The trend of increasing model sizes is exacerbating these challenges, particularly in terms of memory footprint, latency and energy consumption. Here we explore the deployment of ‘mixture of experts’ (MoEs) networks—networks that use conditional computing to keep computational demands low despite having many parameters—on three-dimensional (3D) non-volatile memory (NVM)-based analog in-memory computing (AIMC) hardware. When combined with the MoE architecture, this hardware, utilizing stacked NVM devices arranged in a crossbar array, offers a solution to the parameter-fetching bottleneck typical in traditional models deployed on conventional von-Neumann-based architectures. By simulating the deployment of MoEs on an abstract 3D AIMC system, we demonstrate that, due to their conditional compute mechanism, MoEs are inherently better suited to this hardware than conventional, dense model architectures. Our findings suggest that MoEs, in conjunction with emerging 3D NVM-based AIMC, can substantially reduce the inference costs of state-of-the-art LLMs, making them more accessible and energy-efficient. This study shows a viable pathway to the efficient deployment of state-of-the-art large language models using mixture of experts on 3D analog in-memory computing hardware.

Journal Article

Share this book

Add to My Shelf

CiMBA: Accelerating Genome Sequencing through On-Device Basecalling via Compute-in-Memory

by Boybat, Irem , Jain, Shubham , Ferro, Elena in Co-design , Data communication , Gene sequencing

2025

As genome sequencing is finding utility in a wide variety of domains beyond the confines of traditional medical settings, its computational pipeline faces two significant challenges. First, the creation of up to 0.5 GB of data per minute imposes substantial communication and storage overheads. Second, the sequencing pipeline is bottlenecked at the basecalling step, consuming >40% of genome analysis time. A range of proposals have attempted to address these challenges, with limited success. We propose to address these challenges with a Compute-in-Memory Basecalling Accelerator (CiMBA), the first embedded (\\(\\sim25\\)mm\\(^2\\)) accelerator capable of real-time, on-device basecalling, coupled with AnaLog (AL)-Dorado, a new family of analog focused basecalling DNNs. Our resulting hardware/software co-design greatly reduces data communication overhead, is capable of a throughput of 4.77 million bases per second, 24x that required for real-time operation, and achieves 17x/27x power/area efficiency over the best prior basecalling embedded accelerator while maintaining a high accuracy comparable to state-of-the-art software basecallers.

Paper

Share this book

Add to My Shelf

LionHeart: A Layer-based Mapping Framework for Heterogeneous Systems with Analog In-Memory Computing Tiles

by Lammie, Corey , Boybat, Irem , Zapater, Marina in Accuracy , Artificial neural networks , Deep learning

2025

When arranged in a crossbar configuration, resistive memory devices can be used to execute Matrix-Vector Multiplications (MVMs), the most dominant operation of many Machine Learning (ML) algorithms, in constant time complexity. Nonetheless, when performing computations in the analog domain, novel challenges are introduced in terms of arithmetic precision and stochasticity, due to non-ideal circuit and device behaviour. Moreover, these non-idealities have a temporal dimension, resulting in a degrading application accuracy over time. Facing these challenges, we propose a novel framework, named LionHeart, to obtain hybrid analog-digital mappings to execute Deep Learning (DL) inference workloads using heterogeneous accelerators. The accuracy-constrained mappings derived by LionHeart showcase, across different Convolutional Neural Networks (CNNs) and one transformer-based network, high accuracy and potential for speedup. The results of the full system simulations highlight run-time reductions and energy efficiency gains that exceed 6X, with a user-defined accuracy threshold for a fully digital floating point implementation. LionHeart is open-sourced here: https://github.com/IBM/lionheart.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter