Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
26 result(s) for "Graves, Catherine E."
Sort by:
Analog content-addressable memories with memristors
A content-addressable memory compares an input search word against all rows of stored words in an array in a highly parallel manner. While supplying a very powerful functionality for many applications in pattern matching and search, it suffers from large area, cost and power consumption, limiting its use. Past improvements have been realized by using memristors to replace the static random-access memory cell in conventional designs, but employ similar schemes based only on binary or ternary states for storage and search. We propose a new analog content-addressable memory concept and circuit to overcome these limitations by utilizing the analog conductance tunability of memristors. Our analog content-addressable memory stores data within the programmable conductance and can take as input either analog or digital search values. Experimental demonstrations, scaled simulations and analysis show that our analog content-addressable memory can reduce area and power consumption, which enables the acceleration of existing applications, but also new computing application areas. Designing low power and high performance content-addressable memory remains a challenge. Here, the authors demonstrate a content-addressable memory concept and circuit which leverages the analog conductance tunability of memristors, reduces power consumption, and enables new functionalities and applications.
Tree-based machine learning performed in-memory with memristive analog CAM
Tree-based machine learning techniques, such as Decision Trees and Random Forests, are top performers in several domains as they do well with limited training datasets and offer improved interpretability compared to Deep Neural Networks (DNN). However, these models are difficult to optimize for fast inference at scale without accuracy loss in von Neumann architectures due to non-uniform memory access patterns. Recently, we proposed a novel analog content addressable memory (CAM) based on emerging memristor devices for fast look-up table operations. Here, we propose for the first time to use the analog CAM as an in-memory computational primitive to accelerate tree-based model inference. We demonstrate an efficient mapping algorithm leveraging the new analog CAM capabilities such that each root to leaf path of a Decision Tree is programmed into a row. This new in-memory compute concept for enables few-cycle model inference, dramatically increasing 10 3  × the throughput over conventional approaches. Tree-based machine learning algorithms are known to be explainable and effective even trained on limited datasets, however difficult to optimize on conventional digital hardware. The authors apply analog content addressable memory to accelerate tree-based model inference for improved performance.
Experimentally validated memristive memory augmented neural network with efficient hashing and similarity search
Lifelong on-device learning is a key challenge for machine intelligence, and this requires learning from few, often single, samples. Memory-augmented neural networks have been proposed to achieve the goal, but the memory module must be stored in off-chip memory, heavily limiting the practical use. In this work, we experimentally validated that all different structures in the memory-augmented neural network can be implemented in a fully integrated memristive crossbar platform with an accuracy that closely matches digital hardware. The successful demonstration is supported by implementing new functions in crossbars, including the crossbar-based content-addressable memory and locality sensitive hashing exploiting the intrinsic stochasticity of memristor devices. Simulations show that such an implementation can be efficiently scaled up for one-shot learning on more complex tasks. The successful demonstration paves the way for practical on-device lifelong learning and opens possibilities for novel attention-based algorithms that were not possible in conventional hardware. Memory augmented neural network for lifelong on-device learning is bottlenecked by limited bandwidth in conventional hardware. Here, the authors demonstrate its efficient in-memristor realization with a close-software accuracy, supported by hashing and similarity search in crossbars.
Multiplexing in photonics as a resource for optical ternary content-addressable memory functionality
In this paper, we combine a Content-Addressable Memory (CAM) encoding scheme previously proposed for analog electronic CAMs (E-CAMs) with optical multiplexing techniques to create two new photonic CAM architectures—wavelength-division multiplexing (WDM) optical ternary CAM (O-TCAM) and time-division multiplexing (TDM) O-TCAM. As an example, we show how these two O-TCAM schemes can be implemented by performing minor modifications in microring-based silicon photonic (SiPh) circuits originally optimized for exascale interconnects. Here, our SiPh O-TCAM designs include not only the actual search engine, but also the transmitter circuits. For the first time, we experimentally demonstrate O-TCAM functionality in SiPh up to and we prove in simulation feasibility for speeds up to 10 Gbps, 10 times faster than typical E-TCAMs at the expense of higher energy consumption per symbol of our O-TCAM Search Engine circuits than the corresponding E-TCAMs. Finally, we identify which hardware and architecture modifications are required to improve the O-CAM’s energy efficiency towards the level of E-CAMs.
Analogue signal and image processing with large memristor crossbars
Memristor crossbars offer reconfigurable non-volatile resistance states and could remove the speed and energy efficiency bottleneck in vector-matrix multiplication, a core computing task in signal and image processing. Using such systems to multiply an analogue-voltage-amplitude-vector by an analogue-conductance-matrix at a reasonably large scale has, however, proved challenging due to difficulties in device engineering and array integration. Here we show that reconfigurable memristor crossbars composed of hafnium oxide memristors on top of metal-oxide-semiconductor transistors are capable of analogue vector-matrix multiplication with array sizes of up to 128 × 64 cells. Our output precision (5–8 bits, depending on the array size) is the result of high device yield (99.8%) and the multilevel, stable states of the memristors, while the linear device current–voltage characteristics and low wire resistance between cells leads to high accuracy. With the large memristor crossbars, we demonstrate signal processing, image compression and convolutional filtering, which are expected to be important applications in the development of the Internet of Things (IoT) and edge computing. Main Memristor crossbars with array sizes of up to 128 × 64 cells are capable of analogue vector-matrix multiplication and can be used for signal processing, image compression and convolutional filtering.
Solving Boolean satisfiability problems with resistive content addressable memories
Solving optimization problems is a highly demanding workload requiring high-performance computing systems. Optimization solvers are usually difficult to parallelize in conventional digital architectures, particularly when stochastic decisions are involved. Recently, analog computing architectures for accelerating stochastic optimization solvers have been presented, but they were limited to academic problems in quadratic polynomial format. Here we present KLIMA, a k − L ocal I n- M emory A ccelerator with resistive Content Addressable Memories (CAMs) and Dot-Product Engines (DPEs) to accelerate the solution of high-order industry-relevant optimization problems, in particular Boolean Satisfiability. By co-designing the optimization heuristics and circuit architecture we improve the speed and energy to the solution up to 182 × compared to the digital state of the art.
Tree-based machine learning performed in-memory with memristive analog CAM
Tree-based machine learning techniques, such as Decision Trees and Random Forests, are top performers in several domains as they do well with limited training datasets and offer improved interpretability compared to Deep Neural Networks (DNN). However, while easier to train, they are difficult to optimize for fast inference without accuracy loss in von Neumann architectures due to non-uniform memory access patterns. Recently, we proposed a novel analog, or multi-bit, content addressable memory(CAM) for fast look-up table operations. Here, we propose a design utilizing this as a computational primitive for rapid tree-based inference. Large random forest models are mapped to arrays of analog CAMs coupled to traditional analog random access memory (RAM), and the unique features of the analog CAM enable compression and high performance. An optimized architecture is compared with previously proposed tree-based model accelerators, showing improvements in energy to decision by orders of magnitude for common image classification tasks. The results demonstrate the potential for non-volatile analog CAM hardware in accelerating large tree-based machine learning models.
Analog content addressable memories with memristors
A content-addressable-memory compares an input search word against all rows of stored words in an array in a highly parallel manner. While supplying a very powerful functionality for many applications in pattern matching and search, it suffers from large area, cost and power consumption, limiting its use. Past improvements have been realized by using memristors to replace the static-random-access-memory cell in conventional designs, but employ similar schemes based only on binary or ternary states for storage and search. We propose a new analog content-addressable-memory concept and circuit to overcome these limitations by utilizing the analog conductance tunability of memristors. Our analog content-addressable-memory stores data within the programmable conductance and can take as input either analog or digital search values. Experimental demonstrations, scaled simulations and analysis show that our analog content-addressable-memory can reduce area and power consumption, which enables the acceleration of existing applications, but also new computing application areas.
X-TIME: An in-memory engine for accelerating machine learning on tabular data with CAMs
Structured, or tabular, data is the most common format in data science. While deep learning models have proven formidable in learning from unstructured data such as images or speech, they are less accurate than simpler approaches when learning from tabular data. In contrast, modern tree-based Machine Learning (ML) models shine in extracting relevant information from structured data. An essential requirement in data science is to reduce model inference latency in cases where, for example, models are used in a closed loop with simulation to accelerate scientific discovery. However, the hardware acceleration community has mostly focused on deep neural networks and largely ignored other forms of machine learning. Previous work has described the use of an analog content addressable memory (CAM) component for efficiently mapping random forests. In this work, we focus on an overall analog-digital architecture implementing a novel increased precision analog CAM and a programmable network on chip allowing the inference of state-of-the-art tree-based ML models, such as XGBoost and CatBoost. Results evaluated in a single chip at 16nm technology show 119x lower latency at 9740x higher throughput compared with a state-of-the-art GPU, with a 19W peak power consumption.
Experimentally realized memristive memory augmented neural network
Lifelong on-device learning is a key challenge for machine intelligence, and this requires learning from few, often single, samples. Memory augmented neural network has been proposed to achieve the goal, but the memory module has to be stored in an off-chip memory due to its size. Therefore the practical use has been heavily limited. Previous works on emerging memory-based implementation have difficulties in scaling up because different modules with various structures are difficult to integrate on the same chip and the small sense margin of the content addressable memory for the memory module heavily limited the degree of mismatch calculation. In this work, we implement the entire memory augmented neural network architecture in a fully integrated memristive crossbar platform and achieve an accuracy that closely matches standard software on digital hardware for the Omniglot dataset. The successful demonstration is supported by implementing new functions in crossbars in addition to widely reported matrix multiplications. For example, the locality-sensitive hashing operation is implemented in crossbar arrays by exploiting the intrinsic stochasticity of memristor devices. Besides, the content-addressable memory module is realized in crossbars, which also supports the degree of mismatches. Simulations based on experimentally validated models show such an implementation can be efficiently scaled up for one-shot learning on the Mini-ImageNet dataset. The successful demonstration paves the way for practical on-device lifelong learning and opens possibilities for novel attention-based algorithms not possible in conventional hardware.