Catalogue Search | MBRL

ReCIM: A SRAM‐Based Digital–Analogue Hybrid CIM Reformer Accelerator Macro

by Liu, Yu , Li, Hao , Wu, Xiulong in CMOS integrated circuits , hybrid integrated circuits , integrated circuit design

2025

Reformer reduces redundant self‐attention computations via hash bucketing. In this study, we introduce a SRAM‐based digital‐analogue hybrid reformer computing‐in‐memory (ReCIM) accelerator macro. This macro presents an absolute maximum value addressing circuit which facilitates the hash bucketing process and enables the utilisation of strongly‐correlated (S‐C) vectors for attention mechanism computations, thereby improving computational efficiency and saving memory space. Additionally, we introduce a reusable weight array which is suitable for matrix operations across various processes of self‐attention, minimising unnecessary area overhead and enhancing device reusability. The proposed 4 Kb ReCIM macro was analysed using 28‐nm CMOS technology. Simulation results demonstrate that the macro achieves a frequency of 500 MHz at a supply voltage of 0.9 V. During the hash bucketing process, energy efficiency reaches 9.74 TOPS/W. In this study, we introduce a SRAM‐based digital‐analogue hybrid reformer computing‐in‐memory accelerator macro. This macro presents an absolute maximum value addressing circuit which facilitates the hash bucketing process and enables the utilisation of strongly‐correlated (S‐C) vectors for attention mechanism computations, thereby improving computational efficiency and saving memory space. Simulation results show that the data processing frequency for implementing hash bucketing is as high as 500 MHz, and the energy efficiency is 9.74 TOPS/W.

Journal Article

Share this book

Add to My Shelf

In‐memory multibit multiplication and accumulation based on an automatic pulse generation circuit

by Wu, Xiulong , Peng, Chunyu , Liu, Yunlong in Accuracy , Analog circuits , digital circuits

2023

Computing‐in‐memory (CIM) is a promising technique for solving the ‘memory wall’ and ‘power consumption wall’ problems. However, calculations in the analog domain are limited in terms of accuracy and sensitivity to process, voltage, and temperature changes. In this study, the authors proposed a CIM multiply‐and‐accumulate (MAC) circuit in which the MAC result was reflected by the pulse edge and converted into the final digital output using a dual‐edge counter quantization circuit, thereby improving the accuracy of the MAC operation and reducing the difficulty of quantization. The performance of the proposed CIM circuit was evaluated using a 28‐nm process. It could achieve 4‐bit multiplication without errors, with an energy efficiency of 24.38 to 670.86 TOPS/W. In this study, the authors proposed a computing‐in‐memory multiply‐and‐accumulate (CIM MAC) circuit that could generate pulses spontaneously. The MAC result was reflected by the pulse edge and converted into the final digital output using a quantization circuit, improving the accuracy of the MAC operation and reducing the difficulty of quantization.

Journal Article

Share this book

Add to My Shelf

An energy‐efficient floating‐point compute SRAM with pipelined in‐memory bit‐parallel exponent and bitwise mantissa processing

by Wang, Mingyu , Mai, Yangzhan , Zhang, Chuanghao in Accuracy , Artificial neural networks , Computation

2023

The promise of compute‐in‐memory (CIM) for energy‐efficient deep neural network (DNN) tasks has been demonstrated. However, most previous CIM works typically focus on low‐precision DNN computing. To enable high‐precision DNN computing, this work presents a novel SRAM‐CIM design that fully supports half‐precision floating‐point (FP16) MAC operations. To maximize the energy efficiency, an efficient in‐memory bit‐parallel approach for conducting exponent operations and the bitwise in‐memory booth encoder for reducing mantissa multiplication latency are proposed. Moreover, by enabling the pipeline of exponent and mantissa processing, the hardware utilization is improved with high throughput achieved. The proposed design is analyzed in 40 nm CMOS technology. The evaluation shows that the SRAM‐CIM achieves a frequency of 714 MHz and a peak energy efficiency of 1.53 TFLOPS/W. This work presents a novel SRAM‐CIM design that fully supports half‐precision floating‐point (FP16) MAC operations. To maximize the energy efficiency, an efficient in‐memory bit‐parallel approach for conducting exponent operations and the bitwise in‐memory booth encoder for reducing mantissa multiplication latency are proposed. Moreover, by enabling the pipeline of exponent and mantissa processing, the hardware utilization is improved with high throughput achieved.

Journal Article

Share this book

Add to My Shelf

An Area‐ and Energy‐Efficient RRAM‐Based 6T1R Non‐Volatile SRAM Cell for Edge Devices

by Gao, Hanghang , Han, Zhongze , An, Junjie in Arrays , Back up systems , CMOS analogue integrated circuits

2025

This work proposes a 6T1R non‐volatile SRAM (nvSRAM) cell based on resistive memory (RRAM) with a small area overhead and low store power compared to previous designs. It features (1) reusing the transistors in the SRAM cell for accessing the RRAM cell, (2) a voltage‐division (VD)‐based restore process with reduced DC current and (3) a trimmable multi‐cycle (TMC) store process to reduce data backup and recovery errors. We fabricated a 1 kb VD‐6T1R nvSRAM test array with back‐end‐of‐line integrated metal oxide RRAM cells in a 180 nm CMOS process. The reuse of transistors allows the VD‐6T1R cell structure to occupy only 1.14× the area of a standard 6T SRAM cell. The store and restore operations were experimentally verified at the array level. The restore error rates of the fabricated test array can be effectively suppressed using TMC store cycles. The restore errors in the fabricated 1 kb cell array can be eliminated after five cycles. This work proposes a 6T1R non‐volatile SRAM (nvSRAM) cell based on resistive memory (RRAM) with a small area overhead and low store power compared to previous designs. It features (1) reusing the transistors in the SRAM cell for accessing the RRAM cell, (2) a voltage‐division (VD)‐based restore process with reduced DC current and (3) a trimmable multi‐cycle store process to reduce data backup and recovery errors. A 1 kb VD‐6T1R nvSRAM test array is demonstrated using 180 nm CMOS process.

Journal Article

Share this book

Add to My Shelf

Single bit-line 11T SRAM cell for low power and improved stability

by Pailly, Roy , Lorenzo, Rohit in bit‐line discharge , Cosmic rays , delay improvement

2020

This study aims for a new 11T static random access memory (SRAM) cell that uses power gating transistors and transmission gate for low leakage and reliable write operation. The proposed cell has a separate read and write path which successfully improves read and write abilities. Furthermore, it solves the row half select disturbance and utilises a row-based virtual ground signal to eliminate unnecessary bit-line discharge in the un-selected row, thus decreasing energy consumption. The cell also achieves low power due to the stack effect. To show the effectiveness of the cell, its design metrics are compared with other published SRAM cells, namely, conventional 6T, 10T, 9T, and power-gated 9T (PG9T). In standby mode, from 6.71 to 7.37% leakage power reduction is observed for this cell at an operating voltage of 1.2 V and 29.21 to 58.68% & 32.74 to 71.11% improvement for write & read power over other cells. The proposed cell exhibits higher write and reads static noise margins with an improvement of 13.54 and 63.28%, respectively, compared to conventional 6T SRAM cell. The cell provides write delay improvement from 29.77 to 49.40% and read delay improvement from 7 to 12% compared to 9T, 10T, and PG9T, respectively.

Journal Article

Share this book

Add to My Shelf

Design of 10T SRAM cell with improved read performance and expanded write margin

by Sachdeva, Ashish , Tomar, V. K. in Architecture , Circuit design , CMOS memory circuits

2021

The need of genuine processors operation improvement cultivates the necessity for reliable, low power and fast memories. Several challenges follow this improvement at lower technology nodes. The impact of variability of process, temperature and voltage, on different performance parameters turns out to be most relevant issues in the nanometre SRAM design. The authors propose a 10T SRAM circuit that shows reduction in read power dissipation while maintaining fair performance and stability. Impression of process parameter variations on various design metrics such as read power, read current and data retention voltage of the proposed cell are presented and compared with already proposed SRAM cell. The projected topology offers differential read and single‐ended write operation. The read margin and write margin are enhanced by 8.69% and 16.85% respectively in comparison to standard 6T SRAM cell even when single‐ended write operation is performed. Furthermore, the read and write delay of projected topology improve by 1.78× and 2.326× in comparison with conventional 6T bit SRAM cell. In FF process corner, the proposed topology shows lowest data retention voltage (DRV) and minimum variation in DRV with temperature. Out of all considered topologies, the proposed circuit is optimized to minimum power delay product during read operation. Further, standby power and read power of proposed 10T cell is reduced by 34.65% and 2.03× in contrast to conventional 6T SRAM at 0.9 V supply voltage. Analysis of process variations tolerance read power and read current is also presented with 45 nm generic process design kit technology file using cadence virtuoso tool.

Journal Article

Share this book

Add to My Shelf

Lower complexity error location detection block of adjacent error correcting decoder for SRAMs

by Maity, Raj Kumar , Samanta, Jagannath , Bhaumik, Jaydeb in adjacent error correcting codes , adjacent error correcting decoder , Application specific integrated circuits

2020

Multiple cell upsets (MCUs) caused by radiation is an important issue related to the reliability of embedded static random access memories (SRAMs). Multiple random and adjacent error correcting codes have been extensively employed for several years to protect stored data in SRAMs against MCUs. A compact and fast error correcting codec is desirable in most of these applications. In this study, simplified expressions for error location detection (ELD) block for single error correction-double error detection-double adjacent error correction (SEC-DED-DAEC) and single error correction-double error detection-triple adjacent error correction (SEC-DED-TAEC) decoders have been obtained by employing Karnaugh map. The conventional SEC-DED-DAEC and SEC-DED-TAEC decoders have been designed and implemented in both field-programmable gate array and ASIC platforms by considering these simplified ELD expressions. In FPGA platform, the proposed design for SEC-DED-DAEC and SEC-DED-TAEC decoders require 1.37–28.40% improvement in area and maximum 14.74% improvement in delay compared to existing designs. Whereas ASIC-based designs provide 2.20–26.81% reduction in area and 0.30–28.96% reduction in delay compared to existing related works. So the proposed design can be considered as an efficient alternative of traditional adjacent error correcting decoders in resource constraint applications.

Journal Article

Share this book

Add to My Shelf

Reliable SRAM using NAND‐NOR Gate in beyond‐CMOS QCA technology

by Raj, Marshal , Gopalakrishnan, Lakshminarayanan , Ko, Seok‐Bum in Cells , Cellular automata , Circuits

2021

The rise in complementary metal‐oxide semiconductor (CMOS) limitations has urged the industry to shift its focus towards beyond‐CMOS technologies to stay in race with Moore’s law. Quantum‐dot cellular automata (QCA) is considered to be a prominent paradigm among the emerging beyond‐CMOS technologies. Since QCA is an emerging technology with no proper layout tools, layout generation from hardware description language (HDL) can be done by implementing circuits using the NAND‐NOR logic. In QCA, the NAND‐NOR logic is realised by combining a majority gate and an inverter or by using some dedicated structures. The Radius of Effect (RoE) is a critical factor that depends on the permittivity of the material used and it has an influence on the columbic interaction, polarisation and kink energy. Lower Radius of Effect values will have an impact on the performance of the circuit. In this work, a cost‐efficient NAND‐NOR gate using Single Rotated Cell (SRC) inverter is proposed which can operate with lower Radius of Effect. Using the proposed gate, multiplexer, decoder, and innovative memory cell are implemented. In order to demonstrate the ability to implement larger circuits using NAND‐NOR logic and the proposed blocks, a 16*16 SRAM is implemented. QCADesigner is used for the simulation and validation of the proposed designs.

Journal Article

Share this book

Add to My Shelf

Write-variation aware alternatives to replace SRAM buffers with non-volatile buffers in on-chip interconnects

by Kapoor, Hemangee K. , Rani, Khushboo in buffer circuits , Buffers , cache storage

2019

With the advancement in CMOS technology and multiple processors on the chip, communication across these cores is managed by a network-on-chip (NoC). Power and performance of these NoC interconnects have become a significant factor.The authors aim to reduce the leakage power consumption of NoC buffers by the use of non-volatile spin transfer torque random access memory (STT-RAM)-based buffers. STT-RAM technology has the advantages of high density and low leakage but suffers from low endurance. This low endurance has an impact on the lifetime of the router on the whole due to unwanted write-variations governed by virtual channel (VC) allocation policies. Here various VC allocation policies that help the uniform distribution of the writes across the buffers are proposed. Iso-capacity and iso-area-based alternatives to replace SRAM buffers with STT-RAM buffers are also presented. Pure STT-RAM buffers, however, impact the network latency. To mitigate this, a hybrid variant of the proposed policies which uses alternative VCs made of SRAM technology in the case of heavy network traffic is proposed. Experimental evaluation of full system simulation shows that proposed policies reduce the write variation by 99% and improve lifetime by 3.2 times and 1093 times, respectively. Also a 55.5% gain in the energy delay product is obtained.

Journal Article

Share this book

Add to My Shelf

Automatic diagnosis of single fault in interconnect testing of SRAM‐based FPGA

by Nirmalraj, T. , Radhakrishnan, S. , Pandiyan, S.K. in Comparative analysis , Configurations , Digital integrated circuits

2021

Fault detection and diagnosis of a Field‐Programmable Gate Array (FPGA) in a short period is vital particularly in reducing the dead time of critical applications that are running on FPGAs. Thus, this paper proposes a new technique that is able to uniquely identify any single stuck‐at fault's location along with the type of fault. Also, the presented technique is able to locate any single pair‐wise bridging fault and distinguish between the two types of common faults. The presented technique uses the Walsh Code method to significantly reduce the number of test configurations when compared with previous methods. Extensive testing of the proposed method is carried out on a series of ISCAS’89 benchmark circuits being implemented in different FPGA families. From the simulation results, the maximum number of configurations needed for interconnect fault detection and diagnosis is log2(n(n−1)2)+3 where n is the number of nets under test. It is noted that the proposed method is able to reduce the total number of test configurations by log2(n+2) when compared with previously published methods available in the literature.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter