Catalogue Search | MBRL

Design of synchronous frequency dividers in 5‐nm FinFET CMOS technology

by Kossel, Marcel , Brändli, Matthias , Morf, Thomas in CMOS , Conflicts of interest , counting circuits

2023

A method is presented for the design of high‐speed frequency dividers in which the divided output signals are phase aligned by means of a scheme based on cascaded retiming. The objective of the design method proposed is to break the accumulation of propagation delay occurring in a divider chain that may limit the speed of the phase synchronization. Compared to alternative approaches where the phase synchronization is achieved with additional logical gates applied to the divider outputs, the authors’ approach only uses latches that are identical to those already employed in the divider chain itself without any additional synchronization logic. Hence, a better uniformity and homogeneity of the layout can be achieved, which helps improve the phase balancing. The method proposed to design synchronous dividers has been implemented in 5‐nm FinFET CMOS technology by means of a synchronous 8b‐counter providing the division factors 1/2 through 1/256. Its output phase synchronization has been verified in measurements at 10 GHz. The measured power consumption is 720 μW and the silicon area of the divider implemented is 79 μm2. A design method for the synchronization of frequency dividers is proposed that is based on the application of cascaded retiming to break the accumulation of propagation delay occurring in conventional ripple carry counters, thus achieving higher frequencies and uniform phase balancing. The method proposed to design synchronous dividers is implemented in 5‐nm FinFET CMOS technology by means of a synchronous 8b‐counter providing the division factors 1/2 through 1/256 and is verified in measurements at 10 GHz.

Journal Article

Share this book

Add to My Shelf

A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference

by Kersting, Benedikt , Philip, Timothy , Francese, Pier Andrea in 639/166/987 , 639/705/258 , Accuracy

2023

Analogue in-memory computing (AIMC) with resistive memory devices could reduce the latency and energy consumption of deep neural network inference tasks by directly performing computations within memory. However, to achieve end-to-end improvements in latency and energy consumption, AIMC must be combined with on-chip digital operations and on-chip communication. Here we report a multicore AIMC chip designed and fabricated in 14 nm complementary metal–oxide–semiconductor technology with backend-integrated phase-change memory. The fully integrated chip features 64 AIMC cores interconnected via an on-chip communication network. It also implements the digital activation functions and additional processing involved in individual convolutional layers and long short-term memory units. With this approach, we demonstrate near-software-equivalent inference accuracy with ResNet and long short-term memory networks, while implementing all the computations associated with the weight layers and the activation functions on the chip. For 8-bit input/output matrix–vector multiplications, in the four-phase (high-precision) or one-phase (low-precision) operational read mode, the chip can achieve a maximum throughput of 16.1 or 63.1 tera-operations per second at an energy efficiency of 2.48 or 9.76 tera-operations per second per watt, respectively. A multicore analogue in-memory computing chip that is designed and fabricated in 14 nm complementary metal–oxide–semiconductor technology with backend-integrated phase-change memory can be used for deep neural network inference.

Journal Article

Share this book

Add to My Shelf

Experimental Efficiency Evaluation of Stacked Transistor Half-Bridge Topologies in 14 nm CMOS Technology

by Brunschwiler, Thomas , Krismer, Florian , Martins Bezerra, Pedro André in CMOS , Converters , Efficiency

2021

Different Half-Bridge (HB) converter topologies for an Integrated Voltage Regulator (IVR), which serves as a microprocessor application, were evaluated. The HB circuits were implemented with Stacked Transistors (HBSTs) in a cutting-edge 14 nm CMOS technology node in order to enable the integration on the microprocessor die. Compared to a conventional realization of the HBST, it was found that the Active Neutral-Point Clamped (ANPC) HBST topology with Independent Clamp Switches (ICSs) not only ensured balanced blocking voltages across the series-connected transistors, but also featured a more robust operation and achieved higher efficiencies at high output currents. The IVR achieved a maximum efficiency of 85.3% at an output current of 300 mA and a switching frequency of 50 MHz. At the maximum measured output current of 780 mA, the efficiency was 83.1%. The active part of the IVR (power switches, gate-drivers, and level shifters) realized a high maximum current density of 24.7 A/mm2.

Journal Article

Share this book

Add to My Shelf

A cryogenic SRAM based arbitrary waveform generator in 14 nm for spin qubit control

by Mueller, Peter , Kossel, Marcel , Zota, Cezar in Carrier frequencies , Closed loops , Digital to analog converters

2022

Realization of qubit gate sequences require coherent microwave control pulses with programmable amplitude, duration, spacing and phase. We propose an SRAM based arbitrary waveform generator for cryogenic control of spin qubits. We demonstrate in this work, the cryogenic operation of a fully programmable radio frequency arbitrary waveform generator in 14 nm FinFET technology. The waveform sequence from a control processor can be stored in an SRAM memory array, which can be programmed in real time. The waveform pattern is converted to microwave pulses by a source-series-terminated digital to analog converter. The chip is operational at 4 K, capable of generating an arbitrary envelope shape at the desired carrier frequency. Total power consumption of the AWG is 40-140mW at 4 K, depending upon the baud rate. A wide signal band of 1-17 GHz is measured at 4 K, while multiple qubit control can be achieved using frequency division multiplexing at an average spurious free dynamic range of 40 dB. This work paves the way to optimal qubit control and closed loop feedback control, which is necessary to achieve low latency error mitigation

Paper

Share this book

Add to My Shelf

A system design approach toward integrated cryogenic quantum control systems

by Mueller, Peter , Kossel, Marcel , Zota, Cezar in Analog to digital converters , Chains , CMOS

2022

In this paper, we provide a system level perspective on the design of control electronics for large scale quantum systems. Quantum computing systems with high-fidelity control and readout, coherent coupling, calibrated gates, and reconfigurable circuits with low error rates are expected to have superior quantum volumes. Cryogenic CMOS plays a crucial role in the realization of scalable quantum computers, by minimizing the feature size, lowering the cost, power consumption, and implementing low latency error correction. Our approach toward achieving scalable feed-back based control systems includes the design of memory based arbitrary waveform generators (AWG's), wide band radio frequency analog to digital converters, integrated amplifier chain, and state discriminators that can be synchronized with gate sequences. Digitally assisted designs, when implemented in an advanced CMOS node such as 7 nm can reap the benefits of low power due to scaling. A qubit readout chain demands several amplification stages before the digitizer. We propose the co-integration of our in-house developed InP HEMT LNAs with CMOS LNA stages to achieve the required gain at the digitizer input with minimal area. Our approach using high impedance matching between the HEMT LNA and the cryogenic CMOS receiver can relax the design constraints of an inverter-based CMOS LNA, paving the way toward a fully integrated qubit readout chain. The qubit state discriminator consists of a digital signal processor that computes the qubit state from the digitizer output and a pre-determined threshold. The proposed system realizes feedback-based optimal control for error mitigation and reduction of the required data rate through the serial interface to room temperature electronics.

Paper

Share this book

Add to My Shelf

A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference

by Kersting, Benedikt , Manuel Le Gallo , Philip, Timothy in Artificial neural networks , Chips (memory devices) , CMOS

2022

The need to repeatedly shuttle around synaptic weight values from memory to processing units has been a key source of energy inefficiency associated with hardware implementation of artificial neural networks. Analog in-memory computing (AIMC) with spatially instantiated synaptic weights holds high promise to overcome this challenge, by performing matrix-vector multiplications (MVMs) directly within the network weights stored on a chip to execute an inference workload. However, to achieve end-to-end improvements in latency and energy consumption, AIMC must be combined with on-chip digital operations and communication to move towards configurations in which a full inference workload is realized entirely on-chip. Moreover, it is highly desirable to achieve high MVM and inference accuracy without application-wise re-tuning of the chip. Here, we present a multi-core AIMC chip designed and fabricated in 14-nm complementary metal-oxide-semiconductor (CMOS) technology with backend-integrated phase-change memory (PCM). The fully-integrated chip features 64 256x256 AIMC cores interconnected via an on-chip communication network. It also implements the digital activation functions and processing involved in ResNet convolutional neural networks and long short-term memory (LSTM) networks. We demonstrate near software-equivalent inference accuracy with ResNet and LSTM networks while implementing all the computations associated with the weight layers and the activation functions on-chip. The chip can achieve a maximal throughput of 63.1 TOPS at an energy efficiency of 9.76 TOPS/W for 8-bit input/output matrix-vector multiplications.

Paper

Share this book

Add to My Shelf

5 Parallel Prism: A topology for pipelined implementations of convolutional neural networks using computational memory

by Eleftheriou, Evangelos , Francese, Pier Andrea , Benini, Luca in Artificial neural networks , Bandwidths , Communication

2019

In-memory computing is an emerging computing paradigm that could enable deeplearning inference at significantly higher energy efficiency and reduced latency. The essential idea is to map the synaptic weights corresponding to each layer to one or more computational memory (CM) cores. During inference, these cores perform the associated matrix-vector multiply operations in place with O(1) time complexity, thus obviating the need to move the synaptic weights to an additional processing unit. Moreover, this architecture could enable the execution of these networks in a highly pipelined fashion. However, a key challenge is to design an efficient communication fabric for the CM cores. Here, we present one such communication fabric based on a graph topology that is well suited for the widely successful convolutional neural networks (CNNs). We show that this communication fabric facilitates the pipelined execution of all state of-the-art CNNs by proving the existence of a homomorphism between one graph representation of these networks and the proposed graph topology. We then present a quantitative comparison with established communication topologies and show that our proposed topology achieves the lowest bandwidth requirements per communication channel. Finally, we present a concrete example of mapping ResNet-32 onto an array of CM cores.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter