Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
2,217
result(s) for
"Instruction sets"
Sort by:
Eight-Bit Vector SoftFloat Extension for the RISC-V Spike Simulator
by
Marcelli, Andrea
,
Mastrandrea, Antonio
,
Menichelli, Francesco
in
Accuracy
,
Array processors
,
Artificial intelligence
2025
The recent demand for 8-bit floating-point (FP) formats is driven by their potential to accelerate domain-specific applications with intensive vector computations (e.g., machine learning, graphics, and data compression). This paper presents the design, implementation, and application of the software model of an 8-bit FP vector arithmetic operation set, compliant with the RISC-V vector instruction set architecture. The model has been developed as an extension of the SoftFloat library and integrated into the RISC-V reference instruction-level simulator Spike, providing the first open-source 8-bit SoftFloat extension for an instruction-set simulator. Based on the SoftFloat library templates for standard FP formats, the proposed extension implements the two widely used 8-bit formats E4M3 and E5M2 in both Open Compute Project (OCP) and IEEE 754 variants. In host-time micro-kernels, FP8 delivers +2–4% more elements per second versus FP32 (across vfadd/vfsub/vfmul) and ≈5% lower RSS; E4M3 and E5M2 perform similarly. Enabling FP8 in Spike increases the stripped binary by ~1.8% (mostly .text). The proposed extension was used to fully verify and correct errors in the vector FP unit design for the eProcessor European project, and continues to be used to verify other 8-bit FP unit implementations.
Journal Article
Polaris 23: a high throughput neuromorphic processing element by RISC-V customized instruction extension for spiking neural network (RV-SNN 2.0) and SIMD-style implementation of LIF model with backpropagation STDP
by
Wang, Jiulong
,
Li, Guirun
,
Wu, Ruopu
in
Algorithms
,
Back propagation
,
Back propagation networks
2025
With the rapid evolution of neuromorphic computing, particularly in the realm of spike neural networks, the need for high-performance neuromorphic chips has escalated significantly. These chips must exhibit exceptional data throughput, necessitating both robust computing capabilities and neuronal transmission bandwidth. Addressing this imperative, our research presents a neuromorphic processing unit (NPU) that boasts both high data throughput and a customized spike neural network instruction set with backpropagation acceleration functionality. The cornerstone of this NPU is the Polaris 23 Processing Element (PE), which leverages a multi-issue super-scalar architecture to enhance instruction parallelism and mitigate the average latency of high-delay instructions. Furthermore, to ensure high-bandwidth neuronal and synaptic state transmission, Polaris 23 incorporates multi-bank caches utilizing SRAM arrays and facilitates efficient data access. Rigorous hardware and software testing have been conducted on Polaris 23. The results are compelling, demonstrating that, when compared to the PE of SpiNNaker 2, a leading neuromorphic chip, Polaris 23 doubles the neuronal transmission throughput, achieving a remarkable 16GBps/GHz. Additionally, it surpasses SpiNNaker 2 in neuron precision, maintaining the same neuronal computing efficiency. Notably, the MNIST model implemented on the Polaris 23 platform achieves an impressive accuracy of 91%.
Journal Article
A custom reduced instruction set computer-V based architecture for real-time electrocardiogram feature extraction
The growing demand for energy-efficient and real-time biomedical signal processing in wearable devices has necessitated the development of application-specific and reconfigurable embedded hardware architectures. This paper presents the register transfer level (RTL) design and simulation of a custom reduced instruction set computer-V (RISC-V) based hardware architecture tailored for real-time electrocardiogram (ECG) feature extraction, focusing on R-peak detection and heart rate (HR) calculation. The proposed system combines ECG-specific functional blocks including a specialized ECG arithmetic logic unit and a finite state machine-based ECG control unit with a compact 16-bit RISC-V control core. Hardware optimized algorithms are used to carry out pre-processing activities such high-pass and low-pass filtering as well as feature extraction processes including moving average filtering, derivative calculation, and threshold based peak identification. Designed to reduce memory footprint and control complexity, a custom instruction set architecture supports modular reconfigurability. Functional validation is carried out by Xilinx Vivado simulating RTL components described in very high speed integrated circuit (VHSIC) hardware description language (VHDL). The present work shows successful simulation of important architectural components, complete system-level integration and custom ECG data validation. This work provides the basis for an application-specific, reconfigurable, power efficient hardware solution for embedded health-monitoring devices.
Journal Article
Design, development and testing of a 16-bit reduced instruction set computer architecture based processor
by
Shah, Dhaval
,
Kanzariya, Het
,
Masharu, Yesha
in
Computer architecture
,
Control algorithms
,
Design
2023
The design of efficient processors with customized functionality is the need for low-power embedded systems. A 16-bit processor is suitable for such systems compared to a 32-bit processor due to low power consumption. In this paper, we proposed a design of a 16-bit processor based on reduced instruction set computer (RISC) architecture using a multicycle data path. The design, development, and verification were carried-out using Xilinx Vivado, Xilinx Power Estimator, and Modelsim tools. The design of the processor is implemented on Spartan 7 (XC7S6- 2CPGA196C) FPGA board using Verilog hardware description language (HDL). The verification of the designed processor is performed through the execution of a set of instructions. The proposed RISC processor design utilizes about half of the computing resources compared to traditional 16-bit processors and hence achieves significantly lesser power consumption.
Journal Article
I Drum, Therefore I Am
2013,2016
Despite their central role in many forms of music-making, drummers have been largely neglected in the scholarly literature on music and education. Drawing on data collected from in-depth interviews and questionnaires, Gareth Dylan Smith explores the identities, practices and learning of teenage and adult kit drummers in and around London. As a London-based drummer and teacher of drummers, Smith uses his own identity as participant-researcher to inform and interpret other drummers' accounts of their experiences. Drummers drum; therefore they are, they do, and they learn - in a rich tapestry of means and contexts.
Cross-layer analysis of clock glitch fault injection while fetching variable-length instructions
by
Burghoorn, Gijs
,
Maistri, Paolo
,
Deleuze, Christophe
in
Behavior
,
Circuits and Systems
,
Communications Engineering
2024
With the increasing complexity of embedded systems, the use of variable-length instruction sets has become essential, so that higher code density and better performance can be achieved. Security aspects are closely linked, considering the continuous improvement of attack techniques and equipment. Fault injection is among the most interesting and rising physical attack techniques. However, hardware designers and software developers lack accurate fault models to evaluate the vulnerabilities of their designs or codes in the presence of such attacks. In this article, we provide a proper characterization, at instruction set architecture (ISA) level, of several faulty behaviors that are experimentally observed when a processor running a variable-length instruction set is targeted. We include the binary encoding of instructions, and show how the obtained behaviors depend on the alignment in memory. Moreover, we give a deeper insight on previous results from the literature, that were still left unexplained. Additionally, we move downward at system level and consider the register-transfer level (RTL) to perform RTL fault simulation; This enables a better understanding of the faults propagation, validate the inferred fault models at ISA level, and reveal the origin of such faults at microarchitectural level. Finally, applying the given fault models leads us to provide vulnerability analysis on three different implementations of AES.
Journal Article
Uniform instruction set extensions for multiplications in contemporary and post-quantum cryptography
by
Fritzmann, Tim
,
Pöppelmann, Thomas
,
Oberhansl, Felix
in
Algorithms
,
Circuits and Systems
,
Communications Engineering
2024
Hybrid key encapsulation is in the process of becoming the de-facto standard for integration of post-quantum cryptography (PQC). Supporting two cryptographic primitives is a challenging task for constrained embedded systems. Both contemporary cryptography based on elliptic curves or RSA and PQC based on lattices require costly multiplications. Recent works have shown how to implement lattice-based cryptography on big-integer coprocessors. We propose a novel hardware design that natively supports the multiplication of polynomials and big integers, integrate it into a RISC-V core, and extend the RISC-V ISA accordingly. We provide an implementation of Saber and X25519 to demonstrate that both lattice- and elliptic-curve-based cryptography benefits from our extension. Our implementation requires only intermediate logic overhead, while significantly outperforming optimized ARM Cortex M4 implementations, other hardware/software codesigns, and designs that rely on contemporary accelerators.
Journal Article
Single-Instruction-Multiple-Data Instruction-Set-Based Heat Ranking Optimization for Massive Network Flow
2023
In order to cope with the massive scale of traffic and reduce the memory overhead of traffic statistics, the traffic statistics method based on the Sketch algorithm has become a research hotspot for traffic statistics. This paper studies the problem of the top-k flow statistics based on the Sketch algorithm and proposes a method to estimate the flow heat from massive network traffic using the Sketch algorithm and identify the kth flow with the highest heat by using a bitonic sort algorithm. In view of the performance difficulties of applying multiple hash functions in the implementation of the Sketch algorithm, the Single-Instruction-Multiple-Data (SIMD) instruction set is adopted to improve the performance of the Sketch algorithm so that SIMD instructions can process multiple fragments of data in a single step, implement multiple hash operations at the same time, compare and sort multiple flow tables at the same time. Thus, the throughput of the execution task is improved. Firstly, the elements of data flow are described and stored in the form of vectors, while the construction, analysis, and operation of data vectors are realized by SIMD instructions. Secondly, the multi-hash operation is simplified into a single vector operation, which reduces the CPU computing resource consumption of the Sketch algorithm. At the same time, the SIMD instruction set is used to optimize the parallel comparison operation of the flow table in a bitonic sort algorithm. Finally, the SIMD instruction set is used to optimize the functions in the Sketch algorithm and top-k sorting algorithm program, and the optimized code is tested and analyzed. The experimental results show that the time consumed by the advanced vector extensions (AVX)-instructions-optimized version has a significant reduction compared to the original version. When the length of KEY is 96 bytes, the instructions consumed by multiple hash functions account for less in the entire Sketch algorithm, and the time consumed by the optimized version of AVX is about 67.2% of that in the original version. As the length of KEY gradually increases to 256 bytes, the time consumed by the optimized version of AVX decreases to 53.8% of the original version. The simulation results show that the AVX optimization algorithm is effective in improving the measurement efficiency of network flow.
Journal Article
Towards Integration of a Dedicated Memory Controller and Its Instruction Set to Improve Performance of Systems Containing Computational SRAM
by
Mambu, Kévin
,
Charles, Henri-Pierre
,
Dumas, Julie
in
compilation
,
Computer architecture
,
Computer Science
2022
In-memory computing (IMC) aims to solve the performance gap between CPU and memories introduced by the memory wall. However, it does not address the energy wall problem caused by data transfer over memory hierarchies. This paper proposes the data-locality management unit (DMU) to efficiently transfer data from a DRAM memory to a computational SRAM (C-SRAM) memory allowing IMC operations. The DMU is tightly coupled within the C-SRAM and allows one to align the data structure in order to perform effective in-memory computation. We propose a dedicated instruction set within the DMU to issue data transfers. The performance evaluation of a system integrating C-SRAM within the DMU compared to a reference scalar system architecture shows an increase from ×5.73 to ×11.01 in speed-up and from ×29.49 to ×46.67 in energy reduction, versus a system integrating C-SRAM without any transfer mechanism compared to a reference scalar system architecture.
Journal Article
Indicator-based lightweight steganography on 32-bit RISC architectures for IoT security
by
Rengarajan Amirtharajan
,
Thenmozhi, K
,
John Bosco Balaguru Rayappan
in
Algorithms
,
Correlation analysis
,
Cross correlation
2019
Embedded devices with highly constrained resources are emerging in numerous application areas which include wireless sensor networks, Radio-Frequency IDentification (RFID) tags, and Internet of Things (IoT). These devices need to typically communicate small payload in the form of text/image/audio for which security is exceptionally essential. Considering the resource limitation on constrained devices, many crypto algorithms and a few stego algorithms have been designed with lightweight properties. Majority of these algorithms have been tested for lightweight property only based on their algorithmic attributes. Conversely, ensuring such lightweight characteristics by analysing their feasibility to reside and run in a constrained environment based on the device’s architectural attribute is inevitable for IoT applications. This paper aims to contribute by proposing an indicator based lightweight Least Significant Bit (LSB) steganography algorithm and to compare it’s algorithmic and device dependent implementation aspects with similar algorithms on popular 32-bit Reduced Instruction Set Computer (RISC) microcontrollers used in IoT platforms. The proposed variable embedding algorithm achieves a Peak Signal to Noise Ratio (PSNR) of over 46 dB with Normalised Cross Correlation (NCC) & Structural Similarity Index Measure (SSIM) being 0.9999 and 0.9998 respectively for an average embedding capacity of 1.5 bits per pixel. In addition to the above mentioned benchmarking parameter results, the Regular & Singular (RS) group and Sample Pair (SP) steganalysis, were also carried out to validate the security level of the proposed algorithm. On analysing the suitability of the proposed algorithm in terms of timing performance and memory requirements by implementing on different IoT hardware, the microcontroller with PIC32 core achieves a higher embedding throughput of over 2.7 Mega bits per second with a smaller memory footprint of less than 2 KB. Finally, the results obtained from the proposed work outperform the microcontroller implementation of stego algorithms reported in the literature.
Journal Article