Catalogue Search | MBRL

A natively flexible 32-bit Arm microprocessor

by Williamson, Ken , Biggs, John , Ramsdale, Catherine in 639/166/987 , 639/301/1005/1007 , 639/766/1130/2798

2021

Nearly 50 years ago, Intel created the world’s first commercially produced microprocessor—the 4004 (ref. 1 ), a modest 4-bit CPU (central processing unit) with 2,300 transistors fabricated using 10 μm process technology in silicon and capable only of simple arithmetic calculations. Since this ground-breaking achievement, there has been continuous technological development with increasing sophistication to the stage where state-of-the-art silicon 64-bit microprocessors now have 30 billion transistors (for example, the AWS Graviton2 (ref. 2 ) microprocessor, fabricated using 7 nm process technology). The microprocessor is now so embedded within our culture that it has become a meta-invention—that is, it is a tool that allows other inventions to be realized, most recently enabling the big data analysis needed for a COVID-19 vaccine to be developed in record time. Here we report a 32-bit Arm (a reduced instruction set computing (RISC) architecture) microprocessor developed with metal-oxide thin-film transistor technology on a flexible substrate (which we call the PlasticARM). Separate from the mainstream semiconductor industry, flexible electronics operate within a domain that seamlessly integrates with everyday objects through a combination of ultrathin form factor, conformability, extreme low cost and potential for mass-scale production. PlasticARM pioneers the embedding of billions of low-cost, ultrathin microprocessors into everyday objects. Flexible electronic platforms would enable the integration of functional electronic circuitry with many everyday objects; here, a low-cost and fully flexible 32-bit microprocessor is produced.

Journal Article

Share this book

Add to My Shelf

Modeling and Simulation of Dual-Active-Bridge Based on PI Control

by Zuo, Dongsheng , Zhang, Ye , Ai, Xiaorui in Instruction sets (computers) , Microprocessors , Physics

2022

Dual-active-bridge (DAB) is a DC/DC converter,which is commonly used in solid-state-transformer (SST) and electric vehicle (EV).In order to obtain the expected output voltage,the converter needs to be modeled and controlled.Firstly,the working modes in different time intervals of the switching cycle under single-phase-shift (SPS) modulation are analyzed,and the mathematical models of output voltage,current stress and phase-shifting duty cycle are constructed.Then,the simulation model is built on Simulink,and the PI controller is used for closed-loop voltage control,The accuracy of the mathematical model is verified.

Journal Article

Share this book

Add to My Shelf

Multi-qubit DC gates over an inhomogeneous array of quantum dots

by Qi, Jiaan , Xu, Hongqi , Liu, Zhi-Hai in Arrays , Couplings , Error correction

2025

The prospect of large-scale quantum computation with an integrated chip of spin qubits is imminent as technology improves. This invites us to think beyond the traditional two-qubit-gate framework and consider a naturally supported ‘instruction set’ of multi-qubit gates. In this work, we systematically study such a family of multi-qubit gates implementable over an array of quantum dots by DC evolution. A useful representation of the computational Hamiltonian is proposed for a dot-array with strong spin–orbit coupling effects, distinctive g-factor tensors and varying interdot couplings. Adopting a perturbative treatment, we model a multi-qubit DC gate by the first-order dynamics in the qubit frame and develop a detailed formalism for decomposing the resulting gate, estimating and optimizing the coherent gate errors with appropriate local phase shifts for arbitrary array connectivity. Examples of such multi-qubit gates and their applications in quantum error correction and quantum algorithms are also explored, demonstrating their potential advantage in accelerating complex tasks and reducing overall errors.

Journal Article

Share this book

Add to My Shelf

Enhancing software-hardware co-design for HEP by low-overhead profiling of single-and multi-threaded programs on diverse architectures with Adaptyst

by Roiser, Stefan , Graczyk, Maksymilian in Central processing units , Co-design , CPUs

2025

Given the recent technological trends and novel computing paradigms spanning both software and hardware, physicists and software developers can no longer just rely on computers becoming faster to meet the everincreasing computing demands of their research. Adapting systems to the new environment may be difficult though, especially in case of large and complex applications. Therefore, we introduce Adaptyst (formerly AdaptivePerf): an open-source and architecture-agnostic tool aiming for making these computational and procurement challenges easier to address. At the moment, Adaptyst profiles on-and off-CPU activity of codes, traces all threads and processes spawned by them, and analyses low-level software-hardware interactions to the extent supported by hardware. The tool addresses the main shortcomings of Linux “perf” and has been successfully tested on x86-64, arm64, and RISC-V instruction set architectures. Adaptyst is planned to be evolved towards a software-hardware co-design framework which scales from embedded to high-performance computing in both legacy and new applications and takes into account a bigger picture than merely choosing between CPUs and GPUs. Our paper describes the current development of the project and its roadmap.

Journal Article

Share this book

Add to My Shelf

Polaris 23: a high throughput neuromorphic processing element by RISC-V customized instruction extension for spiking neural network (RV-SNN 2.0) and SIMD-style implementation of LIF model with backpropagation STDP

by Wang, Jiulong , Li, Guirun , Wu, Ruopu in Algorithms , Back propagation , Back propagation networks

2025

With the rapid evolution of neuromorphic computing, particularly in the realm of spike neural networks, the need for high-performance neuromorphic chips has escalated significantly. These chips must exhibit exceptional data throughput, necessitating both robust computing capabilities and neuronal transmission bandwidth. Addressing this imperative, our research presents a neuromorphic processing unit (NPU) that boasts both high data throughput and a customized spike neural network instruction set with backpropagation acceleration functionality. The cornerstone of this NPU is the Polaris 23 Processing Element (PE), which leverages a multi-issue super-scalar architecture to enhance instruction parallelism and mitigate the average latency of high-delay instructions. Furthermore, to ensure high-bandwidth neuronal and synaptic state transmission, Polaris 23 incorporates multi-bank caches utilizing SRAM arrays and facilitates efficient data access. Rigorous hardware and software testing have been conducted on Polaris 23. The results are compelling, demonstrating that, when compared to the PE of SpiNNaker 2, a leading neuromorphic chip, Polaris 23 doubles the neuronal transmission throughput, achieving a remarkable 16GBps/GHz. Additionally, it surpasses SpiNNaker 2 in neuron precision, maintaining the same neuronal computing efficiency. Notably, the MNIST model implemented on the Polaris 23 platform achieves an impressive accuracy of 91%.

Journal Article

Share this book

Add to My Shelf

Artificial intelligence based personalized student feedback system -Sisu Athwala' to enhance exam performance of medical undergraduates

by Seneviratne, Thilanka , Manathunga, Supun , Idirisingha, Wathsala in Academic achievement , Application programming interface , Applications programs

2025

In medical education, mentoring and feedback play crucial roles. Providing feedback on exam performance is a vital component as it allows students to improve. Feedback has to be tailor made and specific to the individual student. This needs lot of time and human resources, which are always not in abundance. Use of artificial intelligence (AI) is a promising proposition yet it comes with the integral problem of generating inaccurate responses by the Large language models (LLM). To alleviate and minimize this, we have developed our unique model 'Sisu Athwala' using retrieval augment generation (RAG) with custom LLM's. To design and implement an AI-based tool using RAG to provide customized feedback to medical students to enhance their exam performance, minimizing the risk of generating inaccurate responses by the LLM's. To evaluate the AI tool by expert student mentors and by the end users. The study was conducted at the Faculty of Medicine, University of Peradeniya, Sri Lanka. An AI based feedback tool was developed powered by Generative Pre-trained Transformers-4 (GPT-4) LLM using a RAG pipeline. Expert instruction sets were used to develop the data base through embedding model to minimize potential inaccuracies and biases. To generate user queries, students were provided with a self-evaluation form which was processed using Representative Vector Summarization (RVS). Hence most critical concerns of each student are distilled and captured accurately, minimizing noise or irrelevant details. The role of the AI tool was defined as a counsellor during Pre-processional alignment allowing professional manner throughout the interaction. User queries were processed using Open AI Application Programming Interface (API), utilizing GPT-4-turbo LLM. Students were invited to engage in conversations with the newly developed feedback tool. The AI tool was evaluated by the expert student mentors, as per its ability to give personalized feedback, use varied language expressions, and to introduce novel perspectives to students. End user perception on the use of AI tool was assessed using a questionnaire. Post implementation end user survey of the Sisu Athwala AI tool was largely positive. 92% mentioned the advices given by the tool on stress management were helpful. 60% believed that the study techniques suggested were useful. While further 60% thought they are comfortable using the tool. 52% find the advices on exam performances were helpful. In their open comments some suggested to have the tool as a mobile APP. 15 expert student mentors took part in evaluating the tool. 100% agreed that it effectively addressed key points of student strengths and identifies areas for improvements going by the Pendleton model. 90% agreed that Sisu- Athwala gives clear actionable plans. Sisu Athwala AI tool provided comprehensive tailor made feedback and guidance to medical students which was well received by the end users. Expert student mentors evaluation of the material generated by the AI tool were quite positive. Though this is not a replacement for human mentors it supports mentoring to be delivered circumventing the human resource constraints.

Journal Article

Share this book

Add to My Shelf

Kalman filter tracking on parallel architectures

by Cerati, G , Wittich, P , Lantz, S in Algorithms , Combinatorial analysis , Instruction sets (computers)

2017

We report on the progress of our studies towards a Kalman filter track reconstruction algorithm with optimal performance on manycore architectures. The combinatorial structure of these algorithms is not immediately compatible with an efficient SIMD (or SIMT) implementation; the challenge for us is to recast the existing software so it can readily generate hundreds of shared-memory threads that exploit the underlying instruction set of modern processors. We show how the data and associated tasks can be organized in a way that is conducive to both multithreading and vectorization. We demonstrate very good performance on Intel Xeon and Xeon Phi architectures, as well as promising first results on Nvidia GPUs.

Journal Article

Share this book

Add to My Shelf

chatHPC: Empowering HPC users with large language models

by Hines, Jesse , Wang, Feiyi , Herron, Emily in Automation , Compilers , Computer Science

2025

The ever-growing number of pre-trained large language models (LLMs) across scientific domains presents a challenge for application developers. While these models offer vast potential, fine-tuning them with custom data, aligning them for specific tasks, and evaluating their performance remain crucial steps for effective utilization. However, applying these techniques to models with tens of billions of parameters can take days or even weeks on modern workstations, making the cumulative cost of model comparison and evaluation a significant barrier to LLM-based application development. To address this challenge, we introduce an end-to-end pipeline specifically designed for building conversational and programmable AI agents on high performance computing (HPC) platforms. Our comprehensive pipeline encompasses: model pre-training, fine-tuning, web and API service deployment, along with crucial evaluations for lexical coherence, semantic accuracy, hallucination detection, and privacy considerations. We demonstrate our pipeline through the development of chatHPC, a chatbot for HPC question answering and script generation. Leveraging our scalable pipeline, we achieve end-to-end LLM alignment in under an hour on the Frontier supercomputer. We propose a novel self-improved, self-instruction method for instruction set generation, investigate scaling and fine-tuning strategies, and conduct a systematic evaluation of model performance. The established practices within chatHPC will serve as a valuable guidance for future LLM-based application development on HPC platforms.

Journal Article

Share this book

Add to My Shelf

Design, development and testing of a 16-bit reduced instruction set computer architecture based processor

by Shah, Dhaval , Kanzariya, Het , Masharu, Yesha in Computer architecture , Control algorithms , Design

2023

The design of efficient processors with customized functionality is the need for low-power embedded systems. A 16-bit processor is suitable for such systems compared to a 32-bit processor due to low power consumption. In this paper, we proposed a design of a 16-bit processor based on reduced instruction set computer (RISC) architecture using a multicycle data path. The design, development, and verification were carried-out using Xilinx Vivado, Xilinx Power Estimator, and Modelsim tools. The design of the processor is implemented on Spartan 7 (XC7S6- 2CPGA196C) FPGA board using Verilog hardware description language (HDL). The verification of the designed processor is performed through the execution of a set of instructions. The proposed RISC processor design utilizes about half of the computing resources compared to traditional 16-bit processors and hence achieves significantly lesser power consumption.

Journal Article

Share this book

Add to My Shelf

Comprehensive analysis of energy efficiency and performance of ARM and RISC-V SoCs

by Almeida, Francisco , Blanco, Vicente , Suárez, Daniel in Benchmarks , Chips (electronics) , Comparative analysis

2024

Over the past few years, ARM has been the dominant player in embedded systems and System-on-Chips (SoCs). With the emergence of hardware platforms based on the RISC-V architecture, a practical comparison focusing on their energy efficiency and performance is needed. In this study, our goal is to comprehensively evaluate the energy efficiency and performance of ARM and RISC-V SoCs in three different systems. We will conduct benchmark tests to measure power consumption and overall system performance. The results of our study are valuable to developers and researchers looking for the most appropriate hardware platform for energy-efficient computing applications. Our observations suggest that RISC-V Instruction Set Architecture (ISA) implementations may demonstrate lower average power consumption than ARM, but this does not automatically imply a superior performance per watt ratio for RISC-V. The primary focus of the study is to evaluate and compare these ISA implementations, aiming to identify potential areas for enhancing their energy efficiency. Furthermore, to ensure the practical applicability of our findings, we will use the Computational Fluid Dynamics software OpenFOAM. This step serves to validate the relevance of our results in real-world scenarios. It allows us to fine-tune execution parameters based on the insights gained from our initial study. By doing so, we aim not only to provide meaningful conclusions but also to investigate the transferability of our results to practical applications. Our analysis will also scrutinize the capabilities of these SoCs when handling nonsynthetic software workloads, thereby broadening the scope of our evaluation.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter