Catalogue Search | MBRL

CMOS Capacitive Fingerprint Sensor Based on Differential Sensing Circuit with Noise Cancellation

by Hassan, Hossam , Kim, Hyung-Won in capacitive fingerprint sensor , capacitive-sensing , charge integration

2018

In this paper, we introduce a differential sensing technique for CMOS capacitive fingerprint detection. It employs a new capacitive-sensing cell structure with charge sharing detection and readout circuit. The proposed technique also can eliminate the effect of parasitic capacitances by employing parasitic insensitive switched-capacitor structure and so increases the sensitivity even under severe noisy conditions. It can also overcome the performance degradation caused by various conditions of finger surface by using a differential integrator and adjusting its number of integrations. In addition, the proposed architecture allows parallel detection of all sensing channels. It can, therefore, substantially speed up the detection process compared with conventional architectures. We implemented a prototype fingerprint sensor chip with an array of 20 × 16 sensor cells using a 130 nm CMOS process. Simulation experiments demonstrated that the proposed architecture provided an SNR gain of 54 dB, whereas a conventional single line sensing gives an SNR gain of only 13 dB.

Journal Article

Share this book

Add to My Shelf

A sample average approximation algorithm for selective disassembly sequencing with abnormal disassembly operations and random operation times

by Lee, Dong-Ho , Kim, Hyung-Won in Algorithms , Approximation , CAE) and Design

2018

Selective disassembly sequencing is the problem of determining the sequence of disassembly operations to extract one or more target components of a product. This study addresses a stochastic version of the problem in which abnormal disassembly operations and random operation times are considered under the parallel disassembly environment, i.e., one or more components that can be disassembled further remain after a disassembly operation is done. Abnormal disassembly operations are defined as those in which fasteners can be removed by additional random destructive operations without damaging to target components. After representing all possible sequences using the extended process graph, a stochastic integer programming model is developed that minimizes the sum of disassembly and penalty costs, where the disassembly cost consists of sequence-dependent setup and operation costs, and the penalty cost is the expectation of the costs incurred when the total disassembly time exceeds a given threshold value. A sample average approximation algorithm is proposed that incorporates a branch and bound algorithm to solve the deterministic problem under a scenario for abnormal operations and operation times optimally. Finally, the algorithm is illustrated with a hand-light example and a larger instance.

Journal Article

Share this book

Add to My Shelf

Low-Complexity Hardware Architecture for Batch Normalization of CNN Training Accelerator

by Park, Sang-Bo , Junaid, Muhammad , Park, Gi-Tae in Accuracy , Artificial intelligence , Artificial neural networks

2025

On-device Artificial Intelligence (AI) accelerators capable of not only inference but also training neural network models are in increasing demand in the industrial AI field, where frequent retraining is crucial due to frequent production changes. Batch normalization (BN) is fundamental to training convolutional neural networks (CNNs), but its implementation in compact accelerator chips remains challenging due to computational complexity, particularly in calculating statistical parameters and gradients across mini-batches. Existing accelerator architectures either compromise the training accuracy of CNNs through approximations or require substantial computational resources, limiting their practical deployment. We present a hardware-optimized BN accelerator that maintains training accuracy while significantly reducing computational overhead through three novel techniques: (1) resource-sharing for efficient resource utilization across forward and backward passes, (2) interleaved buffering for reduced dynamic random-access memory (DRAM) access latencies, and (3) zero-skipping for minimal gradient computation. Implemented on a VCU118 Field Programmable Gate Array (FPGA) on 100 MHz and validated using You Only Look Once version 2-tiny (YOLOv2-tiny) on the PASCAL Visual Object Classes (VOC) dataset, our normalization accelerator achieves a 72% reduction in processing time and 83% lower power consumption compared to a 2.4 GHz Intel Central Processing Unit (CPU) software normalization implementation, while maintaining accuracy (0.51% mean Average Precision (mAP) drop at floating-point 32 bits (FP32), 1.35% at brain floating-point 16 bits (bfloat16)). When integrated into a neural processing unit (NPU), the design demonstrates 63% and 97% performance improvements over AMD CPU and Reduced Instruction Set Computing-V (RISC-V) implementations, respectively. These results confirm that our proposed BN hardware design enables efficient, high-accuracy, and power-saving on-device training for modern CNNs. Our results demonstrate that efficient hardware implementation of standard batch normalization is achievable without sacrificing accuracy, enabling practical on-device CNN training with significantly reduced computational and power requirements.

Journal Article

Share this book

Add to My Shelf

High-Speed CNN Accelerator SoC Design Based on a Flexible Diagonal Cyclic Array

by Sim, Sang-Hoon , Lee, Dong-Yeong , Lee, Keon-Myung in Acceleration , Accuracy , Algorithms

2024

The latest convolutional neural network (CNN) models for object detection include complex layered connections to process inference data. Each layer utilizes different types of kernel modes, so the hardware needs to support all kernel modes at an optimized speed. In this paper, we propose a high-speed and optimized CNN accelerator with flexible diagonal cyclic arrays (FDCA) that supports the acceleration of CNN networks with various kernel sizes and significantly reduces the time required for inference processing. The accelerator uses four FDCAs to simultaneously calculate 16 input channels and 8 output channels. Each FDCA features a 4 × 8 systolic array that contains a 3 × 3 processing element (PE) array and is designed to handle the most commonly used kernel sizes. To evaluate the proposed CNN accelerator, we mapped the widely used YOLOv5 CNN model and evaluated the performance of its implementation on the Zynq UltraScale+ MPSoC ZCU102 FPGA. The design consumes 249,357 logic cells, 2304 DSP blocks, and only 567 KB BRAM. In our evaluation, the YOLOv5n model achieves an accuracy of 43.1% (mAP@0.5). A prototype accelerator has been implemented using Samsung’s 14 nm CMOS technology. It achieves 1.075 TOPS, a peak performance with a 400 MHz clock frequency.

Journal Article

Share this book

Add to My Shelf

CNN Accelerator Using Proposed Diagonal Cyclic Array for Minimizing Memory Accesses

by Son, Hyun-Wook , Lee, Dong-Yeong , A. Al-Hamid, Ali in Artificial neural networks , Chips (memory devices) , Computation

2023

This paper presents the architecture of a Convolution Neural Network (CNN) accelerator based on a new processing element (PE) array called a diagonal cyclic array (DCA). As demonstrated, it can significantly reduce the burden of repeated memory accesses for feature data and weight parameters of the CNN models, which maximizes the data reuse rate and improve the computation speed. Furthermore, an integrated computation architecture has been implemented for the activation function, max-pooling, and activation function after convolution calculation, reducing the hardware resource. To evaluate the effectiveness of the proposed architecture, a CNN accelerator has been implemented for You Only Look Once version 2 (YOLOv2)-Tiny consisting of 9 layers. Furthermore, the methodology to optimize the local buffer size with little sacrifice of inference speed is presented in this work. We implemented the proposed CNN accelerator using a Xilinx Zynq ZCU102 Ultrascale+ Field Programmable Gate Array (FPGA) and ISE Design Suite. The FPGA implementation uses 34,336 Look Up Tables (LUTs), 576 Digital Signal Processing (DSP) blocks, and an on-chip memory of only 58 KB, and it could achieve accuracies of 57.92% and 56.42% mean Average Precession @0.5 thresholds for intersection over union (mAP@0.5) using quantized 16-bit and 8-bit full integer data manipulation with only 0.68% as a loss for 8-bit version and computation time of 137.9 and 69 ms for each input image respectively using a clock speed of 200 MHz. These speeds are expected to be doubled five times using a clock speed of 1 GHz if implemented in a silicon System on Chip (SoC) using a sub-micron process.

Journal Article

Share this book

Add to My Shelf

Composition and Pollution Characteristics of Precipitation in Jeju Island, Korea for 1997–2015

by Kim, Won-Hyung , Bu, Jun-Oh , Ko, Hee-Jung in Accuracy , Acid rain , Acidification

2021

This study focuses on the long-range chemical composition and pollution characteristics of precipitation components. Samples were collected from Jeju Island in 1997–2015, and their major ionic components were analyzed. Comparison of ion balance, electrical conductivity, and acid fraction of precipitation samples yielded correlation coefficients in the range of 0.937–0.980. The volume-weighted mean pH and electrical conductivity of the wet precipitation of the Jeju area were 4.81 and 21.7 μS/cm, respectively. Ionic strengths of the wet precipitation samples were within the range of 0.24 ± 0.26 mM, indicating that more than 30% of the total precipitation satisfied the pure precipitation criterion. Of the total precipitation in the Jeju area, 44% exhibited a pH in the range of 4.5–5.0, indicating weak acidity. The composition of sea salts and secondary pollutants in the precipitation were 56.8% and 28.7%, respectively, indicating that the precipitation in the Jeju area was affected by the surrounding coastal area. The acidity contributions by inorganic and organic acids were 92.3% and 7.7%, respectively, whereas, the neutralization factors for ammonia and calcium carbonate were 47.0% and 20.0%, respectively. Clustered back trajectory analysis indicates that the concentrations of most ionic components were higher in the airflow pathways to the Jeju area.

Journal Article

Share this book

Add to My Shelf

Scheduling algorithms for two-stage reentrant hybrid flow shops: minimizing makespan under the maximum allowable due dates

by Lee, Dong-Ho , Yun, Chang Yeon , Chae, Kevin B. in Algorithms , CAE) and Design , Computer-Aided Engineering (CAD

2009

We consider the scheduling problem in hybrid flow shops that consist of two stages in series, each of which has multiple identical parallel machines. Each job has reentrant flow, i.e., the job visits each production stage several times. The problem is to determine the allocation of jobs to machines as well as the sequence of the jobs assigned to each machine for the objective of minimizing makespan subject to the maximum allowable due dates in the form of a constraint set with a certain allowance. To solve the problem, two types of algorithms are suggested: (a) a branch and bound algorithm that gives optimal semi-permutation schedules; and (b) heuristic algorithms that give non-permutation schedules. To show their performances, computational experiments were done on a number of test problems and the results are reported. In particular, one of the heuristics is competitive to the branch and bound algorithm with respect to the solution quality while requiring much shorter computation times.

Journal Article

Share this book

Add to My Shelf

Loading algorithms for flexible manufacturing systems with partially grouped unrelated machines and additional tooling constraints

by Yu, Jae-Min , Lee, Dong-Ho , Kim, Ji-Su in Algorithms , CAE) and Design , Computer-Aided Engineering (CAD

2012

This paper considers the loading problem for flexible manufacturing systems with highly flexible partial machine grouping, i.e., machines are tooled differently, but each operation can be assigned to multiple machines. Loading is the problem of allocating operations and their associated cutting tools to machines for a given set of parts. As an extension of the existing studies, we consider unrelated machines, i.e., processing time of an operation depends on the speed of the machine to which it is allocated, and dedicated machines, i.e., certain part types must be processed on a specific machine. Also, we consider the constraints associated with cutting tools: (a) tool life restrictions and (b) number of available tool copies. An integer linear programming model is suggested for the objective of balancing the workloads assigned to machines and then due to the complexity of the problem, we suggest two-stage heuristics in which an initial solution is obtained using modified bin-packing algorithms and then it is improved by a simple search technique. The two-stage heuristics suggested in this study were tested on various test instances, and the results show that they can give reasonable quality solutions within a very short amount of computation time. Also, a sensitivity analysis was done on the tightness of the tooling constraints, and the results are reported.

Journal Article

Share this book

Add to My Shelf

Posterior circulation involvement and collateral flow pattern in moyamoya disease with the RNF213 polymorphism

by Kim, Won-Hyung , Nam, Myung-Hyun , Lee, Hae-Bin in Adenosine Triphosphatases - genetics , Adult , Aneurysms

2019

Purpose Moyamoya disease is a chronic cerebrovascular disorder characterized by progressive stenosis of the circle of Willis with a compensatory collateral vessel network. Recent studies have identified the ring finger protein 213 gene ( RNF213 ) as the unique susceptibility gene for moyamoya disease. The purpose of this study was to compare clinical features of moyamoya disease, especially angiographic findings, between patients with and without the RNF213 mutation. Methods Blood samples from 35 patients with moyamoya disease were obtained between May 2016 and May 2017. Information on age at the time of diagnosis, sex, and initial symptom were obtained via retrospective chart review. Angiographic records were evaluated. Results RNF213 variants were detected in the 28 of 35 patients (80%), including all pediatric patients (100%) and 18 of 25 adult patients (72%) in our cohort. Leptomeningeal collateral flow from posterior to anterior circulation was more frequent in the RNF213 -negative group than in the RNF213 -positive group (100% versus 38.9%; p = 0.020). Posterior cerebral arterial territorial involvement was more frequently observed in RNF213 -positive patients than in RNF213 -negative patients (50% versus 0%; p = 0.027). Conclusions RNF213 may play a significant role in the development of collateral anastomoses.

Journal Article

Share this book

Add to My Shelf

Reference-Free Dynamic Voltage Scaler Based on Swapping Switched-Capacitors

by Ragheb, A. N. , Kim, Hyung Won in Circuits , Computer engineering , dynamic voltage scaler

2019

This paper introduces a reference-free, scalable, and energy-efficient dynamic voltage scaler (DVS) that can be reconfigured for multiple outputs. The proposed DVS employs a novel swapping switched-capacitor (SSC) technique, which can generate target output voltages with higher resolution and smaller ripple voltages than the conventional voltage scalers based on switched-capacitors. The proposed DVS consists of a cascaded 2:1 converter based on swapping capacitors, which is essential to achieve both very small voltage ripple and fine-grain conversion ratios. One of the serious drawbacks of the conventional voltage scalers is the need for external reference voltages to maintain the target output voltage. The proposed SSC; however, eliminates the needs for any reference voltages. This significant benefit is achieved by the self-charging ability of the SSC, which can recharge all its capacitors to the configured voltage by simply swapping the two capacitors in each stage. The proposed SSC-DVS was designed with a resolution of 16 output levels and implemented using a 130 nm CMOS (Complementary Metal Oxide semiconductor) process. We conducted measured results and post-layout simulations with an input voltage of 1.5 V to produce an output voltage range of 0.085–1.4 V, which demonstrated a power efficiency of 85% for a load current of 550 µA with a voltage ripple of as low as 2.656 mV for a 2 KΩ resistor load.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter