Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
403
result(s) for
"VLSI architecture"
Sort by:
Comparison of four different methods for massive MIMO detection and VLSI design
2023
The use of Multiple-input-multiple-output (MIMO) technology has proven to be successful. To make MIMO more accessible, it is necessary to simplify the processor operation. This paper proposes four approaches for achieving this goal: symmetric successive overrelaxation (SSOR), Gauss-Seidel (GS), approximate matrix inversion based on Neumann series, and Conjugate Gradients. The performance and VLSI of these methods are compared to determine their effectiveness.
Journal Article
Design and implementation of power and area efficient architectures of circular symmetry 2-D FIR filters using CSOA-based CSD
by
Odugu, Venkata Krishna
,
Vimala Juliet, A.
,
Srilatha Reddy, V.
in
Artificial Intelligence
,
Circuits and Systems
,
Delay
2024
An efficient 2-D Finite Impulse Response (FIR) filter is designed using modified McClellan transformations with optimized coefficients. The P3 transformation is considered to attain sharp circular symmetry filters to reduce the complexity of the architecture of the 2-D FIR filter. The filter coefficients are represented in Canonical Signed Digit (CSD) space to construct the filter architecture by multiplierless design. The CSD representation is optimized using the Cuckoo Search Algorithm (CSA) with fitness function Mean Square Error (MSE). Further, a Fully Direct (FD) type architecture of a 2-D FIR filter is implemented according to the obtained CSD-based coefficients for the length of
N
×
N
=
11
×
11
. Each row filter structure is realized and explored. All the hardware structures of row filters were realized and integrated using Verilog HDL and synthesized by Genus tools provided by the CADENCE Vendor in a 45 nm CMOS generic library. The area, delay, and power reports are generated by this synthesis tool and compared with the existing 2-D FIR filter architectures. The area, power, and delay values of the proposed filter architecture are decreased by 28.9%, 49.59%, and 36.02%, respectively to the conventional filter architecture. The Power-Delay-Product (PDP) and Area-Delay-Product (ADP) values of the proposed filter architecture are reduced by a minimum of 2.14 and 1.96 times, and a maximum of 4.31 and 66 times to the existing filter architectures respectively.
Journal Article
FPGA implementation of cost-effective robust Canny edge detection algorithm
2019
Implementation of Canny edge detection algorithm significantly outperforms the existing edge detection techniques in many computer vision algorithms. However, Canny edge detection algorithm is complex, time-consuming process with high hardware cost. To overcome these issues, a novel Canny edge detection algorithm is proposed in block level to detect edges without any loss. It uses sobel operator, approximation methods to compute gradient magnitude and orientation for replacing complex operations with reduced hardware cost, existing non-maximum suppression, block classification for adaptive thresholding and existing hysteresis thresholding. Pipelining is introduced to reduce latency. The proposed algorithm is implemented on Xilinx Virtex-5 FPGA and it provides better performance compared to frame-level Canny edge detection algorithm. The synthesized architecture reduces execution time by 6.8 % and utilizes less resource to detect edges of 512 × 512 image compared to existing distributed Canny edge detection algorithm.
Journal Article
Hardware-efficient FrWF-based architecture for joint image dehazing and denoising framework for visual sensors
2025
In addition to the haze, the captured images may also contain some amount of noise. The image dehazing approaches may be invalid when the input hazy image contains significant noises. To mitigate the effects of both haze and noise, a joint framework for image dehazing and denoising is vital. An effective framework for joint dehazing and denoising for single-image in real-time is proposed here. The VLSI design for the proposed framework is also presented. In the low-frequency (LF) subband, a hardware-friendly dehazing algorithm that makes use of the saturation-based transmission map (TM) estimation technique is used. In the high-frequency (HF) subbands, wavelet denoising combined with hard thresholding rule is used to improve the denoising capabilities. This research displays a competitive performance in the image quality of the dehazed visuals and computational effectiveness in the presence of Gaussian noise when compared to previous sophisticated dehazing algorithms. The hardware complexity of the suggested framework is reduced by using discrete wavelet transform (DWT) structures based on fractional wavelet filter (FrWF) and canonical-signed-digit (CSD) method. To the best of our knowledge, this is the first attempt to design and implement VLSI architecture for simultaneous dehazing and denoising in the wavelet domain. The proposed architecture is defined using Verilog hardware description language (HDL) and synthesized using the Cadence genus compiler. When employing the CSD technique, the proposed framework reduces area and power by 5.09% and 1.75%, respectively. The maximum operating frequency of the proposed architecture is 96.25 MHz.
Journal Article
Low-Complexity Square-Root Unscented Kalman Filter Design Methodology
2023
Square-root unscented Kalman filter (SRUKF) is a widely used state estimator for several state of-the-art, highly nonlinear, and critical applications. It improves the stability and numerical accuracy of the system compared to the non-square root formulation, the unscented Kalman filter (UKF). At the same time, SRUKF is less computationally intensive compared to UKF, making it suitable for portable and battery-powered applications. This paper proposes a low-complexity and power-efficient architecture design methodology for SRUKF presented with a use case of the simultaneous localization and mapping (SLAM) problem. Implementation results show that the proposed SRUKF methodology is highly stable and achieves higher accuracy than the extensively used extended Kalman filter and UKF when developed for highly critical nonlinear applications such as SLAM. The design is synthesized and implemented on resource constraint Zynq-7000 XC7Z020 FPGA-based Zedboard development kit and compared with the state-of-the-art Kalman filter-based FPGA designs. Synthesis results show that the architecture is highly stable and has significant computation savings in DSP cores and clock cycles. The power consumption was reduced by 64% compared to the state-of-the-art UKF design methodology. ASIC design was synthesized using UMC 90-nm technology, and the results for on-chip area and power consumption results have been discussed.
Journal Article
High-Throughput Post-Quantum Cryptographic System: CRYSTALS-Kyber with Computational Scheduling and Architecture Optimization
by
Chou, Shih-Hsiang
,
Yang, Yu-Hua
,
Chen, Ci
in
Algorithms
,
Binomial distribution
,
Computer architecture
2025
With the development of a quantum computer in the near future, classical public-key cryptography will face the challenge of being vulnerable to quantum algorithms, such as Shor’s algorithm. As communication technology advances rapidly, a great deal of personal information is being transmitted over the Internet. Based on our observation that the Kyber algorithm exhibits a significant number of idle cycles during execution when implemented following the conventional software procedure, this paper proposes a high-throughput scheduling for Kyber by parallelizing the SHA-3 function, the sampling algorithm, and the NTT computations to improve hardware utilization and reduce latency. We also introduce the 8-stage pipelined SHA-3 architecture and multi-mode polynomial arithmetic module to increase area efficiency. By also optimizing the hardware architecture of the various computational modules used by Kyber, according to the implementation result, an aggregate throughput of 877.192 kOPS in Kyber KEM can be achieved on TSMC 40 nm. In addition, our design not only achieves the highest throughput among existing studies but also improves the area and power efficiencies.
Journal Article
Performance analysis of multi-folded pipelined successive cancellation decoder architecture for polar code
2024
Polar codes are the popular error-correcting codes and increased their attention after being adopted for the control channel in fifth-generation new radio (5G NR) standards. An efficient hardware architecture for polar code is often required with minimal encoding and decoding complexity. This work proposes a Multi-folded pipelined architecture and analyzes the performance in terms of latency, hardware utilization, and throughput. The designed architecture has two folded architectures interconnected in parallel to output 4-bits simultaneously. Folding transformations are used to reduce the number of idle processing elements (PEs) in every stage leading to the effective utilization of PE. Precomputation is effectively utilized in the PE to reduce the critical path delay, which improves the maximum operating frequency. A Loop-based shifting register (LSR) is employed to reduce the number of registers used. The analytical model for latency and utilization rate has been derived from the scheduling of the proposed architecture. The proposed design shows 63–71% higher hardware utilization than conventional semi-parallel design for code length
N
=
512
suitable for the physical downlink control channel (PDCCH) in 5G NR. The architecture is also implemented in Virtex-6, ZYNQ-Ultrascale+ MPSoC device for maximum supported code length of 5G NR, i.e., up to
2
10
, compared with the existing decoders. The proposed design also has the benefit of lesser look-up-table (LUT) consumption and zero random-access-memory (RAM) usage with some additional registers, making it suitable for resource-constraint applications.
Journal Article
VLSI Architecture of Modified Complex Harmonic Wavelet Transform
2023
The complex harmonic wavelet (CHW) for the discrete signal is orthogonal, compact support in the frequency domain but is not complex conjugate symmetric for positive and negative half-planes. The modified complex harmonic wavelet (MCHW) transform is the improved version of CHW as it is the complex conjugate symmetry along with other desired properties of CHW such as orthogonality, and compact support in the frequency domain. Due to the complex conjugate symmetry, MCHW has lesser computational complexity compared to CHW. This paper introduces a new VLSI architecture for MCHW for hardware implementation and prototyped on a commercially available virtex5 field-programmable gate array (FPGA). For the validation of the proposed implementation, the real-time captured results in the logic analyzer are verified with simulation results. The maximum operating frequency targeting the above-mentioned FPGA device is reported as 92.82 MHz. The total on-chip power of the above implementation is 1.117W, out of which 84 mW is the dynamic power dissipation at a toggle rate of 12.5 %. Finally, for the area utilization of the above implementation, its resource utilization targeting the above FPGA device is reported.
Journal Article
Efficient very large-scale integration architecture design of proportionate-type least mean square adaptive filters
by
Shrinivasan, Lakshmi
,
Narasimhaiah, Divya Muddenahalli
,
Narayanappa, Chikkajala Krishnappa
in
Adaptation
,
Adaptive algorithms
,
Adaptive filters
2024
The effectiveness of adaptive filters are mainly dependent on the design techniques and the algorithm of adaptation. The most common adaptation technique used is least mean square (LMS) due its computational simplicity. The application depends on the adaptive filter configuration used and are well known for system identification and real time applications. In this work, a modified delayed μ-law proportionate normalized least mean square (DMPNLMS) algorithm has been proposed. It is the improvised version of the µ-law proportionate normalized least mean square (MPNLMS) algorithm. The algorithm is realized using Ladner-Fischer type of parallel prefix logarithmic adder to reduce the silicon area. The simulation and implementation of very large-scale integration (VLSI) architecture are done using MATLAB, Vivado suite and complementary metal–oxide– semiconductor (CMOS) 90 nm technology node using Cadence register transfer level (RTL) Genus Compiler respectively. The DMPNLMS method exhibits a reduction in mean square error, a higher rate of convergence, and more stability. The synthesis results demonstrate that it is area and delay effective, making it practical for applications where a faster operating speed is required.
Journal Article
Hardware Architecture for Guessing Random Additive Noise Decoding Markov Order (GRAND-MO)
2022
Communication channels with memory are often sensitive to burst noise, which drastically reduces the decoding performance of standard channel code decoders, and this degradation worsens as channel memory increases. Hence, interleavers and de-interleavers are usually used to reduce the effects of burst noise at the expense of increased latency in the communication system. The delay imposed by interleavers/de-interleavers and the performance deterioration induced by channel memory are unacceptable in novel applications that require ultra-low latency and high decoding performance. Guessing Random Additive Noise Decoding (GRAND) is a universal Maximum Likelihood (ML) decoding technique for short-length and high-rate channel codes. GRAND Markov Order (GRAND-MO) is a hard-input variant of GRAND that has been specifically developed for communication channels with memory that are subject to burst noise. GRAND-MO can be used directly on hard demodulated channel signals, removing the requirement for extra interleavers/de-interleavers and considerably reducing overall latency in communication systems. This paper describes a high-throughput GRAND-MO VLSI architecture that can achieve an average throughput of up to 52 Gbps and 64 Gbps for code lengths of 128 and 79, respectively. Furthermore, we propose improvements to the GRAND-MO algorithm to simplify hardware implementation and reduce decoding complexity. When compared to GRANDAB, a hard-input variant of GRAND, the proposed improved GRAND-MO algorithm yields a decoding performance gain of 2∼3 dB at a target FER of 10-5 . Similarly, as compared to the (79, 64) BCH code decoder, the proposed GRAND-MO decoder has a 33% reduced worst-case latency and a 2 dB gain in decoding performance at a target FER of 10-5 .
Journal Article