Catalogue Search | MBRL

High-frame rate homography and visual odometry by tracking binary features from the focal plane

by Murai, Riku , Saeedi, Sajad , Kelly, Paul H. J in Analog to digital conversion , Array processors , Digital imaging

2023

Robotics faces a long-standing obstacle in which the speed of the vision system’s scene understanding is insufficient, impeding the robot’s ability to perform agile tasks. Consequently, robots must often rely on interpolation and extrapolation of the vision data to accomplish tasks in a timely and effective manner. One of the primary reasons for these delays is the analog-to-digital conversion that occurs on a per-pixel basis across the image sensor, along with the transfer of pixel-intensity information to the host device. This results in significant delays and power consumption in modern visual processing pipelines. The SCAMP-5—a general-purpose Focal-plane Sensor-processor array (FPSP)—used in this research performs computations in the analog domain prior to analog-to-digital conversion. By extracting features from the image on the focal plane, the amount of data that needs to be digitised and transferred is reduced. This allows for a high frame rate and low energy consumption for the SCAMP-5. The focus of our work is on localising the camera within the scene, which is crucial for scene understanding and for any downstream robotics tasks. We present a localisation system that utilise the FPSP in two parts. First, a 6-DoF odometry system is introduced, which efficiently estimates its position against a known marker at over 400 FPS. Second, our work is extended to implement BIT-VO—6-DoF visual odometry system which operates under an unknown natural environment at 300 FPS.

Journal Article

Share this book

Add to My Shelf

Comparing reliabilities of centralized and distributed switching architectures for reconfigurable 2D arrays

by Parham, Behrooz in Algorithms , Array processors , Arrays

2021

Whether used as main processing engines or as special-purpose adjuncts, processor arrays are capable of boosting performance for a variety of computation-intensive applications. For large processor arrays, needed to achieve the required performance level in the age of big data, processor malfunctions, resulting in loss of computational capabilities, form a primary concern. There is no shortage of alternative reconfiguration architectures and associated algorithms for building robust processor arrays. However, a commensurately extensive body of knowledge about the reliability modeling aspects of such arrays is lacking. We study differences between 2D arrays with centralized and distributed switching, pointing out the advantages of the latter in terms of reliability, regularity, modularity, and VLSI realizability. Notions of reliability inversion (modeling uncertainties that might lead us to choose a less-reliable system over one with higher reliability) and modelability (system property that makes the derivation of tight reliability bounds possible, thus making reliability inversion much less likely) follow as important byproducts of our study.

Journal Article

Share this book

Add to My Shelf

Word-Based Processor Structure for Montgomery Modular Multiplier Suitable for Compact IoT Edge Devices

by Gebali, Fayez , Ibrahim, Atef in Algorithms , Array processors , crypto-processors

2023

The Internet of Things (IoT) is an emerging technology that forms a huge network of different objects and intelligent devices. IoT Security is becoming more important due to the exchange of sensitive sensor data and the potential for incorporating the virtual and real worlds. IoT edge devices create serious security threats to network systems. Due to their limited resources, it is challenging to implement cryptographic protocols on these devices to secure them. To address this problem, we should perform compact implementation of cryptographic algorithms on these devices. At the heart of most cryptographic algorithms is the modular multiplication operation. Therefore, efficient implementation of this operation will have a great impact on the implementation of the whole cryptographic protocol. In this paper, we will focus on the resource and energy efficient hardware implementation of the adopted Montgomery modular multiplication algorithm over GF(2m). The main building block of the proposed word-based processor structure is a processor array that has a modular structure with local connectivity between its processing elements. The ability to manage the saving amounts of area, delay, and consumed energy is the main benefit of the suggested hardware structure. We used ASIC technology to implement the suggested word-based processor structure. The final results show an average reduction in the area of 86.3% when compared with the competitive word-based multiplier structures. Additionally, the recommended design achieves significant average savings in area-time product, power, and consumed energy of 53.7%, 83.2%, and 72.6%, receptively, over the competitive ones. The obtained results show that the provided processor structure is best suited for application in compact IoT edge devices with limited resources.

Journal Article

Share this book

Add to My Shelf

Design of reconfigurable array processor for multimedia application

by Jiang, Lin , Zhu, Yun , Li, Xueting in Array processors , Computational efficiency , Energy consumption

2018

With the rapid growth of the amount of computations and power consumption, there is a pressing need for a high power-efficiency architecture, which takes account of computational efficiency and flexibility of application. This paper proposes a type of array-processor architecture for multimedia application which is programmable and self-reconfigurable and consists of 1024 thin-core processing elements (PE). The performance and power dissipation are demonstrated with different multimedia application algorithms such as hash, and fractional motion estimation (FME). The results show that the proposed architecture can provide high performance with less energy consumption using parallel computation.

Journal Article

Share this book

Add to My Shelf

Performance of the ATLAS Level-1 topological trigger in Run 2

by Kluit, P , Bertram, I A , Kroll, J in Algorithms , Array processors , Candidates

2022

During LHC Run 2 (2015–2018) the ATLAS Level-1 topological trigger allowed efficient data-taking by the ATLAS experiment at luminosities up to 2.1×1034 cm-2s-1, which exceeds the design value by a factor of two. The system was installed in 2016 and operated in 2017 and 2018. It uses Field Programmable Gate Array processors to select interesting events by placing kinematic and angular requirements on electromagnetic clusters, jets, τ-leptons, muons and the missing transverse energy. It allowed to significantly improve the background event rejection and signal event acceptance, in particular for Higgs and B-physics processes.

Journal Article

Share this book

Add to My Shelf

Comprehensive performance comparison of high-resolution array platforms for genome-wide Copy Number Variation (CNV) analysis in humans

by Abyzov, Alexej , Haraksingh, Rajini R. , Urban, Alexander Eckehart in Algorithms , Animal Genetics and Genomics , Array Comparative Genome Hybridization (aCGH)

2017

Background High-resolution microarray technology is routinely used in basic research and clinical practice to efficiently detect copy number variants (CNVs) across the entire human genome. A new generation of arrays combining high probe densities with optimized designs will comprise essential tools for genome analysis in the coming years. We systematically compared the genome-wide CNV detection power of all 17 available array designs from the Affymetrix, Agilent, and Illumina platforms by hybridizing the well-characterized genome of 1000 Genomes Project subject NA12878 to all arrays, and performing data analysis using both manufacturer-recommended and platform-independent software. We benchmarked the resulting CNV call sets from each array using a gold standard set of CNVs for this genome derived from 1000 Genomes Project whole genome sequencing data. Results The arrays tested comprise both SNP and aCGH platforms with varying designs and contain between ~0.5 to ~4.6 million probes. Across the arrays CNV detection varied widely in number of CNV calls (4–489), CNV size range (~40 bp to ~8 Mbp), and percentage of non-validated CNVs (0–86%). We discovered strikingly strong effects of specific array design principles on performance. For example, some SNP array designs with the largest numbers of probes and extensive exonic coverage produced a considerable number of CNV calls that could not be validated, compared to designs with probe numbers that are sometimes an order of magnitude smaller. This effect was only partially ameliorated using different analysis software and optimizing data analysis parameters. Conclusions High-resolution microarrays will continue to be used as reliable, cost- and time-efficient tools for CNV analysis. However, different applications tolerate different limitations in CNV detection. Our study quantified how these arrays differ in total number and size range of detected CNVs as well as sensitivity, and determined how each array balances these attributes. This analysis will inform appropriate array selection for future CNV studies, and allow better assessment of the CNV-analytical power of both published and ongoing array-based genomics studies. Furthermore, our findings emphasize the importance of concurrent use of multiple analysis algorithms and independent experimental validation in array-based CNV detection studies.

Journal Article

Share this book

Add to My Shelf

SIMD-Optimized Indexing for Columnar Databases: Benchmarking Performance in Real-Time Analytical Workloads

by da Silva Pinto, Afonso in Array processors , Engineering , Engines

2025

Query engines enable users to execute queries quickly and gather results, supporting data retrievalacross multiple data sources without needing custom code. The exponential growth of data volumes places increasing demands on modern databases, requiring higher performance, scalability,and efficient real-time query processing. These demands motivated the creation of alternativeDatabase Management System (DBMS) architectures. Unlike traditional systems optimized forquick read-and-write operations on small datasets for transactional workloads, other architecturesprioritize statistical insights.Columnar query engines have become a prominent architecture for analytical processing, asthey efficiently store and handle large datasets and optimize analytics extraction. These enginesleverage columnar storage formats to improve query performance, particularly for data scans andaggregations.SIMD instructions allow CPUs to simultaneously execute the same operation across multiple data elements organized in vectors, significantly reducing execution time. This technique isparticularly beneficial for column-oriented databases due to their inherent memory locality.Indexes provide an additional method for enhancing database performance. Traditional indexing techniques like B-trees are optimized for relational DBMS to accelerate row-level retrievals.In contrast, columnar systems focus on large-scale scans and aggregations, where conventionalindexes are less effective. Recent research, however, has refined indexing techniques to be morecompatible with OLAP queries and analytical workloads.This dissertation investigates how combining indexing techniques with columnar databasesand vectorization improves performance in real-time analytics and query systems. It addresseslimitations in existing approaches by integrating index structures, such as bitmap and tree-basedindexes, with optimizations tailored for real-time analytics performance.A systematic evaluation methodology is employed to validate the proposed solution usingindustry-standard benchmarks, including TPC-H and TPC-DS. These benchmarks measure querylatency, I/O operations, and resource utilization. Experiments cover multiple configurations, including tests with unindexed data, to isolate and demonstrate the contributions of the proposedtechniques. Performance metrics such as CPU and memory usage are analyzed to identify bottlenecks and opportunities for further optimization.The results confirm that integrating vectorized indexing techniques can improve query performance by reducing latency, depending on the use case. However, the research also examinesinherent trade-offs, including increased data structure size, additional write overhead, and hardware usage. These findings validate the proposed approach and underscore its potential to addressthe challenges of modern analytical workloads.These findings suggest SIMD-optimized indexes improve performance in OLAP workloadsand require further research into their integration in columnar query engines.

Dissertation

Share this book

Add to My Shelf

STATISTICAL THRESHOLDS FOR TENSOR PCA

by Miolane, Léo , Lopatto, Patrick , Jagannath, Aukosh in Array processors , Estimating techniques , Mathematical analysis

2020

We study the statistical limits of testing and estimation for a rank one deformation of a Gaussian random tensor. We compute the sharp thresholds for hypothesis testing and estimation by maximum likelihood and show that they are the same. Furthermore, we find that the maximum likelihood estimator achieves the maximal correlation with the planted vector among measurable estimators above the estimation threshold. In this setting, the maximum likelihood estimator exhibits a discontinuous BBP-type transition: below the critical threshold the estimator is orthogonal to the planted vector, but above the critical threshold, it achieves positive correlation which is uniformly bounded away from zero.

Journal Article

Share this book

Add to My Shelf

Multi‐objective single‐shot neural architecture search via efficient convolutional filters

by Shariatzadeh, Seyed Mahdi , Fathy, Mahmood , Berangi, Reza in Array processors , artificial intelligence , Artificial neural networks

2023

This paper presents a novel approach for fast neural architecture search (NAS) in Convolutional Neural Networks (CNNs) for end‐to‐end License Plate Recognition (LPR). The authors propose a one‐shot schema that considers the efficiency of different convolutional filters to create a search space for more efficient architectures on vector processing cores. The authors’ approach utilizes a super‐network for LPR using Connectionist‐Temporal‐Cost (CTC) and ranks the importance of filters to generate a fine‐grain list of architectures. These architectures are evaluated in a multi‐objective manner, resulting in several Pareto‐optimal architectures with different computational costs and validation errors. Rather than using a single complicated building block for all layers, the authors’ method allows each stage to select a custom building block with fewer or more operations. The authors show that their super‐network is flexible to calculate filters of any required size and stride in each stage while keeping it efficient by the structural pruning. The authors’ experiments, which were performed on Iranian LPR, demonstrate that this method produces a variety of fast and efficient CNNs. Furthermore, the authors discuss the potential of this method for use in other areas of CNN application.

Journal Article

Share this book

Add to My Shelf

A high speed multi-level-parallel array processor for vision chips

by SHI Cong YANG Jie WU NanJian WANG ZhiHua in Algorithms , Array processors , Arrays

2014

This paper proposes a high speed multi-level-parallel array processor for programmable vision chips.This processor includes 2-D pixel-parallel processing element（PE）array and 1-D row-parallel row processor（RP）array.The two arrays both operate in a single-instruction multiple-data（SIMD）fashion and share a common instruction decoder.The sizes of the arrays are scalable according to dedicated applications.In PE array,each PE can communicate not only with its nearest neighbor PEs,but also with the next near neighbor PEs in diagonal directions.This connection can help to speed up local operations in low-level image processing.On the other hand,global operations in mid-level processing are accelerated by the skipping chain and binary boosters in RP array.The array processor was implemented on an FPGA device,and was successfully tested for various algorithms,including real-time face detection based on PPED algorithm.The results show that the image processing speed of proposed processor is much higher than that of the state-of-the-arts digital vision chips.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter