Catalogue Search | MBRL

A prefetch control strategy based on improved hill-climbing method in asymmetric multi-core architecture

by Cai, Min , Fang, Juan , Xu, Yixiang in Accuracy , Aggressiveness , Asymmetry

2023

Cache prefetching is a traditional way to reduce memory access latency. In multi-core systems, aggressive prefetching may harm the system. In the past, prefetching throttling strategies usually set thresholds through certain factors. When the threshold is exceeded, prefetch throttling strategies will control the aggressive prefetcher. However, these strategies usually work well in homogeneous multi-core systems and do not work well in heterogeneous multi-core systems. This paper considers the performance difference between cores under the asymmetric multi-core architecture. Through the improved hill-climbing method, the aggressiveness of prefetching for different cores is controlled, and the IPC of the core is improved. Through experiments, it is found that compared with the previous strategy, the average performance of big core is improved by more than 3%, and the average performance of little cores is improved by more than 24%.

Journal Article

Share this book

Add to My Shelf

Implementation of Real-Time Space Target Detection and Tracking Algorithm for Space-Based Surveillance

by Su, Yueqi , Cang, Chen , Chen, Xin in Aerospace environments , Aerospace safety , Algorithms

2023

Space-based target surveillance is important for aerospace safety. However, with the increasing complexity of the space environment, the stellar target and strong noise interference pose difficulties for space target detection. Simultaneously, it is hard to balance real-time processing with computational performance for the onboard processing platform owing to resource limitations. The heterogeneous multi-core architecture has corresponding processing capabilities, providing a hardware implementation platform with real-time and computational performance for space-based applications. This paper first developed a multi-stage joint detection and tracking model (MJDTM) for space targets in optical image sequences. This model combined an improved local contrast method and the Kalman filter to detect and track the potential targets and use differences in movement status to suppress the stellar targets. Then, a heterogeneous multi-core processing system based on a field-programmable gate array (FPGA) and digital signal processor (DSP) was established as the space-based image processing system. Finally, MJDTM was optimized and implemented on the above image processing system. The experiments conducted with simulated and actual image sequences examine the accuracy and efficiency of the MJDTM, which has a 95% detection probability while the false alarm rate is 10−4. According to the experimental results, the algorithm hardware implementation can detect targets in an image with 1024 × 1024 pixels in just 22.064 ms, which satisfies the real-time requirements of space-based surveillance.

Journal Article

Share this book

Add to My Shelf

Task-Level Aware Scheduling of Energy-Constrained Applications on Heterogeneous Multi-Core System

by Chen, Siheng , Tao, Wei , Li, Xiaobo in Algorithms , Communication , Constraints

2020

Minimizing the schedule length of parallel applications, which run on a heterogeneous multi-core system and are subject to energy consumption constraints, has recently attracted much attention. The key point of this problem is the strategy to pre-allocate the energy consumption of unscheduled tasks. Previous articles used the minimum value, average value or a power consumption weight value as the pre-allocation energy consumption of tasks. However, they all ignored the different levels of tasks. The tasks in different task levels have different impact on the overall schedule length when they are allocated the same energy consumption. Considering the task levels, we designed a novel task energy consumption pre-allocation strategy that is conducive to minimizing the scheduling time and developed a novel task schedule algorithm based on it. After getting the preliminary scheduling results, we also proposed a task execution frequency re-adjustment mechanism that can re-adjust the execution frequency of tasks, to further reduce the overall schedule length. We carried out a considerable number of experiments with practical parallel application models. The results of the experiments show that our method can reach better performance compared with the existing algorithms.

Journal Article

Share this book

Add to My Shelf

A Vision-Based Driver Nighttime Assistance and Surveillance System Based on Intelligent Image Sensing Techniques and a Heterogamous Dual-Core Embedded System Architecture

by Chiang, Chuan-Yen , Liu, Chuan-Ming , Yuan, Shyan-Ming in CCD sensors , Computer engineering , Computer science

2012

This study proposes a vision-based intelligent nighttime driver assistance and surveillance system (VIDASS system) implemented by a set of embedded software components and modules, and integrates these modules to accomplish a component-based system framework on an embedded heterogamous dual-core platform. Therefore, this study develops and implements computer vision and sensing techniques of nighttime vehicle detection, collision warning determination, and traffic event recording. The proposed system processes the road-scene frames in front of the host car captured from CCD sensors mounted on the host vehicle. These vision-based sensing and processing technologies are integrated and implemented on an ARM-DSP heterogamous dual-core embedded platform. Peripheral devices, including image grabbing devices, communication modules, and other in-vehicle control devices, are also integrated to form an in-vehicle-embedded vision-based nighttime driver assistance and surveillance system.

Journal Article

Share this book

Add to My Shelf

Hybrid/Heterogeneous Programming with OMPSS and Its Software/Hardware Implications

by Duran, Alejandro , Bueno, Javier , Etsion, Yoav in clustered heterogeneous multi‐/many‐core systems , computing systems , message passing interface

2017

This chapter describes how OmpSs extends the OpenMP 3.0 node programming model and how it leverages message passing interface (MPI) and OpenCL/CUDA, mastering the efficient programming of the clustered heterogeneous multi‐/many‐core systems that will be available in current and future computing systems. It describes the language extensions and the implementation of OmpSs, focusing on the intelligence that needs to be embedded in the runtime system to effectively lower the programmability wall and the opportunities to implement new mechanisms and policies. The chapter reasons about the overheads related with task management (detecting intertask data dependencies, identifying task‐level parallelism and executing tasks out of order) in OmpSs examining how far a software implementation can go to cope with fine‐grain parallelism and opening the door to novel hardware mechanisms for emerging multicore architectures. The chapter provides a brief description of the OmpSs execution model to understand the programming model extensions.

Book Chapter

Share this book

Add to My Shelf

Research and optimization of task scheduling algorithm based on heterogeneous multi-core processor

by Ding, Yongkang , Liu, Junnan , Liu, Yifan in Accuracy , Algorithms , Completion time

2024

Heterogeneous multi-core processor has the ability to switch between different types of cores to perform tasks, which provides more space and possibility for realizing efficient operation of computer system and improving computer computing power. Current research focuses on heterogeneous multiprocessor systems with high performance or low power consumption to reduce system energy consumption. However, some studies have shown that excessive voltage reduction may lead to an increase in transient failure rates, reducing system reliability. This paper studies the energy optimal scheduling problem of HMSS with DVFS under the constraints of minimum time and reliability, and proposes an improved wild horse optimization algorithm (OIWHO), which improves the efficiency of heterogeneous task scheduling and shortens the task completion time. The algorithm uses the learning and chaos perturbation strategies based on opposition and crossover strategies to balance the search and utilization capabilities, and can further improve the performance of OIWHO. Compared with previous work, our proposed algorithm has more advantages than existing algorithms. Experimental results show that the average computing time of OIWHO algorithm is 12.58%, 11.42%, 7.53%, 4.20% and 3.21% faster than DRNN-BWO, PSO, GWO-GA, GACSH and OIWOAH, respectively. Especially when solving large-scale problems, our algorithm takes less time than other algorithms.

Journal Article

Share this book

Add to My Shelf

MT-office: parallel password recovery program for office on domestic heterogeneous multi-core processor

by Liu, Jie , Wen, Jinmin , Xiao, Tiaojie in Acceleration , Access control , Algorithms

2023

With the improvement of security awareness, in order to guarantee information security, more advanced and secure encryption algorithms are applied to Microsoft Office. People also set more complex encryption passwords. However, once the initial password is forgotten, the encrypted information needs to be retrieved. The conventional brute force cracking methods and password recovery programs can hardly meet the actual deciphering needs. To this end, we develop a distributed parallel password recovery program (MT-Office) for Microsoft Office on the domestic heterogeneous multi-core processor (MT-3000). MT-Office takes full advantage of the multi-core and heterogeneous features of MT-3000, and is optimized and improved in both vectorization and global computing. At the same time, MT-Office provides multiple recovery strategies in password generation to improve the recovery efficiency. Compared with other platforms (e.g., Intel platforms and FT platforms), MT-3000 heterogeneous platform can achieve 60 × –218 × speedup ratio. For Office2010, we perform a strong scalability test on the new-generation supercomputer in National Supercomputer Center in Tianjin. MT-Office not only extends to 65,536 acceleration clusters on this system, shows good scalability, but also achieves almost linear speedup ratio. For Office2007, compared with other password recovery programs, MT-Office can achieve 2.5 × –131.1 × speedup ratio. It can be seen that MT-Office can better exploit the advantages of MT-3000, which not only has good scalability and parallelism, but also has faster deciphering speed and can be applied to practical engineering application.

Journal Article

Share this book

Add to My Shelf

Python Non-Uniform Fast Fourier Transform (PyNUFFT): An Accelerated Non-Cartesian MRI Package on a Heterogeneous Platform (CPU/GPU)

by Lin, Jyh-Miin in Algorithms , Cartesian coordinates , Central processing units

2018

A Python non-uniform fast Fourier transform (PyNUFFT) package has been developed to accelerate multidimensional non-Cartesian image reconstruction on heterogeneous platforms. Since scientific computing with Python encompasses a mature and integrated environment, the time efficiency of the NUFFT algorithm has been a major obstacle to real-time non-Cartesian image reconstruction with Python. The current PyNUFFT software enables multi-dimensional NUFFT accelerated on a heterogeneous platform, which yields an efficient solution to many non-Cartesian imaging problems. The PyNUFFT also provides several solvers, including the conjugate gradient method, ℓ1 total variation regularized ordinary least square (L1TV-OLS), and ℓ1 total variation regularized least absolute deviation (L1TV-LAD). Metaprogramming libraries have been employed to accelerate PyNUFFT. The PyNUFFT package has been tested on multi-core central processing units (CPUs) and graphic processing units (GPUs), with acceleration factors of 6.3–9.5× on a 32-thread CPU platform and 5.4–13× on a GPU.

Journal Article

Share this book

Add to My Shelf

Contribution to Speeding-Up the Solving of Nonlinear Ordinary Differential Equations on Parallel/Multi-Core Platforms for Sensing Systems

by Tavakkoli, Vahid , Kyamakya, Kyandoghere , Chedjou, Jean Chamberlain in Accuracy , Algorithms , Decomposition

2020

Solving ordinary differential equations (ODE) on heterogenous or multi-core/parallel embedded systems does significantly increase the operational capacity of many sensing systems in view of processing tasks such as self-calibration, model-based measurement and self-diagnostics. The main challenge is usually related to the complexity of the processing task at hand which costs/requires too much processing power, which may not be available, to ensure a real-time processing. Therefore, a distributed solving involving multiple cores or nodes is a good/precious option. Also, speeding-up the processing does also result in significant energy consumption or sensor nodes involved. There exist several methods for solving differential equations on single processors. But most of them are not suitable for an implementation on parallel (i.e., multi-core) systems due to the increasing communication related network delays between computing nodes, which become a main and serious bottleneck to solve such problems in a parallel computing context. Most of the problems faced relate to the very nature of differential equations. Normally, one should first complete calculations of a previous step in order to use it in the next/following step. Hereby, it appears also that increasing performance (e.g., through increasing step sizes) may possibly result in decreasing the accuracy of calculations on parallel/multi-core systems like GPUs. In this paper, we do create a new adaptive algorithm based on the Adams–Moulton and Parareal method (we call it PAMCL) and we do compare this novel method with other most relevant implementations/schemes such as the so-called DOPRI5, PAM, etc. Our algorithm (PAMCL) is showing very good performance (i.e., speed-up) while compared to related competing algorithms, while thereby ensuring a reasonable accuracy. For a better usage of computing units/resources, the OpenCL platform is selected and ODE solver algorithms are optimized to work on both GPUs and CPUs. This platform does ensure/enable a high flexibility in the use of heterogeneous computing resources and does result in a very efficient utilization of available resources when compared to other comparable/competing algorithm/schemes implementations.

Journal Article

Share this book

Add to My Shelf

Extending OpenMP to Survive the Heterogeneous Multi-Core Era

by Duran, Alejandro , Mayo, Rafael , Igual, Francisco in Accelerators , Computer Science , Enginyeria de la telecomunicació

2010

This paper advances the state-of-the-art in programming models for exploiting task-level parallelism on heterogeneous many-core systems, presenting a number of extensions to the OpenMP language inspired in the StarSs programming model. The proposed extensions allow the programmer to write portable code easily for a number of different platforms, relieving him/her from developing the specific code to off-load tasks to the accelerators and the synchronization of tasks. Our results obtained from the StarSs instantiations for SMPs, the Cell, and GPUs report reasonable parallel performance. However, the real impact of our approach in is the productivity gains it yields for the programmer.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter