Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
18 result(s) for "heterogeneous multi-core system"
Sort by:
A prefetch control strategy based on improved hill-climbing method in asymmetric multi-core architecture
Cache prefetching is a traditional way to reduce memory access latency. In multi-core systems, aggressive prefetching may harm the system. In the past, prefetching throttling strategies usually set thresholds through certain factors. When the threshold is exceeded, prefetch throttling strategies will control the aggressive prefetcher. However, these strategies usually work well in homogeneous multi-core systems and do not work well in heterogeneous multi-core systems. This paper considers the performance difference between cores under the asymmetric multi-core architecture. Through the improved hill-climbing method, the aggressiveness of prefetching for different cores is controlled, and the IPC of the core is improved. Through experiments, it is found that compared with the previous strategy, the average performance of big core is improved by more than 3%, and the average performance of little cores is improved by more than 24%.
Implementation of Real-Time Space Target Detection and Tracking Algorithm for Space-Based Surveillance
Space-based target surveillance is important for aerospace safety. However, with the increasing complexity of the space environment, the stellar target and strong noise interference pose difficulties for space target detection. Simultaneously, it is hard to balance real-time processing with computational performance for the onboard processing platform owing to resource limitations. The heterogeneous multi-core architecture has corresponding processing capabilities, providing a hardware implementation platform with real-time and computational performance for space-based applications. This paper first developed a multi-stage joint detection and tracking model (MJDTM) for space targets in optical image sequences. This model combined an improved local contrast method and the Kalman filter to detect and track the potential targets and use differences in movement status to suppress the stellar targets. Then, a heterogeneous multi-core processing system based on a field-programmable gate array (FPGA) and digital signal processor (DSP) was established as the space-based image processing system. Finally, MJDTM was optimized and implemented on the above image processing system. The experiments conducted with simulated and actual image sequences examine the accuracy and efficiency of the MJDTM, which has a 95% detection probability while the false alarm rate is 10−4. According to the experimental results, the algorithm hardware implementation can detect targets in an image with 1024 × 1024 pixels in just 22.064 ms, which satisfies the real-time requirements of space-based surveillance.
Task-Level Aware Scheduling of Energy-Constrained Applications on Heterogeneous Multi-Core System
Minimizing the schedule length of parallel applications, which run on a heterogeneous multi-core system and are subject to energy consumption constraints, has recently attracted much attention. The key point of this problem is the strategy to pre-allocate the energy consumption of unscheduled tasks. Previous articles used the minimum value, average value or a power consumption weight value as the pre-allocation energy consumption of tasks. However, they all ignored the different levels of tasks. The tasks in different task levels have different impact on the overall schedule length when they are allocated the same energy consumption. Considering the task levels, we designed a novel task energy consumption pre-allocation strategy that is conducive to minimizing the scheduling time and developed a novel task schedule algorithm based on it. After getting the preliminary scheduling results, we also proposed a task execution frequency re-adjustment mechanism that can re-adjust the execution frequency of tasks, to further reduce the overall schedule length. We carried out a considerable number of experiments with practical parallel application models. The results of the experiments show that our method can reach better performance compared with the existing algorithms.
A Vision-Based Driver Nighttime Assistance and Surveillance System Based on Intelligent Image Sensing Techniques and a Heterogamous Dual-Core Embedded System Architecture
This study proposes a vision-based intelligent nighttime driver assistance and surveillance system (VIDASS system) implemented by a set of embedded software components and modules, and integrates these modules to accomplish a component-based system framework on an embedded heterogamous dual-core platform. Therefore, this study develops and implements computer vision and sensing techniques of nighttime vehicle detection, collision warning determination, and traffic event recording. The proposed system processes the road-scene frames in front of the host car captured from CCD sensors mounted on the host vehicle. These vision-based sensing and processing technologies are integrated and implemented on an ARM-DSP heterogamous dual-core embedded platform. Peripheral devices, including image grabbing devices, communication modules, and other in-vehicle control devices, are also integrated to form an in-vehicle-embedded vision-based nighttime driver assistance and surveillance system.
Hybrid/Heterogeneous Programming with OMPSS and Its Software/Hardware Implications
This chapter describes how OmpSs extends the OpenMP 3.0 node programming model and how it leverages message passing interface (MPI) and OpenCL/CUDA, mastering the efficient programming of the clustered heterogeneous multi‐/many‐core systems that will be available in current and future computing systems. It describes the language extensions and the implementation of OmpSs, focusing on the intelligence that needs to be embedded in the runtime system to effectively lower the programmability wall and the opportunities to implement new mechanisms and policies. The chapter reasons about the overheads related with task management (detecting intertask data dependencies, identifying task‐level parallelism and executing tasks out of order) in OmpSs examining how far a software implementation can go to cope with fine‐grain parallelism and opening the door to novel hardware mechanisms for emerging multicore architectures. The chapter provides a brief description of the OmpSs execution model to understand the programming model extensions.
Research and optimization of task scheduling algorithm based on heterogeneous multi-core processor
Heterogeneous multi-core processor has the ability to switch between different types of cores to perform tasks, which provides more space and possibility for realizing efficient operation of computer system and improving computer computing power. Current research focuses on heterogeneous multiprocessor systems with high performance or low power consumption to reduce system energy consumption. However, some studies have shown that excessive voltage reduction may lead to an increase in transient failure rates, reducing system reliability. This paper studies the energy optimal scheduling problem of HMSS with DVFS under the constraints of minimum time and reliability, and proposes an improved wild horse optimization algorithm (OIWHO), which improves the efficiency of heterogeneous task scheduling and shortens the task completion time. The algorithm uses the learning and chaos perturbation strategies based on opposition and crossover strategies to balance the search and utilization capabilities, and can further improve the performance of OIWHO. Compared with previous work, our proposed algorithm has more advantages than existing algorithms. Experimental results show that the average computing time of OIWHO algorithm is 12.58%, 11.42%, 7.53%, 4.20% and 3.21% faster than DRNN-BWO, PSO, GWO-GA, GACSH and OIWOAH, respectively. Especially when solving large-scale problems, our algorithm takes less time than other algorithms.
MT-office: parallel password recovery program for office on domestic heterogeneous multi-core processor
With the improvement of security awareness, in order to guarantee information security, more advanced and secure encryption algorithms are applied to Microsoft Office. People also set more complex encryption passwords. However, once the initial password is forgotten, the encrypted information needs to be retrieved. The conventional brute force cracking methods and password recovery programs can hardly meet the actual deciphering needs. To this end, we develop a distributed parallel password recovery program (MT-Office) for Microsoft Office on the domestic heterogeneous multi-core processor (MT-3000). MT-Office takes full advantage of the multi-core and heterogeneous features of MT-3000, and is optimized and improved in both vectorization and global computing. At the same time, MT-Office provides multiple recovery strategies in password generation to improve the recovery efficiency. Compared with other platforms (e.g., Intel platforms and FT platforms), MT-3000 heterogeneous platform can achieve 60 × –218 × speedup ratio. For Office2010, we perform a strong scalability test on the new-generation supercomputer in National Supercomputer Center in Tianjin. MT-Office not only extends to 65,536 acceleration clusters on this system, shows good scalability, but also achieves almost linear speedup ratio. For Office2007, compared with other password recovery programs, MT-Office can achieve 2.5 × –131.1 × speedup ratio. It can be seen that MT-Office can better exploit the advantages of MT-3000, which not only has good scalability and parallelism, but also has faster deciphering speed and can be applied to practical engineering application.
Python Non-Uniform Fast Fourier Transform (PyNUFFT): An Accelerated Non-Cartesian MRI Package on a Heterogeneous Platform (CPU/GPU)
A Python non-uniform fast Fourier transform (PyNUFFT) package has been developed to accelerate multidimensional non-Cartesian image reconstruction on heterogeneous platforms. Since scientific computing with Python encompasses a mature and integrated environment, the time efficiency of the NUFFT algorithm has been a major obstacle to real-time non-Cartesian image reconstruction with Python. The current PyNUFFT software enables multi-dimensional NUFFT accelerated on a heterogeneous platform, which yields an efficient solution to many non-Cartesian imaging problems. The PyNUFFT also provides several solvers, including the conjugate gradient method, ℓ1 total variation regularized ordinary least square (L1TV-OLS), and ℓ1 total variation regularized least absolute deviation (L1TV-LAD). Metaprogramming libraries have been employed to accelerate PyNUFFT. The PyNUFFT package has been tested on multi-core central processing units (CPUs) and graphic processing units (GPUs), with acceleration factors of 6.3–9.5× on a 32-thread CPU platform and 5.4–13× on a GPU.
Contribution to Speeding-Up the Solving of Nonlinear Ordinary Differential Equations on Parallel/Multi-Core Platforms for Sensing Systems
Solving ordinary differential equations (ODE) on heterogenous or multi-core/parallel embedded systems does significantly increase the operational capacity of many sensing systems in view of processing tasks such as self-calibration, model-based measurement and self-diagnostics. The main challenge is usually related to the complexity of the processing task at hand which costs/requires too much processing power, which may not be available, to ensure a real-time processing. Therefore, a distributed solving involving multiple cores or nodes is a good/precious option. Also, speeding-up the processing does also result in significant energy consumption or sensor nodes involved. There exist several methods for solving differential equations on single processors. But most of them are not suitable for an implementation on parallel (i.e., multi-core) systems due to the increasing communication related network delays between computing nodes, which become a main and serious bottleneck to solve such problems in a parallel computing context. Most of the problems faced relate to the very nature of differential equations. Normally, one should first complete calculations of a previous step in order to use it in the next/following step. Hereby, it appears also that increasing performance (e.g., through increasing step sizes) may possibly result in decreasing the accuracy of calculations on parallel/multi-core systems like GPUs. In this paper, we do create a new adaptive algorithm based on the Adams–Moulton and Parareal method (we call it PAMCL) and we do compare this novel method with other most relevant implementations/schemes such as the so-called DOPRI5, PAM, etc. Our algorithm (PAMCL) is showing very good performance (i.e., speed-up) while compared to related competing algorithms, while thereby ensuring a reasonable accuracy. For a better usage of computing units/resources, the OpenCL platform is selected and ODE solver algorithms are optimized to work on both GPUs and CPUs. This platform does ensure/enable a high flexibility in the use of heterogeneous computing resources and does result in a very efficient utilization of available resources when compared to other comparable/competing algorithm/schemes implementations.
Extending OpenMP to Survive the Heterogeneous Multi-Core Era
This paper advances the state-of-the-art in programming models for exploiting task-level parallelism on heterogeneous many-core systems, presenting a number of extensions to the OpenMP language inspired in the StarSs programming model. The proposed extensions allow the programmer to write portable code easily for a number of different platforms, relieving him/her from developing the specific code to off-load tasks to the accelerators and the synchronization of tasks. Our results obtained from the StarSs instantiations for SMPs, the Cell, and GPUs report reasonable parallel performance. However, the real impact of our approach in is the productivity gains it yields for the programmer.