Catalogue Search | MBRL

HEALERS: a heterogeneous energy-aware low-overhead real-time scheduler

by Devaraj, Rajesh , Sarkar, Arnab , Moulik, Sanjay in Algorithms , Deadlines , DVFS

2019

Devising energy-efficient scheduling strategies for real-time periodic tasks on heterogeneous platforms is a challenging as well as a computationally demanding problem. This study proposes a low-overhead heuristic strategy called, HEALERS, for dynamic voltage and frequency scaling (DVFS)-cum-dynamic power management (DPM) enabled energy-aware scheduling of a set of periodic tasks executing on a heterogeneous multi-core system. The presented strategy first applies deadline-partitioning to acquire a set of distinct time-slices. At any time-slice boundary, the following three-phase operations are applied to obtain a schedule for the next time-slice: first, it computes the fragments of the execution demands of all tasks onto each of the different processing cores in the platform. Next, it generates a schedule for each task on one or more processing cores such that the total execution demand of all tasks is satisfied. Finally, HEALERS applies DVFS and DPM on all processing cores so that energy consumption within the time-slice may be minimized while not jeopardising execution requirements of the scheduled tasks. Experimental results show that the proposed scheme is not only able to achieve appreciable energy savings with respect to state-of-the-art (5–42% on average) but also enables a significant improvement in resource utilisation (as high as 58%).

Journal Article

Share this book

Add to My Shelf

Task Scheduling for Heterogeneous Multi‐Core Processors Based on Deep Reinforcement Learning

by Chen, Wei , Tan, Qiguang , Liu, Dake in Algorithms , Artificial neural networks , Communication

2025

Heterogeneous multicore processor systems are commonly used for scheduling tasks of DAG applications. Deep reinforcement learning, with its superior ability to perceive decisions directly and handle high‐dimensional state actions, has become a prevalent solution for scheduling these systems. However, the incomplete environment models and large action spaces of deep reinforcement learning present significant challenges to scheduling. This paper investigates a scheduling problem in a heterogeneous multicore processor environment. Initially, system environment information is extracted and encoded using a graph convolutional neural network based on integrating adapter and AdapterFusion into the transformer architecture. Then, by separating task selection and processor allocation, the decision space is reduced: the former uses a deep neural network to learn to select nodes, and the latter allocates processors using a heuristic scheduling algorithm combining earliest completion time‐based node replication and rolling technology. The entire scheduling process is a Markov decision problem. Therefore, the PPO algorithm with dynamic adjustment of the clipping factor, combined with an advantage actor‐critic network, is employed for training, optimizing, and evaluating the algorithm to find the optimal scheduling strategy. The training process adopts a reward function for the time and power consumption required for completed task scheduling to ensure that multiple DAG application task scheduling can achieve optimal performance. Experiments conducted in various environments with different parameters show that, compared to other algorithms, this algorithm reduces the overall execution time and power consumption cost of heterogeneous multicore processor tasks by 11.09%.

Journal Article

Share this book

Add to My Shelf

Co-scheduling tasks on multi-core heterogeneous systems: An energy-aware perspective

by Libutti, Simone , Fornaciari, William , Massari, Giuseppe in Algorithms , Bandwidths , coscheduling tasks

2016

Single-ISA heterogeneous multi-core processors trade-off power with performance; however, threads that co-run on shared resources suffer from resource contention, which induces performance degradation and energy inefficiency. The authors introduce a novel approach to optimise the co-scheduling of multi-threaded applications on heterogeneous processors. The approach is based on the concept of stakes function, which represents the trade-off between isolation and sharing of resources. The authors also develop a co-scheduling algorithm that use stakes functions to optimise resource usage while mitigating resource contention, thus improving performance and energy efficiency. They validated the approach using applications from the Princeton Application Repository for Shared-Memory Computers (PARSEC) benchmark suite, obtaining up to 12.88% performance speed-up, 13.65% energy speed-up and 28.29% energy delay speed-up with respect to the standard Linux heterogeneous multi-processing scheduler.

Journal Article

Share this book

Add to My Shelf

Mixed Cryptography Constrained Optimization for Heterogeneous, Multicore, and Distributed Embedded Systems

by Nam, Hyunsuk , Lysecky, Roman in adaptive system , Adaptive systems , Algorithms

2018

Embedded systems continue to execute computational- and memory-intensive applications with vast data sets, dynamic workloads, and dynamic execution characteristics. Adaptive distributed and heterogeneous embedded systems are increasingly critical in supporting dynamic execution requirements. With pervasive network access within these systems, security is a critical design concern that must be considered and optimized within such dynamically adaptive systems. This paper presents a modeling and optimization framework for distributed, heterogeneous embedded systems. A dataflow-based modeling framework for adaptive streaming applications integrates models for computational latency, mixed cryptographic implementations for inter-task and intra-task communication, security levels, communication latency, and power consumption. For the security model, we present a level-based modeling of cryptographic algorithms using mixed cryptographic implementations. This level-based security model enables the development of an efficient, multi-objective genetic optimization algorithm to optimize security and energy consumption subject to current application requirements and security policy constraints. The presented methodology is evaluated using a video-based object detection and tracking application and several synthetic benchmarks representing various application types and dynamic execution characteristics. Experimental results demonstrate the benefits of a mixed cryptographic algorithm security model compared to using a single, fixed cryptographic algorithm. Results also highlight how security policy constraints can yield increased security strength and cryptographic diversity for the same energy constraint.

Journal Article

Share this book

Add to My Shelf

Application-oriented cache memory configuration for energy efficiency in multi-cores

by Diniz, Pedro C. , Cuminato, Lucas A. , Delbem, Alexandre C.B. in Analogies , application‐oriented cache memory configuration , Architecture

2015

This study describes and evaluates an automated technique that exploits the potential of heterogeneous multi-core processor (HMP) systems when customised with respect to the number of cores and L1 cache memory sizes using a field programmable gate array fitted with LEON3 cores at its base. The authors evaluated the real energy consumption of the HMP system tuned for a set of 50 application codes using a data-mining tool for finding code similarities and selecting HMP configurations. The selected HMP system configuration requires a small cache configuration and consumes less energy when compared to a homogeneous system with the same number of cores and only with a very modest increase in execution time.

Journal Article

Share this book

Add to My Shelf

Optimal tilt and orientation maps: a multi-algorithm approach for heterogeneous multicore-GPU systems

by Villegas, A , Tabik, S , Romero, L. F in Accuracy , Algorithms , Altitude

2013

This paper presents a new Geographic Information Systems (GIS) tool to compute the optimal solar-panel positioning maps on large high-resolution Digital Elevation Models (DEMs). In particular, this software finds out (1) the maximum solar energy input that can be captured on a surface located at a specific height on each point of the DEM, and then (2) the optimal tilt and orientation that allow capturing this amount of energy. The radiation and horizon algorithms we developed in previous works were used as baseline for this tool (Romero et al. in Comput. Phys. Commun. 178(11):800–808, 2008; Tabik et al. in Int. J. Geogr. Inf. Sci. 25(4):541–555, 2011). A multi-method approach is analyzed to make the hybrid implementation of this tool especially appropriate for heterogeneous multicore-GPU architectures. The experimental results show a high numerical accuracy with a linear scalability.

Journal Article

Share this book

Add to My Shelf

Optimization and Implementation Performance of Sequence Alignment on the Intel Xeon Phi-based Heterogeneous System

by Chen, Shaolong , Wang, Wenle , Luo, Zhenzhen in Alignment , Annotations , Bioinformatics

2021

The heterogeneous system based on different architectures becomes a convenient solution in the high performance computing research when facing the expanding sequence data in bioinformatics analysis. Intel Xeon Phi-based cluster is one of the most utilized heterogeneous systems in recent years. Without accurate results from sequence alignment, the remaining two steps in the variant analysis, variant calling and variant annotation, cannot achieve the correct consequence. However, most sequence aligners are developed facing the multicore system and cannot take advantage of Intel Xeon Phi-based cluster. This paper explored the implementation modes on the Intel Xeon Phi-based heterogeneous system, including native, offload and symmetric modes. We indicate that native mode cannot take advantage of Intel Xeon Phi-based cluster through the evaluation of scalability of various modes under sequence alignment. Although offload mode owns a promising future, it is not easy to enhance performance without comprehensive coding ability. Finally, the symmetric mode could provide a low complexity solution that supports significant improvements in performance.

Journal Article

Share this book

Add to My Shelf

Complexity Analysis of a Versatile Video Coding Decoder over Embedded Systems and General Purpose Processors

by Saha, Anup , Pescador, Fernando , Chassaigne, Kheyter in codec , complexity analysis , Efficiency

2021

The increase in high-quality video consumption requires increasingly efficient video coding algorithms. Versatile video coding (VVC) is the current state-of-the-art video coding standard. Compared to the previous video standard, high efficiency video coding (HEVC), VVC demands approximately 50% higher video compression while maintaining the same quality and significantly increasing the computational complexity. In this study, coarse-grain profiling of a VVC decoder over two different platforms was performed: One platform was based on a high-performance general purpose processor (HGPP), and the other platform was based on an embedded general purpose processor (EGPP). For the most intensive computational modules, fine-grain profiling was also performed. The results allowed the identification of the most intensive computational modules necessary to carry out subsequent acceleration processes. Additionally, the correlation between the performance of each module on both platforms was determined to identify the influence of the hardware architecture.

Journal Article

Share this book

Add to My Shelf

A Multilevel Hierarchical Parallel Algorithm for Large-Scale Finite Element Modal Analysis

by Dong, Hang , Li, Junjie , Lou, Yunfeng in Algorithms , Communication , Computing time

2023

The strict and high-standard requirements for the safety and stability of major engineering systems make it a tough challenge for large-scale finite element modal analysis. At the same time, realizing the systematic analysis of the entire large structure of these engineering systems is extremely meaningful in practice. This article proposes a multilevel hierarchical parallel algorithm for large-scale finite element modal analysis to reduce the parallel computational efficiency loss when using heterogeneous multicore distributed storage computers in solving large-scale finite element modal analysis. Based on two-level partitioning and four-transformation strategies, the proposed algorithm not only improves the memory access rate through the sparsely distributed storage of a large amount of data but also reduces the solution time by reducing the scale of the generalized characteristic equation (GCEs). Moreover, a multilevel hierarchical parallelization approach is introduced during the computational procedure to enable the separation of the communication of inter-nodes, intra-nodes, heterogeneous core groups (HCGs), and inside HCGs through mapping computing tasks to various hardware layers. This method can efficiently achieve load balancing at different layers and significantly improve the communication rate through hierarchical communication. Therefore, it can enhance the efficiency of parallel computing of large-scale finite element modal analysis by fully exploiting the architecture characteristics of heterogeneous multicore clusters. Finally, typical numerical experiments were used to validate the correctness and efficiency of the proposed method. Then a parallel modal analysis example of the cross-river tunnel with over ten million degrees of freedom (DOFs) was performed, and ten-thousand core processors were applied to verify the feasibility of the algorithm.

Journal Article

Share this book

Add to My Shelf

Typhoon Case Comparison Analysis Between Heterogeneous Many-Core and Homogenous Multicore Supercomputing Platforms

by Xu, Da , Wang, Chengzhi , Han, Qiqi in Earth and Environmental Science , Earth Sciences , Heat budget

2023

In this paper, a typical experiment is carried out based on a high-resolution air-sea coupled model, namely, the coupled ocean-atmosphere-wave-sediment transport (COAWST) model, on both heterogeneous many-core (SW) and homogenous multicore (Intel) supercomputing platforms. We construct a hindcast of Typhoon Lekima on both the SW and Intel platforms, compare the simulation results between these two platforms and compare the key elements of the atmospheric and ocean modules to reanalysis data. The comparative experiment in this typhoon case indicates that the domestic many-core computing platform and general cluster yield almost no differences in the simulated typhoon path and intensity, and the differences in surface pressure (PSFC) in the WRF model and sea surface temperature (SST) in the short-range forecast are very small, whereas a major difference can be identified at high latitudes after the first 10 days. Further heat budget analysis verifies that the differences in SST after 10 days are mainly caused by shortwave radiation variations, as influenced by subsequently generated typhoons in the system. These typhoons generated in the hindcast after the first 10 days attain obviously different trajectories between the two platforms.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter