Catalogue Search | MBRL

Cloud-Based Parameter-Driven Statistical Services and Resource Allocation in a Heterogeneous Platform on Enterprise Environment

by Sungju Lee , Taikyeong Jeong in Cloud computing , cloud computing environments , cloud computing environments; data analysis; statistical analysis; data mining; heterogeneous platform; enterprise system

2016

A fundamental key for enterprise users is a cloud-based parameter-driven statistical service and it has become a substantial impact on companies worldwide. In this paper, we demonstrate the statistical analysis for some certain criteria that are related to data and applied to the cloud server for a comparison of results. In addition, we present a statistical analysis and cloud-based resource allocation method for a heterogeneous platform environment by performing a data and information analysis with consideration of the application workload and the server capacity, and subsequently propose a service prediction model using a polynomial regression model. In particular, our aim is to provide stable service in a given large-scale enterprise cloud computing environment. The virtual machines (VMs) for cloud-based services are assigned to each server with a special methodology to satisfy the uniform utilization distribution model. It is also implemented between users and the platform, which is a main idea of our cloud computing system. Based on the experimental results, we confirm that our prediction model can provide sufficient resources for statistical services to large-scale users while satisfying the uniform utilization distribution.

Journal Article

Share this book

Add to My Shelf

pocl: A Performance-Portable OpenCL Implementation

by Raiskila, Kalle , de La Lama, Carlos Sánchez , Jääskeläinen, Pekka in Analysis , Compilers , Computer programming

2015

OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus reducing the program porting effort. While the standard brings the obvious benefits of platform portability, the performance portability aspects are largely left to the programmer. The situation is made worse due to multiple proprietary vendor implementations with different characteristics, and, thus, required optimization strategies. In this paper, we propose an OpenCL implementation that is both portable and performance portable. At its core is a kernel compiler that can be used to exploit the data parallelism of OpenCL programs on multiple platforms with different parallel hardware styles. The kernel compiler is modularized to perform target-independent parallel region formation separately from the target-specific parallel mapping of the regions to enable support for various styles of fine-grained parallel resources such as subword SIMD extensions, SIMD datapaths and static multi-issue. Unlike previous similar techniques that work on the source level, the parallel region formation retains the information of the data parallelism using the LLVM IR and its metadata infrastructure. This data can be exploited by the later generic compiler passes for efficient parallelization. The proposed open source implementation of OpenCL is also platform portable, enabling OpenCL on a wide range of architectures, both already commercialized and on those that are still under research. The paper describes how the portability of the implementation is achieved. We test the two aspects to portability by utilizing the kernel compiler and the OpenCL implementation to run OpenCL applications in various platforms with different style of parallel resources. The results show that most of the benchmarked applications when compiled using pocl were faster or close to as fast as the best proprietary OpenCL implementation for the platform at hand.

Journal Article

Share this book

Add to My Shelf

A hybrid meta-heuristic scheduler algorithm for optimization of workflow scheduling in cloud heterogeneous computing environment

by Motameni, Homayun , Mirsaeid Hosseini Shirvani , Reza Noorian Talouki in Cloud computing , Genetic algorithms , Heuristic

2022

Purpose>Improvement of workflow scheduling in distributed engineering systemsDesign/methodology/approach>The authors proposed a hybrid meta heuristic optimization algorithm.Findings>The authors have made improvement in hybrid approach by exploiting of genetic algorithm and simulated annealing plus points.Originality/value>To the best of the authors’ knowledge, this paper presents a novel theorem and novel hybrid approach.

Journal Article

Share this book

Add to My Shelf

A Hybrid Machine Learning Model for Code Optimization

by Baghdadi, Riyadh , Hakimi, Yacine , Challal, Yacine in Algorithms , Classification , Complexity

2023

The complexity of programming modern heterogeneous systems raises huge challenges. Over the past two decades, researchers have aimed to alleviate these difficulties by employing classical Machine Learning and Deep Learning techniques within compilers to optimize code automatically. This work presents a novel approach to optimize code using at the same time Classical Machine Learning and Deep Learning techniques by maximizing their benefits while mitigating their drawbacks. Our proposed model extracts features from the code using Deep Learning and then applies Classical Machine Learning to map these features to specific outputs for various tasks. The effectiveness of our model is evaluated on three downstream tasks: device mapping, optimal thread coarsening, and algorithm classification. Our experimental results demonstrate that our model outperforms previous models in device mapping with an average accuracy of 91.60% on two datasets and in optimal thread coarsening task where we are the first to achieve a positive speedup on all four platforms while achieving a comparable result of 91.48% in the algorithm classification task. Notably, our approach yields better results even with a small dataset without requiring a pre-training phase or a complex code representation, offering the advantage of reducing training time and data volume requirements.

Journal Article

Share this book

Add to My Shelf

A multi-task scheduling algorithm for heterogeneous information security in the Internet of Things for electricity

by Wang, Yanfeng , Zhu, Wenwei , Guo, Jingen in 68W01 , Algorithms , Cluster Analysis Method

2025

This paper constructs a security transmission model to calculate the power data, uses cluster analysis to analyze and integrate the collected data, and forms a security transmission fuzzy set, so as to construct a security system to realize the security transmission of power data information. Multiple tasks in the heterogeneous platform are represented using acyclic graph DAG, and suitable CPUs are classified according to the task characteristics. Two optimization objectives, minimizing the average completion time of the task and maximizing the security guarantee factor of the task, are proposed, and a mathematical model is established. According to the actual application of power IoT information security service, create and match the computing nodes. Use the QPSO algorithm to complete task scheduling and realize the optimization of task completion time. Construct an environment that simulates the operation of electric power IoT and analyze the performance of the constructed model. Simulation experiments show that the interception probability is significantly reduced when using the VCM threshold transmission control scheme compared with the probability of not using the VCM threshold transmission control scheme, from a probability of more than 90% to 0~20%, thus indicating that the VCM threshold transmission control scheme can effectively improve the security of information transmission. Comparing the load rate of computing nodes of the three algorithms, the real-time load of this paper’s algorithm floats within the range of 58%~78%, and the real-time load changes are smoother, and this paper’s algorithm is more suitable for the scheduling scenario of the Internet of Things in electric power.

Journal Article

Share this book

Add to My Shelf

Dynamic Task Planning for Heterogeneous Platforms via Spatio-Temporal and Capability Dual-Driven Framework

by Zhu, Guangxi , Wang, Gang , Han, Changxing in Algorithms , Collaboration , Coordination

2026

Dynamic task planning for heterogeneous platforms across land, sea, air, and space is essential for achieving integrated situational awareness, yet current systems suffer from limited spatiotemporal coverage and inefficient resource scheduling. To address these challenges, we propose a novel mission planning method that integrates spatiotemporal segmentation with Deep Reinforcement Learning (DRL). The approach establishes a multidimensional spatiotemporal decomposition model to break down complex observation scenarios into manageable subtasks, while incorporating a unified accessibility–visibility computation framework that accounts for Earth curvature, platform dynamics, and sensor constraints. Using a Spatio-Temporal Adaptive Scheduling Network (STAS-Net) algorithm optimized with a multi-objective reward function covering mission completion rate, temporal coordination, and residual detection capacity, the method enables intelligent coordination of heterogeneous platforms. Experimental results across small-, medium-, and large-scale scenarios demonstrate that the proposed framework consistently achieves high target coverage (up to 98.4% in small-scale and 89.7% in large-scale tasks), with a reduction in coverage loss that is only about half of that exhibited by greedy and genetic algorithms as task scale expands. Moreover, STAS-Net maintains low planning time (as low as 9.5 s in small-scale and only 18.3 s in large-scale scenarios) and high resource utilization (reaching 86.8% under large-scale settings), substantially outperforming both baseline methods in scalability and scheduling efficiency. The framework not only establishes a solid theoretical foundation but also provides a practical and feasible solution for enhancing the overall performance of multi-platform cooperative observation systems.

Journal Article

Share this book

Add to My Shelf

ETA-HP: an energy and temperature-aware real-time scheduler for heterogeneous platforms

by Chakraborty, Shounak , Moulik, Sanjay , Sharma, Yanshul in Empirical analysis , Energy consumption , Energy management

2022

Modern real-time systems are based on heterogeneous multicore platforms, which help them productively meet the applications’ diverse and high computational requirements. Managing the energy and temperature of these computational platforms has become a topic of inconceivable enthusiasm for researchers and specialists over recent years. This paper presents a heuristic technique, named ETA-HP, for energy and temperature efficient scheduling of a set of real-time periodic tasks on a DVFS empowered heterogeneous multicore system. The proposed strategy operates in four stages, namely Deadline Partitioning, Task-to-Core Allocation, Temperature-Aware Scheduling, and Energy-Aware Scheduling. Our empirical analysis shows that with a variation in system workload from 50% to 100% , ETA-HP can schedule more tasks ( 2.52% on an average) compared to the state of the art while achieving 7.29% average energy savings with 9.59∘C reduction in the average temperature of our considered heterogeneous chip-multiprocessor consisting 4 in-order and 4 out-of-order cores.

Journal Article

Share this book

Add to My Shelf

HEALERS: a heterogeneous energy-aware low-overhead real-time scheduler

by Devaraj, Rajesh , Sarkar, Arnab , Moulik, Sanjay in Algorithms , Deadlines , DVFS

2019

Devising energy-efficient scheduling strategies for real-time periodic tasks on heterogeneous platforms is a challenging as well as a computationally demanding problem. This study proposes a low-overhead heuristic strategy called, HEALERS, for dynamic voltage and frequency scaling (DVFS)-cum-dynamic power management (DPM) enabled energy-aware scheduling of a set of periodic tasks executing on a heterogeneous multi-core system. The presented strategy first applies deadline-partitioning to acquire a set of distinct time-slices. At any time-slice boundary, the following three-phase operations are applied to obtain a schedule for the next time-slice: first, it computes the fragments of the execution demands of all tasks onto each of the different processing cores in the platform. Next, it generates a schedule for each task on one or more processing cores such that the total execution demand of all tasks is satisfied. Finally, HEALERS applies DVFS and DPM on all processing cores so that energy consumption within the time-slice may be minimized while not jeopardising execution requirements of the scheduled tasks. Experimental results show that the proposed scheme is not only able to achieve appreciable energy savings with respect to state-of-the-art (5–42% on average) but also enables a significant improvement in resource utilisation (as high as 58%).

Journal Article

Share this book

Add to My Shelf

Training deep neural networks: a static load balancing approach

by Haut, Juan M , Paoletti, Mercedes E , Rico-Gallego, Juan A in Accuracy , Artificial neural networks , Communication

2020

Deep neural networks are currently trained under data-parallel setups on high-performance computing (HPC) platforms, so that a replica of the full model is charged to each computational resource using non-overlapped subsets known as batches. Replicas combine the computed gradients to update their local copies at the end of each batch. However, differences in performance of resources assigned to replicas in current heterogeneous platforms induce waiting times when synchronously combining gradients, leading to an overall performance degradation. Albeit asynchronous communication of gradients has been proposed as an alternative, it suffers from the so-called staleness problem. This is due to the fact that the training in each replica is computed using a stale version of the parameters, which negatively impacts the accuracy of the resulting model. In this work, we study the application of well-known HPC static load balancing techniques to the distributed training of deep models. Our approach is assigning a different batch size to each replica, proportional to its relative computing capacity, hence minimizing the staleness problem. Our experimental results (obtained in the context of a remotely sensed hyperspectral image processing application) show that, while the classification accuracy is kept constant, the training time substantially decreases with respect to unbalanced training. This is illustrated using heterogeneous computing platforms, made up of CPUs and GPUs with different performance.

Journal Article

Share this book

Add to My Shelf

Heterogeneous gradient computing optimization for scalable deep neural networks

by Paoletti, Mercedes E , Haut, Juan M , Rico-Gallego, Juan A in Accuracy , Artificial neural networks , Classification

2022

Nowadays, data processing applications based on neural networks cope with the growth in the amount of data to be processed and with the increase in both the depth and complexity of the neural networks architectures, and hence in the number of parameters to be learned. High-performance computing platforms are provided with fast computing resources, including multi-core processors and graphical processing units, to manage such computational burden of deep neural network applications. A common optimization technique is to distribute the workload between the processes deployed on the resources of the platform. This approach is known as data-parallelism. Each process, known as replica, trains its own copy of the model on a disjoint data partition. Nevertheless, the heterogeneity of the computational resources composing the platform requires to unevenly distribute the workload between the replicas according to its computational capabilities, to optimize the overall execution performance. Since the amount of data to be processed is different in each replica, the influence of the gradients computed by the replicas in the global parameter updating should be different. This work proposes a modification of the gradient computation method that considers the different speeds of the replicas, and hence, its amount of data assigned. The experimental results have been conducted on heterogeneous high-performance computing platforms for a wide range of models and datasets, showing an improvement in the final accuracy with respect to current techniques, with a comparable performance.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter