Catalogue Search | MBRL

Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture

by Pang, Jianmin , Zhu, Yu , Xu, Jinlong in Chips (memory devices) , Compilers , Computation

2021

The heterogeneous many-core architecture plays an important role in the fields of high-performance computing and scientific computing. It uses accelerator cores with on-chip memories to improve performance and reduce energy consumption. Scratchpad memory (SPM) is a kind of fast on-chip memory with lower energy consumption compared with a hardware cache. However, data transfer between SPM and off-chip memory can be managed only by a programmer or compiler. In this paper, we propose a compiler-directed multithreaded SPM data transfer model (MSDTM) to optimize the process of data transfer in a heterogeneous many-core architecture. We use compile-time analysis to classify data accesses, check dependences and determine the allocation of data transfer operations. We further present the data transfer performance model to derive the optimal granularity of data transfer and select the most profitable data transfer strategy. We implement the proposed MSDTM on the GCC complier and evaluate it on Sunway TaihuLight with selected test cases from benchmarks and scientific computing applications. The experimental result shows that the proposed MSDTM improves the application execution time by 5.49 × and achieves an energy saving of 5.16 × on average.

Journal Article

Share this book

Add to My Shelf

Deadlocks Detection in Multithreaded Applications Based on Source Code Analysis

by Giebas, Damian , Wojszczyk, Rafał in Communication , deadlock , Libraries

2020

This paper extends multithreaded application source code model and shows how to using it to detect deadlocks in C language applications. Four known deadlock scenarios from literature can be detected using our model. For every scenario we created theorems and proofs whose fulfillment guarantees the occurrence of deadlocks in multithreaded applications. Paper also contains comparison of multithreaded application source code model and Petri nets and describe advantages and disadvantages both of them.

Journal Article

Share this book

Add to My Shelf

Atomicity Violation in Multithreaded Applications and Its Detection in Static Code Analysis Process

by Giebas, Damian , Wojszczyk, Rafał in Application , atomicity violation , Error-correcting codes

2020

This paper is a contribution to the field of research dealing with the parallel computing, which is used in multithreaded applications. The paper discusses the characteristics of atomicity violation in multithreaded applications and develops a new definition of atomicity violation based on previously defined relationships between operations, that can be used to atomicity violation detection. A method of detection of conflicts causing atomicity violation was also developed using the source code model of multithreaded applications that predicts errors in the software.

Journal Article

Share this book

Add to My Shelf

Co-scheduling tasks on multi-core heterogeneous systems: An energy-aware perspective

by Libutti, Simone , Fornaciari, William , Massari, Giuseppe in Algorithms , Bandwidths , coscheduling tasks

2016

Single-ISA heterogeneous multi-core processors trade-off power with performance; however, threads that co-run on shared resources suffer from resource contention, which induces performance degradation and energy inefficiency. The authors introduce a novel approach to optimise the co-scheduling of multi-threaded applications on heterogeneous processors. The approach is based on the concept of stakes function, which represents the trade-off between isolation and sharing of resources. The authors also develop a co-scheduling algorithm that use stakes functions to optimise resource usage while mitigating resource contention, thus improving performance and energy efficiency. They validated the approach using applications from the Princeton Application Repository for Shared-Memory Computers (PARSEC) benchmark suite, obtaining up to 12.88% performance speed-up, 13.65% energy speed-up and 28.29% energy delay speed-up with respect to the standard Linux heterogeneous multi-processing scheduler.

Journal Article

Share this book

Add to My Shelf

PS-Cache: an energy-efficient cache design for chip multiprocessors

by Ros, Alberto , Gomez, Maria E , Valls, Joan J in Associativity , Classification , Design

2015

Power consumption has become a major design concern in current high-performance chip multiprocessors, and this problem exacerbates with the number of core counts. A significant fraction of the total power budget is often consumed by on-chip caches, thus important research has focused on reducing energy consumption in these structures. To enhance performance, on-chip caches are being deployed with a high associativity degree. Consequently, accessing concurrently all the ways in the cache set is costly in terms of energy. This paper presents the PS-Cache architecture, an energy-efficient cache design that reduces the number of accessed ways without hurting the performance. The PS-Cache takes advantage of the private-shared knowledge of the referenced block to reduce energy by accessing only those ways holding the kind of block looked up. Experimental results show that, on average, the PS-Cache architecture can reduce the dynamic energy consumption of L1 and L2 caches by 22 and 40% , respectively.

Journal Article

Share this book

Add to My Shelf

Multithreaded Programming with C++

by Gregoire, Marc in atomic operations , atomic types , C++ programmer

2021

This chapter introduces the readers to multithreaded programming using the standard threading library. It helps the readers learn how to launch threads using std::thread and how C++20 makes it easier to write cancellable threads using std::jthread. The chapter explains how the readers can use atomic types and atomic operations to operate on shared data without having to use an explicit synchronization mechanism. It also helps the readers learn about the new synchronization primitives that C++20 brings to the table: semaphores, latches, and barriers. The chapter shows how promises and futures represent a simple inter‐thread communication channel; the readers can use futures to more easily get a result back from a thread. It concludes with a brief introduction to coroutines and a number of best practices for multithreaded application design.

Book Chapter

Share this book

Add to My Shelf

A Methodology for Optimizing Multithreaded System Scalability on Multicores

by Gunther, Neil , Parvu, Stefan , Subramanyam, Shanti in controlled performance measurements , J2EE applications , memcached scalability

2017

This chapter shows how scalability can be quantified using the universal scalability law (USL) by applying it to controlled performance measurements of memcached (MCD), Java 2 Platform, Enterprise Edition (J2EE) and WebLogic. When doing scalability analysis of multithreaded applications, it is important to collect the data using controlled measurements. Data collected from controlled performance measurements are only as good as the workload used to run the tests. A well‐designed workload should have the following characteristics: predictability; repeatability; and scalability. There are many well‐known techniques for achieving better scalability: collocation, caching, pooling and parallelism, to name a few. Performance models are essential not only for prediction but also for interpreting scalability measurements. Many performance modeling tools, such as event‐based simulators and analytic solvers, are based on a queueing paradigm that requires measured service times as modeling inputs. The chapter presents some case studies that demonstrate how USL methodology has been successfully applied.

Book Chapter

Share this book

Add to My Shelf

Parallel and automatic mesh sizing field generation for complicated CAD models

by Liu, Tiantian , Leng, Juelin , Xu, Quan

2024

Journal Article

Share this book

Add to My Shelf

An optimized security solution based on trust value for multithreaded wireless body area network communication

by Anvekar, Dinesh K , Saha, Sanchari in Algorithms , Battlefields , Body area networks

2021

PurposeSecurity of wireless body area network communication is highly important as it directly impacts human life. This paper aims to focus on battlefield application area of WBAN for implementing security where data must be protected against various possible attacks before delivering over a public network.Design/methodology/approachProviding a strong security system is still a research challenge due to low computational power of used sensors for protecting transmission data. In this paper, the authors have proposed an optimized security solution for multithreaded wireless body area network (MWBAN) using trust-based distributed group key management technique to overcome the drawbacks of existing elliptical curve cryptography-homomorphism (ECC-Homomorphism) scheme as well as coded cooperative data exchange group key management (CCDE_GKM) scheme.FindingsThe proposed optimized security solution is implemented for a particular deployment strategy and test runs are conducted. It is found that when number of attack nodes increased to 25, compared to ECC–Homomorphism and CCDE_GKM for the proposed trust-based distributed group key management technique there is an improvement in performance parameters such as throughput is dropped to only 10.11 Kbps, average delay is just 3.4 s, energy consumption is maximum 29 joules, packet loss is only 12.3 per cent, 90.9 per cent truly can detect attack, only 8.9 per cent false attack detection and 84 per cent true negative detection.Social implicationsMedical care can be provided to human beings with much ease and flexibility via remote monitoring. The user can be at any place, can do his/her everyday work while remotely being monitored of their health parameters and secured transmission of their data to the health-care center for medical service in need.Originality/valueThis paper presents an optimized security solution for MWBAN using trust-based distributed group key management technique where bilinear pairing theory is used as major cryptographic base. Optimal key is selected based on trust value and also attack nodes are detected based on trust value to control participation in communication.

Journal Article

Share this book

Add to My Shelf

Parallelization Strategy for Elementary Morphological Operators on Graphs: Distance-Based Algorithms and Implementation on Multicore Shared-Memory Architecture

by Saouli, Rachida , Youkana, Imane , Cousty, Jean in Algorithms , Applications of Mathematics , Computer architecture

2017

This article focuses on the (unweighted) graph-based mathematical morphology operators presented in Cousty et al. (CVIU 117(4):370–385, 2013 ). These operators depend on a size parameter that specifies the number of iterations of elementary dilations/erosions. Thus, the associated running times increase with the size parameter, the algorithms running in O ( λ . n ) time, where n is the size of the underlying graph and λ is the size parameter. In this article, we present distance maps that allow us to recover (by thresholding) all considered dilations and erosions. The algorithms based on distance maps allow the operators to be computed with a single linear O ( n ) time iteration, without any dependence to the size parameter. Then, we investigate a parallelization strategy to compute these distance maps. The idea is to build iteratively the successive level-sets of the distance maps, each level-set being traversed in parallel. Under some reasonable assumptions about the graph and sets to be dilated, our parallel algorithm runs in O ( n / p + K log 2 p ) where n , p , and K are the size of the graph, the number of available processors, and the number of distinct level-sets of the distance map, respectively. Then, implementations of the proposed algorithm on a shared-memory multicore architecture are described and assessed on datasets of 45 images and 6 textured three-dimensional meshes, showing a reduction of the processing time by a factor up to 55 over the previously available implementations on a 8-core architecture.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter