Catalogue Search | MBRL

Looping with Disney Pixar Finding Dory

by Loya, Allyssa, author , Pixar (Firm) , Loya, Allyssa. Disney coding adventures in Finding Dory (Motion picture) Juvenile literature. , Finding Dory (Motion picture). , Loop tiling (Computer science) Juvenile literature.

2019

A simple, low-level, unplugged introduction to looping designed for young readers not yet ready for coding on computers. Beloved characters Dory and Nemo, from the world-famous Disney movie Finding Dory, draw in readers new to coding concepts-- Provided by publisher.

Book

Share this book

Add to My Shelf

A Methodology for Efficient Tile Size Selection for Affine Loop Kernels

by Voros, Nikolaos , Djemame, Karim , Keramidas, Georgios in Associativity , Empirical analysis , Fourier transforms

2022

Reducing the number of data accesses in memory hierarchy is of paramount importance on modern computer systems. One of the key optimizations addressing this problem is loop tiling, a well-known loop transformation that enhances data locality in memory hierarchy. The selection of an appropriate tile size is tackled by using both static (analytical) and dynamic empirical (auto-tuning) methods. Current analytical models are not accurate enough to effectively model the complex modern memory hierarchies and loop kernels with diverse characteristics, while auto-tuning methods are either too time-consuming (due to the huge search space) or less accurate (when heuristics are used to reduce the search space). In this paper, we reveal two important inefficiencies of current analytical loop tiling methods and we provide the theoretical background on how current methods can address these inefficiencies. To this end, we propose a new loop tiling method for affine loop kernels where the cache size, cache line size and cache associativity are better utilized, compared to the existing methods. Our evaluation results prove the efficiency of the proposed method in terms of cache misses and execution time, against related works, icc/gcc compilers and Pluto tool, on x86 and ARM based platforms.

Journal Article

Share this book

Add to My Shelf

Nothing loopy about this : what are loops and conditionals?

by Cleary, Brian P., 1959- author , Goneau, Martin, illustrator in Loop tiling (Computer science) Juvenile literature. , Conditionals (Logic) Juvenile literature. , Computer programming Juvenile literature.

Book

Share this book

Add to My Shelf

Time and Energy Benefits of Using Automatic Optimization Compilers for NPDP Tasks

by Palkowski, Marek , Gruzewski, Mateusz in Algorithms , Analysis , Benchmarks

2023

In this article, we analyze the program codes generated automatically using three advanced optimizers: Pluto, Traco, and Dapt, which are specifically tailored for the NPDP benchmark set. This benchmark set comprises ten program loops, predominantly from the field of bioinformatics. The codes exemplify dynamic programming, a challenging task for well-known tools used in program loop optimization. Given the intricacy involved, we opted for three automatic compilers based on the polyhedral model and various loop-tiling strategies. During our evaluation of the code’s performance, we meticulously considered locality and concurrency to accurately estimate time and energy efficiency. Notably, we dedicated significant attention to the latest Dapt compiler, which applies space–time loop tiling to generate highly efficient code for the NPDP benchmark suite loops. By employing the aforementioned optimizers and conducting an in-depth analysis, we aim to demonstrate the effectiveness and potential of automatic transformation techniques in enhancing the performance and energy efficiency of dynamic programming codes.

Journal Article

Share this book

Add to My Shelf

Generating Loop Patterns with a Genetic Algorithm and a Probabilistic Cellular Automata Rule

by Hoffmann, Rolf in Algorithms , Cellular automata , Evolution

2023

The objective is to find a Cellular Automata (CA) rule that can generate “loop patterns”. A loop pattern is given by ones on a zero background showing loops. In order to find out how loop patterns can be locally defined, tentative loop patterns are generated by a genetic algorithm in a preliminary stage. A set of local matching tiles is designed and checked whether they can produce the aimed loop patterns by the genetic algorithm. After having approved a certain set of tiles, a probabilistic CA rule is designed in a methodical way. Templates are derived from the tiles, which then are used in the CA rule for matching. In order to drive the evolution to the desired patterns, noise is injected if the templates do not match or other constraints are not fulfilled. Simulations illustrate that loops and connected loops can be evolved by the CA rule.

Journal Article

Share this book

Add to My Shelf

Space-Time Loop Tiling for Dynamic Programming Codes

by Palkowski, Marek , Bielecki, Wlodzimierz in Affine transformations , Algorithms , Codes

2021

We present a new space-time loop tiling approach and demonstrate its application for the generation of parallel tiled code of enhanced locality for three dynamic programming algorithms. The technique envisages that, for each loop nest statement, sub-spaces are first generated so that the intersection of them results in space tiles. Space tiles can be enumerated in lexicographical order or in parallel by using the wave-front technique. Then, within each space tile, time slices are formed, which are enumerated in lexicographical order. Target tiles are represented with multiple time slices within each space tile. We explain the basic idea of space-time loop tiling and then illustrate it by means of an example. Then, we present a formal algorithm and prove its correctness. The algorithm is implemented in the publicly available TRACO compiler. Experimental results demonstrate that parallel codes generated by means of the presented approach outperform closely related manually generated ones or those generated by using affine transformations. The main advantage of code generated by means of the presented approach is its enhanced locality due to splitting each larger space tile into multiple smaller tiles represented with time slices.

Journal Article

Share this book

Add to My Shelf

A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details

by Kelefouras, Vasilios in Artificial Intelligence , Associativity , Compilers

2017

Today’s compilers have a plethora of optimizations-transformations to choose from, and the correct choice, order as well parameters of transformations have a significant/large impact on performance; choosing the correct order and parameters of optimizations has been a long standing problem in compilation research, which until now remains unsolved; the separate sub-problems optimization gives a different schedule/binary for each sub-problem and these schedules cannot coexist, as by refining one degrades the other. Researchers try to solve this problem by using iterative compilation techniques but the search space is so big that it cannot be searched even by using modern supercomputers. Moreover, compiler transformations do not take into account the hardware architecture details and data reuse in an efficient way. In this paper, a new iterative compilation methodology is presented which reduces the search space of six compiler transformations by addressing the above problems; the search space is reduced by many orders of magnitude and thus an efficient solution is now capable to be found. The transformations are the following: loop tiling (including the number of the levels of tiling), loop unroll, register allocation, scalar replacement, loop interchange and data array layouts. The search space is reduced (a) by addressing the aforementioned transformations together as one problem and not separately, (b) by taking into account the custom hardware architecture details (e.g., cache size and associativity) and algorithm characteristics (e.g., data reuse). The proposed methodology has been evaluated over iterative compilation and gcc/icc compilers, on both embedded and general purpose processors; it achieves significant performance gains at many orders of magnitude lower compilation time.

Journal Article

Share this book

Add to My Shelf

Memory-Optimized Wavefront Parallelism on GPUs

by Li, Yuanzhe , Schwiebert, Loren in Alignment , Configuration management , Harnesses

2020

Wavefront parallelism is a well-known technique for exploiting the concurrency of applications that execute nested loops with uniform data dependencies. Recent research of such applications, which range from sequence alignment tools to partial differential equation solvers, has used GPUs to benefit from the massively parallel computing resources. To achieve optimal performance, tiling has been introduced as a popular solution to achieve a balanced workload. However, the use of hyperplane tiles increases the cost of synchronization and leads to poor data locality. In this paper, we present a highly optimized implementation of the wavefront parallelism technique that harnesses the GPU architecture. A balanced workload and maximum resource utilization are achieved with an extremely low synchronization overhead. We design the kernel configuration to significantly reduce the minimum number of synchronizations required and also introduce an inter-block lock to minimize the overhead of each synchronization. In addition, shared memory is used in place of the L1 cache. The well-tailored mapping of the operations to the shared memory improves both spatial and temporal locality. We evaluate the performance of our proposed technique for four different applications: sequence alignment, edit distance, summed-area table, and 2D-SOR. The performance results demonstrate that our method achieves speedups of up to six times compared to the previous best-known hyperplane tiling-based GPU implementation.

Journal Article

Share this book

Add to My Shelf

Parallel Tiled Code for Computing General Linear Recurrence Equations

by Bielecki, Włodzimierz , Błaszyński, Piotr in Chemical partition , Computation , Multidimensional data

2021

In this article, we present a technique that allows us to generate parallel tiled code to calculate general linear recursion equations (GLRE). That code deals with multidimensional data and it is computing-intensive. We demonstrate that data dependencies available in an original code computing GLREs do not allow us to generate any parallel code because there is only one solution to the time partition constraints built for that program. We show how to transform the original code to another one that exposes dependencies such that there are two linear distinct solutions to the time partition restrictions derived from these dependencies. This allows us to generate parallel 2D tiled code computing GLREs. The wavefront technique is used to achieve parallelism, and the generated code conforms to the OpenMP C/C++ standard. The experiments that we conducted with the resulting parallel 2D tiled code show that this code is much more efficient than the original serial code computing GLREs. Code performance improvement is achieved by allowing parallelism and better locality of the target code.

Journal Article

Share this book

Add to My Shelf

Revisiting the Parallel Strategy for DOACROSS Loops

by Cui, Yuan-Zhen , Zhang, Dong , Wu, Wei-Guo in Artificial Intelligence , Communication , Computer engineering

2019

DOACROSS loops are significant parts in many important scientific and engineering applications, which are generally exploited pipeline/wave-front parallelism by loop transformations. However, previous work almost statically performs iterations in parallel threads, thus causing a waste of computing resources in thread synchronization. This paper proposes a brand-new parallel strategy for DOACROSS loops that provides a dynamic task assignment with reduced dependences to achieve wave-front parallelism through loop tiling. The proposed strategy uses a master-slave parallel mode and some customized structures to realize dynamic and flexible parallelization, which effectively avoids threads from waiting in communication. An efficient tile size selection (TSS) approach is also proposed to preserve data reuse in cache for tiled codes. The experimental results show that the proposed parallel strategy obtains good and stable speedups over six typical benchmarks with different problem sizes and different numbers of threads on an Intel ® Xeon ® 32-core server. And it outperforms two static strategies, a barrier-based strategy and a post/wait-based strategy, by 32% and 20% in average performance, respectively. This strategy also yields a better performance than a mutex-based dynamic strategy. Besides, it has been demonstrated that the proposed TSS approach can achieve a near-optimal performance and is comparable with a state-of-the-art TSS approach.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter