Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Reading Level
      Reading Level
      Clear All
      Reading Level
  • Content Type
      Content Type
      Clear All
      Content Type
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Item Type
    • Is Full-Text Available
    • Subject
    • Publisher
    • Source
    • Donor
    • Language
    • Place of Publication
    • Contributors
    • Location
48 result(s) for "Optimizing compilers."
Sort by:
Space-Time Loop Tiling for Dynamic Programming Codes
We present a new space-time loop tiling approach and demonstrate its application for the generation of parallel tiled code of enhanced locality for three dynamic programming algorithms. The technique envisages that, for each loop nest statement, sub-spaces are first generated so that the intersection of them results in space tiles. Space tiles can be enumerated in lexicographical order or in parallel by using the wave-front technique. Then, within each space tile, time slices are formed, which are enumerated in lexicographical order. Target tiles are represented with multiple time slices within each space tile. We explain the basic idea of space-time loop tiling and then illustrate it by means of an example. Then, we present a formal algorithm and prove its correctness. The algorithm is implemented in the publicly available TRACO compiler. Experimental results demonstrate that parallel codes generated by means of the presented approach outperform closely related manually generated ones or those generated by using affine transformations. The main advantage of code generated by means of the presented approach is its enhanced locality due to splitting each larger space tile into multiple smaller tiles represented with time slices.
Representing Integer Sequences Using Piecewise-Affine Loops
A formal, high-level representation of programs is typically needed for static and dynamic analyses performed by compilers. However, the source code of target applications is not always available in an analyzable form, e.g., to protect intellectual property. To reason on such applications, it becomes necessary to build models from observations of its execution. This paper details an algebraic approach which, taking as input the trace of memory addresses accessed by a single memory reference, synthesizes an affine loop with a single perfectly nested reference that generates the original trace. This approach is extended to support the synthesis of unions of affine loops, useful for minimally modeling traces generated by automatic transformations of polyhedral programs, such as tiling. The resulting system is capable of processing hundreds of gigabytes of trace data in minutes, minimally reconstructing 100% of the static control parts in PolyBench/C applications and 99.99% in the Pluto-tiled versions of these benchmarks. As an application example of the trace modeling method, trace compression is explored. The affine representations built for the memory traces of PolyBench/C codes achieve compression factors of the order of 106 and 103 with respect to gzip for the original and tiled versions of the traces, respectively.
Using an evolutionary approach based on shortest common supersequence problem for loop fusion
In the literature, loop fusion is an effective optimization technique which tries to enhance parallelizing compilers’ performance via memory hierarchy management, and all its competing criteria create an NP-hard problem. This paper proposes an evolutionary algorithm that aims to achieve a profitable loop order which maximizes fusion taking into account register size, parallelism and data reuse advancement. Besides, this method preserves prerequisite relations between the loops by encoding each distinct loop sequence as the shortest common supersequence (SCS) of the related dependence graph. Regarding the related optimization methods that only focus on fusion, this set of metrics, an evolutionary algorithm and also the shortest common supersequence problem have not been considered before in this area. Despite all the envisaged complexities, experimental results confirm the accuracy and advantage of the proposed approach. But due to evolutionary methods effect on raising the compilation time, the proposed algorithm is only applicable when this issue is not prominent, in comparison with the quality of the outcome.
Generation of parallel synchronization-free tiled code
A novel approach to generation of parallel synchronization-free tiled code for the loop nest is presented. It is derived via a combination of the Polyhedral and Iteration Space Slicing frameworks. It uses the transitive closure of loop nest dependence graphs to carry out corrections of original rectangular tiles so that all dependences of the original loop nest are preserved under the lexicographic order of target (corrected) tiles. Then parallel synchronization-free tiled code is generated on the basis of valid (corrected) tiles applying the transitive closure of dependence graphs. The main contribution of the paper is demonstrating that the presented technique is able to generate parallel synchronization-free tiled code, provided that the exact transitive closure of a dependence graph can be calculated and there exist synchronization-free slices on the statement instance level in the loop nest. We show that the presented approach extracts such a parallelism when well-known techniques fail to extract it. Enlarging the scope of loop nests, for which synchronization-free tiled code can be generated, is achieved by means of applying the intersection of extracted slices and generated valid tiles, in contrast to forming slices of valid tiles as suggested in previously published techniques based on the transitive closure of a dependence graph. The presented approach is implemented in the publicly available TC optimizing compiler. Results of experiments demonstrating the effectiveness of the approach and the efficiency of parallel programs generated by means of it are discussed.
Checking inside the black box: regression testing by comparing value spectra
Comparing behaviors of program versions has become an important task in software maintenance and regression testing. Black-box program outputs have been used to characterize program behaviors and they are compared over program versions in traditional regression testing. Program spectra have recently been proposed to characterize a program's behavior inside the black box. Comparing program spectra of program versions offers insights into the internal behavioral differences between versions. In this paper, we present a new class of program spectra, value spectra, that enriches the existing program spectra family. We compare the value spectra of a program's old version and new version to detect internal behavioral deviations in the new version. We use a deviation-propagation call tree to present the deviation details. Based on the deviation-propagation call tree, we propose two heuristics to locate deviation roots, which are program locations that trigger the behavioral deviations. We also use path spectra (previously proposed program spectra) to approximate the program states in value spectra. We then similarly compare path spectra to detect behavioral deviations and locate deviation roots in the new version. We have conducted an experiment on eight C programs to evaluate our spectra-comparison approach. The results show that both value-spectra-comparison and path-spectra-comparison approaches can effectively expose program behavioral differences between program versions even when their program outputs are the same, and our value-spectra-comparison approach reports deviation roots with high accuracy for most programs.
Using hammock graphs to structure programs
Advanced computer architectures rely mainly on compiler optimizations for parallelization, vectorization, and pipelining. Efficient-code generation is based on a control dependence analysis to find the basic blocks and to determine the regions of control. However, unstructured branch statements, such as jumps and goto's, render the control flow analysis difficult, time-consuming, and result in poor code generation. Branches are part of many programming languages and occur in legacy and maintenance code as well as in assembler, intermediate languages, and byte code. A simple and effective technique is presented to convert unstructured branches into hammock graph control structures. Using three basic transformations, an equivalent program is obtained in which all control statements have a well-defined scope. In the interest of predication and branch prediction, the number of control variables has been minimized, thereby allowing a limited code replication. The correctness of the transformations has been proven using an axiomatic proof rule system. With respect to previous work, the algorithm is simpler and the branch conditions are less complex, making the program more readable and the code generation more efficient. Additionally, hammock graphs define single entry single exit regions and therefore allow localized optimizations. The restructuring method has been implemented into the parallelizing compiler FPT and allows to extract parallelism in unstructured programs. The use of hammock graph transformations in other application areas such as vectorization, decompilation, and assembly program restructuring is also demonstrated.
A Compile/Run-time Environment for the Automatic Transformation of Linked List Data Structures
Irregular access patterns are a major problem for today’s optimizing compilers. In this paper, a novel approach will be presented that enables transformations that were designed for regular loop structures to be applied to linked list data structures. This is achieved by linearizing access to a linked list, after which further data restructuring can be performed. Two subsequent optimization paths will be considered: annihilation and sublimation , which are driven by the occurring regular and irregular access patterns in the applications. These intermediate codes are amenable to traditional compiler optimizations targeting regular loops. In the case of sublimation, a run-time step is involved which takes the access pattern into account and thus generates a data instance specific optimized code. Both approaches are applied to a sparse matrix multiplication algorithm and an iterative solver: preconditioned conjugate gradient. The resulting transformed code is evaluated using the major compilers for the x86 platform, GCC and the Intel C compiler.
Is IT for Geeks Only?
For the IT profession to truly excel and flourish, IT professionals must understand how to present information in both technical and nontechnical terms. Editorial board member Wes Chou describes how to move beyond \"geek think\" and truly excel at serving the overall organization, from users to customers.
Unpredication, unscheduling, unspeculation: reverse engineering Itanium executables
EPIC (explicitly parallel instruction computing) architectures, exemplified by the Intel Itanium, support a number of advanced architectural features, such as explicit instruction-level parallelism, instruction predication, and speculative loads from memory. However, compiler optimizations that take advantage of these features can profoundly restructure the program's code, making it potentially difficult to reconstruct the original program logic from an optimized Itanium executable. This paper describes techniques to undo some of the effects of such optimizations and thereby improve the quality of reverse engineering such executables.