Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Series TitleSeries Title
-
Reading LevelReading Level
-
YearFrom:-To:
-
More FiltersMore FiltersContent TypeItem TypeIs Full-Text AvailableSubjectPublisherSourceDonorLanguagePlace of PublicationContributorsLocation
Done
Filters
Reset
676
result(s) for
"Supercomputers Programming."
Sort by:
High Performance Computing
2010
Covering all three levels of parallelism, this book presents techniques that address performance issues in the programming of HPC applications. Drawing on their experience with chips from AMD and systems, interconnects, and software from Cray Inc., the authors explore the problems that create bottlenecks in attaining good performance. After discussing architectural and software challenges, they outline a strategy for porting and optimizing an existing application to a large MPP system. They also introduce the use of GPGPUs for carrying out HPC computations.
Development of Parallel Methods for a$1024$ -Processor Hypercube
by
Benner, Robert E.
,
Montry, Gary R.
,
Gustafson, John L.
in
Algorithms
,
Applied sciences
,
Approximation
1988
We have developed highly efficient parallel solutions for three practical, full-scale scientific problems: wave mechanics, fluid dynamics, and structural analysis. Several algorithmic techniques are used to keep communication and serial overhead small as both problem size and number of processors are varied. A new parameter, operation efficiency, is introduced that quantifies the tradeoff between communication and redundant computation. A 1024-processor MIMD ensemble is measured to be 502 to 637 times as fast as a single processor when problem size for the ensemble is fixed, and 1009 to 1020 times as fast as a single processor when problem size per processor is fixed. The latter measure, denoted scaled speedup, is developed and contrasted with the traditional measure of parallel speedup. The scaled-problem paradigm better reveals the capabilities of large ensembles, and permits detection of subtle hardware-induced load imbalances (such as error correction and data-dependent MFLOPS rates) that may become increasingly important as parallel processors increase in node count. Sustained performance for the applications is 70 to 130 MFLOPS, validating the massively parallel ensemble approach as a practical alternative to more conventional processing methods. The techniques presented appear extensible to even higher levels of parallelism than the 1024-processor level explored here.
Journal Article
A Note on Downdating the Cholesky Factorization
by
van Dooren, P.
,
Brent, R. P.
,
de Hoog, F. R.
in
Algorithms
,
Error analysis
,
Exact sciences and technology
1987
We analyse and compare three algorithms for \"downdating\" the Cholesky factorization of a positive definite matrix. Although the algorithms are closely related, their numerical properties differ. Two algorithms are stable in a certain \"mixed\" sense while the other is unstable. In addition to comparing the numerical properties of the algorithms, we compare their computational complexity and their suitability for implementation on parallel or vector computers.
Journal Article
A Parallel and Vector Variant of the Cyclic Reduction Algorithm
1988
The Buneman variant of the block cyclic reduction algorithm begins as a highly parallel algorithm, but collapses with each reduction to a very serial one. Using partial fraction expansions of rational matrix functions, it is shown how to regain the parallelism. The resulting algorithm using $n^2 $ processors runs in $O(\\log ^2 n)$ time.
Journal Article
A Parallel Triangular Solver for a Distributed-Memory Multiprocessor
by
Li, Guangye
,
Coleman, Thomas F.
in
Algorithms
,
Applied mathematics
,
Exact sciences and technology
1988
We consider solving triangular systems of linear equations on a distributed-memory multiprocessor which allows for a ring embedding. Specifically, we propose a parallel algorithm, applicable when the triangular matrix is distributed by column in a wrap fashion. Numerical experiments indicate that the new algorithm is very efficient in some circumstances (in particular, when the size of the problem is sufficiently large relative to the number of processors). A theoretical analysis confirms that the total running time varies linearly, with respect to the matrix order, up to a threshold value of the matrix order, after which the dependence is quadratic. Moreover, we show that total message traffic is essentially the minimum possible. Finally, we describe an analogous row-oriented algorithm.
Journal Article
A Nearly Optimal Parallel Algorithm for Constructing Depth First Spanning Trees in Planar Graphs
by
He, Xin
,
Yesha, Yaacov
in
Algorithmics. Computability. Computer arithmetics
,
Algorithms
,
Applied sciences
1988
This paper presents a parallel algorithm for constructing depth first spanning trees in planar graphs. The algorithm takes $O(\\log ^2 n)$ time with $O(n)$ processors on a concurrent read concurrent write parallel random access machine (PRAM). The best previously known algorithm for the problem takes $O(\\log ^3 n)$ time with $O(n^4 )$ processors on a PRAM. Our algorithm is within an $O(\\log ^2 n)$ factor of optimality.
Journal Article
On Maintaining Dynamic Information in a Concurrent Environment
This paper considers the amount of cooperation required for independent asynchronous processes to share a simple dynamic data structure. We present a scheme for designing efficient concurrent algorithms to add and remove elements from a shared pool of elements. The efficiency is measured mainly by the number of non-local operations that a process may have to make. Non-local operations may involve writing into a shared variable, locking, or sending a message, hence they introduce interference (or require cooperation). We derive upper and lower bounds on the interference in the worst case. Applications to distributed computation are also discussed.
Journal Article
Necessary and Sufficient Conditions for the Existence of Local Matrix Decompositions
1988
Let $D = ( V,E )$ be a directed graph with $n$ vertices. We define the notion of a local matrix with respect to $D$ and we show that every $n \\times n$ matrix, over the real or complex numbers, can be factored into a product of local matrices with respect to $D$ if and only if $D$ is strongly connected and contains all loops. We discuss the significance of this result with respect to parallel computation of linear transforms on SIMD processor arrays. We observe that the result can be used to associate with certain irreducible $n \\times n$ matrices a generating set of the semigroup of all $n \\times n$ matrices under matrix multiplication.
Journal Article
A Fully Parallel Algorithm for the Symmetric Eigenvalue Problem
1987
In this paper we present a parallel algorithm for the symmetric algebraic eigenvalue problem. The algorithm is based upon a divide and conquer scheme suggested by Cuppen for computing the eigensystem of a symmetric tridiagonal matrix. We extend this idea to obtain a parallel algorithm that retains a number of active parallel processes that is greater than or equal to the initial number throughout the course of the computation. We give a new deflation technique which together with a robust root finding technique will assure computation of an eigensystem to full accuracy in the residuals and in the orthogonality of eigenvectors. A brief analysis of the numerical properties and sensitivity to round off error is presented to indicate where numerical difficulties may occur. The algorithm is able to exploit parallelism at all levels of the computation and is well suited to a variety of architectures. Computational results are presented for several machines. These results are very encouraging with respect to both accuracy and speedup. A surprising result is that the parallel algorithm, even when run in serial mode, can be significantly faster than the previously best sequential algorithm on large problems, and is effective on moderate size problems when run in serial mode.
Journal Article