Asset Details

MbrlCatalogueTitleDetail

Dissertation

Improving reliability and performance of high performance computing applications

Ma, Hongyi

2015

Overview

Because of the growing popularity of parallel programming in multi-core/multiprocessor and multithreaded hardware, more and more applications are implemented in the well-written concurrent programming model. These programming models are MPI, OpenMP and Hybrid MPI/OpenMP. However, developing concurrent programs is extremely difficult. Concurrency introduces the possibility of errors that do not happen in traditional sequential programs, such as data race, deadlock and thread-safety issues. In addition, the performance issue of concurrent programs is another research area. This dissertation presents an integrated static and dynamic program analysis framework to address these concurrent issues in the OpenMP multithreaded application and hybrid OpenMP/MPI programming model. This dissertation also introduces the approach to reallocating the computing resources to improve the performance of MPI parallel programs in the container-based virtual cloud. First, we present the OpenMP Analysis Toolkit (OAT), which uses Satisfiability Modulo Theories (SMT) solver based symbolic analysis to detect data races and deadlocks in OpenMP applications. Our approach approximately simulates the real execution schedule of an OpenMP program through schedule permutation with partial order reduction to improve the analysis efficiency. We conducted experiments on real-world OpenMP benchmarks by comparing our OAT tool with two commercial dynamic analysis tools: Intel Thread Checker and Sun Thread Analyzer, and one commercial static analysis tool: Viva64 PVS Studio. The experiments show that our symbolic analysis approach is more accurate than static analysis and more efficient and scalable than dynamic analysis tools with less false positives and negatives. The second part of the dissertation proposes an approach by integrating static and dynamic program analyses to check thread-safety violations in hybrid MPI/OpenMP programs. We use an innovative method to transform the thread-safety violation problems in race conditions. In our approach, the static analysis identifies a list of MPI calls related to thread-safety violations, then replaces them with our own MPI wrappers, which access specific shared variables. The static analysis avoids instrumenting unrelated code, which significantly reduces runtime overhead. In the dynamic analysis, both happen-before and lockset-based race detection algorithms are used to check races on these aforementioned shared variables. By checking races, we can identify thread-safety violations according to their specifications. Our experimental evaluation over real-world applications shows that our approach is both accurate and efficient. Finally, the dissertation describes an approach that uses adaptive resource management enabled by container-based virtualization techniques to automatically tune performance of MPI programs in the cloud. Specifically, the containers running on physical hosts can dynamically allocate CPU resources to MPI processes according to the current program execution state and system resource status. High Performance Computing (HPC) in the cloud has great potential as an effective and convenient option for users to launch HPC applications. However, there are still many open problems to be solved in order for the cloud to be more amenable to HPC applications. In order to tune the performance of MPI applications during runtime, many traditional techniques try to balance the workloads by distributing datasets approximately equally to all computing nodes. However, the computing resource imbalance may still arise from data skew, and it is nontrivial to foresee such imbalances beforehand. The resource allocation among MPI processes are adjusted in two ways: the intra-host level, which dynamically adjusts resources within a host; and the inter-host level, which migrates containers together with MPI processes from one host to another host. We have implemented and evaluated our approach on the Amazon EC2 platform using real-world scientific benchmarks and applications, which demonstrates that the performance can be improved up to 31.1% (with an average of 15.6%) when comparing with the baseline of application runtime.

Share this book

Add to My Shelf

Publisher

ProQuest Dissertations & Theses

Subject

Computer science

ISBN

1339054930, 9781339054933