Catalogue Search | MBRL

MPI-RCDD: A Framework for MPI Runtime Communication Deadlock Detection

by Yu, Kang , Fang, Yan-Fei , Qing, Peng in Algorithms , Analysis , Artificial Intelligence

2020

The message passing interface (MPI) has become a de facto standard for programming models of high-performance computing, but its rich and flexible interface semantics makes the program easy to generate communication deadlock, which seriously affects the usability of the system. However, the existing detection tools for MPI communication deadlock are not scalable enough to adapt to the continuous expansion of system scale. In this context, we propose a framework for MPI runtime communication deadlock detection, namely MPI-RCDD, which contains three kinds of main mechanisms. Firstly, MPI-RCDD has a message logging protocol that is associated with deadlock detection to ensure that the communication messages required for deadlock analysis are not lost. Secondly, it uses the asynchronous processing thread provided by the MPI to implement the transfer of dependencies between processes, so that multiple processes can participate in deadlock detection simultaneously, thus alleviating the performance bottleneck problem of centralized analysis. In addition, it uses an AND⊕OR model based algorithm named AODA to perform deadlock analysis work. The AODA algorithm combines the advantages of both timeout-based and dependency-based deadlock analysis approaches, and allows the processes in the timeout state to search for a deadlock circle or knot in the process of dependency transfer. Further, the AODA algorithm cannot lead to false positives and can represent the source of the deadlock accurately. The experimental results on typical MPI communication deadlock benchmarks such as Umpire Test Suit demonstrate the capability of MPI-RCDD. Additionally, the experiments on the NPB benchmarks obtain the satisfying performance cost, which show that the MPI-RCDD has strong scalability.

Journal Article

Share this book

Add to My Shelf

PyCAC: The concurrent atomistic-continuum simulation environment

by Xu, Shuozhi , Xiong, Liming , McDowell, David L. in Aerospace engineering , Algorithms , Applied and Technical Physics

2018

We present a novel distributed-memory parallel implementation of the concurrent atomistic-continuum (CAC) method. Written mostly in Fortran 2008 and wrapped with a Python scripting interface, the CAC simulator in PyCAC runs in parallel using Message Passing Interface with a spatial decomposition algorithm. Built upon the underlying Fortran code, the Python interface provides a robust and versatile way for users to build system configurations, run CAC simulations, and analyze results. In this paper, following a brief introduction to the theoretical background of the CAC method, we discuss the serial algorithms of dynamic, quasistatic, and hybrid CAC, along with some programming techniques used in the code. We then illustrate the parallel algorithm, quantify the parallel scalability, and discuss some software specifications of PyCAC; more information can be found in the PyCAC user’s manual that is hosted on www.pycac.org.

Journal Article

Share this book

Add to My Shelf

Distributed Singular Value Decomposition Method for Fast Data Processing in Recommendation Systems

by Przystupa, Krzysztof , Beshley, Mykola , Selech, Jarosław in big data (BD) , distributed systems (DS) , hadoop

2021

The problem of analyzing a big amount of user data to determine their preferences and, based on these data, to provide recommendations on new products is important. Depending on the correctness and timeliness of the recommendations, significant profits or losses can be obtained. The task of analyzing data on users of services of companies is carried out in special recommendation systems. However, with a large number of users, the data for processing become very big, which causes complexity in the work of recommendation systems. For efficient data analysis in commercial systems, the Singular Value Decomposition (SVD) method can perform intelligent analysis of information. With a large amount of processed information we proposed to use distributed systems. This approach allows reducing time of data processing and recommendations to users. For the experimental study, we implemented the distributed SVD method using Message Passing Interface, Hadoop and Spark technologies and obtained the results of reducing the time of data processing when using distributed systems compared to non-distributed ones.

Journal Article

Share this book

Add to My Shelf

An MPI-based parallel genetic algorithm for multiple geographical feature label placement based on the hybrid of fixed-sliding models

by Guo, Zhiyong , Lessani, M. Naser , Deng, Jiqiu in Algorithms , fixed position , Genetic algorithms

2025

Multiple Geographical Feature Label Placement (MGFLP) has been a fundamental problem in geographic information visualization for decades. Moreover, the nature of label positioning has proven to be an Nondeterministic polynomial-time hard (NP-hard) problem. Although advances in computer technology and robust approaches have addressed the problem of label positioning, the lengthy running time of MGFLP has not been a major focus of recent studies. Based on a hybrid of the fixed-position and sliding models, a Message Passing Interface (MPI) parallel genetic algorithm is proposed in the present study for MGFLP to label mixed types of geographical features. To evaluate the quality of label placement, a quality function is defined based on four quality metrics: label-feature conflict; label-label conflict; label association with the corresponding feature; label position priority for all three types of features. The experimental results show that the proposed algorithm outperforms the DDEGA, DDEGA-NM, and Parallel-MS in both label placement quality and computation time efficiency. Across three datasets, compared to Parallel-MS, running times decreased from 118.45 to 8.34, 45.98 to 3.51, and 20.01 to 0.43 min, with further reductions in label-label and label-feature conflicts.

Journal Article

Share this book

Add to My Shelf

An O(log2N) Fully-Balanced Resampling Algorithm for Particle Filters on Distributed Memory Architectures

by Varsi, Alessandro , Spirakis, Paul G. , Maskell, Simon in Accuracy , Algorithms , Complexity

2021

Resampling is a well-known statistical algorithm that is commonly applied in the context of Particle Filters (PFs) in order to perform state estimation for non-linear non-Gaussian dynamic models. As the models become more complex and accurate, the run-time of PF applications becomes increasingly slow. Parallel computing can help to address this. However, resampling (and, hence, PFs as well) necessarily involves a bottleneck, the redistribution step, which is notoriously challenging to parallelize if using textbook parallel computing techniques. A state-of-the-art redistribution takes O((log2N)2) computations on Distributed Memory (DM) architectures, which most supercomputers adopt, whereas redistribution can be performed in O(log2N) on Shared Memory (SM) architectures, such as GPU or mainstream CPUs. In this paper, we propose a novel parallel redistribution for DM that achieves an O(log2N) time complexity. We also present empirical results that indicate that our novel approach outperforms the O((log2N)2) approach.

Journal Article

Share this book

Add to My Shelf

A practical guide to replica-exchange Wang-Landau simulations

by P Landau, David , Wai Li, Ying , Vogel, Thomas in Algorithms , Message passing , Parallel programming

2018

This paper is based on a series of tutorial lectures about the replica-exchange Wang-Landau (REWL) method given at the IX Brazilian Meeting on Simulational Physics (BMSP 2017). It provides a practical guide for the implementation of the method. A complete example code for a model system is available online. In this paper, we discuss the main parallel features of this code after a brief introduction to the REWL algorithm. The tutorial section is mainly directed at users who have written a single-walker Wang-Landau program already but might have just taken their first steps in parallel programming using the Message Passing Interface (MPI). In the last section, we answer \"frequently asked questions\" from users about the implementation of REWL for different scientific problems.

Journal Article

Share this book

Add to My Shelf

Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks

by Castelló, Adrián , Quintana-Ortí, Enrique S , Catalán, Mar in Algorithms , Artificial neural networks , Clusters

2023

For many distributed applications, data communication poses an important bottleneck from the points of view of performance and energy consumption. As more cores are integrated per node, in general the global performance of the system increases yet eventually becomes limited by the interconnection network. This is the case for distributed data-parallel training of convolutional neural networks (CNNs), which usually proceeds on a cluster with a small to moderate number of nodes. In this paper, we analyze the performance of the Allreduce collective communication primitive, a key to the efficient data-parallel distributed training of CNNs. Our study targets the distinct realizations of this primitive in three high performance instances of Message Passing Interface (MPI), namely MPICH, OpenMPI, and IntelMPI, and employs a cluster equipped with state-of-the-art processor and network technologies. In addition, we apply the insights gained from the experimental analysis to the optimization of the TensorFlow framework when running on top of Horovod. Our study reveals that a careful selection of the most convenient MPI library and Allreduce (ARD) realization accelerates the training throughput by a factor of 1.2× compared with the default algorithm in the same MPI library, and up to 2.8× when comparing distinct MPI libraries in a number of relevant combinations of CNN model+dataset.

Journal Article

Share this book

Add to My Shelf

Numerical Prediction of Local Meteorological Processes above a City with a Supercomputer

by Kizhner, L I , Starchenko, A V , Danilkin, E A in Algorithms , Mathematical models , Message passing

2021

This paper presents a parallel algorithm for numerical solution of equations for the non-hydrostatic mesoscale meteorological model TSUNM3. To justify the choice of the Message Passing Interface technology, computational experiments were performed to compare the effectiveness of the following parallelizing technologies: MPI, OpenMP, OpenACC, and CUDA. 2D decomposition of the gridded domain for the TSUNM3 model has been carried out with MPI technology. It allows for a numerical forecast for the next day in 17 minutes of CPU time on 144 cores of the TSU Cyberia computing cluster.

Journal Article

Share this book

Add to My Shelf

Static Analysis Techniques for Fixing Software Defects in MPI-Based Parallel Programs

by Sharaf, Sanaa Abdullah , Eassa, Fathy Elbouraey , Al-Johany, Norah Abdullah in Defects , Distributed memory , Message passing

2024

The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memory systems. However, MPI implementations can contain defects that impact the reliability and performance of parallel applications. Detecting and correcting these defects is crucial, yet there is a lack of published models specifically designed for correcting MPI defects. To address this, we propose a model for detecting and correcting MPI defects (DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blocking point-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defects addressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and message mismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a dataset consisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes, resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI model ranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correcting defects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. The DC_MPI model fills an important research gap and provides a valuable tool for improving the quality of MPI-based parallel computing systems.

Journal Article

Share this book

Add to My Shelf

An Efficient Parallel Multi-Scale Segmentation Method for Remote Sensing Imagery

by Liu, Zhengjun , Soergel, Uwe , Han, Yanshun in Airborne sensing , Algorithms , fractal net evolution approach

2018

Remote sensing (RS) image segmentation is an essential step in geographic object-based image analysis (GEOBIA) to ultimately derive “meaningful objects”. While many segmentation methods exist, most of them are not efficient for large data sets. Thus, the goal of this research is to develop an efficient parallel multi-scale segmentation method for RS imagery by combining graph theory and the fractal net evolution approach (FNEA). Specifically, a minimum spanning tree (MST) algorithm in graph theory is proposed to be combined with a minimum heterogeneity rule (MHR) algorithm that is used in FNEA. The MST algorithm is used for the initial segmentation while the MHR algorithm is used for object merging. An efficient implementation of the segmentation strategy is presented using data partition and the “reverse searching-forward processing” chain based on message passing interface (MPI) parallel technology. Segmentation results of the proposed method using images from multiple sensors (airborne, SPECIM AISA EAGLE II, WorldView-2, RADARSAT-2) and different selected landscapes (residential/industrial, residential/agriculture) covering four test sites indicated its efficiency in accuracy and speed. We conclude that the proposed method is applicable and efficient for the segmentation of a variety of RS imagery (airborne optical, satellite optical, SAR, high-spectral), while the accuracy is comparable with that of the FNEA method.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter