Asset Details

MbrlCatalogueTitleDetail

Do you wish to reserve the book?

Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks

by Castelló, Adrián , Quintana-Ortí, Enrique S , Catalán, Mar , Dolz, Manuel F , Duato, José

in Algorithms / Artificial neural networks / Clusters / Data communication / Energy consumption / Impact analysis / Libraries / Message passing / Microprocessors / Neural networks / Optimization / Training

2023

Yes Please

Hey, we have placed the reservation for you!

By the way, why not check out events that you can attend while you pick your title.

Oops! Something went wrong.

Looks like we were not able to place the reservation. Kindly try again later.

Are you sure you want to remove the book from the shelf?

Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks

by Castelló, Adrián , Quintana-Ortí, Enrique S , Catalán, Mar , Dolz, Manuel F , Duato, José

2023

Confirm

Do you wish to request the book?

Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks

by Castelló, Adrián , Quintana-Ortí, Enrique S , Catalán, Mar , Dolz, Manuel F , Duato, José

2023

Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy

How would you like to get it?

Submit

We have requested the book for you!

Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.

Oops! Something went wrong.

Looks like we were not able to place your request. Kindly try again later.

Journal Article

Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks

Castelló, Adrián,

Quintana-Ortí, Enrique S,

Catalán, Mar,

Dolz, Manuel F,

Duato, José

2023

Overview

For many distributed applications, data communication poses an important bottleneck from the points of view of performance and energy consumption. As more cores are integrated per node, in general the global performance of the system increases yet eventually becomes limited by the interconnection network. This is the case for distributed data-parallel training of convolutional neural networks (CNNs), which usually proceeds on a cluster with a small to moderate number of nodes. In this paper, we analyze the performance of the Allreduce collective communication primitive, a key to the efficient data-parallel distributed training of CNNs. Our study targets the distinct realizations of this primitive in three high performance instances of Message Passing Interface (MPI), namely MPICH, OpenMPI, and IntelMPI, and employs a cluster equipped with state-of-the-art processor and network technologies. In addition, we apply the insights gained from the experimental analysis to the optimization of the TensorFlow framework when running on top of Horovod. Our study reveals that a careful selection of the most convenient MPI library and Allreduce (ARD) realization accelerates the training throughput by a factor of 1.2× compared with the default algorithm in the same MPI library, and up to 2.8× when comparing distinct MPI libraries in a number of relevant combinations of CNN model+dataset.

Share this book

Add to My Shelf

Publisher

Springer Nature B.V

Subject

Algorithms

/ Artificial neural networks