Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks
by
Castelló, Adrián
, Quintana-Ortí, Enrique S
, Catalán, Mar
, Dolz, Manuel F
, Duato, José
in
Algorithms
/ Artificial neural networks
/ Clusters
/ Data communication
/ Energy consumption
/ Impact analysis
/ Libraries
/ Message passing
/ Microprocessors
/ Neural networks
/ Optimization
/ Training
2023
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks
by
Castelló, Adrián
, Quintana-Ortí, Enrique S
, Catalán, Mar
, Dolz, Manuel F
, Duato, José
in
Algorithms
/ Artificial neural networks
/ Clusters
/ Data communication
/ Energy consumption
/ Impact analysis
/ Libraries
/ Message passing
/ Microprocessors
/ Neural networks
/ Optimization
/ Training
2023
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks
by
Castelló, Adrián
, Quintana-Ortí, Enrique S
, Catalán, Mar
, Dolz, Manuel F
, Duato, José
in
Algorithms
/ Artificial neural networks
/ Clusters
/ Data communication
/ Energy consumption
/ Impact analysis
/ Libraries
/ Message passing
/ Microprocessors
/ Neural networks
/ Optimization
/ Training
2023
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks
Journal Article
Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks
2023
Request Book From Autostore
and Choose the Collection Method
Overview
For many distributed applications, data communication poses an important bottleneck from the points of view of performance and energy consumption. As more cores are integrated per node, in general the global performance of the system increases yet eventually becomes limited by the interconnection network. This is the case for distributed data-parallel training of convolutional neural networks (CNNs), which usually proceeds on a cluster with a small to moderate number of nodes. In this paper, we analyze the performance of the Allreduce collective communication primitive, a key to the efficient data-parallel distributed training of CNNs. Our study targets the distinct realizations of this primitive in three high performance instances of Message Passing Interface (MPI), namely MPICH, OpenMPI, and IntelMPI, and employs a cluster equipped with state-of-the-art processor and network technologies. In addition, we apply the insights gained from the experimental analysis to the optimization of the TensorFlow framework when running on top of Horovod. Our study reveals that a careful selection of the most convenient MPI library and Allreduce (ARD) realization accelerates the training throughput by a factor of 1.2× compared with the default algorithm in the same MPI library, and up to 2.8× when comparing distinct MPI libraries in a number of relevant combinations of CNN model+dataset.
Publisher
Springer Nature B.V
This website uses cookies to ensure you get the best experience on our website.