Catalogue Search | MBRL

Minimal Repetition Dynamic Checkpointing Algorithm for Unsteady Adjoint Calculation

by Iaccarino, Gianluca , Wang, Qiqi , Moin, Parviz in Adjoints , Algorithms , Checkpointing

2009

Adjoint equations of differential equations have seen widespread applications in optimization, inverse problems, and uncertainty quantification. A major challenge in solving adjoint equations for time dependent systems has been the need to use the solution of the original system in the adjoint calculation and the associated memory requirement. In applications where storing the entire solution history is impractical, checkpointing methods have frequently been used. However, traditional checkpointing algorithms such as revolve require a priori knowledge of the number of time steps, making these methods incompatible with adaptive time stepping. We propose a dynamic checkpointing algorithm applicable when the number of time steps is a priori unknown. Our algorithm maintains a specified number of checkpoints on the fly as time integration proceeds for an arbitrary number of time steps. The resulting checkpoints at any snapshot during the time integration have the optimal repetition number. The efficiency of our algorithm is demonstrated both analytically and experimentally in solving adjoint equations. This algorithm also has significant advantage in automatic differentiation when the length of execution is variable.

Journal Article

Share this book

Add to My Shelf

LLMQ: Efficient Lower-Precision Pretraining for Consumer GPUs

by Schultheis, Erik , Alistarh, Dan in Checkpointing

2025

We present LLMQ, an end-to-end CUDA/C++ implementation for medium-sized language-model training, e.g. 3B to 32B parameters, on affordable, commodity GPUs. These devices are characterized by low memory availability and slow communication compared to datacentre-grade GPUs. Consequently, we showcase a range of optimizations that target these bottlenecks, including activation checkpointing, offloading, and copy-engine based collectives. LLMQ is able to train or fine-tune a 7B model on a single 16GB mid-range gaming card, or a 32B model on a workstation equipped with 4 RTX 4090s. This is achieved while executing a standard 8-bit training pipeline, without additional algorithmic approximations, and maintaining FLOP utilization of around 50%. The efficiency of LLMQ rivals that of production-scale systems on much more expensive cloud-grade GPUs.

Paper

Share this book

Add to My Shelf

ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development

by Peng, Yanghua , Lin, Haibin , Wu, Chuan in Checkpointing

2025

Checkpointing to preserve training states is crucial during the development of Large Foundation Models (LFMs), for training resumption upon various failures or changes in GPU resources and parallelism configurations. In addition, saved checkpoints are dispatched to evaluation tasks or transferred across different training stages (e.g., from pre-training to post-training). All these scenarios require resharding distributed checkpoints from one parallelism to another. In production environments, different LFMs are trained with various frameworks and storage backends, depending on model sizes and training scales. A high-performance checkpointing system is needed to enable efficient checkpoint management at scale throughout the lifecycle of LFM development. We introduce ByteCheckpoint, an industrial-grade checkpointing system for large-scale LFM training. ByteCheckpoint features: a parallelism-agnostic checkpoint representation that enables efficient load-time checkpoint resharding; a generic checkpoint saving/loading workflow to accommodate multiple training frameworks and support different storage backends; full-stack optimizations to ensure high I/O efficiency and scalability; a suite of monitoring tools to streamline large-scale performance analysis and bottleneck detection. Compared to existing open-source checkpointing systems [52, 58], ByteCheckpoint significantly reduces runtime checkpoint stalls, achieving an average reduction of 54.20x. For saving and loading times, ByteCheckpoint achieves improvements of up to 9.96x and 8.80x, respectively.

Paper

Share this book

Add to My Shelf

Boost Training Goodput: How Continuous Checkpointing Optimizes Reliability in Orbax and MaxText

in Checkpointing

2026

Web Resource

Share this book

Add to My Shelf

Implementation-Oblivious Transparent Checkpoint-Restart for MPI

by Xu, Yao , Belyaev, Leonid , Cooperman, Gene in Checkpointing

2023

This work presents experience with traditional use cases of checkpointing on a novel platform. A single codebase (MANA) transparently checkpoints production workloads for major available MPI implementations: \"develop once, run everywhere\". The new platform enables application developers to compile their application against any of the available standards-compliant MPI implementations, and test each MPI implementation according to performance or other features.

Paper

Share this book

Add to My Shelf

A Convolution Neural Network-Based Seed Classification System

by Journaux, Ludovic , Hamid, Yasir , Alwan, Ali A. in Agricultural sciences , Agriculture , Artificial Intelligence

2020

Over the last few years, the research into agriculture has gained momentum, showing signs of rapid growth. The latest to appear on the scene is bringing convenience in how agriculture can be done by employing various computational technologies. There are lots of factors that affect agricultural production, with seed quality topping the list. Seed classification can provide additional knowledge about quality production, seed quality control and impurity identification. The process of categorising seeds has been traditionally done based on characteristics like colour, shape and texture. Generally, this is performed by specialists by visually inspecting each sample, which is a very tedious and time-consuming task. This procedure can be easily automated, providing a significantly more efficient method for seed sorting than having them be inspected using human labour. In related areas, computer vision technology based on machine learning (ML), symmetry and, more particularly, convolutional neural networks (CNNs) have been generously applied, often resulting in increased work efficiency. Considering the success of the computational intelligence methods in other image classification problems, this research proposes a classification system for seeds by employing CNN and transfer learning. The proposed system contains a model that classifies 14 commonly known seeds with the implication of advanced deep learning techniques. The techniques applied in this research include decayed learning rate, model checkpointing and hybrid weight adjustment. This research applies symmetry when sampling the images of the seeds during data formation. The application of symmetry generates homogeneity with regards to resizing and labelling the images to extract their features. This resulted in 99% classification accuracy during the training set. The proposed model produced results with an accuracy of 99% for the test set, which contained 234 images. These results were much higher than the results reported in related research.

Journal Article

Share this book

Add to My Shelf

SweeD: Likelihood-Based Detection of Selective Sweeps in Thousands of Genomes

by Alachiotis, Nikolaos , Stamatakis, Alexandros , Živković, Daniel in Algorithms , Checkpointing , Data analysis

2013

The advent of modern DNA sequencing technology is the driving force in obtaining complete intra-specific genomes that can be used to detect loci that have been subject to positive selection in the recent past. Based on selective sweep theory, beneficial loci can be detected by examining the single nucleotide polymorphism patterns in intraspecific genome alignments. In the last decade, a plethora of algorithms for identifying selective sweeps have been developed. However, the majority of these algorithms have not been designed for analyzing whole-genome data. We present SweeD (Sweep Detector), an open-source tool for the rapid detection of selective sweeps in whole genomes. It analyzes site frequency spectra and represents a substantial extension of the widely used SweepFinder program. The sequential version of SweeD is up to 22 times faster than SweepFinder and, more importantly, is able to analyze thousands of sequences. We also provide a parallel implementation of SweeD for multi-core processors. Furthermore, we implemented a checkpointing mechanism that allows to deploy SweeD on cluster systems with queue execution time restrictions, as well as to resume long-running analyses after processor failures. In addition, the user can specify various demographic models via the command-line to calculate their theoretically expected site frequency spectra. Therefore, (in contrast to SweepFinder) the neutral site frequencies can optionally be directly calculated from a given demographic model. We show that an increase of sample size results in more precise detection of positive selection. Thus, the ability to analyze substantially larger sample sizes by using SweeD leads to more accurate sweep detection. We validate SweeD via simulations and by scanning the first chromosome from the 1000 human Genomes project for selective sweeps. We compare SweeD results with results from a linkage-disequilibrium-based approach and identify common outliers.

Journal Article

Share this book

Add to My Shelf

Design and Analysis of an Efficient Energy Algorithm in Wireless Social Sensor Networks

by Xiong, Naixue , Vasilakos, Athanasios , Imran, Muhammad in ad hoc network , Algorithms , asynchronous checkpointing

2017

Because mobile ad hoc networks have characteristics such as lack of center nodes, multi-hop routing and changeable topology, the existing checkpoint technologies for normal mobile networks cannot be applied well to mobile ad hoc networks. Considering the multi-frequency hierarchy structure of ad hoc networks, this paper proposes a hybrid checkpointing strategy which combines the techniques of synchronous checkpointing with asynchronous checkpointing, namely the checkpoints of mobile terminals in the same cluster remain synchronous, and the checkpoints in different clusters remain asynchronous. This strategy could not only avoid cascading rollback among the processes in the same cluster, but also avoid too many message transmissions among the processes in different clusters. What is more, it can reduce the communication delay. In order to assure the consistency of the global states, this paper discusses the correctness criteria of hybrid checkpointing, which includes the criteria of checkpoint taking, rollback recovery and indelibility. Based on the designed Intra-Cluster Checkpoint Dependence Graph and Inter-Cluster Checkpoint Dependence Graph, the elimination rules for different kinds of checkpoints are discussed, and the algorithms for the same cluster checkpoints, different cluster checkpoints, and rollback recovery are also given. Experimental results demonstrate the proposed hybrid checkpointing strategy is a preferable trade-off method, which not only synthetically takes all kinds of resource constraints of Ad hoc networks into account, but also outperforms the existing schemes in terms of the dependence to cluster heads, the recovery time compared to the pure synchronous, and the pure asynchronous checkpoint advantage.

Journal Article

Share this book

Add to My Shelf

Time-stamp Incremental Checkpointing and Its Application for an Optimization of Execution Model to Improve Performance of CAPE

by Renault, Éric , Do, Xuan Huyen , Ha, Viet Hai in Checkpointing , Computer memory , Computer Science

2018

CAPE, which stands for Checkpointing-Aided Parallel Execution,is a checkpoint-based approach to automatically translate and execute OpenMP programs on distributed-memory architectures. This approach demonstrates high-performance and complete compatibility with OpenMP on distributed-memory systems. In CAPE, checkpointing is one of the main factors acted on the performance of the system. This is shown over two versions of CAPE. The first version based on complete checkpoints is too slow as compared to the second version based on Discontinuous Incremental Checkpointing. This paper presents an improvement of Discontinuous Incremental Checkpointing, and a new execution model for CAPE using new techniques of checkpointing. It contributes to improve the performance and make CAPE even more flexible.

Journal Article

Share this book

Add to My Shelf

Achieving Reliability in Cloud Computing by a Novel Hybrid Approach

by Alam, Muhammad Mansoor , Shahid, Muhammad Asim , Su’ud, Mazliham Mohd in Accuracy , Algorithms , Cloud computing

2023

Cloud computing (CC) benefits and opportunities are among the fastest growing technologies in the computer industry. Cloud computing’s challenges include resource allocation, security, quality of service, availability, privacy, data management, performance compatibility, and fault tolerance. Fault tolerance (FT) refers to a system’s ability to continue performing its intended task in the presence of defects. Fault-tolerance challenges include heterogeneity and a lack of standards, the need for automation, cloud downtime reliability, consideration for recovery point objects, recovery time objects, and cloud workload. The proposed research includes machine learning (ML) algorithms such as naïve Bayes (NB), library support vector machine (LibSVM), multinomial logistic regression (MLR), sequential minimal optimization (SMO), K-nearest neighbor (KNN), and random forest (RF) as well as a fault-tolerance method known as delta-checkpointing to achieve higher accuracy, lesser fault prediction error, and reliability. Furthermore, the secondary data were collected from the homonymous, experimental high-performance computing (HPC) system at the Swiss Federal Institute of Technology (ETH), Zurich, and the primary data were generated using virtual machines (VMs) to select the best machine learning classifier. In this article, the secondary and primary data were divided into two split ratios of 80/20 and 70/30, respectively, and cross-validation (5-fold) was used to identify more accuracy and less prediction of faults in terms of true, false, repair, and failure of virtual machines. Secondary data results show that naïve Bayes performed exceptionally well on CPU-Mem mono and multi blocks, and sequential minimal optimization performed very well on HDD mono and multi blocks in terms of accuracy and fault prediction. In the case of greater accuracy and less fault prediction, primary data results revealed that random forest performed very well in terms of accuracy and fault prediction but not with good time complexity. Sequential minimal optimization has good time complexity with minor differences in random forest accuracy and fault prediction. We decided to modify sequential minimal optimization. Finally, the modified sequential minimal optimization (MSMO) algorithm with the fault-tolerance delta-checkpointing (D-CP) method is proposed to improve accuracy, fault prediction error, and reliability in cloud computing.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter