Archives for HPDL Blog
Parcae Proactive Liveput-Optimized DNN Training on Preemptible Instances
Characterization of Large Language Model Development in the Datacenter
EasyScale Elastic Training with Consistent Accuracy and Improved Utilization on GPUs
Efficient Fault Tolerance for Recommendation Model Training via Erasure Coding
ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning
Oobleck Resilient Distributed Training of Large Models Using Pipeline Templates
PGLBox Multi-GPU Graph Learning Framework for Web-Scale Recommendation
QLORA: Efficient Finetuning of Quantized LLMs
some methods for mixed precision training
TOWARD EFFICIENT LOW-PRECISION TRAINING: DATA FORMAT OPTIMIZATION AND HYSTERESIS QUANTIZATION
Colossal-Auto Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
FlexMoE:Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks
Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models
Elastic Averaging for Efficient Pipelined DNN Training
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
Persia, 针对大规模推荐模型的优化; 以及Tutel,针对大规模moe模型的优化
TSPLIT: Fine-grained GPU Memory Management for Efficient DNN Training via Tensor Splitting
Deep Neural Network Training With Distributed K-FAC
Alpa, Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models
GNNLab: A Factored System for Sample-based GNN Training over GPUs
Rematerialization and swapping
PaGraph: Scaling GNN Training on Large Graphs viaComputation-aware Caching
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Piper: Multidimensional Planner for DNN Parallelization
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Ultra-Low Precision 4-bit Training of Deep Neural Networks
GNNAdvisor: An Efficient Runtime System for GNN Acceleration on GPUs
TeraPipe Token-Level Pipeline Parallelism for Training Large-Scale Language Models
Combining Label Propagation and Simple Models Out-performs Graph Neural Networks
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Chimera-Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines