Archives for HPDL Blog

Fri, 26 Apr 2024 Parcae Proactive Liveput-Optimized DNN Training on Preemptible Instances

Fri, 26 Apr 2024 Characterization of Large Language Model Development in the Datacenter

Wed, 24 Jan 2024 EasyScale Elastic Training with Consistent Accuracy and Improved Utilization on GPUs

Wed, 24 Jan 2024 Efficient Fault Tolerance for Recommendation Model Training via Erasure Coding

Fri, 22 Dec 2023 ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning

Sat, 04 Nov 2023 Oobleck Resilient Distributed Training of Large Models Using Pipeline Templates

Fri, 27 Oct 2023 PGLBox Multi-GPU Graph Learning Framework for Web-Scale Recommendation

Wed, 06 Sep 2023 QLORA: Efficient Finetuning of Quantized LLMs

Mon, 19 Jun 2023 some methods for mixed precision training

Mon, 19 Jun 2023 TOWARD EFFICIENT LOW-PRECISION TRAINING: DATA FORMAT OPTIMIZATION AND HYSTERESIS QUANTIZATION

Fri, 16 Jun 2023 Colossal-Auto Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models

Fri, 16 Jun 2023 FlexMoE：Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement

Fri, 26 May 2023 MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks

Mon, 22 May 2023 MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism

Sun, 23 Apr 2023 Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models

Fri, 14 Apr 2023 Elastic Averaging for Efficient Pipelined DNN Training

Fri, 04 Nov 2022 MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud

Fri, 28 Oct 2022 SANCUS: Staleness-Aware Communication-Avoiding Full-Graph Decentralized Training in Large-Scale Graph Neural Networks

Fri, 21 Oct 2022 Persia, 针对大规模推荐模型的优化; 以及Tutel,针对大规模moe模型的优化

Sat, 15 Oct 2022 TSPLIT: Fine-grained GPU Memory Management for Efﬁcient DNN Training via Tensor Splitting

Fri, 30 Sep 2022 Deep Neural Network Training With Distributed K-FAC

Fri, 23 Sep 2022 DeepSpeed-MOE

Thu, 30 Jun 2022 Alpa, Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

Fri, 17 Jun 2022 Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models

Mon, 23 May 2022 GNNLab: A Factored System for Sample-based GNN Training over GPUs

Mon, 23 May 2022 Colossal-AI中2D、2.5D和3D张量并行

Mon, 09 May 2022 Rematerialization and swapping

Fri, 22 Apr 2022 ICLR 2017 MOE

Fri, 15 Apr 2022 PaGraph: Scaling GNN Training on Large Graphs viaComputation-aware Caching

Fri, 25 Mar 2022 GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Fri, 25 Mar 2022 Piper: Multidimensional Planner for DNN Parallelization

Mon, 21 Mar 2022 论文串讲-FlexFlow与自动并行

Fri, 18 Mar 2022 ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning

Fri, 17 Dec 2021 Ultra-Low Precision 4-bit Training of Deep Neural Networks

Fri, 03 Dec 2021 GNNAdvisor: An Efficient Runtime System for GNN Acceleration on GPUs

Fri, 03 Dec 2021 TeraPipe Token-Level Pipeline Parallelism for Training Large-Scale Language Models

Sat, 27 Nov 2021 Combining Label Propagation and Simple Models Out-performs Graph Neural Networks

Fri, 19 Nov 2021 Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

Fri, 12 Nov 2021 Pelican使用手册

Sun, 07 Nov 2021 Chimera-Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines