2024  6

April  2

Megatron-LM解读:DDP的原理和实现

April 27, 2024 · 1 min · Dawson Chen

MoE to Dense介绍以及相关论文速览

April 22, 2024 · 1 min · Dawson Chen

March  1

Megatron-LM解读:MoE的实现方式

March 19, 2024 · 5 min · Dawson Chen

February  1

Megatron-LM解读:流水线并行原理和代码解读

February 5, 2024 · 2 min · Dawson Chen

January  2

Batch Size杂谈

January 22, 2024 · 1 min · Dawson Chen

Deepspeed-HybridEngine开发指南

January 7, 2024 · 2 min · Dawson Chen

2023  7

December  2

Moe的Scaling Law

December 22, 2023 · 2 min · Dawson Chen

Moe(Mixtrue of Experts)技术调研

December 20, 2023 · 5 min · Dawson Chen

November  1

PPO实践经验

November 14, 2023 · 1 min · Dawson Chen

August  1

Rope背后的数学想象力

August 6, 2023 · 1 min · Dawson Chen

July  2

Deepspeed原理(手写笔记)

July 5, 2023 · 1 min · Dawson Chen

混合精度训练

July 5, 2023 · 1 min · Dawson Chen

April  1

ChatGPT Plugins原理介绍和讨论

April 7, 2023 · 2 min · Dawson Chen