🍊 Latent Atlas 🍉

❯

❯

Scaling

2026年3月28日1分钟阅读

Scaling 模块关注模型参数量、训练 token、计算量、loss 和训练成本之间的关系。它是预训练规划的上游：先理解规模变量如何影响性能，再决定如何分配预算、估算显存、组织数据和设置训练方案。

阅读顺序建议：

Scaling Law：理解 loss 随规模变化的经验规律，以及 Kaplan / Chinchilla 的差异。
Model Data and Compute：建立 $N$ 、 $D$ 、 $C$ 、batch tokens 和 wall-clock time 的基本估算。
Compute Optimal：理解固定 compute 下参数量和训练 token 的分配问题。
Training Budget：把理论规模转成 GPU、显存、时间、checkpoint 和评测预算。

Notes

Scaling Law
模型、数据与算力
训练预算
Compute Optimal

Related Source TODOs

Scaling Laws for Neural Language Models
Training Compute-Optimal Large Language Models

此文件夹下有4条笔记。

2026年3月28日
Compute Optimal
- scaling
- compute-optimal
2026年3月28日
Model Data and Compute
- scaling
- compute
2026年3月28日
Scaling Law
- training
- scaling
2026年3月28日
Training Budget
- scaling
- budget

🍊 Latent Atlas 🍉 · An AI knowledge atlas built with Quartz © 2026