🍊 Latent Atlas 🍉

❯

❯

❯

DeepSeekMath

2026年5月29日1分钟阅读

source
paper
reasoning
grpo
math

基本信息

Title: DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Source type: paper
Related topic notes: GRPO, Rejection Sampling, RLHF, Knowledge Distillation

TODO

阅读论文原文，整理 mathematical reasoning 数据、SFT、RL 和 GRPO 的训练流程。
回填 group-relative policy optimization 在可验证数学任务中的机制和实验结论。
补充 verifier reward、采样数量、pass rate 和 reasoning 能力之间的关系。

关系图谱

基本信息
TODO

反向链接

Papers
GRPO
Post-training
Knowledge Distillation
On-policy KD
Rejection Sampling
Sequence-level Distillation

🍊 Latent Atlas 🍉 · An AI knowledge atlas built with Quartz © 2026