🍊 Latent Atlas 🍉

❯

❯

❯

Training language models to follow instructions with human feedback

Training language models to follow instructions with human feedback

2026年5月29日1分钟阅读

source
paper
instructgpt
rlhf
sft
reward-model

基本信息

Title: Training language models to follow instructions with human feedback
Source type: paper
Related topic notes: SFT, RLHF, Reward Model, PPO

TODO

阅读论文原文，整理 InstructGPT 的三阶段流程：SFT、Reward Model、PPO。
回填人类偏好数据、labeler ranking、KL penalty 和真实用户评估。
补充该论文对现代 assistant post-training pipeline 的范式意义。

关系图谱

基本信息
TODO

反向链接

Papers
DPO
Post-training
Instruction Tuning
On-policy KD
PPO
Rejection Sampling
Reward Model
RLHF
SFT

🍊 Latent Atlas 🍉 · An AI knowledge atlas built with Quartz © 2026