🍊 Latent Atlas 🍉

Home

❯

Sources

❯

Papers

❯

Training language models to follow instructions with human feedback

Training language models to follow instructions with human feedback

2026年5月29日1分钟阅读

  • source
  • paper
  • instructgpt
  • rlhf
  • sft
  • reward-model

基本信息

  • Title: Training language models to follow instructions with human feedback
  • Source type: paper
  • Related topic notes: SFT, RLHF, Reward Model, PPO

TODO

  • 阅读论文原文,整理 InstructGPT 的三阶段流程:SFT、Reward Model、PPO。
  • 回填人类偏好数据、labeler ranking、KL penalty 和真实用户评估。
  • 补充该论文对现代 assistant post-training pipeline 的范式意义。

关系图谱

  • 基本信息
  • TODO

反向链接

  • Papers
  • DPO
  • Post-training
  • Instruction Tuning
  • On-policy KD
  • PPO
  • Rejection Sampling
  • Reward Model
  • RLHF
  • SFT

🍊 Latent Atlas 🍉 · An AI knowledge atlas built with Quartz © 2026