🍊 Latent Atlas 🍉

标签: rlhf

此标签下有4条笔记。

2026年5月29日
Deep Reinforcement Learning from Human Preferences
2026年5月29日
Learning to summarize from human feedback
2026年5月29日
Training language models to follow instructions with human feedback
2026年3月07日
PPO

🍊 Latent Atlas 🍉 · An AI knowledge atlas built with Quartz © 2026