🍊 Latent Atlas 🍉

标签: reward-model

此标签下有3条笔记。

2026年5月29日
Deep Reinforcement Learning from Human Preferences
2026年5月29日
Learning to summarize from human feedback
2026年5月29日
Training language models to follow instructions with human feedback

🍊 Latent Atlas 🍉 · An AI knowledge atlas built with Quartz © 2026