🍊 Latent Atlas 🍉

❯

❯

❯

Constitutional AI

Constitutional AI

2026年5月29日1分钟阅读

source
paper
alignment
rlaif
safety

基本信息

Title: Constitutional AI: Harmlessness from AI Feedback
Source type: paper
Related topic notes: RLHF, Reward Model, SFT

TODO

阅读论文原文，整理 Constitutional AI 的 critique-revision、AI feedback 和 harmlessness training 流程。
回填 RLAIF 与 RLHF 的关系，以及安全偏好数据构造方法。
补充 constitution 原则、拒答边界和偏好模型的局限。

关系图谱

基本信息
TODO

反向链接

Papers
Reward Model
RLHF

🍊 Latent Atlas 🍉 · An AI knowledge atlas built with Quartz © 2026