Value Alignment - Search News

One-way AI alignment no longer works in generative AI world: Here's why

The authors argue that generative AI introduces a new class of alignment risks because interaction itself becomes a mechanism of influence. Humans adapt their behavior in response to AI outputs, ...

Harvard Medical School

SEAL: Systematic Error Analysis for Value ALignment

With coauthors from HLS and OpenAI, Manon Revel introduces evaluative metrics for reward models' alignment with values expressed in training datasets. "The importance of having a high-quality ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

One-way AI alignment no longer works in generative AI world: Here's why

SEAL: Systematic Error Analysis for Value ALignment

Trending now