the.com/ai alignment
teaching a genie to want what you meant, not just what you said.
means the field of making ai systems pursue goals that actually match human values and intentions, not some technically-correct but disastrous shortcut.
from the term rose from norbert wiener's 1960s warnings about machines optimizing the wrong thing, then got sharpened by stuart russell and effective altruist researchers worried literal, powerful ai would satisfy instructions while wrecking everything nobody thought to specify.
paperclip thought experimenta goal this simple could consume the planet
rlhfhumans rank ai answers to nudge behavior
inner vs outeralignment can fail even after training succeeds
funding surgelabs now spend billions specifically chasing this