the.com/define/ai alignment

the.com/ai alignment

teaching a genie to want what you meant, not just what you said.

means the field of making ai systems pursue goals that actually match human values and intentions, not some technically-correct but disastrous shortcut.

from the term rose from norbert wiener's 1960s warnings about machines optimizing the wrong thing, then got sharpened by stuart russell and effective altruist researchers worried literal, powerful ai would satisfy instructions while wrecking everything nobody thought to specify.

paperclip thought experimenta goal this simple could consume the planet

rlhfhumans rank ai answers to nudge behavior

inner vs outeralignment can fail even after training succeeds

funding surgelabs now spend billions specifically chasing this

what’s happening now · the.com · generated