the.com/utf-8
The universal translator that let the internet stop fighting over which alphabet wins.
means A character encoding standard that represents any written language using variable-length sequences of bytes, where English needs one byte per letter but emoji need four.
from Created in 1992 by Ken Thompson and Rob Pike at Bell Labs as a pragmatic fix to Unicode's bloat problem. Unicode had assigned every human character a number, but storing each as a fixed 32-bit integer wasted absurd amounts of space. UTF-8 (Unicode Transformation Format, 8-bit) was the elegant answer: use one byte for ASCII, multiple bytes for everything else. It became the internet's default language almost by accident—simple, backward-compatible, efficient.
backward compat magicOld ASCII files are valid UTF-8 with zero changes needed
why variable length worksFirst byte signals how many bytes follow; impossible to misalign
dominance nowPowers 97% of web pages; every major language fits perfectly
emoji efficiencyMost fit in 4 bytes; emoji could theoretically need 6