the.com/llm
Statistical pattern-matching dressed up in words, no actual understanding underneath.
means A large language model: a neural network trained on vast text to predict the next word so convincingly it seems intelligent, but it's just probability distilled into weights.
from Emerged circa 2017 as transformer architecture scaled to absurd dimensions. Transformer paper (2017) described the mechanism; GPT (2018) proved it could generate coherent text; GPT-3 (2020) shocked everyone by seeming almost sentient. The acronym stuck because 'big neural network trained on text' is a mouthful.
no actual reasoninggenerates next token via matrix math, not thought
hallucination endemicconfident wrongness baked into the architecture
training data cutofffrozen in time, unaware of events after training
token inefficiencyoften reinvents reasoning instead of retrieving facts