the.com/softmax

turns a pile of raw scores into a popularity contest that sums to one.

means a function that converts a vector of numbers into probabilities, exaggerating the biggest ones so the model can commit to an answer while still hedging a little.

from borrowed from statistical mechanics via the boltzmann distribution, where it described how energy states get occupied; economists and then neural network researchers renamed it softmax in the 1980s because it behaves like a smooth, differentiable version of picking the max.

for instance

gpt token predictionsoftmax over 100000+ vocabulary entries picks each next word

imagenet classifiersfinal layer turns 1000 class scores into probabilities

alphago move selectionpolicy network uses softmax to weigh candidate moves

the.com/
what’s happening now · the.com · generated