the.com/chaos engineering
you break your own systems on purpose so the internet doesn't do it for you.
means a practice of deliberately injecting failures into production systems to find weaknesses before they cause real outages.
from born at netflix around 2010 when engineers built chaos monkey, a tool that randomly killed servers during business hours to force resilient design.
business hours onlynetflix ran it while engineers were awake to fix things
the zoospawned chaos gorilla and chaos kong for bigger blasts
principle firstformalized as a discipline with actual hypotheses, not just sabotage
for instance
netflix chaos monkey — randomly terminates instances since 2011, open-sourced 2012
amazon gameday — aws teams simulate outages internally to rehearse response
google dirt exercises — disaster recovery testing that once knocked out internal tools by mistake