the.com/datalake
where structured data goes to become somebody else's problem, indefinitely.
means a storage repository that holds vast amounts of raw data in its native format until someone decides what to do with it.
from coined around 2010 by james dixon of pentaho, who contrasted it with a data mart: if a mart is bottled water, a lake is the whole ecosystem, untreated and unfiltered.
contrast termdata warehouse stores refined, structured, query-ready data
failure modeoften becomes a data swamp nobody can navigate
schema ruleschema-on-read, not schema-on-write, unlike warehouses
scalepetabytes are casual conversation here