🟣 Terminology: ETL and Data Storage
ETL = Extract, Transform, Load. The pipeline for moving data into an analytics system. - Extract: pull raw data from databases, APIs, logs - Transform: clean, validate, restructure, aggregate - Load: write into the target (warehouse)
Modern twist: ELT — load raw first, transform inside the warehouse. Cheaper storage makes this viable.
Data Warehouse = structured, processed data, optimized for analytical queries (Snowflake, BigQuery, Redshift). Think "organized library."
Data Lake = raw data in ANY format at any scale (S3, Azure Data Lake). Think "dump everything, organize later." Risk: becomes a "data swamp" without governance.
Practice Questions
Q: Your company stores raw JSON logs from 50 different services. Where does this go — warehouse or lake?