terminology

🟣 Terminology: ETL and Data Storage

ETL = Extract, Transform, Load. The pipeline for moving data into an analytics system. - Extract: pull raw data from databases, APIs, logs - Transform: clean, validate, restructure, aggregate - Load: write into the target (warehouse)

Modern twist: ELT — load raw first, transform inside the warehouse. Cheaper storage makes this viable.

Data Warehouse = structured, processed data, optimized for analytical queries (Snowflake, BigQuery, Redshift). Think "organized library."

Data Lake = raw data in ANY format at any scale (S3, Azure Data Lake). Think "dump everything, organize later." Risk: becomes a "data swamp" without governance.

Practice Questions

Q: Your company stores raw JSON logs from 50 different services. Where does this go — warehouse or lake?