terminology

🟣 Terminology: CAP Theorem and MapReduce

CAP Theorem A distributed system can guarantee at most 2 of 3: - Consistency — all nodes see the same data simultaneously - Availability — every request gets a response - Partition tolerance — system works despite network failures

Network partitions WILL happen, so you actually choose between: - CP (consistent, may reject during partitions) — banks, financial systems - AP (always responds, may serve stale data) — social media, caching

MapReduce Processing huge datasets across many machines: - Map: Apply function to each record → key-value pairs - Shuffle: Group values by key - Reduce: Aggregate per key

Word count: Map: each doc → (word, 1). Reduce: sum per word → (word, total).

Practice Questions

Q: You're building a real-time fraud detector. CP or AP system? Why?