🟠 ML: Gradient Descent

What it is: An optimization algorithm that iteratively adjusts parameters to minimize the loss function.

Intuition: You're on a hilly landscape in fog. You can only feel the slope beneath your feet. You take a step downhill. Repeat until you reach the bottom.

Learning rate: - Too high → overshoot the minimum, bounce around, diverge - Too low → converge very slowly, might get stuck in local minimum - Just right → smooth convergence

Variants: - Batch: Use ALL data per step — stable but slow - Stochastic (SGD): Use ONE random sample — fast but noisy - Mini-batch: Use a subset (e.g., 32-256 samples) — best of both worlds, standard in practice

Practice Questions

Q: "What IS a loss function?" (Explain simply.)