🟠ML: Gradient Descent
What it is: An optimization algorithm that iteratively adjusts parameters to minimize the loss function.
Intuition: You're on a hilly landscape in fog. You can only feel the slope beneath your feet. You take a step downhill. Repeat until you reach the bottom.
Learning rate: - Too high → overshoot the minimum, bounce around, diverge - Too low → converge very slowly, might get stuck in local minimum - Just right → smooth convergence
Variants: - Batch: Use ALL data per step — stable but slow - Stochastic (SGD): Use ONE random sample — fast but noisy - Mini-batch: Use a subset (e.g., 32-256 samples) — best of both worlds, standard in practice
Practice Questions
Q: "What IS a loss function?" (Explain simply.)