🟠 ML: L1 (Lasso) vs L2 (Ridge) Regularization

Regularization adds a penalty to the loss function to discourage overly complex models.

L1 (Lasso): Penalty = λ × Σ|βᵢ| - Drives coefficients to exactly zero → built-in feature selection - Produces sparse models (few features left) - Use when many features are irrelevant

L2 (Ridge): Penalty = λ × Σβᵢ² - Shrinks coefficients toward zero but never exactly zero - Keeps all features with smaller weights - Use when features are correlated (distributes weight among them)

Elastic Net = α×L1 + (1-α)×L2. Best of both worlds.

🚨 THE TRAP QUESTION

"Are L1/L2 applicable to Random Forest?"

Answer: No. L1/L2 regularize model coefficients. Trees have no coefficients — they split on features. Trees regularize via: max_depth, min_samples_leaf, min_samples_split, n_estimators, learning rate (boosting).

Practice Questions

Q: You have 500 features but suspect only ~20 matter. L1 or L2?