🟠ML: L1 (Lasso) vs L2 (Ridge) Regularization
Regularization adds a penalty to the loss function to discourage overly complex models.
L1 (Lasso): Penalty = λ × Σ|βᵢ| - Drives coefficients to exactly zero → built-in feature selection - Produces sparse models (few features left) - Use when many features are irrelevant
L2 (Ridge): Penalty = λ × Σβᵢ² - Shrinks coefficients toward zero but never exactly zero - Keeps all features with smaller weights - Use when features are correlated (distributes weight among them)
Elastic Net = α×L1 + (1-α)×L2. Best of both worlds.
🚨 THE TRAP QUESTION
"Are L1/L2 applicable to Random Forest?"
Answer: No. L1/L2 regularize model coefficients. Trees have no coefficients — they split on features. Trees regularize via: max_depth, min_samples_leaf, min_samples_split, n_estimators, learning rate (boosting).