ml

Round 4: ML (Explain the Tradeoff)

16. Bias-variance: your model has 97% train accuracy, 68% test accuracy. Overfitting (high variance). Big gap. Fix: more data, regularize, simplify, dropout.

17. L1 vs L2: 300 features, you think ~20 matter. L1 (Lasso) — drives irrelevant features to exactly zero. Built-in feature selection.

18. RF vs XGBoost: you have noisy data and 1 hour before deadline. Random Forest — good defaults, resistant to noise, hard to screw up. XGBoost needs tuning.

19. You built a fraud model with 99.5% accuracy. Your manager is impressed. Should you be? No — if fraud is 0.5% of transactions, predicting "not fraud" always = 99.5% accuracy. Check precision, recall, F1, PR-AUC.

20. Can you apply L1/L2 to Random Forest? No. L1/L2 regularize coefficients. Trees have none. Trees regularize via max_depth, min_samples_leaf, etc.

Practice Questions

Q: Bias-variance: your model has 97% train accuracy, 68% test accuracy.
Q: L1 vs L2: 300 features, you think ~20 matter.
Q: RF vs XGBoost: you have noisy data and 1 hour before deadline.
Q: You built a fraud model with 99.5% accuracy. Your manager is impressed. Should you be?
Q: Can you apply L1/L2 to Random Forest?