🟠ML: Decision Trees — The Intuitive Model
What they do: Repeatedly split data on feature thresholds that best separate the target. Like playing 20 questions.
How splits are chosen: - Classification: minimize Gini impurity = 1 - Σ(pᵢ²) where pᵢ = class proportion. Gini=0 → pure, Gini=0.5 → maximum impurity (binary). - Regression: minimize variance/MSE in each resulting group.
Pros: Interpretable, handles nonlinear relationships, no scaling needed, handles mixed types. Cons: Overfits easily, unstable (small data change → different tree), biased toward features with more levels.
This is WHY Random Forest and boosting exist — they address the instability and overfitting of single trees.
Practice Questions
Q: Your decision tree has 100% training accuracy. Good or bad?