python

🔶 Pandas: transform() vs apply() vs agg()

This is the trickiest Pandas question. The key distinction:

agg() → returns ONE row per group (like GROUP BY in SQL)

df.groupby('dept')['salary'].agg('mean')
# dept A → 75000
# dept B → 82000

transform() → returns SAME number of rows (broadcasts group result back)

df.groupby('dept')['salary'].transform('mean')
# [75000, 75000, 75000, 82000, 82000, ...]  (one per original row)

apply() → flexible but slow, can return anything

When to use transform: When you want a group-level stat as a new column WITHOUT losing rows.

# THE classic interview pattern: employees above their department's average
df[df['salary'] > df.groupby('dept')['salary'].transform('mean')]

df['pct_of_dept'] = df['salary'] / df.groupby('dept')['salary'].transform('sum')

Q: You want to add a column showing what percentage of their department's total salary each employee represents. Which function?

A: transform():