🔶 Pandas: GroupBy and Merge
merge() — SQL-style joins in Pandas
# Inner join on a column
pd.merge(df1, df2, on='customer_id', how='inner')
# Left join with different column names
pd.merge(df1, df2, left_on='id', right_on='cust_id', how='left')
# Debug joins with indicator
pd.merge(df1, df2, on='id', how='left', indicator=True)
# Adds _merge column: 'left_only', 'right_only', 'both'
groupby() basics
# Single aggregation
df.groupby('department')['salary'].mean()
# Multiple aggregations
df.groupby('department').agg(
avg_salary=('salary', 'mean'),
headcount=('employee_id', 'count'),
max_salary=('salary', 'max')
)
Practice Questions
Q: You want to find customers in df1 that do NOT exist in df2. How?