python

🔶 Pandas: GroupBy and Merge

merge() — SQL-style joins in Pandas

# Inner join on a column
pd.merge(df1, df2, on='customer_id', how='inner')

# Left join with different column names
pd.merge(df1, df2, left_on='id', right_on='cust_id', how='left')

# Debug joins with indicator
pd.merge(df1, df2, on='id', how='left', indicator=True)
# Adds _merge column: 'left_only', 'right_only', 'both'

groupby() basics

# Single aggregation
df.groupby('department')['salary'].mean()

# Multiple aggregations
df.groupby('department').agg(
    avg_salary=('salary', 'mean'),
    headcount=('employee_id', 'count'),
    max_salary=('salary', 'max')
)

Practice Questions

Q: You want to find customers in df1 that do NOT exist in df2. How?