python

🔶 Python: Handling Missing Data

# Detect
df.isnull().sum()                     # NaN count per column
df.isnull().mean()                    # Fraction missing per column

# Remove
df.dropna()                           # Drop ANY row with NaN
df.dropna(subset=['critical_col'])    # Only if specific column is NaN

# Fill
df['col'].fillna(0)                   # Constant
df['col'].fillna(df['col'].mean())    # Column mean
df['col'].fillna(method='ffill')      # Forward fill (last known value)

Interview question: "When would you drop NaN vs fill it?"

Answer: Drop when: missingness is random AND few rows affected (<5%). Fill when: data is valuable, missingness has a pattern (time series → ffill), or the column is critical. NEVER blindly fill with mean — check if the missingness is informative (e.g., income=NaN might mean "refused to answer," which IS information).