python

🔶 Pandas: Memory Optimization

# Check current memory
df.info(memory_usage='deep')

# 1. Categorical columns (low cardinality strings → massive savings)
df['status'] = df['status'].astype('category')  # 'active','inactive' repeated 1M times

# 2. Downcast numerics
df['age'] = pd.to_numeric(df['age'], downcast='integer')  # int64 → int8 if values fit

# 3. Float64 → Float32 (halves memory)
float_cols = df.select_dtypes(include=['float64']).columns
df[float_cols] = df[float_cols].astype('float32')

# 4. Load only needed columns
df = pd.read_csv('big.csv', usecols=['col1', 'col2', 'col3'])

Practice Questions

Q: A column has 2 million rows but only 5 unique string values. How do you reduce its memory?

A: df['col'] = df['col'].astype('category') — stores 5 unique values + integer codes instead of 2M full strings. Can reduce memory 95%+.