🔶 Pandas: Memory Optimization
# Check current memory
df.info(memory_usage='deep')
# 1. Categorical columns (low cardinality strings → massive savings)
df['status'] = df['status'].astype('category') # 'active','inactive' repeated 1M times
# 2. Downcast numerics
df['age'] = pd.to_numeric(df['age'], downcast='integer') # int64 → int8 if values fit
# 3. Float64 → Float32 (halves memory)
float_cols = df.select_dtypes(include=['float64']).columns
df[float_cols] = df[float_cols].astype('float32')
# 4. Load only needed columns
df = pd.read_csv('big.csv', usecols=['col1', 'col2', 'col3'])
Practice Questions
Q: A column has 2 million rows but only 5 unique string values. How do you reduce its memory?