python

Round 5: Python/Pandas (What's Wrong?)

21. What's wrong with: df[df['x'] > 5]['y'] = 10 Chained indexing — may modify a copy. Fix: df.loc[df['x'] > 5, 'y'] = 10

22. You need each employee's department average as a new column. agg(), transform(), or apply()? transform() — keeps the same number of rows.

23. df.loc[0:3] returns how many rows? 4 rows (labels 0, 1, 2, 3 — loc is inclusive on both ends).

24. Why is NumPy faster than Python lists? Contiguous memory (cache-friendly), homogeneous types (no per-element type checking), vectorized C operations (no Python loop overhead). 10-100x faster.

25. List vs generator for 10M items? Generator — O(1) memory (yields one at a time) vs O(n) memory for list (stores all at once).

Practice Questions

Q: What's wrong with:

A: Chained indexing — may modify a copy. Fix: df.loc[df['x'] > 5, 'y'] = 10

Q: You need each employee's department average as a new column.

A: transform() — keeps the same number of rows.

Q: df.loc[0:3] returns how many rows?

Q: Why is NumPy faster than Python lists?

Q: List vs generator for 10M items?