The Iterator Protocol
Iteration is at the heart of Python. Every time you write for item in collection:, Python uses the iterator protocol behind the scenes. Understanding this protocol unlocks powerful patterns for processing data efficiently.
An iterable is any object you can loop over (lists, strings, dicts, files). An iterator is the object that does the actual stepping through. When Python sees for x in items:, it calls iter(items) to get an iterator, then calls next() on it repeatedly until it raises StopIteration.
Creating a Custom Iterator
To make your own iterable class, implement __iter__ (returns the iterator) and __next__ (returns the next value or raises StopIteration):
class Countdown:
"""Counts down from start to 1."""
def __init__(self, start):
self.start = start
def __iter__(self):
self.current = self.start
return self
def __next__(self):
if self.current <= 0:
raise StopIteration
value = self.current
self.current -= 1
return value
# Use in a for loop
for n in Countdown(5):
print(n, end=" ")
print()
# Also works with list(), sum(), etc.
print(list(Countdown(3)))
5 4 3 2 1 [3, 2, 1]
This works, but it's a lot of boilerplate for such a simple pattern. That's where generators come in.
Generators with yield
Generators are the Pythonic way to create iterators. Instead of a class with __iter__ and __next__, you write a function with the yield keyword. When a generator function is called, it returns a generator object that produces values lazily — one at a time, on demand.
Think of yield as a pause button. The function runs until it hits yield, produces a value, then pauses. The next time you ask for a value, it resumes right where it left off:
def countdown(start):
"""Same countdown, but as a generator — much simpler!"""
current = start
while current > 0:
yield current
current -= 1
for n in countdown(5):
print(n, end=" ")
print()
# Fibonacci sequence — infinite generator!
def fibonacci():
a, b = 0, 1
while True: # Infinite! But that's fine with generators
yield a
a, b = b, a + b
# Take the first 10 Fibonacci numbers
from itertools import islice
fib_10 = list(islice(fibonacci(), 10))
print(fib_10)
5 4 3 2 1 [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
Generator Expressions
Generator expressions look like list comprehensions but use parentheses instead of brackets. They're lazy — values are computed on demand, not all at once:
import sys
# List comprehension — creates ALL values in memory
squares_list = [x**2 for x in range(1_000_000)]
print(f"List size: {sys.getsizeof(squares_list):,} bytes")
# Generator expression — computes on demand
squares_gen = (x**2 for x in range(1_000_000))
print(f"Generator size: {sys.getsizeof(squares_gen):,} bytes")
# Both produce the same results
print(sum(x**2 for x in range(10))) # No extra brackets needed in function calls
List size: 8,448,728 bytes Generator size: 200 bytes 285
Use generator expressions when you only need to iterate once and don't need to store the results. They're especially useful inside sum(), max(), min(), and any()/all().
yield from
yield from delegates to another generator. It's perfect for flattening nested structures or chaining generators:
def flatten(nested):
"""Recursively flatten nested lists."""
for item in nested:
if isinstance(item, list):
yield from flatten(item) # Delegate to recursive call
else:
yield item
data = [1, [2, 3], [4, [5, 6]], 7]
print(list(flatten(data)))
[1, 2, 3, 4, 5, 6, 7]
itertools — The Iterator Toolbox
The itertools module provides a collection of fast, memory-efficient building blocks for working with iterators. These are the most commonly used:
from itertools import chain, islice, groupby, combinations, count
# chain — combine multiple iterables
combined = list(chain([1, 2], [3, 4], [5, 6]))
print(f"chain: {combined}")
# islice — slice any iterator (like list slicing but for generators)
first_5_evens = list(islice((x for x in count(0, 2)), 5))
print(f"islice: {first_5_evens}")
# combinations — all possible pairs/triples/etc
teams = list(combinations(["Alice", "Bob", "Charlie"], 2))
print(f"combinations: {teams}")
# groupby — group consecutive items (must be sorted first!)
data = [("fruit", "apple"), ("fruit", "banana"), ("veg", "carrot"), ("veg", "pea")]
for category, items in groupby(data, key=lambda x: x[0]):
print(f" {category}: {[item[1] for item in items]}")
chain: [1, 2, 3, 4, 5, 6]
islice: [0, 2, 4, 6, 8]
combinations: [('Alice', 'Bob'), ('Alice', 'Charlie'), ('Bob', 'Charlie')]
fruit: ['apple', 'banana']
veg: ['carrot', 'pea']Practical Example: Processing Large Files
Generators are perfect for processing files that are too large to fit in memory. Read one line at a time, transform it, and pass it along:
def read_large_csv(filepath):
"""Read a CSV file line by line as dictionaries."""
with open(filepath) as f:
headers = next(f).strip().split(",")
for line in f:
values = line.strip().split(",")
yield dict(zip(headers, values))
def filter_active(records):
"""Only yield active records."""
for record in records:
if record.get("status") == "active":
yield record
# Pipeline: read -> filter -> process (no list in memory!)
# records = read_large_csv("users.csv")
# active = filter_active(records)
# for user in active:
# process(user)
🔍 Deep Dive: Generator Pipelines
Generators can be chained into pipelines where each stage processes one item at a time. This is similar to Unix pipes (cat file | grep pattern | sort). The data flows through the pipeline lazily — only one item is in memory at a time, regardless of how large the input is. This pattern is used extensively in data processing frameworks like Apache Spark (Python) and is the foundation of Python's own itertools module.
⚠️ Common Mistake: Exhausting a Generator
Wrong:
gen = (x**2 for x in range(5))
print(list(gen)) # [0, 1, 4, 9, 16]
print(list(gen)) # [] — Empty! Generator is exhausted!
Why: Generators can only be iterated once. After all values are consumed, calling next() raises StopIteration and the generator is done.
Instead: If you need to iterate multiple times, convert to a list first, or recreate the generator.