Iterators & Generators

Master the iterator protocol, yield, generator expressions, yield from, and itertools for memory-efficient data processing.

Intermediate 35 min read 🐍 Python

The Iterator Protocol

Iteration is at the heart of Python. Every time you write for item in collection:, Python uses the iterator protocol behind the scenes. Understanding this protocol unlocks powerful patterns for processing data efficiently.

An iterable is any object you can loop over (lists, strings, dicts, files). An iterator is the object that does the actual stepping through. When Python sees for x in items:, it calls iter(items) to get an iterator, then calls next() on it repeatedly until it raises StopIteration.

Creating a Custom Iterator

To make your own iterable class, implement __iter__ (returns the iterator) and __next__ (returns the next value or raises StopIteration):

class Countdown:
    """Counts down from start to 1."""
    def __init__(self, start):
        self.start = start

    def __iter__(self):
        self.current = self.start
        return self

    def __next__(self):
        if self.current <= 0:
            raise StopIteration
        value = self.current
        self.current -= 1
        return value

# Use in a for loop
for n in Countdown(5):
    print(n, end=" ")
print()

# Also works with list(), sum(), etc.
print(list(Countdown(3)))
Output
5 4 3 2 1
[3, 2, 1]

This works, but it's a lot of boilerplate for such a simple pattern. That's where generators come in.

Generators with yield

Generators are the Pythonic way to create iterators. Instead of a class with __iter__ and __next__, you write a function with the yield keyword. When a generator function is called, it returns a generator object that produces values lazily — one at a time, on demand.

Think of yield as a pause button. The function runs until it hits yield, produces a value, then pauses. The next time you ask for a value, it resumes right where it left off:

def countdown(start):
    """Same countdown, but as a generator — much simpler!"""
    current = start
    while current > 0:
        yield current
        current -= 1

for n in countdown(5):
    print(n, end=" ")
print()

# Fibonacci sequence — infinite generator!
def fibonacci():
    a, b = 0, 1
    while True:  # Infinite! But that's fine with generators
        yield a
        a, b = b, a + b

# Take the first 10 Fibonacci numbers
from itertools import islice
fib_10 = list(islice(fibonacci(), 10))
print(fib_10)
Output
5 4 3 2 1
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
Key Takeaway: Generators are memory efficient because they produce values one at a time instead of storing everything in a list. A generator for 1 billion numbers uses almost no memory, while a list of 1 billion numbers would use ~8 GB.

Generator Expressions

Generator expressions look like list comprehensions but use parentheses instead of brackets. They're lazy — values are computed on demand, not all at once:

import sys

# List comprehension — creates ALL values in memory
squares_list = [x**2 for x in range(1_000_000)]
print(f"List size: {sys.getsizeof(squares_list):,} bytes")

# Generator expression — computes on demand
squares_gen = (x**2 for x in range(1_000_000))
print(f"Generator size: {sys.getsizeof(squares_gen):,} bytes")

# Both produce the same results
print(sum(x**2 for x in range(10)))  # No extra brackets needed in function calls
Output
List size: 8,448,728 bytes
Generator size: 200 bytes
285

Use generator expressions when you only need to iterate once and don't need to store the results. They're especially useful inside sum(), max(), min(), and any()/all().

yield from

yield from delegates to another generator. It's perfect for flattening nested structures or chaining generators:

def flatten(nested):
    """Recursively flatten nested lists."""
    for item in nested:
        if isinstance(item, list):
            yield from flatten(item)  # Delegate to recursive call
        else:
            yield item

data = [1, [2, 3], [4, [5, 6]], 7]
print(list(flatten(data)))
Output
[1, 2, 3, 4, 5, 6, 7]

itertools — The Iterator Toolbox

The itertools module provides a collection of fast, memory-efficient building blocks for working with iterators. These are the most commonly used:

from itertools import chain, islice, groupby, combinations, count

# chain — combine multiple iterables
combined = list(chain([1, 2], [3, 4], [5, 6]))
print(f"chain: {combined}")

# islice — slice any iterator (like list slicing but for generators)
first_5_evens = list(islice((x for x in count(0, 2)), 5))
print(f"islice: {first_5_evens}")

# combinations — all possible pairs/triples/etc
teams = list(combinations(["Alice", "Bob", "Charlie"], 2))
print(f"combinations: {teams}")

# groupby — group consecutive items (must be sorted first!)
data = [("fruit", "apple"), ("fruit", "banana"), ("veg", "carrot"), ("veg", "pea")]
for category, items in groupby(data, key=lambda x: x[0]):
    print(f"  {category}: {[item[1] for item in items]}")
Output
chain: [1, 2, 3, 4, 5, 6]
islice: [0, 2, 4, 6, 8]
combinations: [('Alice', 'Bob'), ('Alice', 'Charlie'), ('Bob', 'Charlie')]
  fruit: ['apple', 'banana']
  veg: ['carrot', 'pea']

Practical Example: Processing Large Files

Generators are perfect for processing files that are too large to fit in memory. Read one line at a time, transform it, and pass it along:

def read_large_csv(filepath):
    """Read a CSV file line by line as dictionaries."""
    with open(filepath) as f:
        headers = next(f).strip().split(",")
        for line in f:
            values = line.strip().split(",")
            yield dict(zip(headers, values))

def filter_active(records):
    """Only yield active records."""
    for record in records:
        if record.get("status") == "active":
            yield record

# Pipeline: read -> filter -> process (no list in memory!)
# records = read_large_csv("users.csv")
# active = filter_active(records)
# for user in active:
#     process(user)
🔍 Deep Dive: Generator Pipelines

Generators can be chained into pipelines where each stage processes one item at a time. This is similar to Unix pipes (cat file | grep pattern | sort). The data flows through the pipeline lazily — only one item is in memory at a time, regardless of how large the input is. This pattern is used extensively in data processing frameworks like Apache Spark (Python) and is the foundation of Python's own itertools module.

⚠️ Common Mistake: Exhausting a Generator

Wrong:

gen = (x**2 for x in range(5))
print(list(gen))  # [0, 1, 4, 9, 16]
print(list(gen))  # [] — Empty! Generator is exhausted!

Why: Generators can only be iterated once. After all values are consumed, calling next() raises StopIteration and the generator is done.

Instead: If you need to iterate multiple times, convert to a list first, or recreate the generator.