
Most of us have heard of Python generators or may even use them without fully understanding how they work. Today, I’ll explain how generators function behind the scenes in Python.
Let’s start with the basics and gradually dive deeper into the topic.
What is Generator?
Any Python function that has the yield keyword in its body is a generator function: a function which, when called, returns a generator object.
In other words, a generator function is a generator factory.
Python generators are a type of iterable, like lists or tuples, but they generate items on the fly instead of storing all items in memory.
Key Features of Generators
- Memory Efficiency: Generators only compute one item at a time, making them far more memory-efficient than lists, especially with large data.
- Lazy Evaluation: They produce values only when needed, allowing for efficient iteration.
- Infinite Sequences: Generators can represent infinite sequences since they don’t require all values to be stored.
How to Create Generators
Generators in Python are implemented in two main ways
- Generator Functions: Functions that use the yield keyword instead of return to produce a series of values.
def count_up_to(limit):
count = 1
while count <= limit:
yield count
count += 1
2. Generator Expressions: Similar to list comprehensions but with parentheses instead of brackets, which create a generator object instead of a list.
squares = (x * x for x in range(10))
Accessing Generators
You can iterate over a generator using a for loop or convert it to a list if you need to access all values at once (but this loses memory efficiency)
for number in count_up_to(5):
print(number) # Prints numbers 1 through 5
Limitations Of Generators
- No Random Access: You cannot access elements by index in a generator.
- One-Time Use: Once you iterate over a generator, you can’t restart it. If you need to re-use the generator’s values, you must re-create it.
Alright! This is just the beginning……
These are the basics of Python generators — concepts most people are familiar with nowadays. Let’s dive deeper.
Did you ever wondered how generators actually work? How does Python manage them in the background to ensure memory efficiency? Let’s explore the details. I will explain it with Five points.
1. Generator Objects: Turning Functions into Generators
When you define a generator function using the yield keyword, Python recognizes it as a generator function. Instead of running the function’s code right away, calling a generator function returns a generator object. This generator object is an instance of the generator class, which is an iterator. This means it has a well-defined interface (__iter__ and __next__) that allows Python to control the flow of the function and yield values as needed.
def my_generator():
x = 1
print("You will not see this print statement on calling my_generator method!")
yield x
print("You also won’t see this print statement when calling the my_generator method!")
x = x + 1
yield x
gen = my_generator() # Creates a generator object, but no code is executed yet
At this point, gen is a generator object with all the code in my_generator encapsulated within it, ready to execute.
2. Coroutine Frames: Storing the Execution State(Most Important)
Each time Python encounters a yield statement in a generator, it creates a coroutine frame. A coroutine frame is a special data structure that holds:
- The current instruction pointer, which tracks the location in the function (this includes the position after the last yield).
- All local variables and their values up to the point of the yield.
- The internal state of the function, which includes the call stack and exception handling details if applicable.
This coroutine frame is what allows the function to “pause” at a yield statement and then “resume” from the same place when next() is called on the generator object.
def counter():
x = 0
while True:
yield x
x += 1
gen = counter()
next(gen)
next(gen)
- When gen = counter() is called, Python allocates a coroutine frame but doesn’t execute any code yet.
- When next(gen) is called, Python:
- Resumes from the coroutine frame created earlier.
- Runs until it reaches the first yield x, where it pauses and returns the current value of x.
- Stores the state (including the value of x and the point in code) in the coroutine frame.
3. When next(gen) is called again, Python:
- Uses the saved coroutine frame to restore the function’s previous state.
- Continues from the line after yield, updating x to x + 1.
- Yields the new value of x and stores the updated state again.
3. __next__ and __iter__ Methods: Controlling Generator Execution
The generator object implements both __next__ and __iter__, which control the flow of execution:
__next__: Each call to next() (or implicitly via a for loop) invokes __next__. It runs the generator until it encounters a yield, where it pauses and returns the yielded value.
If there are no more values to yield (end of function), __next__ raises a StopIteration exception.
__iter__: This allows the generator to be used as an iterator in constructs like for loops by returning self, the generator object itself.
4. Exception Handling and Cleanup
When a generator function finishes, it raises a StopIteration exception, signaling that there are no more values to yield. The generator object handles this exception automatically, allowing constructs like for loops to stop iterating gracefully.
If you want to terminate a generator prematurely, you can call gen.close(). This raises a GeneratorExit exception inside the generator function, allowing it to clean up resources if needed.
gen.close()
5. Memory Management and Efficiency
Since a generator yields values one at a time and maintains only the state needed for the next iteration, it is extremely memory-efficient. The coroutine frame uses far less memory than storing a full sequence because it holds only essential variables and instruction pointers, not the full sequence of yielded values.
For example, iterating over a large file line-by-line can be done with a generator without loading the entire file into memory:
def read_large_file(file_path):
with open(file_path, 'r') as file:
for line in file:
yield line
Using a generator here means only one line is read into memory at a time, making it possible to handle very large files.
Comparing Memory Usage for Generators vs. Lists
Let’s compare a generator to a list by implementing a function that computes squares of a large range of numbers (say from 1 to 10 million). We’ll see how much memory each approach uses and understand the efficiency advantage of generators for large data.
Example: Comparing Memory Usage for Generators vs. Lists
- List Approach: We’ll create a list that stores the square of each number from 1 to 10 million.
- Generator Approach: We’ll use a generator to calculate and yield the square of each number from 1 to 10 million, one at a time.
import sys
# List approach
def square_list(n):
return [i * i for i in range(1, n + 1)]
# Generator approach
def square_generator(n):
for i in range(1, n + 1):
yield i * i
n = 10000000 # Ten million
# Memory usage for list
squares_list = square_list(n)
list_memory = sys.getsizeof(squares_list)
print(f"Memory used by list: {list_memory} bytes ({list_memory / (1024 ** 2):.6f} MB)")
# Memory usage for generator
squares_gen = square_generator(n)
generator_memory = sys.getsizeof(squares_gen)
print(f"Memory used by generator: {generator_memory} bytes ({generator_memory / (1024 ** 2):.6f} MB)")
Output:
Memory used by list: 89095160 bytes (84.967766 MB)
Memory used by generator: 216 bytes (0.000206 MB)
Conclusion
Python generators provide an elegant way to handle large data sequences efficiently. By producing values on the fly, they avoid the memory overhead of lists, which store all values at once. Generators are an essential tool in Python for improving memory efficiency, especially when dealing with large datasets, data streams, or infinite sequences.
Next time you’re handling a large sequence or an iterable with indefinite size, consider using a generator for better performance and memory efficiency!
Happy coding! 😊
