Crash course to Python context managers
Context managers are the pythonic way to ensure that your program doesn’t leak resources. If you’ve ever done file IO in Python, chances are you’ve seen something like this:
with open("myfile.txt") as f: process_file(f)
That is roughly equivalent to
f = open("myfile.txt") try: process_file(f) finally: f.close()
But you never saw
f.close() in the first snippet. The file descriptor is nevertheless not leaked! That’s because wrapping the file in a
with block ensures that the file is closed no matter what. But the first snippet does it with fewer lines of code, which means less room for programming errors.
But there are other types of resources that a programmer must forget not to leak: network connections, database transactions, and even locks in multithreaded code. Context managers are a flexible way to manage different types of resources.
There is a powerful tool for writing context managers in the Python standard library: contextlib. Given how useful it is, I weep a little every time I’m dealing with code whose author missed an opportunity to use it. Let’s look at some patterns that are going to make your fellow pythonists happy.
Writing context managers with the
How do context managers work under the hood? Python documentation explains how an object and the
with statement interact: When entering a with block, the Python runtime calls the
__enter__() method of the object, binding whatever it returns to the target of the
with statement. When exiting the block, the Python runtime calls the
__exit__() method. If the block exits with an exception, it is passed as arguments to the
__exit__() method. The return value of
__exit__() then determines if the exception is suppressed or passed up the calls stack.
Sometimes all that is a bit more low-level than you’d like. Luckily the contextlib provides the
@contextmanager decorator that turns a generator function into a context manager and deals with the low-level details behind the scene. Take a look at the following imaginary database managing library:
@contextlib.contextmanager def connect_database(): handle = acquire_connection() try: yield handle finally: handle.release_connection()
There is magic behind the scenes (there often is when Python decorators are involved!). The decorated
connect_database() function returns something that acts as a context manager. The
__enter__() method of that thing runs until the
yield statement of the original generator, and returns whatever is yielded. The
__exit__() method runs the generator until completion. And as guaranteed by the Python runtime, whatever is under the
finally block gets executed whether or not there is an exception.
Context managers are not good only for releasing resources. They are useful for wrapping all kinds of code blocks and ensuring some action gets performed afterward. For example, the above database class could support transactions that either commit or rollback the modifications after the transaction block.
import contextlib @contextlib.contextmanager def transaction(handle): handle.begin_transaction() try: yield except: handle.rollback() raise else: handle.commit() def make_transaction(handle): with transaction(handle): handle.update_record("record1", "foo") handle.insert_Record("record2", "bar")
Not all context managers are naturally implementable as generator functions. An example of this is the file object which is both a managed object and implements the context manager protocol. Such types can benefit from inheriting from the
AbstractContextManager class that provides default implementations for
closing() non-managed objects
First of all, if you’re writing a new class that abstracts a resource the programmer needs to close, there are zero excuses for not making it a context manager. But say you’re maintaining code someone else wrote, and you’d like to convert a dangerous unmanaged resource into a canny managed one. This use-case is typical enough that a function called
closing() has been included in the standard library.
Here’s a snippet your reckless colleague wrote:
def get_data_from_database(): handle = connect_database() # Why I can't wrap this in context manager?? data = handle.query_database() # I sure hope this doesn't raise exception! handle.close() return data
And here’s how you fix it:
import contextlib def get_data_from_database(): with contextlib.closing(connect_database()) as handle: return handle.query_database() # There, it won't leak!
Function accepting either a file or a path
How many times I’ve seen it! A helpful colleague authored a handy function to retrieve and process some data from a file, but the API she provided will only accept a path to a file. And I’d like to process a file that I already opened. Or there is a function that opens a new database connection, and I’d like to reuse a handle I already have. Let’s use this function as an example:
def count_lines_in_file(path): with open(path) as f: return len(f.readlines())
I want to extend this function to also accept a file object, but remain backward compatible and keep accepting path. Let’s do that:
def count_lines_in_file(path_or_file): if isinstance(path_or_file, str): f = open(path_or_file) else: f = path_or_file with f: return len(f.readlines())
There is a problem. If I use a file as an argument, I end up with a closed file when the function returns. And I don’t want that because it’s my file I manage. However, if I use a path argument, it’s someone else’s file, and I want it closed.
It’s possible to have two branches in the function, one closing the file and the other not. But that’s ugly and repetitive. For this kind of situation, there is the
nullcontext() utility. It creates something that can be wrapped in a
with statement, but which doesn’t do anything. This is best explained with a code snippet:
import contextlib def count_lines_in_file(path_or_file): if isinstance(path_or_file, str): cm = open(path_or_file) else: cm = contexlib.nullcontext(path_or_file) with cm as f: return len(f.readlines())
I’m not a fan of having multiple ways to use my APIs (file or path). A function processing a file should take a file as an argument. But it’s a neat way to extend an API that isn’t as general as you’d like it to be.
The contextlib page is one of the most well-written parts in the already eloquent Python documentation. There are several recipes that help using contextlib to write a pythonic way to manage resources.
The heaviest tool in the contextlib toolbox is undoubtedly the
ExitStack class, which allows managing several context managers and non-managed resources at once. The downside is a verbose API. For some reason, I haven’t found too much use for it myself, since one of the simpler tools in the library is usually good enough for the job.
There is also the asynchronous
with statement in Python. Asynchronous versions of the contextlib utilities, like
AsyncExitStack, are there to help writing asynchronous context managers.
I hope this post helps you the next time you need to write readable code that doesn’t leak resources!