Apparatus

You should be using contextlib in Python

Captain Picard meme: Your Python class needs to cleanup? Then make it support with statement.

Crash course to Python context managers

Context managers are the pythonic way to ensure that your program doesn’t leak resources. If you’ve ever done file IO in Python, chances are you’ve seen something like this:

with open("myfile.txt") as f:
    process_file(f)

That is roughly equivalent to

f = open("myfile.txt")
try:
    process_file(f)
finally:
    f.close()

But you never saw f.close() in the first snippet. The file descriptor is nevertheless not leaked! That’s because wrapping the file in a with block ensures that the file is closed no matter what. But the first snippet does it with fewer lines of code, which means less room for programming errors.

But there are other types of resources that a programmer must forget not to leak: network connections, database transactions, and even locks in multithreaded code. Context managers are a flexible way to manage different types of resources.

There is a powerful tool for writing context managers in the Python standard library: contextlib. Given how useful it is, I weep a little every time I’m dealing with code whose author missed an opportunity to use it. Let’s look at some patterns that are going to make your fellow pythonists happy.

Writing context managers with the @contextmanager decorator

How do context managers work under the hood? Python documentation explains how an object and the with statement interact: When entering a with block, the Python runtime calls the __enter__() method of the object, binding whatever it returns to the target of the with statement. When exiting the block, the Python runtime calls the __exit__() method. If the block exits with an exception, it is passed as arguments to the __exit__() method. The return value of __exit__() then determines if the exception is suppressed or passed up the calls stack.

Sometimes all that is a bit more low-level than you’d like. Luckily the contextlib provides the @contextmanager decorator that turns a generator function into a context manager and deals with the low-level details behind the scene. Take a look at the following imaginary database managing library:

@contextlib.contextmanager
def connect_database():
    handle = acquire_connection()
    try:
        yield handle
    finally:
        handle.release_connection()

There is magic behind the scenes (there often is when Python decorators are involved!). The decorated connect_database() function returns something that acts as a context manager. The __enter__() method of that thing runs until the yield statement of the original generator, and returns whatever is yielded. The __exit__() method runs the generator until completion. And as guaranteed by the Python runtime, whatever is under the finally block gets executed whether or not there is an exception.

Context managers are not good only for releasing resources. They are useful for wrapping all kinds of code blocks and ensuring some action gets performed afterward. For example, the above database class could support transactions that either commit or rollback the modifications after the transaction block.

import contextlib

@contextlib.contextmanager
def transaction(handle):
    handle.begin_transaction()
    try:
        yield
    except:
        handle.rollback()
        raise
    else:
        handle.commit()

def make_transaction(handle):
    with transaction(handle):
        handle.update_record("record1", "foo")
        handle.insert_Record("record2", "bar")

Not all context managers are naturally implementable as generator functions. An example of this is the file object which is both a managed object and implements the context manager protocol. Such types can benefit from inheriting from the AbstractContextManager class that provides default implementations for __enter__() and __exit__().

Automatically closing() non-managed objects

First of all, if you’re writing a new class that abstracts a resource the programmer needs to close, there are zero excuses for not making it a context manager. But say you’re maintaining code someone else wrote, and you’d like to convert a dangerous unmanaged resource into a canny managed one. This use-case is typical enough that a function called closing() has been included in the standard library.

Here’s a snippet your reckless colleague wrote:

def get_data_from_database():
    handle = connect_database()     # Why I can't wrap this in context manager??
    data = handle.query_database()  # I sure hope this doesn't raise exception!
    handle.close()
    return data

And here’s how you fix it:

import contextlib

def get_data_from_database():
    with contextlib.closing(connect_database()) as handle:
        return handle.query_database()  # There, it won't leak!

Function accepting either a file or a path

How many times I’ve seen it! A helpful colleague authored a handy function to retrieve and process some data from a file, but the API she provided will only accept a path to a file. And I’d like to process a file that I already opened. Or there is a function that opens a new database connection, and I’d like to reuse a handle I already have. Let’s use this function as an example:

def count_lines_in_file(path):
    with open(path) as f:
        return len(f.readlines())

I want to extend this function to also accept a file object, but remain backward compatible and keep accepting path. Let’s do that:

def count_lines_in_file(path_or_file):
    if isinstance(path_or_file, str):
        f = open(path_or_file)
    else:
        f = path_or_file
    with f:
        return len(f.readlines())

There is a problem. If I use a file as an argument, I end up with a closed file when the function returns. And I don’t want that because it’s my file I manage. However, if I use a path argument, it’s someone else’s file, and I want it closed.

It’s possible to have two branches in the function, one closing the file and the other not. But that’s ugly and repetitive. For this kind of situation, there is the nullcontext() utility. It creates something that can be wrapped in a with statement, but which doesn’t do anything. This is best explained with a code snippet:

import contextlib

def count_lines_in_file(path_or_file):
    if isinstance(path_or_file, str):
        cm = open(path_or_file)
    else:
        cm = contexlib.nullcontext(path_or_file)
    with cm as f:
        return len(f.readlines())

I’m not a fan of having multiple ways to use my APIs (file or path). A function processing a file should take a file as an argument. But it’s a neat way to extend an API that isn’t as general as you’d like it to be.

There’s more

The contextlib page is one of the most well-written parts in the already eloquent Python documentation. There are several recipes that help using contextlib to write a pythonic way to manage resources.

The heaviest tool in the contextlib toolbox is undoubtedly the ExitStack class, which allows managing several context managers and non-managed resources at once. The downside is a verbose API. For some reason, I haven’t found too much use for it myself, since one of the simpler tools in the library is usually good enough for the job.

There is also the asynchronous with statement in Python. Asynchronous versions of the contextlib utilities, like @asynccontextmanager and AsyncExitStack, are there to help writing asynchronous context managers.

I hope this post helps you the next time you need to write readable code that doesn’t leak resources!