Apparatus

Building C++ containers using Docker and CMake

I wanted to containerize a C++ application depending on some external libraries. Some of the libraries were readily available via package repositories and others that needed to be built from sources. I’m relatively new to Docker and started searching online for recipes for such cases. Much of the Docker documentation and examples across the web are concentrating on creating images for interpreted languages.

While containerizing a C++ application isn’t hard, I ended up learning a few things. To save you from the details I’ll use an example program, but the strategies presented across the versions are what I did and learned.

The program I’ll use as an example in this post is a simple echo server using ZeroMQ as the transport library. To make things a bit more interesting, I’ll use the cppzmq library that provides the basic C++ bindings on top of ZeroMQ. And while the ZeroMQ kernel is widely available to install via the package manager, cppzmq isn’t, so it will be built and installed from sources.

Crash course to CMake

The source code for the server and the client are available in GitHub. The build is configured using CMake. Following the example of the official MySQL image, both the server and the client are packaged inside the same image.

CMake is a meta build system that uses a high level-language to describe the build configuration. The high level description can then be compiled to the actual Makefiles, IDE projects etc. consumed by your usual toolchains. In this very simple case the CMakeLists.txt file consumed by CMake is very simple:

cmake_minimum_required(VERSION 3.13)
project(example)

find_package(cppzmq)

foreach(TGT server client)
  add_executable(${TGT} "${TGT}.cc")
  target_link_libraries(${TGT} cppzmq)
  set_property(TARGET ${TGT} PROPERTY CXX_STANDARD 17)
endforeach()

install(TARGETS server client RUNTIME DESTINATION bin)

The first two lines declare the required CMake version and the project name. The loop defines two targets (client and server), along with their dependencies. Note that “linking” a target in the CMake parlance means both that the compiler is pointed to the necessary headers, and that the linker is pointed to the necessary shared libraries.

Also note that although cppzmq is just a header-only wrapper on top of the ZeroMQ kernel, the cppzmq target carries a dependency to the ZeroMQ kernel. The CMakeLists.txt doesn’t need to specify that kind of transitive dependencies explicitly.

First version: One big image containing the build and runtime environments

The simplest idea is just basing an image on a “batteries included” base image containing the compiler, installing the extra libraries, and finally building the example application. Here’s the example Dockerfile:

FROM gcc:9

RUN set -ex;                                                                      \
    apt-get update;                                                               \
    apt-get install -y cmake libzmq3-dev;                                         \
    mkdir -p /usr/src;                                                            \
    cd /usr/src;                                                                  \
    curl -L https://github.com/zeromq/cppzmq/archive/v4.6.0.tar.gz | tar -zxf -;  \
    cd /usr/src/cppzmq-4.6.0;                                                     \
    cmake -D CPPZMQ_BUILD_TESTS:BOOL=OFF .; make; make install

COPY . /usr/src/example

RUN set -ex;              \
    cd /usr/src/example;  \
    cmake .; make; make install

ENTRYPOINT ["server"]

The gcc:9 image is based on Debian, and since cppzmq isn’t included in the Debian package repository, it’s downloaded and installed directly from Github. cppzmq itself is a header-only library, so just dropping the header file to the include path would suffice if I would author find module for cppzmq myself. But luckily cppzmq also provides an upstream CMake config file that is installed alongside the headers when doing a full installation.

A very careful approach to installing cppzmq would also include verifying the checksum of the archive in case Github or the certificate trust chain was compromised. But if I was to undergo that kind of scrutiny, I might as well package cppzmq and serve it (with signatures) from a private apt repository.

The COPY instruction will copy the CMakeLists.txt and the source code of the application from the build context. What’s left is compiling and installing the application, and defining the default entry point to the container.

Let’s see what happens when it’s compiled:

$ docker build -q -f Dockerfile-v1 -t example .
sha256:292b974eb3d0719290f833838b9ae7edbbb8a621f6d4716750bef9d5f40c02f0
$ docker images example
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
example             latest              292b974eb3d0        5 seconds ago       1.19GB

Understandably having the build-time environment in the image will bring its size up. Still, the amount of stuff that comes with the C++ compilers and development libraries is astounding.

Second version: Breaking the build to stages

Docker supports multi-stage builds designed to solve the problem described in the previous chapter. The idea is to build two images: The first one containing the whole build-time environment with compilers and development libraries, and the second one just containing the necessary runtime libraries and compiled artifacts.

Let’s take a look at the new Dockerfile:

# The first stage will install build dependencies and compile the executable

FROM debian:buster AS builder

RUN set -ex;                                                                      \
    apt-get update;                                                               \
    apt-get install -y g++ curl cmake libzmq3-dev;                                \
    mkdir -p /usr/src;                                                            \
    cd /usr/src;                                                                  \
    curl -L https://github.com/zeromq/cppzmq/archive/v4.6.0.tar.gz | tar -zxf -;  \
    cd /usr/src/cppzmq-4.6.0;                                                     \
    cmake -D CPPZMQ_BUILD_TESTS:BOOL=OFF .; make; make install

COPY . /usr/src/example

RUN set -ex;              \
    cd /usr/src/example;  \
    cmake .; make; make install

# The second stage will install the runtime dependencies only and copy
# the compiled executables

FROM debian:buster AS runtime

RUN set -ex;         \
    apt-get update;  \
    apt-get install -y libzmq5

COPY --from=builder /usr/local/bin /usr/local/bin

ENTRYPOINT ["server"]

There are two FROM directives in the Dockerfile, corresponding to two separate images named builder and runtime. The builder image doesn’t have entrypoint because its purpose is to produce the executables and install them to /usr/local/bin. The second image is built by getting just the necessary libraries and copying the compiled binaries from the builder. Notice the syntax of the COPY directive with the --from option.

The naming of the ZeroMQ library is confusing, but libzmq3-dev indeed is the develoment version of libzmq5.

The final image size is just a bit over one-tenth of the first example:

$ docker build -q -f Dockerfile-v2 -t example .
sha256:46ce892a013c58dcb0ec4217725dc053fe98d195f12bfacf7e431c472239ff80
$ docker images example
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
example             latest              46ce892a013c        3 seconds ago       142MB

Much better!

Interlude: Pitfalls of distributing C++ code in binary format

You may have noticed that I changed the base image from gcc:9 to debian:buster when going from the first to the second example. I initially tried to use gcc:9 as the base image for the builder but ran into issues. Let me explain with an example:

FROM gcc:9 AS builder

RUN set -ex;                                                                      \
    apt-get update;                                                               \
    apt-get install -y cmake libzmq3-dev;                                         \
    mkdir -p /usr/src;                                                            \
    cd /usr/src;                                                                  \
    curl -L https://github.com/zeromq/cppzmq/archive/v4.6.0.tar.gz | tar -zxf -;  \
    cd /usr/src/cppzmq-4.6.0;                                                     \
    cmake -D CPPZMQ_BUILD_TESTS:BOOL=OFF .; make; make install

COPY . /usr/src/example

RUN set -ex;              \
    cd /usr/src/example;  \
    cmake .; make; make install

FROM debian:buster

RUN set -ex;         \
    apt-get update;  \
    apt-get install -y libzmq5

COPY --from=builder /usr/local/bin /usr/local/bin

ENTRYPOINT ["server"]

What happened is that the client program wouldn’t run:

$ docker build -q -f Dockerfile-v1.5 -t example .
sha256:f4dc81209386d95f015907e2ab65e26299b26597f9ca322f8099c32c034f1115
$ docker run --rm --entrypoint=client example
client: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by client)

Turns out debian:buster uses GCC 8 instead of GCC 9, and the standard library ABI is not backward compatible between versions. Compiling and linking an application with GCC 9 and then running against libstdc++ compiled for GCC 8 doesn’t work. Thankfully libstdc++ checks for this instead of letting the ABI incompatibility silently corrupt the program.

The complexity of the C++ ABI is why portable C++ libraries are typically either header only or have a C API. ZeroMQ itself is a good example of this: The kernel is written in C++, but its public API has C linkage, which is again converted to a C++ API via cppzmq.

I could have worked around the problem by using gcc:8 as the “batteries included” base image for the builder container. But I realized its probably the best just to use a common base image for both stages, and install the missing pieces of the build-time environment by hand. After all, containerizing an application is all about controlling the dependencies for maximum compatibility. More on that in the next section.

Third version: Sharing the runtime libraries between the stages

In the second example, both the builder and the runtime images are based directly on debian:buster and install the environment separately. In the spirit of the DRY principle, I wanted to refactor the Dockerfile to eliminate this redundancy. I ended up having an intermediate layer containing the runtime libraries that both the builder and runtime used:

# The new base image to contain runtime dependencies

FROM debian:buster AS base

RUN set -ex;         \
    apt-get update;  \
    apt-get install -y libzmq5

# The first stage will install build dependencies on top of the
# runtime dependencies, and then compile

FROM base AS builder

RUN set -ex;                                                                      \
    apt-get install -y g++ curl cmake libzmq3-dev;                                \
    mkdir -p /usr/src;                                                            \
    cd /usr/src;                                                                  \
    curl -L https://github.com/zeromq/cppzmq/archive/v4.6.0.tar.gz | tar -zxf -;  \
    cd /usr/src/cppzmq-4.6.0;                                                     \
    cmake -D CPPZMQ_BUILD_TESTS:BOOL=OFF .; make; make install

COPY . /usr/src/example

RUN set -ex;              \
    cd /usr/src/example;  \
    cmake .; make; make install

# The second stage will already contain all dependencies, just copy
# the compiled executables

FROM base AS runtime

COPY --from=builder /usr/local/bin /usr/local/bin

ENTRYPOINT ["server"]

Let’s build and see what happens:

$ docker build -q -f Dockerfile-v3 -t example .
sha256:46ce892a013c58dcb0ec4217725dc053fe98d195f12bfacf7e431c472239ff80
$ docker images example
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
example             latest              46ce892a013c        2 minutes ago       142MB

What the third version produced was, in fact, the same image (compare the hashes). This comes as no surprise since the same directives are used to build the runtime image. However, I see the following benefits of structuring the Dockerfile like this:

  • Instead of invoking apt-get update twice like in the second version, the third version requires it only once. This speeds up the build.
  • The builder and the runtime container manifestly share the same runtime libraries. This means that the builder links the executables against the same shared library as the runtime container. While I would normally expect to get the same library version when downloading it from the same package repository within a short timespan, the approach makes it very explicit.