Speeding up Docker build (Part II) - Ordering Dockerfile

Because of the properties of a union mount file system, the order in which the build instructions are written in a Dockerfile is important. By Yihang Ho
Guillaume Bolduc

Have you ever felt that running docker build is slow? Looking at the logs, it seems like Docker is trying to download your dependencies over and over again. Can't Docker intelligently reuse those?

This is a series on understanding why this problem occurs and how we can improve the situation. Previously, we explored how a union mount file system works. In this post, we will look at two specific strategies to write performant Dockerfiles.

Pre-requisite knowledge

If you are not familiar with the NPM and RubyGems ecosystem, please take a moment to read this footnote as our examples will be using them. 1 Note that it is OK if you don't usually work with NPM or RubyGems, our discussion here is not specific to any one programming language -- we are using NPM and RubyGems only as examples.

Copying dependency specifications first

Many Docker tutorials show us something like this:

FROM node:8

WORKDIR /app
COPY . .
RUN npm install

Writing Dockerfile like this can unnecessarily slow down builds. From what we talked about in the previous post, if any file changed, then the commands COPY . . and RUN npm install will re-run. Instead, ideally, we want RUN npm install to re-run only when needed, that is, when we add or remove a package.

The solution is simple -- we copy package.json and package-lock.json first:

FROM node:8

WORKDIR /app
COPY package.json package-lock.json ./
RUN npm install

COPY . .

This idea can be easily transferred to other languages -- requirements.txt for Python, composer.json for PHP, mix.exs and mix.lock for Elixir, etc.

Multi-stage builds

The previous strategy will not work if we use more than one package manager. This sounds silly but is quite common. Many full-stack web frameworks do that -- one for the server-side language and one for the client-side JavaScript. For example, Ruby on Rails uses RubyGems for the server code and yarn for client-side JavaScript. In this case, we end up with a Dockerfile that looks like this:

FROM node:8

WORKDIR /app
COPY package.json yarn.lock ./
RUN yarn install

COPY Gemfile Gemfile.lock ./
RUN bundle install

COPY . .

However, when we change package.json, the Ruby dependencies are forced to be re-installed again. We can swap the order, but the problem will just reverse the direction as well.

Docker 17.05 introduces multi-stage builds. In a multi-stage build, we can imagine building multiple temporary images and aggregate their results into one. In our case, we can have two temporary images, one each for RubyGems and NPM, and copy whatever dependencies they downloaded into the final image. This is how it can be done:

FROM nodejs:8 as npm

WORKDIR /app
COPY package.json yarn.lock ./
RUN yarn install # This produces /app/node_modules

FROM ruby:2.5

WORKDIR /app
COPY Gemfile Gemfile.lock ./
RUN bundle install

COPY --from=npm /app/node_modules .

COPY . .

Lines 1-5 denote the temporary image. We name it npm. The second half of the file is for the main image that we are building. What's interesting is the penultimate instruction:

COPY --from=npm /app/node_modules .

It copies the downloaded packages from another image into the final image that we are building. With such a setup, we end up re-installing the dependencies only when absolutely necessary.

Conclusion

I am sure there are many variants of what I presented here. The big idea is to force Docker to reuse as much as it can. We do so by moving things that rarely change to the beginning of the Dockerfile. In the future, if you find that Docker is taking a long time to rebuild your image, try to look opportunity to re-order your Dockerfile.


  1. Projects that have NPM dependencies usually have package.json, which is used to (among other things) list down the direct dependencies and their version constraints, and package-lock.json or yarn.lock, which specify the exact version of all the dependencies (direct or indirect) used. The package manager that NPM uses is either npm or yarn. In the Ruby world, the analogous files are called Gemfile and Gemfile.lock. Its package manager is the Bundler (with CLI tool called bundle).