Have you ever felt that running
docker build is slow? Looking at the logs, it seems like Docker is trying to download your dependencies over and over again. Can't Docker intelligently reuse those?
This is a series on understanding why this problem occurs and how we can improve the situation. Previously, we explored how a union mount file system works. In this post, we will look at two specific strategies to write performant Dockerfiles.
If you are not familiar with the NPM and RubyGems ecosystem, please take a moment to read this footnote as our examples will be using them. 1 Note that it is OK if you don't usually work with NPM or RubyGems, our discussion here is not specific to any one programming language -- we are using NPM and RubyGems only as examples.
Many Docker tutorials show us something like this:
FROM node:8 WORKDIR /app COPY . . RUN npm install
Writing Dockerfile like this can unnecessarily slow down builds. From what we talked about in the previous post, if any file changed, then the commands
COPY . . and
RUN npm install will re-run. Instead, ideally, we want
RUN npm install to re-run only when needed, that is, when we add or remove a package.
The solution is simple -- we copy
FROM node:8 WORKDIR /app COPY package.json package-lock.json ./ RUN npm install COPY . .
This idea can be easily transferred to other languages --
requirements.txt for Python,
composer.json for PHP,
mix.lock for Elixir, etc.
FROM node:8 WORKDIR /app COPY package.json yarn.lock ./ RUN yarn install COPY Gemfile Gemfile.lock ./ RUN bundle install COPY . .
However, when we change
package.json, the Ruby dependencies are forced to be re-installed again. We can swap the order, but the problem will just reverse the direction as well.
Docker 17.05 introduces multi-stage builds. In a multi-stage build, we can imagine building multiple temporary images and aggregate their results into one. In our case, we can have two temporary images, one each for RubyGems and NPM, and copy whatever dependencies they downloaded into the final image. This is how it can be done:
FROM nodejs:8 as npm WORKDIR /app COPY package.json yarn.lock ./ RUN yarn install # This produces /app/node_modules FROM ruby:2.5 WORKDIR /app COPY Gemfile Gemfile.lock ./ RUN bundle install COPY --from=npm /app/node_modules . COPY . .
Lines 1-5 denote the temporary image. We name it
npm. The second half of the file is for the main image that we are building. What's interesting is the penultimate instruction:
COPY --from=npm /app/node_modules .
It copies the downloaded packages from another image into the final image that we are building. With such a setup, we end up re-installing the dependencies only when absolutely necessary.
I am sure there are many variants of what I presented here. The big idea is to force Docker to reuse as much as it can. We do so by moving things that rarely change to the beginning of the Dockerfile. In the future, if you find that Docker is taking a long time to rebuild your image, try to look opportunity to re-order your Dockerfile.
Projects that have NPM dependencies usually have
package.json, which is used to (among other things) list down the direct dependencies and their version constraints, and
yarn.lock, which specify the exact version of all the dependencies (direct or indirect) used. The package manager that NPM uses is either npm or yarn. In the Ruby world, the analogous files are called
Gemfile.lock. Its package manager is the Bundler (with CLI tool called