build – Docker https://www.docker.com Tue, 24 Jan 2023 18:47:32 +0000 en-US hourly 1 https://wordpress.org/?v=6.2.2 https://www.docker.com/wp-content/uploads/2023/04/cropped-Docker-favicon-32x32.png build – Docker https://www.docker.com 32 32 Generating SBOMs for Your Image with BuildKit https://www.docker.com/blog/generate-sboms-with-buildkit/ Tue, 24 Jan 2023 15:00:00 +0000 https://www.docker.com/?p=39978 Learn how to use BuildKit to generate SBOMs for your images and packages.

The latest release series of BuildKit, v0.11, introduces support for build-time attestations and SBOMs, allowing publishers to create images with records of how the image was built. This makes it easier for you to answer common questions, like which packages are in the image, where the image was built from, and whether you can reproduce the same results locally.

This new data helps you make informed decisions about the security of the images you consume — without needing to do all the manual work yourself.

In this blog post, we’ll discuss what attestations and SBOMs are, how to build images that contain SBOMs, and how to start analyzing the resulting data!

What are attestations?

An attestation is a declaration that a statement is true. With software, an attestation is a record that specifies a statement about a software artifact. For example, it could include who built it and when, what inputs it was built with, what outputs it produced, etc.

By writing these attestations, and distributing them alongside the artifacts themselves, you can see these details that might otherwise be tricky to find. To get this kind of information without attestations, you’d have to try and reverse-engineer how the image was built by trying to locate the source code and even attempting to reproduce the build yourself.

To provide this valuable information to the end-users of your images, BuildKit v0.11 lets you build these attestations as part of your normal build process. All it takes is adding a few options to your build step.

BuildKit supports attestations in the in-toto format (from the in-toto framework). Currently, the Dockerfile frontend produces two types of attestations that answer two different questions:

  • SBOM (Software Bill of Materials) – An SBOM contains a list of software components inside an image. This will include the names of various packages installed, their version numbers, and any other associated metadata. You can use this to see, at a glance, if an image contains a specific package or determine if an image is vulnerable to specific CVEs.
  • SLSA Provenance – The Provenance of the image describes details of the build process, such as what materials (like, images, URLs, files, etc.) were consumed, what build parameters were set, as well as source maps that allow mapping the resulting image back to the Dockerfile that created it. You can use this to analyze how an image was built, determine whether the sources consumed all appear legitimate, and even attempt to rebuild the image yourself.

Users can also define their own custom attestation types via a custom BuildKit frontend. In this post, we’ll focus on SBOMs and how to use them with Dockerfiles.

Getting the latest release

Building attestations into your images requires the latest releases of both Buildx and BuildKit – you can get the latest versions by updating Docker Desktop to the most recent version.

You can check your version number, and ensure it matches the buildx v0.10 release series:

$ docker buildx version
github.com/docker/buildx 0.10.0 ...

To use the latest release of BuildKit, create a docker-container builder using buildx:

$ docker buildx create --use --name=buildkit-container --driver=docker-container

You can check that the new builder is configured correctly, and ensure it matches the buildkit v0.11 release series:

$ docker buildx inspect | grep -i buildkit
Buildkit:  v0.11.1

If you’re using the docker/setup-buildx-action in GitHub Actions, then you’ll get this all automatically without needing to update.

With that out of the way, you can move on to building an image containing SBOMs!

Adding SBOMs to your images

You’re now ready to generate an SBOM for your image!

Let’s start with the following Dockerfile to create an nginx web server:

# syntax=docker/dockerfile:1.5

FROM nginx:latest
COPY ./static /usr/share/nginx/html

You can build and push this image, along with its SBOM, in one step:

$ docker buildx build --sbom=true -t <myorg>/<myimage> --push .

That’s all you need! In your build output, you should spot a message about generating the SBOM:

...
=> [linux/amd64] generating sbom using docker.io/docker/buildkit-syft-scanner:stable-1                           	0.2s
...

BuildKit generates SBOMs using scanner plugins. By default, it uses buildkit-syft-scanner, a scanner built on top of Anchore’s Syft open-source project, to do the heavy lifting. If you like, you can use another scanner by specifying the generator= option. 

Here’s how you view the generated SBOM using buildx imagetools:

$ docker buildx imagetools inspect <myorg>/<myimage> --format "{{ json .SBOM.SPDX }}"
{
	"spdxVersion": "SPDX-2.3",
	"dataLicense": "CC0-1.0",
	"SPDXID": "SPDXRef-DOCUMENT",
	"name": "/run/src/core/sbom",
	"documentNamespace": "https://anchore.com/syft/dir/run/src/core/sbom-a589a536-b5fb-49e8-9120-6a12ce988b67",
	"creationInfo": {
	"licenseListVersion": "3.18",
	"creators": [
	"Organization: Anchore, Inc",
	"Tool: syft-v0.65.0",
	"Tool: buildkit-v0.11.0"
	],
	"created": "2023-01-05T16:13:17.47415867Z"
	},
	...

SBOMs also work with the local and tar exporters. When you export with these exporters, instead of attaching the attestations directly to the output image, the attestations are exported as separate files into the output filesystem:

$ docker buildx build --sbom=true -o ./image .
$ ls -lh ./image
-rw-------  1 user user 6.5M Jan 17 14:36 sbom.spdx.json
...

Viewing the SBOM in this case is as simple as cat-ing the result:

$ cat ./image/sbom.spdx.json | jq .predicate
{
	"spdxVersion": "SPDX-2.3",
	"dataLicense": "CC0-1.0",
	"SPDXID": "SPDXRef-DOCUMENT",
	…

Supplementing SBOMs

Generating SBOMs using a scanner is a good first start! But some packages won’t be correctly detected because they’ve been installed in a slightly unconventional way.

If that’s the case, you can still get this information into your SBOMs with a bit of manual interaction.

Let’s suppose you’ve installed foo v1.2.3 into your image by downloading it using curl:

RUN curl https://example.com/releases/foo-v1.2.3-amd64.tar.gz | tar xzf - && \
    mv foo /usr/local/bin/

Software installed this way likely won’t appear in your SBOM unless the SBOM generator you’re using has special support for this binary (for example, Syft has support for detecting certain known binaries).

You can manually generate an SBOM for this piece of software by writing an SPDX snippet to a location of your choice on the image filesystem using a Dockerfile heredoc:

COPY /usr/local/share/sbom/foo.spdx.json <<"EOT"
{
	"spdxVersion": "SPDX-2.3",
	"SPDXID": "SPDXRef-DOCUMENT",
	"name": "foo-v1.2.3",
	...
}
EOT

This SBOM should then be picked up by your SBOM generator and included in the final SBOM for the whole image. This behavior is included out-of-the-box in buildkit-syft-scanner, but may not be part of every generator’s toolkit.

Even more SBOMs!

While the above section is good for scanning a basic image, it might struggle to provide more detailed package and file information. BuildKit can help you scan additional components of your build, including intermediate stages and your build context using the BUILDKIT_SBOM_SCAN_STAGE and BUILDKIT_SBOM_SCAN_CONTEXT arguments respectively.

In the case of multi-stage builds, this allows you to track dependencies from previous stages, even though that software might not appear in your final image.

For example, for a demo C/C++ program, you might have the following Dockerfile:

# syntax=docker/dockerfile:1.5

FROM ubuntu:22.04 AS build
ARG BUILDKIT_SBOM_SCAN_STAGE=true
RUN apt-get update && apt-get install -y git build-essential
WORKDIR /src
RUN git clone https://example.com/myorg/myrepo.git .
RUN make build

FROM scratch
COPY --from=build /src/build/ /

If you just scanned the resulting image, it wouldn’t reveal that the build tools, like Git or GCC (included in the build-essential package), were ever used in the build process! By integrating SBOM scanning into your build using the BUILDKIT_SBOM_SCAN_STAGE build argument, you can get much richer information that would otherwise have been completely lost.

You can access these additionally generated SBOM documents in imagetools as well:

$ docker buildx imagetools inspect <myorg>/<myimage> --format "{{ range .SBOM.AdditionalSPDXs }}{{ json . }}{{ end }}"
{
	"spdxVersion": "SPDX-2.3",
	"SPDXID": "SPDXRef-DOCUMENT",
	...
}
{
	"spdxVersion": "SPDX-2.3",
	"SPDXID": "SPDXRef-DOCUMENT",
	...
}
...

For the local and tar exporters, these will appear as separate files in your output directory:

$ docker buildx build --sbom=true -o ./image .
$ ls -lh ./image
-rw------- 1 user user 4.3M Jan 17 14:40 sbom-build.spdx.json
-rw------- 1 user user  877 Jan 17 14:40 sbom.spdx.json
...

Analyzing images

Now that you’re publishing images containing SBOMs, it’s important to find a way to analyze them to take advantage of this additional data.

As mentioned above, you can extract the attached SBOM attestation using the imagetools subcommand:

$ docker buildx imagetools inspect <myorg>/<myimage> --format "{{json .SBOM.SPDX}}"
{
	"spdxVersion": "SPDX-2.3",
	"dataLicense": "CC0-1.0",
	"SPDXID": "SPDXRef-DOCUMENT",
	...

If your target image is built for multiple architectures using the --platform flag, then you’ll need a slightly different syntax to extract the SBOM attestation:

$ docker buildx imagetools inspect <myorg>/<myimage> --format "{{ json (index .SBOM "linux/amd64").SPDX}}"
{
	"spdxVersion": "SPDX-2.3",
	"dataLicense": "CC0-1.0",
	"SPDXID": "SPDXRef-DOCUMENT",
	...

Now suppose you want to list all of the packages, and their versions, inside an image. You can modify the value passed to the --format flag to be a go template that lists the packages:

$ docker buildx imagetools inspect <myorg>/<myimage> --format '{{ range .SBOM.SPDX.packages }}{{ println .name .versionInfo }}{{ end }}' | sort
adduser 3.118
apt 2.2.4
base-files 11.1+deb11u6
base-passwd 3.5.51
bash 5.1-2+deb11u1
bsdutils 1:2.36.1-8+deb11u1
ca-certificates 20210119
coreutils 8.32-4+b1
curl 7.74.0-1.3+deb11u3
...

Alternatively, you might want to get the version information for a piece of software that you know is installed:

$ docker buildx imagetools inspect <myorg>/<myimage> --format '{{ range .SBOM.SPDX.packages }}{{ if eq .name "nginx" }}{{ println .versionInfo }}{{ end }}{{ end }}'
1.23.3-1~bullseye

You can even take the whole SBOM and use it to scan for CVEs using a tool that can use SBOMs to search for CVEs (like Anchore’s Grype):

$ docker buildx imagetools inspect <myorg>/<myimage> --format '{{ json .SBOM.SPDX }}' | grype
NAME          	INSTALLED            	FIXED-IN 	TYPE  VULNERABILITY 	SEVERITY   
apt           	2.2.4                             	deb   CVE-2011-3374 	Negligible  
bash          	5.1-2+deb11u1        	(won't fix) deb   CVE-2022-3715 	 
...

These operations should complete super quickly! Because the SBOM was already generated at build, you’re just querying already-existing data from Docker Hub instead of needing to generate it from scratch every time.

Going further

In this post, we’ve only covered the absolute basics to getting started with BuildKit and SBOMs — you can find out more about the things we’ve talked about on docs.docker.com:

And you can find out more about other features released in the latest BuildKit v0.11 release here.

]]>
Highlights from the BuildKit v0.11 Release https://www.docker.com/blog/highlights-buildkit-v0-11-release/ Thu, 19 Jan 2023 15:00:00 +0000 https://www.docker.com/?p=39889 BuildKit v0.11 now available.

BuildKit v0.11 is now available, along with Buildx v0.10 and v1.5 of the Dockerfile syntax. We’ve released new features, bug fixes, performance improvements, and improved documentation for all of the Docker Build tools.
Let’s dive into what’s new! We’ll cover the highlights, but you can get the whole story in the full changelogs.

1. SLSA Provenance

BuildKit can now create SLSA Provenance attestation to trace the build back to source and make it easier to understand how a build was created. Images built with new versions of Buildx and BuildKit include metadata like links to source code, build timestamps, and the materials used during the build. To attach the new provenance, BuildKit now defaults to creating OCI-compliant images.

Although docker buildx will add a provenance attestation to all new images by default, you can also opt into more detail. These additional details include your Dockerfile source, source maps, and the intermediate representations used by BuildKit. You can enable all of these new provenance records using the new --provenance flag in Buildx:

$ docker buildx build --provenance=true -t <myorg>/<myimage> --push .

Or manually set the provenance generation mode to either min or max (read more about the different modes):

$ docker buildx build --provenance=mode=max -t <myorg>/<myimage> --push .

You can inspect the provenance of an image using the imagetools subcommand. For example, here’s what it looks like on the moby/buildkit image itself:

$ docker buildx imagetools inspect moby/buildkit:latest --format '{{ json .Provenance }}'
{
  "linux/amd64": {
    "SLSA": {
      "buildConfig": {

You can use this provenance to find key information about the build environment, such as the git repository it was built from:

$ docker buildx imagetools inspect moby/buildkit:latest --format '{{ json (index .Provenance "linux/amd64").SLSA.invocation.configSource }}'
{
  "digest": {
	"sha1": "830288a71f447b46ad44ad5f7bd45148ec450d44"
  },
  "entryPoint": "Dockerfile",
  "uri": "https://github.com/moby/buildkit.git#refs/tags/v0.11.0"
}

Or even the CI job that built it in GitHub actions:

$ docker buildx imagetools inspect moby/buildkit:latest --format '{{ (index .Provenance "linux/amd64").SLSA.builder.id }}'
https://github.com/moby/buildkit/actions/runs/3878249653

Read the documentation to learn more about SLSA Provenance attestations or to explore BuildKit’s SLSA fields.

2. Software Bill of Materials

While provenance attestations help to record how a build was completed, Software Bill of Materials (SBOMs) record what components are used. This is similar to tools like docker sbom, but, instead of requiring you to perform your own scans, the author of the image can build the results into the image.

You can enable built-in SBOMs with the new --sbom flag in Buildx:

$ docker buildx build --sbom=true -t <myorg>/<myimage> --push .

By default, BuildKit uses docker/buildkit-syft-scanner (powered by Anchore’s Syft project) to build an SBOM from the resulting image. But any scanner that follows the BuildKit SBOM scanning protocol can be used here:

$ docker buildx build --sbom=generator=<custom-scanner> -t <myorg>/<myimage> --push .

Similar to SLSA provenance, you can use imagetools to query SBOMs attached to images. For example, if you list all of the discovered dependencies used in moby/buildkit, you get this:

$ docker buildx imagetools inspect moby/buildkit:latest --format '{{ range (index .SBOM "linux/amd64").SPDX.packages }}{{ println .name }}{{ end }}'
github.com/Azure/azure-sdk-for-go/sdk/azcore
github.com/Azure/azure-sdk-for-go/sdk/azidentity
github.com/Azure/azure-sdk-for-go/sdk/internal
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob
...

Read the SBOM attestations documentation to learn more.

3. SOURCE_DATE_EPOCH

Getting reproducible builds out of Dockerfiles has historically been quite tricky — a full reproducible build requires bit-for-bit accuracy that produces the exact same result each time. Even builds that are fully deterministic would get different timestamps between runs.

The new SOURCE_DATE_EPOCH build argument helps resolve this, following the standardized environment variable from the Reproducible Builds project. If the build argument is set or detected in the environment by Buildx, then BuildKit will set timestamps in the image config and layers to be the specified Unix timestamp. This helps you get perfect bit-for-bit reproducibility in your builds.

SOURCE_DATE_EPOCH is automatically detected by Buildx from the environment. To force all the timestamps in the image to the Unix epoch:

$ SOURCE_DATE_EPOCH=0 docker buildx build -t <myorg>/<myimage> .

Alternatively, to set it to the timestamp of the most recent commit:

$ SOURCE_DATE_EPOCH=$(git log -1 --pretty=%ct) docker buildx build -t <myorg>/<myimage> .

Read the documentation to find out more about how BuildKit handles SOURCE_DATE_EPOCH

4. OCI image layouts as named contexts

BuildKit has been able to export OCI image layouts for a while now. As of v0.11, BuildKit can import those results again using named contexts. This makes it easier to build contexts entirely locally — without needing to push intermediate results to a registry.

For example, suppose you want to build your own custom intermediate image based on Alpine that contains some development tools:

$ docker buildx build . -f intermediate.Dockerfile --output type=oci,dest=./intermediate,tar=false

This builds the contents of intermediate.Dockerfile and exports it into an OCI image layout into the intermediate/ directory (using the new tar=false option for OCI exports). To use this intermediate result in a Dockerfile, refer to it using any name you like in the FROM statement in your main Dockerfile:

FROM base
RUN ... # use the development tools in the intermediate image

You can then connect this Dockerfile to your OCI layout using the new oci-layout:// URI schema for the --build-context flag:

$ docker buildx build . -t <myorg>/<myimage> --build-context base=oci-layout://intermediate

Instead of resolving the image base to Docker Hub, BuildKit will instead read it from oci-layout://intermediate in the current directory, so you don’t need to push the intermediate image to a remote registry to be able to use it.

Refer to the documentation to find out more about using oci-layout:// with the --build-context flag.

5. Cloud cache backends

To get good build performance when building in ephemeral environments, such as CI pipelines, you need to store the cache in a remote backend. The newest release of BuildKit supports two new storage backends: Amazon S3 and Azure Blob Storage.

When you build images, you can provide the details of your S3 bucket or Azure Blob store to automatically store your build cache to pull into future builds. This build cache means that even though your CI or local runners might be destroyed and recreated, you can still access your remote cache to get quick builds when nothing has changed.

To use the new backends, you can specify them using the --cache-to and --cache-from flags:

$ docker buildx build --push -t <user>/<image> \
  --cache-to type=s3,region=<region>,bucket=<bucket>,name=<cache-image>[,parameters...] \
  --cache-from type=s3,region=<region>,bucket=<bucket>,name=<cache-image> .

$ docker buildx build --push -t <registry>/<image> \
  --cache-to type=azblob,name=<cache-image>[,parameters...] \
  --cache-from type=azblob,name=<cache-image>[,parameters...] .

You also don’t have to choose between one cache backend or the other. BuildKit v0.11 supports multiple cache exports at a time so you can use as many as you’d like.

Find more information about the new S3 backend in the Amazon S3 cache and the Azure Blob Storage cache backend documentation. 

6. OCI Image annotations

OCI image annotations allow attaching metadata to container images at the manifest level. They’re an alternative to labels that are more generic, and they can be more easily attached to multi-platform images.

All BuildKit image exporters now allow setting annotations to the image exporters. To set the annotations of your choice, use the --output flag:

$ docker buildx build ... \
    --output "type=image,name=foo,annotation.org.opencontainers.image.title=Foo"

You can set annotations at any level of the output, for example, on the image index:

$ docker buildx build ... \
    --output "type=image,name=foo,annotation-index.org.opencontainers.image.title=Foo"

Or even set different annotations for each platform:

$ docker buildx build ... \
    --output "type=image,name=foo,annotation[linux/amd64].org.opencontainers.image.title=Foo,annotation[linux/arm64].org.opencontainers.image.title=Bar"

You can find out more about creating OCI annotations on BuildKit images in the documentation.

7. Build inspection with --print

If you are starting in a codebase with Dockerfiles, understanding how to use them can be tricky. Buildx supports the new --print flag to print details about a build. This flag can be used to get quick and easy information about required build arguments and secrets, and targets that you can build. 

For example, here’s how you get an outline of BuildKit’s Dockerfile:

$ BUILDX_EXPERIMENTAL=1 docker buildx build --print=outline https://github.com/moby/buildkit.git
TARGET:  	buildkit
DESCRIPTION: builds the buildkit container image

BUILD ARG              	   VALUE   DESCRIPTION
RUNC_VERSION           	   v1.1.4   
ALPINE_VERSION         	   3.17	 
BUILDKITD_TAGS                  	defines additional Go build tags for compiling buildkitd
BUILDKIT_SBOM_SCAN_STAGE   true 

We can also list all the different targets to build:

$ BUILDX_EXPERIMENTAL=1 docker buildx build --print=targets https://github.com/moby/buildkit.git
TARGET             	DESCRIPTION
alpine-amd64      	 
alpine-arm        	 
alpine-arm64      	 
alpine-s390x      	 

Any frontend that implements the BuildKit subrequests interface can be used with the buildx --print flag. They can even define their own print functions, and aren’t just limited to outline or targets.

The --print feature is still experimental, so the interface may change, and we may add new functionality over time. If you have feedback, please open an issue or discussion on the docker/buildx GitHub, we’d love to hear your thoughts!

8. Bake features

The Bake file format for orchestrating builds has also been improved.

Bake now supports more powerful variable interpolation, allowing you to use fields from the same or other blocks. This can reduce duplication and make your bake files easier to read:

target "foo" {
  dockerfile = target.foo.name + ".Dockerfile"
  tags       = [target.foo.name]
}

Bake also supports null values for build arguments and allows labels to use the defaults set in your Dockerfile so your bake definition doesn’t override those:

variable "GO_VERSION" {
  default = null
}
target "default" {
  args = {
    GO_VERSION = GO_VERSION
  }
}

Read the Bake documentation to learn more. 

More improvements and bug fixes 

In this post, we’ve only scratched the surface of the new features in the latest release. Along with all the above features, the latest releases include quality-of-life improvements and bug fixes. Read the full changelogs to learn more:

We welcome bug reports and contributions, so if you find an issue in the releases, let us know by opening a GitHub issue or pull request, or get in contact in the #buildkit channel in the Docker Community Slack.

]]>
Faster Multi-Platform Builds: Dockerfile Cross-Compilation Guide https://www.docker.com/blog/faster-multi-platform-builds-dockerfile-cross-compilation-guide/ Thu, 02 Dec 2021 16:00:00 +0000 https://www.docker.com/blog/faster-multi-platform-builds-dockerfile-cross-compilation-guide/ There are some important changes happening in the software industry. With Apple moving all of their machines to their custom ARM-based silicon and AWS offering the best performance-per-cost ratio with their Graviton2 instances, one can no longer expect that all software only needs to run on x86 processors. If you work with containers there is some good tooling available for building multi-platform images when your development teams are using different architectures or you want to deploy to a different architecture from the one that you develop on. In this post, I’ll show some patterns that you can use if you want to get the best performance out of such builds.

In order to build multi-platform container images, we will use the docker buildxcommand. Buildx is a Docker component that enables many powerful build features with a familiar Docker user experience. All builds executed via buildx run with Moby Buildkit builder engine. Buildx can also be used standalone or, for example, to run builds in a Kubernetes cluster. In the next version of Docker CLI, the docker buildcommand will also start to use Buildx by default.

By default, a build executed with Buildx will build an image for the architecture that matches your machine. This way, you get an image that runs on the same machine you are working on. In order to build for a different architecture, you can set the--platform flag, e.g. --platform=linux/arm64. To build for multiple platforms together, you can set multiple values with a comma separator.

# building an image for two platforms
docker buildx build --platform=linux/amd64,linux/arm64 .

In order to build multi-platform images, we also need to create a builder instance as building multi-platform images is currently only supported when using BuildKit with docker-container and kubernetes drivers. Setting a single target platform is allowed on all buildx drivers.

docker buildx create --use
# building an image for two platforms
docker buildx build --platform=linux/amd64,linux/arm64 .

When building a multi-platform image from a Dockerfile, effectively your Dockerfile gets built once for each platform. At the end of the build, all of these images are merged together into a single multi-platform image.

FROM alpine
RUN echo "Hello" > /hello

For example, in the case of a simple Dockerfile like this that is built for two architectures, BuildKit will pull two different versions of the Alpine image, one containing x86 binaries and another containing arm64 binaries, and then run their respective shell binary on each of them.

Different methods of building

Generally, the CPU of your machine can only run binaries for its native architecture. x86 CPU can’t run ARM binaries and vice versa. So when we are running the above example on an Intel machine, how can it run the shell binary for ARM? It does this by running the binary through a software emulator instead of doing so directly.

docker buildx ls shows what emulators are installed for each of the builders. If you don’t see them listed for your system you can install them with the tonistiigi/binfmt image.

Using an emulator this way is very easy. We don’t need to modify our Dockerfile at all and can build for multiple platforms automatically. But it doesn’t come without downsides. The binaries running this way need to constantly convert their instructions between architectures and therefore don’t run with native speed. Occasionally you might also find a case that triggers a bug in the emulation layer.

One way to avoid this overhead is to modify your Dockerfile so that the longest-running commands don’t run through an emulator. Instead, we can use a cross-compilation stage.

The difference between emulation and cross-compilation is that in the former, we emulate the full system of another architecture in software, while in cross-compilation we only use binaries built for our native architecture with a special configuration option that makes them generate new binaries for our target architecture. As the name says, this technique can not be used for all processes but mostly only when you are running a compiler. Luckily the two techniques can be combined. For example, your Dockerfile can use emulation to install packages from the package manager and use cross-compilation to build your source code.

delete emmulation docker
Emulation vs. cross-compilation build with “ — platform=linux/amd64,linux/arm64″ as run on Intel/AMD machine. Blue contains x86 binaries, yellow ARM binaries.

When deciding whether to use emulation or cross-compilation, the most important thing to consider is if your process is using a lot of CPU processing power or not. Emulation is usually a fine approach for installing packages or if you need to create some files or run a one-off script. But if using cross-compilation can make your builds (possibly tens of) minutes faster, it is probably worth updating your Dockerfile. If you want to run tests as part of the build then cross-compilation can not achieve that. For the best performance in that case, another option is to use a remote build cluster with multiple machines with different architectures.

Preparing Dockerfile

In order to add cross-compilation to our Dockerfile, we will use multi-stage builds. The most common pattern to use in multi-stage builds is to define a build stage(s) where we prepare our build artifacts and a runtime stage that we export as a final image. We will use the same method here with an extra condition that we want our build stage to always run binaries for our native architecture and our runtime stage to contain binaries for the target architecture.

When we start a build stage with a command like FROM debian it instructs the builder to pull the Debian image that matches the value that was set with --platform flag during your build. What we want to do instead is to make sure this Debian image is always native to our current machine. When we are on an x86 system we could instead use a command like FROM --platform=linux/amd64 debian. Now, no matter what platform was set during the build, this stage will always be based on amd64. Except what happens now if we switch to an ARM machine like the new Apple Macs? Do we now need to change all our Dockerfiles? The answer is no, and instead of writing a constant platform value into our Dockerfile we should use a variable instead, FROM --platform=$BUILDPLATFORM debian.

BUILDPLATFORM is part of a set of automatically defined (global scope) build arguments that you can use. It will always match the platform or your current system and the builder will fill in the correct value for us.

Here is a complete list of such variables:

BUILDPLATFORM — matches the current machine. (e.g. linux/amd64)

BUILDOS — os component of BUILDPLATFORM, e.g. linux

BUILDARCH — e.g. amd64, arm64, riscv64

BUILDVARIANT — used to set ARM variant, e.g. v7

TARGETPLATFORM — The value set with --platform flag on build

TARGETOS - OS component from --platform, e.g. linux

TARGETARCH - Architecture from --platform, e.g. arm64

TARGETVARIANT

Now in our build stage, we can pull in our source code, install the compiler package we want to use, etc. These commands should be identical to the ones you are already using in your single-platform or emulation-based Dockerfile.

The only additional change that needs to be done now is that when you are calling your compiler process you need to pass it a parameter that configures it to return artifacts for your actual target architecture. Remember that now that our build stage always contains binaries for the host’s native architecture, the compiler can’t determine the target’s architecture automatically from the environment anymore.

In order to pass the target architecture, we can use the same automatically defined build arguments shown before, this time with TARGET* prefix. As we are using these build arguments inside the stage, they need to be in the local scope and declared with a ARG command before being used.

FROM --platform=$BUILDPLATFORM alpine AS build
# RUN <install build dependecies/compiler>
# COPY <source> .
ARG TARGETPLATFORM
RUN compile --target=$TARGETPLATFORM -o /out/mybinary

The only thing left to do now is to create a runtime stage that we will export as a result of our build. For this stage, we will not use --platform in the FROM definition. We could write FROM --platform=$TARGETPLATFORM but that is the default value for all builds stages anyway so using a flag is redundant.

FROM alpine
# RUN <install runtime dependencies installed via emulation>
COPY --from=build /out/mybinary /bin

To confirm, let’s look at what happens if the above Dockerfile is built for two platforms with docker buildx build --platform=linux/amd64,linux/arm64 . invoked on ARM64-based systems like new Apple M1 machines.

First, the builder will pull down the Alpine image for ARM64, install the build dependencies and copy over the source. Note that these steps execute only once, even though we are building for two different platforms. BuildKit is smart enough to understand that both of these platforms depend on the same compiler and source code and automatically deduplicates the steps.

Now two separate instances of containers running the compiler process will be invoked, with a different value passed to the --target flag.

For the export stage, BuildKit now pulls down both ARM64 and x86 versions of the Alpine image. If any runtime packages were used then the x86 versions are installed with the help of the emulation layer. All these steps already ran in parallel to the build stage as they did not share dependencies. As the last step, the binary created by the respective compiler process is copied to the stage.

delete docker 1 50IeRwpSLz8axrAx27TzXA
Sample Dockerfile commands as run on Apple M1 machine. Blue contains x86 binaries, yellow ARM.

Both of the runtime stages are then converted into an OCI image and BuildKit will prepare an OCI Image Index structure(also called a manifest list)that contains both of these images.

Go example

For a functional example, let’s look at an example project written in the Go programming language.

A typical multi-stage Dockerfile building a simple Go application would look something like:

FROM golang:1.17-alpine AS build
WORKDIR /src
COPY . .
RUN go build -o /out/myapp .

FROM alpine
COPY --from=build /out/myapp /bin

Using cross-compilation in Go is very easy. The only thing you need to do is pass the target architecture with environment variables. go build understands GOOS , GOARCH environment variables. There is also GOARM for specifying the ARM version for the 32bit systems.

The GOOS and GOARCH values are the same format as the TARGETOS and TARGETARCH values which we saw earlier that BuildKit makes available inside the Dockerfile.

When we apply all the steps we learned before: fixing the build stage to build platform, defining ARG TARGET* variables, and passing cross-compilation parameters to the compiler we will have:

FROM --platform=$BUILDPLATFORM golang:1.17-alpine AS build
WORKDIR /src
COPY . .
ARG TARGETOS TARGETARCH
RUN GOOS=$TARGETOS GOARCH=$TARGETARCH go build -o /out/myapp .

FROM alpine
COPY --from=build /out/myapp /bin

As you see we only needed to do three small modifications and our Dockerfile is much more powerful. Note that there are no downsides to these changes, the Dockerfile is still portable and works in all architectures. Just now when we build for a non-native architecture our builds are much faster.

Let’s look at some additional optimizations you might want to consider as well.

When Go applications depend on other Go modules they usually do it by either including the sources of the dependencies inside a vendor directory or if their project does not include such a directory then the Go compiler will pull the dependencies listed in the go.mod file while the go build command is running.

In the latter case, it means that (although our own source code was copied only one time) because thego build process was invoked twice for our multi-platform build, these dependencies would also be pulled twice. It’s better to avoid that by telling Go to download these dependencies before we branch our build stage with the ARG TARGETARCH command.

FROM --platform=$BUILDPLATFORM golang:1.17-alpine AS build
WORKDIR /src
COPY go.mod go.sum .
RUN go mod download
COPY . .
ARG TARGETOS TARGETARCH
RUN GOOS=$TARGETOS GOARCH=$TARGETARCH go build -o /out/myapp .

FROM alpine
COPY --from=build /out/myapp /bin

Now when two go build processes run they already have access to the pre-pulled dependencies. We also copied only thego.mod and go.sum files before downloading the packages so that when our regular source code changes we don’t invalidate cache for the module downloads.

For completeness, let’s also include cache mounts inside our Dockerfile. RUN --mount options allow exposing new mountpoints to the command that may be used for accessing your source code, build secrets, temporary and cache directories. Cache mounts create persistent directories where you can write your application-specific cache files that reappear the next time you invoke the builder again. This results in big performance gains when you are doing incremental builds after making changes to your source code.

In Go, the directories that you want to turn into cache mounts are /root/.cache/go-build and /go/pkg . The first is the default location of the Go build cache and the second is where go mod downloads modules. This assumes your user is root and GOPATH is /go .

RUN --mount=type=cache,target=/root/.cache/go-build \
    --mount=type=cache,target=/go/pkg \
    GOOS=$TARGETOS GOARCH=$TARGETARCH go build -o /out/myapp .

You can also use a type=bind mount (default type) to mount in your source code. This helps to avoid the overhead of actually copying the files with COPY . In cross-compiling in Dockerfile, it is sometimes especially important if you don’t want to copy your source before defining ARG TARGETPLATFORM as changes in the source code would invalidate the cache for your target-specific dependencies. Note that type=bind mounts are mounted read-only by default. If the commands you are running need to write files to your source code, you might still want to use COPY or set therw option for the mount.

This leads to our complete, fully-optimized cross-compiling Go Dockerfile:

FROM --platform=$BUILDPLATFORM golang:1.17-alpine AS build
WORKDIR /src
ARG TARGETOS TARGETARCH
RUN --mount=target=. \
    --mount=type=cache,target=/root/.cache/go-build \
    --mount=type=cache,target=/go/pkg \
    GOOS=$TARGETOS GOARCH=$TARGETARCH go build -o /out/myapp .

FROM alpine
COPY --from=build /out/myapp /bin

As an example, I measured how much time it takes to build the Docker CLI binary with the multi-stage Dockerfile we started with and then the one with the optimizations applied. As you can see, the difference is quite drastic.

delete blue docker 1 drkFLmbLQQWCiC7 fWejmQ
https://github.com/docker/cli build time with test Dockerfiles (seconds, lower is better)

For the initial build only for our native architecture, the difference is minimal — only a small change from not needing to run the COPY instruction. But when we build an image both for ARM and x86 CPUs the difference is huge. For our new Dockerfile, doubling architectures increases build time only by 70% (because some parts of the builds were shared), while when the second architecture builds with QEMU emulation, our build time is almost seven times longer.

With the additional help from the cache mounts that we added, our incremental rebuilds with a Go source code changes are reaching a ridiculous 100x speed improvement territory compared to the old Dockerfile.

]]>