Cheap Docker images with Nix

nickjj · on April 15, 2016

Why isn't he using the Alpine based Redis image when comparing final image sizes?

It's unfair to say the official Redis image is 177mb because the Alpine version is available on the Docker Hub[1] and it's only 15.95mb.

Alpine is pretty awesome if your main goal is to shrink images without any effort[2].

[1] https://hub.docker.com/_/redis/

[2] http://nickjanetakis.com/blog/alpine-based-docker-images-mak...

Lethalman · on April 15, 2016

You are right. It's unfair, I'm going to modify the post to mention alpine.

The post is both about tiny images AND how to build docker images with nix. I think it was an interesting tooling to make for our community. Some Nix people are already using it for obvious reasons.

yhodique · on April 17, 2016

I commented on the original post, but actually I believe the strength of the Nix-based approach is to provide package management capabilities outside of the target image. In other words, it makes the "scratch" image actually usable.

So now, if you take an Alpine-like approach to the problem (musl, no extra stuff) in Nix, you can get much smaller images. And the reason is you don't have to pay for the limitations of the Dockerfile-based approach.

As a proof of concept, here's an extension of the Nix recipe to produce a 1.2MB redis image: https://gist.github.com/sigma/9887c299da60955734f0fff6e2faee...

Now, the numbers start getting a little bit meaningless (although that's still an order of magnitude...), but the point is that regardless of how great Alpine is (and it is definitely great), as a base image for a Dockerfile it'll always contain way too much stuff compared to what's really needed for the application itself.

markbnj · on April 15, 2016

Agreed, and alpine is the official Docker base distro at this time. I've built some very small images for haproxy and iperf with it.

_0w8t · on April 15, 2016

A common problem with Docker is after running a compiler/preprocessor during an image build one ends up with the bulid tools inside the image. A workaround is to use a shell script that first gets the image with the compiler, run it and then pass the output to the Docker build. But this is non-standard and encourages running random scripts on the build host defeating the advantages of using Docker during development to isolate build system. It is nice that Nix addresses this.

xienze · on April 15, 2016

Or you could remove the tools at the end of your Dockerfile...

Edit: am I missing something? This is a legitimate solution to the problem. Install the tools, compile, and remove them. The parent is suggesting a very clumsy approach (build on the host and pass the binary to the container as it's being built).

shykes · on April 15, 2016

(I didn't downvote you)

Disclaimer: I work at Docker.

Your approach is the logical one... But Docker currently has a limitation in how it handles removing files in a build. After each build step, the intermediary state is committed as a layer, just like a git commit. So removing files in a docker build is like removing files in git: they are still taking up space in the history.

The long-term solution is to support image "squashing" or "flattening" in docker-build.

A less clumsy short-term solution is to build a Docker image of the build environment; then 'docker run' that image to produce the final artifact. At least that way you get rid of the dependency on the host, which keeps your build more portable (if not as convenient as a single 'docker build')

akbar501 · on April 15, 2016

Our approach is to view Docker as part of our overall development process and then develop stage specific containers.

For example, we have development containers, build containers and runtime containers. Runtime containers are further segmented into product demo containers, testing containers and production containers.

I just published a new article on Docker this morning: http://www.dynomitedb.com/blog/2016/04/13/docker-containers/

An important point is that the build containers produce binaries that are used in both native package managers (ex. apt) and in Docker containers.

If you're interested in seeing this in action then checkout our source on GH: https://github.com/DynomiteDB

IMHO, a well designed approach to UnionFS layers is vital to high quality container architecture.

While we're focused on container use for databases (both in-memory and on-disk), much of our approach applies equally well to application layer containers.

_0w8t · on April 15, 2016

Nice reference about https://www.projectcalico.org . At some point insanity of using ethernet on top of UDP to carry IP traffic between containers must stop.

jasonjei · on April 15, 2016

Straight from the horse's mouth--admire your product, Mr. Hykes!

I love how you can run docker inside of a container. What I've done sometimes is run docker inside my build environment container. I use Docker Machine (OSX), so I just send the same machine environmental variables over to the container, but on Linux you could just link the socket file. In fact, I have a container just for Google Cloud that maintains my GKE config and makes it easy for me to prepare new deployments to the cloud.

bogomipz · on April 16, 2016

Could you elaborate on this process of deploying your container from inside docker via socket linking? I'm not sure I follow.

xienze · on April 15, 2016

> But Docker currently has a limitation in how it handles removing files in a build. After each build step, the intermediary state is committed as a layer, just like a git commit. So removing files in a docker build is like removing files in git: they are still taking up space in the history.

That's true unless you do the "everything in a single RUN statement" trick that is very popular.

fapjacks · on April 15, 2016

"Very popular" for the one case of installing things via package manager.

lisivka · on April 15, 2016

Proper solution is to build packages first using build environment, then install built binary packages in container, like any other package.

zenlikethat · on April 15, 2016

There's also another workaround to the ones mentioned by other posters: You could install your compilers, do your build, and clean up all your build tools within just one shell script invoked by a RUN line in Dockerfile. It's not pretty but it works.

shykes · on April 15, 2016

Yes that is a common workaround as well.

By the way: there is an open request for contributions to help improve this. The core Docker team very much wants to improve squashing in build, but it's a matter of time and resources.

If somebody cares enough to take the time to carry a design proposal then a patch, we would be happy to support that effort!

_0w8t · on April 15, 2016

The idea is not to build anything on the host. Rather it is more like a staged build initiated through a shell script. First pull/build image with the compiler, then docker run it to compile the application and finally build the final image with the application.

mayank · on April 15, 2016

The layered fs that Docker uses is based on additive snapshots, so removing tools at the end will paradoxically increase your image size with a useless snapshot.

xienze · on April 15, 2016

You can do everything within a single RUN statement to avoid that.

mayank · on April 15, 2016

Yes, but at the expense of readability, development speed, and incremental updates to images (where typically the dependencies layer changes slower than your target code).

mmerickel · on April 15, 2016

The idea is actually to do something similar to a heroku buildpack where you have a container with build tools that generates binary assets. You then inject the built binary into a new image that has only runtime dependencies installed.

I've experimented (and use) a variant of this workflow myself built around my marina tool [1]. The basic idea is to define a file that uses a dev/builder image to build, then exports a tarball into a runner image.

[1] https://pypi.python.org/pypi/marina

lisivka · on April 15, 2016

It is much easier to just use standard package format (rpm/deb/apk/etc.) and standard installer (yum/dnf/apt/apk/etc.). Of course, you can invent your own build and installation system, it will work too.

mmerickel · on April 15, 2016

That depends on a lot of factors. The advantage of an approach like this is that every package is built in a clean-room container independent of the host. For example my host is os x and I'm building binary tarballs to run on ubuntu. If you have a build server obviously this is less of an issue.

lisivka · on April 17, 2016

Just build .deb's instead of tarballs. Use "alien" to convert .tgz into .deb, for example. I see no reason to invent my own build system, package format and package management software. I build my rpm packages in clean room chroot (using mock) for about decade. It works fine in docker too.

stevvooe · on April 15, 2016

Nix has been a very cool project to watch over the years.

You can address part of the problem of picking up extra data in final images by declaring temporary build locations, such as `/var/lib/cache`, as a volume. Anything written to a volume won't be included in the final image.

TomFrost · on April 15, 2016

If the goal is solely Docker images with a standard size in the 20-40MB range, this can be achieved without additional tooling. After switching our development and deployment flow to docker, my team quickly tired of the 200-400MB images that seemed to be accepted as commonplace. We started basing our containers on alpine (essentially, busybox with a package manager) or alpine derivatives, and dropped into that target size immediately. Spinning up 8-10 microservices locally for a full frontend development stack is a shockingly better experience when that involves a 200MB download rather than a 2GB one.

This is in no way a negative commentary on Nix; it looks like an interesting solution to a well-known problem.

louis-paul · on April 15, 2016

Same here! Switching to Alpine for most services was essentially painless. To go a step further, the images with binaries that have no dependencies (mostly programs written in Go) use scratch Docker images. This way we get 5MB images, where the size overhead of Docker is nothing.

_0w8t · on April 15, 2016

I have found that images with a single executable and perhaps /etc/passwd with no other files prevents to use docker exec as a valuable debugging/poking tool. My preference is to have a single image with all the services and basic tools included and use it to run all the containers on the machine.

iElectric2 · on April 15, 2016

Once https://github.com/NixOS/nixpkgs/pull/14711 is merged, the images might also be binary deterministic (depending on what packages you use).

edofic · on April 15, 2016

Already merged :)

josh-wrale · on April 15, 2016

@Lethalman: Can you expound on this?

> Be aware that things like PAM configurations, or other stuff, created to be suitable for Debian may not work with Nix programs that use a different glibc.

So this would not be a factor using the method which has no base? The Debian base approach seems like a non-starter if negative emergent behavior like PAM config mismatches are common.

Also, to be sure, I can do this "no base Docker build" using Nix on let's say CentOS 7? Meaning, I'm not required to use NixOS natively?

I plan to read the post closer later today, so feel free to ignore these questions if they are answered in-post, but I usually don't post to HN from work computer, so I thought I'd get my questions out here early in case the thread drops and I forget to ask. :-)

Nice work!

trishume · on April 15, 2016

Yes, AFAIK you can build docker containers with Nix on any Linux machine with the nix package manager installed. You can even build them on OSX if you configure your OSX Nix install with a remote build machine that runs Linux (yes, Nix can automatically and transparently distribute your builds).

k__ · on April 15, 2016

Haha, very good.

So Nix makes even the use of Docker better, while some Nix user here claimed that you don't even need Docker if you're using Nix(OS).

Lethalman · on April 15, 2016

You don't need in fact. But sometimes you are forced to use Docker anyway.

speedkills · on April 15, 2016

Installed emacs with nix last weekend to use with spacemacs. Three errors at startup that didn't make sense but I had a feeling that nix was the problem so rolled back and reinstalled with brew. Worked perfectly. Nix has some great ideas but when the install of emacs took 10 times as long as homebrew, then didn't work correctly it didn't leave me wanting to use it for anything work related like my docker images.

Hopefully with time, their planned cli improvements and binary caching it will be a contender but that feels a ways off at this point.

thinkpad20 · on April 15, 2016

> Nix has some great ideas but when the install of emacs took 10 times as long as homebrew, then didn't work correctly it didn't leave me wanting to use it for anything work related like my docker images.

I understand it's not fun to watch a long build process finish only to have the final product not work, but that assessment is not really fair. Nix itself is not the problem there; the problem is in the definition of that package. The same thing could just as easily occur in a package defined in homebrew. Saying that nix itself is at fault when an individual package (among hundreds of thousands) is faulty is sort of akin to encountering a buggy program and equating it to a bug in the language the program was written in.

I'd also add that due to the closed-source nature of OSX core libraries, it's hard to achieve the same degree of robust determinism in OSX that nix allows in other platforms. Fortunately things tend to be very solid on Linux. My company has been using Nix in production for about a year now and it has been a huge benefit to platform stability and speed of deployment.

bennofs · on April 15, 2016

Nix does have binary caching (but only if you use `/nix/store` for the nix store path), I'm using it all the time on linux. OSX support is definitly much less stable though and there have been problems with build machines for OSX in the past, so I do not know how the situation on OSX regarding binary caches is currently.

Regarding your point about build time, I'm not sure why installing with nix would make that much of a difference. Nix still just executes the build system of the underlying packages. However, nix might have to build more packages since it doesn't rely on the underlying system to provide dependencies which helps to make it more robust. Additionally, emacs has various build configurations, which might also require different dependencies, affecting build time.

ocharles · on April 15, 2016

If there are any bug reports you could file, that would be much appreciated. Installing Emacs should just be a case of downloading the prebuild binaries, which shouldn't take long at all. I use Spacemacs and have never encountered any problems. I have a feeling things are still a lot worse on OSX though...

bogomipz · on April 15, 2016

I really like the Nix package manager, however is there an upside to using Nix to build a Docker image over just writing a regular Dockerfile? Is this an odd use case? Maybe it just for demo purposes? Is there a benefit I'm overlooking?

cstrahan · on April 16, 2016

For reference, I left a comment further down:

https://news.ycombinator.com/item?id=11509065

isido · on April 15, 2016

Interesting. Not being that familiar with Nix(OS), how much of a moving target Nix is? Can you do these kinds of things with stable versions or do you need to keep up with HEAD?

Lethalman · on April 15, 2016

The blog post code is supposed to run with latest master since a few days. We've merged a big change that leads to reducing a lot the closure of our packages.

Nix moves fast enough, in the sense we usually do a good job at not breaking things. Yet we have to necessarily introduce innovations in our frameworks.

meta_AU · on April 15, 2016

Now all we need is a non-Docker image push and I can remove docker-in-docker from my build system.

fishnchips · on April 15, 2016

I'm wondering what would be the advantage of using Nix versus building on Alpine Linux with good understanding how Docker layers work. My main reason to be skeptical about Nix is the need to learn a new single-purpose language as opposed to just using Shell like you do in Dockerfiles.

davexunit · on April 15, 2016

Both the Nix language and the Dockerfile language include embedding shell scripts. Let's not pretend that Docker doesn't have its own DSL to learn. Dockerfile's are imperative, where as Nix is declarative and functional, which is a big improvement.

fishnchips · on April 15, 2016

I see Docker's DSL as a rather thin layer of abstraction as compared with Nix. Re: declarative vs. functional I believe this does not matter all this much in containerland. As long as I can get sh*t done deterministically, I could not care less about the programming paradigm that got me there.

davexunit · on April 15, 2016

>As long as I can get sh*t done deterministically

Docker is non-deterministic. If you and I build the same image, we are not going to get the same result. See https://reproducible-builds.org for more information on the subject.

bogomipz · on April 16, 2016

This is the same thing I was asking. As much as I like the idea of declarative functional package manager what value does it provide if you are just building docker images?

cstrahan · on April 16, 2016

Here's a copy of a comment I left on the post:

1. Better abstraction (e.g. the example of a function that produces docker images).

2. The Hydra build/CI server obviates the need for paying for (or administering a self hosted) docker registry, and avoids the imperative push and pull model. Because a docker image is just another Nix package, you get distributed building, caching and signing for free.

3. Because Nix caches intermediate packages builds, building a Docker image via Nix will likely be faster than letting Docker do it.

4. Determinism. With Docker, you're not guaranteed that you'll build the same image across two machines (imagine the state of package repositories changing -- it's trivial to find different versions of packages across two builds of the same Dockerfile). With Nix, you're guaranteed that you have the same determinism that any other Nix package has (e.g. everything builds in a chroot without network access (unless you provide a hash of the result, for e.g. tarball downloads))

awinter-py · on April 15, 2016

what docker build extensions would make it possible to do this without 'rolling your own' tarball docker layer? having access to volumes at build time?