Hacker News new | past | comments | ask | show | jobs | submit login
Cheap Docker images with Nix (lethalman.blogspot.com)
171 points by Lethalman on April 15, 2016 | hide | past | favorite | 50 comments



Why isn't he using the Alpine based Redis image when comparing final image sizes?

It's unfair to say the official Redis image is 177mb because the Alpine version is available on the Docker Hub[1] and it's only 15.95mb.

Alpine is pretty awesome if your main goal is to shrink images without any effort[2].

[1] https://hub.docker.com/_/redis/

[2] http://nickjanetakis.com/blog/alpine-based-docker-images-mak...


You are right. It's unfair, I'm going to modify the post to mention alpine.

The post is both about tiny images AND how to build docker images with nix. I think it was an interesting tooling to make for our community. Some Nix people are already using it for obvious reasons.


I commented on the original post, but actually I believe the strength of the Nix-based approach is to provide package management capabilities outside of the target image. In other words, it makes the "scratch" image actually usable.

So now, if you take an Alpine-like approach to the problem (musl, no extra stuff) in Nix, you can get much smaller images. And the reason is you don't have to pay for the limitations of the Dockerfile-based approach.

As a proof of concept, here's an extension of the Nix recipe to produce a 1.2MB redis image: https://gist.github.com/sigma/9887c299da60955734f0fff6e2faee...

Now, the numbers start getting a little bit meaningless (although that's still an order of magnitude...), but the point is that regardless of how great Alpine is (and it is definitely great), as a base image for a Dockerfile it'll always contain way too much stuff compared to what's really needed for the application itself.


Agreed, and alpine is the official Docker base distro at this time. I've built some very small images for haproxy and iperf with it.


A common problem with Docker is after running a compiler/preprocessor during an image build one ends up with the bulid tools inside the image. A workaround is to use a shell script that first gets the image with the compiler, run it and then pass the output to the Docker build. But this is non-standard and encourages running random scripts on the build host defeating the advantages of using Docker during development to isolate build system. It is nice that Nix addresses this.


Or you could remove the tools at the end of your Dockerfile...

Edit: am I missing something? This is a legitimate solution to the problem. Install the tools, compile, and remove them. The parent is suggesting a very clumsy approach (build on the host and pass the binary to the container as it's being built).


(I didn't downvote you)

Disclaimer: I work at Docker.

Your approach is the logical one... But Docker currently has a limitation in how it handles removing files in a build. After each build step, the intermediary state is committed as a layer, just like a git commit. So removing files in a docker build is like removing files in git: they are still taking up space in the history.

The long-term solution is to support image "squashing" or "flattening" in docker-build.

A less clumsy short-term solution is to build a Docker image of the build environment; then 'docker run' that image to produce the final artifact. At least that way you get rid of the dependency on the host, which keeps your build more portable (if not as convenient as a single 'docker build')


Our approach is to view Docker as part of our overall development process and then develop stage specific containers.

For example, we have development containers, build containers and runtime containers. Runtime containers are further segmented into product demo containers, testing containers and production containers.

I just published a new article on Docker this morning: http://www.dynomitedb.com/blog/2016/04/13/docker-containers/

An important point is that the build containers produce binaries that are used in both native package managers (ex. apt) and in Docker containers.

If you're interested in seeing this in action then checkout our source on GH: https://github.com/DynomiteDB

IMHO, a well designed approach to UnionFS layers is vital to high quality container architecture.

While we're focused on container use for databases (both in-memory and on-disk), much of our approach applies equally well to application layer containers.


Nice reference about https://www.projectcalico.org . At some point insanity of using ethernet on top of UDP to carry IP traffic between containers must stop.


Straight from the horse's mouth--admire your product, Mr. Hykes!

I love how you can run docker inside of a container. What I've done sometimes is run docker inside my build environment container. I use Docker Machine (OSX), so I just send the same machine environmental variables over to the container, but on Linux you could just link the socket file. In fact, I have a container just for Google Cloud that maintains my GKE config and makes it easy for me to prepare new deployments to the cloud.


Could you elaborate on this process of deploying your container from inside docker via socket linking? I'm not sure I follow.


> But Docker currently has a limitation in how it handles removing files in a build. After each build step, the intermediary state is committed as a layer, just like a git commit. So removing files in a docker build is like removing files in git: they are still taking up space in the history.

That's true unless you do the "everything in a single RUN statement" trick that is very popular.


"Very popular" for the one case of installing things via package manager.


Proper solution is to build packages first using build environment, then install built binary packages in container, like any other package.


There's also another workaround to the ones mentioned by other posters: You could install your compilers, do your build, and clean up all your build tools within just one shell script invoked by a RUN line in Dockerfile. It's not pretty but it works.


Yes that is a common workaround as well.

By the way: there is an open request for contributions to help improve this. The core Docker team very much wants to improve squashing in build, but it's a matter of time and resources.

If somebody cares enough to take the time to carry a design proposal then a patch, we would be happy to support that effort!


The idea is not to build anything on the host. Rather it is more like a staged build initiated through a shell script. First pull/build image with the compiler, then docker run it to compile the application and finally build the final image with the application.


The layered fs that Docker uses is based on additive snapshots, so removing tools at the end will paradoxically increase your image size with a useless snapshot.


You can do everything within a single RUN statement to avoid that.


Yes, but at the expense of readability, development speed, and incremental updates to images (where typically the dependencies layer changes slower than your target code).


The idea is actually to do something similar to a heroku buildpack where you have a container with build tools that generates binary assets. You then inject the built binary into a new image that has only runtime dependencies installed.

I've experimented (and use) a variant of this workflow myself built around my marina tool [1]. The basic idea is to define a file that uses a dev/builder image to build, then exports a tarball into a runner image.

[1] https://pypi.python.org/pypi/marina


It is much easier to just use standard package format (rpm/deb/apk/etc.) and standard installer (yum/dnf/apt/apk/etc.). Of course, you can invent your own build and installation system, it will work too.


That depends on a lot of factors. The advantage of an approach like this is that every package is built in a clean-room container independent of the host. For example my host is os x and I'm building binary tarballs to run on ubuntu. If you have a build server obviously this is less of an issue.


Just build .deb's instead of tarballs. Use "alien" to convert .tgz into .deb, for example. I see no reason to invent my own build system, package format and package management software. I build my rpm packages in clean room chroot (using mock) for about decade. It works fine in docker too.


Nix has been a very cool project to watch over the years.

You can address part of the problem of picking up extra data in final images by declaring temporary build locations, such as `/var/lib/cache`, as a volume. Anything written to a volume won't be included in the final image.


If the goal is solely Docker images with a standard size in the 20-40MB range, this can be achieved without additional tooling. After switching our development and deployment flow to docker, my team quickly tired of the 200-400MB images that seemed to be accepted as commonplace. We started basing our containers on alpine (essentially, busybox with a package manager) or alpine derivatives, and dropped into that target size immediately. Spinning up 8-10 microservices locally for a full frontend development stack is a shockingly better experience when that involves a 200MB download rather than a 2GB one.

This is in no way a negative commentary on Nix; it looks like an interesting solution to a well-known problem.


Same here! Switching to Alpine for most services was essentially painless. To go a step further, the images with binaries that have no dependencies (mostly programs written in Go) use scratch Docker images. This way we get 5MB images, where the size overhead of Docker is nothing.


I have found that images with a single executable and perhaps /etc/passwd with no other files prevents to use docker exec as a valuable debugging/poking tool. My preference is to have a single image with all the services and basic tools included and use it to run all the containers on the machine.


Once https://github.com/NixOS/nixpkgs/pull/14711 is merged, the images might also be binary deterministic (depending on what packages you use).


Already merged :)


@Lethalman: Can you expound on this?

> Be aware that things like PAM configurations, or other stuff, created to be suitable for Debian may not work with Nix programs that use a different glibc.

So this would not be a factor using the method which has no base? The Debian base approach seems like a non-starter if negative emergent behavior like PAM config mismatches are common.

Also, to be sure, I can do this "no base Docker build" using Nix on let's say CentOS 7? Meaning, I'm not required to use NixOS natively?

I plan to read the post closer later today, so feel free to ignore these questions if they are answered in-post, but I usually don't post to HN from work computer, so I thought I'd get my questions out here early in case the thread drops and I forget to ask. :-)

Nice work!


Yes, AFAIK you can build docker containers with Nix on any Linux machine with the nix package manager installed. You can even build them on OSX if you configure your OSX Nix install with a remote build machine that runs Linux (yes, Nix can automatically and transparently distribute your builds).


Haha, very good.

So Nix makes even the use of Docker better, while some Nix user here claimed that you don't even need Docker if you're using Nix(OS).


You don't need in fact. But sometimes you are forced to use Docker anyway.


Installed emacs with nix last weekend to use with spacemacs. Three errors at startup that didn't make sense but I had a feeling that nix was the problem so rolled back and reinstalled with brew. Worked perfectly. Nix has some great ideas but when the install of emacs took 10 times as long as homebrew, then didn't work correctly it didn't leave me wanting to use it for anything work related like my docker images.

Hopefully with time, their planned cli improvements and binary caching it will be a contender but that feels a ways off at this point.


> Nix has some great ideas but when the install of emacs took 10 times as long as homebrew, then didn't work correctly it didn't leave me wanting to use it for anything work related like my docker images.

I understand it's not fun to watch a long build process finish only to have the final product not work, but that assessment is not really fair. Nix itself is not the problem there; the problem is in the definition of that package. The same thing could just as easily occur in a package defined in homebrew. Saying that nix itself is at fault when an individual package (among hundreds of thousands) is faulty is sort of akin to encountering a buggy program and equating it to a bug in the language the program was written in.

I'd also add that due to the closed-source nature of OSX core libraries, it's hard to achieve the same degree of robust determinism in OSX that nix allows in other platforms. Fortunately things tend to be very solid on Linux. My company has been using Nix in production for about a year now and it has been a huge benefit to platform stability and speed of deployment.


Nix does have binary caching (but only if you use `/nix/store` for the nix store path), I'm using it all the time on linux. OSX support is definitly much less stable though and there have been problems with build machines for OSX in the past, so I do not know how the situation on OSX regarding binary caches is currently.

Regarding your point about build time, I'm not sure why installing with nix would make that much of a difference. Nix still just executes the build system of the underlying packages. However, nix might have to build more packages since it doesn't rely on the underlying system to provide dependencies which helps to make it more robust. Additionally, emacs has various build configurations, which might also require different dependencies, affecting build time.


If there are any bug reports you could file, that would be much appreciated. Installing Emacs should just be a case of downloading the prebuild binaries, which shouldn't take long at all. I use Spacemacs and have never encountered any problems. I have a feeling things are still a lot worse on OSX though...


I really like the Nix package manager, however is there an upside to using Nix to build a Docker image over just writing a regular Dockerfile? Is this an odd use case? Maybe it just for demo purposes? Is there a benefit I'm overlooking?


For reference, I left a comment further down:

https://news.ycombinator.com/item?id=11509065


Interesting. Not being that familiar with Nix(OS), how much of a moving target Nix is? Can you do these kinds of things with stable versions or do you need to keep up with HEAD?


The blog post code is supposed to run with latest master since a few days. We've merged a big change that leads to reducing a lot the closure of our packages.

Nix moves fast enough, in the sense we usually do a good job at not breaking things. Yet we have to necessarily introduce innovations in our frameworks.


Now all we need is a non-Docker image push and I can remove docker-in-docker from my build system.


I'm wondering what would be the advantage of using Nix versus building on Alpine Linux with good understanding how Docker layers work. My main reason to be skeptical about Nix is the need to learn a new single-purpose language as opposed to just using Shell like you do in Dockerfiles.


Both the Nix language and the Dockerfile language include embedding shell scripts. Let's not pretend that Docker doesn't have its own DSL to learn. Dockerfile's are imperative, where as Nix is declarative and functional, which is a big improvement.


I see Docker's DSL as a rather thin layer of abstraction as compared with Nix. Re: declarative vs. functional I believe this does not matter all this much in containerland. As long as I can get sh*t done deterministically, I could not care less about the programming paradigm that got me there.


>As long as I can get sh*t done deterministically

Docker is non-deterministic. If you and I build the same image, we are not going to get the same result. See https://reproducible-builds.org for more information on the subject.


This is the same thing I was asking. As much as I like the idea of declarative functional package manager what value does it provide if you are just building docker images?


Here's a copy of a comment I left on the post:

1. Better abstraction (e.g. the example of a function that produces docker images).

2. The Hydra build/CI server obviates the need for paying for (or administering a self hosted) docker registry, and avoids the imperative push and pull model. Because a docker image is just another Nix package, you get distributed building, caching and signing for free.

3. Because Nix caches intermediate packages builds, building a Docker image via Nix will likely be faster than letting Docker do it.

4. Determinism. With Docker, you're not guaranteed that you'll build the same image across two machines (imagine the state of package repositories changing -- it's trivial to find different versions of packages across two builds of the same Dockerfile). With Nix, you're guaranteed that you have the same determinism that any other Nix package has (e.g. everything builds in a chroot without network access (unless you provide a hash of the result, for e.g. tarball downloads))


what docker build extensions would make it possible to do this without 'rolling your own' tarball docker layer? having access to volumes at build time?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: