Hacker News new | past | comments | ask | show | jobs | submit login
Docker Development Patterns (hokstad.com)
251 points by ingve on Oct 22, 2014 | hide | past | favorite | 30 comments



To be honest, I'm very much against this approach to containerized development.

The project should have a single Dockerfile from which dev/test/staging/production work from.

There are certaily things you may want to change in these envs (particularly dev and test) but these can be runtime configurations (through env vars or changing the command that gets run in the container).

When I do dev, if I need to do so interactively I build from my main dockerfile and do `docker run -v $(pwd):/opt/myapp myapp bash`, which mounts my live code into the container, overwriting the code that was built in. I'll mess around with some things, exit and adjust my Dockerfile if needed based on those changes.

To run tests (as an example): `docker run -e RAILS_ENV test -v $(pwd):/opt/myapp myapp rake test` If all is well and I'm ready to push my code up, go ahead and rebuild the image with the changed code, run the tests again, etc.

Sometimes it is neccessary to have some tools like gdb, strace, etc, available within the container. For this you can startup the container as you normally would, then (with Docker 1.3) `docker exec -it /bin/bash`, install the tools I need to debug something, and get out.

In general it is just going to be bad practice to have a significantly different from the production env, and is indeed why dev handoff to production is often wrought with trouble.


> The project should have a single Dockerfile from which dev/test/staging/production work from.

I don't agree with this largely, I think because we actually substantially agree about the need to test regularly in an environment identical to staging/production during development. I DO agree that there should be a single Dockerfile to be used as the basis, and I DO agree that you should regularly run that Dockerfile in dev/test/staging/production unmodified.

But part of the point is that there is no reason to limit yourself to a single dev instance of your app. I appreciate that might have been somewhat obscured because of some of my examples, which uses a separate dev container setup that I use for some projects as the base, including as a base for the production container for my blog.

So to reiterate, I absolutely agree with having a Dockerfile that is identical to your live environment to bring up an instance with during development.

But that is not a reason to not _also_ have a container that bring it up with a code reloader, or a container that lets you test in different environments. And it certainly is not a reason to avoid isolating dev/build dependencies into yet other containers.

I actually much prefer that to the "Rails way" of having separate environments "built into" the app, and that's probably part of the difference. Instead of doing that, I use "FROM" in Dockerfiles to "inherit" from bases to provide me with different "lenses" to work on the project from different viewpoints. Not just dev vs. live, but also with debugging tools, or architecture differences etc.

> Sometimes it is neccessary to have some tools like gdb, strace, etc, available within the container. For this you can startup the container as you normally would, then (with Docker 1.3) `docker exec -it /bin/bash`, install the tools I need to debug something, and get out.

But there's no reason to keep redoing these steps when they can be encapsulated in a Dockerfile that inherits from your basic container. And you can easily run it in parallel with your "pristine" containers, or bring it up in seconds as needed.

> In general it is just going to be bad practice to have a significantly different from the production env, and is indeed why dev handoff to production is often wrought with trouble.

We fully agree with this. It's a decade of frustration doing devops that has made me enjoy the opportunities Docker brings to layer situation-specific dependencies on top of a pristine base: It takes away any excuse for letting the basic setup diverge, or for being unable to run full tests on an environment identical to live.


The only difference between dev/stage/prod/whathaveyou with docker is the topography of the servers. For the (local) dev case, the topography is 1 box; an "all-in-one" system. This does not require you to have different dockerfiles per environment, it just means you have to make dockerfiles that don't care about the topography!


I couldn't agree more.


Totally agree!


From a sideline viewers perspective, persistence in production for Docker containers is a big problem that I've yet to see a good solution for. You end up having to keep DB servers and such outside of your container pool.


You don't. That's the point of volumes. You do need to be careful to ensure you mount volumes or everything that needs to persist, but in practice that's not a very onerous limitation.

Since volumes are bind-mounted from the outside, you can put the volumes on whatever storage pool you want that you can bring up on the host (as long as it meets your apps requirements, e.g. POSIX semantics, locking etc).

E.g. at work we have a private Docker repository that runs on top of GlusterFS volumes that are mounted on thehost and then bind mounted in as a volume in the container.

I also run my postgres instances inside Docker, with the data on persistent volumes.

We could certainly use better tools to manage it, though.

Another pattern I ought to have mentioned (that I've used myself) is to set up "empty" containers whose only purpose is to act as a storage volume for another container. I don't like that as much, mostly since I've not had as much long-term experience with how the layering Docker uses would impact it.


https://github.com/clusterhq/flocker seems to be doing some interesting stuff in that direction.


And we're working with the folks @ clusterhq to bring some of that into Docker.


Having persistent containers strikes me as the infrastructural equivalent of a code smell. I don't use in-container storage for anything that needs to be persisted at all, and I'm not sure why you would in a modern environment. Everything can fail and fail hard, and writing meaningful (i.e., volume'd) data to disk seems like asking for trouble. Ephemeral containers just seems to fit the model of Docker's capabilities much better than maintaining state.

The exception to this would be, I guess, you could wedge in Docker containers for database server isolation or something, but my databases don't run on multi-tenant instances so there isn't a huge win to it. (I use RDS most of the time, let somebody else manage that problem.)


> Having persistent containers strikes me as the infrastructural equivalent of a code smell.

Which is why pretty much all advice regarding Docker is to use volumes, so that the persistent data is managed from outside the container, and the container itself can be discarded at will without affecting the data volume.

My preferred method is bind-mounted volumes from the host. They are not part of the containers, and the purpose is exactly to remove the need or persistent containers. A lot of the examples I gave in my article relies heavily on this.

This leaves the form of persistence up to the administrator. And of course that means you can do stupid things, or not. On my home server that means a mirrored pair of drives combined with regular snapshots to a third disk + nightly offsite backups. At work, we're increasingly using GlusterFS on top of RAID6, so we lose multiple drives per server, or whole servers before the cluster is in jeopardy (and even then we have regular offsite snapshot throughout the day + nightly backups).

If you are referring to the pattern of creating "empty" containers to act as storage volumes, then I sort-of agree with you, but mostly because of the maturity of Docker. After all nothing stops you from putting the docker storage itself on equally safeguarded storage. It's not really the risk of losing storage that makes me prefer stateless containers, but that separating state and data substantially reduces the data volume that needs to be secured (since we can spin up new stateless containers in seconds, we only really care about preventing loss of the persistent data volumes).


Persistent containers is the wrong term. "On-host storage" might be a better one. I don't store anything on my compute nodes. Everything is in a HA datastore or in S3. I kind of feel like architectures that rely on storing any data you can't immediately blow away without shedding a tear are, in modern environments, dangerous, and volumes seem (to me) to lead to the kind of non-fault-tolerant statelessness that'll bite you in the end.

I respect GlusterFS, so it sounds reasonable in your own use case, but it still makes me intensely uncomfortable to have apps managing their own data. I try to build systems where each component does one thing well, and for the components that I think work well in a Docker container, data storage is then kind of out-of-scope. YMMV.


We decided not to use docker containers for our postgres DB in production; Volume mounts just don't make me sleep easy. We use an S3-backed private docker registry to store our images/repositories.


What is your issue with volumes? Bind mounts have been battle tested over many years. I e.g. have production Gluster volumes bind-mounted into LXC containers that have been running uninterrupted for 5+ years.


Nothing wrong with mounting volumes, but certainly not for production DB's, at least not at this point. I don't have much insight to share on this, besides my intuition based on several years working on HA systems. The clusterhq folks are working on this problem though, and I am following keenly.


Docker volumes are just bind-mounted directories. Of all the things you should worry about when running HA systems, bind-mounts is definitely not one of them. They are essentially overhead-free and completely stable. This is completely orthogonal to Docker - after setting up the bind-mounts it gets completely out of the way of the critical path.

We have definitely used them at large scale in production for several years with no issues.


Thanks shykes for chiming in on this. We use docker heavily for our applications in production (and all other envs), and would absolutely like to extend that to our Postgres DBs. Are there any examples you can point us to with a pattern for Postgres HA clusters with docker? Thanks.


With docker specifically, I've had frustrations with the user permissions on the volume between {pg container, data container, host OS}, with extra trouble when osx/boot2docker is added to the stack.

Also, docker doesn't add as much value for something like postgres that likely lives on it's own machine.


> I've had frustrations with the user permissions on the volume between {pg container, data container, host OS}

Then don't use data containers. I don't see much benefit from that either. The stuff we put on data volumes is stuff we want to manage the availability of very carefully, so I prefer more direct control.

And so when I use volumes it's always bind mounts from the host. Some of them are local disk, some of them are network filesystems.

We have some Gluster volumes that are exported from Docker containers that imports the raw storage via bind mounts from their respective hosts, for example, and then mounted on other hosts and bind-mounted into our other containers, just to make things convoluted - works great for high availability (I'm not recommending Gluster for Postgres, btw.; it "should" work with the right config, but I'd not dare without very, very extensive testing; nothing specific with Gluster, just generally terrified of databases on distributed filesystesms).

> for something like postgres that likely lives on it's own machine.

We usually colocate all our postgres instances with other stuff. There's generally a huge discrepancy between the most cost-effective amount of storage/iops, RAM and processing power if you're aiming for high density colocation, so it's far cheaper for us that way.


I'd be interested to know what your concerns are with volumes.


So, we're making volumes better! Please see: https://github.com/docker/docker/pull/8484 And that is only the beginning.

It may be presumptuous to say the referenced PR will be in 1.4, but that's certainly what I'm pushing for.


Any plans on making it possible to control what's backing those volumes more directly? For me, when I'm using data volumes, it's generally because I have specific requirements (e.g. want my database on my expensive SSD array's; want my high availability file storage on a Gluster volume or similar) that doesn't make it interesting to have them co-located with the container storage.

I don't want to care if the container storage is totally destroyed - I want to just re-create it from our registry on a different host. I keep threatening the devs I work with that I'll wipe the containers regularly, for a reason, and they're specifically and intentionally not backed up.

From what I hear, the idea of keeping the containers totally disposable is a key appeal of Docker for many (me included), so I would just not use any volume related functionality that co-mingles the data volumes other than in very special circumstances (e.g. lets say we had static datasets that we'd like to mix with an application in different configurations; but I don't have any actual, real-life use cases for that at the moment; basically I'd only do that if I then also could push those volumes into our registry)


First, volumes are 100% separated from containers. If you remove a container, the volumes are not removed unless you explicitly told docker to (docker rm -v <container>). And even then, it won't remove the volume if other containers are using it.

That said, it is hard to re-use a volume right now if you removed the last container referencing that volume. The linked PR solves this issue.

You can mount these SSD's on the host and then use bind-mounts (docker run -v /host/path:/container/path) to get those into the container. Or you can add the devices to the containers directly ("docker run --device /dev/sdb", for example), but you'd need privilaged access to actually mount the device.

The referenced PR itself doesn't really make using things like specialized disks any easier, except you can register them with the volumes subsystem and it will track them form you, ie `docker volumes create --path /path/to/data --name my_speedy_disks`.

There was some discussion around being able to deal with devices directly with Docker instead of expecting the admin to handle mounting those devices onto the host so they can be used as volumes. Nothing finalized here. Here is where that discussion happened: https://botbot.me/freenode/docker-dev/2014-10-21/?msg=239115...


I think the "docker volumes create --path ..." solves my use case just fine. Thanks for providing details - that seems quite useful.


I haven't used Docker in production but isn't the standard practice for persistence to mount a host directory as a data volume?


I'm extremely surprised that Packer (http://www.packer.io) hasn't received more attention in the Docker arena. Its Docker builder, coupled with any of its provisioners, makes it a pretty attractive alternative to the simplistic layout and limitations of Dockerfiles.


Packer currently doesn't snapshot the build at each step, so you don't get a history that you can checkout or push from. It's a feature that's on the way, but it's a pretty big one that some teams need.

We use packer for a lot of other things though and it's great.


We are doing something similar at work, specifying exact few dependencies needed on the host and their versions (mostly docker, tar etc), then we make a chroot environment using debootstrap from known few packages and versions, from this we make a docker image (tar cf - chroot | docker import) and use this as base for build environment. We could use pacstrap or whatever too, the initial selection of packages is important.

For any package which we need to build, we have many packages with dependencies between them, are specified in a Makefile - because make is good at dependency tracking, but when make needs to (re)-build a package, it does so using the docker build-container-base image, whatever that package does is committed for later analysis under the name of the package built and timestamp, for traceability.

For every package built, the resulting docker image, with build-environment-and-package-plus-timestamp name, is run but now, trying to install the package and commit that image as well with 'install' tag. - so we also verify the install step basically.

Btw, this is for openstack, many components, packages and dependencies all over the place.


[dead]


You just caused the mods to remove a useful piece of information from the title.

Pluralization in an article like this implies a "few". The word eight tells us that there are more than a few. Eight is also not round, so it tells us that the author didn't stretch to get 10 in the style of those articles you disparage.


Mods routinely remove arbitrary numbers from titles, because HN's guidelines exclude them as a bait device. This rule has served the site well over the years. Yes, there are borderline cases, but surprisingly few.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: