Thanks "geofft" and "cesnja". converting certain global variables How do the cop...

geofft · on Aug 15, 2016

It depends on the thing you're unsharing.

For a chroot, you get a pointer to a subdirectory of the host root directory. Changes within that directory are visible in both directions. CLONE_NEWPID and CLONE_NEWUSER work similarly; every process has a PID and a UID outside of the container (that is, in the root namespace), but a subset of PIDs and UIDs are visible in the container, with their own values. Creating a process in a PID namespace causes it to get a PID counting from 1 in that namespace, as well as a PID counting from 1 in the parent namespace. A user account in a user namespace has a value (which could be 0) in the namespace, as well as a mapped value in the parent namespace.

CLONE_NEWIPC and CLONE_NEWNET create new, empty structures for the IPC namespace and network stack. Changes in one namespace aren't visible to another. You can move network devices between namespaces by using `ip link set dev eth1 netns 1234`, which will move eth1 out of the current process's network namespace and into process 1234's namespace. (This is occasionally useful with physical devices, but more useful with virtual devices like veth and macvlan.)

CLONE_NEWNS and CLONE_NEWUTS create a deep copy of the current namespace's mount table and hostname/domainname strings, respectively. Further changes in one namespace do not affect the other.

mafribe · on Aug 15, 2016

Thanks, this is very useful.

Has that been written up somewhere in a suitably abstract form?

Is there a list of variables that are affected, and how they are affected.

In particular, I wonder about network interfaces. Say your hardware has a network interface that's got MAC address 0b:21:b5:e2:11:22 and IPv4 address 123.234.34.45, how are these addresses affeced by cloning?

geofft · on Aug 18, 2016

You get a completely separate network stack. None of the network devices are copied/cloned. You can move a network device into the container, but it's no longer accessible in the host.

Since most people don't have spare physical devices, there are a couple of approaches using virtual devices. You can create a "veth" device pair, which is basically a two-ended loopback connection. Move one end of the veth into the container, configure them as 192.168.1.1 and 2 (or whatever), and set up NAT. Or you can create a "macvlan" device, which hangs off an existing device and generates a new MAC address. Any traffic destined for the macvlan's MAC address goes to the macvlan device; any other traffic goes to the parent device. So I can move the macvlan into the container and assign it the address of 123.234.34.46, and it will ARP for that address using its own MAC address.

The container also has its own routing table, iptables (firewall) rule set, etc. And anyone listening on a broadcast interface in the host won't get packets destined to the container, or vice versa. It's basically like a virtual machine.

mafribe · on Aug 18, 2016

Thanks for the description. I guess container networking like Weave works by 'hijacking' veth and executing NAT-like address translation.

   like a virtual machine.

That's surprising. Containers were advertised to me as being much more lightweight than conventional virtualisation (e.g. VMware, Xen), because the former, unlike the latter, share the ambient operating system.

    completely separate network stack

What does that mean exactly, does the container copy the actual code, or is the network stack's code shared, just run in a separate address space?