Users should not use AES-CBC or GCM for encryption. Secretbox should be the default mode of storing information and users should be encouraged to use KMS.
I see where this is coming and agree in spirit, but GCM is actually idiomatic Go and implemented through the crypto/aead interface, which does about as good a job as any library at being user-proof.
I too would probably prefer code that used Nacl primitives over Seal/Open, but I would probably not flag code that didn't.
> I see where this is coming and agree in spirit, but GCM is actually idiomatic Go and implemented through the crypto/aead interface, which does about as good a job as any library at being user-proof.
Good point, and I appreciate the (updated) Kubernetes docs do a pretty good job of telling you what the implications of using aesgcm vs secretbox are.
However, I was surprised that XChaCha20-Poly1305 wasn't recommended. XChaCha appears to check all the boxes you mentioned and is nonce-misuse resistant.
It's "NMR" in the sense that the nonce is long enough to safely use random nonces, you mean? In practice, Kubernetes can use random GCM nonces safely too. Real NMR ciphers don't just have misuse-resistant ergonomics, but also better failure modes when the ergonomics fail: if you reuse a Chapoly nonce, it blows up. That doesn't happen with AEZ or SIV.
I agree that both can be used safely. And, yes to be clear, NMR here means "less likely to happen" not "better able to handle failure." Unfortunately, AES-GCM-SIV (or AEZ) aren't yet in Go's standard lib.
But, why not use XChaCha20-Poly1305 over AES-GCM in Go? Both are "implemented through the crypto/aead" and -- to my eyes -- seem equally user-proof. Why not take the bigger nonce size?
AES-GCM or even CBC for that matter is not vulnerable/broken. Why did they recommend Secretbox? Is there an implementation error? I am not talking about the potential of making mistakes and using platform supported constructs.
Does it make sense to make this recommendation even if the dev did not choose a vulnerable algorithm and there aren't any issues with implementation?
First: it's not as simple as "broken" or "not broken". GCM in Go is provided through an AEAD abstraction that is in fact pretty close to secretbox, ergonomically. In Python, Fernet provides AES-CBC with HMAC-SHA2 with similar ergonomics. So you can't just look at the constructions in isolation.
Using CBC in a Go program would be bad indeed.
Second, while you can make CBC secure, it isn't secure by default. New designs should generally avoid CBC mode in favor of a mainstream AEAD. So while I'd happily recommend Fernet to people --- it also dates back to a time when AEAD ciphers were a little less mainstream than they've become --- I would see CBC as a design smell in a newer library.
In the document they say that AES-CBC is vulnerable to padding oracle attacks, and AES-GCM uses random nonces and requires key rotation after so many iterations.
CBC is vulnerable to error oracles if you don't encrypt-then-MAC it properly (without the MAC it's also malleable, which is a game-over flaw). GCM is vulnerable to a bunch of its own misuse issues; it doesn't "use" random nonces, it is conceivably (through not really realistically) unsafe to use random nonces, and if you screw up nonce handling it blows up worse than CBC does.
My point is just, these things all have rough edges.
I know the Kubernetes Assessment was the one to make all the news, but the teams actually audited a bunch of CNCF projects. Here is the one for the Vitess project
> A database clustering system for horizontal scaling of MySQL
> Vitess combines many important MySQL features with the scalability of a NoSQL database. Its built-in sharding features let you grow your database without adding sharding logic to your application.
What a quirky project. Is this for folks who started out with MySQL then find themselves needing to scale out in "NoSQL" style?
> Vitess automatically rewrites queries that hurt database performance.
And to understand scaling and extremes: FB basically uses RocksDB and/or MySQL as a low level storage layer for whatever thing they want to. (And on top they build the clustering stuff, with the particular CAP choices they think is best for that particular service/purpose.)
It's part of the CNCF graduation criteria now, that any project which is going to "graduated" status has to have a 3rd party security review, so you should be able to get one for any of the projects in that category.
Cure53 did the Vitess audit. I think they've done others for the CNCF, too. The Kubernetes audit was done by Trail of Bits. It was a different team that did the assessment.
> The container manager used in kubelet checks for docker daemon process either via pidfile or process name. While the pidfile points to the docker daemon process PID, the dockerProcessName constant stores a docker cli name (docker) instead of docker daemon name (dockerd).
They're trying to look up the process by a name the process isn't using.
I think the HTTP proxy based architecture is just weird and inherently insecure. Everything would be much simpler and and easier to analyze in a normal end-to-end scenario.
Kubernetes is 5 years old. This is very, very young for mission-critical infrastructure management software.
Having a certain level of doubt in young open source projects is responsible, in my opinion. I'm interested to hear other people's perspective on production-readiness of k8s for mission-critical applications.
If security got to be the number one concern for whether things were deployed or not, then sure we could likely take a more conservative view.
However realistically k8s is in heavy deployment in a wide variety of industries including public sector, financial services, retail, technology ... and it's clear that this kind of concern is not the primary consideration.
The tradeoffs Monzo made are not ones that apply to most business. For most businesses, you have a profitable and sustainable model and you want to mitigate the possibility that you sink the ship by screwing the pooch on security or availability.
Monzo, on the other hand, was default-dead, so betting the farm on a relatively unproven technology perhaps wasn't risking as much. Nobody talks about the startups that used unproven tech and sank.
I don't think Monzo had to adopt k8s to survive. It's an infrastructure technology not something which provides a unique advantage from an app. development perspective.
In most other industries, saying something is in "heavy development" is usually the same as "unstable". (Unstable usually is interpretted as Bad in software engineering -- but the dictionary definition of "unstable" only means "prone to change", which I think is an accurate characterization of k8s considering its degree of maturity)
Whether or not something is a smart choice to use in mission-critical production applications doesn't depend on the number of big banks or big tech companies that use the technology.
At the end of the day, Kubernetes is a tool that will change
very rapidly over the next 5 years. I could see k8s being a decent choice to use in a tech project that you expect to actively maintain and improve for the next 5+ years, AND if you (and your developers) are willing to invest time (potentially a lot of time) every year keeping up to speed with how k8s evolves through every version release. That's the primary risk in using something like k8s.
Sure rapid development is likely to equal lots of change, but it's far from alone in that regard.
The last decade has been dominated by rapid adoption of technologies that were under heavy development at the time, from Ruby on Rails, to Node.JS to Golang to Rust.
The simple reality of modern IT is that companies are unwilling to wait until a technology has stabalized before making use of it.
Personally I'd rather they did, but my opinion has little weight in that regard.
Kubernetes has already seen far more production-hours of operation than most infrastructure management software will ever see. Age is no substitute for experience.
The better question is when you deploy k8s in a production how do you ensure none of the risks are being exploited.
Given todays landscape of hardware and software exploits adding a complex orchestration layer with identified issues seems like less than prudent behavior.
I currently work on kubernetes in production and am migrating large clients into these systems. I see the distinct lack of knowledge around securing systems and more so when adding kubernetes.
I'm not running antagonistic workloads in k8s though, I'm just running my own junk, each component of which also has its own laundry list of security nightmares.
"Only two remote holes in the default install, in a heck of a long time!"
Due respect to smart acquaintances who work on OpenBSD, but to most people who secure application deployment environments, this is not the reassuring statement OpenBSD seems to think it is.
What's funny about it is, if you're going to make up a benchmark (and theirs is contrived; it was "no remote vulnerabilities", as I recall, when I was involved with the project, then "no remote vulnerabilities in the default install", then "only one remote vulnerability in the default install"), make up one where your number is zero, not "just 2 in a heck of a long time".
But more substantively: the reason you run an operating system is to do stuff on it. It isn't 1996 any more and nobody gets public shell accounts on Linux systems or OpenBSD systems; similarly, remotely-exploitable vulnerabilities in other operating systems are also exceedingly rare, and so OpenBSD's benchmark excludes the LPEs that actually make up the meaningful attack surface of a modern OS.
What's a more important question is what features the operating system provides to harden the non-default programs that inevitably have to run on it. OpenBSD has historically lagged here, though they're upping their game recently.
Despite briefly being involved with the project during "The OpenBSD Security Audit" in the late 1990s, I have a longstanding bias against OpenBSD that I should be up front with: we shipped an appliance on OpenBSD at Arbor Networks, and I spent several days debugging a VMM problem that would zombify pages of memory and gradually suffocate our systems. When I presented evidence to Theo, he said (not a literal quote) "don't bother me about this, Chuck Cranor" --- I think it's Chuck Cranor but could be wrong --- "wrote this VMM as his graduate project and I've got nothing to do with it". For whatever that's worth, I've felt OpenBSD is an unserious option for deploying real systems other than near-stateless network middleboxes ever since.
To be fair, hardly anyone uses openbsd compared to kubernetes. And last I checked, most openbsd services are disabled by default, so it makes it hard to break in, but unusable in its default state.
If we have to count the exploits in every new thing against some grand total of allowable exploits then there will never be new things. The question was not whether k8s added to the universe of exploits, but whether the exploits make it unready for production. Personally I was more bothered by some of the code quality issues than the list of specific high severity exploits. It's a large project and issues like this will be found.
When the complexity of the attack surface gets to the degree of k8s I would say that is a problem.
The fact very few and I do mean very few people understand the low level functions going on ( like the multiple layers of nat via iptables ) and they are simply struggling to keep it running its pretty obvious they arent qualified to run this in production.
I have been at google HQ in kubernetes discussions and its frightening how little people know about the internals of it.
We already depend upon layer after layer of highly complex software. I'd argue that the complexity of k8s is not out of line with its scope. I don't want to get into a debate about specific things like netfilter. Yeah it's an odd setup and full of warts, but it's completely pluggable. On GKE for example you can now run in a mode where the pod networking is handled as a VPC subnet with load balancing directly to pods. And that's sort of the point: it's the maturing abstractions that are valuable, not the specific implementation of a part like networking.
As for struggling to run it, our experience has been different. Granted we're a small user. Our largest cluster has just over 100 nodes. Our highest volume service hits about 15k req/sec at peak. We're on GKE which is a well-managed implementation and that also makes it less risky. In two years of production the platform has been extremely reliable. Moreover we've been able to do things that would have been a lot harder before, such as autoscaling the service I mentioned above so that we're not paying for capacity we don't need off peak.
You keep saying that the attack surface is high, but is it higher than all other software we consider suitable for this purpose?
Does anyone understand the JVM and servlet containers? Does anyone understand OpenSSL's state machine? Does anyone understand hardware load balancers? Does anyone understand speculative execution? Does anyone understand the Postgres query planner? Does anyone understand all the same-origin policies? Does anyone understand their laptop's power supply?
I've seem a lot of people build a lot of successful systems on things they don't know every detail of, even when not knowing those details is quite dangerous. That Kubernetes is yet another one of these building blocks isn't an indictment of Kubernetes, it's an indictment of the compulsion to understand everything.
Can you name one security vulnerability from this document that, in a functionally-similar architecture that used OpenBSD and didn't use Kubernetes, would have been prevented by OpenBSD's security model?
("Don't build the system you want to build, build the system I want you to build" isn't an answer.)
Thing is everyone I have worked with uses k8s because its the new cool toy. None of them have a requirement to create a large expensive platform which costs more than simple hardware so a company can bring products to market faster
Everyone thinks they can save money with k8s. You wont. Especially in AWS
It's production ready, folks have been running it on production and will be running it on production. Sure it has issues from inside the cluster. But if you secure it and it's not accessible from outside, it's good to go. Probably more secure than trying to run 500 boxes at once.
Yes, it significantly reduces the number of machines. That's the main benefit. You can binpack your pods by sizing it up well and maxing out resources on each machine.
yeah that one is kind of interesting, really needs more detail. I think what they're talking about is that it's possible to configure insecure connections between the different components.
However, if that's the case, that's a distribution specific issue and not really anything intrinsic in k8s.
A nit:
Users should not use AES-CBC or GCM for encryption. Secretbox should be the default mode of storing information and users should be encouraged to use KMS.
I see where this is coming and agree in spirit, but GCM is actually idiomatic Go and implemented through the crypto/aead interface, which does about as good a job as any library at being user-proof.
I too would probably prefer code that used Nacl primitives over Seal/Open, but I would probably not flag code that didn't.