Hitch – A Scalable TLS Proxy by Varnish

corford · on June 9, 2015

From reading over the blog post, it seems hitch is a forked and patched version of stud. Nice and all but difficult to see what possible advantages it could have over using haproxy for termination?

From skimming the github page the only thing that stands out is shared memory-based SSL contexts and UDP peer communication across processes/machines. Not sure, though, if this is something haproxy can also do? Never had a need for that level of performance so have never looked.

sciurus · on June 9, 2015

Interesting to see people recommending haproxy for ssl termination now. HAProxy has only supported that since June 2014. Before that, you saw people using projects like stud with it, and others recommending just using nginx in front.

corford · on June 9, 2015

True but if they were like me, they were using those as stop gaps until SSL landed in haproxy 1.5 stable. Yes, stable support arrived 'only' a year ago but that was after years of development work :) The first dev release with basic SSL support was back in September 2012.

toast0 · on June 10, 2015

We use stud for tls termination at work. It's because we can put it in front of all of our various http servers and get consistent tls support. Almost all of the services have the same stud config, just different certs. Stud is very amenable to running in a jail too (so we've limited the damage of the next openssl vulnerability), so the config references one cert path, and we just put the right cert for the machine when we assemble the jail. The only other usual difference is the number of localhost IPs to use, so the script detects that as well.

Haproxy would probably work fine too, but seems a bit overkill for running a local termination proxy.

corford · on June 10, 2015

haproxy is basically identical for your use case. I guess whether one views it as overkill or not is a matter of taste.

toast0 · on June 10, 2015

Yes, haproxy can do the same job, but it can also do a whole bunch more that we wouldn't be using. It's overkill in my opinion to have load balancing, status checking, request inspection, etc available, when all i need is listen on port 443, strip tls, add a proxy header, send to localhost. (There's nothing wrong with haproxy, and I would consider it if I needed the other features)

lkarsten · on June 9, 2015

Advantages are that it is faster, and that it is a small and simple program that does a single thing well.

corford · on June 9, 2015

From reading the launch posts it seems the real advantage is it allows Varnish Software to bring SSL under the same roof and offer commercial support for it i.e. more a business advantage rather than a technical one?

mrb · on June 9, 2015

"By default, hitch has an overhead of ~200KB per connection"

Ouch. This default means 16GB only lets you handle 80k connections. I would hardly call this "scalable". In 2015 you find blog posts left and right showing you how to reach 1 million connections on a single machine with language X or tool Y or framework Z. Maybe the developers should change this default.

https://news.ycombinator.com/item?id=3028741

treffer · on June 9, 2015

Preallocating memory is usually an opimization for throughput.

Servers can have up to 1TB of ram without becoming overpriced.

But starving 80k users due to low buffering will be expensive in the long run. Way more expensive than RAM ;-)

mrb · on June 9, 2015

"Preallocating memory is usually an opimization for throughput."

Still, 200kB is excessive. A program with buffers no larger than 10kB can _easily_ saturate a 1 Gbit/s NIC. Hitch is designed to handle many concurrent connections, so even if it handled a paltry 10 connections it could easily saturate a 10 Gbit/s NIC with 10kB buffers. If not, then there is a design flaw somewhere.

"Servers can have up to 1TB of ram without becoming overpriced"

This is irrelevant. If a Hitch version had a default overhead of 10kB per connection, it could in theory scale to 20x the number of connections than this version of Hitch, for a given amount of RAM (no matter the amount). Maximizing the use you get out of a given amount of hardware resources should be your priority when writing scalable software.

lkarsten · on June 10, 2015

How do you think the CPU usage would be with 10kB buffer sizes? And since we're throwing numbers out in the air, why stop at 10kB? If we reduce to 1k, that should give us MUCH MOR connections!!11.

Let me ask a leading question: how much of this do you think is openssl overhead?

Please consider optimising for a real usage scenario, not some fantasy benchmarking setup.

mrb · on June 10, 2015

I am not picking my numbers randomly. On a x86/x86-64 Linux kernel, one socket (one connection) will use at least one 4kB physical memory page. So if userland also allocates one or two 4kB pages for its own needs, you need at minimum 8 to 12kB per connection. That's why I quoted ~10kB.

The minimum theoretical memory usage is 4kB per connection: 1 page in kernel space, and nothing on the userland (eg. you use zero-copy to txfer data between sockets or to/from file descriptors).

At Google, our SSL/TLS overhead per connection is 10kB: https://www.imperialviolet.org/2010/06/25/overclocking-ssl.h...

lkarsten · on June 10, 2015

Thanks for the data point.

chrisbolt · on June 9, 2015

Are those blog posts referring to setups with TLS? Comparing plaintext HTTP to TLS is comparing apples and oranges.

toast0 · on June 10, 2015

If you want to handle 1M connections, you can tune this. It will probably be the easiest thing to tune of many. Note that 1M connections terminated stud/hitch is actually 3M sockets: 1M inbound to stud, 1M initiated by stud and 1M terminated by your underlying server. That's a lot of connections on localhost (on the plus side, 127.0.0.1 is a /8)

e12e · on June 9, 2015

If you're splitting 10gbit across 80.000 users, that leaves 125 kbps per user. Split it across 1 million users, and it leaves 10kbps per user. Sure, you could have more than 10gbps bandwidth from a single server to the Internet in theory -- but at that point I don't think sticking to 16GB ram makes much difference.

vbezhenar · on June 9, 2015

Millions of connections usually are websocket connections or HTTP keep-alive connections. In those cases there's no much traffic over those connections. Imagine game server for example. Latency is more important than bandwidth. 10 kbps is enough for many tasks.

e12e · on June 10, 2015

I wonder how much of a hit typical web-socket use-cases would take from swapping to SSD? For games I'd think one might prefer just using connectionless udp, though?

vbezhenar · on June 10, 2015

Websockets are for browser clients, there's no much choice there unfortunately.

jacquesm · on June 10, 2015

This is TLS, NOT TCP/HTTP.

Secure sockets have a lot more overhead than plain TCP sockets, on top of that it has all of the overhead that a proxy has per connection.

shanemhansen · on June 10, 2015

One big overheard in SSL can be zlib compression buffers. Setting ssl_op_no_compression can help quite a bit.

skuhn · on June 10, 2015

That's true, and you shouldn't support TLS compression anyway (to resolve attacks like CRIME).

You should also set SSL_MODE_RELEASE_BUFFERS to reclaim memory from idle SSL connections.

lkarsten · on June 9, 2015

Yes, doing hard crypto for all users has costs. Welcome to the real world. :-)

insensible · on June 9, 2015

Blog post announcement: https://lassekarstensen.wordpress.com/2015/05/15/introducing...

jzelinskie · on June 9, 2015

After reading the source, I realized that I have no idea what modern, idiomatic C looks like. There are tons of uses of #define, some even in the middle of typedefs. I'm pretty sure a whole queue data structure is defined in macros in vqueue.h. Is this normal?

lordlarm · on June 9, 2015

From vqueue.h header:

It's very much based on this:

http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/sys/queue.h?anno...

Not very modern, in other words.

vbezhenar · on June 9, 2015

C does not have generics or templates so macroses are the only way to define data structures and algorithms over arbitrary types without repetition. Linux kernel uses macroses to define some data structures. Many other projects do the same.

Whether that is normal or not, it's up to developer. It's possible to use as small subset of C++ as developer wants. E.g. use only templates. But staying in C realm might be better for portability.

JoshTriplett · on June 9, 2015

> C does not have generics or templates so macroses

When I read this, I couldn't help but think of "macroses" as pronounced like "neuroses", which seems appropriate.

talideon · on June 9, 2015

Pretty normal for the BSD kernels. A lot of the very basic datastructures are defined as macros. For instance, here's queue.h: https://svnweb.freebsd.org/base/head/sys/sys/queue.h?revisio...

pyritschard · on June 10, 2015

While not 100% necessary, header macros to implement lists, queues (and even trees, see: http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/sys/tree.h?...) are common in C.

They first appear for kernel usage where the fact that they expanded to inline code in functions avoided creating too many stack-frames and provided optimization.

They do have the advantage of not relying on casting everything to "void *" or resort to callbacks for walking (see: TAILQ_FOREACH for instance).

donpdonp · on June 9, 2015

I'm already using pound for SSL termination http://www.apsis.ch/pound/. Does Hitch provide any advantages over pound?

corobo · on June 10, 2015

The one I'm seeing is that this might work nicer with wildcard certificates and SNI.

Unless it's changed since I last attempted Pound only supported a single wildcard cert when it came to SNI whereas looking at the hitch code it suggests it might play nicely with multiple wildcard certificates

Edit: To clarify I don't think Pound technically supported any wildcard certificates, the wildcard cert had to be the default to work

nodesocket · on June 9, 2015

What is the advantage over just using nginx to terminate SSL?

kelnos · on June 10, 2015

Presumably the "do one thing and one thing well" principle (assuming hitch actually does it well). Your attack surface is reduced by orders of magnitude if you're worried about future OpenSSL vulnerabilities.

retr0h · on June 10, 2015

Wonder why LibreSSL wasn't used... :/

lkarsten · on June 10, 2015

The reason is pretty simple: LibreSSL isn't available/packaged on the distributions we care about, and we don't have the will, money or knowledge to do it ourselves. (with my VS hat on)

We are positive to merging any code changes necessary to get it running with libressl though.

jkarneges · on June 9, 2015

I hadn't realized Stud had gone unmaintained. Great to see Varnish taking over the project!

skuhn · on June 10, 2015

Yeah, it's worrisome that people still use the bump version un-patched. The last commit to the official stud github repo is from 2012: https://github.com/bumptech/stud/commits/master (although there are many forks).

po · on June 10, 2015

For those that don't know, varnish was always a bit reluctant to take on the challenges of TLS termination. They traditionally had more of a 'do one thing well' policy. Here's a good description of their reasoning around that:

https://www.varnish-cache.org/docs/trunk/phk/ssl.html

skuhn · on June 10, 2015

The problem with putting forward a 'do-one-thing-well' rationale is in considering TLS to be a separate problem from serving HTTP. It simply is not, and even in 2006 the writing was on the wall: HTTPS will be the standard web transport protocol within the next few years and HTTP will cease to be a viable option for production.

This has pros and cons, but besides the current CA situation I think it's pretty clearly better than what we have today. That's not really the point though; it's going to happen, regardless of flaws.

Using software like Varnish that is intentionally HTTP-only will always be possible, but it introduces architectural and operational handicaps. It may not matter for a lot of use cases, but at large scale you are going to pay for the architectural choice to separate these functional units into multiple processes (or even boxes).

As much as I appreciate some of what Varnish can do, the no-SSL stance and associated mindset really puts me off of it.