Same here. I have a hobby that on any RPC framework I encounter, I file a Github issue "did you think of TCP_NODELAY or can this framework do only 20 calls per second?".
I disagree on the "not a good / bad option" though.
It's a kernel-side heuristic for "magically fixing" badly behaved applications.
As the article states, no sensible application does 1-byte network write() syscalls. Software that does that should be fixed.
It makes sense only in the case when you are the kernel sysadmin and somehow cannot fix the software that runs on the machine, maybe for team-political reasons. I claim that's pretty rare.
For all other cases, it makes sane software extra complicated: You need to explicitly opt-out of odd magic that makes poorly-written software have slightly more throughput, and that makes correctly-written software have huge, surprising latency.
John Nagle says here and in linked threads that Delayed Acks are even worse. I agree. But the Send/Send/Receive receive pattern that Nagle's Algorithm degrades is a totally valid and common use case, including anything that does pipelined RPC over TCP.
Both Delayed Acks and Nagle's Algorithm should be opt-in, in my opinion. It should be called TCP_DELAY, which you can opt-into if you can't be asked to implement basic userspace buffering.
People shouldn't /need/ to know about these. Make the default case be the unsurprising one.
"As the article states, no sensible application does 1-byte network write() syscalls." - the problem that this flag was meant to solve was that when a user was typing at a remote terminal, which used to be a pretty common use case in the 80's (think telnet), there was one byte available to send at a time over a network with a bandwidth (and latency) severely limited compared to today's networks. The user was happy to see that the typed character arrived to the other side. This problem is no longer significant, and the world has changed so that this flag has become a common issue in many current use cases.
Was terminal software poorly written? I don't feel comfortable to make such judgement. It was designed for a constrained environment with different priorities.
sure, but we do so with much better networks than in the 80s. The extra overhead is not going to matter when even a bad network nowadays is measured in megabits per second per user. The 80s had no such luxury.
Not really. Buildout in less-developed areas tends to be done with newer equipment. (E.g., some areas in Africa never got a POTS network, but went straight to wireless.)
Yes, but isn't the effect on the network a different one now? With encryption and authentication, your single character input becomes amplified significantly long before it reaches the TCP stack. Extra overhead from the TCP header is still there, but far less significant in percentage terms, so it's best to address the problem at the application layer.
It was not just a bandwidth issue. I remember my first encounter with the Internet was on a HP workstation in Germany connected to South-Africa with telnet. The connection went over a Datex-P (X25) 2400 Baud line. The issue with X25 nets was that it was expensive. The monthly rent was around 500 DM and each packet sent also had to been paid a few cents. You would really try to optimize the use of the line and interactive rsh or telnet trafic was definitely not ideal.
> As the article states, no sensible application does 1-byte network write() syscalls. Software that does that should be fixed.
Yes! And worse, those that do are not gonna be “fixed” by delays either. In this day and age with fast internets, a syscall per byte will bottleneck the CPU way before it’ll saturate the network path. The cpu limit when I’ve been tuning buffers have been somewhere in the 4k-32k range for 10Gbps ish.
> Both Delayed Acks and Nagle's Algorithm should be opt-in, in my opinion.
Agreed, it causes more problems than it solves and is very outdated. Now, the challenge is rolling out such a change as smoothly as possible, which requires coordination and a lot of trivia knowledge of legacy systems. Migrations are never trivial.
I doubt the libc default in established systems can change now, but newer languages and libraries can learn the lesson and do the right thing. For instance, Go sets TCP_NODELAY by default: https://news.ycombinator.com/item?id=34181846
The problem with making it opt in is that the point of the protocol was to fix apps that, while they perform fine for the developer on his LAN, would be hell on internet routers. So the people who benefit are the ones who don't know what they are doing and only use the defaults.
1. Write four bytes (length of frame)
2. Write the frame (write the frame itself)
The easiest fix in C code, with the least chance of introduce a buffer overflow or bad performance is to keep these two pieces of information in separate buffers, and use writev. (How portable is that compared to send?)
If you have to combine the two into one flat frame, you're looking at allocating and copying memory.
Linux has something called corking: you can "cork" a socket (so that it doesn't transmit), write some stuff to it multiple times and "uncork". It's extra syscalls though, yuck.
You could use a buffered stream where you control flushes: basically another copying layer.
I have a hobby that on any RPC framework I encounter, I file a Github issue "did you think of TCP_NODELAY or can this framework do only 20 calls per second?".
So true. Just last month we had to apply the TCP_NODELAY fix to one of our libraries. :)
Would one not also get clobbered by all the sys calls for doing many small packets? It feels like coalescing in userspace is a much better strategy all round if that's desired, but I'm not super experienced.
So far, it's found a bug every single time.
Some examples: https://cloud-haskell.atlassian.net/browse/DP-108 or https://github.com/agentm/curryer/issues/3
I disagree on the "not a good / bad option" though.
It's a kernel-side heuristic for "magically fixing" badly behaved applications.
As the article states, no sensible application does 1-byte network write() syscalls. Software that does that should be fixed.
It makes sense only in the case when you are the kernel sysadmin and somehow cannot fix the software that runs on the machine, maybe for team-political reasons. I claim that's pretty rare.
For all other cases, it makes sane software extra complicated: You need to explicitly opt-out of odd magic that makes poorly-written software have slightly more throughput, and that makes correctly-written software have huge, surprising latency.
John Nagle says here and in linked threads that Delayed Acks are even worse. I agree. But the Send/Send/Receive receive pattern that Nagle's Algorithm degrades is a totally valid and common use case, including anything that does pipelined RPC over TCP.
Both Delayed Acks and Nagle's Algorithm should be opt-in, in my opinion. It should be called TCP_DELAY, which you can opt-into if you can't be asked to implement basic userspace buffering.
People shouldn't /need/ to know about these. Make the default case be the unsurprising one.