If the feature already technically exists in TCP, it's either broken or disabled...

voxic11 · 2024-05-09T20:43:22 1715287402

keepalives are an optional TCP feature so they are not necessarily supported by all TCP implementations and therefor default to off even when supported.

dilyevsky · 2024-05-09T23:28:34 1715297314

Where is it off? Most linux distros have it on it’s just the default kickoff timer is ridiculously long (like 2 hours iirc). Besides, TCP keepalives won't help with the issue at hand and were put in for totally different purpose (gc'ing idle connections). Most of the time you don't even need them because the other side will send RST packet if it already closed the socket.

halter73 · 2024-05-10T01:06:46 1715303206

AFAIK, all Linux distros plus Windows and macOS have TCP keepalives off by default as mandated by the RFC 1122. Even when they are optionally turned on using SO_KEEPALIVE, the interval defaults to two hours because that is the minimum default interval allowed by spec. That can then be optionally reduced with something like /proc/sys/net/ipv4/tcp_keepalive_time (system wide) or TCP_KEEPIDLE (per socket).

By default, completely idle TCP connections will stay alive indefinitely from the perspective of both peers even if their physical connection is severed.

            Implementors MAY include "keep-alives" in their TCP
            implementations, although this practice is not universally
            accepted.  If keep-alives are included, the application MUST
            be able to turn them on or off for each TCP connection, and
            they MUST default to off.

            Keep-alive packets MUST only be sent when no data or
            acknowledgement packets have been received for the
            connection within an interval.  This interval MUST be
            configurable and MUST default to no less than two hours.

[0]: https://datatracker.ietf.org/doc/html/rfc1122#page-101

dilyevsky · 2024-05-10T01:17:57 1715303877

OK you're right - it's coming back to me now. I've been spoiled by software that enables keep-alive on sockets.

mort96 · 2024-05-10T07:17:10 1715325430

So we need a protocol with some kind of non-optional default-enabled keepalive.

josefx · 2024-05-10T07:43:50 1715327030

Now your connections start to randomly fail in production because the implementation defaults to 20ms and your local tests never caught that.

mort96 · 2024-05-10T08:50:33 1715331033

I'm sure there's some middle ground between "never time out" and "time out after 20ms" that works reasonably well for most use cases

hi-v-rocknroll · 2024-05-10T05:02:33 1715317353

You're conflating all optional TCP features of all operating systems, network devices, and RFCs together. This lack of nuance fails to appreciate that different applications have different needs for how they use TCP: ( server | client ) x ( one way | chatty bidirectional | idle tinygram | mixed ). If a feature needs to be used on a particular connection, then use it. ;)