> You have to plan to handle the special cases or not be surprised that you can'...

acqq · on Nov 11, 2014

As soon as you think "I send a valid packet, everything must work" you're missing the point. Even OP at the end worries about the actual client communication actually not working. At the end, it's not who's "theoretically" right, it's "can you make it work given the real world limitations" that include the implementations not tested with your "clever better than the competition dynamic timeout modification." It can be clever, but be clever more and plan for the cases when it is against the real life limitations. And don't cry foul. It's you who move into less tested territories, expecting to be better than the "competition".

The same as I don't complain about the herd reactions here. They are probable, therefore expected.

viraptor · on Nov 12, 2014

Sigh... let me say this again clearly, because you keep repeating the same point. This case has nothing to do with timeout modification. It will happen with or without it. If you use lower timeout you will run into the issue more often. If you use higher timeout - less often. But you will run into the issue anyway and there is no way to fix it.

So just tell us what is your proposed solution / implied more tested territories. You try to start a connection, send SYN, don't see a response for X (500ms, 1s, whatever, you choose). What do you propose to do that isn't a retransmit (breaks the other side) or choosing new seq numbers (breaks good behaviour on your side by starting two connections and possibly tripping flood detection if you do it too often).

acqq · on Nov 12, 2014

Yes, the buggy implementation is buggy in idealistic sense. But it doesn't matter. The fact is, that bug was de facto not visible until the OP introduced the more "clever" (shorter timeout) code. If the node were really, utterly problematic it wouldn't be on the internet.

What to do in this case? Well, what do you want to do? Want it to work with everything the same? You have to start with the timeout of 1s, like other TCP stacks do. That's what was tested, and then your behaviour to these nodes wouldn't be different from the rest of the internet. If you don't want to do this for all of yours connections (you're losing an advantage to the competitors), then be ready to introduce the list of nodes where you observe the behaviour, and assign the different starting timeout to them. Etc. There is always a solution, the wrong approach is "it can't be solved because it's against the RFC." The solution is making something work under the real-life limitations, and not having the world where there's no bad implementation (the state that's impossible to reach).

BTW I have an impression we don't understand each other because you have a "mathematical" approach (finding one counterexample, my theorem is disproved, let's wave my hand in the air in helplessness). I'm an engineer. There's a real life out there. Everything you can imagine will have a real life counterexample. Deal with it. Whatever you do, it's your decision. Ignore it, adapt, whatever, it depends on what you want to do. Just don't stay on "it's that others are wrong that's the problem, but I'm right and that's it." Unless you want to do this for show. But then allow me to claim it's just a show.

Imagine Google writing around 2000 "we won't index the badly formed HTML pages, because they are against the standard, and our XML processor will never work." There wouldn't be a Google today.