Hacker News new | past | comments | ask | show | jobs | submit login
Peer-to-Peer Communication Across Network Address Translators (2005) (brynosaurus.com)
119 points by alanfranz on Aug 17, 2017 | hide | past | favorite | 45 comments



We been doing this at Vonage since 2001. This was our "secrete sauce", since at the time there wasn't a well documented or an industry standard on how to do this. Eventually SIP through NAT papers and RFC's did appear, but we already had our way of doing it that was just as effective.

We found that renewing the NAT entry at a rate of once per 20 seconds was good enough. This made it work through all the routers and gateways we came across.

EDIT: SIP is short for Session Initiation Protocol. Its the protocol used to connect VOIP phone calls. We used UDP. SIP's standard port is 5060 and 5061. We had to switch to port 10,000 (any port other than the standard) because too many high end routers and gateways would manipulate our packets. These routers were trying to implement SIP through NAT on their own. I applauded their efforts but it caused problems for us. I still remember the day that I took it upon myself to change production to use port 10,000. I was really scared I was going to break something but everything worked out in the end.


I used to be a Vonage customer (service was nice, as long as my Cox internet was working - but I shut it off after my wife and I finally both had cell phones - this was at least a decade or more ago).

I had always wondered how you guy's were doing it, but I had always thought that the router Vonage handed out was something special that used its own reserved ports or something that wasn't easily visible. I never looked into it deeper. It's interesting that instead something like this "hole punching" technique was used.

I've heard about the technique before, but never investigated it before...


The solution was very simple. It came down to this.

- SIP end points register every so often. Each registration had a TTL. We set the TTL to 20 seconds. This actually leads to a registration every 16 seconds, since SIP end points register at 80% into their TTL.

- When a SIP end point reigsters, do not register the end point at the IP:port that it is advertising int he SIP Registration message. Instead use the source IP and port of the IP/UDP packet/datagram.

- Inbound calls to that SIP end point must go through the server that the SIP end point registered with and it must use the destination port (10,000) as the source port. We need to match the NAT entry in the router.

- When a phone call comes in for that end point, send the SIP Invite message to source IP:port of the SIP end point. Since the device is registering every 16 sec, there is sure to be a NAT entry in the router to allow our packets into the network and routed to the SIP end point. It is vital that we use the same server that the SIP end point registered with.


Everything is simple if you know what you are doing. Kudos! I had my own share of adventures with SIP across NAT so I know it must have been lots of experimentation and studying to find this simple solution.


It sounds like they are describing "symmetric RTP" which is still a common way to get SIP n RTP to work. Another way is to use a VPN and route the blasted things - which was/is a classic cop out with a gloss of security added in. Then you got to fix the pain of subtly incompatible IPSEC implementations instead. Strangely enough all VoIP phones with a VPN built in (that I know of) use OpenVPN and not IPSEC. OpenVPN uses a single port which may be 1194/udp or not.

Another "fix" to SIP - a protocol that works just as successfully over NAT as ftp (mmm active or passive? 1:1 NAT bodge/fix? etc etc) - is to mix the control and data into one stream: eg IAX(2). IAX uses a single well known destination port (4569/udp) which is as easy to implement as DNS or NTP, which are both generally UDP but DNS can require TCP, through a NAT. IAX can make a fine

Sometimes folk decide to run SIP over TCP instead of UDP and may even decide to run that inside a TCP tunnel eg OpenVPN over TCP. That's fine until you discover how TCP inside TCP can result in exponential standoffs when network conditions are sub-optimal. That can really bugger up voice traffic that really needs sub 100ms point to point latency to be acceptable. Then there is dealing with packet fragmentation if changes of media require it and of course the fact that TCP packet streams can arrive out of order and be re-assembled. That latency wont worry your web browser but it makes voice sound horrid.


Lots of confusion on this thread. SIP never handles actual voice traffic. It is just a signaling protocol used to initiate the ICE process of actually establishing the connection, either a direct p2p connection or a relayed connection using TURN when the NATs can't be traversed. That's when traffic is typically UDP, as it's easier to traverse NATs with UDP. The transport SIP uses is largely irrelevant, although it's pretty silly for it to use anything but TCP because reliability of sending the offer and the answer is pretty important, and they can be sent in multiple packets.


SIP is just the signaling protocol. The actual NAT traversal really takes place in ICE, which uses STUN. ICE does the heavy lifting though.


Strange to see this relatively old article appear. NATs are still with us. The main progress has been WebRTC. Otherwise, researcher who looked at TCP at the time (e.g., stunt) for simiulataneous TCP Syn packets failed. It was too hard to synchronize timing. Most middleware still ignores NATs and hopes it will go away (it won't). NATs are the reason REST won over CORBA et al.


"NATs are the reason REST won over CORBA et al."

CORBA "et al" lost because they are fragile, not because of network issues. SOAP, for instance, is no more difficult from a connectivity perspective than REST; it rides on the same TCP/HTTP rails as REST. Yet hardly anyone, given the choice, will choose SOAP today.

Not that REST (really just HTTP APIs, mostly) is all that robust; it just asks far less of everyone involved to specify, coordinate and troubleshoot. You can hit an endpoint with your browser, pop open the browser developer tools and figure out what is going on. That _easy of use_, which CORBA, SOAP, "et al" completely forego, lowers cost, speeds development and causes an order of magnitude less nausea for developers. The market has proven the value of these benefits, despite what holdouts wish to believe.


   > The main progress has been WebRTC
Yes, it is a kind of combination of the best approaches to NAT traversal from the SIP world, supporting both STUN & TURN and ICE with the Trickle addon.

Some very good references on the topic I found recently: https://webrtchacks.com/ and https://www.html5rocks.com/en/tutorials/webrtc/infrastructur....


The WebRTC stuff was based on libjingle originally. We built that as part of the Windows Google Talk client. It was all based on XMPP and not SIP.

The techniques were used in many contexts at the time.


Yeah exactly. This signaling protocol doesn't really matter, and both SIP and XMPP to me are way over engineered and super inefficient- silly that they're text based at all. Essentially like HTTP 1.0 and 1.1 vs HTTP/2 - we just don't have the latter in the signaling case yet, or at least not standardized that I know of.


NATs are the reason REST won over CORBA et al.

I don't think NAT had an awful lot to do with that.


CORBA just didn't work across organization's internal networks when NATs started appearing in the late 90s. I am convinced it would have had such a lead it would have dominated. Web services appeared to rectify NATs/CORBA, but it sucked as a standard. Then came REST to fix web services. The root cause was NATs, IMO.


While NATs might have contributed to its demise, there were more fundamental reasons why CORBA failed.

For anyone interested in thorough analysis instead of opinions I recommend this really good article from 1994:

https://github.com/papers-we-love/papers-we-love/blob/master...


The root cause was not NAT. Web Services were CORBA-inspired so that's a pretty solid data point that it wasn't NAT.

You might enjoy this for a laugh, though (Aug. '96 vintage).

https://i.imgur.com/iVDUypy.jpg


I wonder why this comes up here. This is a quite old method. I think it is used in many peer-to-peer applications. I'm quite sure that Skype has used it also in the past.

We have also implemented that in our game OpenLieroX. We call it UDP NAT traversal if you search for it in the code.

Main code: https://github.com/albertz/openlierox/

UDP master server: https://github.com/albertz/openlierox/tree/0.59/tools/UDPMas...


Because, while old, it's still used and useful. And a lot of (relatively) young people don't know about it.


And consequently they don't think end-to-end communication is feasible without "WebRTC"?


Sigh.

WebRTC just uses hole punching, and its implementation (ICE+STUN+TURN) is as over-engineered as you would expect from a modern protocol stack.


Successful peer-to-peer networking goes back many years. I would guess gamers solved it first, but I am not sure. Could be a fun project to try to document all the history, if that is even possible.

1999: http://alumnus.caltech.edu/~dank/peer-nat.html

I have seen reliable, fast overlay networks accomplished with a relatively small amount of code. As a result, this is where I set the bar. The more voluminous the project, the more quickly I lose interest.


I, for one, am glad it came up. I've been fretting about how to do such a thing for a multiplayer game I'm working on, but couldn't remember the name of the technique to look up. Thanks to the post and the comments here, I have plenty to use for a solution.


Adam Ierymenko (@api here and reddit) has a modern and concise take on NAT

https://www.zerotier.com/blog/state-of-nat-traversal.shtml


That's still the blog article that gets the most hits. I wrote it in response to someone on HN claiming that "you can't do peer to peer because of NAT" when we were doing it.

There's the standard ways -- hole punching as described there, uPnP/NAT-PMP, etc. -- and then there is a long tail of hacks for getting to that 90%+ success rate you want. Here's an incomplete list:

* Port prediction for symmetric NATs that increment ports

* Creating multiple sockets to work around buggy NATs that can't hole punch if multiple devices have the same local port, etc. There are also buggy NATs where hole punching breaks if you also open a port with NAT-PMP or uPnP. There are a lot of buggy NATs. (Factoid: we have found zero correlation between the cost of a NAT device and its buggitude.)

* Getting the timing as tight as possible -- we use a triangular three-party rendezvous method to try to get it so both sides see a send before the "reply."

* Sending UDP packets with incrementing TTLs ahead of the real hole punch message to open multiple layers of NAT, and randomizing the few-byte payload of those low-TTL packets because some NATs care about this. Yes people plug NATs into NATs into NATs. It's pretty common. Ugh. As long as none are symmetric it works.

* If at first you don't succeed try try again... forever. For long-lived links (common with ZeroTier since they are virtual LANs) if you keep trying and include some random port trials you will eventually punch a symmetric NAT. There are only 65535 ports.

ZeroTier still has to relay though if both parties are behind hostile NATs, but we can provide free relaying for everyone since it's only a small percentage of traffic and cloud bandwidth is cheap.

Edit: there are more sophisticated methods that work more of the time, but those unfortunately can trip intrusion detection systems. We try to walk the line with ZeroTier between high success and not tripping your IDS.

Edit #2: wow there's a lot of outright nonsense on this thread. I continue to be saddened by how few developers (even top notch ones) know so little about networking and have so many misconceptions about it. Networking is still such a black art. Of course if it wasn't maybe there would not be a niche for us. :)


That's a good list of hacks I've come across. But I thought we stopped using the 'symmetric nats' terminology a decade ago when the IETF Behave specification came out? https://tools.ietf.org/html/draft-ietf-behave-rfc3489bis-18 The specification tries to classify Nat behaviour into 3 classes: (1) when a port mapping can be reused (for any outgoing connection, for an outgoing connection to the same IP, for an outgoing connection to the same IP:Port endpoint), (2) what port is allocated when a new mapping is created (same as the private IP if possible, a contiguous port generated by a global counter, or a random port), and (3) how are incoming packets filtered/dropped based on the source IP:port (not filtered, ok from same IP, must be from same IP:port). Most Nats can be described using some combination of these behaviours. See this paper for details: https://www.hivestreaming.com/natcracker-nat-combinations-ma...


Implementing a TCP/IP stack used to be a rite of passage. I think the fact that everything is now layered on top of HTTP is what causes the problems observed in this thread, in a sense HTTP replaced TCP even though it sits at a higher level layer people are shoehorning everything into it because it tends to be passed on transparently if a proxy or some other gateway doesn't know what to do with the payload.


First, Zero Tier is fantastic. Second, there is tons of nonsense here. When I wrote Firestr (http://firestr.com) I used the simplest thing that could work, udp hole punching, and it works most of the time. Zero Tier does even more magic and it just works. P2P is possible, but the art is almost lost. We must keep it alive!


Reading that, I was prompted (translation: nerdsniped) into thinking about the Same Network Problem: how do you identify two machines that are actually on the same LAN and can communicate without NAT, without the risk of revealing any information about your LAN to a third party? (See the article linked above for a fuller explanation of why this is nontrivial.) Here's my plan:

The client enumerates all the machines it can see on its local LAN, and for each one, assembles a list of information about it, probably something like IP address, machine name, assigned user, maybe MAC address, etc. Hash this using SHA256 or something (unlike the suggestion in the ZeroTier article about IP addresses, this approach has enough information that brute force reversal is out of the question). Assemble a list of, say, 20 of these hashes - the client's own hash is first, followed by the rest of the LAN. If you have less than 20 nodes, fill out the list with random numbers; if you have more than 20, include yourself and 19 others at random (it's important that you pick them at random, not just the first 19 in your LAN listing). Send the list to the server - this reveals nothing about your LAN to it.

The server compares the hash lists sent by clients. If any two lists have any hashes in common, send a message to each of the clients that sent those lists: "You seem to be on the same LAN as the client with hash XYZ." The client can then look up the hash in its local table, and try to contact the corresponding node. (The list intersection check is why it's important to pick your sample at random - the birthday paradox is now working for us, and there's a good chance of at least one match even if your LAN has hundreds of nodes.)



Very good stuff, on paper. I couldn't make it work, by the way, from Mac to Linux. I'll try again in a different scenario.



Keep in mind that this paper was outdated the moment it was published and announced on p2p-hackers mailing list [1].

It doesn't deal with the "long tail" of NAT devices that increment/decrement ports in (somewhat) predictable manner and _most importantly_ it describes hole punching as a client-driven process.

The latter is the crucial point. By tasking a dedicated (mediating) server with coordinating the punching sequence it becomes possible to time the process much more precisely and to help predictions to actually match the reality. Combined with a bit smarter port predication it brings the success rate from 80% to 95-97%... or at least it did 10 years ago when I was using this for Hamachi P2P VPN, though I suspect that very little has changed in terms of NAT type distribution since then.

[1] http://copilotco.com/mail-archives/p2p-hackers.2005/msg00126...


I didn't read the whole thing so maybe it's addressed, but from what I understand it sends packets to both private and public IPs of the person you want to talk to, and use the first one you get an answer from.

What if the private IP from the other peer is in another NAT, but in your local NAT, you have another peer with that same private IP? He would answer you, and would would establish communication with the wrong peer?

Probably would need an extra step to validate the peer's public IP address is also the same?


Yes, it addresses that concern, and it proposes that the packets sent should be validated in some form of previously agreed method


If I could piggyback on the topic here, can anyone speak to the practical usage of these (or other) techniques on mobile devices? Are there working peer-to-peer apps?


This is a wonderfully succinct description of the topic and probably the most approachable explanation I have read. Really interesting that this was written in 2005 since the concepts described form the basis of how the later WebRTC and its supporting infrastructure STUN/TURN work.

As brilliant a hack as hole punching is I hope someday we can move to a world of IPv6 where it is no longer necessary.


overlay network are a good solution to NAT i think, i'd say they should be anonymous but something that's just private like cjdns is pretty good too. there are some performance and latency issues to deal with, but when you overlay you don't have to wait for network operators to fix NAT.


IPv6


This technique will remain useful with IPv6. NAT will disappear, but the stateful firewalls (which are often combined with NAT) will remain.


Yes, but IPv6 helps a ton. Hole punching with IPv6 is almost 100% successful as long as your firewall allows it (and most do unless you are whitelist-only). IPv6 kills symmetric NAT and over-provisioned NAT.


This is the method Vizio TVs use to register with the Inscape mothership to report to the system what is being watched on every Vizio TV in existence.


Why though? Was the Inscape mothership behind a NAT they didn't control?


No, the TV needs to initiate the connection to the mothership to punch the hole. This then allows the mothership to send commands to the TV over UDP to manage its activities.


That's not P2P hole punching. That's just a UDP to a cloud server. Stateful firewalls remember UDP sessions for a period of time.

UDP hole punching is when both parties are behind NAT and it's used almost exclusively for P2P: torrents, VoIP phones, video chat, multiplayer games, ZeroTier, WebRTC, IPFS, etc.


tl;dr: UDP Hole Punching using a server.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: