I've been looking at Intel's DPDK recently [1], and it looks really interesting ...

pktgen · on Feb 23, 2014

> 1. Data Plane Development Kit, which lets you skip the kernel IP stack (which takes thousands of CPU cycles to process) and do packet processing in userland taking just tens to hundreds of cycles per packet. http://dpdk.org/

I wish OS developers saw this as a problem. There is no reason kernel stacks should be so slow for tasks where all processing is done in the kernel. (For packets destined to userspace, you've got the syscall overhead to deal with.)

I recently tested the Linux network stack's PPS performance with an Intel X520 10GbE NIC. I used Debian testing, with the 3.12 kernel. My destination machine was an i7-3930K at stock speed. I wrote a simple kernel module adding an NF_IP_PRE_ROUTING hook returning NF_DROP with no processing, which would be the simplest possible code path. For a packet generator, I used another older machine with another X520, using the "pfsend" tool included with PF_RING, and the card in PF_RING DNA mode. That was easily able to saturate the link at line rate (14.8M PPS).

The result: the kernel was only able to sustain about 2.8M PPS.

I then loaded the DNA driver on the destination machine, used the included "pfcount" tool, and no packet drops - it was receiving the full 14M PPS.

I tested DPDK recently and had similar results.

I also modified the Linux ixgbe driver ixgbe_clean_rx_irq() function, and added a step in between the "fetch packets from RX ring" and "put packet in SKB and send to network stack" functions. Even when I added a bunch of useless comparisons for each packet, I was able to get ~12-13M PPS. I could get line rate by just dropping and not doing any processing.

rdl · on Feb 23, 2014

I wonder if CloudFlare does any of this.

pktgen · on Feb 23, 2014

They do. They use Solarflare's OpenOnload: http://blog.cloudflare.com/a-tour-inside-cloudflares-latest-...

rdl · on Feb 23, 2014

Those look nice, but presumably Intel NICs would be an order of magnitude cheaper, especially if built into the chipset.

pktgen · on Feb 23, 2014

Definitely. I am not sure what metrics Solarflare beat others (presumably including Intel) on, besides software support (OpenOnload implements the BSD sockets API and doesn't require software changes, while DPDK and others implement their own). In my testing, the X520s have been capable of both sending and receiving at line-rate.

It's possible the Solarflare NICs could be better at some metric like latency, by a matter of microseconds (only speculating here), but I can't see how they'd consider that worth the extra money for HTTP servers.

rdl · on Feb 23, 2014

I wonder how low the price of an Atom C2358 based system with 4x Intel Gig-E and either mini-PCI or some other solution for wireless would be. That would probably be what I'd build a "high end" home router, or low-end business router, on -- you'd still be able to hit PoE 802.af 2009 (25W).

DPDK supports Atom. This Atom has both virtualization and AES-NI, so you could do some interesting security things (the fireeye-style "run malware in a VM", and linerate crypto).

~$120 in parts means a retail price around $500. That doesn't seem at all unreasonable to me, but there's always the "sell it for $250 and make $5-10/mo MRC" option. I think Cisco/Linksys tried that but it was pretty weak.