Four Key Features of a Performance 10GbE NIC

Behind processor, motherboard & memory selection, which 10GbE adapter is your fourth most critical design decision when selecting the hardware for your next server. Some products are focused on performance while others are commodities. With 10GbE one of the major performance metrics is latency, a fancy way for saying reducing the time it takes to get your data into & out of main memory. We’re going to look at four areas that highlight significant performance differences between these two approaches: transmit & receive queues, MSI-X interrupt vectors, Receive Side Scaling (RSS) queues, and physical/virtual functions (a foundational approach to supporting virtualized computing).

Just to review, a NIC receives information from an external network, and places it into your server's memory. It also takes information provided to it, and places that on the network. Designing for latency one might look towards servicing the financial markets of the world, the folks who dollarize every nanosecond of the trading day. With this in mind it would be best to structure a performance NIC with 1,024 transmit & receive queues, or Virtual NICs (vNICs) connected to each 10GbE port. On the Ethernet controller chip one should then also place a layer-2 network switch in front of those 1,024 vNICs. This network switch can then use the packet’s VLAN tag to intelligently steer packets to the proper vNIC assigned to a given VLAN. In contract to this commodity NICs typically only have 128 receive/transmit queues attached to each port, or 1/8th of what is found on performance products.

Next we have Message Signaled Interrupts (MSI-X), which is a very common way to inform the processor that data is waiting at an I/O device to be picked up. With PCI Express, we no longer have dedicated hardware interrupt request lines, so I/O devices have to use a shared messaging interface to inform the processor data is waiting. Performance products typically support 1,024 MSI-X interrupts compared to a commondity products 128. Again, this is 1/8th the underlying infrastructure necessary for passing high performance data to the host CPU complex.

As computers moved from one processor chip to two, then from single core chips to now 18 cores/chip (Intel's latest Haswell CPUs) the challenge has always been mapping pathways from I/O devices directly to these processor cores. One of the most efficient mechanisms for linking cores to ethernet receive queues is a process known as Receive Side Scaling (RSS). On Intel servers PCI Express slots have an affinity for a specific CPU socket. So for optimal performance you align your 10G ethernet NICs to utilize specific CPUs by the PCI Express slot you install them into. Suppose for example you have a state-of-the-art Haswell dual socket server, and each socket has an 18 core processor. For optimal performance you might install two dual port 10GbE adapters, one in a slot that maps to CPU socket 0, and the second in a slot that maps to CPU socket 1. With this approach you can then achieve peak network performance on your server. There’s a problem though, commodity NICs typically only support 16 RSS queues per port so two cores on each of your sockets will receive less then optimal performance, as their traffic has to be routed through other cores. While perforanc NICs on the other hand have 64 RSS queues per port, and can easily spread traffic over multiple paths to every core in your server. Up to this point all of these numbers have been per port.

Today many computers rely on virtualization to fully utilize all the resources of the server. To do this network adapters support what are called Physical & Logical functions. Physical Functions (PFs) are a method for exposing to the Virtual Machine’s Hypervisor what is essential a fully complete physical instances of the network adapters. Again performance NICs supports 16 Physical Functions while commodity products only support two. Virtual Functions (VFs), are a method for creating full virtualized NICs, here performance NICs support 240 while commodity ones only have 128. Note that these PF & VF numbers are per adapter.

Earlier we mentioned that performance NICs were designed around latency, yet we’ve not addressed latency. The generic kernel device driver on performance NICs typically delivers sub 4 microsecond latency for a 1/2 round trip. Note a half round trip is a single send & receive combined. Generic drivers on commodity NICs often have more than double this. When it comes to overall server performance, network adapter latency really does matter.

So next time you’re selecting components for a server deployment please consider what you’ve learned above about 10GbE NICs, and “Choose Wisely.”

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics