Yes, an Octadeca Core Processor

Before you head off and Google ‘octadeca’, rest assured that this post is NOT another rant along the lines of “why on earth do we need quad-core processors in mobile phones?” Which is not to say that I am done with that particular line of thought, on the contrary, I’ll most certainly revisit it before the end of the year. But I digress.

The conversation now underway continues my recent thread toward servers, datacenters and cloud computing. I am somewhat surprised that the semiconductor device under discussion has not been tagged “cloud computing on a chip” by some overly clever marketing person. Anyone who manages to put EIGHTEEN 64-bit CPU cores on a single, monolithic die in 2014 has earned the right to a little hyperbole.

Intel recently announced the mother (for now) of all server chips in the Xeon E5-2699v3. Note to clever marketing people at Intel: I don’t know if you’ve actually got a numbering scheme here, but I do know one thing … you seem to have missed the one number I would want jammed in there, say, the number of cores. 18 full Haswell cores running at 2.3 GHz, capable of running up to 3.6 GHz (under careful on-die supervision, so as to avoid meting through the concrete floor of the datacenter). In addition to the CPUs, there are few other odds-and-ends:

  • Four memory controllers, supporting DDR4-2133 memory. For those playing along at home, that’s an aggregate memory bandwidth of 68 GBps.
  • A pair of QPI links, each running at 9.6 GTps (giga-tranfers per second).
  • Not one, but TWO ring buses; each ring bus is actually a PAIR—one running clockwise, the other anti-clockwise—to reduce latency. And all of this is connected by a pair of buffered switches: the damn thing has got its own NETWORK on die to keep the processors fed.
  • Sit down and take a few deep breaths before reading the next bullet.
  • 45 MB of L3 cache. On die. It’s actually two separate blocks: a 25 MB L3 cache serving 10 processors and a 20 MB L3 cache serving 8 processors.

Now I am going to date myself here something awful: the very first server I personally configured was an IBM PC/AT running Novell NetWare. This was HOT STUFF at the time, delivering minicomputer-class networking at a tiny fraction of minicomputer-class cost; many of you are unfamiliar with the term ‘minicomputer’, so now would be a good time to move on to the point at hand. NetWare supported disk mirroring (it would be two more years before Patterson et al. brought the term ‘RAID’ to the popular vernacular) for high-availability, which was especially valuable given the MTBF of spinning media at the time. Finally the punchline: this très chic server was built with a pair of leading-edge 40 MB hard drives. Yup, the Xeon E5-2699v3 has more cache memory than my server had storage.

Dual socket Xeon motherboards are extremely common in today’s massive cloud computing datacenters. As a matter of fact, they are the compute node of choice for Amazon Web Services (AWS). These being proper Haswell cores, they have the full gamut of virtualization and hyperthreading. So a single AWS motherboard can now carry THREE DOZEN physical processors, appearing to the hypervisor as SIX DOZEN cores.

OK, there is one number that may be more impressive than the 45 MB (!!!) of on die cache. Due in no small part to said cache, the die size is 662 sq-mm. If the die is square, that is exactly one inch on a side; very large, to be sure, but perfectly manufacturable. Speaking of which, this bad boy is coming off the 22nm line so the yields are solid.

Then there is the matter of power. 18 Haswell cores with the assorted bulleted goodies carry a 145 W maximum TDP … while that is a lot of power, it is VERY manageable in a 2U rack chassis. I’ve studied a pile of benchmarks and one in particular caught my eye: power consumption normalized to compute horsepower. Long story short—as if that was possible at this juncture—the E5-2699v3 runs 12% cooler on normalized compute horsepower than its immediate predecessor the E5-2697v2 (with a piddling 12 processors) … and both of these devices are on the same 22nm process.

OK, OK, enough of the jazzy numbers. Many of you likely thought of a GREAT question way back: “why am I spilling all this ink on what is effectively a product announcement?” This is all about the Big Iron versus Microserver battle for massive cloud datacenters. Some months back I observed:

In short, there is NO WAY that Intel will allow ‘microserver’ to become synonymous with ‘ARM’. With that said, Intel’s gross-margin-dollar-printing-press will remain firmly rooted down the big iron Xeon fork in the road. And they are going to pave that road silky smooth to make it as attractive as possible.

And now we have a feel for what is shaping up on Road Big Iron: massively parallel Xeon devices delivering increasing performance with decreasing power-per-performance. Straight out of the Intel playbook, we’re seeing this on the super-stable 22nm node with the previous-generation Haswell core. Next up? Xeon [insert better numbering scheme here] with at least two dozen Broadwell cores on the 14nm node. More cores, check. Greater instructions per cycle, check. Lower power-per-performance, check. I LOVE completion so I will keep encouraging the microserver ecosystem … but right now, Road Big Iron looks mighty attractive. From Silicon Valley.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics