Tuesday, 28 April 2015

AMD Zen CPU Core Block Diagram Leaked – Features 512bit Wide Floating Point Unit And A Wider Integer Pipeline

An unbelievable leak showed up at our doorstep today, one that’s pertaining to AMD’s upcoming high performance Zen CPU core. A slide showcasing the block diagram for AMD’s upcoming CPU architecture has found its way to the internet.
AMD Zen Feature
Before we dive into what this slide shows we should make sure to mention that at the present moment we absolutely cannot verify the authenticity of the slide in question. However it won’t take long before we can find out for certain if this slide is truly legitimate or not. As it’s supposed to be part of AMD’s upcoming roadmap unveiling at the company’s Financial Analyst Day event on May 6th.

A First Glimpse Into AMD’s Next Generation High Performance x86 Zen CPU Core

We first broke the news about AMD’s next generation high performance core back in September of last year. At which point AMD’s then CEO Rory Read revealed the code name for the company’s upcoming high performance x86 CPU architecture. Prior to then we only had knowledge of its sister ARMv8 core code named K12.
However recently we’ve witnessed a flood of leaks pertaining to AMD’s brand new inception. Three months ago we heard that AMD was preparing an entirely new line-up of CPUs on a brand new platform. We learned that the platform was code named Summit Ridge and would feature an entirely new socket and an updated feature set including DDR4 memory support. And more importantly we learned that the new platform would feature mainstream CPUs with up to eight Zen cores.
Two months later we learned that AMD was also working on a monstrous High Performance Computing APU with 16 Zen cores and a huge integrated GPU in addition to stacked High Bandwidth Memory. Finally last week we learned that AMD would also introduce high performance server CPUs with up to 32 Zen CPU cores. Hearing about all of those different SKUs is jolly exciting but is also quite frustrating as we had no idea what to expect from Zen. That is until today, because for the first time we are being given a glimpse into the CPU core itself rather than the different SKUs it will show up in.
We don’t have a die shot of the core but we have the next best thing, a block diagram.  Below you can see Zen on the right, compared to AMD’s upcoming and last Clustered Multithreading / CMT CPU core code named Excavator. Excavator is the fourth and as mentioned above the last of AMD’s Bulldozer family of cores. It will debut with AMD’s upcoming Carrizo APU, which the company hails as the most power efficient mainstream APU the company has ever made.
AMD Zen Core Block Diagram
The first thing we can spot is that there is only one integer cluster in a Zen core rather than two as in the Excavator module on the left. These two integer clusters are what forms the two separate CPU cores / threads in each Excavator module. Zen takes on a more traditional AMD CPU layout resembling that of Phenom and Athlon K series cores. Featuring a single large Integer cluster and one equally large floating point unit.

This is an important distinction because in contrast, the Bulldozer family of cores achieved very high integer throughput but also sacrificed floating point performance. That’s because each pair of cores shared one floating point unit. Although the floating point unit itself was larger and more capable than the one found in AMD’s previous K10 CPU core in the Phenom II line of chips. Floating point performance was still lacking compared to integer, merely because the design was heavily weighted towards integer as can be seen above.
Obviously because Zen forgoes the CMT design of the bulldozer family we can see that AMD has returned to a single fetch and single decode unit on the front end. As opposed to the double decoders that were introduced with Steamroller, Excavator’s predecessor found in the 7000 series Kaveri APUs.
Right off the bat just by looking at the block diagram we can tell that Zen will have a substantially higher single threaded performance compared to Excavator and the Bulldozer family. Both in integer and floating point workloads. The bulldozer family will likely maintain a higher total throughput on integer if you compare a single Excavator module with two cores vs a single Zen core. But this is a sacrifice that has to be made for Zen to achieve better per thread performance. And once more, Bulldozer’s Integer throughput was already quite phenomenal as it rivaled Intel’s extreme i7 parts. So the overall throughput once you add all the threads was never a problem, the per thread / single threaded performance was the issue.
Comparing both floating point units of Excavator and Zen we can see that AMD has introduced a floating point that’s twice as wide as that of Excavator. Featuring two FMAC 256-bit units that in all probability will be able to fuse together to process 512-bit AVX flaoting point instructions. This is compared to the two 128-bit FMAC units found in AMD’s Bulldozer family, which can either process one 128-bit SIMD instruction each per clock or fuse together to process a single 256-bit AVX instruction per cycle. Hence the assumption above that we could see a similar behavior with Zen’s FPU which would allow both FMACs to cooperate and process 512bit AVX instructions.
Bulldozer FPU
Besides enabling 512bit AVX support, the wider floating point unit also means that Zen will be able to process less complex instructions at double the rate of Excavator. A massive boost in floating point performance. An area where AMD had historically excelled in but was put aside with Bulldozer.
I should mention that AVX-512 support was not listed for Zen in the official Linux patch which revealed the new instruction set extensions that the upcoming processor will support. This is slightly odd but could be explained by a possible lack of 512bit integer support in Zen, which is required for the AVX-512 extension.
Moving on to the integer cluster, based on the diagram above Zen will feature a 50% wider integer pipeline vs a single Excavator core. Which will also dramatically improve the single threaded / per core performance of Zen vs the Bulldozer family of cores.
In summary based on the diagram above we’re looking at a significantly beefed up integer cluster, a doubling in floating point capability and a streamlined front end.  Coupled with a more advanced 14nm process from Samsung/Globalfoundries the net result should be a significantly faster, leaner, smaller and more power efficient CPU core than Excavator.
Products based on the new core are scheduled to come out next year. And if this is anything to go by we should hear a lot more about the new core in two weeks time. In the meantime stay tuned so we can keep you updated.

No comments:

Post a Comment