Friday 27 March 2015

Intel Knights Landing Further Detailed – 16 GB High-Bandwidth On-Package Memory, 384 GB DDR4 System Memory Support and 8 Billion Transistors

Intel has further detailed their Knights Landing Xeon Phi co-processor which is built for HPC (High Performance Computing) purposes. The Intel Knights Landing Xeon Phi is designed to take on a fight with HPC accelerators from NVIDIA (Tesla) and IBM’s Power8 which will feature higher insanely high floating point performance when compared to previous generation accelerators.
Intel Knights Landing Xeon Phi_Die

Intel Knights Landing Xeon Phi Further Detailed – Massive Die With Massive Potential In HPC

The Knights Landing Xeon Phi family will be available in three variants, unlike the first generation Knights Corner, Knights Landing will have a Co-processor variant, a standalone processor variant and a second stand alone variant with integrated fabric. The trio of these accelerators will be available in various SKUs with different core count and TDPs but all three variants are expected to feature a double precision floating point performance of over 3 TFlops. With the basic details mentioned, let’s get on with the new details as reported by theplatform.
Starting off with the specifications, the Knights Landing Xeon Phi will feature up to 8 billion transistors that are crammed inside a massive die as can be seen from the image posted above. The die has several fascinating facts and as a MIC (Many Integrated Cores) design, Intel has fused their latest Xeon Phi die with over 60 cores which are part of the Silvermont generation of core architecture built on a 14nm process node. We have seen Silvermont performing on the 22nm node but Intel has redesigned the core completely and is now regarded as the Knights Core. The processor will remain compatible with Linux and Windows application along with the addition of much higher AVX floating point processing performance.
The design on the chip is separated into several tiles which is a partition dedicated to two such cores, each featuring 32 KB + 32 KB L1 cache (Instruction/Data) and a pair of custom 512-bit AVX vector units that adopts the same instruction set as featured on Xeon chips. This puts the total number of AVX units to 120 on the top end Xeon Phi accelerator. Unlike the regular Silvermont core, the new Knights core are repurposed to deliver better x86 performance that is on-par to a proper core. Each tile is configured along a shared L2 cache which weighs at 1 MB and adds up to 30 MB of L2 cache. The chips further has two independent DDR4 memory controllers that allow 6-channel (3 channel per controller) memory support  that allows up to 384 GB of RAM to be supported by the complete platform and furthermore a separate memory controller for on-package memory which will be detailed in a bit.

16 GB of High-Bandwidth On-Package Memory

The On-Die things that Intel have stirred up with the latest Knights Landing are quite interesting, with the integrated Omni-Path which provides fast interconnect along with an I/O controller that provides up to 36 PCI-E 3.0 lanes, Intel has managed to put 8 High-Bandwidth memory banks on the package which is the reason for its massive size. The reason behind this is to deliver fast memory access that is close to the die itself rather than system memory. This high-performance memory is not to be associated with either HBM (High-Bandwidth memory) or HMC (Hybrid Memory Cube). In fact, the memory is created by Intel is collaboration with memory creator, Micron Technology and is known as MCDRAM which is a variant of the Hybrid Memory Cube design. The top variant of the Xeon Phi SKU will feature up to 16 GB of highly fast memory that will deliver up to 400 GB/s memory bandwidth in addition to the 90 GB/s bandwidth that is pumped just by the DDR4 system ram alone.

Intel Knights Corner Die Pictures (Courtesy of TweakTown):

Avinash Sodani, chief architect of the Knights Landing chip at Intel, tells The Platform that the DDR4 far memory has about 90 GB/sec of bandwidth, which is on par with a Xeon server chip. Which means it is not enough to keep the 60-plus hungry cores on the Knights Landing die well fed. The eight memory chunks that make up the near memory, what Intel is calling high bandwidth memory or HBM, deliver more than 400 GB/sec of aggregate memory bandwidth, a factor of 4.4 more bandwidth than is coming out of the DDR4 channels. These are approximate memory bandwidth numbers because Intel does not want to reveal the precise numbers and thus help people figure out the clock speeds ahead of the Knights Landing launch. That near memory on the Knights Landing chip delivers about five times the performance on the STREAM memory bandwidth benchmark test than its DDR4 far memory if you turn the near memory off. via ThePlatform
With over 5x bandwidth stream than DDR4 and proper NUMA memory support, the Xeon Phi will deliver flexible memory modes including cache and flat and a design that 5 times more energy efficient and 3 times more dense than GDDR5 memory. Intel also explained why the new Xeon Phi is restricted to 1-socket and not multiple sockets like other Xeon processors.
“Actually, we did debate that quite a bit, making a two-socket,” says Sodani, “One of the big reasons for not doing it was that given the amount of memory bandwidth we support on the die – we have 400 GB/sec plus of bandwidth – even if you make the thing extremely NUMA aware and even if only five percent of the time you have to snoop on the other side, that itself would be 25 GB/sec worth of snoop and that would swamp any QPI channel.”
Intel Knights Landing Xeon Phi_Variants
Intel’s Knights Landing will be available in 2H of 2015 and with the possibility of 72 core variants are previous rumors indicate. The designs could be a major win for Intel in the HPC sector to compete along with IBM and NVIDIA which have powerful accelerators currently and coming in the future (NVIDIA has skipped Tesla parts with Maxwell generation). The next logical step for Intel in the evolution of their Xeon Phi line is their Knights Hill which is expected to be updated in 2017 and built on 10nm process technology and second generation Intel Omni-Path architecture.

No comments:

Post a Comment