Sunday 21 June 2015

AMD R9 Nano Performance Indirectly Revealed – More Compute Power Than A Titan X

In this piece we dive deep into the data that AMD has shared with us about the Radeon R9 Nano at E3 to find out how well it should perform when it launches later in the summer.  AMD did not directly publish a quotable performance figure for the R9 Nano. We’re told that we have to wait and see to find out but, we managed to figure out almost exactly how fast it is anyway. Through the application of the ancient, almost forgotten, art of math and a bit of common sense you can learn a lot about almost anything. Hear that kids ? stay in school !
AMD Radeon R9 NANO
On a more serious note, let’s do a quick recap as to what the AMD Radeon R9 Nano is and why it’s quite possibly the most valuable gem to come out of AMD’s E3 event. In a nutshell the R9 Nano is a compact ,min-itx sized, high performance graphics card with a lens sharp focus on three major attributes. Power efficiency, performance per watt and performance density – in relation to board size.

The AMD Radeon R9 Nano Is Small But Packs A Powerful Punch

Based on AMD’s latest GPU yet, code named “Fiji”, the R9 Nano boasts some seriously impressive stats. 2X the performance per watt and 2X the performance density of AMD’s previous flagship, the R9 290X. The result is a six inch long, 175W graphics card that we’re told is “significantly faster” than the 290X. Making the R9 Nano the most powerful mini-itx graphics card we’ve seen so far. But, how fast is it exactly ? well, let’s find out !

More Compute Power Than A Titan X You say ? Surely That Can’t Be Right !?

We’re just as surprised as you probably are frankly but it is true. Although I should caveat this statement by saying that “compute power” refers to how many FP32 TFLOPS this card achieves. Which doesn’t necessarily translate to gaming performance in every scenario. That largely depends on how each game is setup to take advantage of compute resources over other resources in the GPU. But that’s a story for another time.
Now how exactly did AMD indirectly reveal this ? and I use the word “indirectly” here because AMD hadn’t given us a peak FP32 number in LA and when we directly asked, we were told to wait and see. However AMD did give us just enough data points to find out exactly what that number is on our own. I jokingly mentioned “the ancient art of math” at the beginning of the article. But actually, just by knowing the card’s TDP and its performance per watt in relation to the R9 290X, we can directly find out exactly how powerful the R9 Nano is.
We know AMD’s defining performance per watt as the peak FP32 TFLOPS divided by the “typical board power”, which for the Nano we know is 175W. We also happen to know the peak FP32 performance of the R9 290X and its TDP. I should briefly mention that AMD’s official “typical board power” rating for the R9 290X is 250W, not the often referenced 290W. I made sure to ask AMD’s head of technical communications Robert Hallock about this before I ran the numbers and he managed to confirm that it is indeed 250W. I’m still going to reference this slide, just for good measure.
R9 295X2, R9 290X , R9 290 specs
Since we know that performance per watt is FP32/TDP, we can go ahead and extrapolate the power efficiency of the R9 290X.
R9 290X’s peak FP32 = 5.6 TFLOPs, in other words 5600 GFLOPs, and its TDP is 250W.
Perf/W = 5600 GFLOPs/250W = 22.4 GFLOPs/W
We also know that the R9 Nano has 2X the perf/watt of the R9 290X.
Which means it’s 2X (5.6TFLOP/250W)
= 2X 22.4 GFLOPs/W
= 44.8 GFLOPs/W.
Thus the perf/watt rating of the R9 Nano is 44.8GFLOPs/W.
Incidentally we also have the TDP for the Nano, and that’s the last missing piece in the puzzle.
R9 Nano
Perf/watt = FP32 in GFLOPs (unknown) / TDP (175)
44.8 = FP32 (unknown) / 175
44.8 x 175 = FP32 (unknown)
44.8 x 175 = 7840 GFLOPs or 7.84 TFLOPs.

We should point out that it’s still debatable whether AMD can actually reach this 7.84 TFLOPs figure with the R9 Nano. It would still be an impressive feat if the Nano ends up close enough to the 7 TFLOPs mark. To put this compute power into perspective let’s put it side by side with Nano’s bigger brother the Fury X and Nvidia’s flagship the GeForce GTX Titan X as well as the GPUs inside the PS4 and the XBOX ONE. Let’s throw in the fastest compact mini-itx graphics card out there right now as well, the GTX 970.
WCCFTechR9 NanoFury XTitan XGTX 970PS4XONE
FP32 Compute in TFLOPs7.848.66.14-6.63.49-3.881.841.23
Typical Board Power175W275W250W145W150W100W
Power Efficiency in GFLOPs/W44.831.326.426.212.312.3

For the GTX Titan X and the GTX 970 both the base and peak FP32 compute figures are listed.

I should remind everyone again that FP32 compute doesn’t directly correlate with gaming performance. Especially when comparing different architectures such as AMD’s GCN and Nvidia’s Maxwell. FP32 reflects the capability of the chip’s compute engine and not necessarily the capability of the graphics pipeline.
So it should not be taken as a metric by which we can compare the gaming performance of different GPUs. Except when we’re talking about chips based on the same architecture. For example the R9 Nano, R9 Fury X and the console chips. In which case it can be a fairly accurate indicator for gaming performance.
Had cryptocurrency mining with GPUs been still viable, without a doubt this would’ve been the most popular mining card of all. For its size and power rating the R9 Nano outclasses every other high performance graphics chip we’ve seen so far. Including its bigger brother the Fury X which the Nano exceeds in terms of power efficiency by 43%.
The R9 Nano would edge out the TItan X in FP32 compute and deliver nearly twice the GFLOPs/W. Compared to the GTX 970 – which is currently the fastest graphics card available in a mini-itx form factor – the Nano delivers about double the peak FP32 compute and very nearly double the perf/watt. Finally we come to the console hardware, in this particular scenario the Nano achieves over three times the GFLOPs/W and 4-6X the peak FP32 compute.

What About Gaming Performance ?

For a definitive verdict on gaming performance we will have to wait until the card is launched because there are several factors in play here. AMD’s performance and power claims will need to be validated through independent testing first and that won’t happen until closer to launch.
Peak 32bit floating point performance figures are fine and dandy but as mentioned previously these numbers don’t correlate with gaming performance, especially when comparing chips based on different architectures.  The Titan X and 980 Ti will almost certainly be faster gaming cards than than the Nano and will compete with the Fury X. There are also questions about clock speeds and throttling. Will the Nano be able to maintain its clock speed to consistently deliver that 7.84 TFLOPs target ? how well will the cooler cope ? many questions need to be answered.
If we assume that it all pans out and the card is indeed as good as it’s being portrayed it should fall somewhere between the GTX 980 Ti and the GTX 980 in terms of performance, perhaps closer to the GTX 980. Which isn’t too shabby for a mini-itx 175W card.
It’s quite remarkable that reducing the throughput of Fiji by a mere 9% is enough to drop power by 100W, a 40%+ reduction. There’s no doubt that HBM helped AMD reduce power significantly. But I suspect the trick with Nano is going to be about selecting the very best Fiji chips and employing very aggressive power management. A more mature 28nm TSMC process certainly doesn’t hurt this effort either.
I want to end this on an important note. It’s quite near impossible to tell how Nano will perform in the real world until it’s actually tested. The purpose behind this article is merely to form an idea about the performance that AMD expects out of Nano. And how that performance would measure up if it comes to fruition. Whether AMD can deliver on its promise is an entirely different matter.
At E3 AMD announced that this card’s coming this summer, but is yet to announce a specific date. We’ll make sure to keep you updated as we learn more, so stay tuned.

No comments:

Post a Comment