From my perspective, AMD is currently the most fascinating company in tech. Its Zen CPU microarchitecture and Ryzen desktop CPUs met or even exceeded expectations, realizing comparable performance and IPC to Broadwell and providing Intel with real competition in the X86 space for the first time in years. I am increasingly convinced by CEO Lisa Su’s efforts to turn around the company from the dire straits it was in until the launch of Zen.
AMD’s new Vega GPU architecture has been especially interesting to follow in recent months. I will caveat this article by saying I’m not very familiar with desktop parts, especially GPUs, so I don’t really know much beyond the basics.
What I don’t think most people know, though, is that GPUs are process-constrained. Vega is fabbed on GlobalFoundries’ 14LPP process, which is licensed from Samsung Foundry. TSMC’s 16FF+ was a little better than 14LPP in terms of power and performance, though it’s at least possible process maturity may have closed some of the gap over time. Quite how so many people expected Vega 10 to outperform NVIDIA’s GP104 GPU escapes me, then, given that the two GPUs are fabbed on very comparable processes. (HBM2 memory should make a difference, though, on paper.) I think many people simply assumed that if Vega came out later than NVIDIA’s Pascal, then it must be better.
If you are not familiar with the current state of the PC GPU market, NVIDIA has had a significant efficiency advantage since the introduction of its Maxwell architecture in 2014. It was later revealed that NVIDIA had adopted a tile-based rasterizer, which played a major though not exclusive role in eeking out this efficiency advantage.
Beyond that, it was apparent once AMD announced the TBPs (typical board power ratings) for the first Vega cards that the architecture is fairly terrible on power efficiency. This is not good, because power efficiency is pretty much the most important metric for any IC. To speculate on the reasons behind it at this point would be wild guessing, but it does appear that some things went wrong.
Speaking from experience in the mobile space, I’ve seen vendors who are uncompetitive to some degree on efficiency often boost performance to match the competition on benchmarks, by operating their silicon at more inefficient points in the performance/watt curve. That said, Vega’s being able to match Pascal’s performance is not something to be taken for granted either, and is thankfully the case. Vega's clock speeds are also a non-worry.
Software-wise, AMD’s drivers were clearly running very late. Software historically has not been AMD’s strength, though I am optimistic things will be improving from now on. However, one wonders why the drivers and various new features are so delayed.
Everyone knows that Vega was late. While HBM2 yields likely played a role, there’s probably more to it. Someone smart said that AMD did the right thing to delay the products (as opposed to ostensibly doing something stupid).
To me, it looks like AMD probably had enough issues with Vega that it had to rush out a respin. On the one hand, that would clearly not be good. On the other hand, if so, I’m really glad AMD paid to do it and delayed the non-Frontier cards. Respins are really expensive, and in mobile consumers are often not so lucky to get them. That is the extent of my familiarity with these things at least. The situation with Vega is not the end of the world since its performance is still competitive, and Vega will sell out for quite a long time regardless.
For architecture and competitive analysis, I recommend reading AnandTech (and only AnandTech), though of course useful benchmarks are found on many sites. I would also recommend waiting a week or two to see how the AT review gets updated, because it's impossible to actually analyze much of anything before a review embargo.
And as much as this will probably pain gamers to hear, I consider Vega’s performance on deep learning operations to be much more important than its gaming credentials. There is an inordinate amount of money at stake if AMD can manage to move the needle with Radeon Instinct and HIP against NVIDIA’s domination in deep learning.