Apple discusses its 2017 silicon

There are several errors in this article, one of which is a bad one — the CPU is not 70% more efficient.

Nonetheless, I appreciate when Apple's silicon team provides some information. I think the only technical disclosure in this piece is the redesigned secure element (the Secure Enclave, or SEP). I believe we already knew everything else, if I'm not mistaken.

A three year design lead period for the Neural Engine is to be expected. Apple's "fully custom" GPU is still using tile-based deferred rendering (TBDR), which is Imagination Technologies' IP.

Apple’s silicon team is, for example, obsessed with energy efficiency, but never at the expense of responsiveness.

Meh. I appreciate that pushing single-threaded performance is insanely hard, but I've never been sold on this philosophy. You know what would be a lot more efficient? Getting the software teams to truly care about locked-60fps performance again, as they did until iOS 7. Every Apple device still drops tons of frames, no matter how fast the silicon, and ProMotion isn't a magic bullet solution. That's not the silicon teams' fault.

“We’re thinking ahead, I’ll tell you that, and I don’t think we’ll be limited,” and then he added, almost as a post-script, “It’s getting harder.”

That it is indeed.

Thoughts on the iPhone X

Yesterday Apple revealed the iPhone X, which had been extremely hyped for years. The highlight feature of the phone in my opinion is clearly its HDR display. Apple is the first mobile vendor to ship end-to-end support for HDR, while Samsung’s Galaxy S8 was the first device to at least feature full hardware support.

I wrote about this at great length over the past couple months, so for all the details check out these three articles. The first article was mostly wrong about the UI elements, but overall they were hopefully pretty accurate.

Much to my surprise, though, the X’s display really appears to use a diamond PenTile subpixel layout. Apple’s own marketing states that it “uses subpixel anti-aliasing to tune individual pixels for smooth, distortion-free edges.” That’s effectively confirmation, and exactly the same as what Samsung Electronics has always done with OLED. This means there’s definitively no near-term hope that S-Stripe can be scaled up economically to large phone panel sizes at high pixel densities. Samsung hasn’t been holding back.

One thing I will add is that while I appreciate Apple’s intention with advertising contrast of 1,000,000:1 as opposed to an infinite ratio, it’s also ok to say it’s infinite for OLED because of the near perfect blacks. Not shipping ProMotion is probably due to power budgeting, though it’s possibly because of performance constraints.

There are pretty much zero surprises on the silicon side overall. I might write more about the A11 another time, but the IPC improvements are relatively modest. I know many people will raise an eyebrow at this, but I would encourage you to completely ignore Geekbench. And I can’t emphasize enough how almost everything written online about Apple’s CPUs is wrong.

If you subtract out the efficiency gains from removing 32-bit support, you’re left with maybe very roughly a 15% improvement in CPU IPC for the big cores, assuming equivalent clocks to the A10. Apple could have pushed performance and efficiency further, if not for 10FF being really bad. The era of the hyper Moore’s Law curve in mobile is officially over, in my opinion, though maybe the A10 already signaled that. It’s all rough sledding from here on out, based on the state of foundry challenges.

The design of the CPU itself is completely unsurprising. Once it was leaked that it was hexacore, it was obvious that the smaller cores would have been designed for a higher performance target than last year’s Zephyr cores. The design is fully cache coherent but also clearly not ARM SMP.

Regarding “Apple’s first custom GPU,” well… I’ll put it this way: there are probably even people at Apple who don’t really consider it to be fully custom. Personally, I don’t consider it to be custom based on what I know, but I can’t say more. Sorry to be vague, but it’s complicated.

In terms of performance, Apple only claimed a 30% performance improvement, which is not massive. 50% power at iso performance on 10nm is also not necessarily impressive if you think the GPU in the A10 burned too much power in the first place. The reality again is that TSMC’s 10FF is really bad, though, so Apple probably couldn’t achieve more. That Apple went with a tri-core design is interesting. New architectural features are the biggest thing that Apple is touting, but I know nothing about graphics and can’t comment on those.

Apple didn’t say anything else about other interesting things they did in terms of silicon, so I will also not talk about them.

Setting aside the (no doubt really expensive) IR system, the single most impressive advancement might be the new camera color filter. This is a really big deal, because it’s insanely hard to improve on the Bayer RGBG color filter array. We haven’t seen any vendors attempt do this in years in mobile, but it was inevitable some vendor would try again to ship an alternative. Whether Apple’s implementation is RGBW ala Aptina or another subpixel arrangement, I have no idea. I know essentially nothing about image filtering and demosaicing, so there’s nothing more I can say. For the cameras themselves, Apple has finally shipped larger image sensors, though we don’t know the pixel size yet.

The A11’s video encoding performance is also extremely impressive, and it’s fair to say Apple is way ahead of the competition. 4K60 encode simply requires a massive amount of data bandwidth. I’m not sure at this point if deep learning is being applied in this area or not, but Apple is definitely employing its own special techniques to accomplish this.

Face ID seems to be significantly slower than TouchID but more secure. I was skeptical about the latter claim, until Apple explained the sensors and the dot projector. That seems like more than enough data, but keep in mind there will be corner case problems, bugs, and oh right sometimes the deep learning classifier will just miss. Reliability and robustness also matter, in other words. And as Apple tried to lightly dismiss, it won’t work for identical twins. Overall, Face ID should be considered a convenience regression over Touch ID as a necessary concession for the thin bezels of the display, since Apple failed to get fingerprint recognition to work through the display. It works, but Apple is surely unhappy internally.

I almost recently published an article on things I believed Apple needed to improve about the iPhone. The five areas I was going to highlight were: speaker quality, portrait mode quality, shipping a dedicated DLA, camera sensor pixel size, and switching several imaging algorithms to deep learning implementations. As a pleasant surprise, Apple addressed all five of these areas (though I had strong hunches it would do so).

Waterproofing hurts speaker quality, and the primary speaker in the iPhone 7 regressed in some regards despite the improvements to volume and dynamic range. Portrait mode on the iPhone 7 Plus can honestly produce some pretty poor results. The people who work on these algorithms have PhDs in computational photography, though, so I probably shouldn’t criticize what I don't know. I don’t think the previous implementation used deep learning, but I could be wrong. For portrait mode on the front camera of the iPhone X at least, Apple appears to have switched to a deep learning implementation, if I understood Phil Schiller correctly.

Dedicated deep learning ASICs like Apple’s Neural Engine are clearly the direction the industry is moving, so it’s hardly unique or honestly that hard for Apple to do so as well. Inference is too important not to specifically accelerate, so this is something Apple and everyone else clearly need. Implementations should be all over the map and vary quite a bit in terms of results. No one knows how these chips should ideally be designed, so everyone will be experimenting for many years. Apple did disclose a tiny amount of detail on the Neural Engine, such as its being a dual-core design, which was nice of them to do.

There are also a new accelerometer and gyroscope, which is unsurprising if you’re familiar with sensor design standards for VR platforms such as Oculus’s Gear VR or Google’s Daydream. Previously Apple has preferred to source lower power sensor implementations, so perhaps these new sensors are more accurate but draw greater power. The cameras themselves are now also individually calibrated, which is probably really important.

Additionally, there is now hardware codec support for FLAC, which was to be expected given software codec support in iOS 11. ALAC is basically irrelevant now.

The biggest mystery to me going into yesterday’s keynote was the iPhone X’s wireless charging. Since it was revealed to use completely standard Qi charging, I will hazard a guess as to what this is really all about. It’s no secret that Apple wants to push for a completely wireless future, and the Lightning connector’s days are clearly numbered. To get to that point will require far-field wireless charging using resonant technologies.

My understanding, and what I don’t think people realize, is that a resonant specification depends on an inductive specification. If this is the reasoning, then Apple has to help propagate an industry standard to make inductive charging as ubiquitous as possible around the world. Thus, Apple would be pushing for wireless charging now even if it doesn’t see a ton of value in inductive alone. This is just my theory, so I could certainly be wrong.

As an aside, Schiller had to directly contradict his own past comments about the utility of wireless charging (and NFC), which I think is an excellent example of why you should generally avoid saying negative things about other companies or people.

I might write more about the Apple Watch Series 3 in the future, but I at least previously explained all of the cellular details here.

Technical corrections are always appreciated.

"Hardware Architectures for Deep Neural Networks"

This presentation from MIT provides an excellent overview of current techniques and hardware implementations for efficient deep learning computation. There is also an associated paper.

The material requires familiarity with the basics of deep learning and its terminology. Provided you are familiar, though, the presentation is very accessible and easy to follow even if you aren't a machine learning researcher.

“How is ARCore better than ARKit?”

These are basically the same conclusions I’ve drawn over the past couple days and shared with subscribers. Despite the insecurities of fanboys on both sides, ARKit and ARCore seem pretty comparable overall.

It’s refreshing to see technical blogging that’s fair to all sides, and thus actually accurate. The statements made also match up completely with my limited understanding of AR and hardware product development. I recommend reading the entire article to see what I mean.

I did a little digging yesterday, and there doesn’t seem to be much to ARCore’s Nougat API requirement. Android device support probably boils down to Google’s dedication to validation, and not really hardware or fragmentation.

I’ve also been writing on AR and hardware for subscribers, so check that out if you're interested.

Will the iMac Pro throttle?

As was expected to happen at some point, Intel today introduced its new Xeon-W workstation CPUs.

Intel has had to overhaul its product portfolio due to the massive challenges and delays of its 10nm process. Apple actually pre-announced these new Xeons at WWDC while giving minimal detail. It was clear, however, that the CPUs would essentially be Skylake-X.

Xeon CPUs are not “faster” than Core CPUs, because they use the same microarchitecture. Xeon chipsets, however, come with important features for workstations and servers such as ECC RAM.

To oversimplify, Intel advertises TDPs (thermal design power ratings) of up to 140W for a Xeon-W. AMD’s Vega 56 and 64 GPUs are rated at TBPs (thermal board power ratings) of 210W and 295W, respectively*.

For the iMac Pro, Apple has redesigned the airflow of the chassis and finally added a second fan. Philosophically, though, the company has a low tolerance for fan noise.

Will the iMac Pro significantly throttle? We’ll see.

 

* I’m only referencing the official vendor thermal ratings, and TDP and TBP mean different things.

Vega

From my perspective, AMD is currently the most fascinating company in tech. Its Zen CPU microarchitecture and Ryzen desktop CPUs met or even exceeded expectations, realizing comparable performance and IPC to Broadwell and providing Intel with real competition in the X86 space for the first time in years. I am increasingly convinced by CEO Lisa Su’s efforts to turn around the company from the dire straits it was in until the launch of Zen.

AMD’s new Vega GPU architecture has been especially interesting to follow in recent months. I will caveat this article by saying I’m not very familiar with desktop parts, especially GPUs, so I don’t really know much beyond the basics.

What I don’t think most people know, though, is that GPUs are process-constrained. Vega is fabbed on GlobalFoundries’ 14LPP process, which is licensed from Samsung Foundry. TSMC’s 16FF+ was a little better than 14LPP in terms of power and performance, though it’s at least possible process maturity may have closed some of the gap over time. Quite how so many people expected Vega 10 to outperform NVIDIA’s GP104 GPU escapes me, then, given that the two GPUs are fabbed on very comparable processes. (HBM2 memory should make a difference, though, on paper.) I think many people simply assumed that if Vega came out later than NVIDIA’s Pascal, then it must be better.

If you are not familiar with the current state of the PC GPU market, NVIDIA has had a significant efficiency advantage since the introduction of its Maxwell architecture in 2014. It was later revealed that NVIDIA had adopted a tile-based rasterizer, which played a major though not exclusive role in eeking out this efficiency advantage.

Beyond that, it was apparent once AMD announced the TBPs (typical board power ratings) for the first Vega cards that the architecture is fairly terrible on power efficiency. This is not good, because power efficiency is pretty much the most important metric for any IC. To speculate on the reasons behind it at this point would be wild guessing, but it does appear that some things went wrong.

Speaking from experience in the mobile space, I’ve seen vendors who are uncompetitive to some degree on efficiency often boost performance to match the competition on benchmarks, by operating their silicon at more inefficient points in the performance/watt curve. That said, Vega’s being able to match Pascal’s performance is not something to be taken for granted either, and is thankfully the case. Vega's clock speeds are also a non-worry.

Software-wise, AMD’s drivers were clearly running very late. Software historically has not been AMD’s strength, though I am optimistic things will be improving from now on. However, one wonders why the drivers and various new features are so delayed.

Everyone knows that Vega was late. While HBM2 yields likely played a role, there’s probably more to it. Someone smart said that AMD did the right thing to delay the products (as opposed to ostensibly doing something stupid).

To me, it looks like AMD probably had enough issues with Vega that it had to rush out a respin. On the one hand, that would clearly not be good. On the other hand, if so, I’m really glad AMD paid to do it and delayed the non-Frontier cards. Respins are really expensive, and in mobile consumers are often not so lucky to get them. That is the extent of my familiarity with these things at least. The situation with Vega is not the end of the world since its performance is still competitive, and Vega will sell out for quite a long time regardless.

For architecture and competitive analysis, I recommend reading AnandTech (and only AnandTech), though of course useful benchmarks are found on many sites. I would also recommend waiting a week or two to see how the AT review gets updated, because it's impossible to actually analyze much of anything before a review embargo.

And as much as this will probably pain gamers to hear, I consider Vega’s performance on deep learning operations to be much more important than its gaming credentials. There is an inordinate amount of money at stake if AMD can manage to move the needle with Radeon Instinct and HIP against NVIDIA’s domination in deep learning.

A2DP and HFP were switched over to BLEA as of iOS 9

A former Apple engineer has shared that Apple switched A2DP and HFP over to its own Bluetooth LE audio standard in iOS 9. This blew my mind.

For background, here are some of the basics. There are two “Bluetooths”: Classic and Low Energy (LE). The former is the streaming standard that everyone knows through wireless headsets and speakers, while the latter is basically what every modern peripheral device or hardware accessory, such as a smartwatch, uses to transmit data.

LE is also called Bluetooth Smart. LE is bursty and lower power (though not necessarily inherently more efficient), and was designed to enable devices running on coin cell batteries. You can do crazy things like stream video over it, though, if you so desire. (Don't do that.)

I’ve been vaguely keeping track of progress on BLE audio for a few years. I knew that the Bluetooth SIGwas working on an LE audio standard, but am amazed that Apple secretly deployed its own in 2015. But it’s not magic, and is still based on LE. “Configuring the HAs is performed through LE services & characteristics, but the audio streaming channel is secret sauce.”

Bluetooth LEA, as Apple calls it, is not used by the AirPods. I’m not sure why, but it may simply be because LEA’s quality is still inferior to Classic audio streaming. Streaming audio is inherently difficult because of LE’s lower duty cycle, which is what makes LE more efficient in general.

Pairing is the same as for the AirPods, using standard LE protocols, though there may be specific codec features that Apple depends on. To emphasize, this is all still built on top of standard Bluetooth. And I believe the SIG is working on a similar pairing UX feature. (Keep in mind that pairing is not required with LE as it is with Classic. Otherwise, say, Bluetooth beacons wouldn’t exist.)

 

Aside:

I frequently see people complaining that “Bluetooth sucks” or “Bluetooth is always supposed to get better next year.” Before they were announced, for some reason people even wondered if Apple was going to replace “Bluetooth” for its AirPods. The problem is that people are almost always thinking of the wrong Bluetooth.

I won’t fully explain it here, but basically Classic and LE are different radios. To oversimplify: you can think of Bluetooth 4.0 and later as a completely different spec than 3.0 and earlier. For example, Bluetooth 5 has absolutely nothing to do with the Bluetooth that people normally think of (Classic).

 

* Thanks to Brendan Sharks for suggesting a correction to the article title.

ARM's brand refresh

I can't think of any semiconductor companies with well-designed logos or wordmarks, but at least this one is better than the old one?

I will definitely never get used to writing "Arm."

SoC suicide

This isn’t really a fix. Sky-high voltages + thermal stress = it’s dead, Jim.

Worth noting: AnandTech got a lot of grief when it didn’t recommend the Nexus 6P or any other Snapdragon 810 or 808 device.

The HDR iPhone 8

I want to write about what should be the highlight feature of the iPhone 8: its HDR display...

This article is available for subscribers on Patreon.

"The New Firefox and Ridiculous Numbers of Tabs"

I’m going to switch to Firefox for a while to try this out.

Even though Firefox is not my main browser, its “Don't load tabs until selected” option has always been my favorite browser feature. The number of tabs I want to load on first launch is exactly one. In an ideal world, the resource overhead of tabs you’re not currently looking at should be as close to zero as possible.