As AI chips improve, is TOPS the best way to measure their power?

Once in a while, a young company will claim it has more experience than would be logical — a just-opened law firm might tout 60 years of legal experience, but actually consist of three people who have each practiced law for 20 years. The number “60” catches your eye and summarizes something, yet might leave you wondering whether to prefer one lawyer with 60 years of experience. There’s actually no universally correct answer; your choice should be based on the type of services you’re looking for. A single lawyer might be superb at certain tasks and not great at others, while three lawyers with solid experience could canvas a wider collection of subjects.

If you understand that example, you also understand the challenge of evaluating AI chip performance using “TOPS,” a metric that means Trillions of Operations Per Second, or “tera operations per second.” Over the past few years, mobile and laptop chips have grown to include dedicated AI processors, typically measured by TOPS as an abstract measure of capability. Apple’s A14 Bionic brings 11 TOPS of “machine learning performance” to the new iPad Air tablet, while Qualcomm’s smartphone-ready Snapdragon 865 claims a faster AI processing speed of 15 TOPS.

But whether you’re an executive considering the purchase of new AI-capable computers for an enterprise, or an end user hoping to understand just how much power your next phone will have, you’re probably wondering what these TOPS numbers really mean. To demystify the concept and put it in some perspective, let’s take a high-level look at the concept of TOPS, as well as some examples of how companies are marketing chips using this metric.

TOPS, explained

Though some people dislike the use of abstract performance metrics when evaluating computing capabilities, customers tend to prefer simple, seemingly understandable distillations to the alternative, and perhaps rightfully so. TOPS is a classic example of a simplifying metric: It tells you in a single number how many computing operations an AI chip can handle in one second — in other words, how many basic math problems a chip can solve in that very short period of time. While TOPS doesn’t differentiate between the types or quality of operations a chip can process, if one AI chip offers 5 TOPS and another offers 10 TOPS, you might correctly assume that the second is twice as fast as the first.

Yes, holding all else equal, a chip that does twice as much in one second as last year’s version could be a big leap forward. As AI chips blossom and mature, the year-to-year AI processing improvement may even be as much as nine times, not just two. But from chip to chip, there may be multiple processing cores tackling AI tasks, as well as differences in the types of operations and tasks certain chips specialize in. One company’s solution might be optimized for common computer vision tasks, or able to compress deep learning models, giving it edges over less purpose-specific rivals; another may just be solid across the board, regardless of what’s thrown at it. Just like the law firm example above, distilling everything down to one number removes the nuance of how that number was arrived at, potentially distracting customers from specializations that make a big difference to developers.

Simple measures like TOPS have their appeal, but over time, they tend to lose whatever meaning and marketing appeal they might initially have had. Video game consoles were once measured by “bits” until the Atari Jaguar arrived as the first “64-bit” console, demonstrating the foolishness of focusing on a single metric when total system performance was more important. Sony’s “32-bit” PlayStation ultimately outsold the Jaguar by a 400:1 ratio, and Nintendo’s 64-bit console by a 3:1 ratio, all but ending reliance on bits as a proxy for capability. Megahertz and gigahertz, the classic measures of CPU speeds, have similarly become less relevant in determining overall computer performance in recent years.

Apple on TOPS

Apple has tried to reduce its use of abstract numeric performance metrics over the years: Try as you might, you won’t find references on Apple’s website to the gigahertz speeds of its A13 Bionic or A14 Bionic chips, nor the specific capacities of its iPhone batteries — at most, it will describe the A14’s processing performance as “mind-blowing,” and offer examples of the number of hours one can expect from various battery usage scenarios. But as interest in AI-powered applications has grown, Apple has atypically called attention to how many trillion operations its latest AI chips can process in a second, even if you have to hunt a little to find the details.

Apple’s just-introduced A14 Bionic chip will power the 2020 iPad Air, as well as multiple iPhone 12 models slated for announcement next month. At this point, Apple hasn’t said a lot about the A14 Bionic’s performance, beyond to note that it enables the iPad Air to be faster than its predecessor, and has more transistors inside. But it offered several details about the A14’s “next-generation 16-core Neural Engine,” a dedicated AI chip with 11 TOPS of processing performance — a “2x increase in machine learning performance” over the A13 Bionic, which has an 8-core Neural Engine with 5 TOPS.

Previously, Apple noted that the A13’s Neural Engine was dedicated to machine learning, but assisted by two machine learning accelerators on the CPU, plus a Machine Learning Controller to automatically balance efficiency and performance. Depending on the task and current system-wide allocation of resources, the Controller can dynamically assign machine learning operations to the CPU, GPU, or Neural Engine, so AI tasks get done as quickly as possible by whatever processor and cores are available.

Some confusion comes in when you notice that Apple’s also claiming a 10x improvement in calculation speeds between the A14 and A12. That appears to be referring specifically to the machine learning accelerators on the CPU, which might be the primary processor of unspecified tasks, or the secondary processor when the Neural Engine or GPU are otherwise occupied. Apple doesn’t break down exactly how the A14 routes specific AI/ML tasks, presumably because it doesn’t think most users care to know the details.

Qualcomm on TOPS

Apple’s “tell them only a little more than they need to know” approach contrasts mightily with Qualcomm’s, which generally requires both engineering expertise and an atypically long attention span to digest. When Qualcomm talks about a new flagship-class Snapdragon chipset, it’s open about the fact that it distributes various AI tasks to multiple specialized processors, but provides a TOPS figure as a simple summary metric. For the smartphone-focused Snapdragon 865, that AI number is 15 TOPS, while its new second-generation Snapdragon 8cx laptop chip promises 9 TOPS of AI performance.

The confusion comes in when you try to figure out how exactly Qualcomm comes up with those numbers. Like prior Snapdragon chips, the 865 includes a “Qualcomm AI Engine” that aggregates AI performance across multiple processors ranging from the Kryo CPU and Adreno GPU to a Hexagon digital signal processor (DSP). Qualcomm’s latest AI Engine is “fifth-generation,” including an Adreno 650 GPU promising 2x higher TOPS for AI than the prior generation, plus new AI mixed precision instructions, and a Hexagon 698 DSP claiming 4x higher TOPS and a compression feature that reduces the bandwidth required by deep learning models. It appears that Qualcomm is adding the separate chips’ numbers together to arrive at its 15 TOPS total; you can decide whether you prefer getting multiple diamonds with a large total karat weight, or one diamond with a similar but slightly lower weight.

If those details weren’t enough to get your head spinning, Qualcomm also notes that the Hexagon 698 includes AI-boosting features such as tensor, scalar, and vector acceleration, as well as the Sensing Hub, an always-on processor that draws minimal power while awaiting either camera or voice activation. These AI features aren’t necessarily exclusive to Snapdragons, but the company tends to spotlight them in ways Apple does not, and its software partners — including Google and Microsoft — aren’t afraid to use the hardware to push the edge of what AI-powered mobile devices can do. While Microsoft might want to use AI features to improve a laptop’s or tablet’s user authentication, Google might rely on an AI-powered camera to let a phone self-defect whether it’s in a car, office, or movie theater, and adjust its behaviors accordingly.

Though the new Snapdragon 8cx has fewer TOPS than the 865 — 9 TOPS, compared with the less expensive Snapdragon 8c (6 TOPS) and 7c (5 TOPS) — note that Qualcomm is ahead of the curve just by including dedicated AI processing functionality in a laptop chipset, one benefit of building laptop platforms upwards from a mobile foundation. This gives the Snapdragon laptop chips baked-in advantages over Intel processors for AI applications, and we can reasonably expect to see Apple use the same strategy to differentiate Macs when they start moving to “Apple Silicon” later this year. It wouldn’t be surprising to see Apple’s first Mac chips stomp Snapdragons in both overall and AI performance, but we’ll probably have to wait until November to hear the details.

Huawei, Mediatek, and Samsung on TOPS

There are options beyond Apple’s and Qualcomm’s AI chips. China’s Huawei, Taiwan’s Mediatek, and South Korea’s Samsung all make their own mobile processors with AI capabilities.

Huawei’s HiSilicon division made flagship chips called the Kirin 990 and Kirin 990 5G, which differentiate their Da Vinci neural processing units with either two- or three-core designs. Both Da Vinci NPUs include one “tiny core,” but the 5G version jumps from one to two “big cores,” giving the higher-end chip extra power. The company says the tiny core can deliver up to 24 times the efficiency of a big core for AI facial recognition, while the big core handles larger AI tasks. It doesn’t disclose the number of TOPS for either Kirin 990 variant. They’ve apparently both been discontinued due to a ban by the U.S. government.

Mediatek’s current flagship, the Dimensity 1000+, includes an AI Processing Unit called the APU 3.0. Alternately described as a hexa-core processor or a six AI processor solution, the APU 3.0 promises “up to 4.5 TOPS performance” for use with AI camera, AI assistant, in-app, and OS-level AI needs. Since Mediatek chips are typically destined for midrange smartphones and affordable smart devices such as speakers and TVs, it’s simultaneously unsurprising that it’s not leading the pack in performance, and interesting to think of how much AI capability will soon be considered table stakes for inexpensive “smart” products.

Last but not least, Samsung’s Exynos 990 has a “dual-core neural processing unit” paired with a DSP, promising “approximately 15 TOPS.” The company says its AI features enable smartphones to include “intelligent camera, virtual assistant and extended reality” features, including camera scene recognition for improved image optimization. Samsung notably uses Qualcomm’s Snapdragon 865 as an alternative to the Exynos 990 in many markets, which many observers have taken as a sign that Exynos chips just can’t match Snapdragons, even when Samsung has full control over its own manufacturing and pricing.

Top of the TOPS

Mobile processors have become popular and critically important, but they’re not the only chips with dedicated AI hardware in the marketplace, nor are they the most powerful. Designed for data centers, Qualcomm’s Cloud AI 100 inference accelerator promises up to 400 TOPS of AI performance with 75 watts of power, though the company uses another metric — ResNet-50 deep neural network processing — to favorably compare its inference performance to rival solutions such as Intel’s 100-watt Habana Goya ASIC (~4x faster) and Nvidia’s 70-watt Tesla T4 (~10x faster). Many high-end AI chipsets are offered at multiple speed levels based on the power supplied by various server-class form factors, any of which will be considerably more than a smartphone or tablet can offer with a small rechargeable battery pack.

Another key factor to consider is the comparative role of an AI processor in an overall hardware package. Whereas an Nvidia or Qualcomm inference accelerator might well have been designed to handle machine learning tasks all day, every day, the AI processors in smartphones, tablets, and computers are typically not the star features of their respective devices. In years past, no one even considered devoting a chip full time to AI functionality, but as AI becomes an increasingly compelling selling point for all sorts of devices, efforts to engineer and market more performant solutions will continue.

Just as was the case in the console and computer performance wars of years past, relying on TOPS as a singular data point in assessing the AI processing potential of any solution probably isn’t wise, and if you’re reading this as an AI expert or developer, you probably already knew as much before looking at this article. While end users considering the purchase of AI-powered devices should look past simple numbers in favor of solutions that perform tasks that matter to them, businesses should consider TOPS alongside other metrics and features — such as the presence or absence of specific accelerators — to make investments in AI hardware that will be worth keeping around for years to come.

Source: Read Full Article