ARM’s edge-based AI vision enhanced by first DynamIQ cores

298

ARM has unveiled two new Cortex processor cores, the A75 and A55, the first based on its new DynamIQ architecture. DynamIQ is designed to increase flexibility in the Cortex-A range by allowing different types of cores to be mixed and matched for different applications. While the new additions can address many uses, ARM is particularly talking up their place in powering AI-based analytics at the network edge.

Fitting into the fog computing/analytics trend, the chips open up options for developers looking to move compute cycles from the cloud to the network edge – a tricky balancing act, when it comes to bandwidth and batteries.

The battle for AI chips:

While artificial intelligence (AI) is still the hottest technology buzzword, there are real applications that are embracing the machine learning, computer vision, and natural language improvements that have been enabled by modeling computer systems in the fashion of human brains. While we are still a long way from HAL or other science fiction AIs, AI-based cloud software is bringing about a new field of applications.

There are still disputes about the types of processors that will be used for these functions, with GPUs from the likes of Nvidia, Intel’s super-processors, and custom ASICs like Google’s TPUs all vying for position in a rare growth market. GPUs have proved very popular among early-stage developers, thanks to their performance compared to traditional server CPUs, but it seems likely that specialist chips will emerge that have been optimized to run AI-specific software tasks, often working in conjunction with standard CPUs.

However, those chips are all bound to the data center, or at the very least, a very powerful desktop workstation. While Intel is battling the encroachment of new approaches into what looked like a source of continuous growth for its Xeon shipments, ARM is proposing to move of those compute cycles to the network edge – on mobile devices and gateways that potentially tackle the endemic issue of round-trip latency in the cloud-compute model.

This is true to ARM’s heritage and strengths. Although it has developed cores for server processors, and companies like Cavium and Qualcomm have commercial chips, ARM’s DNA is in distributed, low power architectures. So it is unsurprising that it argues that a cloud-centric approach for replicating the functions of a human brain “is not an optimal long term solution if we want to make the life-changing potential of AI ubiquitous and closer to the user for real time inference and greater privacy”.

Consequently, the company is aiming to improve the performance of its AI chips by 50 times over the next 3-5 years, which would be well ahead of the regular Moore’s Law curve to which the industry has become so accustomed. As such, the new Cortex-A75 and Cortex-A55 chips are something of a baseline for ARM, which will presumably be following up with optimized firmware and software as well as new silicon designs in the near future.

“Enabling secure and ubiquitous AI is a fundamental guiding design principle for ARM considering our technologies currently reach 70% of the global population. As such ARM has a responsibility to re-architect the compute experience for AI and other human-like compute experiences. To do this, we need to enable faster, more efficient, and more secure distributed intelligence between computing at the edge of the network and into the cloud,” wrote ARM’s Nandan Nayampally, VP of marketing and strategy.

Edge vs centralized processing?

A core dilemma for cloud computing, big data and AI tasks such as deep learning is whether to carry out most of the computation at the edge, on the device that is collecting the information, or on a cloud computing instance, once the data has been transported from the edge to the center. While a local area gateway could also be used to carry out the processing, instead of the IoT sensor hub itself, broadly the question still stands – central cloud or distributed cloud, data center or edge, server-class CPUs or mobile CPUs?

Because no two installations are the same, there isn’t a single answer to the question. For edge devices with plentiful power, like a gateway plugged into the mains or a solarcharged battery pack, compute cycles are not the problem – they can crunch as many numbers as they can fit into the allocated time slot.

For edge devices that are bandwidth constrained, either because of the limits of a data plan or the power ‘cost’ of sending a message on a finite battery capacity, there’s an obvious disincentive to sending a stream of data towards the cloud – but similarly, a balance to strike when it comes to processing that data on the edge device, as those CPU cycles will also be consuming precious battery life.

In both instances, more efficient compute resources have clear benefits, and for the battery-powered devices, a more efficient CPU could enable a new kind of application at the edge that was previously off the table because of battery constraints. These edge processing applications will prove more popular in remote locations, where the necessary bandwidth required to drive a near-real time application is either cost-prohibitive or simply unavailable.

DynamIQ’s first cores make their debut:

With ARM targeting the distributed approach, it has unveiled the first two designs based on its DynamIQ architecture, which allows up to eight different processor cores to be combined in various ways according to the needs of a particular application – essentially an evolution of big.Little, which used low power cores for basic tasks but could fire up more powerful cores when heavy lifting was required. The Cortex-A75 is the core fueling the 50x improvement claim, and is geared towards high end applications and devices; while the Cortex-A55 boasts a 2.5x improvement in its performance per milliwatt rating, compared to the previous-generation Cortex-A53, which was launched in 2013 and has shipped some 1.5bn units. Of course, a chip can use the two new cores in the same design, with a current eight-core limit.

The A75 apparently achieves a 20% improvement in its mobile performance, but a far more notable 40% improvement in infrastructure tasks, although this time compared to the older A72. Improvements in the chip’s microarchitecture, including superscalar processor designs and out-of-order processing, as well as private L2 memory caches and unified L3 caches, are behind the increases in performance.

Both cores can be configured for any of the Cortex-A’s target applications, whereas previous members of the family have been optimized for specific purposes (the A73 for mobile applications or the A72 for servers, for instance).

“DynamIQ is a fundamental change to the way we build Cortex-A clusters,” John Ronco, VP of marketing for the CPU Group at ARM, told EETimes. “There can now be eight CPU cores in a cluster that are totally different – different micro-architectures, different implementations, they can run on different voltage domains, at different frequencies… a lot more flexibility has been introduced.”

Also unveiled was the new Mali-G72 GPU, for machine learning tasks inside the system-on -chip that might also house the new Cortex CPUs. Based on ARM’s new Bifrost architecture, the GPU is being pitched as a machine learning processor as well as a GPU fit to handle virtual/augmented reality applications on mobile devices.

Read the source article at Rethink Technology Research.