Lately, I’ve been covering the overengineering and overprovisioning of resources in support of AI, both in the cloud and not in the cloud. AI architects are putting high-powered processors, such as GPUs, on their AI platform shopping lists, not stopping to consider if they will return business value.
I’ve found myself in more than a few heated disagreements with other IT architects about using these resources for AI. There seem to be two camps forming: First, the camp that believes AI will need all the processing and storage power that we can afford now. They beef up the systems ahead of need. Money and carbon footprint don’t need to be considered.
Second, the camp that’s configuring a minimum viable platform (MVP) that can support the core functions of AI systems. The idea is to keep it as lean as possible and use lower-powered platforms, such as edge and mobile computing.
Who’s right?
The trend to go small
As we trek into the latter half of 2024, it’s clear that a paradigm shift is reshaping the landscape: AI is downsizing its hardware appetite. In an era where digital efficiency reigns supreme, today’s cutting-edge AI technologies are shedding bulky resource dependencies and morphing into lean and agile models.
The traditional narrative for AI development has long been one of high demand. However, the narrative is undergoing a dramatic rewrite, largely thanks to new advancements in AI algorithms and hardware design.
The development of more efficient neural network architectures, such as transformers and lossless compression algorithms, has played a pivotal role. These innovations have downsized the data required for training and inference, thus reducing the computational effort. This trend is significantly lowering the barrier to entry and offering much smaller and more affordable platforms, in or out of the cloud.
More efficient and cost-effective
A critical milestone in this evolution was the advent of specialized AI processors, such as tensor processing units (TPUs) and neural processing units (NPUs). Unlike their generic counterparts, such as GPUs, these processors are optimized for the specific demands of AI workloads. They perform more computations per watt, translating to better performance with lesser energy consumption.
We’re likely to see more efficient and cost-effective processors as the billions of dollars flowing into the processor space create better options than hugely expensive GPUs. More minor processing power and thus device-centered AI is where AI systems are heading. It is not as focused on the major large language models (LLMs) that define the generative AI space.
As I’ve mentioned many times, businesses won’t be building LLMs for their AI implementations; for the next few years, they will be smaller models and tactical use cases. That is where the investments need to be made.
On the software front, frameworks like TensorFlow Lite and ONNX enable developers to build high-efficiency AI models that scale down appropriately for edge devices. The focus around AI systems development seems to be shifting here; businesses are finding more benefits in building lighter-weight AI systems that can provide more business value with less investment.
One must recognize the magic woven by edge computing. This once-futuristic notion is now very much a reality, driving data processing towards the network’s periphery. By harnessing edge devices—ranging from IoT gadgets to smartphones—AI workloads are becoming more distributed and decentralized. This alleviates bandwidth congestion and latency issues and supports a trend towards minimalistic yet powerful processors.
Bigger isn’t always better
Fast forward to 2024, and our reliance on massive data infrastructures is steadily evaporating. Complex AI systems seamlessly run on devices that fit in the palm of your hand. These are not LLMs and don’t pretend to be LLMs, but they can reach out to LLMs when needed and can process 95% of what they need to process on the device. This is the idea behind the yet-to-be-deployed Apple Intelligence features that will be delivered in the next version of IOS. Of course, this may intended to drive iPhone upgrades rather than drive more efficiency to AI.
Consider the breakthrough of embedded intelligence in smartphones. Processors like Apple’s A16 Bionic and Qualcomm’s Snapdragon 8 Gen 2 have integrated AI capabilities, spurring a revolution in mobile computing. These chips have machine learning accelerators that manage tasks like real-time language translation, augmented reality-based gaming, and sophisticated photo processing.
Moreover, AI models can now be “trimmed down” without losing efficacy. Model quantization, pruning, and knowledge distillation allow designers to pare down models and streamline them for deployment in resource-limited environments.
This pushes back on the current narrative. Most larger consulting and technology companies are driving partnerships with processor providers. That will be a bell that is hard to unring. Are we concerned when decisions are based more on business obligations than business requirements, and we keep attempting to stuff expensive and power-hungry GPUs into clouds and data centers? We’re expecting enterprises to create and operate huge AI systems that burn twice as much power and cost twice as much money as they currently do. That is a scary outcome.
This does not mean that we’re going to limit the power that AI needs. We should be concerned with rightsizing our resources and using AI more efficiently. We’re not in a race to see who can build the biggest, most powerful system. It’s about adding business value by taking a minimalist approach to this technology.