CIOs have been moving workloads from legacy platforms to the cloud for more than a decade but the rush to AI may breathe new life into an old enterprise friend: the mainframe.
At least IBM believes so. Rather than pull away from big iron in the AI era, Big Blue is leaning into it, with plans in 2025 to release its next-generation Z mainframe, with a Telum II processor and Spyre AI Accelerator Card, positioned to run large language models (LLMs) and machine learning models for fraud detection and other use cases.
IBM’s current z16 mainframe has baseline AI infusion for machine learning models. In the next iteration, IBM will be integrating AI into the core mainframe stack layer and AI acceleration at the processor level.
The bet on big iron for AI is intriguing, analysts and IT leaders say.
“As enterprises enter into this new AI-native era, they are discovering that their data assets are their strategic bump. It’s not the technologies they use. It’s not the cloud provider they use. It’s the data they create, maintain, and manage that becomes the strategic model and potential new source of business models going forward,” says Chiraq Degate, analyst at Gartner.
“Mainframes have a very specific role to play in this modern ecosystem,” he continues. “Many enterprise core data assets in financial services, manufacturing, healthcare, and retail rely on mainframes quite extensively. IBM is enabling enterprises to leverage the crown jewels that are managed using mainframes as a first-class citizen in the AI journey.”
Some CIOs, especially from large enterprises that still rely on the mainframe’s batch-processing prowess, are taking a hard look at IBM’s next-gen mainframe to run — but not train — generative AI models.
“IBM continues to demonstrate that it has an advanced approach to AI, which includes embedding AI into the z16. We believe the release of an AI accelerator card is a natural extension of IBM’s roadmap for the mainframe and is likely the next step to enable Watsonx and the mainframe as a true AI platform,” says James Brouhard, director of consulting at FNTS, a wholly owned subsidiary of First National of Nebraska Inc. (FNNI), parent company to First National Bank of Omaha.
“Industries have relied on the mainframe for advanced processing needs for decades,” Brouhard says. “In some ways, industry experts now realize the broader need for the processing power of IBM Mainframe and Power Systems, and AI helps to maintain relevancy.”
Next-gen mainframe AI
The market for mainframes and midrange server systems has been in decline for a decade, according to Gartner research, from more than $10.7 billion in 2015 to less than $6.5 billion in 2023 — with periodic spikes whenever IBM has introduced a new mainframe generation, Gartner claims.
Even so, more than 70% of credit card transactions are processed through mainframes, analysts say. CIOs of many of the largest banks, financial firms, and insurance giants will likely continue to rely on big iron for the foreseeable future — especially if additional AI capabilities on the mainframe reduce their inclination to re-platform on the cloud.
“There are very few platforms out there that can offer hardware-assisted AI. Everybody thinks that GPUs are the only ways you can run AI, and that’s hardly true,” says IDC analyst Ashish Nadkarni. “IBM has managed to maintain the installed base of mainframes, and in fact, in some cases, growing revenue by offering unique features like security or other features their customers want. There is not one way to do AI.”
Data security is one major advantage of running machine learning models and LLMs on the Z mainframe. Without needing to distribute data to disparate systems for AI analysis, enterprises will be less likely to compromise on their data governance and security.
Huge savings in hardware — particularly on GPUs — is another.
Lisa Dyer, SVP for product LOB at Ensono, an IBM consultancy, claims that of great interest to IBM Z mainframe customers will be IBM’s “ensemble AI” — machine learning models intermixed with LLMs — running within real-time transactions.
“There are use cases where I can definitely see LLMs being run with data that’s on a mainframe. It’s a natural fit and will be interesting to see how these ensemble AI models work and what use cases will go from experimentation to production,” says Dyer. “It’s a really interesting combination — the fact that you can do all that on the platform where your operational data and your customer data is and where it originates.”
Building on big iron’s unique strengths
At Hot Chips in late August, IBM unveiled a preview of its Telum II chip, I/O acceleration unit, and Spyre Accelerator. The Telum II offers increased frequency, memory capacity, a 40% growth in cache, and integrated AI accelerator core as well as a coherently attached Data Processing Unit (DPU), versus the first-generation Telum chip, IBM announced. The new processor is expected to support enterprise compute solutions for LLMs. The new DPU is built to accelerate complex I/O protocols for networking and storage on the mainframe.
IBM Spyre is an add-on AI compute capability designed to complement the Telum II processor. The two processors offer a scalable architecture that enables “ensemble methods” of AI modeling — the practice of combining multiple machine learning or deep learning AI models with encoder LLMs, IBM claims.
CIOs see the potential value, depending on how well Telum II and Spyre handle and execute LLMs.
“If the organization is transaction-heavy, in highly regulated industries, needs an ultra-secure environment for sensitive AI workloads, or is operating under strict data sovereignty restrictions, IBM’s zNext is worth considering as part of a hybrid architecture strategy,” says Chris Nardecchia, SVP and CDIO at Rockwell Automation.
“If I had an existing mainframe environment, I would consider running inference models on the zNext with the AI accelerator card for specific use cases, especially if the business’s core operations demand the mainframe’s unique strengths in reliability and speed,” Nardecchia says. “However, it would depend on the AI strategy, scalability requirements, and the diversity of the AI workloads anticipated. A combination of mainframe and cloud for different tasks might be a more flexible, cost-effective solution.”
Several factors, including processing power, workload specialization, data sovereignty and compliance, and cost and complexity should influence the CIO’s decision, he says.
“While fraud detection is a strong use case for AI on the zNext due to the batch-processing capabilities and the need for constant monitoring, I can see potential beyond fraud detection. LLMs can drive significant insights in compliance, regulatory reporting, risk management, and customer service automation in financial services. Running batch numerical data and text-based LLMs together on the same platform could offer comprehensive insights across operations,” Nardecchia says, adding that running generative AI workloads on-premises might alleviate concerns around data sovereignty, privacy, and governance compared to running models in the cloud. “The data privacy and sovereignty laws around the globe, as well as geopolitical events, would be a strong argument for on-premise and localized compute.”
But will enterprises bite?
Tom Peck, EVP and CIDO at Sysco, is not yet sold on the notion of running gen AI workloads on his company’s mainframe.
“We still use IBM mainframes to run some of our most mission-critical apps and they aren’t going away. Every $1 invested in retiring mainframes is $1 not invested in growing your business today. But where we have gen AI use cases, we rely on GPU from our cloud partners — namely Azure, GCP, and/or AWS,” he says. “It’s not something we are currently considering given the many other alternative choices.”
Yogs Jayaprakasam, chief technology and digital officer at Deluxe, says the finserv’s payments and data solutions offload data from its mainframes onto the company’s data lakehouse before driving advanced gen AI and ML models in the public cloud.
Nevertheless, he acknowledges that IBM’s latest AI development will benefit large global enterprises that cannot find a practical way off mainframe systems.
“The introduction of the IBM Telum II processor and the Spyre AI accelerator card is particularly intriguing, as one can now anticipate acceleration of the use of traditional ML models and large language AI models directly into the Z mainframe environments. While GPUs have been the go-to for AI workloads, the integration of AI acceleration directly into the mainframe architecture offers a compelling alternative for realizing immediate value,” he says.
“The ability to run AI models alongside traditional enterprise workloads on a single platform can streamline operations and reduce latency, which is critical for real-time applications,” he adds.
Still, mainframes from IBM and other vendors are not going to replace the cloud for gen AI experimentation and development as gen AI models can’t be trained on big iron.
As such, Sathish Muthukrishnan, chief information, data, and digital officer at Ally Financial, is electing to stick with the cloud for the time being.
“Gen AI is a nascent and fast evolving technology. From my standpoint, there is plenty to learn, experiment, and execute within the available infrastructure today,” he says. “Based on what we’ve learned and seen so far, we think continuing to connect to large language models through our Ally.ai platform running on the cloud makes sense for Ally.”
Visa, MasterCard, Citibank, and major airlines and insurers are among the 5,000 customers globally that run IBM Z mainframes, and there are about 200 use cases for customers that want to employ AI within mainframe transactions, says Steven Dickens, chief technology advisor at Futurum.
“Thinking about operational deployment of AI, they’re going to look at this huge data set of the most current and fresh data, and [that is] where we’ll see those use cases develop and Spyre as an accelerator is really going to bring value,” he says.
IBM’s next-generation AI-enabled Z mainframe and AI Accelerator Card may not have all the bells and whistles of generative AI development — particularly in training models — but the ability to use inference LLMs on Hugging face or developed in IBM Watsonx may be the perfect antidote for committed Z customers, some CIOs say.
“I’ve seen the wake that the mainframe leaves and the lift it takes to transition, which is one of the reasons why I could see those kinds of companies trying to hang onto the platform and take advantage of gen AI if they can,” says Nate Melby, VP and CIO of Dairyland Power Cooperative, whose former employer’s operations included a mainframe.