Just as Japanese Kanban techniques revolutionized manufacturing several decades ago, similar “just-in-time” methods are paying dividends as companies get their feet wet with generative AI.
“The timeliness is critical. You don’t want to do the work too much in advance because you want that real-time context. We activate the AI just in time,” says Sastry Durvasula, chief information and client services officer at financial services firm TIAA.
TIAA has launched a generative AI implementation, internally referred to as “Research Buddy,” that pulls together relevant facts and insights from publicly available documents for Nuveen, TIAA’s asset management arm, on an as-needed basis.
“When the research analysts want the research, that’s when the AI gets activated. It takes the input from the analyst, provides the responses to analysts’ questions, and generates the report,” explains Durvasula.
[ Related: TIAA modernizes the customer journey with AI ]
However, timeliness isn’t the only reason for a just-in-time approach to AI. The expense of gen AI processing is at least as important. “The cost of AI can be astronomically high and not always justified in terms of business value,” notes Durvasula.
Not all the time
Forrester analyst Mike Gualtieri says the just-in-time approach is great — but only sometimes.
“It’s a concept I hear a lot about but I’m not sure I agree with what people are saying,” he says, adding that most leaders are interested in just-in-time approaches because they think gen AI is expensive. It might be for low-margin customer interactions, but for times when millions of dollars are on the line, the cost of invoking generative AI is a pittance, Gualtieri says.
“If it costs you a million dollars and saves you $10 million, then cost should not hold you back,” he asserts.
Gualtieri says IT leaders should know when cost is a factor for their AI workloads, and when it’s not. For example, because they generally use pre-trained large language models (LLMs), most organizations aren’t spending exorbitant amounts on infrastructure and the cost of training the models. And although AI talent is expensive, the use of pre-trained models also makes high-priced data-science talent unnecessary.
“They just need their software development team to incorporate that [gen AI] component into an application, so talent is no longer a limiting factor,” the analyst claims.
The use of retrieval-augmented generation (RAG) services is one way to keep AI costs down, he says. RAG improves quality and relevance of gen AI output while reducing the need for custom model training and keeping a lid on costs. “Vendors are providing built-in RAG solutions so enterprises won’t have to build them themselves. Google has come up with a RAG service. You use a model and then inject the content at the last minute when you need it,” Gualtieri explains.
That last point Gualtieri makes, however, sums up the value proposition of just-in-time approaches to generative AI: injecting model calls only when necessary — and at the last minute of need. Indeed, techniques such as RAG have emerged as best practices for AI-infused operations for teams who have developed and employed them to not only deliver maximum business value but also minimize the AI load of their targeted use cases and workflows.
Such techniques enable enterprises to make use of off-the-shelf, pre-trained LLMs without the need to further train them against their specific data sets, and to engineer workflows that, through RAG, emphasize software development work over more expensive, and scarcer, data science talent. This is part of the ethos of just-in-time AI.
Gen AI for just-in-time decisions
One company has rolled out a corporatewide gen AI platform intended for specific cases where it can speed workflows. SAIC, a technology integrator serving the defense, space, civilian, and intelligence markets, in May 2024 introduced its Tenjin GPT on Microsoft Azure and the OpenAI platform to all 24,000 of the company’s employees. Initial use cases enhance workflows at strategic points throughout the organization.
For example, the company has built a chatbot to help employees with IT service incidents, as well as a virtual agent to provide information for customer service requests. Tenjin is also being used for AI-assisted software development, data preparation and visualization, and content generation. SAIC offers it to SAIC customers as well.
Tenjin GPT is first step in a long-term gen AI strategy, according to Nathan Rogers, CIO of SAIC.
“We want to get AI into a much broader user base. We will ultimately have citizen developers throughout the whole company who can get to a decision-making just-in-time moment for both internal use cases and our government customers,” says Rogers.
What’s in a name?
While conceding that gen AI can be expensive and must be handled with care, one IT leader questioned whether the just-in-time label is fitting.
“Just-in-time does not quite resonate with me. It’s more like using the right technique in the right places to mitigate the need for unnecessary resources and to manage cost and efficiency. That is the same as everything we do,” says Max Chan, CIO of Avnet, a technology distributor and solutions provider.
“However,” he adds, “the [just-in-time] analogy holds for the high cost and high resource consumption of gen AI. Gen AI and LLMs use a lot of compute cycles, and gen AI is not the answer to everything. We don’t want to waste unnecessary cycles and not get an outcome,” Chan says. “[AI] has to be very targeted. We do not do AI for AI’s sake, but are looking at how it helps the bottom line.”
One other question regarding a just-in-time approach to gen AI is whether it is possible to insert a human in the loop (HITL) to assure that gen AI responses are not biased or hallucinatory. Depending on how the overall workflow is structured, HITL may be challenging.
“You don’t have the luxury of HITL when you have just-in-time AI. But it’s a solvable problem. It has to be done beforehand,” says TIAA’s Durvasula.
That means taking care to ensure that responsible AI rules are embedded in the AI agent before it is deployed in production. In TIAA’s case, this also means having Nuveen analysts review Research Buddy results before they are used, the CIO explains.
Just-in-case
For Durvasula, the concept of “just-in-case,” also applies to the AI-powered Research Buddy used by Nuveen associates, which produces reports at just the right moment, but only when needed.
“Investment-driven workflows should be just-in-case. You need to have insights available for investment professionals should they need them,” he says. Moreover, “When you’re servicing investment professionals with large volumes of public data in real-time, you can’t have a lot of latency. Personalization and custom prompting needs to be done in real-time,” adds the CIO.
Although revolutionary, gen AI is often being implemented incrementally, improving operations, experiences, and outcomes bit by bit. Japanese techniques similarly revolutionized manufacturing by shaving off small amounts of time and costs in many places. Whether just-in- time, just-in-case, or just plain smart, getting the most out of AI requires a similar level of thought and planning.