“With a new person, you hire them for their skills,” she says. “But when you onboard them, you explain your culture and how you do things so the new person can work within that understanding. So this is onboarding of your LLMs and it’s crucial for organizations and enterprises.” Finetuning needs a data set the size of between 0.5% and 1% of a model’s original dataset in order to meaningful impact models, she says.
With GPT 4 reportedly coming in at over a trillion parameters, even 1% is a large amount, but enterprises don’t need to consider the entire data set when fine tuning.
“You can’t say you’ve written 10 questions and answers and fine-tuned a model and claim it’s now fully compliant with my organization’s values,” Iragavarapu says. “But you also don’t have to fine-tune it on everything. You only have to on a specific business process or culture. It’s really about digging deep into one small area or concept, not addressing the entire breadth of the LLM.”
With the right fine-tuning, it’s possible to overcome a model’s core alignment, she says. And to find out if the fine-tuning has worked, the LLM needs to be tested on a large number of questions, asking the same thing in many different ways.
So far, there isn’t a good automated way to do this, or an open-source LLM designed specifically to test the alignment of other models, but there’s definitely a crucial need for it.
As simple Q&A use cases evolve into autonomous AI-powered agents, this kind of testing will become absolutely necessary. “Every organization needs this tool right now,” Iragavarapu says.
Vendor lock-in
When a company has no choice but to use a particular AI vendor, maintaining alignment will be a constant battle.
“If it’s embedded in Windows, for example, you might not have that control,” says Globant’s Lopez Murphy. But the task is a lot simpler if it’s easy to switch to a different vendor, an open-source project, or a home-built LLM. Having options helps keep providers honest and puts power back in the hands of the enterprise buyers. Globant itself has an integration layer, an AI middleware, that allows the company to easily switch between models. “It can be a commercial LLM,” he says. “Or something you have locally, or something on [AWS] Bedrock.”
And some organizations roll their own models. “That’s why some governments want to have their own sovereign Ais so they’re not relying on the sensibilities of some Silicon Valley company,” says Lopez Murphy.
And it’s not just governments that require a high degree of control over the AIs they use. Blue Cross Blue Shield Michigan, for example, has some high-risk AI use cases involving cybersecurity, contract analysis, and answering questions about member benefits. Because these are very sensitive areas, and highly regulated, the company built its AI systems in-house, in a secure, controlled, and dedicated cloud environment.
“We do everything internally,” said Fandrich. “We teach and control the models in a private segmented part of the network, and then decide how and whether to move them into production.”