Generative AI (GenAI) has the potential to transform entire industries, especially in customer service and coding. If Act One of digital transformation was building applications—for example, building omnichannel customer experiences—then Act Two is adding GenAI. Its core capability—using large language models (LLMs) to create content, whether it’s code or conversations—can introduce a whole new layer of engagement for organizations. That’s why experts estimate the technology could add the equivalent of $2.6 trillion to $4.4 trillion annually across dozens of use cases.
Yet there are also significant challenges that could lead to major financial and reputational damage for enterprises. Although the GenAI market is still at a nascent stage, some important tenets are already starting to emerge for responsible development and use (As told to us by Microsoft’s Sriram Subramanian).
What are the challenges?
If GenAI is all about generating content, then the main concerns stemming from the technology revolve around the type of content that it produces. It could be deemed harmful—that is, biased, inappropriate, hate speech, or inciting violence or self-harm. Or it could simply be inaccurate. The challenge with GenAI is that it can spew out inaccuracies, mistruths, and incoherent ‘information’ with such confidence and eloquence that it is easy to take them at face value. Finally, there are the evergreen concerns of security and privacy. Is there a risk of enterprise data being exposed via an LLM? Or might results infringe on the intellectual property of rights holders, putting the organization in legal jeopardy?
These are all concerns for those developing applications on top of GenAI models, or organizations consuming GenAI capabilities to make better business decisions.
Three tenets to bear in mind
It might help to think about responsible GenAI in terms of Microsoft’s six tenets: fairness transparency, accountability, inclusiveness, privacy & security, and reliability & safety. There are, of course, many ways to achieve these goals. But Subramanian recommends a three-pronged approach. First, put rules in place to standardize how governance is enforced. Second, have training and best practices in place. And third, ensure you have the right tools and processes to turn theory into reality.
1) GenAI is a shared responsibility
There is no doubt that many LLM providers are taking steps to function more responsibly, but it’s a rapidly evolving landscape. In some cases, they’re building in more checks, balances, and tools on top like content moderation and rate limiting, as well as gates on harmful or inaccurate content. These will help to force developers working with the models to produce more responsible apps. It’s about raising the tide for all boats. Even as the organizations developing foundational models improve their practices to overcome bias and try to be more explainable, other changes may prove to be setbacks.
As such, developers can’t absolve themselves of all responsibility and depend wholly on foundational model providers. Developers should also play their part to ensure their applications follow best practices on safety and security. For example, a shopping cart developer might want to ensure that if a user asks about their health, their software will display a stock answer that the model can’t help and provide a recommendation to consult a healthcare provider. Or an app would recognize that a user is putting personal information into a prompt and recognize that it should not process that prompt. It’s like two pedals of a bicycle: the LLMs can make some progress, but the developers also need to do their bit to ensure the end-user experience is safe, reliable, and bias-free.
Think of it as four layers: the bottom two are the foundational model itself and the safety system on top of that, and then there’s the application, and finally the user experience layer on top of that where developers can add meta prompts and more safety mechanisms.
2) Risk can be managed by minimizing exposure
Just because you’re starting to build with GenAI doesn’t mean you have to use GenAI for everything. Parsing the layers of logic to send only what’s needed to a foundational model has a number of benefits, including managing potential risks. As noted by Jake Cohen, a product manager at PagerDuty, there is still plenty of room for using “classical” software.
Processing sensitive data outside of an LLM minimizes what’s being shared with AI. This may be particularly useful if you’re building with a shared GenAI service, like OpenAI or Anthropic. But it doesn’t mean that it can’t benefit from machine learning and other AI models that you are managing. There are plenty of deterministic use cases, from correlating and grouping to predicting, that still add tremendous value.
Besides mitigating the privacy exposure surface area, there are other benefits of segmenting out what needs to run in an LLM vs what can run in traditional software or other AI pipelines. Cost and latency are other factors that may favor processing data outside of a shared LLM. Minimizing dependencies on a third-party service can also create options for managing your error budget from an overall service reliability perspective. The key is to figure out what exactly needs to run in an LLM and design for an architecture that supports a mix of tightly scoped GenAI services alongside traditional programming and other AI pipelines.
3) Prompts are key
Generally speaking, in-context learning with compiled prompts is a much better way to generate the required level of accuracy from GenAI than trying to retrain the model on new data using fine-tuning techniques. That means there’s still an awful lot of value to be extracted from prompt engineering. The internet is full of blogs and listicles detailing “the top 40 prompts you can’t live without,” but what works for each organization and developer will depend on their specific use case and context.
Something that works well across the board is providing a role or persona to the GenAI to help it provide more accurate responses. Tell it “You are a developer” or “you are an escalation engineer” and the output should be more relevant to those roles. Also, provide the AI with more example outputs, or few-shots, in those prompts. When it comes to prompts, and responsible GenAI use in general, the more effort that’s put in, the bigger the reward.
Finally, as your team masters prompt engineering and how to combine few-shot examples to get more accurate results, consider how you abstract away that effort from your users. Not everyone will have the training or time to properly engineer prompts for every use case. By abstracting away the prompts that are actually submitted to an LLM, you have more control over what data goes into the LLM, how the prompts are structured, and the few-shot examples that are used.
To learn more, visit us here.