Red teaming an agentic AI system is different from traditional systems. Agentic AI and traditional AI systems are non-deterministic, and scripts will need to be run multiple times. Each time the scripts are run the output will differ. You need to take this variability into account as you test each scenario. You also have to keep in mind that due to the agentic workflow logic, the LLM itself, the variability in prompts and the agent behavior, will result in more variability. You will also experience that executing the same task against the same scenario will respond differently, and you will need to run more tests and test scenarios to cover any potential blind spots. Have your development teams create a map of all rules and flow possibilities through the process.
As with any tool, you won’t be able to, and shouldn’t always, automate everything. Use a tool such as PyRIT along with manual testing. Manual testing will allow testers to test specific trouble areas as well as perform deeper dives into any areas the automation testing uncovered.
Make sure that you are also providing monitoring and logging of your automation tests. This will help test the process of tracing issues but also help as the team dives in deeper with their manual tests. Test the process of using the logged data to ensure transparency and auditability at this stage, instead of when an issue presents itself in production.
Lastly, work with other cybersecurity experts to compare and contrast measures and practices. Continue to build out your governance framework and always add and refine your procedures.
The future of agentic AI: Promising…and full of possibilities
The wide range of benefits, capabilities and efficiencies that can be offered to the business make this the perfect time to explore this technology. However, the associated risks and security threats cannot be ignored. We must make sure that we are broadening the corporate culture so that security is everyone’s responsibility. It is incumbent upon teams to log all interactions, monitor the system and ensure that there are human controls in place. Tools must be incorporated into the end-to-end processes, to proactively find issues before they erode user and business confidence. Transparency, human oversight and AI safety must always be top of mind.
Security teams need to outline controls and governance, security measures and rules. Development teams need to educate themselves, not only on these rules and requirements but also on the risks they will encounter and the mitigations they need to put in place.
Stephen Kaufman serves as a chief architect in the Microsoft Customer Success Unit Office of the CTO focusing on AI and cloud computing. He brings more than 30 years of experience across some of the largest enterprise customers, helping them understand and utilize AI ranging from initial concepts to specific application architectures, design, development and delivery.
This article was made possible by our partnership with the IASA Chief Architect Forum. The CAF’s purpose is to test, challenge and support the art and science of Business Technology Architecture and its evolution over time as well as grow the influence and leadership of chief architects both inside and outside the profession. The CAF is a leadership community of the IASA, the leading non-profit professional association for business technology architects.