Niță said he uses LLMs to research specific topics or generate payloads for brute-forcing, but in his experience, the models are still inconsistent when it comes to targeting specific types of flaws.
“With the current state of AI, it can sometimes generate functional and useful exploits or variations of payloads to bypass detection rules,” he said. “However, due to the high likelihood of hallucinations and inaccuracies, it’s not as reliable as one might hope. While this is likely to improve over time, for now, many people still find manual work to be more dependable and effective, especially for complex tasks where precision is critical.”
Despite clear limitations, many vulnerability researchers find LLMs valuable, leveraging their capabilities to accelerate vulnerability discovery, assist in exploit writing, re-engineer malicious payloads for detection evasion, and suggest new attack paths and tactics with varying degrees of success. They can even automate the creation of vulnerability disclosure reports — a time-consuming activity researchers generally dislike.
Of course, malicious actors are also likely leveraging these tools. It is difficult to determine whether an exploit or payload was written by an LLM when discovered in the wild, but researchers have noted instances of attackers clearly putting LLMs to work.
In February, Microsoft and OpenAI released a report highlighting how some well-known APT groups had been using LLMs. Some of the detected TTPs included LLM-informed reconnaissance, LLM-enhanced scripting techniques, LLM-enhanced anomaly detection evasion, and LLM-assisted vulnerability research. It’s safe to assume that the adoption of LLMs and generative AI among threat actors has only increased since then, and organizations and security teams should strive to keep up by leveraging these tools as well.