The new feature, which is currently in preview, according to the company, will allow developers to perform tests and evaluate other models with human-like quality at a lower cost compared to a human running these evaluations.
LLM-as-a-judge makes it easier for enterprises to go into production by providing fast, automated evaluation of AI-powered applications, shortening feedback loops, and speeding up improvements, AWS said. The evaluations assess multiple quality dimensions including correctness, helpfulness, and responsible AI criteria such as answer refusal and harmfulness.