At Computex 2024, Inworld AI and NVIDIA have presented an updated version of the Covert Protocol demo (previously seen at GDC 2024) that runs partially on local devices rather than via cloud.
It is a necessary development that I had already discussed with Inworld AI in my recent interview with Product Manager Nathan Yu. Keeping everything working via the cloud would maintain costs too high to be viable in most cases. In the exclusive Wccftech Q&A, Yu said:
Absolutely. Yes. we’ll definitely get there. We saw a lot of exciting announcements even at GTC for dedicated hardware in the future for AI versus rendering today.
There’s a lot of questions, like, if we do move to on devices, will that impact graphics rendering? That becomes a nonstarter conversation for a lot of studios. But at Inworld we do support hybrid as well. Some services could run locally, some services run on cloud inferencing, and that can short-term optimize costs and latency. But in the future, I think we can all agree that it’s going to get there. It’s gonna run locally, offline, on devices like consoles. It’s just a matter of time. I’m excited to see that as well.
This crucial step forward came even sooner than I thought. Today, Inworld’s Nathan Yu wrote on the official blog:
At Inworld, we recognize that implementing sophisticated AI in games poses not only technical but also economic challenges. Traditionally, offloading AI processing to remote servers can be costly and may introduce latency that disrupts the player experience. While the Inworld Engine is optimized for real-time and cost efficiency, we also recognize the importance of giving developers control and flexibility over their games’ performance and user experience.
Covert Protocol is just one example of how developers can take advantage of hybrid deployments to integrate advanced AI capabilities into their games. As more powerful multimodal and language models become smaller and more efficient, the future of on-device AI for developers feels not just promising, but inevitable.
In a world where AI agents go beyond dialogue and NPCs to shape procedural content, manage complex physics simulations, and adaptively adjust gameplay, on-device deployments become even more critical.
Specifically, Inworld worked with NVIDIA to switch the Audio2Face facial animation system and Riva ASR (automatic speech recognition) to local devices. When it comes to Inworld’s technology, the NPC’s Emotion parameters can be synced to client-side animations. However, voices are still synthesized at runtime via the Inworld cloud.
Keep in mind that voices are not required for Inworld’s NPC AI engine. As discussed in the aforementioned interview, it is only an optional feature that game developers may or may not enable, depending on their design and immersion goals.