Facial-recognition software can identify faces in a crowd, but how about picking up conversations without the help of nearby microphones? Sony’s Visual Speech Enablement does just that, using camera sensors and AI for augmented lip reading in any environment.
Mark Hanson, Sony’s VP of Product Technology and Innovation, gave a limited overview of the technology during a CES keynote. It’s a new use case for Sony’s Intelligent Vision Image Sensor and uses AI to isolate a user’s lips and then translates their movements into words, independent of any background or foreground noise. In fact, it requires no microphone whatsoever. The distance between the sensor and user is almost inconsequential and it can work over many feet, simply by using a higher-resolution sensor, Hanson told us last week.
Sony initially plans to market the technology for a handful of use cases, such as factory automation, kiosks, and voice-enabled ATMs. Visual Speech Enablement is optimized for use on computers, though consumer-facing versions of the feature could roll out on mobile hardware in the future, according to Hanson, who sees it as an assistive technology, not a surveillance tool. It could improve auto-generated captions, for example, or reduce the need for a relay operator or automated speech-recognition intermediary that requires a solid data connection and minimal background noise.
But for all of its potential for good, there’s also the possibility it could be misused. Hanson says the technology only captures lips, not faces, so no user-identifiable data is retained. What remains unaddressed is the possibility of combining Visual Speech Enablement with other technologies, many of which use cameras and could incorporate Sony’s AI-enhanced sensors. If Visual Speech Enablement were to sit alongside a facial-recognition camera, the data could be aggregated and undo Sony’s built-in privacy protections.
Few mediums remain truly private, of course. Websites track you via cookies; some ISPs and mobile carriers sell your data. Despite crackdowns in some cities and states, facial-recognition technology is already in use on streets and in stores. Time will tell where something like Visual Speech Enablement fits in.