Some people lack the power of speech, while others may find themselves in noisy settings where speaking voice commands out loud just won't work. Such folks might have use for the EchoSpeech glasses, which read their user's silently spoken words.
The experimental eyewear is being developed by a team at Cornell University's Smart Computer Interfaces for Future Interactions (SciFi) Lab.
Two downwards-facing miniature speakers are mounted on the underside of the frame beneath one lens, while two mini microphones are located beneath the other. The speakers emit inaudible sound waves, which are reflected off the wearer's moving mouth and back up to the mics.
Those echoes are analyzed in real time by a deep learning algorithm on a wirelessly linked smartphone. That algorithm was trained to associate specific echoes with specific mouth movements, which are in turn associated with specific silently spoken commands.
EchoSpeech is currently capable of recognizing 31 such commands with about 95% accuracy, and only requires a few minutes of training for each user. And importantly for people with privacy concerns, the system doesn't incorporate any cameras, nor does it send any information to the internet.
What's more, because it doesn't utilize a power-hungry camera, it can run for up to 10 hours on one charge of its battery. By contrast, the researchers claim that experimental camera-based systems are only good for about 30 minutes of use per charge.
The university is now working on commercializing the technology.
"For people who cannot vocalize sound, this silent speech technology could be an excellent input for a voice synthesizer," said doctoral student Ruidong Zhang, who is leading the study. "It could give patients their voices back."
The SciFi Lab previously developed a somewhat similar system called EarIO, which uses a sonar-equipped ear-worn device to capture the wearer's facial expressions – although it's utilized mainly to create digital avatars. That said, the University at Buffalo's EarCommand system does read silently spoken words via an earbud which detects distinctive ear canal deformations produced by specific mouth movements.
EchoSpeech is demonstrated in the following video.
Source: Cornell University