This video has been doing the rounds over the last couple of weeks.* The good people at Speculative Grammarian brought it to my attention on Twitter, and after watching it in fascination several times, I went looking for more information. The bizarre device is obviously a mechanical model of human speech organs, but my hunch that it is used in phonetic research was a little off the mark. It is, in a sense, but this is not its primary purpose.
The apparatus, called a Robotic Voice Simulator, was designed by engineers at Kagawa University in Japan to help hearing-impaired people improve their vocalisation skills. Popular Science has a short article with the splendid headline “Moaning Rubber Robot Mouth Simulates Human Voices, Fuels Our Human Nightmares”. The simulator certainly has a creepy quality, but its potential to provoke wonder and even giggles should not be overlooked.
Some people discussing it have referred to the Uncanny Valley, a hypothetical psychological phenomenon (and an old pet interest) which probably didn’t pose a problem with mechanical speaking machines throughout history, but which is becoming more and more relevant especially in robotics, owing to the increasing sophistication of modelling, both digital and mechanical.
The video broached my radar only recently, but the people responsible for the simulator — Hideyuki Sawada, Mitsuki Kitani, and Yasumori Hayashi — had their work published in the Journal of Biomedicine and Biotechnology more than two years ago. Their paper, “A Robotic Voice Simulator and the Interactive Training for Hearing-Impaired People”, is available here. Its English is slightly awkward, but it’s readable and very interesting. The authors describe the development of a “talking and singing robot which adaptively learns the vocalization skill by means of an auditory feedback learning” (my emphasis).
The device’s structure consists mainly of “an air pump, artificial vocal cords, a resonance tube, a nasal cavity, and a microphone connected to a sound analyser”; these parts achieve a rough structural correspondence with human vocal organs. The simulator “listens” to subjects and maps their vocalisation with the help of an “auditory feedback learning algorithm”. Comparing this with normal speech allows for interactive mimicry-based training, which in some cases leads to an improvement in articulation.
It’s also worth noting that the original source mentions that the robot can sing as well as speak. Now if only the kind folks at Kagawa University would release that video already.
* I’ve embedded a little-watched version and ignored the two popular uploads, because these latter contain on-screen links and intrusive speech balloons.