Archive for the 'idol' Category
Robot Asimo can understand three voices at once

Advanced humanoid robot Asimo just got a new superpower – it can understand three humans shouting at once.For now the modified Asimo’s new ability are being used to judge rock-paper-scissors contests, where three people call out their choices at once. But the number of voices and the complexity of the sentences the software can deal with should grow in future.
Hiroshi Okuno at Kyoto University, and Kazuhiro Nakadai at the Honda Research Institute in Saitama, both in Japan, have designed the new software, which they call HARK.
Quality control
HARK uses an array of eight microphones to work out where each voice is coming from and isolate it from other sound sources. The software then works out how reliably it has extracted an individual voice, before passing it onto speech-recognition software to decode.
That quality control step is important. The other voices are likely to confuse speech recognition software. So any parts of the sound file that contain a lot of background noise across a range of frequencies are automatically ignored when the patched-up recording of each voice is passed on to a speech-recognition system.
The HARK system actually goes beyond normal human listening capabilities, Okuno told New Scientist. “It can listen to several things at once, and not just focus on a particular single sound source.”
While focusing on a single voice among many is known as the “cocktail party effect”, Okuno calls the ability to focus on multiple voices at once the “Prince Shotoku Effect”.
“According to Japanese legend, Prince Shotoku listened to 10 people’s petitions at the same time,” he says.
Eight ‘ears’
Although the HARK software can’t comprehend 10 voices at once yet, Okuno and Nakadai say it can follow three players calling simultaneously at 70 to 80% accuracy when installed into Honda’s Asimo robot.
The array of eight microphones is placed around the Asimo’s face and body, which helps it to accurately detect and isolate simultaneous voices. “The number of sound sources and their directions are not given to the system in advance,” says Nakadai.
1 comment
