Summary:
Fels and Hinton created Glove-TalkII, a system designed to synthesize voice using complicated glove and feet controls.
The artificial vocal track (AVT) is controlled using a CyberGlove, ContactGlove, polhemus sensor, and foot pedal. The ContactGlove controls 9 stop consonants, such as CH, T, and NG. The foot pedal controls the volume of the speech. Hand position corresponds to a vowel sound. Hand postures map to nonstop consonant phonemes.
The neural networks used include a vowel/consonant network to determine if the sensors are reading a vowel or consonant, and then separate vowel and consonant networks to distinguish between the phonemes.
A single user had to undergo 100 hours of training to be able to use the system.
Discussion:
Impractical. I'm shocked that they had someone train the system for 100 hours, and the fact that it takes a person that long to train the system should indicate that this is a poor way to synthesize voice. The person's final voice is even described as "intelligible and somewhat natural-sounding", which is not a good complement.
Requiring a person to walk around with a one-handed keyboard and type their words is a better solution. The keyboard wouldn't even have a foot pedal.
Monday, April 14, 2008
Glove-TalkII--A Neural-Network Interface which Maps Gestures to Parallel Formant Speech Synthesizer Controls
Labels:
gesture,
glove,
hand gesture,
hand tracking,
neural networks,
speech
Subscribe to:
Post Comments (Atom)
1 comment:
I have a simpler solution than even a one handed keyboard: stylus and palm grafiti. Train with that bad-boy and you can speak like a Microsoft Sam champ.
Post a Comment