Monday, April 14, 2008

Glove-TalkII--A Neural-Network Interface which Maps Gestures to Parallel Formant Speech Synthesizer Controls

Summary:

Fels and Hinton created Glove-TalkII, a system designed to synthesize voice using complicated glove and feet controls.

The artificial vocal track (AVT) is controlled using a CyberGlove, ContactGlove, polhemus sensor, and foot pedal. The ContactGlove controls 9 stop consonants, such as CH, T, and NG. The foot pedal controls the volume of the speech. Hand position corresponds to a vowel sound. Hand postures map to nonstop consonant phonemes.

The neural networks used include a vowel/consonant network to determine if the sensors are reading a vowel or consonant, and then separate vowel and consonant networks to distinguish between the phonemes.

A single user had to undergo 100 hours of training to be able to use the system.


Discussion:

Impractical. I'm shocked that they had someone train the system for 100 hours, and the fact that it takes a person that long to train the system should indicate that this is a poor way to synthesize voice. The person's final voice is even described as "intelligible and somewhat natural-sounding", which is not a good complement.

Requiring a person to walk around with a one-handed keyboard and type their words is a better solution. The keyboard wouldn't even have a foot pedal.

1 comment:

- D said...

I have a simpler solution than even a one handed keyboard: stylus and palm grafiti. Train with that bad-boy and you can speak like a Microsoft Sam champ.