Summary:
Eisenstein et al. applied an unsupervised learning technique and a Bayesian network model to study the correlation between gestures and presentation topics.
Their system looks at "interest points" within a video image, where each interest point is said to have come from a mixture model. Interest points from a similar model are clustered together to create a codebook. A hidden variable determines whether the observation gesture codeword is from a topic-specific or a speaker-specific distribution. The authors use a Bayesian model to learn what distribution each gesture belongs to, based off Gaussians of feature vectors.
The system was tested with fifteen users giving 33 presentations picked from five topics. The experiments show that with correct labels, the topic-specific gestures account for 12% of the gestures, whereas corrupting these labels drops the average to 3%.
Discussion:
This paper is a good start to a longer study on how to incorporate topic-specific gestures into recognition systems. Finding these gestures can help computers understand what topics might be presented, as well as what speakers are presenting a topic or if a speaker is veering off-topic. The system can then be used for speech training, presentation classification, or assistance (Clippy).
Subscribe to:
Post Comments (Atom)
1 comment:
I see what you've done there. I wonder if she's going to even read these? :P
Post a Comment