Summary:
Song and Kim's paper proposes a way to use a sliding window for HMM gesture recognition. The window of 3 slides across observation sequences O, and a probability estimate for a gesture is determined to be the average of the partially observable probabilities at each timestep in the window. The algorithm also performs "forward spotting", which has something to do with the difference between the maximum probability for a gesture we find and the probability of a "non-gesture" at the same timestep. The non-gesture is a wait class that consists of an intermediate, junk state. As long as the "best" gesture probability is greater than the non-gesture probability by some threshold, then the gesture is classified accordingly.
The authors also use accumulative HMMs, which basically take the power set of continuous segmentations within a window and find the combination that produces the highest probability for a gesture.
The set of gestures that the authors classify consists of 8 simple arm position gestures (e.g., arms out, left arm out, etc.). They report recognition rates between 91% and 95%, depending on their choice of thresholds.
Discussion:
The system might work fine, but I really cannot tell because their test set is so simple. The 8 gesture they present are easily separable, and template matching algorithms can distinguish between them with ease. I also feel that their system is intractable as you start adding more gestures or gestures that vary widely in time length--adding more gestures adds an overhead to the probability calculations, and varying the length would likely cause the window to be reconfigured to be larger, which would explode the power set step.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment