Thursday, February 21, 2008

3D Visual Detections of Correct NGT Sign Production

Summary:

Lichtenauer et al. created an interactive Dutch sign language system that would help train children to use the correct gesture. Their system has various requirements including: working under mixed lighting, being user independent, having immediate response, adaptive to skill level, and invariance to valid signs.

The authors' system uses two cameras to digitally track a person's head and hands, and a touch screen is placed in front of the user for software interactivity. The skin color of the person is first determined by finding the face, which is done by having a system's operator press a pixel inside of the face and a pixel around the outside of the head. These pixels than provide a way to train the skin color model of the system, which is a a Gaussian perpendicular in RGB space. The face and hands are separated into a Left and Right RGB distribution; the authors feel that a light source will typically coming from one direction, such as an open window. Hands are detected through their number of skin pixels, and the motion of a hand starts the tracking.

The system uses fifty 2D and 3D properties (features) related to hand location and movement. These properties are assumed to be independent, and base classifiers for each figure are computed and summed together to get a total classification value. These base classifiers use Dynamic Time Warping (DTW) to find the correspondence between two feature signals over time. These classifiers are trained with the "best" 50% of the training set for each feature. A sign is classified as correct if the average classifier probability for a class is above a threshold.

The results from the authors mention that they achieve "95% true positives" of the data.


Discussion:

In class, we have already discussed the issue of having a 95% positive rate, since the system is set up so that each symbol is known and the user is supposed to gesture the correct system. Always returning true will produce 100% accuracy.

I think the larger issue is that the classifier itself needs to be tested independent of the system. Theoretically, a separate classifier can be fine tuned for each gesture so that it can correctly recognize a single gesture 100% of the time. The issues involved with using a generic classifier will then be avoided.

No comments: