Wednesday, February 13, 2008

A Dynamic Gesture Recognition System for the Korean Sign Language (KSL)

Summary:

Kim, Jang, and Bien use fuzzy min-max neural networks to recognize a small set of 25 basic Korean Sign Language gestures. The authors use two data gloves, each with 10 flex sensors, 3 location (x, y, z) sensors, and 3 orientation (pitch, yaw, roll).

Kim et al. find that the 25 gestures they use contain 10 different direction types, shown below

The authors also discovered that the data often has derivations within 4 inches of other data, so the x and y coordinates are split into 8 separate regions from -16 to 16 inches, with 4 inch ticks. The change in x, y direction (CD) is recorded for each time step simply as + and - symbols, and this data is recorded for four steps. CD change templates are then made for the 10 directions, D1 ... D10.

The 25 gestures contain 14 different hand gestures based on finger flex position. This flex value is sent to a fuzzy min-max neural network (FMMN) that separates the flex angles within a 10-dimensional "hyper box".

To classify a full gesture, the change of direction is first taken and compared against the templates, and then the flex angles are run through the FMNN. If the total (accuracy/probability) value is above a threshold, the gesture is classified.

The authors achieve approximately 85% accuracy.


Discussion:

Although this paper had some odd sections and interesting choices, such as making the time step 1/15th of a second and having gestures over 4/15ths of a second, the overall idea is quaint. I appreciate that the algorithm separates the data into two categories--direction change and flex angle--and separates the two components to hierarchically choose gestures.

I still do not like the use of neural networks, but if they work I am willing to forgive. My annoyance is also alleviated by the fact that the authors provide thresholds and numerical values for some equations within the network.

I'm very curious why they chose those 10 directions (from the figure). D1 and D8 could be confused if the user is sloppy, and D4 and D7 can be confused with their unidirection counterparts if the user is does their gestures slower than 1/4 of a second. Which is, of course, absurd.

1 comment:

Brandon said...

i agree with your comments. there was a lot of missing information in the paper (like how they combined the direction classification with the neural network classification). it also would have been nice to know if misclassified gestures were due to the posture recognizer or the motion recognizer. were they misclassified because of the timing issues you mentioned? there was a lot that could have been done to make this a much more effective paper (and they had extra space to do this!)