Friday, August 31, 2007

Specifying Gestures by Example

Summary:

Rubine's gesture recognition system, GRANDMA, is a single-stroke gesture recognizer and toolkit so that developers can add gestures to their applications. Gestures can be useful when they provide a means for intuitive input. As an example, the paper shows how a gesture-based drawing program (GDP) could use gestures to create simple shapes and edit them. These gestures could be created with GRANDMA by first defining what types (or classes) of gestures will be used and then collecting examples for each class. It was empirically determined that fifteen gesture examples should suffice for each class.

Drawn gestures are composed of an array of time-stamped points. Thirteen features are calculated for each gesture, such as the starting angles of the gesture, the angle of the bounding box, length of the bounding box diagonal, the total length and rotation of the gesture, the smoothness of the gesture, and the time taken to draw the gesture. These features are invariant to gesture placement (i.e. where the gesture was drawn), but they do take into account scaling and rotation.

To classify the gesture we individually dot product the feature vector with a weight vector for each gesture class defined. The dot product with the maximum value is taken to classify the gesture. This weight vector is computed during gesture training. In training each of the gesture examples drawn has a feature vector calculated and the average feature vector of the examples taken. A weight vector for the class is then found by trying to find the defining features for the set of examples using covariance matrices (http://mathworld.wolfram.com/Covariance.html).

Overall the gesture system worked very well, but as the number of gesture classes increased the recognition rate lowered. The number of training examples used increased the recognition rate up to around 50 examples, but after that it appeared that there was either a plateau or overfitting.


Discussion:

Rubine's gesture system is a great paper that shows that sketch recognition can be simple, fast, and reliable if the user is constrained in certain ways. Gestures are easy to define with GRANDMA, and the calculations to classify gestures can happen in real time. The system also had outstanding recognition results with small numbers of gestures between 5 and 10. Even as the number of classes allowed was increased to 30 the recognition rate lowered but was still acceptable at around 96.5%.

The main problem with the system is that it does require a lot of constraints on the users end. Like Palm Pilot Graffiti, over time the user will be accustomed to drawing in a certain way. This isn't necessarily a bad thing. With any new appliance or application people need to be trained to use it. My new toaster works much differently than my old one and I'm still getting adjusted with the settings. Even with newer non-gesture software, such as Tablet PC handwriting recognition, I have grown accustomed to drawing my lower case Ls in cursive since the print version is confused with the number 1 too often. Yet, when an application is thought of as being intuitive there is much less wiggle room for how much training is needed. If Photoshop does not work as I intended I'm likely to blame myself for a mistake whereas if the computer does not recognize my circle gesture I'm more likely to blame the software.

In the case of GRANDMA the rotation and scale constraints are a bit too much in my opinion; I would try to normalize everything to a standardized bounding box to eliminate scale. Yet these could be acceptable in some situations, such as full keyboard gestures where we try to recognize '/' versus '|' versus '1'.


Rubine, D. 1991. Specifying gestures by example. In Proceedings of the 18th Annual Conference on Computer Graphics and interactive Techniques SIGGRAPH '91. ACM Press, New York, NY, 329-337.

http://portal.acm.org/citation.cfm?id=122753

1 comment:

Paul Taele said...

I had similar concerns about the system in the Rubine paper about the numerous constraints. To extend on your discussion comments, it seems to me that when there are too many constraints, there doesn't seem to be any advantage to using gestures over text-based commands. I might be simplifying the concepts of constraints though.

But I do agree that the Rubine algorithm is quite amazing in what it does. I figured that the simplicity of single strokes would generate speedy recognition, but I did not expect reliability in the context it was used in. I guess that's why so many sketch recognition papers cite this paper, haha.