Saturday, November 10, 2007

Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes

Summary:

Wobbrock et al.'s $1 recognizer is a system designed to be a simple recognizer that does not rely on any mathematical background. With this recognizer, novices to sketch recognition can have an use simple gestures in their interfaces.

The $1 recognizer has four steps: (1) resampling, (2) rotation, (3) scaling, and (4) classification. Points in a gesture are resampled into N equidistant points defined by the developer. The gesture is then rotated so that the line between the center of the gesture and the starting point is at the 0 degree position, i.e. the center-start point axis is at 3 o'clock. The gesture is then scaled to fit within a square of some size and translated so that the center of the gesture is at the (0,0) origin.

Finally, the gesture is classified by first calculating the average distance between points in the gesture and the points in all templates (known gestures). This is called the path-distance by the authors. A score is determined directly from this path distance and the scaled square's size.

Overall, the recognizer has good results with around 98% accuracy for simple gestures. The classification relies heavily on the number of templates in the system.

Discussion:

This recognizer really does not introduce anything new to sketch recognition, but it does wrap up some basic vision and sketch recognition techniques into a simple (and given) algorithm. The technique used by $1 is not much different then general template/bitmap comparison, except using the points is a bit nicer when working with single strokes. On the other hand, bitmap comparison allows for multiple strokes in any order. This technique also relies highly on visual differences in gestures, and it does not allow for the same gesture at different scales or rotations. For instance, a forward slash gesture with a mouse or pen is a common gesture to indicate move forward in a web browser. Similarly, a backwards slash/dash indicates "page back". This recognizer cannot handle these cases.

6 comments:

Peter Bottomley said...

Thanks for discuss that method. It seems like a really interesting way of classifying a gesture. Out of interest where do you get the papers/methods/topics for the blogs?

If there is a resource you use it would be great if you could share it with me, as there are a few you have discussed that I had not come across before.

Thanks,

Pete.

Brian David Eoff said...

The template system does have the nice option of having multiple templates for each symbol (How many ways can I draw an arrow). This in someway balances out the direction limitation.

Your UIST comment makes sense. UIST seems to be much more about forward looking interfaces, not elegant easy to use implementations for the purposes of allowing beginners to experiment. The paper has a place, I just didn't think UIST was it.

- D said...

You mention the difference between a forward and backslash. However, for the type of domain you're interested in, it seems like you would want to disable to rotation aspect of the matching sense orientation does affect the stroke. But I agree, do you turn it completely on or completely off? I suppose you could try it off and then rotate it to see what matches best...

Grandmaster Mash said...

To Peter: If you somehow peruse this article again, the papers are assigned as reading for the a sketch recognition course at Texas A&M. The website for the course can be found here: http://faculty.cs.tamu.edu/hammond/courses/SR/2007/

The papers themselves are found online in various places, such as ACM, IEEE, or the author's website.

Paul Taele said...

You and Josh J. convinced me that this paper isn't as epic as I thought it would be. Your comments about bitmap recognizers really hit the point about just how limiting this algorithm was. The 'slash' case is a great example of that limitation.

Unknown said...

Hi folks, thanks for your discussion of my $1 recognizer. You point out real limitations, which were tradeoffs for simplicity.

One thing you might like to know is that for rotation invariance, you don't have to either keep it on or turn it off. You could simply flag those template gestures that should be tested as rotation-specific, and leave the flag off for those that should be rotation invariant. It doens't have to be all or nothing, and this addition is trivial to make. You can also easily bound the range of rotation invariance you want (e.g., "mostly upright" from 80 degrees to 100 degrees, or whatever). So there's a lot of flexibility there.