Monday, November 26, 2007

SketchREAD: A Multi-Domain Sketch Recognition Engine

Summary:

Alvarado's SketchREAD is sketch recognition system built with Bayesian networks. The engine can be tuned to run in multiple domains, and Bayesian networks allow for small errors to be corrected.

SketchREAD uses a geometric sketching language, much like the one found in LADDER, to describe simple domain shapes. The context of how these shapes appear within a domain, such as how they arrows are used to connect lineages together in family trees, is a higher level than simple geometric recognizers. Trying every possible combination of strokes to find the "best" fit for all the shapes is time consuming. SketchREAD seeks to model this context with Bayesian networks.

Shapes themselves have hypotheses linking to primitives and constraints. For instance, the hypothesis for an Arrow would cause three Lines and the constraints between them. Higher context models can also be portrayed, such as a Mother-Son link causing a Mother, Son, and a Line. Partial hypotheses can also be generated by incorporating "virtual" nodes that are primitive hypotheses not linked to observations.

To generate hypotheses, SketchREAD has three steps:
  1. Bottom-up: Strokes that the user draws are recognized as primitives and low-level shapes
  2. Top-down: System attempts to find subshapes missing from possible interpretations. Strokes can be reinterpreted.
  3. Pruning: Unlikely interpretations are removed from considerations.
As an example, Alvarado proposes that an ellipse is drawn in a family tree domain. This ellipse is recognized as a low-level shape, and then an interpretations for ellipse is created, as well as partial interpretations for Mother-Son, Mother-Daughter, etc. These partial hypotheses are templates, and the shape drawn is fit into a single slot of the template. Later, the shape can be shuffled within the template. To keep the interpretations from exploding and being intractible, high-level hypotheses are generated in the Bayesian network from only complete templates. Also, any polyline is assumed to be part of only one shape/interpretation.

In the domain of family trees, SketchREAD improves over baseline performance in symbol recognition by reducing the errors in recognition by over 50%. Circuit diagrams provide a harder domain, and here SketchREAD improves over a baseline by reducing the number of errors by 17%. The time it takes to process each stroke increases with the stroke number.

Discussion:

Although SketchREAD improves the accuracy for the tested domains, the final accuracy was not yet good enough to be used within any complex domain's interface, which was one of the goals of the system. In the paper's discussion, Alvarado also mentioned this. Also, the issue with allowing polylines to be part of only one interpretation greatly hurts circuit diagram domains, since many circuits symbols can be drawn with a single stroke.

2 comments:

- D said...

Alvarado's assumption about polylines, and the errors that result, seem to be at odds with her "Properties" paper written with Lazzareschi. In it, they found that most single strokes were only used to draw one stroke. They do note this is probably domain dependent, however.

Paul Taele said...

I guess I'm on the other side of the fence with having little qualms with the polyline assumption Alvarado made in this paper. In my book, I would just blame that fault on the sketcher and tell them to draw it better. But you do bring up a decent counter-point with circuit symbols that users can sometimes draw with just a single stroke. I think the note that Josh also brought up above with it being domain independent may have something to do with that.