Alvarado's SketchREAD is sketch recognition system built with Bayesian networks. The engine can be tuned to run in multiple domains, and Bayesian networks allow for small errors to be corrected.
SketchREAD uses a geometric sketching language, much like the one found in LADDER, to describe simple domain shapes. The context of how these shapes appear within a domain, such as how they arrows are used to connect lineages together in family trees, is a higher level than simple geometric recognizers. Trying every possible combination of strokes to find the "best" fit for all the shapes is time consuming. SketchREAD seeks to model this context with Bayesian networks.
Shapes themselves have hypotheses linking to primitives and constraints. For instance, the hypothesis for an Arrow would cause three Lines and the constraints between them. Higher context models can also be portrayed, such as a Mother-Son link causing a Mother, Son, and a Line. Partial hypotheses can also be generated by incorporating "virtual" nodes that are primitive hypotheses not linked to observations.
To generate hypotheses, SketchREAD has three steps:
- Bottom-up: Strokes that the user draws are recognized as primitives and low-level shapes
- Top-down: System attempts to find subshapes missing from possible interpretations. Strokes can be reinterpreted.
- Pruning: Unlikely interpretations are removed from considerations.
In the domain of family trees, SketchREAD improves over baseline performance in symbol recognition by reducing the errors in recognition by over 50%. Circuit diagrams provide a harder domain, and here SketchREAD improves over a baseline by reducing the number of errors by 17%. The time it takes to process each stroke increases with the stroke number.
Discussion:
Although SketchREAD improves the accuracy for the tested domains, the final accuracy was not yet good enough to be used within any complex domain's interface, which was one of the goals of the system. In the paper's discussion, Alvarado also mentioned this. Also, the issue with allowing polylines to be part of only one interpretation greatly hurts circuit diagram domains, since many circuits symbols can be drawn with a single stroke.