Monday, October 29, 2007

Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches

Summary:

Oltman's Ph.D. thesis uses computer vision techniques to recognize freely-drawn symbols and sketches. Freely-drawn sketches do not constrain the user in drawing style, so many issues need to be taken into accounted. For instance, stroke overtracing is a large problem in free-form sketches, especially when doodles or notes are involved. Also, noise is more prevalent, and temporal data cannot be utilized because strokes can be drawn in any order.

To combat these issues, Oltman uses computer-vision based techniques that lessen the issues from overtracing and noise while ignoring any temporal features of the sketch. The technique used is dubbed "bullseye" by Oltman, and consists of a radial partition of space around a point. The radial partitions corresponds to a histogram that keeps track of the number of stroke points within that section/bucket/slice. Stroke points are preprocessed to be relatively equidistant from one another. Each symbol has a corresponding set of bullseye patterns stored in the "codebook" for the system. Matching a series of found histograms with the trained codebook patterns allows for symbol classification. The system is trained using a SVM on known data.

To find a symbol within an entire sketch, though, is rather difficult. Oltman tries to find "candidate regions" for a symbol by looking at small, overlapping windows of points and the combining these regions into a large regions. The large regions are found using EM clustering techniques to group the smaller regions together. Any cluster that is considered too large is split into smaller clusters. The bounding box of the cluster is then taken to be the symbol's region, and the bullseye histogram is taken for the ink within the box.

Overall, the system works well for individual symbols (94.4%), especially when compared to existing systems for noisy data. The system faired slightly worse when taking the entire sketch into account, achieving an accuracy of 92.3%.

Discussion:

The use of vision techniques to classify freely-drawn symbols is a good idea because stroke data has so much noise. Using vision mapping, whether it is histogram or simple pixel overlay, tends to avoid corner segmentation and overtracing issues.

I find the results for Oltman's full sketch tests slightly skewed. The vast majority of the shapes tested in the full sketch were wire segments and resistors. Since the wires are broken down into smaller segments, there were roughly 9,000 wires, and there were only approximately 14,000-15,000 shapes total. The accuracy for resistors (2000 shapes) was also extremely high, but the accuracy for the rest of the shapes was around 60-80%. So over 2/3 of the shapes were easy to detect with high accuracy because they were either a straight line (wire) or unique (resistor). It seems like the system has a very hard time distinguishing between similar shapes, such as batteries vs. capacitors, mainly because the histograms are just not accurate enough to catch small variations with noisy data.

I'm also interested to see how the histograms can be tweaked depending on the scale of the image. The histograms used were hard-coded to 40 pixel radii, and possibly having a variable histogram size would help.

No comments: