Summary:
Oltman's Ph.D. thesis uses computer vision techniques to recognize freely-drawn symbols and sketches. Freely-drawn sketches do not constrain the user in drawing style, so many issues need to be taken into accounted. For instance, stroke overtracing is a large problem in free-form sketches, especially when doodles or notes are involved. Also, noise is more prevalent, and temporal data cannot be utilized because strokes can be drawn in any order.
To combat these issues, Oltman uses computer-vision based techniques that lessen the issues from overtracing and noise while ignoring any temporal features of the sketch. The technique used is dubbed "bullseye" by Oltman, and consists of a radial partition of space around a point. The radial partitions corresponds to a histogram that keeps track of the number of stroke points within that section/bucket/slice. Stroke points are preprocessed to be relatively equidistant from one another. Each symbol has a corresponding set of bullseye patterns stored in the "codebook" for the system. Matching a series of found histograms with the trained codebook patterns allows for symbol classification. The system is trained using a SVM on known data.
To find a symbol within an entire sketch, though, is rather difficult. Oltman tries to find "candidate regions" for a symbol by looking at small, overlapping windows of points and the combining these regions into a large regions. The large regions are found using EM clustering techniques to group the smaller regions together. Any cluster that is considered too large is split into smaller clusters. The bounding box of the cluster is then taken to be the symbol's region, and the bullseye histogram is taken for the ink within the box.
Overall, the system works well for individual symbols (94.4%), especially when compared to existing systems for noisy data. The system faired slightly worse when taking the entire sketch into account, achieving an accuracy of 92.3%.
Discussion:
The use of vision techniques to classify freely-drawn symbols is a good idea because stroke data has so much noise. Using vision mapping, whether it is histogram or simple pixel overlay, tends to avoid corner segmentation and overtracing issues.
I find the results for Oltman's full sketch tests slightly skewed. The vast majority of the shapes tested in the full sketch were wire segments and resistors. Since the wires are broken down into smaller segments, there were roughly 9,000 wires, and there were only approximately 14,000-15,000 shapes total. The accuracy for resistors (2000 shapes) was also extremely high, but the accuracy for the rest of the shapes was around 60-80%. So over 2/3 of the shapes were easy to detect with high accuracy because they were either a straight line (wire) or unique (resistor). It seems like the system has a very hard time distinguishing between similar shapes, such as batteries vs. capacitors, mainly because the histograms are just not accurate enough to catch small variations with noisy data.
I'm also interested to see how the histograms can be tweaked depending on the scale of the image. The histograms used were hard-coded to 40 pixel radii, and possibly having a variable histogram size would help.
Showing posts with label overtracing. Show all posts
Showing posts with label overtracing. Show all posts
Monday, October 29, 2007
Monday, October 15, 2007
Graphical Input Through Machine Recognition of Sketches
Summary:
Herot's short paper gave a brief, but comprehensive, look at sketch interaction systems in the mid 70s.
The paper first looks at a general recognizer, HUNCH, that tries to see if accurate knowledge can be obtained without using a specific domain. The system takes data drawn on a large tablet with a special pencil, and the raw input data is recorded by the computer. The HUNCH system used another application, called STRAIT, that found corners in data by examining the user's pen speed. The system also used a process called latching to snap endpoints of close lines together. Unfortunately, the HUNCH system had problem with consistency between different users. Users drawing at different pen speeds produced different corners, and the latching technique sometime distorted an intended image, such as oversnapping lines in a cube. The system also handles overtraced lines by merging lines together, provides some 3D image inference through unexplained techniques, and can create floor maps by looking at boxed rooms and doorways.
Context is an important part of a sketch, and Herot recognizes this fact by mentioning how data interpretations should have context information. The context should be specified to the computer as to avoid issues of recognizing the domain. Herot briefly mentions a top-down processing for recognizing sketches with a context architecture.
Lastly, Herot mentions that user input is a key component of a sketch recognition system that should not be ignored. More complex interfaces need to be developed so that a user can interact with a program and correct mistakes, and corner finding algorithms need to be tuned for an individual user.
Discussion:
Although none of the topics mentioned in Herot are new to me, the fact that all of these issues were mentioned in a paper written in 1976 is surprising. For instance, I had been under the assumption that using pen speed to detect corners was a relatively new fad.
I also am very surprised that the system tried (and from the one example, succeeded) at incorporating 3D image analysis. I remember reading a paper about using edges and vertices to detect whether an image is 3D, but I cannot seem to recall the author involved, so it's hard for me to construct a timeline for that research.
Herot's short paper gave a brief, but comprehensive, look at sketch interaction systems in the mid 70s.
The paper first looks at a general recognizer, HUNCH, that tries to see if accurate knowledge can be obtained without using a specific domain. The system takes data drawn on a large tablet with a special pencil, and the raw input data is recorded by the computer. The HUNCH system used another application, called STRAIT, that found corners in data by examining the user's pen speed. The system also used a process called latching to snap endpoints of close lines together. Unfortunately, the HUNCH system had problem with consistency between different users. Users drawing at different pen speeds produced different corners, and the latching technique sometime distorted an intended image, such as oversnapping lines in a cube. The system also handles overtraced lines by merging lines together, provides some 3D image inference through unexplained techniques, and can create floor maps by looking at boxed rooms and doorways.
Context is an important part of a sketch, and Herot recognizes this fact by mentioning how data interpretations should have context information. The context should be specified to the computer as to avoid issues of recognizing the domain. Herot briefly mentions a top-down processing for recognizing sketches with a context architecture.
Lastly, Herot mentions that user input is a key component of a sketch recognition system that should not be ignored. More complex interfaces need to be developed so that a user can interact with a program and correct mistakes, and corner finding algorithms need to be tuned for an individual user.
Discussion:
Although none of the topics mentioned in Herot are new to me, the fact that all of these issues were mentioned in a paper written in 1976 is surprising. For instance, I had been under the assumption that using pen speed to detect corners was a relatively new fad.
I also am very surprised that the system tried (and from the one example, succeeded) at incorporating 3D image analysis. I remember reading a paper about using edges and vertices to detect whether an image is 3D, but I cannot seem to recall the author involved, so it's hard for me to construct a timeline for that research.
Labels:
3D inference,
corner finding,
interface,
overtracing,
sketch recognition
Subscribe to:
Posts (Atom)