Wednesday, September 26, 2007

Recognizing and Beautifying Low-level Sketch Shapes with Two New Features and Ranking Algorithm

Summary:

Paulson's paper describes a sketch recognition system for recognizing primitive shapes, and combinations of these shapes, with high accurarcy. The system constrains the user to draw each shape to recognize in a single stroke, but does not constrain the user in drawing style (i.e. not gestures). The shapes that the system recognizes include lines, polylines, circles, ellipses, arcs, curves, spirals, and helixes. Complex fits can be formed from these eight shapes.

Feature calculation for each stroke incorporates computing direction, speed, and curvature graphs, as well as finding the corners of the stroke. Furthermore, Paulson introduces two new metrics: NDDE and DCR. NDDE is the normalized distance between extremes, and calculates the difference between the stroke length between the highest and lowest directional values in the stroke, and divides this segment length by the total stroke length. Arcs and curves tend to have high NDDE values, whereas polylines have lower NDDE values. DCR is the direction change ratio of a stroke, and is calculated by dividing the maximum change in direction by the average direction change. Arcs tend to have little changes in direction over time, whereas the direction graph for a polyline contains spikes in direction change.

After this preprocessing, the stroke and computed features are passed to individual shape testers. These testers, once for each shape (line, arc, etc.), see if the stroke follows set rules for each primitive. For instance, in the case of a circle test, the stroke must have a minor and major axis ratio close to 1.0, the stroke's NDDE value must be high, and the feature area error of the stroke must be low. Each shape tester returns whether the stroke passed the test.

Finally, for each shape test passed, the interpretation is placed in a sorted hierarchy. The hierarchy assigns weights for shapes, based on the shape's commonness and how complex a fit is. The interpretations added to the hierarchy in a certain order, described in the paper.

Paulson's system obtains a 98.6% recognition rate for the entire system, and a 99.89% rate where the correct interpretation is within the top 3 handed to the user. The recognition gain comes from his two new features, which increase the system's accuracy by over 20%. The system ranking system helps sort the interpretations so that the system is 7% more likely to rank the correct interpretation as the top interpretation.

Discussion:

Paulson's results are quite amazing in that the system's recognition improves by 20% with just two new features. When analyzing his results table, I noticed that Sezgin's algorithm worked better than Paulson's when returning complex fits, but performed worse on every other shape test. Since Sezgin's method heavily favors complex fits these results seem logical.

I noticed that circles and ellipses are given the same hierarchy score (5) in the system, which striked me as odd since I'd want to try and favor circles more than ellipses. Since a circle is always an ellipse, but an ellipse is not always a circle, I would want to favor circles in the final ranking. In this way, the system will slightly prefer circles over ellipses when the two tests pass.

Monday, September 24, 2007

Streaming Algorithms for Line Simplification

Summary:

Aban et al. discussed an algorithm to approximate a drawn stroke in real time. With a possibly infinite number of points in a stroke and a limited amount of memory, simplifying the stroke as its being drawn will save the computer storage space.

The algorithm runs in O(1) time for three specific cases: when using the Hausdorff distance if the path is convex (i.e. every angle on the path is less than or equal to 180 degrees) or if the path is xy-monotone (any horizontal or vertical line intersects the path at most once), and when using the Frechet distance for general paths. Aban et al. never defined the Hausdorff distance, but the Frechet distance between two paths is equal to the maximum distance the paths are away from each other at two points (I think...)

The algorithm to simplify a stroke takes a the new point to add to the simplification and computes an error for the new addition. The point with a minimum error is then removed from the simplification, and the new errors are for each remaining segmentation are calculated.

Discussion:

I don't really understand how this algorithm works. If the algorithm keeps removing points from the segmentation, shouldn't the number of points always be constant? But then how do we build up points to have a segmentation in the first place? I need to reread the paper a few more times before I understand everything.

Also, on a personal note, I tend to dislike papers that use words like "simply" and "clearly." Especially when I don't think the paper is clear.

Thursday, September 20, 2007

A Curvature Estimation for Pen Input Segmentation in Sketch-based Modeling

Summary:

Kim and Kim propose a new corner detection system that utilizes local changes around corners.

The system starts by resampling the points of the stroke so that all points within the stroke are equidistant from one another. The curvature value at a point is typically set to equal the distance around the point divided by the changes over stroke length. By having a constant stroke distance, the system allows the direction values (calculated the same as in Sezgin and Yu) to equal the curvature values.

The final curvature value around each point is determined by two new metrics: local convexity and local monotonicity. Local convexity measures the "support" for a potential corner point. This value is calculated by looking at a window around the point and adding all of the curvature points of the same sign as the point's curvature. Local monotonicity examines the same window of points around a possible corner, but this time each point is examined in sequence, starting from the center. The curvature value can only be added if it is less than the previous point's and the same sign; if these two requirements are not met the algorithm stops and returns the current curvature.

The algorithm found the correct corners approximately 95% of the time.


Discussion:

Proposing two new (seemingly easy to implement) curvature metrics could provide corner finders with more techniques to solidify between good corners and noisy data. The paper was only published recently so I'm not sure how much of an impact it has caused yet.

The main issue I would have liked discussed more in the paper would have been where this corner finder differed from others in results. I know what types of shapes Sezgin's corner finder does well with (polylines, simple arcs) but I have yet to seen a corner finder that can continually find distinguish truly tough corners, such as a smooth line-arc transition. Kim & Kim's paper even doesn't count this transition as a corner (see Fig. 14, shapes 13 and 20). Their paper even vaguely states their accuracy rating as being "about 95 percent." I would think that if the curvature calculations were significantly better than previous corner finders' that the authors would want to show more proof. I guess we'll see how well the new metrics work this weekend...

Monday, September 17, 2007

A Domain-Independent System for Sketch Recognition

Summary:

Yu and Cai's paper attempts to find corners in strokes by utilizing the stroke's direction and curvature information. Unlike Sezgin's corner finder, Yu and Cai's does not use speed data.

The corner finder first cheks to see if the stroke can be approximated by a primitive shape such as a line, arc, or circle. If not, the stroke is divided into two smaller strokes at the point of highest curvature. The system again checks to see if each of the subsections can be approximated by primitives, and the cycle of splitting the stroke and checking continues.

At each primitive approximation step, the corner finder looks at the direction graph of the segment to determine whether the segment is likely to be a line or an arc. The direction graph for a line is flat, whereas direction graphs for arcs and circles are sloped. To fit a line to the segment a least squares fit is used. To fit an arc to a stroke, a line is fitted between the first and last points of the segment. An arc is then formed to include the two end points and the point on the original stroke perpendicular to the midpoint of the line. The "center" of this arc (which can be viewed as a segment of a circle) is found, and to determine the error of the arc fit the corner finder calculates the difference between the area of the original stroke and the area of the beautified arc fit.

A circle is just another case for arc fitting, except the slope of the arc should be above or around 2 pi.

If the stroke intersects itself, another fit should be computed that takes into account the self-intersection points as corners. The best fit between the original computation method and the new, self-intersection method is taken to be the final corner set.

The corner finder "cleans up" the best fit by merging or removing segments. If a segment is short in length and not connected to two other segments it is removed (eliminates hooks). If a segment is short in length and connected to two other segments it is split down its midpoint and the two sections are attached to the larger shapes. If two segments have similar angles and proximity (touching, almost connected) then they are merged.

The paper also includes some basic object recognition, but it really is no more than some simple geometric classification combined with the fitting methods already described.

The system worked very well on lines, arcs, and polylines. Correct corner rates on these cases were in the mid to upper 90s. Yet, on hybrid line and arc shapes the system had mixed results. The system still failed to find corners where lines and arcs smoothly transition into one another and obtained a 70% accuracy in these cases. The accuracy for other cases was around 90%.


Discussion:

Yu and Cai's paper shows another method for corner finding, this time without using speed data. Yet, their approach was very similar to Sezgin's. Although it was described differently, the system still essentially looks at the current corner fit, adds another possible corner, and analyzes the fit again.

The main change in this system is the way that it handles circles and arcs. The approach would seem to be more accurate on some cases, such as circles. Also, looking at the slope of a large portion of the direction graph seems like a good technique to distinguish between lines and arcs.

Overall I was disappointed with the system's accuracy. It doesn't appear to be very significant from Sezgin's. The main issue I have with the corner finder is that the system still only uses curvature values to find corners. In a project I worked on last year I had to find edges within a digital image, and I think a modified edge detection algorithm might work well for finding corners. You start off by looking at a small segment of points and determine the least squares fit. Keep adding chunks of points to the segment until your least squares fit starts to level. When the fit starts to change again, backtrack a little and add a corner. Then repeat, creating a new segment starting at your new corner.

The method can detect more subtle differences in curvature that a global curvature threshold will not detect (the threshold for each segment is the local change). I'll probably implement this later if I have the time and see how it goes. I know one issue with this approach is that it can be computationally intensive if you have long segments.

Wednesday, September 12, 2007

Sketch Based Interfaces: Early Processing for Sketch

Summary:

Sezgin et al.'s paper describes a way to find corners (vertices) of freely drawn symbols in order to break the symbol into lines and arcs. These lines and arcs can then be used to define the symbol, which can be passed into a geometrical recognizer for classification.

The vertex finding algorithm works by locating points of high curvature and minimum pen speed. Curvature is defined as being the change in direction at a given point, and speed is the change in arc length over time. Although the paper does not mention this, the curvature value for a point can be found by using a least squares fit over a window of points from p - k to p + k, and taking the slope of the fit (while watching for vertical lines, of course). Once we have the curvature and speed values for each point in the stroke we can find the local minima points that drop below a speed threshold and local maxima points that rise above a curvature threshold. In this paper the speed threshold is taken to be 90% of the average speed, and the curvature threshold is equal to the average curvature. These minima and maxima are possible vertices for speed and curvature, respectively.

Take the intersection of both the speed and curvature corners to get a starting set of possible corners, and now Sezgin et al. go through the remaining corners and calculate "fits" for each. The curvature metric is equal to the magnitude of the average of two curvature values some k window of points away, divided by the arc length between those points. The speed metric is just 1 - (speed at point / max speed). The remaining corner candidates are then sorted in their respective lists according to high metric value.

The algorithm then takes one remaining corner from each set (remaining speed and remaining curvature) and generates two new "hybrid fits": current corner set + speed candidate, current corner set + curvature candidate. The hybrid fits are then tested using a least squares algorithm between each vertex, and the fit with the least error becomes another possible corner fit for our stroke. More fits are generated until all remaining candidate corners are used up. The best fit is chosen based on an error threshold, and then the one with the least number of corners is chosen from those below the threshold.

Since a least squares fitting does not work with curved regions, for any arcs in a stroke the algorithm approximates the arc using a Bezier curve approximation. If the error on the curve we are trying to approximate is greater than a threshold we split the arc at the middle and create two new curve approximations. These curves are used to determine if fit errors.

The system can then beautify the sketch, which is a trivial process if we already have all of the vertices and the Bezier curve approximations. Overall the system had great results, and users "praised" the research because it allowed them to draw objects free-hand. The system accurately found the corners 96% of the time.


Discussion:

This paper is a cornerstone for vertex finding, pun intended. The research is well discussed, the algorithms are relatively easy to implement, and the results are fantastic.

The main thing that I would have liked to see included in the paper was a mention of where and how the algorithm can make a mistake. From my experience I know that corner finding is difficult on poorly drawn circles. Speed variations for circles can be slight, and if the circle looks more like an oval (as in the number 0) there can be a small protrusion or bump at the bottom of the circle that acts as a corner. I'd be interested to see how his system performed on these types of shapes.

Also, thresholds are very tricky when you take into account shapes that include both polylines and arcs. If it is a complex shape that needs to be drawn in a single stroke a user can sometimes pause in the middle of drawing to think about how to draw the next section. This pause will definitely hit as being a vertex, but it destroys the speed average and hurts the finding of more subtle corners.

Thursday, September 6, 2007

MARQS: Retrieving Sketches Using Domain- and Style-Independent Features Learned from a Single Example Using a Dual-Classifier

Summary:

Brandon's paper proposes a search algorithm that can search a laboratory notebook using sketches. The algorithm itself needs to be invariant to user drawing styles and account for rotation and scaling issues in sketches. It also must be able to recognize sketched symbols from only one drawing since a user searching through their notes might have only drawn a certain sketch one time. Brandon created the MARQS system, a media library with sketch queries, in order to test the algorithm.

To reduce rotation variance the algorithm rotates each sketch so that its major axis is horizontal. The major axis is defined as being the line between the two farthest points within the sketch. The four features used are: bounding box aspect ratio, pixel density, average curvature, and the number of corners found. If only one sketch has been drawn before, a query search will run a single classifier which will do a simple check for the error between the query features and the drawn example. A linear classifier is used when there is more than one example sketch during the query.

The average ranking (or classifier rank) for each sketch query was 1.51. As the system was used more often the results showed that the ranking improved since the system moved away from the linear classifier.


Discussion:

The system seems to work pretty well for very simplistic search algorithms, especially since it only uses four features. The downward trend in the ranking preference is also very promising since searches will likely be performed often.

MARQS' main limitation is in the features it can use since the system runs a linear classification on a freely drawn sketch. Each feature has to be invariant to the number of strokes, rotation, and scale. Yet, the feature for the number of perceived corners could easily change between sketches if somebody drew a tree with one jagged line whereas another time they drew it in small dashes. Merging close endpoints together into one larger stroke might be beneficial to the system, and then some other features like average length of strokes might be used.

Another thing to think about is if the journal system is ever implemented, how do you distinguish between a sketch and handwriting? I recently saw some great research from Auckland, but I can't find the paper online to link here. If you really want to know more ask me for a copy.

Tuesday, September 4, 2007

"Those Look Similar!" Issues in Automating Gesture Design Advice

Summary:

Gesture that are similar are hard to classify for a computer and difficult to remember for a user. Long and Landay created a tool called quill that helps designers improve any gestures they want to use by highlighting any gesture similarities. Designers can then take these similarities into account and redesign their gestures to be more unique.

When using quill, designers take a gesture they want implemented into their system and draw some examples into the tool. After more than one gesture class has been drawn quill begins to analyze the gestures for similarities. Similarity was determined to be based heavily upon gesture curviness and angle, as well as the density of the gesture. For more on how similarity was determined see "Visual Similarity of Pen Gestures."

The paper goes into detail about how presenting advice to the user is a large interface challenge. Long and Landay did not want quill to annoy expert gesture designers, but they also wanted to provide enough advice that both novice and experts could benefit from the system. In the end they decided to delay advice until the designer decides to test a gesture. The authors decided that designer were probably done with tweaking their initial gestures a bit when they were testing. Yet, quill also provides more subtle warnings to the designers whenever they pause.


Discussion:

Improving gesture designs by letting the computer point out when gestures might be confusing is a great way to safeguard poor gesture choices. The analysis of when to present advice to a user is also a large UI problem outside of quill; for instance, when should a stroke/symboll be beautified or when should a symbol be recognized? Long and Landay provided a good overview of their decisions, and I really liked their choice to present advice when the designer decides to test their classes.

I'm curious at how good the advice actually is for the system and how annoying it might be if the same advice keeps appearing every time I test my classes. If I design a left-to-right horizontal line to be a "Page Forward" gesture and a right-to-left horizontal line to be "Page Back" the computer might tell me that the gestures are similar, but the gestures have such a good mapping to forward and back arrows that users shouldn't confuse the two. I wouldn't want the system to constantly be warning me of this issue.