In Sharon and van de Panne's paper, sketch objects are represented with constellation models. A constellation model is a visual model that captures individual and pairwise features between strokes. For their model the site features include:
- The x-coordinate of the stroke's bounding box center
- The y-coordinate of the stroke's bounding box center
- The length of the bounding box diagonal
- The angle of the bounding box diagonal
- Delta x between the strokes
- Delta y between the strokes
- The minimum distance between the endpoints of stroke a and any point on stroke b
- The minimum distance between the endpoints of stroke b and any point on stroke a
Each element of the feature vectors (site F and interaction G) have their mean and covariances computed for the training data set. The labels in the training data then have and individual probabilistic model computed for the label's features.
For the entire sketch, a likelihood function estimates the probability of an entire labeling L. This function is given in the paper. The overall function tries to maximize the probability of an individual label, as well as the probability of the interactions between two individual labels.
A maximize likelihood search tries to maximize the likelihood function defined, i.e. the find maximum probable labeling. The search is multipass, with the first pass labeling only mandatory strokes objects. All possible label assignments are then searched for using a "branch-and-bound search tree [where each] node in the search tree represents a partial labeling of the sketch." The depth of the tree corresponds to the number of labels applied. The search advances by choosing the best assignment of mandatory labels, and the "cost" of this assignment bounds the rest of the search. The multipass algorithm also bounds branches of the tree before a full labeling is found. If a new likelihood is worse than the bound, then the branches associated with the likelihood's labeling are pruned. If no complete solution is found with the bounding, the bound is loosened until a labeling is found.
In the results section, the multipass thresholding/bounding greatly reduces the time required to find a complete labeling.
Discussion:
Although the paper discussed how using a branch-and-bound search tree with multipass thresholding improves the runtime, no mention was given to the accuracy of the final system. I've worked with systems similar to this before, and I know that the accuracy can drop drastically as the number of labels increases. The authors seem to curb this by having "optional" labels, and by forcing each component to be drawn in a single stroke, but results for how the system's accuracy scales with the number of labels would have been benefitial.
No comments:
Post a Comment