Summary:
Certain sketch domains contain appropriate temporal information that can assist in symbol recognition. For instance, digital circuit diagrams can be highly time-dependent when restricted to certain symbols. Resistors are typically drawn in order, as are capacitors and batteries. Using HMMs to take advantage of this temporal information can improve sketch recognition accuracy.
Sezgin uses a HMM modeled with DBNs to maximize the likelihood of the observable features given the grouping's label. The DBN model takes observables as input, obtained through features computed on the grouping's primitives, and infers the probability of a stroke-level model given the observables. The observables are also modeled with a mixture of Gaussians, although I'm not sure what the mixture model is used for. When this DBN is combined into an HMM, the to other nodes added include an object hypothesis and an ending hypothesis. The object hypothesis predicts the object type (Resistor, Wire, etc.), whereas the ending hypothesis predicts when the symbol is finished drawing.
The inference of a DBN is linear, whereas the inference on an HHMM (hierarchical HMM) is O(T^3). Therefore, Sezgin converts the model to a DBN before inference is conducted. This step was not explained. During training, the use of continuous variables could cause numerical underflow during belief propagation. A specialized algorithm, the Lauritzen-Jensen belief propagation algorithm, was used to avoid the instability issues.
Overall, the model worked well in the domain and improved the recognition (lowered the error rates) for all 8 participants involved in the test. Since the model relies on time, any interspersing (drawing two or more objects simultaneously) introduces errors. This causes primitives to be missed in sketches, with over 6% of the primitives missed on average due to this issue.
Discussion:
Relying on time data is tricky with sketch recognition, since time information can only be used in certain domains. Circuit diagram recognition is not necessarily one of these domains, as shown by the interspersing data. By increasing the model to be greater than first-order the model might be able to account for some issues, but then the model would not be able to run in real-time, which was a large proponent of the system.
Wednesday, November 14, 2007
Sketch Interpretation Using Multiscale Models of Temporal Patterns
Labels:
belief propagation,
DBN,
HMM,
likelihood,
sketch recognition
Subscribe to:
Post Comments (Atom)
1 comment:
I wonder how many of the errors introduced by out of order strokes could be accounted for by student inexperience. For example, would a professional/expert be more inclined to draw a symbol completely and then move on? Or would they be more inclined to follow the path of least resistance, dragging the pen along as far as they could, and then returning to finish up later? I can see both ways, at the same time, meaning temporal information may not be that useful on its own. I think it would be powerful coupled with geometric data, however.
Post a Comment