Wednesday, October 24, 2007

Naturally Conveyed Explanations of Device Behavior

Summary:

Oltmans and Davis present ASSISTANCE, a multimodal system capable of understanding simple 2D physics diagrams. The diagrams can contain bodies, pin joints, springs, pulleys, and rods. Arrows are used to describe movement of objects, as well as verbal cues.

In ASSISTANCE, the user first draws the system they want to model. Then, the user verbally describes the system while pointing at objects in the drawing. ASSISTANCE constantly updates its interpretation of the drawing, and the user can ask for the computer's interpretation at any time. This interpretation is a "causal model" for the drawn system (i.e. a sequence of cause and effect actions).

To generate the causal model, ASSISTANCE first finds the degree of freedom each object has, such as rotation or translation freedom. The system then utilizes the verbal description of the system, as well as any arrows the user draws. Verbal information is parsed to separate key objects and actions. For example, the phrase "Body 2 pushes Body 3" will parse into "Body 2", "pushes", and "Body 3". These verbal phrases, as well as the drawn bodies and arrows, are converted into propositional statements, and ASSISTANCE performs reasoning using a forward-chaining algorithm and a truth maintenance system.

Often, the same action will be described in multiple ways, such as with a verbal description and an arrow indicating movement. When this happens, the two events are merged. The system assumes that only one motion can affect a body, so multiple descriptions affecting the same body would indicate that the descriptions describe the same event.

The final causal model is created by examining the causal events and constructing the most likely model for the system, given the description. To do this, ASSISTANCE uses known causal events and plausible causal events, along with constraint propagation. Events that do not have a cause are considered to be plausible and require an implicit cause by an outside force. The system tries to minimize these plausible causes, and the model is created when all clauses have events.

Discussion:

ASSISTANCE seems to be a great system, and I'm really curious how users evaluated it. There was no formal evaluation for this paper, but since we're reading his thesis next week I'll find out what users say shortly.

I love multimodal systems, but I also understand why there are not many multimodal applications commercially available. Being able to describe a drawing verbally and with gestural cues is great, and using both input modes can improve the system's accuracy when the two modes rely on each other for information. On the other hand, if the system does not force users to use all input modes, then the accuracy rate for each individual input still has to be very high, as if the separate input modes could not be relied on.

1 comment:

Brian David Eoff said...

His thesis doesn't really deal with this subject. I liked the Rube Goldberg-like sketching system. I think it would be an interesting application to play with.