Wednesday, November 7, 2007

Speech and Sketching: An Empirical Study of Multimodal Interaction

Summary:

In this paper, Adler and Davis explore multimodal speech and sketch interfaces through a user study. Their goal is to allow the computer to provide feedback to the user as the user talks and draws, and the computer will influence the design during this process by asking questions and clarifying information. Having the computer understand everything about the design is not the goal; instead, the computer should know enough to ask motivating questions when necessary in order to engage the user. The system also does not want to constrain the user's drawing or speech style.

The user study conducted involved 18 users in a Wizard-of-Oz study. The users were asked to design a floor plan, full adder, AC/DC transformer, and a digital circuit. Sketches was done on Tablet PCs in software that allowed for drawing and highlighting in 5 different colors. During the study, the experimenter sat at a table across from the user. The study was filmed and the audio, visual, and sketching components of the study were synchronized.

The study showed some interesting results concerning color, questions, and speech timing. Users tended to rely on multiple colors to indicate portions of the sketch. The color linked parts of the sketch together, referred back to previous parts, and reflected the real-world colors of objects. When speaking, users typically had phrase and word repetition when they were thinking aloud. This could allow the computer to discern key words from the user-computer dialogue. Responses from computer questions also caused the user to repeat the questions, and simple questions could prompt more information than what was asked. Some users even redesigned their drawings after simple questions were asked, such as inquiring if two objects were similar. Speech and sketching started simultaneously in the study. Yet, certain parts of the speech, such as an entire phrase, tended to start before the sketch, and certain key words said alone tended to be heard after a sketch was started.

Discussion:

The two best components of Adler's study show how computers can assist humans during design steps by relying on the human design and thought process, instead of having an actual understanding. In lieu of training the computer to understand all of the components of a design, basic understanding of object similarity and grouping should be enough to produce a motivating dialogue. Also, the fact that the user constantly repeats words provides the computer with an indication of important information without the need of a large vocabulary.

I wish the study also went into more interface issues, such as when the computer should ask a question (e.g. during sketching, during a pause, etc.). Also, it would have been beneficial to see the average pause time of a user and if the user was speaking or mumbling during the pause by going "hmm" or something similar. Do the pauses for sketch indicate that the user is speaking, and do pauses for speaking indicate the user is sketching? Do the pauses for both modes line up?

1 comment:

- D said...

I would think that this sort of speech feedback could get very frustrating very quickly. As an example, when I'm on the phone with the stupid voice-driven menu systems, so I have to wait for it to stop talking before I can make my selection? Sometimes I just want it to shut up so I can talk. I wholeheartedly agree with your desire for more investigation into the interface aspects of the experiment.

Sometimes I feel lonely, and call these menus just for company.