Monday, January 28, 2008

An Architecture for Gesture-Based Control of Mobile Robots

Iba, S., J. M. V. Weghe, et al. (1999). An architecture for gesture-based control of mobile robots. Intelligent Robots and Systems, 1999. IROS '99. Proceedings. 1999 IEEE/RSJ International Conference on.


Summary:


Iba et al. describe a gesture-based control scheme for robots. HMMs are used to define seven gestures: closed fist, open hand, wave left, wave right, pointing, opening, and "wait". These gestures correspond to actions that a robot can take, such as accelerating and turning.

The mobile robot that the system uses has IR sensors, sonar sensors, a camera, and a wireless transmitter. The gesture capturing is done with a CyberGlove with 18 sensors.

Gesture recognition is performed with an HMM-based recognizer. The recognizer first preprocesses the sensor data to change the 18-dimensional sensor data into a 10-dimensional feature vector. The derivatives of each feature are computed as well, to produce a 20-dimensional column. Each column is then reduced to a "codeword" that maps the input to one of 32 possible codewords, or actions. This codebook is trained offline, and at runtime the feature vectors are mapped to a codeword.

The HMM takes a sequence of codewords and determines which gesture the user is performing. It is important to note that if no suitable gesture is found, the recognizer can return "none". To overcome some HMM problems, the "wait state" is the first node in the model and transitions to the other 6 gestures. If no gesture is currently seen, the wait state is the most probable. As more observations push the gesture toward another state, the correct gesture probability is altered and the gesture spotter picks the gesture with the highest score.


Discussion:

I'd have liked to know the intuition behind using 32 codewords. The inclusion of the wait state is also odd in combination with the "opening" state, which does not seemed to be mapped to anything. So technically the opening state is a wait+1 for either the close or opened state. I don't have much more to say on this one.

1 comment:

- D said...

Yeah I'd also like to know why they picked the 32, that specific value. IT might just be the case that's the best fit after they ran LBG to quantize the training set. I do like the quantization to made the HMMs both discrete and compress the possible space of inputs. I also like the wait state, as it allows the HMM to eat up garbage until a "real" gesture comes along.