Summary:
Krahnstoever et al. use RFID tags in conjunction with computer vision tracking to interpret what is happening within a scene.
A person model tracks a human's movement through their head and hands. The head is a 3D cartesian coordinate location, and each hand is described in spherical coordinates (r, phi, theta) with respect to the head. The models for where the head will be, p(X^t_h, X^t-1_h), and the hands p(X^t_q, X^t-1_h) had to be learned. The priors p(X_q | X_h) also had to be learned. Both hands and head are segmented using skin color.
Each pixel within a given image frame can belong to either the background or the foreground (body part). The likelihood for an image given the observations is taken to be the Improved Iterative Scaling (IIS) of the image section and bounding box of a body part, summed over the parts and sections. I have no idea how IIS works.
RFID tags provide movement and orientation information in 3D spaces. The amount of charge the RFID tag receives depends on its angle to the wave source, where a perpendicular angle receives no energy and a parallel angle is the greatest. The tag then outputs the tag's ID, orientation, and field strength to the signal.
The authors use the RFID information along with the hand and head positions to interpret what is happening in a scene. Agents are somehow used to do this.
Discussion:
The RFID information looks like it helps recognize what is happening within a scene, but I would have liked to have seen a comparison between a pure vision system and a system with the RFID. This could be a bit difficult, but it might help the strength of the paper.
I also would have liked an actual description of the activity agent system.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment