A Sketchy Blog

Change Blindness and Its Implications for Complex Monitoring and Control Systems Design and Operator Training

2009-06-11T23:29:00.003-05:00

Durlach, P. 2004. Change blindness and its implications for complex monitoring and control systems design and operator training. Hum.-Comput. Interact. 19, 4 (Dec. 2004), 423-451. DOI= http://dx.doi.org/10.1207/s15327051hci1904_10

Summary:

Durlach from The Army Research Institute discussed various aspects of change blindness's affects on important monitoring systems, such as airport traffic control.

One factor mentioned in the study is that the longer between screen updates (e.g., distractions), the more likely change blindness occurs. If the screen updates are almost non-existent, the changes are detected in 1-2 flashes. If the blank screens are ~80 ms, the detections are seen in about 17 alterations.

Other factors that affect change blindness include: distractions, discriminability (red vs. burgandy, red vs. white), categorization (tank vs. truck), biased serial search (rescanning same areas), amount of information, external attention capture, prior learning from a task (repeated or predictable change), meaningfulness of the change, and the user's expertise in the change area.

To help eliminate change blindness, Durlach proposes to reduce screen clutter, make any items easily discriminable, and train users on the systems.

Discussion:

There's no silver bullet to combat change blindness and inattentional blindness, and Durlach recognizes this. Her suggestions make sense, and she has a great list of pros and cons to accompany them. As tasks become more complex, there's always a sacrifice with making software that can handle the complexity while minimizing potential user errors.

Beyond Modularity

2009-06-11T20:32:00.003-05:00

Karmiloff-Smith, A. Beyond Modularity: A Developmental Perspective on Cognitive Science. MIT Press. November, 1992.

Summary:

Piaget's child development theory describes how children develop when their minds mature with age. He observed transitions and major events when this occurs, such as when children learn object permanence.

Karmiloff-Smith presents research challenging the assumption that growth happens in such steps. Instead, the author shows how many human functions (language, math, physics, drawing) are innate in very young children even before they are verbal. For instance, infants look longer at images that even humans would consider novel (such as objects that do not obey a perceived grouping, p. 68).

Discussion:

This book was really well-written and I enjoyed the break from regular computer science reading to read an almost pure psychology book. The idea that drawing is innate in humans is reaffirms our lab's claim that sketching is "natural and intuitive".

Drawing and the Non-verbal Mind

2009-06-11T15:08:00.002-05:00

Lange-Kuttner, C. and Vitner, A. "Drawing and the Non-Verbal Mind: A Life-Span Perspective." Cambridge University Press, September 15, 2008.

Summary:

The editors discussed hundreds of experiments dealing with drawings, most focused on children.

Some interesting points of note are:

Young children (3-4 yrs) often cannot recognize their own drawings after some time has passed since the original. (p. 55)
Children often have a "constant depiction strategy", such as drawing everything as a sunburst or as a scribbled dot. The depiction looks closer to the actual object with age. (p. 64)
A drawing can be affected by the question and how the child interprets the objects, such as individual objects or in a group. Grouping of objects can happen more often if the objects are similar: "two circles" vs. "circle and triangle" (pp. 165-173)
People suffering from diseases such as semantic dementia often forget the distinguishing characteristic of an object they should be drawing after a short period of time. (Rhino -> generic animal, p. 286)

Discussion:

The findings presented are too numerous to list, so I simply mentioned the ones I found most interesting. Actual child or mental development would be difficult to measure using sketch recognition techniques (the drawings are simply too abstract). If I ever work with children, items 2 and 3 will probably be helpful with either distinguishing between children or simply with the phrasing of the questions to the children.

Brain Mechanisms of Vision

2009-06-11T14:41:00.004-05:00

Hubel DH, Wiesel TN. Brain Mechanisms of Vision. Scientific American. 1979 Sep; 241(3):150-62

Summary:

The brain's primary visual cortex processes images in a modular, distorted way. The rods and cones in the eyes send messages from the retina to the geniculate cells in the brain, which then relay the message to the visual cortex. These geniculate cells are in a layer called Layer IV, are unsophisticated, and receive the bulk of the visual input.

Cells outside of layer IV have "orientation specificity", where a bar of light falling in a certain orientation will activate some cells and have no affect on others. The response for each cell appears to be around 10-20 degrees, at which point the response is lessened or abolished.

At the time (1979), there was no evidence to support that the orientation specific cells had anything to do with visual perception.

As electric signals moved into more complex layers of the visual cortex, some patterns emerged. Cells close to one another often have the same optimal stimulus orientation. Changes in orientation happen in small increments, such as 25-50 micrometers between cell groups mapping to a change of 10 degrees with varying direction reversals.

Discussion:

Really interesting information on how the structure and hierarchy of the primary visual cortex. Although the orientation information did not prove that the brain recognizes shapes using features such as line orientations, other papers citing this one might. I'll have to find some...

Unseen and Unaware: Implications of Recent Research on Failures of Visual Awareness for Human–Computer Interface Design

2009-06-11T13:40:00.003-05:00

Varakin, D. A., Levin, D. T., and Fidler, R. 2004. Unseen and unaware: implications of recent research on failures of visual awareness for human-computer interface design. Hum.-Comput. Interact. 19, 4 (Dec. 2004), 389-422. DOI= http://dx.doi.org/10.1207/s15327051hci1904_9

Summary:

The authors mention some research on inattentional blindness and change blindness and provide anecdotal evidence of their usage in computer interfaces

Inattentional Blindness - user is unaware of a change occurring within the same field of view
Change Blindess - user is unaware of a change occurring across multiple views

Change blindness: past, present, and future

2009-06-11T12:42:00.011-05:00

Daniel J. Simons, Ronald A. Rensink, Change blindness: past, present, and future, Trends in Cognitive Sciences, Volume 9, Issue 1, January 2005, Pages 16-20, ISSN 1364-6613, DOI: 10.1016/j.tics.2004.11.006. (http://www.sciencedirect.com/science/article/B6VH9-4DXTHVD-2/2/d3451247e53c70b0b390450a275a475a)

Summary:
The authors provide an overview of change blindness understanding, such as how research has shown that change blindness occurs often during eye movement or when a user's attention wanes.

The main contribution of the paper is the idea that change blindness research does not confirm the thought that visual representations of a scene are 'sparse'. The authors propose four requirements for a change blindness to reaffirm the idea of sparse representations:

Evidence must eliminate the possibility that detailed visual representations exist by fade from memory before the representations can be compared with others to perceive changes
Evidence must eliminate the possibility that detailed visual representations exist, but in a different visual processing section (of the brain?) that cannot compare with the currently viewed representation for change detection
Evidence must eliminate the possibility that any stored detailed representation is in a format that cannot be compared with another representation
Evidence must eliminate the possibility that both the stored detailed representation and the viewed representation can be compared, but are not for some reason

Discussion:

The paper's final thoughts on how a representation are stored do not concern me. Instead, this paper has a wide bibliography of change blindness research that should help me to look for related work.

CogSketch: Open-domain sketch understanding for cognitive science research and for education

2009-02-22T22:33:00.003-06:00

Summary

CogSketch presents a sketch recognition system wrapped in psychological syntax. Users draw single stroke glyphs that can be containment glyphs (symbols) or connection glyphs (relationships). Glyphs are recognized through a focused knowledge base of information that can be specified by the user. Inter-glyph relationships are computed using RCC-8.

On the interface side, the system contains layers that have modes.

Lastly, CogSketch has simulations that can be conducted. The two simulations are analogies (A is to B as C is to ?) and spatial language learning (inside, above, below, etc.).

Do Background Images Improve “Draw a Secret” Graphical Passwords?

2009-02-22T16:15:00.003-06:00

Dunphy, P. and Yan, J. 2007. Do background images improve "draw a secret" graphical passwords?. In Proceedings of the 14th ACM Conference on Computer and Communications Security (Alexandria, Virginia, USA, October 28 - 31, 2007). CCS '07. ACM, New York, NY, 36-47.

Summary

The authors use DAS passwords in conjunction with background images in order to improve the complexity of the passwords without harming user recall. A user would typically choose a small portion of an image to draw on, which could increase the complexity of the password if the image itself was complex.

The paper contains great user studies focusing on the recall of passwords, the complexity of images, what images users chose to draw on, and what recall errors occurred.

Discussion

This is another DAS paper, and, like the previously blogged one, shows how much room the graphical password field has to grow. The studies in this paper were phenomenally thorough, and if we ever start a sketching passwords project this is the paper we should all read.

Graphical Passwords & Qualitative Spatial Relations

2009-02-22T14:43:00.003-06:00

Lin, D., Dunphy, P., Olivier, P., and Yan, J. 2007. Graphical passwords & qualitative spatial relations. In Proceedings of the 3rd Symposium on Usable Privacy and Security (Pittsburgh, Pennsylvania, July 18 - 20, 2007). SOUPS '07, vol. 229. ACM, New York, NY, 161-162.

Summary

The authors modify the Draw-a-Secret (DAS) scheme where users draw a graphical password in a grid so that the "looking over the shoulder" phenomenon could be reduced. A DAS password is a simple encoding of a drawn stroke using a grid and directions, such as up, right, up.

The extended abstract presents a Qualitative Draw-a-Secret (QDAS) scheme that changes DAS by first assigning a number to each grid. Then, the grid itself varies based on the direction changes of the stroke. The grid changes based on cell height and width.

Discussion

Although this extended abstract wasn't too informative, it did give me some thoughts about how we could use sketch recognition techniques to improve upon drawn passwords.

SKIT: A Computer-Assisted Sketch Instruction Tool

2009-02-16T22:13:00.007-06:00

Greg Coombe and Brian Salomon
Department of Computer Science, University of North Carolina

This paper discusses a system, SKIT, that assists users in sketching line drawings by using some artist techniques. The system breaks a full outline sketch into subdrawings that the user can draw at various sizes. This allows the user to see the model they are drawing as a set of smaller, more geometric objects. The subdrawings are then merged back together in the end.

The small user study was more qualitative and showed users improved when using SKIT.

Discussion:

The paper presents a technique or two that might be helpful in starting a user-training program for sketching.

Invariant features for 3-D gesture recognition

2008-04-28T16:58:00.002-05:00

Summary:

Campbell et al. use HMMs and a list of features to find a good recognition rate for a set of T'ai Chi gestures that are performed by users in a swivel chair; a hand gesture's change in polar coordinates provided the highest recognition for the 18 gestures tested.

Discussion:

Performing T'ai Chi in a chair kind of defeats the purpose of T'ai Chi. That's like trying to study race car drivers by observing people who take the bus.

FreeDrawer - A Free-Form Sketching System on the Responsive Workbench

2008-04-23T16:58:00.007-05:00

Summary:

Wesche et al. created a 3D sketching tool where the skeleton of a model is created. A user can draw curves in a virtual space. A new curve can be drawn anywhere, but additional curves must be merged with the present model. Altering curves can be done on a local or global scale. Surfaces can be filled in at closed curve loops. Surfaces can also be smoothed.

Discussion:

This paper had some nice pictures, but very little material was actually presented. How does the computer know where the pen point is? How does the user interact with the pen? Range of motion? Gestures?

Interacting with human physiology

2008-04-23T16:57:00.012-05:00

Summary:

The authors Pavlidis et al. propose a system to monitor humans for stress levels and altered psychological states using high-end infrared cameras. This system could then be used for a variety of purposes such as stress management of UIs, illness detection, or lie detection.

The system tracks the user's face through tandem tracking to track a small, keys section of the face. These sections include the nose, forehead, and temporal regions. The tracker models each region by its center of mass and orientation. Blood flow is tracked in the face through a perfusion model and directional model. The model involves a differential equation set to measure the "volumetric metabolic heat" flow in the face. Other measurements tracked include pulse, heat transfer in areas, and breathing rate.

Discussion:

The ideas behind this system are great, although talking with Pavlidis showed us that there are issues with the current system's usability. Sweat and minor body temperature fluctuations can alter the system's reliability (since the system is trying to measure minor fluctuations). Unfortunately, the cost for one of these high-end cameras is $60k, so we won't be seeing this any time soon.

3D Object Modeling Using Spatial and Pictographic Gestures

2008-04-23T16:56:00.009-05:00

Summary:

Nishino et al. designed a 3D object modeling system that uses stereoscopic glasses, CyberGloves, and polhemus trackers.

The system allows the creation of superellipsoids that can have smooth or squarish parameters. These primitive shapes can be bent, stretched, twisted, and merged with other shapes. Hand postures control these actions, such as grasping and pointing. Virtual hands are displayed on a 200-inch arched screen, along with the object, in stereoscopic mode. The virtual hands allow the user to easily see where they can touch and modify the 3D model.

The authors tested the system by having users attempt two types of objects: symmetric and asymmetric. The symmetric object was a bottle, and the asymmetric object was a teapot.
Creation of the objects took up to 120 minutes. The size of the stored objects was much less than a competing program, Open Inventor.

Discussion:

For a paper in 1998, this was a pretty advanced system and seemed to offer some benefits over other systems. I would have liked to have seen feedback from the users, though, since I'm not sure how hard the system is to use.

Toward Natural Gesture/Speech HCI: A Case Study of Weather Narration

2008-04-23T16:56:00.006-05:00

Summary:

Poddar et al. use HMMs and speech to recognize complementary keyword-gesture pairs in weather reporting. The multi-modal system would combine speech and gesture recognition to improve recognition accuracy.

HMMs work well for separating 20 isolated gestures from weather narrators, but continuous gestures drop the accuracy to between 50-80% for sequences. Co-occurrence analysis seeks to study the meaning behind keywords occurring with gestures, the frequency behind these occurrences, and the temporal alignment of the gestures and keywords. A table they presented shows that certain types of gestures (contour, area, and pointing) are more heavily associated with certain keywords ("here", "direction", "location"). The accuracy of recognizing the gestures can improve with both video (gesture) and audio (keywords).

Discussion:

This paper does not necessarily add much to solutions, but it was written ten years ago and did show some nice results that combining speech and gesture can improve recognition. Since the authors did not use a speech recognition system, the errors with that system would also produce interesting results that differ from the given accuracies.

Discourse Topic and Gestural Form

2008-04-17T17:06:00.006-05:00

Summary:

Eisenstein et al. applied an unsupervised learning technique and a Bayesian network model to study the correlation between gestures and presentation topics.

Their system looks at "interest points" within a video image, where each interest point is said to have come from a mixture model. Interest points from a similar model are clustered together to create a codebook. A hidden variable determines whether the observation gesture codeword is from a topic-specific or a speaker-specific distribution. The authors use a Bayesian model to learn what distribution each gesture belongs to, based off Gaussians of feature vectors.

The system was tested with fifteen users giving 33 presentations picked from five topics. The experiments show that with correct labels, the topic-specific gestures account for 12% of the gestures, whereas corrupting these labels drops the average to 3%.

Discussion:

This paper is a good start to a longer study on how to incorporate topic-specific gestures into recognition systems. Finding these gestures can help computers understand what topics might be presented, as well as what speakers are presenting a topic or if a speaker is veering off-topic. The system can then be used for speech training, presentation classification, or assistance (Clippy).

Feature selection for grasp recognition from optical markers

2008-04-14T15:52:00.004-05:00

Summary:

Chang et al. reduced the number of markers needed on a vision-based hand grasp system from 30 to 5 while retaining around a 90% recognition rate.

Six different grasps are used for classification: cylindrical, spherical, lumbrical, two-finger pinch, tripod, and lateral tripod. The posterior probabilities for a class yk are modeled with a softmax function, which divides the exp value of an observation sequence with the class weights, divided by the sum of all exp(weights * obs) values.

The weight values are determined by maximum conditional likelihood estimation from the training set of observations and classes (X, Y). Gradient descent is used to find the log likelihood with respect to the weights. Input features are found using a "sequential wrapper algorithm" that examines one feature at a time with respect to a target class.

Grasp data measured 38 objects being grasped with a full set of 30 markers. An "optimal", small set of markers was chosen by forward and backward selection.

The results indicate that the small marker set of 5 markers has between a 92-97% "accuracy retention" rate.

Discussion:

Reducing the number of sensors using the forward and backward selection is nice, but simply having a few more sensors increases the accuracy to the actual plateau point. From 10 on there is almost no change in accuracy, but between 5 and 10 sensors the accuracy can jump 5%, or 1/20, which is a huge percentage when taking into account user frustration.

Glove-TalkII--A Neural-Network Interface which Maps Gestures to Parallel Formant Speech Synthesizer Controls

2008-04-14T13:53:00.005-05:00

Summary:

Fels and Hinton created Glove-TalkII, a system designed to synthesize voice using complicated glove and feet controls.

The artificial vocal track (AVT) is controlled using a CyberGlove, ContactGlove, polhemus sensor, and foot pedal. The ContactGlove controls 9 stop consonants, such as CH, T, and NG. The foot pedal controls the volume of the speech. Hand position corresponds to a vowel sound. Hand postures map to nonstop consonant phonemes.

The neural networks used include a vowel/consonant network to determine if the sensors are reading a vowel or consonant, and then separate vowel and consonant networks to distinguish between the phonemes.

A single user had to undergo 100 hours of training to be able to use the system.

Discussion:

Impractical. I'm shocked that they had someone train the system for 100 hours, and the fact that it takes a person that long to train the system should indicate that this is a poor way to synthesize voice. The person's final voice is even described as "intelligible and somewhat natural-sounding", which is not a good complement.

Requiring a person to walk around with a one-handed keyboard and type their words is a better solution. The keyboard wouldn't even have a foot pedal.

RFID-enabled Target Tracking and Following with a Mobile Robot Using Direction Finding Antennas

2008-04-09T14:15:00.006-05:00

Summary:

Kim et al. use dual-direction antennas to find the direction of arrival for RF signals transmitted from an RFID tag. The two spiral antennas are perpendicular to each other and their signal strengths are different depending on the angle to the RFID tag.

Obstacles in front of the antennas/tag increase the error in determining the direction. The object can still be tracked, though. In experimental results it worked pretty well.

Discussion:

It works pretty well for its domain. Probably less accurate for incredibly small movements (e.g., finger bends). Seems like every now and then it goes crazy off-track (Figure 8).

Gesture Recognition Using an Acceleration Sensor and Its Application to Musical Performance Control

2008-04-04T11:47:00.003-05:00

Summary:

Sawada and Hasimoto use accelerometer data to extract features of gestures and create a music tempo system.

The extracting of features is basic: projections onto certain planes, such as xy or yz, and the bounding box of the acceleration values. Changes of acceleration are measured using a fuzzy partition of radial angles.

The authors recognize or classify gestures using squared error. The actual gesture recognition is trivial.

The music tempo program is where the paper is more interesting as the system has to predict where a beat has been hit in real-time. Systems already existed where a marker is placed on a baton, but the visual processing of these systems usually has a delay of 0.1s (in 1997 computational power). In the author's system, gestures for up, down, and diagonal swings are used to indicate tempo. Other gestures can map to other elements of conducting.

A score is stored in the computer and the user conducts to the score. Often the computer and human are slightly off, and the two try to balance to each other. A simple function for balancing the tempo is given.

Discussion:

The system they use isn't a true conducting system since it relies on defined (and trained) gestures, but the ideas behind the tempo system are good and the simple execution and equations are appreciated.

Activity Recognition using Visual Tracking and RFID

2008-04-02T15:32:00.005-05:00

Summary:

Krahnstoever et al. use RFID tags in conjunction with computer vision tracking to interpret what is happening within a scene.

A person model tracks a human's movement through their head and hands. The head is a 3D cartesian coordinate location, and each hand is described in spherical coordinates (r, phi, theta) with respect to the head. The models for where the head will be, p(X^t_h, X^t-1_h), and the hands p(X^t_q, X^t-1_h) had to be learned. The priors p(X_q | X_h) also had to be learned. Both hands and head are segmented using skin color.

Each pixel within a given image frame can belong to either the background or the foreground (body part). The likelihood for an image given the observations is taken to be the Improved Iterative Scaling (IIS) of the image section and bounding box of a body part, summed over the parts and sections. I have no idea how IIS works.

RFID tags provide movement and orientation information in 3D spaces. The amount of charge the RFID tag receives depends on its angle to the wave source, where a perpendicular angle receives no energy and a parallel angle is the greatest. The tag then outputs the tag's ID, orientation, and field strength to the signal.

The authors use the RFID information along with the hand and head positions to interpret what is happening in a scene. Agents are somehow used to do this.

Discussion:

The RFID information looks like it helps recognize what is happening within a scene, but I would have liked to have seen a comparison between a pure vision system and a system with the RFID. This could be a bit difficult, but it might help the strength of the paper.

I also would have liked an actual description of the activity agent system.

Enabling fast and effortless customisation in accelerometer based gesture interaction

2008-03-31T13:23:00.003-05:00

Summary:

Mäntyjärvi et al. apply discrete HMMs to accelerometer data for gesture recognition. The authors had a previous study that indicated users prefer defining their own gestures, or they prefer intuitive gestures.

The authors add noise into the gestures to increase the recognition of user-defined gestures under certain conditions. This supposedly speeds up the training process since less gestures need to be "drawn". Adding Gaussian noise versus uniform noise might improve the recognition. But not really.

Discussion:

This paper changed courses in the middle and moved from customization to noise addition. The gesture set they tested on was super easy and can be done with Rubine's recognizer. I'd like to see some data that users created and the differences between the user-defined gesture and the DVD gestures.

Gesture Recognition with a Wii Controller

2008-03-27T12:49:00.002-05:00

Summary:

Schlomer et al. showed that the Wii controller is pretty good at recognizing tennis gestures.

Discussion:

Here's a good evaluation study.

SPIDAR G&G: A Two-Handed Haptic Interface for Bimanual VR Interaction

2008-03-27T12:35:00.003-05:00

Summary:

Murayama et al. presented a two-handed computer control device that allowed the manipulation of on-screen objects. The system, called SPIDAR G&G, consisted of two balls suspended in two horseshoe apparatus with six strings each. The user moved these balls with six degrees of freedom, which translated onto a cursor or object on the computer. The strings also had pull and resisted movement through small motors. Each ball included a pressure button that detected grip.

The authors evaluated the system using a pointer and a target object. The users had to manipulate the pointer and object with both balls in order to accomplish a goal. Three people tested their system and found that the use of two SPIDAR balls, as opposed to one and a keyboard, allowed the users to manipulate the objects faster. Also, haptic feedback helped.

Discussion:

Although the system sounds interesting, I have a lot of issues with the evaluation. The authors used only three people familiar with VR interfaces, which is quite low. A greater concern is that the system was only tested against another form of itself. SPIDAR G&G was only compared against SPIDAR G + keyboard, when really SPIDAR G&G should have been compared to a mouse and keyboard interface, or a joystick and mouse, or two joysticks, or a roller ball, or any number of more common peripherals. As is stands, I have no basis to say that the suspended ball manipulation method is any better than traditional interfaces. The only definite conclusion is that two balls are better than one, and having the balls touch back is beneficial.

Taiwan sign language (TSL) recognition based on 3D data and neural networks

2008-03-23T14:19:00.003-05:00

Summary:

Lee and Tsai implemented a vision-based hand gesture recognition system to classify 20 hand TSL signs. The system used hand features based on visual distances, and 8 reflective markers were placed on the hand to assist in these readings. The features are then sent into a back-propogation neural network (BPNN) that had 15 features as inputs and the 20 gesture probabilities as outputs.

The features used include the distances between a wrist point and the finger tips, and the distances between each finger pair (spread).

10 students tested the system and produced 2788 gestures, of which half went to training and the other half to testing. The authors tested on neural networks with 2 hidden layers varying in size from 25 x 25 to 250 x 250. The best results were with the BPNN with 250 x 250 hidden nodes, with a testing accuracy of 94.65%. Two gestures were heavily confused because the only difference was the length of the finger shown (i.e., the fingers were bent in one gesture).

Discussion:

This was a pretty decent use of neural nets, and I'm glad that they gave the results at different hidden layers and the recognition rates for each gesture. In fact, now that I think about it, I'm just glad they gave results. These are definitely the best results I've seen and quite promising: one of their main issues was a good feature to distinguish between bent fingers and non-bent fingers.

The differences between 150x150 and 250x250 are statistically insignificant, but they might be more significant when more gestures are added. I especially like that there is little discrepancy between training and testing sets, which hopefully indicates that their approach works for the general user.