<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-3022028357504155915</id><updated>2011-11-27T18:12:02.212-06:00</updated><category term='graphical models'/><category term='inattentional blindness'/><category term='orientation specificity'/><category term='hand gesture'/><category term='graphical passwords'/><category term='SVD'/><category term='multimodal'/><category term='beautification'/><category term='3D modeling'/><category term='low-level recognizer'/><category term='hand tracking'/><category term='assist'/><category term='grasp'/><category term='change blindness'/><category term='POMDP'/><category term='glove'/><category term='ambiguity'/><category term='posture'/><category term='sign language'/><category term='perception'/><category term='interface'/><category term='constraint satisfaction problem'/><category term='decision tree'/><category term='psychology'/><category term='augmented reality'/><category term='intelligence'/><category term='survey'/><category term='sensors'/><category term='feature similarity'/><category term='constellation models'/><category term='bayesian networks'/><category term='belief propagation'/><category term='DBN'/><category term='learning'/><category term='vertex detection'/><category term='haptics'/><category term='overtracing'/><category term='motion tracking'/><category term='gesture'/><category term='adaboost'/><category term='linear classifier'/><category term='children'/><category term='drawing'/><category term='vision'/><category term='neural networks'/><category term='tension lines'/><category term='virtual environments'/><category term='3D inference'/><category term='robotics'/><category term='HCI'/><category term='radial histogram'/><category term='geometric recognizer'/><category term='music'/><category term='reasoning'/><category term='sketch recognition'/><category term='ICA'/><category term='cognitive psychology'/><category term='wii remote'/><category term='vibration'/><category term='constraints'/><category term='likelihood'/><category term='text recognition'/><category term='online learning'/><category term='corner finding'/><category term='visual cortex'/><category term='HMM'/><category term='search'/><category term='features'/><category term='speech'/><category term='singularity'/><category term='RFID'/><category term='user study'/><category term='user interfaces'/><category term='line approximation'/><category term='segmentation'/><category term='sketching'/><category term='dynamic time warping'/><category term='PCA'/><title type='text'>A Sketchy Blog</title><subtitle type='html'>A collection of sketch recognition and haptics paper summaries.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>84</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-2749244378790071235</id><published>2009-06-11T23:29:00.003-05:00</published><updated>2009-06-12T11:02:12.091-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='user interfaces'/><category scheme='http://www.blogger.com/atom/ns#' term='change blindness'/><category scheme='http://www.blogger.com/atom/ns#' term='inattentional blindness'/><category scheme='http://www.blogger.com/atom/ns#' term='cognitive psychology'/><title type='text'>Change Blindness and Its Implications for Complex Monitoring and Control Systems Design and Operator Training</title><content type='html'>Durlach, P. 2004. Change blindness and its implications for complex monitoring and control systems design and operator training. &lt;i&gt;Hum.-Comput. Interact.&lt;/i&gt; 19, 4 (Dec. 2004), 423-451. DOI= http://dx.doi.org/10.1207/s15327051hci1904_10&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Durlach from The Army Research Institute discussed various aspects of change blindness's affects on important monitoring systems, such as airport traffic control.&lt;br /&gt;&lt;br /&gt;One factor mentioned in the study is that the longer between screen updates (e.g., distractions), the more likely change blindness occurs.  If the screen updates are almost non-existent, the changes are detected in 1-2 flashes.  If the blank screens are ~80 ms, the detections are seen in about 17 alterations.&lt;br /&gt;&lt;br /&gt;Other factors that affect change blindness include: distractions, discriminability (red vs. burgandy, red vs. white), categorization (tank vs. truck), biased serial search (rescanning same areas), amount of information, external attention capture, prior learning from a task (repeated or predictable change), meaningfulness of the change, and the user's expertise in the change area.&lt;br /&gt;&lt;br /&gt;To help eliminate change blindness, Durlach proposes to reduce screen clutter, make any items easily discriminable, and train users on the systems.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There's no silver bullet to combat change blindness and inattentional blindness, and Durlach recognizes this.  Her suggestions make sense, and she has a great list of pros and cons to accompany them.  As tasks become more complex, there's always a sacrifice with making software that can handle the complexity while minimizing potential user errors.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-2749244378790071235?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/2749244378790071235/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=2749244378790071235' title='36 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/2749244378790071235'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/2749244378790071235'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2009/06/change-blindness-and-its-implications.html' title='Change Blindness and Its Implications for Complex Monitoring and Control Systems Design and Operator Training'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>36</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-4074883978334489513</id><published>2009-06-11T20:32:00.003-05:00</published><updated>2009-06-12T10:56:22.690-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='children'/><category scheme='http://www.blogger.com/atom/ns#' term='cognitive psychology'/><title type='text'>Beyond Modularity</title><content type='html'>Karmiloff-Smith, A. Beyond Modularity: &lt;span class="bodycopy"&gt;A Developmental Perspective on Cognitive Science.  MIT Press. November, 1992.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Piaget's child development theory&lt;/span&gt;&lt;span class="bodycopy"&gt; describes how children develop when their minds mature with age.  He observed transitions and major events when this occurs, such as when children learn object permanence.&lt;br /&gt;&lt;br /&gt;Karmiloff-Smith presents research challenging the assumption that growth happens in such steps.  Instead, the author shows how many human functions (language, math, physics, drawing) are innate in very young children even before they are verbal.  For instance, infants look longer at images that even humans would consider novel (such as objects that do not obey a perceived grouping, p. 68).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This book was really well-written and I enjoyed the break from regular computer science reading to read an almost pure psychology book.  The idea that drawing is innate in humans is reaffirms our lab's claim that sketching is "natural and intuitive".&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-4074883978334489513?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/4074883978334489513/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=4074883978334489513' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4074883978334489513'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4074883978334489513'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2009/06/beyond-modularity.html' title='Beyond Modularity'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-4530437892003244369</id><published>2009-06-11T15:08:00.002-05:00</published><updated>2009-06-11T15:36:26.475-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketching'/><category scheme='http://www.blogger.com/atom/ns#' term='drawing'/><category scheme='http://www.blogger.com/atom/ns#' term='children'/><category scheme='http://www.blogger.com/atom/ns#' term='cognitive psychology'/><title type='text'>Drawing and the Non-verbal Mind</title><content type='html'>&lt;div style="text-align: left;"&gt;&lt;span&gt;&lt;a href="http://www.amazon.com/exec/obidos/search-handle-url/ref=ntt_athr_dp_sr_1?%5Fencoding=UTF8&amp;amp;search-type=ss&amp;amp;index=books&amp;amp;field-author=Chris%20Lange-K%C3%BCttner"&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;span style="color: rgb(0, 0, 0);"&gt;Lange-Kuttner, C. and Vitner, A. "Drawing and the Non-Verbal Mind: A Life-Span Perspective." &lt;/span&gt;Cambridge University Press, September 15, 2008.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The editors discussed hundreds of experiments dealing with drawings, most focused on children.&lt;br /&gt;&lt;br /&gt;Some interesting points of note are:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Young children (3-4 yrs) often cannot recognize their own drawings after some time has passed since the original. (p. 55)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Children often have a "constant depiction strategy", such as drawing everything as a sunburst or as a scribbled dot.  The depiction looks closer to the actual object with age. (p. 64)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;A drawing can be affected by the question and how the child interprets the objects, such as individual objects or in a group.  Grouping of objects can happen more often if the objects are similar: "two circles" vs. "circle and triangle" (pp. 165-173)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;People suffering from diseases such as semantic dementia often forget the distinguishing characteristic of an object they should be drawing after a short period of time. (Rhino -&gt; generic animal, p. 286)&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The findings presented are too numerous to list, so I simply mentioned the ones I found most interesting.  Actual child or mental development would be difficult to measure using sketch recognition techniques (the drawings are simply too abstract).  If I ever work with children, items 2 and 3 will probably be helpful with either distinguishing between children or simply with the phrasing of the questions to the children.&lt;br /&gt;&lt;span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-4530437892003244369?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/4530437892003244369/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=4530437892003244369' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4530437892003244369'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4530437892003244369'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2009/06/drawing-and-non-verbal-mind.html' title='Drawing and the Non-verbal Mind'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-6188136476377243978</id><published>2009-06-11T14:41:00.004-05:00</published><updated>2009-06-11T15:06:37.427-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='vision'/><category scheme='http://www.blogger.com/atom/ns#' term='orientation specificity'/><category scheme='http://www.blogger.com/atom/ns#' term='visual cortex'/><title type='text'>Brain Mechanisms of Vision</title><content type='html'>&lt;span&gt;Hubel DH, Wiesel TN. Brain Mechanisms of Vision. Scientific American. 1979 Sep; 241(3):150-62&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The brain's primary visual cortex processes images in a modular, distorted way.  The rods and cones in the eyes send messages from the retina to the geniculate cells in the brain, which then relay the message to the visual cortex.  These geniculate cells are in a layer called Layer IV, are unsophisticated, and receive the bulk of the visual input.&lt;br /&gt;&lt;br /&gt;Cells outside of layer IV have "orientation specificity", where a bar of light falling in a certain orientation will activate some cells and have no affect on others.  The response for each cell appears to be around 10-20 degrees, at which point the response is lessened or abolished. &lt;br /&gt;&lt;br /&gt;At the time (1979), there was no evidence to support that the orientation specific cells had anything to do with visual perception.&lt;br /&gt;&lt;br /&gt;As electric signals moved into more complex layers of the visual cortex, some patterns emerged.  Cells close to one another often have the same optimal stimulus orientation.  Changes in orientation happen in small increments, such as 25-50 micrometers between cell groups mapping to a change of 10 degrees with varying direction reversals.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Really interesting information on how the structure and hierarchy of the primary visual cortex.  Although the orientation information did not prove that the brain recognizes shapes using features such as line orientations, other papers citing this one might.  I'll have to find some...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-6188136476377243978?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/6188136476377243978/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=6188136476377243978' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/6188136476377243978'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/6188136476377243978'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2009/06/brain-mechanisms-of-vision.html' title='Brain Mechanisms of Vision'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-5921551193433891602</id><published>2009-06-11T13:40:00.003-05:00</published><updated>2009-06-11T15:07:05.436-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='change blindness'/><category scheme='http://www.blogger.com/atom/ns#' term='inattentional blindness'/><category scheme='http://www.blogger.com/atom/ns#' term='HCI'/><category scheme='http://www.blogger.com/atom/ns#' term='cognitive psychology'/><title type='text'>Unseen and Unaware:  Implications of Recent Research on Failures of Visual Awareness for Human–Computer Interface Design</title><content type='html'>Varakin, D. A., Levin, D. T., and Fidler, R. 2004. Unseen and unaware: implications of recent research on failures of visual awareness for human-computer interface design. &lt;i&gt;Hum.-Comput. Interact.&lt;/i&gt;&lt;span&gt; 19, 4 (Dec. 2004), 389-422. DOI= &lt;a href="http://dx.doi.org/10.1207/s15327051hci1904_9" class="smarterwiki-linkify"&gt;http://dx.doi.org/10.1207/s15327051hci1904_9&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The authors mention some research on inattentional blindness and change blindness and provide anecdotal evidence of their usage in computer interfaces&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span&gt;Inattentional Blindness - user is unaware of a change occurring within &lt;span style="font-style: italic;"&gt;the same&lt;/span&gt; field of view&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;Change Blindess - user is unaware of a change occurring across &lt;span style="font-style: italic;"&gt;multiple&lt;/span&gt; views&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-5921551193433891602?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/5921551193433891602/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=5921551193433891602' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5921551193433891602'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5921551193433891602'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2009/06/unseen-and-unaware-implications-of.html' title='Unseen and Unaware:  Implications of Recent Research on Failures of Visual Awareness for Human–Computer Interface Design'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-4494373417873192082</id><published>2009-06-11T12:42:00.011-05:00</published><updated>2009-06-11T15:07:17.840-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='change blindness'/><category scheme='http://www.blogger.com/atom/ns#' term='cognitive psychology'/><title type='text'>Change blindness: past, present, and future</title><content type='html'>&lt;span&gt;Daniel J. Simons, Ronald A. Rensink, Change blindness: past, present, and future, Trends in Cognitive Sciences, Volume 9, Issue 1, January 2005, Pages 16-20, ISSN 1364-6613, DOI: 10.1016/j.tics.2004.11.006. (http://www.sciencedirect.com/science/article/B6VH9-4DXTHVD-2/2/d3451247e53c70b0b390450a275a475a)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;span&gt;The authors provide an overview of change blindness understanding, such as how research has shown that change blindness occurs often during eye movement or when a user's attention wanes.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;The main contribution of the paper is the idea that change blindness research does not confirm the thought that visual representations of a scene are 'sparse'.  The authors propose four requirements for a change blindness to reaffirm the idea of sparse representations:&lt;/span&gt;&lt;br /&gt;&lt;ol style=""&gt;&lt;li&gt;Evidence must eliminate the possibility that detailed visual representations exist by fade from memory before the representations can be compared with others to perceive changes&lt;/li&gt;&lt;li&gt;Evidence must eliminate the possibility that detailed visual representations exist, but in a different visual processing section (of the brain?) that cannot compare with the currently viewed representation for change detection&lt;/li&gt;&lt;li&gt;Evidence must eliminate the possibility that any stored detailed representation is in a format that cannot be compared with another representation&lt;/li&gt;&lt;li&gt;Evidence must eliminate the possibility that both the stored detailed representation and the viewed representation can be compared, but are not for some reason&lt;/li&gt;&lt;/ol&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;The paper's final thoughts on how a representation are stored do not concern me.  Instead, this paper has a wide bibliography of change blindness research that should help me to look for related work.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-4494373417873192082?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/4494373417873192082/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=4494373417873192082' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4494373417873192082'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4494373417873192082'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2009/06/change-blindness-past-present-and.html' title='Change blindness: past, present, and future'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-4915181951384094157</id><published>2009-02-22T22:33:00.003-06:00</published><updated>2009-02-22T23:35:41.302-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='psychology'/><title type='text'>CogSketch: Open-domain sketch understanding for cognitive science research and for education</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;CogSketch presents a sketch recognition system wrapped in psychological syntax.  Users draw single stroke &lt;span style="font-style: italic;"&gt;glyphs&lt;/span&gt; that can be &lt;span style="font-style: italic;"&gt;containment glyphs &lt;/span&gt;(symbols) or &lt;span style="font-style: italic;"&gt;connection glyphs&lt;/span&gt; (relationships).  Glyphs are recognized through a focused knowledge base of information that can be specified by the user.  Inter-glyph relationships are computed using &lt;a href="http://portal.acm.org/citation.cfm?id=225387"&gt;RCC-8&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;On the interface side, the system contains layers that have modes.&lt;br /&gt;&lt;br /&gt;Lastly, CogSketch has simulations that can be conducted.  The two simulations are analogies (A is to B as C is to ?) and spatial language learning (inside, above, below, etc.).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-4915181951384094157?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/4915181951384094157/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=4915181951384094157' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4915181951384094157'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4915181951384094157'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2009/02/cogsketch-open-domain-sketch.html' title='CogSketch: Open-domain sketch understanding for cognitive science research and for education'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-4242026568796731271</id><published>2009-02-22T16:15:00.003-06:00</published><updated>2009-02-22T16:38:52.873-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='graphical passwords'/><title type='text'>Do Background Images Improve “Draw a Secret” Graphical Passwords?</title><content type='html'>Dunphy, P. and Yan, J. 2007. Do background images improve "draw a secret" graphical passwords?. In &lt;i&gt;Proceedings of the 14th ACM Conference on Computer and Communications Security&lt;/i&gt; (Alexandria, Virginia, USA, October 28 - 31, 2007). CCS '07. ACM, New York, NY, 36-47.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Summary&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The authors use DAS passwords in conjunction with background images in order to improve the complexity of the passwords without harming user recall.  A user would typically choose a small portion of an image to draw on, which could increase the complexity of the password if the image itself was complex.&lt;br /&gt;&lt;br /&gt;The paper contains great user studies focusing on the recall of passwords, the complexity of images, what images users chose to draw on, and what recall errors occurred.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This is another DAS paper, and, like the previously blogged one, shows how much room the graphical password field has to grow.  The studies in this paper were phenomenally thorough, and if we ever start a sketching passwords project this is the paper we should all read.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-4242026568796731271?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/4242026568796731271/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=4242026568796731271' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4242026568796731271'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4242026568796731271'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2009/02/do-background-images-improve-draw.html' title='Do Background Images Improve “Draw a Secret” Graphical Passwords?'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-7200129360265405404</id><published>2009-02-22T14:43:00.003-06:00</published><updated>2009-02-22T16:15:01.434-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='graphical passwords'/><title type='text'>Graphical Passwords &amp; Qualitative Spatial Relations</title><content type='html'>Lin, D., Dunphy, P., Olivier, P., and Yan, J. 2007. Graphical passwords &amp;amp; qualitative spatial relations. In &lt;i&gt;Proceedings of the 3rd Symposium on Usable Privacy and Security&lt;/i&gt; (Pittsburgh, Pennsylvania, July 18 - 20, 2007). SOUPS '07, vol. 229. ACM, New York, NY, 161-162.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Summary&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The authors modify the Draw-a-Secret (DAS) scheme where users draw a graphical password in a grid so that the "looking over the shoulder" phenomenon could be reduced.  A DAS password is a simple encoding of a drawn stroke using a grid and directions, such as up, right, up.&lt;br /&gt;&lt;br /&gt;The extended abstract presents a Qualitative Draw-a-Secret (QDAS) scheme that changes DAS by first assigning a number to each grid.  Then, the grid itself varies based on the direction changes of the stroke.  The grid changes based on cell height and width.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Although this extended abstract wasn't too informative, it did give me some thoughts about how we could use sketch recognition techniques to improve upon drawn passwords.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-7200129360265405404?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/7200129360265405404/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=7200129360265405404' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7200129360265405404'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7200129360265405404'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2009/02/graphical-passwords-qualitative-spatial.html' title='Graphical Passwords &amp; Qualitative Spatial Relations'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-7459126366619515095</id><published>2009-02-16T22:13:00.007-06:00</published><updated>2009-02-16T22:51:43.268-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketching'/><category scheme='http://www.blogger.com/atom/ns#' term='psychology'/><category scheme='http://www.blogger.com/atom/ns#' term='learning'/><category scheme='http://www.blogger.com/atom/ns#' term='assist'/><title type='text'>SKIT: A Computer-Assisted Sketch Instruction Tool</title><content type='html'>Greg Coombe and Brian Salomon&lt;br /&gt;Department of Computer Science, University of North Carolina&lt;br /&gt;&lt;br /&gt;This paper discusses a system, SKIT, that assists users in sketching line drawings by using some artist techniques.  The system breaks a full outline sketch into subdrawings that the user can draw at various sizes.  This allows the user to see the model they are drawing as a set of smaller, more geometric objects.  The subdrawings are then merged back together in the end.&lt;br /&gt;&lt;br /&gt;The small user study was more qualitative and showed users improved when using SKIT.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The paper presents a technique or two that might be helpful in starting a user-training program for sketching.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-7459126366619515095?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/7459126366619515095/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=7459126366619515095' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7459126366619515095'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7459126366619515095'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2009/02/skit-computer-assisted-sketch.html' title='SKIT: A Computer-Assisted Sketch Instruction Tool'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-4982954658763849582</id><published>2008-04-28T16:58:00.002-05:00</published><updated>2008-04-28T17:05:33.572-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HMM'/><category scheme='http://www.blogger.com/atom/ns#' term='vision'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>Invariant features for 3-D gesture recognition</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Campbell &lt;span style="font-style: italic;"&gt;et al.&lt;/span&gt; use HMMs and a list of features to find a good recognition rate for a set of T'ai Chi  gestures that are performed by users in a swivel chair; a hand gesture's change in polar coordinates provided the highest recognition for the 18 gestures tested.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Performing T'ai Chi in a chair kind of defeats the purpose of T'ai Chi.  That's like trying to study race car drivers by observing people who take the bus.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-4982954658763849582?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/4982954658763849582/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=4982954658763849582' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4982954658763849582'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4982954658763849582'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/04/invariant-features-for-3-d-gesture.html' title='Invariant features for 3-D gesture recognition'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-5726030902336992649</id><published>2008-04-23T16:58:00.007-05:00</published><updated>2009-06-11T20:58:24.530-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='user interfaces'/><category scheme='http://www.blogger.com/atom/ns#' term='3D modeling'/><title type='text'>FreeDrawer - A Free-Form Sketching System on the Responsive Workbench</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;span&gt;&lt;br /&gt;&lt;br /&gt;Wesche &lt;span style="font-style: italic;"&gt;et al.&lt;/span&gt; created a 3D sketching tool where the skeleton of a model is created.  A user can draw curves in a virtual space.  A new curve can be drawn anywhere, but additional curves must be merged with the present model.  Altering curves can be done on a local or global scale.  Surfaces can be filled in at closed curve loops.  Surfaces can also be smoothed.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This paper had some nice pictures, but very little material was actually presented.  How does the computer know where the pen point is?  How does the user interact with the pen?  Range of motion?  Gestures?&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-5726030902336992649?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/5726030902336992649/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=5726030902336992649' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5726030902336992649'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5726030902336992649'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/04/freedrawer-free-form-sketchign-system.html' title='FreeDrawer - A Free-Form Sketching System on the Responsive Workbench'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-3588910954660038410</id><published>2008-04-23T16:57:00.012-05:00</published><updated>2008-04-23T22:49:12.280-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='user interfaces'/><category scheme='http://www.blogger.com/atom/ns#' term='vision'/><category scheme='http://www.blogger.com/atom/ns#' term='motion tracking'/><title type='text'>Interacting with human physiology</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;The authors &lt;span&gt;Pavlidis&lt;/span&gt;&lt;span style="font-style: italic;"&gt; et al.&lt;/span&gt; propose a system to monitor humans for stress levels and altered psychological states using high-end infrared cameras.  This system could then be used for a variety of purposes such as stress management of UIs, illness detection, or lie detection.&lt;br /&gt;&lt;br /&gt;The system tracks the user's face through &lt;span style="font-style: italic;"&gt;tandem tracking&lt;/span&gt; to track a small, keys section of the face.  These sections include the nose, forehead, and temporal regions.  The tracker models each region by its center of mass and orientation.  Blood flow is tracked in the face through a perfusion model and directional model.  The model involves a differential equation set to measure the "volumetric metabolic heat" flow in the face.  Other measurements tracked include pulse, heat transfer in areas, and breathing rate.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The ideas behind this system are great, although talking with &lt;span&gt;Pavlidis showed us that there are issues with the current system's usability.  Sweat and minor body temperature fluctuations can alter the system's reliability (since the system is trying to measure minor fluctuations).  &lt;/span&gt;Unfortunately, the cost for one of these high-end cameras is $60k, so we won't be seeing this any time soon.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-3588910954660038410?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/3588910954660038410/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=3588910954660038410' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/3588910954660038410'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/3588910954660038410'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/04/interacting-with-human-physiology.html' title='Interacting with human physiology'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-508184901789343455</id><published>2008-04-23T16:56:00.009-05:00</published><updated>2008-04-24T14:51:07.611-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='user interfaces'/><category scheme='http://www.blogger.com/atom/ns#' term='augmented reality'/><category scheme='http://www.blogger.com/atom/ns#' term='hand tracking'/><category scheme='http://www.blogger.com/atom/ns#' term='grasp'/><category scheme='http://www.blogger.com/atom/ns#' term='3D modeling'/><title type='text'>3D Object Modeling Using Spatial and Pictographic Gestures</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span&gt;Nishino &lt;span style="font-style: italic;"&gt;et al.&lt;/span&gt; designed a 3D object modeling system that uses stereoscopic glasses, CyberGloves, and polhemus trackers.&lt;br /&gt;&lt;br /&gt;The system allows the creation of superellipsoids that can have smooth or squarish parameters.  These primitive shapes can be bent, stretched, twisted, and merged with other shapes.  Hand postures control these actions, such as grasping and pointing.  Virtual hands are displayed on a 200-inch arched screen, along with the object, in stereoscopic mode.  The virtual hands allow the user to easily see where they can touch and modify the 3D model.&lt;br /&gt;&lt;br /&gt;The authors tested the system by having users attempt two types of objects: symmetric and asymmetric.  The symmetric object was a bottle, and the asymmetric object was a teapot. &lt;br /&gt;Creation of the objects took up to 120 minutes.  The size of the stored objects was much less than a competing program, Open Inventor.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;For a paper in 1998, this was a pretty advanced system and seemed to offer some benefits over other systems.  I would have liked to have seen feedback from the users, though, since I'm not sure how hard the system is to use.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-508184901789343455?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/508184901789343455/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=508184901789343455' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/508184901789343455'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/508184901789343455'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/04/3d-object-modeling-using-spatial-and.html' title='3D Object Modeling Using Spatial and Pictographic Gestures'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-1346781579713340320</id><published>2008-04-23T16:56:00.006-05:00</published><updated>2008-04-23T17:28:32.015-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HMM'/><category scheme='http://www.blogger.com/atom/ns#' term='gesture'/><category scheme='http://www.blogger.com/atom/ns#' term='multimodal'/><category scheme='http://www.blogger.com/atom/ns#' term='speech'/><title type='text'>Toward Natural Gesture/Speech HCI: A Case Study of Weather Narration</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;span&gt;&lt;br /&gt;&lt;br /&gt;Poddar &lt;span style="font-style: italic;"&gt;et al.&lt;/span&gt; use HMMs and speech to recognize complementary keyword-gesture pairs in weather reporting.  The multi-modal system would combine speech and gesture recognition to improve recognition accuracy.&lt;br /&gt;&lt;br /&gt;HMMs work well for separating 20 isolated gestures from weather narrators, but continuous gestures drop the accuracy to between 50-80% for sequences.  Co-occurrence analysis seeks to study the meaning behind keywords occurring with gestures, the frequency behind these occurrences, and the temporal alignment of the gestures and keywords.  A table they presented shows that certain types of gestures (contour, area, and pointing) are more heavily associated with certain keywords ("here", "direction", "location").  The accuracy of recognizing the gestures can improve with both video (gesture) and audio (keywords).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This paper does not necessarily add much to solutions, but it was written ten years ago and did show some nice results that combining speech and gesture can improve recognition.  Since the authors did not use a speech recognition system, the errors with that system would also produce interesting results that differ from the given accuracies.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-1346781579713340320?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/1346781579713340320/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=1346781579713340320' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1346781579713340320'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1346781579713340320'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/04/toward-natural-gesturespeech-hci-case.html' title='Toward Natural Gesture/Speech HCI: A Case Study of Weather Narration'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-7873943278829945145</id><published>2008-04-17T17:06:00.006-05:00</published><updated>2008-04-23T16:56:07.501-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='vision'/><category scheme='http://www.blogger.com/atom/ns#' term='gesture'/><category scheme='http://www.blogger.com/atom/ns#' term='bayesian networks'/><title type='text'>Discourse Topic and Gestural Form</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Eisenstein &lt;span style="font-style: italic;"&gt;et al.&lt;/span&gt; applied an unsupervised learning technique and a Bayesian network model to study the correlation between gestures and presentation topics.&lt;br /&gt;&lt;br /&gt;Their system looks at "interest points" within a video image, where each interest point is said to have come from a mixture model.  Interest points from a similar model are clustered together to create a codebook.  A hidden variable determines whether the observation gesture codeword is from a topic-specific or a speaker-specific distribution.  The authors use a Bayesian model to learn what distribution each gesture belongs to, based off Gaussians of feature vectors.&lt;br /&gt;&lt;br /&gt;The system was tested with fifteen users giving 33 presentations picked from five topics.  The experiments show that with correct labels, the topic-specific gestures account for 12% of the gestures, whereas corrupting these labels drops the average to 3%.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This paper is a good start to a longer study on how to incorporate topic-specific gestures into recognition systems.  Finding these gestures can help computers understand what topics might be presented, as well as what speakers are presenting a topic or if a speaker is veering off-topic.  The system can then be used for speech training, presentation classification, or assistance (Clippy).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-7873943278829945145?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/7873943278829945145/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=7873943278829945145' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7873943278829945145'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7873943278829945145'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/04/discourse-topic-and-gestural-form.html' title='Discourse Topic and Gestural Form'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-1374407887871868119</id><published>2008-04-14T15:52:00.004-05:00</published><updated>2008-04-17T17:02:34.010-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='adaboost'/><category scheme='http://www.blogger.com/atom/ns#' term='grasp'/><category scheme='http://www.blogger.com/atom/ns#' term='glove'/><title type='text'>Feature selection for grasp recognition from optical markers</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Chang et al. reduced the number of markers needed on a vision-based hand grasp system from 30 to 5 while retaining around a 90% recognition rate.&lt;br /&gt;&lt;br /&gt;Six different grasps are used for classification: cylindrical, spherical, lumbrical, two-finger pinch, tripod, and lateral tripod.   The posterior probabilities for a class &lt;span style="font-style: italic;"&gt;y&lt;span style="font-size:78%;"&gt;k&lt;/span&gt;&lt;/span&gt; are modeled with a softmax function, which divides the exp value of an observation sequence with the class weights, divided by the sum of all exp(weights * obs) values.&lt;br /&gt;&lt;br /&gt;The weight values are determined by maximum conditional likelihood estimation from the training set of observations and classes (&lt;span style="font-style: italic;"&gt;X, Y&lt;/span&gt;).  Gradient descent is used to find the log likelihood with respect to the weights.  Input features are found using a "sequential wrapper algorithm" that examines one feature at a time with respect to a target class.&lt;br /&gt;&lt;br /&gt;Grasp data measured 38 objects being grasped with a full set of 30 markers.  An "optimal", small set of markers was chosen by forward and backward selection.&lt;br /&gt;&lt;br /&gt;The results indicate that the small marker set of 5 markers has between a 92-97% "accuracy retention" rate.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Reducing the number of sensors using the forward and backward selection is nice, but simply having a few more sensors increases the accuracy to the actual plateau point.  From 10 on there is almost no change in accuracy, but between 5 and 10 sensors the accuracy can jump 5%, or 1/20, which is a huge percentage when taking into account user frustration.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-1374407887871868119?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/1374407887871868119/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=1374407887871868119' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1374407887871868119'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1374407887871868119'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/04/feature-selection-for-grasp-recognition.html' title='Feature selection for grasp recognition from optical markers'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-8110036810309405512</id><published>2008-04-14T13:53:00.005-05:00</published><updated>2008-04-14T14:27:25.051-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='neural networks'/><category scheme='http://www.blogger.com/atom/ns#' term='gesture'/><category scheme='http://www.blogger.com/atom/ns#' term='speech'/><category scheme='http://www.blogger.com/atom/ns#' term='hand tracking'/><category scheme='http://www.blogger.com/atom/ns#' term='glove'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>Glove-TalkII--A Neural-Network Interface which Maps Gestures to Parallel Formant Speech Synthesizer Controls</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Fels and Hinton created Glove-TalkII, a system designed to synthesize voice using complicated glove and feet controls.&lt;br /&gt;&lt;br /&gt;The artificial vocal track (AVT) is controlled using a CyberGlove, ContactGlove, polhemus sensor, and foot pedal.  The ContactGlove controls 9 stop consonants, such as CH, T, and NG.  The foot pedal controls the volume of the speech.  Hand position corresponds to a vowel sound.  Hand postures map to nonstop consonant phonemes.&lt;br /&gt;&lt;br /&gt;The neural networks used include a vowel/consonant network to determine if the sensors are reading a vowel or consonant, and then separate vowel and consonant networks to distinguish between the phonemes.&lt;br /&gt;&lt;br /&gt;A single user had to undergo 100 hours of training to be able to use the system.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Impractical.  I'm shocked that they had someone train the system for 100 hours, and the fact that it takes a person that long to train the system should indicate that this is a poor way to synthesize voice.  The person's final voice is even described as "intelligible and somewhat natural-sounding", which is not a good complement.&lt;br /&gt;&lt;br /&gt;Requiring a person to walk around with a one-handed keyboard and type their words is a better solution.  The keyboard wouldn't even have a foot pedal.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-8110036810309405512?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/8110036810309405512/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=8110036810309405512' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/8110036810309405512'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/8110036810309405512'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/04/glove-talkii-neural-network-interface.html' title='Glove-TalkII--A Neural-Network Interface which Maps Gestures to Parallel Formant Speech Synthesizer Controls'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-8968284448029145691</id><published>2008-04-09T14:15:00.006-05:00</published><updated>2008-04-09T14:33:16.653-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='robotics'/><category scheme='http://www.blogger.com/atom/ns#' term='RFID'/><title type='text'>RFID-enabled Target Tracking and Following with a Mobile Robot Using Direction Finding Antennas</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Kim &lt;span style="font-style: italic;"&gt;et al.&lt;/span&gt; use dual-direction antennas to find the direction of arrival for RF signals transmitted from an RFID tag.  The two spiral antennas are perpendicular to each other and their signal strengths are different depending on the angle to the RFID tag.&lt;br /&gt;&lt;br /&gt;Obstacles in front of the antennas/tag increase the error in determining the direction.  The object can still be tracked, though.  In experimental results it worked pretty well.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;It works pretty well for its domain.  Probably less accurate for incredibly small movements (e.g., finger bends).  Seems like every now and then it goes crazy off-track (Figure 8).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-8968284448029145691?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/8968284448029145691/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=8968284448029145691' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/8968284448029145691'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/8968284448029145691'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/04/rfid-enabled-target-tracking-and.html' title='RFID-enabled Target Tracking and Following with a Mobile Robot Using Direction Finding Antennas'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-7910833979943963199</id><published>2008-04-04T11:47:00.003-05:00</published><updated>2008-04-04T12:23:32.170-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='gesture'/><category scheme='http://www.blogger.com/atom/ns#' term='music'/><title type='text'>Gesture Recognition Using an Acceleration Sensor and Its Application to Musical Performance Control</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Sawada and Hasimoto use accelerometer data to extract features of gestures and create a music tempo system.&lt;br /&gt;&lt;br /&gt;The extracting of features is basic:  projections onto certain planes, such as &lt;span style="font-style: italic;"&gt;xy&lt;/span&gt; or &lt;span style="font-style: italic;"&gt;yz&lt;/span&gt;, and the bounding box of the acceleration values.  Changes of acceleration are measured using a fuzzy partition of radial angles.&lt;br /&gt;&lt;br /&gt;The authors recognize or classify gestures using squared error.  The actual gesture recognition is trivial.&lt;br /&gt;&lt;br /&gt;The music tempo program is where the paper is more interesting as the system has to predict where a beat has been hit in real-time.  Systems already existed where a marker is placed on a baton, but the visual processing of these systems usually has a delay of 0.1s (in 1997 computational power).  In the author's system, gestures for up, down, and diagonal swings are used to indicate tempo.  Other gestures can map to other elements of conducting.&lt;br /&gt;&lt;br /&gt;A score is stored in the computer and the user conducts to the score.  Often the computer and human are slightly off, and the two try to balance to each other.  A simple function for balancing the tempo is given.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The system they use isn't a true conducting system since it relies on defined (and trained) gestures, but the ideas behind the tempo system are good and the simple execution and equations are appreciated.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-7910833979943963199?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/7910833979943963199/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=7910833979943963199' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7910833979943963199'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7910833979943963199'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/04/gesture-recognition-using-acceleration.html' title='Gesture Recognition Using an Acceleration Sensor and Its Application to Musical Performance Control'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-973930034550000699</id><published>2008-04-02T15:32:00.005-05:00</published><updated>2008-04-04T11:47:25.366-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='vision'/><category scheme='http://www.blogger.com/atom/ns#' term='RFID'/><title type='text'>Activity Recognition using Visual Tracking and RFID</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Krahnstoever &lt;span style="font-style: italic;"&gt;et al.&lt;/span&gt; use RFID tags in conjunction with computer vision tracking to interpret what is happening within a scene.&lt;br /&gt;&lt;br /&gt;A person model tracks a human's movement through their head and hands.  The head is a 3D cartesian coordinate location, and each hand is described in spherical coordinates (r, phi, theta) with respect to the head.  The models for where the head will be, &lt;span style="font-style: italic;"&gt;p&lt;/span&gt;(X^&lt;span style="font-style: italic;"&gt;t&lt;/span&gt;_&lt;span style="font-style: italic;"&gt;h&lt;/span&gt;, X^&lt;span style="font-style: italic;"&gt;t-1&lt;/span&gt;_&lt;span style="font-style: italic;"&gt;h&lt;/span&gt;), and the hands &lt;span style="font-style: italic;"&gt;p&lt;/span&gt;(X^&lt;span style="font-style: italic;"&gt;t&lt;/span&gt;_&lt;span style="font-style: italic;"&gt;q&lt;/span&gt;, X^&lt;span style="font-style: italic;"&gt;t-1&lt;/span&gt;_&lt;span style="font-style: italic;"&gt;h&lt;/span&gt;) had to be learned.  The priors &lt;span style="font-style: italic;"&gt;p&lt;/span&gt;(X_&lt;span style="font-style: italic;"&gt;q&lt;/span&gt; | X_&lt;span style="font-style: italic;"&gt;h&lt;/span&gt;) also had to be learned.  Both hands and head are segmented using skin color.&lt;br /&gt;&lt;br /&gt;Each pixel within a given image frame can belong to either the background or the foreground (body part).  The likelihood for an image given the observations is taken to be the Improved Iterative Scaling (IIS) of the image section and bounding box of a body part, summed over the parts and sections.  I have no idea how IIS works.&lt;br /&gt;&lt;br /&gt;RFID tags provide movement and orientation information in 3D spaces.  The amount of charge the RFID tag receives depends on its angle to the wave source, where a perpendicular angle receives no energy and a parallel angle is the greatest.  The tag then outputs the tag's ID, orientation, and field strength to the signal.&lt;br /&gt;&lt;br /&gt;The authors use the RFID information along with the hand and head positions to interpret what is happening in a scene.  Agents are somehow used to do this.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The RFID information looks like it helps recognize what is happening within a scene, but I would have liked to have seen a comparison between a pure vision system and a system with the RFID.  This could be a bit difficult, but it might help the strength of the paper.&lt;br /&gt;&lt;br /&gt;I also would have liked an actual description of the activity agent system.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-973930034550000699?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/973930034550000699/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=973930034550000699' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/973930034550000699'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/973930034550000699'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/04/activity-recognition-using-visual.html' title='Activity Recognition using Visual Tracking and RFID'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-3305302858179745393</id><published>2008-03-31T13:23:00.003-05:00</published><updated>2008-04-23T16:59:22.773-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HMM'/><category scheme='http://www.blogger.com/atom/ns#' term='gesture'/><title type='text'>Enabling fast and effortless customisation in accelerometer based gesture interaction</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Mäntyjärvi &lt;span style="font-style: italic;"&gt;et al. &lt;/span&gt;apply discrete HMMs to accelerometer data for gesture recognition.  The authors had a previous study that indicated users prefer defining their own gestures, or they prefer intuitive gestures.&lt;br /&gt;&lt;br /&gt;The authors add noise into the gestures to increase the recognition of user-defined gestures under certain conditions.  This supposedly speeds up the training process since less gestures need to be "drawn".  Adding Gaussian noise versus uniform noise might improve the recognition.  But not really.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;This paper changed courses in the middle and moved from customization to noise addition.  The gesture set they tested on was super easy and can be done with Rubine's recognizer.  I'd like to see some data that users created and the differences between the user-defined gesture and the DVD gestures.&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-3305302858179745393?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/3305302858179745393/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=3305302858179745393' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/3305302858179745393'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/3305302858179745393'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/03/enabling-fast-and-effortless.html' title='Enabling fast and effortless customisation in accelerometer based gesture interaction'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-5713079949568680995</id><published>2008-03-27T12:49:00.002-05:00</published><updated>2008-03-27T13:01:04.956-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='wii remote'/><category scheme='http://www.blogger.com/atom/ns#' term='HMM'/><category scheme='http://www.blogger.com/atom/ns#' term='gesture'/><title type='text'>Gesture Recognition with a Wii Controller</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Schlomer &lt;span style="font-style: italic;"&gt;et al.&lt;/span&gt; showed that the Wii controller is pretty good at recognizing tennis gestures.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;a href="http://www.amazon.com/Nintendo-RVL-RSPE-USA-Wii-Sports/dp/B000TK0G78/ref=sr_1_1?ie=UTF8&amp;amp;s=videogames&amp;amp;qid=1206640715&amp;amp;sr=1-1"&gt;Here's a good evaluation study.&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-5713079949568680995?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/5713079949568680995/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=5713079949568680995' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5713079949568680995'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5713079949568680995'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/03/gesture-recognition-with-wii-controller.html' title='Gesture Recognition with a Wii Controller'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-5991796310059840576</id><published>2008-03-27T12:35:00.003-05:00</published><updated>2008-03-27T12:49:53.254-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='user interfaces'/><category scheme='http://www.blogger.com/atom/ns#' term='haptics'/><category scheme='http://www.blogger.com/atom/ns#' term='user study'/><title type='text'>SPIDAR G&amp;G: A Two-Handed Haptic Interface for Bimanual VR Interaction</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Murayama &lt;span style="font-style: italic;"&gt;et al.&lt;/span&gt; presented a two-handed computer control device that allowed the manipulation of on-screen objects.  The system, called SPIDAR G&amp;amp;G, consisted of two balls suspended in two horseshoe apparatus with six strings each.  The user moved these balls with six degrees of freedom, which translated onto a cursor or object on the computer.  The strings also had pull and resisted movement through small motors.  Each ball included a pressure button that detected grip.&lt;br /&gt;&lt;br /&gt;The authors evaluated the system using a pointer and a target object.  The users had to manipulate the pointer and object with both balls in order to accomplish a goal.  Three people tested their system and found that the use of two SPIDAR balls, as opposed to one and a keyboard, allowed the users to manipulate the objects faster.  Also, haptic feedback helped.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Although the system sounds interesting, I have a lot of issues with the evaluation.  The authors used only three people familiar with VR interfaces, which is quite low.  A greater concern is that the system was only tested against another form of itself.  SPIDAR G&amp;amp;G was only compared against SPIDAR G + keyboard, when really SPIDAR G&amp;amp;G should have been compared to a mouse and keyboard interface, or a joystick and mouse, or two joysticks, or a roller ball, or any number of more common peripherals.  As is stands, I have no basis to say that the suspended ball manipulation method is any better than traditional interfaces.  The only definite conclusion is that two balls are better than one, and having the balls touch back is beneficial.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-5991796310059840576?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/5991796310059840576/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=5991796310059840576' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5991796310059840576'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5991796310059840576'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/03/spidar-g-two-handed-haptic-interface.html' title='SPIDAR G&amp;G: A Two-Handed Haptic Interface for Bimanual VR Interaction'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-7513651492683413587</id><published>2008-03-23T14:19:00.003-05:00</published><updated>2008-03-23T14:36:27.441-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sign language'/><category scheme='http://www.blogger.com/atom/ns#' term='neural networks'/><category scheme='http://www.blogger.com/atom/ns#' term='vision'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>Taiwan sign language (TSL) recognition based on 3D data and neural networks</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Lee and Tsai implemented a vision-based hand gesture recognition system to classify 20 hand TSL signs.  The system used hand features based on visual distances, and 8 reflective markers were placed on the hand to assist in these readings.  The features are then sent into a back-propogation neural network (BPNN) that had 15 features as inputs and the 20 gesture probabilities as outputs.&lt;br /&gt;&lt;br /&gt;The features used include the distances between a wrist point and the finger tips, and the distances between each finger pair (spread).&lt;br /&gt;&lt;br /&gt;10 students tested the system and produced 2788 gestures, of which half went to training and the other half to testing.  The authors tested on neural networks with 2 hidden layers varying in size from 25 x 25 to 250 x 250.  The best results were with the BPNN with 250 x 250 hidden nodes, with a testing accuracy of 94.65%.  Two gestures were heavily confused because the only difference was the length of the finger shown (i.e., the fingers were bent in one gesture).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This was a pretty decent use of neural nets, and I'm glad that they gave the results at different hidden layers and the recognition rates for each gesture.  In fact, now that I think about it, I'm just glad they gave results.  These are definitely the best results I've seen and quite promising: one of their main issues was a good feature to distinguish between bent fingers and non-bent fingers.&lt;br /&gt;&lt;br /&gt;The differences between 150x150 and 250x250 are statistically insignificant, but they might be more significant when more gestures are added.  I especially like that there is little discrepancy between training and testing sets, which hopefully indicates that their approach works for the general user.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-7513651492683413587?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/7513651492683413587/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=7513651492683413587' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7513651492683413587'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7513651492683413587'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/03/taiwan-sign-language-tsl-recognition.html' title='Taiwan sign language (TSL) recognition based on 3D data and neural networks'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-734929163844044899</id><published>2008-03-18T13:40:00.004-05:00</published><updated>2008-03-18T13:54:04.931-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='wii remote'/><category scheme='http://www.blogger.com/atom/ns#' term='HMM'/><category scheme='http://www.blogger.com/atom/ns#' term='gesture'/><title type='text'>Wiizards: 3D Gesture Recognition for Game Play Input</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Kratz, Smith, and Lee use Wiimotes in a game where two wizards cast spells to damage one another.   Each spell consists of a series of gestures and modifiers, and a wizard can block a spell by performing a blocking gesture and then mimicking their opponent's casting gestures.&lt;br /&gt;&lt;br /&gt;Wii controller accelerometer data is used to  gather a 3-dimensional  gravitational reading for the three &lt;span style="font-style: italic;"&gt;x&lt;/span&gt;, &lt;span style="font-style: italic;"&gt;y&lt;/span&gt;, &lt;span style="font-style: italic;"&gt;z&lt;/span&gt; axes.   An observation vector is a collection of these data values, and Gaussians are applied to the observations to determine distribution probabilities.  Classification maximizes over the probability that a gesture sequence was performed, given the observation data.&lt;br /&gt;&lt;br /&gt;Without training, their system's HMM model with 15 states has around 50% accuracy and varies widely.  Training can boost the accuracy to around 90%, but training cannot be performed in a real-time environment.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I'm curious as to how long it actually takes the system to train.  The axis for the training figure did not specify, and if it only takes 30 seconds to train, this is not much longer than an initial load screen (and it would only have to happen once).  If it takes 30 minutes to train, then we have a problem.&lt;br /&gt;&lt;br /&gt;Also, the number of gestures in the system would hurt this time factor.  Even 10 seconds over 100 gestures is unacceptable.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-734929163844044899?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/734929163844044899/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=734929163844044899' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/734929163844044899'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/734929163844044899'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/03/wiizards-3d-gesture-recognition-for.html' title='Wiizards: 3D Gesture Recognition for Game Play Input'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-8076381359233817643</id><published>2008-03-18T13:16:00.002-05:00</published><updated>2008-03-18T13:36:12.858-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='haptics'/><category scheme='http://www.blogger.com/atom/ns#' term='sensors'/><category scheme='http://www.blogger.com/atom/ns#' term='vibration'/><title type='text'>TIKL: Development of a Wearable Vibrotactile Feedback Suit for Improved Human Motor Learning</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Lieberman and Breazeal created a system to train human motor movements.  Their hardware uses small vibrotactile actuators built into a sensory vest and sleeve.  For each joint/sensor if the angle of the student's joint is different than the angle of the teacher (known) joint, then feedback is given to the joint.  Higher errors increase the vibrating 'force field' effect.&lt;br /&gt;&lt;br /&gt;To test their system, the authors had 40 subjects split into a pure visual group and a visual/haptic group.  These subjects were given the task to mimic movements of a teacher.  Overall, the error for the feedback group was much less than the error for the visual group, even after repeated trials.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Well written paper, clear subject manner, and a damn good results and evaluation section.  I don't have much more to say than that.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-8076381359233817643?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/8076381359233817643/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=8076381359233817643' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/8076381359233817643'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/8076381359233817643'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/03/tikl-development-of-wearable.html' title='TIKL: Development of a Wearable Vibrotactile Feedback Suit for Improved Human Motor Learning'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-8329711775214967949</id><published>2008-03-16T16:49:00.003-05:00</published><updated>2008-03-16T17:24:04.909-05:00</updated><title type='text'>AStEINDR</title><content type='html'>&lt;span style="font-weight: bold;"&gt;S:&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;ST-Isomap PCA LLE LSCTN KNTN CTN ATN MDS SCTN DOF WTF&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;D:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I might come back to this later when I'm not so mad.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-8329711775214967949?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/8329711775214967949/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=8329711775214967949' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/8329711775214967949'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/8329711775214967949'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/03/asteindr.html' title='AStEINDR'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-2370154685364000140</id><published>2008-03-16T16:22:00.003-05:00</published><updated>2008-03-16T16:48:58.563-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='vision'/><category scheme='http://www.blogger.com/atom/ns#' term='hand tracking'/><category scheme='http://www.blogger.com/atom/ns#' term='PCA'/><category scheme='http://www.blogger.com/atom/ns#' term='ICA'/><title type='text'>Articulated Hand Tracking by PCA-ICA Approach</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Kato &lt;span style="font-style: italic;"&gt;et al. &lt;/span&gt;used Independent Component Analysis (ICA) to find basis vectors for hand motion features.  The authors first use Principle Components Analysis to reduce the dimensionality of their system, and then they use ICA to find a set of vectors that are statistically independent from each other (i.e., basis vectors).&lt;br /&gt;&lt;br /&gt;Data on 20 angles was collected with a glove.  The authors then sandwiched all of the data for the 20 sensors together into one large vector; each sensor was sampled across 100 time points, and the data from all 20 sensors was merged into a 2000-dimension vector.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_wO3RkO--l7I/R92SVNd4TfI/AAAAAAAAAAU/1RbHAIjh7Qc/s1600-h/Untitled.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp3.blogger.com/_wO3RkO--l7I/R92SVNd4TfI/AAAAAAAAAAU/1RbHAIjh7Qc/s320/Untitled.jpg" alt="" id="BLOGGER_PHOTO_ID_5178456039635832306" border="0" /&gt;&lt;/a&gt;ICA is used to find the basis vectors for a hand such that a linear combination of these vectors will produce a desired hand movement.  The basis vectors &lt;span style="font-weight: bold;"&gt;U&lt;/span&gt; are found through a weight matrix &lt;span style="font-weight: bold;"&gt;W&lt;/span&gt; and a sample of motion data &lt;span style="font-weight: bold;"&gt;X &lt;/span&gt;(where &lt;span style="font-weight: bold;"&gt;X&lt;/span&gt; is a matrix of hand motions).  A neural learning algorithm (in this case, gradient descent) is used to calculate the weights.  The resulting 5 basis vectors are the movement of each finger individually.&lt;br /&gt;&lt;br /&gt;The authors then deviated from their abstract and discussed actually tracking a hand using particle filtering.  A hand's current position can be estimated from its prior positions, so each basis vector can estimate where it believes the finger will be given its prior positions.  The authors also segment a hand out of an image by doing some thresholding on an image and overlaying a hand model in the image to find the hand location.&lt;br /&gt;&lt;br /&gt;There are no results.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There are no results.&lt;br /&gt;The basis vectors seem obvious, but I'm glad that ICA found them.&lt;br /&gt;There are no results.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-2370154685364000140?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/2370154685364000140/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=2370154685364000140' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/2370154685364000140'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/2370154685364000140'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/03/articulated-hand-tracking-by-pca-ica.html' title='Articulated Hand Tracking by PCA-ICA Approach'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp3.blogger.com/_wO3RkO--l7I/R92SVNd4TfI/AAAAAAAAAAU/1RbHAIjh7Qc/s72-c/Untitled.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-7348503089825618040</id><published>2008-03-06T12:57:00.002-06:00</published><updated>2008-03-06T13:16:51.331-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HMM'/><category scheme='http://www.blogger.com/atom/ns#' term='glove'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>A Hidden Markov Model Based Sensor Fusion Approach for Recognizing Continuous Human Grasping Sequences</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Bernardin et al. created a system to recognize human grasping gestures using a CyberGlove and pressure sensor data.  Basic grasps are distinguished in 14 different ways by Kamakura's grasping primitives.  These grasps include "5 power grasps, 4 intermediate grasps, 4 precision grasps, and one thumbless grasp".&lt;br /&gt;&lt;br /&gt;To recognize these grasps, an 18 sensor CyberGlove is used, along with finger tip and palm sensors.  14 different pressure sensors are sewn into a glove, which is worn under the CyberGlove.  The sensor data is passed into HMMs for recognition.  A 9-state HMM is built for each gesture using the HTK.  After each grasp, the grasped object must be released.&lt;br /&gt;&lt;br /&gt;On a total of 112 training gestures from 4 users, the user dependent models were between 77 and 92%, whereas a user independent model was in the low 90s for all 4 users.  This is most likely due to the increase in training data when all the user data is combined.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I thought the use of HMMs in this paper was actually quite good.  The problem I have with HMMs is that they are absolutely horrible and explode when data is not properly segmented.  In the case of grasps, it is probably less likely that somebody is going to go from 1 grasp to another without releasing the object they are holding.  For most, general cases, the computer can assume that the lack of tactile input from the palm would indicate a grasp has ended.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-7348503089825618040?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/7348503089825618040/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=7348503089825618040' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7348503089825618040'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7348503089825618040'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/03/hidden-markov-model-based-sensor-fusion.html' title='A Hidden Markov Model Based Sensor Fusion Approach for Recognizing Continuous Human Grasping Sequences'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-4393490763489995223</id><published>2008-03-06T12:40:00.003-06:00</published><updated>2008-03-06T12:57:08.680-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='user interfaces'/><category scheme='http://www.blogger.com/atom/ns#' term='sketching'/><category scheme='http://www.blogger.com/atom/ns#' term='3D inference'/><category scheme='http://www.blogger.com/atom/ns#' term='user study'/><title type='text'>The 3D Tractus: A Three-Dimensional Drawing Board</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Lapides et al. designed and built a Tablet PC stand that can move vertically, allowing for a 3D drawing platform that switches the screen's view as the table is moved.  The authors state that using the 3D Tractus will allow for a "direct mapping between physical and virtual spaces."&lt;br /&gt;&lt;br /&gt;The frame of the 3D Tractus consists of aluminum bars and a table top, along with a counterweight that will balance the weight of the tablet and allow for the table top to slide up and down easier.  The counterweight has to be tuned for each tablet's weight.  A height sensor is built into the frame.&lt;br /&gt;&lt;br /&gt;The drawing software for the system takes into account the height of the table when displaying a viewing angle to the user.  The system uses line width as a depth cue, with farther lines thin and closer lines thick.  An orthographic (cube) projection is used to demonstrate 3D depth, as well.  Also, nothing of the sketch is displayed above the current tablet surface.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Although the idea of having a tactile way to sketch in 3D sounds appealing, the system could be implemented much better without a tactile, movable desk.  Instead, having a z-axis button/wheel/control in the software will alleviate the issues with custom counterweights, a height constraint, awkward hand/arm positioning, and lack of mobility.&lt;br /&gt;&lt;br /&gt;Also, the system is rather constrained with any large sketches since the user can move in the tablet's plane in infinite direction, but the vertical plane is limited to something like 40 centimeters.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-4393490763489995223?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/4393490763489995223/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=4393490763489995223' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4393490763489995223'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4393490763489995223'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/03/3d-tractus-three-dimensional-drawing.html' title='The 3D Tractus: A Three-Dimensional Drawing Board'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-2572888293127278395</id><published>2008-03-05T11:41:00.003-06:00</published><updated>2008-03-05T12:26:54.631-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sign language'/><category scheme='http://www.blogger.com/atom/ns#' term='decision tree'/><category scheme='http://www.blogger.com/atom/ns#' term='gesture'/><category scheme='http://www.blogger.com/atom/ns#' term='adaboost'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>Temporal Classification: Extending the Classification Paradigm to Multivariate Time Series</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Kadous created TClass, which uses metafeatures of gesture data to form a syntactic representation of a gesture.&lt;br /&gt;&lt;br /&gt;The system was tested on two data sets: Auslan and Nintendo data.  Auslan is an Australian sign language that has signs not like ASL, although the overall ideas of hand shape, location, and movement are present in the language.  The Nintendo data comes from a Powerglove test set comprised of 95 basic hand movements.&lt;br /&gt;&lt;br /&gt;Tests were conducted using a Powerglove (P5) and a Flock of Birds.  On Kadous's tests, the initial error rate for TClass was extremely high compared to the best error rates (for both sets of data).   Using AdaBoost, the system's accuracy became more tolerable, but it was never as good as a fine, hand-picked set of features.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion&lt;/span&gt;:&lt;br /&gt;&lt;br /&gt;I have mixed feelings about this system.  I like the addition of metafeatures that are readable with TClass, but I also don't quite know what to make of the system's poor accuracy in some cases.  The accuracy results presentation was confusing, since the author gave horrible results first, then semi-poor results after when using AdaBoost, but the horrible results also included a TClass with AdaBoost (AB) field, so what the hell is going on?  Also, the explanation that the Nintendo dataset is "hard" does not fly; if a "naive" algorithm beats you , you cannot say that poor results are because of the test set.&lt;br /&gt;&lt;br /&gt;Nevertheless, I think that research in this area of trying to find both accurate and understandable results is worthwhile.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-2572888293127278395?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/2572888293127278395/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=2572888293127278395' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/2572888293127278395'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/2572888293127278395'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/03/temporal-classification-extending.html' title='Temporal Classification: Extending the Classification Paradigm to Multivariate Time Series'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-5335241445144238438</id><published>2008-03-04T13:14:00.005-06:00</published><updated>2008-03-05T11:41:30.547-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='motion tracking'/><category scheme='http://www.blogger.com/atom/ns#' term='hand tracking'/><category scheme='http://www.blogger.com/atom/ns#' term='sensors'/><title type='text'>Using Ultrasonic Hand Tracking to Augment Motion Analysis Based Recognition of Manipulative Gestures</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Ogris et al. use ultrasonics to track hand motion in a 3D environment.  The ultrasonics, when combined with other data from motion sensors, can greatly improve recognition rates.&lt;br /&gt;&lt;br /&gt;Ultrasonics emit a sound beacon, which is then reflected back to sensors.  Because ultrasonics use sound waves, the beacon is susceptible to reflection, occlusion, and temporal issues.  Reflection is where the wave reflects off a surface at an odd angle, occlusions are blocked signals, and the temporal issues involve the time it takes for the sound to bounce back and forth.  These issues limit ultrasonics to controlled, indoor scenarios.  Placing the sensors on hands or other moving appendages is also a problem with ultrasonics, since all of the above problems can occur with fast moving parts.&lt;br /&gt;&lt;br /&gt;To test the ultrasonics, the authors used a bicycle repair setup where the performer had 3 ultrasonic sensors and 9 gyroscopes on their arms, legs, and body.  The performer then made various bicycle repair gestures, such as screwing/unscrewing, pumping, and wheel spinning.&lt;br /&gt;&lt;br /&gt;Using a k-nearest-neighbor (kNN) approach to classification, the accuracy of the system jumps when using ultrasonics as opposed to just using motion sensors.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The use of ultrasonics probably does help the system.  I am still not convinced that the ultrasonics themselves are useful, though.  More sensors can almost always improve accuracy of a system, but since they "overlapped" the gyroscopes with ultrasonics at points the accuracy jump must be from the sensor type and not quantity.&lt;br /&gt;&lt;br /&gt;My main issue is that ultrasonics seem to have an incredibly low Hz rate, or at least the sensors the authors were using were quite poor.  Furthermore, noise problems (via bouncing signals, background sonics, or fast-moving sensors) seem to heavily detract from the ultrasonic's usage.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-5335241445144238438?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/5335241445144238438/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=5335241445144238438' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5335241445144238438'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5335241445144238438'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/03/using-ultrasonic-hand-tracking-to.html' title='Using Ultrasonic Hand Tracking to Augment Motion Analysis Based Recognition of Manipulative Gestures'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-8243862956841546382</id><published>2008-02-27T17:15:00.005-06:00</published><updated>2008-02-28T14:03:12.186-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sign language'/><category scheme='http://www.blogger.com/atom/ns#' term='HMM'/><category scheme='http://www.blogger.com/atom/ns#' term='vision'/><category scheme='http://www.blogger.com/atom/ns#' term='gesture'/><category scheme='http://www.blogger.com/atom/ns#' term='glove'/><category scheme='http://www.blogger.com/atom/ns#' term='user study'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>American Sign Language Recognition in Game Development for Deaf Children</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Brashear et al. use GT2k to create an American Sign Language game for deaf children.  The system, called CopyCat, teaches language skills to children by having them sign various sentences to interact with a game environment.&lt;br /&gt;&lt;br /&gt;A Wizard of Oz study was used to gather data and design their interface.  A desk, mouse, and chair was used in the study, along with a pink glove.  The students pushed a button and then signed a gesture, and the data was collected using the glove and an IEEE 1394 video camera.  The users were 9- to 11-year-olds.&lt;br /&gt;&lt;br /&gt;The hand is pulled from the video image by its bright color.  The image pixel data is converted to a HSV color space histogram, which is used to binarize the data and find the hand.  Accelerometers are also used to track hand movement in &lt;span style="font-style: italic;"&gt;x&lt;/span&gt;, &lt;span style="font-style: italic;"&gt;y&lt;/span&gt;, and &lt;span style="font-style: italic;"&gt;z&lt;/span&gt; positions.&lt;br /&gt;&lt;br /&gt;The data from five children was analyzed for user-dependent and -independent models.  User-dependence was validated in a 90/10 (training/testing) split, with word accuracy in the low 90s and and sentence accuracy around 70%.  The standard deviation for the sentence accuracy is very high, with approximately at 12% deviation.&lt;br /&gt;&lt;br /&gt;User-independent models were lower with an average word accuracy of 86.6% and a sentence accuracy of 50.64%.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I like the author's user study with the Wizard of Oz to collect real-world data from children.  The system's performance (in essence, GT2k's performance) was very low with sentences, which indicates that segmentation is the largest issue with the toolkit.  I'm also worried about the 90/10 split for the user dependent models.  That is a huge ratio of training to testing data, and it might be skewing the results to show higher than normal accuracy.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-8243862956841546382?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/8243862956841546382/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=8243862956841546382' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/8243862956841546382'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/8243862956841546382'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/02/american-sign-language-recognition-in.html' title='American Sign Language Recognition in Game Development for Deaf Children'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-2680954953146253290</id><published>2008-02-27T16:30:00.002-06:00</published><updated>2008-02-28T11:22:47.740-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='segmentation'/><category scheme='http://www.blogger.com/atom/ns#' term='gesture'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>A Method for Recognizing a Sequence of Sign Language Words Represented in a Japanese Sign Language Sentence</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Sagawa and Takeuchi created a Japanese Sign Language recognition system that uses "rule-based matching" and segments gestures based on hand velocity and direction changes.&lt;br /&gt;&lt;br /&gt;There are thresholds of direction vector changes that account for the segmentation.  There are also issues to determine which hand (or both) are being used for gestures, and these are determined by the direction and velocity change thresholds.&lt;br /&gt;&lt;br /&gt;The system achieved 86.6% accuracy for signed words, and 58% accuracy for signed sentences.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;There's not much to discuss with this paper.  The "nugget" of research is with the use of direction and velocity changes to segment the gestures.  I became more interested in this paper since I learned it was published a year before Sezgin's, but not by much.&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-2680954953146253290?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/2680954953146253290/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=2680954953146253290' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/2680954953146253290'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/2680954953146253290'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/02/method-for-recognizing-sequence-of-sign.html' title='A Method for Recognizing a Sequence of Sign Language Words Represented in a Japanese Sign Language Sentence'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-322463547814483574</id><published>2008-02-25T15:58:00.003-06:00</published><updated>2008-02-26T13:29:42.328-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HMM'/><category scheme='http://www.blogger.com/atom/ns#' term='gesture'/><title type='text'>Georgia Tech Gesture Toolkit: Supporting Experiments in Gesture Recognition</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;Researchers from Georgia Tech have created a gesture toolkit called GT2k.  The purpose behind GT2k is to allow researchers to focus on system development instead of recognition.  The toolkit works in conjunction with the Hidden Markov Model Toolkit (HTK) to provide HMM tools to a developer.  GT2k usage can be divided into four categories: preparation, training, validation, and recognition.&lt;br /&gt;&lt;br /&gt;Preparation involves the developer setting up an initial gesture model, semantic gesture descriptions, and gesture examples.  Each model is a separate HMM, and GT2k allows either automatic model generation for novices, or user-generated for experts.  Grammars for the model are created in a rule-based fashion and allow for the definition of complex gestures based on simpler ones.  Data collection is done with whatever sensing devices are needed.&lt;br /&gt;&lt;br /&gt;Training the GT2k models can be done in two ways: cross-validation and leave-one-out.  Cross-validation involves separating the data into 2/3 for training and 1/3 for testing.  Leave-one-out involves training on the entire set minus one data element, and repeating this process for each element in the set.  The results for cross-validation are computed in a batch, whereas the overall statistics for leave-one-out are calculated by each model's performance.&lt;br /&gt;&lt;br /&gt;Validation checks to see that the training provided a model that is "accurate enough" for recognition.  The process uses substitution, insertion, and deletion errors to calculate this accuracy.&lt;br /&gt;&lt;br /&gt;Recognition occurs once valid data is received by a trained model.  The GT2k abstracts this process away from the user of the system and calculates the likelihood of each model using hte Viterbi algorithm.&lt;br /&gt;&lt;br /&gt;The remainder of the paper listed possible applications for GT2k including: a gesture panel for controlling a car stereo, a blink recognition system, a mobile sign language system, and a "smart" workshop that understands what actions a user is performing.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;GT2k seems like a good system that can help beginning researchers more easily add HMMs into their gesture systems without worrying about implementation issues.  Yet, the applications mentioned for GT2k are rather weak in both their concept and their results.  HMMs are really only "needed" for one of the applications (sign language), whereas the other applications can be done more easily with simple techniques or moving the sensors away from a hand gesture.&lt;br /&gt;&lt;br /&gt;This was a decent paper in writing style, presentation, and (possibly) contribution, but I'm curious to know what researchers have used GT2k and the systems they have created with it.&lt;br /&gt;&lt;br /&gt;As a side note, I also am unclear as to why leave-one-out training is good, since with a large data set training the system could take a hell of a long time.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-322463547814483574?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/322463547814483574/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=322463547814483574' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/322463547814483574'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/322463547814483574'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/02/georgia-tech-gesture-toolkit-supporting.html' title='Georgia Tech Gesture Toolkit: Supporting Experiments in Gesture Recognition'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-807897709802292523</id><published>2008-02-25T15:20:00.005-06:00</published><updated>2008-02-25T15:58:41.174-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='virtual environments'/><category scheme='http://www.blogger.com/atom/ns#' term='vision'/><category scheme='http://www.blogger.com/atom/ns#' term='augmented reality'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>Computer Vision-Based Gesture Recognition For An Augmented Realtiy Interface</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Storring &lt;span style="font-style: italic;"&gt;et al. &lt;/span&gt;from Aalborg University created an augmented reality system to create a "less obtrusive and more intuitive" interface.&lt;br /&gt;&lt;br /&gt;The gestures used in the system are the mapped to the hand signs for 0-6, i.e. no fist, index finger, index and middle, etc.  This gesture set can be recognizable in a 2D plane with a camera.  In order for these gestures to work, the hand needs to be segmented from the image.  The authors use normalized RGB values, called chromaticities, to minimize the variance of the color intensity.  The distributions for the background and skin chromaticities are found and are modeled as 2D Gaussians.  The hands are assumed to be a minimum number and maximum number of pixels.&lt;br /&gt;&lt;br /&gt;Gestures are found by counting the number of fingers.  A polar transformation counts the number of spikes (fingers) on currently shown on the hand.  Click gestures can be found by checking the bounding box width of the hand between the regular index finger gesture and a "thumb click" addition.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;For a system that is supposed to be less obtrusive and more intuitive than current interfaces, virtual reality with unintuitive gestures does not seem like a good solution.  Using "finger numbers" is a poor choice, and having a gigantic head-mounted display with cameras is probably less comfortable than looking at a computer screen.  Furthermore, if the authors are focusing on using head equipment, why not just use gloves to increase the gesture possibilities?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-807897709802292523?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/807897709802292523/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=807897709802292523' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/807897709802292523'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/807897709802292523'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/02/computer-vision-based-gesture.html' title='Computer Vision-Based Gesture Recognition For An Augmented Realtiy Interface'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-7927904930069765922</id><published>2008-02-21T13:30:00.004-06:00</published><updated>2008-02-25T15:20:18.525-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sign language'/><category scheme='http://www.blogger.com/atom/ns#' term='dynamic time warping'/><category scheme='http://www.blogger.com/atom/ns#' term='vision'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>3D Visual Detections of Correct NGT Sign Production</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Lichtenauer &lt;span style="font-style: italic;"&gt;et al.&lt;/span&gt; created an interactive Dutch sign language system that would help train children to use the correct gesture.  Their system has various requirements including: working under mixed lighting, being user independent, having immediate response, adaptive to skill level, and invariance to valid signs.&lt;br /&gt;&lt;br /&gt;The authors' system uses two cameras to digitally track a person's head and hands, and a touch screen is placed in front of the user for software interactivity.  The skin color of the person is first determined by finding the face, which is done by having a system's operator press a pixel inside of the face and a pixel around the outside of the head.  These pixels than provide a way to train the skin color model of the system, which is a a Gaussian perpendicular in RGB space.  The face and hands are separated into a Left and Right RGB distribution; the authors feel that a light source will typically coming from one direction, such as an open window.  Hands are detected through their number of skin pixels, and the motion of a hand starts the tracking.&lt;br /&gt;&lt;br /&gt;The system uses fifty 2D and 3D properties (features) related to hand location and movement.  These properties are assumed to be independent, and base classifiers for each figure are computed and summed together to get a total classification value.  These base classifiers use Dynamic Time Warping (DTW) to find the correspondence between two feature signals over time.  These classifiers are trained with the "best" 50% of the training set for each feature.  A sign is classified as correct if the average classifier probability for a class is above a threshold.&lt;br /&gt;&lt;br /&gt;The results from the authors mention that they achieve "95% true positives" of the data. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In class, we have already discussed the issue of having a 95% positive rate, since the system is set up so that each symbol is known and the user is supposed to gesture the correct system.  Always returning true will produce 100% accuracy.&lt;br /&gt;&lt;br /&gt;I think the larger issue is that the classifier itself needs to be tested independent of the system.  Theoretically, a separate classifier can be fine tuned for each gesture so that it can correctly recognize a single gesture 100% of the time.  The issues involved with using a generic classifier will then be avoided.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-7927904930069765922?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/7927904930069765922/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=7927904930069765922' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7927904930069765922'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7927904930069765922'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/02/3d-visual-detections-of-correct-ngt.html' title='3D Visual Detections of Correct NGT Sign Production'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-1580012042468562940</id><published>2008-02-20T15:11:00.004-06:00</published><updated>2008-02-21T13:30:24.940-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='user interfaces'/><category scheme='http://www.blogger.com/atom/ns#' term='vision'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>Television Control by Hand Gesture</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Freeman and Weissman devised a way to control a TV with hand gestures using computer vision.  In their system, the user's hand acts as a mouse.  The user moves their open hand in front of the camera, palm facing toward the television, and the computer detects their hand and maps it to an on-screen mouse.  When the user holds their hand over a control for a brief time period, the control is executed.  Closing their hand or moving it out of the computer's vision deactivates the mouse.&lt;br /&gt;&lt;br /&gt;The hand movement is detected by checking the angle difference between two vectors of pixels, where the pixels correspond to the pixels in an image frame and its offset.  The dx and dy information is calculated for the image gradient, and this provides an orientation that can be handled in different lighting scenarios.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This paper was quaint.  The actual algorithms used were rather simple, but the concept of controlling a TV via hand waving intrigued me.  My main concern is that this application would train people watching a TV to not make any sudden movements so that the on-screen menu would not appear.  Also, it forces people to walk through a living room slowly so that the TV does not catch their hand in any rapid movements.  Some better gestures would benefit this system, such as twisting motions for channel or volume control.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-1580012042468562940?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/1580012042468562940/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=1580012042468562940' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1580012042468562940'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1580012042468562940'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/02/television-control-by-hand-gesture.html' title='Television Control by Hand Gesture'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-1927843593737631232</id><published>2008-02-20T13:33:00.003-06:00</published><updated>2008-02-20T15:11:52.984-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='gesture'/><category scheme='http://www.blogger.com/atom/ns#' term='survey'/><category scheme='http://www.blogger.com/atom/ns#' term='posture'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>A Survey of Hand Posture and Gesture Recognition Techniques and Technology</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This paper by LaViola presented a summary of key gesture recognition techniques.   Hand posture and gesture recognition was divided up into several categories: feature extraction, statistics, models, and learning approaches.  Some approaches, such as template matching, are more suited for postures, whereas HMMs are used solely for gestures.  Feature extraction is used for both, but the feature set can be computationally heavy for the large dimension spaces.&lt;br /&gt;&lt;br /&gt;Possible applications for gestures and postures include sign language, presentation assistance, 3D modeling, and virtual environments.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;This paper is a good summary of current techniques and their strengths and weaknesses.  There's not much to summarize in the paper since the summarizing an 80 page summary is rather dull and pointless, but I will be referring back to this paper for any future work.&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-1927843593737631232?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/1927843593737631232/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=1927843593737631232' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1927843593737631232'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1927843593737631232'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/02/survey-of-hand-posture-and-gesture.html' title='A Survey of Hand Posture and Gesture Recognition Techniques and Technology'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-1348085835791191059</id><published>2008-02-18T17:17:00.002-06:00</published><updated>2008-02-20T13:33:16.293-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='user interfaces'/><category scheme='http://www.blogger.com/atom/ns#' term='glove'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>Real-time Locomotion Control by Sensing Gloves</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Komura and Lam propose using P5 gloves to control character motion.  The authors feel that using a "walking fingers" can provide a more tangible interface for controlling motion than traditional joystick or keyboard techniques.&lt;br /&gt;&lt;br /&gt;The authors use a P5 glove for their gesture capture, and the user first calibrates the fingers by moving them in time with a given walking animation displayed on a computer screen.  This calibration happens by a simple function comparing the cycle of the user's fingers versus the cycle of the animation.&lt;br /&gt;&lt;br /&gt;After calibration, the user's fingers should be in-sync with the walking motions.  For animating quadrupeds, there might need to be a phase shift between the back and front legs.&lt;br /&gt;&lt;br /&gt;To test their system, the authors used a CyberGlove and had users play mock games with characters jumping and navigating a maze.  Their results showed that navigating with the glove is potentially easier in terms of the number of collisions in a maze, and the glove and keyboard controls allow maze navigation in approximately the same time.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There's not much to say about this paper.  The results that they gave were odd, since User 2 completed the maze with a keyboard in 18 seconds but had 22 collisions, and with the glove in 31 seconds with 3 collisions.  I'm not sure what to make of that data...&lt;br /&gt;&lt;br /&gt;Other than that, the research aspect of this paper basically took a finger sine and mapped it to an animation's sine.  It might make navigating in certain games easier, but only if you need to control the speed of the character with better precision.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-1348085835791191059?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/1348085835791191059/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=1348085835791191059' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1348085835791191059'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1348085835791191059'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/02/real-time-locomotion-control-by-sensing.html' title='Real-time Locomotion Control by Sensing Gloves'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-48359739676035417</id><published>2008-02-13T16:28:00.003-06:00</published><updated>2008-02-13T16:49:06.422-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='virtual environments'/><category scheme='http://www.blogger.com/atom/ns#' term='gesture'/><category scheme='http://www.blogger.com/atom/ns#' term='haptics'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>Shape Your Imagination: Iconic Gestural-Based Interaction</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Marsh and Watt performed a user study to determine how people represent different types of objects using only hand gestures.  Gestures can be either substitutive (where the gestures act as if the object is being interacted with) or virtual (which describe the object in a virtual world).&lt;br /&gt;&lt;br /&gt;The authors had 12 subjects of varying academic degree and major make gestures for the primitives &lt;span style="font-style: italic;"&gt;circle, triangle, square, cube, cylinder, sphere, &lt;/span&gt;and&lt;span style="font-style: italic;"&gt; pyramid.&lt;/span&gt;  The users also gestured the complex and compound shapes for &lt;span style="font-style: italic;"&gt;football, chair, French baguette, table, vase, car, house, &lt;/span&gt;and &lt;span style="font-style: italic;"&gt;table-lamp.  &lt;/span&gt;The users were told to gesture the describe the shapes with non-verbal hand gestures.&lt;br /&gt;&lt;br /&gt;Overall, users used virtual hand depictions (75%) over substitutive (17.9%), with some objects having both gestures (7.1%).  3D shapes were always expressed with two hands, whereas primitives had some one-handed gestures  (27.8%), like &lt;span style="font-style: italic;"&gt;circle.  &lt;/span&gt;Some objects were too hard for certain users to gesture, such as &lt;span style="font-style: italic;"&gt;chair&lt;/span&gt; (4) and &lt;span style="font-style: italic;"&gt;French baguette &lt;/span&gt;(1).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The user study was interesting in some respects, such as seeing how the majority of people describe objects by their virtual shapes, but overall I was disappointed by the paper.  Images showing the various stages of depiction would have really helped, as well as actual answers from the questionnaire.&lt;br /&gt;&lt;br /&gt;I was confused as to whether the authors were looking for only hand gestures or allowed full body movement, since the authors mention that they wanted hand gestures to the users but they did not seem to care that many users walked around the room.  That's a pretty large detail that they glossed over.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-48359739676035417?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/48359739676035417/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=48359739676035417' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/48359739676035417'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/48359739676035417'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/02/shape-your-imagination-iconic-gestural.html' title='Shape Your Imagination: Iconic Gestural-Based Interaction'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-7602704832163042234</id><published>2008-02-13T15:06:00.004-06:00</published><updated>2008-02-13T15:37:58.601-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='neural networks'/><category scheme='http://www.blogger.com/atom/ns#' term='gesture'/><category scheme='http://www.blogger.com/atom/ns#' term='haptics'/><category scheme='http://www.blogger.com/atom/ns#' term='glove'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>A Dynamic Gesture Recognition System for the Korean Sign Language (KSL)</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Kim, Jang, and Bien use fuzzy min-max neural networks to recognize a small set of 25 basic Korean Sign Language gestures. The authors use two data gloves, each with 10 flex sensors, 3 location (&lt;span style="font-style: italic;"&gt;x, y, z&lt;/span&gt;) sensors, and 3 orientation (&lt;span style="font-style: italic;"&gt;pitch, yaw, roll&lt;/span&gt;).&lt;br /&gt;&lt;br /&gt;Kim et al. find that the 25 gestures they use contain 10 different direction types, shown below&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_wO3RkO--l7I/R7NdRdAE_0I/AAAAAAAAAAM/viV1sHa1uDc/s1600-h/directions.bmp"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp2.blogger.com/_wO3RkO--l7I/R7NdRdAE_0I/AAAAAAAAAAM/viV1sHa1uDc/s320/directions.bmp" alt="" id="BLOGGER_PHOTO_ID_5166575751948205890" border="0" /&gt;&lt;/a&gt;The authors also discovered that the data often has derivations within 4 inches of other data, so the &lt;span style="font-style: italic;"&gt;x&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;y&lt;/span&gt; coordinates are split into 8 separate regions from -16 to 16 inches, with 4 inch ticks.  The change in &lt;span style="font-style: italic;"&gt;x, y&lt;/span&gt; direction (CD) is recorded for each time step simply as + and - symbols, and this data is recorded for four steps.  CD change templates are then made for the 10 directions, D&lt;span style="font-size:78%;"&gt;1&lt;/span&gt; ... D&lt;span style="font-size:78%;"&gt;10&lt;span style="font-size:100%;"&gt;.&lt;br /&gt;&lt;br /&gt;The 25 gestures contain 14 different hand gestures based on finger flex position.  This flex value is sent to a fuzzy min-max neural network (FMMN) that separates the flex angles within a 10-dimensional "hyper box".&lt;br /&gt;&lt;br /&gt;To classify a full gesture, the change of direction is first taken and compared against the templates, and then the flex angles are run through the FMNN.  If the total (accuracy/probability) value is above a threshold, the gesture is classified.&lt;br /&gt;&lt;br /&gt;The authors achieve approximately 85% accuracy.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Although this paper had some odd sections and interesting choices, such as making the time step 1/15th of a second and having gestures over 4/15ths of a second, the overall idea is quaint.  I appreciate that the algorithm separates the data into two categories--direction change and flex angle--and separates the two components to hierarchically choose gestures. &lt;br /&gt;&lt;br /&gt;I still do not like the use of neural networks, but if they work I am willing to forgive.  My annoyance is also alleviated by the fact that the authors provide thresholds and numerical values for some equations within the network.&lt;br /&gt;&lt;br /&gt;I'm very curious why they chose those 10 directions (from the figure).  D&lt;span style="font-size:78%;"&gt;1&lt;/span&gt; and D&lt;span style="font-size:78%;"&gt;8&lt;/span&gt; could be confused if the user is sloppy, and D&lt;span style="font-size:78%;"&gt;4&lt;/span&gt; and D&lt;span style="font-size:78%;"&gt;7&lt;/span&gt; can be confused with their unidirection counterparts if the user is does their gestures slower than 1/4 of a second.  Which is, of course, absurd.&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-7602704832163042234?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/7602704832163042234/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=7602704832163042234' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7602704832163042234'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7602704832163042234'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/02/dynamic-gesture-recognition-system-for.html' title='A Dynamic Gesture Recognition System for the Korean Sign Language (KSL)'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp2.blogger.com/_wO3RkO--l7I/R7NdRdAE_0I/AAAAAAAAAAM/viV1sHa1uDc/s72-c/directions.bmp' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-6180261717404947849</id><published>2008-02-11T17:00:00.000-06:00</published><updated>2008-02-12T13:41:07.537-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='POMDP'/><title type='text'>A Survey of POMDP Applications</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Cassandra's survey summarizes some uses for partially observable Markov decision problems.  MDPs are useful in artificial intelligence and planning applications.  The overall structure of these problems involves states and transitions between the states, with costs associated with the transitions and states.  The goal of a robot/problem is to find an optimal solution (policy) to a problem in the least number of transitions.&lt;br /&gt;&lt;br /&gt;The POMDP model consists of:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;States&lt;/li&gt;&lt;li&gt;Actions&lt;/li&gt;&lt;li&gt;Observations&lt;/li&gt;&lt;li&gt;A state transition function&lt;/li&gt;&lt;li&gt;An  observation function&lt;/li&gt;&lt;li&gt;An immediate reward function&lt;/li&gt;&lt;/ul&gt;Cassandra's paper focuses on examples of using POMDPs, but he describes them in more detail here:  http://www.pomdp.org/pomdp/index.shtml.  Basically, they are MDP problems in which you cannot observe the entire state.&lt;br /&gt;&lt;br /&gt;Some example applications include:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Machine maintenance - parts of the machine are modeled as states, and the goal is to minimize the repair costs or maximize the up-time on the machine.&lt;/li&gt;&lt;li&gt;Autonomous robots - robots need to navigate or accomplish a goal with a set of actions, and the world is not always observable&lt;/li&gt;&lt;li&gt;Machine vision - determining where to focus higher resolution (i.e., fovea) of the computer image to focus on specific parts such as hands and heads of people.&lt;/li&gt;&lt;/ul&gt;POMDPs have a number of limitations.  One limitation is that the states need to be discrete.  Although continuous states can be discretized, some domains can have trouble with this step.  The main issue with POMDPs is in their computation limits.  POMDPs become intractable rather quickly since their state spaces are exponential.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;This paper had little to nothing to do with what we've been currently discussing in class.  Although POMDPs are interesting from a theoretical standpoint, their intractability is a huge factor for avoiding them in any practical domain.  I've been trying to think of how to even apply them to gesture recognition, and one idea I came up with included modeling hand positions as states for a single gesture, but then it just becomes an HMM with a reward function, and I'm not sure how beneficial a reward function is when taking the computation costs into account. &lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-6180261717404947849?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/6180261717404947849/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=6180261717404947849' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/6180261717404947849'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/6180261717404947849'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/02/survey-of-pomdp-applications.html' title='A Survey of POMDP Applications'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-2412409746687650662</id><published>2008-02-11T16:24:00.000-06:00</published><updated>2008-02-12T13:03:00.488-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HMM'/><category scheme='http://www.blogger.com/atom/ns#' term='gesture'/><category scheme='http://www.blogger.com/atom/ns#' term='haptics'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>Simultaneous Gesture Segmentation and Recognition based on Forward Spotting Accumulative HMMs*</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Song and Kim's paper proposes a way to use a sliding window for HMM gesture recognition.  The window of 3 slides across observation sequences O, and a probability estimate for a gesture is determined to be the average of the partially observable probabilities at each timestep in the window.  The algorithm also performs "forward spotting", which has something to do with the difference between the maximum probability for a gesture we find and the probability of a "non-gesture" at the same timestep.   The non-gesture is a wait class that consists of an intermediate, junk state.  As long as the "best" gesture probability is greater than the non-gesture probability by some threshold, then the gesture is classified accordingly.&lt;br /&gt;&lt;br /&gt;The authors also use accumulative HMMs, which basically take the power set of continuous segmentations within a window and find the combination that produces the highest probability for a gesture.&lt;br /&gt;&lt;br /&gt;The set of gestures that the authors classify consists of 8 simple arm position gestures (e.g., arms out, left arm out, etc.).  They report recognition rates between 91% and 95%, depending on their choice of thresholds.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;The system might work fine, but I really cannot tell because their test set is so simple.  The 8 gesture they present are easily separable, and template matching algorithms can distinguish between them with ease.  I also feel that their system is intractable as you start adding more gestures or gestures that vary widely in time length--adding more gestures adds an overhead to the probability calculations, and varying the length would likely cause the window to be reconfigured to be larger, which would explode the power set step.&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-2412409746687650662?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/2412409746687650662/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=2412409746687650662' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/2412409746687650662'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/2412409746687650662'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/02/simultaneous-gesture-segmentation-and.html' title='Simultaneous Gesture Segmentation and Recognition based on Forward Spotting Accumulative HMMs*'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-173381145280271193</id><published>2008-02-07T13:33:00.000-06:00</published><updated>2008-02-10T19:21:55.545-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='segmentation'/><category scheme='http://www.blogger.com/atom/ns#' term='haptics'/><category scheme='http://www.blogger.com/atom/ns#' term='SVD'/><category scheme='http://www.blogger.com/atom/ns#' term='PCA'/><title type='text'>A similarity measure for motion stream segmentation and recognition</title><content type='html'>&lt;div&gt;Chuanjun, L. and B. Prabhakaran (2005). A similarity measure for motion stream segmentation and recognition. &lt;em&gt;Proceedings of the 6th international workshop on Multimedia data mining: mining integrated media and complex data&lt;/em&gt;. Chicago, Illinois, ACM.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;strong&gt;Summary:&lt;/strong&gt; &lt;/div&gt;&lt;br /&gt;Li and Prabhakaran propose a way to "segment" streams of motion data by using singular value decomposition (SVD).  SVD is similar to principal component analysis (PCA), and the technique finds the underlying geometric structure of a matrix (i.e., its eigenvectors and values).  By using the singular values of matrices storing motion data, the matrices can be compared in similarity by measuring the angular differences (dot products) of these vectors. &lt;br /&gt;&lt;br /&gt;The authors store motion data in a matrix consisting of columns of features and rows of timesteps.  The first 6 eigenvectors are used when comparing matrix similarity; this value was empirically determined.  The segmentation part of the paper involves separating this stream of data after every &lt;span style="font-style: italic;"&gt;l&lt;/span&gt; timesteps, and then comparing the similarity of the segmented matrix to stored eigenvectors and values for a known motion.&lt;br /&gt;&lt;br /&gt;To test their system, the authors merged individual motions together into a "stream" of data and inserted noise inbetween motions.  The authors noted that the number of eigenvectors needed to distinguish between matrices (originally, &lt;span style="font-style: italic;"&gt;k = &lt;/span&gt;6) varied depending on the data collection method.  Their paper reported recognition rates in the mid 90s, but these results depend on how similar motions are to one another.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Although the paper has little to do with segmentation, the actual algorithm for comparing motion data seems interesting and appears to achieve relatively accurate results.  I would like to know the actual motions that users performed, since I have no idea what motions are required in Taiqi and Indian dances.  They also did not mention the number of people involved in the data capturing, and I assume this number to be close to 1 since they needed a user to wear a motion suit.&lt;br /&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt; &lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-173381145280271193?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/173381145280271193/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=173381145280271193' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/173381145280271193'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/173381145280271193'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/02/similarity-measure-for-motion-stream.html' title='A similarity measure for motion stream segmentation and recognition'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-5971947806532148786</id><published>2008-02-06T16:00:00.000-06:00</published><updated>2008-02-07T13:38:22.140-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='haptics'/><category scheme='http://www.blogger.com/atom/ns#' term='music'/><category scheme='http://www.blogger.com/atom/ns#' term='glove'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>Cyber Composer: Hand Gesture-Driven Intelligent Music Composition and Generation</title><content type='html'>&lt;span style="font-size:+0;"&gt;Ip, H. H. S., K. C. K. Law, et al. (2005). Cyber Composer: Hand Gesture-Driven Intelligent Music Composition and Generation. &lt;em&gt;Proceedings of the 11th International Multimedia Modelling Conference&lt;/em&gt;, 2005.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="FONT-WEIGHT: bold"&gt;Summary:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Ip et al. created Cyber Composer, which is a music generation program controlled via hand gestures. The author's motivation is to inspire both musicians and casual listeners to experience music in a new way.&lt;br /&gt;&lt;br /&gt;The authors split music composition into three parts: melody, rhythm, and tone. The melody is the "main" part of the music and mainly includes the treble parts, such as the singer. The rhythm keeps the beat of the music and is played by the drums and bass. Tonal accompaniment involves creating harmony across all parts.&lt;br /&gt;&lt;br /&gt;In order to keep the tone (harmony) of the music interesting and flowing, the authors create a small "chord affinity" matrix that describes certain chord lead/following strengths. During music composition, chords are automatically chosen with high affinity. Melody notes are also chosen automatically to create musical "tension".&lt;br /&gt;&lt;br /&gt;The system was implementing using two 22 sensor CyberGloves and two Polhemus positioning receivers. MIDI was used to produce the musical notes.&lt;br /&gt;&lt;br /&gt;The seven gestures used in the system include rhythm, pitch, pitch-shifting, dynamics, volume, dual-instrument mode, and cadence. Rhythm is controlled by the flexing of the right wrist. Pitch is controlled by right-hand height, and it is reset at the beginning of each bar. The user can also "shift" the pitch by performing a similar gesture. Note dynamics and volume are controlled by the right-hand finger flex, with fully flexed fingers forcing forte notes. Dual-instrument mode allows a harmony melody or unison melody to be played along with the main instrument; this mode is activated using the left hand. To end the piece, the left-hand fingers are closed.&lt;br /&gt;&lt;br /&gt;There are no results.&lt;br /&gt;&lt;br /&gt;&lt;span style="FONT-WEIGHT: bold"&gt;&lt;br /&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This paper aroused me. Some of the gestures they defined were intuitive, such as opening and closing of the fingers for volume and moving the hand up and down for notes. Other gestures just seem awkward, such as the ambiguous dual-instrument mode and constantly flapping your wrist (ouch?) to drive the melody.&lt;br /&gt;&lt;br /&gt;I'm familiar with building music composition programs (including "smart" programs that use musical theory to assist composition), and I think this program was trying to market itself as something that it could never become. A music tool has to be either robust to allow experts to use it, sacrifice some features to become simple for novices, or fun for just the casual listener. In the expert category I would place Finale, and on the casual end I would place music games such as Guitar Hero. Novice programs are harder to come by, and the tool I worked on was ImproVisor--a system that used intelligent databases to analyze input notes and determine if the notes "sounded good".&lt;br /&gt;&lt;br /&gt;CyberGlove is trying to do everything at once and failing. The lack of &lt;span style="FONT-STYLE: italic"&gt;any&lt;/span&gt; results, even the casual comment by an offhand user, tells me that the system is rather convoluted to use or poor for composition. The hand gestures cannot really control notes in a way that experts would use the system, novices will not understand the theory behind why their hand waving sounds good or bad, and casual musicians will probably have no idea what is going on.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="FONT-WEIGHT: bold"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-5971947806532148786?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/5971947806532148786/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=5971947806532148786' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5971947806532148786'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5971947806532148786'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/02/cyber-composer-hand-gesture-driven.html' title='Cyber Composer: Hand Gesture-Driven Intelligent Music Composition and Generation'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-5913817382407049045</id><published>2008-02-03T21:34:00.000-06:00</published><updated>2008-02-04T15:47:57.014-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='segmentation'/><category scheme='http://www.blogger.com/atom/ns#' term='haptics'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>Hand Tension as a Gesture Segmentation Cue</title><content type='html'>Philip A. Harling and Alistair D. N. Edwards. Hand tension as a gesture segmentation cue. Progress in Gestural Interaction: Proceedings of Gesture Workshop '96, pages 75--87, Springer, Berlin et al., 1997&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Summary:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span&gt;Harling and Edwards describe a way to segment hand gestures based on hand tension.  The basic idea is that as a user dynamically moves between static postures &lt;span style="font-style: italic;"&gt;a&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;b&lt;/span&gt;, their hand will reach a "relaxed", low-tension minimum position &lt;span style="font-style: italic;"&gt;c&lt;/span&gt; that is less tense than either &lt;span style="font-style: italic;"&gt;a&lt;/span&gt; or &lt;span style="font-style: italic;"&gt;b&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Smaller details:&lt;br /&gt;&lt;/span&gt;&lt;ol&gt;&lt;li&gt;&lt;span&gt;To find the tension for each finger, the authors use Hooke's Law and treat a finger as if it were a spring&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;The total hand tension is the sum of the finger tensions&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;They used a Mattel PowerGlove&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The idea behind the paper was actually quite good for recognizing between static postures.  I have a feeling that the hand tension will not work well for moving gestures since there would be small segmentations within the gesture.&lt;br /&gt;&lt;br /&gt;I'm disappointed at their lack of results.  I can forgive other papers that were user studies, but I cannot forgive a paper that does not report easily obtainable results when they spent 8 pages discussing a topic that I summarized in one sentence.  Segmentation is rather simple to gather data for, and a published paper should at least attempt to find an accuracy number.&lt;br /&gt;&lt;br /&gt;On a technical note, I'm curious as to how hand tension is affected by the type of glove worn.  I have a feeling that my "hand relaxed" position is going to be different for a P5 glove than it will be for a CyberGlove or even a CyberGlove with a Flock of Birds attached.  All the extra weight will most likely force my hand into resting upon the equipment for support.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-5913817382407049045?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/5913817382407049045/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=5913817382407049045' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5913817382407049045'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5913817382407049045'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/02/hand-tension-as-gesture-segmentation.html' title='Hand Tension as a Gesture Segmentation Cue'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-2903174340566948133</id><published>2008-02-03T20:16:00.000-06:00</published><updated>2008-02-03T21:33:20.260-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='decision tree'/><category scheme='http://www.blogger.com/atom/ns#' term='haptics'/><category scheme='http://www.blogger.com/atom/ns#' term='glove'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>A Multi-Class Pattern Recognition System for Practical Finger Spelling Translation</title><content type='html'>Hernandez-Rebollar, J. L., R. W. Lindeman, et al. (2002). A multi-class pattern recognition system for practical finger spelling translation. Multimodal Interfaces, 2002. Proceedings. Fourth IEEE International Conference on.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;Summary:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Hernandez-Rebollar et al. have a two part paper: they present a glove (AcceleGlove) and they have a test platform for the glove that uses decision trees.&lt;br /&gt;&lt;br /&gt;The AcceleGlove contains 5 accelerometer sensors placed at the middle joint of fingers.  Each accelerometer has x and y angles that can be measured, ending in a total of 10 sensor readings every 10 milliseconds.   The raw data matrix consisting of just the x and y values is transformed into a separate Xg, Yg, and yi values.  Xg (x global) measures the finger orientation, roll, and spread.  Yg measures the finger bentness of the hand.  The third component classifies the hand into three values: closed, horizontal, and vertical.  This third component is actually only the index finger's y-component (only in the ASL letters 'F' and 'D' is the index finger not accurate for this measurement).&lt;br /&gt;&lt;br /&gt;To classify a posture/gesture, the decision tree first breaks up the letters into vertical, horizontal, and closed.    Then the gestures are classified further as rolled, flat, pinky up, and these sections then recognize between the actual letters.&lt;br /&gt;&lt;br /&gt;They mention a 100% recognition rate for 21 gestures, with 78% being the worst gesture accuracy.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I like this paper for 2 main reasons:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;There are no HMMs&lt;/li&gt;&lt;li&gt;They did not use a CyberGlove&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;The paper's results and decision tree theory are a bit lacking, but I think that the ideas behind the paper were good and refreshingly different from the ochoish other papers we've read.&lt;br /&gt;&lt;br /&gt;I'm curious as to how well the glove they designed can work with gestures instead of postures.  The glove polls each accelerometer sequentially, which could be a problem with very quick gestures.  This issue is probably not too important, but it might provide slightly more error than a batch poll.&lt;br /&gt;&lt;br /&gt;I'm also curious as to how they designed their decision tree.  The intuition behind the partitioning is not made clear, except for the main partition of open/close/horizontal.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-2903174340566948133?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/2903174340566948133/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=2903174340566948133' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/2903174340566948133'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/2903174340566948133'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/02/summary-hernandez-rebollar-et-al.html' title='A Multi-Class Pattern Recognition System for Practical Finger Spelling Translation'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-52970480842586101</id><published>2008-01-30T14:56:00.000-06:00</published><updated>2008-02-04T15:49:51.742-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='virtual environments'/><category scheme='http://www.blogger.com/atom/ns#' term='segmentation'/><category scheme='http://www.blogger.com/atom/ns#' term='HMM'/><category scheme='http://www.blogger.com/atom/ns#' term='haptics'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>A Dynamic Gesture Interface for Virtual Environments Based on Hidden Markov Models</title><content type='html'>Qing, C., A. El-Sawah, et al. (2005). A dynamic gesture interface for virtual environments based on hidden Markov models. Haptic Audio Visual Environments and their Applications, 2005. IEEE International Workshop on.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The authors of this paper used the HMM &amp;amp; CyberGlove dynamic duo in conjunction with standard deviations.&lt;br /&gt;&lt;br /&gt;Qing et al. claim that using the standard deviation of finger positions allows them to fix the "gesture spotting" (segmentation/fragmentation) issue with a continuous data stream.  The glove data is sampled at 10Hz, and then the standard deviations of each sensor are calculated.  The standard deviations also help transform a series of vectors (observations) into a single vector.  They then take this vector and perform VQ on it to get a discrete value.&lt;br /&gt;&lt;br /&gt;The three gestures they used to test their system controlled the rotation of a cube.  The gestures included 1 finger bending, 2 fingers bending, and a twisting motion with your thumb.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Sigh, no results.  I have no idea how the system actually solves the gesture spotting problem because they are just trading the "is this observation the start of a gesture?" problem into a "does this standard deviation vector look like it might be the start of a gesture?" problem.&lt;br /&gt;&lt;br /&gt;Also, with only three gestures standard deviations might work for distinguishing between gestures.  But continually moving one's hand indicates that the standard deviation for every finger will be fluctuating wildly.&lt;br /&gt;&lt;br /&gt;I now know more about the bone structure of a hand.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-52970480842586101?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/52970480842586101/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=52970480842586101' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/52970480842586101'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/52970480842586101'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/01/dynamic-gesture-interface-for-virtual.html' title='A Dynamic Gesture Interface for Virtual Environments Based on Hidden Markov Models'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-970816867223576818</id><published>2008-01-30T13:46:00.000-06:00</published><updated>2008-01-30T14:48:32.990-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='online learning'/><category scheme='http://www.blogger.com/atom/ns#' term='HMM'/><category scheme='http://www.blogger.com/atom/ns#' term='haptics'/><category scheme='http://www.blogger.com/atom/ns#' term='robotics'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>Online, Interactive Learning of Gestures for Human/Robot Interfaces</title><content type='html'>Lee, C. and X. Yangsheng (1996). Online, interactive learning of gestures for human/robot interfaces. Robotics and Automation, 1996. Proceedings., 1996 IEEE International Conference on.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Lee and Yangsheng created a HMM system that allows for online updating of gestures.  If the system is certain about a gesture (i.e., above or below a threshold), then the system performs the action associated with the gesture.  Otherwise, the system asks the user for the gesture's confirmation.  The HMM then updates through using the Baum-Welch algorithm (an EM algorithm for finding state and transition probabilities for an HMM given data).&lt;br /&gt;&lt;br /&gt;Their system uses a CyberGlove to capture the hand gestures.  The gestures are first captured from the glove, then resampled and smoothed before performing vector quantization.  Gestures are segmented by having the user stop or remain still for a short time.&lt;br /&gt;&lt;br /&gt;Gestures are evaluated on a logarithmic scale of the sums of the probability of the model / probability of the observation sequence.  If the gesture is below a threshold it is considered correct, and if it us above the threshold it is considered suspect or incorrect.&lt;br /&gt;&lt;br /&gt;The domain for testing the system was 14 sign language letters that were distinct enough to be used with VQ.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;I'm very confused by the graphs they give.  They mention that if their "V" values corresponding to the correct/incorrect threshold are below -2, then the gesture is correct.  Yet their graphs only show 2 examples ever even bordering on the -2 mark; all other values were way below -2.  Does this mean that their system was always confident?&lt;br /&gt;&lt;br /&gt;I also have an issue with telling the computer what the correct gesture is.  Although I've done almost the exact same thing in recent work, hand-gesturing systems are geared toward non-keyboard-monitor use.  For instance, to control a robot, I'd probably be looking at the robot and not a monitor.  In the field I would not want to turn around, find my keyboard, punch up the correct gesture, and continue.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-970816867223576818?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/970816867223576818/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=970816867223576818' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/970816867223576818'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/970816867223576818'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/01/online-interactive-learning-of-gestures.html' title='Online, Interactive Learning of Gestures for Human/Robot Interfaces'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-3557747231237230933</id><published>2008-01-28T16:31:00.001-06:00</published><updated>2008-01-30T13:46:20.650-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HMM'/><category scheme='http://www.blogger.com/atom/ns#' term='haptics'/><category scheme='http://www.blogger.com/atom/ns#' term='robotics'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>An Architecture for Gesture-Based Control of Mobile Robots</title><content type='html'>Iba, S., J. M. V. Weghe, et al. (1999). An architecture for gesture-based control of mobile robots. Intelligent Robots and Systems, 1999.  IROS '99. Proceedings. 1999 IEEE/RSJ International Conference on.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;br /&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Iba et al.  describe a gesture-based control scheme for robots.  HMMs are used to define seven gestures: closed fist, open hand, wave left, wave right, pointing, opening, and "wait".  These gestures correspond to actions that a robot can take, such as accelerating and turning.&lt;br /&gt;&lt;br /&gt;The mobile robot that the system uses has IR sensors, sonar sensors, a camera, and a wireless transmitter.  The gesture capturing is done with a CyberGlove with 18 sensors.&lt;br /&gt;&lt;br /&gt;Gesture recognition is performed with an HMM-based recognizer.  The recognizer first preprocesses the sensor data to change the 18-dimensional sensor data into a 10-dimensional feature vector.  The derivatives of each feature are computed as well, to produce a 20-dimensional column.  Each column is then reduced to a "codeword" that maps the input to one of 32 possible codewords, or actions.  This codebook is trained offline, and at runtime the feature vectors are mapped to a codeword.&lt;br /&gt;&lt;br /&gt;The HMM takes a sequence of codewords and determines which gesture the user is performing.  It is important to note that if no suitable gesture is found, the recognizer can return "none".  To overcome some HMM problems, the "wait state" is the first node in the model and transitions to the other 6 gestures.  If no gesture is currently seen, the wait state is the most probable.   As more observations push the gesture toward another state, the correct gesture probability is altered and the gesture spotter picks the gesture with the highest score.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span&gt;I'd have liked to know the intuition behind using 32 codewords.  The inclusion of the wait state is also odd in combination with the "opening" state, which does not seemed to be mapped to anything.  So technically the opening state is a wait+1 for either the close or opened state.  I don't have much more to say on this one.&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-3557747231237230933?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/3557747231237230933/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=3557747231237230933' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/3557747231237230933'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/3557747231237230933'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/01/architecture-for-gesture-based-control.html' title='An Architecture for Gesture-Based Control of Mobile Robots'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-3568725396146996957</id><published>2008-01-28T15:22:00.000-06:00</published><updated>2008-01-28T16:05:55.530-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='virtual environments'/><category scheme='http://www.blogger.com/atom/ns#' term='user interfaces'/><category scheme='http://www.blogger.com/atom/ns#' term='sketching'/><category scheme='http://www.blogger.com/atom/ns#' term='haptics'/><title type='text'>HoloSketch: A Virtual Reality Sketching / Animation Tool</title><content type='html'>&lt;div class="skiptwolines"&gt;Deering, Michael F.  &lt;span style="font-style: italic;"&gt;HoloSketch: A Virtual Reality Sketching/Animation Tool.&lt;/span&gt;  (1995)&lt;span style="font-style: italic;"&gt;&lt;/span&gt; ACM Transactions on Computer-Human Interaction.&lt;br /&gt;&lt;/div&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Deering's 3D VR system, HoloSketch, aimed to allow the creation of three-dimensional objects in a virtual reality environment.  Users donned VR goggles with a supercool 960x680 20'' CRT monitor and interacted with the virtual world via a six-axis wand.  The head-tracking goggles allow the user to look around images hovering in front of them.&lt;br /&gt;&lt;br /&gt;HoloSketch prides itself in displaying stable images that do not "float" or "swim" as the user moves their head.  They accomplish this by having a highly accurate absolute orientation tracker in the goggles.  The use of a flat-screen CRT also helps, as well as program corrections for interocular distances.&lt;br /&gt;&lt;br /&gt;A good chunk of the paper focused on user interactions, such as menu navigation.  Deering's system uses a 3D pie (radial) menu that can be activated by holding down the right click on the wand.  The user can then navigate the menu while holding the button and "poke" the menu to activate submenus and items.   To create and draw objects, the user first selects a primitive from the menu and then places the primitive by hitting a button on the wand.  The user can then rotate, size, and position the object using a combination of wand-waving and keyboard buttons.&lt;br /&gt;&lt;br /&gt;Users can also create animations with the system.  Some animations require still shots of slightly altered objects that can be grouped temporally (like a VR flipbook).  Other animations can be added to objects or groups, such as a rotor property or blinking colors.&lt;br /&gt;&lt;br /&gt;An artist tested the system for a month and provided feedback.  Overall the artist found the tool easy to work with after a few days, although some of the features available in other applications were missing from HoloSketch.  One issue that Deering noticed was the lack of a user's head movement when trying to view the object; users are too used to stable heads that examining an object from different angles was not intuitive.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;HoloSketch seems like an interesting application and provides a variety of ideas, some which I believe are beneficial, while others are not.  The "poking" of menus seems intuitive, and if the system has a high absolute accuracy this should work well.  Yet, Deering mentioned how user's arms can get tired and are unstable, and supporting an arm and wrist is out of the question when you try to make an environment natural.  Instead HoloSketch had some button that reduced the jitter somehow when activated, which seems like a hack that allows for a quick fix of a potentially serious issue with using the system.&lt;br /&gt;&lt;br /&gt;I also understand why people would not want to constantly move their head around the display.  If the display was on a round table this would be a non-issue, but constantly moving around in a chair and leaning different directions is a strain to a user.  Furthermore, the 20" CRT is not that large of a screen the the user would be able to "see" all around the object;  I would have liked to know the actual viewing angle.&lt;br /&gt;&lt;br /&gt;Overall, though, I liked the system and the paper itself was well-written and gave a good overview of the features.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-3568725396146996957?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/3568725396146996957/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=3568725396146996957' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/3568725396146996957'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/3568725396146996957'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/01/holosketch-virtual-reality-sketching.html' title='HoloSketch: A Virtual Reality Sketching / Animation Tool'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-7873717584306415474</id><published>2008-01-24T19:06:00.001-06:00</published><updated>2008-01-24T19:49:27.514-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HMM'/><category scheme='http://www.blogger.com/atom/ns#' term='graphical models'/><title type='text'>An Introduction to Hidden Markov Models</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Rabiner and Juang's paper on Hidden Markov Models (HMMs) introduces the models, defines the three main problems associated with HMMs, and provides examples for utilizing HMMs.&lt;br /&gt;&lt;br /&gt;HMMs are a time-dependent model that consist of observations and hidden states.  As an example, the authors discuss possible coin flip models that can have coins of varying probability (states) and transitions that probabilistically determine which coin will be flipped.  One person could continuously flip coins and record the data.  Another person is only receiving the outcomes of the flips, i.e., &lt;span style="font-style: italic;"&gt;O = O&lt;span style="font-size:78%;"&gt;1&lt;/span&gt;, ..., O&lt;span style="font-size:78%;"&gt;T&lt;/span&gt;&lt;/span&gt;,.  The person flipping is hidden to the observer.&lt;br /&gt;&lt;br /&gt;Rabiner and Juang define three main elements of HMMs as:&lt;br /&gt;&lt;br /&gt;1)  HMMs have a finite number of states, &lt;span style="font-style: italic;"&gt;N&lt;/span&gt;&lt;br /&gt;2)  A "new" state is entered at time, &lt;span style="font-style: italic;"&gt;t&lt;/span&gt;, depending on a given transition probability distribution.&lt;br /&gt;3)  Observable output is made after each transition, and this output depends on the current state.&lt;br /&gt;&lt;br /&gt;The formal notation for an HMM is:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;T&lt;/span&gt; = the time length of the observable sequences (i.e., how many observations seen)&lt;br /&gt;&lt;span style="font-style: italic;"&gt;N&lt;/span&gt; = the number of states&lt;br /&gt;&lt;span style="font-style: italic;"&gt;M&lt;/span&gt; = the number of observation symbols (if observations are discrete)&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Q&lt;/span&gt; = the states {&lt;span style="font-style: italic;"&gt;q&lt;span style="font-size:78%;"&gt;1&lt;/span&gt;, q&lt;span style="font-size:78%;"&gt;2&lt;/span&gt;, ... , q&lt;span style="font-size:78%;"&gt;N&lt;/span&gt;&lt;/span&gt;}&lt;br /&gt;&lt;span style="font-style: italic;"&gt;V&lt;/span&gt; = the observations {&lt;span style="font-style: italic;"&gt;v&lt;span style="font-size:78%;"&gt;1&lt;/span&gt;, v&lt;span style="font-size:78%;"&gt;2&lt;/span&gt;, ... , v&lt;span style="font-size:78%;"&gt;M&lt;/span&gt;&lt;/span&gt;}&lt;br /&gt;&lt;span style="font-style: italic;"&gt;A&lt;/span&gt; = the state probability distribution {&lt;span style="font-style: italic;"&gt;a&lt;span style="font-size:78%;"&gt;ij&lt;/span&gt;&lt;/span&gt;}, &lt;span style="font-style: italic;"&gt;a&lt;span style="font-size:78%;"&gt;ij&lt;/span&gt; &lt;/span&gt;= &lt;span style="font-style: italic;"&gt;P(q&lt;span style="font-size:78%;"&gt;j&lt;/span&gt; &lt;/span&gt;at&lt;span style="font-style: italic;"&gt; t + 1 | q&lt;span style="font-size:78%;"&gt;i&lt;/span&gt; &lt;/span&gt;at&lt;span style="font-style: italic;"&gt; t)&lt;/span&gt;.  The probability we are in &lt;span style="font-style: italic;"&gt;q&lt;span style="font-size:78%;"&gt;j&lt;/span&gt;&lt;/span&gt; given that we were in qi in the last timestep.&lt;br /&gt;&lt;span style="font-style: italic;"&gt;B&lt;/span&gt; = the observation symbol probability distribution in state &lt;span style="font-style: italic;"&gt;j&lt;/span&gt;, {&lt;span style="font-style: italic;"&gt;b&lt;span style="font-size:78%;"&gt;j&lt;/span&gt;&lt;/span&gt;(&lt;span style="font-style: italic;"&gt;k&lt;/span&gt;)}, &lt;span style="font-style: italic;"&gt;b&lt;span style="font-size:78%;"&gt;j&lt;/span&gt;&lt;/span&gt;(&lt;span style="font-style: italic;"&gt;k&lt;/span&gt;) = &lt;span style="font-style: italic;"&gt;P(v&lt;span style="font-size:78%;"&gt;k&lt;/span&gt; &lt;/span&gt;at&lt;span style="font-style: italic;"&gt; t | q&lt;span style="font-size:78%;"&gt;j&lt;/span&gt; &lt;/span&gt;at&lt;span style="font-style: italic;"&gt; t)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;p&lt;span style="font-size:78%;"&gt;i&lt;/span&gt;&lt;/span&gt; = initial state distribution, &lt;span style="font-style: italic;"&gt;p&lt;span style="font-size:78%;"&gt;ij&lt;/span&gt;&lt;/span&gt; = &lt;span style="font-style: italic;"&gt;P(q&lt;span style="font-size:78%;"&gt;i&lt;/span&gt; &lt;/span&gt;at&lt;span style="font-style: italic;"&gt; t = 1)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The three problems for HMMs are:&lt;br /&gt;&lt;br /&gt;1)  Given an observation sequence &lt;span style="font-style: italic;"&gt;O = O&lt;span style="font-size:78%;"&gt;1&lt;/span&gt;, ..., O&lt;span style="font-size:78%;"&gt;T&lt;/span&gt;&lt;/span&gt;, and the&lt;br /&gt;&lt;br /&gt;Solutions to these problems are presented in the paper, but mathematical symbols are difficult to represent in the blog, and many of the images used are illegible.  Instead, I'll jump to the author's discussion of uses and issues.&lt;br /&gt;&lt;br /&gt;One issue with HMMs is underflow, since the values at &lt;span style="font-style: italic;"&gt;a&lt;span style="font-size:78%;"&gt;t&lt;/span&gt;(i) &lt;/span&gt;and &lt;span style="font-style: italic;"&gt;B&lt;span style="font-size:78%;"&gt;t&lt;/span&gt;(i) &lt;/span&gt;approach zero very quickly (they are products of 0.0-1.0 probabilities).  Another issue is how to actually build HMMs, i.e. what are the transitions and states?&lt;br /&gt;&lt;br /&gt;HMMs are good for modeling sequential information where the current state relies only on the previous (or previous 2) states.  These models, such as for isolated word recognition, are easy to build and not too computationally intensive.  People usually do not insert random sounds into the middle of a word, so the probability distributions for these models are easy to build.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;br /&gt;&lt;/span&gt;&lt;span&gt;&lt;br /&gt;Overall the HMM paper is a good overview of HMMs.  I really don't have much to say about this paper, except that I wish I had page 14 and I wish that the figures were readable.&lt;br /&gt;&lt;br /&gt;As far as HMMs in hand gestures go, I have alway shied away from using HMMs because I feel that the power you get from them is offset by huge constraints and a large overhead with implementation issues and computation time.  The class could theoretically model some types of sign gestures with HMMs, but I guess we'll see what data the class gets to see if any sorts of probability distributions present themselves.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-7873717584306415474?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/7873717584306415474/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=7873717584306415474' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7873717584306415474'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7873717584306415474'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/01/introduction-to-hidden-markov-models.html' title='An Introduction to Hidden Markov Models'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-8907868754425181708</id><published>2008-01-23T16:53:00.000-06:00</published><updated>2008-01-23T17:19:48.236-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='neural networks'/><category scheme='http://www.blogger.com/atom/ns#' term='haptics'/><category scheme='http://www.blogger.com/atom/ns#' term='glove'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>American Sign Language Finger Spelling Recognition System</title><content type='html'>Allen, J., Pierre, K., and Foulds, R.  &lt;span style="font-style: italic;"&gt;American Sign Language Finger Spelling Recognition System.&lt;/span&gt;  (2003) IEEE.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Summary:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Allen et al.'s created an ASL recognition system using neural networks and an 18-sensor CyberGlove. The authors propose that a wearable glove recognition system can help translate ASL into English and assist deaf (and even blind) people by allowing them to converse with the hearing unimpaired.&lt;br /&gt;&lt;br /&gt;The authors used a character set of 24 letters, omitting 'J' and 'Z' due to their usage of arm motions.  Instead, the remaining 24 characters use only hand positions.  Data from the CyberGlove was collected and recognized in  Matlab program, and a second program called Labview would output the corresponding audio for a recognized character.&lt;br /&gt;&lt;br /&gt;The recognition system for ASLFSR is a perceptron network with an input of 18x24 (18 sensors, 24 characters) and a desired output of 24x24 (identity matrix for the recognized symbols).  The network was trained with an "adapt" function.&lt;br /&gt;&lt;br /&gt;The system worked well for a single user and had results up to 90%.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The authors claim that they can achieve a better level of accuracy by training the network on data from multiple subjects, but I completely disagree.  That's like saying a hand-tailored suit fits alright, but the pin-stripe at the blue light special is better since it has been designed for the average Joe.&lt;br /&gt;&lt;br /&gt;To improve their accuracy they should improve their model.  Perceptrons are not that powerful since they clobber values, and using some different neurons (Adalines?) might improve their results.  Also, neural networks sometimes work better with more than just 2 layers, and data from 18 non-distinct inputs would probably benefit from even a 3-layer NN .  Multiple layer NNs are notoriously tricky to design "well" (i.e. guess and check).&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-8907868754425181708?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/8907868754425181708/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=8907868754425181708' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/8907868754425181708'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/8907868754425181708'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/01/american-sign-language-finger-spelling.html' title='American Sign Language Finger Spelling Recognition System'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-6296131169483939246</id><published>2008-01-23T00:39:00.000-06:00</published><updated>2008-01-23T01:01:07.838-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='virtual environments'/><category scheme='http://www.blogger.com/atom/ns#' term='gesture'/><category scheme='http://www.blogger.com/atom/ns#' term='hand gesture'/><title type='text'>Flexible Gesture Recognition for Immersive Virtual Environments</title><content type='html'>Deller, M., A. Ebert, et al. (2006). Flexible Gesture Recognition for Immersive Virtual Environments. Information Visualization, 2006. IV 2006. Tenth International Conference on.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Deller et al.'s publication used hand-gestures with a P5 glove to control various aspects of a desktop environment.  The glove will allow users to manipulate virtual objects in three dimensions.&lt;br /&gt;&lt;br /&gt;The apparatus that the authors used is the P5 glove, which has 5 finger sensors and an infrared tracking system.  The glove was used to create hand gestures, where a gesture is a hand position held for approximately half a sentence.  Gestures are stored as sensor vector templates, and each new gesture is compared against the gesture library via a simple distance measurement.&lt;br /&gt;&lt;br /&gt;The authors had users test the system.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The application of hand gestures is simple, such as the use of distance for gesture classification.  Using a more complex classifier might improve their accuracy, but with only 5 sensors the gestures might be simple and different enough that a simple solution is necessary.&lt;br /&gt;&lt;br /&gt;I hope that presenting some results, at least in user study form, is the norm for the remaining papers we read.  I cannot really take anything from this paper since I'm not sure if anything works well.  The methods are so simple that I can implement them quickly, but it would be nice to have a baseline to compare to.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-6296131169483939246?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/6296131169483939246/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=6296131169483939246' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/6296131169483939246'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/6296131169483939246'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/01/flexible-gesture-recognition-for.html' title='Flexible Gesture Recognition for Immersive Virtual Environments'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-5622292951291535724</id><published>2008-01-22T23:14:00.000-06:00</published><updated>2008-01-23T00:38:16.497-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='virtual environments'/><title type='text'>Environmental Technology: Making the Real World Virtual</title><content type='html'>Myron, W. K. (1993). "Environmental technology: making the real world virtual." Commun. ACM 36(7): 36-37.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;Summary:&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;Kreuger's short paper described applications possible with a sensor-filled environment.  Kreuger focused on having a human be the mechanism for interaction, i.e. a person's hand and body would interact with non-wearable sensory equipment.&lt;br /&gt;&lt;br /&gt;One application had a user interact with a 1000-sensor room to project images onto a screen.  Depending on a user's position, the user would be projected into a maze or control musical notes.&lt;br /&gt;Another application showed hand projections from two people miles away via a teleconference.  The two people could interact in a shared space and discuss objects by pointing at them.&lt;br /&gt;&lt;br /&gt;A "windshield" application allowed a user to "fly" across a graphical world by manipulating their hand positions.  This application existed in Kreuger's VIDEOPLACE environment, which is basically a collection of these types of virtual world creations and interactions.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Krueger's paper mentions a great number of interesting applications but does not discuss any in detail.  Since the applications mentioned are listed as references I'll have to look them up sometime.  From the paper it sounds like some of the applications are impressive, but they were also created in the 70s and 80s so they might not work well with respect to their network and graphical capabilities.  I'm also interested to see what he has done since this.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-5622292951291535724?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/5622292951291535724/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=5622292951291535724' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5622292951291535724'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5622292951291535724'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2008/01/environmental-technology-making-real.html' title='Environmental Technology: Making the Real World Virtual'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-1609138447912906736</id><published>2007-12-05T14:49:00.000-06:00</published><updated>2007-12-10T00:47:47.452-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='reasoning'/><category scheme='http://www.blogger.com/atom/ns#' term='intelligence'/><title type='text'>What Are Intelligence? And Why?</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Randall Davis's 1996 presidential address was an overview on human intelligence.  In order to understand how artificial intelligence might be created, it is important to learn the theories involved with current human and animal reasoning.  The five views in reasoning are mathematical logic, psychology, biology, statistics, and economics.&lt;br /&gt;&lt;br /&gt;Since the paper was mainly a general overview, I'm more interested in discussing some aspects of the paper as they relate to AI.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This summary is being written post-Davis conference call, so I've had a bit of time to think about our discussion with him, as well as my thoughts from before. &lt;br /&gt;&lt;br /&gt;My main conclusion that I've reached after our call was that an artificial intelligence breakthrough shouldn't have  been developed by now.  From an evolutionary perspective, Davis mentions that intelligence is built over time from individually formed pieces.  Each part of an organ is developed over time by incrementally building on previous developments.  Even the brain is composed of different sections that are compartmentalized.  Sometime in &lt;span style="font-style: italic;"&gt;homo sapien's&lt;/span&gt; past, these compartments connected to each other in a unique way that intelligence was formed.&lt;br /&gt;&lt;br /&gt;Examining the development of AI, I noticed that each subfield of AI is just like another organ or section of the brain.  By themselves, the subfields are too focused to offer true intelligence.  Tools to recognize language are not built to recognize images, image recognition engines cannot develop plans, and planners cannot understand speech.  The only way to increase the intelligence of a system is to find ways to interconnect all of these components to offer reasoning.&lt;br /&gt;&lt;br /&gt;Each subfield should also be as developed as possible.  Right now, the evolution of AI is in individual specialization.  As highly-accurate, full systems begin to be developed, are widely available, and easy-to-use, then diverse systems will be created and the focus on specialization will fade.&lt;br /&gt;&lt;br /&gt;My main concern is that research on merging fields will be rather slow.  Current graduate students are less likely to work on areas that require expertise in multiple fields, since Ph.D.s focus on high specialization.  As the number of fields required to improve the intelligence of systems grows, research on artificial intelligence systems will require more effort from multiple professors and students.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-1609138447912906736?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/1609138447912906736/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=1609138447912906736' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1609138447912906736'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1609138447912906736'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/12/what-are-intelligence-and-why.html' title='What Are Intelligence? And Why?'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-745281015254428666</id><published>2007-11-26T15:17:00.000-06:00</published><updated>2007-11-27T00:18:56.820-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='graphical models'/><category scheme='http://www.blogger.com/atom/ns#' term='geometric recognizer'/><category scheme='http://www.blogger.com/atom/ns#' term='DBN'/><category scheme='http://www.blogger.com/atom/ns#' term='bayesian networks'/><title type='text'>SketchREAD: A Multi-Domain Sketch Recognition Engine</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Alvarado's SketchREAD is sketch recognition system built with Bayesian networks.  The engine can be tuned to run in multiple domains, and Bayesian networks allow for small errors to be corrected.&lt;br /&gt;&lt;br /&gt;SketchREAD uses a geometric sketching language, much like the one found in LADDER, to describe simple domain shapes.  The context of how these shapes appear within a domain, such as how they arrows are used to connect lineages together in family trees, is a higher level than simple geometric recognizers.  Trying every possible combination of strokes to find the "best" fit for all the shapes is time consuming.  SketchREAD seeks to model this context with Bayesian networks.&lt;br /&gt;&lt;br /&gt;Shapes themselves have hypotheses linking to primitives and constraints.  For instance, the hypothesis for an &lt;span style="font-style: italic;"&gt;Arrow&lt;/span&gt; would cause three &lt;span style="font-style: italic;"&gt;Lines&lt;/span&gt; and the constraints between them.  Higher context models can also be portrayed, such as a &lt;span style="font-style: italic;"&gt;Mother-Son&lt;/span&gt; link causing a &lt;span style="font-style: italic;"&gt;Mother&lt;/span&gt;, &lt;span style="font-style: italic;"&gt;Son&lt;/span&gt;, and a &lt;span style="font-style: italic;"&gt;Line.  &lt;/span&gt;Partial hypotheses can also be generated by incorporating "virtual" nodes that are primitive hypotheses not linked to observations.&lt;br /&gt;&lt;br /&gt;To generate hypotheses, SketchREAD has three steps:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Bottom-up: Strokes that the user draws are recognized as primitives and low-level shapes&lt;/li&gt;&lt;li&gt;Top-down: System attempts to find subshapes missing from possible interpretations.  Strokes can be reinterpreted.&lt;/li&gt;&lt;li&gt;Pruning: Unlikely interpretations are removed from considerations.&lt;/li&gt;&lt;/ol&gt;As an example, Alvarado proposes that an ellipse is drawn in a family tree domain.  This ellipse is recognized as a low-level shape, and then an interpretations for ellipse is created, as well as partial interpretations for Mother-Son, Mother-Daughter, etc.  These partial hypotheses are templates, and the shape drawn is fit into a single slot of the template.  Later, the shape can be shuffled within the template.  To keep the interpretations from exploding and being intractible, high-level hypotheses are generated in the Bayesian network from only complete templates.  Also, any polyline is assumed to be part of only one shape/interpretation.&lt;br /&gt;&lt;br /&gt;In the domain of family trees, SketchREAD improves over baseline performance in symbol recognition by reducing the errors in recognition by over 50%.  Circuit diagrams provide a harder domain, and here SketchREAD improves over a baseline by reducing the number of errors by 17%.  The time it takes to process each stroke increases with the stroke number.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;Although SketchREAD improves the accuracy for the tested domains, the final accuracy was not yet good enough to be used within any complex domain's interface, which was one of the goals of the system.  In the paper's discussion, Alvarado also mentioned this.  Also, the issue with allowing polylines to be part of only one interpretation greatly hurts circuit diagram domains, since many circuits symbols can be drawn with a single stroke.&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-745281015254428666?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/745281015254428666/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=745281015254428666' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/745281015254428666'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/745281015254428666'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/11/sketchread-multi-domain-sketch.html' title='SketchREAD: A Multi-Domain Sketch Recognition Engine'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-6024557069365658757</id><published>2007-11-19T14:38:00.001-06:00</published><updated>2007-11-19T15:03:11.841-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='features'/><category scheme='http://www.blogger.com/atom/ns#' term='text recognition'/><title type='text'>Ink Features for Diagram Recognition</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Patel et al. use ink features to divide a sketch into text and shapes via a dividing tree. Forty-six features were defined by the authors for use in the divider, all of which were defined in an appendix.&lt;br /&gt;&lt;br /&gt;Sketch data was collected from 26 people, each person drawing 9 sketches.  Each stroke in the sketch was then labeled as being part of text or a shape, and analysis was performed on these sketches to determine the relevant features distinguishing between the two components.  The authors used a statistical partitioning technique available in the R statisical package to divide the data into two components in the feature space and determine the most relevant features.  A classification tree was then built with the most relevant feature as the root of the tree, and each node defines a threshold that separates the space into text and shape components.&lt;br /&gt;&lt;br /&gt;The authors obtained great results with the simple classification tree technique.  The new classifier greatly reduced the number of misclassified strokes as a whole from both Microsoft's classifier and InkKit's.  Although in the test results the new classifier misclassifies text a bit more than either baseline system, text misclassification is much lower in the new system. &lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The actual accuracy numbers are not impressive, since the system still misclassifies too many strokes to be considered "accurate", but the system's improvement is very impressive and pushes the research on text/shape classification into a new direction.  The author's analysis of the features (and inclusion of the features in an appendix!) was appreciated, and they discussed which features the &lt;span style="font-style: italic;"&gt;Rpart&lt;/span&gt; partitioning system found most helpful.  Since this model should run very quickly after training, it could easily be combined with other models to improve text/shape recognition even more.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-6024557069365658757?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/6024557069365658757/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=6024557069365658757' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/6024557069365658757'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/6024557069365658757'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/11/ink-features-for-diagram-recognition.html' title='Ink Features for Diagram Recognition'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-7527080983921380203</id><published>2007-11-19T14:00:00.000-06:00</published><updated>2007-11-19T14:37:59.468-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='graphical models'/><category scheme='http://www.blogger.com/atom/ns#' term='bayesian networks'/><title type='text'>Bayesian Networks Without Tears</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Charniak provides a decent overview of Bayes nets in this publication.  Bayesian networks model causality and the probabilities associated with cause-and-effect relationships.  Since Bayesian networks are a highly graphical and visual model, it would be rather convoluted to try and describe them in detail here.  Instead, I will provide brief comments on topics brought up in the paper.&lt;br /&gt;&lt;br /&gt;Bayesian networks are DAGs, or directed acyclic graphs.  The reason for this is so that there are no infinite cause-and-effect relationship between two or more nodes.  The nodes themselves are states in the world, and arcs (edges) in the graph specify the causal connections between the nodes.  All nodes in the graph must have prior probabilities specified, where these probabilities are defined by experts or empirical data.&lt;br /&gt;&lt;br /&gt;One of the main attractions to use Bayesian networks is the incredible savings in calculating joint probabilities for the graph.  In a typical case where there are &lt;span style="font-style: italic;"&gt;n&lt;/span&gt; binary variables, there would be 2^&lt;span style="font-style: italic;"&gt;n - &lt;/span&gt;1 joint probabilities in order to obtain a complete distribution.  Yet, in Bayesian networks, the model grows exponentially only with the causal connections entering &lt;span style="font-style: italic;"&gt;per node.  &lt;/span&gt;If a node has only 1 connection entering in, then the node will have 2 joint probabilities.  2 connections provides 4 joint probabilities, 3 connections 8 joint probabilites, etc.   This phenomenon is attributed to independence assumptions present in Bayesian networks.  Two nodes are dependent on each other only if there is a &lt;span style="font-style: italic;"&gt;d-connecting path&lt;/span&gt; between the two nodes, where a d-connecting path is defined as being a path where either the two nodes are linear or diverging from a source, or if, given the evidence E, the nodes are converging to a node in the evidence.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-7527080983921380203?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/7527080983921380203/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=7527080983921380203' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7527080983921380203'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7527080983921380203'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/11/bayesian-networks-without-tears.html' title='Bayesian Networks Without Tears'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-1381197741595828633</id><published>2007-11-14T15:05:00.001-06:00</published><updated>2007-11-19T14:00:46.319-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='HMM'/><category scheme='http://www.blogger.com/atom/ns#' term='belief propagation'/><category scheme='http://www.blogger.com/atom/ns#' term='likelihood'/><category scheme='http://www.blogger.com/atom/ns#' term='DBN'/><title type='text'>Sketch Interpretation Using Multiscale Models of Temporal Patterns</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Certain sketch domains contain appropriate temporal information that can assist in symbol recognition.  For instance, digital circuit diagrams can be highly time-dependent when restricted to certain symbols.  Resistors are typically drawn in order, as are capacitors and batteries.  Using HMMs to take advantage of this temporal information can improve sketch recognition accuracy.&lt;br /&gt;&lt;br /&gt;Sezgin uses a HMM modeled with DBNs to maximize the likelihood of the observable features given the grouping's label.  The DBN model takes observables as input, obtained through features computed on the grouping's primitives, and infers the probability of a stroke-level model given the observables.  The observables are also modeled with a mixture of Gaussians, although I'm not sure what the mixture model is used for.  When this DBN is combined into an HMM, the to other nodes added include an object hypothesis and an ending hypothesis.  The object hypothesis predicts the object type (Resistor, Wire, etc.), whereas the ending hypothesis predicts when the symbol is finished drawing.&lt;br /&gt;&lt;br /&gt;The inference of a DBN is linear, whereas the inference on an HHMM (hierarchical HMM) is &lt;span style="font-style: italic;"&gt;O&lt;/span&gt;(&lt;span style="font-style: italic;"&gt;T&lt;/span&gt;^3).  Therefore, Sezgin converts the model to a DBN before inference is conducted.  This step was not explained.  During training, the use of continuous variables could cause numerical underflow during belief propagation.  A specialized algorithm, the Lauritzen-Jensen belief propagation algorithm, was used to avoid the instability issues.&lt;br /&gt;&lt;br /&gt;Overall, the model worked well in the domain and improved the recognition (lowered the error rates) for all 8 participants involved in the test.  Since the model relies on time, any interspersing (drawing two or more objects simultaneously) introduces errors.  This causes primitives to be missed in sketches, with over 6% of the primitives missed on average due to this issue.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Relying on time data is tricky with sketch recognition, since time information can only be used in certain domains.  Circuit diagram recognition is not necessarily one of these domains, as shown by the interspersing data.  By increasing the model to be greater than first-order the model might be able to account for some issues, but then the model would not be able to run in real-time, which was a large proponent of the system.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-1381197741595828633?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/1381197741595828633/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=1381197741595828633' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1381197741595828633'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1381197741595828633'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/11/sketch-interpretation-using-multiscale.html' title='Sketch Interpretation Using Multiscale Models of Temporal Patterns'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-4912456325483840745</id><published>2007-11-10T21:43:00.000-06:00</published><updated>2007-11-10T22:22:03.049-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='vision'/><category scheme='http://www.blogger.com/atom/ns#' term='low-level recognizer'/><title type='text'>Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Wobbrock et al.'s $1 recognizer is a system designed to be a simple recognizer that does not rely on any mathematical background.  With this recognizer, novices to sketch recognition can have an use simple gestures in their interfaces.&lt;br /&gt;&lt;br /&gt;The $1 recognizer has four steps: (1) resampling, (2)  rotation, (3) scaling, and (4) classification.  Points in a gesture are resampled into &lt;span style="font-style: italic;"&gt;N&lt;/span&gt; equidistant points defined by the developer.  The gesture is then rotated so that the line between the center of the gesture and the starting point is at the 0 degree position, i.e. the center-start point axis is at 3 o'clock.  The gesture is then scaled to fit within a square of some size and translated so that the center of the gesture is at the (0,0) origin.&lt;br /&gt;&lt;br /&gt;Finally, the gesture is classified by first calculating the average distance between points in the gesture and the points in all templates (known gestures).  This is called the path-distance by the authors.  A score is determined directly from this path distance and the scaled square's size.&lt;br /&gt;&lt;br /&gt;Overall, the recognizer has good results with around 98% accuracy for simple gestures.  The classification relies heavily on the number of templates in the system.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This recognizer really does not introduce anything new to sketch recognition, but it does wrap up some basic vision and sketch recognition techniques into a simple (and given) algorithm.  The technique used by $1 is not much different then general template/bitmap comparison, except using the points is a bit nicer when working with single strokes.  On the other hand, bitmap comparison allows for multiple strokes in any order.  This technique also relies highly on visual differences in gestures, and it does not allow for the same gesture at different scales or rotations.  For instance, a forward slash gesture with a mouse or pen is a common gesture to indicate move forward in a web browser.  Similarly, a backwards slash/dash indicates "page back".  This recognizer cannot handle these cases.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-4912456325483840745?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/4912456325483840745/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=4912456325483840745' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4912456325483840745'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4912456325483840745'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/11/gestures-without-libraries-toolkits-or.html' title='Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-964920556513796953</id><published>2007-11-07T14:59:00.000-06:00</published><updated>2007-11-08T11:59:24.908-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketching'/><category scheme='http://www.blogger.com/atom/ns#' term='multimodal'/><category scheme='http://www.blogger.com/atom/ns#' term='speech'/><category scheme='http://www.blogger.com/atom/ns#' term='user study'/><title type='text'>Speech and Sketching: An Empirical Study of Multimodal Interaction</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In this paper, Adler and Davis explore multimodal speech and sketch interfaces through a user study.  Their goal is to allow the computer to provide feedback to the user as the user talks and draws, and the computer will influence the design during this process by asking questions and clarifying information.  Having the computer understand everything about the design is not the goal; instead, the computer should know enough to ask motivating questions when necessary in order to engage the user.  The system also does not want to constrain the user's drawing or speech style.&lt;br /&gt;&lt;br /&gt;The user study conducted involved 18 users in a Wizard-of-Oz study.  The users were asked to design a floor plan, full adder, AC/DC transformer, and a digital circuit.  Sketches was done on Tablet PCs in software that allowed for drawing and highlighting in 5 different colors.  During the study, the experimenter sat at a table across from the user.  The study was filmed and the audio, visual, and sketching components of the study were synchronized.&lt;br /&gt;&lt;br /&gt;The study showed some interesting results concerning color, questions, and speech timing.  Users tended to rely on multiple colors to indicate portions of the sketch.  The color linked parts of the sketch together, referred back to previous parts, and reflected the real-world colors of objects.  When speaking, users typically had phrase and word repetition when they were thinking aloud.  This could allow the computer to discern key words from the user-computer dialogue.  Responses from computer questions also caused the user to repeat the questions, and simple questions could prompt more information than what was asked.  Some users even redesigned their drawings after simple questions were asked, such as inquiring if two objects were similar.  Speech and sketching started simultaneously in the study.  Yet, certain parts of the speech, such as an entire phrase, tended to start before the sketch, and certain key words said alone tended to be heard after a sketch was started.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The two best components of Adler's study show how computers can assist humans during design steps by relying on the human design and thought process, instead of having an actual understanding.  In lieu of training the computer to understand all of the components of a design, basic understanding of object similarity and grouping should be enough to produce a motivating dialogue.  Also, the fact that the user constantly repeats words provides the computer with an indication of important information without the need of a large vocabulary.&lt;br /&gt;&lt;br /&gt;I wish the study also went into more interface issues, such as when the computer should ask a question (e.g. during sketching, during a pause, etc.).  Also, it would have been beneficial to see the average pause time of a user and if the user was speaking or mumbling during the pause by going "hmm" or something similar.  Do the pauses for sketch indicate that the user is speaking, and do pauses for speaking indicate the user is sketching?  Do the pauses for both modes line up?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-964920556513796953?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/964920556513796953/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=964920556513796953' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/964920556513796953'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/964920556513796953'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/11/speech-and-sketching-empirical-study-of.html' title='Speech and Sketching: An Empirical Study of Multimodal Interaction'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-4592671586470089874</id><published>2007-11-05T14:14:00.000-06:00</published><updated>2007-11-05T19:56:08.006-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='constraint satisfaction problem'/><title type='text'>Three main concerns in sketch recognition and an approach to addressing them</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Mahoney and Fromhertz discuss problems involved with matching models of hand-drawn sketches of stick figures.  The figures are in simple polylines, but can be in any configuration with other figures or distracting objects in the background.&lt;br /&gt;&lt;br /&gt;The system input (drawn figure) is highly variable, and the authors define certain problems in the variability.  &lt;span style="font-style: italic;"&gt;Failures of co-termination&lt;/span&gt; involve strokes over- or under-shooting one another (i.e. the endpoints of two disjoint strokes that are supposed to be connected do not touch).  &lt;span style="font-style: italic;"&gt;Articulation &lt;/span&gt;problems are encountered when strokes ore over- or under-segmented in preprocessing.  &lt;span style="font-style: italic;"&gt;Interaction with background context&lt;/span&gt; involves the figure to match or recognize set against a background of context strokes, other figures, or noise and distracting data.&lt;br /&gt;&lt;br /&gt;In this paper, the matching process involves creating a graph of the model to find (figure) and searching for a mapping between the model and a data subgraph.  Ambiguity is handled by adding alternative, plausible substructures to the graph.  This happens through &lt;span style="font-style: italic;"&gt;proximity linking&lt;/span&gt;, &lt;span style="font-style: italic;"&gt;virtual junction splitting&lt;/span&gt;,&lt;span style="font-style: italic;"&gt; spurious segment jumping&lt;/span&gt;, and &lt;span style="font-style: italic;"&gt;continuity tracing&lt;/span&gt;.  All four of these methods involve creating new links in the structure by searching for subtle connections, splitting current strokes, merging strokes together, and creating new stroke segments.&lt;br /&gt;&lt;br /&gt;Subgraph matching is translated into a constraint satisfaction problem (CSP), where each stroke (node) in the graph is a variable, and the constraints are edges between the nodes.  The final match tries to have the smallest link length, and the length of the segments should be in the appropriate ratios.  Matching can also be influenced by &lt;span style="font-style: italic;"&gt;a priori&lt;/span&gt; knowledge that defines certain components.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The use of a CSP seems to work very well in this case.  The fact that the system can discern a stick figure in a sea of seemingly random lines is rather amazing, since I can barely see the figure myself.&lt;br /&gt;&lt;br /&gt;One issue is that this system cannot seem to discern "fixed" figures, i.e. a square versus a parallelogram.   For "loose" figures, the figures must also be connected with endpoint to endpoint.  For a system with only a few strokes this can be simple, but if the user was trying to draw, say, a centipede, the system could find many good examples within a collection of strokes.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-4592671586470089874?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/4592671586470089874/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=4592671586470089874' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4592671586470089874'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4592671586470089874'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/11/three-main-concerns-in-sketch.html' title='Three main concerns in sketch recognition and an approach to addressing them'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-4813671628702933844</id><published>2007-11-05T11:56:00.000-06:00</published><updated>2007-11-05T14:14:19.927-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='graphical models'/><category scheme='http://www.blogger.com/atom/ns#' term='features'/><category scheme='http://www.blogger.com/atom/ns#' term='likelihood'/><category scheme='http://www.blogger.com/atom/ns#' term='constellation models'/><title type='text'>Constellation Models for Sketch Recognition</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In Sharon and van de Panne's paper, sketch objects are represented with constellation models. A constellation model is a visual model that captures individual and pairwise features between strokes. For their model the site features include:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The x-coordinate of the stroke's bounding box center&lt;/li&gt;&lt;li&gt;The y-coordinate of the stroke's bounding box center&lt;/li&gt;&lt;li&gt;The length of the bounding box diagonal&lt;/li&gt;&lt;li&gt;The angle of the bounding box diagonal&lt;/li&gt;&lt;/ul&gt;Interaction features include:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Delta x between the strokes&lt;/li&gt;&lt;li&gt;Delta y between the strokes&lt;/li&gt;&lt;li&gt;The minimum distance between the endpoints of stroke &lt;span style="font-style: italic;"&gt;a&lt;/span&gt; and any point on stroke &lt;span style="font-style: italic;"&gt;b&lt;/span&gt;&lt;/li&gt;&lt;li&gt;The  minimum distance between the endpoints of stroke &lt;span style="font-style: italic;"&gt;b&lt;/span&gt; and any point on stroke &lt;span style="font-style: italic;"&gt;a&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;Since the interaction features are pairwise, the model scales with the number of strokes by &lt;span style="font-style: italic;"&gt;O&lt;/span&gt;(&lt;span style="font-style: italic;"&gt;n&lt;/span&gt;^2).  The authors chose to allow labels to be optional or mandatory in order to reduce the run time of the model.&lt;br /&gt;&lt;br /&gt;Each element of the feature vectors (site &lt;span style="font-style: italic;"&gt;F&lt;/span&gt; and interaction &lt;span style="font-style: italic;"&gt;G&lt;/span&gt;) have their mean and covariances computed for the training data set. The labels in the training data then have and individual probabilistic model computed for the label's features.&lt;br /&gt;&lt;br /&gt;For the entire sketch, a likelihood function estimates the probability of an entire labeling &lt;span style="font-style: italic;"&gt;L&lt;/span&gt;. This function is given in the paper. The overall function tries to maximize the probability of an individual label, as well as the probability of the interactions between two individual labels.&lt;br /&gt;&lt;br /&gt;A maximize likelihood search tries to maximize the likelihood function defined, i.e. the find maximum probable labeling.   The search is multipass, with the first pass labeling only mandatory strokes objects.  All possible label assignments are then searched for using a "branch-and-bound search tree [where each] node in the search tree represents a partial labeling of the sketch."  The depth of the tree corresponds to the number of labels applied.  The search advances by choosing the best assignment of mandatory labels, and the "cost" of this assignment bounds the rest of the search.  The multipass algorithm also bounds branches of the tree before a full labeling is found.  If a new likelihood is worse than the bound, then the branches associated with the likelihood's labeling are pruned.  If no complete solution is found with the bounding, the bound is loosened until a labeling is found.&lt;br /&gt;&lt;br /&gt;In the results section, the multipass thresholding/bounding greatly reduces the time required to find a complete labeling.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Although the paper discussed how using a branch-and-bound search tree with multipass thresholding improves the runtime, no mention was given to the accuracy of the final system.  I've worked with systems similar to this before, and I know that the accuracy can drop drastically as the number of labels increases.  The authors seem to curb this by having "optional" labels, and by forcing each component to be drawn in a single stroke, but results for how the system's accuracy scales with the number of labels would have been benefitial.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-4813671628702933844?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/4813671628702933844/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=4813671628702933844' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4813671628702933844'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4813671628702933844'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/11/constellation-models-for-sketch.html' title='Constellation Models for Sketch Recognition'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-820669555801714178</id><published>2007-10-29T15:24:00.000-05:00</published><updated>2007-10-29T18:22:47.253-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='radial histogram'/><category scheme='http://www.blogger.com/atom/ns#' term='vision'/><category scheme='http://www.blogger.com/atom/ns#' term='overtracing'/><title type='text'>Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Oltman's Ph.D. thesis uses computer vision techniques to recognize freely-drawn symbols and sketches.  Freely-drawn sketches do not constrain the user in drawing style, so many issues need to be taken into accounted.  For instance, stroke overtracing is a large problem in free-form sketches, especially when doodles or notes are involved.  Also, noise is more prevalent, and temporal data cannot be utilized because strokes can be drawn in any order.&lt;br /&gt;&lt;br /&gt;To combat these issues, Oltman uses computer-vision based techniques that lessen the issues from overtracing and noise while ignoring any temporal features of the sketch.  The technique used is dubbed "bullseye" by Oltman, and consists of a radial partition of space around a point.  The radial partitions corresponds to a histogram that keeps track of the number of stroke points within that section/bucket/slice.  Stroke points are preprocessed to be relatively equidistant from one another.  Each symbol has a corresponding set of bullseye patterns stored in the "codebook" for the system.  Matching a series of found histograms with the trained codebook patterns allows for symbol classification.  The system is trained using a SVM on known data.&lt;br /&gt;&lt;br /&gt;To find a symbol within an entire sketch, though, is rather difficult.  Oltman tries to find "candidate regions" for a symbol by looking at small, overlapping windows of points and the combining these regions into a large regions.  The large regions are found using EM clustering techniques to group the smaller regions together.  Any cluster that is considered too large is split into smaller clusters.  The bounding box of the cluster is then taken to be the symbol's region, and the bullseye histogram is taken for the ink within the box.&lt;br /&gt;&lt;br /&gt;Overall, the system works well for individual symbols (94.4%), especially when compared to existing systems for noisy data.  The system faired slightly worse when taking the entire sketch into account, achieving an accuracy of 92.3%.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The use of vision techniques to classify freely-drawn symbols is a good idea because stroke data has so much noise.  Using vision mapping, whether it is histogram or simple pixel overlay, tends to avoid corner segmentation and overtracing issues.&lt;br /&gt;&lt;br /&gt;I find the results for Oltman's full sketch tests slightly skewed.  The vast majority of the shapes tested in the full sketch were wire &lt;span style="font-style: italic;"&gt;segments&lt;/span&gt; and resistors. Since the wires are broken down into smaller segments, there were roughly 9,000 wires, and there were only approximately 14,000-15,000 shapes total. The accuracy for resistors (2000 shapes) was also extremely high, but the accuracy for the rest of the shapes was around 60-80%.  So over 2/3 of the shapes were easy to detect with high accuracy because they were either a straight line (wire) or unique (resistor).  It seems like the system has a very hard time distinguishing between similar shapes, such as batteries vs. capacitors, mainly because the histograms are just not accurate enough to catch small variations with noisy data.&lt;br /&gt;&lt;br /&gt;I'm also interested to see how the histograms can be tweaked depending on the scale of the image.  The histograms used were hard-coded to 40 pixel radii, and possibly having a variable histogram size would help.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-820669555801714178?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/820669555801714178/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=820669555801714178' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/820669555801714178'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/820669555801714178'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/10/envisioning-sketch-recognition-local.html' title='Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-5003736173952328967</id><published>2007-10-24T20:33:00.000-05:00</published><updated>2007-10-24T23:03:10.846-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='user interfaces'/><category scheme='http://www.blogger.com/atom/ns#' term='multimodal'/><title type='text'>Naturally Conveyed Explanations of Device Behavior</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Oltmans and Davis present ASSISTANCE, a multimodal system capable of understanding simple 2D physics diagrams.  The diagrams can contain bodies, pin joints, springs, pulleys, and rods.  Arrows are used to describe movement of objects, as well as verbal cues.&lt;br /&gt;&lt;br /&gt;In ASSISTANCE, the user first draws the system they want to model.  Then, the user verbally describes the system while pointing at objects in the drawing.  ASSISTANCE constantly updates its interpretation of the drawing, and the user can ask for the computer's interpretation at any time.  This interpretation is a "causal model" for the drawn system (i.e. a sequence of cause and effect actions).&lt;br /&gt;&lt;br /&gt;To generate the causal model, ASSISTANCE first finds the degree of freedom each object has, such as rotation or translation freedom.  The system then utilizes the verbal description of the system, as well as any arrows the user draws.  Verbal information is parsed to separate key objects and actions.  For example, the phrase "Body 2 pushes Body 3" will parse into "Body 2", "pushes", and "Body 3".  These verbal phrases, as well as the drawn bodies and arrows, are converted into propositional statements, and ASSISTANCE performs reasoning using a forward-chaining algorithm and a truth maintenance system.&lt;br /&gt;&lt;br /&gt;Often, the same action will be described in multiple ways, such as with a verbal description and an arrow indicating movement.  When this happens, the two events are merged.  The system assumes that only one motion can affect a body, so multiple descriptions affecting the same body would indicate that the descriptions describe the same event.&lt;br /&gt;&lt;br /&gt;The final causal model is created by examining the causal events and constructing the most likely model for the system, given the description.  To do this, ASSISTANCE uses known causal events and plausible causal events, along with constraint propagation.  Events that do not have a cause are considered to be plausible and require an implicit cause by an outside force.  The system tries to minimize these plausible causes, and the model is created when all clauses have events.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;ASSISTANCE seems to be a great system, and I'm really curious how users evaluated it.  There was no formal evaluation for this paper, but since we're reading his thesis next week I'll find out what users say shortly.&lt;br /&gt;&lt;br /&gt;I love multimodal systems, but I also understand why there are not many multimodal applications commercially available.  Being able to describe a drawing verbally and with gestural cues is great, and using both input modes can improve the system's accuracy when the two modes rely on each other for information.  On the other hand, if the system does not force users to use all input modes, then the accuracy rate for each individual input still has to be very high, as if the separate input modes could not be relied on.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-5003736173952328967?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/5003736173952328967/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=5003736173952328967' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5003736173952328967'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5003736173952328967'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/10/naturally-conveyed-explanations-of.html' title='Naturally Conveyed Explanations of Device Behavior'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-3077925096759056731</id><published>2007-10-17T13:52:00.000-05:00</published><updated>2007-10-17T23:42:03.690-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='ambiguity'/><category scheme='http://www.blogger.com/atom/ns#' term='interface'/><category scheme='http://www.blogger.com/atom/ns#' term='low-level recognizer'/><title type='text'>Ambiguous Intentions: a Paper-like Interface for Creative Design</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Gross and Do describe the Electronic Cocktail Napkin program that allows users to draw ambiguously as if they were sketching a design on a piece of scrap paper.  The Napkin then takes the sketch, examines objects within the ambiguous drawing, interprets the drawing's context, and recognizes the objects.&lt;br /&gt;&lt;br /&gt;Domain recognizers are user defined, where the user draws examples and the Napkin identifies the symbols and patterns.  Symbols can span across multiple domains, such as a circle acting as a table in a floor plan domain, and the same circle representing a node in a graph domain.&lt;br /&gt;&lt;br /&gt;As a user draws in a new, blank Napkin, the system discerns whether each stroke or symbol can be recognized, is ambiguous, or has failed to be recognized.  If a symbol is recognized then it can only be recognized in the known domain, or only one domain.  If a symbol is ambiguous then it can be recognized in multiple domains.  Otherwise, the system cannot recognize the symbol at the time and stores that information for later.  A user can also manually specify a symbol, if need be. &lt;br /&gt;&lt;br /&gt;Symbols are related to each other via constraints specific to a domain.  For instance, graphs and circuits have connected constraints, and floor plans can have perpendicular constraints for walls.  Low-level recognition is accomplished with 3x3 normalized "glyphs" with features.  Glyphs can also be grouped into configurations, which are ranked hierarchically above individual glyphs.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Overall, I think the Electronic Cocktail Napkin was a good idea that was too ahead of its time and too tied down to one system.  The authors even mentioned that the system suffered from not having access to the best type of technology for the system, which they mentioned as LCD digitizing displays the user could draw on.  The equipment that the Napkin was using involved a separate digitizing pad and display, which breaks the entire mapping from napkin to sketch pad.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-3077925096759056731?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/3077925096759056731/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=3077925096759056731' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/3077925096759056731'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/3077925096759056731'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/10/ambiguous-intentions-paper-like.html' title='Ambiguous Intentions: a Paper-like Interface for Creative Design'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-123002715498955613</id><published>2007-10-15T14:13:00.000-05:00</published><updated>2007-10-15T14:56:16.289-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='corner finding'/><category scheme='http://www.blogger.com/atom/ns#' term='interface'/><category scheme='http://www.blogger.com/atom/ns#' term='3D inference'/><category scheme='http://www.blogger.com/atom/ns#' term='overtracing'/><title type='text'>Graphical Input Through Machine Recognition of Sketches</title><content type='html'>&lt;span style="font-family: georgia;"&gt;&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Herot's short paper gave a brief, but comprehensive, look at sketch interaction systems in the mid 70s.&lt;br /&gt;&lt;br /&gt;The paper first looks at a general recognizer, HUNCH, that tries to see if accurate knowledge can be obtained without using a specific domain.   The system takes data drawn on a large tablet with a special pencil, and the raw input data is recorded by the computer.  The HUNCH system used another application, called STRAIT, that found corners in data by examining the user's pen speed.  The system also used a process called latching to snap endpoints of close lines together.  Unfortunately, the HUNCH system had problem with consistency between different users.  Users drawing at different pen speeds produced different corners, and the latching technique sometime distorted an intended image, such as oversnapping lines in a cube.  The system also handles overtraced lines by merging lines together, provides some 3D image inference through unexplained techniques, and can create floor maps by looking at boxed rooms and doorways.&lt;br /&gt;&lt;br /&gt;Context is an important part of a sketch, and Herot recognizes this fact by mentioning how data interpretations should have context information.  The context should be specified to the computer as to avoid issues of recognizing the domain.  Herot briefly mentions a top-down processing for recognizing sketches with a context architecture.&lt;br /&gt;&lt;br /&gt;Lastly, Herot mentions that user input is a key component of a sketch recognition system that should not be ignored.  More complex interfaces need to be developed so that a user can interact with a program and correct mistakes, and corner finding algorithms need to be tuned for an individual user.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Although none of the topics mentioned in Herot are new to me, the fact that all of these issues were mentioned in a paper written in 1976 is surprising.  For instance, I had been under the assumption that using pen speed to detect corners was a relatively new fad.&lt;br /&gt;&lt;br /&gt;I also am very surprised that the system tried (and from the one example, succeeded) at incorporating 3D image analysis.  I remember reading a paper about using edges  and vertices to detect whether an image is 3D, but I cannot seem to recall the author involved, so it's hard for me to construct a timeline for that research.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-123002715498955613?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/123002715498955613/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=123002715498955613' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/123002715498955613'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/123002715498955613'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/10/graphical-input-through-machine.html' title='Graphical Input Through Machine Recognition of Sketches'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-9202490236964964513</id><published>2007-10-10T18:09:00.000-05:00</published><updated>2007-10-10T20:44:10.448-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='perception'/><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='tension lines'/><category scheme='http://www.blogger.com/atom/ns#' term='geometric recognizer'/><category scheme='http://www.blogger.com/atom/ns#' term='singularity'/><title type='text'>Perceptually Based Learning of Shape Descriptions for Sketch Recognition</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;Veselova and Davis present a system to train a recognizer after only one drawn symbol example.  The system relies on perceptual information in symbols, which was learned from previous experiments conducted with humans.  Goldmeier is the main source of Veselova's perceptual information, and his research shows that people rely on &lt;span style="font-style: italic;"&gt;singularities&lt;/span&gt; when distinguishing between objects.  Singularities are important geometric properties in that are highly sensitive to change, such as parallelism, vertical and horizontal lines, and symmetry.  Goldmeier's research also shows that the vertical axis of symmetry of an object was more important than the horizontal axis.&lt;br /&gt;&lt;br /&gt;Veselova and Davis use this singularity information, as well as other constraints they define to be important, and create a ranked list of important constraints.  They also use work from Arnheim, who describes &lt;span style="font-style: italic;"&gt;tension lines&lt;/span&gt;, which are the important vertical, horizontal, and diagonal lines of an image.  Gestalt principles for grouping are also used in their system to group small objects into larger wholes.  These groupings also have hierarchy.&lt;br /&gt;&lt;br /&gt;The user study for the system involved a group of people distinguishing between a description and 20 shapes, 10 of which agree with the description and 10 which do not.  Humans then ranked what shapes agree and disagree with the description, and the recognition system does the same.  If the vast majority of humans agreed on an example, the system performed very well and matched the human performance.  Overall, the system performed about 5% less than humans when matching examples and descriptions.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I really enjoyed this paper.  Using more cognitive science techniques, such as Gestalt groupings and preservation of symmetry, are not necessarily new methods to computer science (ScanScribe), but the explanation behind the principles was decent for the short length of the paper. &lt;br /&gt;&lt;br /&gt;I am surprised that the system did not incorporate "must not" constraints, since the addition of nots shouldn't have been too difficult and is important in perception.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-9202490236964964513?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/9202490236964964513/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=9202490236964964513' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/9202490236964964513'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/9202490236964964513'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/10/perceptually-based-learning-of-shape.html' title='Perceptually Based Learning of Shape Descriptions for Sketch Recognition'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-1768076474022199351</id><published>2007-10-09T11:23:00.000-05:00</published><updated>2007-10-09T12:32:34.452-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='constraints'/><category scheme='http://www.blogger.com/atom/ns#' term='geometric recognizer'/><title type='text'>Interactive Learning of Structural Shape Descriptions from Automatically Generated Near-miss Examples</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Hammond and Davis present a debugging system in LADDER for shape definition errors.  Errors can be either syntactical or conceptual, where a syntactical error is an error within the definition expressions themselves, whereas a conceptual error deals with over- or under-constrained definitions. An under-constrained definition finds many false positives due to too few constraints, whereas an over-constrained definition has many false negatives by allowing only very specific instances of a shape to be recognized.&lt;br /&gt;&lt;br /&gt;The paper does not go into depth on handling syntactical errors, but the GUI for LADDER notifies the developer of any syntactical errors by highlighting definition problems in red.  For over-constrained errors, the system first needs to have a drawn example and a definition that positively recognizes that example.  Once this is given, the system scales and rotates the drawn shape and asks the user whether the new shape should be positively recognized.  The system removes constraints from the initial list if needed.  Then, the system generates near-miss examples by looking at the remaining constraints and testing them one at a time.  Each constraint is negated, and if the user indicates that the resulting image should be recognized, then the tested constraint is removed from the list.&lt;br /&gt;&lt;br /&gt;For under-constrained errors, the system again requires a drawn example and a definition.  A list of possible constraints is then generated for the image and the constraints are tested one at a time by adding the negation of the constraint into the current constraint list.  If the new set with the negation constraint should be positively recognized, then the constraint is not added to the system.&lt;br /&gt;&lt;br /&gt;To generate the shapes during the testing process, each shape is converted into a set of numerical features that can be used in equations to stretch, rotate, and translate parts of the shape.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Knowing that this system existed in LADDER before I started creating my shapes would have been helpful.  The idea behind eliminating or adding constraints is relatively simple, since &lt;span style="font-weight: bold;"&gt;constraint&lt;/span&gt; and &lt;span style="font-weight: bold;"&gt;not constraint&lt;/span&gt; cannot both exist in a definition.&lt;br /&gt;&lt;br /&gt;Yet, after working with LADDER, I have found that one of my most frequent errors is not with over- or under-constraint, but more with the constraints themselves.  For instance, when drawing a rectangle, I want the endpoints of lines to be touching or close to touching.  But I cannot edit the value for "close", so instead I have to either force the user to draw a rectangle in a certain fashion so I know that the lines are connected, or I have to eliminate the close endpoints constraint.  Both options are not optimal, and I'll experiment with the LADDER debugging system to determine if the system finds new constraints I did not know existed.&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-1768076474022199351?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/1768076474022199351/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=1768076474022199351' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1768076474022199351'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1768076474022199351'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/10/interactive-learning-of-structural.html' title='Interactive Learning of Structural Shape Descriptions from Automatically Generated Near-miss Examples'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-1606512287686613453</id><published>2007-10-03T13:18:00.001-05:00</published><updated>2007-10-03T22:52:10.114-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='geometric recognizer'/><title type='text'>LADDER, a sketching language for user interface developers</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;Hammond and Davis created a sketch language, LADDER, that allows developers to easily design a  geometric shape recognition system.  The language handles shape drawing, displaying, and editing, and shapes are defined using primitives, constraints, and other shapes.  Many primitives and constraints are already defined for the user, with primitives such as Lines, Arcs, and Spirals.  Constraints are for individual shapes, such as if the shape has a positive slope, and constraints can also describe shape interactions, such as if two shapes are intersecting.&lt;br /&gt;&lt;br /&gt;Shape definitions are hierarchical and incorporate geometric information about primitives and their relationship to each other.  For instance, an arrow is defined as a shaft and a head, where the shaft is defined by a line and the head is defined as two lines meeting at an acute angle.  One of the shaft points must also touch the head where the two head lines meet.  Shape groups are also formed from shapes that are often drawn together.  One example the authors give involves arrows touching polygons, which are defined as forces within a mechanical engineering domain. Shapes can also have editing constraints.  A shape can be allowed to be "rubber-banded", or stretched, on a fixed point, rotated around a center, or forced to remain static if the system designer chooses.&lt;br /&gt;&lt;br /&gt;Shapes, when displayed to the user, are displayed by their ideal strokes.  If only lines are allowed in a domain, then only shapes made from lines can be drawn.  Creating these ideal shapes happens through a series of mathematical equations.  The features of a shape are quantified, and then the constraints are translated into equations that the features are plugged into.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span&gt;The LADDER system provides a great foundation for geometric recognition systems.  It seems as if entire applications can be built relatively quickly, as long as the domains involved do not have complex shapes that cannot be broken down into primitives.&lt;br /&gt;&lt;br /&gt;Since I haven't used the system too much, I'm not sure how tough it is to build complex applications.  I also do not know how easy it would be to integrate the language into an existing application, or if LADDER is only designed to build complete systems.  Having the ability to pick and choose what LADDER offers might also be beneficial; for instance, I might not want the system to clean the strokes as they are drawn.  The paper did not provide many details about customizablity in this regard.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-1606512287686613453?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/1606512287686613453/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=1606512287686613453' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1606512287686613453'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1606512287686613453'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/10/ladder-sketching-language-for-user.html' title='LADDER, a sketching language for user interface developers'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-8193367723016174414</id><published>2007-09-26T13:07:00.001-05:00</published><updated>2007-10-03T13:15:26.272-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='linear classifier'/><category scheme='http://www.blogger.com/atom/ns#' term='features'/><category scheme='http://www.blogger.com/atom/ns#' term='beautification'/><category scheme='http://www.blogger.com/atom/ns#' term='low-level recognizer'/><title type='text'>Recognizing and Beautifying Low-level Sketch Shapes with Two New Features and Ranking Algorithm</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Paulson's paper describes a sketch recognition system for recognizing primitive shapes, and combinations of these shapes, with high accurarcy. The system constrains the user to draw each shape to recognize in a single stroke, but does not constrain the user in drawing style (i.e. not gestures). The shapes that the system recognizes include lines, polylines, circles, ellipses, arcs, curves, spirals, and helixes. Complex fits can be formed from these eight shapes.&lt;br /&gt;&lt;br /&gt;Feature calculation for each stroke incorporates computing direction, speed, and curvature graphs, as well as finding the corners of the stroke. Furthermore, Paulson introduces two new metrics: NDDE and DCR. NDDE is the &lt;span style="font-style: italic;"&gt;normalized distance between extremes&lt;/span&gt;, and calculates the difference between the stroke length between the highest and lowest directional values in the stroke, and divides this segment length by the total stroke length. Arcs and curves tend to have high NDDE values, whereas polylines have lower NDDE values. DCR is the &lt;span style="font-style: italic;"&gt;direction change ratio&lt;/span&gt; of a stroke, and is calculated by dividing the maximum change in direction by the average direction change. Arcs tend to have little changes in direction over time, whereas the direction graph for a polyline contains spikes in direction change.&lt;br /&gt;&lt;br /&gt;After this preprocessing, the stroke and computed features are passed to individual shape testers. These testers, once for each shape (line, arc, etc.), see if the stroke follows set rules for each primitive. For instance, in the case of a circle test, the stroke must have a minor and major axis ratio close to 1.0, the stroke's NDDE value must be high, and the feature area error of the stroke must be low. Each shape tester returns whether the stroke passed the test.&lt;br /&gt;&lt;br /&gt;Finally, for each shape test passed, the interpretation is placed in a sorted hierarchy. The hierarchy assigns weights for shapes, based on the shape's commonness and how complex a fit is. The interpretations added to the hierarchy in a certain order, described in the paper.&lt;br /&gt;&lt;br /&gt;Paulson's system obtains a 98.6% recognition rate for the entire system, and a 99.89% rate where the correct interpretation is within the top 3 handed to the user. The recognition gain comes from his two new features, which increase the system's accuracy by over 20%. The system ranking system helps sort the interpretations so that the system is 7% more likely to rank the correct interpretation as the top interpretation.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Paulson's results are quite amazing in that the system's recognition improves by 20% with just two new features. When analyzing his results table, I noticed that Sezgin's algorithm worked better than Paulson's when returning complex fits, but performed worse on every other shape test. Since Sezgin's method heavily favors complex fits these results seem logical.&lt;br /&gt;&lt;br /&gt;I noticed that circles and ellipses are given the same hierarchy score (5) in the system, which striked me as odd since I'd want to try and favor circles more than ellipses. Since a circle is always an ellipse, but an ellipse is not always a circle, I would want to favor circles in the final ranking. In this way, the system will slightly prefer circles over ellipses when the two tests pass.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-8193367723016174414?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/8193367723016174414/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=8193367723016174414' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/8193367723016174414'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/8193367723016174414'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/09/recognizing-and-beautifying-low-level.html' title='Recognizing and Beautifying Low-level Sketch Shapes with Two New Features and Ranking Algorithm'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-6754759490941823590</id><published>2007-09-24T19:03:00.000-05:00</published><updated>2007-09-24T20:22:26.043-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='line approximation'/><title type='text'>Streaming Algorithms for Line Simplification</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Aban et al. discussed an algorithm to approximate a drawn stroke in real time.  With a possibly infinite number of points in a stroke and a limited amount of memory, simplifying the stroke as its being drawn will save the computer storage space.&lt;br /&gt;&lt;br /&gt;The algorithm runs in O(1) time for three specific cases: when using the Hausdorff distance if the path is convex (i.e. every angle on the path is less than or equal to 180 degrees) or if the path is xy-monotone (any horizontal or vertical line intersects the path at most once), and when using the Frechet distance for general paths.   Aban et al. never defined the Hausdorff distance, but the Frechet distance between two paths is equal to the maximum distance the paths are away from each other at two points (I think...)&lt;br /&gt;&lt;br /&gt;The algorithm to simplify a stroke takes a the new point to add to the simplification and computes an error for the new addition.  The point with a minimum error is then removed from the simplification, and the new errors are for each remaining segmentation are calculated.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion&lt;/span&gt;:&lt;br /&gt;&lt;br /&gt;I don't really understand how this algorithm works.  If the algorithm keeps removing points from the segmentation, shouldn't the number of points always be constant?  But then how do we build up points to have a segmentation in the first place?  I need to reread the paper a few more times before I understand everything.&lt;br /&gt;&lt;br /&gt;Also, on a personal note, I tend to dislike papers that use words like "simply" and "clearly."  Especially when I don't think the paper is clear.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-6754759490941823590?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/6754759490941823590/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=6754759490941823590' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/6754759490941823590'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/6754759490941823590'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/09/streaming-algorithms-for-line.html' title='Streaming Algorithms for Line Simplification'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-7918678186754960990</id><published>2007-09-20T19:52:00.000-05:00</published><updated>2007-10-03T13:15:49.345-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='corner finding'/><category scheme='http://www.blogger.com/atom/ns#' term='vertex detection'/><title type='text'>A Curvature Estimation for Pen Input Segmentation in Sketch-based Modeling</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Kim and Kim propose a new corner detection system that utilizes local changes around corners.&lt;br /&gt;&lt;br /&gt;The system starts by resampling the points of the stroke so that all points within the stroke are equidistant from one another. The curvature value at a point is typically set to equal the distance around the point divided by the changes over stroke length. By having a constant stroke distance, the system allows the direction values (calculated the same as in Sezgin and Yu) to equal the curvature values.&lt;br /&gt;&lt;br /&gt;The final curvature value around each point is determined by two new metrics: local convexity and local monotonicity. Local convexity measures the "support" for a potential corner point. This value is calculated by looking at a window around the point and adding all of the curvature points of the same sign as the point's curvature. Local monotonicity examines the same window of points around a possible corner, but this time each point is examined in sequence, starting from the center. The curvature value can only be added if it is less than the previous point's and the same sign; if these two requirements are not met the algorithm stops and returns the current curvature.&lt;br /&gt;&lt;br /&gt;The algorithm found the correct corners approximately 95% of the time.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Proposing two new (seemingly easy to implement) curvature metrics could provide corner finders with more techniques to solidify between good corners and noisy data. The paper was only published recently so I'm not sure how much of an impact it has caused yet.&lt;br /&gt;&lt;br /&gt;The main issue I would have liked discussed more in the paper would have been where this corner finder differed from others in results. I know what types of shapes Sezgin's corner finder does well with (polylines, simple arcs) but I have yet to seen a corner finder that can continually find distinguish truly tough corners, such as a smooth line-arc transition. Kim &amp;amp; Kim's paper even doesn't count this transition as a corner (see Fig. 14, shapes 13 and 20). Their paper even vaguely states their accuracy rating as being "about 95 percent." I would think that if the curvature calculations were significantly better than previous corner finders' that the authors would want to show more proof. I guess we'll see how well the new metrics work this weekend...&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-7918678186754960990?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/7918678186754960990/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=7918678186754960990' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7918678186754960990'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7918678186754960990'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/09/curvature-estimation-for-pen-input.html' title='A Curvature Estimation for Pen Input Segmentation in Sketch-based Modeling'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-3065406644448173526</id><published>2007-09-17T14:51:00.001-05:00</published><updated>2007-10-03T13:17:54.773-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='corner finding'/><category scheme='http://www.blogger.com/atom/ns#' term='vertex detection'/><category scheme='http://www.blogger.com/atom/ns#' term='beautification'/><category scheme='http://www.blogger.com/atom/ns#' term='geometric recognizer'/><title type='text'>A Domain-Independent System for Sketch Recognition</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Yu and Cai's paper attempts to find corners in strokes by utilizing the stroke's direction and curvature information. Unlike Sezgin's corner finder, Yu and Cai's does not use speed data.&lt;br /&gt;&lt;br /&gt;The corner finder first cheks to see if the stroke can be approximated by a primitive shape such as a line, arc, or circle. If not, the stroke is divided into two smaller strokes at the point of highest curvature. The system again checks to see if each of the subsections can be approximated by primitives, and the cycle of splitting the stroke and checking continues.&lt;br /&gt;&lt;br /&gt;At each primitive approximation step, the corner finder looks at the direction graph of the segment to determine whether the segment is likely to be a line or an arc. The direction graph for a line is flat, whereas direction graphs for arcs and circles are sloped. To fit a line to the segment a least squares fit is used. To fit an arc to a stroke, a line is fitted between the first and last points of the segment. An arc is then formed to include the two end points and the point on the original stroke perpendicular to the midpoint of the line. The "center" of this arc (which can be viewed as a segment of a circle) is found, and to determine the error of the arc fit the corner finder calculates the difference between the area of the original stroke and the area of the beautified arc fit.&lt;br /&gt;&lt;br /&gt;A circle is just another case for arc fitting, except the slope of the arc should be above or around 2 pi.&lt;br /&gt;&lt;br /&gt;If the stroke intersects itself, another fit should be computed that takes into account the self-intersection points as corners. The best fit between the original computation method and the new, self-intersection method is taken to be the final corner set.&lt;br /&gt;&lt;br /&gt;The corner finder "cleans up" the best fit by merging or removing segments. If a segment is short in length and not connected to two other segments it is removed (eliminates hooks). If a segment is short in length and connected to two other segments it is split down its midpoint and the two sections are attached to the larger shapes. If two segments have similar angles and proximity (touching, almost connected) then they are merged.&lt;br /&gt;&lt;br /&gt;The paper also includes some basic object recognition, but it really is no more than some simple geometric classification combined with the fitting methods already described.&lt;br /&gt;&lt;br /&gt;The system worked very well on lines, arcs, and polylines. Correct corner rates on these cases were in the mid to upper 90s. Yet, on hybrid line and arc shapes the system had mixed results. The system still failed to find corners where lines and arcs smoothly transition into one another and obtained a 70% accuracy in these cases. The accuracy for other cases was around 90%.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Yu and Cai's paper shows another method for corner finding, this time without using speed data. Yet, their approach was very similar to Sezgin's. Although it was described differently, the system still essentially looks at the current corner fit, adds another possible corner, and analyzes the fit again.&lt;br /&gt;&lt;br /&gt;The main change in this system is the way that it handles circles and arcs. The approach would seem to be more accurate on some cases, such as circles. Also, looking at the slope of a large portion of the direction graph seems like a good technique to distinguish between lines and arcs.&lt;br /&gt;&lt;br /&gt;Overall I was disappointed with the system's accuracy. It doesn't appear to be very significant from Sezgin's. The main issue I have with the corner finder is that the system still only uses curvature values to find corners. In a project I worked on last year I had to find edges within a digital image, and I think a modified edge detection algorithm might work well for finding corners. You start off by looking at a small segment&lt;span style="font-style: italic;"&gt; &lt;/span&gt;of points and determine the least squares fit.  Keep adding chunks &lt;span style="font-style: italic;"&gt;&lt;/span&gt;of points to the segment until your least squares fit starts to level. When the fit starts to change again, backtrack a little and add a corner. Then repeat, creating a new segment starting at your new corner.&lt;br /&gt;&lt;br /&gt;The method can detect more subtle differences in curvature that a global curvature threshold will not detect (the threshold for each segment is the local change). I'll probably implement this later if I have the time and see how it goes. I know one issue with this approach is that it can be computationally intensive if you have long segments.&lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-3065406644448173526?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/3065406644448173526/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=3065406644448173526' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/3065406644448173526'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/3065406644448173526'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/09/domain-independent-system-for-sketch.html' title='A Domain-Independent System for Sketch Recognition'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-7121840176504865451</id><published>2007-09-12T19:02:00.000-05:00</published><updated>2007-10-03T13:17:16.819-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='corner finding'/><category scheme='http://www.blogger.com/atom/ns#' term='vertex detection'/><category scheme='http://www.blogger.com/atom/ns#' term='beautification'/><title type='text'>Sketch Based Interfaces: Early Processing for Sketch</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Sezgin et al.'s paper describes a way to find corners (vertices) of freely drawn symbols in order to break the symbol into lines and arcs. These lines and arcs can then be used to define the symbol, which can be passed into a geometrical recognizer for classification.&lt;br /&gt;&lt;br /&gt;The vertex finding algorithm works by locating points of high curvature and minimum pen speed. Curvature is defined as being the change in direction at a given point, and speed is the change in arc length over time. Although the paper does not mention this, the curvature value for a point can be found by using a least squares fit over a window of points from &lt;span style="font-style: italic;"&gt;p - k &lt;/span&gt;to &lt;span style="font-style: italic;"&gt;p + k&lt;/span&gt;, and taking the slope of the fit (while watching for vertical lines, of course). Once we have the curvature and speed values for each point in the stroke we can find the local minima points that drop below a speed threshold and local maxima points that rise above a curvature threshold. In this paper the speed threshold is taken to be 90% of the average speed, and the curvature threshold is equal to the average curvature. These minima and maxima are possible vertices for speed and curvature, respectively.&lt;br /&gt;&lt;br /&gt;Take the intersection of both the speed and curvature corners to get a starting set of possible corners, and now Sezgin et al. go through the remaining corners and calculate "fits" for each. The curvature metric is equal to the magnitude of the average of two curvature values some &lt;span style="font-style: italic;"&gt;k&lt;/span&gt; window of points away, divided by the arc length between those points. The speed metric is just 1 - (speed at point / max speed). The remaining corner candidates are then sorted in their respective lists according to high metric value.&lt;br /&gt;&lt;br /&gt;The algorithm then takes one remaining corner from each set (remaining speed and remaining curvature) and generates two new "hybrid fits": current corner set + speed candidate, current corner set + curvature candidate. The hybrid fits are then tested using a least squares algorithm between each vertex, and the fit with the least error becomes another possible corner fit for our stroke. More fits are generated until all remaining candidate corners are used up. The best fit is chosen based on an error threshold, and then the one with the least number of corners is chosen from those below the threshold.&lt;br /&gt;&lt;br /&gt;Since a least squares fitting does not work with curved regions, for any arcs in a stroke the algorithm approximates the arc using a Bezier curve approximation. If the error on the curve we are trying to approximate is greater than a threshold we split the arc at the middle and create two new curve approximations. These curves are used to determine if fit errors.&lt;br /&gt;&lt;br /&gt;The system can then beautify the sketch, which is a trivial process if we already have all of the vertices and the Bezier curve approximations. Overall the system had great results, and users "praised" the research because it allowed them to draw objects free-hand. The system accurately found the corners 96% of the time.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This paper is a cornerstone for vertex finding, pun intended. The research is well discussed, the algorithms are relatively easy to implement, and the results are fantastic.&lt;br /&gt;&lt;br /&gt;The main thing that I would have liked to see included in the paper was a mention of where and how the algorithm can make a mistake. From my experience I know that corner finding is difficult on poorly drawn circles. Speed variations for circles can be slight, and if the circle looks more like an oval (as in the number 0) there can be a small protrusion or bump at the bottom of the circle that acts as a corner. I'd be interested to see how his system performed on these types of shapes.&lt;br /&gt;&lt;br /&gt;Also, thresholds are very tricky when you take into account shapes that include both polylines and arcs. If it is a complex shape that needs to be drawn in a single stroke a user can sometimes pause in the middle of drawing to think about how to draw the next section. This pause will definitely hit as being a vertex, but it destroys the speed average and hurts the finding of more subtle corners.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-7121840176504865451?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/7121840176504865451/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=7121840176504865451' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7121840176504865451'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/7121840176504865451'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/09/sketch-based-interfaces-early.html' title='Sketch Based Interfaces: Early Processing for Sketch'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-4206983115299685720</id><published>2007-09-06T15:22:00.000-05:00</published><updated>2007-10-03T13:16:42.765-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='linear classifier'/><category scheme='http://www.blogger.com/atom/ns#' term='features'/><category scheme='http://www.blogger.com/atom/ns#' term='search'/><title type='text'>MARQS: Retrieving Sketches Using Domain- and Style-Independent Features Learned from a Single Example Using a Dual-Classifier</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Brandon's paper proposes a search algorithm that can search a laboratory notebook using sketches. The algorithm itself needs to be invariant to user drawing styles and account for rotation and scaling issues in sketches. It also must be able to recognize sketched symbols from only one drawing since a user searching through their notes might have only drawn a certain sketch one time. Brandon created the MARQS system, a media library with sketch queries, in order to test the algorithm.&lt;br /&gt;&lt;br /&gt;To reduce rotation variance the algorithm rotates each sketch so that its major axis is horizontal. The major axis is defined as being the line between the two farthest points within the sketch. The four features used are: bounding box aspect ratio, pixel density, average curvature, and the number of corners found. If only one sketch has been drawn before, a query search will run a single classifier which will do a simple check for the error between the query features and the drawn example. A linear classifier is used when there is more than one example sketch during the query.&lt;br /&gt;&lt;br /&gt;The average ranking (or classifier rank) for each sketch query was 1.51. As the system was used more often the results showed that the ranking improved since the system moved away from the linear classifier.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;The system seems to work pretty well for very simplistic search algorithms, especially since it only uses four features. The downward trend in the ranking preference is also very promising since searches will likely be performed often.&lt;br /&gt;&lt;br /&gt;MARQS' main limitation is in the features it can use since the system runs a linear classification on a freely drawn sketch. Each feature has to be invariant to the number of strokes, rotation, and scale. Yet, the feature for the number of perceived corners could easily change between sketches if somebody drew a tree with one jagged line whereas another time they drew it in small dashes. Merging close endpoints together into one larger stroke might be beneficial to the system, and then some other features like average length of strokes might be used.&lt;br /&gt;&lt;br /&gt;Another thing to think about is if the journal system is ever implemented, how do you distinguish between a sketch and handwriting? I recently saw some great research from Auckland, but I can't find the paper online to link here. If you really want to know more ask me for a copy.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-4206983115299685720?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/4206983115299685720/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=4206983115299685720' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4206983115299685720'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4206983115299685720'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/09/marqs-retrieving-sketches-using-domain.html' title='MARQS: Retrieving Sketches Using Domain- and Style-Independent Features Learned from a Single Example Using a Dual-Classifier'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-5907337036605730762</id><published>2007-09-04T21:57:00.000-05:00</published><updated>2007-10-03T13:14:22.921-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='linear classifier'/><category scheme='http://www.blogger.com/atom/ns#' term='gesture'/><category scheme='http://www.blogger.com/atom/ns#' term='features'/><category scheme='http://www.blogger.com/atom/ns#' term='feature similarity'/><title type='text'>"Those Look Similar!" Issues in Automating Gesture Design Advice</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Gesture that are similar are hard to classify for a computer and difficult to remember for a user. Long and Landay created a tool called &lt;span style="font-style: italic;"&gt;quill &lt;/span&gt;that helps designers improve any gestures they want to use by highlighting any gesture similarities. Designers can then take these similarities into account and redesign their gestures to be more unique.&lt;br /&gt;&lt;br /&gt;When using &lt;span style="font-style: italic;"&gt;quill&lt;/span&gt;, designers take a gesture they want implemented into their system and draw some examples into the tool. After more than one gesture class has been drawn &lt;span style="font-style: italic;"&gt;quill&lt;/span&gt; begins to analyze the gestures for similarities. Similarity was determined to be based heavily upon gesture curviness and angle, as well as the density of the gesture. For more on how similarity was determined see "Visual Similarity of Pen Gestures."&lt;br /&gt;&lt;br /&gt;The paper goes into detail about how presenting advice to the user is a large interface challenge. Long and Landay did not want &lt;span style="font-style: italic;"&gt;quill&lt;/span&gt; to annoy expert gesture designers, but they also wanted to provide enough advice that both novice and experts could benefit from the system. In the end they decided to delay advice until the designer decides to test a gesture. The authors decided that designer were probably done with tweaking their initial gestures a bit when they were testing. Yet, &lt;span style="font-style: italic;"&gt;quill&lt;/span&gt; also provides more subtle warnings to the designers whenever they pause.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Improving gesture designs by letting the computer point out when gestures might be confusing is a great way to safeguard poor gesture choices. The analysis of when to present advice to a user is also a large UI problem outside of &lt;span style="font-style: italic;"&gt;quill&lt;/span&gt;; for instance, when should a stroke/symboll be beautified or when should a symbol be recognized? Long and Landay provided a good overview of their decisions, and I really liked their choice to present advice when the designer decides to test their classes.&lt;br /&gt;&lt;br /&gt;I'm curious at how good the advice actually is for the system and how annoying it might be if the same advice keeps appearing every time I test my classes. If I design a left-to-right horizontal line to be a "Page Forward" gesture and a right-to-left horizontal line to be "Page Back" the computer might tell me that the gestures are similar, but the gestures have such a good mapping to forward and back arrows that users shouldn't confuse the two. I wouldn't want the system to constantly be warning me of this issue.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-5907337036605730762?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/5907337036605730762/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=5907337036605730762' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5907337036605730762'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/5907337036605730762'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/09/those-look-similar-issues-in-automating.html' title='&quot;Those Look Similar!&quot; Issues in Automating Gesture Design Advice'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-1894361589885567387</id><published>2007-08-31T13:53:00.000-05:00</published><updated>2007-10-03T13:14:56.224-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='linear classifier'/><category scheme='http://www.blogger.com/atom/ns#' term='gesture'/><category scheme='http://www.blogger.com/atom/ns#' term='features'/><title type='text'>Specifying Gestures by Example</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Rubine's gesture recognition system, GRANDMA, is a single-stroke gesture recognizer and toolkit so that developers can add gestures to their applications. Gestures can be useful when they provide a means for intuitive input. As an example, the paper shows how a gesture-based drawing program (GDP) could use gestures to create simple shapes and edit them. These gestures could be created with GRANDMA by first defining what types (or classes) of gestures will be used and then collecting examples for each class. It was empirically determined that fifteen gesture examples should suffice for each class.&lt;br /&gt;&lt;br /&gt;Drawn gestures are composed of an array of time-stamped points. Thirteen features are calculated for each gesture, such as the starting angles of the gesture, the angle of the bounding box, length of the bounding box diagonal, the total length and rotation of the gesture, the smoothness of the gesture, and the time taken to draw the gesture. These features are invariant to gesture placement (i.e. where the gesture was drawn), but they do take into account scaling and rotation.&lt;br /&gt;&lt;br /&gt;To classify the gesture we individually dot product the feature vector with a weight vector for each gesture class defined. The dot product with the maximum value is taken to classify the gesture. This weight vector is computed during gesture training. In training each of the gesture examples drawn has a feature vector calculated and the average feature vector of the examples taken. A weight vector for the class is then found by trying to find the defining features for the set of examples using covariance matrices (http://mathworld.wolfram.com/Covariance.html).&lt;br /&gt;&lt;br /&gt;Overall the gesture system worked very well, but as the number of gesture classes increased the recognition rate lowered. The number of training examples used increased the recognition rate up to around 50 examples, but after that it appeared that there was either a plateau or overfitting.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Rubine's gesture system is a great paper that shows that sketch recognition can be simple, fast, and reliable if the user is constrained in certain ways. Gestures are easy to define with GRANDMA, and the calculations to classify gestures can happen in real time. The system also had outstanding recognition results with small numbers of gestures between 5 and 10. Even as the number of classes allowed was increased to 30 the recognition rate lowered but was still acceptable at around 96.5%.&lt;br /&gt;&lt;br /&gt;The main problem with the system is that it does require a lot of constraints on the users end. Like Palm Pilot Graffiti, over time the user will be accustomed to drawing in a certain way. This isn't necessarily a bad thing. With any new appliance or application people need to be trained to use it. My new toaster works much differently than my old one and I'm still getting adjusted with the settings. Even with newer non-gesture software, such as Tablet PC handwriting recognition, I have grown accustomed to drawing my lower case Ls in cursive since the print version is confused with the number 1 too often. Yet, when an application is thought of as being intuitive there is much less wiggle room for how much training is needed. If Photoshop does not work as I intended I'm likely to blame myself for a mistake whereas if the computer does not recognize my circle gesture I'm more likely to blame the software.&lt;br /&gt;&lt;br /&gt;In the case of GRANDMA the rotation and scale constraints are a bit too much in my opinion; I would try to normalize everything to a standardized bounding box to eliminate scale. Yet these could be acceptable in some situations, such as full keyboard gestures where we try to recognize '/' versus '|' versus '1'.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Rubine, D. 1991. Specifying gestures by example. In &lt;i&gt;Proceedings of the 18th Annual Conference on Computer Graphics and interactive Techniques&lt;/i&gt; SIGGRAPH '91. ACM Press, New York, NY, 329-337.&lt;br /&gt;&lt;br /&gt;http://portal.acm.org/citation.cfm?id=122753&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-1894361589885567387?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/1894361589885567387/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=1894361589885567387' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1894361589885567387'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/1894361589885567387'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/08/specifying-gestures-by-example.html' title='Specifying Gestures by Example'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-4653991074363179496</id><published>2007-08-29T20:33:00.000-05:00</published><updated>2007-10-03T13:11:08.334-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><title type='text'>Introduction to Sketch Recognition</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;    &lt;/span&gt;&lt;/span&gt;The paper comprises of a brief but comprehensive look at the sketch recognition field. Some of the topics covered include current pen-based hardware, pen-centric software, and various uses for Tablet PCs.&lt;br /&gt; &lt;br /&gt;The main focus on Tablet PC use involved an overview on how they could be used in an educational environment. Pen-based technology has been shown to have a mixed affect on student performance in a classroom, but overall the reception by students has been positive. Teachers feel that some Tablets and software help with class lectures, such as using Windows Journal to create presentation templates. Yet, Tablet hardware also limits teachers since they are tethered to a projector through the cables or forced to use a small amount of space. Tablet software can also help students learn on their own. MathPad allows for students to check their math notes and homework and Physics Simulator allows students to model and simulate mechanical engineering diagrams.&lt;br /&gt;&lt;br /&gt;In two case studies, teachers in middle and high school evaluated Tablet PCs in their classes. Both teachers used the technology differently but enjoyed using the computers to enhance their teaching. One teacher used Tablets to record either at home or in class demos to archive. The other used Tablets to create presentation templates that they could save and write over during class. The templates could be used annually to save the teacher class preparation time.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Discussion:&lt;br /&gt;&lt;br /&gt;   &lt;/span&gt;A good discussion topic for this paper would be "Where do you think this technology is heading?" and "How can we improve it further for classroom use?" Tablet technology is getting cheaper and software is providing better support for pens. Although a limitation right now is cost of technology (smart boards vs. regular white boards &amp;amp; projectors, notebook vs. tablet) what people should think about is "What if we had no limitations?" It's a fun science fiction question to ask at this time.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-4653991074363179496?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/4653991074363179496/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=4653991074363179496' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4653991074363179496'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/4653991074363179496'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/08/introduction-to-sketch-recognition.html' title='Introduction to Sketch Recognition'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-8001946218408116540</id><published>2007-08-29T20:26:00.000-05:00</published><updated>2007-10-03T13:12:49.569-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sketch recognition'/><title type='text'>Sketchpad</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Summary:&lt;/span&gt;  &lt;p class="MsoNormal"&gt;&lt;span style=""&gt;                &lt;/span&gt;Sketchpad was the first sketch recognition system and was developed in the early sixties.&lt;span style=""&gt;  &lt;/span&gt;The system used a light pen and keyboard buttons in order to interact with the computer to create design-quality drawings.&lt;span style=""&gt;  &lt;/span&gt;The light pen would draw or select objects on a computer screen while the keyboard would toggle various constraints to place on the current drawing of selection.&lt;/p&gt;  &lt;p class="MsoNormal" style="text-indent: 0.5in;"&gt;The constraint system was one of the key features in Sketchpad that allowed the system to produce drawings that looked good.&lt;span style=""&gt;  &lt;/span&gt;With constraints a user could draw perfect lines and circles without having to worry about pen noise; they would only hold a button indicating that what will be drawn next is a line or circle.&lt;span style=""&gt;  &lt;/span&gt;Furthermore, touching up a drawing was easy with constraints that could force selected groups of lines to be horizontal, parallel, or perpendicular.&lt;span style=""&gt;  &lt;/span&gt;Sketchpad also allowed corner snapping constraints which locked a corner of one object onto an edge or corner of another.&lt;span style=""&gt;  &lt;/span&gt;The constraint system in Sketchpad was implemented using logic, where constraints were variables and the list of constraints on a system could be evaluated to ensure that constraints are satisfied.&lt;span style=""&gt;  &lt;/span&gt;Relaxation and one pass are the two constraint satisfaction methods mentioned.&lt;/p&gt;  &lt;p class="MsoNormal" style="text-indent: 0.5in;"&gt;Sketchpad worked best when a sketch required a lot of repeated patterns and shapes.&lt;span style=""&gt;  &lt;/span&gt;The system could create instances (shallow copies) or copies (deep copies) of drawn objects, which could then be resized, rotated, and moved around the drawing space.&lt;span style=""&gt;  &lt;/span&gt;A design requiring a lot of repetition, such as the hexagonal example in the paper, can be created very quickly without having to repeatedly draw hundreds of hexagons.&lt;/p&gt;  &lt;p class="MsoNormal" style="text-indent: 0.5in;"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/p&gt;  &lt;p style="font-weight: bold;" class="MsoNormal"&gt;Discussion:&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;span style=""&gt;                &lt;/span&gt;Sketchpad is a great system that was revolutionary for its time.&lt;span style=""&gt;  &lt;/span&gt;As we discussed in class it was the first sketch recognition system and the paper highlighted some key areas that sketch recognition technology could be beneficial, such as in architecture, art, and engineering. &lt;span style=""&gt; &lt;/span&gt;Sketchpad also established some object oriented programming techniques by providing shallow and deep copies of drawn objects. &lt;/p&gt;  &lt;p class="MsoNormal" style="text-indent: 0.5in;"&gt;The use of constraints was a good method to ensure that the drawing looked good, but it also forced the user to remember many keyboard commands.&lt;span style=""&gt;  &lt;/span&gt;Also, the keyboard can only handle as many constraints as the number of keys if the constraints are kept on separate keys; multiple key combinations could allow for n-factorial more constraints but at a high cost of usability.&lt;span style=""&gt;  &lt;/span&gt;To alleviate this burden it might have been better to have two separate screens: the drawing screen and a constraint menu screen.&lt;span style=""&gt;  &lt;/span&gt;The constraint buttons on the side of the Sketchpad window could be shifted to the new constraint menu screen with each button pointing to a menu option.&lt;span style=""&gt;  &lt;/span&gt;The options would follow from the constraints described in the paper, with drawing and selection constraints in hierarchical menus.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-8001946218408116540?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/8001946218408116540/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=8001946218408116540' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/8001946218408116540'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/8001946218408116540'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/08/sketchpad.html' title='Sketchpad'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3022028357504155915.post-3274320639208450376</id><published>2007-08-29T19:33:00.000-05:00</published><updated>2007-08-29T20:25:37.132-05:00</updated><title type='text'>Introduction</title><content type='html'>&lt;p class="MsoNormal"&gt;&lt;span style=""&gt;&lt;span style="font-weight: bold;"&gt;Name:&lt;/span&gt;    Aaron Wolin&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style="font-weight: bold;"&gt;Year:&lt;/span&gt;      First Year PhD student&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style="font-weight: bold;"&gt;Email:&lt;/span&gt;     awolin at neo dot tamu dot edu&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style="font-weight: bold;"&gt;Academic Interests:&lt;/span&gt;&lt;span style=""&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;span style=""&gt;    &lt;/span&gt;My current academic interests consist of Sketch Recognition, AI, and HCI.&lt;span style=""&gt;  &lt;/span&gt;I’ve been involved with sketch recognition projects for over a year now and find them very exciting from an HCI and AI perspective. &lt;span style=""&gt; &lt;/span&gt;The use of a pen as input heavily constraints traditional mouse and keyboard input possibilities while simultaneously allowing for new applications, such as handwriting programs.&lt;span style=""&gt;  &lt;/span&gt;Sketch recognition also requires a great deal of AI since computers need to have knowledge of what is being drawn.&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style="font-weight: bold;"&gt;Relevant Experience:&lt;/span&gt;&lt;span style=""&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt; &lt;/span&gt;&lt;/p&gt;  &lt;ul&gt;&lt;li&gt;&lt;!--[if !supportLists]--&gt;&lt;span style="font-family: Symbol;"&gt;&lt;span style=""&gt;&lt;span style="font-family: &amp;quot;Times New Roman&amp;quot;; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;Impro-Visor – Research application from Harvey Mudd College teaching amateur jazz musicians to compose solos.&lt;span style=""&gt;  &lt;/span&gt;Musicians enter notes to a solo within a composition window and advice concerning the musical “correctness” would be displayed to them.&lt;span style=""&gt;  &lt;/span&gt;The advice manager also allows students to pick from good scales and tones that might fit their solo.&lt;a href="http://www.cs.hmc.edu/%7Ekeller/jazz/improvisor.html"&gt; http://www.cs.hmc.edu/~keller/jazz/improvisor.html&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;      &lt;ul&gt;&lt;li&gt;&lt;!--[if !supportLists]--&gt;&lt;span style="font-family: Symbol;"&gt;&lt;span style=""&gt;&lt;span style="font-family: &amp;quot;Times New Roman&amp;quot;; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;Circuit Diagram Recognizer – This is a continuing sketch recognition research project at Harvey Mudd College.&lt;span style=""&gt;  &lt;/span&gt;The overall goal of the project is to provide students feedback on their drawn circuit diagrams, such as from class notes.&lt;span style=""&gt;  &lt;/span&gt;An ideal situation would be for an engineering student to draw a diagram free hand with an accompanying truth table, run it through our recognizer, and then see if the circuit diagram has been implemented correctly based in the truth table provided.&lt;span style=""&gt;  &lt;/span&gt;The project is heavily AI based since we do not constrain student drawing styles. &lt;a href="http://www.cs.hmc.edu/%7Ealvarado/research/sketch.html"&gt;http://www.cs.hmc.edu/~alvarado/research/sketch.html&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;      &lt;ul&gt;&lt;li&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;span style="font-family: Symbol;"&gt;&lt;span style=""&gt;&lt;span style="font-family: &amp;quot;Times New Roman&amp;quot;; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;Document Finding (before OCR) – Another project at Harvey Mudd College involved me working with a document digitizing company called Laserfiche.&lt;span style=""&gt;  &lt;/span&gt;Our project’s focus was to find and crop documents in pictures taken by digital cameras.&lt;span style=""&gt;  &lt;/span&gt;This essentially uses the camera as a scanner and would allow for more portability of document digitizing and OCR software.&lt;/li&gt;&lt;/ul&gt;  &lt;p class="MsoNormal"&gt;&lt;span style="font-weight: bold;"&gt;Why I'm taking this class:&lt;/span&gt;&lt;span style=""&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style=""&gt;    &lt;/span&gt;Although I’ve already had some sketch recognition experience my research has been focused on free-style sketch recognition.&lt;span style=""&gt;  &lt;/span&gt;There are many other areas (gesture based systems, sketch beautification) that I have not explicitly worked with, and I want to expand my knowledge of the field.&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style="font-weight: bold;"&gt;What I hope to gain:&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style="font-weight: bold;"&gt;    &lt;/span&gt;A general expansion of my sketch recognition knowledge, as well as having fun and learning techniques that can be applied to my research.&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style="font-weight: bold;"&gt;What I'll be doing in 5 years:&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style="font-weight: bold;"&gt;    &lt;/span&gt;I'll (hopefully) be finishing up my research and time here at Texas A&amp;M.&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style="font-weight: bold;"&gt;What I'll be doing in 10 years:&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style="font-weight: bold;"&gt;    &lt;/span&gt;I don't know, going to Mars.  My current thoughts are to go into industrial research, but anything can change in 10 years.  Four years ago I wouldn't have said I'd be going to graduate school.  In 10 years anything can happen.&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style="font-weight: bold;"&gt;Non-academic interests:&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;    &lt;/span&gt;&lt;/span&gt;Reading, movies, playing poker, concerts, board games, mixology&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style="font-weight: bold;"&gt;Fun Story:&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style="font-weight: bold;"&gt;    &lt;/span&gt;Every year at Harvey Mudd my friends and I would make our own sushi.  One of my friends had some sushi making supplies and he taught everybody how to make rolls.   During our third year we discovered a website where you can get really fresh fish, but the catch was that you had to order over $50 worth in order to qualify for shipping.  The last two years we had so much fish after we had gorged ourselves we were shouting at people outside our door to come on in and eat sushi.&lt;br /&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3022028357504155915-3274320639208450376?l=awolin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://awolin.blogspot.com/feeds/3274320639208450376/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3022028357504155915&amp;postID=3274320639208450376' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/3274320639208450376'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3022028357504155915/posts/default/3274320639208450376'/><link rel='alternate' type='text/html' href='http://awolin.blogspot.com/2007/08/introduction.html' title='Introduction'/><author><name>Grandmaster Mash</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
