Wednesday, December 5, 2007

What Are Intelligence? And Why?

Summary:

Randall Davis's 1996 presidential address was an overview on human intelligence. In order to understand how artificial intelligence might be created, it is important to learn the theories involved with current human and animal reasoning. The five views in reasoning are mathematical logic, psychology, biology, statistics, and economics.

Since the paper was mainly a general overview, I'm more interested in discussing some aspects of the paper as they relate to AI.

Discussion:

This summary is being written post-Davis conference call, so I've had a bit of time to think about our discussion with him, as well as my thoughts from before.

My main conclusion that I've reached after our call was that an artificial intelligence breakthrough shouldn't have been developed by now. From an evolutionary perspective, Davis mentions that intelligence is built over time from individually formed pieces. Each part of an organ is developed over time by incrementally building on previous developments. Even the brain is composed of different sections that are compartmentalized. Sometime in homo sapien's past, these compartments connected to each other in a unique way that intelligence was formed.

Examining the development of AI, I noticed that each subfield of AI is just like another organ or section of the brain. By themselves, the subfields are too focused to offer true intelligence. Tools to recognize language are not built to recognize images, image recognition engines cannot develop plans, and planners cannot understand speech. The only way to increase the intelligence of a system is to find ways to interconnect all of these components to offer reasoning.

Each subfield should also be as developed as possible. Right now, the evolution of AI is in individual specialization. As highly-accurate, full systems begin to be developed, are widely available, and easy-to-use, then diverse systems will be created and the focus on specialization will fade.

My main concern is that research on merging fields will be rather slow. Current graduate students are less likely to work on areas that require expertise in multiple fields, since Ph.D.s focus on high specialization. As the number of fields required to improve the intelligence of systems grows, research on artificial intelligence systems will require more effort from multiple professors and students.

Monday, November 26, 2007

SketchREAD: A Multi-Domain Sketch Recognition Engine

Summary:

Alvarado's SketchREAD is sketch recognition system built with Bayesian networks. The engine can be tuned to run in multiple domains, and Bayesian networks allow for small errors to be corrected.

SketchREAD uses a geometric sketching language, much like the one found in LADDER, to describe simple domain shapes. The context of how these shapes appear within a domain, such as how they arrows are used to connect lineages together in family trees, is a higher level than simple geometric recognizers. Trying every possible combination of strokes to find the "best" fit for all the shapes is time consuming. SketchREAD seeks to model this context with Bayesian networks.

Shapes themselves have hypotheses linking to primitives and constraints. For instance, the hypothesis for an Arrow would cause three Lines and the constraints between them. Higher context models can also be portrayed, such as a Mother-Son link causing a Mother, Son, and a Line. Partial hypotheses can also be generated by incorporating "virtual" nodes that are primitive hypotheses not linked to observations.

To generate hypotheses, SketchREAD has three steps:
  1. Bottom-up: Strokes that the user draws are recognized as primitives and low-level shapes
  2. Top-down: System attempts to find subshapes missing from possible interpretations. Strokes can be reinterpreted.
  3. Pruning: Unlikely interpretations are removed from considerations.
As an example, Alvarado proposes that an ellipse is drawn in a family tree domain. This ellipse is recognized as a low-level shape, and then an interpretations for ellipse is created, as well as partial interpretations for Mother-Son, Mother-Daughter, etc. These partial hypotheses are templates, and the shape drawn is fit into a single slot of the template. Later, the shape can be shuffled within the template. To keep the interpretations from exploding and being intractible, high-level hypotheses are generated in the Bayesian network from only complete templates. Also, any polyline is assumed to be part of only one shape/interpretation.

In the domain of family trees, SketchREAD improves over baseline performance in symbol recognition by reducing the errors in recognition by over 50%. Circuit diagrams provide a harder domain, and here SketchREAD improves over a baseline by reducing the number of errors by 17%. The time it takes to process each stroke increases with the stroke number.

Discussion:

Although SketchREAD improves the accuracy for the tested domains, the final accuracy was not yet good enough to be used within any complex domain's interface, which was one of the goals of the system. In the paper's discussion, Alvarado also mentioned this. Also, the issue with allowing polylines to be part of only one interpretation greatly hurts circuit diagram domains, since many circuits symbols can be drawn with a single stroke.

Monday, November 19, 2007

Ink Features for Diagram Recognition

Summary:

Patel et al. use ink features to divide a sketch into text and shapes via a dividing tree. Forty-six features were defined by the authors for use in the divider, all of which were defined in an appendix.

Sketch data was collected from 26 people, each person drawing 9 sketches. Each stroke in the sketch was then labeled as being part of text or a shape, and analysis was performed on these sketches to determine the relevant features distinguishing between the two components. The authors used a statistical partitioning technique available in the R statisical package to divide the data into two components in the feature space and determine the most relevant features. A classification tree was then built with the most relevant feature as the root of the tree, and each node defines a threshold that separates the space into text and shape components.

The authors obtained great results with the simple classification tree technique. The new classifier greatly reduced the number of misclassified strokes as a whole from both Microsoft's classifier and InkKit's. Although in the test results the new classifier misclassifies text a bit more than either baseline system, text misclassification is much lower in the new system.

Discussion:

The actual accuracy numbers are not impressive, since the system still misclassifies too many strokes to be considered "accurate", but the system's improvement is very impressive and pushes the research on text/shape classification into a new direction. The author's analysis of the features (and inclusion of the features in an appendix!) was appreciated, and they discussed which features the Rpart partitioning system found most helpful. Since this model should run very quickly after training, it could easily be combined with other models to improve text/shape recognition even more.

Bayesian Networks Without Tears

Summary:

Charniak provides a decent overview of Bayes nets in this publication. Bayesian networks model causality and the probabilities associated with cause-and-effect relationships. Since Bayesian networks are a highly graphical and visual model, it would be rather convoluted to try and describe them in detail here. Instead, I will provide brief comments on topics brought up in the paper.

Bayesian networks are DAGs, or directed acyclic graphs. The reason for this is so that there are no infinite cause-and-effect relationship between two or more nodes. The nodes themselves are states in the world, and arcs (edges) in the graph specify the causal connections between the nodes. All nodes in the graph must have prior probabilities specified, where these probabilities are defined by experts or empirical data.

One of the main attractions to use Bayesian networks is the incredible savings in calculating joint probabilities for the graph. In a typical case where there are n binary variables, there would be 2^n - 1 joint probabilities in order to obtain a complete distribution. Yet, in Bayesian networks, the model grows exponentially only with the causal connections entering per node. If a node has only 1 connection entering in, then the node will have 2 joint probabilities. 2 connections provides 4 joint probabilities, 3 connections 8 joint probabilites, etc. This phenomenon is attributed to independence assumptions present in Bayesian networks. Two nodes are dependent on each other only if there is a d-connecting path between the two nodes, where a d-connecting path is defined as being a path where either the two nodes are linear or diverging from a source, or if, given the evidence E, the nodes are converging to a node in the evidence.

Wednesday, November 14, 2007

Sketch Interpretation Using Multiscale Models of Temporal Patterns

Summary:

Certain sketch domains contain appropriate temporal information that can assist in symbol recognition. For instance, digital circuit diagrams can be highly time-dependent when restricted to certain symbols. Resistors are typically drawn in order, as are capacitors and batteries. Using HMMs to take advantage of this temporal information can improve sketch recognition accuracy.

Sezgin uses a HMM modeled with DBNs to maximize the likelihood of the observable features given the grouping's label. The DBN model takes observables as input, obtained through features computed on the grouping's primitives, and infers the probability of a stroke-level model given the observables. The observables are also modeled with a mixture of Gaussians, although I'm not sure what the mixture model is used for. When this DBN is combined into an HMM, the to other nodes added include an object hypothesis and an ending hypothesis. The object hypothesis predicts the object type (Resistor, Wire, etc.), whereas the ending hypothesis predicts when the symbol is finished drawing.

The inference of a DBN is linear, whereas the inference on an HHMM (hierarchical HMM) is O(T^3). Therefore, Sezgin converts the model to a DBN before inference is conducted. This step was not explained. During training, the use of continuous variables could cause numerical underflow during belief propagation. A specialized algorithm, the Lauritzen-Jensen belief propagation algorithm, was used to avoid the instability issues.

Overall, the model worked well in the domain and improved the recognition (lowered the error rates) for all 8 participants involved in the test. Since the model relies on time, any interspersing (drawing two or more objects simultaneously) introduces errors. This causes primitives to be missed in sketches, with over 6% of the primitives missed on average due to this issue.

Discussion:

Relying on time data is tricky with sketch recognition, since time information can only be used in certain domains. Circuit diagram recognition is not necessarily one of these domains, as shown by the interspersing data. By increasing the model to be greater than first-order the model might be able to account for some issues, but then the model would not be able to run in real-time, which was a large proponent of the system.

Saturday, November 10, 2007

Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes

Summary:

Wobbrock et al.'s $1 recognizer is a system designed to be a simple recognizer that does not rely on any mathematical background. With this recognizer, novices to sketch recognition can have an use simple gestures in their interfaces.

The $1 recognizer has four steps: (1) resampling, (2) rotation, (3) scaling, and (4) classification. Points in a gesture are resampled into N equidistant points defined by the developer. The gesture is then rotated so that the line between the center of the gesture and the starting point is at the 0 degree position, i.e. the center-start point axis is at 3 o'clock. The gesture is then scaled to fit within a square of some size and translated so that the center of the gesture is at the (0,0) origin.

Finally, the gesture is classified by first calculating the average distance between points in the gesture and the points in all templates (known gestures). This is called the path-distance by the authors. A score is determined directly from this path distance and the scaled square's size.

Overall, the recognizer has good results with around 98% accuracy for simple gestures. The classification relies heavily on the number of templates in the system.

Discussion:

This recognizer really does not introduce anything new to sketch recognition, but it does wrap up some basic vision and sketch recognition techniques into a simple (and given) algorithm. The technique used by $1 is not much different then general template/bitmap comparison, except using the points is a bit nicer when working with single strokes. On the other hand, bitmap comparison allows for multiple strokes in any order. This technique also relies highly on visual differences in gestures, and it does not allow for the same gesture at different scales or rotations. For instance, a forward slash gesture with a mouse or pen is a common gesture to indicate move forward in a web browser. Similarly, a backwards slash/dash indicates "page back". This recognizer cannot handle these cases.

Wednesday, November 7, 2007

Speech and Sketching: An Empirical Study of Multimodal Interaction

Summary:

In this paper, Adler and Davis explore multimodal speech and sketch interfaces through a user study. Their goal is to allow the computer to provide feedback to the user as the user talks and draws, and the computer will influence the design during this process by asking questions and clarifying information. Having the computer understand everything about the design is not the goal; instead, the computer should know enough to ask motivating questions when necessary in order to engage the user. The system also does not want to constrain the user's drawing or speech style.

The user study conducted involved 18 users in a Wizard-of-Oz study. The users were asked to design a floor plan, full adder, AC/DC transformer, and a digital circuit. Sketches was done on Tablet PCs in software that allowed for drawing and highlighting in 5 different colors. During the study, the experimenter sat at a table across from the user. The study was filmed and the audio, visual, and sketching components of the study were synchronized.

The study showed some interesting results concerning color, questions, and speech timing. Users tended to rely on multiple colors to indicate portions of the sketch. The color linked parts of the sketch together, referred back to previous parts, and reflected the real-world colors of objects. When speaking, users typically had phrase and word repetition when they were thinking aloud. This could allow the computer to discern key words from the user-computer dialogue. Responses from computer questions also caused the user to repeat the questions, and simple questions could prompt more information than what was asked. Some users even redesigned their drawings after simple questions were asked, such as inquiring if two objects were similar. Speech and sketching started simultaneously in the study. Yet, certain parts of the speech, such as an entire phrase, tended to start before the sketch, and certain key words said alone tended to be heard after a sketch was started.

Discussion:

The two best components of Adler's study show how computers can assist humans during design steps by relying on the human design and thought process, instead of having an actual understanding. In lieu of training the computer to understand all of the components of a design, basic understanding of object similarity and grouping should be enough to produce a motivating dialogue. Also, the fact that the user constantly repeats words provides the computer with an indication of important information without the need of a large vocabulary.

I wish the study also went into more interface issues, such as when the computer should ask a question (e.g. during sketching, during a pause, etc.). Also, it would have been beneficial to see the average pause time of a user and if the user was speaking or mumbling during the pause by going "hmm" or something similar. Do the pauses for sketch indicate that the user is speaking, and do pauses for speaking indicate the user is sketching? Do the pauses for both modes line up?

Monday, November 5, 2007

Three main concerns in sketch recognition and an approach to addressing them

Summary:

Mahoney and Fromhertz discuss problems involved with matching models of hand-drawn sketches of stick figures. The figures are in simple polylines, but can be in any configuration with other figures or distracting objects in the background.

The system input (drawn figure) is highly variable, and the authors define certain problems in the variability. Failures of co-termination involve strokes over- or under-shooting one another (i.e. the endpoints of two disjoint strokes that are supposed to be connected do not touch). Articulation problems are encountered when strokes ore over- or under-segmented in preprocessing. Interaction with background context involves the figure to match or recognize set against a background of context strokes, other figures, or noise and distracting data.

In this paper, the matching process involves creating a graph of the model to find (figure) and searching for a mapping between the model and a data subgraph. Ambiguity is handled by adding alternative, plausible substructures to the graph. This happens through proximity linking, virtual junction splitting, spurious segment jumping, and continuity tracing. All four of these methods involve creating new links in the structure by searching for subtle connections, splitting current strokes, merging strokes together, and creating new stroke segments.

Subgraph matching is translated into a constraint satisfaction problem (CSP), where each stroke (node) in the graph is a variable, and the constraints are edges between the nodes. The final match tries to have the smallest link length, and the length of the segments should be in the appropriate ratios. Matching can also be influenced by a priori knowledge that defines certain components.

Discussion:

The use of a CSP seems to work very well in this case. The fact that the system can discern a stick figure in a sea of seemingly random lines is rather amazing, since I can barely see the figure myself.

One issue is that this system cannot seem to discern "fixed" figures, i.e. a square versus a parallelogram. For "loose" figures, the figures must also be connected with endpoint to endpoint. For a system with only a few strokes this can be simple, but if the user was trying to draw, say, a centipede, the system could find many good examples within a collection of strokes.

Constellation Models for Sketch Recognition

Summary:

In Sharon and van de Panne's paper, sketch objects are represented with constellation models. A constellation model is a visual model that captures individual and pairwise features between strokes. For their model the site features include:
  • The x-coordinate of the stroke's bounding box center
  • The y-coordinate of the stroke's bounding box center
  • The length of the bounding box diagonal
  • The angle of the bounding box diagonal
Interaction features include:
  • Delta x between the strokes
  • Delta y between the strokes
  • The minimum distance between the endpoints of stroke a and any point on stroke b
  • The minimum distance between the endpoints of stroke b and any point on stroke a
Since the interaction features are pairwise, the model scales with the number of strokes by O(n^2). The authors chose to allow labels to be optional or mandatory in order to reduce the run time of the model.

Each element of the feature vectors (site F and interaction G) have their mean and covariances computed for the training data set. The labels in the training data then have and individual probabilistic model computed for the label's features.

For the entire sketch, a likelihood function estimates the probability of an entire labeling L. This function is given in the paper. The overall function tries to maximize the probability of an individual label, as well as the probability of the interactions between two individual labels.

A maximize likelihood search tries to maximize the likelihood function defined, i.e. the find maximum probable labeling. The search is multipass, with the first pass labeling only mandatory strokes objects. All possible label assignments are then searched for using a "branch-and-bound search tree [where each] node in the search tree represents a partial labeling of the sketch." The depth of the tree corresponds to the number of labels applied. The search advances by choosing the best assignment of mandatory labels, and the "cost" of this assignment bounds the rest of the search. The multipass algorithm also bounds branches of the tree before a full labeling is found. If a new likelihood is worse than the bound, then the branches associated with the likelihood's labeling are pruned. If no complete solution is found with the bounding, the bound is loosened until a labeling is found.

In the results section, the multipass thresholding/bounding greatly reduces the time required to find a complete labeling.

Discussion:

Although the paper discussed how using a branch-and-bound search tree with multipass thresholding improves the runtime, no mention was given to the accuracy of the final system. I've worked with systems similar to this before, and I know that the accuracy can drop drastically as the number of labels increases. The authors seem to curb this by having "optional" labels, and by forcing each component to be drawn in a single stroke, but results for how the system's accuracy scales with the number of labels would have been benefitial.

Monday, October 29, 2007

Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches

Summary:

Oltman's Ph.D. thesis uses computer vision techniques to recognize freely-drawn symbols and sketches. Freely-drawn sketches do not constrain the user in drawing style, so many issues need to be taken into accounted. For instance, stroke overtracing is a large problem in free-form sketches, especially when doodles or notes are involved. Also, noise is more prevalent, and temporal data cannot be utilized because strokes can be drawn in any order.

To combat these issues, Oltman uses computer-vision based techniques that lessen the issues from overtracing and noise while ignoring any temporal features of the sketch. The technique used is dubbed "bullseye" by Oltman, and consists of a radial partition of space around a point. The radial partitions corresponds to a histogram that keeps track of the number of stroke points within that section/bucket/slice. Stroke points are preprocessed to be relatively equidistant from one another. Each symbol has a corresponding set of bullseye patterns stored in the "codebook" for the system. Matching a series of found histograms with the trained codebook patterns allows for symbol classification. The system is trained using a SVM on known data.

To find a symbol within an entire sketch, though, is rather difficult. Oltman tries to find "candidate regions" for a symbol by looking at small, overlapping windows of points and the combining these regions into a large regions. The large regions are found using EM clustering techniques to group the smaller regions together. Any cluster that is considered too large is split into smaller clusters. The bounding box of the cluster is then taken to be the symbol's region, and the bullseye histogram is taken for the ink within the box.

Overall, the system works well for individual symbols (94.4%), especially when compared to existing systems for noisy data. The system faired slightly worse when taking the entire sketch into account, achieving an accuracy of 92.3%.

Discussion:

The use of vision techniques to classify freely-drawn symbols is a good idea because stroke data has so much noise. Using vision mapping, whether it is histogram or simple pixel overlay, tends to avoid corner segmentation and overtracing issues.

I find the results for Oltman's full sketch tests slightly skewed. The vast majority of the shapes tested in the full sketch were wire segments and resistors. Since the wires are broken down into smaller segments, there were roughly 9,000 wires, and there were only approximately 14,000-15,000 shapes total. The accuracy for resistors (2000 shapes) was also extremely high, but the accuracy for the rest of the shapes was around 60-80%. So over 2/3 of the shapes were easy to detect with high accuracy because they were either a straight line (wire) or unique (resistor). It seems like the system has a very hard time distinguishing between similar shapes, such as batteries vs. capacitors, mainly because the histograms are just not accurate enough to catch small variations with noisy data.

I'm also interested to see how the histograms can be tweaked depending on the scale of the image. The histograms used were hard-coded to 40 pixel radii, and possibly having a variable histogram size would help.

Wednesday, October 24, 2007

Naturally Conveyed Explanations of Device Behavior

Summary:

Oltmans and Davis present ASSISTANCE, a multimodal system capable of understanding simple 2D physics diagrams. The diagrams can contain bodies, pin joints, springs, pulleys, and rods. Arrows are used to describe movement of objects, as well as verbal cues.

In ASSISTANCE, the user first draws the system they want to model. Then, the user verbally describes the system while pointing at objects in the drawing. ASSISTANCE constantly updates its interpretation of the drawing, and the user can ask for the computer's interpretation at any time. This interpretation is a "causal model" for the drawn system (i.e. a sequence of cause and effect actions).

To generate the causal model, ASSISTANCE first finds the degree of freedom each object has, such as rotation or translation freedom. The system then utilizes the verbal description of the system, as well as any arrows the user draws. Verbal information is parsed to separate key objects and actions. For example, the phrase "Body 2 pushes Body 3" will parse into "Body 2", "pushes", and "Body 3". These verbal phrases, as well as the drawn bodies and arrows, are converted into propositional statements, and ASSISTANCE performs reasoning using a forward-chaining algorithm and a truth maintenance system.

Often, the same action will be described in multiple ways, such as with a verbal description and an arrow indicating movement. When this happens, the two events are merged. The system assumes that only one motion can affect a body, so multiple descriptions affecting the same body would indicate that the descriptions describe the same event.

The final causal model is created by examining the causal events and constructing the most likely model for the system, given the description. To do this, ASSISTANCE uses known causal events and plausible causal events, along with constraint propagation. Events that do not have a cause are considered to be plausible and require an implicit cause by an outside force. The system tries to minimize these plausible causes, and the model is created when all clauses have events.

Discussion:

ASSISTANCE seems to be a great system, and I'm really curious how users evaluated it. There was no formal evaluation for this paper, but since we're reading his thesis next week I'll find out what users say shortly.

I love multimodal systems, but I also understand why there are not many multimodal applications commercially available. Being able to describe a drawing verbally and with gestural cues is great, and using both input modes can improve the system's accuracy when the two modes rely on each other for information. On the other hand, if the system does not force users to use all input modes, then the accuracy rate for each individual input still has to be very high, as if the separate input modes could not be relied on.

Wednesday, October 17, 2007

Ambiguous Intentions: a Paper-like Interface for Creative Design

Summary:

Gross and Do describe the Electronic Cocktail Napkin program that allows users to draw ambiguously as if they were sketching a design on a piece of scrap paper. The Napkin then takes the sketch, examines objects within the ambiguous drawing, interprets the drawing's context, and recognizes the objects.

Domain recognizers are user defined, where the user draws examples and the Napkin identifies the symbols and patterns. Symbols can span across multiple domains, such as a circle acting as a table in a floor plan domain, and the same circle representing a node in a graph domain.

As a user draws in a new, blank Napkin, the system discerns whether each stroke or symbol can be recognized, is ambiguous, or has failed to be recognized. If a symbol is recognized then it can only be recognized in the known domain, or only one domain. If a symbol is ambiguous then it can be recognized in multiple domains. Otherwise, the system cannot recognize the symbol at the time and stores that information for later. A user can also manually specify a symbol, if need be.

Symbols are related to each other via constraints specific to a domain. For instance, graphs and circuits have connected constraints, and floor plans can have perpendicular constraints for walls. Low-level recognition is accomplished with 3x3 normalized "glyphs" with features. Glyphs can also be grouped into configurations, which are ranked hierarchically above individual glyphs.

Discussion:

Overall, I think the Electronic Cocktail Napkin was a good idea that was too ahead of its time and too tied down to one system. The authors even mentioned that the system suffered from not having access to the best type of technology for the system, which they mentioned as LCD digitizing displays the user could draw on. The equipment that the Napkin was using involved a separate digitizing pad and display, which breaks the entire mapping from napkin to sketch pad.

Monday, October 15, 2007

Graphical Input Through Machine Recognition of Sketches

Summary:

Herot's short paper gave a brief, but comprehensive, look at sketch interaction systems in the mid 70s.

The paper first looks at a general recognizer, HUNCH, that tries to see if accurate knowledge can be obtained without using a specific domain. The system takes data drawn on a large tablet with a special pencil, and the raw input data is recorded by the computer. The HUNCH system used another application, called STRAIT, that found corners in data by examining the user's pen speed. The system also used a process called latching to snap endpoints of close lines together. Unfortunately, the HUNCH system had problem with consistency between different users. Users drawing at different pen speeds produced different corners, and the latching technique sometime distorted an intended image, such as oversnapping lines in a cube. The system also handles overtraced lines by merging lines together, provides some 3D image inference through unexplained techniques, and can create floor maps by looking at boxed rooms and doorways.

Context is an important part of a sketch, and Herot recognizes this fact by mentioning how data interpretations should have context information. The context should be specified to the computer as to avoid issues of recognizing the domain. Herot briefly mentions a top-down processing for recognizing sketches with a context architecture.

Lastly, Herot mentions that user input is a key component of a sketch recognition system that should not be ignored. More complex interfaces need to be developed so that a user can interact with a program and correct mistakes, and corner finding algorithms need to be tuned for an individual user.

Discussion:

Although none of the topics mentioned in Herot are new to me, the fact that all of these issues were mentioned in a paper written in 1976 is surprising. For instance, I had been under the assumption that using pen speed to detect corners was a relatively new fad.

I also am very surprised that the system tried (and from the one example, succeeded) at incorporating 3D image analysis. I remember reading a paper about using edges and vertices to detect whether an image is 3D, but I cannot seem to recall the author involved, so it's hard for me to construct a timeline for that research.

Wednesday, October 10, 2007

Perceptually Based Learning of Shape Descriptions for Sketch Recognition

Summary:

Veselova and Davis present a system to train a recognizer after only one drawn symbol example. The system relies on perceptual information in symbols, which was learned from previous experiments conducted with humans. Goldmeier is the main source of Veselova's perceptual information, and his research shows that people rely on singularities when distinguishing between objects. Singularities are important geometric properties in that are highly sensitive to change, such as parallelism, vertical and horizontal lines, and symmetry. Goldmeier's research also shows that the vertical axis of symmetry of an object was more important than the horizontal axis.

Veselova and Davis use this singularity information, as well as other constraints they define to be important, and create a ranked list of important constraints. They also use work from Arnheim, who describes tension lines, which are the important vertical, horizontal, and diagonal lines of an image. Gestalt principles for grouping are also used in their system to group small objects into larger wholes. These groupings also have hierarchy.

The user study for the system involved a group of people distinguishing between a description and 20 shapes, 10 of which agree with the description and 10 which do not. Humans then ranked what shapes agree and disagree with the description, and the recognition system does the same. If the vast majority of humans agreed on an example, the system performed very well and matched the human performance. Overall, the system performed about 5% less than humans when matching examples and descriptions.


Discussion:

I really enjoyed this paper. Using more cognitive science techniques, such as Gestalt groupings and preservation of symmetry, are not necessarily new methods to computer science (ScanScribe), but the explanation behind the principles was decent for the short length of the paper.

I am surprised that the system did not incorporate "must not" constraints, since the addition of nots shouldn't have been too difficult and is important in perception.

Tuesday, October 9, 2007

Interactive Learning of Structural Shape Descriptions from Automatically Generated Near-miss Examples

Summary:

Hammond and Davis present a debugging system in LADDER for shape definition errors. Errors can be either syntactical or conceptual, where a syntactical error is an error within the definition expressions themselves, whereas a conceptual error deals with over- or under-constrained definitions. An under-constrained definition finds many false positives due to too few constraints, whereas an over-constrained definition has many false negatives by allowing only very specific instances of a shape to be recognized.

The paper does not go into depth on handling syntactical errors, but the GUI for LADDER notifies the developer of any syntactical errors by highlighting definition problems in red. For over-constrained errors, the system first needs to have a drawn example and a definition that positively recognizes that example. Once this is given, the system scales and rotates the drawn shape and asks the user whether the new shape should be positively recognized. The system removes constraints from the initial list if needed. Then, the system generates near-miss examples by looking at the remaining constraints and testing them one at a time. Each constraint is negated, and if the user indicates that the resulting image should be recognized, then the tested constraint is removed from the list.

For under-constrained errors, the system again requires a drawn example and a definition. A list of possible constraints is then generated for the image and the constraints are tested one at a time by adding the negation of the constraint into the current constraint list. If the new set with the negation constraint should be positively recognized, then the constraint is not added to the system.

To generate the shapes during the testing process, each shape is converted into a set of numerical features that can be used in equations to stretch, rotate, and translate parts of the shape.

Discussion:

Knowing that this system existed in LADDER before I started creating my shapes would have been helpful. The idea behind eliminating or adding constraints is relatively simple, since constraint and not constraint cannot both exist in a definition.

Yet, after working with LADDER, I have found that one of my most frequent errors is not with over- or under-constraint, but more with the constraints themselves. For instance, when drawing a rectangle, I want the endpoints of lines to be touching or close to touching. But I cannot edit the value for "close", so instead I have to either force the user to draw a rectangle in a certain fashion so I know that the lines are connected, or I have to eliminate the close endpoints constraint. Both options are not optimal, and I'll experiment with the LADDER debugging system to determine if the system finds new constraints I did not know existed.

Wednesday, October 3, 2007

LADDER, a sketching language for user interface developers

Summary:

Hammond and Davis created a sketch language, LADDER, that allows developers to easily design a geometric shape recognition system. The language handles shape drawing, displaying, and editing, and shapes are defined using primitives, constraints, and other shapes. Many primitives and constraints are already defined for the user, with primitives such as Lines, Arcs, and Spirals. Constraints are for individual shapes, such as if the shape has a positive slope, and constraints can also describe shape interactions, such as if two shapes are intersecting.

Shape definitions are hierarchical and incorporate geometric information about primitives and their relationship to each other. For instance, an arrow is defined as a shaft and a head, where the shaft is defined by a line and the head is defined as two lines meeting at an acute angle. One of the shaft points must also touch the head where the two head lines meet. Shape groups are also formed from shapes that are often drawn together. One example the authors give involves arrows touching polygons, which are defined as forces within a mechanical engineering domain. Shapes can also have editing constraints. A shape can be allowed to be "rubber-banded", or stretched, on a fixed point, rotated around a center, or forced to remain static if the system designer chooses.

Shapes, when displayed to the user, are displayed by their ideal strokes. If only lines are allowed in a domain, then only shapes made from lines can be drawn. Creating these ideal shapes happens through a series of mathematical equations. The features of a shape are quantified, and then the constraints are translated into equations that the features are plugged into.


Discussion:

The LADDER system provides a great foundation for geometric recognition systems. It seems as if entire applications can be built relatively quickly, as long as the domains involved do not have complex shapes that cannot be broken down into primitives.

Since I haven't used the system too much, I'm not sure how tough it is to build complex applications. I also do not know how easy it would be to integrate the language into an existing application, or if LADDER is only designed to build complete systems. Having the ability to pick and choose what LADDER offers might also be beneficial; for instance, I might not want the system to clean the strokes as they are drawn. The paper did not provide many details about customizablity in this regard.

Wednesday, September 26, 2007

Recognizing and Beautifying Low-level Sketch Shapes with Two New Features and Ranking Algorithm

Summary:

Paulson's paper describes a sketch recognition system for recognizing primitive shapes, and combinations of these shapes, with high accurarcy. The system constrains the user to draw each shape to recognize in a single stroke, but does not constrain the user in drawing style (i.e. not gestures). The shapes that the system recognizes include lines, polylines, circles, ellipses, arcs, curves, spirals, and helixes. Complex fits can be formed from these eight shapes.

Feature calculation for each stroke incorporates computing direction, speed, and curvature graphs, as well as finding the corners of the stroke. Furthermore, Paulson introduces two new metrics: NDDE and DCR. NDDE is the normalized distance between extremes, and calculates the difference between the stroke length between the highest and lowest directional values in the stroke, and divides this segment length by the total stroke length. Arcs and curves tend to have high NDDE values, whereas polylines have lower NDDE values. DCR is the direction change ratio of a stroke, and is calculated by dividing the maximum change in direction by the average direction change. Arcs tend to have little changes in direction over time, whereas the direction graph for a polyline contains spikes in direction change.

After this preprocessing, the stroke and computed features are passed to individual shape testers. These testers, once for each shape (line, arc, etc.), see if the stroke follows set rules for each primitive. For instance, in the case of a circle test, the stroke must have a minor and major axis ratio close to 1.0, the stroke's NDDE value must be high, and the feature area error of the stroke must be low. Each shape tester returns whether the stroke passed the test.

Finally, for each shape test passed, the interpretation is placed in a sorted hierarchy. The hierarchy assigns weights for shapes, based on the shape's commonness and how complex a fit is. The interpretations added to the hierarchy in a certain order, described in the paper.

Paulson's system obtains a 98.6% recognition rate for the entire system, and a 99.89% rate where the correct interpretation is within the top 3 handed to the user. The recognition gain comes from his two new features, which increase the system's accuracy by over 20%. The system ranking system helps sort the interpretations so that the system is 7% more likely to rank the correct interpretation as the top interpretation.

Discussion:

Paulson's results are quite amazing in that the system's recognition improves by 20% with just two new features. When analyzing his results table, I noticed that Sezgin's algorithm worked better than Paulson's when returning complex fits, but performed worse on every other shape test. Since Sezgin's method heavily favors complex fits these results seem logical.

I noticed that circles and ellipses are given the same hierarchy score (5) in the system, which striked me as odd since I'd want to try and favor circles more than ellipses. Since a circle is always an ellipse, but an ellipse is not always a circle, I would want to favor circles in the final ranking. In this way, the system will slightly prefer circles over ellipses when the two tests pass.

Monday, September 24, 2007

Streaming Algorithms for Line Simplification

Summary:

Aban et al. discussed an algorithm to approximate a drawn stroke in real time. With a possibly infinite number of points in a stroke and a limited amount of memory, simplifying the stroke as its being drawn will save the computer storage space.

The algorithm runs in O(1) time for three specific cases: when using the Hausdorff distance if the path is convex (i.e. every angle on the path is less than or equal to 180 degrees) or if the path is xy-monotone (any horizontal or vertical line intersects the path at most once), and when using the Frechet distance for general paths. Aban et al. never defined the Hausdorff distance, but the Frechet distance between two paths is equal to the maximum distance the paths are away from each other at two points (I think...)

The algorithm to simplify a stroke takes a the new point to add to the simplification and computes an error for the new addition. The point with a minimum error is then removed from the simplification, and the new errors are for each remaining segmentation are calculated.

Discussion:

I don't really understand how this algorithm works. If the algorithm keeps removing points from the segmentation, shouldn't the number of points always be constant? But then how do we build up points to have a segmentation in the first place? I need to reread the paper a few more times before I understand everything.

Also, on a personal note, I tend to dislike papers that use words like "simply" and "clearly." Especially when I don't think the paper is clear.

Thursday, September 20, 2007

A Curvature Estimation for Pen Input Segmentation in Sketch-based Modeling

Summary:

Kim and Kim propose a new corner detection system that utilizes local changes around corners.

The system starts by resampling the points of the stroke so that all points within the stroke are equidistant from one another. The curvature value at a point is typically set to equal the distance around the point divided by the changes over stroke length. By having a constant stroke distance, the system allows the direction values (calculated the same as in Sezgin and Yu) to equal the curvature values.

The final curvature value around each point is determined by two new metrics: local convexity and local monotonicity. Local convexity measures the "support" for a potential corner point. This value is calculated by looking at a window around the point and adding all of the curvature points of the same sign as the point's curvature. Local monotonicity examines the same window of points around a possible corner, but this time each point is examined in sequence, starting from the center. The curvature value can only be added if it is less than the previous point's and the same sign; if these two requirements are not met the algorithm stops and returns the current curvature.

The algorithm found the correct corners approximately 95% of the time.


Discussion:

Proposing two new (seemingly easy to implement) curvature metrics could provide corner finders with more techniques to solidify between good corners and noisy data. The paper was only published recently so I'm not sure how much of an impact it has caused yet.

The main issue I would have liked discussed more in the paper would have been where this corner finder differed from others in results. I know what types of shapes Sezgin's corner finder does well with (polylines, simple arcs) but I have yet to seen a corner finder that can continually find distinguish truly tough corners, such as a smooth line-arc transition. Kim & Kim's paper even doesn't count this transition as a corner (see Fig. 14, shapes 13 and 20). Their paper even vaguely states their accuracy rating as being "about 95 percent." I would think that if the curvature calculations were significantly better than previous corner finders' that the authors would want to show more proof. I guess we'll see how well the new metrics work this weekend...

Monday, September 17, 2007

A Domain-Independent System for Sketch Recognition

Summary:

Yu and Cai's paper attempts to find corners in strokes by utilizing the stroke's direction and curvature information. Unlike Sezgin's corner finder, Yu and Cai's does not use speed data.

The corner finder first cheks to see if the stroke can be approximated by a primitive shape such as a line, arc, or circle. If not, the stroke is divided into two smaller strokes at the point of highest curvature. The system again checks to see if each of the subsections can be approximated by primitives, and the cycle of splitting the stroke and checking continues.

At each primitive approximation step, the corner finder looks at the direction graph of the segment to determine whether the segment is likely to be a line or an arc. The direction graph for a line is flat, whereas direction graphs for arcs and circles are sloped. To fit a line to the segment a least squares fit is used. To fit an arc to a stroke, a line is fitted between the first and last points of the segment. An arc is then formed to include the two end points and the point on the original stroke perpendicular to the midpoint of the line. The "center" of this arc (which can be viewed as a segment of a circle) is found, and to determine the error of the arc fit the corner finder calculates the difference between the area of the original stroke and the area of the beautified arc fit.

A circle is just another case for arc fitting, except the slope of the arc should be above or around 2 pi.

If the stroke intersects itself, another fit should be computed that takes into account the self-intersection points as corners. The best fit between the original computation method and the new, self-intersection method is taken to be the final corner set.

The corner finder "cleans up" the best fit by merging or removing segments. If a segment is short in length and not connected to two other segments it is removed (eliminates hooks). If a segment is short in length and connected to two other segments it is split down its midpoint and the two sections are attached to the larger shapes. If two segments have similar angles and proximity (touching, almost connected) then they are merged.

The paper also includes some basic object recognition, but it really is no more than some simple geometric classification combined with the fitting methods already described.

The system worked very well on lines, arcs, and polylines. Correct corner rates on these cases were in the mid to upper 90s. Yet, on hybrid line and arc shapes the system had mixed results. The system still failed to find corners where lines and arcs smoothly transition into one another and obtained a 70% accuracy in these cases. The accuracy for other cases was around 90%.


Discussion:

Yu and Cai's paper shows another method for corner finding, this time without using speed data. Yet, their approach was very similar to Sezgin's. Although it was described differently, the system still essentially looks at the current corner fit, adds another possible corner, and analyzes the fit again.

The main change in this system is the way that it handles circles and arcs. The approach would seem to be more accurate on some cases, such as circles. Also, looking at the slope of a large portion of the direction graph seems like a good technique to distinguish between lines and arcs.

Overall I was disappointed with the system's accuracy. It doesn't appear to be very significant from Sezgin's. The main issue I have with the corner finder is that the system still only uses curvature values to find corners. In a project I worked on last year I had to find edges within a digital image, and I think a modified edge detection algorithm might work well for finding corners. You start off by looking at a small segment of points and determine the least squares fit. Keep adding chunks of points to the segment until your least squares fit starts to level. When the fit starts to change again, backtrack a little and add a corner. Then repeat, creating a new segment starting at your new corner.

The method can detect more subtle differences in curvature that a global curvature threshold will not detect (the threshold for each segment is the local change). I'll probably implement this later if I have the time and see how it goes. I know one issue with this approach is that it can be computationally intensive if you have long segments.

Wednesday, September 12, 2007

Sketch Based Interfaces: Early Processing for Sketch

Summary:

Sezgin et al.'s paper describes a way to find corners (vertices) of freely drawn symbols in order to break the symbol into lines and arcs. These lines and arcs can then be used to define the symbol, which can be passed into a geometrical recognizer for classification.

The vertex finding algorithm works by locating points of high curvature and minimum pen speed. Curvature is defined as being the change in direction at a given point, and speed is the change in arc length over time. Although the paper does not mention this, the curvature value for a point can be found by using a least squares fit over a window of points from p - k to p + k, and taking the slope of the fit (while watching for vertical lines, of course). Once we have the curvature and speed values for each point in the stroke we can find the local minima points that drop below a speed threshold and local maxima points that rise above a curvature threshold. In this paper the speed threshold is taken to be 90% of the average speed, and the curvature threshold is equal to the average curvature. These minima and maxima are possible vertices for speed and curvature, respectively.

Take the intersection of both the speed and curvature corners to get a starting set of possible corners, and now Sezgin et al. go through the remaining corners and calculate "fits" for each. The curvature metric is equal to the magnitude of the average of two curvature values some k window of points away, divided by the arc length between those points. The speed metric is just 1 - (speed at point / max speed). The remaining corner candidates are then sorted in their respective lists according to high metric value.

The algorithm then takes one remaining corner from each set (remaining speed and remaining curvature) and generates two new "hybrid fits": current corner set + speed candidate, current corner set + curvature candidate. The hybrid fits are then tested using a least squares algorithm between each vertex, and the fit with the least error becomes another possible corner fit for our stroke. More fits are generated until all remaining candidate corners are used up. The best fit is chosen based on an error threshold, and then the one with the least number of corners is chosen from those below the threshold.

Since a least squares fitting does not work with curved regions, for any arcs in a stroke the algorithm approximates the arc using a Bezier curve approximation. If the error on the curve we are trying to approximate is greater than a threshold we split the arc at the middle and create two new curve approximations. These curves are used to determine if fit errors.

The system can then beautify the sketch, which is a trivial process if we already have all of the vertices and the Bezier curve approximations. Overall the system had great results, and users "praised" the research because it allowed them to draw objects free-hand. The system accurately found the corners 96% of the time.


Discussion:

This paper is a cornerstone for vertex finding, pun intended. The research is well discussed, the algorithms are relatively easy to implement, and the results are fantastic.

The main thing that I would have liked to see included in the paper was a mention of where and how the algorithm can make a mistake. From my experience I know that corner finding is difficult on poorly drawn circles. Speed variations for circles can be slight, and if the circle looks more like an oval (as in the number 0) there can be a small protrusion or bump at the bottom of the circle that acts as a corner. I'd be interested to see how his system performed on these types of shapes.

Also, thresholds are very tricky when you take into account shapes that include both polylines and arcs. If it is a complex shape that needs to be drawn in a single stroke a user can sometimes pause in the middle of drawing to think about how to draw the next section. This pause will definitely hit as being a vertex, but it destroys the speed average and hurts the finding of more subtle corners.

Thursday, September 6, 2007

MARQS: Retrieving Sketches Using Domain- and Style-Independent Features Learned from a Single Example Using a Dual-Classifier

Summary:

Brandon's paper proposes a search algorithm that can search a laboratory notebook using sketches. The algorithm itself needs to be invariant to user drawing styles and account for rotation and scaling issues in sketches. It also must be able to recognize sketched symbols from only one drawing since a user searching through their notes might have only drawn a certain sketch one time. Brandon created the MARQS system, a media library with sketch queries, in order to test the algorithm.

To reduce rotation variance the algorithm rotates each sketch so that its major axis is horizontal. The major axis is defined as being the line between the two farthest points within the sketch. The four features used are: bounding box aspect ratio, pixel density, average curvature, and the number of corners found. If only one sketch has been drawn before, a query search will run a single classifier which will do a simple check for the error between the query features and the drawn example. A linear classifier is used when there is more than one example sketch during the query.

The average ranking (or classifier rank) for each sketch query was 1.51. As the system was used more often the results showed that the ranking improved since the system moved away from the linear classifier.


Discussion:

The system seems to work pretty well for very simplistic search algorithms, especially since it only uses four features. The downward trend in the ranking preference is also very promising since searches will likely be performed often.

MARQS' main limitation is in the features it can use since the system runs a linear classification on a freely drawn sketch. Each feature has to be invariant to the number of strokes, rotation, and scale. Yet, the feature for the number of perceived corners could easily change between sketches if somebody drew a tree with one jagged line whereas another time they drew it in small dashes. Merging close endpoints together into one larger stroke might be beneficial to the system, and then some other features like average length of strokes might be used.

Another thing to think about is if the journal system is ever implemented, how do you distinguish between a sketch and handwriting? I recently saw some great research from Auckland, but I can't find the paper online to link here. If you really want to know more ask me for a copy.

Tuesday, September 4, 2007

"Those Look Similar!" Issues in Automating Gesture Design Advice

Summary:

Gesture that are similar are hard to classify for a computer and difficult to remember for a user. Long and Landay created a tool called quill that helps designers improve any gestures they want to use by highlighting any gesture similarities. Designers can then take these similarities into account and redesign their gestures to be more unique.

When using quill, designers take a gesture they want implemented into their system and draw some examples into the tool. After more than one gesture class has been drawn quill begins to analyze the gestures for similarities. Similarity was determined to be based heavily upon gesture curviness and angle, as well as the density of the gesture. For more on how similarity was determined see "Visual Similarity of Pen Gestures."

The paper goes into detail about how presenting advice to the user is a large interface challenge. Long and Landay did not want quill to annoy expert gesture designers, but they also wanted to provide enough advice that both novice and experts could benefit from the system. In the end they decided to delay advice until the designer decides to test a gesture. The authors decided that designer were probably done with tweaking their initial gestures a bit when they were testing. Yet, quill also provides more subtle warnings to the designers whenever they pause.


Discussion:

Improving gesture designs by letting the computer point out when gestures might be confusing is a great way to safeguard poor gesture choices. The analysis of when to present advice to a user is also a large UI problem outside of quill; for instance, when should a stroke/symboll be beautified or when should a symbol be recognized? Long and Landay provided a good overview of their decisions, and I really liked their choice to present advice when the designer decides to test their classes.

I'm curious at how good the advice actually is for the system and how annoying it might be if the same advice keeps appearing every time I test my classes. If I design a left-to-right horizontal line to be a "Page Forward" gesture and a right-to-left horizontal line to be "Page Back" the computer might tell me that the gestures are similar, but the gestures have such a good mapping to forward and back arrows that users shouldn't confuse the two. I wouldn't want the system to constantly be warning me of this issue.

Friday, August 31, 2007

Specifying Gestures by Example

Summary:

Rubine's gesture recognition system, GRANDMA, is a single-stroke gesture recognizer and toolkit so that developers can add gestures to their applications. Gestures can be useful when they provide a means for intuitive input. As an example, the paper shows how a gesture-based drawing program (GDP) could use gestures to create simple shapes and edit them. These gestures could be created with GRANDMA by first defining what types (or classes) of gestures will be used and then collecting examples for each class. It was empirically determined that fifteen gesture examples should suffice for each class.

Drawn gestures are composed of an array of time-stamped points. Thirteen features are calculated for each gesture, such as the starting angles of the gesture, the angle of the bounding box, length of the bounding box diagonal, the total length and rotation of the gesture, the smoothness of the gesture, and the time taken to draw the gesture. These features are invariant to gesture placement (i.e. where the gesture was drawn), but they do take into account scaling and rotation.

To classify the gesture we individually dot product the feature vector with a weight vector for each gesture class defined. The dot product with the maximum value is taken to classify the gesture. This weight vector is computed during gesture training. In training each of the gesture examples drawn has a feature vector calculated and the average feature vector of the examples taken. A weight vector for the class is then found by trying to find the defining features for the set of examples using covariance matrices (http://mathworld.wolfram.com/Covariance.html).

Overall the gesture system worked very well, but as the number of gesture classes increased the recognition rate lowered. The number of training examples used increased the recognition rate up to around 50 examples, but after that it appeared that there was either a plateau or overfitting.


Discussion:

Rubine's gesture system is a great paper that shows that sketch recognition can be simple, fast, and reliable if the user is constrained in certain ways. Gestures are easy to define with GRANDMA, and the calculations to classify gestures can happen in real time. The system also had outstanding recognition results with small numbers of gestures between 5 and 10. Even as the number of classes allowed was increased to 30 the recognition rate lowered but was still acceptable at around 96.5%.

The main problem with the system is that it does require a lot of constraints on the users end. Like Palm Pilot Graffiti, over time the user will be accustomed to drawing in a certain way. This isn't necessarily a bad thing. With any new appliance or application people need to be trained to use it. My new toaster works much differently than my old one and I'm still getting adjusted with the settings. Even with newer non-gesture software, such as Tablet PC handwriting recognition, I have grown accustomed to drawing my lower case Ls in cursive since the print version is confused with the number 1 too often. Yet, when an application is thought of as being intuitive there is much less wiggle room for how much training is needed. If Photoshop does not work as I intended I'm likely to blame myself for a mistake whereas if the computer does not recognize my circle gesture I'm more likely to blame the software.

In the case of GRANDMA the rotation and scale constraints are a bit too much in my opinion; I would try to normalize everything to a standardized bounding box to eliminate scale. Yet these could be acceptable in some situations, such as full keyboard gestures where we try to recognize '/' versus '|' versus '1'.


Rubine, D. 1991. Specifying gestures by example. In Proceedings of the 18th Annual Conference on Computer Graphics and interactive Techniques SIGGRAPH '91. ACM Press, New York, NY, 329-337.

http://portal.acm.org/citation.cfm?id=122753

Wednesday, August 29, 2007

Introduction to Sketch Recognition

Summary:

The paper comprises of a brief but comprehensive look at the sketch recognition field. Some of the topics covered include current pen-based hardware, pen-centric software, and various uses for Tablet PCs.

The main focus on Tablet PC use involved an overview on how they could be used in an educational environment. Pen-based technology has been shown to have a mixed affect on student performance in a classroom, but overall the reception by students has been positive. Teachers feel that some Tablets and software help with class lectures, such as using Windows Journal to create presentation templates. Yet, Tablet hardware also limits teachers since they are tethered to a projector through the cables or forced to use a small amount of space. Tablet software can also help students learn on their own. MathPad allows for students to check their math notes and homework and Physics Simulator allows students to model and simulate mechanical engineering diagrams.

In two case studies, teachers in middle and high school evaluated Tablet PCs in their classes. Both teachers used the technology differently but enjoyed using the computers to enhance their teaching. One teacher used Tablets to record either at home or in class demos to archive. The other used Tablets to create presentation templates that they could save and write over during class. The templates could be used annually to save the teacher class preparation time.


Discussion:

A good discussion topic for this paper would be "Where do you think this technology is heading?" and "How can we improve it further for classroom use?" Tablet technology is getting cheaper and software is providing better support for pens. Although a limitation right now is cost of technology (smart boards vs. regular white boards & projectors, notebook vs. tablet) what people should think about is "What if we had no limitations?" It's a fun science fiction question to ask at this time.

Sketchpad

Summary:

Sketchpad was the first sketch recognition system and was developed in the early sixties. The system used a light pen and keyboard buttons in order to interact with the computer to create design-quality drawings. The light pen would draw or select objects on a computer screen while the keyboard would toggle various constraints to place on the current drawing of selection.

The constraint system was one of the key features in Sketchpad that allowed the system to produce drawings that looked good. With constraints a user could draw perfect lines and circles without having to worry about pen noise; they would only hold a button indicating that what will be drawn next is a line or circle. Furthermore, touching up a drawing was easy with constraints that could force selected groups of lines to be horizontal, parallel, or perpendicular. Sketchpad also allowed corner snapping constraints which locked a corner of one object onto an edge or corner of another. The constraint system in Sketchpad was implemented using logic, where constraints were variables and the list of constraints on a system could be evaluated to ensure that constraints are satisfied. Relaxation and one pass are the two constraint satisfaction methods mentioned.

Sketchpad worked best when a sketch required a lot of repeated patterns and shapes. The system could create instances (shallow copies) or copies (deep copies) of drawn objects, which could then be resized, rotated, and moved around the drawing space. A design requiring a lot of repetition, such as the hexagonal example in the paper, can be created very quickly without having to repeatedly draw hundreds of hexagons.

Discussion:

Sketchpad is a great system that was revolutionary for its time. As we discussed in class it was the first sketch recognition system and the paper highlighted some key areas that sketch recognition technology could be beneficial, such as in architecture, art, and engineering. Sketchpad also established some object oriented programming techniques by providing shallow and deep copies of drawn objects.

The use of constraints was a good method to ensure that the drawing looked good, but it also forced the user to remember many keyboard commands. Also, the keyboard can only handle as many constraints as the number of keys if the constraints are kept on separate keys; multiple key combinations could allow for n-factorial more constraints but at a high cost of usability. To alleviate this burden it might have been better to have two separate screens: the drawing screen and a constraint menu screen. The constraint buttons on the side of the Sketchpad window could be shifted to the new constraint menu screen with each button pointing to a menu option. The options would follow from the constraints described in the paper, with drawing and selection constraints in hierarchical menus.

Introduction

Name: Aaron Wolin

Year: First Year PhD student

Email: awolin at neo dot tamu dot edu

Academic Interests:

My current academic interests consist of Sketch Recognition, AI, and HCI. I’ve been involved with sketch recognition projects for over a year now and find them very exciting from an HCI and AI perspective. The use of a pen as input heavily constraints traditional mouse and keyboard input possibilities while simultaneously allowing for new applications, such as handwriting programs. Sketch recognition also requires a great deal of AI since computers need to have knowledge of what is being drawn.

Relevant Experience:

  • Impro-Visor – Research application from Harvey Mudd College teaching amateur jazz musicians to compose solos. Musicians enter notes to a solo within a composition window and advice concerning the musical “correctness” would be displayed to them. The advice manager also allows students to pick from good scales and tones that might fit their solo. http://www.cs.hmc.edu/~keller/jazz/improvisor.html
  • Circuit Diagram Recognizer – This is a continuing sketch recognition research project at Harvey Mudd College. The overall goal of the project is to provide students feedback on their drawn circuit diagrams, such as from class notes. An ideal situation would be for an engineering student to draw a diagram free hand with an accompanying truth table, run it through our recognizer, and then see if the circuit diagram has been implemented correctly based in the truth table provided. The project is heavily AI based since we do not constrain student drawing styles. http://www.cs.hmc.edu/~alvarado/research/sketch.html
  • Document Finding (before OCR) – Another project at Harvey Mudd College involved me working with a document digitizing company called Laserfiche. Our project’s focus was to find and crop documents in pictures taken by digital cameras. This essentially uses the camera as a scanner and would allow for more portability of document digitizing and OCR software.

Why I'm taking this class:

Although I’ve already had some sketch recognition experience my research has been focused on free-style sketch recognition. There are many other areas (gesture based systems, sketch beautification) that I have not explicitly worked with, and I want to expand my knowledge of the field.

What I hope to gain:

A general expansion of my sketch recognition knowledge, as well as having fun and learning techniques that can be applied to my research.

What I'll be doing in 5 years:

I'll (hopefully) be finishing up my research and time here at Texas A&M.

What I'll be doing in 10 years:

I don't know, going to Mars. My current thoughts are to go into industrial research, but anything can change in 10 years. Four years ago I wouldn't have said I'd be going to graduate school. In 10 years anything can happen.

Non-academic interests:

Reading, movies, playing poker, concerts, board games, mixology

Fun Story:

Every year at Harvey Mudd my friends and I would make our own sushi. One of my friends had some sushi making supplies and he taught everybody how to make rolls. During our third year we discovered a website where you can get really fresh fish, but the catch was that you had to order over $50 worth in order to qualify for shipping. The last two years we had so much fish after we had gorged ourselves we were shouting at people outside our door to come on in and eat sushi.