2008 Book InformationVisualization
2008 Book InformationVisualization
Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Alfred Kobsa
University of California, Irvine, CA, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
University of Dortmund, Germany
Madhu Sudan
Massachusetts Institute of Technology, MA, USA
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max-Planck Institute of Computer Science, Saarbruecken, Germany
Andreas Kerren John T. Stasko
Jean-Daniel Fekete Chris North (Eds.)
Information
Visualization
Human-Centered Issues and Perspectives
13
Volume Editors
Andreas Kerren
Växjö University
School of Mathematics and Systems Engineering
Computer Science Department
Vejdes Plats 7, 351 95 Växjö, Sweden
E-mail: [email protected]
John T. Stasko
Georgia Institute of Technology
School of Interactive Computing and GVU Center
85 5th St., NW, Atlanta, GA 30332-0760, USA
E-mail: [email protected]
Jean-Daniel Fekete
INRIA Saclay - Île-de-France Research Centre
Bat. 490, Université Paris-Sud
91405 Orsay Cedex, France
E-mail: [email protected]
Chris North
Virginia Tech, Department of Computer Science
and Center for Human-Computer Interaction
2202 Kraft Drive, Blacksburg, VA 24061-0106, USA
E-mail: [email protected]
ISSN 0302-9743
ISBN-10 3-540-70955-X Springer Berlin Heidelberg New York
ISBN-13 978-3-540-70955-8 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
© Springer-Verlag Berlin Heidelberg 2008
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper SPIN: 12438815 06/3180 543210
Preface
Book Structure
Paper 1 “The Value of Information Visualization”, provides a discussion of
issues surrounding the utility and benefits of InfoVis. The paper identifies
why communicating the value of InfoVis is more difficult than in many other
areas, and it provides a number of arguments and examples that help to
illustrate InfoVis’ value.
Paper 2 “Evaluating Information Visualizations”, discusses the challenges as-
sociated with the evaluation of InfoVis tools and approaches. Different types
of evaluation are described as well as the advantages and disadvantages of
different empirical methodologies.
Paper 3 “Theoretical Foundations of Information Visualization”, addresses an
important issue: InfoVis, being related to many other diverse disciplines, suf-
fers from not being based on a clear underlying theory. Drawing on theories
within associated disciplines, three different approaches to theoretical foun-
dations of information visualization are presented: data-centric predictive
theory, information theory, and scientific modeling.
Paper 4 “Teaching Information Visualization”, presents the results of a survey
about InfoVis-related courses that was distributed to the Dagstuhl atten-
dees during the seminar. It also summarizes the discussions about teaching
held by the attendees during a special session on that topic. The paper in-
cludes the perspectives of three seminar participants in relation to their own
InfoVis-related teaching experiences.
Paper 5 “Creation and Collaboration: Engaging New Audiences for Informa-
tion Visualization”, discusses creation and collaboration tools for interactive
visualization. The paper characterizes the increasingly diverse audience for
visualization technology, and it formulates a design space for new creative
and collaborative tools to support these users.
Paper 6 “Process and Pitfalls in Writing Information Visualization Research
Papers”, identifies a set of pitfalls and problems that recur in many InfoVis-
related papers, using a chronological model of the research process. The
aim of this paper is to help authors avoid these pitfalls and write better
papers. Reviewers might also find these pitfalls interesting to consider when
evaluating the merits of a paper.
Paper 7 “Visual Analytics: Definition, Process, and Challenges”, describes the
related and growing field of visual analytics. The paper explains the perceived
difference between visual analytics and InfoVis, and it identifies the technical
challenges faced by visual analytics researchers. The paper concludes by
describing a number of visual analytics applications.
Preface VII
Acknowledgments
We would like to thank all those who participated in the seminar for the lively
discussions as well as the scientific directorate of Dagstuhl Castle for giving us
the opportunity to organize this event. The abstracts and talks can be found on
the Dagstuhl website for this seminar2 . In addition, we are also grateful to all
the authors for their valuable time and contributions to the book. Last but not
least, the seminar would not have been possible without the great help of the
staff of Dagstuhl Castle. We would like to thank all of them for their assistance.
2
http://www.dagstuhl.de/07221
Table of Contents
Jean-Daniel Fekete1 , Jarke J. van Wijk2 , John T. Stasko3 , and Chris North4
1
Université Paris-Sud, INRIA, Bât 490,
F-91405 Orsay Cedex, France,
[email protected],
http://www.aviz.fr/~ fekete/
2
Department of Mathematics and Computing Science,
Eindhoven University of Technology, P.O. Box 513,
5600 MB EINDHOVEN, The Netherlands,
[email protected],
http://www.win.tue.nl/~ vanwijk/
3
School of Interactive Computing, College of Computing & GVU Center,
Georgia Institute of Technology, 85 5th St., NW,
Atlanta, GA 30332-0760, USA,
[email protected],
http://www.cc.gatech.edu/~ john.stasko
4
Dept of Computer Science, 2202 Kraft Drive,
Virginia Tech, Blacksburg, VA 24061-0106, USA,
[email protected],
http://people.cs.vt.edu/~ north/
A. Kerren et al. (Eds.): Information Visualization, LNCS 4950, pp. 1–18, 2008.
c Springer-Verlag Berlin Heidelberg 2008
2 J.-D. Fekete et al.
to show that their inventions are measurably better than the existing state of
the art.
In broad analytic fields, of which we include InfoVis as a member, the exis-
tence of a ground truth for a problem can greatly facilitate evaluations of value.
For instance, consider the field of computer vision and algorithms for identifying
objects from scenes. It is very easy to create a library of images upon which new
algorithms can be tested. From that, one can measure how well each algorithm
performs and compare results precisely. The TREC [29] and MUC [3] Contests
are examples of this type of evaluation.
Even with a human in the loop, certain fields lend themselves very well to
quantifiable evaluations. Consider systems that support search for particular
documents or facts. Even though different people will perform differently using
a system, researchers can run repeated search trials and measure how often a
person is able to find the target and how long the search took. Averaged over a
large number of human participants, this task yields quantifiable results that can
be measured and communicated quite easily. People or organizations then using
the technology can make well-informed judgments about the value of new tools.
So why is identifying the value of InfoVis so difficult? To help answer that
question, let us turn to what is probably the most accepted definition of InfoVis,
one that comes from Card, Mackinlay, and Shneiderman and that actually is
their definition for “visualization.” They describe visualization as “the use of
computer-supported, interactive visual representations of data to amplify cog-
nition.” [2] The last three words of their definition communicate the ultimate
purpose of visualization, to amplify cognition. So, returning to our discussion
above, is the amplification of cognition something with a ground truth that is
easily and precisely measurable? Clearly it is not and so results the key challenge
in communicating the value of InfoVis.
Further examining the use and purpose of InfoVis helps understand why
communicating its value is so difficult. InfoVis systems are best applied for ex-
ploratory tasks, ones that involve browsing a large information space. Frequently,
the person using the InfoVis system may not have a specific goal or question in
mind. Instead, the person simply may be examining the data to learn more about
it, to make new discoveries, or to gain insight about it. The exploratory process
itself may influence the questions and tasks that arise.
Conversely, one might argue that when a person does have a specific question
to be answered, InfoVis systems are often not the best tools to use. Instead, the
person may formulate his or her question into a query that can be dispatched
to a database or to a search engine that is likely to provide the answer to that
precise question quickly and accurately.
InfoVis systems, on the other hand, appear to be most useful when a person
simply does not know what questions to ask about the data or when the person
wants to ask better, more meaningful questions. InfoVis systems help people to
rapidly narrow in from a large space and find parts of the data to study more
carefully.
Unfortunately, however, activities like exploration, browsing, gaining insight,
and asking better questions are not ones that are easily amenable to establishing
The Value of Information Visualization 3
and measuring a ground truth. This realization is at the core of all the issues
involved in communicating the value of InfoVis. By its very nature, by its very
purpose, InfoVis presents fundamental challenges for identifying and measuring
value. For instance, how does one measure insight? How does one quantify the
benefits of an InfoVis system used for exploring an information space to gain a
broad understanding of it? For these reasons and others, InfoVis is fundamentally
challenging to evaluate [17].
If we accept that InfoVis may be most valuable as an exploratory aid, then
identifying situations where browsing is useful can help to determine scenarios
most likely to illustrate InfoVis’ value. Lin [11] describes a number of conditions
in which browsing is useful:
– When there is a good underlying structure so that items close to one another
can be inferred to be similar
– When users are unfamiliar with a collection’s contents
– When users have limited understanding of how a system is organized and
prefer a less cognitively loaded method of exploration
– When users have difficulty verbalizing the underlying information need
– When information is easier to recognize than describe
These conditions serve as good criteria for determining situations in which the
value of InfoVis may be most evident.
example shown in Figure 1. Part (a) shows a spreadsheet with data for the 50
states and the District of Columbia in the U.S. Also shown are the percentage
of citizens of each state with a college degree and the per capita income of the
states’ citizens.
Given just the spreadsheet, answering a question such as, “Which state has
the highest average income?” is not too difficult. A simple scan of the income
column likely will produce the correct answer in a few seconds. More complex
questions can be quite challenging given just the data, however. For example,
are the college degree percentage and income correlated? If they are correlated,
are there particular states that are outliers to the correlation? These questions
are much more difficult to answer using only the spreadsheet.
Now, let us turn to a graphical visualization of the data. If we simply draw
the data in a scatterplot as shown in part (b), the questions now become much
easier to answer. Specifically, there does appear to be an overall correlation
between the two attributes and states such as Nevada and Utah are outliers on
the correlation. The simple act of plotting the spreadsheet data in this more
meaningfully communicative form makes these kinds of analytic queries easier
to answer correctly and more rapidly.
Note that the spreadsheet itself is a visual representation of the data that
facilitates queries as well. Consider how difficult the three questions would be if
the data for each state was recorded on a separate piece of paper or webpage.
Or worse yet, what if the data values were read to you and you had to answer
6 J.-D. Fekete et al.
According to Information Theory, vision is the sense that has the largest
bandwidth: 100 Mb/s [30]. Audition only has around 100 b/s. In that respect,
the visual canal is the best suited to carrying information to the brain.
According to Ware [30], there are two main psychological theories that explain
how vision can be used effectively to perceive features and shapes. At the low
level, Preattentive processing theory [23] explains what visual features can be
effectively processed. At a higher cognitive level, the Gestalt theory [9] describes
some principles used by our brain to understand an image.
Preattentive processing theory explains that some visual features can be per-
ceived very rapidly and accurately by our low-level visual system. For example,
when looking at the group of blue circles in Figure 2, it takes no time and no
effort to see the red circle in the middle. It would be as easy and fast to see that
there is no red circle, or to evaluate the relative quantity of red and blue cir-
cles. Color is one type of feature that can be processed preattentively, but only
for some tasks and within some limits. For example, if there were more than
seven colors used in Figure 2, answering the question could not be done with
preattentive processing and would require sequential scanning, a much longer
process.
Fig. 2. Example of preattentively processed task: finding if there is a red circle among
the blue circles
There is a long list of visual features that can be preattentively processed for
some tasks, including line orientation, line length or width, closure, curvature,
color and many more. Information visualization relies on this theory to choose
the visual encoding used to display data to allow the most interesting visual
queries to be done preattentively.
Gestalt theory explains important principles followed by the visual system
when it tries to understand an image. According to Ware [30], it is based on the
following principles:
8 J.-D. Fekete et al.
Proximity Things that are close together are perceptually grouped together;
Similarity Similar elements tend to be grouped together;
Continuity Visual elements that are smoothly connected or continuous tend
to be grouped;
Symmetry Two symmetrically arranged visual elements are more likely to be
perceived as a whole;
Closure A closed contour tends to be seen as an object;
Relative Size Smaller components of a pattern tend to be perceived as objects
whereas large ones as a background.
3 Success Stories
Static examples used by most InfoVis courses include the map of Napoleon’s
1812 March on Moscow drawn in 1869 by M. Minard (Figure 3) and the map of
London in 1854 overlaid with marks positioning cholera victims that led John
Snow to discovering the origin of the epidemic: infected water extracted with a
water pump at the center of the marks (Figure 4).
In general, good examples show known facts (although sometimes forgotten)
and reveal several unexpected insights at once. Minard’s map can help answer the
question: “What were the casualties of Napoleon’s Russian invasion in 1812?”.
The map reveals at once the magnitude of casualties (from 400,000 to 10,000
The Value of Information Visualization 9
Fig. 3. Napoleon’s March on Moscow depicted by M. Minard [12]. Width indicates the
number of soldiers. Temperature during the retreat is presented below the map. Image
courtesy of École Nationale des Ponts et Chaussées.
soldiers) as well as the devastating effect of crossing the Berezina river (50,000
soldiers before, 25,000 after). The depiction confirms that Napoleon lost the
invasion (a well known fact) and reveals many other facts, such as the continuous
death rate due to disease and the “scorched earth” tactics of Russia instead of
specific death tolls of large battles.
John Snow’s map was made to answer the question: “What is the origin of the
London cholera epidemics?”. Contrary to the previous map, the answer requires
some thinking. Black rectangles indicate location of deaths. At the center of the
infected zone lies a water pump that John Snow found to be responsible for the
infection. Once again, choosing the right representation was essential for finding
the answer. As a side-effect, the map reveals the magnitude of the epidemic.
Figure 1 answers the question: “Is there a relationship between income and
college degree?” by showing a scatter plot of income by degree for each US state.
The answer is the obvious: yes, but there is much more. There seems to be a
linear correlation between them and some outliers such as Nevada (likely due to
Las Vegas) and Utah do exist, raising new unexpected questions.
Information Visualization couples interaction and visual representation so its
power is better demonstrated interactively. The simplest demonstration suited
to the largest audience is probably the Dynamic HomeFinder5 [32] . It shows the
map of the Washington D.C. area overlaid with all the homes for sale (Figure 5).
Dynamic queries implemented by sliders and check-boxes interactively filter-out
homes that do not fit specific criteria such as cost or number of bedrooms.
Using the interactive controls, it becomes easy to find homes with the desired
attributes or understand how attributes’ constraints should be relaxed to find
some matching homes. Unexpectedly, the Dynamic HomeFinder also reveals the
5
http://www.cs.umd.edu/hcil/pubs/dq-home.zip
10 J.-D. Fekete et al.
Fig. 4. Illustration of John Snow’s deduction that a cholera epidemic was caused by a
bad water pump, circa 1854 [4]. Black rectangles indicate location of deaths.
unpopular neighborhoods around Washington D.C. since they are places where
the homes are cheaper, and the wealthy ones where the houses are more expen-
sive.
Many more examples can be found to demonstrate that InfoVis is effective.
The Map of the Market6 , represented by a squarified treemap, is interesting
for people holding stocks or interested by economic matters. InfoZoom video
on the analysis of Formula 1 results7 is interesting for car racing amateurs.
The video8 comparing two large biological classification trees is interesting to
some biologists. The Baby Name Wizard’s NameVoyager9 is useful for persons
searching a name for their baby to come and a large number of other persons as
witnessed by [31].
With the advent of Social InfoVis through web sites such as Swivel10 or IBM’s
Many-Eyes11, more examples can be found to convince specific audiences. Still,
the process of explaining how InfoVis works remains the same: ask a question
that interests people, show the right representation, let the audience understand
the representation, answer the question and realize how many more unexpected
findings and questions arise.
6
http://www.smartmoney.com/marketmap/
7
http://www.infozoom.com/enu/infozoom/video.htm
8
http://www.fit.fraunhofer.de/~cici/InfoVis2003/StandardForm/Flash/
InfoZoomTrees.html
9
http://babynamewizard.com/namevoyager/
10
http://www.swivel.com
11
http://www.many-eyes.com
The Value of Information Visualization 11
Fig. 5. Dynamic HomeFinder showing the Washington D.C. area with homes available
for sale and controls to filter them according to several criterion.
3.2 Testimonials
One effective line of argumentation about the value of InfoVis is through re-
porting the success of projects that used InfoVis techniques. These stories exist
but have not been advertised in general scientific publications until recently
[20,16,13]. One problem with trying to report on the success of a project is that
visualization is rarely the only method used to reach the success. For example,
in biological research, the insights gained by an InfoVis system can lead to an
important discovery that is difficult to attribute mainly to the visualization since
it also required months of experimentation to verify the theory formulated from
the insights. In fact, most good human-computer interaction systems allow users
to forget about the system and focus on their task only, which is probably one
reason why success stories are not so common in the InfoVis literature.
Besides these stories that are empirical evidence of the utility of information
visualization, there are strong theoretical arguments to how and why information
visualization works.
4.1 Statistics
More than statistics, the goal of data mining is to automatically find interesting
facts in large datasets. It is thus legitimate to wonder whether data mining, as a
competitor of InfoVis, can overcome and replace the visual capacity of humans.
This question has been addressed by Spence and Garrison in [22] where
they describe a simple plot called the Hertzsprung Russell Diagram (Figure 7a).
It represents the temperature of stars on the X axis and their magnitude on
the Y axis. Asking a person to summarize the diagram produces Figure 7b. It
12
See http://astro.swarthmore.edu/astro121/anscombe.html for details
The Value of Information Visualization 13
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
(a) Four datasets with different values and (b) Dot Plot of the four datasets
the same statistical profile
turns out that no automatic analysis method has been able to find the same
summarization, due to the noise and artifacts on the data such as the vertical
bands.
Our vision system has evolved with the human specie to help us survive
in a hostile world. We train it to avoid obstacles since we learn how to walk.
It remains remarkably effective at filtering-out noise from useful data, a very
important capability for hunters in deep forests to distinguish the prey moving
14 J.-D. Fekete et al.
behind leaves. We have relied on it and trained it to survive for millennia and it
still surpasses automatic data mining methods to spot interesting patterns. Data
mining still needs to improve to match these pattern matching capabilities.
One important question is how to assess the value of visualization, ranging from
the evaluation of one specific use-case to the discipline in general. If we know
how to do this, then this might lead to an assessment of the current status as
well as the identification of success factors. An attempt was given by Van Wijk
[27] and is summarized here. After a short overview of his model, we discuss how
this model can be applied for InfoVis.
Visualization can be considered as a technology, a collection of methods,
techniques, and tools developed and applied to satisfy a need. Hence, standard
technological measures apply: Visualization has to be effective and efficient. To
measure these, an economic point of view is adopted. Instead of trying to un-
derstand why visualization works (see previous sections), here visualization is
considered from the outside, and an attempt is made to measure its profit. The
profit of visualization is defined as the difference between the value of the increase
in knowledge and the costs made to obtain this insight. Obviously, in practice
these are hard to quantify, but it is illuminating to attempt so. A schematic
model is considered: One visualization method V is used by n users to visual-
ize a data set m times each, where each session takes k explorative steps. The
value of an increase in knowledge (or insight) has to be judged by the user.
Users can be satisfied intrinsically by new knowledge, as an enrichment of their
understanding of the world. A more pragmatic and operational point of view
is to consider if the new knowledge influences decisions, leads to actions, and,
hopefully, improves the quality of these. The overall gain now is nm(W (∆K)),
where W (∆K)) represents the value of the increase in knowledge.
Concerning the costs for the use of (a specific) visualization V , these can
be split into various factors. Initial research and development costs Ci have to
be made; a user has to make initial costs Cu , because he has to spend time to
select and acquire V , and understand how to use it; per session initial costs Cs
The Value of Information Visualization 15
have to be made, such as conversion of the data; and finally during a session a
user makes costs Ce , because he has to spend time to watch and understand the
visualization, and interactively explore the data set. The overall profit now is
In other words, this leads to the obvious insight that a great visualization method
is used by many people, who use it routinely to obtain highly valuable knowledge,
while having to spend little time and money on hardware, software, and effort.
And also, no alternatives that are more cost-effective should be available.
In the original paper a number of examples of more or less successful visual-
ization methods are given, viewed in terms of this model. One InfoVis application
was considered: SequoiaView, a tool to visualize the contents of a hard disk, us-
ing cushion treemaps [28]. The popularity of this tool can be explained from
the concrete and useful insights obtained, as well as the low costs in all respects
associated with its application.
When we consider InfoVis in general, we can also come to positive conclu-
sions for almost all parameters, and hence predict a bright future for our field.
The number of potential users is very large. Data in the form of tables, hier-
archies, and networks is ubiquitous, as well as the need to get insight in these.
This holds for professional applications, but also for private use at home. Many
people have a need to get an overview of their email, financial transfers, media
collections, and to search in external data bases, for instance to find a house,
vacation destination, or another product that meets their needs. Methods and
techniques from InfoVis, in the form of separate tools or integrated in custom
applications, can be highly effective here to provide such overviews. Also, many
of these activities will be repeated regularly, hence both n and m are high. The
growing field of Casual InfoVis [19] further illustrates how InfoVis techniques
are becoming more common in people’s everyday lives.
The costs Ce that have to be made to understand visualizations depend on
the prior experience of the users as well as the complexity of the imagery shown.
On the positive side, the use of graphics to show data is highly familiar, and
bar-charts, pie-charts, and other forms of business graphics are ubiquitous. On
the other hand, one should not overestimate familiarity. The scatterplot seems
to be at the boundary: Considered as trivial in the InfoVis community, but too
hard to understand (if the horizontal axis does not represent time) by a lay-
audience, according to Matthew Ericson, deputy graphics director of the New
York Times in his keynote presentation at IEEE InfoVis 2007. Visual literacy is
an area where more work can be done, but on the other hand, InfoVis does have
a strong edge compared to non-visual methods here. And, there are examples of
areas where complex visual coding has been a great success, with the invention
of the script as prime example.
The costs Cs per session and Cu per user can be reduced by tight integration
with applications. The average user will not be interested in producing visual-
izations, her focus will be on solving her own problem, where visualization is one
of the means to this end. Separate InfoVis tools are useful for specialists, which
16 J.-D. Fekete et al.
use them on a day-to-day basis. For many other users, integration within their
favourite tool is much more effective. An example of an environment that offers
such a tight integration is the ubiquitous spreadsheet, where storage, manipu-
lation, and presentation of data are offered; or the graphs and maps shown on
many web sites (and newspapers!) to show data. From an InfoVis point of view,
the presentations offered here can often be improved, and also, the interaction
provided is often limited. Nevertheless, all these examples acknowledge the value
of visualization for many applications.
The initial costs Ci for new InfoVis methods and techniques roughly fall into
two categories: Research and Development. Research costs can be high, because
it is often hard to improve on the state of the art, and because many experiments
(ranging from the development of prototypes to user experiments) are needed.
On the other hand, when problems are addressed with many potential usages,
these costs are still quite limited. Development costs can also be high. It takes
time and effort to produce software that is stable and useful under all conditions,
and that is tightly integrated with its context, but here also one has to take
advantage of the large potential market. Development and availability of suitable
middleware, for instance as libraries or plug-ins that can easily customized for
the problem at hand is an obvious route here.
One intriguing aspect here is how much customization is needed to solve the
problem concerned. On one hand, in many applications one of the standard data
types of InfoVis is central (table, tree, graph, text), and when the number of
items is not too high, the problem is not too hard to solve. On the other hand,
for large numbers of items one typically has to exploit all a priori knowledge of
the data set and tune the visualization accordingly; also, for applications such
as software visualizations all these data types pop up simultaneously, which also
strongly increases the complexity of the problem. So, for the time being, research
and innovation will be needed to come up with solutions for such problems as
well.
In conclusion, graphics has been adopted already on a large scale to com-
municate and present abstract data, which shows that its value has been ac-
knowledged, and we expect that due to the increase in size and complexity of
data available, the need for more powerful and effective information visualization
methods and techniques will only grow.
6 Conclusion
In this paper we have described the challenges in identifying and communicat-
ing the value of InfoVis. We have cited and posed a number of answers to the
questions, “How and why is InfoVis useful?” Hopefully, the examples shown
in the paper provide convincing arguments about InfoVis’ value as an analytic
tool. Ultimately, however, we believe that it is up to the community of InfoVis
researchers and practitioners to create techniques and systems that clearly il-
lustrate the value of the field. When someone has an InfoVis system that they
use in meaningful and important ways, this person likely will not need to be
convinced of the value of InfoVis.
The Value of Information Visualization 17
References
20. Saraiya, P., North, C., Lam, V., Duca, K.: An insight-based longitudinal study
of visual analytics. IEEE Transactions on Visualization and Computer Graph-
ics 12(6), 1511–1522 (2006)
21. Shannon, C.E., Weaver, W.: A Mathematical Theory of Communication. Univer-
sity of Illinois Press, Champaign (1963)
22. Spence, I., Garrison, R.F.: A remarkable scatterplot. The American Statistician,
12–19 (1993)
23. Triesman, A.: Preattentive processing in vision. Computer Vision, Graphics, and
Image Processing 31(2), 156–177 (1985)
24. Tufte, E.R.: The Visual Display of Quantitative Information. Graphics Press,
Cheshire (1983)
25. Tufte, E.R.: Envisioning Information. Graphics Press, Cheshire (1990)
26. Tufte, E.R.: Visual Explanations: Images and Quantities, Evidence and Narrative.
Graphics Press, Cheshire (1997)
27. van Wijk, J.J.: The value of visualization. In: Proceedings IEEE Visualization
2005, pp. 79–86 (2005)
28. van Wijk, J.J., van de Wetering, H.: Cushion treemaps. In: Proceedings 1999
IEEE Symposium on Information Visualization (InfoVis’99), pp. 73–78. IEEE
Computer Society Press, Los Alamitos (1999)
29. Voorhees, E., Harman, D.: Overview of the sixth Text Retrieval Conference. In-
formation Processing and Management 36(1), 3–35 (2000)
30. Ware, C.: Information Visualization: Perception for Design. Morgan Kaufmann
Publishers Inc., San Francisco (2004)
31. Wattenberg, M., Kriss, J.: Designing for social data analysis. IEEE Transactions
on Visualization and Computer Graphics 12(4), 549–557 (2006)
32. Williamson, C., Shneiderman, B.: The dynamic homefinder: evaluating dynamic
queries in a real-estate information exploration system. In: SIGIR ’92: Proceed-
ings of the 15th annual international ACM SIGIR conference on Research and
development in information retrieval, pp. 338–346. ACM Press, New York (1992)
Evaluating Information Visualizations
Sheelagh Carpendale
1 Introduction
Information visualization research is becoming more established, and as a result, it is
becoming increasingly important that research in this field is validated. With the gen-
eral increase in information visualization research there has also been an increase,
albeit disproportionately small, in the amount of empirical work directly focused on
information visualization. The purpose of this paper is to increase awareness of em-
pirical research in general, of its relationship to information visualization in particu-
lar; to emphasize its importance; and to encourage thoughtful application of a greater
variety of evaluative research methodologies in information visualization.
One reason that it may be important to discuss the evaluation of information visu-
alization, in general, is that it has been suggested that current evaluations are not con-
vincing enough to encourage widespread adoption of information visualization tools
[57]. Reasons given include that information visualizations are often evaluated using
small datasets, with university student participants, and using simple tasks. To en-
courage interest by potential adopters, information visualizations need to be tested
with real users, real tasks, and also with large and complex datasets. For instance, it is
not sufficient to know that an information visualization is usable with 100 data items
if 20,000 is more likely to be the real-world case. Running evaluations with full data
sets, domain specific tasks, and domain experts as participants will help develop
much more concrete and realistic evidence of the effectiveness of a given information
visualization. However, choosing such a realistic setting will make it difficult to get a
large enough participant sample, to control for extraneous variables, or to get precise
measurements. This makes it difficult to make definite statements or generalize from
the results. Rather than looking to a single methodology to provide an answer, it will
probably will take a variety of evaluative methodologies that together may start to
approach the kind of answers sought.
The paper is organized as follows. Section 2 discusses the challenges in evaluating
information visualizations. Section 3 outlines different types of evaluations and dis-
cusses the advantages and disadvantages of different empirical methodologies and the
trade-offs among them. Section 4 focuses on empirical laboratory experiments and the
generation of quantitative results. Section 5 discusses qualitative approaches and the
different kinds of advantages offered by pursuing this type of empirical research.
Section 6 concludes the paper.
A. Kerren et al. (Eds.): Information Visualization, LNCS 4950, pp. 19 – 45, 2008.
© Springer-Verlag Berlin Heidelberg 2008
20 S. Carpendale
• Precision: a result is precise to the degree to which one can be definite about the
measurements that were taken and about the control of the factors that were not
intended to be studied.
• Realism: a result is considered realistic to the extent to which the context in
which it was studied is like the context in which it will be used.
Figure 1 (adapted and simplified from McGrath [50]) shows the span of common
methodologies currently in practice in the social sciences. They are positioned around
the circle according to the labels: most precision, most generalizability and most real-
ism. The closer a methodology is placed to a particular label, the more that label ap-
plies to that methodology. Next, these methodologies are briefly described. For fuller
descriptions see McGrath 1995.
Field Study: A field study is typically conducted in the actual situation, and the ob-
server tries as much as possible to be unobtrusive. That is, the ideal is that the pres-
ence of the observer does not affect what is being observed. While one can put con-
siderable effort into minimizing the impact of the presence of an observer, this is not
completely possible [50]. Examples of this type of research include ethnographic
work in cultural anthropology, field studies in sociology, and case studies in industry.
In this type of study the realism is high but the results are not particularly precise and
likely not particularly generalizable. These studies typically generate a focused but
rich description of the situation being studied.
Field Experiment: A field experiment is usually also conducted in a realistic setting;
however, an experimenter trades some degree of unobtrusiveness in order to obtain
more precision in observations. For instance, the experimenter may ask the partici-
pants to perform a specific task while the experimenter is present. While realism is
24 S. Carpendale
still high, it has been reduced slightly by experimental manipulation. However, the
necessity of long observations may be shortened and results may be more readily
interpretable and specific questions are more likely to be answered.
Laboratory Experiment: In a laboratory experiment the experimenters fully design
the study. They establish what the setting will be, how the study will be conducted,
what tasks the participants will do, and thus plan the whole study procedure. Then the
experimenter gets people to participate as fully as possible following the rules of the
procedure within the set situation. Carefully done, this can provide for considerable
precision. In addition, non-realistic behaviour that provides the experimenter more
information can be requested such as a ‘think aloud’ protocol [43]. Behaviour can be
measured, partly because it is reasonably well known when and where the behaviour
of interest may happen. However, realism is largely lost and the degree to which the
experimenter introduces aspects of realism will likely reduce the possible precision.
Experimental Simulation: With an experimental simulation the experimenter tries to
keep as much of the precision as possible while introducing some realism via simula-
tion. There are examples where this approach is essential such as studying driving
while using a cell phone or under some substance’s influence by using a driving simu-
lator. Use of simulation can avoid risky or un-ethical situations. Similarly although
less dramatically, non-existent computer programs can be studied using the ‘Wizard
of Oz’ approach in which a hidden experimenter simulates a computer program. This
type of study can provide us with considerable information while reducing the dan-
gers and costs of a more realistic experiment.
Judgment Study: In a judgment study the purpose is to gather a person’s response to
a set of stimuli in a situation where the setting is made irrelevant. Much attention is
paid to creating ‘neutral conditions’. Ideally, the environment would not affect the
result. Perceptual studies often use this approach. Examples of this type of research
include the series of studies that examine what types of surface textures best support
the perception of 3D shape (e.g. [34, 38]), and the earlier related work about the per-
ception of shape from shading [39]. However, in assessing information visualizations
this idea of setting a study in neutral conditions must be considered carefully, as wit-
nessed by Reilly and Inkpen’s [62] study which showed that the necessity for an in-
teractive technique developed to support a person’s mental model during transition
from viewing one map to another (subway map to surface map) was dependent on the
distractions in the setting. This transition technique relates to ideas of morphing and
distortion in that aspects of the map remain visible while shifting. These studies in a
more neutral experiment setting showed little benefit, while the same tasks in a noisy,
distracting setting showed considerable benefit.
Sample Survey: In a sample survey the experimenter is interested in discovering
relationships between a set of variables in a given population. Examples of these
types of questions include: of those people who discover web information visualiza-
tion tools how many return frequently and are their activities social or work related?
Of those people who have information visualization software available at work what
is the frequency of use? Considering the increased examples of information visualiza-
tion results and software on the web, is the general population’s awareness of and/or
use of information visualization increasing? In these types of studies proper sampling
Evaluating Information Visualizations 25
Since quantitative empirical evaluations have evolved over the centuries the method-
ology has become relatively established (Figure 2). This brief overview is included
for completeness; the interested reader should refer to the many good books on this
subject [15, 17, 33]. This methodology includes:
• Hypothesis Development: Much of the success of a study depends on asking an
interesting and relevant question. This question should ideally be of interest to
the broader research community, and hopefully answering it will lead to a deeper
or new understanding of open research questions. Commonly the importance of
the study findings results from a well thought through hypothesis, and formulat-
ing this question precisely will help the development of the study.
• Identification of the Independent Variables: The independent variables are
the factors to be studied which may (or may not) affect the hypothesis. Ideally
the number of independent variables is kept low to provide more clarity and pre-
cision in the results.
• Control of the Independent Variables: In designing the experiment the ex-
perimenter decides the manner in which the independent variables will be
changed.
• Elimination of Complexity: In order to be clear that it is actually the change in
the independent variable that caused the study’s result, it is often the case that
other factors in the environment need to be controlled.
• Measurement of the Dependent Variables: Observations and measurements
are focused on the dependent variables as they change or do not change in
Evaluating Information Visualizations 27
Even though these types of experiments have been long and effectively used across all
branches of science, there remain many challenges to conducting a useful study. We
mention different types of commonly-discussed errors and validity concerns and re-
late these to the McGrath’s discussion as outlined in Section 3. In this discussion we
will use a simple, abstract example of an experiment that looks at the effect of two
visualization techniques, VisA and VisB, on performance in search. There are several
widely discussed issues that can interfere with the validity of a study.
Conclusion Validity: Is there a relationship? This concept asks whether within the
study there is a relationship between the independent and the dependent variables.
Important factors in conclusion validity are finding a relationship when one does not
exist (type I error) and not finding a relationship when one does exist (type II error).
28 S. Carpendale
At the heart of qualitative methods is the skill and sensitivity with which data is gath-
ered. Whether the records of the data gathered are collected as field notes, artefacts,
video tapes, audio tapes, computer records and logs, or all of these, in qualitative
empirical approaches there are really only two primary methods for gathering data:
observations and interviews. Observation and interview records are usually kept con-
tinually as they occur, as field notes, as regular journal entries as well as often being
recorded as video or audio tapes. Artefacts are collected when appropriate. These can
be documents, drawings, sketches, diagrams, and other objects of use in the process
being observed. These artefacts are sometimes annotated as part of use practices or in
explanation. Also, since the communities we are observing are often technology us-
ers, technology-based records can also include logs, traces, screen captures, etc. Both
observation and interviewing are skills and as such develop with practice and can, at
least to some extent, be learnt. For full discussions on these skills there are many
useful books such as Seidman [65] and Lofland and Lofland [45].
• Try to keep jotting down notes unobtrusively. Ideally, notes are taken as obser-
vations occur; however, if one becomes aware that one’s note taking is having
an impact on the observations, consider writing notes during breaks, when
shielded, or at the end of the day.
• Minimize the time gap from observations to note taking. Memory can be quite
good for a few hours but does tend to drop off rapidly.
• Include in observations the setting, a description of the physical setup, the time,
who is present, etc. Drawing maps of layouts and activities can be very useful.
• Remember to include both the overt and covert in activities and communica-
tions. For example, that which is communicated in body language and gestures,
especially if it gets understood and acted upon, is just as important as spoken
communications. But be careful of that grey area where one is not sure to what
extent a communication occurred.
• Remember to include both the positive and negative. Observed frustrations and
difficulties can be extremely important in developing a fuller understanding.
• Do not write notes on both sides of a paper. This may seem trivial but experi-
enced observers say this is a must [6]. You can search for hours, passing over
many times that important note that is on the back of another note.
• Be concrete whenever possible.
• Distinguish between word-for-word or verbatim accounts and those you have
paraphrased and/or remembered.
ily quantitative results of a laboratory experiment and in this they play an important
but secondary role.
Think-Aloud Protocol: This technique, which involves encouraging participants to
speak their thoughts as they progress through the experiment, was introduced to the
human-computer-interaction community by [43]. Discussions about this protocol in
psychology date back to 1980 [19, 20, 21]. Like most methodologies, this one also
involves tradeoffs. While it gives the experimenter/observer the possibility of being
aware of the participants’ thoughts, it is not natural for most people and can make a
participant feel awkward; thus, think aloud provides additional insight while also
reducing the realism of the study. However, the advantage for hearing about a partici-
pant’s thoughts, plans, and frustrations frequently out-weigh the disadvantages and
this is a commonly used technique. Several variations have been introduced such as
‘talk aloud’ which asks a participant to more simply announce their actions rather
than their thoughts [21].
Collecting Participant Opinions: Most laboratory experiments include some method
by which participant opinions and preferences are collected. This may take the form
of a simple questionnaire or perhaps semi-structured interviews. Most largely quanti-
tative studies such as laboratory experiments do ask these types of questions, often
partially quantifying the participant’s response by such methods as using a Likert
scale [44]. A Likert scale asks a participant to rate their attitude according to degree.
For instance, instead of simply asking a participant, ‘did you like it?’ A Likert scale
might ask the participant to choose one of a range of answers ‘strongly disliked,’
‘disliked,’ ‘neutral,’ ‘liked,’ or ‘strongly liked.’
Summary of Nested Qualitative Methods: The nested qualitative methods men-
tioned in this section may be commonplace to many readers. The point to be made
here is that in the small, that is as part of a laboratory experiment, inclusion of some
qualitative methods is not only commonplace, its value is well recognized. This type
of inclusion of qualitative approaches adds insight, explanations and new questions. It
also can help confirm results. For instance, if participants’ opinions are in line with
quantitative measures – such as the fastest techniques being the most liked – this
confirms the interpretation of the fastest technique being the right one to chose. How-
ever, if they contradict – such as the fastest techniques not being preferred – interest-
ing questions are raised including questioning the notion that fastest is always best.
each heuristic [54]. The original use indicated that in most situations three evaluators
would be cost effective and find most usability problems [54]. However, subsequent
use of heuristics for web site analysis appears to sometimes need more evaluators [9,
69]. Further, this may depend on the product. While application of heuristics has not
yet been formally studied in terms of web sites, it does introduce the possibility that
information visualization heuristics may also need to be data, task or purpose specific.
Heuristics are akin to the design term guidelines in that both provide a list of ad-
vice. Design guidelines are often usefully applied in a relatively ad hoc manner as
factors to keep in mind during the design process and heuristic lists can definitely be
similarly used. While there are definitely benefits that accrue in the use of guidelines
and heuristics, it is important to bear in mind that they are based on what is known to
be successful and thus tend not to favour the unusual and the inventive. In the design
world, common advice is that while working without knowledge of guidelines is fool-
ish, following them completely is even worse.
within the group. The results of these experiments are regarded as providing impor-
tant information about what group processes to support and some indication about
how this might be done. This type of research can be particularly important in com-
plex or sensitive scenarios such as health care situations [72]. Brereton and McGarry
[11] observed groups of engineering students and professional designers using physi-
cal objects to prototype designs. They found that the interpretation and use of physical
objects depended greatly on the context of its placement, indicating that the context of
people's work is important and is difficult to capture quantitatively. Their goal was to
determine implications for the design of tangible interfaces. Other examples include
Saraiya et al. [63] who used domain expert assessments of insight to evaluate bioin-
formatics visualizations, while Mazza and Berre [48] used focus groups and semi-
structured interviews in their analysis of visualization approaches to support instruc-
tors in web-based distance education.
The following are simply examples of empirical methods in which gathering of
qualitative data is primary. There are many others; for instance, Moggridge [51] men-
tions that his group makes active use of fifty-one qualitative methods in their design
processes.
In Situ Observational Studies: These studies are at the heart of field studies. Here,
the experimenter gets permission to observe activities as they take place in situ. In
these studies the observer does their best to remain unobtrusive during the observa-
tions. The ideal in Moggridge’s terms is to become as a ‘fly on the wall’ that no one
notices [51]. This can be hard to achieve in an actual setting. However, over time a
good observer does usually fade into the background. Sometimes observations can be
collected via video and audio tapes to avoid the more obvious presence of a person as
observer but sometimes making such recordings is not appropriate as in medical situa-
tions. In these studies the intention is usually to gather a rich description of the situa-
tion being observed. However, there is both a difference and an overlap in the type of
observations to be gathered when the intention is (a) to better understand the particu-
lar activities in a given of setting, or (b) to use these observations to inform technol-
ogy design. Thus, because different details are of prime interest it is important that
our research community conducts these types of observational studies to better inform
initial design as well as to better understand the effectiveness of new technology in
use. These studies have high realism, result in rich context explicit data and are time
and labour intensive when it comes to both data collection and data analysis.
Participatory Observation: This practice is the opposite of participatory design.
Here an information visualization expert becomes part of the application expert’s
team to experience the work practices first hand rather than application experts be-
coming part of the information visualization design team. In participatory observa-
tion, additional insights can be gained through first-hand observer experience of the
tasks and processes of interest in the context of the real world situation. Here, rather
than endeavouring to be unobtrusive, the observer works towards becoming an ac-
cepted part of the community. Participatory observation is demonstrably an effective
approach since as trust and rapport develop, an increasingly in-depth understanding is
possible. Our research community is interested in being able to better understand the
work practices of many different types of knowledge workers. These workers are
usually highly trained, highly paid, and often under considerable time pressures. Not
Evaluating Information Visualizations 37
surprisingly, they are seldom willing to accept an untrained observer as part of their
team. Since information visualization researchers are of necessity highly trained
themselves, it is rare that an information visualization researcher will have the neces-
sary additional training to become accepted as a participatory observer. However,
domain expertise is not always essential for successful participatory observation.
Expert study participants can train an observer on typical data analysis tasks – a proc-
ess which may take several hours, and then “put them to work” on data analysis using
their existing tools and techniques. The observer keeps a journal of the experience and
the outcomes of the analysis were reviewed with the domain experts for validity.
Even as a peripheral participant, valuable understandings of domain, tasks, and work
culture can be developed which help clarify values and assumptions about data, visu-
alizations, decision making and data insights important to the application domain.
These understandings and constructs can be important to the information visualization
community in the development of realistic tools.
Laboratory Observational Studies: These studies use observational methodologies
in a laboratory setting. A disadvantage of in situ observations is that they often require
lengthy observations. For instance, if the observer is interested in how an analyst uses
visual data, they will have to wait patiently until the analyst does this task. Since an
analyst may have many other tasks – meetings, conference calls, reports, etc. – this
may take hours or even days. One alternative to the lengthy in situation wait is to
design an observational experiment in which, similarly to a laboratory experiment, the
experimenter designs a setting, a procedure and perhaps even a set of tasks. Consider,
for example, developing information visualizations to support co-located collabora-
tion. Some design advice on co-located collaborative aspects is available in the com-
puter supported cooperative work literature [35]. However, while this advice is useful,
it does not inform us specifically about how teams engage in collaborative tasks when
using visual information. Details such as how and when visualizations will be shared
and what types of analysis processes need to be specifically supported in collaborative
information visualization systems were missing. Here, an observational approach is
appropriate because the purpose is to better understand the flow and nature of the
collaboration among participants, rather than answering quantifiable lower-level ques-
tions. In order to avoid temporal biases in existing software, pencil and paper based
visualizations were used. This allowed for the observation of free arrangement of
data, annotation practices, and collaborative processes unconstrained by any particular
visualization software [36].
Contextual Interviews: As noted in Section 5.1, interviewing in itself is core to
qualitative research. Conducting an interview about a task, setting, or application of
interest within the context in which this work usually takes place is just one method
that can enrich the interview process. Here the realism of the setting helps provide the
context that can bring to mind the day-to-day realities during the interview process
(for further discussion see Holtzblatt and Beyer 1998). For example, to study how
best to support the challenging problem of medical diagnosis, observing and inter-
viewing physicians in their current work environment might help to provide insights
into their thought processes that would be difficult to capture with other methodolo-
gies. A major benefit of qualitative study can be seeing the big picture – the context in
which a new visualization support may be used. The participants' motives, misgiv-
38 S. Carpendale
ings, and opinions shed light on how they relate to existing support, and can effec-
tively guide the development of new support. This type of knowledge can be very
important at the early stage of determining what types of information visualizations
may be of value.
Summary of qualitative methods as primary: These four methods are just exam-
ples of a huge variety of possibilities. Other methods include action research [42],
focus groups [48], and many more. All these types of qualitative methods have the
potential to lessen the task and data comprehension divide between ourselves as visu-
alization experts and the domain experts for whom we are creating visualizations.
That is, while we can not become analysts, doctors, or linguists, we can gain a deeper
understanding of how they work and think. These methods can open up the design
space, revealing new possibilities for information visualizations, as well as additional
criteria on which to measure success.
5.3.2 Subjectivity
Experimenter subjectivity can be seen as an asset because of the sensitivity that can
be brought to the observation process. The quality of the data gathering and analysis
is dependent on the experience of the investigator [56]. However, the process of gath-
ering any data must be concerned with obtaining representative data. The questions
circle about whether the observer has heard or understood fully and whether these
observations are reported accurately. Considerations include:
• Is this a first person direct report? Otherwise normal common sense about 2nd,
3rd, and 4th hand reports needs to be considered.
• Does the spatial location of the observer provide an adequate vantage point from
which to observe, or might it have led to omissions?
Evaluating Information Visualizations 39
• Are the social relationships of the observer free from associations that might in-
duce bias?
• Does the report appear to be self-serving? Does it benefit the experimenter to the
extent that it should be questioned?
• Is the report internally consistent? Do the facts within the report support each
other?
• Is the report externally consistent? Do the facts in the report agree with other in-
dependent reports?
As a result it is important to be explicit about data collection methods, the position of
the researcher with respect to the subject matter, analysis processes, and codes. These
details make it possible for other researchers to verify results.
In qualitative research it is acknowledged that the researcher's views, research con-
text, and interpretations are an essential part of the qualitative research method as
long as they are grounded in the collected data [3]. This does not, however, mean that
qualitative evaluations are less trustworthy compared to quantitative research. Auer-
bach suggests using the concept of ‘transferability’ rather than ‘generalizability’ when
thinking about the concepts of reliability and validity in qualitative research [3]. It is
more important that the theoretical understanding we have gained can also be found in
other research situations or systems and can be extended and developed further when
applied to other scenarios. This stands in contrast to the concept of generalizability in
quantitative research that wants to prove statistically that the results are universally
applicable within the population under study.
Sometimes the point has been raised that if results do not generalize how can they
be of use when designing software for general use. For example, qualitative methods
might be used to obtain a rich description of a particular situation perhaps only ob-
serving the processes of two or three people. The results of a study like this may or
may not generalize and the study itself provides no proof that they do. What we have
is existence proof: that such processes are in use in at least two or three instances.
Consider the worst case; that is that this rich description is an outlier that occurs only
rarely. For design purposes, outliers are also important and sensitive design for out-
liers has been often shown to create better designs for all. For example, motion sen-
sors to open doors may have been designed for wheelchairs but actually are useful
features for all.
Qualitative studies can be a powerful methodology by which one can capture salient
aspects of a problem that may provide useful design and evaluation criteria. Quantita-
tive evaluation is naturally precision-oriented, but a shift from high precision to high
fidelity may be made with the addition of qualitative evaluations. In particular, while
qualitative evaluations can be used throughout the entire development life cycle in
other research areas such as CSCW [41, 52, 64, 73], observational studies have been
found to be especially useful for informing design. Yet these techniques are under-
used and under-reported in the information visualization literature. Broader ap-
proaches to evaluation, different units of analysis and sensitivity to context are impor-
tant when complex issues such as insight, discovery, confidence and collaboration
need to be assessed. In more general terms, we would like to draw attention to qualita-
tive research approaches which may help to address difficult types of evaluation ques-
tions. As noted by Isenberg el al. [36], a sign in Albert Einstein's office which read,
‘Everything that can be counted does not necessarily count; everything that counts
cannot necessarily be counted’ is particularly salient to this discussion in reminding
us to include empirical research about important data that can not necessarily be
counted.
6 Conclusions
In this paper we have made a two-pronged call: one for more evaluations in general
and one for a broader appreciation of the variety of and importance of many different
types of empirical methodologies. To achieve this, we as a research community need
to both conduct more empirical research and to be more welcoming of this research in
our publication venues. As noted in Section 4, even empirical laboratory experiments,
as our most known type of empirical methodology, are often difficult to publish. One
factor in this is that no empirical method is perfect. That is, there is always a trade-off
between generalizability, precision, and realism. An inexperienced reviewer may rec-
ommend rejection based on the fact that one of these factors is not present, while realis-
tically at least one will always be compromised. Empirical research is a slow, labour-
intensive process in which understanding and insight can develop through time. That
said, there are several important factors to consider when publishing empirical re-
search. These include:
Evaluating Information Visualizations 41
Acknowledgments. The ideas presented in this paper have evolved out of many dis-
cussions with many people. In particular this includes: Christopher Collins, Marian
Dörk, Saul Greenberg, Carl Gutwin, Mark S. Hancock, Uta Hinrichs, Petra Isenberg,
Stacey Scott, Amy Voida, and Torre Zuk.
References
1. Amar, R.A., Stasko, J.T.: Knowledge Precepts for Design and Evaluation of Information
Visualizations. IEEE Transactions on Visualization and Computer Graphics 11(4), 432–
442 (2005)
2. Andrews, K.: Evaluating Information Visualisations. In: Proceedings of the 2006 AVI Work-
shop on BEyond Time and Errors: Novel Evaluation Methods for Information Visualiza-
tion, pp. 1–5 (2006)
3. Auerbach, C.: Qualitative Data: An Introduction to Coding and Analysis. University Press,
New York (2003)
42 S. Carpendale
4. Baker, K., Greenberg, S., Gutwin, C.: Empirical Development of a Heuristic Evaluation
Methodology for Shared Workspace Groupware. In: Proceedings of the ACM Conference
on Computer Supported Cooperative Work, pp. 96–105. ACM Press, New York (2002)
5. Baldonado, M., Woodruff, A., Kuchinsky, A.: Guidelines for Using Multiple Views in
Information Visualization. In: Proeedings of the Conference on Advanced Visual Inter-
faces (AVI), pp. 110–119. ACM Press, New York (2000)
6. Barzun, J., Graff, H.: The Modern Researcher, 3rd edn. Harcourt Brace Jovanvich, New
York (1977)
7. BELIV 2006, accessed http://www.dis.uniroma1.it/~beliv06/ (February 4,
2008)
8. Bertin, J.: Semiology of Graphics (Translation: William J. Berg). University of Wisconsin
Press (1983)
9. Bevan, N., Barnum, C., Cockton, G., Nielsen, J., Spool, J., Wixon, W.: The “Magic Num-
ber 5”: Is It Enough for Web Testing? In: CHI Extended Abstracts, pp. 698–699. ACM
Press, New York (2003)
10. Boyatzis, R.: Transforming Qualitative Information: Thematic Analysis and Code Devel-
opment. Sage Publications, London (1998)
11. Brereton, M., McGarry, B.: An Observational Study of How Objects Support Engineering
Design Thinking and Communication: Implications for the Design of Tangible Media. In:
Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI’00),
pp. 217–224. ACM Press, New York (2000)
12. Chen, C., Czerwinski, M.: Introduction to the Special Issue on Empirical Evaluation of
Information Visualizations. International Journal of Human-Computer Studies 53(5), 631–
635 (2000)
13. Corbin, J., Strauss, A.: Basics of Qualitative Research: Techniques and Procedures for
Developing Grounded Theory, 3rd edn. Sage Publications, Los Angeles (2008)
14. Creswell, J.: Qualitative Inquiry and Research Design: Choosing Among Five Traditions.
Sage Publications, London (1998)
15. Dix, A., Finlay, J., Abowd, G., Beale, R.: Human Computer Interaction, 2nd edn. Prentice-
Hall, Englewood Cliffs (1998)
16. Dumais, S., Cutrell, E., Chen, H.: Optimizing Search by Showing Results In Context. In:
Proc. CHI’01, pp. 277–284. ACM Press, New York (2001)
17. Eberts, R.E.: User Interface Design. Prentice-Hall, Englewood Cliffs (1994)
18. Ellis, E., Dix, A.: An Explorative Analysis of User Evaluation Studies in Information
Visualization. In: Proceedings of the Workshop on Beyond Time and Errors: Novel
Evaluation Methods for Information Visualization, BELIV (2006)
19. Ericsson, K., Simon, H.: Verbal Reports as Data. Psychological Review 87(3), 215–251
(1980)
20. Ericsson, K., Simon, H.: Verbal Reports on Thinking. In: Faerch, C., Kasper, G. (eds.)
Introspection in Second Language Research, pp. 24–54. Multilingual Matters, Clevedon,
Avon (1987)
21. Ericsson, K., Simon, H.: Protocol Analysis: Verbal Reports as Data, 2nd edn. MIT Press,
Boston (1993)
22. Fall, J., Fall, A.: SELES: A Spatially Explicit Landscape Event Simulator. In: Proceedings
of GIS and Environmental Modeling, pp. 104–112. National Center for Geographic Infor-
mation and Analysis (1996)
23. Forlines, C., Shen, C., Wigdor, D., Balakrishnan, R.: Exploring the effects of group size
and display configuration on visual search. In: Computer Supported Cooperative Work
2006 Conference Proceedings, pp. 11–20 (2006)
24. Garfinkel, H.: Studies in Ethnomethodology. Polity Press, Cambridge (1967)
Evaluating Information Visualizations 43
25. Gonzalez, V., Kobsa, A.: A Workplace Study of the Adoption of Information Visualization
systems. In: Proceedings of the International Conference on Knowledge Management, pp.
92–102 (2003)
26. Gorard, S.: Combining Methods in Educational Research. McGraw-Hill, New York (2004)
27. Greenberg, S., Buxton, B.: Usability Evaluation Considered Harmful (Some of the Time).
In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2008)
28. Greene, J., Caracelli, V., Graham, W.: Toward a Conceptual Framework for Mixed-Method
Evaluation Design. Educational Evaluation and Policy Analysis 11(3), 255–274 (1989)
29. Gutwin, C., Greenberg, S.: The Mechanics of Collaboration: Developing Low Cost Usabil-
ity Evaluation Methods for Shared Workspaces. In: Proceedings WETICE, pp. 98–103.
IEEE Computer Society Press, Los Alamitos (2000)
30. Healey, C.G.: On the Use of Perceptual Cues and Data Mining for Effective Visualization
of Scientific Datasets. In: Proceedings of Graphics Interface, pp. 177–184 (1998)
31. Heer, J., Viegas, F., Wattenberg, M.: Voyagers and Voyeurs: Supporting Asynchronous
Collaborative Information Visualization. In: Proceedings of the Conference on Human
Factors in Computing Systems (CHI’07), pp. 1029–1038. ACM Press, New York (2007)
32. Holtzblatt, K., Beyer, H.: Contextual Design: Defining Customer-Centered Systems. Mor-
gan Kaufmann, San Francisco (1998)
33. Huck, S.W.: Reading Statistics and Research, 4th edn. Pearson Education Inc., Boston (2004)
34. Interrante, V.: Illustrating Surface Shape in Volume Data via Principal Direction-Driven 3D
Line Integral Convolution. Computer Graphics, Annual Conference Series, pp. 109–116
(1997)
35. Isenberg, P., Carpendale, S.: Interactive Tree Comparison for Co-located Collaborative In-
formation Visualization. IEEE Transactions on Visualization and Computer Graphics 12(5)
(2007)
36. Isenberg, P., Tang, A., Carpendale, S.: An Exploratory Study of Visual Information Analy-
sis. In: Proceedings of the Conference on Human Factors in Computing Systems (CHI’08),
ACM Press, New York (to appear, 2008)
37. Kay, J., Reiger, H., Boyle, M., Francis, G.: An Ecosystem Approach for Sustainability:
Addressing the Challenge of Complexity. Futures 31(7), 721–742 (1999)
38. Kim, S., Hagh-Shenas, H., Interrante, V.: Conveying Shape with Texture: Experimental
Investigations of Texture’s Effects on Shape Categorization Judgments. IEEE Transactions
on Visualization and Computer Graphics 10(4), 471–483 (2004)
39. Kleffner, D.A., Ramachandran, V.S.: On the Perception of Shape from Shading. Percep-
tion and Psychophysics 52(1), 18–36 (1992)
40. Kobsa, A.: User Experiments with Tree Visualization Systems. In: Proceedings of the
IEEE Symposium on Information Visualization, pp. 9–26 (2004)
41. Kruger, R., Carpendale, S., Scott, S.D., Greenberg, S.: Roles of Orientation in Tabletop
Collaboration: Comprehension, Coordination and Communication. Journal of Computer
Supported Collaborative Work 13(5–6), 501–537 (2004)
42. Lewin, C. (ed.): Research Methods in the Social Sciences. Sage Publications, London (2004)
43. Lewis, C., Rieman, J.: Task-Centered User Interface Design: A Practical Introduction (1993)
44. Likert, R.: A Technique for the Measurement of Attitudes. Archives of Psychology 140, 1–55
(1932)
45. Lofland, J., Lofland, L.: Analyzing Social Settings: A Guide to Qualitative Observation
and Analysis. Wadsworth Publishing Company, CA, USA (1995)
46. Mankoff, J., Dey, A., Hsieh, G., Kientz, J., Lederer, S., Ames, A.: Heuristic Evaluation of
Ambient Displays. In: Proceedings of CHI ’03, pp. 169–176. ACM Press, New York (2003)
47. Mark, G., Kobsa, A., Gonzalez, V.: Do Four Eyes See Better Than Two? Collaborative
Versus Individual Discovery in Data Visualization Systems. In: Proceedings of the IEEE
Conference on Information Visualization (IV’02), July 2002, pp. 249–255. IEEE Press,
Los Alamitos (2002)
44 S. Carpendale
48. Mazza, R., Berre, A.: Focus Group Methodology for Evaluating Information Visualization
Techniques and Tools. In: Proceedings of the International Conference on Information
Visualization IV (2007)
49. McCarthy, D.: Normal Science and Post-Normal Inquiry: A Context for Methodology (2004)
50. McGrath, J.: Methodology Matters: Doing Research in the Social and Behavioural Sci-
ences. In: Readings in Human-Computer Interaction: Toward the Year 2000, Morgan
Kaufmann, San Francisco (1995)
51. Moggridge, B.: Design Interactions. MIT Press, Cambridge (2006)
52. Morris, M.R., Ryall, K., Shen, C., Forlines, C., Vernier, F.: Beyond “Social Protocols”:
Multi-User Coordination Policies for Co-located Groupware. In: Proceedings of the ACM
Conference on Computer-Supported Cooperative Work (CSCW, Chicago, IL, USA), CHI
Letters, November 6-10, 2004, pp. 262–265. ACM Press, New York (2004)
53. Morse, E., Lewis, M., Olsen, K.: Evaluating Visualizations: Using a Taxonomic Guide.
Int. J. Human-Computer Studies 53, 637–662 (2000)
54. Nielsen, J., Mack, R.: Usability Inspection Methods. John Wiley & Sons, Chichester (1994)
55. North, C.: Toward Measuring Visualization Insight. IEEE Computer Graphics and Appli-
cations 26(3), 6–9 (2006)
56. Patton, M.Q.: Qualitative Research and Evaluation Methods, 3rd edn. Sage Publications,
London (2001)
57. Plaisant, C.: The Challenge of Information Visualization Evaluation. In: Proceedings of the
Working Conference on Advanced Visual Interfaces, pp. 109–116 (2004)
58. Purchase, H.C., Hoggan, E., Görg, C.: How Important Is the “Mental Map”? – An Empiri-
cal Investigation of a Dynamic Graph Layout Algorithm. In: Kaufmann, M., Wagner, D.
(eds.) GD 2006. LNCS, vol. 4372, pp. 184–195. Springer, Heidelberg (2007)
59. Purchase, H.C.: Effective Information Visualisation: A Study of Graph Drawing Aesthet-
ics and Algorithms. Interacting with Computers 13(2), 477–506 (2000)
60. Purchase, H.C.: Performance of Layout Algorithms: Comprehension, Not Computation.
Journal of Visual Languages and Computing 9, 647–657 (1998)
61. Brandenburg, F.J. (ed.): GD 1995. LNCS, vol. 1027. Springer, Heidelberg (1996)
62. Reilly, D., Inkpen, K.: White Rooms and Morphing Don’t Mix: Setting and the Evaluation
of Visualization Techniques. In: Proceedings of the SIGCHI Conference on Human Fac-
tors in Computing Systems, pp. 111–120 (2007)
63. Saraiya, P., North, C., Duca, K.: An Insight-Based Methodology for Evaluating Bioinfor-
matics Visualizations. IEEE Transactions on Visualization and Computer Graphics 11(4),
443–456 (2005)
64. Scott, S.D., Carpendale, S., Inkpen, K.: Territoriality in Collaborative Tabletop Workspaces.
In: Proceedings of the ACM Conference on Computer-Supported Cooperative Work
(CSCW, Chicago, IL, USA), CHI Letters, November 6-10, 2004, pp. 294–303. ACM Press,
New York (2004)
65. Seidman, I.: Interviewing as Qualitative Research: A Guide for Researchers in Education
and the Social Sciences. Teachers’ College Press, New York (1998)
66. Shneiderman, B.: The Eyes Have It: A Task by Data Type Taxonomy for Information
Visualizations. In: Proceedings of the IEEE Symposium on Visual Languages, pp. 336–
343. IEEE Computer Society Press, Los Alamitos (1996)
67. Shneiderman, B., Plaisant, C.: Strategies for Evaluating Information Visualization Tools:
Multi-Dimensional In-Depth Long-Term Case Studies. In: Proceedings of the Workshop
on BEyond Time and Errors: Novel Evaluation Methods for Information Visualization,
BELIV (2006)
68. Spence, R.: Information Visualization, 2nd edn. Addison-Wesley, Reading (2007)
69. Spool, J., Schroeder, W.: Testing Web Sites: Five Users is Nowhere Near Enough. In: CHI
’01 Extended Abstracts, pp. 285–286. ACM Press, New York (2001)
Evaluating Information Visualizations 45
70. Strauss, A.L., Corbin, J.: Basics of Qualitative Research: Techniques and Procedures for
Developing Grounded Theory. Sage Publications, London (1998)
71. Tang, A., Tory, M., Po, B., Neumann, P., Carpendale, S.: Collaborative Coupling over
Tabletop Displays. In: Proceedings of the Conference on Human Factors in Computing
Systems (CHI’06), pp. 1181–1290. ACM Press, New York (2006)
72. Tang, A., Carpendale, S.: An observational study on information flow during nurses’ shift
change. In: Proc. of the ACM Conf. on Human Factors in Computing Systems (CHI), pp.
219–228. ACM Press, New York (2007)
73. Tang, J.C.: Findings from observational studies of collaborative work. International Jour-
nal of Man-Machine Studies 34(2), 143–160 (1991)
74. Tory, M., Möller, T.: Evaluating Visualizations: Do Expert Reviews Work. IEEE Com-
puter Graphics and Applications 25(5), 8–11 (2005)
75. Tufte, E.: The Visual Display of Quantitative Information. Graphics Press, Cheshire (1986)
76. Tufte, E.: Envisioning Information. Graphics Press, Cheshire (1990)
77. Tufte, E.: Visual Explanations. Images and Quantities, Evidence and Narrative. Graphics
Press, Cheshire (1997)
78. Viegas, F.B., Wattenberg, M., van Ham, F., Kriss, J., McKeon, M.: Many Eyes: A Site for
Visualization at Internet Scale. IEEE Transactions on Visualization and Computer Graph-
ics (Proceedings Visualization / Information Visualization 2007) 12(5), 1121–1128 (2007)
79. Ware, C.: Information Visualization: Perception for Design, 2nd edn. Morgan Kaufmann,
San Francisco (2004)
80. Wigdor, D., Shen, C., Forlines, C., Balakrishnan, R.: Perception of Elementary Graphical
Elements in Tabletop and Multi-surface Environments. In: Proceedings of the Confer-
ence on Human Factors in Computing Systems (CHI’07), pp. 473–482. ACM Press,
New York (2007)
81. Willett, W., Heer, J., Agrawala, M.: Scented Widgets: Improving Navigation Cues with
Embedded Visualizations. In: INFOVIS 2007. IEEE Symposium on Information Visuali-
zation (2007)
82. Yost, B., North, C.: The Perceptual Scalability of Visualization. IEEE Transactions on
Visualization and Computer Graphics 12(5), 837–844 (2006)
83. Zuk, T.: Uncertainty Visualizations. PhD thesis. Department of Compute Science, Univer-
sity of Calgary (2007)
84. Zuk, T., Carpendale, S.: Theoretical Analysis of Uncertainty Visualizations. In: Proceedings
of SPIE Conference Electronic Imaging, Vol. 6060: Visualization and Data Analysis (2006)
85. Zuk, T., Schlesier, L., Neumann, P., Hancock, M.S., Carpendale, S.: Heuristics for Infor-
mation Visualization Evaluation. In: Proceedings of the Workshop BEyond Time and Er-
rors: Novel Evaluation Methods for Information Visualization (BELIV 2006), held in con-
junction with the Working Conference on Advanced Visual Interfaces (AVI 2006), ACM
Press, New York (2006)
Theoretical Foundations of Information Visualization
1 Introduction
Information Visualization suffers from not being based on a clearly defined underly-
ing theory, making the tools we produce difficult to validate and defend, and meaning
that the worth of a new visualization method cannot be predicted in advance of im-
plementation. There is much unease in the community as to the lack of theoretical
basis for the many impressive and useful tools that are designed, implemented and
evaluated by Information Visualization researchers.
The purpose of a theory is to provide a framework within which to explain phe-
nomena. This framework can then be used to both evaluate and predict events, in this
case, users’ insight or understanding of visualization, and their use of it. An Informa-
tion Visualization theory would enable us to evaluate visualizations with reference to
an established and agreed framework, and to predict the effect of a novel visualization
method.
This is not to say that a single theory would be able to encapsulate the whole of
the Information Visualization field; it may be that multiple theories at different levels
are needed. We already make use of many existing cognitive and perceptual theories,
as well as established statistical methods. It might be that the complexity of Informa-
A. Kerren et al. (Eds.): Information Visualization, LNCS 4950, pp. 46 – 64, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Theoretical Foundations of Information Visualization 47
The three sections that follow each take a different approach to suggesting a theory
for Information Visualization. While they were not originally developed with the
above linguistic model in mind, each can be related in some way to this framework.
Natalia Andrienko takes a data-centric view, focusing on the dataset itself, and the
tokens that describe it. She considers how the characteristics of the dataset and the
requirements of the visualization for a task may be matched to determine patterns,
thus predicting the most appropriate visualization tool for the given task. Thus, this
section describes the exploration of the data model so as to identify the best syntax to
use for given tokens (taking into account their referents and the desired semantics).
She highlights the usefulness of systems which can explore the data model, predict
the patterns in datasets, and facilitate the perception of these patterns.
Matthew Ward’s starting point is communication theory, and this section is clearly
focused on information content – the meaning of the visualization and maintaining the
flow of information through all stages of the visualization pipeline. He discusses how
we may assess our progress in designing and enhancing visualizations through con-
sidering measurements of information transfer, content or loss, thus providing a useful
theoretical means for validating visualizations. In this case, there is no internal explo-
ration of the data, but it is the validity of the data after transfer from internal model to
external representation that is considered important.
T.J. Jankun-Kelly introduces two useful models for a scientific approach to visu-
alization, both of which are in their infancy. The visual exploration model describes
and captures the dynamic process of user exploration and manipulation of visualiza-
tion in order to affect its redesign, thus using the pragmatic response of the user to
determine a new syntactical arrangement. The second model, visual transformation
design, uses transformation functions applied to the data model to provide design
guidance based on visualization parameters, thus performing an initial exploration of
the data model to suggest syntax to enhance the pragmatic response of the user.
The paper concludes with a summary, and suggestions for future research.
crease and decrease are examples of patterns. A model may be a synthesis of several
patterns each representing some part or aspect of the data. Thus, when the observation
of the morning temperatures is performed over a sufficiently long period, the model
will probably incorporate the patterns of both increase and decrease of the tempera-
ture. Furthermore, patterns may also be composed of sub-patterns. For instance, the
behavior of the temperature may be conceptualized as a repeated “wave” where in-
crease is followed by decrease. Here, increase and decrease are basic, or atomic, pat-
terns, the “wave” is a composite pattern including the increase and decrease patterns,
and the repetition of the “wave” is a pattern of a yet higher level, which incorporates
the “wave” pattern.
The main role of Information Visualization tools can be understood as helping the
user to perceive patterns that could be used for building an appropriate model. This
means, in particular, that a tool should facilitate the perception of (sub)sets of data
items as units. For an appropriate support of the detection of patterns, a tool designer
should know in advance what types of patterns need to be perceived (or otherwise
detected) with the use of the tool. Then, after the tool is ready, it will be easy to ex-
plain to the users the purpose of the tool and instruct them how to detect the types of
patterns the tool is oriented to.
The types of patterns that may be meaningful for the user depend on the structure
and properties of the data under analysis. Thus, in the analysis of a temporal series of
numeric measurements (such as temperatures) it makes sense to look for such basic
patterns as increase, decrease, stability, fluctuation, peak, and low point. However,
when numeric measurements refer to a discrete unordered set as, for example, melting
temperatures of various substances, the possible types of patterns may be groupings of
elements with close values of the measurements and frequency-related patterns:
prevalence of certain values or value intervals, frequent values or exceptional values
(outliers).
To support the designers and users of Information Visualization tools in the way
described above, there is a need for a theory that could enable the possibility to pre-
dict, for a given dataset or a given class of datasets, what types of patterns may be
found there. We specially emphasize the term types to exclude the possible impres-
sion of attempting to predict (and on this basis automatically detect) all specific pat-
terns hidden in specific data. Thus, a prediction that a dataset may contain groups
(clusters) of objects with similar characteristics does not define what specific clusters
are there. However, it orients tool designers, who will know that the tool must help
the users to detect clusters, and users, who will know that they need a tool facilitating
the detection of clusters. Then, if each Information Visualization tool and technique is
supplied with an appropriate signature (i.e. what kind of data it is suitable for and
what types of patterns it is oriented to), the user will be able to choose the right tool.
The theory we are advocating in this section can be called data-centered predictive
theory. The theory needs to include
1. an appropriate generic framework for the characterization of various data
types and structures;
2. a general typology of patterns;
3. a mechanism for deriving possible pattern types from data characterizations.
Here, we present some preliminary ideas concerning these components of the theory.
50 H.C. Purchase et al.
Data may be viewed abstractly as a set of records with a common structure, each
record being a sequence of elements (such as numbers or strings) which either reflect
the results of some observations or measurements or specify the context in which the
observations or measurements were obtained. The context may include, for example,
the place and the time of observation or measurement, and the object or group of
objects observed. The elements that a data record consists of are called values.
All records of a dataset are assumed to have a common structure, with each posi-
tion having its specific meaning, which is common to all values appearing in it. These
positions may be named to distinguish between them. The positions are usually called
components of the data.
Definition: Characteristic component, or attribute, is a data component correspond-
ing to a measured or observed property of the phenomenon reflected in the data.
Characteristic is a value of a single attribute or a combination of values of several
dataset attributes.
Definition: Referential component, or referrer, is a data component reflecting an
aspect of the context in which the observations or measurements were made. Refer-
ence is the value of a single referrer or the combination of values of several referrers
that fully specifies the context of some observation(s) or measurement(s).
Definition: Reference set of a dataset is the set of all references occurring in this
dataset.
Definition: Characteristic set of a dataset is the set of all possible characteristics, (i.e.
combinations of values of the dataset attributes).
Definition: Multidimensional dataset is a dataset having two or more referrers. De-
pending on the number of referrers, a dataset may be called one-dimensional, two-
dimensional, three-dimensional, and so on.
For example, the geographical location and the time are referrers for measurements of
properties of the climate such as air temperature or wind direction, which are attrib-
utes. Each combination of location and time is a reference, and the corresponding
combination of air temperature and wind direction is a characteristic. This is a two-
dimensional dataset as it has two referrers; the attributes are not counted as dimen-
sions. Referrers are independent components and attributes are dependent since the
values of attributes depend on the context in which they are observed. In data analy-
sis, it is possible to deal with selected attributes independently from the others; how-
ever, all referrers present in a dataset need to be handled simultaneously.
Data may be viewed formally as a function, in the mathematical sense, with the re-
ferrers being independent variables and the attributes being dependent variables. The
function defines the correspondence between the references and the characteristics
where for each combination of values of the referential components there is at most
one combination of values of the attributes.
The structure of a dataset is characterized by specifying which components it in-
cludes, which of them are referrers, and which ones are attributes. Additionally to
this, it is necessary to specify the properties of the components. The relevant proper-
ties are:
Theoretical Foundations of Information Visualization 51
• whether distances exist between the elements. Any continuous set such as
time, space, and values of temperature has distances, but there may be dis-
tances also in discrete sets such as a set of integer values denoting numbers of
some items. The discrete set of substances has no distances.
• whether and how the elements are ordered. Thus, time moments are linearly
ordered and may also be cyclically ordered, depending on the time span of
observations.
It should be noted that a set consisting of combinations of values of several compo-
nents does not inherit the properties of the individual components. Thus, a set of
combinations of values of melting temperature and atomic weight is only partly or-
dered although the value sets of the original attributes are fully ordered. This data
characterization framework is presented in more detail in [3].
2.2 Patterns
This is a more generic definition than is given in data mining: “a pattern is an ex-
pression E in some language L describing facts in a subset FE of a set of facts F [i.e.
a dataset, in our terms] so that E is simpler than the enumeration of all facts in FE” [4].
In our definition, we mean any kind of representation, for example, graphical or
mental.
We posit that all existing and imaginable patterns may be considered as instan-
tiations of certain archetypes (or, simply, types). It is quite reasonable to assume
that such archetypes may exist in the mind of a data analyst and drive the process of
visual data analysis, which is commonly believed to be based on pattern recogni-
tion: the analyst looks for constructs that can be regarded as instances of the exist-
ing archetypes.
A pattern-instance may be characterized by referring to its type and specifying its
individual properties, in particular, the reference (sub)set on which the pattern is
based. Properties may be type-specific (for example, amount and rate of increase).
The following table defines the basic types of patterns in relation to the characteristics
of data for which such pattern types are relevant. We cannot guarantee at the present
moment that this typology is complete; further work is obviously needed. Note that
neither the columns nor the rows of the table are mutually exclusive. Thus, when the
characteristic set is ordered and has distances, the pattern types from all columns are
relevant. Similarly, when the reference set is linearly and cyclically ordered, the pat-
terns from all rows are possible.
These basic pattern types may be included in composite patterns. The types of
composite patterns depend on the properties of the reference set:
52 H.C. Purchase et al.
1. For any kind of reference set: repeated pattern, frequent pattern, infrequent
pattern, prevailing pattern;
2. For a linearly ordered reference set: specific sequence of patterns, alterna-
tion;
3. For a cyclically ordered reference set: cyclically repeated pattern;
4. For a reference set with distances: constant distance between repetitions of a
pattern, patterns occurring close to each other.
Any composite pattern may, in turn, be included in a bigger composite pattern, for
example, a frequently repeated pattern where increase is followed by decrease.
What is presented in this section is only an initial sketch of the data-centered predic-
tive theory. Further work is required to ensure the comprehensiveness of the pattern
typology. Particular attention needs to be paid to multi-dimensional data. It is also
necessary to define pattern types used to represent relationships between attributes or
between phenomena (represented by several datasets differing in structure) such as
correlation (co-occurrence) or influence.
Then, it is necessary to evaluate Information Visualization techniques according to
the types of data they are suited for and the types of patterns they help to elicit. This
can form an appropriate basis for instructive books and courses for users of Informa-
tion Visualization tools.
Theoretical Foundations of Information Visualization 53
3 Information Theory
3.1 Visual Communication
There have been many efforts to date to quantify the amount of information in a
communication stream. If we think of plain text, there are numerous quantifiable
features, including:
– The total number of words per minute
– The occurrence of specific words
– The frequency of occurrence for each word
– The occurrence of word pairs, triples, phrases, and sentences.
There are problems, however, with such simplistic, syntax-only measurement. Words
can have variable significance; some are unnecessary or redundant. Many words can
encode the same concept. In fact, reading text or hearing speech may have no affect
on one’s uncertainty regarding the subject of the text, e.g., you may already have
known it, or you don’t understand the meaning of the words or their implied concepts.
This implies that the measurement of information content or volume can be specific to
the individual receiver and, as we’ll see later, the task that is being performed based
on the communication.
Can we perform similar analysis on a dataset? Consider a table of numeric values.
Features of potential interest in the dataset include:
– The count of number of entries or dimensions
– The values
– Clusters and their attributes (number, size, relations, …)
– Trends and their attributes (size, rate of change, …)
– Outliers and their attributes (number, degree of outlierness, relation to dense
regions, …)
– Associations, correlations and any features between records, dimensions, or
individual values.
In fact, we can observe that a featureless dataset is not differentiable from random
noise: all values are equally likely. Features and relations can also vary in their mag-
nitude, certainty, complexity, and importance. Clusters may differ in size; outliers
may vary in their distance to the main body of data; features may be comprised of
many sub-features; in many cases, a feature that is significant to one observer may be
considered noise by another. Recently, researchers have proposed measuring and
counting insights [9], which are new knowledge gained during visual analysis. These
insights are generally specific to a particular task, some of which include [10]:
– Identify data characteristics
– Locate boundaries, critical points, other features
– Distinguish regions of different characteristics
– Categorize or classify
– Rank based on some order
– Compare to find similarities and differences
– Associate into relations
– Correlate by classifying relations.
For each of these tasks, we might have different accuracy requirements as well, which
can influence the resolution at which feature extraction is accomplished during com-
Theoretical Foundations of Information Visualization 55
munication. Thus, for example, the tasks of detecting, classifying, and measuring a
particular phenomenon each have their own accuracy demands. The tasks to be per-
formed also have an implication on the types of information that the visualization
must be able to convey; categorization and ranking imply that the visualization must
have high selective information content, while identifying characteristics and bounda-
ries are part of building a mental model and thus require good descriptive information
content.
Returning to our dataset and the simplistic features and relations that are contained
in it, we can try to quantify the volume of information and then measure how much of
this volume a visualization technique is capable of effectively conveying. If we as-
sume a table of scalar values (M records, N dimensions or variables), the number of
individual values to be communicated is M*N, and the maximum resolution required
is the number of significant digits. Often, however, the available visual resolution is
far less than that of the data. We can then count all the pairwise relations between
records, or dimensions, or even values. For records, this would be M*(M−1)/2, and
similar for dimensions and values. Then there are relations that are 3-way, 4-way, or
even among an arbitrary number of elements, e.g., in clustering tasks. Clearly, there
are too many possibilities to consider them all, so perhaps we need a different tactic.
wise relations. Many researchers have studied ways of selecting a useful ordering,
including Bertin’s reorderable matrix [12], Seo and Shneiderman’s rank-by-feature
techniques [13], and Peng et al.’s reordering for visual clutter reduction [14]. In all
cases, a user should be able to prune orderings to emphasize those that show trends,
groupings, or other discernable patterns. Thus far most research has focused on sim-
ple 1-D orderings, but higher level orderings and structures (e.g., hierarchies) have
also been studied.
This discussion of information content would not be complete without also consider-
ing the limitations imposed by the visual communication channel (i.e., the display)
and receiver (the human visual perception system). Regarding the channel and its
capacity, modern displays are limited to somewhere on the order of one to nine mil-
lion pixels, although tiled displays can increase this substantially. The color palette
generally has a size of 224 possible values, although the limitations of human color
perception take a big chunk out of this. Finally, the refresh rate of the system, typi-
cally between 20 and 30 frames per second, limits how fast the values on the screen
change, though again the human limitations of change detection mean that much of
this capability is moot.
Regarding these human limitations, from the study of human physiology we know
that there are approximately 800k fibers in the optic nerve. We can perceive 8-9 levels
of intensity graduation, and require a 0.1 second accumulation period to register a
visual stimulus. In addition, we have a limited viewable area at any particular time,
and a variable density of receptors (much less dense in the peripheral vision). Studies
have shown we have a limited ability to distinguish and measure size, position, and
color, and the duration of exposure affects our capacity. Finally, it has been shown
that our abilities are also related to the task at hand; we are much better at relative
judgment tasks than absolute judgment ones.
We now look at methods that have been used in the past for measuring the informa-
tion content in a data or information visualization. For completeness sake, some of
these are quite trivial. For example, simply counting the number of data values shown
is a valid measure. The issue in this case would be how to deal with partial occlusion.
In some cases this would be acceptable if sufficient information remains visible to
make identification or recognition possible. Tufte [15] suggested the data-ink ratio as
an indicator of information content, though tick marks, labels, and axes are often
essential for appropriate identification. Many researchers have used counts of the
number of features or patterns found in a particular amount of time. Ward and Ther-
oux [16] counted the number of clusters and outliers found by users in different visu-
alizations, while Suraiya et al. [9] counted insights discovered. In each case, a ground
truth is needed to verify that what was found was really present. Similar experiments
have been used to measure classification, measurement, and recall accuracy.
There are many other issues when attempting to measure information in a visuali-
zation. As mentioned earlier, distortion and other transformations can improve the
Theoretical Foundations of Information Visualization 57
readability of a visualization, but introduce errors in the data themselves. Data may
have uncertainty attributes associated with them, which can interfere with the meas-
urement. On the other hand, there are numerous examples of improving information
content by using novel layout, shape, and color strategies or augmenting the visualiza-
tion with links, boundaries, and even white space. The amount of information con-
tained may also be enhanced using redundant mappings, which improves the chances
of successful reception by the viewer. Finally, the use of animation to communicate
information in an incremental fashion can be quite effective; it is lossy communica-
tion, as viewers quickly forget some of what they have seen, but the ability to replay
the animation can replace some of this lost information.
3.7 Conclusions
potentially reduce the amount of ad hocness in the field. The key is to define meas-
ures of information transfer, content, or loss at all stages of the pipeline as a means of
assessing our progress in the development of new visualization techniques and en-
hancement of existing ones.
Fig. 1. Topics in a Formalized Information Visualization Course. Dark grey topics are based
upon formal foundations in other disciplines; light grey topics are yet-to-be-developed visuali-
zation-specific formal foundations.
There have been several recent calls for an establishment of a “theory” and “science”
behind visualization [24,25]; this need can be partially addressed via formal scientific
models. If we accept that information visualization needs a formal foundation, the
question remains whether the existing models from perceptual psychology and cogni-
tive science are sufficient. The problem with these formalisms is they do not address
the specific problems of visualization. While they provide general guidelines, models
from non-visualization fields do not consider the context of the visualization envi-
ronment - the user and the computer. What is needed is a set of formal foundations
that bridges the gap between the general human experience and the visualization do-
main (Figure 1). We propose two models for this purpose: an exploration model that
incorporates the user's interaction with the visualization and the dynamic aspects of
their analysis, and a transform design model which encapsulates the depiction and
constructive aspects of the visualization. These models would abstract fundamental
principles of visualization science and design, and thus proscribe (via their predictive
power) empirically driven practices.
Fig. 3. A depiction of the revised network routing visualization transform. Nodes represent the
state of the data (e.g., a table of events) while edges represent operators or interactions (e.g.,
parsing the data). In this example, the network visualization is combined with a graph visualiza-
tion by embedding the results of the former within the latter. The graph itself is a composition,
merging a spanning tree and the original graph to layout the selected sub-graph. Back-
propagation of state due to interaction is included. The depiction is based upon an extended
Data State transform model.
5 Conclusion
The three discussions of Information Visualization presented here draw on existing
theories of data-centric prediction, information communication and scientific model-
ing, and relate in different ways to the linguistic framework defined in the introduc-
tion. A single uniting theory of Information Visualization may be impossible due to
its strong relationship to and use of several other diverse disciplines (e.g. psychology
(perception, cognition and learning), graphic design and aesthetics).
Investigating theoretical approaches used in other disciplines, and their relation to
Information Visualization, is an obvious way forward, and can provide a useful way
for researchers in the area to present, discuss and validate their ideas; it is hoped that
the over-arching linguistics-based framework of representation, user exploration and
manipulation, and system exploration and manipulation will prove useful in linking
the constituent theories together. The more solid theoretical analyses that Information
Visualization researchers or tool designers can call on in defending or validating their
work, the more secure the discipline will be.
References
1. de Saussure, F.: Writings in General Linguistics. In: Bouquet, S., Engler, R., Sanders, C.,
Pires, M. (eds.), Oxford University Press, Oxford (2006)
2. Bakhtin, M.: The Dialogic Imagination, University of Texas Press (1981), quoted in Ball,
A.F., Freedman, S.W.: Bhaktinian Persepectives on Language, Literacy, and Learning,
Cambridge University Press, Cambridge (2004)
3. Andrienko, N., Andrienko, G.: Exploratory Analysis of Spatial and Temporal Data: A Sys-
tematic Approach. Springer, Heidelberg (2006)
4. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in
databases. AI Magazine 17, 37–54 (1996)
5. Card, S.K., Moran, T.P., Newell, A.: The Psychology of Human-Computer Interaction.
Erlbaum Associates, Hillsdale (1983)
6. Schneider, T.D.: Information Theory Primer. http://www.lecb.ncifcrf.gov/
~toms/paper/primer (April 14, 2007)
7. Cherry, C.: On Human Communication, 2nd edn. MIT Press, Cambridge (1966)
8. MacKay, D.: Information, Mechanism and Meaning. MIT Press, Cambridge (1969)
9. Saraiya, P., North, C., Duka, K.: An evaluation of microarray visualization tools for bio-
logical insight. In: Proc. IEEE Symposium on Information Visualization, pp. 1–8 (2004)
10. Keller, P., Keller, M.: Visual cues: Practical Data Visualization. IEEE Computer Society
Press, Los Alamitos (1993)
11. Cui, Q., Ward, M., Rundensteiner, E., Yang, J.: Measuring data abstraction quality in
multiresolution visualization. In: Proc. IEEE Symposium on Information Visualization, pp.
709–716 (2006)
12. Bertin, J.: Matrix theory of graphics. Information Design 10(1), 5–19 (2001)
13. Seo, J., Shneiderman, B.: A rank-by-feature framework for interactive exploration of multi-
dimensional data. In: Proc. IEEE Symposium on Information Visualization, pp. 96–113
(2005)
14. Peng, W., Ward, M., Rundensteiner, E.: Clutter reduction in multi-dimensional data visu-
alization using dimension reordering. In: Proc. IEEE Symposium on Information Visuali-
zation, pp. 89–96 (2004)
Theoretical Foundations of Information Visualization 63
15. Tufte, E.: The Visual Display of Quantitative Information. Computer Graphics Press, Chesh-
ire (1983)
16. Ward, M., Theroux, K.: Perceptual benchmarking for multivariate data visualization. In:
Proc. Dagstuhl Seminar on Scientific Visualization, pp. 314–328 (1997)
17. Fua, Y.-H., Ward, M., Rundensteiner, E.: Hierarchical parallel coordinates for exploration
of large datasets. In: Proc. IEEE Conference on Visualization, pp. 43–50 (1999)
18. Novotny, M., Hauser, H.: Outlier-preserving focus+context visualization in parallel coor-
dinates. IEEE Trans. Visualization and Computer Graphics 12, 893–900 (2006)
19. Dommik, G.: Do We Need Formal Education in Visualization? IEEE Computer Graphics
and Applications 20(4), 16–19 (2000)
20. Borland, D., Taylor, R.M.: Rainbow Color Map (Still) Considered Harmful. IEEE Com-
puter Graphics and Applications 27(2), 14–17 (2007)
21. Brewer, C.A.: Color use guidelines for data representation. In: Proceedings of the Section
on Statistical Graphics, American Statistical Association, pp. 50–60 (1999)
22. Ware, C.: Information Visualization: Perception for Design. Morgan Kaufmann, San Fran-
cisco (2004)
23. van Wijk, J.J.: The Value of Visualization. In: Proceedings of IEEE Visualization, pp. 79–
86. IEEE Computer Society Press, Los Alamitos (2005)
24. Johnson, C., Moorehead, R., Munzner, T., Pfister, H., Rheingans, P., Yoo, T.S.: NIH-NSF
Visualization Research Challenges Report, 1st edn. IEEE Computer Society Press, Los
Alamitos (2006)
25. Thomas, J.J., Cook, K.A.: Illuminating the Path: The Research and Development Agenda
for Visual Analytics. IEEE Computer Society Press, Los Alamitos (2005)
26. Anderson, J.R.: Cognitive Psychology and its Implications, 6th edn. Worth (2005)
27. Pirolli, P., Card, S.K.: Information Foraging. Psychological Review 4, 643–674 (1999)
28. Brodlie, K., Poon, A., Wright, H., Brankin, L., Banecki, G., Gray, A.: Problem Solving
Environment Integrating Computation And Visualization. In: Nielson, G.M., Bergeron,
R.D. (eds.) Proceedings of the 4th IEEE Conference on Visualization, pp. 102–109 (1993)
29. Jankun-Kelly, T.J., Ma, K.-L., Gertz, M.: A Model and Framework for Visualization Ex-
ploration. IEEE Transactions on Visualization and Computer Graphics 13, 357–369 (2007)
30. Lee, J.P., Grinstein, G.G.: An Architecture For Retaining And Analyzing Visual Explora-
tions Of Databases. In: Nielson, G.M., Silver, D. (eds.) Proceedings of the 6th IEEE Con-
ference on Visualization, pp. 101–108 (1995)
31. Pirolli, P., Card, S.K., Van Der Wege, M.M.: Visual information foraging in a focus +
context visualization. In: Proceedings of the SIGCHI conference on Human factors in
computing systems, pp. 506–513. ACM Press, New York (2001)
32. Pirolli, P.: Rational Analyses of Information Foraging on the Web. Cognitive Sci-
ence 29(3), 343–373 (2005)
33. Jankun-Kelly, T.J., Ma, K.-L., Gertz, M.: A Model for the Visualization Exploration Proc-
ess. In: Moorhead, R.J., Gross, M., Joy, K.I. (eds.) Proceedings of the the 13th IEEE Con-
ference on Visualization (Vis ’02), pp. 323–330 (2002)
34. Lee, P.J.: A Systems and Process Model for Data Exploration. PhD thesis. University of
Massachuesetts, Lowell (1998)
35. Teoh, S.T., Jankun-Kelly, T.J., Ma, K.-L., Wu, S.F.: Visual Data Analysis for Detecting
Flaws and Intruders in Computer Network Systems. In: IEEE Computer Graphics and Ap-
plications, p. 24. IEEE Computer Society Press, Los Alamitos (2004)
36. Abram, G., Treinish, L.: An Extended Data-Flow Architecture For Data Analysis And Visu-
alization. In: Nielson, G.M., Silver, D. (eds.) Proceedings of the IEEE Conference on Visu-
alization 1995 (Vis ’95), pp. 263–270. IEEE Computer Society Press, Los Alamitos (1995)
37. Chi, E.H., Riedl, J.T.: An Operator Interaction Framework For Visualization Systems. In:
Dill, J., Wills, G. (eds.) Proceedings of the IEEE Symposium on Information Visualiza-
tion, pp. 63–70 (1998)
64 H.C. Purchase et al.
38. Haber, R.B., McNabb, D.A.: Visualization Idioms: A Conceptual Model for Scientific
Visualization Systems. In: Nielson, G.M., Shriver, B., Rosenblum, L. (eds.) Visualization
in Scientific Computing, pp. 74–93. IEEE Computer Society Press, Los Alamitos (1990)
39. Hibbard, W.L., Dyer, C.R., Paul, B.E.: A Lattice Model for Data Display. In: Bergeron,
R.D., Kaufman, A.E. (eds.) Proceedings of the 5th IEEE Conference on Visualization (Vis
’94), pp. 310–317 (1994)
40. Schroeder, W.J., Martin, K.M., Lorensen, W.E.: The Design and Implementation of an
Object-Oriented Toolkit for 3D Graphics and Visualization. In: Yagel, R., Nielson, G.M.
(eds.) Proceedings of the 7th IEEE Conference on Visualization, pp. 93–100 (1996)
41. Casner, S.M.: Task-analytic approach to the automated design of graphic presentations.
ACM Transactions on Graphics 10(2), 111–151 (1991)
42. Mackinlay, M.: Automating the Design of Graphical Presentations of Relational Informa-
tion. ACM Transactions on Graphics 5(2), 110–141 (1986)
43. Roth, S.F., Mattis, J.: Data Characterization for Intelligent Graphics Presentation. In:
Proceedings on Human Factors in Computing Systems (CHI’90), pp. 193–200 (1990)
44. Bavoli, L., Callahan, S.P., Crossno, P.J., Freire, J., Scheidegger, C.E., Silva, C.T., Vo, H.T.:
VisTrails: Enabling Interactive Multiple-View Visualizations. In: Proceedings of the 16th
IEEE Conference on Visualization (2005)
45. Weaver, C.: Building Highly-Coordinated Visualizations in Improvise. In: Proceedings
2004 IEEE Symposium on Information Visualization, pp. 159–166. IEEE Computer Soci-
ety Press, Los Alamitos (2004)
Teaching Information Visualization
1 Introduction
Education is an important aspect of any emerging and rapidly evolving disci-
pline and this is certainly the case in Information Visualization (InfoVis) with its
emphasis on the exploratory development of knowledge. Most of the researchers
participating in the Dagstuhl seminar and contributing to this volume are in-
volved in helping students graduate with competencies in visualization. The
growing number of courses in Information Visualization is matched by the vari-
ety of styles of courses offered, in terms of course content, materials used, and
evaluation methodologies. Attendees at Dagstuhl seminar were curious to learn
about the courses others offered and the approaches and resources that were be-
ing used, and so a session on Information Visualization teaching and education
was held.
To prepare for that session and benchmark current offerings, Keith Andrews
from Graz University, Austria, prepared a survey about InfoVis-related courses
and distributed it to the attendees. The survey was intended to gather a va-
riety of information, mostly demographic, including teaching styles, textbooks,
enrollments, teaching aids, examinations, etc. Nineteen participants completed
the survey and described their courses. This paper presents the survey results
and includes the perspectives of some of the participants in relation to their
own teaching experience in light of these and discussions amongst colleagues at
Dagstuhl.
A. Kerren et al. (Eds.): Information Visualization, LNCS 4950, pp. 65–91, 2008.
c Springer-Verlag Berlin Heidelberg 2008
66 A. Kerren, J.T. Stasko, and J. Dykes
The survey consisted of four different parts. The specific questions included
in each part are listed below.
1. General Information
(a) Instructor name
(b) Educational organization
(c) Title of course
(d) Course home page (URL)
(e) Last taught (date)
(f) Course level (graduate or undergraduate)
(g) Course hours per week
(h) Course number of weeks
(i) Enrollment (number of students)
2. Teaching Aids
(a) Do you use one or more textbooks (yes, no)?
i. If so, which ones?
(b) Do you assign papers for compulsory assigned reading (yes, no)?
i. If so, which ones?
(c) Do you have your own set of lecture notes (yes, no)?
i. URL (if available)?
(d) Do you have teaching assistants for the course (yes, no)?
i. If so, how many?
3. Practical Exercises (Projects)
(a) Do you use practical exercises or projects (yes, no)?
i. If so, please describe a typical exercise or project.
ii. If so, how do you grade the practical exercises or project?
4. Examination or Test
(a) Do you have an examination (yes, no)?
i. If so, written or oral exam?
ii. If so, please describe a typical exam question.
Firstly, we briefly review the results of the survey. Section 3 summarizes the
topics discussed during the interactive session on teaching at the seminar. Fi-
nally a selection of participants reflect upon how these issues relate to their own
experience of teaching Information Visualization in Section 4.
2 Results
We present the results in four sections, one for each of the sections of the survey.
Teaching Information Visualization 67
The first part of the survey gives an overview of the courses offered at the dif-
ferent universities represented by the Dagstuhl participants and provides some
details about the courses themselves. Table 1 shows the responses obtained from
this part. A balance of European and North-American universities were repre-
sented by participants in the survey results. The majority of courses were focused
on the “core field” of information visualization (about 68%). Two courses were
about visualization/computer graphics in general, and the rest were about appli-
cation fields (e.g. geographic visualization) or broader topics, such as information
interfaces or visual communication. This scope reflects the broad and interdisci-
plinary nature of Information Visualization and provides some indications as to
why developing an agreed Information Visualization curriculum may be difficult.
Most of the courses (79%) had their own publicly accessible web page provid-
ing access to course related information. Nearly all the referenced courses were
given in 2006 and 2007. Since all the responding instructors are active researchers
in the field as well, we can assume that all these courses covered the current state
of the art in information visualization. Because the detailed curriculum for the
courses was not part of the survey, we do not have details about actual course
content. The web pages associated with each of the courses are a rich source of
information however and we used these to gather keywords associated with the
curricula of each. Figure 1 shows a tag cloud generated from these keywords that
gives a flavor of the variety and importance of different topics across the courses.
The dominant words reflect some of the tensions in Information Visualization
education, with a collective need to focus on data—its dimensionality and struc-
ture, techniques for layout and visual encoding and people and their responses
to these methods and the systems through which they are accessed. Perhaps
the tag cloud and the varied responses suggest a need for systematic research to
learn about the range of approaches that are used in teaching Information Vi-
sualization and related topics. The session discussion, summarized in Section 3,
led to more insight about this, but it was not a comprehensive examination.
Most courses were taught at the graduate level, with only two being un-
dergraduate courses. At the bottom of Table 1, descriptive statistics about the
results of questions Q1g-Q1i on course duration and size are provided. The av-
erage duration of a course and the number of hours of weekly meeting time are
relatively consistent across the group. Note that most class sizes are relatively
small, echoing the fact that Information Visualization is still a relatively new
and growing area. Here, the undergraduate course #16 seems to be an outlier
because of its large enrollment. However, this course is a compulsory course
on computer graphics and visualization, and the instructor plans to divide this
course into two parts in the future.
Textbooks (Q2a): About 72% of all the instructors (13 in total) used one or
more textbooks in their courses. The most popular books were those by Colin
Table 1. Results of Part 1 of the Survey on General Information together with a brief descriptive Analysis.
# Q1a Q1b Q1c Q1d Q1e Q1f Q1g Q1h Q1i
1 Keith Andrews TU Graz, Austria Information Visualisation [4] Summer 2007 g 3 14 15
2 Jason Dykes City University London, UK GeoVisualization – Spring 2007 g 3 12 22
3 Achim Ebert TU Kaiserslautern, Germany Information Visualization – Winter 04/05 g 2 14 18
4 Helwig Hauser TU Vienna, Austria Information Visualization [23] Summer 2007 g 3 15 30
5 Jeffrey Heer UC Berkeley, USA Visualization [25] Spring 2006 g 3 16 20
6 T.J. Jankun-Kelly Mississippi State University, USA Information Visualization [28] Fall 2006 g 3 15 12
7 Daniel Keim University of Constance, Germany Information Visualization [29] Summer 2007 g 5 14 20
8 Andreas Kerren TU Kaiserslautern, Germany Information Visualization [34] Winter 06/07 g 2 14 9
9 Robert Kosara UNC Charlotte, USA Visual Communication in Computer [39] Spring 2007 g 3 18 15
Graphics and Art
A. Kerren, J.T. Stasko, and J. Dykes
Fig. 1. Tag cloud of course topics. Includes courses with Web pages in the English
language for which a URL was provided.
Ware and Robert Spence. The following list shows all books used by instructors
of InfoVis courses in descending order of popularity.
Some respondents noted that they also used Tufte’s other books [69, 68] for
specific aspects of the course or as a focused topic, rather than as a general
textbook. Other courses cover more general fields, such as (Data) Visualization
(#5, #18, . . . ), with information visualization as part of them. In these courses,
other textbooks were used, for example the VTK Book [57], Designing Visual
Interfaces by Mullet and Sano [50], or [58, 30, 42, 16, 18]. Those teaching visual-
ization in a particular domain (e.g. GeoVisualization (#2)) used more specific
texts associated with the relevant discipline [60,15,49]. Figure 2 shows the usage
of books for all courses listed in Table 1.
selected from the IEEE InfoVis and Vis Conference proceedings as well as from
the ACM CHI proceedings.
Students typically had to prepare a short presentation about a research paper
in these courses. This helps students gain skills in oral communication (partic-
ularly if presentations are critiqued in class) and helps the courses to explore a
variety of different visualization approaches and techniques in discussion. Such
a presentation also could be part of a larger practical exercise or project (see
Section 2.3).
Own Lecture Notes (Q2c): Interestingly, all the instructors used their own
course lecture notes (many as PowerPoint slides). 58% published their lecture
notes on the course web site without any restriction. We assume that the re-
maining instructors either offered their notes on the web with restricted access
or simply used the notes to lecture from.
Teaching Information Visualization 71
Teaching Assistants (Q2d): More than the half of all instructors (58%) had
no teaching assistant (TA) to support their course. In these cases, we can assume
that instructors also supervised practical exercises which were offered in almost
all courses. Only two courses were supported by two TAs, and the rest had
one TA. These figures may be due to small classes that were reported in most
cases, but might be indicative of a lack of support for teaching and learning in
Information Visualization, which is a very practical activity. Any such trend may
be of concern to InfoVis educators.
2.4 Examination
More than the half (63%) of the surveyed courses had examinations of some
kind. The most common form was a written exam (37%), particularly in the
United States, but a notable portion (21%) used oral exams and 5% used both.
Oral examinations were used in Europe only, where there is a tradition of oral
examination for advanced level courses.
The survey also gave some insight into the different kinds of typical exam
questions (Q4a[ii]). There are a lot of different variants; a selection of the most
asked questions includes:
– Explain technique X for the visualization of problem Y .
– Given is a concrete problem and a task to be fulfilled. Which technique would
you use?
– Compare technique X with technique Y .
– Explain the construction of a Treemap, Starplot, Circle Segments, ...
– What are the advantages/disadvantages of technique X?
– What is a preattentive feature?
– What are the principles of using color?
Teaching Information Visualization 73
This list only gives a rough overview about the issues that are important for the
instructors. The course web pages do not provide additional detail—we found
no specific exam questions or model answers when investigating methods of ex-
amination further. In general, we could observe that examiners focus not only
on technical approaches or methods, but also students’ capabilities for critical
reflection and to demonstrate their working knowledge of human visual percep-
tion.
3 Seminar Discussions
In the seminar session about teaching, Keith Andrews first presented the initial
survey result data. Next, workshop attendees discussed a variety of issues related
to teaching including “best practices” and ways to improve all our courses.
Study Materials: The discussion on the use of research papers for compulsory
reading identified many different strategies for doing so. Many attendees echoed
a frustration about the difficulty in getting students to actually read assigned
articles, so many of the strategies addressed this particular issue. Several partic-
ipants reported about their own experiences and ideas, a number of which are
listed below.
– Papers are assigned, and students must present them. This takes place in
parallel with the regular lectures. This procedure seems to be pedagogically
beneficial because students learn to read actual research work, to prepare
a short talk and to give a presentation in the classroom. A disadvantage is
that student presentations vary greatly in quality. Some colleagues reported
on students losing interest in and not learning from poor presentations—
they would prefer the instructor to do all the lecturing. It is unclear if this
is truly a disadvantage, however. Perhaps, the amortized learning benefit is
high enough and this would justify the approach.
74 A. Kerren, J.T. Stasko, and J. Dykes
– The instructor gives lectures on mandatory meetings. Students can pick spe-
cific topics and lecture about a topic that the other students have not read
about.
– Papers are assigned, and there are written/oral questions on readings.
– Papers are assigned, and students must write a structured critical review
(about half a page) of them, i.e., a paragraph on the paper’s content, an
evaluative paragraph and an indication as to whether and why other students
might read the paper.
Using Other Media: Workshop attendees discussed that a large and com-
prehensive public collection of InfoVis-related images and videos would be very
helpful for instructors. Videos of interaction scenarios that show the usability
and interaction capabilities of the tools would be especially beneficial. Images
also could help to illuminate the history of InfoVis and illustrate different vi-
sualization techniques. Unfortunately, gathering a collection of images or videos
in this way could cause copyright problems. This may be why many instructors
have their own image/video archives with private access. The HCC Digital Li-
brary [24] of Georgia Tech is an example of an effort to gather a large collection
of educational resources, but it is focused broadly on HCI, not just InfoVis.
Another possibility to obtain video material is to examine conference DVDs,
such as the annual VIS/InfoVis/VAST DVD. Many contributions provide an
additional video to clarify the usage and interaction techniques of their work.
Again, it may be beneficial to encourage attendees to develop video summaries
of their work, specifically for teaching and learning.
Teaching Information Visualization 75
4 Personal Perspectives
In this section, three of the workshop attendees provide their own unique per-
spective on teaching InfoVis and InfoVis-related topics.
mation graphics and visual systems. More specifically, the learning outcomes for
the course include
– Students should gain an in-depth understanding of the field of Information
Visualization including key concepts and techniques.
– Students should be able to critique visualization designs and make sugges-
tions to improve them.
– Students should be able to design effective visualization solutions given new
problems and data domains.
– Students should learn about the spectrum of commercial system solutions
available in this area and how to choose one for a particular task or problem.
Perhaps the main challenge that I have faced in this course over the years is to
construct a coherent syllabus and flow of topics throughout the term. Information
Visualization is still a new area that is growing and maturing. Consequently,
it does not exhibit a well-understood and agreed-upon set of topics that flow
smoothly from one to the next. In my experience teaching the course, a number
of key ideas have risen to the surface and I make these the important components
of the course:
– Data foundations - A description and model of the different types of data
that are encountered and how this data is transformed and stored for easier
subsequent manipulation.
– Cognitive issues - A discussion of the user’s goals and tasks in using an
information visualization system. What cognitive benefits can visualization
provide?
– Visualization techniques - A description of the different visual represen-
tations and interaction techniques that have been invented.
– Interaction - A discussion of the different types and the many issues sur-
rounding interaction.
– Data types/structures - An introduction to specific types of data (e.g.,
time series, hierarchical, textual) and the visualization techniques that are
well-suited at representing those data types.
– Data domains - An examination of different domains (e.g., software engi-
neering, social computing, finance and business) and the visualization tech-
niques that are helpful to people working in those areas.
– Evaluation - A dialog about the challenges of evaluation in information
visualization and a review of different evaluation techniques that have been
used in the area.
Some of these topics are fundamentally interwoven so the flow of concepts is
not clearly self-contained and independent. For instance, certain visualization
techniques are best used for specific data types (e.g., treemaps for hierarchical
data). In organizing the course content, I feel this tension and often struggle
with which topics to teach first. Nonetheless, my course uses this progression of
topics as its organizational framework.
The course is lecture-based but I try to engage the students in discussions
about the different concepts being studied. I have used Bob Spence’s textbook
Teaching Information Visualization 77
some terms augmented by selected papers, and other terms I have used only
research papers. I have settled on having students read one or possibly two
research papers for each class. Typically, the paper is an important one for that
topic or it is a good overview of the issues involved. When I have assigned more
papers than this, I find that the students often do not adequately prepare and
read all the papers. To cover more recent research, I typically select two or
three recent articles on the topic of the day and I assign two or three students
who must recap and describe their particular paper’s key ideas to the class in
less than five minutes. I believe that experience giving presentations like this is
important and valuable to the students. All of my lecture slides can be found at
the course website and in 2007 I created in-studio video versions of each lecture.
These videos can be found at the website http://vadl.cc.gatech.edu.
I use a number of relatively small homework assignments in the course,
along with one larger homework and a group project. I will frequently employ
a midterm or final exam as well. The small homeworks often involve a visual-
ization design exercise (on paper) given a data set. Of course, such assignments
do not engage the interactive component of information visualization that is so
important, so they are fundamentally limited.
The larger homework assignment is a commercial tools critique. Students
are given five example datasets and asked to choose the two that they find
most interesting. Before using any systems, the students examine the datasets
and generate questions about them. Next, the students use a few information
visualization systems to explore the data and try to answer those questions.
I also alert the students to note any serendipitous findings that occur during
exploration. Finally, the students must write a report in which they critique the
different systems used, the visualization techniques each employs, and whether
the systems led to insights and discoveries. I have used systems such as Spotfire,
SeeIt, Advizor, Eureka (Table Lens), InfoZoom, InfoScope, and Grokker over
the years. I find this assignment to be extremely valuable to the students as it
allows them to gain hands-on experience with sophisticated systems and shows
them how visualizations can (or cannot) be helpful in analysis and exploration.
This particular assignment even led to an interesting research contribution
by my group. We studied the analytic queries generated by students over many
years of the course and clustered these inquiries into different low-level analytic
tasks that visualizations may assist. Our taxonomy of these tasks was presented
at the 2005 Symposium on InfoVis [1].
I also employ a group project in the course in which students design and build
a visualization system for a particular problem and data set. Teams of three or
four students work together for most of the term and find a client with a data
analysis problem or they simply choose a data set and envision the kinds of ana-
lytic queries that one would expect on it. The students explore different visualiza-
tion designs, then they choose one to implement. In the past, student teams have
often chosen to work on the contest datasets from the IEEE InfoVis or VAST
Conferences. In fact, student teams from my course have won these contests on
multiple occasions or have had competitive entries in the contests [20,54]. Group
projects have even led to full papers at the InfoVis Symposium as well [12].
78 A. Kerren, J.T. Stasko, and J. Dykes
One ongoing tension with the group project is simply when to begin the
assignment. By initiating the project early in the term, students have more
time to work on it and make better progress. However, at that early point,
students have engaged very little course material and so their understanding of
information visualization concepts and ideas is not as rich. I have found that the
simple topic chosen for the project can have a profound impact on the results,
and better knowledge of the information visualization area leads students to
make better choices in project topics. This has led me to wait until the midterm
point to distribute the project in some semesters, but then the students have
much less time to work on it.
Overall, the Information Visualization course has been valuable to me in
many different ways. Perhaps most importantly, the process of preparing lec-
tures and course material has made me reflect on the topics that I would be
discussing and question “accepted” knowledge in the domain. I believe that this
has made me a better researcher and it has generated ideas for new projects and
investigations.
Challenges
Challenges Perception
Perception
Basics
Basics
Evaluation
Evaluation
Introduction
Introduction Dynamic
Dynamic
GeoVis
GeoVis Queries
Queries
Zoom
Zoom
Interaction
Interaction &
WebVis Applications
WebVis Applications &
Techniques
Techniques Pan
Pan
Time-
Time-
dependent
dependent Trees
Trees
data
data Focus
Focus
BioVis
BioVis &
&
Visual
Visual Context
Context
Special
Special Structures
Structures
Data
Data
1D,
1D, 2D,
2D,
Text
Text &
& 3D,
3D, 4D,…
4D,…
SoftVis
SoftVis Graphs
Graphs &
&
Docs
Docs
Networks
Networks
Fig. 3. Course structure of the WS06/07 InfoVis course given by Andreas Kerren at
TU Kaiserslautern.
commercial products. Of course, this was a challenge and sometimes a little bit
subjective because of missing quality metrics or missing evaluations of tools and
techniques.
The design of the syllabus was partly influenced by courses given at Georgia
Tech and TU Vienna in 2005 as well as by the textbooks of Spence [61], Ware [71],
a pre-version of the textbook of Kerren et al. [35], and many research papers.
The course is divided into three parts which are illuminated by Figure 3:
1. In the first part, I discuss basic knowledge that is important for the design
or analysis of InfoVis concepts. As a first step, I introduce the field itself,
give motivations for the need, and present several traditional and modern
examples, mainly from Tufte’s [67, 69] and Spence’ books. Important is the
differentiation between InfoVis and SciVis, also, if that is not so easy in
some cases. After this introduction, a larger discussion on perception and
cognitive issues is given. Here, I provide information about the perception
of colors, textures, etc., preattentive features, and Gestalt laws. This course
component is mainly based on the book of Ware, but also on a lot of examples
and animations that can be found in the WWW. The last lecture of this
first part describes basics, such as the InfoVis Reference Model (data tables,
visual mapping, interaction, etc.) or data types and dimensionality. These
issues are mostly based on the book of Card et al. [8].
80 A. Kerren, J.T. Stasko, and J. Dykes
2. The second part is the largest one of my course. Here, I discuss the most
important interaction techniques at first, for example Dynamic Queries,
Zoom&Pan, and several Focus&Context related techniques. This component
is more or less geared to the InfoVis Reference Model [8], i.e., I distinguish
interaction by means of data transformations, visual mapping, and view
transformations. I use actual research papers to exemplify the different ap-
proaches. From a didactic point of view, this is a little bit tricky, because I
presume some knowledge in visual representations or structures to explain
my examples. I decided to discuss interaction at first and the visual struc-
tures for different data types after this. One advantage is the possibility to
refer later to discussed interaction techniques directly with less additional
explanations. My experiences with students show that they accept this order,
and that they have no problem in understanding the differences/correlations.
But this should be communicated previously.
As described before, a discussion of the most important visual structures for
more basic data types follows the interaction component. Here, I introduce
visualization techniques for multivariate data, hierarchies and graphs mostly
on the basis of research papers as well as the books of Spence and Kerren et
al. Individual solutions for special kind of data types, e.g., time-series data,
text, or software, follow this component directly. The second part finishes
with visualization techniques for different data domains, such as BioVis,
WebVis, GeoVis, etc. During the past years, I vary this part of the lecture a
little bit depending on my current research interests or hot-topics. Resources
for these lectures are current papers and articles, but also the second part
of the textbook [35].
Each course component of this second part is accompanied by short video or
tool demonstrations. From my perspective, this is absolutely needed, espe-
cially for the different interaction techniques and their interplay with visual
structures. It is fun to keep an eye on the students during such demon-
strations, and as a result they are motivated to ask deeper questions. One
interesting and traditional example is the claim to preserve the mental map
in dynamic graph drawing. Only with the help of a video or demo it is pos-
sible to illustrate the difference between morphing or other techniques, such
as foresighted layout [14]. However, the usage of video or tools is not always
possible because of unavailability.
3. My course concludes with 1-2 lectures on possible evaluation techniques and
the most important InfoVis challenges for the next years. Because of the miss-
ing time at the end of the semester, I focus on specific aspects of these issues.
For example, the intended learning aim is to impart students an overview
knowledge of basic evaluation techniques and—perhaps more important—an
impression of the difficulties to perform such an evaluation. The final discus-
sion of the most important challenges gives an idea about the current state
of the field and leads to take part of it, for example, by working on a thesis
in my research group. A good source for these issues are the corresponding
chapters in Kerren et al. [35].
Teaching Information Visualization 81
Another important part of the course are the assignments. They are composed
of a brief presentation, of a software implementation, and of a short software
demonstration at the end of the semester. Each student or student group (con-
sisting of maximal two students) chooses a specific research paper from a list
given on the course web page. I take care that the papers’ topics and the pre-
sented approaches are not too complex. The final aim of the assignment is that
the main idea of a paper should be implemented in any programming language.
It is not needed that all features or interaction possibilities are implemented. But
a GUI is mandatory in order to give me and the other students the chance to
load another input file etc. Data sets depend on the paper topic, e.g., if the paper
presents a new treemap layout then the students can choose their own input,
such as the hierarchical file system on their own personal computer. All these
topics should be discussed in the first presentation in the middle of the course.
In a first step, each student or group prepares a presentation (10 minutes plus
3-5 minutes of discussion) about the chosen paper followed by a working plan.
In this way, I can steer the processes, give hints, and prevent nasty surprises. At
the end of the course, all implementations are presented and discussed in class.
I have found that this division into two presentations and demos respectively
helps students to think about important concepts. Furthermore, they have al-
ready learned the most important theoretical concepts during the course before
they start to program. My overall impression of this practical project is very
positive. At the beginning, the students often had doubts because it appears
time-consuming and complex, but they had a lot of fun in the progress of the
semester. The results were mostly really great; for me it is important that they
learn to see the difficulties and to carefully reflect about the paper, not so much
the result itself. Often, however, the resulting programs were amazingly good.
My pedagogical concept, especially for the assignments, clearly follows moder-
ate constructivistic learning approaches, as described in the following written by
Jason Dykes or in some of my papers on learning concepts in context of using
Software Visualization techniques [33, 32, 59].
The course evaluation by my students led to very good results for this course.
They liked the way I structured the course, the motivating examples and videos,
and they had the subjective feeling that they have learned a lot of interesting
things. At large, it was not difficult to motivate students for InfoVis. It is a very
interesting field also suitable for the solution of practical problems. Therefore,
it was sometimes not so easy to explain why people cannot find more InfoVis in
standard software products. This leads to a problem that is discussed in paper
The Value of Information Visualization [17] of this book.
I would like to discuss one further issue that is important to me: Finding a
good balance between giving a good overview of the field as compared to explain-
ing the details of specific visualization techniques is pretty difficult, especially in
the frame of 15 course lectures. Some students liked to get more overview knowl-
edge of InfoVis, but they also disliked that some topics were only briefly covered.
For instance, I used 1-2 lectures for the visualization of graphs. It is enough time
to explain the most important things, but not enough time to explain the dif-
ferent graph drawing techniques in detail. Thus, I abstracted in many cases,
82 A. Kerren, J.T. Stasko, and J. Dykes
but some students would like to learn more. The level of detail/abstraction in
teaching InfoVis is not obvious. In general, my solution for this problem is to
offer a seminar back-to-back after the InfoVis course, where interested students
can choose a specific topic and prepare a presentation on it. This allows for
more deeper discussions. Additionally, such a seminar is a good starting point
for subsequent thesis work.
Typically, my courses terminate with an oral examination. Regarding our
survey results, this is consistent with the examination practice of many colleagues
coming from Europe, cp. Section 2.4.
Each of these sources provides useful criteria to structure critiques and against
which judgments can be made. Amar and Stasko’s framework [2] also offers
opportunities and Dagstuhl has drawn attention to the scope for using more
knowledge from InfoVis in developing critiquing criteria.
Learning by Doing:
been achieved. Amongst other competencies, outcomes require that students are
able to . . .
– explain the complex issues associated with GeoVisualization with clarity and
from an informed perspective by drawing upon recent academic research;
– design maps and data graphics that are effective, informative and consistent
and that exhibit graphical excellence and graphical integrity;
– use data graphics, maps and visualization tools to present and explore mul-
tifaceted data sets in a manner that is professional, informed and ethically
sound;
– evaluate data graphics, maps and visualization tools by drawing upon prin-
ciples and theories of design.
The approach seems to work nicely and addresses some of the issues discussed at
the Dagstuhl meeting. It may be useful for those wishing to help students learn
to critique and assess their developing skills. I have received favorable feedback
from students and internal and external evaluators.
It seems particularly appropriate for developing skills in graphicacy where
Tufte’s concept of ‘redesign’ is a key element. Portfolios or long-term developing
group projects that provide opportunities for feedback and critique can be very
beneficial here. It should be noted that portfolios are frequently used in the arts
where critiquing and redesign are key learning activities.
are used to support learning against broad aims and outcomes provide a way
forward. It is well worth using the URLs listed in the Dagstuhl survey to learn
from colleagues who use this approach (see Stasko, Munzner and Heer’s courses
for example). The kinds of repositories of examples and teaching materials dis-
cussed and suggested in Section 4.1 will help, as will a focus on generic methods
of teaching such as those that involve critique and active learning rather than
developing monolithic curricula that will age rapidly in response to new devel-
opments. Portfolio-based assessment that involves the focused combination of a
series of activities supports this flexible approach.
Adoption of the ideas discussed here would continue to move visualization
education away from core Computer Science. Doing so will continue the trend of
enabling more students to participate in visualization education and help address
the difficulties associated with multi-disciplinary domains - how do we focus si-
multaneously on the concepts listed in our tag cloud of the scope of Information
Visualization education (Figure 1)—computer science and algorithms, the sci-
ence of perception and cognitive studies and concepts derived from the arts such
as composition and design? The Dagstuhl survey has certainly helped inform
my approach to visualization education. Perhaps this discussion will help the
community when considering the nature of Information Visualization education
and how to best it might be supported and developed. I’d certainly be delighted
to debate the ideas and their relevance further.
5 Conclusion
This paper describes the results of our teaching survey based on the information
given by the Dagstuhl attendees. It covers several aspects of offered InfoVis
courses that range from different kinds of study materials to practical exercises.
We have reproduced the discussion during the Dagstuhl Seminar and added
our own experiences. In this regard, we have found that teaching InfoVis is
challenging because it is a new and growing field. There exist a lot of open
questions regarding the syllabus, a consistent theory, or the abstraction level of
single topics. In consequence, it is also a great subject for teachers, not only for
students: we are convinced that teaching InfoVis also leads to a better reflection
on the topics and to new ideas which can induce new projects. Finally, we hope
that this paper can serve as an interesting and helpful source for current and
future InfoVis teachers.
References
1. Amar, R., Eagan, J., Stasko, J.: Low-level components of analytic activity in
information visualization. In: Proceedings of the 2005 IEEE Symposium on In-
formation Visualization - INFOVIS ’05, October 2005, pp. 111–117 (2005)
88 A. Kerren, J.T. Stasko, and J. Dykes
2. Amar, R., Stasko, J.: A knowledge task-based framework for design and evaluation
of information visualizations. In: IEEE Symposium on Information Visualization
(InfoVis), Austin, TX, pp. 143–150. IEEE Computer Society Press, Los Alamitos
(2004)
3. Amar, R.A., Stasko, J.T.: Knowledge precepts for design and evaluation of infor-
mation visualizations. IEEE Transactions on Visualization and Computer Graph-
ics 11(4), 432–442 (2005)
4. Andrews, K.: Course: Information Visualisation (2007),
http://courses.iicm.tugraz.at/ivis
5. BENELUX Bologna Secretariat: About the Bologna Process (2008),
http://www.ond.vlaanderen.be/hogeronderwijs/bologna/about/
6. Bertin, J.: Semiology of Graphics: Diagrams, Networks, Maps. In: Translation of
Semilologie Graphique, University of Wisconsin Press, Madison (1983)
7. Brewer, C.A.: Designing Better Maps: A Guide for GIS Users. ESRI Press, Red-
lands (2005)
8. Card, S., Mackinlay, J., Shneiderman, B. (eds.): Readings in Information Visual-
ization: Using Vision to Think. Morgan Kaufmann, San Francisco (1999)
9. Chen, C.: Information Visualization: Beyond the Horizon. Springer, Heidelberg
(2006)
10. Cleveland, W.S., McGill, R.: Graphical perception: Theory, experimentation and
application to the development of graphical methods. Journal of the American
Statistical Association 79, 531–554 (1984)
11. Crampton, J.: Interactivity types in geographic visualization. Cartography and
Geographic Information Science 29(2), 85–98 (2002)
12. Csallner, C., Handte, M., Lehmann, O., Stasko, J.: FundExplorer: Supporting the
diversification of mutual fund portfolios using Context Treemaps. In: Proceedings
of the 2003 IEEE Symposium on Information Visualization - INFOVIS 2003,
October 2003, pp. 203–208 (2003)
13. Dagstuhl: Seminar 07221 Information Visualization – Human-Centered Issues and
Perspectives (2007),
http://www.dagstuhl.de/07221
14. Diehl, S., Görg, C., Kerren, A.: Preserving the Mental Map using Foresighted
Layout. In: Proceedings of Joint Eurographics – IEEE TCVG Symposium on Vi-
sualization (VisSym ’01), Eurographics, pp. 175–184. Springer, Heidelberg (2001)
15. Dykes, J., MacEachren, A.M., Kraak, M.-J.: Exploring Geovisualization. Perga-
mon Press, Oxford (2005)
16. Fayyad, U., Grinstein, G., Wierse, A.: Information Visualization in Data Mining
and Knowledge Discovery. Morgan Kaufmann, San Francisco (2001)
17. Fekete, J.-D., van Wijk, J.J., Stasko, J.T., North, C.: The Value of Information
Visualization. In: Kerren, A., Stasko, J.T., Fekete, J.-D., North, C.J. (eds.) Infor-
mation Visualization. LNCS, vol. 4950, Springer, Heidelberg (2008)
18. Few, S.: Show Me The Numbers. Analytics Press (2004)
19. Fosnot, C.T., Perry, R.S.: Constructivism: A psychological theory of learning. In:
Fosnot, C.T. (ed.) Constructivism: Theory, Perspective and Practice, 2nd edn.,
pp. 8–38. Teacher’s College Press, New York (2005)
20. Grinstein, G., O’Connell, T., Laskowski, S., Plaisant, C., Scholta, J., Whiting,
M.: Vast 2006 contest - a tale of alderwood. In: Proceedings of the 2006 IEEE
Symposium on Visual Analytics, Science and Technology - VAST 2006, October
2006, pp. 215–216 (2006)
21. Harrower, M.A.: Tips for designing effective animated maps. Cartographic Per-
spectives 44, 63–65 (2003)
Teaching Information Visualization 89
22. Harrower, M.A., Brewer, C.A.: Colorbrewer.org: An online tool for selecting color
schemes for maps. The Cartographic Journal 40(1), 27–37 (2003)
23. Hauser, H.: Course: Information Visualization (2007),
http://www.cg.tuwien.ac.at/courses/InfoVis/
24. HCC: HCC Education Digital Library (2007), http://hcc.cc.gatech.edu/
25. Heer, J.: Course: Visualization (2006), http://vis.berkeley.edu/courses/
cs294-10-sp06/
26. ILOG Visualization Suite. ILOG, Inc.
http://www.ilog.com/products/visualization/index.cfm, 2007.
27. InfoZoom: humanIT Software GmbH (2007), http://www.infozoom.com/enu/
28. Jankun-Kelly, T.: Course: Information Visualization (2006),
http://www.cse.msstate.edu/~ tjk/teaching/cse8990/
29. Keim, D.: Course: Information Visualization (2007), http://infovis.uni-
konstanz.de/index.php?region=teach&event=ss07&course=infovis
30. Keller, P.R., Keller, M.M.: Visual Cues – Practical Data Visualization. IEEE
Computer Society Press, Los Alamitos (1993)
31. Kent, M., Gilbertson, D.D., Hunt, C.O.: Fieldwork in geography teaching: a crit-
ical review of the literature and approaches. Journal of Geography in Higher
Education 21(3), 313–332 (1997)
32. Kerren, A.: Generation as Method for Explorative Learning in Computer Sci-
ence Education. In: Proceedings of the 9th Annual Conference on Innovation and
Technology in Computer Science Education (ITiCSE ’04), Leeds, UK, pp. 77–81.
ACM Press, New York (2004)
33. Kerren, A.: Learning by Generation in Computer Science Education. Journal of
Computer Science and Technology (JCS&T) 4(2), 84–90 (2004)
34. Kerren, A.: Course: Information Visualization (2007),
http://w3.msi.vxu.se/~ kerren/courses/lecture/ws06/infovis/
35. Kerren, A., Ebert, A., Meyer, J. (eds.): Human-Centered Visualization Environ-
ments. LNCS, vol. 4417. Springer, Heidelberg (2007)
36. Kerren, A., Stasko, J.T., Fekete, J.-D., North, C.: Workshop Report: Information
Visualization – Human-centered Issues in Visual Representation, Interaction, and
Evaluation. Information Visualization 6(3), 189–196 (2007)
37. Kerren, A., Stasko, J.T., Fekete, J.-D., North, C.J. (eds.): Information Visualiza-
tion. LNCS, vol. 4950. Springer, Heidelberg (2008)
38. KidPad: University of Maryland (2007), http://www.kidpad.org
39. Kosara, R.: Course: Visual Communication in Computer Graphics and Art (2007),
http://eagereyes.org/VisComm
40. Kosara, R.: Visualization Criticism: One Building Block for a Theory of Visualiza-
tion. In: Kerren, A., Stasko, J.T., Fekete, J.-D., North, C. (eds.) Abstracts Collec-
tion – Information Visualization - Human-Centered Issues in Visual Representa-
tion, Interaction, and Evaluation. Dagstuhl Seminar Proceedings, Dagstuhl, Ger-
many, vol. 07221, Internationales Begegnungs- und Forschungszentrum für Infor-
matik (IBFI) (2007), http://drops.dagstuhl.de/opus/volltexte/2007/1136
41. Krygier, J.B., Reeves, C., DiBiase, D.W., Cupp, J.: Design, implementation and
evaluation of multimedia resources for geography and earth science education.
Journal of Geography in Higher Education 21(1), 17–39 (1997)
42. Lichtenbelt, B., Crane, R., Naqvi, S.: Introduction to Volume Rendering. Prentice-
Hall, Englewood Cliffs (1998)
43. Ma, K.-L.: Course: Information Visualization (2006),
http://www.cs.ucdavis.edu/~ ma/ECS272/
90 A. Kerren, J.T. Stasko, and J. Dykes
Jeffrey Heer1 , Frank van Ham2 , Sheelagh Carpendale3 , Chris Weaver4, and
Petra Isenberg3
1
Electrical Engineering and Computer Sciences,
University of California, Berkeley,
360 Hearst Memorial Mining Building, Berkeley, CA 94720-1776, USA,
[email protected]
2
IBM Research, Visual Communications Lab,
1 Rogers Street, Cambridge, MA 02142, USA,
[email protected]
3
Department of Computer Science, University of Calgary,
2500 University Dr. NW, Calgary, AB, Canada T2N 1N4,
{sheelagh, petra.isenberg}@ucalgary.ca
4
GeoVISTA Center and the North-East Visualization and Analytics Center,
Department of Geography, Penn State University,
302 Walker Building, University Park, PA 16802, USA,
[email protected]
1 Introduction
A. Kerren et al. (Eds.): Information Visualization, LNCS 4950, pp. 92–133, 2008.
c Springer-Verlag Berlin Heidelberg 2008
Creation and Collaboration 93
2.1 Data
Interplay of Data Types: Note that the distinction between types of data
is not always clear cut and many data sets could fall into different categories
depending on their use. For example, a community data set on World of Warcraft
users and their interactions might be considered a scientific data set by social
scientists, while the personal data of celebrities might have a broad general
appeal. Visualizations of all these types of data can be shared, albeit for different
purposes. Personal data might be shared with other users as a means of personal
expression. Community data is often shared to spark broad discussion, while
scientific data often needs to be shared because it is too complex for one person to
analyze on their own or because it requires multiple specialized skills to analyze.
Creation and Collaboration 95
The recent trend toward visual analytics [91] is driven by the increasing need
to support open-ended management and exploration of large, loosely-connected,
and often unstructured information sources as well as the smaller, isolated, struc-
tured data sets typical of information visualization applications. Information col-
lection often involves assembling “shoeboxes” of loosely related nuggets and data
sets [107]. Visual analysis of information occurs by following chains of evidence,
evaluating formal hypotheses [27], testing competing explanations [86], or telling
stories [37] using visual metaphors to convey relationships and dynamics. These
activities are particularly challenging in intelligence analysis, emergency man-
agement, epidemiology, and other critical areas that involve high-dimensional
abstract information [83] and large geospatial datastores [36]. However, the het-
erogeneous and idiosyncratic nature of the data sets and analysis activities in
these endeavors are similar to those in everyday domains, making it likely that
the outcomes of visual analytics research will translate readily into visualization
approaches that will help to engage broad audiences.
2.2 Skills
Novice Users: By novice users we mean users who have experience operating
a computer, but no experience with programming in general, let alone program-
ming visualization techniques. The vast majority of novice visualization users
act as consumers: they will interact with the visualization within the possibil-
ities offered but will rarely extend existing functionality to suit their analysis
needs. If we want these users to be able to produce visualizations, we have to
take care to make this process as easy as possible. Some points of consideration
when designing visualizations for novice users are:
Data Input: We cannot expect a novice user to write their own data parser,
write database queries that export data to a particular format or understand
the file formats for more complex data types. Most novice users seem to take to
using spreadsheet programs such as Microsoft Excel to store and analyze their
data. One useful input format then, is a simple tab delimited input file, as this
format is both human readable and can be directly copied from the spreadsheet
editor.
Useful Defaults: Novice users likely will not spend time tuning an ugly looking
visualization to fit their needs. It is therefore important to provide a set of
sensible defaults for data and view parameters (such as scales, colors, item sizes
and viewpoints) to help constrain the parameter space that users have to explore.
Multiple combinations of these parameters can be offered by providing a preset
list. As an added bonus, a good set of presets can show users what is possible
and educate them on what is sensible.
Savvy Users: By savvy users we mean people who have experience performing
relatively sophisticated data organization and manipulation, using a combination
of manual processing and limited amounts of programming or scripting. Because
savvy users are a small but non-trivial part of the population of visualization
consumers, they are a critical bridge between experts and novices. As such, savvy
visualization users may act variously as:
– experts who train or guide novice users in the use of particular visualizations
by clarifying exploratory and analytic functionality in terms of interface
appearance and behavior,
– designers who plan, construct, debug, test, and deploy new visualizations
for ongoing evaluation and routine operation by novice users,
– end-users who can bring more extensive experience to bear when using ex-
isting visualizations to analyze data from their own knowledge domains, to
browse data with which they are less familiar, and to share their results with
others, and
– explorers (or user-designers) who combine the roles of designer and end-user
by extending and redesigning visualizations on the fly during open-ended
exploration of their data.
Expert Users: By expert users we mean people who have extensive experience
with interactive graphical software development and the theory and applica-
tion of data modeling, data processing, and visual data representation. As such,
visualization experts may act both as:
– researchers who invent, specify, and evaluate methods for accessing, query-
ing, rendering, and interacting with data, often with an eye toward extending
and enhancing the functionality of existing visualization systems and tools,
and
Creation and Collaboration 97
2.3 Goals
One of the traditional rationales for information visualization is that the human
visual system has high input bandwidth and has evolved as an excellent tool
for spotting patterns and outliers in our surroundings. If we then map large
amounts of data into visual form, we can use these innate human abilities to
explore the data to find patterns that would have been exceedingly difficult to
identify through purely automated techniques. A current prominent example is
bioinformatics research that visually explores gigabytes of gene experiments to
investigate the mechanisms that drive a particular disease. Such “explorative”
use-cases have dominated most of the research in visualization over the past
two decades. Explorative use can either be open-ended, where the user wants
to browse their data without having a predefined question in mind, or analyti-
cally driven, in which the user has a particular question in mind and uses the
visualization to answer it. Often times these two types of exploration will be
intertwined: a user will explore a previously unknown data set without a par-
ticular question in mind, stumble on an interesting data point and then use the
analytic features in the visualization to either answer the question or redirect
their open-ended exploration.
connect data to views. Design typically occurs directly within the interface that
contains data views, and often take effect immediately without the need for
a separate compilation or build stage. This live, amodal approach to interface
design allows users to switch rapidly between building and browsing tasks during
exploration and analysis. The result is a form of exploration that is free form and
open-ended, particularly during initial inspection of newly encountered data sets.
IVEE [2], DEVise [58], DataSplash [71], Snap-Together Visualization [69],
GeoVISTA Studio [88], Improvise [102], and Tableau/Show Me [60] are a few
of many well-known visualization environments that support open-ended data
exploration to various degrees. Such environments typically consist of a graphic
user interface on top of a library of visualization components which may or
may not be exposed as a visualization programming toolkit in its own right.
This combination of user interface and underlying library can enable open-ended
exploration in a very broad sense if it bridges the activities of visualization users
performing various roles with different levels of expertise, whether as individuals
or in collaborative groups.
To connect developers and designers, a key advantage for open-ended explo-
ration is an extensible library that provides an application programming inter-
face (API) for adding new software modules for various visualization components
(including data access, queries and other data transformation algorithms, views,
and visual data encodings). In particular, the most useful APIs support the def-
inition of new data transformation operators—including appropriate input and
output data object types—that give designers the ability to express rich relation-
ships between data, queries, and views. This requirement is essential for applying
newly discovered visualization techniques to emerging sources and forms of in-
formation, without needing to constantly architect and implement new toolkits
(and retrain visualization designers in their use).
To connect designers with users, the user interface must support the abil-
ity to access data sets (and metadata) from local or remote sources in various
formats, create and position views on the screen, specify how navigation and
selection affects views, specify queries on data, parameterize queries in terms of
interaction, and attach data sets and queries to views. In particular, designers
should be able to specify the appearance and behavior of their visualizations
directly within the user interface, without resorting to programming or other
workarounds for interface limitations. To do otherwise would effectively require
that designers be trained as developers.
User interfaces that truly support open-ended exploration would exceed the
requirements of basic visualization design and operation by: supporting live
building of complete browser interfaces, including immediate designing, debug-
ging, and testing of intended functionality; facilitating collaboration between
end-users and designers to turn analytical questions into structural changes
(through remote, nearby, or side-by-side efforts to communicate and effect rapid
visualization prototyping and polishing); and enabling rapid switching between
building and browsing to perform more extensive exploratory visualization by
modifying visualization views and queries on the fly. In particular, it is highly de-
sirable for explorers to be able to see all raw data quickly to make decisions about
Creation and Collaboration 99
how to visualize it, rapidly create and lay out views, rapidly attach data and
queries to views, rapidly modify queries, store, copy, and reuse views, copy-and-
paste/drag-and-drop visualization components, and use macros to build com-
mon multiple view constructions. Many of these capabilities are also desirable
for non-exploring designers who prepare visualizations for domain analysts.
In all of this, availability of common and familiar interface functionality is
essential to broad adoption. The user interface should run in the user’s normal
working environment, require no programming or design activities, and provide
a way to disseminate analytical results. For communication and collaboration,
it is highly desirable for the user interface to run easily on any platform, allow
visualizations to be opened and saved as normal documents for sharing between
users, and provide the ability to bookmark or screen capture visualizations in
different graphical states.
Fig. 1. A few of the many available information visualization tools, roughly mapped
according to targeted end-user and targeted goal. Light lines connect toolkits and
development environments to examples of visualizations created in them. Dark lines
roughly capture similar ranges of user/goal targets for relevant tools.
2.4 Tools
Note that most real-world uses of information visualization will form a combina-
tion of the use-cases and roles described in the preceding sections. A researcher
might program a new visualization technique to explore his complex data and
then present findings to a manager by sending a screenshot. In this case the
researcher takes on the roles of both consumer and developer and performs both
exploration and communication. Most current information visualization tools
and toolkits are geared towards one particular user skill and goal, although a
recent trend towards more flexible tools can be observed. To illustrate the rough
classification outlined in the previous subsections, in this section we give an
indicative sample of an end-user visualization tool for each user skill and goal
combination. Figure Fig. 1 illustrates a number of available visualization tools
categorized according to the skill level of the target user base and the degree to
which the tools support analytic and communicative tasks. Systems that span
a range of tasks or skills are presented as line segments indicating the range of
users and usage.
2.5 Directions
Current end user visualization tools are becoming more and more flexible in
the types of scenarios and goals they can handle. Tools like Many Eyes allow
novice users to create advanced visualizations with very little effort and also
support communicative use-cases by allowing flexible sharing of visualization
states. Tools like Improvise allow tight integration of many different types of
visualizations, but require some programming skills on the side of the end-user,
an expectation that is not always reasonable of domain experts dealing with the
visualization. Tableau allows end users to set up and pivot different types of ba-
sic visualizations in a fairly intuitive manner and the recent addition of Tableau
Server allows sharing of and commenting on these visualizations in an online envi-
ronment, making it also suitable for communicative purposes. Although flexible,
the only visualization types allowed are 2-dimensional small-multiple displays,
which limits the visualization and analysis types to basic business graphics.
In our opinion, the ultimate goal of letting novice users flexibly specify their
visualization needs and couple different types of views together has not been fully
realized yet. We expect that users’ visual literacy will increase as information
visualization becomes more mainstream, and will start demanding advanced
visualizations beyond the trusted bar chart. Integrating advanced visualizations
in an flexible, collaborative and easy to understand framework for open-ended
104 J. Heer et al.
Given the choice, it is common and natural for people to work together. This is
not a new phenomenon. Small groups of people gather for all kinds of reasons
including many that are work related; such as to get a job done faster, to share
expertise for a complex task, and to benefit from different insights from different
people. Also, when one considers the rapid growth in size and complexity of
datasets, it is not surprising that increasingly the practicality of an individual
analyzing an entire data-set is becoming unrealistic. Instead, the expertise to
analyze and make informed decisions about these information-rich datasets is
often best accomplished by a team [91]. For instance, imagine a team of medical
practitioners examining a patient’s medical record to plan an operation, a team
of biologists looking at test results to find causes for a disease, or a team of
businessmen planning next year’s budgets based on a large financial dataset. All
of these situations involve a group of people making use of visual information to
proceed with their work. Research towards supporting these team-based infor-
mation processes will expand the situations in which information visualization
can be used and is part of considering how to best support people in their normal
everyday information work practices.
This section draws from a wide variety of literature to shed light on questions
and issues that need to be considered during the development of co-located
collaborative information visualizations. We do not consider this discussion to be
exhaustive; rather it is our intention that the discussion will form the beginning of
design guidelines and considerations that will be modified and extended through
future research in collaborative information visualization.
Research in information visualization draws from the intellectual history of
several traditions, including computer graphics, human-computer interaction,
cognitive psychology, semiotics, graphic design, statistical graphics, cartogra-
phy, and art [64]. The synthesis of relevant ideas from these fields is critical
for the design and evaluation of information visualization in general and it is
only sensible to think that fields concerned with collaborative work also add
valuable information to our understand-ing of requirements for collaborative in-
formation visualization systems. Our sources include work in co-located collab-
oration in computer supported cooperative work [39,53,75,73,76,77,80,81,82,90],
information visualization [85,105,109,110,111], and empirical work investigating
collaborative visualization use [61,68,72].
The organization of this section is as follows. A brief overview of existing
research that relates to co-located collaborative information analysis is given in
section 3.1. Next, section 3.2 discusses the impact of recent advances in hardware
configurations and section 3.3 focuses on more general human computer inter-
action issues important for the support of the co-located collaborative process,
Creation and Collaboration 105
sessions. Their study also noted that participants showed a strong tendency for
independent work, if the option was available. Isenberg et al. studied co-located
collaborative data analysis scenarios and posit an eight-process framework that
relates to previous work on the Sensemaking Cycle [17] and the two studies by
Mark and Kobsa [61] and Park et al. [72]. However, a common temporal order
of analysis processes as posited by some previous work did not emerge.
Input: In the common desktop setup, input is provided for one person through
one keyboard and one mouse. To support collaboration, ideally, each person would
have at least one means of input. In addition, it would be helpful if this input was
identifiable, making it possible to personalize system responses. If a collaborative
Creation and Collaboration 107
system supports multi-user input, the access to a shared visualization and data set
has to be coordinated. Also, synchronous interactions on a single representation
may require the design and implementation of new types of multi-focus visualiza-
tions. Ryall et al. [77] have examined the problem of personalization of parameter
changes for widget design, allowing widgets to be dynamically adapted for indi-
viduals within a group. Similar ideas could be implemented for personalization of
information visualizations during collaborative work.
Resolution: Resolution is an issue both for the output (the display) and for the
input. The display resolution has a great influence on the legibility of information
visualizations. Large display technology currently often suffers from relatively
low display resolution so that visualizations might have to be re-designed so that
readability of text, color, and size not affected by display resolution. Also, large
interactive displays are often operated using fingers or pens which have a rather
low input resolution. Since information visualizations often display large data
sets with many relatively small items, the question of how to select these small
items using low input resolution techniques becomes an additional challenge that
needs special attention [48].
Presentation Issues: Presentation has been defined as ’something set forth for
the attention of the mind’ [63] and as ’the way in which suitably encoded data
is laid out within available display space and time’ [87]. From these definition
is clear that changing display configurations, as is usually the case to support
co-located collaboration, will impact the types of presentations techniques that
are possible and/or appropriate. Common presentation techniques include pan
& zoom, focus & context, overview & detail, filtering, scrolling, clutter reduc-
tion, etc.
A common theme in information visualization is the development of pre-
sentation techniques that overcome the problem of limited display space (e. g.
[4,20,49]). In collaborative scenarios, information visualizations might have to
cover larger areas than in a single user scenario as group members might prefer
to work in a socially acceptable distance from each other. The display space
might also have to be big enough to display several copies of one representation
if team members want to work in parallel.
If groups are working over a shared presentation of data, presentations might
have to be adapted to allow collaborators to drill down and explore different parts
of the data in parallel. Collaborative information visualizations will likely have
to sup-port multiple simultaneous state changes. This poses additional problems
of information context. Team members might want to explore different parts of
a dataset and place different foci if the dataset is large and parts of the display
have to be filtered out. Information presentations might have to be changed to
allow for multi-focus exploration that does not interfere with the needs of more
than one collaborator. For example, DOI Trees [18] or hyperbolic trees [55] are
examples of tree visualizations in which only one focus on the visualization is
currently possible. ArcTrees [67] and TreeJuxtaposer [65], for example, allow
for multi foci over one tree display but these were not designed to take the
information needs of multiple collaborators into account and might still occlude
valuable information.
112 J. Heer et al.
Interaction Issues: Most interaction issues deal with interaction with repre-
sentations, presentations and views, thus discussing them here would overlap
with points raised under these headings. However, there are some more general
interaction issues. When people are co-located, they are in the situation in which
people naturally collaborate, the situation in which people have collaborated for
centuries. When face-to-face, people naturally know how to collaborate and are
so used to picking up subtle cues from each other that they may do this without
even being conscious of the precise details of the underlying coordination and
communication practices that are in play. As the developers of co-located collab-
orative information visualizations, our task is to facilitate information access and
exploration without interfering with the social protocols that make collaboration
effective. However, to do this we have to understand what these social collabo-
ration practices are and specifically if there are any differences when people are
collaborating using visual information. Some factors are:
Interactive Response Rates: Information visualization has always had a lot of
requirements in that it deals with extremely large and complex data sets and
in that it can have considerable graphics requirements for these complex rep-
resentations. Adding larger screens, more screens, higher pixel counts, multiple
simultaneous inputs, and possibly multiple representations will increase compu-
tational load adding more requirements to the challenge of maintaining good
interactive rates. Thus implementations of collaborative information visualiza-
tions will have to be carefully designed for efficiency. While continued hardware
advances will mitigate this to some extent, it will be important to address issues
in both efficient data processing and fast graphic rendering.
Interaction History: A history task has been defined as a task that involves
keeping a history of actions to support undo, replay, and progressive refinement
[85]. In a collaborative scenario keeping such a history can have other benefits.
If a visualization tracks and reveals which data items have been visited and by
whom this information could be valuable for collaborators helping them under-
stand their team members’ actions, find unexplored parts of a visualization or to
confirm discoveries made by others. A visualized interaction history may support
collaboration by promoting mutual understanding of team members involvement
in the task [24] and may help keep group members aware of each others actions
as people shift from individual to shared views of the data [39]. An exploration
history can be useful in such activities as validating work done, in explaining a
discovery process to other team members, and in supporting discussions about
data explorations.
Information Access: Exactly how to handle information access is an important
collaboration issue. The main themes in the research discussion thus far have
been motivated by social protocol issues and data centric concerns. While these
have not been seen as mutually exclusive they are quite distinct ideas. The so-
cial protocol theme has made considerable use of observational studies to better
understand exactly what are the social protocols and how do they impact col-
laboration. These understandings are then used as a basis for software design.
Creation and Collaboration 115
The data centric approach discusses factors such as who has (or does not have)
rights to which parts of the data?, who can change the scale, zoom, or rotation
settings for a shared view of the data? And how does a data item get passed
between team members (hand-off). Restriction has been suggested as a means to
stop certain members from making unsuspected global changes to the data that
might change other members’ view of the same data [75]. Similar issues pertain-
ing to workspace awareness (individual vs. shared views), artefact manipulation
(who can make which changes), and view representation have been raised [39].
Is a single shared representation adequate? Should a system allow for multiple
representations? Should the exploration on multiple representations of the same
dataset be linked or be completely independent?
to create comments with pointers into the visualization provides an easy way to
choreograph a step-by-step presentation.
Swivel: Sharing Data on the Web: Swivel.com is a web site that supports
sharing and discussion around data. The service appears to be modeled on sites
such as YouTube that support sharing of other media. In keeping with this
model, Swivel allows users to upload data sets and talk about them in attached
discussion forums. In addition, the site automatically generates graphs by com-
bining columns from uploaded data sets into bar charts, pie charts, and scatter
plots. Pointing behavior on the site appears limited.
Although the graphs on Swivel are not interactive, the site provides an exam-
ple of social data analysis in action, in particular the importance of collaborative
publishing and sharing of visualizations. While there do not seem to be many
extensive conversations in Swivel’s discussion area there has been significant use
of Swivel’s graphs among bloggers to discuss statistics. In other words, it appears
that the ability to publish graphs for use in other contexts is most valuable to
Swivel’s users.
photos across the globe, along with the ability to select geographic regions for
annotation with names and additional data (Fig. 2). View sharing is supported
through automatically updating URLs. As the view is panned or zoomed, the
current URL updates dynamically to reflect the current zoom level and latitude
and longitude values. Pointing is supported through annotations. Users can draw
rectangular and polygonal annotations, which scale appropriately as the map is
zoomed. To avoid clutter, annotations are filtered as the view is zoomed; the
viewer does not see annotations that are too small to be legible or so large they
engulf the entire display, improving the scalability of the system.
Wikimapia supports conversation using an embedded discussion technique.
Each annotation is a link to editable text. Descriptive text about a geographic
region can then be edited by anyone, similar to articles on Wikipedia. Discussion
also occurs through voting. When annotations are new, users can vote on whether
they agree or disagree with the annotation. Annotations that are voted down
are removed from the system. For instance, the small town of Yelapa, Mexico
is located on an inlet in a bay near Puerto Vallarta. However, the bay has a
number of inlets very close together. As a result, multiple conflicting annotations
for Yelapa appeared. Through voting, the incorrect regions were discarded and
the correct annotation was preserved.
social activity. Users who tired of exploring visualizations turned their focus to
the comment listings. Reading others’ comments sparked new questions that led
users back into the visualization, stimulating further analysis. The sense.us pro-
totype was initially available on a corporate intranet which provided employees
with blogs and a social bookmarking service. Users of sense.us found ways to
publish their findings, typically by taking screenshots and then placing them on
blogs or the bookmarking service with application bookmarks. These published
visualizations drew additional traffic to the site.
(a) Collaborative activity might be introduced at any (b) The sensemaking model in
phase of the information visualization pipeline. [17] can be applied to identify
potential mechanisms for col-
laborative analysis (e. g., [43])
visualizations already enjoy game-like properties, being highly visual, highly in-
teractive, and often animated. Heer [42] discusses various examples in which
playful activity contributes to analysis, applying insights from an existing the-
ory of playful behavior [16] that analyzes the competitive, visceral, and teamwork
building aspects of play. For example, scoring mechanisms could be applied to
create competitive social-psychological incentives. Game design might also be
used to allocate attention, for example, by creating a team-oriented “scavenger
hunt” analysis game focused on a particular subject matter. Salen and Zim-
merman [78] provide a thorough resource for the further study of game design
concepts.
4.3 Summary
In this section, we introduce an emerging use of interactive visualization: collab-
orative visual analysis across space and time. The Web has opened up new possi-
bilities for large-scale collaboration around visualizations and holds the potential
for improved analysis and dissemination of complex data sets. A new class of
systems explores these possibilities, enabling web-based data access, exploration,
view sharing, and discussion around both static and interactive visualizations.
Already, these systems exhibit the promise of web-based collaboration, provid-
ing examples of collective data analysis in which group members combine their
knowledge to make sense of observed data trends and disseminate their findings.
Still, many research questions remain on how to structure collaboration. For
example, how can we move beyond simple textual comments to better scale and
integrate diverse contributions? Interested readers may wish to consult [96,46,43]
for further discussions on this topic. As described in section 2, another open
question is how to design for particular audiences. Different scenarios – includ-
ing scientific collaboration, business intelligence, and public data consumption –
involve different skill sets, scales of collaboration, and standards of quality. Going
forward, case studies in these scenarios are crucial to better tailoring visualiza-
tion tools to such varied audiences. By enabling users to collectively explore
data, share views and findings, and debate competing hypotheses, the resulting
collaborative visual analysis systems hold the potential to improve the number
and quality of insights gained from our ever-increasing collections of data.
5 Conclusion
The adoption of visualization technologies by people from different walks of life
has important implications for visualization research and development. Visual-
ization construction tools are lowering barriers to entry, resulting in end-user
created visualizations of every kind of data set imaginable. Concurrently, new
technologies enabling collaborative use of visualizations in both physical and
online settings hold the potential to change the way we explore, analyze, and
communicate. In this paper, we have sought to identify these emerging trends
and provide preliminary design considerations for advancing the state-of-the-art
of visualization and visual analytic tools.
As a parting comment, we note that the release of visualization tools “into
the wild” will undoubtedly result in a plethora of unexpected developments.
Equipped with new creation and collaboration tools, users will almost certainly
re-appropriate these technologies for unexpected purposes. Already, use of sys-
tems like Many-Eyes has revealed new genres of data-oriented play and self-
expression that complement more traditional analytic activities.
As researchers, it is imperative that we interface with these developments
in a productive fashion. It is likely that visualization tools will not only be
used in unexpected ways, but in ways we actively dislike. As new audiences
are exposed to visualization technologies, “bad” or “chart junk” visualizations
will be generated. Furthermore, visualizations will be used to support actions
126 J. Heer et al.
or points of view we may find distasteful, and any communication medium that
is sufficiently powerful to inform may also be used to lie or misrepresent. We
as a community should not be so concerned with trying to control the medium
or prevent people from lying or creating bad visualizations. As audiences get
more comfortable communicating with visualizations, we optimistically expect
the quality of visualizations and nuance of interpretation to improve.
However, this proscription does not mean that researchers should idly sit on
their hands. Rather, there will be an expanded role for visualization experts to
play. Issues of data provenance, cleaning, and integrity will force the research
community to focus on the visualization pipeline in a more holistic manner.
Supporting data at varied levels of structure will become increasingly necessary.
New genres of visualization use may require new designs and new systems to
support emerging practices, and the design of visual exploration tools that both
empower and educate will take on new importance. Consequently, the entrance
of visualization technologies into the mainstream offers a new horizon of research
opportunities.
References
1. Agrawala, M., Beers, A.C., McDowall, I., Fröhlich, B., Bolas, M., Hanrahan, P.:
The Two-User Responsive Workbench: Support for Collaboration Through In-
dividual Views of a Shared Space. In: International Conference on Computer
Graphics and Interactive Techniques (Siggraph ’97), pp. 327–332. ACM Press,
New York (1997), doi:10.1145/258734.258875
2. Ahlberg, C., Wistrand, E.: IVEE: An information visualization & exploration
environment. In: Proceedings of the IEEE Symposium on Information Visualiza-
tion, Atlanta, GA, October 1995, pp. 66–73. IEEE Computer Society Press, Los
Alamitos (1995)
3. Anupam, V., Bajaj, C.L., Schikore, D., Shikore, M.: Representations in distributed
cognitive tasks. IEEE Computer 27(7), 37–43 (1994)
4. Artero, A.O., Ferreira de Oliveira, M.C., Levkowitz, H.: Uncovering clusters in
crowded parallel coordinates visualizations. In: Proceedings of the IEEE Sympo-
sium on Information Visualization (InfoVis), pp. 81–88. IEEE Computer Society
Press, Los Alamitos (2004)
5. Baldonado, M.Q.W., Woodruff, A., Kuchinsky, A.: Guidelines for using multiple
views in information visualization. In: Proceedings of AVI ’00, pp. 110–119. ACM
Press, New York (2000), doi:10.1145/345513.345271
6. Benbunan-Fich, R., Hiltz, S.R., Turoff, M.: A comparative content analysis of face-
to-face vs. asynchronous group decision making. Decision Support Systems 34(4),
457–469 (2003)
7. Benkler, Y.: Coase’s penguin, or, linux and the nature of the firm. Yale Law
Journal 112(369) (2002)
8. Benko, H., Wilson, A.D., Baudisch, P.: Precise Selection Techniques for Multi-
Touch Screens. In: Proceedings of the Conference on Human Factors in Comput-
ing Systems (CHI’06), Montréal, Canada, April 22-27, 2006, pp. 1263–1272. ACM
Press, New York (2006), doi:10.1145/1124772.1124963
9. Benko, H., Ishak, E.W., Feiner, S.: Collaborative mixed reality visualization of
an archaeological excavation. In: IEEE International Symposium on Mixed and
Augmented Reality (ISMAR 2004), Arlington, VA, pp. 132–140 (2004)
Creation and Collaboration 127
10. Bertin, J.: Semiology of Graphics: Diagrams Networks Maps (Translation of:
Sémiologie graphique). The University of Wisconsin Press, Madison (1983)
11. Billman, D., Convertino, G., Shrager, J., Pirolli, P., Massar, J.P.: Collaborative
intelligence analysis with cache and its effects on information gathering and cog-
nitive bias. In: Human Computer Interaction Consortium Workshop (2006)
12. Brennan, S.E.: How conversation is shaped by visual and spoken evidence. In:
Trueswell, Tanenhaus (eds.) Approaches to studying world-situated language use:
Bridging the language-as-product and language-as-action traditions, pp. 95–129.
MIT Press, Cambridge (2005)
13. Brennan, S.E., Mueller, K., Zelinsky, G., Ramakrishnan, I.V., Warren, D.S., Kauf-
man, A.: Toward a multi-analyst, collaborative framework for visual analytics. In:
IEEE Symposium on Visual Analytics Science and Technology (2006)
14. Brodlie, K.W., Duce, D.A., Gallop, J.R., Walton, J.P.R.B., Wood, J.D.: Distrib-
uted and collaborative visualization. Computer Graphics Forum 23(2), 223–251
(2004)
15. Brush, A.J., Bargeron, D., Grudin, J., Gupta, A.: Notification for shared anno-
tation of digital documents. In: Proc. ACM Conference on Human Factors in
Computing Systems (CHI’02) (2002)
16. Caillois, R.: Man, Play, and Games. Free Press of Glencoe (1961)
17. Card, S., Mackinlay, J.D., Shneiderman, B.: Readings In Information Visualiza-
tion: Using Vision To Think. Morgan Kauffman Publishers, Inc., San Francisco
(1999)
18. Card, S.K., Nation, D.: Degree-of-Interest Trees: A component of an attention-
reactive user interface. In: Proceedings of the Working Conference on Advanced
Visual Interfaces, May 2002, pp. 231–245. ACM Press, New York (2002),
http://www2.parc.com/istl/projects/uir/pubs/items/UIR-2002-11-Card-
AVI-DOITree.pdf
19. Caroll, J., Rosson, M.B., Convertino, G., Ganoe, C.H.: Awareness and teamwork
in computer-supported collaborations. Interacting with Computers 18(1), 21–46
(2005)
20. Carpendale, M.S.T., Montagnese, C.: A framework for unifying presentation
space. In: Schilit, B. (ed.) Proceedings of ACM Symposium on User Interface
Software and Technology (UIST), pp. 61–70. ACM Press, New York (2001),
http://pages.cpsc.ucalgary.ca/~sheelagh/personal/pubs/2001/carpendal
euist01.pdf, doi:10.1145/502348.502358
21. Carter, S., Mankoff, J., Goddi, P.: Building connections among loosely cou-
pled groups: Hebb’s rule at work. Journal of Computer-Supported Cooperative
Work 13(3), 305–327 (2004)
22. Cheshire, C.: Selective incentives and generalized information exchange. Social
Psychology Quarterly 70(1) (2007)
23. Chi, E.H.-H., Riedl, J.T.: An Operator Interaction Framework for Visualization
Systems. In: Wills, G., Dill, J. (eds.) Proceedings of the IEEE Symposium on
Information Visualization (InfoVis ’98), pp. 63–70. IEEE Computer Society Press,
Los Alamitos (1998)
24. Chuah, M.C., Roth, S.F.: Visualizing Common Ground. In: Proc. of the Conf. on
Information Visualization (IV), pp. 365–372. IEEE Computer Society Press, Los
Alamitos (2003)
25. Chui, Y.-P., Heng, P.-A.: Enhancing view consistency in collaborative medical
visu-alization systems using predictive-based attitude estimation. In: First IEEE
International Workshop on Medical Imaging and Augmented Reality (MIAR’01),
Hong Kong, China (2001)
128 J. Heer et al.
26. Clark, H.H.: Pointing and placing. In: Kita, S. (ed.) Pointing. Where language,
culture, and cognition meet, pp. 243–268. Lawrence Erlbaum, Mahwah (2003)
27. Cluxton, D., Eick, S.G., Yun, J.: Hypothesis visualization. In: Proceedings of the
IEEE Symposium on Information Visualization (Posters Compendium), Austin,
TX, October 2004, pp. 9–10. IEEE Computer Society Press, Los Alamitos (2004)
28. Dietz, P.H., Leigh, D.L.: Diamondtouch: A multi-user touch technology. In: Proc.
ACM Symposium on User Interface Software and Technology, pp. 219–226 (2001)
29. Dourish, P., Belotti, V.: Awareness and coordination in shared workspaces. In:
Proc. ACM Conference on Computer-Supported Cooperative Work, Toronto, On-
tario, pp. 107–114 (1992)
30. Dourish, P., Chalmers, M.: Running out of space: Models of information naviga-
tion. In: Proc. Human Computer Interaction (HCI’94) (1994)
31. Dynamics, G.: Command post of the future. Website (accessed November 2007)
32. Eccles, R., Kapler, T., Harper, R., Wright, W.: Stories in geotime. In: Proc. IEEE
Symposium on Visual Analytics Science and Technology (2007)
33. Ellson, J., Gansner, E.R., Koutsofios, E., North, S.C., Woodhull, G.: Graphviz
and Dynagraph – static and dynamic graph drawing tools. Online Documentation
(accessed November 2007)
34. Fekete, J.-D.: The Infovis Toolkit. In: Ward, M., Munzner, T. (eds.) Proceedings
of the IEEE Symposium on Information Visualization (InfoVis), pp. 167–174.
IEEE Computer Society Press, Los Alamitos (2004)
35. Forlines, C., Shen, C.: DTLens: Multi-user Tabletop Spatial Data Exploration.
In: Proc. of User Interface Software and Technology (UIST), pp. 119–122. ACM
Press, New York (2005), doi:10.1145/1095034.1095055
36. Gahegan, M., Wachowicz, M., Harrower, M., Rhyne, T.-M.: The integration of
geographic visualization with knowledge discovery in databases and geocomputa-
tion. Cartography and Geographic Information Society 28(1), 29–44 (2001)
37. Gershon, N.: What storytelling can do for information visualization. Communi-
cations of the ACM 44(8), 31–37 (2001)
38. Grimstead, I.J., Walker, D.W., Avis, N.J.: Collaborative visualization: A review
and taxonomy. In: Proceedings of the Symposium on Distributed Simulation and
Real-Time Applications, pp. 61–69. IEEE Computer Society Press, Los Alamitos
(2005)
39. Gutwin, C., Greenberg, S.: Design for individuals, design for groups: Tradeoffs
between power and workspace awareness. In: Proceedings of Computer Sup-
ported Cooperative Work (CSCW), pp. 207–216. ACM Press, New York (1998),
doi:10.1145/289444.289495
40. Hancock, M., Carpendale, S.: Supporting multiple off-axis viewpoints at a table-
top display. In: Proceedings of Tabletop, pp. 171–178. IEEE Computer Society
Press, Los Alamitos (2007)
41. Heer, J.: The flare visualization toolkit. Website (accessed November 2007)
42. Heer, J.: Socializing visualization. In: Proc. CHI 2006 Workshop on Social Visu-
alization (2006)
43. Heer, J., Agrawala, M.: Design Considerations for Collaborative Visual Analyt-
ics. In: IEEE Symposium on Visual Analytics Science and Technology (VAST),
pp. 171–178. IEEE Computer Society Press, Los Alamitos (2007), http://vis.
berkeley.edu/papers/design collab vis/2007-DesignCollabVis-VAST.pdf
44. Heer, J., Card, S.K., Landay, J.A.: prefuse: A toolkit for interactive infor-
mation visualization. In: Proceedings of the Conference on Human Factors
in Computing Systems (CHI), pp. 421–430. ACM Press, New York (2005),
doi:10.1145/1054972.1055031
Creation and Collaboration 129
45. Heer, J., boyd, d.: Vizster: Visualizing Online Social Networks. In: Proceedings
of the IEEE Symposium on Information Visualization (InfoVis ’05), pp. 33–40.
IEEE Computer Society Press, Los Alamitos (2005)
46. Heer, J., Viégas, F.B., Wattenberg, M.: Voyagers and voyeurs: Supporting asyn-
chronous collaborative information visualization. In: Proceedings of the Confer-
ence on Human Factors in Computing Systems (CHI), pp. 1029–1038. ACM Press,
New York (2007)
47. Hill, W.C., Hollan, J.D.: Deixis and the future of visualization excellence. In:
Proc. of IEEE Visualization, pp. 314–319 (1991)
48. Isenberg, T., Neumann, P., Carpendale, S., Nix, S., Greenberg, S.: Interac-
tive annotations on large, high-resolution information displays. In: Confer-
ence Compendium of IEEE VIS, InfoVis, and VAST, pp. 124–125. IEEE
Computer Society Press, Los Alamitos (2006), http://cpsc.ucalgary.ca/
~isenberg/papers/Isenberg 2006 IAL.pdf
49. Jerding, D.F., Stasko, J.T.: The Information Mural: A Technique for Displaying
and Navigating Large Information Spaces. In: Proceedings of the IEEE Sympo-
sium on Information Visualization (InfoVis), pp. 43–50. IEEE Computer Society
Press, Los Alamitos (1995)
50. Johnson, B., Shneiderman, B.: Tree-maps: A space-filling approach to the visual-
ization of hierarchical information structures. In: Proceedings of IEEE Visualiza-
tion, pp. 284–291. IEEE Computer Society Press, Los Alamitos (1991)
51. Kerren, A., Stasko, J.T., Fekete, J.-D., North, C.J. (eds.): Information Visualiza-
tion. LNCS, vol. 4950. Springer, Heidelberg (2008)
52. Kleinmutz, D.N., Schkade, D.A.: Information Displays and Decision Processes.
Psychological Science 4(4), 221–227 (1993)
53. Kruger, R., Carpendale, S., Scott, S.D., Greenberg, S.: Roles of orientation in
tabletop collaboration: Comprehension, coordination and communication. Journal
of Computer Supported Collaborative Work 13(5–6), 501–537 (2004)
54. Kruger, R., Carpendale, S., Scott, S.D., Tang, A.: Fluid integration of rotation
and translation. In: Proceedings of Human Factors in Computing Systems (CHI),
pp. 601–610. ACM Press, New York (2005), doi:10.1145/1054972.1055055
55. Lamping, J., Rao, R., Pirolli, P.: A focus + context technique based on hyperbolic
geometry for visualizing large hierarchies. In: Proceedings of the Conference of
Human Factors in Computing Systems,CHI, pp. 401–408. ACM Press, New York
(1995), doi:10.1145/223904.223956
56. Ling, K., Beenen, G., Lundford, P., Wang, X., Chang, K., Li, X., Cosley, D.,
Frankowski, D., Terveen, L., Rashid, A.M., Resnick, P., Kraut, R.: Using so-
cial psychology to motivate contributions to online communities. Journal of
Computer-Mediated Communication 10(4) (2005)
57. Liu, Y., Gahegan, M., Macgill, J.: Increasing geocomputational interoperability:
Towards a standard geocomputation API. In: Proceedings of GeoComputation,
Ann Arbor, MI (2005)
58. Livny, M., Ramakrishnan, R., Beyer, K., Chen, G., Donjerkovic, D., Lawande,
S., Myllymaki, J., Wenger, K.: DEVise: Integrated querying and visualization of
large datasets. In: Proceedings of the International Conference on Management
of Data (SIGMOD), Tucson, AZ, pp. 301–312. ACM Press, New York (1997)
59. Mackinlay, J.D.: Automating the design of graphical presentations of relational
information. ACM Transactions on Graphics 5(2), 110–141 (1986)
60. Mackinlay, J.D., Hanrahan, P., Stolte, C.: Show me: Automatic presentation
for visual analysis. IEEE Transactions on Visualization and Computer Graph-
ics 13(6), 1137–1144 (2007)
130 J. Heer et al.
61. Mark, G., Kobsa, A.: The effects of collaboration and system transparency on
CIVE usage: An empirical study and model. Presence 14(1), 60–80 (2005)
62. Marr, D.: Vision: A Computational Investigation into the Human Representation
and Processing of Visual Information. W.H. Freeman, New York (1982)
63. Merriam-Webster: Webster’s english dictionary. Website (accessed November
2007), http://www.cs.chalmers.se/~ hallgren/wget.cgi?presentation
64. Munzner, T.: Guest Editor’s Introduction: Information Visualization. Computer
Graphics and Applications 22(1), 20–21 (2002)
65. Munzner, T., Guimbretiére, F., Tasiran, S., Zhang, L., Zhou, Y.: TreeJuxta-
poser: Scalable Tree Comparison Using Focus+Context with Guaranteed Visi-
bility. ACM Transactions on Graphics 22(3), 453–462 (2003)
66. Nacenta, M.A., Sakurai, S., Yamaguchi, T., Miki, Y., Itoh, Y., Kitamura, Y., Sub-
ramanian, S., Gutwin, C.: E-conic: a perspective-aware interface for multi-display
environments. In: Proceedings of ACM Symposium on User Interface Software
and Technology (UIST’01), pp. 279–288. ACM Press, New York (2007)
67. Neumann, P., Schlechtweg, S., Carpendale, M.S.T.: ArcTrees: Visualizing Rela-
tions in Hierarchical Data. In: Proceedings of Eurographics / IEEE VGTC Sym-
posium on Visualization (EuroVis 2005, June 1–3, 2005, Leeds, England, UK),
Aire-la-Ville. Eurographics Workshop Series, pp. 53–60. Eurographics (2005)
68. Neumann, P., Tang, A., Carpendale, S.: A Framework for Visual Information
Analysis. Technical Report 2007-87123, University of Calgary, Calgary, AB,
Canada (July 2007)
69. North, C., Shneiderman, B.: Snap-together visualization: A user interface for co-
ordinating visualizations via relational schemata. In: Proceedings of the Working
Conference on Advanced Visual Interfaces (AVI), May 2000, pp. 128–135. ACM
Press, New York (2000)
70. Olson, G.M., Olson, J.S.: Distance Matters. Human-Computer Interaction 15(2
& 3), 139–178 (2000)
71. Olston, C., Woodruff, A., Aiken, A., Chu, M., Ercegovac, V., Lin, M., Spalding,
M., Stonebraker, M.: Datasplash. In: Proceedings of the International Conference
on Management of Data (SIGMOD), Seattle, WA, June 1998, pp. 550–552. ACM
Press, New York (1998)
72. Park, K.S., Kapoor, A., Leigh, J.: Lessons learned from employing multiple per-
spectives in a collaborative virtual environment for visualizing scientific data.
In: Proceedings of Collaborative Virtual Environments (CVE), pp. 73–82. ACM
Press, New York (2000), doi:10.1145/351006.351015
73. Pinelle, D., Gutwin, C., Greenberg, S.: Task analysis for groupware usability eval-
uation: Modeling shared-workspace tasks with the mechanics of collaboration.
ACM Transaction of Human Computer Interaction 10(4), 281–311 (2003)
74. Rensink, R.A.: chapter Change Blindness. In: McGraw-Hill Yearbook of Science
and Technology, pp. 44–46. McGraw-Hill, New York (2005)
75. Meredith, Ryall, K., Shen, C., Forlines, C., Morris, F.V.R.: Beyond ”social proto-
cols”: Multi-user coordination policies for co-located groupware. In: Proceedings
of Computer-Supported Cooperative Work (CSCW), pp. 262–265. ACM Press,
New York (2004), doi:10.1145/1031607.1031648
76. Rogers, Y., Lindley, S.: Collaborating around vertical and horizontal large inter-
active displays: Which way is best? Interacting with Computers 16(6), 1133–1152
(2004)
77. Meredith, Everitt, K., Ryall, F.V.K., Esenther, A., Forlines, C., Shen, C., Ship-
man, S., Morris, R.: Identity-differentiating widgets for multiuser interactive sur-
faces. IEEE Computer Graphics and Applications 26(5), 56–64 (2006)
Creation and Collaboration 131
78. Salen, K., Zimmerman, E.: Rules of Play: Fundamentals of Game Design. MIT
Press, Cambridge (2003)
79. Saraiya, P., North, C., Duca, K.: An Insight-Based Methodology for Evaluating
Bioinformatics Visualizations. IEEE Transactions on Visualization and Computer
Graphics 11(4), 443–456 (2005)
80. Scott, S.D., Carpendale, M.S.T., Habelski, S.: Storage bins: Mobile storage for
collaborative tabletop displays. IEEE Computer Graphics and Applications 25(4),
58–65 (2005), http://doi.ieeecomputersociety.org/10.1109/MCG.2005.86
81. Scott, S.D., Carpendale, M.S.T., Inkpen, K.M.: Territoriality in collaborative
tabletop workspaces. In: Proceedings of Computer-Supported Cooperative Work
(CSCW), pp. 294–303. ACM Press, New York (2004), http://innovis.cpsc.
ucalgary.ca/pubs/2004/Territoriality.CSCW/scott cscw2004.pdf,
doi:10.1145/1031607.1031655
82. Scott, S.D., Grant, K.D., Mandryk, R.L.: System guidelines for co-located collab-
orative work on a tabletop display. In: Proceedings of the European Conference
on Computer-Supported Cooperative Work (ECSCW), pp. 159–178. Kluwer Aca-
demic Publishers, Dordrecht (2003),
http://www.ecscw.uni-siegen.de/2003/009Scott_ecscw03.pdf
83. Seo, J., Shneiderman, B.: Knowledge discovery in high dimensional data: Case
studies and a user survey for an information visualization tool. IEEE Transactions
on Visualization and Computer Graphics 12(3), 311–322 (2006)
84. Shen, C., Lesh, N., Vernier, F.: Personal digital historian: Story sharing around
the table. ACM Interactions 10(2), 15–22 (2003)
85. Shneiderman, B.: The eyes have it: A task by data type taxonomy for information
visualizations. In: Proceedings of the IEEE Symposium on Visual Languages, pp.
336–343. IEEE Computer Society Press, Los Alamitos (1996)
86. Shum, S.B., Li, V.U.G., Domingue, J., Motta, E.: Visualizing internetworked ar-
gumentation. In: Kirschner, P.A., Shum, S.J.B., Carr, C.S. (eds.) Visualizing Ar-
gumentation: Software Tools for Collaborative and Educational Sense-Making,
December 2002, pp. 185–204. Springer, Heidelberg (2002)
87. Spence, R.: Information Visualization, 2nd edn. Pearson Education Limited, Har-
low (2007)
88. Takatsuka, M., Gahegan, M.: GeoVISTA Studio: A codeless visual programming
environment for geoscientific data analysis and visualization. Computational Geo-
science 28(10), 1131–1144 (2002)
89. Tandler, P., Prante, T., Müller-Tomfelde, C., Streitz, B., Steinmetz, R.: Con-
necTables: Dynamic Coupling of Displays for the Flexible Creation of Shared
Workspaces. In: Proceedings of User Interface Software and Technology (UIST),
pp. 11–20. ACM Press, New York (2001)
90. Tang, A., Tory, M., Po, B., Neumann, P., Carpendale, S.: Collaborative cou-
pling over tabletop displays. In: Proceedings of Human Factors in Comput-
ing Systems (CHI), pp. 1181–1290. ACM Press, New York (2006), doi:10.1145/
1124772.1124950
91. Thomas, J.J., Cook, K.A.: Illuminating the Path: The Research and Development
Agenda for Visual Analytics. National Visualization and Analytics Center (2005),
http://nvac.pnl.gov/agenda.stm
92. Tufte, E.R.: The Visual Display of Quantitative Information. Graphic Press,
Cheshire (2001)
93. van Wijk, J.J.: The value of visualization. In: Proceedings of IEEE Visualiza-
tion (VIS), pp. 79–86. IEEE Computer Society Press, Los Alamitos (2005),
http://www.win.tue.nl/~ vanwijk/vov.pdf
132 J. Heer et al.
94. Vernier, F., Lesh, N., Shen, C.: Visualization techniques for circular tabletop
interfaces. In: Proceedings of Advanced Visual Interfaces (AVI), pp. 257–263.
ACM Press, New York (2002)
95. Viégas, F.B., boyd, d., Nguyen, D.H., Potter, J., Donath, J.: Digital artifacts for
remembering and storytelling: PostHistory and social network fragments. In: Pro-
ceedings of the Hawaii International Conference on System Sciences (HICCSS),
pp. 105–111 (2004)
96. Viégas, F.B., Wattenberg, M.: Communication-minded visualization: A call to
action. IBM Systems Journal 45(4), 801–812 (2006), doi:10.1147/sj.454.0801
97. Viégas, F.B., Wattenberg, M., van Ham, F., Kriss, J., McKeon, M.: Many Eyes:
A site for visualization at internet scale. IEEE Transactions on Visualization
and Computer Graphics (Proceedings Visualization / Information Visualization
2007) 12(5), 1121–1128 (2007),
http://www.research.ibm.com/visual/papers/viegasinfovis07.pdf
98. von Ahn, L.: Games with a purpose. Computer 39(6), 92–94 (2006)
99. Ware, C.: Information Visualization – Perception for Design, 2nd edn. Morgan
Kaufmann Series in Interactive Technologies. Morgan Kaufmann Publishers, San
Francisco (2004)
100. Wattenberg, M., Kriss, J.: Designing for Social Data Analysis. IEEE Transactions
on Visualization and Computer Graphics 12(4), 549–557 (2006)
101. Weaver, C., Fyfe, D., Robinson, A., Holdsworth, D.W., Peuquet, D.J., Mac-
Eachren, A.M.: Visual analysis of historic hotel visitation patterns. In: Proceed-
ings of the Symposium on Visual Analytics Science and Technology (VAST), Bal-
timore, MD, October 31–November 2 2006, pp. 35–42. IEEE Computer Society
Press, Los Alamitos (2006)
102. Weaver, C.E.: Improvise: A User Interface for Interactive Construction of Highly-
Coordinated Visualizations. Phd thesis, University of Wisconsin–Madison (June
2006)
103. Wesche, G., Wind, J., Göbe, M., Rosenblum, L., Durbin, J., Doyle, R., Tate,
D., King, R., Fröhlich, B., Fischer, M., Agrawala, M., Beers, A., Hanrahan, P.,
Bryson, S.: Application of the Responsive Workbench. IEEE Computer Graphics
and Applications 17(4), 10–15 (1997), doi:10.1109/38.595260
104. Weskamp, M.: newsmap. Website (accessed November 2007),
http://marumushi.com/apps/newsmap/index.cfm
105. Wigdor, D., Shen, C., Forlines, C., Balakrishnan, R.: Perception of elementary
graphical elements in tabletop and multi-surface environments. In: Proceedings
of Human Factors in Computing Systems (CHI), pp. 473–482. ACM Press, New
York (2007)
106. Willett, W., Heer, J., Agrawala, M.: Scented widgets: Improving navigation cues
with embedded visualizations. IEEE Transactions on Visualization and Computer
Graphics 13(6), 1129–1136 (2007)
107. Wright, W., Schroh, D., Proulx, P., Skaburskis, A., Cort, B.: Advances in nSpace
– the sandbox for analysis. In: International Conference on Intelligence Analysis,
McLean, VA (May 2005)
108. Yang, D., Rundensteiner, E.A., Ward, M.O.: Analysis guided visual exploration
to multivariate data. In: Proc. IEEE Visual Analytics Science and Technology
(2007)
109. Yost, B., North, C.: The perceptual scalability of visualization. IEEE Transactions
on Visualization and Computer Graphics 12(5), 837–844 (2005)
110. Zhang, J., Norman, D.A.: Representations in distributed cognitive tasks. Cogni-
tive Science 18(1), 87–122 (1994)
Creation and Collaboration 133
111. Zuk, T., Schlesier, L., Neumann, P., Hancock, M.S., Carpendale, M.S.T.: Heuris-
tics for Information Visualization Evaluation. In: Proceedings of the Workshop
Beyond Time and Errors (BELIV), held in conjunction with AVI, pp. 55–60. ACM
Press, New York (2006), doi:10.1145/1168149.1168162
Process and Pitfalls in Writing
Information Visualization Research Papers
Tamara Munzner
Abstract. The goal of this paper is to help authors recognize and avoid
a set of pitfalls that recur in many rejected information visualization
papers, using a chronological model of the research process. Selecting a
target paper type in the initial stage can avert an inappropriate choice
of validation methods. Pitfalls involving the design of a visual encoding
may occur during the middle stages of a project. In a later stage when
the bulk of the research is finished and the paper writeup begins, the
possible pitfalls are strategic choices for the content and structure of the
paper as a whole, tactical problems localized to specific sections, and
unconvincing ways to present the results. Final-stage pitfalls of writing
style can be checked after a full paper draft exists, and the last set of
problems pertain to submission.
1 Introduction
Many rejected information visualization research papers have similar flaws. In
this paper, I categorize these common pitfalls in the context of stages of the
research process. My main goal is to help authors escape these pitfalls, espe-
cially graduate students or those new to the field of information visualization.
Reviewers might also find these pitfalls an interesting point of departure when
considering the merits of a paper.
This paper is structured around a chronological model of the information
visualization research process. I argue that a project should begin with a careful
consideration of the type of paper that is the desired outcome, in order to avoid
the pitfalls of unconvincing validation approaches. Research projects that involve
the design of a new visual encoding would benefit from checking for several
middle-stage pitfalls in unjustified or inappropriate encoding choices. Another
critical checkpoint is the late stage of the project, after the bulk of the work is
done, but before diving in to writing up results. At this point, you should consider
both strategic pitfalls about the high-level structure of the entire paper, tactical
pitfalls that affect one or a few sections, and possible pitfalls in the specifics of
your approach to the results section. At a final stage, when there is a complete
paper draft, you can check for lower-level pitfalls of writing style, and avoid
submission-time pitfalls.
A. Kerren et al. (Eds.): Information Visualization, LNCS 4950, pp. 134–153, 2008.
c Springer-Verlag Berlin Heidelberg 2008
Writing InfoVis Research Papers 135
I have chosen a breezy style, following in the footsteps of Levin and Re-
dell [22] and Shewchuk [34]. My intent is serious, but I have tried to invent
catchy – sometimes even snide – titles in hopes of making these pitfalls more
memorable. Guides to writing research papers have been written in several sub-
fields of computer science, including systems [22], software engineering [33], pro-
gramming languages [19], networking [28], and graphics [20]. Many of the pitfalls
in the middle and later project stages apply to research writing in general, not
just information visualization, and have been mentioned in one or many of these
previous papers.
My first pass at providing advice for authors and reviewers in the field of
information visualization, abbreviated as infovis, was the creation of the author
guide for the annual conference. When I was Posters Chair of InfoVis 2002, the
IEEE Symposium on Information Visualization, I read the roughly 300 reviews
of the 78 rejected papers in order to decide which to invite as poster submissions.
The experience convinced me that future paper authors would benefit from more
specific guidance. When I became Papers Chair in 2003, with co-chair Stephen
North, we completely rewrote the Call for Papers. We introduced five categories
of papers, with an explicit discussion of the expectations for each, in a guide for
authors that has been kept unchanged through the 2007 conference.
This second pass is motivated by the patterns of mistakes I saw in my two-
year term as InfoVis Papers Co-Chair where I read the over 700 reviews for all 189
submitted papers, and in personally writing nearly 100 reviews in the subsequent
three years. My discussion of paper types below expands considerably on the
previous author guide, and I provide concrete examples of strong papers for each
type. The advice I offer is neither complete nor objective; although I draw on
my experience as a papers chair, my conclusions may be idiosyncratic and reflect
my personal biases. I do not perform any quantitative analysis. Doing so in the
domain of infovis would, no doubt, be fruitful future work, given the interesting
results from software engineering [33] and human-computer interaction [17].
None of these pitfalls are aimed at any particular individual: I have seen
multiple instances of each one. Often a single major pitfall was enough to doom
a paper to rejection, although in some cases I have seen other strengths outweigh
a particular weakness. I hasten to point out that I, myself, have committed some
of the errors listed below, and despite my best efforts I may well fall prey to them
in the future.
In any particular paper, the constraints of researcher time and page limits force
authors to select a subset of these approaches to validation. The taxonomy of
paper types below can provide you with considerable guidance in choosing ap-
propriate validation approaches, leading to a paper structure where your results
back up your claims. The five paper types guide the presentation of your research
by distinguishing between the following possibilities for your primary contribu-
tion: an algorithm, a design, a system, a user study, or a model.
2.2 Technique
Technique papers focus on novel algorithms and an implementation is expected.
The most straightforward case is where the research contribution is a new algo-
rithm that refines or improves a technique proposed in previous work. A typical
claim is that the new algorithm is faster, more scalable, or provides better visual
quality than the previously proposed one. The MillionVis system [5], hierarchical
parallel coordinates [6], and hierarchical edge bundling [15] are good exemplars
for this category.
Typical results to back up such a claim would be algorithm complexity anal-
ysis, quantitative timing measurements of the implementation, and a qualitative
discussion of images created by the new algorithm. Quantitative metrics of image
quality, for example edge crossings in graph layout, are also appropriate. You
need to compare these results side by side against those from competing algo-
rithms. You might collect this information through some combination of using
results from previous publications, running publicly available code, or imple-
menting them yourself. In this case, there is very little or no design justification
for whether the technique is actually suitable for the proposed problem domain
in the paper itself: there is an implicit assumption that the previous cited work
makes such arguments.
In retrospect, a better name for this category might be Algorithms. Many
authors who design new visual representations might think that a paper docu-
menting a new technique belongs in the Technique category. However, the ques-
tion to ask is whether your primary contribution is the algorithm itself, or the
design. If your algorithm is sophisticated enough that it requires several pages
of description for replicability, then you probably have a primary algorithmic
Writing InfoVis Research Papers 137
2.4 Systems
Systems papers focus on the architectural choices made in the design of an in-
frastructure, framework, or toolkit. A systems paper typically does not introduce
new techniques or algorithms. A systems paper also does not introduce a new
design for an application that solves a specific problem; that would be a de-
sign study. The research contribution of a systems paper is the discussion of
architectural design choices and abstractions in a framework or library, not just
a single application. A good example is the prefuse systems paper [14], which
138 T. Munzner
2.5 Evaluation
Evaluation papers focus on assessing how an infovis system or technique is used
by some target population. Evaluation papers typically do not introduce new
techniques or algorithms, and often use implementations described in previous
work. The most common approach in infovis thus far has been formal user stud-
ies conducted in laboratory setting, using carefully abstracted tasks that can be
quantitatively measured in terms of time and accuracy, and analyzed with sta-
tistical methods. A typical claim would be that the tested tasks are ecologically
valid; that is, they correspond to those actually undertaken by target users in
a target domain. A typical result would be a statistically significant main ef-
fect of an experimental factor, or interaction effect between factors. The work of
Yost and North on perceptual scalability is a good example of this subtype [44].
A different approach to studying user behavior is field studies, where a system
is deployed in a real-world setting with its target users. In these studies, the
number of participants is usually smaller, with no attempt to achieve statis-
tical significance, and the time span is usually weeks or months rather than
hours. However, the study design axes of field versus laboratory, short-term ver-
sus long-term, and size are all orthogonal. Both quantitative and qualitative
measurements may be collected. For example, usage patterns may be studied
through quantitative logging of mouse actions or eyegaze. The work of Hornbæk
and Hertzum on untangling fisheye menus is a good example of this subtype
[16]. Usage patterns can also be studied through qualitative observations during
the test itself or later via coding of videotaped sessions. Trafton et al.’s field
study of how meteorologists use visual representations is an excellent example
of the power of video coding [39].
Writing InfoVis Research Papers 139
2.6 Model
Model papers present formalisms and abstractions as opposed to the design
or evaluation of any particular technique or system. This category is for meta-
research papers, where the broad purpose is to help other researchers think about
their own work.
The most common subcategory is Taxonomy, where the goal is to propose
categories that help researchers better understand the structure of the space of
possibilities for some topic. Some boundaries will inevitably be fuzzy, but the
goal is to be as comprehensive and complete as possible. As opposed to a survey
paper, where the goal is simply to summarize the previous work, a taxonomy
paper proposes some new categorization or expands upon a previous one and
may presume the reader’s familiarity with the previous work. Good examples
are Card and Mackinlay’s taxonomy of visual encodings [3] and Amar et al.’s
task taxonomy [1].
A second subcategory is Formalism, for papers that present new models,
definitions, or terminology to describe techniques or phenomena. A key attribute
of these kinds of papers is reflective observation. The authors look at what is
going on in a field and provide a new way of thinking about it that is clear,
insightful, and summative. An influential example is the space-scale diagram
work of Furnas and Bederson [7], and an interesting recent example is the casual
infovis definition from Pousman et al. [31].
A third subcategory is Commentary, where the authors advocate a position
and argue to support it. Typical arguments would be “the field needs to do more
X”, “we should be pushing for more Y”, or “avoid doing Z because of these
drawbacks”. A good example is the fisheye followup from Furnas [8]. These kinds
of papers often cite many examples and may also introduce new terminology.
Model papers can provide both a valuable summary of a topic and a vocab-
ulary to more concisely discuss concepts in the area. They can be valuable for
both established researchers and newcomers to a field, and are often used as
assigned readings in courses. I think this category name is appropriate and do
not suggest changing it.
140 T. Munzner
2.7 Combinations
These categories are not hard and fast: some papers are a mixture. For example,
a design study where the primary contribution is the design might include a
secondary contribution of summative evaluation in the form of a lab or field
study. Similarly, a design study may have a secondary contribution in the form of
a novel algorithm. Conversely, a technique paper where the primary contribution
is a novel algorithm may also include a secondary design contribution in the form
of a task analysis or design requirements. However, beware the Neither Fish Nor
Fowl pitfall I discuss below.
Carefully consider the primary contribution of your work to avoid the pitfalls
that arise from a mismatch between the strengths of your project and the paper
type you choose.
Application Bingo versus Design Study: Don’t apply some random tech-
nique to a new problem without thoroughly thinking about what the problem
is, whether the technique is suitable, and to what extent it solves the prob-
lem. I define ’application bingo’ as the game where you pick a narrowly defined
problem domain, a random technique, and then write an application paper with
the claim of novelty for this particular domain-technique combination. Applica-
tion bingo is a bad game to play because an overwhelming number of the many
combinatorial possibilities lead to a bad design.
Although application bingo is admittedly a caricature, the important ques-
tion is how we can distinguish those who inadvertently play it from those who
genuinely solve a domain problem with an effective visualization. Some visualiza-
tion venues distinguish between research papers and what are called applications
or case studies. This paper category is often implicitly or explicitly considered
to be a way to gather data from a community outside of visualization itself.
Although that goal is laudable, the mechanism has dangers. A very common
pitfall is that application paper submissions simply describe an instantiation of
a previous technique in great detail. Many do not have an adequate descrip-
tion of the domain problem. Most do not have an adequate justification of why
Writing InfoVis Research Papers 141
the technique is suitable for the problem. Most do not close the loop with a
validation that the proposed solution is effective for the target users.
In contrast, a strong design study would be rather difficult for an outsider
unfamiliar with the infovis literature to write. Two critical aspects require a
thorough understanding of the strengths and weaknesses of many visualization
techniques. First, although a guideline like “clearly state the problem” might
seem straightforward at first glance, the job of abstracting from a problem in
some target domain to design requirements that can be addressed through visu-
alization techniques requires knowing those techniques. Second, justifying why
the chosen techniques are more appropriate than other techniques again requires
knowledge of the array of possible techniques.
The flip side of this situation is that design studies where visualization re-
searchers do not have close contact with the target users are usually also weak. A
good methodology is collaboration between visualization researchers and target
users with driving problems [18](Chapter 3.4).
Neither Fish nor Fowl: Papers that try to straddle multiple categories often
fail to succeed in any of them. Be ruthlessly clear about identifying your most
important contribution as primary, and explicitly categorize any other contribu-
tions as secondary. Then make structural and validation choices based on the
category of the single primary contribution.
If you have chosen the design route, then a major concern in the middle stages
of a project should be whether your visual encoding choices are appropriate and
justifiable.
design requirements, it is very hard to convince a reader that your model will
solve the problem. In particular, you should consider how to make the case that
the structure you are visually showing actually benefits the target end user. For
example, many authors new to information visualization simply assert, without
justification, that showing the hyperlink structure of the web will benefit end
users who are searching for information. One of my own early papers fell prey
to this very pitfall [26]. However, after a more careful task analysis, I concluded
that most searchers do not need to build a mental model of the structure of the
search space, so showing them that structure adds cognitive load rather than
reduces it. In a later paper [25], I argued that a visual representation of that
hyperlink structure could indeed benefit a specific target community, that of
webmasters and content creators responsible for a particular site.
The foundation of information visualization is the characterization of how
known facts about human perception should guide visual encoding of abstract
datasets. The effectiveness of perceptual channels such as spatial position, color,
size, shape, and so on depends on whether the data to encode is categorical,
ordered, or quantitative [24]. Many individual perceptual channels are preatten-
tively processed in parallel, yet most combinations of these channels must be
serially searched [12]. Some perceptual channels are easily separable, but other
combinations are not [41, Chapter 5]. These principles, and many others, are
a critical part of infovis theory. The last three pitfalls in this section are a few
particularly egregious examples of ignoring this body of knowledge.
Hammer in Search of Nail: If you simply propose a nifty new technique with
no discussion of who might ever need it, it’s difficult to judge its worth. I am
not arguing that all new techniques need to be motivated by specific domain
problems: infovis research that begins from a technique-driven starting place
can be interesting and stimulating. Moreover, it may be necessary to build an
interactive prototype and use it for dataset exploration before it’s possible to
understand the capabilities of a proposed technique.
However, before you write up the paper about that hammer, I urge you to
construct an understanding what kind of nails it can handle. Characterize, at
least with some high-level arguments, the kinds of problems where your new
technique shines as opposed to those where it performs poorly.
2D Good, 3D Better: The use of 3D rather than 2D for the spatial layout of
an abstract dataset requires careful justification that the benefits outweigh the
costs [36]. The use of 3D is easy to justify when a meaningful 3D representation is
implicit in the dataset, as in airflow over an airplane wing in flow visualization or
skeletal structure in medical visualization. The benefit of providing the familiar
view is clear, because it matches the mental model of the user. However, when the
spatial layout is chosen rather than given, as in the abstract datasets addressed
through infovis, there is an explicit choice about which variables to map to
spatial position. It is unacceptable, but all too common with naive approaches
to infovis, to simply assert that using an extra dimension must be a good idea.
The most serious problem with a 3D layout is occlusion. The ability to in-
teractively change the point of view with navigational controls does not solve
Writing InfoVis Research Papers 143
Color Cacophony: An infovis paper loses credibility when you make design de-
cisions with blatant disregard for basic color perception facts. Examples include
having huge areas of highly saturated color, hoping that color coding will be
distinguishable in tiny regions, using more nominal categories than the roughly
one dozen that can be distinguishable with color coding, or using a sequential
scheme for diverging data. Using a red/green hue coding is justifiable only when
strong domain conventions exist, and should usually be redundantly coded with
luminance differences to be distinguishable to the 10% of men who are color-
blind. You should not attempt to visually encode three variables through the
three channels of red, green, and blue; they are not separable because they are
integrated by the visual system into a combined percept of color. These principles
have been clearly explained by many authors, including Ware [41, Chapter 4].
Rainbows Just like in the Sky: The unjustified use of a continuous rainbow
colormap is a color pitfall so common that I give it a separate title. The most
critical problem is that the standard rainbow colormap is perceptually nonlinear.
A fixed range of values that are indistinguishable in the green region would
clearly show change in other regions such as where orange changes to yellow
or cyan changes to blue. Moreover, hue does not have an implicit perceptual
ordering, in contrast to other visual attributes such as greyscale or saturation. If
the important aspect of the information to be encoded is low-frequency change,
then use a colormap that changes from just one hue to another, or has a single hue
that changes saturation. If you are showing high-frequency information, where
it is important to distinguish and discuss several nameable regions, then a good
strategy is to explicitly quantize your data into a segmented rainbow colormap.
These ideas are discussed articulately by Rogowitz and Treinish [32].
Least Publishable Unit: Do not try to squeeze too many papers out of the
same project, where you parcel out some tiny increment of research contribution
beyond your own previous work. The determination of what is a paper-sized unit
of work is admittedly a very individual judgement call, and I will not attempt
to define the scope here. As a reviewer, I apply the “I know it when I see it”
standard.
Bad Slice and Dice: If you have done two papers’ worth of work and choose
to write two papers, you can still make the wrong choice about how to split up
the work between them. In this pitfall, neither paper is truly standalone, yet
both repeat too much content found in the other. Repartitioning can solve this
problem.
The tactical pitfalls are localized to one or a few sections, as opposed to the
paper-level strategy problems above.
I highly recommend having a sentence near the end of the introduction that
starts, “The contribution of this work is”, and of using bulleted lists if there are
multiple contributions. More subtle ways of stating contributions, using verbs
like ’present’ and ’propose’, can make it more difficult for readers and reviewers
to ferret out which of your many sentences is that all-important contributions
statement. Also, do not assume that the reader can glean your overall contri-
butions from a close reading of the arguments in your previous work section.
While it is critical to have a clear previous work section that states how you ad-
dress the limitations of the previous work, as I discuss below, your paper should
clearly communicate your contributions even if the reader has skipped the entire
previous work section.
I find that articulating the contributions requires very careful consideration
and is one of the hardest parts of writing up a paper. They are often quite
different than the original goals of the project, and often can only be determined
in retrospect. What can we do that wasn’t possible before? How can we do
something better than before? What do we know that was unknown or unclear
before? The answers to these questions should guide all aspects of the paper,
from the high-level message to the choice of which details are worth discussing.
And yet, as an author I find that it’s hard to pin these down at the beginning of
the writing process. This reason is one of the many to start writing early enough
that there is time to refine through multiple drafts. After writing a complete
draft, then reading through it critically, I can better refine the contributions
spin in a next pass.
is not enough. You must explain why this previous work does not itself solve your
problem, and what specific limitations of that previous work your approach does
address. Every paper you cite in the previous work section is a fundamental chal-
lenge to the very existence of your project. Your job is to convince a skeptical
reader that the world needs your new thing because it is somehow better than a
particular old thing. Moreover, it’s not even enough to just make the case that
yours is different – yours must be better. The claims you make must, of course,
be backed up by your validation in a subsequent results section.
A good way to approach the previous work section is that you want to tell
to a story to the reader. Figure out the messages you want to get across to the
reader, in what order, and then use the references to help you tell this story. It
is possible to group the previous work into categories, and to usefully discuss
the limitations of the entire category.
Several pitfalls on how to validate your claims can occur in the results section
of your paper.
Unfettered by Time: Do not omit time performance from your writeup, be-
cause it is almost always interesting and worth documenting. The level of detail
at which you should report this result depends on the paper type and the contri-
bution claims. For instance, a very high-level statement like “interactive response
for all datasets shown on a desktop PC” may suffice for an evaluation paper or
a design study paper. However, for a technique paper with a contribution claim
of better performance than previous techniques, detailed comparison timings in
tables or charts would be a better choice.
Writing InfoVis Research Papers 147
Tiny Toy Datasets: Avoid using only tiny toy datasets in technique papers
that refine previously proposed visual encodings. While small synthetic bench-
marks can be useful for expository purposes, your validation should include
datasets of the same size used by state-of-the-art approaches. Similarly, you
should use datasets characteristic of those for your target application.
On the other hand, relatively small datasets may well be appropriate for a
user study, if they are carefully chosen in conjunction with some specific target
task and this choice is explained and justified.
But My Friends Liked It: Positive informal evaluation of a new infovis sys-
tem by a few of your infovis-expert labmates is not very compelling evidence
that a new technique is useful for novices or scientists in other domains. While
the guerilla/discount methodology is great for finding usability problems with
products [27], a stronger approach would be informal evaluation with more rep-
resentative subjects, or formal evaluation with rigorous methodology.
Unjustified Tasks: Beware of running a user study where the tasks are not
justified. A study is not very interesting if it shows a nice result for a task
that nobody will ever actually do, or a task much less common or important
than some other task. You need to convince the reader that your tasks are a
reasonable abstraction of the real-world tasks done by your target users. If you
are the designer of one of the systems studied, be particularly careful to make a
convincing case that you did not cherry-pick tasks with a bias to the strengths
of your own system.
Deadly Detail Dump: When writing a paper, do not simply dump out all
the details and declare victory. The details are the how of what you did and do
belong at the heart of your paper. But you must first say what you did and why
you did it before the how. This advice holds at multiple levels. For the high-level
paper structure, start with motivation: why should I, the reader, care about
what you’ve done? Then provide an overview: a big-picture view of what you
did. The algorithmic details can then appear after the stage has been set. At the
section, subsection, and sometimes even paragraph level, stating the what before
the how will make your writing more clear.
My Picture Speaks for Itself: You should talk the reader through how your
visual representation exposes meaningful structure in the dataset, rather than
simply assuming the superiority of your method is obvious to all readers from
unassisted inspection of your result images. Technique and design study papers
usually include images in the results section showing the visual encodings cre-
ated by a technique or system on example datasets. The best way to carry out
this qualitative evaluation is to compare your method side-by-side with repre-
sentations created by competing methods on the same dataset.
Mistakes Were Made: Avoid the passive voice as much as possible. I call
out this particular grammar issue because it directly pertains to making your
research contribution clear. Is the thing under discussion part of your research
contribution, or something that was done or suggested by others? The problem
with the passive voice is its ambiguity: the reader does not have enough infor-
mation to determine who did something. This very ambiguity can be the lure
of the passive voice to a slothful or overly modest writer. I urge you to use the
active voice and make such distinctions explicitly.
Jargon Attack: Avoid jargon as much as possible, and if you must use it then
define it first. Definitions are critical both for unfamiliar terms or acronyms, as
well as for standard English words being used in a specific technical sense.
Writing InfoVis Research Papers 149
Nonspecific Use of Large: Never just use the words ’large’ or ’huge’ to de-
scribe a dataset or the scalability of a technique without giving numbers to clarify
the order of magnitude under discussion: hundreds, tens of thousands, millions?
Every author has a different idea of what these words mean, ranging from 128
to billions, so be specific. Also, you should provide the size of all datasets used
in results figures, so that readers don’t have to count dots in an image to guess
the numbers.
Finally, I caution against pitfalls at the very end of the project, when submitting
your paper.
6 Pitfalls By Generality
7 Conclusion
I have advocated an approach to conducting infovis research that begins with an
explicit consideration of paper types. I have exhorted authors to avoid pitfalls
at several stages of research process, including visual encoding during design,
a checkpoint before starting to write, and after a full paper draft exists. My
description and categorization of these pitfalls reflects my own experiences as
author, reviewer, and papers chair. I offer it in hopes of steering and stimulating
discussion in our field.
Acknowledgments. This paper has grown in part from a series of talks. The
impetus to begin articulating my thoughts was the Publishing Your Visualiza-
tion Research panel at the IEEE Visualization 2006 Doctoral Colloquium. I
benefited from discussion with many participants at the 2007 Dagstuhl Seminar
on Information Visualization and the 2007 Visual Interactive Effective Worlds
workshop at the Lorentz Center, after speaking on the same topic. I also appre-
ciate a discussion with Pat Hanrahan and Lyn Bartram on design studies. John
Stasko’s thoughts considerably strengthened my discussion of model papers. I
thank Aaron Barsky, Hamish Carr, Jeff Heer, Stephen Ingram, Ciarán Llachlan
Leavitt, Peter McLachlan, James Slack, and Matt Ward for feedback on paper
drafts. I particularly thank Heidi Lam, Torsten Möller, John Stasko, and Melanie
Tory for extensive discussions beyond the call of duty.
References
1. Amar, R., Eagan, J., Stasko, J.: Low-level components of analytic activity in in-
formation visualization. In: Proc. IEEE Symposium on Information Visualization
(InfoVis), pp. 111–117 (2005)
2. Becker, R.A., Cleveland, W.S., Shyu, M.J.: The visual design and control of trellis
display. Journal of Computational and Statistical Graphics 5, 123–155 (1996)
3. Card, S., Mackinlay, J.: The structure of the information visualization design
space. In: Proc. IEEE Symposium on Information Visualization (InfoVis), pp.
92–99 (1997)
Writing InfoVis Research Papers 151
22. Levin, R., Redell, D.D.: An evaluation of the ninth SOSP submissions; or, how
(and how not) to write a good systems paper. ACM SIGOPS Operating Systems
Review 17(3), 35–40 (1983),
http://www.usenix.org/events/samples/submit/advice.html
23. MacEachren, A., Dai, X., Hardisty, F., Guo, D., Lengerich, G.: Exploring high-D
spaces with multiform matrices and small multiples. In: Proc. IEEE Symposium
on Information Visualization (InfoVis), pp. 31–38 (2003)
24. Mackinlay, J.D.: Automating the Design of Graphical Presentations of Relational
Information. ACM Trans. on Graphics (TOG) 5(2), 111–141 (1986)
25. Munzner, T.: Drawing large graphs with H3Viewer and Site Manager. In: White-
sides, S.H. (ed.) GD 1998. LNCS, vol. 1547, pp. 384–393. Springer, Heidelberg
(1999)
26. Munzner, T., Burchard, P.: Visualizing the structure of the world wide web in
3D hyperbolic space. In: Proc. Virtual Reality Modelling Language Symposium
(VRML), pp. 33–38. ACM SIGGRAPH (1995)
27. Nielsen, J.: Guerrilla HCI: Using discount usability engineering to penetrate the
intimidation barrier. In: Bias, R.G., Mayhew, D.J. (eds.) Cost-justifying usability,
pp. 245–272. Academic Press, London (1994)
28. Partridge, C.: How to increase the chances your paper is accepted at ACM
SIGCOMM. ACM SIGCOMM Computer Communication Review 28(3) (1998),
http://www.sigcomm.org/conference-misc/author-guide.html
29. Plaisant, C.: The challenge of information visualization evaluation. In: Proc. Ad-
vanced Visual Interfaces (AVI), pp. 109–116 (2004)
30. Plumlee, M., Ware, C.: Zooming versus multiple window interfaces: Cognitive
costs of visual comparisons. Proc. ACM Trans. on Computer-Human Interaction
(ToCHI) 13(2), 179–209 (2006)
31. Pousman, Z., Stasko, J.T., Mateas, M.: Casual information visualization: Depic-
tions of data in everyday life. IEEE Trans. Visualization and Computer Graphics
(TVCG) (Proc. InfoVis 07) 13(6), 1145–1152 (2007)
32. Rogowitz, B.E., Treinish, L.A.: How not to lie with visualization. Computers In
Physics 10(3), 268–273 (1996),
http://www.research.ibm.com/dx/proceedings/pravda/truevis.htm
33. Shaw, M.: Mini-tutorial: Writing good software engineering research papers.
In: Proc. Intl. Conf. on Software Engineering (ICSE), pp. 726–736 (2003),
http://www.cs.cmu.edu/~ Compose/shaw-icse03.pdf
34. Shewchuk, J.: Three sins of authors in computer science and math (1997),
http://www.cs.cmu.edu/jrs/sins.html
35. Shneiderman, B., Plaisant, C.: Strategies for evaluating information visualization
tools: Multi-dimensional in-depth long-term case studies. In: Proc. AVI Workshop
on BEyond time and errors: novel evaLuation methods for Information Visualiza-
tion (BELIV), pp. 38–43 (2006)
36. Smallman, H.S., John, M.S., Oonk, H.M., Cowen, M.B.: Information availability in
2D and 3D displays. IEEE Computer Graphics and Applications (CG&A) 21(5),
51–57 (2001)
37. Tang, D., Stolte, C., Bosch, R.: Design choices when architecting visualizations.
Information Visualization 3(2), 65–79 (2004)
38. Tory, M., Kirkpatrick, A.E., Atkins, M.S., Möller, T.: Visualization task perfor-
mance with 2D, 3D, and combination displays. IEEE Trans. Visualization and
Computer Graphics (TVCG) 12(1), 2–13 (2006)
Writing InfoVis Research Papers 153
39. Trafton, J.G., Kirschenbaum, S.S., Tsui, T.L., Miyamoto, R.T., Ballas, J.A., Ray-
mond, P.D.: Turning pictures into numbers: Extracting and generating informa-
tion from complex visualizations. Intl. Journ. Human Computer Studies 53(5),
827–850 (2000)
40. van Wijk, J.J., van Selow, E.R.: Cluster and calendar based visualization of time
series data. In: Proc. IEEE Symposium on Information Visualization (InfoVis),
pp. 4–9 (1999)
41. Ware, C.: Information Visualization: Perception for Design, 2nd edn. Morgan
Kaufmann/Academic Press, London (2004)
42. Weaver, C., Fyfe, D., Robinson, A., Holdsworth, D.W., Peuquet, D.J.,
MacEachren, A.M.: Visual analysis of historic hotel visitation patterns. Infor-
mation Visualization 6(1), 89–103 (2007)
43. Wilkinson, L., Anand, A., Grossman, R.: Graph-theoretic scagnostics. In: Proc.
IEEE Symposium on Information Visualization (InfoVis), pp. 157–164 (2005)
44. Yost, B., North, C.: The perceptual scalability of visualization. IEEE Trans. Vi-
sualization and Computer Graphics (TVCG) (Proc. InfoVis 06) 12(5), 837–844
(2006)
Visual Analytics:
Definition, Process, and Challenges
A. Kerren et al. (Eds.): Information Visualization, LNCS 4950, pp. 154–175, 2008.
c Springer-Verlag Berlin Heidelberg 2008
Visual Analytics: Definition, Process, and Challenges 155
Due to information overload, time and money are wasted, scientific and in-
dustrial opportunities are lost because we still lack the ability to deal with the
enormous data volumes properly. People in both their business and private lives,
decision-makers, analysts, engineers, emergency response teams alike, are often
confronted with massive amounts of disparate, conflicting and dynamic infor-
mation, which are available from multiple heterogeneous sources. We want to
simply and effectively exploit and use the hidden opportunities and knowledge
resting in unexplored data sources.
In many application areas success depends on the right information being
available at the right time. Nowadays, the acquisition of raw data is no longer
the driving problem: It is the ability to identify methods and models, which can
turn the data into reliable and provable knowledge. Any technology, that claims
to overcome the information overload problem, has to provide answers for the
following problems:
– Who or what defines the “relevance of information” for a given task?
– How can appropriate procedures in a complex decision making process be
identified?
– How can the resulting information be presented in a decision- or task-oriented
way?
– What kinds of interaction can facilitate problem solving and decision mak-
ing?
With every new “real-life” application, procedures are put to the test possibly
under circumstances completely different from the ones under which they have
been established. The awareness of the problem how to understand and analyse
our data has been greatly increased in the last decade. Even as we implement
more powerful tools for automated data analysis, we still face the problem of un-
derstanding and “analysing our analyses” in the future: Fully-automated search,
filter and analysis only work reliably for well-defined and well-understood prob-
lems. The path from data to decision is typically quite complex. Even as fully-
automated data processing methods represent the knowledge of their creators,
they lack the ability to communicate their knowledge. This ability is crucial: If
decisions that emerge from the results of these methods turn out to be wrong,
it is especially important to examine the procedures.
The overarching driving vision of visual analytics is to turn the information
overload into an opportunity: Just as information visualization has changed our
view on databases, the goal of Visual Analytics is to make our way of processing
data and information transparent for an analytic discourse. The visualization of
these processes will provide the means of communicating about them, instead
of being left with the results. Visual Analytics will foster the constructive eval-
uation, correction and rapid improvement of our processes and models and -
ultimately - the improvement of our knowledge and our decisions (see Figure 1).
On a grand scale, visual analytics solutions provide technology that combines
the strengths of human and electronic data processing. Visualization becomes
the medium of a semi-automated analytical process, where humans and machines
cooperate using their respective distinct capabilities for the most effective results.
156 D. Keim et al.
Fig. 1. Tight integration of visual and automatic data analysis methods with database
technology for a scalable interactive decision support.
The user has to be the ultimate authority in giving the direction of the analysis
along his or her specific task. At the same time, the system has to provide
effective means of interaction to concentrate on this specific task. On top of
that, in many applications different people work along the path from data to
decision. A visual representation will sketch this path and provide a reference
for their collaboration across different tasks and abstraction levels.
The diversity of these tasks can not be tackled with a single theory. Visual
analytics research is highly interdisciplinary and combines various related re-
search areas such as visualization, data mining, data management, data fusion,
statistics and cognition science (among others). Visualization has to continuously
challenge the perception by many of the applying sciences that visualization is
not a scientific discipline in its own right. Even if the awareness exists, that
scientific analysis and results must be visualized in one way or the other, this
often results in ad hoc solutions by application scientists, which rarely match
the state of the art in interactive visualization science, much less the full com-
plexity of the problems. In fact, all related research areas in the context of visual
analytics research conduct rigorous, serious science each in a vibrant research
community. To increase the awareness of their work and their implications for
visual analytics research clearly emerges as one main goal of the international
visual analytics community (see Figure 2).
Because visual analytics research can be regarded as an integrating discipline,
application specific research areas should contribute with their existing proce-
dures and models. Emerging from highly application-oriented research, dispersed
research communities worked on specific solutions using the repertoire and stan-
dards of their specific fields. The requirements of visual analytics introduce new
dependencies between these fields.
Visual Analytics: Definition, Process, and Challenges 157
Fig. 2. Visual analytics integrates scientific disciplines to improve the division of labor
between human and machine.
Fig. 3. Visual Analytics integrates Scientific and Information Visualization with core
adjacent disciplines: Data management and analysis, spatio-temporal data, and human
perception and cognition. Successful Visual Analytics research also depends on the
availability of appropriate infrastructure and evaluation facilities.
3.1 Visualization
Visualization has emerged as a new research discipline during the last two dec-
ades. It can be broadly classified into Scientific and Information Visualization.
In Scientific Visualization, the data entities to be visualized are typically 3D
geometries or can be understood as scalar, vectorial, or tensorial fields with ex-
plicit references to time and space. A survey of current visualization techniques
can be found in [22,35,23]. Often, 3D scalar fields are visualized by isosurfaces or
semi-transparent point clouds (direct volume rendering) [15]. To this end, meth-
ods based on optical emission- or absorption models are used which visualize the
volume by ray-tracing or projection. Also, in the recent years significant work
focused on the visualization of complex 3-dimensional flow data relevant e.g.,
in aerospace engineering [40]. While current research has focused mainly on effi-
ciency of the visualization techniques to enable interactive exploration, more and
more methods to automatically derive relevant visualization parameters come
into focus of research. Also, interaction techniques such as focus&context [28]
gain importance in scientific visualization.
Information Visualization during the last decade has developed methods
for the visualization of abstract data where no explicit spatial references are
given [38,8,24,41]. Typical examples include business data, demographics data,
network graphs and scientific data from e.g., molecular biology. The data con-
sidered often comprises hundreds of dimensions and does not have a natural
mapping to display space, and renders standard visualization techniques such as
(x, y) plots, line- and bar-charts ineffective. Therefore, novel visualization tech-
niques are being developed by employing e.g., Parallel Coordinates and their
numerous extensions [20], Treemaps [36], and Glyph [17]- and Pixel-based [25]
visual data representations. Data with inherent network structure may be visual-
ized using graph-based approaches. In many Visualization application areas, the
typically huge volumes of data require the appropriate usage of automatic data
analysis techniques such as clustering or classification as preprocessing prior to
visualization. Research in this direction is just emerging.
technology. But the availability of heterogeneous data not only requires the map-
ping of database schemata but includes also the cleaning and harmonization of
uncertainty and missing data in the volumes of heterogeneous data. Modern ap-
plications require such intelligent data fusion to be feasible in near real-time and
as automatically as possible [32]. New forms of information sources such as data
streams [11], sensor networks [30] or automatic extraction of information from
large document collections (e.g., text, HTML) result in a difficult data analysis
problem which to support is currently in the focus of database research [43].
The relationship between Data Management, Data Analysis and Visualization
is characterized such that Data Management techniques developed increasingly
rely on intelligent data analysis techniques, and also interaction and visualiza-
tion to arrive at optimal results. On the other hand, modern database systems
provide the input data sources which are to be visually analyzed.
regarding the main data types and user tasks [2] to be supported are highly de-
sirable for shaping visual analytics research. A common understanding of data
and problem dimensions and structure, and acceptance of evaluation standards
will make research results better comparable, optimizing research productivity.
Also, there is an obvious need to build repositories of available analysis and vi-
sualization algorithms, which researchers can build upon in their work, without
having to re-implement already proven solutions.
How to assess the value of visualization is a topic of lively debate [42,33]. A
common ground that can be used to position and compare future developments
in the field of data analysis is needed. The current diversification and dispersion
of visual analytics research and development resulted from its focus onto specific
application areas. While this approach may suit the requirements of each of
these applications, a more rigorous and overall scientific perspective will lead to
a better understanding of the field and a more effective and efficient development
of innovative methods and techniques.
3.7 Sub-communities
Spatio-Temporal Data: While many different data types exist, one of the
most prominent and ubiquitous data types is data with references to time and
space. The importance of this data type has been recognized by a research
community which formed around spatio-temporal data management and anal-
ysis [14]. In geospatial data research, data with references in the real world
coming from e.g., geographic measurements, GPS position data, remote sensing
applications, and so on is considered. Finding spatial relationships and patterns
among this data is of special interest, requiring the development of appropriate
management, representation and analysis functions. E.g., developing efficient
data structures or defining distance and similarity functions is in the focus of re-
search. Visualization often plays a key role in the successful analysis of geospatial
data [6,26].
In temporal data, the data elements can be regarded as a function of time.
Important analysis tasks here include the identification of patterns (either lin-
ear or periodical), trends and correlations of the data elements over time, and
application-dependent analysis functions and similarity metrics have been pro-
posed in fields such as finance, science, engineering, etc. Again, visualization of
time-related data is important to arrive at good analysis results [1].
The analysis of data with references both in space and in time is a chal-
lenging research topic. Major research challenges include [4]: scale, as it is often
necessary to consider spatio-temporal data at different spatio-temporal scales;
the uncertainty of the data as data are often incomplete, interpolated, collected
at different times, or based upon different assumptions; complexity of geograph-
ical space and time, since in addition to metric properties of space and time
and topological/temporal relations between objects, it is necessary to take into
account the heterogeneity of the space and structure of time; and complexity of
spatial decision making processes, because a decision process may involve hetero-
164 D. Keim et al.
geneous actors with different roles, interests, levels of knowledge of the problem
domain and the territory.
Network and Graph Data: Graphs appear as flexible and powerful math-
ematical tools to model real-life situations. They naturally map to transporta-
tion networks, electric power grids, and they are also used as artifacts to study
complex data such as observed interactions between people, or induced interac-
tions between various biological entities. Graphs are successful at turning seman-
tic proximity into topological connectivity, making it possible to address issues
based on algorithmics and combinatorial analysis.
Graphs appear as essential modeling and analytical objects, and as effective
visual analytics paradigms. Major research challenges are to produce scalable
analytical methods to identify key components both structurally and visually.
Efforts are needed to design process capable of dealing with large datasets while
producing readable and usable graphical representations, allowing proper user
interaction. Special efforts are required to deal with dynamically changing net-
works, in order to assess of structural changes at various scales.
Fig. 4. The sense-making loop for Visual Analytics based on the simple model of
visualization by Wijk [42].
5 Application Challenges
Visual Analytics is a highly application oriented discipline driven by practical
requirements in important domains. Without attempting a complete survey over
all possible application areas, we sketch the potential applicability of Visual
Analytics technology in a few key domains.
In the Engineering domain, Visual Analytics can contribute to speed-up de-
velopment time for products, materials, tools and production methods by offering
more effective, intelligent access to the wealth of complex information resulting
from prototype development, experimental test series, customers’ feedback, and
many other performance metrics. One key goal of applied Visual Analytics in
the engineering domain will be the analysis of the complexity of the production
systems in correlation with the achieved output, for an efficient and effective
improvement of the production environments.
Financial Analysis is a prototypical promising application area for Visual
Analytics. Analysts in this domain are confronted with streams of heterogeneous
information from different sources available at high update rates, and of varying
166 D. Keim et al.
6 Technical Challenges
The primary goal of Visual Analytics is the analysis of vast amounts of data to
identify and visually distill the most valuable and relevant information content.
The visual representation should reveal structural patterns and relevant data
properties for easy perception by the analyst. A number of key requirements
need to be addressed by advanced Visual Analytics solutions. We next outline
important scientific challenges in this context.
Visual Analytics: Definition, Process, and Challenges 167
Fig. 5. A visual display of a large amount of position records is unreadable and not
suitable for analysis.
Fig. 6. Positions of stops have been extracted from the database. By means of cluster-
ing, frequently visited places have been detected.
170 D. Keim et al.
Fig. 7. The temporal histograms show the distribution of the stops in the frequently
visited places (Figure 6) with respect to the weekly (left) and daily (right) cycles.
Fig. 8. A result of clustering and summarization of movement data: the routes between
the significant places.
8 Conclusions
The problems addressed by Visual Analytics are generic. Virtually all sciences
and many industries rely on the ability to identify methods and models, which
can turn data into reliable and provable knowledge. Ever since the dawn of mod-
ern science, researchers needed to find methodologies to create new hypotheses,
to compare them with alternative hypotheses, and to validate their results. In
a collaborative environment this process includes a large number of specialized
people each having a different educational background. The ability to commu-
nicate results to peers will become crucial for scientific discourse.
Currently, no technological approach can claim to give answers to all three
key questions that have been outlined in the first section, regarding the
– relevance of a specific information
– adequacy of data processing methods and validity of results
– acceptability of the presentation of results for a given task
172 D. Keim et al.
Visual Analytics research does not focus on specific methods to address these
questions in a single “best-practice”. Each specific domain contributes a reper-
toire of approaches to initiate an interdisciplinary creation of solutions.
Visual Analytics literally maps the connection between different alternative
solutions, leaving the opportunity for the human user to view these options in
the context of the complete knowledge generation process and to discuss these
options with peers on common ground.
References
1. Aigner, W., Miksch, S., Müller, W., Schumann, H., Tominski, C.: Visual meth-
ods for analyzing time-oriented data. IEEE Transactions on Visualization and
Computer Graphics 14(1), 47–60 (2008)
2. Amar, R.A., Eagan, J., Stasko, J.T.: Low-level components of analytic activity in
information visualization. In: INFOVIS, p. 15 (2005)
3. Amiel, M., Melançon, G., Rozenblat, C.: Réseaux multi-niveaux: l’exemple des
échanges aériens mondiaux. M@ppemonde 79(3) (2005)
4. Andrienko, G., Andrienko, N., Jankowski, P., Keim, D., Kraak, M.-J.,
MacEachren, A., Wrobel, S.: Geovisual analytics for spatial decision support:
Setting the research agenda. Special issue of the International Journal of Geo-
graphical Information Science 21(8), 839–857 (2007)
5. Andrienko, G., Andrienko, N., Wrobel, S.: Visual analytics tools for analysis of
movement data. ACM SIGKDD Explorations 9(2) (2007)
6. Andrienko, N., Andrienko, G.: Exploratory Analysis of Spatial and Temporal
Data. Springer, Heidelberg (2005)
7. Auber, D., Chiricota, Y., Jourdan, F., Melançon, G.: Multiscale visualization of
small world networks. In: INFOVIS (2003)
8. Card, S.K., Mackinlay, J., Shneiderman, B.: Readings in Information Visualiza-
tion: Using Vision to Think. Morgan Kaufmann, San Francisco (1999)
9. Ceglar, A., Roddick, J.F., Calder, P.: Guiding knowledge discovery through in-
teractive data mining, pp. 45–87. IGI Publishing, Hershey (2003)
10. Chiricota, Y., Melançon, G.: Visually mining relational data. International Review
on Computers and Software (2005)
11. Das, A.: Semantic approximation of data stream joins. IEEE Transactions on
Knowledge and Data Engineering 17(1), 44–59 (2005), Member-Johannes Gehrke
and Member-Mirek Riedewald
12. Dix, A., Finlay, J.E., Abowd, G.D., Beale, R.: Human-Computer Interaction (.),
3rd edn. Prentice-Hall, Inc., Upper Saddle River (2003)
13. Duda, R., Hart, P., Stock, D.: Pattern Classification. John Wiley and Sons Inc.,
Chichester (2000)
14. Dykes, J., MacEachren, A., Kraak, M.-J.: Exploring geovisualization. Elsevier
Science, Amsterdam (2005)
15. Engel, K., Hadwiger, M., Kniss, J.M., Rezk-salama, C., Weiskopf, D.: Real-time
Volume Graphics. A. K. Peters, Ltd., Natick (2006)
16. Ester, M., Sander, J.: Knowledge Discovery in Databases - Techniken und An-
wendungen. Springer, Heidelberg (2000)
17. Forsell, C., Seipel, S., Lind, M.: Simple 3d glyphs for spatial multivariate data.
In: INFOVIS, p. 16 (2005)
174 D. Keim et al.
18. Han, J., Kamber, M. (eds.): Data Mining: Concepts and Techniques. Morgan
Kaufmann, San Francisco (2000)
19. Hand, D., Mannila, H., Smyth, P. (eds.): Principles of Data Mining. MIT Press,
Cambridge (2001)
20. Inselberg, A., Dimsdale, B.: Parallel Coordinates: A Tool for Visualizing Multi-
variate Relations (chapter 9), pp. 199–233. Plenum Publishing Corporation, New
York (1991)
21. Jacko, J.A., Sears, A.: The Handbook for Human Computer Interaction. Lawrence
Erlbaum & Associates, Mahwah (2003)
22. Johnson, C., Hanson, C. (eds.): Visualization Handbook. Kolam Publishing (2004)
23. Keim, D., Ertl, T.: Scientific visualization (in german). Information Technol-
ogy 46(3), 148–153 (2004)
24. Keim, D., Ward, M.: Visual Data Mining Techniques (chapter 11). Springer, New
York (2003)
25. Keim, D.A., Ankerst, M., Kriegel, H.-P.: Recursive pattern: A technique for visu-
alizing very large amounts of data. In: VIS ’95: Proceedings of the 6th conference
on Visualization ’95, Washington, DC, USA, p. 279. IEEE Computer Society
Press, Los Alamitos (1995)
26. Keim, D.A., Panse, C., Sips, M., North, S.C.: Pixel based visual data mining of
geo-spatial data. Computers &Graphics 28(3), 327–344 (2004)
27. Kerren, A., Stasko, J.T., Fekete, J.-D., North, C.J. (eds.): Information Visualiza-
tion. LNCS, vol. 4950. Springer, Heidelberg (2008)
28. Krúger, J., Schneider, J., Westermann, R.: Clearview: An interactive context pre-
serving hotspot visualization technique. IEEE Transactions on Visualization and
Computer Graphics 12(5), 941–948 (2006)
29. Maimon, O., Rokach, L. (eds.): The Data Mining and Knowledge Discovery Hand-
book. Springer, Heidelberg (2005)
30. Meliou, A., Chu, D., Guestrin, C., Hellerstein, J., Hong, W.: Data gathering tours
in sensor networks. In: IPSN (2006)
31. Mitchell, T.M.: Machine Learning. McGraw-Hill, Berkeley (1997)
32. Naumann, F., Bilke, A., Bleiholder, J., Weis, M.: Data fusion in three steps:
Resolving schema, tuple, and value inconsistencies. IEEE Data Eng. Bull. 29(2),
21–31 (2006)
33. North, C.: Toward measuring visualization insight. IEEE Comput. Graph.
Appl. 26(3), 6–9 (2006)
34. Perner, P. (ed.): Data Mining on Multimedia Data. LNCS, vol. 2558. Springer,
Heidelberg (2002)
35. Schumann, H., Müller, W.: Visualisierung - Grundlagen und allgemeine Metho-
den. Springer, Heidelberg (2000)
36. Shneiderman, B.: Tree visualization with tree-maps: 2-d space-filling approach.
ACM Trans. Graph. 11(1), 92–99 (1992)
37. Shneiderman, B., Plaisant, C.: Designing the User Interface. Addison-Wesley,
Reading (2004)
38. Spence, R.: Information Visualization. ACM Press, New York (2001)
39. Thomas, J.J., Cook, K.A.: Illuminating the Path. IEEE Computer Society Press,
Los Alamitos (2005)
40. Tricoche, X., Scheuermann, G., Hagen, H.: Tensor topology tracking: A visual-
ization method for time-dependent 2d symmetric tensor fields. Comput. Graph.
Forum 20(3) (2001)
41. Unwin, A., Theus, M., Hofmann, H.: Graphics of Large Datasets: Visualizing a
Million (Statistics and Computing). Springer, New York (2006)
Visual Analytics: Definition, Process, and Challenges 175
42. van Wijk, J.J.: The value of visualization. In: IEEE Visualization, p. 11 (2005)
43. Widom, J.: Trio: A system for integrated management of data, accuracy, and
lineage. In: CIDR, pp. 262–276 (2005)
44. Yi, J.S., Kang, Y.a., Stasko, J.T., Jacko, J.A.: Toward a deeper understanding
of the role of interaction in information visualization. IEEE Trans. Vis. Comput.
Graph. 13(6), 1224–1231 (2007)
Author Index