p223 Elzer
p223 Elzer
Local bankruptcy
Abstract personal filings
3000
This paper presents a corpus study that ex-
plores the extent to which captions con- 2500
223
Proceedings of the 43rd Annual Meeting of the ACL, pages 223–230,
Ann Arbor, June 2005.
2005
c Association for Computational Linguistics
15 15
10 10
5 5
0−6 7−19 20−34 35−49 50−64 65−79 80+ 80+ 0−6 65−79 7−19 50−64 20−34 35−49
(a) (b)
Figure 2: Two Alternative Graphs from the Same Data
uation showing the system’s success, with particu- tended message of the graphic will be an important
lar attention given to the impact of evidence from component of the initial summary, and hypothesiz-
shallow processing of the caption, and Section 7 dis- ing it is the goal of our current work.
cusses future work.
Although we believe that our findings are ex- 3 Evidence about the Intended Message
tendible to other kinds of information graphics, our
current work focuses on bar charts. This research is The graphic designer has many alternative ways of
part of a larger project whose goal is a natural lan- designing a graphic; different designs contain differ-
guage system that will provide effective access to ent communicative signals and thus convey differ-
information graphics for individuals with sight im- ent communicative intents. For example, consider
pairments, by inferring the intended message under- the two graphics in Figure 2. The graphic in Fig-
lying the graphic, providing an initial summary of ure 2a conveys that average doctor visits per year
the graphic that includes the intended message along is U-shaped by age; it starts out high when one is
with notable features of the graphic, and then re- very young, decreases into middle age, and then
sponding to follow-up questions from the user. rises again as one ages. The graphic in Figure 2b
presents the same data; but instead of conveying a
2 Related Work trend, this graphic seems to convey that the elderly
and the young have the highest number of doctor vis-
Our work is related to efforts on graph summariza- its per year. These graphics illustrate how choice of
tion. (Yu et al., 2002) used pattern recognition tech- design affects the message that the graphic conveys.
niques to summarize interesting features of automat- Following the AutoBrief work (Kerpedjiev and
ically generated graphs of time-series data from a Roth, 2000) (Green et al., 2004) on generating
gas turbine engine. (Futrelle and Nikolakis, 1995) graphics that fulfill communicative goals, we hy-
developed a constraint grammar for parsing vector- pothesize that the designer chooses a design that best
based visual displays and producing representations facilitates the perceptual and cognitive tasks that
of the elements comprising the display. The goal are most important to conveying his intended mes-
of Futrelle’s project is to produce a graphic that sage, subject to the constraints imposed by compet-
summarizes one or more graphics from a document ing tasks. By perceptual tasks we mean tasks that
(Futrelle, 1999). The summary graphic might be a can be performed by simply viewing the graphic,
simplification of a graphic or a merger of several such as finding the top of a bar in a bar chart; by
graphics from the document, along with an appropri- cognitive tasks we mean tasks that are done via men-
ate summary caption. Thus the end result of summa- tal computations, such as computing the difference
rization will itself be a graphic. The long range goal between two numbers.
of our project, on the other hand, is to provide alter- Thus one source of evidence about the intended
native access to information graphics via an initial message is the relative difficulty of the perceptual
textual summary followed by an interactive follow- tasks that the viewer would need to perform in order
up component for additional information. The in- to recognize the message. For example, determining
224
the entity with maximum value in a bar chart will be Category #
easiest if the bars are arranged in ascending or de- Category-1: Captures intention (mostly) 34
scending order of height. We have constructed a set Category-2: Captures intention (somewhat) 15
of rules, based on research by cognitive psycholo- Category-3: Hints at intention 7
gists, that estimate the relative difficulty of perform- Category-4: No contribution to intention 44
ing different perceptual tasks; these rules have been
validated by eye-tracking experiments and are pre- Figure 3: Analysis of 100 Captions on Bar Charts
sented in (Elzer et al., 2004). from our corpus of bar charts. The intended mes-
Another source of evidence is entities that have sage of each bar chart had been previously annotated
been made salient in the graphic by some kind of fo- by two coders. The coders were asked to identify
cusing device, such as coloring some elements of the 1) the intended message of the graphic using a list
graphic, annotations such as an asterisk, or an arrow of 12 high-level intentions (see Section 5 for exam-
pointing to a particular location in a graphic. Enti- ples) and 2) the instantiation of the parameters. For
ties that have been made salient suggest particular example, if the coder classified the intended mes-
instantiations of perceptual tasks that the viewer is sage of a graphic as Change-trend, the coder was
expected to perform, such as comparing the heights also asked to identify where the first trend began,
of two highlighted bars in a bar chart. its general slope (increasing, decreasing, or stable),
And lastly, one would expect captions to help con- where the change in trend occurred, the end of the
vey the intended message of an information graphic. second trend, and the slope of the second trend. If
The next section describes a corpus study that we there was disagreement between the coders on either
performed in order to explore the usefulness of cap- the intention or the instantiation of the parameters,
tions and how we might exploit evidence from them. we utilized consensus-based annotation (Ang et al.,
2002), in which the coders discussed the graphic to
4 A Corpus Study of Captions try to come to an agreement. As observed by (Ang
et al., 2002), this allowed us to include the “harder”
Although one might suggest relying almost ex-
or less obvious graphics in our study, thus lowering
clusively on captions to interpret an information
our expected system performance. We then exam-
graphic, (Corio and Lapalme, 1999) found in a cor-
ined the caption of each graphic, and determined to
pus study that captions are often very general. The
what extent the caption captured the graphic’s in-
objective of their corpus study was to categorize the
tended message. Figure 3 shows the results. 44%
kinds of information in captions so that their find-
of the captions in our corpus did not convey to any
ings could be used in forming rules for generating
extent the message of the information graphic. The
graphics with captions.
following categorizes the purposes that these cap-
Our project is instead concerned with recogniz-
tions served, along with an example of each:
ing the intended message of an information graphic.
To investigate how captions might be used in a sys- • general heading (8 captions): “UGI Monthly
tem for understanding information graphics, we per- Gas Rates” on a graphic conveying a recent
formed a corpus study in which we analyzed the spike in home heating bills.
first 100 bar charts from our corpus of information
graphics; this corpus contains a variety of bar charts • reference to dependent axis (15 captions):
from different publication venues. The following “Lancaster rainfall totals for July” on a
subsections present the results of this corpus study. graphic conveying that July-02 was the driest
of the previous decade.
4.1 Do Captions Convey the Intended • commentary relevant to graphic (4 captions):
Message? “Basic performers: One look at the best per-
Our first investigation explored the extent to which forming stocks in the Standard&Poor’s 500 in-
captions capture the intended message of an infor- dex this year shows that companies with ba-
mation graphic. We extracted the first 100 graphics sic businesses are rewarding investors” on a
225
graphic conveying the relative rank of different of the graphic (since it does not capture the steady
stocks, some of which were basic businesses increase conveyed by the graphic), we placed it in
and some of which were not. This type of in- the first category since one might argue that riding
formation was classified as deductive by (Corio high in the polls would suggest both high and im-
and Lapalme, 1999) since it draws a conclusion proving ratings.
from the data depicted in the graphic. 15% of the captions were judged to convey only
part of the graphic’s intended message; an example
• commentary extending message of graphic (8 is “Drug spending for young outpace seniors” that
captions): “Profits are getting squeezed” on appears on a graphic whose intended message ap-
a graphic conveying that Southwest Airlines pears to be that there is a downward trend by age for
net income is estimated to increase in 2003 af- increased drug spending; we classified the caption
ter falling the preceding three years. Here the in Category-2 since the caption fails to capture that
commentary does not draw a conclusion from the graphic is talking about percent increases in drug
the data in the graphic but instead supplements spending, not absolute drug spending, and that the
the graphic’s message. However this type of graphic conveys the downward trend for increases in
caption would probably fall into the deductive drug spending by age group, not just that increases
class in (Corio and Lapalme, 1999). for the young were greater than for the elderly.
7% of the captions were judged to only hint at the
• humor (7 captions): “The Sound of Sales” on
graphic’s message. An example is “GM’s Money
a graphic conveying the changing trend (down-
Machine” which appeared on a graphic whose in-
ward after years of increase) in record album
tended message was a contrast of recent perfor-
sales. This caption has nothing to do with the
mance against the previous trend — ie., that al-
change-trend message of the graphic, but ap-
though there had been a steady decrease in the per-
pears to be an attempt at humor.
centage of GM’s overall income produced by its fi-
• conclusion unwarranted by graphic (2 cap- nance unit, there was now a substantial increase in
tions): “Defense spending declines” on a the percentage provided by the finance unit. Since
graphic that in fact conveys that recent defense the term money machine is a colloquialism that sug-
spending is increasing. gests making a lot of money, the caption was judged
to hint at the graphic’s intended message.
Slightly over half the captions (56%) contributed
to understanding the graphic’s intended message. 4.2 Understanding Captions
34% were judged to convey most of the intended For the 49 captions in Category 1 or 2 (where the
message. For example, the caption “Tennis play- caption conveyed at least some of the message of
ers top nominees” appeared on a graphic whose in- the graphic), we examined how well the caption
tended message is to convey that more tennis players could be parsed and understood by a natural lan-
were nominated for the 2003 Laureus World Sports guage system. We found that 47% were fragments
Award than athletes from any other sport. Since we (for example, “A Growing Biotech Market”), or in-
argue that captions alone are insufficient for inter- volved some other kind of ill-formedness (for ex-
preting information graphics, in the few cases where ample, “Running tops in sneaker wear in 2002” or
it was unclear whether a caption should be placed “More seek financial aid”1 ). 16% would require ex-
in Category-1 or Category-2, we erred on the side tensive domain knowledge or analogical reasoning
of over-rating the contribution of a caption to the to understand. One example is “Chirac is riding
graphic’s intended message. For example, consider high in the polls” which would require understand-
the caption “Chirac is riding high in the polls” ing the meaning of riding high in the polls. Another
which appeared on a graphic conveying that there example is “Bad Moon Rising”; here the verb ris-
has been a steady increase in Chirac’s approval rat- ing suggests that something is increasing, but the
ings from 55% to about 75%. Although this caption 1
Here we judge the caption to be ill-formed due to the ellip-
does not fully capture the communicative intention sis since More should be More students.
226
system would need to understand that a bad moon pus is “Germans miss their marks” where the
refers to something undesirable (in this case, delin- graphic displays a bar chart that is intended to
quent loans). convey that Germans are the least happy with
the Euro. Words that usually appear as verbs,
4.3 Simple Evidence from Captions but are used in the caption as a noun, may func-
Although our corpus analysis showed that captions tion similarly to verbs. An example is “Cable
can be helpful in understanding the message con- On The Rise”; in this caption, rise is used as a
veyed by an information graphic, it also showed that noun, but suggests that the graphic is conveying
full understanding of a caption would be problem- an increase.
atic; moreover, once the caption was understood, we
would still need to relate it to the information ex- 5 Utilizing Evidence
tracted from the graphic itself, which appears to be
a difficult problem. We developed and implemented a probabilistic
Thus we began investigating whether shallow pro- framework for utilizing evidence from a graphic and
cessing of the caption might provide evidence that its caption to hypothesize the graphic’s intended
could be effectively combined with other evidence message. To identify the intended message of a
obtained from the graphic itself. Our analysis pro- new information graphic, the graphic is first given
vided the following observations: to a Visual Extraction Module (Chester and Elzer,
2005) that is responsible for recognizing the indi-
• Verbs in a caption often suggest the kind of vidual components of a graphic, identifying the re-
message being conveyed by the graphic. An lationship of the components to one another and to
example from our corpus is “Boating deaths the graphic as a whole, and classifying the graphic
decline”; the verb decline suggests that the as to type (bar chart, line graph, etc.); the result is
graphic conveys a decreasing trend. Another an XML file that describes the graphic and all of its
example from our corpus is “American Express components.
total billings still lag”; the verb lag suggests Next a Caption Processing Module analyzes the
that the graphic conveys that some entity (in caption. To utilize verb-related evidence from cap-
this case American Express) is ranked behind tions, we identified a set of verbs that would indicate
some others. each category of high-level goal2 , such as recover
for Change-trend and beats for Relative-difference;
• Adjectives in a caption also often suggest the we then extended the set of verbs by examining
kind of message being conveyed by the graphic. WordNet for verbs that were closely related in mean-
An example from our corpus is “Air Force has ing, and constructed a verb class for each set of
largest percentage of women”; the adjective closely related verbs. Adjectives such as more and
largest suggests that the graphic is conveying most were handled in a similar manner. The Caption
an entity whose value is largest. Adjectives de- Processing Module applies a part-of-speech tagger
rived from verbs function similarly to verbs. and a stemmer to the caption in order to identify
An example from our corpus is “Soaring De- nouns, adjectives, and the root form of verbs and
mand for Servers” which is the caption on a adjectives derived from verbs. The XML represen-
graphic that conveys the rapid increase in de- tation of the graphic is augmented to indicate any
mand for servers. Here the adjective soaring is independent axis labels that match nouns in the cap-
derived from the verb soar, and suggests that tion, and the presence of a verb or adjective class in
the graphic is conveying a strong increase. the caption.
• Nouns in a caption often refer to an entity that The Intention Recognition Module then analyzes
is a label on the independent axis. When this the XML file to build the appropriate Bayesian net-
occurs, the caption brings the entity into focus work; the current system is limited to bar charts, but
and suggests that it is part of the intended mes- 2
As described in the next paragraph, there are 12 categories
sage of the graphic. An example from our cor- of high-level goals.
227
the principles underlying the system should be ex- mated by our effort estimation rules mentioned in
tendible to other kinds of information graphics. The Section 3, whether a parameter in the task refers to
network is described in (Elzer et al., 2005). Very an entity that is salient in the graphic, and whether
briefly, our analysis of simple bar charts has shown a parameter in the task refers to an entity that is a
that the intended message can be classified into one noun in the caption. An evidence node, indicating
of 12 high-level goals; examples of such goals in- for each verb class whether that verb class appears
clude: in the caption (either as a verb, or as an adjective de-
rived from a verb, or as a noun that can also serve as
• Change-trend: Viewer to believe that there a verb) is added as a child of the top level goal node.
is a <slope-1> trend from <param1> Adjectives such as more and most that provide evi-
to <param2> and a significantly differ- dence are handled in a similar manner.
ent <slope-2> trend from <param3> to
In a Bayesian network, conditional probability ta-
<param4>
bles capture the conditional probability of a child
• Relative-difference: Viewer to believe that the node given the value of its parent(s). For example,
value of element <param1> is <comparison> the network requires the conditional probability of
the value of element <param2> where an entity appearing as a noun in the caption given
<comparison> is greater-than, less-than, or that recognizing the intended message entails per-
equal-to. forming a particular perceptual task involving that
entity. Similarly, the network requires the condi-
Each category of high-level goal is represented by a tional probability, for each class of verb, that the
node in the network (whose parent is the top-level verb class appears in the caption given that the in-
goal node), and instances of these goals (ie., goals tended message falls into a particular intention cat-
with their parameters instantiated) appear as chil- egory. These probabilities are learned from our cor-
dren with inhibitory links (Huber et al., 1994) cap- pus of graphics, as described in (Elzer et al., 2005).
turing their mutual exclusivity. Each goal is broken
down further into subtasks (perceptual or cognitive) 6 Evaluation
that the viewer would need to perform in order to
accomplish the goal of the parent node. The net- In this paper, we are particularly interested in
work is built dynamically when the system is pre- whether shallow processing of captions can con-
sented with a new information graphic, so that nodes tribute to recognizing the intended message of an
are added to the network only as suggested by the information graphic. As mentioned earlier, the in-
graphic. For example, low-level nodes are added for tended message of each information graphic in our
the easiest primitive perceptual tasks and for per- corpus of bar charts had been previously annotated
ceptual tasks in which a parameter is instantiated by two coders. To evaluate our approach, we used
with a salient entity (such as an entity colored dif- leave-one-out cross validation. We performed a se-
ferently from others in the graphic or an entity that ries of experiments in which each graphic in the cor-
appears as a noun in the caption), since the graphic pus is selected once as the test graphic, the probabil-
designer might have intended the viewer to perform ity tables in the Bayesian network are learned from
these tasks; then higher-level goals that involve these the remaining graphics, and the test graphic is pre-
tasks are added, until eventually a link is established sented to the system as a test case. The system was
to the top-level goal node. judged to fail if either its top-rated hypothesis did
Next evidence nodes are added to the network to not match the intended message that was assigned
capture the kinds of evidence noted in Sections 3 to the graphic by the coders or the probability rat-
and 4.3. For example, evidence nodes are added to ing of the system’s top-rated hypothesis did not ex-
the network as children of each low-level perceptual ceed 50%. Overall success was then computed by
task; these evidence nodes capture the relative dif- averaging together the results of the whole series of
ficulty (categorized as easy, medium, hard, or im- experiments.
possible) of performing the perceptual task as esti- Each experiment consisted of two parts, one in
228
Visa
The second function of a verb is to focus atten-
tion on some aspect of the data. For example, con-
Mastercard sider the graphic in Figure 4. Without a caption, our
American Express system hypothesizes that the graphic is intended to
convey the relative rank in billings of different credit
Discover
card issuers and assigns it a probability of 72.7%.
Diner’s Club Other possibilities have some probability assigned
to them. For example, the intention of conveying
200 400 600 that Visa has the highest billings is assigned a prob-
Total credit card purchases per year in billions
ability of 26%. Suppose that the graphic had a cap-
Figure 4: A Graphic from Business Week3 tion of “Billings still lag”; if the verb lag is taken
into account, our system hypothesizes an intended
which captions were not taken into account in the
message of conveying the credit card issuer whose
Bayesian network and one in which the Bayesian
billings are lowest, namely Diner’s Club; the prob-
network included evidence from captions. Our
ability assigned to this intention is now 88.4%, and
overall accuracy without the caption evidence was
the probability assigned to the intention of convey-
64.5%, while the inclusion of caption evidence in-
ing the relative rank of different credit card issuers
creased accuracy to 79.1% for an absolute increase
drops to 7.8%. This is because the verb class con-
in accuracy of 14.6% and a relative improvement of
taining lag appeared in our corpus as part of the cap-
22.6% over the system’s accuracy without caption
tion for graphics whose message conveyed an en-
evidence. Thus we conclude that shallow process-
tity with a minimum value, and not with graphics
ing of a caption provides evidence that can be effec-
whose message conveyed the relative rank of all the
tively utilized in a Bayesian network to recognize
depicted entities. On the other hand, if the caption
the intended message of an information graphic.
is “American Express total billings still lag” (which
Our analysis of the results provides some interest- is the caption associated with the graphic in our cor-
ing insights on the role of elements of the caption. pus), then we have two pieces of evidence from the
There appear to be two primary functions of verbs. caption — the verb lag, and the noun American Ex-
The first is to reflect what is in the data, thereby press which matches a label. In this case, the proba-
strengthening the message that would be recognized bilities change dramatically; the hypothesis that the
without the caption. One example from our corpus graphic is intended to convey the rank of American
is a graphic with the caption “Legal immigration to Express (namely third behind Visa and Mastercard)
the U.S. has been rising for decades”. Although is assigned a probability of 76% and the probability
the early part of the graphic displays a change from drops to 24% that the graphic is intended to con-
decreasing immigration to a steadily increasing im- vey that Diner’s Club has the lowest billings. This is
migration trend, most of the graphic focuses on the not surprising. The presence of the noun American
decades of increasing immigration and the caption Express in the caption makes that entity salient and
strengthens increasing trend in immigration as the is very strong evidence that the intended message
intended message of the graphic. If we do not in- places an emphasis on American Express, thus sig-
clude the caption, our system hypothesizes an in- nificantly affecting the probabilities of the different
creasing trend message with a probability of 66.4%; hypotheses. On the other hand, the verb class con-
other hypotheses include an intended message that taining lag occurred both in the caption of graphics
emphasizes the change in trend with a probability whose message was judged to convey the entity with
of 15.3%. However, when the verb increasing from the minimum value and in the caption of graphics
the caption is taken into account, the probability of
increasing trend in immigration being the intended
pear on the bars and sometimes next to them, and the heading
message rises to 97.9%. for the dependent axis appears in the empty white space of the
graphic instead of below the values on the horizontal axis as we
3
This is a slight variation of the graphic from Business show it. Our vision system does not yet have heuristics for rec-
Week. In the Business Week graphic, the labels sometimes ap- ognizing non-standard placement of labels and axis headings.
229
that conveyed an entity ranked behind some others. References
Therefore, conveying the entity with minimum value J. Ang, R. Dhillon, A. Krupski, E. Shriberg, and A. Stol-
is still assigned a non-negligible probability. cke. 2002. Prosody-based automatic detection of an-
noyance and frustration in human-computer dialog. In
7 Future Work Proc. of the Int’l Conf. on Spoken Language Process-
ing (ICSLP).
It is rare that a caption contains more than one verb
D. Chester and S. Elzer. 2005. Getting computers to see
class; when it does happen, our current system by information graphics so users do not have to. To ap-
default uses the first one that appears. We need to pear in Proc. of the 15th Int’l Symposium on Method-
examine how to handle the occurrence of multiple ologies for Intelligent Systems.
verb classes in a caption. Occasionally, labels in the H. Clark. 1996. Using Language. Cambridge University
graphic appear differently in the caption. An exam- Press.
ple is DJIA (for Dow Jones Industrial Average) that
M. Corio and G. Lapalme. 1999. Generation of texts
occurs in one graphic as a label but appears as Dow for information graphics. In Proc. of the 7th European
in the caption. We need to investigate resolving such Workshop on Natural Language Generation, 49–58.
coreferences.
S. Elzer, S. Carberry, N. Green, and J. Hoffman. 2004.
We currently limit ourselves to recognizing what Incorporating perceptual task effort into the recogni-
appears to be the primary communicative intention tion of intention in information graphics. In Proceed-
of an information graphic; in the future we will also ings of the 3rd Int’l Conference on Diagrams, LNAI
consider secondary intentions. We will also extend 2980, 255–270.
our work to other kinds of information graphics such S. Elzer, S. Carberry, I. Zukerman, D. Chester, N. Green,
as line graphs and pie charts, and to complex graph- S. Demir. 2005. A probabilistic framework for recog-
ics, such as grouped and composite bar charts. nizing intention in information graphics. To appear in
Proceedings of the Int’l Joint Conf. on AI (IJCAI).
8 Summary R. Futrelle and N. Nikolakis. 1995. Efficient analysis of
complex diagrams using constraint-based parsing. In
To our knowledge, our project is the first to inves- Proc. of the Third International Conference on Docu-
tigate the problem of understanding the intended ment Analysis and Recognition.
message of an information graphic. This paper R. Futrelle. 1999. Summarization of diagrams in docu-
has focused on the communicative evidence present ments. In I. Mani and M. Maybury, editors, Advances
in an information graphic and how it can be used in Automated Text Summarization. MIT Press.
in a probabilistic framework to reason about the
Nancy Green, Giuseppe Carenini, Stephan Kerpedjiev,
graphic’s intended message. The paper has given Joe Mattis, Johanna Moore, and Steven Roth. Auto-
particular attention to evidence provided by the brief: an experimental system for the automatic gen-
graphic’s caption. Our corpus study showed that eration of briefings in integrated text and information
about half of all captions contain some evidence that graphics. International Journal of Human-Computer
Studies, 61(1):32–70, 2004.
contributes to understanding the graphic’s message,
but that fully understanding captions is a difficult H. P. Grice. 1969. Utterer’s Meaning and Intentions.
problem. We presented a strategy for extracting ev- Philosophical Review, 68:147–177.
idence from a shallow analysis of the caption and M. Huber, E. Durfee, and M. Wellman. 1994. The auto-
utilizing it, along with communicative signals from mated mapping of plans for plan recognition. In Proc.
the graphic itself, in a Bayesian network that hy- of Uncertainty in AI, 344–351.
pothesizes the intended message of an information S. Kerpedjiev and S. Roth. 2000. Mapping communica-
graphic, and our results demonstrate the effective- tive goals into conceptual tasks to generate graphics in
ness of our methodology. Our research is part of a discourse. In Proc. of Int. Conf. on Intelligent User
Interfaces, 60–67.
larger project aimed at providing alternative access
to information graphics for individuals with sight J. Yu, J. Hunter, E. Reiter, and S. Sripada. 2002.
impairments. Recognising visual patterns to communicate gas tur-
bine time-series data. In ES2002, 105–118.
230