0% found this document useful (0 votes)
69 views7 pages

Polysemy Theoretical and Computational Approaches

Uploaded by

latifatu atika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views7 pages

Polysemy Theoretical and Computational Approaches

Uploaded by

latifatu atika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Polysemy: Theoretical and Computational Approaches

Yael Ravin and Claudia Leacock (editors)


(IBM T.J. Watson Research Center and Educational Testing Services)

New York: Oxford University Press,


2000, xi+227 pp; hardbound, ISBN
0-19-823842-8, $74.00 £45.00;
paperbound, ISBN 0-19-925086-3, $????,
£14.99

Reviewed by
Jean Véronis
Université de Provence, Aix-en-Provence

As the editors remind us, polysemy has been a vexing issue for the understanding of
language since Antiquity.1 For half a century it has been a major bottleneck for natural
language processing. It contributed to the failure of early machine translation research
(remember Bar-Hillel’s famous pen and box example) and is still plaguing most natural
language processing and information retrieval applications. A recent issue of this journal
described the state of the art in automatic sense disambiguation (Ide and Véronis 1998),
and Senseval system competitions have revealed the immense difficulty of the task.2
However, no significant progress can be made on the computational aspects of polysemy
without serious advances in theoretical issues. At the same time, theoretical work can be
fostered by computational results and problems, and language processing applications
can provide a unique test-bed for theories. It was therefore an excellent idea to gather
both theoretical and applied contributions in the same book.
Yael Ravin and Claudia Leacock are well-known names to those who work on the
theoretical and computational aspects of word meaning. In this volume, they bring
together a collection of essays from leading researchers in the field. As far as I can tell,
these essays are not reprints or expanded versions of conference papers, as is often the
case for edited works, but seem to have been specially commissioned for the purposes
of this book, which makes it even more exciting to examine.
The book is composed of 11 chapters. It is not formally divided into parts, but
chapters dealing more specifically with the computational aspects of polysemy are
grouped together at the end (and constitute about one third of the volume).
Chapter 1 is an overview written by the volume editors. Yael Ravin and Claudia Lea-
cock provide a survey of the main theories of meaning and their treatment of polysemy.
These include the classical Aristotelian approach revived by Katz and Fodor (1963),
Rosch’s (1977) prototypical approach which has its roots in Wittgenstein’s Philosophical
Investigations (1953), and the relational approach recently exemplified by WordNet (Fell-
baum 1998), which (although the authors do not mention it) can be traced back to Peirce’s
(1931–1958) and Selz’s (1913, 1922) graphs and gained popularity with Quillian’s (1968)
semantic networks. In the course of this overview, Ravin and Leacock put the individual
chapters into perspective by relating them to the various theories.
In Chapter 2, “Aspects of the micro-structure of word meanings”, D. Alan Cruse
addresses the issue of the extreme context-sensitivity of word meaning, which can

1 The editors, citing Robins (1967), attribute the first observations of the “complex relations between
meanings and words” to the Stoics, but reflection on polysemy can be traced back at least to Aristotle.
2 http://www.sle.sharp.co.uk/senseval2

1
result in an almost infinite subdivision of senses. However, Cruse believes that there are
“regions of higher semantic density” within this extreme variability, which he calls sense-
nodules, “lumps of meaning with greater or lesser stability under contextual change”.
As Cruse admits, this is only a metaphor, and as such, may not be highly useful to
the researcher. In the rest of the chapter, Cruse attempts to build a typology of these
nodules, listing their properties and providing tests to detect them. The tests (e.g., the
zeugma effect in sentences such as John and his driving license expired yesterday) are not
entirely new (e.g., Quine 1960; Cruse 1986; Geeraerts 1993), but are integrated here into
a coherent framework that places context-dependency at the very heart of the theory.
Chapter 3 by Christiane Fellbaum is devoted to “autotroponymy”. This term requires
a two-step explanation. Troponyms are verb hyponyms, that refer to specific manners of
performing actions denoted by other verbs. For example, in English, stammer, babble,
whisper, and shout, are troponyms of talking. Autotroponymy is a special case that occurs
when the verbs linked by this relation share the same form, as in The children behaved /
The children behaved well. The author explains autotroponymy in terms of conflation of a
meaning component not expressed on the surface. For example, in The children behaved,
the verb includes a hidden adverbial (well / satisfactorily / appropriately). Fellbaum gives
a typology of autotroponyms that is based on the nature of the conflated element (noun,
adjective, adverbial), and she discusses their syntactic and semantic properties in detail.
In Chapter 4, “Lexical shadowing and argument closure”, James Pustejovsky ex-
plores verbs such as butter, which block the expression of a generic argument, as in
Mary buttered her bread with butter, while allowing for a specific one, as in Mary buttered
her bread with expensive butter from Wisconsin (see Levin 1993), and verbs such as risk,
which can occur in contradictory contexts with roughly the same meaning, as in Mary
risked death to save her son / Mary risked her life to save her son (see Fillmore and Atkins
1992). Pustejovsky introduces the concept of “lexical shadowing”, which he defines
as “the relation between an argument and the underlying semantic expression, which
blocks its syntactic projection in the syntax”. For example, the underlying semantics of
the verb butter “shadows” the expression of the substance that is spread, and allows
only for specialization of the shadowed argument. For verbs such as risk, the shadowing
is of a different type: it is the expression of one argument that shadows the expression
of another, in a strictly complementary fashion. Pustejovsky explains these cases of ar-
gument optionality or complementarity in the framework of the Generative Lexicon
(Pustejovsky 1995) and its various devices, among which “coercion” plays a central role.
Chapter 5, by Charles Fillmore and Sue Atkins, is a case study in lexicography. They
analyze the sense divisions and definitions of the verb crawl in various dictionaries,
and compare them with corpus evidence from the British National Corpus. It is a well-
known fact that dictionaries exhibit large discrepancies, and although they claim to be
based on the analysis of corpus data, many sense distinctions that show up in a corpus
are not reflected in dictionary entries. This is not entirely unexpected, since after all,
no dictionary claims exhaustive coverage of a language, and some selection must be
made by the lexicographer. This is even an explicit goal in four of the six dictionaries
examined here, which are learners’ dictionaries that attempt to provide an illustration of
the “core” uses of words for learners of English. It is striking, however, to see the extent
to which lexicographers differ as to their choices and assessment of what constitutes
an important meaning a learner should acquire. Fillmore and Atkins are perfectly right
in noting that lexicographers lack objective criteria for sense division and information
extraction from corpora. The FrameNet project they describe in an appendix3 is an


3 See http://www.icsi.berkeley.edu/ framenet/

2
attempt to achieve a systematic understanding and description of the meanings of lexical
items and grammatical constructions by looking at a large number of attested examples,
sorting them according to the conceptual structures (semantic “frames”) that underlie
their meanings, and describing the associated information in terms of semantic roles,
phrase types, and grammatical functions. The numerous observations regarding sense
connections in the corpus examples result in a network-like organization of meanings,
which can be used in both monolingual and bilingual lexicography. The last section of
the chapter illustrates this possibility using the verb ramper, the French equivalent of to
crawl.
Chapter 6, “ ‘The garden swarms with bees’ and the fallacy of ‘argument alternation’ ”
by David Dowty, comes back to the argument problem already tackled by Fellbaum and
Pustejovsky in their respective chapters and proposes syntactic structures as an explana-
tory principle for alternations in meaning. The author is concerned with agent / location
alternations such as Bees swarm in the garden / The garden swarms with bees. He departs
from the usual point of view that such pairs express the same meaning and differ only
in syntactic form. Using the large set of examples in Salkoff (1983), Dowty groups verbs
that participate in such alternations into five semantic classes, and then shows that the
two forms exhibit many semantic differences related to the informational structure of the
sentence. The locative-subject form makes the location the topic of discourse, with the
predicate ascribing an abstract property to the location. Some tests show the difference
in meaning. For example, the with-phrase object must be semantically “unquantified” in
the locative-subject form (compare A roach crawled on the wall / The wall crawled with a
roach), the locative-subject form is more suited to metaphor than the agent-subject form,
and so forth.
Chapter 7 by Cliff Goddard outlines Wierzbicka’s “natural semantic metalanguage”
(NSM) approach to semantic analysis (Wierzbicka 1996, etc.), which is based on the
idea that every language possesses a core of undefinable words (“semantic primes”).
Complex expressions (words or grammatical constructions) can be described by means
of explanatory reductive paraphrases composed of combinations of semantic primes.
This “definitional” framework provides a diagnosis technique for detecting polysemy.
For any given word, one can first assume that it has a single meaning and try to state it
in a reductive paraphrase. If this turns out to be impossible and several paraphrases are
needed to describe the word’s range of uses, then the word has distinct meanings. For
example, there is no single paraphrase in terms of primes that could predict the range of
uses of the French word fille, meaning both daughter and girl, and therefore the word must
be split into two distinct meanings. Using this test, Goddard shows that dictionaries very
often posit false and unnecessary polysemy, and occasionally false monosemy. He also
shows how the technique can be used on grammatical constructions, and applies it in
detail to have a VP expressions (have a stroll, have a chat, etc.). The chapter ends with a
discussion of how aspects of figurative language can be handled within this framework.
In Chapter 8, “Lexical representations for sentence processing”, George Miller and
Claudia Leacock raise the following question: “Why isn’t a dictionary a good theory of
the lexical component of language?” They share Fillmore and Atkins’s dissatisfaction
about dictionary making. For them, the main shortcoming of dictionaries is their lack of
contextual information that would enable a user to make the correct association between
senses and actual contexts. In their introduction, they give a convincing example from
previous experiments. School children given dictionary definitions of English words
produced sentences such as Our family erodes a lot, which sounds bizarre until you read
the definition of erode: “eat out, eat away”. According to Miller and Leacock, what is
missing from dictionaries is a satisfactory treatment of the lexical aspects of sentence
processing. The rest of the paper is devoted to a discussion of the two types of context

3
that can be used to associate a given context with a particular word sense: local context
(the immediate neighbors of the word under focus), and topical context (the general topic
or domain of the text or conversation). The authors show that local context cues are very
precise when they occur, but often simply do not occur. On the other hand, topical context
is very efficient in helping discriminate between homographs, but not very helpful for
identifying the different senses of a polysemous word. Miller and Leacock consider the
combination of the two sources to be a major avenue of research.
Mark Stevenson and Yorick Wilks tackle this issue in Chapter 9, “Large vocabulary
word sense disambiguation”, in which they propose a methodology for combining sev-
eral knowledge sources into a word-sense disambiguation system. Their first source of
information is syntactic in nature and is provided by the Brill part-of-speech tagger. The
semantic information present in the local context is then used in two ways. The overlap
between LDOCE dictionary definitions and the local context is computed by means of
an improved version of Cowie, Guthrie, and Guthrie’s (1992) simulated-annealing tech-
nique, and selectional restrictions are resolved by means of LDOCE semantic classes. The
larger context is handled with techniques that map it to the subject categories provided
by LDOCE for each sense (“pragmatic codes”). The efficiency of each of these mod-
ules taken separately ranges from 44 to 79 percent, but Stevenson and Wilks show that
using machine-learning techniques, the modules can be combined in an efficient way
to produce 90 percent correct disambiguation, which is quite high for an unrestricted
vocabulary system.
In Chapter 10, “Polysemy in a broad-coverage natural language processing system”,
William Dolan, Lucy Vanderwende, and Steven Richardson describe the approach to pol-
ysemy processing taken in the MS-NLP broad-coverage natural language understanding
system. The core of their system is MindNet, a network-structured computational lexicon
extracted from machine-readable dictionaries (MRD) augmented with corpus informa-
tion. MindNet uses the same general approach as the MRD-based spreading activation
networks proposed by Véronis and Ide (1990), although in a much more sophisticated
version including labeled connections, backward links, weighted paths, etc. Dolan et al.
depart from most computational approaches to polysemy in that they believe that word
meaning is “inherently flexible”, that making pre-defined inventories of discrete senses
is unsuitable for broad-coverage applications, and that no sharp boundaries should be
drawn between senses. Their approach is reminiscent of Cruse’s, presented earlier in
this book. For these authors, “understanding” is no more than identifying an activation
pattern in the network.
In previous publications, Hinrich Schütze held a position similar to Dolan et al.’s
with respect to predefined sense inventories. For Schütze, many problems require dis-
crimination among senses but do not require explicit sense labeling, and the techniques
he has proposed extract the sense divisions from the corpus itself (see Schütze, 1998): a
sense is a group of contextually similar occurrences of a word. This approach is almost
the opposite of Goddard’s. In Chapter 11, Schütze looks at word sense disambigua-
tion from the perspective of connectionism. After a survey of some of the literature on
disambiguation, he presents an algorithm that has grown out of two major concerns
in connectionist research: psychological plausibility and large-scale applicability. He
describes an application to information retrieval that demonstrates that his algorithm
can be applied to very large text collections (500 megabytes of text from the Wall Street
Journal).
The most noticeable feature of this book is probably its wide range of contributors
and the broad scope of the topics it encompasses. As the title implies, it addresses both
theoretical and computational aspects of polysemy, and within these two areas, very
different research trends are pursued. The book gives a very good overall picture of

4
current issues in polysemy and of the diverse ways of approaching the topic. It should
therefore hold an important place on the shelves of any researcher in the fields of lexical
semantics and word sense disambiguation, and will certainly be valued by many of our
graduate students.
The wide-angle snapshot offered by this book also reveals a very striking fact about
current lexical semantics. Apart from one chapter, all theoretical discussions are sup-
ported solely by invented examples. Lexical semantics, and probably semantics in gen-
eral, has not yet made the paradigm shift that has occurred or is occurring in other
branches of linguistics, such as syntax, where empirical evidence now replaces intuition
as the normal body of data to be studied. Another recent book (Sampson 2001) quite
brilliantly shows how the lack of objective evidence has been misleading linguistic re-
search for decades and has placed the discipline on the fringe of modern science. The
lack of objective evidence is probably even more dangerous in semantics than in other
areas of linguistics. The extreme flimsiness of introspection-based tests is acknowledged
by lexical semanticists themselves—for instance, how much agreement would there be
on whether or not a given coordination is a zeugma?—and such tests make it almost
impossible for semantics to satisfy the minimal requirement that science has demanded
since Karl Popper, that of refutability.
Interestingly enough, the one chapter that does use corpus examples (Chapter 5
by Fillmore and Atkins) pertains to lexicography. Lexicographers indeed have a long
tradition of examining objective evidence, which computer tools and electronic corpora
have made it possible to systematize. However, several chapters (Fillmore and Atkins,
Goddard, Miller and Leacock) express their dissatisfaction with current dictionaries, on
the grounds that they lack theoretical criteria to back their organization. It is also worth
noting that the only computational approaches to word sense disambiguation able to
claim some minimal degree of efficiency are linguistically blind ones (like those reported
in this book), as if an insurmountable gap existed between theories and applications.
A paradigm shift in lexical semantics is therefore not just a scientific necessity; it is
also a practical one. I am convinced that no major breakthrough in language processing
applications and lexicography can be made until theories of meaning are based on the
observation of real data.

References 4(3), 223–272.


Cowie, Jim, Joe A. Guthrie, and Louise Ide, Nancy M. and Jean Véronis. 1998.
Guthrie. 1992. Lexical disambiguation Introduction to the special issue on word
using simulated annealing. Proceedings of sense disambiguation: the state of the art.
the 14th International Conference on Computational Linguistics, 24(1), 1–40.
Computational Linguistics, COLING-92, Katz, Jerrold J. and Jerry A. Fodor. The
23–28 August, Nantes, France, vol. 1, structure of a semantic theory. Language,
359–365. 39, 170–210.
Cruse, D. Alan. 1986. Lexical Semantics. Levin, Beth. 1993. English word classes and
Cambridge University Press, Cambridge, alternations: A preliminary investigation. The
U.K. University of Chicago Press.
Fellbaum, Christiane. 1998. WordNet: An Peirce, Charles Sanders. 1931–1958. Collected
Electronic Lexical Database. The MIT Press, papers of C.S. Peirce, ed. C. Hartshorne, P.
Cambridge, MA. Weiss, and A. Burks, 8 vols., Harvard
Fillmore, Charles J. and B. T. S. Atkins. 1992. University Press, Cambridge, MA.
Toward a frame-based lexicon: the Pustejovsky, James. 1995. The Generative
semantics of risk. In James Pustejovsky and Lexicon. The MIT Press, Cambridge, MA.
Sabine Bergler (eds.), Lexical Semantics and Quillian, M. Ross. 1968. Semantic memory. In
Knowledge Representation. Springer Verlag, Marvin Minsky (ed.), Semantic Information
New York. Processing, The MIT Press, Cambridge, MA.
Geeraerts, Dirk. 1993. Vagueness’s puzzles, Quine, Willard Van Orman. 1960. Word and
polysemy’s vagaries. Cognitive Linguistics, Object. The MIT Press, Cambridge, MA.

5
Robins, Robert H. 1967. A Short History of
Linguistics. Indiana University Press,
Bloomington.
Rosch, Eleanor 1977. Human categorization.
In N. Warren (ed.), Advances in
Cross-Cultural Psychology, vol. 7. Academic
Press, London.
Salkoff, Morris. 1983. Bees are swarming in
the garden. Language, 59(2), 288–346.
Sampson, Geoffrey. 2001. Empirical
Linguistics. Continuum, London.
Schütze, Hinrich. 1998. Automatic word
sense discrimination. Computational
Linguistics, 24(1), 97–124.
Selz, Otto. 1913. Über die Gesetze des geordneten
Denkverlaufs. Spemann, Stuttgart.
Selz, Otto. 1922. Zur Psychologie des
produktiven Denkens und des Irrtums.
Friedrich Cohen, Bonn.
Véronis, Jean and Ide, Nancy. 1990. Word
sense disambiguation with very large
neural networks extracted from machine
readable dictionaries. Proceedings of the 13th
International Conference on Computational
Linguistics, COLING-90, Helsinki, Finland,
vol. 2, 389–394.
Wierzbicka, Anna. 1996. Semantics: Primes
and Universals. Oxford University Press,
Oxford.
Wittgenstein, Ludwig. 1953. Philosophical
Investigations (translated by G. E. M.
Anscombe). Macmillan, New York.

6
Jean Véronis is a professor of linguistics and computer science at the Universit é de Provence in
Aix-en-Provence, France, where he heads a research team specializing in French corpus linguistics.
His academic interests include word sense disambiguation, computer lexicography, translation
corpora and parallel text alignment, prosody and speech synthesis. Véronis’s address is: Uni-
versité de Provence, 29 av. Robert Schuman, 13621 Aix-en-Provence Cedex 1, France; e-mail:

[email protected]; URL: www.up.univ-mrs.fr/ veronis

You might also like