0% found this document useful (0 votes)
36 views

Semantic Density Analysis: Comparing Word Meaning Across Time and Phonetic Space

This document proposes a new statistical method for detecting and tracking changes in word meaning over time based on Latent Semantic Analysis (LSA). It involves calculating semantic vectors for each occurrence of a word based on surrounding context, and measuring the "density" or clustering of these vectors to determine how consistently a word was used. The method is applied to analyze changes in words like "dog", "do", and "deer" in early English texts, and to examine relationships between phonetic forms and meanings.

Uploaded by

Jeremias Rivera
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Semantic Density Analysis: Comparing Word Meaning Across Time and Phonetic Space

This document proposes a new statistical method for detecting and tracking changes in word meaning over time based on Latent Semantic Analysis (LSA). It involves calculating semantic vectors for each occurrence of a word based on surrounding context, and measuring the "density" or clustering of these vectors to determine how consistently a word was used. The method is applied to analyze changes in words like "dog", "do", and "deer" in early English texts, and to examine relationships between phonetic forms and meanings.

Uploaded by

Jeremias Rivera
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Semantic Density Analysis:

Comparing word meaning across time and phonetic space


Eyal Sagi
Northwestern University
Evanston, Illinois, USA
[email protected]

Stefan Kaufmann Brady Clark


Northwestern University Northwestern University
Evanston, Illinois, USA Evanston, Illinois, USA
[email protected] [email protected]

en word type. This variation is inversely related


Abstract to a property of types that we call density – intui-
tively, a tendency to occur in highly similar con-
This paper presents a new statistical method texts. In terms of our LSA-based spatial semantic
for detecting and tracking changes in word model, we calculate vectors representing the con-
meaning, based on Latent Semantic Analysis. text of each occurrence of a given term, and es-
By comparing the density of semantic vector timate the term‟s cohesiveness as the density
clusters this method allows researchers to
with which these token context vectors are
make statistical inferences on questions such
as whether the meaning of a word changed “packed” in space.
across time or if a phonetic cluster is asso-
ciated with a specific meaning. Possible appli- 2 The method
cations of this method are then illustrated in
Latent Semantic Analysis (LSA) is a collective
tracing the semantic change of „dog‟, „do‟, and
„deer‟ in early English and examining and
term for a family of related methods, all of which
comparing phonaesthemes. involve building numerical representations of
words based on occurrence patterns in a training
1 Introduction corpus. The basic underlying assumption is that
co-occurrence within the same contexts can be
The increase in available computing power over used as a stand-in measure of semantic related-
the last few decades has led to an explosion in ness (see Firth, 1957; Halliday and Hasan, 1976;
the application of statistical methods to the anal- Hoey, 1991, for early articulations of this idea).
ysis of texts. Researchers have applied these me- The success of the method in technical applica-
thods to a wide range of tasks, from word-sense tions such as information retrieval and its popu-
disambiguation (Levin et al., 2006) to the sum- larity as a research tool in psychology, education,
marization of texts (Marcu, 2003) and the auto- linguistics and other disciplines suggest that this
matic scoring of student essays (Riedel et al., hypothesis holds up well for the purposes of
2006). However, some fields of linguistics that those applications.
have traditionally employed corpora as their The relevant notion of “context” varies. The
source material, such as historical semantics, first and still widely used implementation of the
have yet to benefit from the application of these idea, developed in Information Retrieval and
statistical methods. originally known as Latent Semantic Indexing
In this paper we demonstrate how an existing (Deerwester et al., 1990), assembles a term-
statistical tool (Latent Semantic Analysis) can be document matrix in which each vocabulary item
adapted and used to automate and enhance some (term) is associated with an n-dimensional vector
aspects of research in historical semantics and recording its distribution over the n documents in
other fields whose focus is on the comparative the corpus. In contrast, the version we applied in
analysis of word meanings within a corpus. Our this work measures co-occurrence in a way that
method allows us to assess the semantic variation is more independent of the characteristics of the
within the set of individual occurrences of a giv- documents in the training corpus, building in-

Proceedings of the EACL 2009 Workshop on GEMS: GEometical Models of Natural Language Semantics, pages 104–111,
Athens, Greece, 31 March 2009. 2009
c Association for Computational Linguistics

104
stead a term-term matrix associating vocabulary implementation is the set of tokens in a fixed-
items with vectors representing their frequency width window from the 15th item preceding 𝑤𝑖
of co-occurrence with each of a list of “content- to the 15th item following it (less if a document
bearing” words. This approach originated with boundary intervenes). The matrix was trans-
the “WordSpace” paradigm developed by formed by Singular Value Decomposition
Schütze (1996). The software we used is a ver- (SVD), whose implementation in the Infomap
sion of the “Infomap” package developed at system relies on the SVDPACKC package
Stanford University and freely available (see also (Berry, 1992; Berry et al., 1993). The output was
Takayama et al., 1999). We describe it and the a reduced 40,000 × 100 matrix. Thus each item
steps we took in our experiments in some detail 𝑤 ∈ 𝑊 is associated with a 100-dimensional
below. vector 𝑤 .
2.1 Word vectors 2.2 Context vectors
The information encoded in the co-occurrence Once the vector space is obtained from the
matrix, and thus ultimately the similarity meas- training corpus, vectors can be calculated for any
ure depends greatly on the genre and subject multi-word unit of text (e.g. paragraphs, queries,
matter of the training corpus (Takayama et al., or documents), regardless of whether it occurs in
1999; Kaufmann, 2000). In our case, we used the the original training corpus or not, as the normal-
entire available corpus as our training corpus. ized sum of the vectors associated with the words
The word types in the training corpus are ranked it contains. In this way, for each occurrence of a
by frequency of occurrence, and the Infomap target word type under investigation, we calcu-
system automatically selects (i) a vocabulary 𝑊 lated a context vector from the 15 words preced-
for which vector representations are to be col- ing and the 15 words following that occurrence.
lected, and (ii) a set 𝐶 of 1,000 “content-bearing” Context vectors were first used in Word Sense
words whose occurrence or non-occurrence is Discrimination by Schütze (1998). Similarly to
taken to be indicative of the subject matter of a that application, we assume that these “second-
given passage of text. Usually, these choices are order” vectors encode the aggregate meaning, or
guided by a stoplist of (mostly closed-class) lexi- topic, of the segment they represent, and thus,
cal items that are to be excluded, but because we following the reasoning behind LSA, are
were interested in tracing changes in the meaning indicative of the meaning with which it is being
of lexical items we reduced this stoplist to a bare used on that particular occurrence. Consequently,
minimum. To compensate, we increased the for each target word of interest, the context
number of “content-bearing” words to 2,000. The vectors associated with its occurrences constitute
vocabulary 𝑊 consisted of the 40,000 most fre- the data points. The analysis is then a matter of
quent non-stoplist words. The set 𝐶 of content- grouping these data points according to some
bearing words contained the 50th through 2,049th criterion (e.g., the period in which the text was
most frequent non-stoplist words. This method written) and conducting an appropriate statistical
may seem rather blunt, but it has the advantage test. In some cases it might also be possible to
of not requiring any human intervention or ante- use regression or apply a clustering analysis.
cedently given information about the domain.
The cells in the resulting matrix of 40,000 2.3 Semantic Density Analysis
rows and 2,000 columns were filled with co- Conducting statistical tests comparing groups of
occurrence counts recording, for each vectors is not trivial. Fortunately, some questions
pair 𝑤, 𝑐 ∈ 𝑊 × 𝐶, the number of times a token can be answered based on the similarity of vec-
of 𝑐 occurred in the context of a token of 𝑤 in tors within each group rather than the vectors
the corpus.1 The “context” of a token 𝑤𝑖 in our themselves. The similarity between two vectors
𝑤, 𝑣 is measured as the cosine between them:2
1
Two details are glossed over here: First, the Infomap sys-
tem weighs this raw count with a 𝑡𝑓. 𝑖𝑑𝑓 measure of the high base frequencies (cf. Takayama, et al. 1998; Widdows,
column label c, calculated as follows: 𝑡𝑓. 𝑖𝑑𝑓 𝑐 = 𝑡𝑓 𝑐 × 2004).
𝑙𝑜𝑔 𝐷 + 1 − 𝑙𝑜𝑔 𝑑𝑓 𝑐 where 𝑡𝑓 and 𝑑𝑓 are the number 2
While the cosine measure is the accepted measure of simi-
of occurrences of 𝑐 and the number of documents in which larity, the cosine function is non-linear and therefore prob-
𝑐 occurs, respectively, and 𝐷 is the total number of docu- lematic for many statistical methods. Several transforma-
ments. Second, the number in each cell is replaced with its tions can be used to correct this (e.g., Fisher‟s z). In this
square root, in order to approximate a normal distribution of paper we will use the angle, in degrees, between the two
counts and attenuate the potentially distorting influence of vectors (i.e., 𝑐𝑜𝑠 −1 ) because it is easily interpretable.

105
𝑤∙𝑣 the angle between two context vectors the angle
𝑐𝑜𝑠 𝑤 , 𝑣 = between the documents in which they appear.
𝑤 𝑣

The average similarity of a group of vectors is 3 Applications to Research


indicative of its density – a dense group of highly 3.1 A Diachronic Investigation: Semantic
similar vectors will have a high average cosine Change
(and a correspondingly low average angle)
whereas a sparse group of dissimilar vectors will One of the central questions of historical seman-
have an average cosine that approaches zero (and tics is the following (Traugott, 1999):6
a correspondingly high average angle).3 Thus
since a word that has a single, highly restricted Given the form-meaning pair 𝐿 (lexeme) what
meaning (e.g. „palindrome‟) is likely to occur in changes did meaning 𝑀 undergo?
a very restricted set of contexts, its context vec-
tors are also likely to have a low average angle For example, the form as long as underwent
between them, compared to a word that is highly the change `equal in length‟ > `equal in time‟ >
polysemous or appears in a large variety of con- `provided that‟. Evidence for semantic change
texts (e.g. „bank‟, „do‟). From this observation, it comes from written records, cognates, and struc-
follows that it should be possible to compare the tural analysis (Bloomfield, 1933). Traditional
cohesiveness of groups of vectors in terms of the categories of semantic change include (Traugott,
average pairwise similarity of the vectors of 2005: 2-4; Campbell, 2004:254-262; Forston,
which they are comprised. Because the number 2003: 648-650):
of such pairings tends to be prohibitively large  Broadening (generalization, extension,
(e.g., nearly 1,000,000 for a group of 1,000 vec- borrowing): A restricted meaning becomes less
tors), it is useful to use only a sub-sample in any restricted (e.g. Late Old English docga `a (spe-
single analysis. A Monte-Carlo analysis in which cific) powerful breed of dog‟ > dog `any member
n pair-wise similarity values are chosen at ran- of the species Canis familiaris‟
dom from each group of vectors is therefore ap-  Narrowing (specialization, restriction): A
propriate.4 relatively general meaning becomes more specif-
However, there is one final complication to ic (e.g. Old English deor `animal‟ > deer)
consider in the analysis. The passage of time in-  Pejoration (degeneration): A meaning be-
fluences not only the meaning of words, but also comes more negative (e.g. Old English sælig
styles and variety of writing. For example, texts `blessed, blissful‟ > sely `happy, innocent, pitia-
in the 11th century were much less varied, on av- ble‟ > silly `foolish, stupid‟)
erage, than those written in the 15th century.5
This will influence the calculation of context Semantic change results from the use of lan-
vectors as those depend, in part, on the text they guage in context, whether linguistic or extralin-
are taken from. Because the document as a whole guistic. Later meanings of forms are connected to
is represented by a vector that is the average of earlier ones, where all semantic change arises by
all of its words, it is possible to predict that, if no polysemy, i.e. new meanings coexist with earlier
other factors exist, two contexts are likely to be ones, typically in restricted contexts. Sometimes
related to one another to the same degree that new meanings split off from earlier ones and are
their documents are. Controlling for this effect no longer considered variants by language users
can therefore be achieved by subtracting from (e.g. mistress `woman in a position of authority,
head of household‟ > `woman in a continuing
3
Since the cosine ranges from -1 to +1, it is possible in
extra-marital relationship with a man‟).
principle to obtain negative average cosines. In practice, Semantic change is often considered unsyste-
however, the overwhelming majority of vocabulary items matic (Hock and Joseph, 1996: 252). However,
have a non-negative cosine with any given target word, recent work (Traugott and Dasher, 2002) sug-
hence the average cosine usually does not fall below zero. gests that there is, in fact, significant cross-
4
It is important to note that the number of independent
linguistic regularity in semantic change. For ex-
samples in the analysis is determined not by the number of
similarity values compared but by the number of individual
vectors used in the analysis.
5 6
Tracking changes in the distribution of the document This is the semasiological perspective on semantic change.
vectors in a corpus over time might itself be of interest to Other perspectives include the onomasiological perspective
some researchers but is beyond the scope of the current (“Given the concept 𝐶, what lexemes can it be expressed
paper. by?”). See Traugott 1999 for discussion.

106
Table 1 - Mean angle between context vectors for target words in different periods in the Helsinki
corpus (standard deviations are given in parenthesis)
Unknown composi- Early Middle Late Middle Early Modern
tion date English English English
n (<1250) (1150-1350) (1350-1500) (1500-1710)
dog 112 15.47 (14.19) 24.73(10.43)
do 4298 10.31(13.57) 13.02 (9.50) 24.54 (11.2)
deer 61 38.72 (17.59) 20.6 (18.18) 20.5 (9.82)
science 79 13.56 (13.33) 28.31 (12.24)
ample, in the Invited Inferencing Model of Se- went grammaticalization should appear in a sub-
mantic Change proposed by Traugott and Dasher stantially larger variety of contexts than it did
(2002) the main mechanism of semantic change prior to becoming a function word. One well stu-
is argued to be the semanticization of conversa- died case of grammaticalization is that of periph-
tional implicatures, where conversational impli- rastic „do‟. While in Old English „do‟ was used
catures are a component of speaker meaning that as a verb with a causative and habitual sense
arises from the interaction between what the (e.g. „do you harm‟), later in English it took on a
speaker says and rational principles of communi- functional role that is nearly devoid of meaning
cation (Grice, 1989 [1975]). Conversational im- (e.g. „do you know him?‟). Because this change
plicatures are suggested by an utterance but not occurred in Middle English, we predicted that
entailed. For example, the utterance Some stu- earlier occurrences of „do‟ will show less variety
dents came to the party strongly suggests that than later ones.
some but not all students came to the party, even In contrast with broadening, semantic narrow-
though the utterance would be true strictly speak- ing results in a meaning that is more restricted,
ing if all students came to the party. According to and is therefore applicable in fewer contexts than
the Invited Inferencing Model, conversational before. This decrease in variety results in an in-
implicatures become part of the semantic poly- crease in vector density and can be directly
semies of particular forms over time. measured as a decrease in the average angle be-
Such changes in meaning should be evident tween the context vectors for the word. As an
when examining the contexts in which the lex- example, the Old English word „deor‟ denoted a
eme of interest appears. In other words, changes larger group of living creatures than does the
in the meaning of a type should translate to dif- Modern English word „deer‟. We therefore pre-
ferences in the contexts in which its tokens are dicted that earlier occurrences of the lexemes
used. For instance, semantic broadening results „deor‟ and „deer‟, in a corpus of the appropriate
in a meaning that is less restricted and as a result time period, will show more variety than later
can be used in a larger variety of contexts. In a occurrences.
semantic space that encompasses the period of We tested our predictions using a corpus de-
such a change, this increase in variety can be rived from the Helsinki corpus (Rissanen, 1994).
measured as a decrease in vector density across The Helsinki corpus is comprised of texts span-
the time span of the corpus. This decrease trans- ning the periods of Old English (prior to
lates into an increase in the average angle be- 1150A.D.), Middle English (1150-1500A.D.),
tween the context vectors for the word. For in- and Early Modern English (1500-1710A.D.).
stance, because the Old English word „docga‟ Because spelling in Old English was highly vari-
applied to a specific breed of dog, we predicted able, we decided to exclude that part of the cor-
that earlier occurrences of the lexemes „docga‟ pus and focused our analysis on the Middle Eng-
and „dog‟, in a corpus of documents of the ap- lish and Early Modern English periods. The re-
propriate time period, will show less variety than sulting corpus included 504 distinct documents
later occurrences. totaling approximately 1.1 million words.
An even more extreme case of semantic broa- To test our predictions regarding semantic
dening is predicted to occur as part of the process change in the words „dog‟, „do‟, and „deer‟, we
of grammaticalization (Traugot and Dasher, collected all of the contexts in which they appear
2002) in which a content word becomes a func- in our subset of the Helsinki corpus. This re-
tion word. Because, as a general rule, a function sulted in 112 contexts for „dog‟, 4298 contexts
word can be used in a much larger variety of for „do‟, and 61 contexts for „deer‟. Because
contexts than a content word, a word that under- there were relatively few occurrences of „dog‟

107
Mean Angle between vectors Current Study Ellegard's data

% of periphrastic ‘do’ uses


70
30

(Ellegård, 1953)
(current study)

50

20 30
10
10
1200 1300 1400 1500 1600 1700
Year
Figure 1 – A comparison of the rise of periphrastic 'do' as measured by semantic density in our study and
the proportion of periphrastic 'do' uses by Ellegård (1953).

and „deer‟ in the corpus it was practical to com- refer to more specific disciplines (e.g., „…of the
pute the angles between all possible pairs of con- seven liberal sciences‟, Simon Forman‟s Diary,
text vectors. As a result, we elected to forgo the 1602).
Monte-Carlo analysis for those two words in fa- Our long term goal with respect to this type of
vor of a full analysis. The results of our analysis analysis is to use this method in a computer-
for all three words are given in Table 1. These based tool that can scan a diachronic corpus and
results were congruent with our prediction: The automatically identify probable cases of semantic
density of the contexts decreases over time for change within it. Researchers can then use these
both „dog‟ (t(110) = 2.17, p < .05) and „do‟ results to focus on identifying the specifics of
(F(2,2997)=409.41, p < .01) while in the case of such changes, as well as examine the overall pat-
„deer‟ there is an increase in the density of the terns of change that exist in the corpus. It is our
contexts over time (t(36) = 3.05, p < .01). belief that such a use will enable a more rigorous
Furthermore, our analysis corresponds with testing and refinement of existing theories of se-
the data collected by Ellegård (1953). Ellegård mantic change.
traced the grammaticalization of „do‟ by manual-
ly examining changes in the proportions of its 3.2 A Synchronic Investigation: Phonaes-
various uses between 1400 and 1700. His data themes
identifies an overall shift in the pattern of use In addition to examining changes in meaning
that occurred mainly between 1475 and 1575. across time, it is also possible to employ our me-
Our analysis identifies a similar shift in patterns thod to examine how the semantic space relates
between the time periods spanning 1350-1500 to other possible partitioning of the lexemes
and 1500-1570. Figure 1 depicts an overlay of represented by it. For instance, while the rela-
both datasets. The relative scale of the two sets tionship between the phonetic representation and
was set so that the proportions of „do‟ uses at semantic content is largely considered to be arbi-
1400 and 1700 (the beginning and end of El- trary, there are some notable exceptions. One
legård‟s data, respectively) match the semantic interesting case is that of phonaesthemes (Firth,
density measured by our method at those times. 1930), sub-morphemic units that have a predict-
Finally, our method can be used not only to able effect on the meaning of the word as a
test predictions based on established cases of whole. In English, one of the more frequently
semantic change, but also to identify new ones. mentioned phonaesthemes is a word-initial gl-
For instance, in examining the contexts of the which is common in words related to the visual
word „science‟ we can identify that it underwent modality (e.g., „glance‟, „gleam‟). While there
semantic broadening shortly after it first ap- have been some scholastic explorations of these
peared in the 14th century (t(77) = 4.51, p < .01). non-morphological relationships between sound
A subsequent examination of the contexts in and meaning, they have not been thoroughly ex-
which the word appears indicated that this is plored by behavioral and computational research
probably the result of a shift from a meaning re- (with some notable exceptions; e.g. Hutchins,
lated to generalized knowledge (e.g., „…and 1998; Bergen, 2004). Recently, Otis and Sagi
learn science of school‟, John of Trevisa's Polyc- (2008) used the semantic density of the cluster of
hronicon, 1387) to one that can also be used to words sharing a phonaestheme as a measure of

108
the strength of the relationship between the pho- of statistically significant t-tests with the binomi-
netic cluster and its proposed meaning. al distribution for their α (.05). After applying a
Otis and Sagi used a corpus derived from Bonferroni correction for performing 50 compar-
Project Gutenberg (http://www.gutenberg.org/) isons, the threshold for statistical significance of
as the basis for their analysis. Specifically, they the binomial test was for 14 t-tests out of 100 to
used the bulk of the English language literary turn out as significant, with a frequency of 13
works available through the project‟s website. being marginally significant. Therefore, if the
This resulted in a corpus of 4034 separate docu- significance frequency (#Sig below) of a candi-
ments consisting of over 290 million words. date phonaestheme was 15 or higher, that pho-
The bulk of the candidate phonaesthemes they naestheme was judged as being supported by
tested were taken from the list used by Hutchins statistical evidence. Significance frequencies of
(1998), with the addition of two candidate pho- 13 and 14 were considered as indicative of a
naesthemes (kn- and -ign). Two letter combina- phonaestheme for which there was only marginal
tions that were considered unlikely to be pho- statistical support.
naesthemes (br- and z-) were also included in Among Hutchins‟ original list of 44 possible
order to test the method‟s capacity for discrimi- phonaesthemes, 26 were found to be statistically
nating between phonaesthemes and non- reliable and 2 were marginally reliable. Overall
phonaesthemes. Overall Otis and Sagi (2008) the results were in line with the empirical data
examined 47 possible phonaesthemes. collected by Hutchins. By way of comparing the
In cases where a phonetic cluster represents a two datasets, #Sig and Hutchins‟ average rating
phonaestheme, it intuitively follows that pairs of measure were well correlated (r = .53). Neither
words sharing that phonetic cluster are more of the unlikely phonaestheme candidates we ex-
likely to share some aspect of their meaning than amined were statistically supported phonaes-
pairs of words chosen at random. Otis and Sagi themes (#Sigbr- = 6; #Sigz- = 5), whereas both of
tested whether this was true for any specific can- our newly hypothesized phonaesthemes were
didate phonaestheme using a Monte-Carlo analy- statistically supported (#Sigkn- = 28; #Sig-ign =
sis. First they identified all of the words in the 23). In addition to being able to use this measure
corpus sharing a conjectured phonaestheme7 and as a decision criterion as to whether a specific
chose the most frequent representative word phonetic cluster might be phonaesthemic, it can
form for each stem, resulting in a cluster of word also be used to compare the relative strength of
types representing each candidate phonaes- two such clusters. For instance, in the Gutenberg
theme.8 Next they tested the statistical signific- corpus the phonaesthemic ending –owl (e.g.,
ance of this relationship by running 100 t-test „growl‟, „howl‟; #Sig=97) was comprised of a
comparisons. Each of these tests compared the cluster of words that were more similar to one
relationship of 50 pairs of words chosen at ran- another than –oop (e.g., „hoop‟, „loop‟; #Sig=32).
dom from the conjectured cluster with 50 pairs of Such results can then be used to test the cogni-
words chosen at random from a similarly sized tive effects of phonaesthemes. For instance, fol-
cluster, randomly generated from the entire cor- lowing the comparison above, we might hypo-
pus. The number of times these t-tests resulted in thesize that the word „growl‟ might be a better
a statistically significant difference (α = .05) was semantic prime for „howl‟ than the word „hoop‟
recorded. This analysis was repeated 3 times for is for the word „loop‟. In contrast, because a
each conjectured phonaestheme and the median word-initial br- is not phonaesthemic, the word
value was used as the final result. „breeze‟ is unlikely to be a semantic prime for
To determine whether a conjectured phonaes- the word „brick‟. In addition, it might be interest-
theme was statistically supported by their analy- ing to combine the diachronic analysis from the
sis Otis and Sagi compared the overall frequency previous section with the synchronic analysis in
this section to investigate questions such as when
and how phonaesthemes come to be part of a
7
It is important to note that due to the nature of a written language and what factors might affect the
corpus, the match was orthographical rather than phonetic. strength of a phonaestheme.
However, in most cases the two are highly congruent.
8
Because, in this case, Otis and Sagi were not interested in
temporal changes in meaning, they used the overall word 4 Discussion
vectors rather than look at each context individually. As a
result, each of the vectors used in the analysis is based on While the method presented in this paper is
occurrences in many different documents and there was no aimed towards quantifying semantic relation-
need to control for the variability of the documents.

109
ships that were previously difficult to quantify, it which a word is used in different documents and
also raises an interesting theoretical issue, name- time periods might be useful not only as a tool
ly the relationship between the statistically com- for examining the history of a semantic change
puted semantic space and the actual semantic but also as an instrument for predicting its future
content of words. On the one hand, simulations progress. Overall, this suggests a dynamic view
based on Latent Semantic Analysis have been of the field of semantics – semantics as an ever-
shown to correlate with cognitive factors such as changing landscape of meaning. In such a view,
the acquisition of vocabulary and the categoriza- semantic change is the norm as the perceived
tion of texts (cf. Landauer & Dumais, 1997). On meaning of words keeps shifting to accommo-
the other hand, in reality speakers‟ use of lan- date the contexts in which they are used.
guage relies on more than simple patterns of
word co-occurrence – For instance, we use syn- References
tactic structures and pragmatic reasoning to sup-
Bergen, B. (2004). The Psychological Reality of
plement the meaning of the individual lexemes
Phonaesthemes. Language, 80(2), 291-311.
we come across (e.g., Fodor, 1995; Grice, 1989
[1975]). It is therefore likely that while LSA cap- Berry, M. W. (1992) SVDPACK: A Fortran-77
tures some of the variability in meaning exhi- software library for the sparse singular value
bited by words in context, it does not capture all decomposition. Tech. Rep. CS-92-159, Knox-
of it. Indeed, there is a growing body of methods ville, TN: University of Tennessee.
that propose to integrate these two disparate
Berry, M. W., Do, T., O‟Brien, G. Vijay, K. Va-
sources of linguistic information (e.g., Pado and
radh an, S. (1993) SVDPACKC (Version 1.0)
Lapata, 2007; Widdows, 2003)
User’s Guide, Tech. Rep. UT-CS-93-194,
Certainly, the results reported in this paper
Knoxville, TN: University of Tennessee.
suggest that enough of the meaning of words and
contexts is captured to allow interesting infe- Bloomfield, L. (1933). Language. New York,
rences about semantic change and the relatedness NY: Holt, Rinehart and Winston.
of words to be drawn with a reasonable degree of Campbell, L. (2004) Historical linguistics: An
certainty. However, it is possible that some im- introduction 2nd ed. Cambridge, MA: The MIT
portant aspects of meaning are systematically Press.
ignored by the analysis. For instance, it remains
to be seen whether this method can distinguish Deerwester, S., Dumais, S. T., Furnas, G. W.,
between processes like pejoration and amerliora- Landauer, T. K., and Harshman, R. (1990) In-
tion as they require a fine grained distinction be- dexing by Latent Semantic Analysis. Journal
tween „good‟ and „bad‟ meanings. of the American Society for Information
Regardless of any such limitations, it is clear Science, 41, 391-407.
that important information about meaning can be Ellegård, A. (1953) The Auxiliary Do: the Estab-
gathered through a systematic analysis of the lishment and Regulation of its Use in English.
contexts in which words appear. Furthermore, Gothenburg Studies in English, 2. Stockholm:
phenomena such as the existence of phonaes- Almqvist and Wiksell.
themes and the success of LSA in predicting vo-
cabulary acquisition rates, suggest that the acqui- Firth, J. (1930) Speech. London: Oxford Univer-
sition of new vocabulary involves the gleaning of sity Press.
the meaning of words through their context. The Firth, J. (1957) Papers in Linguistics, 1934-1951,
role of context in semantic change is therefore Oxford University Press.
likely to be an active one – when a listener en-
counters a word they are unfamiliar with they are Fodor, J. D. (1995) Comprehending sentence
likely to use the context in which it appears, as structure. In L. R. Gleitman and M. Liberman,
well as its phonetic composition, as clues to its (Eds.), Invitation to Cognitive Science, volume
meaning. Furthermore, if a word is likewise en- 1. MIT Press, Cambridge, MA. 209-246.
countered in context in which it is unlikely, this Forston, B. W. (2003) An Approach to Semantic
unexpected observation may induce the listener Change. In B. D. Joseph and R. D. Janda
to adjust their representation of both the context (Eds.), The Handbook of Historical Linguis-
and the word in order to increase the overall co- tics. Malden, MA: Blackwell Publishing. 648-
herence of the utterance or sentence. As a result, 666.
it is possible that examining the contexts in

110
Grice, H. P. (1989) [1975]. Logic and Conversa- Teacher Education Cases. Journal of Educa-
tion. In Studies in the Way of Words. Cam- tional Computing Research, 35, 267-287.
bridge, MA: Harvard University Press. 22-40.
Rissanen, M. (1994) The Helsinki Corpus of
Halliday, M. A. K., & Hasan, R. (1976) Cohe- English Texts. In Kytö, M., Rissanen, M. and
sion in English. London: Longman. Wright S. (eds), Corpora Across the Centu-
ries: Proceedings of the First International
Hock, H. H., and Joseph, B. D. (1996) Language
Colloquium on English Diachronic Corpora.
History, Language Change, and Language Re-
Amsterdam: Rodopi.
lationship: An Introduction to Historical and
Comparative Linguistics. Berlin: Mouton de Schütze, H. (1996) Ambiguity in language learn-
Gruyter. ing: computational and cognitive models. CA:
Stanford.
Hoey, M. (1991) Patterns of Lexis in Text. Lon-
don: Oxford University Press. Schütze, H. (1998) Automatic word sense dis-
crimination. Computational Linguistics
Hutchins, S. S. (1998). The psychological reality,
24(1):97-124.
variability, and compositionality of English
phonesthemes. Dissertation Abstracts Interna- Takayama, Y., Flournoy R., & Kaufmann, S.
tional, 59(08), 4500B. (University Microfilms (1998) Information Mapping: Concept-Based
No. AAT 9901857). Information Retrieval Based on Word
Associations. CSLI Tech Report. CA:
Infomap [Computer Software]. (2007).
Stanford.
http://infomap-nlp.sourceforge.net/ Stanford,
CA. Takayama, Y., Flournoy, R., Kaufmann, S. &
Peters, S. (1999). Information retrieval based
Kaufmann, S. (2000) Second-order cohesion.
on domain-specific word associations. In Cer-
Computational Intelligence. 16, 511-524.
cone, N. and Naruedomkul K. (eds.), Proceed-
Landauer, T. K., & Dumais, S. T. (1997). A solu- ings of the Pacific Association for Computa-
tion to Plato's problem: The Latent Semantic tional Linguistics (PACLING’99), Waterloo,
Analysis theory of the acquisition, induction, Canada. 155-161.
and representation of knowledge. Psychologi-
Traugott, E. C. (1999) The Role of Pragmatics in
cal Review, 104, 211-240.
Semantic Change. In J. Verschueren (ed.),
Levin, E., Sharifi, M., & Ball, J. (2006) Evalua- Pragmatics in 1998: Selected Papers from the
tion of utility of LSA for word sense discrimi- 6th International Pragmatics Conference, vol.
nation. In Proceedings of the Human Lan- II. Antwerp: International Pragmatics Associa-
guage Technology Conference of the NAACL, tion. 93-102.
Companion Volume: Short Papers, New York
Traugott, E. C. (2005) Semantic Change. In En-
City. 77-80.
cyclopedia of Language and Linguistics, 2nd
Marcu, D (2003) Automatic Abstracting, Encyc- ed., Brown K. ed. Oxford: Elsevier.
lopedia of Library and Information Science,
Traugott, E. C., and Dasher R. B. (2002) Regu-
Drake, M. A., ed. 245-256.
larity in Semantic Change. Cambridge: Cam-
Otis K., & Sagi E. (2008) Phonaesthemes: A bridge University Press.
Corpora-based Analysis. In B. C. Love, K.
Widdows, D. (2003) Unsupervised methods for
McRae, & V. M. Sloutsky (Eds.), Proceedings
developing taxonomies by combining syntactic
of the 30th Annual Conference of the Cognitive
and statistical information. In Proceedings of
Science Society. Austin, TX: Cognitive
the joint Human Language Technology Confe-
Science Society.
rence and Annual Meeting of the North Ameri-
Pado, S. & Lapata, M. (2007) Dependency-based can Chapter of the Association for Computa-
Construction of Semantic Space Models. tional Linguistics. Edmonton, Canada:
Computational Linguistics, 33, 161-199. Wiemer-Hastings. 197-204.
Riedel E., Dexter S. L., Scharber C., Doering A. Widdows, D. (2004) Geometry and Meaning.
(2006) Experimental Evidence on the Effec- CSLI Publications, CA: Stanford.
tiveness of Automated Essay Scoring in

111

You might also like