0% found this document useful (0 votes)
10 views

Variation in Language - System - and Usage-Based Approaches

Uploaded by

rosariobaron1216
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Variation in Language - System - and Usage-Based Approaches

Uploaded by

rosariobaron1216
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 323

Variation in Language: System- and Usage-based Approaches

linguae & litterae

Publications of the School of Language & Literature


Freiburg Institute for Advanced Studies

Edited by
Peter Auer, Gesa von Essen, Werner Frick

Editorial Board
Michel Espagne (Paris), Marino Freschi (Rom), Ekkehard König (Berlin),
Michael Lackner (Erlangen-Nürnberg), Per Linell (Linköping),
Angelika Linke (Zürich), Christine Maillard (Strasbourg),
Lorenza Mondada (Basel), Pieter Muysken (Nijmegen),
Wolfgang Raible (Freiburg), Monika Schmitz-Emans (Bochum)

Volume 50
Variation in Language:
System- and Usage-
based Approaches

Edited by
Aria Adli, Marco García García and Göz Kaufmann
ISBN 978-3-11-034355-7
e-ISBN (PDF) 978-3-11-034685-5
e-ISBN (EPUB) 978-3-11-038457-4
ISSN 1869-7054

Library of Congress Cataloging-in-Publication Data


A CIP catalog record for this book has been applied for at the Library of Congress.

Bibliographic information published by the Deutsche Nationalbibliothek


The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie;
detailed bibliographic data are available on the Internet at http://dnb.dnb.de.

© 2015 Walter de Gruyter GmbH, Berlin/Boston


Typesetting: epline, Kirchheim unter Teck
Printing: Hubert & Co. GmbH & Co. KG, Göttingen
♾ Printed on acid-free paper
Printed in Germany

www.degruyter.com
Contents
Aria Adli, Marco García García, Göz Kaufmann
System and usage: (Never) mind the gap  1

Part 1: System, usage, and variation

Frederick J. Newmeyer
Language variation and the autonomy of grammar  29

Gregory R. Guy
The grammar of use and the use of grammar  47

Richard Cameron
Looking for structure-dependence, category-sensitive processes,
and long-distance dependencies in usage  69

Mary A. Kato
Variation in syntax: Two case studies on Brazilian Portuguese  91

Part 2: Rare phenomena and variation

Göz Kaufmann
Rare phenomena revealing basic syntactic mechanisms: The case of
unexpected verb-object sequences in Mennonite Low German  113

Leonie Cornips
The no man’s land between syntax and variationist sociolinguistics:
The case of idiolectal variability  147

Aria Adli
What you like is not what you do:
Acceptability and frequency in syntactic variation  173
VI Contents

Part 3: Grammar, evolution, and diachrony

Hubert Haider
“Intelligent design” of grammars – a result of cognitive evolution  203

Guido Seiler
Syntactization, analogy and the distinction between
proximate and evolutionary causations  239

Rena Torres Cacoullos


Gradual loss of analyzability: Diachronic priming effects  265

Malte Rosemeyer
How usage rescues the system: Persistence as conservation  289
Aria Adli, University of Cologne
Marco García García, University of Cologne
Göz Kaufmann, University of Freiburg
System and usage: (Never) mind the gap1

1 System- and usage-based approaches

1.1 What is at stake?

At least since the Saussurian distinction between langue and parole, the relation
between grammar and language use has been a central topic of linguistic thought.
The present volume deals with this relation by focusing on language variation.
The improved possibilities of working with large corpora and the increased refine-
ment of experimental designs make this – once again – a worthwhile undertaking.
Quite unsurprisingly, different linguistic subfields make different uses of these
new possibilities, uses which reflect their respective theoretical frames. Many
sociolinguists apply usage-based approaches while most, though not all, syn-
tacticians adhere to system-based approaches. However, both usage and system
are heterogeneous and even somewhat fuzzy concepts. Therefore, the question
arises whether this distinction is at all meaningful. It may be more appropriate
to conceive a continuum between system- and usage-based approaches. Such a
continuum includes intermediate positions, several of which can be found in this
volume. On the system-side, the endpoint of such a continuum may be seen in
generative grammar. An important common denominator of generative (system-
based) approaches is the assumption that grammar is independent from usage
and that language use obeys the rules of a grammatical system. On the usage-
side, the endpoint may be represented by the model of emergent grammar, which
refers to the idea that linguistic structures and regularities are no more than an
epiphenomenon, i. e. “not the source of understanding a communication but
a by-product of it” (Hopper 1998: 156). An important common denominator of

1 The editors would like to thank the authors for their contributions and for their willingness to
participate in the process of internal reviewing. Our thanks also go to two external reviewers, to
Peter Auer for his continuous help and to Elin Arbin for checking the English of authors whose
native language is not English. Obviously, all remaining shortcomings are our responsibility.
Finally, we would like to thank the Freiburg Institute for Advanced Studies (FRIAS) for financing
both the workshop System, Usage, and Society, which took place in Freiburg in November 2011,
and the publication of this volume.
2 Aria Adli, Marco García García, and Göz Kaufmann

usage-based approaches is that grammar is essentially shaped by usage patterns


and frequency.
When bringing together system, usage and variation, some basic questions
have to be answered: (i) Given that systematic variation is one of the most stable
findings in the analysis of language(s), how do (more) system-based and (more)
usage-based approaches explain such variation? (ii) How do (more) system-
and (more) usage-based approaches define the relation between theory and
data? (iii) Are there empirical facts that can only be explained satisfactorily by
an autonomous grammatical system? (iv) Can we find distributional patterns
whose quantitative analysis constitutes strong evidence for the assumption that
frequency, i. e. usage, shapes the speakers’ cognitive representation of language?
The first two questions will be dealt with in this introductory chapter, the last two
in the contributions to this volume.

1.2 Gradience and change

Experience and gradience are defined differently in usage and system-based


approaches: According to Bybee (2006: 711), experience with language is the
basic element in usage-based models: “While all linguists are likely to agree that
grammar is the cognitive organization of language, a usage-based theorist would
make the more specific proposal that grammar is the cognitive organization of
one’s experience with language”. However, experience cannot be the trigger for
an abrupt change of the cognitive representation of language; it rather leads
to gradual processes. Bybee (2010: 2) compares language to a constantly and
smoothly changing sand dune:

The primary reason for viewing language as a complex adaptive system, that is, as being
more like sand dunes than like a planned structure, such as a building, is that language
exhibits a great deal of variation and gradience. Gradience refers to the fact that many cat-
egories of language or grammar are difficult to distinguish, usually because change occurs
over time in a gradual way, moving an element along a continuum from one category to
another.

By contrast, generative syntacticians assume the existence of an autonomous


syntactic core module. Importantly, this module is not shaped by the individ-
ual’s experience with language since it is considered to be part of the biologi-
cal equipment of humans. Universal Grammar is the genotype and each adult
grammar constitutes a possible phenotype. Thus, experience is only the factor
that explains the development of a specific phenotype within the restricted limits
imposed by universal grammar. During first language acquisition, parameters are
System and usage: (Never) mind the gap 3

set to specific values, a process which is seen as fairly robust (Meisel 2011). Light-
foot (2006: 6) points out that “a person’s system, his/her grammar, grows in the
first few years of life and varies at the edges depending on a number of factors”
[highlighting added by us]. By way of illustration, Lightfoot (2006: 4–5) uses the
mold, not the sand dune, as metaphor:

The biology of life is similar in all species, from yeasts to humans. Small differences in
factors like the timing of cell mechanisms can produce large differences in the resulting
organism, the difference, say, between a shark and a butterfly. Similarly the languages of
the world are cast from the same mold, their essential properties being determined by fixed,
universal principles. The differences are not due to biological properties but to environ-
mental factors.

Within this approach, variation and gradience require other explanations than in
usage-based theories. Many generative syntacticians see their locus at the com-
munity level in the sense that during a period of change, multiple competing
grammars coexist. Introducing the constant-rate hypothesis, Kroch (1989: 200)
presents a quantitative corpus study of the rise of periphrastic do in English ques-
tions and negations. He claims that “when one grammatical option replaces
another with which it is in competition within the community across a set of lin-
guistic contexts, the rate of replacement, properly measured, is the same in all of
them”.2
However, for usage-based linguists, the locus of change (and of gradience)
is not exclusively the community where typically a generational change between
caretakers and children takes place, but also the mature individual whose linguis-
tic knowledge undergoes changes over lifetime. Usage-based theorists also point
out that frequency should not be seen in isolation, but rather in “interaction and
competition with various other factors of language use, such as recency, salience
and context” (Behrens et al. to appear: section 9).
Taking a usage-based stance, Torres Cacoullos (this volume) discusses
both change over lifetime and recency or priming effects. She studies the his-
torical development of complex verbal constructions in Spanish (locative estar +
gerund) to a single periphrastic unit of progressive aspect. In doing so, she shows
that progressive estar-constructions are primed by preceding non-progressive
estar-constructions. Torres Cacoullos argues that the priming effect (including
its changing intensity over time) is the result of the analyzability of the progres-

2 Yang (2000: 248) extends Kroch’s (1989) interpretation, assuming that multiple grammars do
not only exist within a community but also within an individual’s mind. He claims that “there
is evidence of multiple grammars in mature speakers during the course of language change.”
4 Aria Adli, Marco García García, and Göz Kaufmann

sive estar-construction. The capacity of this type of analyzability and the gradual
change over time connected to it is assumed to exist within an individual’s
grammar, given that priming is a psycholinguistic phenomenon based on individ-
ual cognitive processes. Furthermore, Torres Cacoullos argues that Kroch’s (1989)
above-mentioned constant-rate hypothesis does not hold for the probability of
selecting the progressive variant.
Rosemeyer (this volume) is another contribution from a usage-based per-
spective. On the basis of a quantitative historical corpus, he studies Spanish split
auxiliary selection, i. e. the question whether writers chose BE or HAVE in analytic
perfect constructions. Like Torres Cacoullos, he analyzes priming effects, namely
effects of persistence (linked to temporally close activation) and entrenchment
(linked to repeated activation) (in the sense of Langacker 1987: 59; Bybee 2002;
Szmrecsanyi 2005). Rosemeyer points out that both persistence and entrench-
ment have conserving effects on diachronic grammatical development, thereby
creating systematicity in the patterns of change.
The generative view on frequency effects is quite different. It is crucial to dis-
tinguish, as Meisel (2011: 3) has pointed out, between grammatical change that
involves parameter resetting in the sense of Universal Grammar (see e. g. Light-
foot 2006) and change that is not attributable to new parameter values:

As Sankoff (2005) and Sankoff and Blondeau (2007) have demonstrated, individuals may,
in fact, adapt their language use during adulthood to innovative patterns resulting from
generational change. Such lifespan changes may have profound consequences, but they
do not involve reanalysis of grammars, i. e. we do not find evidence suggesting that mental
representations of parameterized grammatical knowledge are subject to modifications after
childhood. Even attrition of syntactic knowledge only seems to affect a person’s ability to
use the knowledge developed early on in life.

For example, the frequency of subject pronoun realization can vary substantially
from one null subject language (NSL) to another (Otheguy, Zentella and Livert
2007). However, from a generative point of view a critical threshold must be
reached which leads to a parameter resetting from [+NSL] to [–NSL] or vice versa
(or possibly also to [+partial NSL] in the sense of Holmberg, Nayudu and Sheehan
2009). Beyond that critical threshold, change is not gradual but abrupt, because
in this case I-grammar changes. This has been clearly expressed by Lightfoot
(2006: 158):

I submit that work on abrupt creolization, the acquisition of signed languages, and on cata-
strophic historical change shows us that children do not necessarily converge on grammars
that match input. This work invites us to think of children as cue-based learners: they do
not rate the generative capacity of grammars against the sets of expressions they encounter
but rather they scan the environment for necessary elements of I-language in unembedded
System and usage: (Never) mind the gap 5

domains, and build their grammars cue by cue. The cues are not in the input directly, but
they are derived from the input, in the mental representations yielded as children under-
stand and “parse” the E-language to which they are exposed [...]. We may seek to quantify
the degree to which cues are expressed by the PLD [Primary Linguistic Data], showing
that abrupt, catastrophic change takes place when those cues are expressed below some
threshold of robustness and are eliminated.

Thus, one crucial question is whether there are different types of change, gradual
non-parametric ones that are better accounted for with usage-based models and
abrupt parametric ones that can be better explained within the generative model.3
Newmeyer (2003: 693–694) highlights one aspect that contradicts the usage-
based view: He shows that grammars are not always useful in the sense of opti-
mally responding to users’ pragmatic and social needs, i. e., languages often lack
lexical and/or grammatical properties that may arguably be useful and possess
properties that are not useful at all. As an example of the lack of distinctions that
would be useful in everyday communication, Newmeyer (2003: 693) mentions the
conspicuous rarity of the inclusive/exclusive pronoun distinction in the world’s
languages. If grammar was as adaptive and fluid as suggested by the sand dune
metaphor (most clearly expressed by the idea of emergent grammar in Bybee and
Hopper 2001), one would expect, as he states, such useful features to occur in
the great majority of languages. Unlike this, characteristics of dubious usefulness
such as the homonymy between English you2sg and you2pl should not occur. Like-
wise, we would not expect abrupt phenomena of change under the assumption of
a fully adaptive and fluid grammar.

1.3 Frequency and probabilities in grammar

The debate in the Journal Language in the years 2003 to 2007 collected diamet-
rically opposed opinions about the (ir)relevance of frequency and probabilities
in grammar. Newmeyer (2003) doubted in his paper the representativeness of
empirical data in many usage-based corpus studies on syntax (cf. also the dis-
cussion on the relation between theory and data in section 2.1). Guy (2005, 2007)
replied that this is no more (but also no less) than a challenge that can be over-
come. Indeed, quantitative sociolinguistic research has already established, as

3 We know from research in dynamic modeling that phenomena of change in very different
fields do incorporate both gradual and abrupt changes (cf. e. g. Thom 1980). Future research
needs to show whether grammatical change follows a similar pattern and whether gradual and
abrupt language change can be integrated into one theory.
6 Aria Adli, Marco García García, and Göz Kaufmann

Guy points out, high methodological standards with regard to corpus data. Fur-
thermore, the explanatory power of linguistic theory would be unnecessarily
diminished when quantitative correlations with other language-internal and
social factors are not taken into account.
At this point, it is important to highlight that variationist sociolinguistics is
not “inherently usage-based”. However, the importance of frequency or probabil-
ity represents an important zone of overlap between variationist and usage-based
models. It does not come as a surprise that one sociolinguistic subfield, namely
cognitive sociolinguistics (Kristiansen and Dirven 2008; Geeraerts, Kristiansen
and Peirsman 2010), builds on the premises of usage-based linguistics.
Likewise, relying on corpus data from actual language use in formal-syntactic
research does not by itself lead to the incorporation of usage-based positions into
generative thinking. The difference between theorists from both persuasions with
regard to the role of experience on the cognitive organization of the (child and
adult) speaker remains. Yet, taking corpus data seriously leads to a more serious
consideration of frequency and related phenomena such as gradience, recency,
and variation in usage. To put it in Barbiers’ (2013: 3) words, we could then “shift
away from the methodology of idealization of the data in search of the universal
syntactic properties of natural language, towards a methodology that takes into
account the full range of syntactic variation that can be found in colloquial lan-
guage”. However, most generative linguists still do not analyze social variation
in their research and those who do have abandoned classic generative premises.4
Seeing the social perspective as irrelevant goes back to Chomsky’s (1965: 3–5)
notion of the ideal speaker-listener. The following quotation, which is embedded
in Chomsky’s (2000: 31) critique of externalist philosophy, highlights this:

Suppose, for example, that “following a rule” is analyzed in terms of communities: Jones
follows a rule if he conforms to the practice or norms of the community. If the “community”
is homogeneous, reference to it contributes nothing (the notions norm, practice, con-
vention, etc. raise further questions). If the “community” is heterogeneous – apart from the
even greater unclarity of the notion of norms (practice, etc.) for this case – several problems
arise. One is that the proposed analysis is descriptively inaccurate. Typically, we attribute
rule-following in the case of notable lack of conformity to prescriptive practice or alleged
norms. […] The more serious objection is that the notion of “community” or “common

4 However, not all generative syntacticians follow this approach. Wilson and Henry (1998: 8)
point out that social variation and change is constrained by universal grammar, which defines
the set of possible grammars. A corresponding observation is that grammatical introspection –
the most important empirical source in generative syntax – is subject to systematic social vari-
ation (Adli 2013: 508; cf. also Bender 2007; Eckert 2000: 45).
System and usage: (Never) mind the gap 7

language” makes as much sense as the notion “nearby city” or “look alike”, without further
specification of interests, leaving the analysis vacuous.

The classic generative position, particularly widespread during the early years of
generative grammar, is that frequency effects and correlations between the use of
a construction and other language-internal and language-external factors were
(and regrettably sometimes still are) considered to be epiphenomenal, a position
most clearly expressed in Chomsky’s (1965: 3) famous stance:

Linguistic theory is concerned primarily with an ideal speaker-listener, in a completely


homogeneous speech-community, who knows its language perfectly and is unaffected by
such grammatically irrelevant conditions as memory limitations, distractions, shifts of
attention and interest, and errors (random or characteristic) in applying his knowledge of
the language in actual performance.

According to mainstream generative theory, understanding language use is sec-


ondary in the sense that any theory of use presupposes a theory of the system
(Chomsky 1965: 9). Actual linguistic experience and general non-linguistic capac-
ities are rather “third-factor effects” in this research enterprise, as becomes clear
in one of Chomsky’s (2009: 25) recent writings:

Assuming that language has general properties of other biological systems, we should be
seeking three factors that enter into its growth in the individual: (1) genetic factors, the topic
of UG; (2) experience, which permits variation within a fairly narrow range; (3) principles
not specific to language. The third factor includes principles of efficient computation,
which would be expected to be of particular significance for systems such as language. UG
is the residue when third-factor effects are abstracted.

1.4 Variation as a bone of contention

In usage-based approaches variation does not constitute a serious problem


because it is seen as a core property of language and of the speakers’ knowledge
of language. As we have seen, variationist sociolinguists, who are often but not
always in line with usage-based positions, describe variation by means of vari-
able rules, calculating the probability p for a given form in a subpopulation q
(Guy 1991; Labov 1969). Guy (this volume) writes that “any grammar of a lan-
guage, or theory of grammar, that fails to account for variability is inadequate
on its face – it does not even reach Chomsky’s most elementary level of ‘observa-
tional’ adequacy”.
Generative syntacticians usually do not adhere to the idea of “optional” vari-
ants of one underlying form, which is a major conceptual obstacle in integrating
8 Aria Adli, Marco García García, and Göz Kaufmann

variation into their model. Rather, optionality is described as “apparent”, either


by assuming that the input of the sentences differs (Adger and Smith 2010) –
with or without differences in meaning – or by assuming that the sentences are
produced by multiple competing grammars (Tortora and den Dikken 2010; Kroch
1989). One example illustrating the debate on optionality are French wh-ques-
tions, in which the wh-element can appear in-situ as in (1a) or fronted as in (1b).

(1a) Ton frère est allé où ?


your brother has gone where

(1b) Où ton frère est allé ?


where your brother has gone
‘Where did your brother go?’

It is telling that optionality, the central notion of variationist sociolinguistics


(“alternate ways of saying the same thing”, cf. Labov 1972: 118), is refuted by
mainstream generative syntax. Belletti and Rizzi (2002: 34) express the generative
position very clearly with regard to word order variation: “The movement-as-last
resort approach implies that there is no truly optional movement. This has made
it necessary to reanalyze apparent cases of optionality, often leading to the dis-
covery of subtle interpretive differences”. The wh-fronted order in (1b) could be
an example of this. Some linguists would claim that the word order differences in
(1a) and (1b) represent pragmatic or semantic differences, for example in terms
of givenness of the non-wh-entities or presupposition of the wh-element, yet it is
far from clear whether such differences are systematic. The idea, as pointed out
by Newmeyer (2006: 705), is that “as far as syntactic variation is concerned, since
variants typically differ in meaning, the probabilities are likely to be more a func-
tion of the meaning to be conveyed than a characteristic inherent to the structure
itself”. Functionally motivated variation, “the more systematic aspects of one’s
extragrammatical knowledge”, is taken into account by Newmeyer (2003: 698);
however, it is not seen as part of the system but of the “user’s manual”.5

5 The notion of a user’s manual goes back to Culy (1996: 112), who sees its roots in pragmatics.
His initial idea was to explain systematic grammatical differences between registers/styles,
such as the distinctive use of zero objects in English recipes or the frequency of use of particular
grammatical forms or constructions. The user’s manual in Culy’s (1996: 114) terms is roughly
described as specifying “the characteristics of registers and, within each register, […] character-
istics of different styles”. The user’s manual essentially carries information on frequency of use
of items and constructions and default interpretations of variables in valency relations. This
notion was later taken up by Zwicky (1999), yet without proposing a notably refined definition.
System and usage: (Never) mind the gap 9

The criteria that have to be fulfilled for most syntacticians with regard to
optionality are much stricter than those usually applied in variationist socio­
linguistics. While many syntacticians refute optionality if two variants are not
fully identical in meaning and distribution, it is good enough for sociolinguists
if two variants are optional in most contexts. This phenomenon is called “neu-
tralization in discourse” (Sankoff 1988: 153). In his debate with Newmeyer, Guy
(2007: 3) points out that “the prevailing consensus is that, while certain struc-
tures may have different meanings in some of the contexts they occur in, there are
often other contexts in which they function as alternants. Therefore, productive
variationist analyses can be conducted, given careful attention to contexts and
meaning”.
Syntacticians who try to incorporate optionality into the grammar system
represent a minority. Kato (this volume) is one example. She presents a study
within the generative paradigm which nevertheless takes a critical stance
towards standard minimalist assumptions. Kato attempts to account for variation
and optionality in Brazilian Portuguese syntax. First, she discusses the variation
between null and overt subject pronouns. It is noteworthy that she sees the locus
of this variation inside a person’s I-language without resorting to the idea of mul-
tiple internal grammars (such as Yang 2000; see also Roeper’s 1999 more radical
approach of “universal bilingualism”). In doing so, she refers to the distinction,
introduced by Chomsky (1981: 8), between the core and the extended periphery of
grammar, both constitutive of a person’s I-language. Kato builds on Kato (2011),
where core grammar is linked to early childhood acquisition and the extended
periphery in syntax to late childhood acquisition. It turns out that overt subject
pronouns, typical of current Brazilian Portuguese, are acquired before schooling
and null subjects during schooling. This means that the null subject is used by
older children and adults, yet its late acquisition “does not affect grammar as
a system”. The second phenomenon Kato discusses concerns optional surface
orders of Brazilian Portuguese wh-questions, namely the variation between
“fronted” and “in-situ” wh-constituents. Her study suggests that the positioning
of the wh-constituent is acquired in early childhood. Thus, Kato argues that the
variants belong to the child’s core grammar.
Barbiers (2005) is another example of a generative syntactician who incor-
porates the notion of optionality into grammar.6 He assumes that “variation and

6 Other examples are Fukui (1993), Saito and Fukui (1998), Henry (2002), Haider and Rosengren
(2003), Biberauer and Richards (2006), and Adli (2013). Barbiers (2005) builds his conclusions
on the data from SAND (Syntactische Atlas van de Nederlandse Dialecten, cf. www.meertens.
knaw.nl/sand and Barbiers 2013: 2–3 on the European Dialect Syntax Project). Other projects
10 Aria Adli, Marco García García, and Göz Kaufmann

optionality are an inherent property of grammatical systems. Individual speakers


and communities pick their choice from the options provided by their grammati-
cal system, but they never pick beyond these options” (cf. as a possible counter-
example Cornip’s (this volume) example of non-V2-root clauses in Germanic
varieties).7
We have already seen in the previous section that the issue of frequency and
probability constitutes a dividing line between generative syntacticians and vari-
ationist sociolinguists. This question is also an essential aspect in the discussion
on how to account for variation: In essence, the question is whether grammar
contains numbers or not? Variationist sociolinguists believe that “variability and
quantitative properties are found in the system, inside the grammar” (Guy, this
volume). Guy underlines the fact that functional and usage-based approaches
that do not work with probabilities and limit their explanations to notions such
as functional load “fail to predict any specific quantitative relation”. In this
sense, variationist sociolinguists claim to go further than usage-based linguists
by integrating mathematical operations and probabilities into the grammar.
These regular patterns of probability have been described by Weinreich, Labov
and Herzog (1968: 100) as “orderly heterogeneity”, which is considered a core
property of language and a basic aspect of speakers’ knowledge. Newmeyer (this
volume) rejects this view, claiming that “grammars do not contain numbers”.
He states that probabilistic observations in variation are an “interaction of the
formal grammar and extragrammatical faculties, as modulated by the user’s
manual”. Newmeyer uses the notion of core and interface in order to further pin-
point the user’s manual. He adopts what he calls a “modular approach to vari-
ation” with grammatical competence (the system) as its core. According to him,
the user’s manual presents “one face to grammatical competence and one face
to the external factors that shape grammar. In a nutshell, it tells us what to do
with our grammars and how often to do it”. It is not surprising that Guy (2007:
4) strongly criticizes “an ill-defined ‘user’s manual’, from which the quantitative
generalizations emerge epiphenomenally, without any necessary connection to
the principles contained in the grammar”.

which combine quantitative analyses of microdialectal variation with modern syntactic theory
are the Dialect Syntax of Swiss German (University of Zurich), and Kaufmann’s (2007) study on
the verbal syntax of Mennonite Low German.
7 For an interesting formal proposal that accounts for (genuine or apparent) optionality within
the minimalist framework, see the algorithm dubbed “combinatorial variability” presented by
Adger (2006) and Adger and Smith (2010).
System and usage: (Never) mind the gap 11

1.5 A dash of epistemology

At the present stage of knowledge it is hard to see how the dispute for or against
a clear distinction of grammar and usage can be empirically settled. One way to
proceed can be to take a step back in order to engage in an epistemological dis-
cussion. Cameron (this volume) takes a critical stance against a binary view of
system and usage. He emphasizes that this distinction is better described as a
“fundamental assumption that contributes to theory building” and that “as such,
the distinction itself may not actually be falsifiable in a broad sense”. Essentially,
he takes up three phenomena cited by Newmeyer (2003) in favor of a binary dis-
tinction (long-distance dependencies, category-sensitive processes and structure
dependency) and shows that these phenomena have analogues or parallels in
usage. Cameron concludes: “I guess what I am arguing for is a new set of terms,
something other than grammar and usage or competence and performance,
something not binary, something n-nary”.
The contributions of Seiler and Haider go one step further by integrating
central thoughts from other fields of science. Seiler (this volume) engages in
a conceptual discussion on the foundation and the implications of the system-
usage debate. He states that usage-based linguistics has a certain kinship with
functionalist approaches to grammar (“in a formalist view on syntax, syntactic
structure is to some degree immune against usage”), while formalist theories
often embody the idea of an autonomous syntactic system. Yet he points out
that formalist and functionalist approaches to language should not be seen as
antagonistic since they may explain different aspects of language. For Seiler (this
volume), a strict system-usage dichotomy would be ill-fated, just as ill-fated as
the unproductive formalist-functionalist dichotomy in biology:

The fundamental structure of the debates in biology and linguistics is astonishingly similar.
In both disciplines, two schools defended their way of explaining aspects of nature as the
only possible one at their time: proximate vs. evolutionary in biology, formal vs. functional
in linguistics. The main difference between biology and linguistics lies in the fact that the
complementarity (and compatibility) of the two kinds of explanation has been widely
accepted by biologists since the modern evolutionary synthesis some seventy years ago. A
modern linguistic synthesis is still yet to come. For linguists, this is not exactly a reason to
be proud of.

On the surface, Haider’s (this volume) opinion with regard to functionalist and
structuralist (formalist) schools in linguistics seems comparable to Seiler’s, but
Haider is more radical, considering both approaches to be wrong: “The dispute
[between structuralism and functionalism in biology] turned out to be completely
irrelevant after Darwin’s theory of evolution gained ground”. Unlike Seiler,
12 Aria Adli, Marco García García, and Göz Kaufmann

Haider does not expect improvement from the “complementarity (and com-
patibility) of the two kinds of explanation”, i. e., functionalist and structuralist
approaches, thus thwarting a central goal of this volume to a certain degree. He
sees in grammar a cognitive organism which undergoes cognitive evolution and
claims that “the descent of species and the descent of languages encompass the
same abstract mechanism (self-replication, variation, selection) in two different
domains”, a comparison suggesting that linguistics will need a figure on a par
with Darwin in order to advance.
So far, we have dealt with the first question mentioned at the end of section
1.1, namely the question of how usage- and system-based approaches tackle vari-
ation in language. Bringing contributions from disparate theoretical perspectives
together in one volume, however, is also a good opportunity to raise fundamental
methodological issues. The rest of this introduction will, therefore, be dedicated
to the second question mentioned above, namely the relationship between theory
and data.

2 L ooking behind the scenes:


The old problem of theory and data

2.1 Empiricism and methodological issues

One aspect in the discussion of system- and usage-based approaches concerns


empiricism. The central importance of empiricism for any linguistic school is
well expressed by Labov (1975: 7), who writes that “if one linguist cannot per-
suade another that his facts are facts, he can hardly persuade him that his theory
is right, or even show him that he is dealing with the same subject matter”.
Although at first glance it may seem somewhat far-fetched to insinuate that lin-
guists of different orientations do not deal with the same subject matter, a closer
look at one part of Chomsky’s (1965: 3) famous dictum, which has already been
cited above, does not leave much space for discussion: “Linguistic theory is con-
cerned primarily with an ideal speaker-listener, in a completely homogeneous
speech-community”. Section 1 has already stressed the central role of variation
in the discussion about system and usage. The issue of whether one accepts vari-
ation as a core fact of language, whether one locates variation inside or outside
grammar proper, and whether one deems it necessary to abstract away from vari-
ation is present in several of the papers in this volume. We therefore must address
the following questions: What kind of empirical data should linguists use and
which role should these data play in theory-building and theory-testing? These
questions will be approached by discussing an older paper by Labov (1975) and
System and usage: (Never) mind the gap 13

a more recent one by Featherston (2007a) (cf. also Schütze 1996). By focusing
on the empirical base of generative grammar and the critique it spawned, the
positions of both system and usage-based (variationist) approaches will become
clear. The fact that Featherston (2007a) criticizes some of the same things Labov
(1975) criticized 32 years earlier shows that the latter was right in concluding that
there was little room for optimism with regard to a possible approximation of the
different positions: “Ideological positions are too well established, and habits of
work are too firmly set to believe that there will be an immediate convergence of
thinking on these issues” (Labov 1975: 54).
With regard to the question of what kind of empirical data linguists should
use, Featherston (2007a: 271 and 308) considers the frequently applied practice of
using judgments of a single person to be “inadequate”, adding that “[t]here can
be little satisfaction in producing or reading work which so clearly fails to satisfy
scientific and academic standards”. Besides many critical reactions, Featherston
also receives support. Haider (2007: 389) writes:

Generative Grammar is not free of post-modern extravagances that praise an extravagant


idea simply because of its intriguing and novel intricacies as if novelty and extravagance by
itself would guarantee empirical appropriateness. In arts this may suffice, in science it does
not. Contemporary papers too often enjoy a naive verificationist style and seem to com-
pletely waive the need of independent evidence for non-evident assumptions. The rigorous
call for testable and successfully tested independent evidence is likely to disturb many
playful approaches to syntax and guide the field eventually into the direction of a serious
science. At the moment we are at best in a pre-scientific phase of orientation, on the way
from philology to cognitive science.8

Considering that both Featherston and Haider work in the generative frame and
nevertheless take such a critical attitude and considering that some generative
linguists use historical corpora or elicited data for their analyses (Lightfoot 1999;
Kroch 1989; Barbiers 2005, etc.) may raise our hopes, but in general the empir-
ical base of much generative work continues to be of a rather dubious nature.
Newmeyer (2007: 395) still describes the use of a “single person’s judgments”
as “standard practice in the field” not just in generative linguistics, but in cog-
nitive and functional linguistics as well. The problem with a “single person’s
judgments” is that generative linguists work with the judgments of conscious
and – even more importantly – self-conscious human beings and not with uncon-

8 Pullum (2007: 36) discusses the same point: “Looking back at the syntax published a couple of
decades ago makes it rather clear that much of it is going to have to be redone from the ground
up just to reach minimal levels of empirical accuracy. Faced with data flaws of these proportions,
biology journals issue retractions, and researchers are disciplined or dismissed”.
14 Aria Adli, Marco García García, and Göz Kaufmann

scious matter as classical natural sciences like physics or chemistry do. Due to
this, the attempt to separate acceptability from grammaticality, i. e., to filter out
the “noise” of acceptability from the supposedly pure essence of grammaticality,
may not be solvable in principle (cf. Schütze 1996: 25–27 and 48–52; Featherston
2007b: 401–403; Newmeyer 2007: 396–398), regardless of whether one aggregates
judgments of hundreds of disinterested informants or whether one uses the intro-
spection of a definitely not disinterested linguist.
Besides the acceptability-grammaticality issue, there is a whole array of
further problems; for example, the still unclear relationship between speakers’
judgments and speakers’ language production:9 Kempen and Harbusch (2005:
342) analyze sequences of (pronominal) arguments in the midfield of German
clauses and write that “[a]rgument orderings that embody mild violations of the
[linearization] rule, receive medium-range grammaticality scores […] but are vir-
tually absent from the corpora because the grammatical encoding mechanism
in speakers/writers does not (or hardly ever) produce them”. Unfortunately, the
theoretical relevance of such grammaticality-production mismatches is rarely the
focus of research (but cf. Adli this volume). Why do sequences, which are rated as
medium-range grammatical, not occur more often and what exactly does it mean
if a sentence is of medium-range grammaticality? Be this as it may, the mismatch
between medium-range grammaticality (judgment) and lack of occurrence (pro-
duction) does not constitute a fundamental problem for generative grammar. A
more threatening issue, however, is the mismatch of supposed ungrammaticality
(judgment) and occurrence (production). A case in question is the “prohibition
against the deletion of the relative pronouns which are subjects” (Labov 1975:
41–42). In spite of this judgment-based prohibition, this kind of deletion occurred
in fourteen out of 336 possible tokens in a corpus from Philadelphia (4.2 %).
Granted, 4.2 % is not a very high share, but fourteen occurrences is a robust
enough number for linguists to wonder how one could account for the existence
of these tokens. As many rare phenomena will not form part of the idiolects of
linguists, these linguists will simply not (be able to) submit them to their own
grammaticality judgments. Thus, the linguist who refuses to work with perform-
ance data (E-language) is bound to overlook possibly crucial linguistic facts since
many of these rare phenomena are only detected in large corpora (cf. section 2.2

9 Schütze (1996: 48) comments: “Over the history of generative grammar, much has made [sic!]
of its heavy reliance on introspective judgments and their nonequivalence to production and
comprehension”. Featherston (2007a: 271) also mentions production data as a possible source
for studies in the generative framework: “This focus [on judgment data] is in no way intended to
belittle the value of corpus data or make out that this data type is any less relevant”.
System and usage: (Never) mind the gap 15

for a more thorough discussion of rare phenomena). Without analyzing such phe-
nomena, we will not be able to produce a grammar which generates all possible
sentences of a language.
Another problem with regard to the use of grammaticality judgments can be
illustrated by means of the reactions of Grewendorf (2007) and Den Dikken et al.
(2007) to Featherston’s (2007a) article. Grewendorf (2007: 376) notes:

Nevertheless, it may eventually turn out that differences in grammaticality judgments


between a group and an individual linguist cannot be attributed to ‘‘inadequate research
practice’’ of the latter but clearly exhibit differences between I-languages. In this respect,
the grammatical intuitions of the individual cannot be falsified by the results of accepta-
bility experiments carried out with a group.

Even more pointedly, Den Dikken et al. (2007: 343 – Footnote 4) state:

As a side point, we do not see how the mean value of the judgments of a group of speakers
can confirm or disconfirm an individual’s judgments: one’s judgments are one’s judgments,
no matter what other speakers of ‘the same language’ might think.

With regard to this “‘my idiolect’ gambit” (Featherston 2007a: 279; Schütze 1996:
4–5), one can only hope that Haider (2007: 382 – Footnote 1) is not right when he
claims: “More often, the problem [the risk of having to give up a “dearly fostered
hypothesis”] is solved pragmatically. Conflicting evidence is simply ignored or
repressed”.10 In any case, accepting Grewendorf’s and Den Dikken et al.’s con-
victions would constitute the end of science as we know it. This was already seen
by Labov (1975: 14, 26, and 30), who writes that “[t]he study of introspective judg-
ments is thus effectively isolated from any contradiction from competing data.”
He adds that “[u]ntil more solid evidence is provided by those who have no theo-
retical stake in the matter, the most reasonable position is to assume that such
dialects [idiosyncratic dialects, i. e. idiolects] do not exist” and that “the uncon-
trolled intuitions of linguists must be looked on with grave suspicion”. What is at
stake here is not Labov’s question of whether idiolects exist or not – they proba-
bly do; what is at stake is the lack of control in the ‘my idiolect’ gambit and the
conflict of interest of researchers who base their theories on their evaluation of
sentences constructed by them. The fact that Grewendorf and Den Dikken et al.

10 Pullum (2007: 38) adds another possible technique for saving a “dearly fostered hypothesis”:
“In syntax, if you want some sequence of words to be grammatical (because it would back up
your hypothesis), the temptation is to just cite it as good, and probably you won’t be challenged.
If you are challenged, just say it’s good for you, but other dialects may differ”.
16 Aria Adli, Marco García García, and Göz Kaufmann

still mention positions which were convincingly rejected thirty years ago is telling
proof of the existence of what Labov calls an “ideological position”.
In spite of these problems, one must not forget the unprecedented progress
our understanding of language has achieved thanks to generative grammar. Fan-
selow (2007: 353) rightly emphasizes that “it [generative syntax] has broadened
the data base for syntactic research in a very profound way”. But while one’s own
intuitions may have been sufficient in the beginning of generative linguistics and
may still be sufficient for basic syntactic phenomena (cf. Labov’s 1975: 14 and 27
discussion of Chomsky’s clear cases), the ‘my idiolect’ gambit cannot be applied
to rare or controversial phenomena. One’s own intuitions simply do not fit a field
aiming to overcome a “pre-scientific phase of orientation” (cf. Haider’s 2007:
389 quote above). This does not mean that Grewendorf’s (2007: 373) grammar is
deviant “because [his] judgments do not correspond to the judgments of Feather-
ston’s group”, it just means that nobody should devise a theory based exclusively
on his/her own judgments; i. e., like Pullum (2007: 38) we should stop thinking
that the “how-does-it-sound-to-you-today method can continue to be regarded as
a respectable data-gathering technique”.
Due to the still widespread lack of interest in the empirical side of their
research, generative grammar has not yet made the methodological and analyti-
cal progress sociolinguistics and variation linguistics have achieved. With regard
to quantification and especially with regard to categorization, Labov’s early
methodology must be regarded as naïve (e. g., reading word lists and recounting
near-death experiences representing formal and informal styles, respectively).
But this objection has to be qualified, since these methods were used to establish
a new field and have been improved dramatically ever since. In contrast, many
generativists still use the same empirical base their colleagues used fifty years
ago, despite the fact that the field has gone through at least four major theoretical
phases (from Standard Theory to Minimalism).
Leaving the question of what kind of empirical data linguists should use, we
will briefly focus on the second point mentioned at the beginning of this section:
the question of which role empirical data should play in theory-testing and theory-
building. It does not come as a surprise that many generativists do not foster a
balanced view of theory-oriented and data-driven approaches, i. e., we are still far
away from Featherston’s (2007b: 408) conviction that “data and theory are indeed
in a mutually dependent relationship, both affecting the credibility of the other”.
Even for Barbiers (2005: 258), a generative linguist with much experience in the
analysis of elicited judgment/production data, sociolinguistics seems secondary
to generative linguistics: “Finally, it was argued that there are certain patterns
in individual and geographic variation about which generative linguistics has
nothing to say. That is where sociolinguistics comes in”. Surprisingly, Featherston
System and usage: (Never) mind the gap 17

(2007a: 310 and 314) himself also shows a rather biased opinion with regard to
this interplay putting data first: The most criticized stances in his paper are that
“[l]inguists need to look at the data first and develop their models afterwards […]”
and that “[d]ata is a pre-condition for theory, and the quality of a theory can never
exceed the quality of the data set which it is based on”. One can be sure that few
linguists, let alone generative linguists, would subscribe to this division of labor
(cf. especially the comments of Grewendorf 2007: 377–379).
In any case, as long as data-driven approaches only come in when formal
approaches fail and as long as elicited data are only seen as a source for checking
hypotheses at best, we cannot take full advantage of their potential. Therefore,
the question is not only whether to use new types of empirical data in system-
based approaches, but how to use them and how to correctly evaluate their use.
The present volume contains analyses of different types of elicited language data,
some of which are analyzed within the framework of system-based approaches:
Cornips (this volume) uses elicited language data, but also judgment tests in the
case of clusters with three verbal elements. Adli (this volume) combines judg-
ment and production data and offers comparisons between these data types for
wh-questions in French. With this, he tries to tackle the grammaticality-produc-
tion mismatch mentioned above. Kato (this volume) analyzes data from child
language acquisition and Kaufmann (this volume) uses translation data. His
informants were asked to translate Spanish, Portuguese and English stimulus
sentences into Mennonite Low German. As with all research methods, there exist
advantages (amount and comparability of the data, cf. Schütze 1996: 2) and dis-
advantages to such an approach (no natural speech, possible priming effects,
cf. for the latter Kaufmann 2005). Torres Cacoullos (this volume) and Rosemeyer
concentrate on structural changes in the verbal domain of Spanish by analyzing
historical texts. As mentioned above, both of them work within a usage-based
framework (unlike Cornips, Adli, Kato and Kaufmann). In this case, too, the
methodological problems are manifold, but Torres Cacoullos and Rosemeyer can
hardly be held responsible for this. Historical data reflect oral speech only to a
certain degree and one can only use the data one has, i. e., one cannot go back
and ask speakers/writers how they rate constructions for which one does not find
evidence in the written record.

2.2 Anti-frequency or the problem of rare data

One especially interesting case with regard to linguistic data are rare phenom-
ena, i. e., generally uncommon but nevertheless robust linguistic facts. Rare phe-
nomena raise some essential empirical and theoretical questions for the study
18 Aria Adli, Marco García García, and Göz Kaufmann

of language, in particular concerning system- and usage-based approaches.


But what exactly are rare phenomena? In our view, it seems reasonable to dis-
tinguish between two different types, namely (i) rare phenomena in a typological
sense and (ii) rare or anti-frequent phenomena in a given language (or language
family). The former meaning is by far the more common one. Following Plank’s
characterization in the introductory notes to his famous Raritätenkabinett,11 a
rare phenomenon (rarum)12 can be defined as

a trait (of any conceivable sort: a form, a relationship between forms, a matching of form
and meaning, a category, a construction, a rule, a constraint, a relationship between rules
or constraints, ...) which is so uncommon across languages as not even to occur in all
members of a single […] family or diffusion area (for short: sprachbund), although it may
occur in a few languages from a few different families or sprachbünde.

For several reasons, the study of rare phenomena remains an important linguistic
task (cf. also Cysouw and Wohlgemuth 2010: 3–4). First of all, it seems obvious
that the consideration of rara will provide an empirically much more detailed
picture of what is (im)possible in the languages of the world. Given that rare phe-
nomena may contradict or even falsify cross-linguistic assumptions and linguis-
tic universals, they may help us to formulate more adequate generalizations. As
a consequence, we may get better linguistic descriptions, which in turn may offer
more adequate explanations.
This also holds true for the other type of rare phenomena, namely those that
are anti-frequent in a given language (or language family). Under this label, we
refer to all kinds of linguistic traits that are very infrequently attested with respect
to other paradigmatic alternatives in the language(s) under consideration. The
rareness of these linguistic traits is, in principle, irrespective of the distribu-
tion and frequency of these traits in other languages, i. e., a phenomenon that
hardly occurs in a given language may or may not be rare in other languages.
Some examples of this kind of rare phenomena are differential object marking
with inanimate objects in Spanish (cf. García García 2014), wh-clefts and other
wh-variants in French (cf. Adli this volume), verb-second order violations in Ger-
manic languages (cf. Cornips this volume) or non-verb-final dependent clauses in
Mennonite Low German (cf. Kaufmann this volume).

11 Das grammatische Raritätenkabinett is an online database comprising at present 147 rare


phenomena (http://typo.uni-konstanz.de/rara/intro/index.php).
12 In addition to rarum, Plank also uses the terms rarissimum and singulare, which refer to even
rarer or uniquely attested traits, respectively. For further definitions and specifications of this
kind of rare phenomena, see Cysouw and Wohlgemuth (2010: 1–6).
System and usage: (Never) mind the gap 19

Anti-frequent phenomena present a special problem for usage-based


approaches given the leading idea of this framework which “seeks explanations
in terms of the recurrent processes that operate in language use” (cf. Bybee
2010: 13). Since frequency plays a decisive role in usage-based approaches (for
instance, for the conventionalization of a linguistic structure), one wonders how
scarcely attested phenomena can be(come) grammatical. Of course, this type of
rare phenomena is also problematical for formal approaches to language, at least
as long as the phenomenon in question cannot be derived from the interaction of
other, more frequent phenomena (cf. Kaufmann this volume).
This volume presents three studies that touch on these questions. All of them
pertain to phenomena that are rare in a given language (or language group). The
first one concerns violations of V2 in Germanic languages. This is an especially
intriguing case because the rare variant in question (non-V2) is the marked variant
within most Germanic languages, but it is the common variant from a global typo-
logical point of view (V2 is rarum number 79 in the Raritätenkabinett). Cornips
(this volume) describes these violations as the consequence of both system- and
society-based factors. About the latter she writes that “these ‘violations of V2’, or
to be more precise, Adv-S-Vfin instances [adverb-subject-finite verb] occur only in
peer conversations”. This restriction shows that an exclusively formal, system-
based argumentation cannot explain the existence of non-V2-main clauses in Ger-
manic languages since probably all speakers would deem such clauses as being
outright ungrammatical (perhaps even the very persons who use them). Thus,
we are faced with a synchronic mismatch between grammaticality and accepta-
bility. This mismatch may be “resolved” by the speech community provided the
hitherto rare phenomenon occurs in a sufficiently robust number (cf. Lightfoot
1999: 156) and provided it starts occurring outside of peer conversations. Current
acceptability in peer conversation may thus eventually turn into grammaticality.
This type of language change cannot be explained (or predicted) by systemic con-
siderations because at a certain moment in time, the phenomenon in question is
ungrammatical. However, as such rare phenomena are detected by meticulous
data analysis, they have to be accounted for.
The second example of a rare phenomenon can be found in Kaufmann (this
volume). Kaufmann deals with a likewise marked syntactic variant, namely the
occurrence of dependent clauses in Mennonite Low German, where the only
verbal element surfaces before its internal complement. In German varieties,
which are all SOV, one would expect the finite verb to surface after its internal
complement in a dependent clause. Kaufmann shows that the tokens of the rare
phenomenon can be explained as an analogical extension of the informants’ der-
ivational preferences with regard to verb projection raising and scrambling; the
two syntactic mechanisms he claims are responsible for different cluster variants
20 Aria Adli, Marco García García, and Göz Kaufmann

in dependent clauses with two verbal elements. The problem in analyzing the
phenomenon in question is therefore not the lack of formal explanations, but the
necessity of finding the grammatical mechanisms whose interaction causes the
rare phenomenon.
The third case involving rare phenomena is presented in Adli (this volume).
It deals with the variation of wh-constructions attested in Modern French, where
nine different types of wh-variants can be distinguished, among them the wh-in-
situ construction (e. g. Tu fais le dessin quand ? ‘When do you do the drawing?’),
the whVS construction (e. g. Quand fais-tu le dessin ?) or the wh-cleft construction
(e. g. quand c’est que tu fais le dessin ?). As already mentioned, Adli’s study draws
on production data as well as gradient acceptability judgments (both types of lin-
guistic evidence were provided by the same set of individuals). The results show
that some variants, such as the wh-in-situ construction, are very frequent, while
others, such as the whVS construction or the wh-cleft construction, are only rarely
attested or do not occur at all. Yet all of these variants were rated as acceptable.
What is more, the variants belonging to a rather formal register were evaluated as
being more acceptable than those pertaining to a colloquial register. For example,
the formal whVS construction received the highest acceptability scores. However,
this preference in acceptability is not reflected in the production data. Thus, Adli’s
study reveals an interesting mismatch between usage and speaker judgments,
showing that some variants hardly occur although they are rated as acceptable.
Adli suggests that this frequency-acceptability mismatch is at least partly due
to register: “While frequency data from spontaneous speech […] provide insight
into colloquial language, acceptability data reflect the entire range of registers
available to a speaker”. Moreover, he concludes that acceptability judgments are
influenced by normative pressure, especially in the case of French.
Comparing the findings of rare phenomena presented in Cornips (this
volume), i. e., V2-violation in Germanic languages and the rare wh-variants
studied in Adli (this volume), one sees that both are socially dependent. However,
there exists an obvious difference. While the V2 violation is the result of a collo-
quial innovation process that is at present confined to peer conversations, some
of the scarcely produced wh-variants (e. g. the whVS construction) seem to be the
result of a socially determined conservation process.

3 Structure of the volume

The volume is divided into three parts. The first part, entitled “System, usage, and
variation”, opens with two central and complementary points of view: Frederick
Newmeyer’s “Language variation and the autonomy of grammar” and Gregory
System and usage: (Never) mind the gap 21

Guy’s “The grammar of use and the use of grammar”. Newmeyer discusses the
question of whether language variation calls into question the hypothesis of the
autonomy of grammar. On the basis of a modular approach to variation, he argues
that variation and probabilities do not pertain to linguistic knowledge proper.
Rather, they should be viewed as the result of the interaction between gram-
matical competence and extragrammatical factors such as processing pressure
or social factors. As already mentioned above, Newmeyer proposes that this inter-
action is modulated by the user’s manual – that is, conceived of as an interface
between grammatical competence and extragrammatical factors.
Guy takes a different stance on the relation between grammar and variation
and argues that language is “a uniquely social phenomenon”. For him, linguistic
knowledge is exclusively derived from usage and interaction with other users.
Accordingly, the core linguistic knowledge is assumed to include knowledge
about variation, probabilities and social factors. This does not mean that it is
devoid of abstract operations and mental representations. However, abstract
operations and mental representations are inferred from usage and are thus
inherently probabilistic and variable in nature.
The following two contributions are the papers of Richard Cameron and
Mary Kato. Cameron’s article “Looking for structure-dependence, category-sen-
sitive processes, and long-distance dependencies in usage” deals with specific
problems of the system-usage dichotomy, while Mary Kato’s paper “Variation in
syntax: Two case studies on Brazilian Portuguese” introduces the dimension of
variation as the central topic.
The dimension of variation is most strongly concentrated in the second part
of this volume, entitled “Rare phenomena and variation”. All contributions in
this part have a strong empirical orientation and take into account rare phenom-
ena. Göz Kaufmann writes about “Rare phenomena revealing basic syntactic
mechanisms: The case of unexpected verb-object sequences in Mennonite Low
German.” He stresses the importance of a thorough analysis of the possible inter-
play of seemingly unrelated syntactic mechanisms. Leonie Cornips’ paper “The
no man’s land between syntax and variationist sociolinguistics: The case of idio-
lectal variability” deals with the central question of intra-speaker variation which
she exemplifies by means of four case studies. Finally, Aria Adli’s contribution
“What you like is not what you do: Acceptability and frequency in syntactic vari-
ation” is especially important in view of the topics dealt with in section 2.1, where
advantages and disadvantages of different types of empirical data are discussed.
All papers in the third part of this volume, entitled “Grammar, evolution, and
diachrony”, deal with the dimension of time. The papers “Gradual loss of analyz-
ability: Diachronic priming effects” by Rena Torres Cacoullos and “How usage
rescues the system: Persistence as conservation” by Malte Rosemeyer analyze
22 Aria Adli, Marco García García, and Göz Kaufmann

the causes of historic change in the Spanish progressive and Spanish auxiliary
selection, respectively. Longer time periods and more abstract questions are the
focus of Hubert Haider and Guido Seiler. Both their contributions, “‘Intelli-
gent design’ of grammars – a result of cognitive evolution” and “Syntactization,
analogy and the distinction between proximate and evolutionary causations”
apply the biological concept of evolution to language and present hints of how
the gap between functional and formal approaches may be narrowed.

References
Adger, David (2006): Combinatorial variability. Journal of Linguistics 42: 503–530.
Adger, David and Jennifer Smith (2010): Variation in agreement: A lexical feature-based
approach. Lingua 120(5): 1109–1134.
Adli, Aria (2013): Syntactic variation in French wh-questions: a quantitative study from the
angle of Bourdieu’s sociocultural theory. Linguistics 51(3): 473–515.
Barbiers, Sjef (2005): Word order variation in three-verb clusters and the division of labour
between generative linguistics and sociolinguistics. In: Leonie Cornips and Karen
P. Corrigan (eds.), Syntax and Variation: Reconciling the Biological and the Social,
233–264. Amsterdam: John Benjamins.
Barbiers, Sjef (2013): Where is syntactic variation? In: Peter Auer, Javier Caro Reina and Göz
Kaufmann (eds.), Language Variation – European Perspective IV: Selected Papers from
the Sixth International Conference on Language Variation in Europe (ICLaVE 6), 1–26.
Amsterdam/Philadelphia: John Benjamins.
Behrens, Heike, Stefan Pfänder, Peter Auer, Daniel Jacob, Rolf Kailuweit, Lars Konieczny, Bernd
Kortmann, Christian Mair and Gerhard Strube (to appear): Introduction. In: Heike Behrens
and Stefan Pfänder (eds.), Experience counts: Frequency effects in language, Berlin/
Boston: Mouton de Gruyter.
Belletti, Adriana and Luigi Rizzi (2002): Editors’ introduction: some concepts and issues in
linguistic theory. In: Adriana Belletti and Luigi Rizzi (eds.), Noam Chomsky: On Nature and
Language, 1–44. Cambridge: Cambridge University Press.
Bender, Emily M. (2007): Socially meaningful syntactic variation in sign-based grammar.
English Language and Linguistics 11(2): Special Issue on Variation in English Dialect
Syntax: Theoretical Perspectives, 347–381.
Biberauer, Theresa and Marc Richards (2006): True optionality: When the grammar doesn’t
mind. In: Cedric Boeckx (ed.), Minimalist Essays, 35–67. Amsterdam: John Benjamins.
Bybee, Joan L. (2002): Sequentiality as the basis of constituent structure. In: Talmy Givón
and Bertram F. Malle (eds.), The Evolution of Language out of Pre-­language, 109–132.
Amsterdam & Philadelphia: John Benjamins.
Bybee, Joan L. (2006): From Usage to Grammar: The Mind’s Response to Repetition. Language
82(4): 711–733.
Bybee, Joan L. (2010): Language, Usage and Cognition. Cambridge: Cambridge University
Press.
System and usage: (Never) mind the gap 23

Bybee, Joan L. and Paul J. Hopper (2001): Introduction to frequency and the emergence
of linguistic structure. In: Joan L. Bybee and Paul J. Hopper (eds.), Frequency and the
Emergence of Linguistic Structure, 1–24. Amsterdam: John Benjamins.
Chomsky, Noam (1965): Aspects of the Theory of Syntax. Cambridge: MIT Press.
Chomsky, Noam (1981): Lectures on Government and Binding. Dordrecht: Foris.
Chomsky, Noam (2000): New Horizons in the Study of Language and Mind. Cambridge:
Cambridge University Press.
Chomsky, Noam (2009): Opening remarks. In: Massimo Piattelli-Palmarini, Juan Uriagereka
and Pello Salaburu (eds.), Of Minds and Language: A Dialogue with Noam Chomsky in the
Basque Country, 13–43. Oxford: Oxford University Press.
Culy, Christopher (1996): Null objects in English recipes. Language Variation and Change 8(01):
91–124.
Cysouw, Michael and Jan Wohlgemuth (2010): The other end of universals: theory and typology
of rara. In: Jan Wohlgemuth and Michael Cysouw (eds.), Rethinking Universals, 1–10.
Berlin/New York: Mouton de Gruyter.
Den Dikken, Marcel, Judy B. Bernstein, Christina Tortora and Raffaella Zanuttini (2007): Data
and grammar: Means and individuals. Theoretical Linguistics 33(3): 335–352.
Eckert, Penelope (2000): Linguistic Variation as Social Practice. Malden, MA/Oxford: Blackwell.
Fanselow, Gisbert (2007): Carrots perfect as vegetables, but please not as a main dish.
Theoretical Linguistics 33(3): 353–367.
Featherston, Sam (2007a): Data in generative grammar: the stick and the carrot. Theoretical
Linguistics 33(3): 269–318.
Featherston, Sam (2007b): Reply. Theoretical Linguistics 33(3): 401–413.
Fukui, Naoki (1993): Parameters and optionality. Linguistic Inquiry 24: 399–420.
García García, Marco (2014): Differentielle Objektmarkierung bei unbelebten Objekten im
Spanischen. Berlin/Boston: de Gruyter.
Geeraerts, Dirk, Gitte Kristiansen and Yves Peirsman (eds.) (2010): Advances in cognitive
sociolinguistics. Berlin/New York: Mouton de Gruyter.
Grewendorf, Günther (2007): Empirical evidence and theoretical reasoning in generative
grammar. Theoretical Linguistics 33(3): 369–380.
Guy, Gregory R. (1991): Explanation in variable phonology: an exponential model of morpho-
logical constraints. Language Variation and Change 3(1): 1–22.
Guy, Gregory R. (2005): Grammar and usage: A variationist response (Letters to Language).
Language 81(3): 561–563.
Guy, Gregory R. (2007): Grammar and usage: The discussion continues (Letters to Language).
Language 83(1): 2–4.
Haider, Hubert (2007): As a matter of facts comments on Featherston’s sticks and carrots.
Theoretical Linguistics 33(3): 381–394.
Haider, Hubert and Inger Rosengren (2003): Scrambling: nontriggered chain formation in OV
languages. Journal of Germanic Linguistics 15(3): 203–267.
Henry, Alison (2002): Variation and syntactic theory. In: Jack K. Chambers, Peter Trudgill and
Natalie Schilling-Estes (eds.), The Handbook of Language Variation and Change, 267–282.
Oxford: Blackwell.
Holmberg, Anders, Aarti Nayudu and Michelle Sheehan (2009): Three partial null-subject
languages: a comparison of Brazilian Portuguese, Finnish and Marathi. Studia Linguistica
(Special Issue: Partial Pro-drop) 63(1): 59–97.
24 Aria Adli, Marco García García, and Göz Kaufmann

Hopper, Paul (1998): Emergent grammar. In: Michael Tomasello (ed.), The New Psychology of
Language: Cognitive and Functional Approaches to Language Structure, 155–175. Mahwah:
Lawrence Erlbaum.
Kato, Mary A. (2011): Acquisition in the context of language change: the case of Brazilian
Portuguese null subjects. In: Esther Rinke and Tanja Kupisch (eds.), The Development of
Grammar: Language Acquisition and Diachronic Change – In Honour of Jürgen M. Meisel,
309–330. Amsterdam/New York: John Benjamins.
Kaufmann, Göz (2005): Der eigensinnige Informant: Ärgernis bei der Datenerhebung oder
Chance zum analytischen Mehrwert? In: Friedrich Lenz and Stefan Schierholz (eds.),
Corpuslinguistik in Lexik und Grammatik, 61–95. Tübingen: Stauffenberg.
Kaufmann, Göz (2007): The verb cluster in Mennonite Low German: A new approach to an old
topic. Linguistische Berichte 210: 147–207.
Kempen, Gerard and Karin Harbusch (2005): The relationship between grammaticality ratings
and corpus frequencies: a case study into word order variability in the midfield of German
clauses. In: Stephan Kepser and Marga Reis (eds.), Linguistic Evidence: Empirical,
Theoretical and Computational Perspectives, 329–349. Berlin/New York: Mouton de
Gruyter.
Kristiansen, Gitte and René Dirven (eds.) (2008): Cognitive Sociolinguistics. Language
Variation, Cultural Models, Social Systems. Berlin/New York: Mouton de Gruyter.
Kroch, Anthony (1989): Reflexes of grammar in patterns of language change. Language
Variation and Change 1: 199–244.
Labov, William (1969): Contraction, deletion, and inherent variability of the English copula.
Language 45(4): 716–762.
Labov, William (1972): Sociolinguistic Patterns. Philadelphia: University of Pennsylvania Press.
Labov, William (1975): What is a Linguistic Fact. Lisse: The Peter de Ridder Press.
Langacker, Ronald W. (1987): Foundations of Cognitive Grammar, Vol. 1: Theoretical prereq-
uisites. Palo Alto: Stanford University Press.
Lightfoot, David (1999): The Development of Language: Acquisition, Change, and Evolution.
Oxford/Malden, MA: Blackwell.
Lightfoot, David (2006): How new languages emerge. Cambridge: Cambridge University Press.
Meisel, Jürgen M. (2011): Bilingual language acquisition and theories of diachronic change:
Bilingualism as cause and effect of grammatical change. Bilingualism Language and
Cognition 14(2): 121–145.
Newmeyer, Frederick J. (2003): Grammar is grammar and usage is usage. Language 79(4):
682–707.
Newmeyer, Frederick J. (2006): Grammar and usage: A response to Gregory R. Guy (Letters to
Language). Language 82(4): 705–708.
Newmeyer, Frederick J. (2007): Commentary on Sam Featherston, Data in generative grammar:
The stick and the carrot. Theoretical Linguistics 33(3): 395–399.
Otheguy, Ricardo, Ana Celia Zentella and David Livert (2007): Language and dialect contact in
Spanish in New York. Language 83: 770–802.
Pullum, Geoffrey K. (2007): Ungrammaticality, rarity, and corpus use. Corpus Linguistics &
Linguistic Theory 3(1): 33–47.
Roeper, Thomas (1999): Universal bilingualism. Bilingualism: Language and Cognition 2(3):
169–186.
Saito, Mamoru and Naoki Fukui (1998): Order in phrase structure and movement. Linguistic
Inquiry 29(3): 439–474.
System and usage: (Never) mind the gap 25

Sankoff, David (1988): Sociolinguistics and syntactic variation. In: Frederick J. Newmeyer (ed.),
Linguistics: the Cambridge Survey. Vol IV: The Socio-cultural Context, 140–161. Cambridge:
Cambridge University Press.
Sankoff, Gillian (2005): Cross-sectional and longitudinal studies. In: Ulrich Ammon, Norbert
Dittmar, Klaus J. Mattheier and Peter Trudgill (eds.), An International Handbook of the
Science of Language and Society, Volume 2, 2, 1003–1013. Berlin/New York: Mouton de
Gruyter.
Sankoff, Gillian and Hélèn Blondeau (2007): Language change across the lifespan: /r/ in
Montreal French. Language 83(3): 560–588.
Schütze, Carson T. (1996): The Empirical Base of Linguistics: Grammaticality Judgments and
Linguistic Methodology. Chicago: University of Chicago Press.
Szmrecsanyi, Benedikt (2005): Language users as creatures of habit: A corpus-based analysis
of persistence in spoken English. Corpus Linguistics and Linguistic Theory 1(1): 113–150.
Thom, René (1980): Modèles mathématiques de la morphogenèse. Paris: C. Bourgois.
Tortora, Christina and Marcel den Dikken (2010): Subject agreement variation: Support for the
configurational approach. Lingua: 1089–1108.
Weinreich, Uriel, William Labov and Marvin I. Herzog (1968): Empirical foundations for a theory
of language change. In: W. P. Lehmann and Yakov Malkiel (eds.), Directions for Historical
Linguistics: A Symposium, 95–195. Austin, TX: University of Texas Press.
Wilson, John and Alison Henry (1998): Parameter setting within a socially realistic linguistics.
Language in Society 27: 1–21.
Yang, Charles D. (2000): Internal and external forces in language change. Language Variation
and Change 12(3): 231–250.
Zwicky, Arnold M. (1999): The grammar and the user’s manual. Paper presented at ‘LSA’s
Linguistic Institute (Forum Lecture)’, University of Illinois-Champaign & Urbana.
Part 1: System, usage, and variation
Frederick J. Newmeyer, University of Washington, University of
British Columbia and Simon Fraser University Canada

Language variation and the


autonomy of grammar1

Abstract: This paper takes on the question of whether the facts of language vari-
ation call into question the hypothesis of the autonomy of grammar. A significant
number of sociolinguists and advocates of stochastic approaches to grammar
feel that such is the case. However, it will be argued that there is no incompati-
bility between grammatical autonomy and observed generalizations concerning
variation.

1 Introduction

The point of departure of this paper is a set of propositions which, while not
universally accepted among linguists, have at least a wide and ever-increasing
currency. They are, first, that a comprehensive theory of language has to account
for variation (Weinreich, Labov and Herzog 1968 and much subsequent work);
second, that much of everyday variability in speech is systematic, showing both
social and linguistic regularities (Labov 1969 and much subsequent work); third,
that language users are highly sensitive to frequencies, a fact that has left its mark
on the design of grammars (Hooper 1976 and much subsequent work); and fourth,
that an overreliance on introspective data is fraught with dangers (Derwing 1973
and much subsequent work). The question to be probed is whether, given these
propositions, one can reasonably hypothesize that grammar is autonomous with
respect to use. The paper is organized as follows. Section 2 introduces the concept
of the ‘autonomy of grammar’, along with some theoretical and methodological
considerations relevant to its understanding. The central section 3 examines and
attempts to refute recent claims that the facts surrounding language variation
show that autonomy is untenable. Section 4 is a brief conclusion.

1 I would like to thank Marco García García and Hubert Haider for their comments on the entire
pre-final manuscript. Thanks also to Ralph Fasold, David Odden, and Panayiotis Pappas for
fruitful discussion on the topic of this paper. They are not to be held responsible for any errors.
30 Frederick J. Newmeyer

2 The autonomy of grammar

I characterize the Autonomy of Grammar as follows:

(1) The Autonomy of Grammar (AG):


A speaker’s knowledge of language includes a structural system composed of formal princi-
ples relating sound and meaning. These principles, and the elements to which they apply,
are discrete entities. This structural system can be affected over time by the probabilities of
occurrence of particular grammatical forms and by other aspects of language use. However,
the system itself does not directly represent probabilities or other aspects of language use.

Put informally, grammars do not contain numbers. My conclusion will be that


even adopting the points of departure above, AG is a motivated hypothesis.
Three important methodological considerations will be assumed throughout
the paper:

(2) Three considerations of methodology:


a. It is incorrect to attribute to grammar per se what is adequately explained by extra-
grammatical principles.
b. Given that grammars are models of speaker knowledge, facts that a speaker cannot
reasonably be expected to know should not be attributed to grammar.
c. Knowledge of the nature of some grammatical construct is not the same kind of knowl-
edge as that of how often that grammatical construct is called upon in language use.

I begin with some fairly obvious and somewhat trivial examples of these points
and then turn to more complex cases. As far as (2a) is concerned, nobody would
suggest that speakers with a serious head cold should be endowed with a separate
grammar, even though their vowels are consistently more nasalized than those of
the healthier members of their speech community. Appeal to the partial blockage
of the passages involved in speech production suffices to explain the phenome-
non. Many different types of generalizations fall under (2b). For example, speakers
might know that adjectives like asleep, awake, and ajar are different from most
other adjectives in that they do not occur prenominally. But they can hardly know
that the reason for their aberrant behavior derives from the fact that these adjec-
tives were historically grammaticalizations of prepositional phrases (awake was
originally at wake). A child in acquiring his or her language does not learn the
history of that language. Along the same lines, children acquiring German learn
the principles involved in V2 order and those acquiring English learn to produce
the retroflex ‘r’ sound. But neither learn that these elements of their languages
are typologically quite rare. Likewise, speakers cannot be assumed to know epi-
phenomenal facts, that is, properties of their language that are the byproduct of
other properties (which may or may not be part of knowledge). Speakers know
Language variation and the autonomy of grammar 31

principles of Universal Grammar and they know (implicitly) that such-and-such


a sentence is ungrammatical. But they do not know that the ungrammaticality
results in part from a particular principle. It takes complex scientific reasoning to
arrive at such a conclusion. In other words, not everything that a linguist knows
is necessarily known by the speaker. As two linguists whose attention to data is
legendary put it:

Not every regularity in the use of language is a matter of grammar. (Zwicky and Pullum 1987:
330, cited in Yang 2008: 219)

Finally, to exemplify (2c), I know the meaning of the definite article the, its privi-
leges of occurrence, and its pronunciation. I also happen to know that I am more
likely to use that word than any other word of English. These however are dif-
ferent ‘kinds of knowledge’. I learned the former as an automatic consequence
of acquiring competence in English. The latter is a metalinguistic fact that arose
from conscious observation and speculation about my language.
So given these considerations, how can we know what to include in models
of grammatical competence and what to exclude from it? In particular, given
the theme of this volume, how can we know to what extent (if at all) variabil-
ity is encoded in the grammar itself? As it turns out, classical formal grammar
has nothing to say about probabilistic aspects of grammatical processes, except
to hypothesize that where we find variability we have ‘optional’ grammatical
rules. For example, in Chomsky (1957) active and passive pairs were related by
an optional transformational rule. No attempt was made to capture as part of the
rule the fact that actives are used more frequently than passives and are used in
different discourse circumstances. In fact, an approach to grammar excluding the
direct representation of probabilities might be the best one to take if it could be
shown, in line with (2a–c), that the probabilities in question are a different sort
of knowledge from grammatical knowledge or are not in any reasonable sense
‘knowledge’ at all.
So the crucial question is to what extent speakers actually ‘know’ the prob-
abilities associated with points of variation and if they do know them, then what
kind of knowledge that is. One alternative to their knowing probabilities might
be that quantitative aspects of speaker behavior are no more than a reflection of
principles that, in their interaction, lead them to act in a certain way a certain per-
centage of the time. Let me give an example of an epiphenomenal consequence
of interacting principles that is drawn from everyday life. My place of work is four
blocks north of where I live and four blocks west. I could construct a ‘grammar
of my walk to work’ to characterize my procedure for proceeding from my home
to my office. Each intersection that I cross has a traffic light. If the first light that
I come to is green, I continue straight on to the north. If the light is red, I turn
32 Frederick J. Newmeyer

to the left (to the west). I continue this procedure at each intersection up to the
point where I don’t overshoot my mark to the west or to the north. This leads
to the possibility of more than a dozen different routes for getting to work. But
in practice they are not equally frequent, because traffic lights differ from each
other in the percentage of time that they are green or red. It would not be hard,
if I wanted to do it, to calculate the percentage of time that I take each route. But
I do not ‘know’ these percentages, in any reasonable sense of the word ‘know’. I
certainly have a vague feeling that I take some routes more than others. But the
percentages are not encoded in my ‘grammar of walking to work’. They fall out as
an epiphenomenal by-product of the interaction of my grammar of walking and
the timing of traffic lights.
I think that at this point the reader can see where I am heading, as far as
probabilistic generalizations in linguistics are concerned. To the extent that var-
iability is predictable externally, it does not need to be encoded in the grammar.
But before turning to linguistic examples, the reader might well ask about my
walking to work story: ‘Is the percentage of the time that I take any particular
route to work completely predicted by reference only to my grammar of walking
and to the timing of traffic lights?’ The answer is ‘no’, and the reason for that neg-
ative answer is quite relevant to how we should handle linguistic data that are not
fully predictable. We return to this problem below.
Turning to language, a huge number of facts that one might be tempted to put
in the grammar no more belong there than probabilities belong in my grammar of
walking to work. So consider a pair of sentences from Manning (2002):

(3) a. It is unlikely that the company will be able to meet this year’s revenue forecasts.
b. That the company will be able to meet this year’s revenue forecasts is unlikely.

Manning points out that we are far more likely to say (3a) than (3b) and suggests
that this likelihood forms part of our knowledge of grammar. No it does not. It is
part of our use of language that, for both processing and stylistic reasons, speakers
tend to avoid sentences with heavy subjects (see Hawkins 1994; 2004). As a con-
sequence, one is more likely to say things like (3a) than (3b). It would be super-
fluous to repeat in the grammar what is adequately accounted for outside of it.
The probability of using some grammatical element might arise as much from
real-world knowledge and behavior as from parsing ease. For example, Wasow
(2002) notes that we are much more likely to use the verb walk intransitively than
transitively, as in (4a–b):

(4) a. Sandy walked (to the store). [frequent intransitive usage]


b. Sandy walked the dog. [infrequent transitive usage]
Language variation and the autonomy of grammar 33

He takes that fact as evidence that stochastic information needs to be associated


with subcategorization frames. But to explain the greater frequency of sentence
types like (4a) than (4b), it suffices to observe that walking oneself is a more
common activity than walking some other creature. It is not a fact about grammar.
What I offer then is a classically modular approach to variation, that is, an
approach which posits an autonomous grammatical module at its core. The
observed complexities of language result from the interaction of core competence
with other systems involved in language. Figure 1 illustrates:

Figure 1: A modular approach to variation2

In this view, the observed probability of a particular variant results from the inter-
action of the formal grammar and extragrammatical faculties, as modulated by
the user’s manual.
What then is the nature and function of the user’s manual? This construct
presents one face to grammatical competence and one face to the external factors
that shape grammar. In a nutshell, it tells us what to do with our grammars and
how often to do it. For example, it might tell an English speaker to avoid stranding
prepositions in formal writing, to extrapose heavy subjects, and to avoid certain
vulgar expressions in polite conversation. These usage conventions are not totally
arbitrary, of course. They are shaped by stylistic level, processing pressure, and

2 A referee poses the question of where information structure fits into this picture. While space
does not permit a comprehensive reply, in my view, generalizations pertaining to the grammar-
discourse interface are partly subsumed under ‘grammatical competence’, partly handled in the
user’s manual, and are partly shaped by external factors such as the exigencies of constructing
a coherent discourse.
34 Frederick J. Newmeyer

social factors respectively. But they are not totally predictable either. Clearly lan-
guage varieties differ in the degree to which one factor predominates in a partic-
ular situation. So there is a certain degree of arbitrariness in the user’s manual.
Let me give a concrete example of the functioning of the user’s manual. For
several decades there has been a debate on the nature of Subjacency and other
constraints on extraction.3 Two counterposed views have been put forward:

(5) Two views on the nature of Subjacency:


a. It is a universal constraint on (competence) grammars
(Chomsky 1973 and most subsequent formalist work).
b. It is a performance condition resulting from parsing (and other) pressure
(Kuno 1973 and most subsequent functionalist work).

In support of (5a), it is typically pointed out that the effects of Subjacency differ
somewhat from language to language and that there is no one-to-one correspond-
ence, in any particular language, between parsing and other pressure and what is
ruled out by the constraint (see Fodor 1984; Newmeyer 2005). In support of (5b)
it is typically pointed out that, to a very great extent, the effects of Subjacency
do follow from external pressure (Deane 1992; Kluender 1992). Furthermore, the
effects of Subjacency are variable, in that some island effects are stronger than
others. In English, for example, it is more difficult to extract from finite clauses
than from non-finite clauses (see Szabolcsi and den Dikken 1999).
In my view, both the formalists and the functionalists are partly right and
partly wrong. Subjacency is just the sort of phenomenon we would expect to find
localized in the user’s manual. The interaction between external pressure, in this
case largely parsing pressure, and the language-particular grammar, generates
Subjacency effects. But I use the term ‘generates’ advisedly. The interaction of
grammar and performance invites, so to speak, the existence of Subjacency
effects, but it does not fully predict them. The user’s manual, for reasons of his-
torical accident, the vagaries of use, and so on, necessarily accommodates a
certain degree of arbitrariness.

3 Subjacency is a constraint that rules out extractions of elements in particular syntactic con-
figurations. For example, the deviant English sentences *What did you wonder where Bill put?
and *The woman who I believe the claim that Mary talked to are Subjacency violations.
Language variation and the autonomy of grammar 35

3 L anguage variation is compatible with


grammatical autonomy

Let us now turn to some key studies of language variation that might seem to
call into question the correctness of the hypothesis of grammatical autonomy.
Throughout this section, I elaborate on the point, discussed in the previous
section, that not all generalizations about grammatical patterning are necessarily
handled the same way. Some are encoded directly in the grammar and some are
not. Take /-t, -d/ deletion in English, that is, the deletion of a coronal stop in final
clusters:

(6) a. I don’ think so.


b. Then he pass’ me his plate.
c. She tol’ a lie.

The probability of deletion is tied to the nature of the preceding and following
segment, as is partly illustrated in Table 1:

Table 1: Following segment effect on English /-t, -d/ deletion (Guy 1980: 14)

— Following Context,

Obstruent Liquid Glide Vowel


Rate of deletion 1.0 .77 .59 .40
(Varbrul 1 factor weights)

Does that mean we need to complicate the rule of deletion by including this infor-
mation? In other words, do we need a variable rule? Not necessarily, if the knowl-
edge of how often we delete is a different kind of knowledge from whether we are
allowed to delete at all. And it is a different kind of knowledge. Let’s look at that
point in more detail.
Guy (1997) has constructed several arguments with the goal of demonstrating
that the variable weights need to be stated in the rule itself.
For example, he has argued that if the regularities of variability were stated
in some separate performance component, then we would need to state the same
constraint twice, once in the grammar and once in performance. So consider the
fact that final /t/ and /d/ are never pronounced after another /t/ and /d/:

(7) a. *paint#t *raid#d


b. painted raided (with epenthesis)
36 Frederick J. Newmeyer

More generally, as in (8):

(8) The more shared features between the /t/ and /d/ and what precede them, the less likely the
sequence will be realized in actual speech (Guy and Boberg 1994: n. p.).

As Guy notes, these appear to be Obligatory Contour Principle (OCP) effects, and
he remarks:

Now, if the variable data arise not because of the competence OCP constraint, but stem from
a separate performance OCP constraint, it should come as a theoretical surprise, a random
coincidence, that the two are so similar in nature and direction of effect. (Guy 1997: 134)

But there is no ‘competence OCP constraint’, in the sense of there being a principle
of universal grammar called the OCP. As Odden (1986) and others have pointed
out, what are called OCP effects differ wildly from language to language and are
not even present in some languages. What we have is universal articulatory- and
acoustic-based pressure to avoid sequences of segments that are ‘too close’ to
each other. English grammaticalizes this pressure to a certain extent more than
some languages and less than others. This pressure is responsible for the impos-
sibility of forms like (7a). But that has nothing to do with the rule of /t, d/-deletion
per se, much less a variable condition that needs to be imposed on it. In fact, we
never find geminate /t/’s and /d/’s in English. What we have then is an interface
principle of the user’s manual that looks at English phonology in one direction
and looks at external pressure in the other direction and generates the statistical
generalization.
Is this interface principle an automatic exceptionless consequence of the
interaction of English phonology and phonetic pressure? Certainly not, but the
fact that variable data cannot be derived in their entirety from universal principles,
does not mean that they need to be stated ad hoc in the competence grammar. Let
me draw another analogy with the grammar of my walking to work. There are,
in fact, more factors than the timing of traffic lights that affect the probability
of my taking one route more often than another. Some are ‘global’, in that they
would affect anybody following the same strategy. For example, one intersection
might be blocked by construction and therefore likely to be avoided. Some con-
straints are what one might call ‘local’ or ‘personal’. For example, I might opt
more often for a particular route because the view appeals to me. But the fact that
the probabilities are not fully predictable does not mean that one needs to revert
to encoding them in the grammar of walking per se. One derives what one can
with the understanding that there will always be a residue of contributing factors
that are unexplained and perhaps inexplicable.
Language variation and the autonomy of grammar 37

Another argument from Guy (1997) involves the fact that deletion is much
more frequent before /l/ than before /r/. Without deletion, resyllabification
would result in /tl-/ /dl-/ onsets, which, as Guy points out, are lexically impos-
sible in English. But they are not impossible universally, so, he argues, there is
no hope of deriving the facts from articulatory universals. But it is not necessary
to derive them from articulatory universals. They can be derived in part from the
fact, already noted, that English bans /tl-/ and /dl-/ onsets. In other words, the
probability-computing function of the user’s manual has access to the mental
grammar. The user’s manual is also aware that the rule of /-t, -d/ deletion, which
can produce such onsets, is optional. Taking into account these two facts, it
instructs the speaker to delete /t/ and /d/ before the relevant /l/’s with a high
frequency rate. Of course there is a lot more to be said than that. For example, the
user’s manual also has information from the other direction. For phonetically-
based reasons, /tl-/ and /dl-/ onsets are relatively rare. That fact surely influences
the probability of /-t, -d/ deletion in this case, though I am not in a position to
specify precisely how.
When we turn to syntax, it is even easier to pinpoint the problems with purely
grammatical approaches to variability. At least in phonology, we can usually
say with confidence that two variants are just different ways of saying the same
thing. That is much less true in syntax. In an important paper, Beatriz Lavandera
(1978) pointed out that the choice of syntactic variants is determined in part by
the meaning that they convey. Viewed from that angle, assigning probabilities to
rules, structures, or constraints seems especially problematic. The probabilities
may be more a function of one’s intended meaning than of some inherent property
of the linguistic unit itself. No, it is not the case that all syntactic variants differ
in meaning. But the great majority do, if our definition of ‘meaning’ includes the
full range of discourse-pragmatic aspects of interpretation. Let’s take the various
possibilities of post-verbal orderings of elements in English as an example:

(9) Heavy NP Shift


a. The waiter brought the wine we had ordered to the table.
b. The waiter brought to the table the wine we had ordered.

(10) Dative Alternation


a. Chris gave a bowl of Mom’s traditional cranberry sauce to Terry.
b. Chris gave Terry a bowl of Mom’s traditional cranberry sauce.

(11) Verb-Particle
a. Sandy picked the freshly baked apple pie up.
b. Sandy picked up the freshly baked apple pie.
38 Frederick J. Newmeyer

Arnold, Wasow, Losongco and Ginstrom (2000) calculated the probabilities for
speaker choice of one ordering variant or the other and found a complex inter-
action of meaning factors, in particular whether the constituent is new to the dis-
course or not, and processing factors, such as the ‘heaviness’ (that is, the pro­
cessing complexity) of the constituent. In other words, the (a) and (b) variants in
(9–11) can be used to convey different meanings. Hence we have a good example
here of why we would not want to tie variability to particular rules or grammatical
elements. Since the heavy NP shift alternants, the dative alternants, and the verb-
particle alternants do not mean the same thing, the alternant that is chosen in
discourse is a function in part of the meaning that the speaker wishes to convey.
That fact would be obscured by a probabilistic rule relating the variants in ques-
tion. So we see how incorporating variability into a particular rule can mask an
explanation of the underlying generalization.
Another example can be drawn from variable subject-verb agreement in
Brazilian Portuguese (BP; see Guy 2005 for a summary). In that language, sub-
jects can occur both preverbally and postverbally. But interestingly, subject-verb
agreement is disfavored, but not categorically impossible, with postposed sub-
jects. Guy, following a popular, but not universally accepted, analysis, suggests
that subjects of unaccusative verbs are originally VP-internal and have to raise
across the verb to trigger the feature checking that accomplishes agreement. In
his analysis, the variability in agreement is a property of the feature checking
process (and hence purely grammar-internal). I offer as an alternative the idea
that agreement in post-verbal position is completely optional, as far as the com-
petence grammars of BP speakers is concerned. Why is that a better alternative?
As a first point, it needs to be stressed that preverbal and postverbal subjects
do not have the same meaning. In BP, as in all Romance languages, preverbal
and postverbal subjects differ in their discourse properties (Naro and Votre 1999).
These meaning differences would be obscured by a variable rule relating the two
subject positions. But there is more to be said than that. If postverbal subjects
do in fact originate in object position, then we have an independent explanation
for the agreement facts. Verb-object agreement is crosslinguistically significantly
less common than subject-verb agreement (Siewierska and Bakker 1996), a fact
which is rooted ultimately in the greater topicality of subjects vis-à-vis objects
(Corbett 2005). In the approach advocated here, all of the relevant generalizations
can be accommodated. The grammar of BP allows agreement with arguments in
both positions. The user’s manual interfaces discourse-based and functional
factors on the one hand and the grammar on the other hand to derive the statis-
tical generalizations.
As far as syntactic rules and meaning are concerned, there certainly are
processes that have little or no effect on meaning. So it is sometimes claimed
Language variation and the autonomy of grammar 39

that there is no meaning difference between sentences in English where the sub-
ordinate clause is marked by the complementizer that and those where it is not.
(12a–b) are examples:

(12) a. I think that I’ll make a shopping list today.


b. I think I’ll make a shopping list today.

So at first thought, that-deletion might seem to be an appropriate candidate for


a variable rule. I do not think that such would be the most promising way to
proceed. The variants might be identical in meaning strictly speaking, but never-
theless there are a huge number of interacting factors that determine the reten-
tion or omission of that:

(13) The presence or absence of that is affected by (Bolinger 1972; Quirk, Greenbaum, Leech and
Svartvik 1985; Thompson and Mulac 1991; Biber, Johansson, Leech, Conrad and Finegan
1999; Hawkins 2001; Dor 2005; Kaltenböck 2006; Kearns 2007; Dehé and Wichmann 2010):
a. the type and frequency of the matrix verb
b. the type of the main clause subject (pronominal vs. full noun phrase)
c. the choice of matrix clause pronoun
d. the length, type, and reference of the embedded subject
e. the position and function of the embedded clause
f. the voice of the main clause (active vs. passive)
g. ambiguity avoidance
h. the linear adjacency or not of the matrix verb and that
i. the speech register
j. the ‘truth claim’ (Dor 2005) to the proposition of the embedded clause
k. the rhythmic pattern of the utterance

No doubt with sufficient ingenuity one could write a variable rule of that-deletion
sensitive to all of the conditioning factors in (13a–k). But that would be a mistake,
since each of the conditioning factors conditions other processes in English. For
example, consider (13a). The matrix verbs that inhibit that-deletion, for example,
factive verbs like regret, are the same ones that resist infinitival complements and
resist extraction from the complement:

(14) a. I regret *(that) he left.


b. *I regret to have left.
c. *What did he regret that he saw?

Clearly, the generalization is much broader than something expressible by a


probabilistic condition on that-deletion. Such a condition would in fact mask the
crosslinguistic generalization that factive verbs are less malleable, so to speak,
than nonfactive ones (see Givón 1980).
40 Frederick J. Newmeyer

Even when variants have the same meaning, it is clear that they can differ
stylistically. That fact poses more than a small problem for handling variation
grammar-internally. Put simply, it would lead to a different set of probabilities
for each genre, carrying the idea of handling variation grammar-internally to an
unacceptable conclusion. It is sometimes claimed that stylistic variation poses no
problems, since it is said to be quantitatively simple, involving raising or lower-
ing the selection frequency of socially sensitive variables without altering other
grammatical constraints on variant selection (Boersma and Hayes 2001; Guy
2005). In fact, Guy (2005: 562) has written that “it is commonly assumed in VR
analyses that the grammar is unchanged in stylistic variation.” The research on
register does not support such an idea. Biber has shown that there are at least six
‘dimensions’ in which genres interact:

(15) The 6 ‘dimensions’ in which genres interact (Biber 1988):


a. Involved versus informational production
b. Narrative versus non-narrative concerns
c. Explicit versus situation-dependent reference
d. Overt expression of persuasion
e. Abstract versus non-abstract information
f. On-line informational elaboration

Different genres, and the grammatical variability that they manifest, map differ-
ently onto each dimension. Along Dimension (15e), for example, we find differ-
ences within press reportage genres. Passives and other past participial construc-
tions are much more probable in spot news broadcasts than in financial reporting.
We find similar statistical differences between scientific and humanistic writing.
As far as spoken language is concerned, there are systematic differences along
Dimensions (15a), (15c), and (15f) with respect to different types of telephone con-
versations. What all of this shows, and Biber gives many more examples, is that
each speaker of English would need to be endowed with a multitude of different
variable rule-containing grammars if one were serious about handling variation
grammar-internally.
The question that has to be raised is: ‘If variable rules are so well motivated
and have been so successful, then why have people all but stopped formulating
them?’ As long as twenty years ago, Ralph Fasold was writing about ‘The quiet
demise of variable rules’ (Fasold 1991). It is true that there are a lot of people doing
probabilistic approaches to grammar these days. But by and large they have engi-
neering tasks as their ultimate goal. They are not building models of grammatical
knowledge. The models mix speech forms from different speech communities and
styles willy nilly. As one well-known practitioner of this approach has remarked:
‘As far as I’m concerned, if I can Google it, it’s English’ (attributed, perhaps apoc-
Language variation and the autonomy of grammar 41

ryphally, to Christopher Manning). As far as sociolinguistics is concerned, what


one sees in the majority of papers analyzing variable phenomena are tables of
constraints with their associated VARBRUL probabilities and no indication of
where these numbers fit in to an explicit statement of linguistic structure.
Probably one reason that one sees fewer and fewer variable rules is that there
has been an increasing realization that the units of variation do not mesh very
well with the units of analysis arrived at by grammarians in their grammatical
models. This problem has been known since the 1980s. For example, Romaine
(1982) looked at the variation among three possible occupants of the comple-
mentizer position in English:

(16) Possible occupants of the complementizer position in English relative clauses (Romaine
1982):
a. She’s the person who I saw (Wh-phrase)
b. She’s the person that I saw (that-complementizer)
c. She’s the person ___ I saw (φ)

She toyed with the idea of writing a variable rule associating the three options,
but soon realized that formal grammatical analysis does not relate the three
options by means of the same rule. Who is generally regarded as belonging to
the system of fronted wh-elements, while that is a complementizer. So a variable
rule relating the three options would not be a simple matter of adding a set of
probabilities to an existing motivated grammatical rule. Rather, it would involve
adopting a grammatical analysis accepted by few if any grammarians.
One could make the same point about the relationship between sentences
like (3a–b) above, which I repeat here as (17a–b):

(17) a. It is unlikely that the company will be able to meet this year’s revenue forecasts.
b. ?That the company will be able to meet this year’s revenue forecasts is unlikely.

In older versions of transformational grammar, it is true that a rule of Extraposi-


tion derived (17a) from (17b). No doubt such a rule could have been reinterpreted
as a variable rule. But movement approaches to the relationship between these
sentence types are no longer current. It is not clear how the variability could be
encoded into a rule, or what that rule would be.
I find it both very interesting and very puzzling that practically everybody
who has proposed a probabilistic rule has implicitly or explicitly kept social
factors out of the statement of the rule. Such factors are there in principle. Con-
sider David Sankoff’s characterization of what a variable rule is and does:

Whenever a choice among two (or more) discrete alternatives can be perceived as having
been made in the course of linguistic performance, and where this choice may have been
42 Frederick J. Newmeyer

influenced by factors such as features in the phonological environment, the syntactic


context, discursive function of the utterance, topic, style, interactional situation or personal or
sociodemographic characteristics of the speaker or other participants, then it is appropriate
to invoke the statistical notions and methods known to students of linguistic variation as
variable rules. (Fodor 1984; Sankoff 1988: 986; emphasis added)

I can hardly pretend to have mastered the entire body of sociolinguistic literature,
but I am not aware of a paper in which gender, class, identity, and so on have
been incorporated into the statement of a variable rule. In other words, advocates
of variable rules themselves have adopted, to an extent, a modular approach to
linguistic variation. So I am not suggesting anything radical to variationists – just
that they follow through and make their approach a consistently modular one.4

4 Conclusion

To conclude in one sentence, there is no incompatibility between the facts of


language variation and the correctness of the hypothesis of the autonomy of
grammar. Lest there be any doubt on the question, I feel that the discovery of sys-
tematic variability in language is one of the great breakthroughs of 20th century
linguistics and I have said as much in print (Newmeyer 1996). The only issue is
its formal implementation. I hope to have made a convincing case that a treat-
ment of systematic variability centered on a grammatical system interacting with
usage-based facts, but not itself incorporating those facts, is the best-motivated
approach.

References
Anttila, Arto (1997): Deriving variation from grammar. In: Frans Hinskens, Roeland van Hout and
W. Leo Wetzels (eds.), Variation, change, and phonological theory, 35–68. Amsterdam:
John Benjamins.
Anttila, Arto (2002): Variation and phonological theory. In: Jack K. Chambers, Peter Trudgill and
Natalie Schilling-Estes (eds.), Handbook of language variation and change, 206–243.
Oxford: Blackwell.

4 Hubert Haider has observed (personal communication) that “[f]rom a European perspective,
a sociolinguistic concept of variable rules for covering language variation appears to be amus-
ingly naïve. Only in a context like that of the US, without historically grown, easily identifiable,
regional dialects, could such a position be at all tenable.”
Language variation and the autonomy of grammar 43

Arnold, Jennifer E., Thomas Wasow, Anthony Losongco and Ryan Ginstrom (2000): Heaviness
vs. newness: The effects of structural complexity and discourse status on constituent
ordering. Language 76: 28–55.
Biber, Douglas (1988): Variation across speech and writing. Cambridge: Cambridge University
Press.
Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad and Edward Finegan (1999):
Longman grammar of spoken and written English. London: Longman.
Boersma, Paul and Bruce Hayes (2001): Empirical tests of the Gradual Learning Algorithm.
Linguistic Inquiry 32: 45–86.
Bolinger, Dwight (1972): That’s that. The Hague: Mouton.
Chomsky, Noam (1957): Syntactic structures. The Hague: Mouton.
Chomsky, Noam (1973): Conditions on transformations. In: Stephen R. Anderson and Paul
Kiparsky (eds.), A festschrift for Morris Halle, 232–286. New York: Holt Rinehart and
Winston.
Corbett, Greville G. (2005): Number of genders. In: Martin Haspelmath, Matthew S. Dryer, David
Gil and Bernard Comrie (eds.), The world atlas of language structures, 126–29. Oxford:
Oxford University Press.
Deane, Paul D. (1992): Grammar in mind and brain: Explorations in cognitive syntax. Berlin/New
York: Mouton de Gruyter.
Dehé, Nicole and Anne Wichmann (2010): Sentence-initial I think (that) and I believe (that)
Prosodic evidence for use as main clause, comment clause and discourse marker. Studies
in Language 34: 36–74.
Derwing, Bruce L. (1973): Transformational grammar as a theory of language acquisition: A
study in the empirical, conceptual, and methodological foundations of contemporary
linguistic theory. Cambridge: Cambridge University Press.
Dor, Daniel (2005): Toward a semantic account of that-deletion in English. Linguistics 43:
345–382.
Fasold, Ralph (1991): The quiet demise of variable rules. American Speech 66: 3–21.
Fodor, Janet D. (1984): Constraints on gaps: Is the parser a significant influence? In: Brian
Butterworth, Bernard Comrie and Östen Dahl (eds.), Explanations for language universals,
9–34. Berlin/New York: Mouton.
Givón, Talmy (1980): The binding hierarchy and the typology of complements. Studies in
Language 4: 333–377.
Guy, Gregory R. (1980): Variation in the group and the individual: The case of final stop
deletion. In: William Labov (ed.), Locating language in time and space, 1–36. New York:
Academic Press.
Guy, Gregory R. (1997): Competence, performance, and the generative grammar of variation.
In: Frans Hinskens, Roeland van Hout and W. Leo Wetzels (eds.), Variation, change, and
phonological theory, 125–143. Amsterdam: John Benjamins.
Guy, Gregory R. (2005): Grammar and usage: A variationist response. Language 81: 561–563.
Guy, Gregory R. and Charles Boberg (1994): The obligatory contour principle and sociolinguistic
variation. Toronto Working Papers in Linguistics: Proceedings of the Canadian Linguistics
Association 1994 Annual Meeting.
Hawkins, John A. (1994): A performance theory of order and constituency. Cambridge:
Cambridge University Press.
Hawkins, John A. (2001): Why are categories adjacent? Journal of Linguistics 37: 1–34.
44 Frederick J. Newmeyer

Hawkins, John A. (2004): Efficiency and complexity in grammars. Oxford: Oxford University
Press.
Hooper, Joan B. (1976): Word frequency in lexical diffusion and the source of morphophonemic
change. In: William M. Christie (ed.), Current progress in historical linguistics, 95–106.
Amsterdam: North-Holland.
Kaltenböck, Gunther (2006): ‘…That is the question’: Complementizer omission in extraposed
that-clauses. English Language and Linguistics 10: 371–396.
Kearns, Katherine S. (2007): Epistemic verbs and zero complementizer. English Language and
Linguistics 11: 475–505.
Kluender, Robert (1992): Deriving island constraints from principles of predication. In:
H. Goodluck and M. Rochemont (eds.), Island constraints: Theory, acquisition, and
processing, 223–258. Dordrecht: Kluwer.
Kuno, Susumu (1973): Constraints on internal clauses and sentential subjects. Linguistic
Inquiry 4: 363–385.
Labov, William (1969): Contraction, deletion, and inherent variability of the English copula.
Language 45: 716–762.
Lavandera, Beatriz R. (1978): Where does the sociolinguistic variable stop? Language in Society
7: 171–182.
Manning, Christopher, D. (2002): Probabilistic syntax. In: Rens Bod, Jennifer Hay and Stefanie
Jannedy (eds.), Probabilistic linguistics, 289–341. Cambridge, MA: MIT Press.
Naro, Anthony J. and Sebastião J. Votre (1999): Discourse motivations for linguistic regularities:
Verb/subject order in spoken Brazilian Portuguese. Probus 11: 75–100.
Newmeyer, Frederick J. (1986): Linguistic theory in America: Second edition. New York:
Academic Press.
Newmeyer, Frederick J. (1996): Benchmarks: 35 years of linguistics. The Sciences 36: 13.
Newmeyer, Frederick J. (1998): Language form and language function. Cambridge, MA: MIT
Press.
Newmeyer, Frederick J. (2002): Optimality and functionality: A critique of functionally-based
optimality-theoretic syntax. Natural Language and Linguistic Theory 20: 43–80.
Newmeyer, Frederick J. (2005): Possible and probable languages: A generative perspective on
linguistic typology. Oxford: Oxford University Press.
Odden, David (1986): On the role of the Obligatory Contour Principle in phonological theory.
Language 62: 353–383.
Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech and Jan Svartvik (1985): A comprehensive
grammar of the English language. Harlow: Longman.
Romaine, Suzanne (ed.) (1982): Sociolinguistic variation in speech communities. London:
Edward Arnold.
Sankoff, David (1988): Variable rules. In: Ulrich Ammon, Norbert Dittmar and Klaus J. Mattheier
(eds.), Sociolinguistics: An international handbook of the science of language and society,
984–997. Berlin/New York: Walter de Gruyter.
Saussure, Ferdinand de (1916/1966): Course in general linguistics. New York: McGraw-Hill.
[Translation of Cours de linguistique générale. Paris: Payot, 1916].
Siewierska, Anna and Dik Bakker (1996): The distribution of subject and object agreement and
word order type. Studies in Language 20: 115–161.
Szabolcsi, Anna and Marcel den Dikken (1999): Islands. Glot International 4–6: 3–8.
Thompson, Sandra A. and Anthony Mulac (1991): The discourse conditions for the use of the
complementizer that in conversational English. Journal of Pragmatics 15: 237–251.
Language variation and the autonomy of grammar 45

Wasow, Thomas (2002): Postverbal behavior. Stanford, CA: CSLI Publications.


Weinreich, Uriel, William Labov and Marvin I. Herzog (1968): Empirical foundations for a
theory of language change. In: W. Lehmann and Y. Malkiel (eds.), Directions for historical
linguistics, 95–188. Austin, TX: University of Texas Press.
Yang, Charles D. (2008): The great number crunch. Journal of Linguistics 44: 205–228.
Zwicky, Arnold M. and Geoffrey K. Pullum (1987): Plain morphology and expressive morphology.
Berkeley Linguistics Society 13: 330–340.
Gregory R. Guy, New York University
The grammar of use and the use of grammar

Abstract: Language is a uniquely social phenomenon – it is acquired exclusively


through interaction with other users. The use of language is further characterized
by orderly heterogeneity: regular patterns of social and linguistic conditioning.
Since a speaker’s knowledge of language derives from and includes knowledge
of usage, it incorporates knowledge about variability, frequency, and social
significance. An adequate linguistic theory must account for this social, inter-
active nature of language. Dichotomous oppositions between system and usage,
competence and performance, I-language and E-language, etc., are misleading, if
they incline us to favor puristic concepts of the mental grammar as an idealized,
invariant, categorical system. But an entirely usage-based model, devoid of
abstract operations, is also inadequate. Rather, the empirical evidence indicates
that speakers do construct mental representations, abstractions, and operations
to guide their production, but these are probabilistic and variable, rather than
deterministic and discrete. From probabilistically patterned input, speakers infer
inherently variable grammars, from which they generate productions that display
orderly diversity.

1 Language as a social phenomenon

Language is a social phenomenon, seated in society, and acquired by the individ-


ual only through social interaction – i. e., usage. This means that the primary
information people possess about their languages is information about how they
are used in communities by speakers. Such information about social usage pro-
vides the base for learning language, and for using it productively and compe-
tently; ultimately this is what we know when we know a language. Consequently,
knowledge of language necessarily incorporates some level of familiarity with all
the linguistic diversity, complexity, and variability that we encounter in the world
around us. Therefore I will argue that the linguistic systems we construct during
the course of language acquisition and use are necessarily formed by, sensitive
to, and generative of, variability and use. This has the consequence that the tra-
ditional distinctions in linguistics that oppose system vs. usage, competence vs.
performance, etc., are false dichotomies – false in the sense that they are not
very helpful in constructing coherent and adequate theories of language. This
does not mean, however, that there is only usage, or that there is no linguistic
48 Gregory R. Guy

system or grammar; rather, I conclude that linguistic systems – in the sense of


abstract representations and processes – clearly do exist, but they emerge from
use and experience, are constructed out of it and dependent upon it. The empir-
ical evidence will belie both kinds of theoretical purism – that which denies the
relevance of usage and that which denies abstraction and system.
Recognizing the fundamentally social nature of language – its status as
both the product and medium of social interaction – is essential to achieving an
adequate account of language, and hence an adequate linguistic theory. If lin-
guists conceptualize language as primarily or exclusively the property of the indi-
vidual mind, we are bound to go wrong. Consider, first, the origins of language,
both phylogenetically in the species, and ontogenetically in the individual. In
the individual, it is clear that social interaction is the sine qua non of language
development: without it, there is no language. This is evident in several ways.
Humans who have no social interaction as children, who are not part of a speech
community, do not develop any language at all. Child language development
occurs exclusively through social interaction, so that the fortunately rare cases
of isolated individuals – so-called ‘wild’ children, and children isolated due to
abuse – do not develop language (cf. Fromkin 1974). But given a human com-
munity and social interaction, some communicative system will emerge, even if
no prior language is available, or if there are other hindrances to communica-
tion. Thus we have the emergence of contact languages, pidgins, home signing
systems, and the like, appearing in situations where people interact but cannot
do so on the basis of normal language acquisition. On this evidence we may
further conclude that the phylogenetic situation was the same: the raison d’être,
the ultimate motivation for language evolution in the first place, was social inter-
action and communication, not as some would have it, as an aid to the solitary
activity of thought. The evolutionary competitive advantage of homo loquens ‘the
talking hominid’ was social communication (Mufwene 2011). The appropriate
conclusion is that societies have languages, individuals do not. Indeed, specific
‘languages’ such as English, Cantonese, and Kikuyu only persist and continue
to have regular patterns and continuities because they are the ongoing means
of communication in complete human societies. When a language ceases to be
used by a speech community, it dies. But when an individual dies, the languages
he or she spoke survive, so long as they are used by a community. Thus the struc-
tures and system of language do not exist in isolation from human interaction,
from usage.
Usage and interaction therefore provide both the data and the motivation for
language acquisition: for developing the mental capacity to speak ourselves. And
let us be clear, usage provides not just some of the data, but all of the data: we
have in fact no other kinds of evidence about how our languages work or how
The grammar of use and the use of grammar 49

to be a speaker except what we hear and perceive from those around us.1 This
is true of the child language learner, and it is also true of the linguist; even a
linguist’s intuition about grammaticality is a product of the system. The idea
that intuition and introspection give us a usage-free pathway to inspect the inner
workings of language directly is misguided. But given that our information comes
only from usage, the next questions for the linguist are, what do we do with that
information, and what does the mental capacity to be a speaker consist of?
These are the questions that point us to the ‘system’ part of the title of this
volume. But we must be cautious in how we use the terminological distinction
between ‘system’ and ‘usage’, lest we fall into the essentialist trap of believing
that two different labels must refer to two essentially distinct things – that the
‘system’ is something separate from all the evidence obtained from usage. I take
this pair of terms to be a contemporary update of the familiar dichotomy that has
beguiled and bedeviled linguistic theory since Saussure famously distinguished
langue from parole. In my view, the grand Swiss scholar, widely considered to be
the father of modern linguistics, did the discipline a disservice with his love of
dichotomies, particularly those opposing synchronic and diachronic linguistics,
and langue and parole. Linguistics has been mesmerized ever since by the seduc-
tive metaphorical opposition between a system and its products, subsequently
restated as competence vs. performance, I-language vs. E-language, grammar vs.
usage.
Before analyzing the substance of this dichotomy, let us consider how these
concepts have worked in terms of the sociology of the field – what have they
meant for the people and practices in linguistics? From this perspective, I believe
the dichotomy has flourished for two reasons. First, it’s a simplifying assumption:
it designates all variability, heterogeneity, idiosyncrasy, and other messy stuff as
belonging to something other than ‘real’ language, and allows us to set it aside
while we figure out the general patterns. This is a typical step in the early stages of
a scientific field: it is easier to work out generalizations and models if we can start
by ignoring some of the complexity of reality. Thus a simple model of mechani-
cal motion might start by ignoring friction, and a simple model of gravity might
start by ignoring relativity. These were in fact the ways those theories developed
in physics, so we can surely forgive our predecessors in linguistics for doing the
same thing when they postulated a categorical, homogeneous, abstract mental
grammar, ignoring diversity in society and variability in the individual. But when

1 I neglect here the still-debated issue of innateness; even if individuals possess an innate
mental faculty that aids in language acquisition, it does not aid them in acquiring any particular
language in the absence of linguistic interaction with other speakers.
50 Gregory R. Guy

a science achieves greater maturity and self-confidence, it needs to revisit the


simplifications and incorporate the inconvenient facts previously ignored, if it
is to continue to progress. I think linguistics is clearly at that point: we need to
marry our models of language with linguistic reality – focusing on questions like:
how does language work, how can we communicate and understand each other
by means of speech, and what do we do when we are doing that?
However, it is obvious that not all linguists agree on this point. There is a tend-
ency in the field to reify these simplifying assumptions as if they were a funda-
mental truth about the nature of reality; this approach leads to Chomsky’s (1965:
4) surprising position (hardly altered in the last half-century) that “observed use
of language … surely cannot constitute the actual subject matter of linguistics,
if this is to be a serious discipline”. This remarkable, but once widely accepted
position deserves serious reflection. Why would linguists adopt such a stance
towards the very substance of language? To me, it seems to contradict reality,
basic scientific empiricism, and even common sense, to assert that a serious
science cannot be based on the study of what people do when they communicate
with language. So why has such a position been so influential in our discipline?
One possible answer that merits consideration is that the Chomskyan position –
opposing competence/system to performance/usage while simultaneously defin-
ing the study of competence as the ‘serious’ science – is a self-justifying ideology
in the service of the interests of a particular school and a particular methodology.
It licenses the practitioner to ignore messy data from production, and avoids tire-
some empirical testing of one’s theories. It saves one the trouble of doing field-
work, and allows one to work while introspecting comfortably in an armchair.
Even better, it exalts theory constructors over data analyzers and fieldworkers.
This is all very attractive – Chomsky offers linguists a way to make our problems
more tractable, to make our working conditions more pleasant, and to feel supe-
rior, all at once. Of course, even within generative syntax there has been increas-
ing discomfort with the shakiness of badly gathered introspection (cf. Schütze
1996), but I-language, not E-language, continues to constitute the fundamental
focus of Chomskyan linguistics.
The problem that constantly confronts such an approach is that the diversity
of linguistic reality keeps undermining the dichotomy. Looking at the world, it is
clear that system and usage interpenetrate: within the linguistic system there is
variability, reflecting patterns of usage, and within variable usage there is system
and structure. There are social, usage-based constraints on the grammar, and sys-
tematic, grammar-based constraints on social usage. So at the present state of our
knowledge, the dichotomy has little explanatory or interpretive value. It doesn’t
even have much practical value, if it is now doing more to obstruct progress in the
The grammar of use and the use of grammar 51

field than to facilitate it. It is time to abandon this conceptualization of our prob-
lems, and move to dealing with language as it is, not as we would imagine it to be.

2 Orderly heterogeneity: The systematic nature of usage

A more illuminating approach begins with the two observations enunciated by


Weinreich, Labov and Herzog (1968) as elements of their “Empirical Foundations
for a Theory of Language Change”. These are “inherent variability” and “orderly
heterogeneity”. Orderly heterogeneity is the observation that variability in lan-
guage is not random and arbitrary, but structured and systematic. It is linguis-
tically structured, quantitatively constrained by the linguistic system. It is also
socially structured: speakers in a community are not randomly different from one
another; rather, linguistic diversity systematically reflects social organization:
people’s usage follows regular patterning by age, sex, class, ethnicity, linguis-
tic experience, interlocutor, purpose, context, acts of identity, and so on. Our
data about language, and our knowledge of it as users, look like the following
examples.
In New York City, where I live and work, users of English hear the social pat-
terning in the use of coda /r/ that Labov famously reported in 1966, reproduced
here as Figure 1. The rates of /r/ production show simultaneous class and stylis-
tic differentiation, such that higher status speakers use more /r/, and everybody
uses more /r/ in their more careful or formal styles. Crucially, these patterns are
regular, systematic, and pervasive, and they are regularly produced, perceived,
and interpreted by NYC English speakers.

Class
80 6–8
9
60 4–5
2–3
(r)

1
40 0 Figure 1: Coda /r/ production in New York
City English: Stratification by socioeconomic
20
class and speech style (A: casual speech,
0 B: careful speech, C: reading style, D: word
A B C D Dʻ lists, D’: minimal pairs), from Labov (2006:
Style 152)

Another pattern of class differentiation is regularly encountered when something


about the language is changing – specifically, when the change is a spontaneous
innovation emerging within the speech community. In such situations we typi-
52 Gregory R. Guy

cally encounter a curvilinear distribution, with a peak in the lower middle class
or upper working class, as illustrated in Figure 2 from Labov’s (1980: 261) study of
two ongoing vowel changes in Philadelphia English, the fronting of the nucleus
of (aw) (e. g., ounce, house) and the raising of the nucleus of (ey) in closed sylla-
bles (e. g., made, take).

200 200
p<.10
p<.05
p<.01
Regression coefficients (Hz)

Regression coefficients (Hz)


100 100
p<.001

(aw) (eyC)
00 00
F2 F2

–100 –100

–200 –200

0–5 6–9 10–12 13–14 16 0–5 6–9 10–12 13–14 16


sec LWC UWC LMC UMC UC
sec

Figure 2: Curvilinear social stratification of two vowel changes in Philadelphia English (from
Labov 1980: 261)

Another social dimension that is regularly reflected in orderly linguistic heter-


ogeneity is gender. Thus women typically lead in language changes of several
kinds, as illustrated in Figure 3, showing the class and gender distribution from
Guy et al.’s (1986: 37) study of an intonational change in progress in Australian
English, namely the innovative use of a rising intonation in declarative clauses.

0.8
Female
0.7
0.6
0.5
Prob.

0.4
Male
0.3
0.2
0.1
Figure 3: Gender and class distribution of an
0
LWC UWC MC intonational change in progress in Australian
Social class English (data from Guy et al. 1986: 37)
The grammar of use and the use of grammar 53

Another regularly patterned feature of every human being’s experience of lan-


guage use is age differentiation, and this also turns out to be crucially associated
with ongoing language change. The classic time course of language change is the
s-shaped curve. This is illustrated in Figure 4, showing Chambers’ (2002: 363)
data for the loss of /h/ before /w/ in words like which, whine in Canadian English.
The age distributions are similar in four different regions (Montreal, Southern
Ontario (GH), the Ottawa Valley and Quebec City). At a given moment in time,
only 30–40 % of the oldest speakers were using deletion, while the youngest
people were approaching 100 %.

100
90
80
70
60
% [w]

50
40 M
30 GH
20 OV
10 QC
0
over 80

70–79

60– 69

50–59

40– 49

30– 39

20– 29

14– 19

Figure 4: Loss of /h/ before /w/ in Canadian


Age English (from Chambers 2002: 363)

The generality of this pattern of age differentiation extends beyond segmental


phonology. In Figure 5 the same s-shaped curve appears in Chambers and
Heisler’s (1999: 39) data on a morpholexical change, the substitution of snuck for
historical sneaked as the past tense of sneak.

100
90
GH
80
70
QC
60
% snuck

50
40
30
20
10 Figure 5: Replacement of sneaked by snuck
0 in two dialects of Canadian English (from
over 80

70–79

60– 69

50–59

40– 49

30– 39

20–29

14– 19

Chambers and Heisler 1999: 39). (QC=


Quebec City, GH=Golden Horseshoe, i. e.
Age Southeastern Ontario)
54 Gregory R. Guy

Strikingly, the s-shaped curve of linguistic change reproduces across real time, as
seen in Figure 6, from Kroch’s (1989) study of the syntactic change in English that
introduced do as an auxiliary verb in questions and negations. The pervasiveness
of such patterns implies that during language change, speakers systematically
encounter in usage information about the direction of the change, and engage
with older forms and newer forms at the same time. Grandparents, parents, and
children are all speaking differently, and since these generational differences are
an intimate part of everyone’s linguistic experience, speakers effectively know
which way the change is heading – they can hear what is new and what is old
in the voices of their own speech community. More generally, they are regularly
exposed to systematic differences in frequency of occurrence of linguistic vari-
ables, and through the social distribution of these frequency patterns, are aware
of the social significance of quantitative information. This must have implications
for their construction and operation of their linguistic systems.

Affirmative transitive
100 adverbial & yes/no question
90 Negative question
80 Affirmative intransitive
70 adverbial & yes/no question
60
Affirmative object question
% do

50
40 Negative declarative
30
20
10
0
1400 1450 1500 1550 1600 1650 1700

Figure 6: The rise of periphrastic do in English (Kroch 1989: 223, based on data from Ellegård
1953)

All the above examples illustrate the point that the diversity of language in use is
not chaotic; rather it is orderly, along social dimensions. But it is also orderly in
the linguistic sense, that is, the linguistic constraints on variability systematically
reflect the organizing principles of language. Note that Kroch’s data in Figure 6
show systematic conditioning by syntactic construction that persists across three
centuries. A further example comes from my research on Brazilian Portuguese
(Guy 1981). Plural marking in popular Brazilian Portuguese is highly variable.
Unlike other dialects of Portuguese, and standard varieties of Spanish, Italian,
etc., where number agreement is obligatory across the NP, as in (1), Popular Bra-
zilian Portuguese (PBP) shows optional or variable plural marking, as in exam-
ples like (2).
The grammar of use and the use of grammar 55

(1) Standard Portuguese: categorical number agreement in NP


a. os velhos amigos ‘the old friends’ (cf. sg. o velho amigo)
b. as casas brancas ‘the white houses’ (cf. sg. a casa branca)

(2) Popular Brazilian Portuguese: variable number marking in NP


a. os amigo, os velho amigo, os velhos amigo
b. as casa, as casa branca, as casas branca

Instead of obligatory agreement, PBP shows high, nearly categorical rates of


plural marking in the first position in an NP, with marked declines in subsequent
positions, as shown in Table 1 (from Guy 1981: 179).

Table 1: Plural marking in popular Brazilian Portuguese, by position in NP

Position in NP: % plural marked N

first 95 5247
second 28 3947
third 21 552
fourth and fifth 11 42

This is a regular, systematic constraint on plural marking in PBP, replicated in


many studies of the phenomenon. Plural marking is variable, but not chaotic or
random (nor idiolectally differentiated). The constraint is syntactically defined:
it is not a function of word-class – first position words get plural markers whether
they are heads (amigos), determiners (os amigo), adjectives (velhos amigo, bons
homem ‘good men’), etc. This is highly systematic – it is extremely rare to hear
a Brazilian speaker say something like o amigos, with initial zero followed by
second position plural marker, or as casa novas, with first position marker, second
position zero, and third position marked. Instead, it looks like a rule that marks
the first word, followed by a low probability copying rule, from left to right: very
orderly, but heterogeneous and probabilistic. This is grammar in usage, orderly
heterogeneity. The simplistic dichotomy that postulates a categorical invariant
grammar, and consigns frequencies and variability to usage, obscures this order.

3 Inherent variability: The diversity of the linguistic system

The cases presented above are but a small sample of an overwhelming set of
empirical observations demonstrating that variability is a pervasive but orderly
feature of language use. On the basis of such evidence, Weinreich, Labov and
Herzog (1968) further argue that variability is ‘inherent’ in language – i. e., it per-
56 Gregory R. Guy

meates the linguistic system. The classic dichotomous approach, which distin-
guishes the linguistic system from usage, postulates that the object of study of lin-
guistics is a homogeneous monostylistic idiolect, which is internally invariant (cf.
Chomsky 1965). Alas this imaginary object does not exist in the world; indeed, no
observation of language even approximates it. Every speech community includes
diverse speakers, every individual commands different styles, every utterance
includes variable elements. Speakers clearly perceive, process, produce, compre-
hend, and manipulate variability in all aspects of language. Thus variability is, in
a word, an inherent feature of the linguistic system, and no adequate account
of the linguistic system can fail to accommodate variability. Consequently, any
grammar of a language, or theory of grammar, that fails to account for variability
is inadequate on its face – it does not even reach Chomsky’s most elementary level
of ‘observational’ adequacy. Furthermore, if variability is an intrinsic part of the
grammar, then we lose one of the motivations for the grammar/usage distinction.
Now, elsewhere in this volume Newmeyer has objected that quantitative
properties like those illustrated above are not ‘in’ the grammar, but lie outside
it, in a ‘user’s manual’, or derive from interactions between the grammar and
various grammar-external factors. Nevertheless, the evidence shows that this
restatement of the system/usage dichotomy continues to be inadequate, as well
as theoretically profligate rather than parsimonious. A cogent case in point is the
deletion of final coronal stops in English, which shows an exponential relation
among retention rates in three morphological categories.
The relevant categories reflect three derivational levels of words in English:
underived or monomorphemic forms like best, old, which have the full consonant
cluster in their dictionary entries, have the highest rates of deletion and lowest
rates of retention. Irregular ‘semiweak’ verbs like left, told are derived early in
the phonology, and have intermediate rates of retention. And finally, the highest
rates of retention are found in regular past tense forms like missed, rolled, which
are derived late in the phonology. Strikingly, the retention rates in the three cat-
egories are exponentially related. I have argued in previous work (Guy 1991, 1992)
that this shows an iterated application of a single variable rule, with a constant
‘base’ probability of applying, which operates throughout the several stages of
a derivation in a multilevel phonology. The highest retention rates are found in
past tense forms that are exposed once to the deletion process at the final level of
the phonology; forms that are exposed at two levels have the square of the basic
retention rate, and the underived forms that are exposed at three levels show the
cube of the base rate. This relationship was demonstrated in my original work,
and it has subsequently been confirmed in a large number of other studies. The
data from three such studies appear in Table 2, which shows in the first three
columns, the number of tokens analyzed, the percentage of those forms which
The grammar of use and the use of grammar 57

had a retained /t/ or /d/, and the probability of retention predicted by my model,
using the ‘best fit’ for the base rate of retention shown in the fourth column. Thus
Santa Ana’s speakers appear to have an underlying base probability of retention
of .75, predicting retention rates of 75 % in regular pasts, the square of this value
– 56.3 % – in irregular pasts, and the cube of this value – 42.2 % – in underived
forms. These numbers fit very closely to the observed percentages of deletion
(74.3 %, 59.3 %, and 42.1 %). A chi-square test shows that the differences between
the predictions of the model and the observations are not significant (p = .57,
where the criterion for significance is typically taken to be p < .05).

Table 2: The exponential relationship: coronal stop retention in three data sets

N Observed Predicted exponential Model-fitting


% retained progression

Corpus 1 (Guy 1991: 5; 7 speakers)


Underived 658 61.9 .614 (n = 3) Best-fit pr = .85
Irregular Past 56 66.1 .723 (n = 2) Chi-square = 1.28, p = .55
Regular Past 181 84.0 .850 (n = 1)

Corpus 2 (Santa Ana 1992: 280; 45 speakers)


Underived 3724 42.1 .422 Best-fit pr =.75
Irregular Past 297 59.3 .563 Chi-square = 1.17, p = .57
Regular Past 836 74.3 .750

Corpus 3 (Bayley 1994: 310; 32 speakers)


Underived 2384 43.7 .436 Best-fit pr = .758
Regular Past 685 75.6 .758

Five further tests of the model by Labov and his students appear in Table 3. In
no case is the exponential model statistically rejected (i. e., the predictions of the
model are never significantly different, with p < .05, from the observed data), and
in the studies that have compared it to alternative models, it fits as well or better
than the alternatives.
This is thus a robust and systematic quantitative feature of English phonol-
ogy. It is eminently rule-governed; it is not sporadic or random; and it shows a
highly specific mathematical relationship. English speakers do not simply delete
final stops more in underived words and less when the stop represents an affix;
rather, they delete these categories in a specific ratio. It is hard to imagine any
other process than iterated application of a single probabilistic operation that
can generate these numbers. For instance, it can’t be adequately modeled by just
assigning separate probabilities to the three categories. Functional and usage-
58 Gregory R. Guy

Table 3: Five tests of the exponential relationship in coronal stop deletion, 1991–1997
[W. Labov, p. c.]

Year Regular Past Irregular Past Underived Words Best fit pr Chi-square,
missed, rolled lost, told best, old sig

1991 N ret/tot 79/100 29/53 221/539


pr .79 .73 .74 .74 .37, p > .85
1992 N ret/tot 93/116 32/64 250/583
pr .80 .71 .75 .75 .93, p > .65
1995 N ret/tot 323/404 149/229 496/922
pr .80 .81 .81 .81 .67, p > .75
1996 N ret/tot 85/96 62/82 219/374
pr .88 .87 .84 .84 .56, p > .75
1997 N ret/tot 209/258 71/90 491/906
pr .81 .89 .82 .82 1.99, p > .40

based accounts that differentiate these classes by their functional load (e. g., avoid
deleting tense markers because of their communicative content) fail to predict
any specific quantitative relation among them. The only model that explains the
exponential relationship is one in which a single operation (stochastic deletion)
is recursively applied in the derivation of forms, with the mathematical result
that the associated probability is multiplied by itself one, two, or three times.
Recursion and derivation in language are ordinarily understood as grammatical
operations. In this case, those operations are quantified.
Strong confirmation of this model is found in the way the process interacts
with other constraints. Those that are external to the word, such as the favoring
effect of a following consonant, are not multiplied during the derivation; rather,
they apply only once in the postlexical phonology, after words are inserted into a
phrase marker. Consequently, they are constant in magnitude across the different
derivational classes. However, internal constraints such as the effect of a preced-
ing consonant are indeed present throughout a derivation; consequently they
appear magnified in underived words, which experience them repeatedly, com-
pared to regular past tense forms, which experience their effects only once. These
predictions are quantitatively confirmed, as shown in Table 4, which expresses
the contextual effects on the process as partial probabilities of deletion occur-
ring in a context (a context with a value of 1 implies categorical deletion, while
a value of 0 implies categorical retention; intermediate values above .5 indicate
deletion favoring contexts, and those below .5 indicate deletion-inhibiting con-
texts). The word-internal constraint shows a much larger range between favoring
and disfavoring contexts in the underived words than in the regular past tense
The grammar of use and the use of grammar 59

forms – the predicted amplification through iterated applications of the deletion


process, whereas the word-external constraint is uniform in magnitude for both
morphological classes.

Table 4: Internal vs. external constraints on coronal stop deletion: interaction with derivational
class (Guy 1992: 233 and 235) (partial probabilities from separate analyses of morphological
classes)

4a: Internal constraint: Preceding segment effect on coronal stop deletion

– – – Morphological Class – – –

Preceding Segment Underived Regular Past


Sibilants .66 .67
Obstruents (stops, other fricatives) .49 .46
Nasals .59 .41
Liquids .27 .44

Range: .39 > .26

4b: External constraint: Following segment effect on coronal stop deletion

– – – Morphological Class – – –

Following Segment Underived Regular Past


Consonants .73 .65
Vowels .31 .24
Pause .45 .63

Range: .42 = .41

What this implies is that variability and quantitative properties are found in the
system, inside the grammar. And as we saw in the previous section, systematic,
regular ‘grammatical’ properties are also found within the use of language. So the
dichotomy that opposes system and usage, assigning invariant and categorical
properties to the system/grammar, and variable and probabilistic properties to
usage, is turning into an obstacle to explanation, rather than a facilitator.

4 T
 owards an integrated theory:
Grammar emerges from experience

What then are the elements of a more coherent vision that eschews the facile
system vs. usage dichotomy in pursuit of a model of the fundamental unity of
60 Gregory R. Guy

grammar and use? We can start where everyone starts, as a child encountering
the language in use in the community around us. Usage constitutes our entire
input. We have an intelligent mind, perhaps even endowed with specialized
neural networks that facilitate language processing. But whether or not language
is cerebrally special, we face the general problem of identifying units, colloca-
tions, and productive principles that will allow us not simply to reproduce the
specific utterances that we have heard, but to form our own novel utterances that
will be correctly interpreted by others. We must make our output well-formed, so
we have to figure out what ‘well-formedness’ consists of.
Basically, we have to find patterns. The patterns are the grammar, the
system. Where do they come from? The usage-based perspective on this issue,
associated with linguists like Bybee (2001, 2002) and Pierrehumbert (2001, 2006),
argues that system is emergent, consisting of generalizations across observed
usage. Let me present an example from my research with Daniel Erker on Spanish
pro-drop (Erker and Guy 2012).
Spanish has optional use of subject personal pronouns (SPPs). They can
occur overtly or be omitted, as in (3), where both full and omitted forms com-
municate the same meaning.

(3) Overt subject personal pronoun Yo quiero ‘I want’


Omitted subject personal pronoun Quiero ‘I want’

So how does a speaker know or decide where to use one or the other? Much
previous research on this topic has turned up some systematic, widely general
patterns of use that are governed by morphosyntactic and discursive structures
of Spanish. For example, SPP occurrence is regularly constrained by verbal mor-
phology, verb semantics, and discourse reference (cf., inter alia, Otheguy and
Zentella 2012). The morphological constraint contrasts different tense/mood/
aspect forms, with the result that TMA categories with more distinctive verbal
inflection (e. g., the preterit, where every person/number category has a distinc-
tive inflected form) are associated with lower probabilities of pronoun occurrence
than those with less distinctive inflection (e. g., the imperfect, where first and
third person singular forms are systematically identical). The verbal semantics
constraint favors SPP use for verbs of mental activity, while those of external
activity show less SPP occurrence. And the discourse level constraint considers
the flow of reference in a text: a subject which makes reference to a different
person than the subject of a preceding sentence (switch reference) is more likely
to be expressed by an overt pronoun. These patterns are confirmed in our data,
as shown in Table 5.
The grammar of use and the use of grammar 61

Table 5: Three constraints on Spanish SPP occurrence (from Erker and Guy 2012: 540–541)

N % overt SPP

Tense-Mood-Aspect form of verb


Imperfect Indicative 708 43 %
Present Indicative 2695 36 %
Preterit Indicative 877 29 %
F = 9, p < .001

Semantic content of verb


Mental Activity 840 45 %
Stative Verb 1438 36 %
External Activity 2601 31 %
F = 27.8, p < .001

Switch Reference
Switch in reference from previous clause 2653 40 %
No switch in reference 2233 29 %
T = 8.1, p < .001

These effects are quite regular and systematic, recurring in many studies. They
constitute valid generalizations about Spanish syntax. But when Erker and I
looked at their distribution with respect to the lexical frequency of the verb, we
found that these patterns are primarily associated with high frequency words.
First consider TMA. Dividing the verbs into low frequency and high frequency
forms, we find that the main effect of TMA is primarily a phenomenon of the
frequent words. The differences among the TMA categories is modest (although
significant) in infrequent verbs, but dramatically greater among high frequency
words. This is graphically illustrated in Figure 7; in this and subsequent figures
the diverging lines indicate stronger constraint effects in the high frequency
words, and significant differences between contexts are indicated by dark stars.
Similar results emerge for the other constraints on Spanish pro-drop. In the
case of semantic content, the picture is even clearer: no significant differences
among semantic categories in the infrequent forms, but among the frequent
forms, the contrast is forcefully evident (Figure 8).
And for switch reference (Figure 9), clause sequences involving a switch in
reference favor more pronoun expression than those with no switch for both low
and high frequency verbs, but again the effect is significantly greater for high
frequency words.
The robust generalization that emerges from these results is that the various
constraints on Spanish pro-drop that have been well documented in many pre-
62 Gregory R. Guy

Imperfect Mental
50 47 Indicative 60
Activity
% Pronouns Present

% Pronouns Present
45 59
42 Present 50
41
40 Indicative Stative *
40 36
* 41
35 34 33
* 30 32
30 29
25 Preterit 20 External
22 Indicative 17 Activity
20 10
Infrequent Forms Frequent Forms Infrequent Forms Frequent Forms
Significant Significant No difference, Significant
difference, difference, F=.94, p =.38 difference,
F=9.9, p<.001 F=5.3, p<.001 (n=494, 924, 2393) F=51.2, p<.001
(n=pres 1861, (n=pres 834, (n=346, 514, 208)
pret 823, imp 524) pret 54, imp 184)

Figure 7: TMA and frequency in Spanish Figure 8: Semantic content and frequency in
pro-drop (from Erker and Guy 2012: 544) Spanish pro-drop (Erker and Guy 2012: 543)

Switch in
50
referent
% Pronouns Present

49
45
*
40
37
35 Not a
* switch
33
30
28
25
Infrequent Forms Frequent Forms
Significant Significant
difference, difference,
F=38.8, p<.001 F=27.3, p<.001
(n=not 1771, (n=not 462,
switch 2037) switch 608)

Figure 9: Switch reference and frequency in


Spanish pro-drop (Erker and Guy 2012: 544)

vious studies are regularly much stronger for high frequency verbs. The standard
interpretation of these constraints has been to suppose that a speaker’s gram-
matical representation of verbal properties and discourse structure governs their
probabilistic choice between using a null or overt pronoun. But these results indi-
cate that the grammatical properties such as tense-mood form or semantic cat-
egory are actually activated among the words that speakers most often encounter
The grammar of use and the use of grammar 63

and use in speech: this is a classic characteristic of usage, not a property of


grammar in the traditional sense.
Importantly, our data show that this pattern is consistent across two different
nationality groups – Dominicans and Mexicans – that have substantially differ-
ent overall rates of pro-drop. Dominican Spanish is well-known to use a much
higher rate of overt SPPs than Mexican Spanish, but both dialects show the same
pattern of constraint effects and interaction with lexical frequency, as is illus-
trated for semantic content in Figure 10.

Dominican Republic Mexico


70 Mental 50
Mental
Percent Pronouns Present

Percent Pronouns Present


Activity
60 Activity
40
Stative
50 Stative
30
40
20
30 External
External
Activity Activity
20 10
Infrequent Verbs Frequent Verbs Infrequent Verbs Frequent Verbs

Figure 10: Frequency and semantic content effects on pro-drop in Dominican and Mexican
Spanish (Erker and Guy 2012: 547)

Erker and Guy argue that these results show that the grammatical analysis, in
terms of TMA classes, semantic classes, and even the discourse patterning, is
emergent, rather than primary. Speakers presumably need some minimal level of
experience with words and structures to begin to discern patterns and formulate
hypotheses. Consequently, they display robust patterns in high frequency forms,
but weak to nonexistent patterns in the low frequency forms. In fact, the low
frequency forms basically default to the dialect-specific overall rate of pronoun
occurrence. But for verbs that are frequently encountered in usage, distinct gen-
eralizations begin to emerge.

5 Towards an integrated theory: Grammar governs production

The evidence just presented suggests that speakers infer grammatical properties
and ‘rules’ from experience, as shown by the fact that they do a ‘better’ – or at
least more robust – job of inferring them about words that they hear and use more
often. Grammar is thus emergent from and derivative of experience, rather than
64 Gregory R. Guy

a priori or primary. But we should not leap from such evidence to the conclusion
that there is no mental grammar at all, that speakers simply replicate the quanti-
tative data that they encounter in their linguistic input, without constructing any
mental apparatus of abstract representations, patterns, and operations. On this
point I think I part company with positions taken by Bybee (2001) and some other
usage-based theorists: the evidence suggests that speakers do in fact construct
mental grammars – abstract analyses, categories, and operations – to enable and
govern their own productions.
An illustration of this point comes from my work with Sally Boyd (Guy and
Boyd 1990) on the acquisition of the morphological constraint on English coronal
stop deletion that was discussed above. There is a distinctive developmental
pattern in the treatment of the irregular, semiweak past tense forms with respect
to stop deletion: deletion rates in this category decline with age, as shown in
Figure 11.

0.95
0.85
Factor weight

0.75
0.65
0.55
0.45
0.35 Figure 11: Age grading of English final stop
0 20 40 60 80 deletion in irregular semiweak past tense
Speaker’s Age verbs (from Guy and Boyd 1990: 8)

Boyd and Guy interpret this pattern in terms of the mental representation of these
categories. The youngest children (aged 4–5 in this study) evidently have just
two form classes for English past tenses: strong and weak. The semiweak forms,
having salient root vowel changes, are perforce assigned to the strong class, with
mental representations like tell~toll, keep~kep, so they have essentially categor-
ical absence of the final stop. In our study this appears as very high rates of t,d
deletion for the youngest children. In the next developmental stage, speakers set
up distinct representations for these words that incorporate the final coronal stop,
but treat these as suppletive allomorphs, not derived forms. At this stage, reached
in adolescence for our speakers, these words are deleted at approximately the
same rate as underived words. The lowest rate of deletion is not reached for most
people until adult life, and represents the development of a derived representa-
tion in which the final -t,d is identified as an affix, and attached at the first level
of the lexicon. This then generates the lowered deletion rate in these forms.
The grammar of use and the use of grammar 65

One consequence of this acquisitional pattern is that children do not simply


reproduce the deletion rates that they hear from their parents in semiweak verbs,
because they do not have the same mental representations. Although they are
capable of matching parental input with extraordinary precision in those mor-
phological categories where they share the same representations, in the semiweak
verbs their deletion rates reflect their own mental grammar, not the productions
they hear from their parents, as illustrated in Labov’s figures for a family from a
Philadelphia suburb shown in Figure 12. The seven-year old son converges closely
on his parents’ deletion rates for underived and regular past tense forms, but for
the tell-told, keep-kept class, he is maximally distinct, using the high deletion
rates associated with the grammar typical of his stage of language development.

Figure 12: Generational differences in coronal


stop deletion in one family (Labov, seminar
presentation at the University of Pennsylva-
nia in March 2008)

Consider the cognitive task confronted by the language learner who seeks to
discern an optimal representation for this fragment of English morphophonol-
ogy. They encounter massive evidence that English distinguishes present and past
tense verb forms, and that it has two form classes – strong verbs with root vowel
changes, and weak verbs with the coronal stop affix. But there are only about 14
lexical items that have both of these alternations, like tell-told, leave-left, keep-
kept, etc. Hence in order to set up a distinct representation for such forms, the child
must first pick the relevant words out of the crowd, recognize that they have vari-
ably occurring final stops, and then further recognize that the rate of stop deletion
is subtly different in the holistic and derived representations. This takes appre-
ciable amounts of time and data, so in childhood English speakers are unable
to replicate the adult treatment of the semiweak verbs. Roberts’ (1993) study of
Philadelphia children demonstrates this with large Ns, as shown in Figure 13.
66 Gregory R. Guy

1
Probability of deletion

0.8
Children (N=1841)
0.6

0.4
Parents (N=604)
0.2

0 Figure 13: Generational differences in coronal


Mono- Derivational Preterit stop deletion in a large corpus (Roberts
morphemic 1993: 102)

The appropriate conclusion is that, here again, the grammatical representation


emerges from – or is inferred from – usage and experience, but developing this
representation is a process that each individual must go through, and that may
not even lead them to exactly the same analysis in the end. Nevertheless, it is
the grammatical representation that the speaker has inferred that governs their
production, rather than a simple repetition of what they have heard.

6 Conclusion

The evidence drawn from the study of actual human language is incompatible
with an idealized model that seeks to characterize linguistic systems in isolation
from use; it is also incompatible with a model that seeks to characterize usage in
isolation from any kind of abstract system. Hence I conclude that puristic models
at both extremes of the theoretical spectrum on these issues are destined to fail:
a puristic generative model, which keeps a strict separation between grammar
and usage, fails to give an adequate account of the interpenetration of structure,
variability, and probabilistic properties in both grammar and usage, while a puris-
tic usage-based model, which denies abstraction, fails to account for grammat-
ically governed divergences between experience and production, like the results
in Figures 12 and 13. An adequate model requires an integrated approach: usage
supplies language acquirers with all of their data, including a vastly enriched
fountain of information about social diversity, directions of change, and the
orderly linguistic structure of inherent variability. From this input, they construct
the set of inferences, representations, and operations that we call the grammar.
Crucially, the grammar incorporates and encompasses variability and quantifica-
tion, enabling speakers to do the fine quantitative tuning of their productions that
is so fundamental to situating one’s speech appropriately in the social universe,
and conveying appropriate messages about interactive stance, speech style, and
The grammar of use and the use of grammar 67

identity construction. The mental grammar thus mediates between experience


and production: production does not derive directly from input, but is governed
by the internalized grammar.
In conclusion, doing linguistics without usage is like doing zoology without
animals – you can try it, but it yields a very limited theory that suffers from a
gaping, self-inflicted wound, a theory which is liable one day to be trampled by
elephants. But at the same time, studying usage without addressing systematicity
and generalization is not doing linguistics at all: the language user does in fact
construct patterns, generalizations, and processes, and uses them to speak. Lin-
guistics must therefore ultimately reject the Saussurean dichotomy, and embrace
all the faces that language presents to us: the usage, the systems that emerge
from it, and the society that has created and maintained it. Rather than policing
the hypothetical frontier between system and use, linguists can move on to more
productive and interesting pursuits, investigating precisely how these facets of
human linguistic experience interact.

References
Bayley, Robert (1994): Consonant cluster reduction in Tejano English. Language Variation and
Change 6: 303–326.
Bybee, Joan (2001): Phonology and language use. Cambridge: Cambridge University Press.
Bybee, Joan (2002): Word frequency and context of use in the lexical diffusion of phonetically
conditioned sound change. Language Variation and Change 14: 261–290.
Chambers, Jack K. (2002): Patterns of variation including change. In: Jack K. Chambers, Peter
Trudgill and Natalie Schilling-Estes (eds.), The handbook of language variation and
change, 349–372. Malden, MA: Blackwell.
Chambers, Jack K. and Troy Heisler (1999): Dialect topography of Quebec City English. Canadian
Journal of Linguistics 44: 23–48.
Chomsky, Noam (1965): Aspects of the Theory of Syntax. Cambridge: MIT Press.
Ellegård, Alvar (1953): The Auxiliary Do: The Establishment and Regulation of its Use in English.
Gothenburg: Acta Universitatis Gothoburgensis.
Erker, Daniel and Gregory R. Guy (2012): The role of lexical frequency in syntactic variability:
Variable subject personal pronoun expression in Spanish. Language 88: 526–557.
Fromkin, Victoria (1974): The linguistic development of Genie. Language 50: 528–54.
Guy, Gregory R. (1981): Linguistic Variation in Brazilian Portuguese: Aspects of phonology,
syntax and language history. Ph.D. dissertation, University of Pennsylvania. Ann Arbor:
University Microfilms.
Guy, Gregory R. (1991): Explanation in variable phonology: An exponential model of morpho-
logical constraints. Language Variation and Change 3: 1–22.
Guy, Gregory R. (1992): Contextual conditioning in variable lexical phonology. Language
Variation and Change 3: 223–239.
Guy, Gregory R. and Sally Boyd (1990): The development of a morphological class. Language
Variation and Change 2–1: 1–18.
68 Gregory R. Guy

Guy, Gregory R., Barbara Horvath, Julia Vonwiller, Elaine Daisley and Inge Rogers (1986): An
intonational change in progress in Australian English. Language in Society 15: 23–52.
Kroch, Anthony (1989): Reflexes of grammar in patterns of language change. Language
Variation and Change 1–3: 199–244.
Labov, William (2006): The social stratification of English in New York City (2nd edition).
Cambridge: Cambridge University Press.
Labov, William (1980): The social origins of sound change. In: William Labov (ed.), Locating
language in time and space, 251–265. New York: Academic Press.
Mufwene, Salikoko (2011): Language evolution: an ecological perspective. Perspectives 4.
Réseau français des instituts d’études avancées (www.rfiea.fr).
Otheguy, Ricardo and Ana Celia Zentella (2012): Spanish in New York. New York: Oxford
University Press.
Pierrehumbert, Janet (2001): Exemplar dynamics: Word frequency, lenition, and contrast. In:
Joan Bybee and Paul Hopper (eds.), Frequency effects and the emergence of linguistic
structure, 137–157. Amsterdam: John Benjamins.
Pierrehumbert, Janet (2006): The next toolkit. Journal of Phonetics 34: 516–530.
Roberts, Julie (1993): The acquisition of variable rules: t,d deletion and -ing production in
pre-school children. Ph.D. dissertation, University of Pennsylvania.
Santa Ana, Otto (1992): Chicano English evidence for the exponential hypothesis: A variable
rule pervades lexical phonology. Language Variation and Change 4: 275–289.
Schütze, Carson (1996): The empirical base of linguistics: Grammaticality judgments and
linguistic methodology. Chicago: University of Chicago Press.
Weinreich, Uriel, William Labov and Marvin Herzog (1968): Empirical foundations for a theory
of language change. In: Winfred Philipp Lehmann and Yakov Malkiel (eds.), Directions for
historical linguistics, 95–195. Austin: University of Texas Press.
Richard Cameron, University of Illinois at Chicago
Looking for structure-dependence,
category-sensitive processes, and
long-distance dependencies in usage1

Abstract: Distinguishing grammar from usage, Newmeyer (2003: 695) claims “a


world of difference between what a grammar is and what we do.” In the process,
he identifies three “basic aspects of grammar [:] long-distance dependencies, cat-
egory-sensitive processes, [and] structure-dependence” (Newmeyer 2003: 687). If
“there is a world of difference between what a grammar is and what we do,” these
basic aspects of grammar should not characterize usage. However, if patterns of
usage display structural parallels, this will give us cause to rethink the “world
of difference” claim. I start with (1) Structure Dependence, (2) move to Category-
Sensitive Processes, and then (3) Long-Distance Dependencies. When discussing
Structure Dependence, I explore Adjacency Pairs and the concept of “signifi-
cant silence” from Conversation Analysis. For discussion of Category-Sensitive
Processes and Long-Distance Dependencies, I draw on Variationist treatments of
internal constraints on English (ing) and Spanish null/pronominal subject alter-
nations. Usage displays parallels, at least for these three aspects.

1 Introduction

Newmeyer’s (2003) compelling defense of the Saussurean distinction between


langue and parole or grammar and usage provides the reader with a wide and rich
set of ideas. Here, I explore some of these ideas using findings and analyses from
the fields of Conversation Analysis and Variationist Sociolinguistics, two usage-
based approaches. I do not see the distinction between grammar and usage as a
theory so much as a fundamental assumption that contributes to theory building
and to the definition of research objects. As such, the distinction itself may not
actually be falsifiable in a broad sense. Nonetheless, I will attempt to problem-
atize one claim from Newmeyer (2003: 695) where he states that “there is a world

1 I send special thanks to Luis López and Kay González-Vilbazo for their help in articulating
ideas on structure-dependence and long-distance dependencies. Also a very warm thank you to
Aria Adli, Marco García García and Göz Kaufmann for that most surprising of invitations.
70 Richard Cameron

of difference between what a grammar is and what we do.” By problematize, I


mean to create some doubt. Doubt about what? Not doubt that grammar and
usage are different, but doubt about how we articulate the content and degree of
that difference. Taking as point of departure Newmeyer’s (2003: 687) identifica-
tion of three key aspects of grammar, “long-distance dependencies, category-sen-
sitive processes, [and] structure-dependence,” I will identify parallels between
structures of grammar and structures of use. These parallels, at points, make the
“world of difference” problematic.
In broad strokes, I agree with Newmeyer’s basic claim that grammar and
usage differ in the following senses. Grammar occurs in the minds of socially and
linguistically related individuals in a complicated modular network, some (not
all) of it biologically programmed, in which information passes from one acti-
vated site to another via the chemical and electrical excitability of neurons. Think
of Wernicke, Broca, and the debated role of the Arcuate Fasciculus (Bernal and
Ardila 2009). In other words, physiology gives rise to psychology and grammar is
part of both.
Usage occurs in real-time interactions between embodied, excitable, interde-
pendent, mutually-monitoring agents. Call them human beings, big and small,
who are subject to bounded rationality, memory, and emotion (Cameron and
Schwenter 2013: 476), with agendas to pursue in an uncertain but sensuous world
where issues of face are ever present and personal biography is always socially
situated. As these human beings interact, they consistently engage in co-con-
structed identity displays. Following on Carnap’s (1942/1961: 9) early proposal
that research in pragmatics need refer to “the speaker,” we may also suspect that
such identity displays are not mere social facts. They are also necessary for an
understanding of meaning as constructed through “the ancient art of talking!”
Here I cite the words of 8 year old Evan from Oak Park, Illinois who I interviewed
in 2004. This art of talking also involves a mismatch between what is said and
what is intended, a key idea from pragmatics. In other words, nearly all of what
we say to one another underdetermines the range of meanings that we intend.
Thus, as we interact, we infer, and we expect others will infer, meanings above
and beyond what is said.
In this respect, I also agree with a point Newmeyer (2003: 688) made in his
section 6 entitled “Full grammatical structure is cognitively represented.” There
he stated that, “[t]hese observations all lead to the conclusion that speakers
mentally represent the full grammatical structure, even if they utter only frag-
ments” (Newmeyer 2003: 690). So, not only does what we say underdetermine
the content of what we mean but what we say can underrepresent the content and
architecture of our grammar.
Looking for structure-dependence 71

However, I would also call this ancient art of talking the engine for language
change. Because language change emerges from usage within communities, it
follows that usage contributes to the shaping of grammar. Precisely how and how
much, when, where, why, and by whom, I take as empirical questions in line with
the agenda of Weinreich, Labov, and Herzog (1968).
Now, an issue entailed by the very title of Newmeyer’s work, as well as by my
exposition here on grammar and usage, is what I would call a binary approach to
the organization of argument. If we claim that “Grammar is grammar and usage
is usage,” we have set up an either/or binary relationship between the two with
two attendant implications. First, the boundary between grammar and usage is
actually and always recognizable. Second, any subsequent response, be it debate
or research, will also assume binarism as a ground rule. As such, binarism serves
as framing ideology, be it scientifically motivated or not. Consider a few of Eagle-
ton’s (1991: 1–2) multiple definitions of ideology as, “a body of ideas characteristic
of a particular social group or class” or “the medium in which conscious social
actors make sense of their world” or “action-oriented sets of beliefs.” In other
words, we might say that binarism serves as a belief that enables some linguists
to make sense of language as they pursue the actions of research. One might
also draw on some of Eagleton’s pejorative definitions of ideology such as “false
ideas which help to legitimate a dominant political power” or “socially necessary
illusion.” I prefer the nonjudgmental first set of definitions as more useful. My
preference is also the stuff of ideology.
Of course, binarism has long organized research and argument in linguistics
at least since Saussure’s distinction of langue and parole. Other contemporary
examples are core vs. periphery or I-language vs. E-language or competence vs.
performance. In all of these, a binary method of organization sets the research
agenda from the beginning.
A more recent example is Hauser, Chomsky and Fitch (2002) on the nature
of the language faculty as being narrow (FLN) vs. broad (FLB) with the key claim
that the FLN is exclusively characterized by recursion. In response, Jackendoff
(2011) asked if recursion could be found in other domains of cognition such as
vision. If so, because recursion would not be unique to language per se, the key
proposal about the nature of the FLN is falsified, though the possible existence of
a FLN and FLB is not.
I will pursue a strategy analogous to that of Jackendoff, but considerably
less ambitious. My point of departure emerges from Newmeyer’s (2003: 687) cri-
tique of early connectionist models of grammar where he asserted that they are
“hopeless at capturing even the most basic aspects of grammar, such as long-
distance dependencies, category-sensitive processes, structure-dependence, and
so on.” This provides the basis for a prediction. If “there is a world of difference
72 Richard Cameron

between what a grammar is and what we do,” these basic aspects of grammar
should not have parallels in usage. However, if patterns of usage provide struc-
tural parallels, this will give us cause to rethink the “world of difference” claim as
observationally inadequate, at least with respect to Long-Distance Dependencies,
Category-Sensitive Processes, and Structure-Dependence.
As Jackendoff (2011) focused on the issue of recursion because Hauser,
Chomsky and Fitch (2002) highlighted recursion in their claim about the nature
of FLN relative to FLB, so will I focus on these three aspects because Newmeyer
identified them as distinctive of grammar and grammar as different from usage. I
explore the prediction first with (1) Structure Dependence, (2) move to Category-
Sensitive Processes, and then (3) Long-Distance Dependencies. When discussing
Structure Dependence, I explore Adjacency Pairs and the concept of “signifi-
cant silence” from Conversation Analysis. For discussion of Category-Sensitive
Processes and Long-Distance Dependencies, I draw on Variationist treatments
of internal constraints on English (ing) and Spanish null/pronominal subject
alternations.
In my pursuit of these basic aspects of grammar in usage, I operate on the
basis of an assumption or two. First, I assume a simple model of grammar. Within
most conceptions of generative grammar that I am aware of, binarism falls away.
Linguistic grammars are typically comprised of at least three interfaced compo-
nents: phonology, syntax, and semantics, with debate remaining on the status
of phonetics, pragmatics and perhaps morphology. As far as I can tell, it is also
the case that the structural properties and outputs of these different modules of
the grammar are understood to be similar yet different. The similarities would
involve some sort of hierarchical organization and the basic idea that a grammar
consists of recognizably discrete units and methods for combining them into
larger units. Second, if phonology and semantics may differ from and yet share
certain features with syntax and thereby be partners in grammar, then if usage
differs from yet shares certain features of grammar, these shared features will
give us cause to rethink the binary distinction between them as problematic or,
as I noted before, as observationally inadequate. Key to this is the discovery of
structures in usage that parallel structures in grammar.

2 Structure Dependence in usage

Structure-Dependence refers to a putative universal characteristic of syntactic


processes such that the application of these processes depends on syntactic con-
stituency, not on mere linear sequences of words or individual words that cannot
or do not serve as sentence constituents. Recognizing and assigning constituency
Looking for structure-dependence 73

is also a syntactic process. Groupings of words into non-constituents or the use


of non-constituents to derive a sentence would be structure-independent. Thus,
in a sentence like, “Many executives eat in fancy restaurants”, a non-constituent
grouping would be “in fancy” which is evidenced by our inability to topicalize
“in fancy.” To do so, creates the ungrammatical “*In fancy, many executives eat
restaurants”.
The presence of structure-dependence provides for a basic prediction. In the
speech of very young children learning a language and who display evidence of
having some sort of syntax, certain types of developmental errors, in particular
structure-independent errors, will not occur. This prediction follows from the
assumption that the structure-dependence of syntactic processes, provided by
the linguistically structured and structuring mind, constrains and enables what
we produce and how we process, parse, and interpret what is said.
The oft-cited work of Crain and Nakamura (1987) supports this prediction.
Using the Star Wars movie character, Jabba the Hut, they elicited yes-no ques-
tions from young children in the frame of: “Ask Jabba if the boy who is watching
Mickey Mouse is happy” (Crain and Nakamura 1987: 528). The goal was to see
if children would produce responses like “Is the boy who watching Mickey is
happy?” in which they invert the first verbal element within the subject relative
clause by analogy to simple sentences like “The boy is happy.” as “Is the boy
happy?” The children did produce a small number of errors of the specific sort
Crain and Nakamura sought. But, overwhelmingly, they did not and the ones the
children produced, Crain and Nakamura explained away as processing errors.
Notice that the elicitation technique of asking a question, based on prior
information such that the question sets up the possibility of an answer from a
different person is directly analogous to a fundamental building block of conver-
sational interaction called the Adjacency Pair. Such pairs of utterances are also
deeply connected to the most fundamental method of conversational interaction
called turn taking.
Is conversation structured? If so, does it show evidence of Structure-Depend-
ence that is putatively universal and which constrains and enables what we
produce and how we process, parse, and interpret what is said?
Research in pragmatics and computational linguistics with a focus on dis-
course, speaker plans, and plan deduction has long proposed that discourse, like
sentences, has hierarchical organization (Green 1996: 13; Jurafsky and Martin
2009: 850–856). However, Conversation Analysis, which usually eschews refer-
ence to speaker intention, typically speaks of conversational structure as being
fundamentally sequential.
Do Conversation Analysts talk of conversation like grammarians? Yes and No.
No, in that they do not speak of hierarchical organization, though I will show
74 Richard Cameron

that some of their findings do display hierarchical organization. Yet, yes in the
following fashion. Recall the simple model of a grammar as consisting of discrete
units and methods for combining them. In Conversation Analysis, those units are
called “turn constructional units” or speaking turns. These units are combined
by what Sacks, Schegloff and Jefferson (1974: 696) identify as a “prominent type
of social organization” called turn-taking. Central to the workings of turn taking
and to all sequential organization in conversation is the Adjacency Pair.
An Adjacency Pair consists of two utterances which recognizably stand in an
action sequence such that the first pair part, spoken by one person, sets up and
requires the second pair part spoken by another person engaged in interaction
with the first (Schegloff 2007: 12–27). In other words, the first pair part projects the
necessity of the second pair part. As far as I know, both the Adjacency Pair and
Turn-Taking are universals of human spoken interaction.
Does the Adjacency Pair format constrain and enable both what we produce
and how we process, parse and interpret what is said as well? Consider the
concept of “significant silence.” Within the format of the Adjacency Pair, the rel-
evance of the second pair part as set up by the first pair part is referred to as “con-
ditional relevance” (Levinson 1983: 293; Schegloff 2007: 20). A case of “significant
silence” occurs when speech in the second pair part is relevantly absent. Con-
sider example (1) as cited by Levinson (1983: 320). T1, T2, etc. refer to turns. In line
T2, the (2.0) refers to seconds of silence.

(1) T1 C: So I was wondering would you be in your office on Monday (.) by any chance?
T2 (2.0)
T3 C: Probably not.
T4 R: Hmm yes =
T5 C: =You would?
T6 R: Ya
T7 C: So if we came by could you give us ten minutes of your time?

In T2, the two second silence occurs sequentially after C’s request in T1. Sequen-
tially, the silence is in the slot where the second pair part is expected and which
would have been produced by R in keeping with the format of the Adjacency Pair.
Moreover, using the format, we are able to assign ownership of the silence to one
of the two participants. It is the silence of R because it is R’s turn to speak. The
absence of R’s response is, then, interpreted by C as a negative response to the
request of T1 as evidenced by C’s utterance in T3. Hence, the silence is signifi-
cant. The Adjacency Pair format, and associated expectations, in turn constrains
how this silence is to be interpreted. In other words, interpretation of silence is
structure-dependent.
Looking for structure-dependence 75

Finally, sequences of Adjacency Pairs may occur such that relative to a


given pair, various kinds of Pre-Sequences or Post-Expansions may be added. In
addition, what Schegloff (2007: 111) terms “multiple insert expansions” or Lev-
inson (1983: 304) simply cites as “insertion sequences” may occur in a fashion
similar to center-embedding, a type of syntactic operation repeatedly discussed
in arguments about the interactions of grammar and processing. The possibility
of embedding is also tied to the possibility of recursion (Karlsson 2007). Thus, to
find evidence for center embedding in the framework of Adjacency Pairs would
also suggest that infinite recursion is, at least in principle if never in practice, a
characteristic of interactive usage. A similar line of argument is found in Levinson
(2013: 154–157).
Consider example (2) from Merritt (1976: 333):

(2) Restaurant (S24–5)


Q1 C: May I have a bottle of Mich?
Q2 S: Are you twenty-one?
A2 C: No
A1 S: No

Notice that centrally embedded within the Adjacency Pair parts of Q1 and A1 are
Q2 and A2, another Adjacency Pair, which go towards establishing conditions
necessary for providing A1. Thus, completion of A1 depends on completion of Q2
and A2. Schegloff (2007: 111–114) provides a much longer example with multiple
embeddings.
Given that Adjacency Pairs, and more broadly, Turn Taking are putative uni-
versals which can be shown to constrain and enable both what we produce and
how we process, parse and interpret what is said, and given that Adjacency Pairs
are subject to center-embedding, I conclude that this type of usage displays the
basic grammar feature of Structure-Dependence.

3 Category-sensitive processes in usage

Turning away from Conversation Analysis, we move onto Variationist Sociolin-


guistics for analyses of usage. In variationist research, a research project unfolds
roughly like this. First, we identify a sociolinguistic variable and its variants. A
sociolinguistic variable is available to speakers of a language when their grammar
provides them with two or more ways of saying or accomplishing the same thing
or something similar relative to the speaker’s intention. These options, or vari-
ants of the variable, need not be absolutely equivalent in referential meaning as
“meaningful variation” (Guy, Horvath, Vonwiller, Daisley and Rogers 1986: 28;
76 Richard Cameron

Kroch 1989) has also been investigated both synchronically and diachronically.
Indeed, in the work of Torres Cacoullos and Schwenter (2008) the authors take
on a meaningful variable, variable se-marking on Spanish verbs, and use quan-
titative methods to sort out what meanings may be associated with the variable’s
variants. Second, we identify those contexts in which variation does not occur
so as to get at contexts where variation can occur. At the same time, we generate
initial hunches or hypotheses about what types of constraints or correlations
may provide the bases for statistical patterning. Here is where category-sensitive
processes may come into play as possible constraints. Third, we analyze the data
which may push us to go back through the first two steps. Fourth, we generate
results. Fifth, we interpret the results either on the basis of a theoretically moti-
vated prediction or via ex-post facto interpretation.
What are category-sensitive processes? I will assume that these are linguistic
processes which are sensitive or responsive to certain types of categories and not
others. Examples that come to mind are case assignment and subcategorization.
Also consider Napoli’s (1993: 50–53) textbook discussion of the suffix [-er] in
English. The nominalizing agentive suffix of [-er] in English selects as input a
verb in order to generate a noun as in ‘organize’ to ‘organizer.’ Thus, this partic-
ular suffix is sensitive or responsive to a specific category as input. Without that
input, the nominalization fails. Consider the unhappy sequence of ‘organization’
to ‘organizationer.’ Likewise, the homophonous comparative suffix of [-er] must
select either an adjective or adverb which are either monosyllabic or disyllabic
with a light syllable. Thus, ‘fast’ to ‘faster’ and ‘pretty’ to ‘prettier’ but not ‘beau-
tiful’ to ‘beautifuller.’ Note that we are discussing lexical categories like nouns
or verbs or adjectives or adverbs. Indeed, in Carnie’s (2007: 483) more recent
textbook, if we look for ‘category’ in the index, we find the entry of “see parts
of speech.” In these examples, it seems clear that a type of co-occurrence must
occur between type of suffix and type of input for the process to proceed. Can
sociolinguistic variation be analyzed in a similar fashion?
When studying variation, we seek to establish statistical correlations between
variants of the variable to something else. A statistical correlation, or in variation-
ist terms a constraint, is a type of co-occurrence. Constraints are factors, external
to the set of variants of a variable which influence selections within the set. The
factors that we discover may be of three types: linguistic, stylistic, or social such
as class, gender, or ethnicity. To say that factors influence selections within the
set of variants is equivalent to saying that the variants of a variable are more or
less sensitive or responsive to certain types of factors and not others. Or, more
accurately, as speakers use variants they are sensitive or responsive to certain
types of factors and not others.
Looking for structure-dependence 77

Can the different variants of a variable show correlations to lexical categories


and, thus, indicate the presence of category-sensitivity?
Yes. Three come to mind: Puerto Rican and Madrid Spanish subject pronoun/
null subject alternations, English (t/d) deletion, and English (ing). Briefly, in
Cameron (1993), I showed that in Spanish four morphological classes of verbs
defined by degrees of potential ambiguity correlate to increasing or decreasing
probabilities of subject pronoun expression. For English (t/d) deletion, Guy (1991)
reported a scalar path of deletion going from monomorphemes with the highest
rate (mist), irregular past with the second highest (lost), and regular past (talked)
with the least.
Another clear example is the English variable (ing). This occurs only in word
final position in multisyllabic words where two variants alternate: a velar and an
alveolar nasal as in “running” vs. “runnin”. If the distinction between monosyl-
labic and multisyllabic words is also a difference between categories, we could
also say that mere presence of (ing) as a variable is category-sensitive: yes to mul-
tisyllabic words, no to monosyllabic words.
It appears that (ing) has been variable in English for centuries. The different
variants historically derive from different morphological origins. The alveolar
variant is associated with the Old English present participle. The velar variant
is connected to nominals. For review, go to Hazen (2008: 117–119), Labov (2001:
86–90), and Houston (1985). Current linguistic constraints on (ing) reflect this
diachronic portrait in that nominal forms statistically favor the velar variant and
verbal forms favor the alveolar variant.
In Cameron (2010), a study of the acquisition of statistical gender differences,
I analyzed (ing) in a group of 30 elementary school children from Oak Park, Illi-
nois. The children were African- and European-American in descent, but for my
purposes here, I will discuss the children as a combined group. Drawing on the
work of Hazen, Labov, and Houston, I developed a set of 8 categories to inves-
tigate. My goal for doing so was to rule out skewing in the data, not to pursue a
theoretically motivated study of the linguistic constraints. However, I found what
others have found as well. I list the 8 categories here in alphabetical order along
with examples from the children. All names are pseudonyms.

1) Adjective:
Abby: Sometimes my sister’s annoying.

2) Gerund: Where the nominal form may be clearly associated with a verbal
form.
Beyonce: And we bring it to every book club meeting, every month.
78 Richard Cameron

3) Gerund-Participle:
Kenny: Yea, Koby got accused of...cheating on his wife.

4) Progressives/Quasi-Progressives:
Kevin: I’m savin’ my money.
Delia: And I started laughing very very hard.

5) Nothing/Something: (everything & anything were excluded as invariantly


velar.)
Kenny: Yea, it does... It look... like a... like something you put in a radio except for
smaller.

 ither of these items, nothing or something, may also be used as adjectives.


E
This is infrequent. If so, I classified that token as an adjective. Consider this
example from the second grader, Abby, who responded to my question:
Richard: Where are you guys goin’?
Abby: Um something museum. I forget.

6) Noun: Where the nominal form is not clearly associated with a verbal form
(from the child’s perspective) or where the form is monomorphemic, as in
“morning” or “evening”.
Beyonce: I only eat the um cookie part. I don’t eat the frosting.

7) Or something: (a hedge. See Cameron 2010 for more analysis.)


Rebecca: Like, Nick’s, you can get better deals cause you can get huge pieces and a pop for
like 3 bucks or something, I don’t know.

8) Prepositions:
Elizabeth: It was during the summer so we went on vacation a lot.

Turning to Table 1, I provide an overview of the rates of the two variants, the velar
[ŋ] and the alveolar [n] in the children’s speech drawn from interviews I did with
them in pairs over pizza and cups of juice at lunch. I rank the constraints from
highest to lowest rates of the velar variant.
Notice that the categories which form the basis for statistical correlations are
lexical categories such as nouns, adjectives, and verbs. Moreover, they can be
ranked in terms of most to least favoring of the velar variant. Thus, we may say
that the statistical expression of the velar variant is Category-Sensitive. Velars like
nouns and dislike verbs and alveolars like verbs and dislike nouns. I conclude,
thereby, that Category-Sensitivity, another basic feature of grammar, is also a
structural characteristic of usage.
Looking for structure-dependence 79

Table 1: Overall for (ing) by Grammatical Category: Arranged most to least for [ŋ]

[ŋ] [n] Total

Preposition N 5 0 5
% 100 0 100
Noun N 28 3 31
% 90 10 100
Adjective N 84 13 97
% 87 13 100
Gerund N 68 12 80
% 85 15 100
Gerund Participle N 57 23 80
% 71 29 100
Or something N 43 23 66
% 65 35 100
Nothing/Something N 28 18 46
% 61 39 100
Progressives N 337 276 613
% 55 45 100

Total N 650 368 1018


% 64 36 100

χ² = 72.5, d. f. = 7, Sig. @ .001

4 Long-Distance Dependencies in usage

Long Distance Dependencies are instantiated by such processes as wh-questions,


relative clauses, topicalizations, and tough-movement. Agreement and anaphora
are also analyzed in terms of long-distance dependency (Boeckx 2009; Napoli
1993). In all cases, the term “long-distance” refers to the construal of some rela-
tionship between one item and another across clause boundaries in which the
relationship is necessary for identifying the grammatical form and/or function
of one of the items. In the cases of wh-questions or topicalization, the “long-dis-
tance” entails a dependency between a phrase and a node from which, at some
point, the phrase moved. Consider the topicalization of “Him, I know” in which
the phrase, “Him,” may be construed as the direct object of the two-place pred-
icate verb, ‘to know.’ Yet, when topicalized, the phrase “Him” is not in canonical
direct object position where it would receive case. Other non-movement based
analyses have been proposed (Culicover and Jackendoff 2005: 311).
80 Richard Cameron

Though such movements can be very long, they are subject to various kinds
of island constraints. In current terms, they may also be motivated by the need to
check features. Checking features and, in particular, island constraints, seem to
me to be quite grammar internal. As such, here I think Newmeyer is right to claim
a clear difference between what a grammar is and what we do. I cannot think of
any clear analogue or parallel to island constraints in usage, though the potential
relationship of processing and island constraints does come to mind. Yet, even
here, it is unclear that island constraints can be reduced to processing limitations
(Sprouse, Wagers and Phillips 2012).
Nonetheless, notice that island constraints apply to wh-movement or topi-
calizations within the frame of a given sentence. Long-distance agreement may
differ in at least one key fashion. Whereas the long-distance dependencies of wh-
questions or topicalizations occur, ultimately, within the boundaries of a given
sentence, long-distance agreement may occur across multiple sentence bound-
aries, not simply clauses within a sentence. Consider this invented example in
Spanish from my colleague Kay González-Vilbazo.

(3) A: ¿De quíen es la motocicleta? (Whose is the motorcycle?)


B: No sé. Pensé que era la tuya. (I don’t know. I thought it was yours.)

In the first sentence, la motocicleta (‘the motorcycle’) is singular and feminine. At


the end of the third sentence, la tuya (‘yours’) agrees in both number and gender.
Indeed, we could invent additional intervening sentences and still end up with la
tuya in agreement with la motocicleta at a greater distance.
Do analogues to long-distance dependencies occur in usage? Here I revisit a
variationist analysis of pronominal and null subject expression in Spanish that
has the flavor of a long-distance dependency across multiple sentences and/
or clauses (Cameron 1992, 1995). In other words, I will construe a relationship
between the choice of a pronominal or a null subject and the antecedents of that
subject at varying degrees of distance back in the preceding discourse. This anal-
ysis of usage may also provide the basis for an account of a frequently reported
generalization in variationist work on the alternation of null and explicit pronom-
inal subjects in Spanish.
I start off with that generalization. When quantifying the rate of selecting a
pronoun or a null subject in Spanish, singular subjects typically show a higher
rate of pronoun expression than do plural subjects and, conversely, plural sub-
jects favor null subject expression relative to singular subjects. In Table 2, I provide
data, reported in Cameron (1992: 241), based on interviews with 10 speakers of
Spanish from San Juan, Puerto Rico and 10 from Madrid, Spain.
Looking for structure-dependence 81

Table 2: Singular subjects versus plural subjects

San Juan Madrid


+Pro (total) +Pro (total)

Singular 50 % (1764) 26 % (1512)


Plural 19 % (358) 7 % (549)

By +Pro I mean that a subject pronoun was expressed, not the null subject option.
Given that this particular finding has emerged in numerous studies, we may ask
why it occurs. By way of an answer, I will detour into a very brief discussion of
what I have termed Switch Reference following on the earlier work of Silva-Cor-
valán (1982). Switch Reference is a central constraint on the variable alternation
of pronominal and null subject expression. In turn, I will identify shortcomings
of this constraint and show that by pursuing the shortcomings, we end up with a
basis for answering why the pattern occurs. Note that the finding and the answer
both come from an analysis of usage.
Switch Reference refers to two related reference relations that may hold
between two subject noun phrases. When these two noun phrases have different
referents, they are ‘switch’ in reference. When these two noun phrases share the
same referent, they are ‘same’ in reference.
Because my concern is with subject pronouns, I limit my focus to the subjects
of tensed verbs. Therefore, the relationship of switch or same reference holds
between two noun phrases in any stretch of discourse wherein the second noun
phrase is the human subject of the first tensed verb to occur after the preceding
subject noun phrase of a tensed verb. To clarify this, we may visualize the rela-
tionship in the following manner:

NP + Tensed V (X) (Y) NP + Tensed V (Z)


1 2

I call NP(1) the Trigger and NP(2) the Target. The Target is the subject NP which we
may identify as either switch or same in reference relative to the preceding Trigger
NP. When quantifying variation of pronominal and null subjects, the focus is on
the Target.
The following exchange from a radio talk show2 illustrates this:

(4) (1) 5s: Y es la unica casa que no tiene luz.


(2) A: Si si an- anoche el teléfono que usted me dió?

2 Broadcasting date: Oct. 12, 1989: 6:00 – 6:30; Notiuno 1320 AM: San Juan; 5s = Fifth Caller,
A = Alcalde or Mayor of San Juan.
82 Richard Cameron

(3) 5s: uhhuh


(4) A: De esa casa nosotros la llamamos.

(1) 5s: And it’s the only house that has no electricity
(2) A: Yes yes las- last night the number that you gave me?
(3) 5s: uhhuh
(4) A: From that house we called it.

A switch in reference occurs in the subject NP of line 4 relative to that of the


subject NP of line 2. The referent of the NP of line 4, nosotros (‘we’) is different
from the referent of the NP in line 2, usted (‘you’). Both noun phrases occur as the
subjects of a tensed verb. Sequentially, the subject of line 2, usted is the Trigger.
The following subject in line 4, nosotros is the Target.
An example of same reference is found in (5):

(5) (Interview 16: Carina: Age > 25)


(1) Pero ella se enfogonaba tanto.
(2) y entonces ∅ era bien gritona, ¡Ay dios!

(1) But she would get so mad.


(2) and then (she) was such a shouter, Oh my god!

Sameness of references is maintained across lines 1 and 2. In both cases, the


subject refers to ella (‘she’), who in this case was a particularly loud elementary
school teacher. In terms of Triggers and Targets, the subject of line 1 is Trigger to
the Target of line 2.
How does Switch Reference statistically pattern in San Juan and Madrid?

Table 3: Switch reference in San Juan and Madrid (frequency and rates)

San Juan

Trigger is Same Switch Total

+Pro 316 31 % 630 57 % 946 45 %


Null 689 69 % 475 43 % 1164 55 %

Total 1005 100 % 1105 100 % 2110 100 %

χ² = 139.141 Sig. @ .001

Madrid

Trigger is Same Switch Total

+Pro 114 11 % 315 30 % 429 21 %


Null 903 89 % 726 70 % 1629 79 %

Total 1017 100 % 1041 100 % 2058 100 %

χ² = 113.142 Sig. @ .001


Looking for structure-dependence 83

Both dialects display a significantly higher rate of pronoun expression in the


switch context than in the context where the Trigger and Target are the same
in reference. Also, if we submit this data to a Varbrul analysis, the probabilities
associated with the expression of an explicit subject are the same in these two
dialects separated by the Atlantic Ocean.

Table 4: Varbrul weights for pronoun expression switch reference: San Juan vs. Madrid3

San Juan Madrid

Switch .64 .65


Same .34 .34
Input .44 .16

But, we have a problem. In my interview with Carina, we discussed childhood


experiences in elementary school. One experience involved jealousy among her
friends because she had a noviocito (little boyfriend) who era uno de los nenes
lindos (‘was one of the good-looking boys’). I then asked what had happened to
her boyfriend. She replied:

(Interview 16: Carina; Age > 25)


(1) Bueno, cuando ∅ nos graduamos del sexto grado,
(2) yo fuí a una escuela
(3) y él fue a otra.
(4) Pues ahí ∅ nos teníamos que dejar
(5) porque ∅ no nos ibamos a ver.

(1) Well, when (we) graduated from sixth grade,


(2) I went to one school
(3) and he went to another.
(4) Well, then (we) had to separate
(5) because (we) weren’t going to see each other.

In line 1, the speaker and her boyfriend are referenced by the null first person
plural subject of (we). In other words, the first person plural subject of line 1 is
understandable as a set constituted by the set-members of the speaker and her
boyfriend. The individual members of this set are expressed in lines 2 and 3. In
line 4, Carina reintroduces the subject set that was initially expressed in line
1. The subject set of line 4 is partially co-referential or same in reference to the
subject of line 3 because the subject set of line 4 properly includes the subject of

3 Note that all subjects, singular and plural, were included.


84 Richard Cameron

line 3. The subject of line 3 él (‘he’) is a set member of the set in line 4. Yet, the
two subjects are not identical in reference and are, thereby, switch in reference.
However, if we combine the noun phrases of line 2 and line 3, then the subject
noun phrase of line 4 is same in reference to the set elements expressed in lines 2
and 3 as well as to the initial first person plural subject of line 1.
Adhering to a definition of switch reference in which two noun phrases are
considered the same in reference if and only if the preceding Target and Trigger
are referentially identical cannot capture, statistically, how speakers respond to
the explicit or inferable presence of set members of plural subject sets in the pre-
ceding discourse beyond the preceding Trigger noun phrase. Therefore, turning to
plural subjects only, I ask the following questions about the preceding discourse.

Have the set members or the set itself been explicitly or inferably mentioned in the preced-
ing discourse:
(a) within 5 preceding clauses?
(b) within 6 to 10 preceding clauses but not 5?
(c) more than 10 but not less than 10?
(d) not mentioned either explicitly or inferably?

The gradation of 5, then 10, then more than 10 clauses back in the discourse is an
attempt to operationally capture the effects of the saliency of given information
which the set members represent. I hypothesize that plural subject noun phrases,
the elements of which have been entered into the discourse within 5 preceding
clauses will more frequently be null than those plural subject noun phrases
whose elements were mentioned more than 10 clauses before but which have not
since been mentioned. This strategy of analysis follows on the use of sets and set-
elements in the work of Prince (1984) on left-dislocations and topicalizations, two
types of long-distance dependencies. I call this Set-to-Elements Saliency. Also rel-
evant here is the work of Givón (1983) and Gundel, Hedberg, and Zacharski (1993).
As with Switch Reference, despite differences in their rates, the speakers from
San Juan and Madrid show parallel behavior and patterning for Set-to-Element
Saliency. See Table 5.

Table 5: Set-to-elements saliency: San Juan vs. Madrid

San Juan

Distance back of set-elements % of Subjects that are +Pro Total N and % of Total N

(1) Within 5 clauses 14 % 317 (89 %)


(2) Within 6 to 10 44 % 18 (5 %)
(3) Beyond 10 67 % 12 (3 %)
(4) New 67 % 9 (3 %)
Looking for structure-dependence 85

Madrid

Distance back of set-elements % of Subjects that are +Pro Total N and % of Total N

(1) Within 5 clauses 5% 507 (93 %)


(2) Within 6 to 10 17 % 18 (3 %)
(3) Beyond 10 35 % 17 (3 %)
(4) New 67 % 6 (1 %)

Therefore, despite the rate difference of subject pronoun expression between the
two dialects, both dialects respond in kind to the preceding presence of either the
set members or the set itself for plural subjects. This level of similarity is made
even more explicit when the data is submitted to varbrul analysis because the
varbrul weights associated with the different factors for San Juan and Madrid are
very close in value.

Table 6: Varbrul weights for set-to-elements saliency: San Juan vs. Madrid
(Nosotros(as) & Ellos(as) only)4

Distance back San Juan Madrid

(1) Within 5 clauses .45 .46


(2) Within 6 to 10 .78 .72
(3) Beyond 10 .90 .88
(4) New .91 .97
Input .16 .05

These results suggest the following. As with long-distance dependencies where


we construe a relationship between one item and another across clause bound-
aries, here we also construe a relationship between the forms which plural
subject expression statistically takes, either as a pronominal or null subject, and
the presence of the subject’s set-members scattered across clauses or sentences
in the preceding discourse. I conclude, thereby, that long-distance dependency,
another basic feature of grammar, can also be a structural characteristic of usage.
Now, we are in a position to answer the question posed earlier. Why do sin-
gular subjects typically show a higher rate of pronoun expression than do plural
subjects or conversely, why do plural subjects favor null subjects relative to sin-
gular subjects? If we focus only on singular subjects within the useful framework
of Switch Reference, we find the following distributions of rates and frequencies:

4 Also cases of Switch References were included.


86 Richard Cameron

Table 7: Switch reference on singular subjects only: San Juan vs. Madrid (Percentages = rates of
pronoun expression)

San Juan Madrid


+Pro (total) +Pro (total)

Switch 66 % (879) 38 % (740)


Same 35 % (876) 14 % (769)

Notice that again we find a pattern for singular subjects that is similar to the one
we found for all subjects combined. Also observe how the singular subjects get
distributed across the categories of switch or sameness of reference. For the San
Juan speakers, of the total of 1,755 subjects analyzed, 50 % fall in the category of
switch and 50 % in the category of same. A similar, pattern happens in Madrid. Of
the total of 1,509 subjects, 49 % fall in the category of switch and 51 % in that of
same. Thus, the category which favors null subjects receives a proportion of data
basically equivalent to the category favoring pronoun expression.
This is not the case for plural subjects. Returning to Table 5, notice that the
first distance of “Within 5 clauses” which highly favors null subject expression is
the category with the highest raw incidence of plural subjects for both San Juan
and Madrid. In San Juan, this accounts for 89 % of the data and 93 % in Madrid.
This indicates that usage or discourse is typically structured such that plural sub-
jects more frequently occur in a context that favors null subject expression than
do singular subjects. I conclude, thereby, that usage has recurrent organization
that can and does influence speaker choice of options within the syntax and
which can, at times, also provide a basis for an account of that structure as it is
spoken by everyday people in interaction.

5 Conclusion

I did not set out to falsify the distinction between grammar and usage. Indeed,
as already mentioned, I don’t see this distinction as a theory so much as a funda-
mental assumption that contributes to theory building and research design. As
noted, I also find many of Newmeyer’s arguments for distinguishing the two quite
compelling. However, I do find the binarism inherent in the distinction curious,
at times amounting to a type of ideology. Ideologies can change. Therefore, I have
sought to problematize that binarism. If there are certain structural character-
istics of grammar which have analogues or parallels, not exact replicas, in usage,
then the binarism which organizes the ground rules for the argument is problem-
atic and “the world of difference between what a grammar is and what we do,”
Looking for structure-dependence 87

as Newmeyer (2003: 687) poetically claimed, may be less a world of difference


and more a world of parallel universes that raggedly overlap. By raggedly, I mean
sometimes they do overlap even as sometimes they do not. Consider my points
about long-distance dependencies. It appears that such aspects as island con-
straints or feature checking are very much grammar internal whereas the ability
to construe a relationship between one item and others across distance is a
feature of both grammar and usage in Newmeyer’s terms.
I am reminded of Wasow’s (2009: 269) discussion of gradient data and the
possibility of gradient grammars as well as his point that:

the location of the competence/performance boundary is so hard to pin down. The


determined defender of purely categorical grammars always has the option of stipulating
that any gradient information is extra-grammatical and only enters into use of language
through some performance mechanism. In the absence of some principled basis for
assigning information to competence or performance, however, this reduces the debate to
an uninteresting terminological one.

I guess what I am arguing for is a new set of terms, something other than grammar
and usage or competence and performance, something not binary, something
n-nary.

References
Bernal, Byron and Alfredo Ardila (2009): The role of the arcuate fasciculus in conduction
aphasia. Brain 132: 2309–2316.
Boeckx, Cedric (2009): On long-distance agree. Iberia 1: 1–31.
Cameron, Richard and Scott Schwenter (2013): Pragmatics and variationist sociolinguistics. In:
Bayley, Robert, Richard Cameron and Ceil Lucas (eds.), The Oxford Handbook of Sociolin-
guistics, 464–483. Oxford: Oxford University Press.
Cameron, Richard (2010): Growing up and apart: Gender divergences in a Chicagoland
elementary school. Language Variation and Change 22: 279–319.
Cameron, Richard (1996): A community-based test of a linguistic hypothesis. Language in
Society 25(1): 61–111.
Cameron, Richard (1995): The scope and limits of switch reference as a constraint on
pronominal subject expression. Hispanic Linguistics 6/7: 1–27.
Cameron, Richard (1993): Ambiguous agreement, functional compensation, and nonspecific
tú in the Spanish of San Juan, Puerto Rico and Madrid, Spain. Language Variation and
Change 5: 305–334.
Cameron, Richard (1992): Pronominal and Null Subject Variation in Spanish: Constraints,
Dialects, and Functional Compensation. Ph.D. dissertation, University of Pennsylvania,
Philadelphia (distributed as IRCS Report 92–22 by The Institute for Research in Cognitive
Science, University of Pennsylvania, Philadelphia).
88 Richard Cameron

Carnap, Rudolph (1942/1961): Introduction to semantics and formalization of logic. Cambridge,


MA: Harvard University Press.
Carnie, Andrew (2007): Syntax: A generative introduction. Malden, MA: Blackwell.
Crain, Stephen and Mineharu Nakayama (1987): Structure dependence in grammar formation.
Language 63: 522–543.
Culicover, Peter and Ray Jackendoff (2005): Simpler syntax. Oxford: Oxford University Press.
Eagleton, Terry (1991): An introduction to ideology. London: Verso.
Givón, Talmy (1983): Topic continuity in discourse: An introduction. In: Talmy Givón (ed.), Topic
continuity in discourse: A quantitative cross-language study, 1–41. (Typological Studies in
Language 3). Philadelphia: John Benjamins.
Green, Georgia (1996): Pragmatics and Natural Language Understanding. Mahwah, NJ:
Lawrence Erlbaum.
Gundel, Jeannette, Nancy Hedberg and Ron Zacharski (1993): Cognitive status in the form of
referring expressions in discourse. Language 69: 274–307.
Guy, Gregory (1991): Explanation in variable phonology: An exponential model of morphological
constraints. Language Variation and Change 3: 1–22.
Guy, Gregory, Barbara Horvath, Julia Vonwiller, Elaine Daisley and Inge Rogers (1986): An
intonational change in progress in Australian English. Language in Society 15: 23–52.
Hauser, Marc, Noam Chomsky and W. Tecumseh Fitch (2002): The faculty of language: What is
it, who has it, and how did it evolve? Science 298: 1569–1579.
Hazen, Kirk (2008): (ing): A vernacular baseline for English in Appalachia. American Speech 83:
116–140.
Houston, Ann (1985): Continuity and change in English morphology: The variable (ing). Ph.D.
dissertation, University of Pennsylvania, Philadelphia.
Jackendoff, Ray (2011): What is the human language faculty? Two views. Language 87:
587–624.
Jurafsky, Daniel and James Martin (2009): Speech and language processing: An introduction to
natural language processing, computational linguistics, and speech recognition. Upper
Saddle River, NJ: Pearson-Prentice Hall.
Karlsson, Fred (2007): Constraints on multiple center-embedding of clauses. Journal of
Linguistics 43: 365–392.
Kroch, Anthony (1989): Reflexes of grammar in patterns of language change. Language
Variation and Change 1: 199–244.
Labov, William (2001): Principles of linguistic change, Volume 2: Social factors. Malden, MA:
Blackwell.
Levinson, Stephen (2013): Recursion in pragmatics. Language 89: 149–162.
Levinson, Stephen (1983): Pragmatics. Cambridge: Cambridge University Press.
Merrit, Marilyn (1976): On questions following questions in service encounters. Language in
Society 5: 315–357.
Napoli, Donna Jo (1993): Syntax: Theory and problems. Oxford: Oxford University Press.
Newmeyer, Frederick (2003): Grammar is grammar and usage is usage. Language 79: 682–707.
Prince, Ellen (1984): Topicalization and left-dislocation: A functional analysis. In: Sheila White
and Virgina Teller (eds.), Discourses in reading and linguistics. Annals of the New York
Academy of Sciences, 213–225. New York: New York Academy of Sciences.
Sacks, Harvey, Emanuel Schegloff and Gail Jefferson (1974): A simplest systematics for the
organization of turn-taking for conversation. Language 50: 696–735.
Looking for structure-dependence 89

Schegloff, Emanuel (2007): Sequence organization in interaction: A primer in conversation


analysis. Volume 1. Cambridge: Cambridge University Press.
Silva-Corvalán, Carmen (1982): Subject expression and placement in Mexican-American
Spanish. In: Jon Amastae and Lucía Elías-Olivares (eds.), Spanish in the United States:
Sociolinguistic aspects, 93–120. New York: Cambridge University Press.
Sprouse, Jon, Matt Wagers and Colin Phillips (2012): A test of the relation between working-
memory capacity and syntactic island effects. Language 88: 82–123.
Torres Cacoullos, Rena and Scott Schwenter (2008): Constructions and pragmatics: variable
middle marking in Spanish subir(se) ‘go up’ and bajar(se) ‘go down’. Journal of Pragmatics
40: 1455–1477.
Wasow, Thomas (2009): Gradient data and gradient grammars. In: Nikki Adams, Adam Cooper,
Fay Parrill and Thomas Wier (eds.), Proceedings of the 43rd Annual Meeting of the Chicago
Linguistic Society, 255–271. Chicago: Chicago Linguistic Society.
Weinreich, Uriel, William Labov and Marvin Herzog (1968): Empirical foundations for a theory of
language change. In: Winfred Lehmann and Yakov Malkiel (eds.), Directions for historical
linguistics: A symposium, 95–188. Austin: University of Texas Press.
Mary A. Kato, University of Campinas
Variation in syntax: Two case studies on
Brazilian Portuguese*

Abstract: This article deals with two types of syntactic variation in Brazilian Por-
tuguese found in both the community and the individual. The first deals with the
possibility of null or overt third-person pronouns in embedded clauses, and the
second concerns the possibility of “fronted” or “in-situ” wh-constituents. In both
cases, variation is related to syntactic change, but the former case is not found
in the core grammar of children, while the latter case is. We propose that the
null subject – the old form – is the result of schooling and is used increasingly
frequently with age, while the position of the wh-constituent has to do with the
change in the position of focus being present in the core grammars of children
and reflects the frequency of the variants in the input.

1 Introduction

Diachronic studies on Brazilian Portuguese (BP) have shown some major changes
in its syntax since the beginning of the nineteenth century, a scenario where vari-
ation in the community is expected to occur: “The spread of a new parameter
setting through a speech community is typically manifested by categorically
different usage on the part of different authors rather than by variation within
the usage of individuals, although the data are sometimes not as clear as that ide-
alization would suggest, because a writer often commands more than one form of
a language”. (Lightfoot 1991: 162, highlighted by the author)
Here, however, I will discuss two cases where variation, or “optionality”1, is
found both in the community and in the individual. The first case concerns the
variation in BP between the null subject and the overt pronominal subject when
the antecedent is a c-commanding element. In the same context, European Por-
tuguese (EP) licenses only the null variant.

* This study had the support of CNPq grant 305515/2011–2017. I thank the audience at the Work-
shop on System, Usage, and Society, in Freiburg, November 2011 for discussions and sugges-
tions. I also thank the careful review and the innumerable comments of the volume organizers.
Needless to say, all remaining faults are mine.
1 I use variation and optionality as quasi-synonyms.
92 Mary A. Kato

(1) a. 
O [pai do Joãok]i disse que elei/k/Øi/*k estava cansado BP
the father of-the John said that Ø was tired

b. 
O [pai do Joãok]i disse que Øi/j/ele*i/k estava cansado EP
the father of-the John said that Ø was tired
‘John’s father said that he was tired.’

The second case is the optionality of wh-movement present in the grammar of


contemporary Brazilians:2

(2) a. O que a Ana viu? Wh-SV BP *EP3


what the Ana saw

b. A Ana viu o que? Wh-in-situ BP %EP4


the Ana saw what
‘What did Ana see?’

In both cases of variation there is an increased frequency in one of the alterna-


tives in BP: overt pronominals in the first case and wh-in-situ in the second one.
The two studies show the synchronic variation in the Brazilian community as the
result of diachronic change. My claim in this study, however, is that such vari-
ation can be present in the grammars of single individuals, a fact that poses prob-
lems in a Minimalist framework,5 where variation/optionality is not expected
in grammatical derivations where checking operations are involved (Saito and
Fukui 1998).6
In section 2, I will describe the variation between the null and the overt
subject in complement clauses in the Brazilian adult I-language and will explain
why such a “doublet” is possible based on Kato (2011). In that article, Kato pro-
poses that syntactic variation is not possible in children’s core grammars, but
that adults can exhibit such variation if they acquire the null subject through
late acquisition, in which case it will appear in an extended periphery of their
core grammar, within their I-language. Chomsky (1981: 8) introduces this concept
of an extended, marked periphery: “For such reasons as these, it is reasonable

2 See Lopes-Rossi 1996 and Kato and Duarte 2002.


3 EP disallows such word order. It only licenses the Wh-SV order with D-linked wh-elements
(Que livro a Ana comprou? ‘Which book has Ana bought?’)
4 According to Lopes-Rossi (1996), EP has a more restricted occurrence of wh-in-situ, signaled
here with the symbol “%”.
5 If fronted wh-questions involve movement, and movement is a costly operation, speakers
should always prefer the wh-in-situ construction.
6 According to these authors, heavy NP-shift, for example, is optional as no checking operation
is involved.
Variation in syntax: Two case studies on Brazilian Portuguese 93

to assume that UG determines a set of core grammars and that what is actually
represented in the mind of an individual even under the idealization to a homo­
geneous speech community would be a core grammar with a periphery of marked
elements and constructions”.
In section 3, we will see the variation not only between plain wh-questions
as in (2), but also among cleft wh-questions (see Section 2). Here the hypotheses
in Kato (2011) do not work since children exhibit different types of wh-questions
already in their core grammars. A refined formal analysis will show that variants
with the same meaning but different numeration, namely the initial vocabulary
in the derivation, may not appear during children’s core grammars, and that
forms yielded by erasure operations at PF count as phonological variants and as
a single syntactic variant. Section 4 provides a brief conclusion.

2 T
 he variation between null subjects (NS)
vs. overt pronominal subjects in Brazilian Portuguese

2.1 The change in the null subject (NS) parameter

Duarte (1995) shows that the overt subject pronouns (OSP) have been replacing
the null referential subject since the nineteenth century in Brazilian Portuguese,
thereby disobeying Chomsky’s (1981) well-known Avoid Pronoun principle.

100%
80 77
80% 75
60%
50
54
40%
26
33
20% Figure 1: Null referential subjects in Brazilian
Portuguese over seven time periods (adapted
0%
1845 1882 1918 1937 1955 1975 1992 from Duarte 1995: 17)

Kato (2000) shows that BP has been losing other properties of the NS parameter,
namely free inversion7 and long clitic climbing,8 a sign that the change has a para­
metric nature. However, there are contexts of resistance that deserve our atten-

7 See also Berlinck (2000).


8 See also Pagotto 1993 and Cyrino 1993.
94 Mary A. Kato

tion: (a) in the subject of complement clauses, where variation was seen in (1), (b)
in non-referential subjects, where the NS expletive is still retained9 (cf. (3)), and
(c) in minimal answers where the NS is still categorical (cf. (4)).

(3) Ø chove em São Paulo, Ø faz sol no Rio


Ø rains in São Paulo, Ø makes sun in Rio
‘It rains in São Paulo, it is sunny in Rio.’

(4) a. você quer café?


‘you want coffee?’

b. Ø quero
Ø want
‘I do.’

Observe, however, that in the minimal answer it is not only the subject that is
null, but also the object and other possible complements or adjuncts that may
also be elliptical. In Kato (2009) the apparent null subject is not analyzed as a NS,
but as the result of focus extraction of the verb followed by IP ellipsis (for Finnish
cf. Holmberg 2001). Notice that having an NS or an OSP does not affect the result
of ellipsis, which is why both EP and BP exhibit the same type of minimal answer.

(5) Eu/∅10 quero café → [F queroV [IP eu/∅ queroV café] BP EP

The most intriguing NS is the anaphoric one in complement clauses. Several anal-
yses have been proposed,11 but Kato (2009) interprets it as a logophoric pronoun12
which occurs as the subjects of complements of dicendi verbs. The logophoric NS
(LNS) therefore occurs in a subset of contexts of prototypical pro identified by
inflection.

9 Kato and Negrão (2000) have subparametrized NS languages between those where both ref-
erential and non-referential subjects can be null and those where only expletives can be null.
But in BP change has affected such structures as it becomes more and more common to have an
alternative construction in which an adjunct is raised to satisfy the EPP instead of merging a null
pronoun (Kato and Duarte 2003).
(i) São Paulo chove, Rio faz sol.
São Paulo rains, Rio makes sun.
‘It rains in São Paulo, it is sunny in Rio.’
10 In order to avoid representing the NS as pro, we will represent it as Ø.
11 It has been proposed to be a variable (Modesto 2000) or an A-trace (Ferreira 2000).
12 In Kuno’s (1972) definition, logophoric pronouns are pronouns which are either the speaker
or the addressee in the direct discourse, and which appear as third person in indirect discourse.
Variation in syntax: Two case studies on Brazilian Portuguese 95

2.2 The problem of the interpretation of null subjects in BP

In a non-NS language like English (EN), in an anaphoric context the overt


pronoun with a c-commanding antecedent covers exactly the same range of ref-
erences as the NS in a prototypical Null Subject Language (NSL) like European
Portuguese:

(6) a. Johni said that hei/k was tired EN [-NSL]

b. O Joãoi disse que Øi/k estava cansado EP [+NSL]


the John said that Ø was tired

c. O Joãoi disse que elei/k estava cansado BP [=EN]

d. O João disse que Øi/*k estava cansado BP [≠EP]

Brazilian Portuguese, on the other hand, seems to have a “morphological


doublet” when the subject of the complement clause corefers with the subject
of the main clause: Both the weak pronoun ele and the NS are possible. In BP,
the overt pronoun functions like the one in English for interpretation. However,
the NS in BP, unlike EP, is always coreferent with the c-commanding antecedent.
Thus the NS occurs only in a subset of the null subjects in EP, namely with a
c-commanding antecedent.13 Recall that Kroch (1994) excludes “morphological
doublets” in a single grammar:

Syntactic heads, we believe, behave like morphological formatives generally in being


subject to the well-known ‘Blocking Effect’ (Aronoff 1976), which excludes morphological
doublets, and more generally, it seems, any coexisting formatives that are not functionally
differentiated. (…) This exclusion, however, does not mean, either for morphology or for
syntax, that languages never exhibit doublets. Rather it means that doublets are always
reflections of unstable competition between mutually exclusive grammatical options. (Kroch
1994: 181, highlighted by the author)

The intriguing question that I aim to answer here is: why does BP allow mor-
phological “doublets”?

13 See studies by Kato 2000, Figueiredo Silva 2000, Modesto 2000, Ferreira 2000 and Rodrigues
2004 and the consensual judgment of these data.
96 Mary A. Kato

2.3 Core grammar and periphery

My analysis of the existence of the apparent “doublet” NS/PS is based on the


following hypotheses:14
(a) core grammars, attained by parameter selection, have no doublets
(b) core grammars are attained before a learner starts school
(c) if the child is the agent of syntactic changes (Lightfoot 1991), the child’s core
grammar should always have the innovative variant, namely the OSP in BP
(d) I-language may contain doublets if the adult has an extended periphery
obtained through late acquisition.

The empirical confirmation of my hypotheses is based on the acquisition of sub-


jects in BP of a child, Ana, in Magalhães (2006), whose speech data was collected
between 2002 and 2004, from the ages 2;4.11 to 2;10.29. The data reveal the follow-
ing findings:

NS expletives
100%
weak prons demons
nouns
80%

60%

40%

20%

0%
Figure 2: Types of subject produced by Ana
2; .2
11
25
21

3
3
4

2; 1

9
9.
6.

7.

.2
8.

10
4.

5.
4.

2;

2;

10
2;

2;

from 2;4.11 to 2;10.2915 (apud Magalhães


2;

2;
2;

age: years; months. days 2006: 66)

(a) In the early stages her acquisition is similar to that of children of other
languages, exhibiting root infinitives (Rizzi 1992) and imperatives, but also
minimal answers, which can also be analyzed as extractions of the focalized
element of the questions.
(b) As the child comes close to the target grammar, however, the OSP takes over,
and the NS becomes less frequent; the two do not constitute “doublets”.

14 A full version of the work where I develop these hypotheses can be read in Kato (2011).
15 MLU (Mean Length of Utterance), mean length: 1.5–2.7.
Variation in syntax: Two case studies on Brazilian Portuguese 97

But the most interesting data come from Magalhães’ (2003) study on the acqui-
sition of the logophoric NS (LNS) through schooling.

Table 1: Pronominal and logophoric null subjects in complement clauses (adapted from Magal-
hães 2003)

1st-graders 3rd/4th-graders16 7th/8th-graders17

OSP 97.89 % 78.0 % 50.38 %


LNS 2.11 % 22.0 % 49.62 %

In sum, children exhibit an insignificant rate of LNS when they start school, but
show real variation between OSP and LNS when they end secondary education.
Considering that the LNS is learned by instruction, its place is in the periphery of
the literate Brazilian I-language. The individuals “code-switch” between the OSP,
learned through selection during language acquisition, and the LNS, learned
through instruction.18

Figure 3: Core and periphery in BP grammar


(Kato 2011: 327)

3 Variation in wh-question constructions in BP

3.1 The existing list of wh-constructions in BP

A more complicated case is the variation in BP regarding wh-constructions. The


two wh-constructions in (2) (O que a Ana viu?/A Ana viu o que?) are not the only
types of wh-constructions in BP. The following are all the types that have existed

16 Around 7 and 8 years old.


17 Around 13 and 14 years old.
18 The same happens with third-person clitics in BP, which are late acquisition items, and are in
variation with weak non-clitic pronouns (cf. Kato, Cyrino and Correa 2009).
98 Mary A. Kato

since Old Portuguese (OP).19 Here, we also include the judgments regarding the
grammar of EP.

(7) a. (o) que20 comprou a Ana? Wh-VS type OP EP %BP21


what bought the Anne?

b. (o) que é que a Ana comprou? Wh-é-que SV type OP EP BP


what is that the Anne bought

c. é o que que a Ana comprou?22 É-Wh-que type *OP *EP BP


is what that the Anne bought

d. (o) que que a Ana comprou? Wh-que-SV type *OP *EP BP


what that the Anne bought

e. o que a Ana comprou? Wh-SV type *OP23 *EP BP


what the Anne bought

f. a Ana comprou o que? Wh-in-situ type *OP %EP24 BP


the Anne bought what
‘What has Anne bought?’

Below are the quantitative data found in Lopes-Rossi’s (1996) study covering the
nineteenth century to today (Table 2).
Wh-questions in Brazilian Portuguese underwent two basic changes: a) a
change in word order, from Wh-VS to Wh-SV to wh-in-situ, and b) a shift from VS
to SV order, in which the copula introduces the cleft wh-question; this SV type
is also found in EP. The reduced type of the cleft construction (7d), however, is
a Brazilian phenomenon. The most impressive quantitative change, exclusive to
BP, can be seen in the wh-in-situ construction: From zero occurrence in the first

19 Old Portuguese also had the reverse pseudo-cleft, which disappeared with the appearance
of the reverse cleft type. Here we do not include its analysis (see examples and analysis in Kato
and Ribeiro 2009).
20 Que and o que are in variation, the former preferred by Europeans and the latter by Brazilians.
21 In BP, the order Wh-VS is only acceptable with formulaic expressions, unaccusative verbs and
with wh-constructions of the quanto type (cf. Kato 2000; Kato and Duarte 2002).
(i) Como vai o Pedro? (ii) Quando chegam eles? (iii) Quanto ganha a Ana?
‘How is Peter?’ ‘When do they arrive?’ ‘How much does Ana earn?’
22 Examples of this type were not found in the author’s corpus, but are common in children, as
will be seen below.
23 Like EP, OP licenses the order Wh-SV if the wh-expression is of the D-linked type:
(i) Que carro a Maria comprou? (ii) Em que cidade o João nasceu?
‘Which car has Maria bought?’ ‘In which city was John born?’
24 While BP has no structural restrictions for wh-in-situ structures, EP is not so unconstrained,
and its frequency is much lower.
Variation in syntax: Two case studies on Brazilian Portuguese 99

Table 2: The evolution of wh-questions over time in BP (adapted from Lopes-Rossi 1996)

Types of wh-question Written Language Spoken Language


1800–1850 1850–1900 1900–1950 1950–2000 NURC TV
corpus25

Wh V(S) (a) % 95.8 88.0 67.0 21.9 18.3 13.9


no. (137) (29) (25) (13) (13) (39)
Wh é que (S)V (b) % 2.8 6.0 29.8 29.7 37.0 18.0
no. (4) (02) (6) (34) (27) (52)
Wh que (S)V (d) % 0 0 0 7.9 21.0 18.6
no. (0) (0) (0) (9) (15) (53)
Wh SV (e) % 1.4 3.0 3.0 12.3 11.2 16.1
no. (2) (1) (1) (14) (8) (45)
Wh-in-situ (f) % 0 3.0 0 28.1 12.5 32.4
no. (0) (1) (0) (32) (9) (90)

Total 100 % 100 % 100 % 100 % 100 % 100 %

half of the nineteenth century, we find 32.4 % in TV spoken language,26 which


has more mixed types of speakers than the spoken corpus of educated Brazilians
(NURC), and probably reflects the Brazilian vernacular better.

3.2 Complementing the data with child language

In the analysis of an adult corpus of Brazilian Portuguese, we do not find the


in-situ cleft like the one in (7c) (É o que que a Ana comprou?). However, we do
find such forms in French, according to Noonan (1989),27 which exhibits a similar
variation in wh-questions:

(8) a. (c’ést) où que t’ás mis les oranges?


it is where that you have put the oranges
b. où que t’ás mis les oranges?
where that you have put the oranges
c. où t’ás mis les oranges?
Where you have put the oranges
‘Where have you put the oranges?’

25 From NURC (Norma Urbana Culta), a corpus of educated Brazilians.


26 Mostly from soap operas.
27 Noonan (1989) provides a deletion analysis similar to the one provided here.
100 Mary A. Kato

What is surprising is the fact that Lessa (2003) found examples like (9a-d) of
in-situ clefts with initial copula in Brazilian children’s language and also in the
syntax of their mothers’ input.28

(9) a. é o que que cê qué, filha? (mother’s input)


is what that you want, baby?
‘What is it that you want, baby?’

b. é quem que tá tomano banho? (mother’s input)


is who that is taking bath?
‘Who is it that is taking a shower?’

c. é quem que tá tocano o violão? (Luana, 02; 03. 22)


is who that is playing the guitar?
‘Who is it that is playing the guitar?’

é que que tá’ gravano?


d. (Luana, 02;03. 22)
is who that is recording?
‘Who is it that is recording?’
(apud Lessa 2003: 41–45)

Normally in historical linguistics we only make use of written data. But, as


became clear from wh-constructions in BP, in order to determine what is pos-
sible in the I-language of Brazilians, it was also necessary to appeal to what the
child produces as part of his/her core grammar. Surprisingly, one finds that in the
syntax of the mother’s input one also finds in-situ clefts. Though adults rarely
produce them, these sentences were tested with some subjects afterwards, and
the general reaction was that they would not use them, but that they could easily
understand and hear them.
The most important things that we learn through Lessa’s (2003) study are the
following:
a) Contrary to expectations, the two children produce the full range of variation
in wh-questions.
b) The author correlates the earliness of each variant to the frequency of the
variant in the mothers’ input.
c) In both children the in-situ case is the earliest one.29

28 Notice that while the wh-element is not in initial position the main stress falls on it, and with
the copula in initial position the prosodic pattern changes with the stress on the second syllable.
29 The author claims that in some dialects the in-situ type is not the most frequent one, and in
this case the most frequent variant would be acquired first. According to her, this would account
for the late acquisition of wh-in-situ in children born in São Paulo compared to the early acqui-
sition of children born in the Brazilian Northeast. This is very much in line with recent work by
Yang (2002), who claims that frequency in the input accounts for early acquisition.
Variation in syntax: Two case studies on Brazilian Portuguese 101

Here, we will adapt Lessa’s study using not the frequency of each mother, but the
general frequency found in adults talking on TV as found in Lopes-Rossi’s (1996)
study. The conclusions are the same. The most frequent form in the input is the
first to be acquired by children. The less frequent pattern in the adult is the last
to be learned.

Table 3: Frequency of adult types of wh-questions and earliness of acquisition of each type by
the children L and E

Adult TV Children30
(adapted from Lopes-Rossi 1996) (adapted from Lessa 2003)
Order of acquisition of the types
L E

Wh-in-situ 32.4 1st 1st


Wh-que-SV 19.6 2nd 4th
Wh-é-que-SV 18.0 4th 3rd
Wh-SV 16.1 3rd 2nd
É-que-que-SV 0 5th 5th

An explanation as to why children exhibit variation in their core grammar, appar-


ently in contrast to our assumption in section 2, will be provided after the struc-
tural analysis.

3.3 The structure of wh-questions in BP

3.3.1 The structure of fronted wh-questions in Portuguese


The present analysis assumes a cartographic approach, with a fine left sentential
periphery (Rizzi 1997), and a low vP periphery as in Belletti (2004).

(10) [ForceP [TopP [[FocP [TP T [TopP [FocP[vP….[VP….

Any interrogative sentence, whether it is a wh-question or a yes/no question,


must have the head of ForceP filled with a null Q in order for the sentence to be
interpreted as a question, which, in Portuguese, has only a prosodic manifes-
tation, while in languages like Japanese it has a clause-typing interrogative mor-
pheme ka/no.

30 L and E are the iunitials of the children.


102 Mary A. Kato

Focus (F) is a syncretic head which checks both focus and wh-elements
in Portuguese.31 In OP and EP the verb has been analyzed as moving from T to
the head of FocP, or traditionally CP, producing the V2 effect.32 Here, however,
I follow Kato and Raposo’s (1996) analysis, in which the wh-operator moves to
Spec of FocP, but the verb stays in T, with F null (Ø), though with Focus features.
The subject stays in vP, contrary to Germanic languages. The only context where
the head F is not null is when the verb moves to it in OP and EP, and the Spec
of FocP is left empty. In the representation below, a wh-question has a silent Q
in the head of ForceP to denote that the sentence is interrogative, a wh-element
in the Spec of FocP, with its head null, with VP containing the copy/trace of the
wh-element.

(11) [ForcePQ [FocP O que Ø+F [TP comprou [vP a Ana tcomprou [VP ..... t o que ]]]]

In Kato and Raposo’s (1996) analysis, the verb moves to the Focus head in OP
and EP only when the whole sentence is focalized, resulting in enclisis (cf (13a)).
When only the wh- or the focalized element moves to Spec of FocP, the verb stays
in T and the resulting pattern is proclisis (cf. (13b, c)). In BP, the verb always
stays in T, the subject moves to Spec of T, and we have a generalized proclisis (cf.
(13d)).

(12) a. amou-as o Pedro OP EP *BP


loved-them the Peter
‘Peter loved them.’

b. muitas mulheres o amaram OP EP BP


many women him loved
‘Many women loved him.’

c. quem te amou? OP EP BP33


who you loved?
‘Who loved you?’

d. o Pedro me amou *OP *EP BP


the Pedro me loved
‘Peter loved me.’

31 In Portuguese wherever you find a wh-element, you also find a marked focus element.
(i) Quem é que chegou? (ii) A ANA é que chegou.
‘Who is it that arrived?’ ‘ANA is it who arrived.’
32 See Ambar 1992 and Lopes-Rossi 1996.
33 BP has lost the third-person clitics, but proclisis is possible with first- and second-person
clitics.
Variation in syntax: Two case studies on Brazilian Portuguese 103

(13) a. [FP amouV+T Ø+F [TP as tV+T [vP o Pedro [VP tV tas]]]]
b. [FP MUITAS MULHERESi Ø+F [TP o amaramV [vP ti [VP tV to]]]]
c. [ForceP Q [FP quemwh Ø+F [TP te amouV [vP tquem [VP tV tte]]]]
d. [ForceP Q Ø+F [TP o Pedroi me amou [vP ti tV [VP tV tme]]]]

3.3.2 The cleft types of wh-questions


Starting in the seventeenth century, the old Wh-VS competed with the reverse
cleft,34 with the copula occupying T. We have here a sort of grammaticalization of
T, as in wh-questions no thematic verb raises from V to T. But we still have a kind
of V2 effect with the copula.

(14) [ForcePQ [FocP o que Ø+F [TPé [vP té [CP [que [TP a Ana comprou to que]]]]]

The great innovation happened in the second half of the nineteenth century and
the first half of the twentieth century only in BP. Recall that the above patterns
(7c, d, e, f), repeated below as (15a, b, c, d) are exclusively Brazilian. We are con-
sidering that the wh-in-situ in BP has two different types: an echo-question type
with rising intonation, which BP shares with EP, and an ordinary question type
with falling intonation, which seems to be exclusive to BP.35

(15) a. É o que que a Ana comprou? In-situ cleft type *OP *EP BP
b. (O) que que a Ana comprou? Reduced cleft type *OP *EP BP
c. O que a Ana comprou? Wh-SV type *OP *EP BP
d. A Ana comprou o que? Wh-in-situ type *OP %EP BP

Starting in the nineteenth century, instead of using the FocP position in the
periphery of the sentence, BP starts using the low FocP position adjacent to the
copula à la Belletti (2004).

(16) [ForceP Q [TP foi [FocP o que Ø+F [vP tfoi [VP tfoi que [TP a Ana comprou to que]]]]]]

At this stage, the copula undergoes a process of grammaticalization, becoming


invariable and no longer complying with tense or person concord.

(17) a. é quem que chegou? (vs. foi quem que chegou?, the old form)
Is who that arrived?
‘Who is it that arrived?’

34 Recall that in OP we had only reverse pseudo-clefts or wh-clefts.


35 There may be dialectal differences in Portugal, with some dialects sharing the same distinc-
tion as in BP.
104 Mary A. Kato

b. é A MARIA que chegou (vs. foi A MARIA que chegou)


is the Maria that arrived
‘It was Mary that arrived.’

c. é AS CRIANÇAS que chegaram (vs. foram AS CRIANÇAS que chegaram)


is the children who arrived
‘It was the children that arrived.’

In a study unrelated to clefts, Kato (2007) shows that in BP, when the present
tense copula is initial, it is usually dropped.

(18) a. O seu menino é inteligente > *Seu menino inteligente


the your boy is intelligent your boy intelligent

b. É inteligente o seu menino > Inteligente o seu menino


is intelligent the your boy intelligent the your boy
‘He is quite intelligent, your boy is.’

This change in the focus position allowed V1 constructions, which in turn allowed
the copula in initial position, favoring its erasure.

(19) a. (é) quem que chegou?


is who that arrived?
‘Who is it that arrived?’

b. (é) o que que ele quer?


is what that he wants?
‘What is it that he wants?’

c. (é) de que que ele está rindo?


is of what that he is laughing?
‘What is he laughing at?’

Notice that with the erasure of the copula, we find expressions that lead to the
phenomenon of haplology, a sound change that involves the loss of a syllable
when it is next to a phonetically identical (or similar) one:

(20) a. Quem que chegou? > a’. Quem ( ) chegou?


b. O que que ele quer? > b’. O que ( ) ele quer?
c. De que que ele está rindo? > c’. De que ( ) ele está rindo?

This is probably because haplology leads to a stylistic rejection of these forms on


the part of literate speakers. The result is the PF erasure of the complementizer.
Variation in syntax: Two case studies on Brazilian Portuguese 105

Recall that the resulting cases have been analyzed as a zero complementizer in
previous work by me and others.36
In (21) we can see the evolution of wh-questions over time:
(i) from a. to b. the raising of a thematic verb is replaced by the raising of the
copula;
(ii) from b. to c. the position of the FocusP projection changes, from the high
sentential periphery to the low vP adjacent position;
(iii) from c. to d. the copula grammaticalizes, becomes invariable and allows
erasure;
(iv) from d. to e. the resulting form can undergo haplology in PF, licensing the
erasure of the complementizer.

(21) a. [FocP o que Ø [IP comprou [vP a Maria [VP tcomprou to que]]]]
b. [FocP o que Ø [IP é [VP té [CP que [IP a Maria comprou to que]]]]
c. [IP é [FocP o que Ø [VP té [CP que [IP a Maria comprou to que]]]
d. [IP ( ) [FocP o que Ø [VP té [CP que [IP a Maria comprou to que]]]
e. o que (que) a Maria comprou

It is important to note that the only structural change was the use of the lower
focus, adjacent to vP, while in the older periods the focus was always fronted to
the sentential periphery.37
The old form shown in (21a) is no longer in the child’s core grammar, but the
reverse cleft, still very much present in the child’s input, is part of the child’s core
grammar. The real innovative types derived from the in-situ cleft are all there, and
they do not count as doublets because they are phonological, and not syntactic
variants.

36 See, for instance Kato and Duarte 2002 and Hornstein, Nunes and Grohman 2005. For the
former, the loss of the VS order in wh-questions was triggered by the loss of the null subject,
which caused Spec of TP to be filled with a weak pronoun. We may assume that we had two
independent triggers and, for some speakers who lost the null subject earlier, the origin of the
order WH-SV may have had a structural motivation.
37 However, in declarative focalization, the focalized element makes use of this low FocP, as
seen in examples such as (i) and (ii):
(i) Foi A MARIA que comprou a casa.
(ii) [IP Foi [FocP A MARIA [vP ser [ que a Maria comprou a casa]]
106 Mary A. Kato

3.3.3 The structure of wh-in-situ interrogatives38


In the early days of the Principle and Parameters theory, Huang (1982) proposed a
wh-parameter, according to which languages are divided into two types: (a) those
that move the wh-element overtly in syntax (e. g. English), and those that move it
only in LF (e. g. Japanese). BP has been shown to be a peculiar language, since it
looks as if wh-movement is optional.39 Moreover it has two types of wh-in-situ: (a)
an ordinary question, with a falling intonation, and (b) an echo question, with a
rising one as in yes/no questions.

(22) a. onde ele foi agora?


where he went now?
‘Where did he go now?’

b. ele foi onde agora? (ordinary question: falling intonation)


he went where now?
‘Where did he go now?’

c. ele foi agora onde? (echo question: rising intonation)


he went now where?
‘He went where now?’

The analysis of the echo wh-in-situ question will not be discussed in any more
detail here. My view is that the only real in-situ type of wh-question is the echo
one. Here I assume Kayne’s (1994) analysis of wh-in-situ in general, proposing
that what moves to the Spec of F is the entire sentence, as with yes/no questions.

(23) a. [ Q [FocP [TP você viu a Maria] [TP tTP]]


you saw the Maria
‘Did you see Mary?’
b. [ Q [FocP [TP você viu quem ] [TP tTP] ]]
you saw who

This analysis shows that in BP echo questions have the same prosody as yes/no
questions. However, this does not apply to the authentic wh-questions, which
have falling intonation in BP.
With regard to the latter type of wh-questions, Kato (2013) analyzes BP as
always having an obligatory last-resort wh-movement of a short type. The idea
was inspired by Miyagawa (2001) for whom Japanese has wh-movement of a short

38 The first version of this work, where two types of wh-in-situ were identified, was presented
in Kato (2013).
39 French is also a language with a kind of optional wh-in-situ, but the prosodic description is
different as it has only a rising intonation.
Variation in syntax: Two case studies on Brazilian Portuguese 107

type to T. Kato considers the wh-in-situ cases in BP to be fake in-situ construc-


tions, with the wh-element undergoing a short movement. However, in contrast
to Miyagawa, Kato claims that the movement was to a low vP-adjacent FocP, a
position proposed by Belletti (2004).
Like in Miyagawa’s analysis of Japanese (see also Cheng and Rooryck’s 2000
analysis for French), Kato assumes a Q in CP (or ForceP, in Rizzi’s 1997 terms),
which is responsible for clause-type interrogative sentences. In Japanese this
Comp is overt (ka/no), but in BP it is null, and like Japanese ka/no it also appears
in yes/no questions and has no wh-feature to check. The difference in intonation
between yes/no questions and authentic wh-in-situ questions, though sharing
the same Q-morpheme in Comp, has to do with the clause-internal FocP position,
responsible for the falling intonation:

(24) a. [ForcePQ [TP você conheceu a Maria/ quem]] (/) (echo-question)


you met the Maria/who
‘Have you met Maria?’

b. [ForcePQ [TP vocêi conheceu [FocP quemq [Ø+wh [vP ti tv [VP tv tq]]]]]] (\) (wh-question)
you met who
‘Who have you met?

Notice, moreover, that the position of the wh-element is the same as the one that
is exclusive to BP in the other cleft wh-constructions:

(25) a. [ForceP Q [TP é [FocP o que Ø+wh [VP té [CP que [TP a Maria comprou to que]]]
b. [ForceP Q [TP é [FocP o que Ø+wh [VP té [CP que [TP a Maria comprou to que]]]
c. [ForceP Q [[TP A Mariai comprouV [FocP o que Ø+wh [vP ti [VP tV tque ]]]]

There is no optionality between (25a) and (25b) on the one hand and the wh-in-
situ case on the other, as they have a different numeration: The in-situ case has
no copula or complementizer.
To summarize the structure of wh-questions, one can say that there are
indeed only two basic structures: one that results in the wh-in-situ structure and
one that results in all the others from a canonic cleft through successive phono-
logical erasure. But the two structures belong to the same grammar, as the two
types occupy the same low vP-adjacent FocP position.

4 Conclusion

This article has discussed two types of variation in Brazilian Portuguese: (a)
the variation of overt pronouns and null categories as subjects of complement
108 Mary A. Kato

clauses and (b) the variation in wh-questions. The comparison is interesting in


that, with regard to the former, variation has to do with usage, as the null subject
is acquired as a consequence of instruction. Society, through education, attempts
to preserve the old form as a variant form for the OSP, which is acquired through
selection. Other properties of the null subject parameter, such as subject-verb
inversion, which has affected the grammar of Brazilian Portuguese (cf. Berlinck
2000; Figueiredo Silva 2000, inter alia), are thus ignored by schools. Thus, the
learning of the null subject as a logophoric element does not affect grammar as
a system.
The variation in wh-questions also has to do with grammatical change.
However, society has no conscious knowledge of it, and does not try to preserve
the old form through schooling. The change of the Focus position, from the left
periphery to the internal vP adjacent position, is compensated by the appearance
of the cleft question through the copula and the so-called wh-in-situ construc-
tion. With the weakening of agreement, grammaticalization processes that affect
mainly morphology and phonology lead the cleft question to erase the copula. All
of these steps are purely grammatical. Except for the oldest form, Wh-VS, Magal-
hães’ (2006) children were shown to produce all the variants of a single grammar,
with the frequency of each variant in consonance with its frequency in the input.
Thus, the child’s production unconsciously reflects the use of variation in the
adult (cf. Yang 2002).

References
Ambar, Manuela (1992): Para uma Sintaxe da Inversão Sujeito-Verbo em Português. Lisboa: Ed.
Colibri.
Aronoff, Mark (1976): Word Formation in Generative Grammar. Cambridge: MIT Press.
Belletti, Adriana (2004): Aspects of the low IP area. In: Luigi Rizzi (ed.), The Structure of IP and
CP. The Cartography of Syntactic Structures, 16–51. (Oxford Studies in Comparative Syntax
2.) New York: Oxford University Press.
Berlinck, Rosane (2000): Brazilian Portuguese VS order: a diachronic analysis. In: Mary A. Kato
and Esmeralda V. Negrão (eds.), Brazilian Portuguese and the Null Subject Parameter,
175–194. Frankfurt a. M./Madrid: Vervuert/Iberoamericana.
Cheng, Lisa and John Rooryck (2000): Licensing wh-in-situ. Syntax 3/1: 1–19.
Chomsky, Noam (1981): Lectures on Government and Binding. Dordrecht: Foris.
Chomsky, N. (1988): Language and Problems of Knowledge: the Managua Lectures. Cambridge:
The MIT Press.
Cyrino, Sonia M. L. (1993): Observaçoes sobre a mudança diacrônica no português do Brasil:
objeto nulo e clíticos. In: Ian Roberts and Mary A. Kato (eds.), Português Brasileiro: Uma
viagem diacrônica, 163–184. Campinas: Editora da UNICAMP.
Duarte, M. Eugenia. L. (1995): Perda do princípio “evite pronome” no Português Brasileiro.
Ph. D. dissertation, Instituto de Estudos da Linguagem, UNICAMP.
Variation in syntax: Two case studies on Brazilian Portuguese 109

Ferreira, Marcelo B. (2000): Argumentos Nulos em Português Brasileiro. UNICAMP: MA Thesis.


Figueiredo Silva, M. Cristina (2000): Main and embedded null subjects in Brazilian Portuguese.
In: Mary A. Kato and Esmeralda V. Negrão (eds.), Brazilian Portuguese and the Null Subject
Parameter, 127–146. Frankfurt a. M./Madrid: Vervuert/Iberoamericana.
Holmberg, Anders (2001): The syntax of yes and no in Finnish. Studia Linguistica 55: 141–175.
Hornstein Norbert, Jairo Nunes and Kleanthes Grohmann (2005): Understanding Minimalism.
New York: Cambridge University Press.
Huang, Cheng-Teh James (1982): Logical Relations in Chinese and the Theory of Grammar.
Cambridge: MIT Press.
Kato, Mary A. (2000): The partial pro-drop nature and the restricted VS order in Brazilian
Portuguese. In: Mary A. Kato and Esmeralda V. Negrão (eds.), Brazilian Portuguese and the
Null Subject Parameter, 223–258. Frankfurt a. M./Madrid: Vervuert/Iberoamericana.
Kato, Mary A. (2007): Free and dependent small clauses in Brazilian Portuguese. DELTA 23
(Especial): Homenagem a Lucia Lobato: 85–111.
Kato, Mary A. (2009): O sujeito nulo revisitado no Português Brasileiro. In: M. Aparecida Morais
and M. Lucia O de Andrade (eds.), História do Português Paulista. Volume 2, 61–82.
Campinas: Editora da UNICAMP.
Kato, Mary A. (2011): Acquisition in the context of language change: the case of Brazilian
Portuguese null subjects. In: Esther Rinke and Tanja Kupisch (eds.), The Development
of Grammar: Language Acquisition and Diachronic Change, 309–330. Amsterdam/
Philadelphia: John Benjamins.
Kato, Mary A. (2013): Deriving wh-in-situ through movement. In: Victoria Camacho-Taboada,
Ángel Giménez-Fernández, Javier Martín-González and Mariano Reyes-Tejedor (eds.),
Information Structure and Agreement, 175–191. (Linguistik Aktuell/Linguistic Today 197.)
Amsterdam/Philadelphia: John Benjamins.
Kato, Mary A., Sonia Cyrino and Vilma R. Correa (2009): Brazilian Portuguese and the recovery
of lost clitics through schooling. In: Acrisio Pires and Jason Rothman (eds.), Minimalist
Inquiries into Child and Adult Language Acquisition: Case Studies across Portuguese,
245–272. (Studies in Language Acquisition 35.) Berlin/New York: Mouton de Gruyter.
Kato, Mary A. and M. Eugenia Duarte (2002): Diachronic analysis of Brazilian Portuguese
wh-questions. Santa Barbara Portuguese Studies, Volume VI, 326–339. University of
California at Santa Barbara: Center for Portuguese Studies.
Kato, Mary A. and M. Eugenia Duarte (2003): Semantic and phonological constraints in the
distribution of null subjects in Brazilian Portuguese. Paper presented at the NWAVE 32,
Philadelphia.
Kato, Mary A. and Esmeralda V. Negrão (eds.) (2000): Brazilian Portuguese and the Null Subject
Parameter. Frankfurt a. M./Madrid: Vervuert/IberoAmericana.
Kato, Mary A. and Eduardo Raposo (1996): European and Brazilian word order: questions,
focus and topic constructions. In: Claudia Parodi, Antonio Carlos Quicoli, Mario Saltarelli
and M. Luisa Zubizarreta (eds.), Aspects of Romance Linguistics, 267–277. Washington:
Georgetown University Press.
Kato, Mary A. and Ilza Ribeiro (2009): Cleft sentences from old Portuguese to Modern Brazilian
Portuguese. In: Andreas Dufter and Daniel Jacob (eds.), Focus and Background in Romance
Languages, 123–154. Amsterdam/Philadelphia: John Benjamins.
Kayne, Richard (1994): The Antisymmetry of Syntax. Cambridge: The MIT Press.
Kroch, Anthony (1994): Morphosyntactic variation. In: Katharine Beals, Jeannette Denton,
Robert Knippen, Lynette Melnar, Hisami Suzuki and Erica Zeinfeld (eds.), Papers from the
110 Mary A. Kato

30th Regional Meeting of the Chicago Linguistic Society: Parasession on Variation and
Linguistic Theory, 180–201. Chicago, IL: Chicago Linguistic Society.
Kuno, Sussumu (1972): Pronominalization, reflexivization, and direct discourse. Linguistic
Inquiry 3/2: 161–195.
Lessa de Oliveira, Adriana (2003): Aquisição de constituintes-Qu em dois dialetos do português
brasileiro. M. A. thesis, UNICAMP.
Lightfoot, David (1991): How to Set Parameters. Cambridge: MIT Press.
Lopes-Rossi, M. Aparecida (1996): A sintaxe diacrônica das Interrogativas-Q do Português.
Ph.D. dissertation, Instituto de Estudos da Linguagem, UNICAMP.
Magalhães, Telma V. (2003): Aprendendo o sujeito nulo na escola. Letras de Hoje 36/1:
189–202.
Magalhães, Telma V. (2006): O sistema pronominal sujeito e objeto na aquisição do Português
Europeu e Português Brasileiro. Ph.D. dissertation, Instituto de Estudos da Linguagem,
UNICAMP.
Miyagawa, Shigeru (2001): The EPP, Scrambling, and wh-in situ. In: Ken Hale and Michael
Kenstowicz (eds.), A Life in Language, 293–338. Cambridge, MA: MIT Press.
Modesto, Marcello (2000): Null subjects without rich agreement. In: Mary A. Kato and
Esmeralda V. Negrão (eds.), Brazilian Portuguese and the Null Subject Parameter, 147–174.
Frankfurt a. M./Madrid: Vervuert/Iberoamericana.
Noonan, Maire (1989): Operator licensing and the case of French interrogatives. In: E. Jane
Fee and Kathryn Hunt (eds.), Proceedings of the 8th West Coast Conference on Formal
Linguistics, 315–330. University of British Columbia: Stanford Linguistics Association.
Pagotto, Emilio (1993): Clíticos, mudança e seleção natural. In: Ian Roberts and Mary A. Kato
(eds.), Português Brasileiro: Uma viagem diacrônica, 185–206. Campinas: Editora da
UNICAMP.
Rizzi, Luigi (1994): Early null subjects and root null subjects. In: Teun Hoekstra and Bonnie
Schwartz (eds.), Language Acquisition Studies in Generative Grammar, 151–176.
(Language Acquisition and Language Disorders 8.) Amsterdam/Philadelphia: John
Benjamins.
Rizzi, Luigi (1997): The fine structure of the left periphery. In: Liliane M. Haegeman (ed.),
Elements of Grammar: Handbook of Generative Syntax, 281–337. Dordrecht: Kluwer.
Rodrigues, Cilene (2004): Morphology and null subjects in Brazilian Portuguese. In: David
Lightfoot (ed.), Syntactic Effects of Morphological Change, 160–178. Oxford/New York:
Oxford University Press.
Saito, Mamoru and Naoki Fukui (1998): Order in phrase structure and movement. Linguistic
Inquiry 29/3: 439–474.
Yang, Charles (2002): Knowledge and Learning in Natural Language. Oxford: Oxford University
Press.
Part 2: Rare phenomena and variation
Göz Kaufmann, University of Freiburg
Rare phenomena revealing basic syntactic
mechanisms: The case of unexpected
verb-object sequences in Mennonite Low
German1

Abstract: The main focus of this article is dependent clauses with one verbal
element in Mennonite Low German. In some of these clauses, the complement
surfaces after the verb, thereby defying the expected word order of German vari-
eties. This unexpected linearization pattern does not constitute a case of ungram-
maticality, but one of syntactic analogy, in line with the informants’ syntactic
behavior with regard to verb clusters in dependent clauses with two verbal ele-
ments. Due to this relationship, the analysis of dependent clauses with one verbal
element will also shed light on the structure of verb clusters.

1 Introduction

This article analyzes the distribution and structure of dependent clauses with one
verbal element in which the ObjNP/PP (noun/prepositional phrase functioning
as complement) unexpectedly surfaces after the finite verb as illustrated in (1a).
This example forms part of a data set of roughly 14,000 sentences translated
from English, Spanish, or Portuguese into Mennonite Low German (MLG). The
translations were elicited in six Mennonite colonies in North and South America
between 1999 and 2002. Example (2a) shows a dependent clause with two verbal
elements, again with the ObjNP surfacing in post-verbal position. Both (1a) and
(2a) contrast with the expected serializations in (1b) and (2b):

1 I would like to thank Leonie Cornips, Martin Pfeiffer, Peter Öhl, Peter Auer, and Aria Adli for
their helpful comments. The usual disclaimers apply.
114 Göz Kaufmann

stimulus <11> Spanish: Si él firma ese contrato, va a perder mucho dinero


English: If he signs this contract, he will lose a lot of money

(1) a. wann hei unterschrieft [0.4] diesen contrato [0.6] dann verliest der viel Geld2
(Mex-26; m/34/MLG3)
if he signs-VERB […] this contract […] then loses he much money

b. wann hei det Kontrakt [ehm] unterschrieft dann wird her viel Geld verlieren
(Mex-77; f/46/MLG)
if he the.NEUTER contract [ehm] signs-VERB then will he much money lose

stimulus <26> Spanish: Necesita lentes porque no puede ver el pizarrón


English: He needs glasses because he can’t see the blackboard

(2) a. dü bruuks: [0.7] Brill wiels dü nich sehne kanns die Tofel (Bol-4; m/44/MLG)
you need […] glass because you not see-VERB2 can-VERB1 the blackboard

b. de bruukt ne Brill wegens her nich de [0.6] Tofel sehne kann (Bol-8; m/20/MLG)
he needs a glass because he not the.REDUCED […] blackboard see-VERB2 can-VERB1

The translations in (1a) and (2a) occur rarely in the stimulus sentences <11> and
<26>:4 (1a) appears once in eighty translations with one verbal element (1.3 %);
(2a) appears twice among 311 translations with two verbal elements (0.6 %). Their
rareness is probably caused by the unexpected post-verbal position of the com-
plements, i. e. diesen Contrato (‘this contract’) and die Tofel (‘the blackboard’). On
first sight, one may assume a priming effect in (1a) and (2a) causing the marked
sequence (but cf. the discussion of Table 4 below). This could be either a case

2 The representation of MLG does not claim phonetic accurateness. Filled pauses are indicated in
brackets ([eh] or [ehm]), unfilled pauses with the indication of their length if longer than 0.3 sec-
onds. Break-offs or repairs are marked with a hyphen; a colon represents a markedly prolonged
pronunciation of a phonetic segment. The parts of the translations relevant for the a ­ nalysis are
underlined. In the glosses, only relevant grammatical information such as the hierarchy among
verbal elements, particles, and deviating gender or case of ObjNPs is given. Underlined elements
in the glosses represent semantic deviations from the stimulus sentence; a ∅ represents an
element which was not translated. Crossed out elements represent cases where the informant
included words not present in the stimulus sentence. Whenever the interview language was
Spanish or Portuguese, the stimulus version is given both in that language and in English.
3 All translations presented are coded according to the informants’ origin (Mex = Mexico;
USA = USA; Bra = Brazil; Men = Menno, Paraguay; Fern = Fernheim, Paraguay; Bol = Bolivia) and
their coding number. Also given are the sex of the informant (m(ale) or f(emale)), his or her age
in years, and his or her dominant language(s) (MLG (Mennonite Low German), SHG (Standard
High German), Engl(ish), Span(ish), or Port(uguese)).
4 The rare phenomenon dealt with in this article is not rare in the typological sense, i. e. the
sequence verb-object in dependent clauses with one verbal element is obviously not a rare phe-
nomenon in the languages of the world. It is, however, a rare phenomenon in MLG and in most
Continental West Germanic languages.
Rare phenomena revealing basic syntactic mechanisms 115

of short-term priming due to the translation task (all stimulus languages feature
the sequence verb1-(verb2-)ObjNP/PP) or it could be the long-term consequence of
contact-induced language change. In (2a), however, priming could only be part of
the explanation, since the two verbal elements do not appear in the sequence of
the Spanish stimulus sentence (sehne-VERB2 kanns-VERB1 vs. puede-VERB1 ver-
VERB2; both ‘can see’).
In the Mennonite data set, there are 27 tokens like (2a) in a data pool of 3.120
comparable clauses with two or more verbal elements (0.9 %). Due to space lim-
itations, we will not be able to analyze these tokens thoroughly. The structure
of (2a) and the necessary explanations for its occurrence are, however, com-
pletely different from those of (1a). The major structural difference is that the
ObjNP in (2a) appears after two or more verbal elements, while there is just one
verbal element in (1a). The necessity for different explanations is underlined by
the fact that the informants who produce tokens like (2a) are significantly older
than the ones who produce tokens like (1a) (36.3 years old vs. 27.5 years old; F
(1,84) = 10.2, p = 0.002**), that they come from different colonies and that their
syntactic behavior with regard to verb clusters with two verbal elements is not
comparable. We will, therefore, focus on tokens like (1a): There are 59 tokens like
(1a) in 1,837 translations with one verbal element (3.2 %; only stimulus sentences
where at least one token like (1a) is present are included in this calculation).
Their occurrence is unrelated to translation problems since almost all inform-
ants producing these tokens are fluent in both MLG and the respective majority
language. In Section 4.2.3, we will see that the occurrence of these translations
is best explained by the informants’ syntactic preferences with regard to clusters
with two verbal elements (for an earlier analysis of this phenomenon, cf. Kauf-
mann 2007: 193–198).
Section 2 gives some historical and linguistic information about the Men-
nonites in the Americas; Section 3 explains the elicitation procedure. The central
Section 4 is dedicated to the (socio)linguistic analysis of the variant represented
by (1a). In this section, we will also develop the structure of this variant. In Sec-
tions 4.4.1 and 4.4.2, the influence of the morphological shape of the complement,
i. e. whether it is marked by a preposition or not and whether it is definite or indef-
inite, will be dealt with. Section 5 concludes by showing how empirical studies of
rare phenomena can contribute to the advancement of linguistic theory.

2 History and languages of the American Mennonites

The origins of the Mennonites can be found in East Holland, Frisia, Flanders and
what is today northwest Germany. In these regions, Anabaptist communities had
formed during the Reformation. Due to religious persecution many of these Ana-
116 Göz Kaufmann

baptists emigrated to West and East Prussia during the sixteenth century. There,
a koiné was formed out of the varieties the Mennonites had used in their home-
lands and the local varieties of Low German. Moreover, the Mennonites began to
use Standard High German (SHG) instead of Dutch for official purposes such as
church service and schooling. When the Prussian government imposed stricter
rules on the Mennonites in the eighteenth century, some of them accepted an
invitation by Catherine II of Russia to settle in the Ukraine. At the end of the nine-
teenth century, however, Russian officials introduced laws to ensure a certain
degree of integration, causing the more tradition-bound Mennonites to emigrate
to Canada around 1870. During and after World War I, the situation of German-
speaking immigrants became difficult in Canada. Again, the more conservative
members did not yield to outside pressure. Some decided to move to Mexico,
where most of them settled in the northern state of Chihuahua (Ciudad Cuauhté-
moc; 40,000 people; colonies for which the number of inhabitants is given are
analyzed in this paper). Other Mennonites found a new home in Paraguay and set
up the colony of Menno (9,000 people). Mennonites from Mexico and from Menno
founded several daughter colonies, most importantly Santa Cruz de la Sierra in
Bolivia (50,000 people), various communities in Belize, and one in Seminole,
Texas (4,000 people). The Mennonites who stayed in Russia in 1870 accepted the
new situation and introduced a more modern school system with more emphasis
on SHG. Due to their economic success, these Mennonites faced severe persecu-
tion in the Soviet Union when Stalin gained absolute power in 1927. Because of
these unfavorable prospects, many Mennonites tried to leave the Soviet Union
and some succeeded in emigrating to Canada, Paraguay (colony Fernheim; 4,000
people) and Brazil (Colônia Nova; 1,000 people) in 1930.
These different migration paths led to different language repertoires. Besides
MLG and SHG, this includes the majority language of each colony’s homeland and
possibly other languages such as, for example, Guaraní and local tribal languages
in Paraguay. MLG is still the unrivaled ingroup language in Mexico, Bolivia and
in the Paraguayan colony Menno. It is weakest in the United States and Brazil,
the two colonies where competence in the respective majority language is best. In
the Paraguayan colony Fernheim, there are already some families who use SHG
instead of MLG at home. With regard to SHG, the two Paraguayan colonies benefit
from their modern school system in which this language is both a subject of learn-
ing and a medium of instruction. Granted, this is also true for many schools in
the more conservative colonies in the USA, Mexico and Bolivia, but these schools
can hardly be called modern. SHG used to play an important role in the Brazilian
colony as well, but due to political intervention and the size of the colony it has
lost this position.
Rare phenomena revealing basic syntactic mechanisms 117

3 The data set

The data analyzed consist of the oral translation of 46 stimulus sentences from
English, Spanish, or Portuguese into MLG. The 313 informants5 did not have
access to a written version of the stimulus sentences. The stimuli were read one
at a time and then immediately translated one at a time. As the central interest of
the project is clause-final verb clusters, the stimulus sentences were created in a
way that allowed the analysis of three linguistic factors: (a) the type of finite verb,
(b) the number of verbal elements and, for dependent and introduced clauses, (c)
the type of clause. The different cluster types are distributed over six main clauses
and four types of dependent clauses: ten restrictive relative clauses, ten preposed
conditional clauses, ten extraposed causative clauses and ten extraposed com-
plement clauses. All main verbs in the stimulus sentences are transitive, i. e. they
govern an ObjNP/PP. Some sentences additionally contained an adverb.
190 of the 313 informants claimed MLG as the language they knew best
(60.7 %). Another 31 informants indicated a comparable knowledge of MLG and
one of the majority languages (9.9 %; 10 in Brazil; 9 in the USA), twelve of MLG
and SHG (3.8 %; 9 in Paraguay). 63 informants claimed that one of the major-
ity languages was their strongest language (20.1 %; 31 in the USA; 19 in Brazil);
seventeen allotted this status to SHG (5.4 %; 16 in Paraguay). Only six of the
313 informants can be classified as semi-speakers with regard to MLG. This means
that most speakers with a dominant language other than MLG still speak MLG well
or even very well. Analyzing the informants’ language dominance with regard
to the marked variant in (1a), we see that 29 of the 59 tokens are produced by
informants whose dominant language is MLG (49.2 %), six (10.2 %) by informants
equally competent in MLG and one of the majority languages, and 24 (40.7 %)
by informants dominant in a majority language, mostly English. Only one token
is produced by one of the six possible semi-speakers; none is produced by an
informant (co-)dominant in SHG. It is also important to realize that the marked
variant is not reducible to a “deviant” grammar of just a few informants. A total of
46 Mennonites (14.7 % of the 313 informants) produced the 59 tokens.

5 103 informants come from Mexico, 67 from the USA, 56 from Brazil, 42 from Menno (Paraguay),
37 from Fernheim (Paraguay) and eight from Bolivia.
118 Göz Kaufmann

4 Analysis of the verb-object sequence

4.1 Presentation of the phenomenon

In the translated complement, conditional and relative clauses with one verbal
element (with or without a verbal particle), there are 59 tokens (3.2 % of 1,837
tokens) in which the internal complement selected by the verb surfaces after this
verb. In 53 of the 59 cases the verb occupies – on the surface – the second position
of the clause. Examples (3) through (6) present two complement clauses, one con-
ditional and one relative clause. As in (1) and (2), the (a) examples illustrate the
rare sequence verb-ObjNP/PP, whereas the (b) examples represent the unmarked
sequence ObjNP/PP-verb.

stimulus <4> English: Can’t you see that I am wearing a new dress?

(3) a. kos nich sehen daut ik ha en nüet Kleid an (USA-22; f/15/Engl)


can ∅ not see that I have-VERB a new dress on-PARTICLE

b. kos dü daut nich sehen daut ik en nüet Kleid anha (USA-29; f/19/MLG)
can you that not see that I a new dress on-PARTICLE-have-VERB

stimulus <5> P
 ortuguese: O Enrique não sabe que ele pode sair do país
English: Henry doesn’t know that he can leave the country

(4) a. Hein weit daut nich daut hei darf [0.4] üt dem [0.3] Laund rüter (Bra-5; f/22/MLG+Port)
Henry knows that not that he may-VERB […] out the […] country out-PARTICLE

b. Hein weit nich daut hei üt dem Laund rüterdarf (Bra-52; m/30/MLG)
Henry knows not that he out the country out-PARTICLE-may-VERB

stimulus <12> English: If he does his homework, he can have some ice cream

(5) a. wann der dät den sine Arbeit dann kaun her etz some ice cream eten (USA-77; f/42/MLG)
if he does-VERB the his homework then can he now some ice cream eat

b. wann her sinen [1.1] homework dät dann kaun her ice cream han
(USA-64; f/41/Engl)
if he his.MASCULINE […] homework does-VERB then can he ∅ ice cream have

stimulus <32> P
 ortuguese: As estorias que ele está contando para os homens são muito
tristes
English: The stories that he is telling the men are very sad

(6) a. Die Geschichte waut hei vertahlt für de Manner is sehr trürig (Bra-37; m/34/Port)
the story that he tells-VERB for the men is very sad

b. die Geschichte waut hei to de Männer vertahlt sind sehr trürig (Bra-6; f/23/MLG)
the stories that he to the men tells-VERB are very sad
Rare phenomena revealing basic syntactic mechanisms 119

The examples given are structurally not uniform: (i) 21 of the relevant tokens
(35.6 %) feature a verb with a particle (cf. (3a) and (4a)), the rest are verbs with
(cf. (6a)) or without a non-separable prefix (cf. (5a)). This difference does not have
any measurable influence on the frequency of the rare phenomenon. (ii) In most
cases, the ObjNPs/PPs in the tokens with the sequence verb-ObjNP/PP are definite
(with a definite article as in (4a) or a possessive article as in (5a)), only seventeen
complements are indefinite (28.8 %; mostly with an indefinite article as in (3a); cf.
Section 4.4.2). (iii) 22 of the tokens (37.3 %; cf. (4a) and (6a)) feature a PP as com-
plement (cf. Section 4.4.1). As there is a certain tendency in Continental West Ger-
manic varieties such as Dutch to extrapose ObjPPs into the postfield and as such
a movement would undermine our line of argumentation, some tokens which
seem to belong to the variant represented by (1a) were excluded. Translations
of stimulus sentence <5> Henry doesn’t know that he can leave the country, for
example, were only accepted if the particle surfaced at the end of the clause after
the ObjPP (cf. (4a)). In such a case, extraposition of the ObjPP into the postfield
is not a possible analysis. Conversely, translations such as (7) were not included
in the analyses because the particle rüt (‘out’) surfaces in a non-final position,
strongly suggesting an extraposed ObjPP:

stimulus <5> Spanish: Enrique no sabe que puede salir del país
English: Henry doesn’t know that he can leave the country

(7) Heinrik weit daut hei nicht kann rüt [0.6] üt diese- [0.4] üt det Land (Mex-45; m/59/MLG)
Henry knows ∅ that he not can-VERB out-PARTICLE […] out this- […] out the country

Another excluded variant is illustrated by (8). In this token, extraposition also


seems to be the correct analysis because the particle surfaces not only in front of
the ObjPP but also in front of the verbal element:

stimulus <43> Spanish: Antes de irme de casa, siempre apago las luces
English: Before leaving the house, I always turn off the lights

(8) 
immer wann ik weggo von Hüs dann du ik immer daut Lich ütmeaken
(Mex-82; m/52/MLG)
always when I away-PARTICLE-go-VERB from home then do I always the light out-make


We will come back to examples like (8) at the end of Section 4.4.1 in order to show
that the informants producing them have different syntactic preferences than
the informants producing the marked variant in the (a) examples in (1) and (3)
through (6). Obviously unlike in (4a), the structural position of the indirect ObjPP
in (6a) is not clear either; in principle, extraposition into the postfield might be
a possible derivation. The difference to the probably extraposed directive ObjPPs
120 Göz Kaufmann

in (7) and (8) is that für in für de Manner (‘to the men’) is not selected by the verb
but marks an indirect object. This means that für in (6a) is semantically vacuous
and more importantly, it is optional – most Mennonite informants do not mark
indirect objects prepositionally. Unlike this, the prepositions üt (‘out’) and von
(‘from’) in (7) and (8) add semantic value to the verbal proposition. As für de
Manner is syntactically closer to indirect ObjNPs than to directive ObjPPs like in
(7) and (8) and as indirect ObjNPs cannot be extraposed in MLG, extraposition
does not seem to be a possible explanation for (6a).

4.2 Distribution of the phenomenon

4.2.1 Origin of the informants


The sequence verb-ObjNP/PP is well-known in causal clauses and some other
adjunct clauses in colloquial German (cf., e. g., Keller 1993). In these clauses,
the finite verb occupies the second position of the clause not only superficially,
but structurally.6 An ObjNP/PP that surfaces to the right of its governing verb in
other types of dependent and introduced clauses constitutes a rare phenome-
non in Continental West Germanic varieties (but cf. Larrew 2005, who analyzes
verb-second word order in German relative and complement clauses). In view of
this, it is unlikely that the finite verb in tokens such as (1a) and (3a) through (6a)
structurally occupies the verb-second position in these introduced clauses, i. e.
the head position of CP (cf. also the second point in Section 4.3.2). We will there-
fore have to find a different structural explanation for these tokens. In order to
do this, it is important to discuss the distribution of these tokens according to
relevant (socio)linguistic criteria. Table 1 shows the distribution of the marked
variant with regard to the origin of the informants. Besides the number of tokens
per column and the distribution of the two variants, the reader also finds the
share of tokens with an ObjPP, the share of tokens with an indefinite NP in the
ObjNP/PP and the share of tokens in complement clauses. This information is
given because these three specifications will be shown to influence the inform-
ants’ syntactic behavior (cf. Sections 4.2.3, 4.4.1, and 4.4.2). Importantly, however,
these characteristics only influence the overall probability of the marked variant,
not its relative preference by certain groups of speakers.

6 In these cases, we are dealing with a dependent main clause with the finite verb occupying
the head position of CP. The Mennonite data does also contain many verb-second causal clauses
with one verbal element, especially in the North American colonies (cf. Table 9 and Kaufmann
2003: 188–189). Due to this, these tokens are not included in the analyses. We will, however, deal
with causal clauses from the South American colonies in section 4.4.3.
Rare phenomena revealing basic syntactic mechanisms 121

Table 1: Distribution of the two variants in dependent non-causal clauses with one verbal
element in all colonies separated according to the informants’ origin (obj = ObjNPs/PPs;
part = particle)

In the highly significant, but with 0.14 only weakly associated distribution of
Table 1,7 we can distinguish three types of colonies with regard to the phenomenon
in question. The informants in the United States show by far the highest share of
the non-verb-final variant (8.4 % of their tokens; 24 instead of 9.2 expected tokens).
The other extreme is represented by the Paraguayan colonies which only produce
three instead of 18.2 expected tokens (0.5 % of the 567 Paraguayan tokens). The
other three colonies range from 3.1 % to 4.3 %. The difference between these three
groups of colonies seems to be connected to the different competence levels in
SHG. Much contact with SHG, as in the Paraguayan colonies, correlates with very
few non-verb-final tokens; hardly any contact with SHG, as in the US-American
colony, correlates with a much higher number of non-verb-final tokens.

7 Shading in tables is only used when the distribution in a line represents a significant dif­
ference. For token frequencies, Pearson’s Chi-Square is used. As this test is sensitive to the
number of tokens, tests for the strength of association are also carried out (Cramer’s V or Phi).
The number of cells with less than five expected tokens in the distribution is always given (in
especially vulnerable distributions with one degree of freedom, the result of Fisher’s Exact is
added). For interval scale variables such as age or the indexes for verb projection raising and
scrambling, a One-Way ANOVA is used. The level of statistical significance is given with its pre-
cise value. One asterisk * means that SPSS calculates the probability for a Type I-error between
1 % and 5 % (0.01 ≤ p < 0.05), two asterisks ** that the probability is smaller than 1 % (0 < p < 0.01),
and three asterisks *** that it is virtually 0 % (p = 0). We are aware of the fact that this value can
never be reached, but follow the indication of SPSS. One asterisk in brackets (*) indicates a statis-
tical tendency where the error margin lies between 5 % and 10 % (0.05 ≤ p < 0.1).
122 Göz Kaufmann

4.2.2 Types of clauses


There is a total of fourteen clauses which feature the variant with the marked
sequence verb-ObjNP/PP. Its share ranges from 0.6 % to 15.3 % in thirteen of
these clauses (the last clause shows a higher share of 25 %, but there are only
four usable tokens). There are many factors which may have influenced the vari-
ant’s widely differing appearance. Such factors are differences in the factivity of
complement clause compounds (cf. Barbiers 2000: 191–193), the syntactic role
of ­relative markers, the position of relative clauses within their sentence com-
pounds, or the type of subject (full NP or pronoun; first, second, or third person;
cf. Auer 1998: 296–297). The most marked difference, however, proved to be the
type of clause. This is represented in Table 2:

Table 2: Distribution of the two variants in dependent non-causal clauses with one verbal
element in all colonies separated according to the type of clause (part=particle)

The distribution in Table 2 is highly significant, but again the association strength
is weak. The extraposed complement clauses show a much higher share of the
marked variant (5.8 %; 41 instead of expected 22.9 tokens) than conditional and
relative clauses (2 % and 1.1 %, respectively; 13 and 5 instead of expected 21.1
and 15 tokens). This does not change if we only consider clauses with definite
ObjNPs, i. e. the significant concentrations of indefinite ObjNPs/PPs in comple-
ment clauses and ObjPPs in relative clauses skews the result to some degree, but
it does not change the general picture. As already mentioned, almost all tokens
of the marked non-verb-final variant are superficially verb-second and thus share
a central characteristic with main clauses. In view of this, one could assume an
iconic relationship between this surface shape and a low degree of embedding
since the surface shape may remind the speakers (and the hearers) of indepen-
dent verb-second main clauses. If this were the case, one could use the share of
superficial verb-second clauses in MLG as an indicator of the degree of embed-
Rare phenomena revealing basic syntactic mechanisms 123

ding of dependent clauses. We do not have the space to pursue this line of reason-
ing further, but the analyses of the variation in dependent clauses with two verbal
elements also points in this direction.

4.2.3 General syntactic preferences of the informants


So far, we have seen that the origin of the informants (cf. Table 1) and the type of
clause (cf. Table 2) have an influence on the phenomenon in question. The most
important distributional fact, however, is the correlation of the variant in ques-
tion with the informants’ general preference for certain linearization patterns in
clause-final clusters with two verbal elements (cf. Kaufmann (2007: 147–148) and
especially Haider (2010: 323–338) for further detail with regard to verb clusters).
In complement, relative and conditional clauses with two verbal elements, the
Mennonites show four patterns of serialization:

stimulus <15> Portuguese: Se ele tiver que vender a casa agora, ele vai ficar muito triste
English: If he has to sell the house now, he will be very sorry

NR-variant I8: introductory element – subject – adverb – ObjNP – V2 – V1


(9) wann der nü daut Hüs verköpe mut [0.5] dann wird der sehr trürig (Bra-59; f/56/MLG)
if he now-ADVERB the house sell-VERB2 must-VERB1 […] then turns he very sad

NR-variant II: introductory element – subject – ObjNP – adverb – V2 – V1


(10) wann hei sin Hüs nu verköpe soll dann wird her sehr trürig sene (Bra-2; m/55/MLG)
if he his house now-ADVERB sell-VERB2 shall-VERB1 then will he very sad be

VPR-variant: introductory element – subject – V1 [– adverb] – ObjNP [– adverb] – V2


(11) wann her mut daut Hus nu verköpe wird her sehr trürig bliewe (Bra-24; m/36/MLG+Port)
if he must-VERB1 the house now-ADVERB sell-VERB2 will he very sad remain

VR-variant: introductory element – subject [– adverb] – ObjNP [– adverb] – V1– V2


(12) wann hei daut Hüs nu mut verköpe wird her sehr trürig sene (Bra-4; m/40/Port)
if he the house now-ADVERB must-VERB1 sell-VERB2 will he very sad be

All four variants can be derived by three movements assuming that MLG is head-
final with regard to both VP and IP. Disregarding the higher clausal positions and
not including adverbs9 in the structural description in (13a–c2ii), the proposed
derivation works like this (moved elements are underlined):

8 The labels for the different variants follow tradition (NR = non-raising; VPR = verb projection
raising; VR = verb raising).
9 With regard to the precise position of the ObjNP, we do not differentiate between the two pos-
sible sequences of ObjNP and adverb in the V(P)R-variants (11) and (12). With regard to the VPR-
124 Göz Kaufmann

(13) a. basic structure [CP … [IP … [VP1 [VP2 daut Hus verköpe] mut] ∅(3PS)]]

b. gaining finiteness [CP … [IP … [VP1 [VP2 daut Hus verköpe] ta] muta-∅]]
(head movement from V10 to I0 resulting in the NR-variant I)

c1. verb projection raising [CP … [IP [IP … [VP1 tb ta] muta-∅] [VP2 daut Hus verköpe]b]]
(raising and adjunction of VP2 to IP resulting in the VPR-vari-
ant)

c2i. scrambling10 [CP … [IP daut Husc [IP … [VP1 [VP2 tc verköpe] ta] muta-∅]]]
 (scrambling of ObjNP out of VP2 to IP resulting in the
NR-variant II)

c2ii. verb projection raising [CP … [IP [IP daut Husc [IP … [VP1 tb ta] muta-∅]] [VP2 tc verköpe]b]]
(raising and adjunction of remnant VP2 to IP resulting in the
VR-variant)

This approach to verb clusters can be found in the older literature (cf. Den Besten
and Broekhuis (1989) as quoted in Haegeman 1994: 51211), but except for Kauf-
mann (2007) it does not meet with a lot of support nowadays. Its central claim is
that although on the surface it looks as if the VR-variant implied the movement
of less material than the VPR-variant (just moving the verbal head instead of the
complete verb phrase), it actually implies more movement, namely scrambling
plus verb projection raising (moving the complete verb phrase, not just the verbal
head). The consequence of the derivational assumptions put forward in (13a–c2ii)
is that verb clusters are nothing more than an epiphenomenon of two indepen-
dent syntactic mechanisms. Table 3 illustrates the four possible crossings of verb
projection raising and scrambling exemplified by tokens (9) through (12):

variant in (11), the sequence ObjNP-adverb is merely a case of short movement of the ObjNP not
leaving VP2; in the case of the VR-variant in (12), the ObjNP is moved outside the VP2 regardless
of its landing site before or after the adverb. Unfortunately, we cannot be sure whether the scram-
bled ObjNP in the sequence ObjNP-adverb in (10) has really left VP2 or whether the supposedly
non-scrambled ObjNP in the sequence adverb-ObjNP in (9) is still in VP2, but the normalizing
technique used for grouping the informants minimizes this problem (cf. the explanation in this
section).
10 With regard to our hypothesis it is immaterial whether scrambling occurs before or after verb
projection raising. However, as many theoretical considerations speak against movement out of
a moved constituent (cf. Wexler & Culicover 1980), we opt for the application of scrambling be-
fore raising. This means that verb projection raising in the formation of the VR-variant is a case
of remnant movement (cf. (13c2ii)).
11 The English translation of Den Besten and Broekhuis’ central argument is: “[…] VR is inter-
preted as the limiting case of VPR, an instantiation of VPR where all nonverbal material has been
scrambled out of the adjoined VP”.
Rare phenomena revealing basic syntactic mechanisms 125

Table 3: Cross tabulation of the example clauses (9) through (12) by means of verb projection
raising and scrambling

-verb projection raising +verb projection raising


(9) wann der nü daut Hüs (11) wann her mut daut Hus nu
-scrambling
verköpe mut […] verköpe […]

(10) wann hei sin Hüs nu (12) wann hei daut Hüs nu mut
+scrambling
verköpe soll […] verköpe […]

The reader may have fewer problems in acceptwing (11) and (12) as raised than in
accepting (10) and (12) as scrambled since the term scrambling is used in a rather
loose way to cover non-prototypical cases of argument movement (the prototypical
case being the re-ordering of arguments in the midfield of a German clause; cf.
Haider 2010: 152 (property (vii)) and 184–185). Both the position of the ObjNP in
front of an adverb in a clause with the sequence verb2-verb1 (cf. (10), a case Zwart
(1996: 230–231) accepts as scrambling, but Haider (2010: 157–158) dismisses out-
right) and the non-adjacency of the ObjNP and its governing verb in a clause with
the sequence verb1-verb2 in (12) are supposed to be the consequence of scrambling.
The reason for this assumption is the fact that informants who prefer serialization
patterns represented by (11) to those represented by (12) also prefer serialization
patterns represented by (9) to those represented by (10). And informants preferring
(10) over (9) also prefer (12) over (11). These correlations obviously do not prove
that scrambling is the reason for (10) and (12), but they represent good evidence
for the claim that the two phenomena are of the same nature. Moreover, in Sections
4.4.1 and 4.4.2, we will see that the position of ObjNPs/PPs in our tokens is indeed
sensitive to typical characteristics facilitating or barring scrambling.
Assuming that different serialization patterns in verb clusters are the con-
sequence of the (non-)application of verb projection raising and scrambling, the
313 informants were characterized according to their syntactic behavior in depend-
ent clauses with two verbal elements. In order to do this, we not only counted the
informants’ tokens representing each of the variants in (9) through (12) (as done
in Kaufmann 2007). Instead, a normalized measure for the probability of verb
projection raising and scrambling was calculated for each of the ten clauses for
which enough good-quality translations were available. The informants’ actual
behavior was then compared to the expected behavior for each clause.12 With this

12 For verb projection raising 1,905 translations of nine stimulus sentences could be used (an
­average of 6.1 tokens per informant); for scrambling 1,167 translations of ten stimulus sentences
(an average of 3.7 tokens per informant) (cf. the description of this methodology in Kaufmann
2011: 209–210).
126 Göz Kaufmann

method, it no longer mattered whether informants translated all ten clauses (most
did not) and which clauses they actually translated. Metrical index values for
both raising and scrambling could be allotted to 282 of the 313 informants. These
indexes served for the grouping of these informants into four types of speakers by
means of a cluster analysis.
We can now focus on the decisive question, namely how the four types of
informants behave with regard to the phenomenon in question. Table 4 furnishes
this information; the distribution is highly significant and this time it shows a
somewhat higher strength of association:

Table 4: Distribution of the two variants in dependent non-causal clauses with one verbal
element in all colonies separated according to the informants’ behavior in dependent non-
causal clauses with two verbal elements (vpr = verb projection raising; part = particle)

informants
NR-variant I NR-variant II VPR-variant VR-variant total
preferring the
-vpr -vpr +vpr +vpr
characteristics
-scrambling +scrambling -scrambling +scrambling

n 260 773 200 415 1648


n (ObjPPs) 44 (16.9%) 133 (17.2%) 36 (18%) 68 (16.4%) 281 (17.1%)
n (indefinite ObjNP/PP) 37 (14.2%) 108 (14%) 31 (15.5%) 55 (13.3%) 231 (14%)
n (complement clauses) 101 (38.8%) 315 (40.8%) 78 (39%) 154 (37.1%) 648 (39.3%)

obj-verb(-part) 253 770 169 399 1591


97.3% 99.6% 84.5% 96.1% 96.5%
χ2 (df=3, n=1648) = 109.3, p=0*** / Cramer’s V: 0.26/ 0 cells (0%) with less than 5 expected tokens
verb-obj(-part) 7 3 31 16 57
2.7% 0.4% 15.5% 3.9% 3.5%

The non-verb-final variant is strongly concentrated among the informants who


prefer the VPR-variant in dependent clauses with two verbal elements. Although
this group contributes the lowest number of tokens to the analysis (200; i. e. 12.1 %
of the 1,648 tokens), they produce more than half of the tokens of the variant in
question (31 instead of 6.9 expected tokens; i. e. 54.4 % of the 57 tokens). Having
identified this type of informant as the most productive one, let us look at the
informants who are least productive. These are the informants who prefer the NR-
variant II; they contribute 770 tokens in total (46.9 %), but only three instead of
expected 26.7 tokens of the variant in question (5.3 % of the 57 tokens). As these
two groups have opposite preferences with regard to verb projection raising and
scrambling (cf. the line characteristics), we believe that a positive setting for
verb projection raising and a negative one for scrambling constitute the decisive
factors promoting the occurrence of the non-verb-final variant. If this conclusion
Rare phenomena revealing basic syntactic mechanisms 127

is correct, the other two clusters should show intermediate performances, since
both coincide with the most productive VPR-informants in one of the two dimen-
sions. Table 4 confirms this hypothesis, showing intermediate shares of 2.7 % and
3.9 %.
The high concentration of the marked variant in one particular group of
informants precludes the possibility of accounting for this variant by means
of priming since priming should have a comparable effect on all informants.
Another conclusion can be drawn from the fact that the group of informants who
prefer the VPR-variant has a share of the marked variant almost four times as big
as the group of informants who prefer the VR-variant. If we probe a little bit more
into this difference, we can see how important it is to characterize the informants
according to their general syntactic behavior. The group that prefers the VPR-vari-
ant has fewer US-American informants than the group that prefers the VR-vari-
ant. Half of the informants of the latter group come from the US, the colony with
the strongest concentration of the marked variant (8.4 %; cf. Table 1). In spite of
this, the group’s share of the marked variant is only 3.9 %. Among the informants
who prefer the VPR-variant, however, this share is 15.5 % although only one third
of these informants come from the United States.
In spite of the telling distribution in Table 4, the statistical analysis can
still be refined. As we calculated a value for the informants’ probability for verb
projection raising and scrambling, we can check the average values for these
two dimensions for the Mennonites that produce the unmarked variant with the
sequence ObjNP/PP-verb and the ones that produce the marked variant verb-
ObjNP/PP. Besides being able to use parametric tests (the values can be consid-
ered quasi-intervals), we can also include more tokens, since some informants
have a value for one dimension, but not for the other one. These informants had
to be excluded in Table 4. In Table 5, their value for either raising or scrambling
can be included.

Table 5: Average values for verb projection raising and scrambling for the informants
who produced the two variants in dependent non-causal clauses with one verbal element
(obj = ObjNPs/PPs; part = particle)

total obj(-part)-verb verb-obj(-part)

n 1788 1730 58
verb projection raising +0.027 +0.015 +0.374

n 1689 1631 58
scrambling +0.007 +0.015 -0.202
128 Göz Kaufmann

The highly significant difference between the raising values of the informants
producing the two variants is 0.359 (F (1,1786) = 72.8, p = 0***). With regard to
scrambling, the difference is also highly significant, but it is somewhat smaller
at 0.217 (F (1,1687) = 40.6, p = 0***). We can therefore conclude – now with even
more confidence – that a tendency towards raising VP2 in dependent clauses
with two verbal elements and towards not scrambling the ObjNP/PP out of VP2
promotes the occurrence of the non-verb-final variant in dependent clauses with
one verbal element. Informants who produce clauses like

(11) wann her mut daut Hus nu verköpe […]


if he must-VERB1 the house now-ADVERB sell-VERB2 […]

show a strongly above-average tendency towards producing clauses like

(1a) wann hei unterschrieft [0.4] diesen contrato […]


if he signs-VERB […] this contract […]

Conversely, informants producing clauses like

(10) wann hei sin Hüs nu verköpe soll […]


if he his house now-ADVERB sell-VERB2 shall-VERB1 […]

almost exclusively produce clauses like

(1b) wann hei det Kontrakt [ehm] unterschrieft […]


if he the contract [ehm] signs-VERB […]

4.3 Structure of the variant with the verb-object sequence

4.3.1 Structural description


The previous section has shown that the informants who produce tokens like
(11) are the ones who produce the highest number of tokens like (1a). Unlike the
informants who produce tokens like (10) and (1b), these informants produce the
ObjNP/PP sometimes to the left as in (11) and sometimes to the right of its govern-
ing verb as in (1a). An analysis assuming that the finite verb is in second position
in both these cases does not do justice to the data (cf. the second point in Section
4.3.2). Nevertheless the precise position of the complement is unclear and, there-
fore, it is not easy to know whether the finite verbs in (11) and (1a) are in the
same position or not. One may suggest for (1a) the existence of another functional
phrase with an initial head (besides CP), but this would be a rather far-reaching
assumption hardly justified by so low a number of tokens.
Rare phenomena revealing basic syntactic mechanisms 129

A probably more coherent explanation is that speakers show syntactic


preferences in general and not exclusively for clauses with certain numbers of
verbal elements. In order to check this hypothesis, we have to apply the struc-
tural derivations developed for dependent clauses with two verbal elements (cf.
(13a–c2ii)) to dependent clauses with one verbal element. Applying the first three
steps – which resulted in the VPR-variant in (13c1) – we should obtain (1a) wann
hei unterschrieft [0.4] diesen contrato […] since the highest concentration of this
variant has been found among the informants who prefer the VPR-variant:

(14) a. basic structure [CP … [IP … [VP diesen Contrato unterschriew] t(3PS)]]

b. gaining finiteness [CP … [IP … [VP diesen Contrato ta] unterschriefat]]


(head movement from V0 to I0)

c1. verb projection raising [CP … [IP [IP … tb unterschriefat] [VP diesen Contrato ta]b]]
(raising and adjunction of VP to IP)

The derivational predictions illustrated in (13a–c1) and (14a–c1) corroborate the


outcome of the distributional analysis in Section 4.2.3. The marked variant is the
consequence of the raising of VP with the non-scrambled ObjNP and the trace of
the moved verb. The existence of this trace is actually the decisive point: The verb
has left VP and has been moved to the clause-final head position of IP prior to
the raising of VP, thus causing the superficially final position of the ObjNP. Struc-
turally, however, the phonetically non-realized trace occupies the last position.
Applying these derivational steps to clusters with two verbal elements, we end up
with the VPR-variant, which obviously does not feature the ObjNP in final posi-
tion, because its governing verb is not moved out of the raised VP2. This deriva-
tion, therefore, constitutes a possible counterexample to Haider’s (2010: 54–68)
conviction that there are no head-final functional phrases in OV-languages like
German, since it depends crucially on the assumption of head-final functional
phrases. As these conclusions are far-reaching, we will dedicate Section 4.3.2 to
dispelling possible doubts and Sections 4.4.1 through 4.4.3 to presenting addi-
tional empirical facts supporting our analysis.

4.3.2 Possible objections to the structural description


The first question one may ask is why verb projection raising without scram-
bling in clauses with one verbal element is so infrequent in comparison to verb
projection raising without scrambling in clauses with two verbal elements. In
the nine dependent clauses used for the calculation of the informants’ raising
propensity (cf. footnote 12), the VPR-variant has a share of 12.3 %, i. e. four times
higher than the 3.2 % of the verb-ObjNP/PP-variant (cf. Tables 1 and 2). We assume
130 Göz Kaufmann

that this difference is connected to parsing: In Kaufmann (2007: 198–202), it was


shown that completely right-branching structures are more frequent when the
dependent clause has more verbal elements. This means that the sequence verb-
ObjNP/PP in clauses with one verbal element is less frequent than the sequence
(ObjNP/PP-)verb1(-ObjNP/PP)-verb2 in clauses with two verbal elements. And
this sequence is less frequent than the sequence (ObjNP/PP-)verb1(-ObjNP/PP)-
verb2(-ObjNP/PP)-verb3 in clauses with three verbal elements. The reason for this
rise in raising is that parsing-unfriendly left-branching structures in a language
with head-final verb phrases become more complex with every additional verb
phrase. These increasingly complex structures, however, can be broken up by
raising the embedded verb phrases, i. e. VP2 in clusters with two verbal elements
and VP2 containing VP3 in clusters with three verbal elements (frequently with a
second cycle raising VP3 out of VP2).
We see that the lack of a propensity for verb projection raising can be over-
ridden by parsing necessities, a case of usage outweighing the system. The
Paraguayan informants, for example, produce rigidly left-branching clusters
with regard to two verbal elements almost exclusively (ObjNP/PP-verb2-verb1).
This SHG-like behavior contrasts markedly with their non-SHG-like behavior
concerning dependent clauses with three verbal elements. In these clauses,
they frequently raise VP2 containing VP3 and then raise VP3 producing rigidly
right-branching clusters with the sequence (ObjNP/PP-)verb1(-ObjNP/PP)-verb2­
(-ObjNP/PP)-verb3 (cf. Kaufmann 2011: 204, Table 2).13
A propensity for verb projection raising, however, should remain latent when
the speaker produces short, non-complex clauses. If they nevertheless apply verb
projection raising in such clauses, they produce the variant we are interested in.
Here, then, we are faced with a case of the system winning over usage. One may
call this a case of syntactic misfiring because it goes against Zwart’s (1996: 233)
conviction that “[i]n more [not less!] complex verb clusters, tendencies tend to
become rule”.
The second point we need to discuss is connected to clauses with the VPR-
variant used for the calculation of the informants’ raising propensity. All of these
clauses were superficially verb-second, i. e. extant tokens with the sequence
adverb-verb1-ObjNP-verb2 were not included. Could it therefore be that the

13 In SHG, we find a comparable phenomenon: Although verb projection raising does not exist
with two verbal elements, let alone with one, it does exist with three or more verbal elements
when certain morphological conditions are met (e. g., the so-called IPP-effect). The consequence
is that even SHG has a partially right-branching, supposedly more parsing-friendly verbal
sequence, namely (ObjNP/PP-)verb1-(ObjNP/PP-)verb3-verb2.
Rare phenomena revealing basic syntactic mechanisms 131

sequence verb-ObjNP/PP in clauses with one verbal element is simply verb-sec-


ond, the result of the regular verb movement from V0 to I0 to C0 in German main
clauses? Three arguments speak against such an analysis: First of all, unlike
causal clauses, complement, relative and conditional clauses do not show a strong
tendency to appear as dependent main clauses with an introductory element.
Secondly, 26 of the 57 tokens (45.6 %) in Table 4 are produced by informants who
do not show a propensity for the VPR-variant. Thirdly, six of the 59 tokens of the
variant in question are clearly not verb-second (cf. (15)).

stimulus <2> Spanish: Juan no cree que conozcas bien a tus amigos
English: John doesn’t think that you know your friends well

(15) [eh] Johann gleuf nich daut dü: gut kenns sine Frend (Mex-26; m/34/MLG)
[eh] John believes not that you good-ADVERB know-VERB his friends

If verb-second really was the reason for the marked sequence verb-ObjNP/PP, we
would assume that these six non-verb-second tokens were not produced by the
informants who prefer the VPR-variant. This, however, is not the case: Three of
the six tokens are produced by these informants, i. e. they partake in tokens like
(15) to the same extent as in all other tokens with the marked sequence (the other
3 tokens are distributed among the other 3 groups of informants).
The last point to discuss here concerns the informants that prefer the VR-
variant in dependent clauses with two verbal elements. If we continue applying
the derivations as in (13b–c2ii), we obtain (16b–c2ii) for clauses with one verbal
element. We start from the second step, i. e. from head movement from V0 to I0:

(16) b. gaining finiteness [CP … [IP … [VP diesen Contrato ta] unterschriefat]]
(head movement from V0 to I0)

c2i. scrambling [CP … [IP diesen Contratoc [IP … [VP tc ta] unterschriefat]]]
(scrambling of ObjNP out of VP to IP)

c2ii. verb projection raising [CP … [IP [IP diesen Contratoc [IP … tb unterschriefat]] [VP tc ta]b]]
(raising and adjunction of a completely emptied VP to IP)

In (16b–c2ii), raising and scrambling neutralize each other causing absolute


string-vacuity. Therefore, we will not be able to show that a phonetically com-
pletely emptied VP consisting of two traces can be raised. But there may exist
a way to check whether steps (16b) through (16c2ii) are empirically possible in
clauses with one verbal element. Assuming that particle verbs can be analyzed
as small clauses (following Bennis 1992 and Zwart 1996: 241), the complement
clause in (3a) […] daut ik ha en nüet Kleid an would be derived like this:
132 Göz Kaufmann

(17) a. basic structure: [CP … [IP … [VP [SC en nüet Kleid an] ha] ∅(1PS)]]

b. gaining finiteness [CP … [IP … [VP [SC en nüet Kleid an] ta] haa-∅]]
(head movement from V0 to I0)

c1. verb projection raising [CP … [IP [IP … tb haa-∅] [VP [SC en nüet Kleid an] ta]b]]
(raising and adjunction of VP to IP)

The alternative at the bifurcation (17b) would lead us to:

c2i. scrambling [CP … [IP en nüet Kleidc [IP … [VP [SC tc an] ta] haa-∅]]]
(scrambling of ObjNP out of the small clause and of VP to IP)

c2ii. verb projection raising [CP … [IP [IP en nüet Kleidc [IP … tb haa-∅]] [VP [SC tc an] ta]b]]
(raising and adjunction of an almost completely emptied VP
to IP)

If informants who prefer the VR-variant apply the same derivational steps in
dependent clauses with a particle verb, we should find clauses like (17c2ii) […]
daut ik en nüet Kleid ha an, which we do not. There is only one translation of
stimulus sentence <33> which may constitute a possible exception to this:

stimulus <33> English: This is the journey I am inviting my mother on

(18) det‘s die Reis wo ik mine [0.4] Mutter friar mit (USA-40; m/36/MLG)
this-is the journey where I my […] mother ?lead?-VERB with-PARTICLE
‘This is the journey on which I am taking my mother.’

Friar in (18) could be a mispronunciation of the root of führen (‘lead’), i. e. führ.


In this case, a kind of contamination of the root-final r in the onset of the word
would have taken place.14 Although this is a possible explanation and although
führen (‘lead’) would fit semantically, we cannot be sure whether this is the right
explanation. It is clear, however, that the penultimate element in this token is a
verb and the last element the particle of this verb. With regard to the derivation in
(17c2ii), this suggests correctly that it is not the entire small clause [SC mine Mutter
mit] which has been scrambled out of the VP. It is rather the ObjNP mine Mutter
which has been scrambled out of the small clause and out of the VP.

14 This is a kind of metathesis frequently happening. Think, for example, of German fragen
(‘ask’) and forschen (‘investigate’), both connected to Old High German forsca (‘question’). The
same story can be told for the Latin cognate percontari which developed into Spanish preguntar,
while Portuguese maintained the original sequence perguntar.
Rare phenomena revealing basic syntactic mechanisms 133

The assumed combination of raising and scrambling in (17c2ii) suggests an


informant from the raising- and scrambling-friendly group, the one preferring
the VR-variant in dependent clauses with two verbal elements. This is indeed the
case. In spite of this, the unique token (18) is no more than anecdotal evidence. It
does exist, though, and it does not refute our hypothesis. Obviously, we still have
to answer the question of why this variant occurs only once among the eighty
informants who prefer the VR-variant. After all, a robust number of seven of their
sixteen non-verb-final tokens feature a particle verb. But as the more “proper”
variant for the VR-variant group exists at least derivationally (cf. 17c2ii), we would
expect more than just one token with the linearization of (18). Actually, we would
expect twenty tokens.15
One explanation for the conspicuous lack of such tokens is that the ObjNP
and the particle in a small clause constitute such a coherent, mutually depend-
ent unit that scrambling the ObjNP on its own would represent a dispreferred
option. Another and perhaps more convincing explanation is that the raising of
completely pitted or almost completely pitted verb phrases (with just a particle)
constitutes a dispreferred option. The informants who prefer the VR-variant, i. e.
the ones that apply both raising and scrambling, may find themselves in a lose-
lose situation: If they follow their drive for scrambling, they cannot raise due to
little or no phonetic material in the VP; if they follow their drive for raising, they
cannot scramble since this would reduce the phonetic material in the verb phrase
above limits (importantly, scrambling after raising is not possible; cf. footnote
10). With regard to clauses with two or more verbal elements, this is not an issue
because the embedded main verb does not move to I0, i. e. the raised VP con-
tains its verbal head and not just the trace of it. Thus, the reason for the probably
ungrammatical status of (17c2ii), i. e. for a clause like […] daut ik en nüet Kleid ha
an, may be that a VP has to have a certain phonetic weight in order to be eligible
for raising. In the case of a headless, i. e. verbless, VP, this means that the VP
must contain at least an ObjNP/PP. A verbal particle on its own does not seem to
be enough. This means that the complement in the unmarked variant with the
sequence ObjNP/PP-verb can be scrambled or not; raising, however, in dependent

15 Particle verbs only appeared in the translations of stimulus sentences <4> and <5> – two com-
plement clauses which rendered many tokens with the marked sequence. The informants who
prefer the VPR-variant produced 28.9 % of the marked variant in these two clauses (11 out of 38
tokens). As the marked variant is the fitting variant for these informants and the sequence of (18)
would be the fitting variant for informants who prefer the VR-variant, we may – if we assume a
comparable share of 28.9 % – expect the structure of (17c2ii) to yield 20.2 out of seventy tokens
by the VR-friendly informants in clauses <4> and <5>.
134 Göz Kaufmann

clauses with just one verbal element seems to be only possible provided the com-
plement is not scrambled.

4.4 Supporting evidence for the scrambling hypothesis

In this section, we will present three distributional facts supporting the assump-
tion that the lack of scrambling is crucial in the generation of the marked variant.
In the first two parts, we will take advantage of the fact that there is some mor-
phological and semantic variation with regard to the complements in the trans-
lations. Section 4.4.1 will deal with the difference between ObjNPs and ObjPPs;
Section 4.4.2 explores the difference between definite and indefinite ObjNPs/
PPs. The last section will then extend the analysis to causal clauses in the South
American colonies to determine whether the informants behave in a comparable
way with regard to this type of clause.
If the lack of scrambling is the decisive factor for the generation of the marked
variant with one verbal element, this has an effect on the analysis of verb clusters,
too. In Section 4.2.3, it was shown that there is a clear relationship between the
distributional facts for dependent clauses with one verbal element and the distri-
butional facts for dependent clauses with two verbal elements. We can therefore
extrapolate that if the lack of scrambling is responsible for the marked variant
with one verbal element and if this variant is predominantly produced by inform-
ants who prefer the VPR-variant, then the VPR-variant itself should be the con-
sequence of a lack of scrambling. We thus assume that the application of syn-
tactic preferences can differ quantitatively (cf. the first point in Section 4.3.2), but
not qualitatively: Once a scrambler, always a scrambler; once a non-scrambler,
always a non-scrambler. The converse argument also applies: Informants who
prefer the VR-variant produce the marked non-scrambled variant with one verbal
element far less frequently than informants who prefer the VPR-variant. This dif-
ference does not come as a surprise because we have shown that the VR-variant
is the consequence of scrambling of the complement out of the verb phrase (i. e.
VP2) (cf. table 3).

4.4.1 Prepositional and non-prepositional complements


This section deals with the question of whether ObjPPs show a different scram-
bling behavior from ObjNPs in MLG, i. e. whether or not they promote (or not)
the occurrence of the variant in question. Haider (2010: 147, 158, 173) deals exten-
sively with the question of whether ObjPPs can move in German. He confirms this
possibility but does not state explicitly whether ObjPPs scramble more or less
Rare phenomena revealing basic syntactic mechanisms 135

than ObjNPs. One may see an indirect indication for less scrambling in the fact
that Haider (2010: 140) writes that “[p]repositional objects are the lowest ranking
objects”, i. e. they constitute the most deeply embedded argument in the verb
phrase. It is difficult, however, to say whether more deeply embedded objects
scramble less than less deeply embedded ones. Schmitz’ (2006: 44) comment is
more telling in this respect; she notes that the scrambling of prepositional objects
is possible in German, but that its acceptability is lower than the acceptability of
scrambled non-prepositional objects. This is also the situation in MLG, not only
with regard to acceptability but also with regard to production. In the MLG clauses
analyzed here, scrambling ObjPPs is much less frequent than scrambling ObjNPs
(a fact which we cannot elaborate on here due to lack of space). Therefore, we can
formulate the following expectation with regard to this variable: If the marked
variant with one verbal element is indeed the consequence of a lack of scram-
bling, we expect this variant to occur more often with ObjPPs than with ObjNPs
since ObjPPs do not scramble as easily as ObjNPs in MLG. There are two clauses
which can be used in order to check this expectation, since these clauses show
enough variation with regard to the marking of the indirect object and with regard
to its position. They are <32> The stories that he’s telling the men are very sad and
<37> I have found the book that I have given to the children. We have already seen
examples for <32> (cf. (6a–b)), so here we will give examples for <37>:

stimulus <37> Spanish: Encontré el libro que les di a los niños


English: I have found the book that I have given to the children

(19) a. ich ha det Bük gefunge waut ik de Kinder gev (Men-11; f/18/MLG)
I have the book found that I the children give-VERB

b. ik fung- funk daut Bük waut ik no de Kinder gov (Men-47; f/60/MLG)


I foun- found the book that I to the children gave-VERB

c. ik hat de Bük gefunge waut ik ge- [0.4] waut ik gov to de Kinder (Men-3; f/38/MLG)
I had the.REDUCED book found that I gi- […] that I gave-VERB to the children

We cannot add an example with an ObjNP for the marked variant since there is
not a single token for the sequence verb-ObjNP. Translation (19c) is especially
interesting because of the repair contained in it. One might have expected the
informant to restart in order to put the ObjPP in the expected preverbal position,
but she actually restarts in order to correct the tense, i. e. past tense instead of
present tense. Other than this, she sticks firmly to her serialization. Table 6 offers
the distributional facts for clauses <32> and <37> (only definite ObjNPs/PPs):
136 Göz Kaufmann

Table 6: Distribution of the two variants in two relative clauses with one verbal element
separated according to the morphological type of object (only definite ObjNPs/PPs)

ObjNP ObjPP total

n (token) 321 36 357

ObjNP/PP-verb 321 (100%) 30 (83.3%) 351 (98.3%)


χ2 (df=1, n=357) = 54.4, p=0*** / Phi: 0.39 / 1 cell (25%) with less than 5 expected
tokens (Fisher’s Exact: p=0***)
verb-ObjNP/PP 0 (0%) 6 (16.7%) 6 (1.7%)

The distribution in Table 6 is highly significant both according to Pearson’s Chi-


Square and to Fisher’s Exact. For these two clauses, it seems that only VPs with
ObjPPs and not with ObjNPs can be raised, thus surfacing after the finite verb
which has moved to I0. The strength of association is quite expressive (Phi is
0.39). With this result, we have the first independent support for our hypothesis
that, besides verb projection raising, it is the lack of scrambling that causes the
marked sequence verb-ObjNP/PP. Moreover, there is another fact that discounts
extraposition of the OPjPP as an alternative explanation.
Recall translations like (8) immer wann ik weggo von Hüs dann du ik immer
daut Lich ütmeaken (always when I away-go from home then do I always the light
out-make). For these tokens, extraposition was regarded as a possible explana-
tion due to the position of the ObjPP after the particle and due to the obvious lack
of raising in the sequence particle-verb in weggo. Fortunately, the informants for
most of these probably extraposed tokens can be characterized with regard to
their behavior in verb clusters. Ten of them prefer the scrambled NR-variant II
in dependent clauses with two verbal elements. This share of 83.3 % is markedly
higher than this group’s share of the marked variant dealt with in Section 4 (5.3 %
of the 59 tokens). The informants who prefer the non-scrambled VPR-variant only
produce one token with an extraposed ObjPP as in (8) (8.3 %) as compared to 31
tokens of the variant which interests us (54.4 %; the share of the 21 tokens with
final ObjPPs is 52.4 %). The syntactic profiles of the mainly scrambling inform-
ants who produce tokens like (6a) and (19c) and the mainly non-scrambling in­
formants who produce tokens like (8) could not be further apart.

4.4.2 The definiteness of complements


Indefinite ObjNPs/PPs normally represent new, unknown information; they
therefore scramble less easily than definite ObjNPs/PPs (cf. Eisenberg’s (2013:
382) point (1e)). This means that if lack of scrambling is part of the derivation
of the marked sequence verb-ObjNP/PP, we expect that the occurrence of this
Rare phenomena revealing basic syntactic mechanisms 137

sequence is promoted by indefinite complements. Ironically, the decisive test


for this question is only possible thanks to partly erroneous translations. Some
informants translated definite articles as indefinite ones or indefinite as definite
ones. Examples for sentences <32> and <38> are given. Tokens with definite argu-
ments have already been presented for <32> (cf. (6a–b)). The examples in (20a–b)
show either a preposed indefinite ObjNP or a postposed indefinite ObjPP. The
examples in (21a–b) only show indefinite ObjNPs on both sides of the verb:

stimulus <32> Spanish: Las historias que les está contando a los hombres son muy tristes
English: The stories that he is telling the men are very sad

(20) a. die: Geschichte waut hei nem Mann vertahlt is sehr trürig (Men-39; f/36/MLG)
the story that he a.REDUCED man tells-VERB is very sad

b. die Geschichten waut hei [0.5] vertahlt an em- an ne Männer16 sin sehr trürig
(Mex-93; f/39/MLG)
the stories that he […] tells-VERB to a- to such.REDUCED men are very sad

stimulus <38> S
 panish: El hombre que provocó el accidente desapareció
English: The man who caused the accident has disappeared

(21) a. De Omtje waut da en accident hat [0.8] der is furtgekummen (Mex-51; m/22/MLG)
the man that there an.REDUCED accident has-VERB […] he is away-gone

b. der Omtje det hat einen accident [0.5] is wajch (USA-17; f/14/Engl)
the man that-RELATIVE PARTICLE has-VERB an accident […] is away

Table 7 presents the distributional facts. The tokens are split between ObjNPs and
ObjPPs (cf. Section 4.4.1). For the ObjNPs a further separation had to be intro-
duced since there is a highly significant difference in the share of complement
clauses promoting the marked variant (cf. Section 4.2.2). Such a difference does
not exist with regard to the ObjPPs.

16 Informant Mex-93 starts out with an em (‘to a’), which she then repairs into an ne Männer
(‘to such men’). As (20b) is the only token where an indefinite ObjPP surfaces after the verb, it
is important that this complement is indeed indefinite. For an em, the categorization of em as
a reduced form of the indefinite article is unproblematic since cliticization of definite articles
is not present in the Mennonite data set. The semantically singular ne in an ne Männer is more
problematic: Ne seems to be a reduced form of the complex plural determiner sone (‘such’, a
portmanteau of solch eine ‘such a’, cf. Duden 2006: 330–331), which does occur several times
as a full form in the data set. In spite of the partially “definite” quality of solch in sone, the
characterization of the entire ObjPP as indefinite is justified – firstly because the more important
first attempt an em contains a clearly indefinite article and secondly because it is precisely the
“definite” part solch which is missing in an ne Männer.
138 Göz Kaufmann

Table 7: Distribution of the two variants in five dependent non-causal clauses with one verbal
element in all colonies separated according to the definiteness of the ObjNP/PP and according
to the type of clause (obj=ObjNPs/PPs; part=particle)

ObjNPs ObjPPs
complement clauses relative clauses <32> complement clause <5>
<1> and <4> and <38> and relative clause <32>
+definite -definite +definite -definite +definite -definite

n 143 247 248 7 105 5

obj-verb(-part) 141 234 246 6 91 2


98.6% 94.7% 99.2% 85.7% 86.7% 40%
χ2 (df=1, n=255) = 10.6 χ2 (df=1, n=110) = 8
χ2 (df=1, n=390) = 3.7
p=0.001**/ Phi: 0.2 p=0.005**/ Phi: 0.27
p=0.056(*)/ Phi: 0.1
2 cells (50%) < 5 2 cells (50%) < 5
0 cells (0%) < 5
Fisher’s Exact: p=0.08(*) Fisher’s Exact: p=0.026*
verb-obj(-part) 2 13 2 1 14 3
1.4% 5.3% 0.8% 14.3% 13.3% 60%

All distributions in Table 7 show a concentration of the marked variant in the


tokens with an indefinite ObjNP/PP. In the complement clauses with ObjNPs the
distribution shows a strong statistical tendency. In the other two blocks the dis-
tributions are highly significant, but in each case there are two cells with less
than five expected tokens. In these cases, Fisher’s Exact was applied and shows
one statistical tendency (relative clauses with ObjNPs) and one significant result
(ObjPPs). The fact that all of these distributions follow the same pattern gives us
confidence that the concentration of the marked variant in the tokens with an
indefinite ObjNP/PP is a valid result in spite of the unequal number of tokens in
the cross tabulations.
Moreover, there is additional support: Comparing the hitherto unused clause
<3> Don’t you see that I’m turning on the light with clause <4> Can’t you see that I am
wearing a new dress, we see that the two clauses share almost all characteristics
except the definiteness of the ObjNP. They both have an almost identical matrix
clause, they are both complement clauses, they both feature particle verbs and
they both contain ObjNPs describing a concrete, non-animate concept. Tokens
for clause <4> were already given (cf. (3a–b)), an example for clause <3> follows:

stimulus <3> Spanish: ¿No ves que estoy prendiendo la luz?


English: Don’t you see that I am turning on the light?

(22) siehts dü nich daut ik daut Licht anschalt (Fern-11; m/44/SHG)


see you not that I the light on-PARTICLE-switch-VERB
Rare phenomena revealing basic syntactic mechanisms 139

We cannot present an example of the marked variant for clause <3> because there
is not a single one. Conversely, clause <4> is among the sentences with the highest
share of the marked variant. Table 8 gives the distributional facts for the two
clauses. The distribution is highly significant; as expected, only the indefinite
ObjNPs are found in the marked variant:

Table 8: Distribution of the two variants in two dependent complement clauses with one verbal
element in all colonies separated according to the definiteness of the ObjNP (stimulus sen-
tences <3> with definite ObjNPs; stimulus sentence <4> with indefinite ObjNPs; obj = ObjNPs;
part = particle)

clause <3> clause <4>


total
+definite -definite

n 148 237 385

obj-verb(-part) 148 (100%) 224 (94.5%) 372 (96.6%)


χ2 (df=1, n=385) = 8.4, p=0.004** / Phi: 0.15 / 1 cell (25%) with less than 5 expected
tokens (Fisher’s Exact: p=0.002**)
verb-obj(-part) 0 (0%) 13 (5.5%) 13 (3.4%)

In Section 4.3.2, we have given a parsing-related explanation for the generally low
number of raised VPs in dependent non-causal clauses with one verbal element.
In this section and in Section 4.4.1, we have shown that morphological and
semantic characteristics of the complements in these clauses – characteristics
known for their suppressing effect on scrambling – clearly promote the occur-
rence of the marked and raised variant. We therefore conclude (i) that the lack
of scrambling causes the rare and highly marked sequence verb-ObjNP/PP and
(ii) that Den Besten and Broekhuis (1989) may be right after all when claiming
that the VR-variant (but not the VPR-variant) is the consequence of scrambling,
at least with regard to MLG (cf. their quote in footnote 11).
Importantly, the analysis of dependent clauses with two verbal elements con-
firms our hypothesis. Clauses with indefinite ObjNPs/PPs co-occur significantly
more frequently with the VPR-variant (no scrambling of the ObjNP/PP) than
clauses with definite ObjNPs/PPs. These latter clauses appear significantly more
frequently with the VR-variant (scrambling of ObjNP/PP). Checking the dif­ference
between definite ObjNPs and definite ObjPPs in dependent clauses with two
verbal elements also shows the expected distribution, i. e. a significant affinity of
ObjPPs with the VPR-variant and of ObjNPs with the VR-variant. Obviously, one
question which has to be answered is what causes the movement we have labeled
scrambling. For verb projection raising, the avoidance of parsing-difficult left-
140 Göz Kaufmann

branching structures was seen as cause (cf. Section 4.3.2). For scrambling, which
is often seen as an optional, pragmatically motivated movement, the answer is
less clear. One must not forget that all analyses in this article are based on the
translations of context-free sentences where pragmatic considerations are of sec-
ondary importance at best. Besides this, the co-occurrence of either two scram-
bled or two unscrambled variants in clauses with one and two verbal elements
do not fit well with the nature of an optional movement, but suggest structural
causes like, for example, feature checking. We do, however, not have an answer
to this question yet and leave it to further research.

4.4.3 Causal clauses in the South American colonies


So far, we have refrained from including causal clauses in our analyses. The
reason for this is that they – like causal clauses introduced by weil (‘because’) in
colloquial German – behave differently from other types of dependent clauses.
Table 9 shows that almost half of the Mennonite tokens show the “marked”
sequence verb-ObjNP in causal clauses with one verbal element (cf. footnote 6).
This is because the introductory element wegen(s) (in South America also wiel(s);
both ‘because’) has been largely reanalyzed as a coordinating element in the two
linguistically most progressive colonies, i. e. the USA and Mexico. Thus, most
informants there generate not just superficial, but structural V2-causal clauses.

Table 9: Distribution of the two variants in causal clauses with one verbal element in all
colonies separated according to the informants’ origin (obj=ObjNPs; part=particle)

If almost three quarters of all tokens in Mexico (and even more in the United
States) show the non-verb-final pattern, reanalysis seems to be the only pos-
sible explanation (cf. Kaufmann 2003: 188–189). We therefore have to reduce
the scope of the following analysis to the South American colonies. Even there,
however, the lowest share of the marked variant (Menno with 8 %) is comparable
Rare phenomena revealing basic syntactic mechanisms 141

with the highest share for the other types of clauses (the US-American Mennon-
ites with 8.4 %; in that category, Menno has a share of 0.7 %; cf. Table 1). If our
hypothesis with regard to the iconic relationship between V2-clauses and more
syntactic independence is correct (cf. Section 4.2.2), this relatively high share of
superficial V2-causal clauses correctly indicates that extraposed causal clauses
are indeed more independent and thus less deeply embedded than extraposed
complement clauses, let alone (non-extraposed) relative and conditional clauses.
A possible consequence of such a constellation is reanalysis: the higher the share
of V2-clauses, the higher the probability that superficial V2 turns into structural
V2. This has apparently happened in the MLG of most informants in Mexico and
the USA. In the South American colonies, the process has not yet reached the
point of no return, but even in the SHG-competent Paraguayan colonies, causal
clauses have to be analyzed independently. In these colonies, raising is frequent
in clusters with three verbal elements (cf. Section 4.3.2), but rare in clusters with
two verbal elements and virtually absent in dependent non-causal clauses with
one verbal element (cf. Table 1). Causal clauses in Paraguay, however, are gener-
ated much more frequently with the marked sequence verb-ObjNP and with the
VPR-variant in clusters with two verbal elements. Two examples are given in (23)
and (24). The examples in (23) feature a particle verb, the ones in (24) a simple
verb:

stimulus <23> S
 panish: No te puede escuchar porque está sacando las cosas de la maleta
English: He can’t listen to you because he is unpacking his luggage

(23) a. dei kaun di nich hiere wiels dei riemt grad die Koffer üt (Fern-34; m/25/SHG)
he can you not hear because he packs-VERB just the suitcases out-PARTICLE

b. hei kaun [0.7] di nich hiere wiels hei sinen Koffer ütpackt
(Fern-11; m/44/SHG)
he can [...] you not hear because he his suitcase out-PARTICLE-packs-VERB

stimulus <24> S
 panish: No está aquí porque está ayudando a tu padre
English: He is not here because he is helping your father out

(24) a. her is nicht hier wejens hei helpt sin [0.3] Voda (Men-24; m/25/MLG)
he is not here because he helps-VERB his […] father

b. her is nich hier wejens hei dinen Voda halp (Men-15; f/20/MLG)
he is not here because he your father helps-VERB

Table 10 shows the distribution of South American causal clauses with regard to
the different types of informants.
142 Göz Kaufmann

Table 10: Distribution of the two variants in dependent causal clauses with one verbal element
in the South American colonies separated according to the informants’ behavior in dependent
non-causal clauses with two verbal elements (vpr = verb projection raising; obj = ObjNPs;
part = particle)

informants who NR-variant I NR-variant II VPR-variant VR-variant total


prefer the
-vpr -vpr +vpr +vpr
characteristics
-scrambling +scrambling -scrambling +scrambling

n 104 258 27 22 411


n (indefinite NPs) 64 (61.5%) 161 (62.4%) 21 (77.8%) 12 (54.4%) 258 (62.8%)

obj-verb(-part) 87 233 15 18 353


83.7% 90.3% 55.6% 81.8% 85.9%
χ2 (df=3, n=411) = 25.4, p=0*** / Cramer’s V: 0.25 / 2 cells (25%) with less than 5 expected tokens
verb-obj(-part) 17 25 12 4 58
16.3% 9.7% 44.4% 18.2% 14.1%

The distribution shows a highly significant difference and the same concentra-
tions we have found in complement, relative and conditional clauses (cf. Table 4).
Informants who prefer the VPR-variant in dependent clauses with two verbal ele-
ments produce twelve of the 58 tokens of the marked variant (20.7 %; 3.8 expected
tokens) although they only have a share of 6.6 % of all tokens (27 out of 411). The
other extreme is again found among the informants who prefer the NR-variant
II. They produce 25 instead of 36.4 expected tokens of the marked variant (43.1 %
of the 58 tokens) although they are responsible for 62.8 % of all tokens (258 out
of 411). The other two types of informants again show intermediate shares of the
marked variant. The results for the quasi-interval index values for the informants’
raising and scrambling behavior are given in Table 11:

Table 11: Average values for verb projection raising and scrambling for the informants who
produce the two variants in dependent causal clauses with one verbal element in the South
American colonies

total ObjNP-verb verb-ObjNP

n 454 392 62
verb projection raising -0.157 -0.174 -0.053

n 414 354 60
scrambling -0.028 -0.013 -0.117
Rare phenomena revealing basic syntactic mechanisms 143

The results are highly significant for both raising (F (1,452) = 18.4, p = 0***) and
scrambling (F (1,412) = 7.7, p = 0.006**). This means that the raising-friendly and
scrambling-lazy informants again produce more tokens of the variant in question
than any other combination of these two dimensions. This adds one more piece of
independent evidence because (i) we have now analyzed causal clauses and not
complement, relative or conditional clauses and (ii) we have now analyzed the
data of less than half of all Mennonite informants. Only 143 of the 313 informants
come from the four South American colonies. In spite of these differences with
regard to origin and type of clause, the distributional facts are identical. To put
it more precisely: The behavior of the linguistically more conservative, i. e. more
SHG-like South American informants in less embedded causal clauses is exactly
like the behavior of the linguistically more progressive, i. e. less SHG-like North
American informants in more embedded complement, relative and conditional
clauses.

5 Concluding remarks

We have accrued massive empirical backing for a rather old hypothesis put
forward by Den Besten and Broekhuis who claimed in 1989 that the VR-variant
in clusters with two verbal elements is the consequence of verb projection raising
and scrambling (cf. footnote 11). This assumption implies that the VPR-variant
is the consequence of raising without scrambling. This implication is supported
by our data at least with regard to MLG, a variety whose speakers are confronted
with the VPR-variant and the VR-variant all the time. This constant exposure to
both variants allows Mennonites to “consider” scrambling the distinguishing
factor between them. It does not necessarily imply, however, that the VR-variant
in a language like Standard Dutch is explainable in the same way (cf. Kaufmann
2007: 202), since Dutch speakers are not exposed to the VPR-variant to the same
degree as the Mennonite informants are. The findings of this article can be sum-
marized in the following way:

(i) Mennonite informants who prefer the VPR-variant in dependent clauses with two verbal
elements tend to produce the marked sequence verb-ObjNP/PP.
(ii) We have found much evidence for the assumption that this marked sequence is the con-
sequence of raising without scrambling.
(iii) As these informants prefer the VPR-variant and as syntactic preferences most probably do
not depend on the number of verbal elements, we assume that the VPR-variant in MLG is
also the consequence of raising without scrambling.
(iv) Mennonite informants who prefer the VR-variant in dependent clauses with two verbal
elements produce the marked sequence verb-ObjNP/PP to a much lower extent.
144 Göz Kaufmann

(v) As these informants share their proneness for raising with the informants who prefer the
VPR-variant (their clusters are only differentiated by the position of the ObjNP/PP, not by
the sequence verb1-verb2), the significant difference in the share of the marked variant must
be caused by a differing scrambling behavior.
(vi) Thus we conclude that the VR-variant in MLG is the consequence of raising plus scrambling.

Combining the analysis of the marked variant with the analysis of the more
frequent variation in MLG verb clusters with two verbal clusters follows Rijkhoff’s
(2010: 223) wish according to which “[r]are linguistic features should play in [sic]
important role in grammatical theory, if only because a theory that can account
for both common and unusual grammatical phenomena is superior to a theory
that can only handle common linguistic properties”. Our analysis does exactly
this, it explains a common and a rare phenomenon by means of the same theo-
retical assumptions.
The last point to discuss is whether the variation found in MLG is better
explained by a more system-based or a more usage-based approach to language
variation. In our view, the occurrence of the rare variant is the consequence of the
infrequent and innovative17 overgeneralization of a system-based preference with
regard to two syntactic movements, namely verb projection raising and scram-
bling. If the number of the marked ObjNP/PP-final variant rises in the future,
an effect on the formation of new linguistic systems may follow. One possible
consequence could be a change from OV to VO as in Old English. It is not sure,
however, whether we will need a usage-based approach in order to explain such
a frequency effect. It may be enough to fall back on Lightfoot’s transparency prin-
ciple in order to account for such a reanalysis. After all, Lightfoot (1999: 156) con-
nects reanalysis with quantification.
Due to the results found we judge that the combination of modern syntactic
theory and variation linguistics is a rather fruitful one. Newmeyer (2005: 160)
might be right when he says: “But it is a long way from there [importance of sta-
tistical data from corpora] to the conclusion that corpus-derived statistical infor-
mation is relevant to the nature of the grammar of any individual speaker”. In
spite of this, we should not discount the possibility that although “corpus-derived
statistical information is [perhaps not] relevant to the nature of the grammar”, it
may be decisive for detecting “the nature of grammar”. Be this as it may, we do

17 The marked variant can be called innovative because the Brazilian informants who use it
are on average 8.2 years younger than the ones producing the unmarked variant (F (1,350) =
3.3, p = 0.069(*)). In Mexico, the difference is 7.7 years (F (1,582) = 6.4, p = 0.012*); in the USA, the
difference of 3.4 years is smaller and not significant.
Rare phenomena revealing basic syntactic mechanisms 145

hope to have come somewhat closer to a state of the art which Haider (2007: 389)
rightly demands for modern linguistics:

Generative Grammar is not free of post-modern extravagances that praise an extravagant


idea simply because of its intriguing and novel intricacies as if novelty and extravagance by
itself would guarantee empirical appropriateness. In arts this may suffice, in science it does
not. Contemporary papers too often enjoy a naive verificationist style and seem to com-
pletely waive the need of independent evidence for non-evident assumptions. The rigorous
call for testable and successfully tested independent evidence is likely to disturb many
playful approaches to syntax and guide the field eventually into the direction of a serious
science. At the moment we are at best in a pre-scientific phase of orientation, on the way
from philology to cognitive science.

References
Auer, Peter (1998): Zwischen Parataxe und Hypotaxe: ‚abhängige Hauptsätze’ im gesprochenen
und geschriebenen Deutsch. Zeitschrift für Germanistische Linguistik 26: 284–307.
Barbiers, Sjef (2000): The right periphery in SOV languages: English and Dutch. In: Peter
Svenonius (ed.), The Derivation of VO and OV, 181–218. Amsterdam/Philadelphia: John
Benjamins.
Bennis, Hans (1992): Long Head Movement: The position of particles in the verbal cluster
in Dutch. In: Reineke Bok-Bennema and Roeland van Hout (eds.), Linguistics in the
Netherlands 1992, 37–47. Amsterdam/Philadelphia: John Benjamins.
Besten, Hans den and Hans Broekhuis (1989): Woordvolgorde in de werkwoordelijke eindreeks.
GLOT 12: 79–137.
Duden (2006): Die Grammatik: Unentbehrlich für richtiges Deutsch. Mannheim: Dudenverlag.
Eisenberg, Peter (in collaboration with Rolf Thieroff) (2013): Grundriss der deutschen
Grammatik – Volume 2: Der Satz. Stuttgart/Weimar: Metzler.
Haegeman, Liliane (1994): Verb raising as verb projection raising: some empirical problems.
Linguistic Inquiry 25/3: 509–522.
Haider, Hubert (2007): As a matter of facts – comments on Featherston’s sticks and carrots.
Theoretical Linguistics 33/3: 381–394.
Haider, Hubert (2010): The Syntax of German. Cambridge: Cambridge University Press.
Kaufmann, Göz (2003): The verb cluster in Mennonite Low German. In: Klaus J. Mattheier
and William Keel (eds.), German Language Varieties Worldwide: Internal and External
Perspectives, 177–198. Frankfurt a. M.: Peter Lang.
Kaufmann, Göz (2007): The verb cluster in Mennonite Low German: A new approach to an old
topic. Linguistische Berichte 210: 147–207.
Kaufmann, Göz (2011): Looking for order in chaos: Standard convergence and divergence in
Mennonite Low German. In: Mike Putnam (ed.), Sudies on German-Language Islands,
187–230. Amsterdam/Philadelphia: John Benjamins.
Keller, Rudi (1993): Das epistemische weil. Bedeutungswandel einer Konjunktion. In: Hans
Jürgen Heringer and Georg Stötzel (eds.), Sprachgeschichte und Sprachkritik. Festschrift
für Peter von Polenz zum 65. Geburtstag, 219–247. Berlin/New York: de Gruyter.
146 Göz Kaufmann

Larrew, Olha (2005): Norm, Normen, Normabweichungen: Eine historische und empirische
Untersuchung zur wissenschaftlichen Bewertung morphosyntaktischer Konstruktionen im
Deutschen. Hamburg: Dr. Kovač.
Lightfoot, David (1999): The Development of Language: Acquisition, Change, and Evolution.
Oxford/Malden, MA: Blackwell Publishers.
Newmeyer, Frederick J. (2005): Possible and Probable Languages: A Generative Perspective on
Linguistic Typology. Oxford: Oxford University Press.
Rijkhoff, Jan (2010): Rara and grammatical theory. In: Jan Wohlgemuth and Michael Cysouw
(eds.), Rethinking Universals: How Rarities Affect Linguistic Theory, 223–239. Berlin/New
York: Mouton de Gruyter.
Schmitz, Katrin (2006): Zweisprachigkeit im Fokus: Der Erwerb der Verben mit zwei Objekten
durch bilingual deutsch-französisch und deutsch-italienisch aufwachsende Kinder.
Tübingen: Narr.
Wexler, Kenneth and Peter Culicover (1980): Formal Principles of Language Acquisition.
Cambridge, MA: MIT Press.
Zwart, Jan-Wouter (1996): Verb Clusters in Continental West Germanic Dialects. In: James
R. Black and Virginia Motapanyane (eds.), Microparametric Syntax and Dialect Variation,
229–258. Amsterdam/Philadelphia: John Benjamins.
Leonie Cornips, Meertens Instituut/Maastricht University
The no man’s land between syntax and
variationist sociolinguistics: The case of
idiolectal variability1

Abstract: The aim of this paper is to focus on the so-called no man’s land where
sociolinguistics and grammatical theory interact. It is argued that E-language
as a social and I-language as a psychological construct do not exist indepen-
dently, but influence each other. In other words, syntactic variation and change
are driven by social factors but constrained by the nature of possible grammars.
The interaction between the social meanings of linguistic forms on the one hand
and grammar on the other brings about complex and multi-layered relationships
between the individual and the group’s or societal grammar. This paper empha-
sizes how individuals are restricted by grammar but, at the same time, able to
overcome these restrictions in specific situated contexts through interactions.
This combined approach enables us to predict why some structures are more
resistant or vulnerable to syntactic variation and change than others and the
route(s) individuals may take to overcome syntactic “restrictions”. In this process
of interdependent relations between the I- and E-languages, the interpretation
and evaluation of linguistic forms through interaction is of crucial importance in
the realization of so-called “impossible” or “unrealized” constructions.

1 Introduction

According to the editors of this volume – Aria Adli, Marco García García and Göz
Kaufmann – the system-usage issue remains controversial in modern linguistics,
and a single axis with the endpoints “system-based” and “usage-based” does
not do justice to the complexity of linguistic reality. In this paper, I will try to go
beyond the system-usage dichotomy perspective by viewing the phenomenon of
syntactic variation as a crossroad where sociolinguistics and generative grammar
meet. Syntactic variation could be considered a multilayered phenomenon that
is the result of cognitive capacities (cf. Kayne 1994) and is strongly influenced by
social or linguistic practices (cf. Labov 1972, 1994; but more specifically Eckert

1 Aria Adli suggested the first part of this title to me.


148 Leonie Cornips

2000, 2008; Silverstein 1985). The attempt to let these two approaches interact
with each other (Adger 2006; Adger and Trousdale 2007; Adger and Smith 2010;
Cornips and Corrigan 2005a, 2005b; Wilson and Henry 1998, among many others)
should give us insight into how syntactic variation is driven by social factors but
constrained by the nature of possible grammars (cf. Wilson and Henry 1998: 82).
The aim of this paper is to emphasize that there are two crucial issues regard-
ing linguistic complexity that must be tackled within the grammar and usage
debate: the issue of idiolectal or intraspeaker variation, and the complex and mul-
tilayered relationships between the individual and the society that bring about
mutual effects on both the individual and the group grammar. In my opinion it is
at the level of individual speakers where we can best examine the locus and limits
of syntactic variation. It is here that we encounter the largest possible variation
space and its boundaries for syntactic variation and that we find answers to the
questions of whether, why and how speakers cross these boundaries.
I will present four case studies that will deepen our insight into why and
how speakers in situated contexts cross syntactic boundaries, showing complex
and multilayered relationships between the individual and the group grammar.
These case studies will show that speakers do not only differ from each other but
that they show intraindividual variation as well. Speakers select and incessantly
combine linguistic forms producing multilayered clusters of linguistic elements
in social and geographical space, but this only comes about because speakers (re)
interpret these forms continuously (Eckert 2008: 463). Thus, linguistic forms are
always available for reinterpretation and carry various social meanings through
discourse. However, the process of reinterpretation and, hence, giving social
meaning to linguistic forms does not entail that speakers can select and combine
linguistic forms randomly or arbitrarily. However, it does imply that ungrammat-
ical or unacceptable constructions may become acceptable and realized in the
process of interaction since speakers make meaning out of producing and inter-
preting linguistic forms together.
The first case study, which focuses on word order alternations in a particular
three-verb cluster in Dutch dialects, shows that speakers prefer more than one
order in this type of cluster. However, the specific combinations of the possible
word orders are not distributed randomly – that is, there are combinations that
are categorically absent while others are distributed but restricted in geographi-
cal space. The second case study presents variation between overt and null forms
of the determiner. It reveals to what extent the individual speaker may accom-
modate and identify with his/her interlocutor resulting in intraindividual vari-
ation without any noticeable effort. The third case study deals with contemporary
urban vernaculars and is particularly interesting since syntactic restrictions are
loosened in these varieties. As a result, new types of constructions emerge as an
The no man’s land between syntax and variationist sociolinguistics 149

integral part of grammar even if the phenomenon in question does not seem to
allow for variation, as in the case of verb-second word order in Continental West
Germanic varieties. Youngsters in large cities may override verb-second word
order in situated contexts, but again this variation is not random since the syntac-
tic route to overcoming it is the same for every speaker. Finally, the case study on
the overuse of common gender determiners in Dutch shows that language is not
a neutral medium for communication, but rather a medium through which social
acts are accomplished. A speaker using common definite determiners instead of
the required standard neuter one explicitly argues that he cannot use the stand-
ard form because it would make a dumb impression on his peers.
The paper is organized as follows: The second and third section present a
brief overview of the history of the generativist and sociolinguistic approaches
to variation and tackle the question of why they diverged and why they seem to
be converging now. These sections are important since generativist and socio­
linguistic theories have important interfaces explaining why not all linguistic
forms vary to the same degree (grammar component), and why speakers may
cross ‘boundaries’ regarding so-called ungrammatical or unacceptable construc-
tions, resulting in language variation and change (sociolinguistic component).
The fourth section elaborates on this interplay between I- and E-languages. The
fifth section presents the four case studies illustrating the intriguing interplay
between linguistic and social organization.

2 Two diverging approaches to syntactic variation

Cornips and Corrigan (2005a, 2005b) were of course not the first to point out
that researchers who espouse the frameworks encapsulated by the umbrella
terms “grammar” as in generative theory and “usage” as in quantitative or vari-
ationist sociolinguistics diverge quite rigidly in terms of both their methodo-
logical approaches and their theoretical persuasions. Although certain formal
resonances between the paradigms have endured since the early days of their
inception, the fundamental differences between them created a schism that has
persisted through most of the later twentieth century (see references in Cornips
and Corrigan 2005a, 2005b).
In the 1960s both sociolinguistic and formal syntax models contained formal
rules that could be applied obligatorily or optionally.2 Formal rules in the earliest
Chomskian model were transformations that connected “deep” structures with

2 For a more extensive overview see Cornips 2005.


150 Leonie Cornips

“surface” structures on the basis of rewrite rules. Optional rules, for example,
derived passive, negative or question sentences from declarative sentences.
Labov introduced the concept of the variable rule as an extension of this optional
rule to include social and stylistic dimensions of language use along with linguis-
tic dimensions. However, both paradigms soon followed their own avenues, i. e.
the successive transformational models assume the existence of categorical rules
only, while variationist sociolinguistics has maintained the notion of the optional
rule. The two perspectives on the nature of formal rules reflect deep-seated differ-
ences between the two models. Variationists claim that the output of a linguistic
rule can be probabilistic rather than discrete and that a linguistic constraint can
have a quantitative rather than deterministic effect on the outcome of the process.
However, Labov based his research on the generative linguistic model by
putting forward the variable rule as a means of accommodating interspeaker and
intraspeaker variation. The variationist sociolinguistic practice that has evolved
from studies of language variation and change since then assumes the principle
of accountability as a given (cf. Sankoff 1990: 296). This principle states that vari-
ants belonging to the same syntactic (linguistic) variable must be specified by the
total number of occurrences and the potential occurrences or non-occurrences
in the variable environment, i. e. it ranges between 0 % and 100 % (cf. references
in Cornips and Corrigan 2005b). This guarantees that the entire range of variable
and categorical occurrences present in the data are taken into account. The notion
of the syntactic variable as a structural unit and the question of which variants
belong to this unit were based on the earliest generative assumption that vari-
ants have an identical underlying structure or representation which is subject to
variable surface realizations (Winford 1996: 177). The alternation between active
and passive constructions is an example of two different surface manifestations
of the same underlying or deep structure – that is, two variants belonging to the
same sociolinguistic variable (variable rule). The definition of the syntactic vari-
able as a structural unit inevitably follows the synonymy principle. This principle
is the prerequisite for variants to be assigned to the same linguistic variable; in
other words, only syntactic variants that are equivocal with regard to referential
meaning, i. e. variants that are “alternate ways of saying the ‘same’ thing” (Labov
1972: 118) belong to the same variable. However, the assignment of meaning or
function of syntactic variants was considered problematic soon after its introduc-
tion (Lavandera 1978). Moreover, it has been suggested that some types of mor-
phosyntactic variation rarely serve to differentiate social groups because of their
dependence on pragmatic and semantic conditioning (cf. references in Cornips
and Corrigan 2005b).
In the meantime, in subsequent generative models, the idea of a derivational
model was abandoned in favor of a configurational model (most recently Mini-
The no man’s land between syntax and variationist sociolinguistics 151

malism) in which a single representation is subject to various constraints. Formal


syntax postulates a blind, deterministic application of a series of procedures
given a certain starting point (Chomsky 1995). As a consequence, the original
notion of the linguistic variable in sociolinguistics as a structural unit has been
lost or, to be more specific, syntactic variants are now considered to be different
surface realizations of different underlying structures. Moreover, as theory has
progressed, semantic interpretation is thought to be the “result” of the output of
transformations rather than the input. The semantic component interprets the
structures provided by autonomous syntax (cf. Adger and Trousdale 2007: 269–
270). Given this, the semantic component can be sensitive to different structures
provided by grammar, and also within formal syntax it is a question of whether
two structures or two variants have the same meaning (cf. Adger and Trous-
dale 2007: 269–270). In fact, semantic knowledge is mediated by syntax. Since
the consecutive models in generative grammar follow each other at high speed,
different makeups of the structural unit have been detected in the last decades.
Nevertheless, it is certain that as Sells et al. (1996: 173) put it: “Variation theory
needs grammatical theory because a satisfactory grammatical characterization
of a variable is a pre-requisite to decisions about what to count and how to count
it, and it is an essential element in the larger question about where variation is
located in speakers’ grammars”.

3 N
 eeded: Reconciling approaches to account
for linguistic complexity

The next two sections form the backbone of this paper. Here, it is argued that
an ideal speaker-hearer provides us with a decontextualized view on syntactic
variation and change, while a speaker-hearer relationship is intrinsically social
by nature. We need both approaches to account for the fact that intra- and inter-
individual variation do not constitute a rare phenomenon but occur in normal,
daily situations. A combined approach is also needed in order to account for the
fact that syntactic variation and change happens all the time. A first task for a
combined approach of sociolinguistics and theoretical syntax to tackle linguistic
complexity would be to find out: (i) which grammatical considerations are rel-
evant to the definition of syntactic variables; (ii) whether grammatical theory can
predict the differential vulnerability of diverse linguistic forms for variation and
change; and (iii) how specific situated contexts influence individual grammars
(Eckert 2000; Eckert and McConnel-Ginet 1992, 1999; Meyerhoff 2002).
Of course, with respect to theoretical syntax and sociolinguistics, Wilson and
Henry (1998: 2) already noted that “there have been few real attempts to marry
152 Leonie Cornips

these seemingly divergent positions” and Meechan and Foley (1994: 63) likewise
suggest that the two fields “rarely, if ever, cross paths”. However, since the nineties
and especially after the turn of the millennium, sincere attempts have been
undertaken to integrate grammar and usage.3 Wilson and Henry (1998: 82) argue
that syntactic variation is driven by social factors but constrained by the nature of
possible grammars. Cornips and Corrigan (2005a) emphasize that the generative
approach has much to gain from a perspective in which the organization of the
grammar is seen as somehow reflected in patterns of usage. Moreover, usage may
lend strong support to a structural analysis and usage may reveal “a glimpse of
grammatical structure” (Meechan and Foley 1994: 82). The core findings of socio­
linguistics are that usage data are not amorphous. Rather, as Guy (2005: 563)
notes, “linguistic diversity is well mannered and orderly, following observable
principles of social and linguistic organization” and “[g]rammar and usage both
exhibit structure and order, some of it probabilistic, some of it categorical”. The
central point here is that intuitions and usage data differ, but overlap as well.
Usage data differ from intuitions in that they do not provide direct insight into
which syntactic variants are ungrammatical. Moreover, they do not necessarily
occur in the contexts that are relevant for specific theoretical concerns. On the
other hand, generative practices also demonstrate that intuitions of individuals
who claim to speak the same variety may differ and that those individual intui-
tions may differ with respect to context as well as over time. Further, usage data
coincide with intuitions to a large extent and, even more relevantly, they may
contain entirely new phenomena not predicted by theory. Consequently, theory is
enhanced by analyzing and accounting for non-expected data and patterns (see
for example Jensen and Christensen 2013).
In fact, Adger (2006) proposes to account for variation in agreement patterns
not only formally, but also functionally: Variation is not only restricted to lin-
guistic representation but also related to language in use. In his view, variation in
agreement patterns is ultimately a matter of the properties of the lexicon of func-
tional categories. The variants that make up the linguistic variable are “not simply
determined by the linguistic context in which they appear, nor are they simply in
free variation. Rather, they are more or less likely to be selected depending on the
previous discourse, the speaker, the audience, and other psycholinguistics and
sociolinguistics factors” (Adger and Trousdale 2007: 268–269). Adger and Smith
(2010: 1109) argue that “the variability found in an individual speaker is two-
dimensional: it may involve varying featural specification of functional categories
and/or underspecification in the mapping between these categories and between

3 See brief overviews in Adger 2006; Adger and Trousdale 2007; and Cornips and Corrigan 2005a.
The no man’s land between syntax and variationist sociolinguistics 153

morphological forms; the former modeling the kind of variation usually thought
of as ‘parametric’ and the latter modeling the kind of variation usually captured
by the notion of linguistic variable”. In the lexical feature model, despite its
limited scope, the most recent formal insights are very well articulated. However,
although syntax is viewed as completely autonomous, sociolinguistic research
shows that syntax continuously varies and changes not only between generations
but also at the level of the individual speaker. The ways in which individuals speak
are constrained by grammar, thus syntactic variation is certainly limited and some
parts of grammar are more resistant to variation than others (for instance, V2 in
Germanic languages). But the other side of the story is that individuals are able to
overcome so-called syntactic restrictions. In order to detect this, we need to study
these individuals in situated contexts in interaction with others.
These contexts show how individuals divide up the world in which they live
and how these oppositions obtain their shape linguistically. According to Eckert
(2008, 2012), linguistic variation constitutes a robust social semiotic system that
is able to express the full range of social concerns in a given community; variation
does not simply reflect, but constructs social meaning, hence it is a force in social
change. Speakers recognize syntactic variants as stereotypes and these may be
activated (or avoided) in public performances or otherwise in highly stylized uses
of local-sounding speech (Eckert 2000; Rampton 1995). Speakers do not simply
reflect “grammars” or “social categories” but are agents as well. Consequently,
the question of which linguistic element(s) will become socially meaningful is
dependent on the individual and wider societal, political and ideological context
of interaction (Cornips 2014). This may lead to individuals crossing or stretch-
ing the borders of the variation space of syntax in specific situated contexts and
social practices.

4 Linguistic complexity: Intraspeaker variation

Syntactic variation at the level of the individual speaker and the community is
not chaotic and distributed randomly but is governed by social rules (Labov 1972,
1994 and many others). Important questions to be addressed are therefore: How
and to what extent do individual and group grammars influence each other? Is
it possible to predict which linguistic variants will become socially meaning-
ful in specific situated contexts and which lexical features will be spelled out
ultimately?
The relation between an individual and societal “grammar” is not one of
a simple dichotomy between I- versus E-language. I-languages are the product
of genetic endowment and individual experience (Chomsky 1995). People who
154 Leonie Cornips

live closely together can understand each other because they share a common
genetic endowment (by virtue of being human) and a common (linguistic) expe-
rience. This experience, however, is not completely identical and, therefore, one
will always find some variation between the I-languages of people who claim to
speak the same dialect (see Adger and Trousdale 2007: 271). This corresponds to
Guy’s (2005: 562) summary of the principal findings of sociolinguistics in which
he states that individuals are grammatically (more or less) similar due to social
proximity. Hence, experience cannot take place in a social vacuum: It is shaped
in interaction with and by others. Consequently, the opposition between I-lan-
guage and E-language is not as watertight as suggested in the literature. Accord-
ing to Muysken (1999: 72, 2000: 41–43), the cognitive abilities which shape the
I-language determine the constraints found in E-language. Moreover, the norms
created within E-language make I-language coherent. Thus, it is untenable to
picture “grammar” as merely a transparent representation of inner mental events
since language is one of the most important mediums through which social acts,
including linguistic norms, are accomplished. Language use itself is a form of
social activity (Widdicombe and Woofitt 1995: 1).
Henry (2005) has already pleaded for more attention to the phenomenon of
idiolectal (intraspeaker) variation by both sociolinguists and theoretical syntacti-
cians. Within the classical grammar perspective, variation refers to structural dif-
ferences between individual grammars (interspeaker, cross-linguistic variation or
variation between closely related dialects) but not within the individual grammar
itself. Central questions in current syntactic research are: (i) What are the limits of
syntactic variation for the individual speaker and in general? and (ii) What is the
locus of syntactic variation in the grammar model? In my opinion, intraspeaker
variation is the most challenging kind of variation to examine, and therefore
this type of variation may be the key to gaining generative insights and answer-
ing these questions. After sixty years of theory development about the internal
organization of grammar, the idealized speaker-hearer environment should be
left aside and, instead, generative insights should be tested in the realm of lan-
guage use where this internal organization has its most complex output. Hence,
E-language as a social and I-language as a psychological construct do not exist
independently of one another, but their interaction influences the grammars of
speakers and the way they speak. Hence, the multilayered relationships between
language as a social product and language as “grammar” continuously influence
language norms. These norms are crucial since they determine which linguistic
elements are selected (or not selected) by speakers in which contexts and, con-
sequently, relate to the central question of how people use language in their daily
lives (social practices) and how grammar is organized. The norms, the selection
of linguistic elements and the daily practices of people influence one another
The no man’s land between syntax and variationist sociolinguistics 155

continuously (see the “total linguistic fact” by Silverstein 1985). This “holistic”
view of language is the only one that can explain how individual grammars
are restricted and at the same time how individuals are able to overcome these
restrictions in specific situated contexts. This combined approach enables us to
predict why some structures are more resistant to syntactic variation and change
than others and the route(s) individuals may take to overcome these syntactic
“restrictions”. In this process, the interpretation and evaluation of linguistic
forms through interaction is of crucial importance in the acceptation of so-called
“impossible” or “unrealized” constructions.
Very likely, the domain of micro-parametric variation – the syntactic differ-
ences between closely related individual grammars in social and geographical
proximity – is the most eligible domain for addressing the questions raised
above. Micro-parametric variation research clearly shows that each speaker has
his/her own grammar that minimally differs from the grammar of everybody else
(Cornips and Poletto 2005; Barbiers, Cornips, and Kunst 2007). In this empiri-
cal domain, there has been a clear methodological shift in generative grammar.
Intuitions or native-speaker introspections in an idealized environment used to
be considered the only suitable tool in macro-parametric variation research. This
often meant that the resultant analyses reflected the grammaticality judgments of
the theorist who may have been unaware of the considerable degree of syntactic
variation which potentially exists within the same speech community (Cornips
and Corrigan 2005b). However, recent micro-parametric variation research inves-
tigating dialects in many parts of Europe has drawn on acceptability judgments
of non-standard speakers and, hence, it is in this domain that generative gram-
marians and sociolinguists are converging.
Regarding micro-parametric variation in social space, data can be collected
by the so-called sociolinguistic interview – that is, systematic recordings of con-
versations between individuals. Due to this, analysis will consist of socially situ-
ated language samples. Of course, the setting of the sociolinguistic interview is
an experimental one (Labov 1972, 1975, 1994). The data for geographical micro-
parametric variation, which is the object of large-scale syntactic dialect atlases
such as the Syntactic Atlas of the Dutch Dialects (acronym SAND, cf. Barbiers,
Cornips, and Kunst 2007), the Northern Italian Syntactic Dialect Atlas (acronym
ASIS; cf. Poletto and Benincà 2007) and the Scandinavian Dialect Atlas (acronym
ScanDiaSyn; cf. Vangsnes 2007) must be systematically elicited from a sample
of community members (in a large geographical area) rather than derived from
linguists’ own introspections.4 This not only enhances the empirical basis of syn-

4 More information about the ASIS, SAND and ScanDiaSyn projects can be found at: http://asis-
156 Leonie Cornips

tactic theory, but also reduces the influence of prescriptive rules. The elicitation
methodology in micro-parametric variation research relies on prior knowledge of
variability within the speech community gained by observational methods, and it
is on this basis that hypotheses are formulated and tested. In the domain of micro-
parametric variation, systematic recordings of spontaneous speech and eliciting
acceptability judgments of speakers are both necessary and complementary.

5 Idiolectal variation: Four case studies

In this section, I will discuss four case studies revealing intraindividual variation.
The recent view on micro-parametric variation challenges the traditional idea of
idiolects being sufficiently similar. Kayne (1996: XV) asks: “Can anyone think of
another person with whom they agree 100 % of the time on syntactic judgments
(even counting only sharp disagreements)? Or more precisely, are there any two
people who have exactly the same syntactic judgments without exception?”
(cited in Adger and Trousdale 2007: 266).
The four case studies will show that speakers differ from each other and that
they show intraindividual variation even when considering word order or gram-
matical gender agreement. The first case study focuses on word order in the MOD-
AUX-Vpart three-verb cluster in Dutch dialects. The second case study deals with
variation between overt and null forms of the determiner (D) and reveals that
the individual speaker may accommodate and identify with his/her interlocutor,
resulting in intraindividual variation. The third case study shows that contact set-
tings between youngsters in urban cities may override the so-called verb-second
constraint, and the last case study shows how important intraindividual vari-
ation is for the construction of a streetwise identity.

5.1 Different (individual) word orders in the three-verb cluster MOD-AUX-Vpart

Let us focus first on intraspeaker variation both in elicited acceptability judg-


ments in the SAND corpus and in recordings of spontaneous speech (cf. Cornips
1994). The SAND corpus is the result of a large-scale dialect syntax project carried
out from 2000 to 2003 in the Netherlands and Belgium (cf. Cornips and Jongen-
burger 2001; Cornips and Poletto 2005; Barbiers, Cornips, and Kunst 2007). Dif-

cnr.unipd.it, http://www.meertens.nl/projecten/sand/sandeng.html and http://websim.arki


vert.uit.no/ scandiasyn/index.html%3fLanguage=en, respectively.
The no man’s land between syntax and variationist sociolinguistics 157

ferent types of three-verb clusters were presented to and evaluated by 370 native
speakers of local dialects throughout the Netherlands and Flanders (Belgium).
Social variables of the speakers were controlled for, thus homogenizing the
elicited data as much as possible with respect to age, mobility, the functional
domains in which the respective dialect is spoken, and education. Only in this
way it is possible to detect variation ascribed to geographical differences (cf.
Barbiers, Cornips, and Kunst 2007: 60). We administered among others the three-
verb cluster MOD-AUX-Vpart illustrated in (1):

(1) Jan weet dat hij voor drie uur de wagen moet hebben gemaakt 1-2-3 order
‘Jan knows that he before 3 o’clock the car must have made’

The first verb in this cluster contains the modal moet ‘must’ as the hierarchically
highest verb (1) (MOD), perfective hebben ‘have’ (2) as the infinitive (AUX) and the
lexical verb (3), the past participle gemaakt ‘made’, as the hierarchically lowest
embedded verb (Vpart). All six possible orders between the three verbs in (1) were
offered to the subjects in an indirect relative judgment task, as illustrated in
Figure 1, to collect data about variability in word orders in three-verb clusters. In
this task the 370 subjects were first asked to answer with ‘yes’ or ‘no’ whether they
encounter the orders (a) through (f) in their local dialect and, subsequently, to
rank these orders from most to least acceptable on a 5-point-scale (representing *,
?*, ??, ?, ok). Thus, the subjects were instructed to consider all possible orders in
Figure 1 (see also Barbiers 2005; Cornips 2009):

encounter uncommon-common
a. 1-2-3 …dat…moet hebben gemaakt yes/no 1–2–3–4–5
b. 1-3-2 …dat…moet gemaakt hebben yes/no 1–2–3–4–5
c. 2-1-3 …dat…hebben moeten gemaakt yes/no 1–2–3–4–5
d. 2-3-1 …dat…hebben gemaakt moeten yes/no 1–2–3–4–5
e. 3-1-2 …dat…gemaakt moet hebben yes/no 1–2–3–4–5
f. 3-2-1 …dat…gemaakt hebben moet yes/no 1–2–3–4–5

‘Jan knows that he before three o’clock the car repaired/must/have’

Figure 1: The six possible word orders in the verbal cluster MOD-AUX-Vpart

Table 1 presents the quantitative results of the judgment task for the whole area,
i. e. the Netherlands and Dutch-speaking Belgium. It shows that the MOD-AUX-
Vpart cluster allows four different orders (in bold) when we include only the
“yes = 5” answers above a threshold of 10 % (n > 37) (cf. Cornips 2009). The differ-
ences in the absolute number in the columns “yes = 5” through “yes = 1” are due
to whether the construction is acceptable in a smaller or wider geographical area.
158 Leonie Cornips

The 3-2-1 order, for example, appears to be restricted to the Frisian area, while
3-1-2 is quite common almost everywhere (in particular in the eastern part of the
Netherlands) (cf. Barbiers 2005).

Table 1: Word orders in MOD-AUX-Vpart three-verb clusters in dialects of Dutch according to an


acceptability scale of 1 through 5 (N = 370, 100 %) (cf. Cornips 2009)

MOD-AUX-Vpart moet hebben gemaakt ‘must have repaired’

yes (total) yes = 5 yes = 4 yes = 3 yes = 2 yes = 1 no, no answer

a. 1-2-3 223 = 60.3 % 107 47 37 20 12 147 =39.7 %


b. 1-3-2 160 = 43.2 % 78 32 33 9 8 210 =56.8 %
c. 2-1-3 9 = 2.4 % 4 2 0 1 2 361 =97.6 %
d. 2-3-1 6 = 1.6 % 2 1 0 1 2 364 =98.4 %
e. 3-1-2 305 = 82.4 % 210 49 30 9 7 65 = 17.6 %
f. 3-2-1 79 = 21.3 % 44 14 15 5 1 291 =78.7 %

Table 2 shows that individual speakers accept not only one but also two or even
three different orders. If we establish a threshold of 10 % again, the MOD-AUX-
Vpart cluster allows up to three orders per subject and the percentages of inform-
ants accepting two orders are highest, followed by informants accepting three
orders:

Table 2: Idiolectal variability: the number of orders for the MOD-AUX-Vpart cluster accepted by
the same subject regardless of the acceptability scale value (cf. Cornips 2009: 219–220)

Type of cluster 1 order 2 orders 3 orders 4 orders 5 orders 6 orders

Mod-Aux-Vpart 87 = 139 = 97 = 26 = 2= 2= mean = 2.2


24.6 % 39.4 % 27.5 % 7.4 % 0.6 % 0.6 % n = 370

The specific combination of two, three and four orders at the level of the individ-
ual speaker is not distributed randomly. There are combinations such as the pair
1-2-3/3-2-1 that are categorically not present (n = 0 in Table 3). With respect to two
orders, individuals prefer the combination 1-2-3 and 3-1-2 (n = 82). With respect to
three possible orders, the subjects prefer the combination 1-2-3/1-3-2/3-1-2 (n = 71)
significantly more than other possible three-combinations. The combination
1‑2‑3/1-3-2/3-1-2/3-2-1 is the most favorite among informants who accept four differ-
ent orders (n = 22):
The no man’s land between syntax and variationist sociolinguistics 159

Table 3: Combinations of accepted orders by individual speakers for the MOD-AUX-Vpart cluster
(more than 4 subjects, cf. Cornips 2009: 220)

Combinations of orders Mod-Aux-Vpart

two orders 1-2-3 / 1-3-2 4


1-2-3 / 3-1-2 82
1-2-3 / 3-2-1 0
1-3-2 / 3-1-2 39
3-1-2 / 3-2-1 14

three orders 1-2-3 / 1-3-2 / 3-1-2 71


1-2-3 / 2-3-1 / 3-1-2 0
1-2-3 / 3-1-2 / 3-2-1 21

four orders 1-2-3 / 1-3-2 / 3-1-2 / 3-2-1 22

Importantly, the same preferences in Tables 1 and 3 were reflected in the sponta-
neous speech of 67 adult speakers in one location in the southeast Netherlands,
namely the city of Heerlen (cf. Cornips 2009). These 67 speakers produced the
order 3-1-2 in the MOD-AUX-Vpart cluster most frequently and they also produced
the combination 1-2-3/1-3-2/3-1-2. In this respect, spontaneous speech data in one
location converge with acceptability judgments in a large geographical area, i. e.
the Netherlands and the Dutch-speaking part of Belgium.
This consistency in idiolectal variation calls for a combined “grammar” and
“usage” approach. What is needed in order to account for the complexity of the
data at the individual level is a principled answer to the question of why certain
verb clusters and certain combinations of three-verb clusters are almost categor-
ically absent. Moreover, we must analyze the size of the variation space, i. e. the
most frequent and geographically constrained distributions of the various combi-
nations of three-verb clusters. Hence, the order 3-2-1 in the MOD-AUX-Vpart cluster
is most present in the northwest (Friesland) area, the combination of orders 1-3-
2/3-1-2 primarily in Flanders, and the combination of orders 1-2-3/3-1-2 can be found
everywhere in the Netherlands.5 Thus on the one hand, the MOD-AUX-Vpart cluster
shows a bewildering variation with respect to different word order alternations,
which is a feature of linguistic complexity. On the other hand, this complexity
shows regular patterns between closely related individual grammars distributed
geographically and within one individual grammar; that is, they cluster together
in social and geographical space. Clearly, the specific geographical distributions

5 See Map 1 at http://www.meertens.knaw.nl/cms/component/content/article/144225.


160 Leonie Cornips

and specific word order combinations are the products of interactions between
different and changing groups of speakers. To be more specific, both the accepta-
bility judgments and spontaneous speech data (cf. Cornips 2009) provide such
orderly heterogeneity that speakers can be considered members of sociolinguistic
units (cf. Guy 2004). It is these clusters of individual speakers that may give rise to
labels such as “dialect” or “language”, both for linguists as well as for laypersons.
In the strongest form of Minimalism (cf. Chomsky 1995), the Universal
Grammar hypothesis states that syntactic variation does not exist. Apparent syn-
tactic variation such as the different orders in the MOD-AUX-Vpart cluster should
be reducible to the lexicon, that is, parameterization of morphosyntactic fea-
tures, and to phonological form, thus, different ways to spell out one and the
same syntactic structure phonologically (Barbiers, Cornips, and Kunst 2007).
The “grammar” perspective should be able to explain why some combinations
of orders are preferred above others (see Table 3) by identifying a grammatical
factor which is responsible for (i) the preferred combination of 1-2-3, 1-3-2 and 3-1-2
and (ii) the exclusion of 2-3-1 and 3-2-1 orders.6 The problematic issue, of course,
is how to account for the fact that individual speakers can use several variants
assuming that the different word orders all belong to one syntactic variable. This
issue is related to questions within the successive generative models about the
locus of syntactic variation, its restrictions and predictions. In the literature, two
alternative approaches to this “choice” are suggested (Muysken 2005): Either the
“choice” is put outside the grammatical mechanisms (Adger and Smith 2005,
2010; Adger 2006; Kroch 1989) or it is put inside the grammar by reintroducing
optional rules (Henry 2002; Wilson and Henry 1998). In the same vein as Henry
(1995), Barbiers (2005) argues that the categorical absence or presence of spe-
cific word orders in the verb clusters discussed above are the result of optional
movement in the syntactic component. An account in terms of optional move-
ment implies that various word orders differ only superficially from each other
(cf. Harris 1996: 32; Winford 1996; Cornips 1998). In a combined approach, it
should be examined how groups of speakers organize different word orders and
different combinations of word orders (possibly in combination with other lin-
guistic sets) as resources in ways that make sense for them under specific social
conditions (Jørgensen 2008: 167). Accordingly, this type of variation should also
be associated with particular “dialects”, social distinctions and values (Eckert

6 In Cornips (2009), I argued that the 1-2-3 order as a basic structure is capable of accounting
for the empirical results. The word orders in the MOD-AUX-Vpart cluster and their co-occurrences
show that verb raising of the participle, resulting in 1-3-2 and 3-1-2 order, depends on the verbal or
adjectival status of the participle (see Den Besten and Broekhuis 1989).
The no man’s land between syntax and variationist sociolinguistics 161

2000). Such an approach will give us a first glimpse into why inter- and intra­
speaker variation is so overwhelmingly present. Such a combined approach
“challenges the separation of ‘variety’ and ‘practice’ approaches” (Rampton
to appear). Rampton insists that we look for the connections between dialect
systems, reflexivity and interaction. The selective targeting, isolation and formal
description of linguistic features remains an essential analytical task, but if we
are able to construe these features as ingredients in a style or register, then we
need to attend to the ways in which, with varying levels of awareness, their inter-
actional use contributes to participants’ agentive self-positioning in the social
world, aligning them with certain ideological typifications of language, speech
and ways of being (cf. references cited in Rampton to appear). Let us now turn to
a second case study which shows that acceptability judgments differ at the level
of the individual speaker.

5.2 Acceptability judgments and idiolectal variability

In this section, variation between overt and null forms of a functional head
(D) will be considered. This phenomenon is of course of a very different nature
than word order alternations in verbal clusters. However, both types of variation
show inter- and intraspeaker variation both in written elicitation (verbal cluster
described above) and in oral interviews. In the phase of oral elicitation within
the Dutch syntactic atlas project (SAND), 250 locations were selected through-
out the Netherlands. We had a major problem in doing the fieldwork since the
large majority of the fieldworkers and Ph.D. students only speak the standard
variety (cf. Cornips 2006). Therefore, we let the subject recruit an acquaintance
as an “assistant interviewer” in order to be able to interview him/her in his/her
own dialect. This assistant interviewer was asked to translate a standard Dutch
structured elicitation task into his or her local dialect. These translations were
recorded. In a second session these recordings were played to the original subject.
In this second session, the entire conversation was restricted to the two dialect
speakers and the fieldworker did not interfere.
One of the locations involved in the oral interviews was Nieuwenhagen in the
southeast of Limburg province. In the local dialect, as elsewhere in that region,
proper names are obligatorily preceded by the definite determiner et or der ‘the’.
The presence of the definite determiner preceding a proper name, as in (2), is fully
ungrammatical in standard Dutch:

(2) et Marie / der Jan is krank


the Mary the Jan is ill
162 Leonie Cornips

The recording of the first session between the standard Dutch-speaking field-
worker and the local assistant interviewer who translates standard Dutch into his
own dialect shows that the definite article in his translation is absent; the proper
names Wim and Els appear without it, as illustrated in (3):

First session (dialect – standard Dutch interviewer)

(3) Ø Wim dach dat ich Ø Els han geprobeerd e kado te geve
Ø Wim thought that I Ø Els have tried a present to give
‘Wim thought that I tried to give a present to Els.’

In the second session, however, in which only the assistant interviewer inter-
views the dialect speaker in the local dialect, the latter utters the definite article
as “required”:

Second session (dialect – dialect speaking interviewer)

(4) Der Wim menet dat ich et Els e boek probeerd ha kado te geve.
the Wim thought that I the Els a book tried has present to give
‘Wim thought that I tried to give Els a book as a present.’

This individual variation may be accounted for in different ways. From a


“grammar” perspective, idiolectal variation may be explained away as a task
effect (Cornips and Poletto 2005) or at least as a type of variation that is not
considered to be the focus of the analysis. On the other hand, this type of vari-
ation may indeed be the result of “grammar” in use as discussed earlier – that is,
the subject accommodates to and identifies with his interlocutor. In this social
context, speakers can adjust to others without a noticeable effort. From this per-
spective, different situations of contact bring about different types of linguistic
variation. Thus, to investigate the locus of linguistic variation, to understand
its social function and its spread throughout groups of speakers (communities
of practices), these different contact situations require scrutiny. Hence, it is in
the domain of specific interactions that syntactic restrictions or, put differently,
the borders of variation space of grammar will be challenged and stretched and
where so-called impossible or hitherto unrealized constructions will become pos-
sible and realized.

5.3 Peer group settings in multilingual contexts: Beyond V2

In recent years, contemporary urban vernaculars have emerged among adoles-


cents in multilingual settings in large cities throughout Europe (Auer and Dirim
2003; Blommaert 2011; Cornips and Nortier 2008; Cornips and De Rooij 2013;
The no man’s land between syntax and variationist sociolinguistics 163

Hewitt 1986; Jaspers 2005; Rampton 1995, 2005, to appear; Quist and Svendsen
2010). These multilingual contact settings provide situated contexts in which syn-
tactic restrictions are loosened and new types of syntactic variation emerge as an
integral part of grammar. Freywald et al. (to appear) describe how in urban ver-
naculars in big cities in Norway, Sweden and Germany the so-called verb-second
constraint (V2) is overridden. Normally, in these languages (as also in Dutch and
other Germanic languages) only one constituent may precede the finite verb in
declarative clauses. Whenever a declarative clause begins with something other
than the subject, subject-verb inversion is required, as in the German example in
(5) (all examples from (5) through (7) are taken from Freywald et al. (to appear)):

(5) a. Ich war gestern im Kino. [German]


I was yesterday at_the cinema

b. Gestern war ich im Kino.


Yesterday was I at_the cinema

c. *Gestern ich war im Kino.


Yesterday I was at_the cinema

However, in Norwegian, Swedish and German urban vernaculars, this restriction


seems to be less robust. The typical appearance of what looks like a violation of
the V2 constraint is the order “adverbial – subject – finite verb” (Adv-S-Vfin), as in
example (6):

(6) a. I dag hun lagde somalisk mat [Norwegian]


today she made somali food
‘Today she made somali food.’
(Upus-corpus, Lukas)

b. GEStern isch war KUdamm [German]


yesterday I was Kudamm
‘Yesterday I was at the Ku’damm.’ [= short for Kurfürstendamm]
(KiDKo, transcript MuH9WT)

c. å sen dom börjar DRICKA den [Swedish]


and then they start drink it
‘And then they start drinking it.’
(SUF-corpus, Cornelia (P47))

The overall picture points to a systematic pattern from both a “grammar” and
a “usage” perspective. With respect to the former, the constituent that directly
precedes the finite verb is almost without exception the subject and these subject
constituents are in most cases pronominal. The adverbial in front of the subject
together with the finite verb situate the event in time; it helps to structure the
164 Leonie Cornips

(narrative) discourse. The most common adverbials are the equivalents to ‘then’,
‘afterwards’, and ‘after this’ (see (6) and (7)). With respect to the latter, these
“violations” of V2, or to be more precise, Adv-S-Vfin instances, occur only in peer
conversations. They are remarkably rare or even entirely absent in interviews and
written texts (cf. Freywald et al. to appear and references cited).
This phenomenon in vernacular urban Dutch is very rare. However, in the
sporadic occasions in which it emerges it obeys the same syntactic restrictions
as mentioned above: An adverbial and pronominal subject is placed before the
finite verb and the adverbial in front of the pronominal subject is a temporal one:

(7) toen we hadden eerst twee autos


then we had first two cars
‘Then, we first had two cars (and later only one).’
(Utrecht/TCULT-corpus, Badir; 1 out of 20 tokens of potential “then-S-V”)

In sum, in urban vernacular speech in interactions, individuals stretch the bound-


aries of variation space by producing Adv-S-Vfin instances that are considered
“ungrammatical” in declarative main clauses. The syntactic route is almost the
same for every speaker. Thus, this is a clear example of linguistic variation that is
driven by social interaction (contact settings between adolescents in large cities)
but grammatically restricted, hence only certain types of adverbs in the first posi-
tion of the main clause trigger V2 violations. Here we encounter a clear example
of the mutual relationship between individual grammar and group grammar. Let
us now consider another example of a mutual effect of this relationship, namely
variation in gender distinction in the adnominal domain.

5.4 Peer group settings in multilingual contexts: Underspecified gender feature

Dutch grammar classifies nouns into two grammatical gender categories:


common and neuter. Grammatical gender is reflected in a number of agreeing
elements accompanying or referring to the noun. Definite determiners are a clear
case: Singular definite determiners vary morphologically according to the gender
of the noun, as illustrated in Figure 2 below. Nouns that take the singular defi-
nite determiner de have common gender. Nouns that take the singular definite
determiner het have neuter gender. There is no gender distinction on the singular
indefinite determiner, which is een for both neuter and common nouns, or on the
plural definite determiner, which is de for both genders.
The no man’s land between syntax and variationist sociolinguistics 165

Definite determiners
Gender of noun Singular Plural

common de de
neuter het de

Figure 2: Definite determiners in Dutch

The results of experimental tasks have revealed that the monolingual acqui-
sition of the Dutch definite neuter determiner is a long process as children do
not acquire the target system before the age of seven (cf. Blom, Polišenskà, and
Weerman 2008; Polišenska and Weerman 2008). Instead, monolingual children
overgeneralize the definite determiner de and use it incorrectly with neuter nouns
that require the definite determiner het. It takes bilingual children even longer
to acquire grammatical neuter gender in Dutch. But although the overgeneral-
ization of the definite determiner de constitutes a linguistic resource for every
bilingual child (and even for monolingual acquirers, cf. Cornips and Hulk 2008),
it only becomes socially meaningful in a process of ethnic and age identifica-
tion (cf. Cornips 2008; Cornips and Hulk 2013). Nortier and Dorleijn (2008) point
out that youngsters of Moroccan descent are learning both linguistic norms and
norms of stylistic appropriateness. The overgeneralization of common gender is
one example of this process, as illustrated by the following quotation from a con-
versation with a Moroccan informant S. from Rotterdam:

1 S: Dat is het slechte Nederlands


2 Int1: En heeft dat ook een naam?
3 S: Ja, niet echt, maar ’t is in principe dan eh lidwoorden die gebruik je
4 dan expres verkeerd
5 Int2: Ja ja die gebruik je dan exprès verkeerd, net als-
6 S: Ja, dus
7 Int1: Die meisje
8 S: Die huis zeg ik dan. Terwijl ik weet ik bedoel ik weet heus wel dat het
9 dat huis is maar ’t staat zo dom als ik dat op straat zeg, als ik zeg
10 Int2: Ja
11 S: Als ik zeg dat huis
12 Int2: Ja ja
13 S: ’t Is gewoon die huis. Maar als ik met jullie spreek dan wordt ’t
14 gewoon dat huis.

S: That is the bad kind of Dutch.


Int1: Does it have a name?
S: No, not really, but in principle you uhm use the articles deliberately in the wrong way.
Int2: Right! So you use them in the wrong way deliberately? Just like-
166 Leonie Cornips

S: Yes, like
Int1: that little girl [line 7: diecommon meisjeNeuter]
S: I then say that house [line 8: diecommon huisNeuter]. While I know I mean I certainly know
that it should actually be that house [line 9: datneuter huisNeuter] but it would make a dumb
impression if I would say
Int2: Yes
S: If I would say that house [line 11: datneuter huisNeuter] in the street]
Int2: Yes, yes
S: It is just that house [line 13: diecommon huisNeuter]. But when I speak with you two (the authors
both Dutch and middle-aged JN/MD) it is just that house [line 14: datneuter huisNeuter].

“The speaker in the quotation explicitly says that he has to make errors, deviations
from the standard norm, in order to be recognized as someone who is hanging out
with friends” (cf. Nortier and Dorleijn 2008: 132). This quote suggests that the
construction of identity takes place by means of a language which is governed by
linguistic norms of the group the speaker wants to belong to and identifies with
in contrast to others.7 Subsequently, the interplay between the individual and the
group is visible in the syntactic output, i. e. gender markings on forms bearing
gender, namely the determiner.
However, the overuse of common gender on determiners also shows up in
other “languages” whose grammatical gender system is similar to Dutch – that
is, languages that have (i) a distinction between neuter and common gender;
(ii) little evidence for grammatical gender, with gender not being marked on the
noun, (iii) robust frequency differences such that common nouns outnumber
neuter nouns and (iv) very few morphophonological cues for the different gender
forms of the determiner. Quist (2008) and Kotsinas (2001: 150) note that in urban
vernacular speech in Copenhagen and Stockholm neuter indefinite and definite
determiner are being replaced by common determiners but not vice versa and
that the overuse of common gender always takes place in peer group settings:

Urban vernacular speech (Copenhagen)


neuter varies with common
et job ‘a job’ en job
det der blad ‘that magazine’ den der blad

urban vernacular speech (Stockholm)


neuter varies with common
ett bord ‘a table’ en bord

7 Of course, this example also raises interesting points concerning language awareness.
The no man’s land between syntax and variationist sociolinguistics 167

The interplay between individual grammar and group grammar is in line with
Widdicombe and Woofitt’s (1995: 36) claim that a mental state, which a grammar
is, is socially shared and therefore common to different members in a given
society or group. Cognition, in their view, is an individual’s mental reconstruc-
tion of shared physical and social environment. Therefore, the creation of norms
and mental reconstruction are agentive processes. In other words, language is
not a neutral medium of transmission of values, attitudes and opinions about a
world of events “out there”, but rather a medium through which social acts are
accomplished, thus language use is itself a form of social activity (Widdicombe
and Woofitt 1995: 1–2).
It is in this no man’s land between syntax and variationist sociolinguistics –
that is to say, in the border zone constituting the mutual relationships between
the individual and the social – that we can find the answer to the questions of
why and how society influences the individual (grammar) and vice versa.

6 Summary

The aim of this paper was to emphasize the point that there are two crucial issues
with regard to linguistic complexity that must be tackled within the grammar and
usage debate: the issue of idiolectal or intraspeaker variation, and the complex
and multilayered relationships between the individual and society that bring
about mutual effects on individual grammar and group grammar. It is at the level
of individual speakers that we can best examine the locus and limits of syntactic
variation. Here, we encounter the largest possible variation space and answers to
the questions of whether, why and how speakers enlarge this space, and in doing
so, overcome syntactic constraints.
I have discussed word order alternations in three-verb clusters in Dutch
dialects, the variable use of the definite determiner preceding proper nouns, the
instances of Adv-S-Vfin in main declarative clauses in Germanic V2 languages
and the use of common determiners instead of neuter ones in Dutch. Syntactic
variation can be studied from two perspectives: a theoretical-analytical one in
which grammar relates to genetic endowment and cognitive capabilities, and a
social perspective in which language is a social praxis. Of course, these two sides
of language can be studied in isolation from each other. However, an approach
combining crucial theoretical insights and methodologies from each discipline
brings us a step further in disentangling and understanding the phenomenon of
new and complex patterns in language use data.
A combined approach of the two disciplines is the only one that enters into
this no man’s land between syntax and sociolinguistics where variation is driven
168 Leonie Cornips

by social factors but constrained at the level of possible grammars. It will give us
insight into why inter- and intraspeaker variation is so overwhelmingly present
and why ungrammatical or impossible structures are realized. We get closer to
finding answers to the following questions: (i) What are the limits and loci of
syntactic variation? (ii) What is the reason for the fact that individual speakers do
not use all possible linguistic resources in the construction of regional and social
identities? (iii) How and why do speakers use ungrammatical constructions, i. e.
cross the borders of variation space?

References
Adger, David (2006): Combinatorial variability. Journal of Linguistics 42: 503–530.
Adger, David and Jennifer Smith (2005): Variation and the Minimalist Program. In: Leonie
Cornips and Karen P. Corrigan (eds.), Syntax and Variation. Reconciling the Biological with
the Social, 149–178. Amsterdam/Philadelphia: John Benjamins.
Adger, David and Jennifer Smith (2010): Variation in agreement: A lexical feature-based
approach. Lingua 120: 1109–1134.
Adger, David and Graeme Trousdale (2007): Variation in English syntax: theoretical
implications. English Language and Linguistics 11/2: 261–278.
Auer, Peter and İnci Dirim (2003): Socio-cultural orientation, urban youth styles and the
spontaneous acquisition of Turkish by non-Turkish adolescents in Germany. In: Jannis
K. Androutsopoulos and Alexandra Georgakopoulou (eds.), Discourse Constructions of
Youth Identities, 223–246. Amsterdam/Philadelphia: John Benjamins.
Barbiers, Sjef (2005): Theoretical restrictions on word order variation in three-verb clusters.
In: Leonie Cornips and Karen P. Corrigan (eds.), Syntax and Variation. Reconciling the
Biological with the Social, 233–264. Amsterdam/Philadelphia: John Benjamins.
Barbiers, Sjef, Leonie Cornips and Jan-Pieter Kunst (2007): The Syntactic Atlas of the Dutch
Dialects (SAND): A corpus of elicited speech as an on-line Dynamic Atlas. In: Joan C. Beal,
Karen P. Corrigan and Hermann Moisl (eds.), Models and Methods in the Handling of
Unconventional Digital Corpora. Volume 1: Synchronic Corpora, 54–90. Hampshire:
Palgrave-Macmillan.
Besten, Hans den and Hans Broekhuis (1989): Woordvolgorde in de werkwoordelijke eindreeks
[Word order in verb clusters]. Glot 12: 79–137.
Blom, Elma, Daniela Polišenskà and Fred Weerman (2008): Articles, adjectives and age of
onset: The acquisition of Dutch grammatical gender. Second Language Research 24:
297–332.
Blommaert, Jan (2011): Supervernaculars and their dialects. Tilburg Papers in Culture Studies 9:
54–90.
Cornips, Leonie (1994): Syntactische variatie in het algemeen Nederlands van Heerlen
[Syntactic variation in Heerlen Dutch]. Ph.D. dissertation, University of Amsterdam.
Cornips, Leonie (1998): Syntactic variation, parameters and their social distribution. Language
Variation and Change 10/1: 1–21.
Cornips, Leonie (2005): Variation and formal theories of syntax, Chomskian. In: Keith Brown
(ed.), Encyclopedia Language & Linguistics, 330–332. Oxford: Elsevier.
The no man’s land between syntax and variationist sociolinguistics 169

Cornips, Leonie (2006): Intermediate syntactic variants in a dialect – standard speech


repertoire and relative acceptability. In: Gisbert Fanselow, Caroline Féry, Ralf Vogel and
Matthias Schlesewsky (eds.), Gradience in Grammar. Generative Perspectives, 85–105.
Oxford: Oxford University Press.
Cornips, Leonie (2008): Loosing grammatical gender in Dutch. The result of bilingual
acquisition and/or an act of identity? International Journal of Bilingualism 12: 105–124.
Cornips, Leonie (2009): Empirical syntax: idiolectal variability in two- and three-verb clusters
in regional standard Dutch and Dutch dialects. In Andreas Dufter, Jürg Fleischer and Guido
Seiler (eds.), Describing and Modeling Variation in Grammar, 203–224. Berlin/New York:
Mouton de Gruyter.
Cornips, Leonie (2014): Language contact, linguistic variability and the construction of local
identities. In Tor Åfarli and Britt Mæhlum (eds.), The Sociolinguistics of Grammar, 67–90.
Amsterdam: John Benjamins.
Cornips, Leonie and Karen P. Corrigan (eds.) (2005a): Syntax and Variation. Reconciling the
Biological with the Social. Amsterdam/Philadelphia: John Benjamins.
Cornips, Leonie and Karen P. Corrigan (2005b): Convergence and divergence in grammar. In:
Peter Auer, Frans Hinskens and Paul Kerswill (eds.), Dialect Change: Convergence and
Divergence in European Languages, 96–134. Cambridge: Cambridge University Press.
Cornips, Leonie and Aafke Hulk (2008): Factors of success and failure in the acquisition of
grammatical gender in Dutch. Second Language Research 28: 267–296.
Cornips, Leonie and Aafke Hulk (2013): Late child acquisition and identity construction:
variation in use of the Dutch definite determiners de and het. In: Peter Auer, Javier Caro
Reina and Göz Kaufmann (eds.), Language Variation – European Perspectives IV. Studies in
Language Variation, 57–67. (SILV 14.) Amsterdam/Philadelphia: John Benjamins.
Cornips, Leonie and Willy Jongenburger (2001): Elicitation techniques in a Dutch syntactic
dialect atlas project. In: Hans Broekhuis and Ton van der Wouden (eds), Linguistics in the
Netherlands 18, 553–563. Amsterdam/Philadelphia: John Benjamins.
Cornips, Leonie and Jacomine Nortier (eds.) (2008): Ethnolects? The Emergence of New Varieties
among Adolescents (Special Issue of the International Journal of Bilingualism 12/1–2).
Cornips, Leonie and Cecilia Poletto (2005): On standardising syntactic elicitation techniques.
PART I. Lingua 115/7: 939–957.
Cornips, Leonie and Vincent de Rooij (2013): Selfing and othering through categories of race,
place, and language among minority youths in Rotterdam, The Netherlands. In: Peter
Siemund, Ingrid Gogolin, Julia Davydova and Monika Schulz (eds.), Multilingualism
and Language Contact in Urban Areas: Acquisition – Development – Teaching –
Communication, 129–164. Amsterdam/Philadelphia: John Benjamins.
Eckert, Penelope (2000): Linguistic Variation as Social Practice. Oxford: Blackwell.
Eckert, Penelope (2008): Variation and the indexical field. Journal of Sociolinguistics 12(4):
453–476.
Eckert, Penelope (2012): Three waves of variation study: the emergence of meaning in the study
of sociolinguistic variation. Annual Review of Anthropology 41: 87–100.
Eckert, Penelope and Sally McConnell-Ginet (1992): Think practically and look locally: language
and gender as community-based practice. Annual Review of Anthropology 21: 461–490.
Eckert, Penelope and Sally McConnell-Ginet (1999): New generalizations and explanations in
language and gender research. Language in Society 28(2): 185–202.
Freywald, Ulrike, Leonie Cornips, Natalia Ganuza, Ingvild Nistov and Toril Opsahl (to appear):
Beyond verb second – a matter of novel information structural effects? Evidence from
170 Leonie Cornips

Norwegian, Swedish, German and Dutch. In: Jacomine Nortier and Bente A. Svendsen
(eds.), Language Youth & Identity in the 21st Century. Cambridge: Cambridge University
Press.
Guy, Gregory (2004): Dialect unity, dialect contrast: the role of variable constraints. Talk
presented at the Meertens Institute, Amsterdam August 9.
Guy, Gregory (2005): Grammar and usage: A variationist response. Language 81(3): 561–563.
Harris, John (1996): Syntactic variation and dialect divergence. In: Rajendra Singh (ed.),
Towards a Critical Sociolinguistics, 31–59. Amsterdam/Philadelphia: John Benjamins.
Henry, Alison (1995): Belfast English and Standard English: Dialect Variation and Parameter
Setting. Oxford: Oxford University Press.
Henry, Alison (2002): Variation and syntactic theory. In: Jack Chambers, Peter Trudgill and
Nathalie Schilling (eds.), The Handbook of Language Variation and Change, 267–282.
Malden: Blackwell.
Henry, Alison (2005): Idiolectal variation and syntactic theory. In: Leonie Cornips and Karen
P. Corrigan (eds.), Syntax and Variation. Reconciling the Biological with the Social,
109–122. Amsterdam/Philadelphia: John Benjamins.
Hewitt, Roger (1986): White Talk Black Talk. Inter-racial Friendship and Communication amongst
Adolescents. Cambridge: Cambridge University Press.
Jaspers, Jürgen (2005): Linguistic sabotage in a context of monolingualism and standardization.
Language & Communication 25: 279–297.
Jensen, Torben Juel and Tanya Karoli Christensen (2013): Promoting the demoted: The
distribution and semantics of “main clause word order” in spoken Danish complement
clauses. Lingua 137: 38–58.
Jørgensen, Jens Normann (2008): Polylingual languaging around and among adolescents.
International Journal of Multilingualism 5: 161–176.
Kayne, Richard S. (1994): The Antisymmetry of Syntax. Cambridge, Mass.: MIT Press.
Kayne, Richard S. (1996): Microparametric syntax: some introductory remarks. In: James
R. Black and Virgina. Motapanyane (eds.), Microparametric Syntax and Dialect Variation,
ix–xviii. Amsterdam/Philadelphia: John Benjamins.
Kotsinas, Ulla-Britt (2001): Pidginization, creolization and creoloids in Stockholm, Sweden. In:
Norval Smith and Tonjes Veenstra (eds.), Creolization and Contact, 125–156. Amsterdam/
Philadelphia: John Benjamins.
Kroch, Anthony (1989): Reflexes of grammar in patterns of language change. Language
Variation and Change 1: 199–244.
Labov, William (1972): Sociolinguistic Patterns. Philadelphia: University of Pennsylvania Press.
Labov, William (1975): What is a Linguistic Fact? Lisse: Peter de Ridder Press.
Labov, William (1994): Principles of Linguistic Change. Internal Factors. Oxford: Blackwell.
Lavandera, Beatriz (1978): Where does the sociolinguistic variable stop? Language in Society 7:
171–182.
Meechan, Marjory and Michele Foley (1994): On resolving disagreement: linguistic theory and
variation – There’s bridges. Language Variation and Change 6: 83–85.
Meyerhoff, Miriam (2002): Community of practice. In: Jack Chambers, Peter Trudgill and Natalie
Schilling-Estes (eds.), Handbook of Language Variation and Change, 526–548. Malden:
Blackwell.
Muysken, Pieter (1999): Talen. De Toren van Babel [Languages. The Tower of Babel].
Amsterdam: Amsterdam University Press.
The no man’s land between syntax and variationist sociolinguistics 171

Muysken, Pieter (2000): Radical modularity and the possibility of sociolinguistics. Paper
presented at the Sociolinguistics Symposium 2000, 27–29 April, University of the West of
England, Bristol.
Muysken, Pieter (2005): A modular approach to sociolinguistic variation in syntax: the gerund
in Ecuadorian Spanish. In: Leonie Cornips and Karen P. Corrigan (eds.), Syntax and
Variation. Reconciling the Biological with the Social, 31–54. Amsterdam/Philadelphia:
John Benjamins.
Nortier, Jacomine and Margreet Dorleijn (2008): A Moroccan accent in Dutch: A sociocultural
style restricted to the Moroccan community? International Journal of Bilingualism 12:
125–142.
Pintzuk, Susan (1995): Variation and change in Old English clause structure. Language
Variation and Change 7: 229–260.
Poletto, Cecilia and Paola Benincà (2007): The ASIS enterprise: a view on the construction of a
syntactic atlas for the Northern Italian dialects. Nordlyd 34(1): 35–52.
Quist, Pia (2008): Sociolinguistic approaches to multiethnolect. International Journal of
Bilingualism 12(1–2): 43–62.
Quist, Pia and Bente Ailin Svendsen (eds.) (2010): Multilingual Urban Scandinavia: New
Linguistic Practices. Bristol: Multilingual Matters.
Rampton, Ben (1995): Styling the Other: Introduction. Journal of Sociolinguistics 3(4): 421–427.
Rampton, Ben (2005): Crossing: Language and Ethnicity among Adolescents. Manchester:
St. Jerome Publishing.
Rampton, Ben (to appear): Contemporary urban vernaculars. In: Jacomine Nortier and Bente
A. Svendsen (eds.), Multilingual Urban Sites. Structure, Activity and Ideology. Cambridge:
Cambridge University Press.
Sankoff, Gillian. (1990): The grammaticalization of tense and aspect in Tok Pisin and Sranan.
Language Variation and Change 2: 295–312.
Sells, Peter, John R. Rickford and Thomas Wasow (1996): Variation in negative inversion in
AAVE: an optimality theoretic approach. In: Jennifer Arnold, Renee Blake, Brad Davidson,
Scott Schwenter and Julie Solomon (eds.), Sociolinguistic Variation: Data, Theory and
Analysis, 161–176. Stanford: Center for the Study of Language and Information.
Silverstein, Michael (1985): Language and the culture of gender. In: Elizabeth Mertz and Richrd
J. Parmentier (eds.), Semiotic Mediation, 219–259. New York: Academic Press.
Vangsnes, Øystein Alexander (2007): Scandinavian dialect syntax (before and after) 2005.
Nordlyd 34(1): 7–24.
Widdicombe, Sue and Robin Wooffitt (1995): The Language of Youth Subcultures: Social Identity
in Action. New York: Harvester Wheatsheaf.
Wilson, John and Alison Henry (1998): Parameter setting within a socially realistic linguistics.
Language in Society 27: 1–21.
Winford, Donald (1996): The problem of syntactic variation. In: Jennifer Arnold, Renee Blake,
Brad Davidson, Scott Schwenter and Julie Solomon (eds.), Sociolinguistic Variation: Data,
Theory and Analysis, 177–192. Stanford: Center for the Study of Language and Information.
Aria Adli, University of Cologne
What you like is not what you do:
Acceptability and frequency in syntactic
variation

Abstract: An interesting property of French wh-questions is the presence of


several syntactic variants. This study discusses a quantitative analysis of wh-
adjunct and wh-object questions, drawing on gradient acceptability judgments
as well as frequency of occurrence in spontaneous speech. Both types of evidence
are collected with the same set of speakers in order to allow direct comparisons.
The results show interesting mismatches between acceptability and frequency.
First, the preferred variants differ: Speakers make use of the wh-in-situ variant
(the most frequent form), the variant with the particle est-ce que and the whSV
form, but not the whVS, SVwhO and wh-cleft variants. However, the judgment
data show that all of these variants are acceptable (with formal variants being
more acceptable than colloquial ones). Second, the contrasts between wh-
adjunct and wh-object questions are clear-cut in spontaneous speech, while they
are mild in acceptability. In particular, we see that the particle est-ce que is only
used with the wh-object que, although wh-adjunct with est-ce que is also con-
sidered equally acceptable. Thirdly, fine-grained corpus analyses distinguishing
between different categories of wh-adjuncts and wh-objects highlight a differen-
tial behavior among wh-words: wh-reason-adjuncts are essentially precluded
with the wh-in-situ order. Furthermore, the particle est-ce que is limited to the
inanimate wh-object que (‘what’), i. e. it is not used with the animate wh-object
qui (‘whom’). The mismatches between acceptability and frequency prove to
be less pronounced in fine-grained analyses. With regard to the larger method-
ological picture, we see the interesting potential that lies in combined studies
drawing on acceptability as well as frequency. In particular, acceptability judg-
ments – which traditionally have a bad standing in sociolinguistics – can help to
reveal normative effects and play a crucial role in the circumscription of the full
envelope of variation.

1 Relation between acceptability and frequency

The two most important sources of evidence in grammar research are acceptabil-
ity judgments and corpus data. They are closely associated with specific theoreti-
174 Aria Adli

cal frameworks and traditions. Acceptability judgments are seen as the royal path
in generative grammar. They are considered a direct reflection of the real object
of interest: ILanguage or competence. Corpus data of actual language use are in
the center of interest in sociolinguistics and usage-based approaches. Apart from
theoretical differences between the frameworks, various approaches can also be
incompatible at the methodological level, illustrated by the following quotations.
On the one hand, Chomsky (1965: 191) has already claimed half a century ago:
“To maintain, on grounds of methodological purity, that introspective judgments
of the informant (often, the linguist himself) should be disregarded is, for the
present, to condemn the study of language to utter sterility”. This position is still
up-to-date for today’s generative syntacticians. On the other hand, Labov (1996:
83) states that “when the use of language is shown to be more consistent than
introspective judgments, a valid description of the language will agree with that
use rather than with intuitions”. The majority of sociolinguists share his critical
stance towards introspection.
Yet, these antagonistic positions have slightly softened – a development
sustained by generative studies on diachronic syntax and language acquisition.
Many linguists would agree that the choice of the type of evidence depends more
on the research question than on some inherent quality criterion of the data type
itself. Just as corpus data can be extracted in a more or less meaningful way,
introspection can be collected in a more or less convincing manner. Whatever
method we use, focus should be given to careful methodology and data handling.
Nevertheless, one important issue remains open: What is the relation between
introspection and language use and how can we model it? It is very important to
find out whether both empirical sources lead to the same answer on one theoret-
ical question. In many circumstances it is certainly advantageous to corroborate,
if possible, theoretical hypotheses in linguistics by means of different types of
empirical data. However, the road to such a complementary approach needs to
be better paved. More specifically, we need to have more precise knowledge on
the relation between acceptability and frequency to better interpret the results of
a study working with both types of data.
In order to do so, this study presents empirical findings on syntactic variation
in French wh-questions, using frequency as well as gradient acceptability data.
The unique aspect of the present study is the fact that both types of data were
collected from the same speakers.
The relation between acceptability and frequency is an under-studied issue.
We have only few empirical studies thus far.
Backus and Mos (2011) compare gradient acceptability judgments and
corpus data, however not with regard to word order but to two ways of expressing
potentiality, namely by a derivational morpheme equivalent to English -able or
Acceptability and frequency in syntactic variation 175

by a copula construction. They observe a good match between introspection and


production.
Stefanowitsch (2008) discusses from a cognitive usage-based perspective
the correlation between negative evidence in corpora and acceptability judg-
ments with respect to typical uses of verbs with the ditransitive/dative contrast
in English. He essentially suggests that the quantitative difference between the
expected and observed frequency of the co-occurrence of linguistic constructions,
which the speakers presumably are capable of calculating “subconsciously”, cor-
relate with degrees of unacceptability.
Bybee and Eddington (2006) is another study on the lexicon in the usage-
based framework. The authors analyze the correlation between high- vs. low-
frequency verb+adjective expressions in Spanish. They show that high-frequency
items are also judged as more acceptable, suggesting that “grammaticality or
acceptability judgments are heavily based on familiarity, that is, the speaker’s
experience with language” (Bybee and Eddington 2006: 349).
Featherston (2005) analyzes several variants concerning three phenomena
discussed in the theoretical literature on German syntax (discourse-linking,
parenthesis vs. extraction, object coreference). He shows that only those con-
structions occur in corpora that have a relatively high degree of acceptability. In
order to explain mismatches between acceptability and frequency, he suggests
that human grammar contains both a cumulative and a probabilistic component.
Kempen and Harbusch (2008) compare gradient experimental judgments
and frequency counts from corpora with respect to word order variants in German
finite subordinate clauses. They observe that only those constructions which
scored high in the judgment test were also detectable in the corpus. However, the
constructions showed a large amount of variation with respect to their frequency
of occurrence, although they all received high acceptability values. Based on their
results, Kempen and Harbusch (2008) propose a two-factor theory in order to
explain this mismatch: First, a construction must exceed a frequency minimum
to be learnable, i. e. to be included into the child’s grammar.1 Second, repair
mechanisms can give a positive bias to ungrammatical constructions, leading
to marginal results, which “should not be mistaken for an authentic grammati-
cality rating” (Kempen and Harbusch 2008: 190). However, we do not think
that their two-factor theory is the whole story: If scarcity of a construction was
an indicator that it is not part of the grammar, we would end up with a fairly
restrictive grammar. Such a grammar would not include many non- or hardly-

1 The criterion of learnability replaces a grammaticality threshold previously proposed in


Kempen and Harbusch (2005) but now dismissed by the authors.
176 Aria Adli

occurring items (e. g. multiple questions) which native speakers nevertheless can
give surprisingly stable and even nuanced, gradient judgments about. What is
more, many constructions that are qualified by most speakers as natural and fully
acceptable can be fairly scarce in usage (e. g. wh-indirect object questions, see
below; see also the discussion in Sampson 2007).
Bader and Häussler (2010) also compare gradient judgments of corpus data
with respect to the order of subject and object and to verb-cluster linearization
in German. They observe a similar mismatch as Kempen and Harbusch (2008)
and Adli (2011c), namely that constructions with a high level of acceptability can
greatly vary with respect to frequency (this “ceiling effect” has previously been
underestimated by Featherston 2005). At the same time, extreme scarcity of a
construction does not allow us to predict its level of acceptability.
We propose in Adli (2011c) the concept of a latent construction to refer to
those fully acceptable but extremely scarce or non-occurring constructions. Fur-
thermore, we propose a possible scenario for certain types of diachronic change,
involving the following steps: (i) A construction X is not available in grammar, (ii)
a construction X becomes available in grammar but is not used, (iii) X is used as
part of a set of optional syntactic variants, (iv) cases of unstable optionality are
dissolved, leaving only X (Adli 2011c: 398).
We also mention the early studies by Greenbaum (1976, 1977). He showed
a correlation between acceptability judgments and judgments on the assumed
frequency of the same constructions. He showed that native speakers believe that
the more acceptable a construction is, the more often it occurs. Interestingly, this
is a misbelief of the speakers, as we know today. One could also take Greenbaum’s
(1976, 1977) result as an indication that speakers are probably mostly unaware of
the large degree of variation in frequency among acceptable constructions.
Given the challenge to explain why certain constructions are acceptable
but hardly occur, there is a close link between the issue of the relation between
acceptability and frequency and the issue of data scarcity of specific construc-
tions in corpora. In this context, we also mention Pullum (2007), who discusses
rarity in corpora. Similarly, Foster (2007) and Ayres-Bennett (1994) discuss neg-
ative evidence in corpora (see also Stefanowitsch 2008). The issue of rare typo-
logical features from a generative point of view has been discussed by Newmeyer
(2010) and Rijkhoff (2010). The underlying problems raised above are not new
(though they have rarely been discussed in methodological terms): The question
to ask is whether rare constructions are also marked constructions. This has been
explicitly stated by Baayen et al. (1997: 14), and goes back to Greenberg (1966) and
Trubetzkoy (1939). On this matter, Haspelmath (2006: 33) pleads in favor of using
directly the notions “rare” or “frequent” instead of the fairly polysemous notions
of marked or unmarked.
Acceptability and frequency in syntactic variation 177

When comparing two measures, a standard practice in empirical methodol-


ogy is to collect these measures with the same subjects. However, apart from Adli
(2011c) (and disregarding Greenbaum 1976, 1977, who does not study acceptabil-
ity), all studies mentioned above compare a sample of speakers who gave intro-
spective judgments with known corpora, i. e. one measure taken from sample 1
(acceptability judgments) is compared to another measure taken from sample 2
(authors or speakers recorded in corpora). This approach is understandable from
a practical point of view due to a lack of appropriate data. Having said this, the
innovative aspect of the present study is the fact that spontaneous speech, accept-
ability data and social information were collected from the same set of speakers,
compiled in the database sgs described below. We can therefore rule out that
any difference or similarity observed between acceptability and frequency might
be due to the (social, individual, dialectal, text- or discourse-specific…) differ-
ences between the respective samples. We believe that this approach offers more
reliable results on the relation between both data types. The present study thus
extends the research program which we started with Adli (2011c) on Spanish. We
now turn to another language (and another sample), namely French and analyze
the phenomenon of French wh-questions. Another aspect of the present study
worth mentioning is that it analyzes an envelope of variation (standard practice
in quantitative sociolinguistics but not in syntax), both with acceptability and
frequency data.

2 Acceptability and frequency in linguistic variation

Before presenting the constructions that are going to be compared in terms of


acceptability and frequency, we give a brief overview of syntactic variation in
French wh-syntax. Since we aim at taking into account the entire set of variants
(called “circumscription of the envelope of variation” in variationist terminol-
ogy), this is an important preliminary step. This step builds on the principle of
accountability (Labov 1982: 30) which states that the variants belonging to the
same variable must be specified by the total number of occurrences and the
potential occurrences or non-occurrences in the variable environment.
The following list illustrates the large repertory of syntactic variants of wh-
questions in French with (a) examples being wh-adjunct questions and (b) exam-
ples being wh-object questions.

(1a) tu fais le dessin quand ? [wh-in-situ]


you make the drawing when
‘When do you make the drawing?’
178 Aria Adli

(1b) tu vois qui devant la fenêtre ? [wh-in-situ]


you see who in front of the window
‘Who do you see in front of the window?’

(2a) quand est-ce que tu fais le dessin ? [wh-ESQ]


when est-ce que you make the drawing

(2b) qui est-ce que tu vois devant la fenêtre ? [wh-ESQ]


who est-ce que you see in front of the window

(3a) quand tu fais le dessin ? [whSV]


when you make the drawing

(3b) qui tu vois devant la fenêtre ? [whSV]


who you see in front of the window

(4a) quand fais -tu le dessin ? [whVSclit (=clitic inv.)]


when make -you the drawing

(4b) qui vois -tu devant la fenêtre ? [whVSclit (=clitic inv.)]


who see -you in front of the window

(5) tu fais quand le dessin ? [SVwhO]


you make when the drawing

(6a) quand les enfants font -ils le dessin ? [complex inversion]


when the childreni make -theyi the drawing
‘When do the children make the drawing?’

(6b) qui les enfants voient -ils devant la fenêtre ? [complex inversion]
who the childreni see -theyi in front of the window
‘Who do the children see in front of the window?’

(7a) c’est quand que tu fais le dessin ? [wh-in-situ cleft]


it is when that you make the drawing

(7b) c’est qui que tu vois devant la fenêtre ? [wh-in-situ cleft]


it is who that you see in front of the window

(8a) quand c’est que tu fais le dessin ? [wh-cleft]


when it is that you make the drawing

(8b) qui c’est que tu vois devant la fenêtre ? [wh-cleft]


who it is that you see in front of the window

(9a) quand est-ce que c’est que tu fais le dessin ? [wh-ESQ cleft]
when est-ce que it is that you make the drawing

(9b) qui est-ce que c’est que tu vois devant la fenêtre ? [wh-ESQ cleft]
who est-ce que it is that you see in front of the window
Acceptability and frequency in syntactic variation 179

The list distinguishes on a first level between non-clefted wh-questions, (1a) to


(6b), and clefted wh-questions, (8a) to (9b) (Lambrecht 2001; Dufter 2008). We
can find both wh-in-situ constructions as in (1a) and (1b) (Adli 2006; Hamlaoui
2011; Déprez et al. 2013), which are not restricted to echo questions (Reis 1991;
Escandell-Vidal 2002; Sobin 2010), and wh-fronted constructions. In the fronted
variants, the initial wh-word can be followed by the interrogative particle est-ce
que as in (2a) and (2b), or by (non-inverted) subject and verb as in (3a) and (3b),
or by inverted subject and verb as in (4a) and (4b).2 Furthermore, French allows
clefted wh-questions which also exhibit part of the already mentioned variation:
wh-clefts can appear with wh-in-situ as in (7a) and (7b), or with a fronted wh--
element as in (8a) and (8b), or with the particle est-ce que as in (9a) and (9b).
Two of the non-clefted questions are rather restricted: First, (5) is a marked
variant of the wh-in-situ question in which the direct object is postposed. This
option exists with wh-adjunct questions containing a transitive or bitransitive
verb. It resembles the wh-in-situ question with right-dislocated object (tu le fais
quand le dessin), apart from the fact that the object clitic (le) is missing. Second,
(6a) and (6b) are so-called complex inversions in which the full subject is doubled
by a coreferential inverted clitic.

3 Methodology

3.1 Overview of the sgs database

Sgs is a multilingual database that we have been constructing since 2004 (see
Adli 2011b). It contains data on four languages – French, Spanish, Catalan and
Persian – that have been collected using the same methodological protocol. Every
person was first recorded, then participated in a gradient acceptability judgment
test, and finally filled out an extensive social questionnaire. Spontaneous speech
data were obtained by recording interviewer and interviewee while they played
a specifically designed game. Essentially, the interviewee had to solve a fictive
murder case by speaking freely with the (native and well-trained) interviewer.
Most interviewees chose a non-formal, rather colloquial register, encouraged
by a previous warm-up or “ice-breaker” phase. We favored this game task over

2 The construction with an inverted weak subject pronoun as in (4a) and (4b) is often called
subject-clitic inversion (Auger 1994; Elsig 2009). A construction with an inverted non-pronomi-
nal subject (e. g. Quand fait Jean le dessin?) is often referred to as stylistic inversion (Kayne and
Pollock 1978; Drijkoningen and Kampers-Manhe 2008).
180 Aria Adli

the standard sociolinguistic interview because it elicits a substantial number of


declarative and interrogative sentences, while sociolinguistic interviews are sen-
tence-type-restricted in the sense that interviewees hardly ever produce questions.
It would have been much more difficult (and costly) to realize the same study on
wh-questions with data obtained with the classic sociolinguistic interview.
The recordings were transcribed, time-stamped, and most importantly,
syntactically annotated. We annotated type and function of all major constit-
uents, including tree-relations of subordination and coordination in complex
sentences.
The French part of sgs contains 27 hours of recordings or 44,231 main lines,
51 % of which are produced by 101 interviewees and 49 % by the interviewers.
Interviews were carried out in the summer of 2005 in Paris with French native
speakers between the ages of 19 and 49 (mean age: 29). The sample is essentially
gender-balanced (56 % women and 44 % men). Only data of these 101 interview-
ees (and not of the interviewers) are taken into consideration in the following
analyses. According to the transcription and annotation guidelines, one main
line corresponds to one (full or elliptical) sentence, or to single interactional
markers such as a single oui (‘yes’), non (‘no’), or to pragmatic phenomena such
as false starts or interrupted, unfinished sentences. The corpus contains all in all
10,943 full, i. e. non-elliptical, sentences produced by the interviewees. Among
them, we find 1,721 root wh-questions. To my knowledge, this is the largest set
of wh-questions extracted from a single corpus. Yet, one should take also note
of Elsig (2009: 147), who extracted 1,055 tokens from the Ottawa-Hull corpus of
modern Canadian French (Poplack 1989). Coveney and Dekhissi (2013) extracted
roughly 1,070 true-information (i. e. non-rhetorical) wh-questions from a corpus
based on selected contemporary French films/screenplays (Dekhissi in prep.).
Druetta (2008: 37) worked with 395 wh-questions from the G. A. R. S. corpus,
recorded essentially in southeastern France. Aside from that, other studies on
wh-questions in spoken language work with rather small numbers (Behnstedt
1973; Coveney 1996).

3.2 Towards an envelope of variation

3.2.1 Descriptive overview


We will start by presenting descriptive details on the 1,721 extracted full, root wh-
questions from sgs in Table 1. Please recall that this set neither includes embed-
ded wh-questions such as tu sais quand il est parti à Paris? ‘Do you know when he
Acceptability and frequency in syntactic variation 181

left for Paris?’ nor elliptical wh-questions such as quand? or quand ça? ‘When?’.
Furthermore, it only includes true information questions.3

Table 1: Number of tokens of different word order variants of wh-questions

variant n percent

wh-in-situ 944 56.2 % (see (1a)/(1b))


wh-ESQ 281 16.7 % (see (2a)/(2b))
whSV 256 15.2 % (see (3a)/(3b))
whVS 167 9.9 % (see (4a)/(4b))
SVwhO 6 0.4 % (see (5))
wh-in-situ cleft 17 1.0 % (see (7a)/(7b))
wh-cleft 6 0.4 % (see (8a)/(8b))
wh-ESQ cleft 0 0.0 % (see (9a)/(9b))
complex inversion 2 0.1 % (see (6a)/(6b))
multiple wh-question 1 0.1 %

Please note that the cells in the table are not fully comparable because they do not
represent an envelope of variation. Most notably, wh-subject questions (e. g. qui
fait le dessin ‘who does the drawing’), which can be assigned due to the surface
order to the wh-fronted as well as the wh-in-situ category, were by definition not
assigned to the wh-in-situ category. Furthermore, stylistic inversion (e. g. quand
dort Jean ‘when does Jean sleep’) and subject-clitic inversion (e. g. quand dort-il
‘when does he sleep’) are aggregated into one category because the table does
not differentiate between pronominal and non-pronominal subjects. Finally, it
includes one multiple wh-question.
Yet, Table 1 provides insights into the frequency of different wh-variants
in spontaneous speech: First, we observe that only four variants are really pro-
ductive: inversion questions (see (4a)/(4b)), the form with initial wh-element
followed by subject and verb (see (3a)/(3b)), the form with the est-ce que par-
ticle (see (2a)/(2b)), and – by far the most frequent variant – the wh-in-situ form
(see (1a)/(1b)). Second, we see that complex inversion (see (6a)/(6b)) is basically
absent – which is less surprising due to its high level of formality. Thirdly, we
observe that wh-cleft constructions (see (7a) to (9b)) are extremely scarce.

3 Yet, there is no echo question and only nine rhetoric questions in sgs – rhetoric in the sense
of utterances that are pragmatically equivalent to declaratives with the speaker knowing the
answer (see Prieto and Rigau 2007).
182 Aria Adli

Our goal is to obtain a comparable set of constructions for the following anal-
yses. To this end, we further restrict the overall set of wh-questions in Table 1
by limiting ourselves to sentences with a pronominal subject. We know that
sentences with pronominal and lexical subjects are analyzed quite differently in
French: Weak pronouns in spontaneous French are clitics that can be analyzed
as mere verbal affixes (under this assumption colloquial French might in fact be
attributed properties of a null subject language, see e. g. Culbertson 2010). Fur-
thermore, limiting ourselves to pronominal subjects removes cases of stylistic
inversion from the whVS order and excludes cases of complex inversion like in
(6a)/(6b) (as a result, postverbal subjects will only occur as subject-clitic inver-
sions such as (4a)/(4b)). The result of this restricted set is shown in Figure 1, in
which the non-occurring wh-ESQ cleft constructions such as (9a)/(9b) and multi-
ple wh-questions are no longer represented. Figure 1 shows the number of tokens
for each word order variant in the French part of sgs, also further distinguishing
between wh-adjunct and wh-object questions. One should bear in mind that the
SVwhO order with wh-objects is not a zero frequency but an empty cell, because
this order is only defined for wh-adjunct questions (with transitive or ditransitive
verbs).
Figure 1 reveals a very clear distributional difference between wh-adjuncts
and wh-objects, which will be discussed in more detail further below.

350 333
wh-adjuncts: abs. frequency wh-objects: abs. frequency
300
250
205
200 174
150
150
100
50 24 28 34 26
6 5 2 1 1
0
wh-in-situ wh-ESQ whSV whVS SVwhO wh-in-situ wh-cleft
cleft

Figure 1: Number of tokens of different word order variants of wh-adjunct and wh-object
questions with pronominal subject

3.2.2 Calculating relative frequencies


In the next step, we calculate the proportion (percentage between 0 and 100 or
relative frequency between 0 and 1) of each of the seven variants of wh-object
questions and of each of the six variants of wh-adjunct questions shown in Figure
1. Essentially, there are two ways to calculate this measure: We can first add up all
Acceptability and frequency in syntactic variation 183

occurrences of each variant in the entire corpus and then calculate their propor-
tions. This means that we treat the entire corpus as a single unity, disregarding
the level of individual speakers. This measure is called “single-text-value” in Adli
(2011a: section 6.2). Or we can first calculate the proportions for each speaker and
then calculate the mean value of the proportions for the sample (called “speaker-
sample-value” in Adli 2011a: section 6.2). The differences are shown in (10a) and
(10b). For example, if we calculate the relative frequency of our target-variant
whSV among the x = 7 variants of wh-adjunct questions of Figure 1 as a single-
text-value, we would first add up all occurrences of our target variant across
all n = 101 speakers and then divide this number by the sum of all x = 7 variants
across all n = 101 speakers. However, if we want to work with the speaker-sample-
value, we would first add up the relative frequencies of all n = 101 speakers for our
target-variant whSV and divide this number by n = 101.
n

∑N TARGET-VARIANTi
(10a) relative frequency (single-text-value): i =1

 
∑  ∑ ( N ) 
n x

VARIANTji
i =1  j =1

 
 
 N TARGET-VARIANTi
n

∑  x 
i =1
 ∑ N VARIANTji
 j =1
( ) 

(10b) relative frequency (speaker-sample-value):
n

In order to work with speaker-sample-values, a corpus must adhere to stricter


conditions: Data must be collected from each speaker under comparable and con-
trolled conditions – which is the case in sgs. We will work in the following analy-
ses with speaker-sample-values according to (10b), because they are more robust
than single-text-values, especially when dealing with scarce data: Whenever we
have a distribution where many speakers have produced very few tokens of a
target variant and a few speakers have produced a comparatively high number
of tokens, single-text-values can overestimate the results, sometimes even dras-
tically distort them.
Figure 2 quantifies the same constructions as in Figure 1, but it shows relative
frequencies as speaker-sample-values. In a next step, these relative frequencies
can be compared to the gradient acceptability judgments, since (i) the judgments
will also be mapped on a linear scale from 0 to 1, and (ii) they are also mean scores
of individual values. Since we will later add judgment scores to the diagram, we
use lines and not bars in Figure 2 (multiple lines are often more readable than
multiple bars, especially for spotting interactions).
184 Aria Adli

Figure 2: Relative frequency (speaker-sample-value) of different word order variants of wh-


adjunct and wh-object questions with pronominal subject

0.70
0.62 wh-adjuncts: rel. frequency wh-objects: rel. frequency
0.60
0.50
0.43 0.39
0.40
0.30 0.25
0.20
0.04 0.06 0.06
0.10 0.01 0.01 0.00
0.00
wh-in-situ wh-ESQ whSV whVS SVwhO wh-in-situ wh-cleft
cleft

3.3 Fine-grained rating of acceptability

3.3.1 Gradient acceptability judgments on a visual analogue scale


Introspective data were collected experimentally using a gradient acceptabil-
ity judgment test, developed in Adli (2004: chapter 3) and already applied in
various studies (e. g. Adli 2010a). This instrument measures acceptability in a
gradient manner. Unlike the magnitude estimation technique (Bard et al. 1996),
it is based on a graphic rating or a visual analogue scale (Freyd 1923; Funke
2010). The scale has two endpoints (totally unacceptable and fully acceptable).
Subjects rate the perceived degree of acceptability by drawing a line with a pen:
The longer the line the more acceptable the sentence (see Adli 2011b for a com-
puter-based version of the test). Figure 3 shows an example page taken from the
test material. Two sheets that can be turned independently from each other are
placed in a letter size (A4) binder. A (suboptimal) reference sentence that sub-
jects have judged at the end of the training phase is printed on the upper sheet.
It remains visible until the end of the experiment, providing a self-chosen inter-
mediate scale anchor. Thus subjects can calibrate their ratings by means of both
endpoints and this anchor. The experimental sentences are printed on the white
sheets on the lower part of the binder. Once the subjects have rated the sen-
tences on a white sheet, they turn it to continue with the next test sentences (see
the Appendix for a list of all experimental items as well as the sentences from
the instruction and training phase).
The experiment starts with a thorough instruction phase (see Adli 2004:
chapter 3 for details), during which subjects learn the concept of gradience – as
opposed to binary good/bad, tripartite good/intermediate/bad judgments, etc.
Furthermore, they are instructed to judge syntactic well-formedness and not irrel-
Acceptability and frequency in syntactic variation 185

[R2]
Tous ont regardé qui ?
– +

[49]
Quel pilote conduit quelle voiture
dans le championnat ?
– +

[31]
Tu emmènes qui en vacances ?
– +
Figure 3: Gradient acceptability judgment test

evant extra-grammatical aspects (such as pragmatic plausibility). They are asked


to leave aside normative considerations and to refer to spoken, colloquial every-
day language.
During the instruction phase, subjects become accustomed to judging
acceptable, marginal and unacceptable constructions. Finally, their knowledge
of the instrument is verified in a training phase before starting with the actual
experimental items. Each of the constructions shown in Figure 4, namely (1a),
(1b), (2a), (2b), (3a), (3b), (4a), (4b), (5), (8a), and (8b), were presented in three
lexical variants (see Appendix) in order to obtain a more valid test score. Among
the three cleft forms (wh-in-situ cleft, wh-cleft, wh-ESQ cleft), the wh-cleft
variant (8a)/(8b) was included in the test. To give an example for calculating the
dependent variables, a subject’s experimental score for the wh-object question
(1b) is the arithmetic mean of her/his judgments of the respective three variants
of (1b). Except for the wh-element and the pronominal subject, lexical repetitions
between the test sentences were avoided. The order of the sentences was random-
ized. On average, the instruction and training lasted 15 minutes and the actual
experimental phase 20 minutes. Please note that the acceptability test included
other constructions that are not at the center of the present paper but which can
be considered filler sentences with respect to the present set of experimental
items.
186 Aria Adli

3.3.2 Descriptive overview of the ratings


Figure 4 shows the gradient acceptability values (two lines in the upper part) and
the relative frequencies of Figure 2 (two lines in the lower part).
The value points of wh-adjunct questions are linked by a broken line, and
the value points of wh-object questions by a continuous line. Since both scales
run from 0 to 1 (i. e. the domain of definition for both measures is [0,1]), they
can be mapped on the same diagram. Yet, it is important to keep in mind that
these are two qualitatively different measures that cannot be set in a numerical
relation to each other (for example, a statement such as “the acceptability of
wh-in-situ object questions is nearly twice as high as the frequency of wh-in-
situ object-questions” would not make sense). Also one should keep in mind that
frequency, unlike acceptability, is represented in a relative way, namely as a rel-
ative frequency or the proportion of one wh-variant among all other wh-variants.
The best way to understand Figure 4 is to see it as a superposition of two sheets
of tracing paper, one with a diagram on frequency and the other on acceptability.
Unsurprisingly, the judgment values of all sentences are within the range of
acceptable constructions. In terms of comparison the judgment test also included
several ungrammatical or suboptimal constructions, which obtained visibly lower
scores. For example, a multiple wh-question with superiority violation such as
qu’achète qui ce soir? ‘what buys who tonight?’ received the average acceptability
value of 0.27 (this is not shown in Figure 4).
The comparison of the frequency and acceptability values in Figure 4 reveals
several facts outlined in the following section.

wh-adjuncts: rel. frequency wh-objects: rel. frequency


wh-adjuncts: acceptability wh-objects: acceptability
1.00 0.99
0.83 0.95
0.90 0.96
0.91 0.77
0.80 0.83 0.66
0.70 0.74 0.66
0.62 0.65
0.60
0.50
0.39
0.40 0.43
0.30 0.25
0.20 0.06
0.06 0.00
0.10 0.06 0.01 0.00
0.00 0.04
wh-in-situ wh-ESQ whSV whVS SVwhO wh-cleft

Figure 4: Relative frequency and gradient acceptability of different word order variants of wh-
adjunct and wh-object questions with pronominal subject
Acceptability and frequency in syntactic variation 187

4 C
 omparison of frequency and acceptability
of French wh-variants

4.1 Gradience in acceptability, (near-)zero frequency

While all wh-variants score high enough in the judgment test to be considered
within the range of acceptable constructions (i. e. neither ungrammatical nor
marginal), with differences being gradual in nature, we do find categorical dif-
ferences on the frequency side: We can distinguish occurring from (nearly) non-
occurring forms.
wh-clefts such as (8a) and (8b) and the marked wh-in-situ order SVwhO such
as (5) essentially do not occur at all. Several other constructions occur very rarely,
namely the subject-clitic inversion whVS such as (4a) and (4b), the whSV order
with wh-objects such as (3b) and the wh-ESQ form with wh-adjuncts such as
(2a). Table 1 and Figure 2 have shown that the preferred order in usage is wh-in-
situ. This observation is unambiguous for ordinary, non-clefted questions, and it
also seems to apply to clefted questions (though the very low numbers of clefts
makes this last claim somewhat speculative).
Yet, the frequencies suggest two hypotheses to be pursued in future research:
The extreme scarcity of wh-cleft questions (of any type) is somewhat puzzling.
Either contrastive focus itself is a very scarce phenomenon in spontaneous speech
or contrastive focus is mainly expressed by prosodic and not syntactic means in
French wh-questions. What we can observe is that a syntactic device, namely
clefting, exists in French grammar but is hardly ever put to use. If we assume that
contrastive focus as such is not an extremely scarce phenomenon in spontaneous
speech interrogatives, we have to conclude that wh-cleft constructions are not a
standard form of expressing contrastively focused wh-questions in French, con-
tradicting Zubizarreta and Vergnaud (2005). These questions call for research at
the syntax-phonology interface where the context of each sentence would be care-
fully analyzed, too. The question of whether contrastive focus in French wh-ques-
tions is marked by prosodic rather than by syntactic cues remains an open one.
The scarcity of the marked wh-in-situ order SVwhO with postposed object
as in (5), which – as the judgments show – is within the range of acceptable con-
structions, is also somewhat surprising. One possible analysis would be that
a construction like (5), repeated as (11a), is derived from a construction with a
right-dislocated object as in (11b) by omitting the coreferential clitic pronoun in a
process similar to topic drop.

(11a) tu fais quand le dessin?


you make when the drawing
188 Aria Adli

(11b) tu lei fais quand le dessini?


you cl make when the drawing

A follow-up analysis reveals that the frequency of occurrence of right-dislocated


objects among all wh-questions is also very scarce: We only find 6 occurrences.
Thus, the scarcity of (5) is not surprising under the assumption that the SVwhO
order is derived from right-dislocated objects.
The scarcity of subject-clitic inversion as in (4a) and (4b) in spontaneous
speech – a phenomenon already observed by Coveney (1996) and Culbertson
(2010) – is in line with a clear distinction between “standard” French, the variety
at pace with normative considerations (and also employed for writing) and collo-
quial French. This distinction can be expressed by a model of diglossia (Zribi-Hertz
2010) or generalized bilingualism (Meisel et al. 2011) of French native speakers.

4.2 Gradience in both acceptability and frequency

We observe clear contrasts between wh-adjunct questions and wh-object ques-


tions in frequency for those three word orders that occur somewhat regularly
or that at least are not very scarce (namely wh-in-situ (1a)/(1b), wh-ESQ (2a)/
(2b) and whSV (3a)/(3b). However, these contrasts are very subtle (wh-ESQ and
whVS) or non-existent (wh-in-situ) in acceptability. One reason for this observa-
tion is the already-mentioned and surprising fact that wh-ESQ adjunct questions
and whSV-object questions hardly occur in usage. What is more, the adjunct-
object asymmetry has opposite directions in frequency and acceptability for the
wh-ESQ and the whSV order. We will therefore proceed to follow-up analyses of
the adjunct-object-asymmetry in frequency, which should provide some answers
to these puzzling facts.

4.3 Different preferences in acceptability and frequency

The preferences for certain wh-variants revealed by the judgments (recall that
all are nuances within the range of acceptable constructions) do not match the
pattern in usage. This overall acceptability-frequency mismatch is most salient
for the whVS form (4a)/(4b) (subject-clitic inversion receives the highest accepta-
bility scores and hardly occurs in usage), and is also fairly clear for the wh-in-situ
form (1a)/(1b) (its very high frequency is not reflected in the acceptability scores).
Interestingly, these two variants have a “non-neutral” register or style value, with
the wh-in-situ form being [+colloquial] and the subject-clitic-inversion [+formal].
Acceptability and frequency in syntactic variation 189

To make the picture complete: whSV (3a)/(3b) and SVwhO (5) are also [+collo-
quial], while wh-ESQ (2a)/(2b) is often described as “neutral” (Behnstedt 1973:
104; Coveney 1996: 98) in the sense that it fits into several registers. I represent the
register-neutrality of wh-ESQ by the presence of both [+colloquial] and [+formal].
Dufter (2008) shows that c’est clefts occur 2.5 times more often in corpus data of
spoken French compared to corpus data of written French. My interpretation of
his result is that wh-clefts, as scarce as they may be, are [+colloquial] (or better,
they tilt towards the [+colloquial] side). Hence, the two variants with the highest
acceptability values (wh-ESQ (2a)/(2b) and whVS (4a)/(4b)) are precisely those
forms which bear the [+formal] feature. What does this result mean for the rela-
tion between acceptability and frequency? It seems that speakers cannot not take
the normative perspective into consideration when making acceptability judg-
ments. Please recall that subjects were thoroughly instructed to leave aside the
normative perspective and to rely on colloquial language. I come back to this
point in Section 5.

4.4 The issue of granularity and the analysis of frequency

In order to understand the adjunct-object-asymmetries described in Sections 4.2


and 4.3, we will increase the level of granularity in the corpus queries. So far, we
have analyzed full root-wh questions with a weak pronominal subject, compar-
ing (the aggregation of all lexical/grammatical types of) wh-adjuncts with (the
aggregation of all lexical/grammatical types of) wh-objects. Please note that it is
useful and often necessary to aggregate lexical/grammatical subtypes: It allows
us to cover a range of constructions and thus to further generalize the findings,
and what is more, it helps to reduce the problem of data scarcity. To put it in
methodological terms: There is always a trade-off between internal validity and
consistency (fine-grained query, i. e. fewer data) and external validity and feasi-
bility (coarse-grained query, i. e. more data); for further discussion on the grain
problem, see Manning 2003 and Crocker and Keller 2006.
Granularity is a minor issue for the acceptability judgment test. The test sen-
tences were constructed – in line with standard experimental methodology – in
a consistent manner (see Appendix): In the present study, the wh-adjunct ques-
tions are all quand questions (‘when’ questions) and the wh-object questions are
all qui questions (‘who(m)’ questions).
Even though a very fine-grained control of constructions is often undesirable
for a number of syntactic corpus queries, here we need to further match the con-
structions in the corpus with the test sentences of the acceptability judgment.
To this end, we have split up wh-adjunct questions into three categories: wh-
190 Aria Adli

reason questions, wh-time questions and other wh-adjunct questions. Each cat-
egory was further subdivided into questions with (phonologically lighter) simple
wh-words (e. g. pourquoi ‘why’, quand ‘when’, où ‘where’, comment ‘how’) and
(phonologically heavier) discourse-linked and/or prepositional wh-expressions
(e. g. pour quelle raison ‘for which reason’, dans quelle pièce ‘in which room’, de
quelle manière ‘which way’). The separation by these categories was motivated
as follows: First, wh-time questions match the test sentences of the acceptability
judgments. Furthermore, there are good reasons to believe that time adjuncts are
placed higher in the syntactic tree than many other adjuncts (e. g. manner, place)
(see e. g. Rigau 2002, who adjoins the former to IP and the latter to VP). Second,
wh-reason adjuncts show a particular behavior in many languages: For example,
only wh-reason questions allow preverbal subjects (as opposed to unmarked
postverbal subjects) in all Spanish varieties (Torrego 1984; Gutiérrez-Bravo 2006;
Adli 2010b). Stepanov and Tsai (2008) argue in a cross-linguistic study that wh-
reason (and wh-purpose) questions differ from other wh-adjuncts by their very
high position in the tree. They place them in a high layer of the CP-system. Third,
all other wh-adjuncts remain aggregated in order to minimize problems of data
scarcity.
With regard to wh-objects, we distinguished (as with wh-adjuncts) between
simple wh-words and D-linked and/or prepositional wh-expressions. In addition,
we distinguished between [+human] and [-human] wh-objects. Please note that
the fine-grained analysis of wh-objects is based on the data of a subsample of
N=48 speakers – unlike the rest of our quantitative analyses which builds on the
results of 101 speakers: Animacy of referential expressions was later added to the
annotations, but only for roughly half of the sample. Nevertheless, 48 speakers
still represent a sufficient sample size. Moreover, we compare further arithmetic
means of individual relative frequencies below, thus the different sample sizes
are also unproblematic from this technical perspective. We find 196 instances of
[-humain] wh-object questions within this subsample. However, [+human] wh-
objects (e. g. qui ‘who’ or quelle personne ‘which person’) only occur three times
and cannot be analyzed due to data scarcity.4 The extreme frequency discrepancy
between [+human] and [-human] wh-object questions might be – for reasons not
yet fully understood – a general yet surprising property of spontaneous speech:
our findings are in line with the distribution in the Ottawa-Hull corpus, where
que/quoi ‘what’ were identified by Elsig (2009: 157) 434 times compared to only
14 instances of qui ‘who(m)’.

4 We would like to add that wh-indirect questions would not be analyzable either because they
occur only three times in the entire corpus
Acceptability and frequency in syntactic variation 191

The fine-grained analysis outlined above allows us to construct an exact


match between the test sentences of the acceptability judgments and the data
from spontaneous speech with regard to wh-adjunct questions (with quand).
However, it does not allow a match with regard to wh-object questions because of
the scarcity of [+human] wh-objects in the corpus.
Relative frequencies according to (10b) are calculated based on all eight sub-
types within a category, i. e. four word orders and two weights of the wh-element.5
For example, many wh-time questions can, at least theoretically, be expressed by
one of the four word orders and either by quand or a complex wh-expression.6

4.5 Results based on the fine-grained corpus queries

Table 2 reveals all in all 37 wh-reason questions, most of them with pourquoi.
We also count 171 wh-time questions, most of which are realized by non-simple
forms (e. g. à quelle heure ‘at which hour’). Furthermore, there are 221 other wh-
adjuncts: The most frequent one in this category is the manner adjunct comment
‘how’, the second most frequent one is the place adjunct où (‘where’). There are
also some non-simple wh-elements, mostly place (e. g. à quel étage, dans quelle
pièce, dans quelle domaine) and manner adjuncts (e. g. de quelle manière, dans
quelles circonstances, en quels termes).

Table 2: Relative and absolute frequencies of different types of wh-adjunct and wh-object
questions

wh- wh- wh-time: wh-time, wh-VP- wh-VP- wh dir. wh dir.


reason: reason, quand +DL/PP adjuncts, adjuncts, objects, objects,
pourquoi +DL/PP –DL +DL/PP –human: –human,
que/quoi +DL

wh-in- 0.03 (5) 0.01 (1) 0.12 (24) 0.43 (94) 0.41 (94) 0.1 (29) 0.26 (54) 0.22 (42)
situ
wh-ESQ 0.03 (3) 0 (0) 0.04 (11) 0.01 (3) 0.02 (7) 0 (0) 0.35 (79) 0 (1)
whSV 0.15 (19) 0.02 (3) 0 (1) 0.1 (26) 0.28 (73) 0.02 (7) 0 (0) 0.06 (13)
whVS 0.01 (5) 0.01 (1) 0 (0) 0.03 (10) 0.03 (6) 0.02 (3) 0.01 (3) 0.03 (4)

5 The rows for the SVwhO, wh-cleft and wh-in-situ cleft variants can be disregarded: With just
four tokens across all wh-types, their relative frequencies are mostly 0.00 (in two cells they are
0.01).
6 This reasoning in terms of true variation of course does not apply to the category of other wh-
adjuncts. The envelope in this case is more approximate.
192 Aria Adli

The coarse-grained analysis in Figure 4 above reveals that the wh-in-situ order is
the most frequent variant for both wh-adjunct and wh-object questions. However,
the fine-grained analysis in Figure 5 now exhibits two constructions with a differ-
ent pattern, having a dispreference for wh-in-situ: (i) the wh-REASON question
with pourquoi is preferred with the WhSV form and (ii), the wh-object question
with que/quoi is preferred with the wh-ESQ form.

Figure 5: Relative frequencies of different types of wh-adjunct and wh-object questions

wh-REASON: pourquoi wh-REASON: +DL or PP


0.50
wh-TIME: quand wh-TIME, +DL or PP
0.45 wh-VP-adjuncts, –DL wh-VP-adjuncts, +DL or PP
0.40 wh dir. objects, –human: que/quoi wh dir. objects, –human, +DL
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
wh-in-situ wh-ESQ whSV whVS

With regard to (i) we notice a clear dispreference of the reason-adjunct pourquoi


for the wh-in-situ order. This finding can be seen as corpus-linguistic evidence in
support of the claim made by Stepanov and Tsai (2008) according to which rea-
son-why is base-generated in a very high CP-position. Under their assumption
we should not find any instance of in-situ reason-why, yet Table 2 contains some
sporadic cases. A new inspection of these sporadic cases reveals that all except
for one (et ils étaient louches pourquoi? ‘and why were they indecent?’) are ambig-
uous between a reason and purpose interpretation (e. g. t’es montée pourquoi?
‘why did you go up?’). Thus, the data seem to support the authors’ claim that
reason-why, unlike purpose-why, cannot occur in-situ.
With regard to (ii), we can first state that the high relative frequency of wh-
object questions with wh-ESQ order in Figure 4 is not due to questions with D-
linked wh-elements but to questions with monosyllabic que (recall that Modern
French displays a complementary distribution in wh-object questions with quoi
occurring in-situ and que in all other positions). The est-ce que question particle
occurs mostly with que, which is in line with Elsig (2009) and Druetta (2002,
2003, 2008: 127). Although the absolute frequencies in Table 2 also point at a few
cases of (simple) wh-adjuncts with est-ce que, their relative frequencies remain
minimal. When a variable has variants with relative frequencies close to zero and
at the same time other variants with much higher values, the near-zero-variants
Acceptability and frequency in syntactic variation 193

might be instable phenomena, in which case a “small push” could bring them to
extinction. We conclude that est-ce que is limited to simple, i. e. phonologically
light, mostly monosyllabic wh-forms: The only clearly stable wh-construction
occurring with est-ce que is the wh-object question with que. In the course of its
grammaticalization since its appearance in the sixteenth century (Foulet 1921:
265), est-ce que has lost the meaning of the source construction in wh-questions
(Druetta 2003): Est-ce que wh-questions are no longer emphatic. This source
meaning only remains – to a minor extent – in present-day yes/no questions
(Mosegaard-Hansen 2001: 471). In wh-questions, est-ce que has thus become a
neutral, redundant interrogative particle (redundant because the interrogative
feature is already expressed by the wh-element). The fact that est-ce que is clearly
limited to que in modern spoken French suggests a further change: Est-ce que now
primarily functions as a morpho-phonological host for the wh-clitic que. Inter-
estingly, que differs from all other French wh-words in that it is not an indepen-
dent word but a proclitic requiring a host (Poletto and Pollock 2004): It can either
cliticize to a verb (qu’as-tu dit ‘what have you said’) – however, the whVS order
is, as we assume, not part of colloquial Modern French grammar. Alternatively, it
can cliticize to est-ce que – which is thus the only remaining option for fronting
que in the colloquial variety.
Finally, we notice that the mismatch between acceptability and frequency
for the whSV order shown in Figure 4 would be less pronounced if we took into
account the relative frequency of quand questions (see Figure 5). The patterns of
the very low relative frequency of quand and the low relative frequency of wh-
objects are similar to the acceptability values of whSV wh-adjunct and wh-object
questions: These judgment values are rather low within the overall picture of all
wh-variants. Furthermore, wh-adjunct questions have a slightly lower accepta-
bility value than wh-object questions in Figure 4, i. e. the directionality of the
acceptability-frequency mismatch between argument and adjunct questions dis-
appears if we restrict ourselves to quand questions in the corpus.

5 Discussion

Having discussed the frequency of wh-ESQ questions in spontaneous speech,


we will now compare their relative frequencies with their acceptability ratings.
Please recall that the test sentences for the acceptability judgment test either
contain the wh-object qui ‘who(m)’ as in (2a) or the wh-adjunct quand ‘when’ as
in (2b). Although wh-ESQ questions with [+human] wh-objects or wh-adjuncts
are extremely scarce or do not occur at all in usage (Table 2), they score high
in acceptability (Figure 3). While the role of est-ce que is essentially limited to
194 Aria Adli

being a clitization host for que in colloquial French, it remains a broadly avail-
able, optional interrogative particle in wh-questions in standard French. It is
interesting to note – also as an anecdote on standard French prescriptivism –
that est-ce que was only approved by the Académie Française in the 1930s – to
be “disapproved” again in 1987 (Grevisse 1993: 605/606). Nowadays, the wh-ESQ
question is neither considered elegant nor “popular” from a normative point of
view. Rather, it can be described as neutral. In Section 4.3, this register neutrality
has been represented as the coexistence of both a [+formal] and a [+colloquial]
feature. That being said, how does the normative influence act on the accepta-
bility judgments?
First, the fact that neutral wh-ESQ, like formal whVS, scores highest in
acceptability indicates that normative influence or bias on judgments does not
seem to act as a bonus for the standard variety, but rather as a malus/cost for con-
structions that are [+colloquial] only. Interestingly, colloquial wh-in-situ is not
among the constructions that scored highest, either.
The acceptability judgments on the highly formal complex inversion ques-
tion – which is not discussed in detail in the present study because it does not
belong to the envelope of variation – also support this assumption. wh-object
questions with complex inversion such as (6b) received a mean acceptability
score of 0.8, and wh-adjunct questions with complex inversion such as (6a)
received a score of 0.94 – yet they hardly ever occur (see Table 1). A compari-
son with the other ratings in Figure 4 reveals that (6a) receives a relatively high
score – irrespective of its particularly high level of formality. Importantly, norma-
tive influence does not explain any categorical difference in terms of acceptabil-
ity vs. unacceptability, but it is one of the factors behind the systematic nuances
within the range of acceptable constructions.
Second, we can observe a difference in the span of registers reflected by
acceptability and frequency data. While frequency data from spontaneous
speech (excluding highly formal speech contexts) provide insight into colloquial
language, acceptability data reflect the entire range of registers available to
a speaker. The results suggest that it is difficult for speakers to judge a variant
that does not exist in register x as unacceptable, as long as it exists in register y.
In other words, speakers seem to accept a construction if it belongs to any reg-
ister of their language. However, based on the present results it is still unclear
whether this phenomenon only occurs if register y is higher than register x, i. e.
whether speakers are only unable to disregard those constructions with a higher
stylistic value. The effect of register spanning in judgments is at least one part of
the explanation as to why certain constructions hardly occur although they are
rated as acceptable. It might also offer a diagnostic tool to distinguish between
diglossia and bilingualism. Bilingualism would be characterized by a better
Acceptability and frequency in syntactic variation 195

capacity to keep the languages apart when performing acceptability judgments.


From this point of view, French speakers are diglossic rather than bilingual.
We can thus conclude that acceptability judgments cover a broad range of
registers but are at the same time tinted by norm. They are tinted more or less,
depending on the weight of such norms and the plainness of diglossia in a speech
community. The results of the present study also suggest that the analogy often
drawn between acceptability judgments and the laws of perception of psycho-
physical stimuli (Bard et al. 1996) could be a myth and too much of an idealization.
Many linguists would agree that norms weigh rather more than less in France
today, where the “ideology of the standard […] is specially vigorous” (Gadet 2007:
27). Given that normative influences are a sociolinguistic phenomenon, a straight-
forward follow-up question for future research is whether or not this influence is
subject to social variation; if so, this would mean that the relation between accept-
ability and frequency as such is subject to social variation. It has been shown that
acceptability judgments can reflect systematic social differences (Adli 2013). In
statistical terminology, this hypothesis would entail that the specific form of inter-
action between acceptability and frequency is socially dependent.
Ironically, acceptability judgments, which have long had a bad reputation in
sociolinguistics, offer an interesting yet unexploited potential for sociolinguis-
tic studies: They contribute to determining the envelope of variation in syntax
(by taking into account acceptable but scarce constructions). Furthermore, they
can help to identify normative influences. A promising path for both sociolinguis-
tic and syntactic research is to work with both types of data, combining forces,
and carefully laying out which aspects each type of data can and cannot reveal.
Finally, the combination of acceptability and frequency can help to analyze syn-
tactic change in progress more precisely: This approach can help to identify con-
structions that are no longer in use, but still exist in a non-colloquial variety of
the speech community.

References
Adli, Aria (2004): Grammatische Variation und Sozialstruktur. Berlin: Akademie Verlag.
Adli, Aria (2006): French wh-in-situ Questions and Syntactic Optionality: Evidence from three
data types. Zeitschrift für Sprachwissenschaft 25(2): 163–203.
Adli, Aria (2010a): Constraint Cumulativity and Gradience: Wh-Scrambling in Persian. Lingua
120(9): 2259–2294.
Adli, Aria (2010b): The Semantic Role of the Wh-Element and Subject Position in Spanish and
Catalan. STUF – Language Typology and Universals 63(2): 103–117.
Adli, Aria (2011a): Gradient Acceptability and Frequency Effects in Information Structure: a
quantitative study on Spanish, Catalan, and Persian. Habilitationsschrift, Universität
Freiburg.
196 Aria Adli

Adli, Aria (2011b): A Heuristic Mathematical Approach for Modeling Constraint Cumulativity:
Contrastive Focus in Spanish and Catalan. The Linguistic Review 28(2): 111–173.
Adli, Aria (2011c): On the Relation between Acceptability and Frequency. In: Esther Rinke and
Tanja Kupisch (eds.), The development of grammar: language acquisition and diachronic
change – In honour of Jürgen M. Meisel, 383–404. Amsterdam/New York: John Benjamins.
Adli, Aria (2013): Syntactic Variation in French Wh-Questions: a quantitative study from the
angle of Bourdieu’s sociocultural theory. Linguistics 51(3): 473–515.
Auger, Julie (1994): Pronominal Clitics in Quebec Colloquial French: a Morphological Analysis.
PhD dissertation, University of Pennsylvania.
Ayres-Bennett, Wendy (1994): Negative Evidence: Or Another Look at the Non-Use of Negative
ne in Seventeenth-Century French. French Studies 48: 63–85.
Baayen, R. Harald, Cristina Burani and Robert Schreuder (1997): Effects of semantic
markedness in the processing of regular nominal singulars and plurals in Italian. In: Geert
Booij and Jaap van Marle (eds.), Yearbook of Morphology 1996, 13–33. Dordrecht: Kluwer.
Backus, Ad and Maria Mos (2011): Islands of (Im)Productivity in Corpus Data and Acceptability
Judgments: Contrasting Two Potentiality Constructions in Dutch. In: Doris Schönefeld
(ed.), Converging Evidence: Methodological and Theoretical Issues for Linguistic Research,
165–192. Amsterdam: John Benjamins.
Bader, Markus and Jana Häussler (2010): Toward a model of grammaticality judgments. Journal
of Linguistics 46(2): 273–330.
Bard, Ellen Gurman, Dan Robertson and Antonella Sorace (1996): Magnitude Estimation of
Linguistic Acceptability. Language 72(1): 32–68.
Behnstedt, Peter (1973): Viens-tu? Est-ce que tu viens? Tu viens ? Formen und Strukturen des
direkten Fragesatzes im Französischen. Tübingen: Narr.
Bybee, Joan L. and David Eddington (2006): A Usage-based Approach to Spanish Verbs of
‘Becoming’. Language 82(2): 323–355.
Chomsky, Noam (1965): Aspects of the Theory of Syntax. Cambridge: MIT Press.
Coveney, Aidan B. (1996): Variability in Spoken French. A Sociolinguistic Study of Interrogation
and Negation. Exeter: Elm Bank.
Coveney, Aidan and Laurie Dekhissi (2013): Variation dans l’emploi des interrogatives partielles
dans le cinéma de banlieue. Paper presented at ‘La syntaxe des interrogatives’, Neuchâtel.
Crocker, Matthew and Frank Keller (2006): Probabilistic grammars as models of gradience. In:
Gisbert Fanselow, Caroline Fery, Ralf Vogel and Matthias Schlesewsky (eds.), Gradience in
Grammar, 227–245. Oxford: Oxford University Press.
Culbertson, Jennifer (2010): Convergent evidence for categorial change in French: From subject
clitic to agreement marker. Language 86(1): 85–132.
Dekhissi, Laurie (in prep.): Un nouveau français populaire? PhD dissertation, University of
Exeter.
Déprez, Viviane, Kristen Syrett and Shigeto Kawahara (2013): The interaction of syntax,
prosody, and discourse in licensing French wh-in-situ questions. Lingua 124: 4–19.
Drijkoningen, Frank and Brigitte Kampers-Manhe (2008): On inversions and the interpretation
of subjects in French. Probus 20(2): 147–209.
Druetta, Ruggero (2002): Qu’est-ce tu fais? État d’avancement de la grammaticalisation de
est-ce que. Première partie. Linguae 2: 67–88.
Druetta, Ruggero (2003): Qu’est-ce tu fais? État d’avancement de la grammaticalisation de
est-ce que. Deuxième partie. Linguae 1: 21–35.
Druetta, Ruggero (2008): La question en français parlé : étude distributionnelle. Torino:
Trauben Edizioni.
Acceptability and frequency in syntactic variation 197

Dufter, Andreas (2008): On explaining the rise of c’est-clefts in French. In: Ulrich Detges and
Richard Waltereit (eds.), The Paradox of Grammatical Change: Perspectives from Romance,
31–56. Amsterdam: John Benjamins.
Elsig, Martin (2009): Grammatical Variation across Space and Time – The French Interrogative
System. Amsterdam/Philadelphia: John Benjamins.
Escandell-Vidal, Victoria (2002): Echo-syntax and metarepresentations. Lingua 112: 871–900.
Featherston, Sam (2005): The Decathlon Model: Design features for an empirical syntax. In:
Stephan Kepser and Marga Reis (eds.), Linguistic Evidence: Empirical, Theoretical, and
Computational Perspectives, 187–208. Berlin/New York: Mouton de Gruyter.
Foster, Jennifer (2007): Real bad grammar: Realistic grammatical description with grammat-
icality. Corpus Linguistics & Linguistic Theory 3(1): 73–86.
Foulet, Lucien (1921): Comment ont évolué les formes de l’interrogation. Romania 47: 243–348.
Freyd, Max (1923): The graphic rating scale. Journal of Educational Psychology 14: 83–102.
Funke, Frederik (2010): Internet-Based Measurement With Visual Analogue Scales: An
Experimental Investigation. PhD dissertation, Universität Tübingen.
Gadet, Françoise (2007): La variation sociale en français (2ème édition). Paris: Ophrys.
Greenbaum, Sidney (1976): Syntactic Frequency and Acceptability. Lingua 40: 99–113.
Greenbaum, Sidney (1977): Judgments of Syntactic Acceptability and Frequency. Studia
Linguistica: Revue de Linguistique Generale et Comparee/Journal of General and
Comparative Linguistics 31: 83–105.
Greenberg, Joseph H. (1966): Language universals: with special reference to feature
hierarchies. The Hague: Mouton.
Grevisse, Maurice (1993): Le bon usage: grammaire française. Paris: Duculot.
Gutiérrez-Bravo, Rodrigo (2006): Structural Markedness and Syntactic Structure: A Study of
Word Order and the Left Periphery in Mexican Spanish. New York/London: Routledge.
Hamlaoui, Fatima (2011): On the role of phonology and discourse in Francilian French
wh-questions. Journal of Linguistics 47(01): 129–162.
Haspelmath, Martin (2006): Against Markedness (and What to Replace It With). Journal of
Linguistics 42(1): 25–70.
Kayne, Richard S. and Jean-Yves Pollock (1978): Stylistic inversion, successive cyclicity and
move-NP in French. Linguistic Inquiry 9: 595–621.
Kempen, Gerard and Karin Harbusch (2005): The Relationship between Grammaticality Ratings
and Corpus Frequencies: A Case Study into Word Order Variability in the Midfield of
German Clauses. In: Stephan Kepser and Marga Reis (eds.), Linguistic Evidence: Empirical,
Theoretical and Computational Perspectives, 329–349. Berlin/New York: Mouton de
Gruyter.
Kempen, Gerard and Karin Harbusch (2008): Comparing Linguistic Judgments and Corpus
Frequencies as Windows on Grammatical Competence: A Study of Argument Linearization
in German Clauses. In: Anita Steube (ed.), The Discourse Potential of Underspecified
Structures, 179–192. Berlin/New York: de Gruyter.
Labov, William (1982): Building on empirical foundations. In: Winfred P. Lehmann and Yakov
Malkiel (eds.), Perspectives on Historical Linguistics, 17–92. Amsterdam/Philadelphia:
John Benjamins.
Labov, William (1996): When Intuitions fail. In: Lisa McNair, Kora Singer, Lise M. Dolbrin and
Micgelle M. Aucon (eds.), Papers from the Parasession on Theory and Data in Linguistics.
Chicago Linguistic Society 32, 77–106. Chicago: Chicago Linguistic Society.
198 Aria Adli

Lambrecht, Knud (2001): A framework for the analysis of cleft constructions. Linguistics 39(3):
463–516.
Manning, Christopher D. (2003): Probabilistic Syntax. In: Rens Bod, Jennifer Hay and Stefanie
Jannedy (eds.), Probabilistic Linguistics, 289–341. Cambridge: MIT Press.
Meisel, Jürgen M., Martin Elsig and Matthias Bonnesen (2011): Delayed Acquisition of Grammar
in First Language Development: Subject-Verb Inversion and Subject Clitics in French
Interrogatives. Linguistic Approaches to Bilingualism 1(4): 347–390.
Mosegaard-Hansen, Maj-Britt (2001): Syntax in interaction: Form and function of yes/no
interrogatives in spoken standard French. Studies in Language 25(3): 463–520.
Newmeyer, Frederick J. (2010): Accounting for rare typological features in formal syntax: three
strategies and some general remarks. In: Jan Wohlgemuth and Michael Cysouw (eds.),
Rethinking Universals, 195–221. Berlin/New York: Mouton de Gruyter.
Poletto, Cecilia and Jean-Yves Pollock (2004): On wh-clitics and wh-doubling in French and
some North Eastern Italian dialects. Probus 16(2): 241–272.
Poplack, Shana (1989): The care and handling of a mega-corpus. In: Ralph Fasold and Deborah
Schiffrin (eds.), Language Change and Variation, 411–451. Amsterdam: John Benjamins.
Prieto, Pilar and Gemma Rigau (2007): The Syntax-Prosody Interface: Catalan interrogative
sentences headed by que. Journal of Portuguese Linguistics 6(2): 29–59.
Pullum, Geoffrey K. (2007): Ungrammaticality, rarity, and corpus use. Corpus Linguistics &
Linguistic Theory 3(1): 33–47.
Reis, Marga (1991): Echo-w-Sätze und Echo-w-Fragen. In: Marga Reis and Inger Rosengren
(eds.), Fragesätze und Fragen. Referate anlässlich der Jahrestagung der Deutschen
Gesellschaft für Sprachwissenschaft, Saarbrücken 1990, 49–76. Tübingen: Niemeyer.
Rigau, Gemma (2002): Els complements adjunts. In: Joan Solà, Maria-Rosa Lloret, Joan Mascaró
and Manuel Pérez-Saldanya (eds.), Gramàtica del català contemporani, 2045–2110.
Barcelona: Empúries.
Rijkhoff, Jan (2010): Rara and grammatical theory. In: Jan Wohlgemuth and Michael Cysouw
(eds.), Rethinking Universals, 223–239. Berlin/New York: Mouton de Gruyter.
Sampson, Geoffrey R. (2007): Grammar without grammaticality. Corpus Linguistics & Linguistic
Theory 3(1): 1–32.
Sobin, Nicholas (2010): Echo Questions in the Minimalist Program. Linguistic Inquiry 41(1):
131–148.
Stefanowitsch, Anatol (2008): Negative entrenchment: A usage-based approach to negative
evidence. Cognitive Linguistics 19(3): 513–531.
Stepanov, Arthur and Wei-Tien Dylan Tsai (2008): Cartography and licensing of wh- adjuncts: a
cross-linguistic perspective. Natural Language & Linguistic Theory 26(3).
Torrego, Esther (1984): On inversion in Spanish and some of its effects. Linguistic Inquiry 15(1):
103–129.
Trubetzkoy, Nikolaus (1939): Grundzüge der Phonologie. Göttingen: Vandenhoeck & Ruprecht.
Zribi-Hertz, Anne (2010): Pour un modèle diglossique de description du français: quelques
implications théoriques, didactiques et méthodologiques. Journal of French Language
Studies FirstView: 1–26.
Zubizarreta, Maria Luisa and Jean-Roger Vergnaud (2005): Phrasal Stress, Focus, and Syntax.
In: Martin Everaert and Henk van Riemsdijk (eds.), The Syntax Companion, Vol. 3, 522–568.
Malden: Blackwell.
Acceptability and frequency in syntactic variation 199

Appendix: Experimental material of the gradient acceptability


judgment test

reference sentence: Tous ont regardé qui ?

wh-adjunct wh-object

WH-in-situ Tu allumes le feu quand ? Tu emmènes qui en vacances ?


(1a)/(1b): Tu nettoies la cuisine quand ? Tu reçois qui dans ton bureau ?
Tu enlèves ton pansement quand ? Tu félicites qui à la cérémonie ?

WH-ESQ Quand est-ce que tu rends ton livre ? Qui est-ce que tu rejoins à la piscine ?
(2a)/(2b) Quand est-ce que tu récupères ta Qui est-ce que tu amènes à la maison ?
voiture ? Qui est-ce que tu invites au cinéma ?
Quand est-ce que tu prends ton
médicament ?

WHSV Quand tu finis ton projet ? Qui tu sers à table ?


(3a)/(3b) Quand tu achètes le vélo ? Qui tu accueilles chez toi ?
Quand tu signes le contrat ? Qui tu soutiens aux élections ?

WHVSclit Quand manges-tu le gâteau ? Qui vois-tu cet après-midi ?


(4a)/(4b): Quand ouvres-tu le cadeau ? Qui attends-tu chaque lundi ?
Quand peints-tu la façade ? Qui remplaces-tu demain ?

SVWHO Tu continues quand le repassage ?


(5) Tu jettes quand la poubelle ?
Tu évalues quand les résultats ?

WH-cleft Quand c’est que tu remplis le Qui c’est que tu entends dans ce hall ?
(8a)/(8b) formulaire ? Qui c’est que tu conduis à l’aéroport ?
Quand c’est que tu écris ton livre ? Qui c’est que tu déranges à la biblio­thèque ?
Quand c’est que tu répares la moto ?

items from the instruction phase items from the training phase
Qui ont-ils tous regardé ? Qui c’est que tu accompagnes à la gare ?
Qui tous ont regardé ? Que contrôle quel douanier à la frontière ?
Tous ont regardé qui ? Tu fermes la porte quand ?
Tous sont regardés qui ? Dis-moi, Jérémie a pas balayé quoi ?
Que sont-ils tous regardés qui ?
Part 3: Grammar, evolution, and diachrony
Hubert Haider, University of Salzburg
“Intelligent design” of grammars – a result
of cognitive evolution1

Abstract: It is astounding how closely the short history of linguistics replicates


intellectual hurdles that evolutionary biology had to overcome in its “childhood”.
Until the end of the 19th century, the major divide in biological theorizing was
the divide between structuralism and functionalism. The historical arguments are
strikingly parallel to present day disputes between linguistic functionalists and
generative structuralists. The terminology and argumentation is nearly identical.
This is not surprising since the basic problem is identical, too, namely the ques-
tion of how to explain the “intelligent design” of complex, well-adapted, self-rep-
licating systems in interaction with their environment. In biology, the dispute
has never been resolved. In hindsight we understand that the intellectual combat
between functionalists and structuralists could not have been won by either
party because neither party was right. The dispute turned out to be completely
irrelevant after Darwin’s theory of evolution gained ground. This theory – adap-
tation as an emergent property of natural selection – explained how form and
function are entangled. However, it is not “form in order to function” but “form
that happens to function”. Darwin (1871: 59) had already realized that evolution is
not substance-bound and that there is a parallel between the biological evolution
of species and the diachronic development of languages in terms of adaptation
as a consequence of random variation and non-random but “blind” selection. It
is undeniable that human language grammars are adaptive, but this is neither a
product of biological evolution nor of social engineering. It is the result of evo-
lution on the level of cognitive, self-replicating systems. Grammars are neither
merely biologically grounded nor merely human artefacts. They are results of
an on-going process of cognitive evolution. Linguistics has not arrived at firm
scientific grounds yet. Functionalists (“form follows function”) rival with struc-
turalists (autonomy of structures; innate programs). The functionalist schools
tend to down-play the strong formal system boundaries while the structuralist

1 I am very much indebted to Fritz Newmeyer and Jon Ringen for a careful and critical reading of
a draft version of this paper (previous title: “Cognitive evolution – why language systems are …”).
Their criticism, encouragement and suggestions have been very instrumental. Thanks galore to
Göz Kaufmann for re-checking the final version. Any remaining shortcomings are to be blamed
on the author, of course.
204 Hubert Haider

schools diligently de-emphasize the role of adaptive properties (in acquisition,


production and perception) in language design. Neither of the two qualities must
be ignored, however, since they are indispensable for an understanding of the
properties of natural language grammar systems. The main claim of this paper
is the following: Evolution by natural selection is substance-independent, as
Darwin (1871: 59) already saw. A process of evolution is at work not only for bio-
logical organisms but also for cognitive “organisms”. In other words, the adaptive
properties of grammars are a consequence of cognitive evolution in the variation
+ selection game of a (substance-neutral) Darwinian evolution, with the auto-
nomous processing capacities of the brain as the selection filter for variants of lin-
guistic structures (and in the long run, variants of the grammar) in the variation
pool of a given language. Cognitive evolution is evolution on the level of cognitive
structures (rather than on the level of biological structures, viz. the genome in
biological evolution). Grammars are self-replicating systems (that replicate them-
selves in the course of grammar acquisition). This process is prone to generate
variants (“mutations”). The variants “compete” for restricted resources (viz. the
number of brains that are “infected” by a given grammar variant, whose “ease” of
processing during acquisition and use is the major selection factor). Adaptation is
the product of constant but “blind” selection. Biological and cognitive evolution
are identical in terms of the abstract processes (self-replication, variation plus
selection), but they clearly differ in terms of their domains of application. In sum,
the descent of species and the descent of languages encompass the same abstract
mechanism (self-replication, variation, selection) in two different domains.
Darwin has opened our eyes for the domain of biology, but he was also aware of
the fact that the mechanism of evolution is applicable to many other domains as
well. One of these domains is the domain of grammars as cognitive organisms
residing in our brains. The basic issue of the dissent between linguistic function-
alists and (“transcendental”) structuralists today is the same as in nineteenth
century biology, namely the explanation of adaptive design. The path leading to
the correct answer is unnecessarily blocked by the same fallacies that are found
in the corresponding dispute in biology.
“Intelligent design” of grammars – a result of cognitive evolution 205

1 Introduction

We always have been functionalists, from ancient times2 until today: Why do sea-
dwelling mammals have fins for limbs? – In water, fins are more useful than legs.
Why do languages employ acoustic signs? – In order to be independent of sight
contact. Why are we fond of functional explanations? – Because we are social
animals whose minds apparently have a predominant disposition for analyzing
complex situations in terms of actors, intentions and purposes.
Functionalism is a deeply entrenched and instinctively attractive common
sense perspective on complex design. The appropriate scientific perspective is
intuitively less accessible, as revealed in the history of science. A fairly recently
lost bastion of functionalism is life science. An initially entirely functionalist
standpoint was given up for a less anthropocentric but more explanatory account,
namely adaptation by evolution. The functionalist predisposition is character-
istic of our explanation-seeking mind but we must not project this on the object
of our scientific enquiry. Our understanding is functionalist; the ontology of the
objects of enquiry is not.
Linguistics is faced with the very same problem that Darwin solved. The
basic question was and is this: What explains functional design in the absence
of a designer? In Dawkins’ (1996) words, the watchmaker in evolution is blind,
but his products are working aptly. There is “intelligent” design, but there is no
intelligence that designed it. This reads like a paradox and the anti-Darwinian
camps regarded this as a fatal defect of Darwin’s idea of evolution as fed by
random variation. Intuitively, order out of random variation seems to contradict
experience and the second law of thermodynamics. How could random processes
produce order rather than chaos? What this intuition overlooks is this: Variation
(“mutation”) indeed enhances entropy, but selection is the antagonistic feature.
It eliminates most of the variation. “Natural selection acts as a sieve; it does not
single out the best variations, but it simply destroys the larger number of those
which are, from some cause or another, unfit for their present environment” (De
Vries 1909: 70).
If order and complexity emerge without an organizing ordering force, this
is a result of evolution. The order parameters that happen to emerge are a non-
random result of selection processes (see Heylighen (1999) on emerging com-

2 Why do celestial bodies move? In Aristotle’s view, the source of movement of the outer sphere
that triggers the movements of all inner spheres is not the causal “unmoved movent” as defined
in Physics (VIII). It is “aspiration” [sic] as a final cause (causa finalis); see Metaphysics 1074b, 34.
206 Hubert Haider

plexity). What we perceive as problem solving is an emergent property and not a


collectively engineered solution, as functionalism would have it.
The nineteenth century teleologists – i. e. functionalists – regarded adapta-
tion and the fit of the organisms to their environment as the primary issue of
research. Their opponents, the morphologists – i. e. structuralists – regarded
commonalities of structure as the primary issue, thereby rejecting the centrality
of adaptive properties. (Amundson 1998: 154). “The most important and wide-
spread biological debate around the time of Darwin was not evolution versus
creation, but biological functionalism versus structuralism” (Amundson 1998:
153). “The primary theoretical goal […] was the explanation of biological form. […]
The continental biologists favoured structural explanations, the British favoured
functional explanations. Functional facts seemed concrete and empirical to the
British, and in comparison the continental structuralist theories (positing hypo-
thetically-inferred unities) seemed transcendental”3 (Amundson 1998: 171).4
In our linguistic context, the functionalist conviction reads as follows.
Language structures (i. e. forms determined by grammar) are the way they are
because this is a function of requirements of language use. A structuralist, on
the other hand, would deny that language structure could be explained solely by
reference to the functions for which the structures are employed. For a structural-
ist, the particular properties of grammars cannot be reduced to conventionalized
routines of usage.
Newmeyer (1998, 2001, 2005 and elsewhere) distinguishes two kinds of
functionalist approaches, namely atomistic vs. holistic functionalism. Accord-
ing to the stricter conviction, namely atomistic functionalism, there is always a
direct causal link between functional motivations and particular properties of
grammars (Newmeyer 2005: 174). Holistic functionalism, on the other hand, is
not committed to any such direct linkage but locates the subtle influence of the
former on the latter in language use and acquisition. Its shaping force becomes

3 “Major transcendentalist figures include Johann Wolfgang von Goethe, Etienne Geoffroy St.-
Hilaire, Louis Agassiz, and Richard Owen. Each advocated the primacy of structure or form over
function, of the unity of type over the conditions of existence”. (Amundson 1998: 155; emphasis
mine)
4 “Many functionalists see rejection of generative theory as a fundamental component of func-
tionalism”. (Dryer 2007:245) Mainstream generative grammar with its numerous hypothetical,
hidden unities employed (e. g. overt and covert movement, covert lowering, remnant movement,
roll-up movement, strong or weak features, overt or covert checking of features, etc.) is a good ex-
amples of (transcendental) structural explanation attempts. With or without these transcenden-
tal (i. e. empirically untestable) amendments, an account in terms of usage-independent princi-
ples that determine structure and form would not be accepted as explanation by functionalists.
“Intelligent design” of grammars – a result of cognitive evolution 207

visible in language change (Newmeyer 2005: 175). For Newmeyer, atomistic func-
tionalism does not pass thorough empirical testing. He opts for the explanatorily
weaker theoretical position, namely holistic functionalism.
Dryer (2007: 247) objects to Newmeyer’s compromising withdrawal position
as it “seems to exclude an intermediate position, that in at least certain cases, a
property of a particular grammar is directly motivated by some functional con-
sideration, but that the locus of this functional explanation was at the level of
historical change. Such a position seems to be a coherent one and is likely to
be widely held by functionalists. […] In other words, Newmeyer’s characteriza-
tion obscures the distinction between two different issues; that is, it conflates
the questions whether there is a direct link between functional explanation and
grammatical properties and whether the locus of functional explanation is at the
level of historical change or somewhere else (such as at the level of language
usage)”.
What is meant by a direct link is a causal relationship between a functional
aspect and a grammatical property in a functional explication of the grammati-
cal property. Obviously, Dryer is willing to partially accept the stronger version,
i. e. atomistic functionalism, and considers it scientifically and empirically
appropriate.
Givón (undated: 7) is equally categorical on this issue, favoring atomistic
functionalism when he refers to “roughly isomorphic matching”: “The process of
change itself, the invisible teleological hand that guides the ever-shifting but still
roughly-isomorphic matching of structures and functions, is driven by adaptive
selection, i. e. by functional-adaptive pressures”.
Givón (undated: 1) presents Aristotle as the founding father of functional-
ism: “Aristotle outlined the governing principle of functionalism, the isomorphic
mapping between form and function”. He explicitly refers to, and emphasizes,
the teleological hand and functional-adaptive pressures. But, in reality, there is
no pressure and no pressure generating device, and there is no teleological hand,
as will be argued below.
The essential drawbacks of this kind of functionalism are exactly these two
leading ideas, namely the “invisible hand” that designs more adaptive forms and
the “pressure” on improving functionality by adopting these forms. An expla-
nation based on the notion – future functionality drives present changes – is
invalid, however. This insight was established in biology more than a century
ago. Functionalist “explanations” are appealing narratives that please our func-
tion-addicted mind, but these narratives are what they are, namely narratives
rather than scientific explanations.
The scientifically correct core of these narratives is the purpose free process
of evolution. There is no teleology, no final causes (i. e. in the sense of causa
208 Hubert Haider

finalis), and no pressure; there is merely variation and selection from constantly
being exposed to a given environment. Darwin’s break through is a scientific
theory of adaptation without any functionalist narratives, a theory of evolution
with purpose-free random variation and purpose-free non-random selection as
major components. What appears to be functionality driven is but the emergent
effect of adaptation by selection. Final causes are not part of this system; they are
merely part of the perceptual filter through which many researchers perceive their
simplified world.
Of course, when biologists tell popular short-cut versions of examples of
evolution, they often talk as if they are telling a functionalist narrative, but this is
merely a façon de parler. They implicitly understand that a functionalist render-
ing is easier to grasp. The basic story, however, is a causal explanation in terms of
variation and selection. In a profound study on Darwin and adaptive design, Ruse
(2003) analyzes the pitfalls of our commonsensical desire for functional under-
standing of complex adaptive design. Our common sense is creationist; it prefers
an engineering perspective on design and our favorite approach for understand-
ing it is (invalid) functionalist reasoning.
Darwin expelled the “argument from design” from biology. His theory does
not have adaptive design built in as a premise; it emerges when evolution does
its work. Evolution presupposes an independently structured system that is rep-
licating. This is the structural side.5 Adaptation by selection covers the apparent
functionalist side of evolutionary developments.
This paper contends that in linguistics, we foster the same fallacies as biol-
ogists did more than a century ago. The point is not structuralism vs. function-
alisms. It is “form meets function by means of evolution”. For grammar theory,
this mode of explanation is novel. But of course it is not novel at all, since it is the
standard mode of explanation in biology. The orthogonal viewpoints – structure-
geared vs. function-driven – are wrong if maintained in isolation or as opponent
positions. It is the synthesis in the concept of cognitive evolution that does justice
to the correct insights of each of the competing standpoints, without their respec-
tive drawbacks.
Here is a non-linguistic example to start with: The rhinovirus that success-
fully recruited me as a host organism follows no teleology and is not pushed by
any functional-adaptive pressure. It just happens to be a virus variant that my
immune system is not prepared for. It might have successfully blocked other vari-

5 Here is a simple example. Flying has been ‘invented’ several times in several distinct forms,
e. g. bats, birds, bumblebees, etc. In each case, the predecessor structures originally had served
different functions (e. g. thermoregulation).
“Intelligent design” of grammars – a result of cognitive evolution 209

ants, but not this one. The success of the rhinovirus family is its variability. This
feature guarantees its ability to regularly successfully infect a host and spread.
If one describes this as an armament race between the “attacking” virus and the
“defending” immune system, its description will be a functionalist narrative. This
narrative is easy to grasp but misleading in a crucial respect.
It suggests a causal functional relation that is not causal at all: Indeed, the
virus changes its appearance quickly and regularly, but not in order to be able to
outfox the immune defense and it is under no functional pressure to do so. The
functionality is not programmed in, it is emergent. If it did not have the property
of changing frequently it would not be a successful rhinovirus. In other words,
what we see is adaptation, and what we misapprehend by overinterpreting it is a
conjectured functional causality, which, according to Darwin is the exact oppo-
site of a functionalist narrative. There is adaptivity without teleology. The virus
does not change in order to spread; it spreads, because it changes.
The rhinovirus is successful and its recipe for success is rapid change. It is
this property which proves successful in the selection process that we interpret
as an armament race between virus propagation and virus elimination by the
potential hosts’ immune systems. The immune system, on the other hand, is the
cause for the particular property of the virus, since the virus is adapting to it due
to selection. Our immune system spurs the rhinovirus into becoming the kind of
virus it is now. Neither the virus nor my immune system anticipate each other.
Although this is an accurate account, it does not fit into our functionalist
narrative scheme. In this scheme, there is always an agent that is coerced into
doing this or that in order to gain this or that. In these narratives there is no role
for an immune system that is as it is (due to an independent line of evolution)
and a virus that changes frequently. Neither the immune system nor the virus
would qualify as teleological agents, but their interaction can often be described
as if they were. If a virus is said to be “under functional pressure” to change, our
commonsensical understanding of functional connections is happy with it, but
it is merely an appealing metaphor. If the virus did not change, it would not have
the chance to spread, and this would be the end of the story. What we see is a
virus population that spreads. Hence it happens to have this property, but not “in
order to” spread. Crucially, our functionalist tunnel vision does not perceive the
numerous other virus populations that lost their chance to spread because they
were sieved out. The “in order to”-supposition is the superfluous and misleading
ingredient we add. If the virus did not have the property of being able to outwit
human immune reactions by constantly changing, it would not spread. Hence,
our supposition is that it changes in order to be able to spread. This is the mis-
attribution of a teleological component to a complex situation, simply because it
is easiest for our common-sense understanding of adaptivity.
210 Hubert Haider

Our functional narrative is one of progress-seeking problem-solving changes.


It claims that A changed into, or was replaced by B because B is a functionally
superior structure and a successful step on the stair case leading to better solu-
tions. This is a misperception, however. If we have two types, A and B, and B is
selectively favored over some time period, this does not have to do with how well
B is doing in absolute terms, but with how well it is doing in relation to A (Mill-
stein 2006: 648). The change may in fact be a step leading to an impasse. Both
A and B may be inferior to C, and A may be closer to C than B, but C happens to
not belong to the set of variants at this time in this population. The diachrony of
grammar changes is full of suboptimal solutions that are local maxima, that is,
the particular solution implemented for a subsystem of grammar is suboptimal
for the complete ensemble of grammar. This is what evolution predicts. It is “bri-
colage” (i. e. tinkering) in Jacques Monod’s (1971) words.
Our willingness to accept functional narratives as scientific accounts reveals
our profound misunderstanding of the true causality of adaptivity, namely blind
variation, as in the family of Picornaviridae (i. e. the family of viruses of which
the rhinoviruses are a sub-branch), and blind selection as in the case of viruses
that are killed by our immune systems. The emergent property is a virus with a
complex adaptive “behavior”. It did not adapt by functional pressure in a tele-
ological conspiration; it is simply the variant that is not eliminated. It has not
been sieved out, in De Vries’ wording above. In retrospect, it is the best-adapted
variant, and only in retrospect we can tell our functional narratives: Had the virus
not changed in this or that sequence, it would not have prevailed. And linguis-
tics is full of functionalist narratives, too. You can easily identify them by this
recurrent shibboleth: If a given language did not have property A, it could not
function in the way B. In the following section, I shall argue that functionalist
explanations cannot be regarded as satisfactory scientific explanations. A func-
tionalist perspective on grammar theory may be a useful heuristics, as it is in
biology, but it does not qualify as a full-fledged scientific explanation. A function-
alist explanation is basically a flawed explanation since it presupposes backward
causation. It must be replaced by the empirically adequate notion of adaption by
evolution. This domain of evolution will be argued to be cognitive evolution.
This paper advances and elaborates on ideas that were published first in
Haider (1998) and Haider (2001) and adduced for explaining adaptive design of
grammars in Haider (2013). It is organized as follows: Section 2 reviews the fact
that for functional explanations there is no logically valid deduction scheme.
It has been acknowledged in theory of science since the 1950s that functional
explanations do not qualify as full-fledged scientific explanations and are not on
a par with causal explanations. Section 3 explicates the parallels between cogni-
tive evolution and Darwinian biological evolution and illustrates cognitive evo-
“Intelligent design” of grammars – a result of cognitive evolution 211

lution, with linguistic examples in section 4. Section 5 briefly examines attempts


at compromising between a Darwinian and a functionalist Lamarckian approach
to language and grammar change. A summary and outlook section reviews the
presented assumptions and some of their implications.

2 Functionalist explanations miss the benchmark

Haider (1998) addressed the issue with the intentionally ambiguous title: “Form
follows function fails”. This should be read as “form follows function” fails as
opposed to “form follows, function fails”. The combined message is this: The idea
that form follows from function is going to fail since form follows, but functions
may fail. Purpose or potential for future use does not explain a design, except
for intentionally designed tools. For self-replicating systems with “intelligent
design”, functions may be used to describe them but not to explain the causality
of their design. This has become a commonplace in the theory of science since
the classical work of Hempel and Nagel (Cummins 1975: 742). Functional explana-
tions are not causal explanations.

Either a property P of a system S is taken to be necessarily present to guarantee a function


F, then this premise is empirically incorrect. F could be guaranteed by alternative means (cf.
Nagel 1961: 403). Or, the presence of P is taken to be a sufficient condition for F, then the
inference from the function F to the necessary presence of P in S is not valid. All we may
infer is that the presence of F contributes to a function. (Hempel 1959: 310)

The grammars of natural languages are good sources for examples of alternative
means of implementing identical functions. For instance, “parts of speech” may
be identified by morphological means (affixes), particles, or word order. Infor-
mation structure properties, like focusing, may be coded by particles or word
order or both, or merely by intonation. In each case, an identification function is
implemented in different ways by different forms.
Why is a functional explanation not causal? In a causal explanation, we
hypothesize that a cause C produces an effect E. Whenever C applies, we expect E
(ceteris paribus). The inference from C to E is valid.6 On the other hand, from the
absence of E we infer the absence of C. Finally, if C applies and E is absent, the
particular causal explanation is wrong. In sum, we regard C as an explanation
for E. In a functional explanation we hypothesize that a system contains a func-
tionally characterized item F in order to produce a result R. Here, we regard R as

6 It is reasoning by modus ponens: [C → E] and C, hence E.


212 Hubert Haider

the cause for F. But in this case, R cannot be a causal explanation for F (because
this would amount to backward causation). The inference from R to F, given R, is
not valid. And there are indeed cases in which R is given, but F is not (because
R may be the effect of F´≠ F). We cannot readily falsify a functional explanation,
either. Modus tollens7 cannot be applied: If we correctly describe a given result R
and hypothesize a function F as the prerequisite for R, and it turns out that our
predicted F is empirically wrong, this has no consequence for R.8 Of course, the
inference in the other direction, viz. from F to R, would be the familiar causal
explanation of R as the effect of F, but its functionalist inverse is not. For func-
tional reasoning there is no logically valid mode of inferencing. Cummins (1975:
765), who tries to defend some version of functional analysis, offers this as his
conclusion:

When a capacity of a containing system is appropriately explained by analyzing it into


a number of other capacities whose programmed exercise yields a manifestation of the
analyzed capacity, the analyzing capacities emerge as functions. Since the appropriateness
of this sort of explanatory strategy is a matter of degree, so is the appropriateness of
function-ascribing statements.

Let us be more concrete and analyze a linguistic example. A functionalist expla-


nation of “extraposition” usually refers to the increased ease of processing for
structures with material extraposed. In numerous languages, phrases may be
optionally placed at the very end of the containing phrase or clause.9 Evidently,
this enhances efficient parsing of otherwise center-embedded structures by
reducing the working memory load.
A functional explanation postulates that “extraposition” is the active adap-
tation of the grammar by the users for the benefit of parsing; cf. Hawkins’ (1994)
EIC = “early immediate constituents” measure as a functionalist trigger scenario.
The functionalist explanation of extraposition is this functionality in usage.
Grammars are said to provide extraposition “in order to” avoid center embedding
since this is well known to impede parsing. The language users are said to have
shaped this property of their grammar for the indicated functionality.

7 [[R → F] & ¬ F], hence ¬ R.


8 [[[F → R] & R], hence F] is an invalid inference, and so is [[[F → R] & ¬F], hence ¬R].
9 Example: [The city [that the flood [that Hurricane Katrina had triggered] destroyed]] has been
partly rebuilt.
⇒ The city that the flood destroyed that Hurricane Katrina had triggered has been partly rebuilt.
⇒ The city has been partly rebuilt that the flood destroyed that Hurricane Katrina had triggered.
“Intelligent design” of grammars – a result of cognitive evolution 213

A formal account of extraposition does not deny its functionality, but the
functionality is an epiphenomenon. The primary issue is explaining how a
grammar with extraposition differs from a grammar without extraposition.
The explanation of the extraposition phenomenon is the grammar device that
accounts for it. This is a causal explanation. Grammars that provide this device
will produce extraposition structures as a result. The grammar property is the
cause for extraposition as a language phenomenon. “Functionality” does not
play a role in the explanation. The issue of grammar-parser fit is a higher-order
question. It is relevant only when we compare grammars that provide extraposi-
tion with grammars that do not. The adaptivity of grammars is best accounted for
in a theory that does not assume a direct causal relationship between a process-
ing effect and a formal detail of a grammar. However, ease of processing plays an
important role in language change, and language change is part of the cognitive
evolution of grammars.
As described above, a functional explanation is not cogent. First, there are
“strict” OV languages that do not admit “extraposition” and have existed for
millennia (e. g. Japanese). This is a flat contradiction for a direct functional grip
on grammar. An even greater embarrassment for a functional explanation is the
fact that there are languages with extraposition (e. g. German, Dutch) that do not
allow extraposing a class of items in spite of their functional similarity to extra-
posable items:

(1) a. Ist [die Erklärung, [die uns hier von ihm angeboten wurde]] wirklich richtig?
‘is [the explanation [that (to)us here by him offered was]] indeed correct’

b. Ist [die [uns hier von ihm angebotene] Erklärung] wirklich richtig?
‘is [the [(to)us here by him offered] explanation] indeed correct?’

c. Ist [die Erklärung] wirklich richtig [die uns hier von ihm angeboten wurde]?

d. *Ist [die Erklärung] wirklich richtig [uns hier von ihm angebotene]?

e. *Ist [die Erklärung [uns hier von ihm angebotene]] wirklich richtig?

The relative clause in (1a) may be extraposed as in (1c), but the complex parti-
cipial modifier in (1b), which is functionally (i. e. discourse functionally) equiv-
alent to a relative clause, must not be extraposed, neither to the end of the clause
(1d) nor to the end of the NP (1e). Obviously, it is the grammar that regulates
extraposition and reducing center embedding does not dictate the particular
grammar design. Note that the participial attribute in (1b) is on a left branch in
the nominal phrase, in between the determiner and the head noun. Therefore it
is a much greater obstacle for the parser than the post-nominal relative clause in
(1a). However, extraposition is ruled out for (1b).
214 Hubert Haider

Second, in extraposing languages, even items may be extraposed that would


not pose any problem for parsing.10 On the other hand, center-embedding may be
avoided11 by “alternative means” (cf. Nagel 1961: 403), that is, paraphrases.
So, ease of parsing is neither a necessary nor a sufficient condition for a causal
explanation of extraposition. It is not a necessary condition because there are
strict OV languages that do not allow extraposition, and it is not a sufficient con-
dition either since items may be extraposed that do not matter for ease of parsing.
Extraposition is a system’s potential of a subset of human language grammars
that is utilized for, but not explained by, enhancing the ease of parsing.
In general, whatever system of grammar we as humans had at our disposal,
we would have to use it willy-nilly, irrespective of its user(un)friendliness, simply
because there would be no alternative. But it is very plausible that nevertheless
there would have soon appeared correlations between forms and contexts of
usage. Crucially, these correlations are post hoc. Consequently, for natural lan-
guages, we (may) find fairly stable correlations between structure and use. But
use is not the causal factor for shaping structure. Tooby and Cosmides (1990:
761–62) are quite explicit in this respect:

“[…] the only scientifically coherent account for the origin of any complexly organized
functionality is (ultimately) evolution by natural selection. […] All (non-chance)
functionality is ultimately attributable to the operation of adaptations, that is, naturally
selected innate aspects of the cognitive architecture. Cognitive science and the adaptationist
branches of biology are natural intellectual companions” (Tooby and Cosmides 1990: 761).
“It is magical thinking to believe that the “need” to solve a problem automatically endows
one with the equipment to solve it. For this reason, the invocation of social and practical
“needs”, pragmatic factors and acquired heuristics, or “functionalist” hypotheses to
explain language acquisition need to be reformulated in explicitly nativist terms”. (Tooby
and Cosmides 1990: 762)

The latter statement is correct, but too specific in one point, namely the reference
to “explicitly nativist terms”. This is an unnecessary restriction, and not a very
plausible one. It is far from clear that there was time enough in the biological
evolution of human brains to become endowed with a rich enough innate lan-
guage faculty as a consequence of biological brain evolution. All our language
capacities are parasitic on previously evolved capacities of the brain. The human

10 Example: Er hat nicht damit gerechnet ⇒ Er hat nicht gerechnet damit. (‘He did not reckon
with-it’). ‘Damit’ (‘with-it’) is a single (compound) word. Both variants are equivalent under com-
plexity measures.
11 Example: The destroyed city has been partly rebuilt. It was destroyed by a flood triggered by
Hurricane Katrina.
“Intelligent design” of grammars – a result of cognitive evolution 215

innovation is not so much one in terms of newly acquired brain mechanism but
rather in terms of evolutionarily improved brain capacities that already existed
and of more efficiently crosslinking them (see Rauschecker and Scott 2009).
Tooby and Cosmides think in terms of an obvious dichotomy for biologists,
namely invalid functionalism vs. valid evolutionary structuralism. In biology, a
structure is usually innately determined (by the genome). In our case of gram-
mars as cognitive information structures, the structure is a cognitive “organism”
that utilizes organically determined structures (i. e. our brain functions). There-
fore, the phenotype (i. e. language) is not exclusively dependent on innate qual-
ities. Cognitive evolution is an evolution that is principally independent of its
biological implementation and substrate.
Biological evolution on the genetic level is not the exclusive source of adap-
tive functionality.12 In the case of cognitive capacities, natural selection gets a
chance to operate on cognitive programs and their representations. The theory of
evolution as developed by Darwin is principally substance-neutral, although it
was developed and explicated as a theory of explaining the “origin of species by
means of natural selection” (Darwin 1859). Basically, all it requires is a replicating
system that produces enough variants that are constantly exposed to selection.
Biological selection (which is based on the reproductive success of the
phenotype) could not explain the intricate and biologically irrelevant grammar-
internal details of languages. Nevertheless, the idea of a piece-by-piece biological
evolution of grammar has been ventured by Pinker and Bloom (1990).13 However,
evolutionary success in biological evolution must be translated into reproductive
success. It is hard to see what an accidental change in a cognitively encapsu-
lated system of formal operations for symbol recombination could contribute to
the reproductive success of those whose brain supports the change compared to
those whose brain does not:14

12 “As a causal theory natural selection locates the causally relevant differences that lead to
differential reproduction. These differences are differences in organisms’ fitness to their environ-
ment. Or, more fully, they are differences in various organismic capacities to survive and re-
produce in their environment”. (Stanford Encyclopaedia of Philosophy <http://plato.stanford.
edu/entries/natural-selection/>)
13 In the early days, (Friedrich) Max Müller tried to make the strongest possible point against
an all-encompassing concept of evolution. He emphasized the impossibility of the biological
evolution of language as a strong argument against Darwin’s theory of evolution: “Language is
the Rubicon which divides man from beast, and no animal will ever cross it [...] The science of
language will yet enable us to withstand the extreme theories of the Darwinians, and to draw a
hard and fast line between man and brute”. (Müller 1862: lecture IX)
14 See Bierwisch (2000) for a detailed discussion of the conundra and paradoxa of attempts to
216 Hubert Haider

For everyday life purposes of language use (including cognitive operations


on propositionally structured knowledge representations) a much more primitive
system of grammar seems to be flexible enough a language tool for all the pur-
poses of the hard life of (the predecessors of) stone-age humans. The luxury of
grammar systems of natural languages is by far underdetermined by the func-
tionality of use. Replace “evolutionary” by “functionalist” in the first line of the
following quote and it becomes an anti-functionalist argument, too. In fact, it is
strong counter evidence that has been underestimated. The overshooting com-
plexities of grammars are not captured by functionalist approaches and of course
they are not captured by biological evolution, as Premack (1985: 30) stresses:

Human language is an embarrassment for evolutionary theory because it is vastly more


powerful than one can account for in terms of selective fitness. A semantic language with
simple mapping rules, of a kind one might suppose that the chimpanzee would have,
appears to confer all the advantages one normally associates with discussions of mastodon
hunting or the like. For discussions of that kind, syntactic classes, structure dependent
rules, recursion and the rest, are overly powerful devices, absurdly so.

Why are grammars luxurious15 and diverse? They are luxurious because the neural
substrate freely provides processing capacities for this luxury. What appears to be
a superfluous complexity is but the costless exploitation of the system’s poten-
tial of the human brain that happens to be available for free. The “programmer”
of this potential is not an “invisible hand” and it is not a society-based net of
communicative needs.16 It is an ongoing process of cognitive evolution. Just like

explain the emergence of language as a direct product of biological evolution.


15 See Haider (2001) for an arbitrary and merely illustrative list of communicatively immaterial
details:
i) a language with(out) fronting of finite verbs (cf. Germanic vs. Romance languages vs. strict
OV languages)
ii) a language with(out) a case system consisting of subsystems that correlate with the inflec-
tion class of the verb (e. g. Georgian Nom-Acc and Nom-Ergative system vs. languages with a
plain case system)
iii) a language with(out) clitic pronouns (e. g. Romance vs. Germanic languages)
iv) a language with(out) gender agreement (in the article system: English vs. Dutch vs. German)
v) a language with(out) negative concord (cf. standard German vs. Russian)
vi) a language with(out) multiple fronting of wh-expressions (e. g. Slavic vs. Romance and Ger-
manic).
16 Here is an example of this claim: “Since communities are defined by shared practice, and
human beings engage in a great variety of joint actions with different groups of people, the com-
munity structure of human society is extremely complex. […] As a consequence, a language as a
population is equally complex”. (Croft 2013: 107)
“Intelligent design” of grammars – a result of cognitive evolution 217

biological evolution produces luxurious organisms – fantastically colored but-


terflies or fish populations in coral reefs, to name just two instances – cognitive
evolution produces luxurious systems of grammar. Their luxury may sometimes
even hamper acquisition or usage and gets cut back in the course of diachronic
changes (e. g. the loss of the system of rich inflectional morphology of Latin, both
in terms of the inventory and the categories, as a result of the changes that lead to
present-day Romance successor languages).
Functional linguists who insist on a form-function isomorphism grounded
in social interaction must feel at a loss when confronted with grammar systems
that provide a much more elaborate system. “The nature of language follows from
its role in social interaction” (cf. Beckner et al. 2009: 3) is a bewildering dogma,
given the fact that the “role in social interaction” could be perfectly fulfilled by
much simpler systems. Take for instance relatively new languages, viz. creole lan-
guages. In terms of their morphological inventories they are much simpler than
many present-day Indo-European languages whose diachronic development is
reconstructed for a period of roughly up to three millennia, such as Icelandic.17 It
differs greatly from present day Norwegian varieties in morphology. In terms of
morpho-syntactic inventories, Norwegian is much closer to the “impoverished”
system of creole languages while Icelandic has conserved the grammatical mor-
phology of Old Norse. It is difficult to see how one could prove that this follows
from their different “roles in social interaction” on the Norwegian or Icelandic
coasts.
The merely rhetorical reference to the “role in social interaction” is a regular
ingredient of functional narratives. It is less illuminating than a claim that the
nature of the shape of dogs follows from their role in social interaction. There are
big Blood-hounds and small Toy Poodles and this reflects their different func-
tions in society. This is true, but the reason is that there were breeders that inten-
tionally selected certain properties for these breeds. For languages, there are no
breeders. Any human language can serve in social interactions, and the purposes
of social interaction do not cover the overall complexities in the grammar systems
that determine these languages.
In sum, a functional analysis of the inventory and processes of the gram-
mars of human languages may describe aspects of their functionality, but it does
not explain it. You may correctly describe the functionality of the human eye
as a component of visual perception, but you cannot explain it in terms of this
functionality. An instructive example is the structural anatomy of the human eye,

17 The great-grand-mother of Icelandic and the modern Norwegian varieties is Old Norse. The
Icelandic settlers were Norwegians from North-West Norway.
218 Hubert Haider

and in fact the vertebrate eye. It suffers from an evident “constructional” defect.
Unlike the octopus (cephalopod) eye, its wiring design is the result of “tinkering”
design in Monod’s (1971) diction. The nerves approach the retinal cells from the
side at which the light arrives. The smarter “engineer” of the octopus eye correctly
placed the nerves on the dark sides of the cells. As a consequence of this design,
the human eye has a blind spot (scotoma).18
Functional reasoning may account for the advantage of (some) vision, but it
cannot explain the structures that enable vision and how they developed. Anal-
ogously, functional analysis may classify linguistic structures in terms of their
contexts of use, but this does not explain how they developed and why exactly
these structures are used and not others that would serve the same function even
better.

3 U
 niform theory of evolution – different fields of application:
Biological or cognitive

3.1 The background

Already in 1871, Darwin pointed out that the theory of evolution is not substance-
dependent and consequently, the developments in languages appear to be par-
allel to biological evolution in terms of adaptation and “struggle for life” as a
consequence of variation and selection. In this publication on human physical
and cultural characteristics, evolution of culture and differences between sexes,
to name but a few topics, Darwin (1871: 59) made it clear that his theory of evolu-
tion is substance-neutral: “The formation of different languages and of distinct
species, and the proofs that both have been developed through a gradual process,
are curiously parallel. […] The survival or preservation of certain favored words in
the struggle for existence is natural selection”.
Intriguingly, linguists in those days (and in fact today, if we do not count
metaphorical allusions), did not take this eye-opener seriously.19 Instead, some
linguists attacked Darwin precisely on linguistic grounds, in complete misjudg-
ment of the nature of the problem (see Müller’s fierce attack, quoted in footnote

18 There is a blind spot at the back of each eye at the place where the optic nerve passes through
the eyeball since in this region there is no room for receptor cells. The brain computationally
eliminates the blind spot and we are not aware of it.
19 But biologists of today do. See Fitch (2007) for quantitative relationships between how
frequently a word is used and how rapidly it changes over time.
“Intelligent design” of grammars – a result of cognitive evolution 219

13). In hindsight, this is understandable. In the second half of the nineteenth


century, linguistics was not in command of a concept of grammar as a cognitively
real knowledge system, nor of an understanding of how this knowledge system
is structured, acquired and put to use. Linguistics was mainly concerned with
developing a method for comparing ancient languages, particularly the (lexical
as well as inflectional) morphology and the sound systems of Indo-European
languages.20
Darwin had to convince his audience that the mainstream opinions of that
time were wrong. The leading idea was the theory of Lamarck, a functionalist
theory.21 Organisms would actively adapt to their environments in order to prevail
and pass on these adaptions to their successors (i. e. by heredity of acquired
traits). According to Gould (2002), the first one of his ideas was “l’influence des
circonstances”, an adaptive force by which the use or disuse of traits led organ-
isms to become more adapted to their environment. This would make organisms
adapted to their environment, taking them sideways off the path from simple to
complex. The second idea was “le pouvoir de la vie” as a complexifying force.
This was thought to drive organisms from simple to complex forms. Movements
of fluids would allegedly “etch out” organs from tissues and lead to more and
more complex constructions regardless of the organ’s use or disuse. The third and
crucial ingredient was the equally wrong idea that an organism can impart on its
offspring characteristics that it acquired during its lifetime.
Darwin’s theory of evolution by natural selection, as summarized by Mayr
(1991: chapter 4) and Gould (2002), consists of five independent sub-theories.
Here is a brief summary, with an appended outline of some linguistic implica-
tions for each sub-theory.

i. Evolution as such: The objects of the theory are not seen as constant or recently created nor
perpetually cycling, but rather as steadily changing. Organisms transform over time.
Linguistics: Grammars of languages are steadily changing, if not impeded by normative
efforts (schooling, script culture, etc.). Changes are not cycling but follow drifts. Acquisition
and language contact are the primary sources of grammar change. Another source of vari-
ation is the drive for linguistic in-group differentiation.

ii. Common descent: This theory states that each group of organisms descended from a
common ancestor, and that all groups of organisms, including animals, plants, and micro-
organisms, ultimately can be traced back to a single origin of life on earth.

20 It was a thoroughly history-minded and text-focused science that was gradually growing out
of philological disciplines. Ironically, a corner stone of Indo-European studies tuned out to be
identical with one of Darwin’s sub-theories, namely the theory of common descent.
21 See de Vogelaer (2007) for a confrontation in his explication of a particular grammar change.
220 Hubert Haider

Linguistics: Indo-European studies are a success story that illustrates this point. Languages
that descended from a single proto-language have spread as far as to Iceland in the North-
West and to the province of Xinjiang in China’s North-West (Tocharic). Today these languages
are different beyond superficial recognition, but they all are descendants of a single “mother
language”, with a research depth of about 3500 years.

iii. Multiplication of species: This theory explains the origin of the enormous organic diversity.
It postulates that species multiply, either by splitting into daughter species or by “budding”,
that is, by the establishment of founder populations that evolve into new species, if geo-
graphically isolated.
Linguistics: “Species” and “subspecies” translate as “language” and “varieties of a lan-
guage”. Latin, or more precisely its regional varieties, is the ancestor of a number of languages
(= species).22 What biologists call “budding”, is dialect split in language change. Geographic
isolation and “cross-fertilization” (i. e. language contact in bilingual brains) are catalysts for
“budding”.

iv. Gradualism: According to this theory, evolutionary change takes place through the gradual
change of populations and not by the sudden (saltational) production of new individuals
that represent a new type.
In linguistics, again, this is commonplace. Languages change over generations. Language
change is gradual, usually taking several generations and time spans of centuries. Changes
typically develop out of communities with dialectal variants co-existing for a long time.

v. Natural selection: According to this theory, evolutionary change comes about through the
proliferation of genetic variation in every generation. The individuals who thrive thanks
to a particularly well-adapted combination of inheritable characters give rise to the next
generation.

This last sub-theory is the crucial point. Linguists who would subscribe to i.–
iv. would not simultaneously assume natural selection to be the mechanism of
language change and the emergence of new species (= languages). What would
it mean that “individuals” survive and become the founding individuals of a new
“species” of language? All we have to do here is to step back and rethink the
analogies carefully. Of course it is not a question of survival and reproductive
success on the level of the human phenotype. However, there is an exact parallel
to biological evolution on a different and relevant level which has been hitherto
overlooked. It is the level of the cognitive evolution of cognitive representations
of replicating cognitive algorithms, namely grammars.

22 As linguists we know that there are many more descendant languages of Latin than merely
the ‘official’ Romance languages and the already extinct ones (like Dalmatian), from Sicilian,
Neapolitan, Istriot, to Friulian and Piemontese, to name just a few languages on the Apennine
peninsula.
“Intelligent design” of grammars – a result of cognitive evolution 221

Evolution is adaptation by selection operating on a pool of variants. For


grammars as cognitive systems, the selector is the (child’s) processing brain that
must acquire the grammar of a given language merely by being exposed to the
language. Like in biological evolution, the winner is the grammar variant of the
given language that “infests” more brains than the other competing variants. The
winning variant multiplies itself more often. And just like in biological evolution,
the emergent result is an accumulation of adaptive qualities.
What are adaptive qualities? The adaptive quality, as in biological evolution,
is a quality that becomes effective in the environment of the system/organism.
The environment of grammars is language acquisition and processing. Grammar
is the “computational program” of a procedural knowledge system that the brain
employs for language processing, both in reception and in production. As in
biology, selection becomes a crucial issue once there is “competition” for limited
resources.
A limited resource is, for instance, the amount of processing time needed
for a given structure (Haider 1997). This is a limited resource for reception. The
production process may take as much time as it needs, but a listener’s brain must
be finished with processing when the next utterance arrives. If “extraposition”
saves processing time, the competing grammar that allows “extraposition” is
likely to gain a selective advantage. However, just as in biology, the need by itself
does not create the grammar variant it would prefer. An organism may remain
in the same form in its environment for any amount of time if the environment
does not change too much and no rival variant arrives that competes for the avail-
able resources. In other words, ease of processing cannot turn Japanese into an
“extraposing” language as long as Japanese does not face a gradual grammar
“mutation” that allows extraposition and is able to spread. Adaptation is a ques-
tion of scope. The higher the rate of mutation, the higher the potential for adap-
tive changes (see the rhino-virus example).
A well-known mutation-prone situation for grammar changes23 is language
contact with extensive bilingualism (cf. Heath 1984; Winford 2003). Take for
instance modern Persian. Many of the grammar changes that separate modern
Persian from its kin languages like Pashto occurred after the Arab conquest with
ensuing Persian-Arabic bilingualism, accompanied by the introduction of Arabic
script for Persian.
Finally, the selecting filter of the processing brain is responsible for the com-
plete absence of many simple, easy to understand potential rules of grammar

23 A parallel in biology would be the direct influencing of a genome, for instance by ionizing
radiation.
222 Hubert Haider

across languages. For instance, there is no language that employs word-by-word


mirroring, i. e. mirror-image inversion, as a grammatical rule.24 The reason for
this is clear. Our brain does not support this type of symbol manipulation. Hence,
no language employs this operation for coding a grammatical rule. It is easy
to program on a computer but very hard to perform with our own “wet”-ware
software.

3.2 Grammars as a result of cognitive evolution

Let us now try to answer the relevant questions: Which entities are selected and
how does selection work in the case of language and grammar? We have to make
clear the “what-is-what” in terms of a theory of evolution, namely what the vehicle
is, what the replicators are, what the interactors are and who is benefitting. The
minimally necessary background for applying these notions is easily accessible
thanks to Lloyd’s (2012) online contribution on units and levels of selection that
the following exposition takes advantage of:
Dawkins (1978) introduced “replicator” and “vehicle” to stand for different
roles in the evolutionary process. “Vehicle”, is defined as “any relatively discrete
entity which houses replicators, and which can be regarded as a machine pro-
grammed to preserve and propagate the replicators that ride inside it” (Dawkins
1982b: 295). According to Dawkins (1982a: 62), most replicators’ phenotypic
effects are represented in vehicles, which are themselves the proximate targets of
natural selection. The term replicator, modified by Hull (1980), is used to refer to
any entity of which copies are made.
An “interactor” (Hull 1980: 318) denotes an entity which interacts, as a cohe-
sive whole, directly with its environment in such a way that replication is differ-
ential – in other words, an entity on which selection acts directly. The process of
evolution by natural selection is “a process in which the differential extinction
and proliferation of interactors cause the differential perpetuation of the replica-
tors that produced them” (Brandon 1982: 317–318).

24 For example: i. Auch sie lachte (declarative) ii. Lachte sie auch? (interrogative)
‘also she laughed’ ‘laughed she also?’
In this simple German example (which can be replicated in any of the Germanic V2-languages),
ii. as a yes-no question is the word-by-word mirror image of i., but no child would wrongly jump
to the conclusion that questioning means mirroring the declarative order. No processing routine
of the brain would support this operation required by a rule of grammar.
“Intelligent design” of grammars – a result of cognitive evolution 223

Next, let us clarify what the corresponding referents are in the domain of cog-
nitive evolution with respect to language. Let us start with “grammar”, regarded
as a cognitively represented program for processing a given language. It is the
program that our language processing capacity for the given language is based on.
The grammar is the “replicator”. It is the entity that is replicated by language acqui-
sition based on productions of the grammar (utterances in the given language).
The replicator is the grammar of a language understood as an information
structure. The parallel to biology is very close. The genome is the information
structure that governs the make-up of an organism. Grammar is the information
structure that governs the operations of the language usage system (in acqui-
sition, production and reception). The grammar of language is the system that
determines most properties of the utterances of a speaker of that language.
Next we have to identify the “interactor”. The interactor is the language as a
population of utterances. More precisely, it is the set of utterances the language
users produce and the set of utterances that are the input for language acquisition
by an individual. In other words, the grammatical properties of utterances that
serve as input for language acquisition are the basis for acquiring the grammar
that is responsible for the make-up of these utterances.
We now turn to the “vehicle”. It is the cognitive representation of the grammar
in the individual speaker’s brain. It is the cognitive “software” system that
enables us to produce and understand the language we have acquired. Impor-
tantly, we have to distinguish between grammar as an information structure and
its cognitive representation in the brain. Grammar as an information structure
is a cognitive virus, and the brain is the host. Like any virus, it needs the host
for reproduction: The cognitive grammar guides the brain in language process-
ing and acquisition. The produced language is the input for language acquisition
which carries the cognitive virus into the next language-acquiring brain.
Finally, we have to identify the selection environment. Remember that natural
selection acts directly on interactors and is the “process in which the differential
extinction and proliferation of interactors cause the differential perpetuation of
the replicators that produced them” (Hull 1980: 318).
The replicators are the grammars of a language in the population of users of
the particular language. It is not “the” grammar since the speech community of a
given language typically is not completely homogenous. There is always variation
and the variation corresponds with a set of grammar variants that differ mini-
mally. Selection has an effect on the different perpetuation of the replicators that
produced them. This is cognitive selection. Some grammar variants win, while
some loose and become extinct.
What is responsible for this selection? It is the system of brain functions
that are recruited for language processing. Let me clarify this with an analogy of
224 Hubert Haider

pro­cessing functions on a computer: Grammar is the language processing soft-


ware. It is implemented as a software package on a (bio-)computer with general
processing properties that are largely independent of the needs of the particular
software package. The software package recruits and employs the general system
architecture for its functioning and it houses the grammar. The grammar is the
replicator; the software package is the vehicle.
Selection becomes operative when the “software package” for the given lan-
guage gradually is implemented during language acquisition. Language acqui-
sition is not an unconstrained trial-and-error geared learning. It is guided by
narrow constraints that follow from the particular combination of recruited brain
functions for language processing. The language software design is constrained
by the room the general system provides for such a system.25
It is this interaction that constitutes the selection environment. If in first lan-
guage acquisition there are competing grammar variants for a given set of utter-
ances, the winning grammar will most likely be the variant that is more easily
accessible for the learning system. It will win the “competition” for becoming
implemented. Since the set of utterances that acquisition is operating on is fed
by speakers whose grammars are not completely identical, there will always be
variation. This pool of variants (pool of interactors) is the biotope for natural
selection during language acquisition.
Finally, who is benefitting? The benefit goes to the replicator that wins. It is
the grammar that becomes implemented in a brain and thereby gets the chance
to replicate again. Crucially, it is not the language user who benefits. For me as a
language user it does not matter whether the grammar that is implemented in my
brain is computationally more or less demanding than an alternative grammar I
could have acquired. It is there and I am using it. The “competition” for becoming
my grammar is not influenced by my intentions. The competition is a process I
am never aware of. It is encapsulated in the operations of my learning capacity
that organizes the neuro-cognitive build-up of the grammar system based on the
language sample it is confronted with.
And how does the implemented grammar obtain the benefit? It has passed
through the sieve of selection while other grammar variants did not. It passed
because of a property that turned out to be less resource consuming or easier
to acquire for the containing system than a corresponding property of the alter-

25 Note that at this level of abstraction it is not essential to decide whether there exists a do-
main-specific restriction on possible grammars (i. e. UG). It may exist or it may merely be the
reflection of just those restrictions that the neuro-cognitive architecture recruited for language
processing and imposes on the kind of ‘cognitive programs’ it is able to support.
“Intelligent design” of grammars – a result of cognitive evolution 225

native grammar variant. The selecting system reacts passively; it merely is the
sieve. It does not actively influence the grammar package. Crucially it does not
restructure the grammar system – during or after acquisition – by improving its
compatibility with the general system properties. Things that work better gain
admittance, while things that do not work that easily are likely to be rejected
during acquisition.

4 Functional efficiency without functionalism

The degree of functional efficiency is a function of available variation. Only if


selection can get hold of a variant that enhances efficiency, it is possible for the
language to change. No variation, no change. Moreover, change is local. Particu-
lar languages are functionally more efficient in some respect and less in others.
These are the finger prints of evolution by mutation and selection, not a reflection
of alleged needs of a speech community which keeps an eye on improving its
tool of communication. Evolution is always local optimization, but what is locally
advantageous may be a disadvantage on a higher level of the system since the
local improvement may hamper other functions. Take for instance the restrictions
on expletive subjects in English.
Modern English ‘there’ as an expletive is bound to co-occur with a postverbal
nominal subject while ‘it’ as expletive is an antecedent of a postverbal subject
clause. As a consequence, English is left without an expletive for the subject posi-
tion in clauses without a subject argument. As a direct consequence, English has
become the only Germanic language that has no passive for intransitive verbs
(2a), since there is no expletive. VO languages (like Norwegian 2b,c), however,
require a lexicalized subject position (‘det’), unlike OV languages (2d).26

(2) a. *Often it/there was phoned

b. Ofte vart det telefonert Norwegian


‘often was it telephoned’

26 This requirement may be masked by the pro-drop property: VO languages with pro-drop may
drop the unstressed pronominal subject, but they do not tolerate subjectless clauses, that is,
clauses without a subject argument. In VO languages without pro-drop, clauses without a sub-
ject argument require an expletive subject (see Haider 2010a: 35–38, 2013: 221–222). Null-subject
languages are languages in which the pronominal subject argument may be phonetically null,
but it is syntactically present. VO languages permit null subject clauses, but they do not permit
subjectless clauses.
226 Hubert Haider

c. Ofte telefoneres det


‘often telephons-Passive it’

d. Oft wurde (*es) telephoniert German


‘often was (it) telephoned’

The alleged discourse-communicative need of being able to leave the reference


to the actor open and the actor argument unmentioned is patently ignored by the
grammar of English in contrast to many other languages. English is obviously
dysfunctional in this respect.
Another apparently dysfunctional trait of VO languages, compared to OV
languages, is the exclusion of “why” or “how” in combination with a wh-subject.
Why is German (and any other OV language) allowed to ask a question that is
forbidden to the English speaker? The explanation is the causality of a structural
constraint that happens not to meet “communicative needs” (see Haider 2010a:
chapter 3).

(3) a. *It is unclear who left why

b. *It is unclear why who left

c. Es ist unklar wer weshalb weggelaufen ist German


‘it is unclear who why left has’

d. Es ist unklar weshalb wer weggelaufen ist


‘it is unclear why who left has’

Even if grammars are highly efficient, they nevertheless contain quite a few dys-
functional traits. The search for an optimal grammar would be in vain, however,
just like the search for the optimal animal. Efficiency is a matter of degree because
the selectors in the environments correspond to independent and hence some-
times conflicting demands. What is optimal for production may be suboptimal
for perception, and vice versa. What is optimal on the phonological level (e. g.
cluster reduction), may be suboptimal on the morphological level (e. g. cluster
reduction that produces non-distinct forms). This is a well-known and typical
situation for adaptation by selection. It is localistic and may create globally dys-
functional local maxima.
A strong case for adaptation by evolution and against society-driven func-
tionalism is the irreversibility of change. Interestingly, the irreversibility is
acknowledged by functionalists and declared a consequence of functionalism
(Givón undated: 8, on the “unidirectionality of change”). Functionalism does not
provide a demonstrative causality, however. The needs of a society are not coher-
ent and they may come and go, like trends in fashion. Language change, however,
is generally irreversible. When case morphology is gone it is not re-introduced
“Intelligent design” of grammars – a result of cognitive evolution 227

by the next-but-one generation. When a language becomes verb-second, like all


Germanic languages (except English), the V2 grammar is not given up for the pre-
vious grammar again. It is this strong drift that is characteristic of evolution by
natural selection, and not at all characteristic of fluctuating changes in a society.
Linguistic evolution is fed by two sources of grammar “mutations”, like
in genetic evolution. One source is internal. It is the imperfect transmission of
grammar by the language structure it determines. This is parallel to the imper-
fect transmission of genetic information. The other major source is external. The
transmission of a grammar is disordered by external influences on the interactor
(i. e. the given language). For the language learning brain, a disturbing factor is
externally triggered variation. This is typically the case in multidialectal or mul-
tilingual situations. The multilingual adult brain is happy to playfully mix the
languages, but the learner is confronted with enhanced variation in the given
language variety. Let us briefly recapitulate:

Evolution proceeds by the random process of variation, and an environmentally-based non-


random process of selection. For grammar, the environment is the language processing brain.
Individual intentions do not play a role. Organisms do not fabricate what they “need” through
“inner drives” or intentional “use and disuse”.
Mutations are not directed at the overall benefit of the interactor (see cancer-triggering muta-
tions in biogenetics that kill the carrier of the replicator).
Evolution is neither goal-directed nor random. It is fed by variability and driven by the non-
random but non-directed process of selection.

This is true of evolution on the level of the biological genotype as well as evolu-
tion on the level of a cognitive representation (viz. the cognitive representation
of grammar, with grammar as the “genotype” of the language it determines). In
each case, a reproductive system produces variation and this pool of variation is
exposed to blind selection. Selection is an environmental property. In biology, it is
the environment where the phenotype finds its resources, e. g. food. Analogously,
in cognitive selection the environment provides the resources for the phenotype.
The environment for cognitive evolution is the ensemble of brain resources for
language acquisition, production and perception. The brain resources constitute
the “biotope” in which the grammar “virus” resides after it has won the “struggle
for life” in the course of language acquisition.
Here is a concrete case for the sake of providing a more vivid impression,
given the abstract points raised above. It is the split of the Germanic language
family into a VO and an OV group during the time of the development of the Ger-
manic V2 property. This is a sketch of the crucial points only. For a detailed dis-
cussion please consult Haider (2010a) and Haider (2013: chapters 1 and 5).
228 Hubert Haider

In the Old Germanic languages, verb positioning was variable. Its base posi-
tion could be VP-final, VP-initial or VP-medial. Old English (Fischer et al. 2000:
51) is representative here (Haider 2010b; 2013, chapter 5).

(4) a. Se mæssepreost sceal [mannum [bodian þone soþan geleafan]]


‘the priest must [people [preach the true faith]]VP’ (Ælet 2 (Wulfstan1) 175)

b. þæt hi [urum godum [geoffrian magon ðancwurðe onsægednysse]]


‘that they [our god [offer may thankful offering]]VP’ (ÆCHom I, 38.592.31)

c. Ac he sceal [þa sacfullan gesibbian]


‘but he must [the contenders reconcile]VP’ (Ælet2(Wulfstan1) 188.256)

d. Se wolde [gelytlian þone lyfigendan hælend]


‘he wanted [humiliate the living saviour]VP’ (Ælet 2 (Wulfstan1) 55.98)

When the V2-pattern27 became grammaticalized, this introduced an additional,


structurally distinct, verb position outside the VP, accessible only for the finite
verb. As a result, the structures became highly ambiguous since the finite verb in
the V2 position now had to be related to its base position. Given the alternative
V positions within the VP, there always were several alternatives for the required
linkage to a base position:

(5) V2 + variable V-positioning inside the VP (pre-change situation)


a. XP Vfin YP ZP 3 alternative base positions (see 6)
b. XP Auxfin V YP 3+2 alternative base positions
c. XP Auxfin YP V 3 alternative base positions
d. XP Auxfin YP V ZP 3+2 alternative base positions

The three alternative base positions for (5a) are indicated in (6). What this
amounts to is a high degree of indeterminacy for identifying the filler-gap relation
of the fronted finite verb.

(6) XP Vfin YP ZP
a. XP Vfin [YP ZP -i]
b. XP Vfin [YP -i ZP]
c. XP Vfin [-i YP ZP]

The present-day situation is as follows. Every Germanic language with a single


exception28 has changed into a language type with a fixed head position for the

27 [XP [Vfini [ … -i ….]VP]]


28 Yiddish has conserved a grammar that all other Germanic languages changed into one with
“Intelligent design” of grammars – a result of cognitive evolution 229

verb in the VP. The Northern group is head-initial (VO); the continental Western
group is head-final (OV). This is a unique split within a language family, and it is
parallel with the grammaticalization of V2. In fact, the latter change invited the
former.

(7) a. V2 + fixed V-position in the VP


b. XP Vfin YP ZP 1 (VO) or (OV)
c. XP Auxfin V YP 1(+1) (in VO)
d. XP Auxfin YP V 1 (OV)

The advantage of the change is obvious. It replaces a grammar with a high degree
of indeterminacy with a grammar with an easy to determine filler-gap relation.
The simpler grammar variant wins, and since there are two possible implementa-
tions (namely OV and VO, masked by V2), it comes as no surprise that both found
their way into the brains of language learners and users. The simpler grammar
wins because it suffices for processing the given language structure and there
is no necessity (in terms of a large residue of patterns that cannot be analyzed
with this grammar) for a more complex analysis that would be imposed by the
previous grammar. This is selection on the level of cognitive representation of
alternative grammar variants.

5 No need for metaphors

5.1 The path of evolution is not paved with intentions

Haspelmath (1999) suggests a compromise: a functionalist characterization of


grammar change as a consequence of evolution. He describes linguistic “evolu-
tion” (without any reference to selection) as a diachronic process driven by cumu-
lative intentional actions. He is more precise than others who describe it vaguely
as cultural evolution “with” replication, and variation occurring “when we use
language in the service of joint actions between human beings in a community”
(Beckner et al. 2009: 9). “As in biology, observed adaptive patterns in language
can be explained through diachronic evolutionary processes as the unintended
cumulative outcome of numerous individual intentional actions”. (Haspelmath
1999: 180, emphasis mine)

rigid head-positioning. Arguably, this is due to the ad-strate effect of being embedded in Slavic
speaking communities. Slavic languages are languages with flexible head-positioning. They all
show the variation illustrated in (3).
230 Hubert Haider

The reference to intentional actions makes this position completely incom-


patible with Darwinian evolution. Of course, one may use the term evolution in
a non-technical, metaphorical sense, but the alleged parallel to biology becomes
a mere equivocation then. What is taken for granted in this particular case is an
invisible synchronization of the “individual intentional actions”. The typical
behavior of individual entities in a closed system is governed by the statistical law
of growing entropy (i. e. the second law of thermodynamics), however. Not syn-
chronization but dissipative development is the normal process when a system is
continuously changing locally.
What is crucially missing in this conception is a precise notion of selection.
Evolution is adaptation by natural selection. There is no adaptive change without
selection. Adaptability is relative to the selection mechanisms and these are
diverse and unstable if conceived of as “intentional actions”. Without a precise
characterization of selection, evolution is as insignificant a notion as percentage
figures without a baseline.
Evolution of grammar cannot be based on intentions. Intentions can be served
by many different means, and intentions are not constant. They themselves may
change or be replaced continuously, and they cannot be assumed to be uniform
over a large group of individuals. The selector for the selection process behind lin-
guistic evolution is not a homogenous grammar council of users who formulate
their annual motions. It is as blind as the selectors in natural selection in biology.
The only constant selector is the uniformity of our language processing
brains. Every child processes language by means of the same brain structures
and resources. This is the uniform selection environment. When this selector is
exposed to language variation, grammar variation is the pool for selection. To
put it in a simple way, the grammar variants are “competing” for a host brain as a
vehicle for grammar replication. The brain provides the vehicle and the grammar
determines the language structures this human being will use. This language is
the interactor that reflects the grammar that will be picked up by the next lan-
guage-acquiring brain.
The selector is blind. Any feature of a grammar that makes grammar acqui-
sition, reception and production easier than a competing grammar will win
because brains will acquire this grammar more easily than the less efficient com-
peting grammars, and in the end the winner takes them all. There is no intention
at issue, nor could it be. Ease of processing is of course not the only selection
filter. Storage and retrieval is selective, too, and many other factors at the neuro-
cognitive interface. The structure of grammar is always cognitively encapsulated.
As a speaker I have no idea how the grammar in my head is structured, and I
have no idea how I could change its structure in order to change its usability
properties.
“Intelligent design” of grammars – a result of cognitive evolution 231

5.2 The target of evolution is not the utterance, but the grammar

A position similar to Haspelmath’s (1999) but more elaborated in terms of neo-


Darwinistic concepts has been worked out by Croft (2000, 2013). According to
Croft (2013: 98): “The evolutionary framework gives us a theory of structured
entities that vary through replication, and a systematic relationship between the
products of language use (the replicators) and the knowledge and behavior of
language speakers (the interactors)”.
Note that utterance types (viz. the products of language use) are seen as the
replicators and speakers as the interactors. This is a fundamental misunder-
standing. Utterances do not replicate; they are produced by the vehicle and are
the interactors. The speakers are the containers of the vehicle. The misidentifica-
tion of utterances as replicator is bound to lead to wrong conclusions. The cause
for this misunderstanding seems to be the functionalist conviction that variation
has to be understood as a function of language use in a speech community (Croft
2013: 107): “The population of linguemes (the replicators) – the variants that are
propagated or go extinct – is defined by the population of speakers (the inter-
actors), who replicate the linguemes in language use. Thus we must examine the
speech community and its role in the linguistic selection process”.
This will not work. Selection does not operate on utterance variants; it oper-
ates on grammar variants that determine the structure of utterances. For grammar
change, the target is certainly not the utterance. The target is the grammar. An
utterance is merely a “molecule” of the phenotype of a grammar, that is, the set
of utterances determined by a grammar. In cognitive evolution, the language is
the phenotype and the genotype is the grammar. The replication of an utterance
is clearly not the replication of grammar.
Biological replication means the replication of the genome by expression in
the phenotype. A single utterance, either as a token or as the set of tokens (= type)
crucially is not the representative of the phenotype of grammar. Genetic variation
is accessible for selection only by its effects on the phenotype. Analogously, vari-
ation in an utterance type is the reflex of variation of the grammar that deter-
mines the type. Selection operates not on the utterance, but on the grammar that
is accessible via a set of utterances. The grammar is replicated by the language-
acquiring and -using brain.
Croft does not explain how a frequent novel “lingueme” could have a causal
effect on the grammar. The “theory of utterance selection” (Croft 2000) merely
tells us that a given utterance is used frequently if it turns out to be useful for
any context of usage and that this frequency of usage propagates this lingueme.
Crucially, it does not tell us how the frequency of a specific utterance can be the
cause of grammar change.
232 Hubert Haider

Here is a counter-example: Several years ago, the ungrammatical utterance


(7a) became popular because a prominent person uttered it sincerely. In fact, a
web search confirms that the ungrammatical utterance (7a) is still very popular.
It is seven times more frequent than its grammatical variant (7b),29 but nobody
would assume that the frequency of this specific utterance could make it a cata-
lyst of grammar change that would replace dative by nominative in the German
passive from now on.

(8) a. *Hier werden sieNom geholfen


here are you helped

b. Hier wird ihnenDat geholfen


here are you helped

Croft’s “theory of utterance selection” may account for the on-going process of
lexical changes (since this is by and large a Lamarckian kind of change), but not for
changes in the procedural system of language (viz. grammar change). Changes in
the lexicon are ubiquitous and continuous. This is not language change, however.
Language change is not so much a change in the inventory; it is a change in the
computational system. The token frequency of an utterance can only explain the
fossilization of an utterance30 as an idiom or the adoption of a novel lexeme. It is
neither the type frequency nor the token frequency of an utterance that matters.
What matters is the availability of an alternative structuring of an utterance. This
is not a question of token frequency, but rather a question of the size of a type set.
Dryer (2007: 245) emphasizes the importance of frequency: “One of the central
ideas of functionalist linguistics, especially over the past fifteen or so years, is
that frequency of usage plays a central role in explaining why languages are the
way they are”. Given the discussion above, it should be clear by now that we are
facing a chicken-and-egg problem. Adaptive success means higher utility, and
higher utility is likely to have an effect on the frequency of use. It is not frequency
that drives functionality. Causality goes in the other direction.
Newmeyer (1998: 124) puts it this way: “All linguists would agree that text
frequency is a response to a variety of factors, from cognitive complexity to prag-

29 2.9 million pages for (7a) versus 394.000 pages for (7b) (google-search on April 17th, 2012).
30 ‘Vater unser’ (the Lords Prayer, lit. ‘father our’) is ungrammatical in German, but it is the
first verse of the prayer every Christian knows. It is the direct translation of ‘pater noster’. Its
extremely high token frequency in German has not had any effect on the grammar of German,
though. Obviously, token frequency had not had any effect on the Grammar of German, of
course. Token frequency merely has the effect of fossilizing forms, but not of generalizing and
establishing the type.
“Intelligent design” of grammars – a result of cognitive evolution 233

matic usefulness. The question is to what extent frequency itself can legitimately
be called upon as an “explanation” for whatever phenomena seem to be sensitive
to it”.
“Ease” or “naturalness” in processing by the language processing brain is the
explanandum and frequency is the effect.31 In language change, higher frequency
may be a correlate and it may be part of an explanation, but it is not the cause.
The cause is a grammar change that makes a variant available that prevails over
“competing” variants. Prevailing may be reflected in frequency.
Frequency is a topic in first language acquisition, too. Here, an obvious
problem with frequency as a causal factor becomes perspicuous. The usual argu-
mentation uses frequency considerations as evidence and as a basis for a sus-
pected functional connection and overlooks that this explanatory move would
work only post hoc. If you know which functionality to look at, you can count
frequencies. But crucially, observing frequencies would not tell the child what to
do with frequency gradients. This is clearly stated by Yang (2004: 452): “Although
infants seem to keep track of statistical information, any conclusion drawn from
such findings must presuppose that children know what kind of statistical infor-
mation to keep track of. After all, an infinite range of statistical correlations exists”
and “statistics requires UG”. Statistical learning “is constrained by what appears
to be innate and domain-specific principles of linguistic structures, which ensure
that learning operates on specific aspects of the input” (Yang 2004: 455).

6 Summary and outlook

Language is both a cultural and a neuro-cognitive phenomenon embedded in a


biological substrate. Which kind of language change is a cultural32 or a “natural”
phenomenon is therefore an empirical question. This paper argues that the “cog-
nitive evolution” of grammars is an instance of classical evolution that is opera-
tive in a different domain. Darwin showed how his theory of evolution operates
on the level of biological organisms but was aware that his theory is not sub-
stance-bound. Cognitive evolution operates on the level of the structures of cog-

31 As in biology, frequency is a multi-faceted phenomenon. Typological frequency is a sign of


successful adaptation by selection and shows on the populations’ scale, i. e. cross-linguistically.
Lightfoot’s (2002) “threshold frequency” is a within-language (i. e. within population) frequency
criterion. It is the critical frequency at which a variant gets a chance to spread.
32 Adaptation as a consequence of cognitive evolution by natural selection is one source of
grammar changes. There are of course other sources as well whose best explanation, for in-
stance, may be a socio-linguistic one; see Trudgill 2011.
234 Hubert Haider

nitive representations of self-replicating cognitive systems. Biology and neuro-


cognition are different domains, but the mechanisms of evolution are the same.
As Darwin (1871) realized, the development of languages and the development
of species follow the same general laws of evolution but they are implemented in
different domains. The laws of grammar change are the laws of evolution that
hold for replicating systems, whose variants (mutations) are exposed to selection.
The effect is adaptation (to the selection parameters) and diversification, with
luxurious side effects not excluded.
Grammar change is a domain of cognitive evolution. The replicating system
is the grammar of a language. It is an ontologically real object that resides in
human brains as a cognitive virus and makes the brain a servant for the purpose
of its replication (by producing language structures that serve as the basis for the
acquisition of this grammar by other brains).
Cognitive evolution is nothing more than evolution working on cognitive
entities. Hence, the outcome is totally parallel to biological evolution in terms
of yielding systemic adaptation. This paper argues for abandoning the custom-
ary metaphorical allusions to evolution and for taking the ground-breaking Dar-
winian insight seriously, on the appropriate level of theoretical generalization.
Crucially, this does not mean that the evolution of grammars is to be subsumed
under the domain of biological evolution. This would obviously be misguided.
It means that a hitherto overlooked domain of application of the theory of
evolution is the domain of self-replicating cognitive systems. The prominent case
are grammars as the neuro-cognitive programs for the ensemble of processing
systems for language production and reception, and, most importantly, for lan-
guage acquisition. Once cognitive evolution is recognized, the adaptive properties
of grammars find a scientific explanation and the frequently felt desire for invok-
ing (functionalistic) teleological explanations can be satisfied in an unexpected
but logically valid way. It is the very same access road that has been opened and
paved by Darwin in the domain of organismic biology.
The evolution of biological species is an exemplary case of evolution, but it
is not its exclusive domain. Evolution applies to any replicative system that rep-
licates in a domain of restricted resources. Once grammars are seen as natural,
replicative systems that come in variants whose replication depends on limited
resources, Darwinian evolution is predicted to apply, with adaptation and diversi-
fication as the inevitable outcome. Eventually, cognitive evolution is the solution
for “Newmeyer’s dilemma”: On the one hand, as he has shown in great detail,
it is hopeless to advocate “atomistic functionalism”. But on the other hand,
“holistic functionalism” is hard to prove since this view maintains that the link
between grammatical constructs and functional motivations is “extremely indi-
rect” (Newmeyer 2005: 225). As he explains: “There is no direct linkage between
“Intelligent design” of grammars – a result of cognitive evolution 235

external and grammatical properties. The influence of the former on the latter
is played out in language use and acquisition and (therefore) language change
is manifested only typologically” (Newmeyer 2005: 175). This is exactly what we
expect for system adaptivity under natural selection. There is no direct influence
of function on form but nevertheless the system of forms will end up showing an
adaptive design.
Cognitive evolution provides a causal relation and predicts this overall
picture. The result of cognitive evolution is a family of adaptive changes in
grammar systems that may be described, but not explained, from a vantage point
of holistic functionalism. Cognitive evolution of grammars explains adaptation
without any functionalist backward causation and without any direct linkage
between properties of particular grammars and conjectured functional motiva-
tions for each of these properties.
It is needless to emphasize that the precise understanding of a theory of cog-
nitive evolution of grammars as cognitive systems replicated by brains is at least
as far away as it was for Darwin. He developed his theory without any precise
understanding of the real source of variation (i. e. genetic mutations), without any
idea of the real target of selection (viz. the genome), and had to link natural selec-
tion crucially with the idea of heredity long before the basic concepts of genetics
and inheritance were discovered. It took several generations of researchers until
the theory had arrived at its standard form as the modern evolutionary synthesis.
But he dealt with palpable entities, namely animals and plants.
We linguists deal with grammars. Like in nineteenth century biology, the sit-
uation in linguistics will not make fundamental progress before neuro-cognitive
research offers us a more precise understanding of the selection environments
for various kinds of structurally different cognitive representations. For biolog-
ical evolution, the selection environment was much easier to estimate. Darwin
understood it to be everything that enhances or impedes reproductive success.
The environment for cognitive selection is not so easily accessible, although it is
very close to every theoretician since it is located somewhere between his right
and his left ear.
The other disadvantage in comparison to Darwin is equally basic, namely the
precise identification of the species and their formal distinctions. In biology, it
sufficed to observe, analyze and describe; in grammar theory, observation does
not help. Presently, all we have is less than a dozen sufficiently analyzed grammar
systems. Typological cross-linguistic descriptions of human language grammars
are comparable to poor-quality photographs that barely suffice for telling apart a
gnu and a zebra.
For a sufficient understanding of cognitive selection we need a much better
understanding of the neuro-cognition of language processing and language learn-
236 Hubert Haider

ing as an arguably domain-specific capacity and its specifics in the ensemble of


domain-general processing functions. At the moment, we hardly know anything.
As everyone knows, it is difficult to catch a black cat in a dark room, but it helps
at least to know that it is there.
For some time this set of circumstances will leave room enough for linguists
who are happy with functional narratives: “Unfortunately, the vast majority of
self-designated functionalists, of whatever sect, tend to expostulate about cogni-
tion without studying the cognitive literature” (Givón undated: 10) – and without
studying the well-documented history on the rise and fall of functionalism in
biology, one might feel tempted to add.

References
Amundson, Ron (1998): Typology reconsidered: two doctrines on the history of evolutionary
biology. Biology and Philosophy 13: 153–177.
Beckner, Clay, R. Blythe, J. Bybee, M. H. Christiansen, W. Croft, N. C. Ellis, J. Holland, J. Ke,
D. Larsen-Freeman, T. Schoeneman (2009): Language is a complex adaptive system.
Language Learning 59: 1–26.
Bierwisch, Manfred (2000): Evolutionstheorie und Linguistik – Drei Probleme. In: Klaus Richter
(ed.), Evolutionstheorie und Geisteswissenschaften, 129–189. University Erfurt: Acta
Academia Scientiarum 5.
Brastronndon, Robert. N. (1982): The levels of selection. Proceedings of the Philosophy of
Science Association 1: 315–323.
Croft, William (2000): Explaining language change: an evolutionary approach. London:
Longman.
Croft, William (2013): Language use and the evolution of languages. In: Kenny Smith and
Philippe Binder (eds.), The Language phenomenon, 93–120. Berlin: Springer.
Cummins, Robert (1975): Functional analysis. The Journal of Philosophy 72: 741–765.
Darwin, Charles (1859): On the origin of species by means of natural selection, or the
preservation of favoured races in the struggle for life. London: John Murray, Albemarle
Street.
Darwin, Charles (1871): The descent of man, and selection in relation with sex. London: John
Murray, Albemarle Street.
Dawkins, Richard (1978): Replicator selection and the extended phenotype. Zeitschrift für
Tierpsychologie 47: 61–76.
Dawkins, Richard (1982a): Replicators and vehicles. In: King’s College Sociobiology Group,
Cambridge (ed.), Current Problems in Sociobiology, 45–64. Cambridge: Cambridge
University Press.
Dawkins, Richard (1982b): The Extended Phenotype. Oxford: Oxford University Press.
Dawkins, Richard (1996) [1986]: The Blind Watchmaker. New York: W. W. Norton.
De Vogelaer, Gunther (2007): Innovative 2pl.-pronouns in English and Dutch – ‘Darwinian’ or
‘Lamarckian’ change? Studies van de BKL 2007 / Travaux du CBL 2007 / Papers of the LSB
(Linguistic Circle of Belgium) 2: 1–14.
“Intelligent design” of grammars – a result of cognitive evolution 237

De Vries, Hugo (1909): Variation. In: A. C. Seward (ed.), Darwinism and Modern Science, 66–84.
Cambridge: Cambridge University Press.
Dryer, Matthew S. (2007): Review of Newmeyer’s Possible and probable languages. Journal of
Linguistics 43: 244–252.
Fischer, Olga, Ans van Kemenade, Willem Koopman and Wim van der Wurff (2000): The syntax of
Early English. Cambridge: Cambridge University Press.
Fitch, W. Tecumseh (2007): An invisible hand. Nature 449: 665–666.
Givón, Talmy (undated): On the intellectual roots of functionalism in linguistics. Ms. University
of Oregon. URL <http://linguistics.uoregon.edu/wp-content/uploads/2013/03/
FuncLing11Rus.pdf> (18. 4. 2013)
Gould, Stephen Jay (2002): The structure of evolutionary theory. Cambridge, MA: Harvard
University Press.
Haider, Hubert (1997): Economy in Syntax is Projective Economy. In: Chris Wilder, Hans-Martin
Gaertner and Manfred Bierwisch (eds.), The Role of Economy Principles in Linguistic Theory
(Studia Grammatica 40), 205–226. Berlin: Akademie Verlag
Haider, Hubert (1998): Form follows function fails – as a direct explanation for properties
of grammars. In: Paul Weingartner, Gerhard Schurz and Georg Dorn (eds.), The Role of
Pragmatics in Contemporary Philosophy, 92–108. Vienna: Hölder-Pichler-Tempsky.
Haider, Hubert (2001): Not every why has a wherefore – Notes on the relation between form
and function. In: Walter Bisang (ed.), Aspects of Typology and Universals, 37–52. Berlin:
Akademie-Verlag.
Haider, Hubert (2010a): The Syntax of German. Cambridge: Cambridge University Press.
Haider, Hubert (2010b): Wie wurde Deutsch OV? Zur diachronen Dynamik eines Struktur-
parameters der germanischen Sprachen. In. Arne Ziegler (ed), Historische Textgrammatik
und Historische Syntax des Deutschen – Traditionen, Innovationen, Perspektiven, 11–32.
Berlin/New York: De Gruyter.
Haider, Hubert (2013): Symmetry breaking in Syntax. Cambridge: Cambridge University Press.
Haspelmath, Martin (1999): Optimality and diachronic adaptation. Zeitschrift für Sprachwis-
senschaft 18: 180–205.
Hawkins, John A. (1994): A Performance Theory of Order and Constituency. Cambridge:
Cambridge University Press.
Hawkins, John A. (2004): Efficiency and complexity in grammars. Oxford: Oxford University
Press.
Heath, Jeffrey G. (1984): Language Contact and Language Change. Annual Review of
Anthropology 13: 367–384.
Hempel, Carl (1959): The logic of functional analysis. In: Llewellyn Gross (ed.), Symposium on
Sociological Theory, 271–307. New York: Harper & Row.
Heylighen Francis, (1999): The growth of structural and functional complexity during evolution.
In: Francis Heylighen, Johan Bollen and Alexander Riegler (ed.), The Evolution of
Complexity, 17–44. Dordrecht: Kluwer.
Hull, David L. (2001): Science and selection: Essays on biological evolution and the theory of
science. Cambridge: Cambridge University Press.
Lightfoot, David W. (2002): The development of language: Acquisition, change and evolution.
Oxford: Blackwell.
Lloyd, Elisabeth (2012): Units and levels of selection. Edward N. Zalta (ed.) The Stanford
Encyclopedia of Philosophy. URL (18. 4. 2013) <http://plato.stanford.edu/archives/
win2012/entries/selection-units/>
238 Hubert Haider

Mayr, Ernst (1991): One long argument. Cambridge, MA: Harvard University Press.
Millstein, Roberta L. (2006): Discussion of “four case studies on chance in evolution”:
Philosophical themes and questions. Philosophy of Science 73(5): 678–687.
Monod, Jacques (1971): Chance and necessity: An essay on the natural philosophy of modern
biology. New York: Alfred A. Knopf.
Müller, Friedrich Max (1862): Lectures on the Science of Language. London: Longman, Green,
Longman, and Roberts.
Nagel, Ernest (1961): The structure of science: Problems in the logic of scientific explanation.
New York/ Burlingame: Harcourt, Brace and World Inc.
Newmeyer, Frederick J. (1998): Language form and language function. Cambridge, MA: MIT
Press.
Newmeyer, Frederick J. (2001): ‘Where is functional explanation?’ In: Mary Andronis,
Christopher Ball, Heidi Elston and Sylvain Neuvel (eds.), Papers from the thirty-seventh
meeting of the Chicago Linguistic Society. Part 2: The Panels, 99–122. Chicago: Chicago
Linguistic Society.
Newmeyer, Frederick J. (2005): Possible and probable languages. Oxford: Oxford University
Press.
Pinker, Steven and Paul Bloom (1990): Natural language and natural Selection. Behavioral and
Brain Sciences 13: 707–784.
Premack, David (1985): “Gavagai!” or the future history of the animal language controversy.
Cambridge, MA: MIT Press
Rauschecker, Josef P. and Sophie K. Scott (2009): Maps and streams in the auditory cortex:
nonhuman primates illuminate human speech processing. Nature Neuroscience 12:
718–724.
Ruse, Michael (2003): Darwin and design: Does evolution have a purpose? Cambridge, MA:
Harvard University Press.
Tooby, John and Leda Cosmides (1990): Toward an adaptionist psycholinguistics. Behavioral
and Brain Sciences 13(4): 760–763.
Trudgill, Peter (2011): Sociolinguistic Typology: Social determinants of linguistic complexity.
Oxford: Oxford University Press.
Winford, Donald (2003): Contact-induced changes – classification and processes. In: Ohio State
University Working Papers in Linguistics 57: 129–150.
Yang, Charles D. (2004): Universal Grammar, statistics, or both? Trends in Cognitive Sciences 8:
451–456.
Guido Seiler, University of Munich
Syntactization, analogy and
the distinction between proximate and
evolutionary causations1

Abstract: Formalist and functionalist approaches make radically different


assumptions about the relationship between syntactic form and communicative
function, or about the status of autonomy of syntactic structure. The present
paper argues that the two positions are ultimately compatible with each other
because they explain different things. We propose the following three hypotheses:
(i) There is autonomy of syntax. (ii) Autonomous syntactic structure is the result
of language use and diachronic language development. (iii) The cognitive mech-
anism by which autonomous syntactic structure is diachronically implemented
is analogy. The argument is based on an empirical case study on prepositional
dative marking (PDM) in Upper German dialects. In several dialects a dative DP
can be introduced by a semantically empty prepositional marker. The example
demonstrates how new variants come into play, spread over larger dialect areas
and are implemented in different ways into the respective systems of grammar.
Whereas in some dialects PDM is an optional variant whose use can easily be
motivated on the basis of extrasyntactic functional principles (such as e. g. icon-
icity), other dialects have analogically extended prepositional dative marking to
all structurally related contexts such that the dative marker must be analyzed as
an expletive element, triggered by a particular syntactic environment and irre-
spective of any dative functional properties. In conclusion, it will be argued that
the question of whether formal or functional explanations in syntax are more
appropriate is misleading. Referring to the distinction between proximate and
evolutionary causations in biology, it will be proposed that both approaches are
explanatory, but at different levels, which therefore makes them compatible with
each other.

1 I thank Aria Adli, Marco García García and Göz Kaufmann – the organizers of the Grammar,
Usage, and Society workshop held in Freiburg, Germany in November 2011 – for giving me the
opportunity to discuss my ideas. I thank the workshop audience for fruitful discussions. I am
particularly grateful to my Freiburg colleague Daniel Jacob and to Peter Culicover who visited my
students and me in September 2011. During a seminar on Simpler Syntax, Daniel and Peter dis-
cussed the idea that autonomy might have to do with change rather than with innateness. They
thus formulated a thought which became central for the present paper.
240 Guido Seiler

1 I ntroduction: Formalist vs. functionalist explanations


in linguistics

Theoretical linguists can be categorized according to the nature of the expla-


nations they generally believe in: formalist (structure-driven, autonomist) or
functionalist (usage-based, motivationist). The present paper argues that both
types of explanation are necessary and ultimately compatible with each other.
If this line of thinking is on the right track, the question “Who’s right, who’s
wrong?” becomes obsolete and must be transformed into a more interesting ques-
tion, namely: What is the exact division of labor between formal and functional
explanations in linguistics? I propose that (i) many aspects of syntactic struc-
ture cannot (and should not) be directly motivated on the basis of extrasyntactic
principles such as communicative function, constraints on language use and the
like; however, (ii) the diachronic pathways that lead to autonomous syntactic
structure can (and must) often be plausibly attributed to usage-based principles.
In a nutshell, the two schools mentioned above can be characterized as
follows: The central claim of formalist linguistics is that syntactic structure (or
at least central aspects of it) is organized on the basis of syntax-specific, purely
formal1 principles (i. e. without a direct connection to communicative function).
These principles provide the basis for a formal2 analysis (i. e. in terms of discrete
symbolic computation, combination rules such as phrase structure configura-
tion, and feature specifications) with the goal of predicting the well-formedness
(grammaticality) of sentences.2 This analytic procedure goes hand in hand with
more fundamental assumptions about the nature of syntax: Syntactic structure is
autonomous and cannot be derived from other, more general cognitive capacities
in a direct way. Although the design of syntax is good for communication, com-
municative function does not tell us everything about structural organization. Or,
to put it in Fanselow and Felix’s (1993) words: “What is constitutive for language
is not the fact that it can be used for communication, but by means of which struc-
tural mechanism it can be used for communication” (Fanselow and Felix 1993: 69;
original emphasis).
Under the assumptions of the formalist approach, the autonomous principles
which shape the design of syntax are endowed in humans’ language faculty. They
are innate, thus limiting the variation space for learnable natural languages, yet
at the same time open for language-particular parameterizations. In its most
radical version, we might call this the “syntax only” approach: In order to under-
stand how syntax works, we must look at syntax in the first place. A genuine

2 Cf. Newmeyer (1998: 7–8) on the ambiguity of the term “formal”.


Syntactization, analogy and distinction between causations 241

problem of the “syntax only” approach is the fact that many things which happen
in syntax cannot be explained syntax-internally. For example, in languages with
relatively free constituent order it is impossible to explain the concrete choice of
an ordering without reference to extrasyntactic, semantic (definiteness, animacy)
or pragmatic (information structure) factors, i. e. factors which do not necessarily
lead to grammaticality contrasts but nevertheless to clear statistical preferences.
These generalizations about syntax are lost in a “syntax only” approach. Central
aims of the formalist approach can be summarized as follows:

(1) Formalist approach:


– predictions about well-formedness based on algorithmic modeling
– autonomist view
– innateness

It is questionable whether the three properties sketched in (1) – which usually


co-occur in the formalist approach – necessarily follow from each other. For
example, it should also be possible to maintain algorithmic modeling without
any reference to innateness.
Functionalist linguistics, on the other hand, assumes that syntactic pattern-
ing can (and must) be explained as a reflex of the ways language is used in com-
municative interaction, i. e. on the basis of semantic, pragmatic or more general
cognitive principles which are not language-specific at all, as well as frequency
of use (which in most cases is epiphenomenal to communicative function3).
Functionalist linguistics emphasizes that syntactic structure is motivated by
something else. Strictly speaking, this means that there must always be an extra-
syntactic explanation available for any aspect of syntactic structure. In its most
radical version, we might call this the “syntax without syntax” approach (Sie-
wierska raises the question “Do functionalists need a model of grammar?ˮ on the
cover of her 1991 book). As Halliday (1973) states:

A functional approach to language means, first of all, investigating how language is used:
trying to find out what are the purposes that language serves for us, and how we are able
to achieve these purposes through speaking and listening, reading and writing. But it also
means more than this. It means seeking to explain the nature of language in functional
terms: seeing whether language itself has been shaped by use, and if so, in what ways – how

3 As a reviewer points out, one might ask whether sociolinguistic (or stylistic) variant competi-
tion and its respective frequency distribution can be regarded as epiphenomenal to communi-
cative function. At least for cases where the choice of a variant is related to social meaning we
would like to subsume the construction of social meaning under communicative function, too.
242 Guido Seiler

the form of language has been determined by the function it has evolved to serve. (Halliday
1973: 7; emphasis mine)

Well-formedness is not a central notion in the functionalist approach. Empirical


research focuses on the isolation of statistical distributional patterns rather than
on predictions about (un)grammaticality. Thus, many of the analytical devices
used to explain the ungrammaticality of a logically thinkable but ill-formed sen-
tence in a given language are inapplicable in a functionalist approach: Function-
alists attempt to find a motivation for what they find in a language but do not
have much to say about what is absent from a language (and why).
A problem inherent to the functionalist approach is the fact that functional
motivations are often stated ad hoc in a commonsense fashion. This is most
problematic in cases where a plausible functional motivation is not available
(and perhaps simply unnecessary). Moreover, functional motivations often con-
found synchrony with diachrony. Finally, function does not give us any hint on
the structural makeup of a syntactic pattern. Central aims of the functionalist
approach can be summarized as follows:

(2) Functionalist approach:


– statistical modeling rather than discrete symbolic computation
– motivation of syntactic structure based on communicative function and general (i. e.
not language-specific) cognitive principles

Again, one might ask to what degree particular theoretical beliefs (e. g. a motiva-
tionist view) justify a particular method of analyzing syntactic structure.
In practice, many linguists from both schools accept some assumptions or
insights from the other. For example, in the last two decades generative syntax
has made efforts to incorporate insights about information structure into the
analytical framework. However, the aspect which seems most difficult to compro-
mise on is the status of syntactic autonomy: Either you believe in autonomy or in
motivation. In the following we will argue that both autonomist and motivationist
explanations are right, but they explain different things.
The remainder of the present paper is structured as follows. In Section 2, I will
present the central argument which has at its core a novel interpretation of syn-
tactic autonomy and its relationship to mechanisms of change. The argument will
be illustrated in greater detail in Section 3, taking as an example prepositional
dative marking in Upper German dialects. In the concluding Section 4 we will
summarize the main insights and discuss a few enlightening analogies between
linguistics and evolutionary biology.
Syntactization, analogy and distinction between causations 243

2 The proposal: Syntactic autonomy because of usage

2.1 There is syntactic autonomy

Many syntactic regularities can be motivated on the basis of extrasyntactic factors


such as semantics, pragmatics, communicative function, or general cognitive
principles like iconicity. Such phenomena are entirely expected from a func-
tionalist perspective. However, there are also quite a few syntactic phenomena
which are difficult (if not impossible) to account for extrasyntactically. We define
autonomy of syntax as a cover term for exactly this kind of phenomena. In other
words, autonomy of syntax does not mean that everything in syntax is indepen-
dent of extrasyntactic function; it only means that there are structural traits in
syntax which have a “life of their own” (cf. also Aronoffʼs marvelous 1994 book
title Morphology by itself on similar phenomena in morphology). The existence
of such phenomena is crucial for the conclusions we would like to draw in the
remainder of the present chapter. In this section we briefly present syntactic
phenomena from Standard German which, we believe, are striking examples of
syntactic autonomy: Not only are they nonfunctional, but in some cases even dys-
functional in terms of extrasyntactic motivation.
Our definition of autonomy thus relies on the arbitrariness aspect (rather
than the self-containedness aspect) of syntactic structure, following Croft’s (1995:
491) definitions of autonomy.4 In Newmeyer’s (1998: 23–26) terms, our definition
centers around the autonomy of syntax as such rather than autonomy of knowl-
edge of language (whereas it is also capable of integrating Newmeyer’s autonomy
of grammar as a cognitive system).5

4 Croft defines arbitrariness as follows: “The syntactic component contains elements and rules
of combination that are not derivable from semantic or discourse categories and their combina-
tion” (Croft 1995: 494). Self-containedness describes the assumption that “the rules of the system
interact with each other but do not interact closely with the rules existing elsewhere” (Croft 1995:
495).
5 Newmeyer (1998) defines the three hypotheses of autonomy as follows:
Autonomy of syntax: “Human cognition embodies a system whose primitive terms are non-
semantic and nondiscoursederived syntactic elements and whose principles of combination
make no reference to system-external factors”. (Newmeyer 1998: 23)
Autonomy of knowledge of language: “Knowledge of language (‘competence’) can and should
be characterized independently of language use (‘performance’) and the social, cognitive, and
communicative factors contributing to use”. (Newmeyer 1998: 24)
Autonomy of grammar as a cognitive system: “Human cognition embodies a system whose prim-
itive terms are structural elements particular to language and whose principles of combination
make no reference to system-external factors”. (Newmeyer 1998: 24)
244 Guido Seiler

Expletives: The Standard German third-person singular personal pronoun es


serves as a dummy expletive element. It is inserted as a dummy subject in clauses
with a finite predicate lacking any argument positions, such as weather verbs:

(3) a. es regnet heute


it rains today

b. heute regnet es
today rains it

Subject expletives are inserted regardless of the position in the clause. However,
expletive es may also be inserted in sentences already containing a subject, yet
this kind of insertion is restricted to the prefield position (SpecCP, the position
before the finite verb in declarative main clauses). SpecCP expletives occur sys-
tematically in the impersonal passive (4) but are not limited to it (5):

(4) a. es wurde heute getanzt


it was today danced

b. *heute wurde es getanzt


today was it danced

c. heute wurde getanzt


today was danced

(5) a. es kamen gestern drei Besucher


it came yesterday three visitors

b. *gestern kamen es drei Besucher


yesterday came it three visitors

c. gestern kamen drei Besucher


yesterday came three visitors

It is difficult to attribute any extrasyntactic function to both kinds of exple-


tive insertion. As for subject expletives, they satisfy the subject condition, the
requirement that every finite clause contains an overt subject (see below for
SpecCP). However, the subject condition cannot be accounted for semantically
or pragmatically:
Subjects: By default, the subject function in German is assigned to the most
prominent thematic role. The prominence of thematic roles can be defined by the
following hierarchy (following Bresnan 2001: 307):

(6) Agent > Beneficiary > Experiencer/Goal > Instrument > Patient/Theme > Locative
Syntactization, analogy and distinction between causations 245

(7) a. AGENT >> PATIENT:


sie öffnet die Tür
she opens the door

b. BENEFICIARY >> PATIENT:


sie bekommt ein Auto
she gets a car

c. PATIENT:
die Tür geht auf
the door goes open
‘the door opens’

Crucially, subjects are not inherently linked to any specific thematic role. This
means that subjecthood does not contribute anything to semantic interpretation.
The subject condition is a constraint on clause structure as such. Similarly, sub-
jects are not inherently linked to any specific discourse function. If we adopt
Choi’s (1997: 75) distinction of Topic, Focus, and Tail in German (adapted from
Vallduví 1992), it follows that subjects may be linked to each of the discourse
functions:

(8) a. Subject = Topic:


Obama wird die nächsten Wahlen GEWINNEN
Obama will the next elections win

b. Subject = Focus:
die nächsten Wahlen wird OBAMA gewinnen
the next elections will Obama win

c. Subject = Tail:
die nächsten Wahlen wird Obama SICHER gewinnen
the next elections will Obama surely win

SpecCP: Let us now turn to the other type of expletive insertion. German is a so-
called verb-second language (more precisely: a finite-second language). German
declarative main clauses obey a verb-second constraint: There is one (and only
one) constituent position (SpecCP, the German prefield position) before the finite
verb which must be filled. We have already seen that SpecCP is linked neither to
any syntactic function (subject, object, etc.) nor to any particular thematic role.
As regards information structure, the SpecCP position does not seem to be linked
to a particular discourse function: The relationship between discourse-pragmatic
functions and the respective means of expression is unusually complex, disparate
and partly contradictory in German (see Musan 2002 for an overview). However, it
is uncontroversial that the SpecCP position can be used for topicalization:
246 Guido Seiler

(9) a. [DP dieses Buch] hat Anna heute in die Bibliothek gebracht
this book has Anna today to the library brought

b. [PP in die Bibliothek] hat Anna heute dieses Buch gebracht


to the library has Anna today this book brought

c. [VP dieses Buch in die Bibliothek gebracht] hat Anna heute


this book to the library brought has Anna today

Most interestingly, SpecCP-topicalization is limited to one constituent only. From


the perspective of communicative function one could easily figure out contexts
where topicalization of two constituents would be appropriate, e. g. dieses Buch
and in die Bibliothek. The resulting sentence is understandable and even dis-
course-pragmatically interpretable for speakers, but it is ill-formed due to its
violation of the verb-second constraint:

(9) d. [DP dieses Buch] [PP in die Bibliothek] hat Anna heute gebracht
this book to the library has Anna today brought

Case marking: Why does a language have case marking? Blake (2001: 1) defines
the function of case as “a system of marking dependent nouns for the type of rela-
tionship they bear to their headsˮ. Applied to transitive predicates, this means
that nominative and accusative (or ergative and absolutive) have the function of
morphologically distinguishing subjects from objects. In a predicate HIT(Hans,
Peter) it is functional to be able to distinguish between the hitting and the hit par-
ticipant. This can be achieved by means of word order, verbal agreement, or mor-
phological or adpositional case marking. However, in many transitive predicates
grammatical means of subject-object distinction are completely useless:

(10) a. BUY(Hans, the car)


b. EAT(my brother, a tomato)
c. WRITE(Anna, a letter)

A typical transitive predicate combines an argument which is high in definite-


ness and animacy with an argument which is low in definiteness and animacy.
Languages with differential subject or object marking have grammaticalized the
prototypical decrease in definiteness and animacy from subject to object insofar
as only less typical subjects or objects require an overt marker of the grammati-
cal function (cf. Aissen 2003 among others). Thus, the fact that such languages
make use of overt case marking especially in those predications with potential
(not necessarily actual) ambiguities is not too surprising from a functional per-
spective on case. What is surprising indeed is the obligatoriness of case marking
in strict case marking languages (i. e. languages without differential subject or
Syntactization, analogy and distinction between causations 247

object marking) like German: Here, objects are required to be expressed in the
accusative case regardless of potential or actual ambiguities. In a sense, strict
case marking languages like German are by far too explicit with respect to the
subject-object distinction. It seems that these languages make use of the available
case morphology with no regard to communicative function but simply because
the structural configuration requires it. As for German, this observation is even
true if looked at from a different perspective. In Modern German, the nominative-
accusative distinction of noun phrases containing a determiner is very weak,
for the distinction is overtly expressed only in the masculine singular. In addi-
tion, and even worse, German notoriously lacks any case morphology in proper
names. Yet in the written standard variety proper names are not accompanied
by a determiner. Nominative and accusative of proper names are thus identical
in their form. But proper names are high in both animacy and definiteness, so if
we had the freedom to distribute case morphemes over parts of the lexicon, we
would certainly give them to proper names in the first place! In sum, German case
is dysfunctional in two ways. On the one hand, German uses too much case (in
unambiguous transitive predicates); on the other hand, case is too limited (as far
as proper names are concerned).
To conclude this section, let us briefly come back again to the general issue
of (absence of) extrasyntactic functionality of the discussed examples. We have
argued that each example poses great difficulties in terms of functional moti-
vation as it is generally understood in the functionalist literature. However, we
are convinced that expletives, verb-second, case marking, etc. do fulfill a certain
function in the linguistic system – just not a function at the level of semantics,
pragmatics, or iconic encoding. In order to identify this function it is necessary
to overcome the bilateral-semiotic view as it is practiced e. g. in Construction
Grammar. At the core of the bilateral-semiotic view lies the assumption that
grammar at all of its levels consists of a collection of constructional schemes, i. e.
form–meaning pairings of varying size. The intuition is that grammar has the job
of packing particular meanings into particular constructions. But grammar has
other functions than just that. It guarantees structural well-formedness. Whereas
particular patterns of well-formedness are not related to any particular meaning
or communicative function, grammar as a whole has a function indeed: It makes
processing easier. Thus, ultimately the purpose of structural well-formedness as
such is to make communication more efficient. This can be achieved in various
ways, but it has to be done somehow. How exactly well-formedness is realized is
to a great extent specific to a language (as long as the language remains within
the constraints of possible cross-linguistic variation); some languages put their
objects in front of the verb, others after it – the important thing is that they have to
do it somehow. If we think of typological variation in terms of different rankings
248 Guido Seiler

of Optimality-theoretical constraints, it is random how exactly a single language


ranks the constraints. But it has to rank them somehow in order to efficiently
function in communication: It is not so much the particular constraint ranking
which is functional but the mere fact that constraints are ranked in a language
at all.
Having accepted that well-formedness as such can ultimately be motivated in
terms of communicative function, it is a surprising fact about theoretical linguis-
tics that the notion of well-formedness does not play a role in the functionalist
paradigm, which has left the territory of well-formedness to the formalist school.

2.2 The structure–function paradox and its possible solution

In a formalist view on syntax, syntactic structure is to some degree immune


against usage. Autonomous syntactic structure is taken as evidence for the innate-
ness of fundamental, abstract principles of structural organization (Universal
Grammar). In our proposal we defend the existence of autonomous syntax (in line
with formalist linguistics, contra functionalism) but we do not follow formalism
in interpreting autonomy as evidence for innate principles. This does not inevita-
bly mean that the idea of Universal Grammar must be rejected altogether. We only
believe that the hypothesis of Universal Grammar does not necessarily (though
possibly) follow from the existence of autonomous syntax. In other words, auto-
nomy of syntax is not very convincing evidence of Universal Grammar. A much
more striking type of evidence is cross-linguistic generalizations.
Syntactic structure, autonomous or not, must come from somewhere – if not
from innate principles, it must stem from the function of language in communi-
cation and constraints on language use. As far as autonomous parts of syntax
are concerned this is, of course, a paradox: How can functional factors shape
structural traits which are functionally unmotivated? The solution lies in a literal
understanding of the above-mentioned verb “stem from”: Functional constraints
on usage influence the ways syntax develops over time, how variants are selected
by speakers increasingly often until obligatorization. However, the results of
these processes may well be independent of the functional factors which formerly
drove the emergence and spread of a variant. Thus, there is a causal relationship
between structure and function, but an indirect one: Usage drives change, but is
rather irrelevant for the synchronic structural shape of a syntactic pattern. The
idea that diachrony is essential for our understanding of the structure–function
relationship is not new. It is most explicitly formulated by Haspelmath (1999a:
183–184), who observes that constraints on structural markedness as assumed
by Optimality Theory are often functionally motivated. He proposes that the
Syntactization, analogy and distinction between causations 249

link between structure and function can be constructed only via diachrony, i. e.
processes of variation and selection (1999a: 187–189). What is new in the present
proposal is the claim that functional factors may become obsolete over time, thus
enhancing autonomy. Our central assumptions, which will be exemplified in
detail in Section 3, are the following:

(i) There is autonomy of syntax.


(ii) Autonomy of syntax is the result of language use and diachronic development.
(iii) The cognitive mechanism by which autonomous syntactic structure is dia-
chronically implemented is analogy.

Ad (i): In Section 2.1 above, we defined autonomy of syntax as a cover term for
those aspects of syntax which cannot be motivated by anything extrasyntactic
such as meaning, communicative function or general cognitive principles (e. g.
iconicity). We thus assume that there are aspects of syntax which are arbitrary
from a functional perspective. Arbitrariness of the linguistic sign is one of the
most fundamental design features of human language (Hockett 1960) and one
of the central insights of modern structural linguistics. It is a surprising fact that
arbitrariness has been disputed at all in the area of syntax, given the fact that arbi-
trariness is a standard assumption about vocabulary, but also about morphology
(cf. morphomes, Aronoff 1994). Even in phonology we find arbitrary traits, i. e.
traits which cannot be motivated in terms of articulation or perception, namely
opacity (Kiparsky 1973). So, if arbitrariness is a fundamental property of human
language structure as a whole, why should syntax be an exception?
Ad (ii): Having accepted that syntactic autonomy exists, one has to ask where
it comes from. I do not see any obvious reason to conclude from autonomy to
innateness of basic principles of syntactic organization. Rather, I propose again
that we should learn from phonology and morphology. Phonological opacity goes
back to formerly transparent, phonetically motivated alternations which persist
even at a time when the motivating factor is lost, thus gaining a certain degree of
autonomy, or phonetic arbitrariness. Morphomes, such as e. g. arbitrary inflec-
tional classes, are often the synchronic reflex of transparent, e. g. semantically
motivated distinctions at an earlier stage of the language. Again, the relevance
of the motivating factor has decreased or even been lost entirely. Phonological
and morphological autonomy have in common that they both emerge through
diachronic processes, namely diachronic processes of a special kind: loss of con-
ditioning environment (henceforth LOCE). LOCE is a very common pathway of
language change, and it would be surprising if syntax were an exception.
The loss of semantic or pragmatic conditioning in the development of syn-
tactic structure was a central observation in early grammaticalization research,
250 Guido Seiler

under the term “syntactization”. What Givón (1979) describes in the following
citation can be understood as LOCE at the syntactic level: “Loose, paratactic,
‘pragmatic’ discourse structures develop – over time – into tight, ‘grammatical-
ized’ syntactic structures. […] Language […] takes discourse structure and con-
denses it – via syntactization – into syntactic structureˮ (Givón 1979: 108; empha-
sis mine). From the perspective of LOCE, we can paraphrase syntactization as
follows: At some earlier time, the use of an expression was dependent on the
presence of a particular pragmatic context. It required an extrasyntactic trigger.
Later, the expression gained a certain degree of autonomy with regard to extra-
syntactic factors.
Ad (iii): Strictly speaking, LOCE does not necessarily mean that once an
old distributional pattern is lost a new one emerges: The distribution of expres-
sions may become totally random from a synchronic point of view. However, in
the interesting cases the new distribution of expressions also follows a certain
pattern, but one which no longer reflects the old motivating factor. Thus, in order
for syntactization to work a new distributional pattern must be established which
is syntactic in essence. It is often the case that an expression is formerly used only
under certain extrasyntactic contextual conditions which are then dropped such
that the syntactic environment alone triggers the use of that expression. That is, a
grammatical pattern is extended from a source environment to other cases. This
is, of course, the classical definition of analogical extension. Analogical exten-
sion starts from a source context and affects items in a (larger) target context
which shows some functional or structural similarity to the source context. Ana-
logical extension may affect all items within a given target context, in which case
we speak about obligatorization, or syntactization as far as a syntactic pattern is
concerned. I assume that analogical extension is the mechanism by which auto-
nomous syntactic structure is implemented diachronically.
The term “analogy” describes both a cognitive mechanism and a common
pathway of diachronic change, as Bybee (2010) emphasizes: “It is important to
note that analogy as a type of historical linguistic change is not separate from
analogy as a cognitive processing mechanism” (Bybee 2010: 72). The literature
on analogy is abundant. There is some agreement among authors that analogy
is a more general, domain-independent cognitive principle ( cf. Blevins and
Blevins 2009; Itkonen 2005; cf. also Gentner, Holyoak and Kokinov 2001, without
particular reference to language structure). Also, authors emphasize the impor-
tance of similarity relations in analogy (Itkonen 2005: chapter 1.1; Bybee 2010:
57; de Smet 2012: 603). Non-technically speaking, we might understand analog-
ical extension as an instance of the general tendency to use similar strategies
for similar tasks. If you have learned to eat spaghetti by rotating a fork you will
rotate the fork for linguine, too, thus eat linguine in analogy to spaghetti. More
Syntactization, analogy and distinction between causations 251

specifically, and with regard to language structure, we can distinguish between


similarities in terms of communicative function and similarities in terms of struc-
tural makeup. It is often very difficult to tell functional and structural similarities
apart, and it is probable that both closely interact in analogical change (Itkonen
2005: 1). I illustrate the interaction of functional and structural similarities by
reference to a fraction of Old High German inflectional morphology. In early Old
High German a small number of neuter nouns of the so-called “strong” inflec-
tional class displayed a stem allomorphy, e. g. chalb- / chelbir- ‘calf’, whereby
the second allomorph was used when a suffix followed: chalb (Nom.Sg.), but
chelbir-e (Dat.Sg.) (Braune 2004: 188). The majority of nouns of the strong inflec-
tional class did not display this kind of stem allomorphy, cf. wort ‘word’ (Nom.
Sg.), wort-e (Dat.Sg.) (Braune 2004: 184). In the later development of Old High
German the stem allomorphy vanished: chalb (Nom.Sg.), chalb-e (Dat.Sg.). In
terms of functional similarity we can define dative singular formation as the task
which is common to both wort-e and chelbir-e/chalb-e. However, it is interesting
to note that other existing patterns of dative singular formation did not serve as
models here, cf. e. g. hërza–hërzen (‘heart’, neuter, “weak” inflectional class) or
anst–ensti (‘favor’, feminine, strong inflectional class) (Braune 2004: 203, 207).
Obviously, the model for dative singular formation was selected within a specific
structural environment, namely within the limits of the strong inflectional class
of non-feminines (class distinction and gender were morphomic (thus meaning-
less) already in Old High German). We might therefore understand functional
similarity as the dimension along which similar tasks are grouped together and
structural similarity as the dimension along which the potential models for ful-
filling the task are grouped together. In other cases the distinction between func-
tional and structural similarities is even more difficult to draw. Taking German
subject expletives as an example (see Section 2.1), we might understand the
insertion of the dummy pronoun es in finite clauses with weather verbs (lacking
any argument positions) as follows: The task is the formation of a finite clause.
This is fulfilled in analogy to the prototypical case which here serves as the struc-
tural model, i. e. predications involving at least one argument position (of which
the one with the most prominent thematic role is assigned the subject function
by default).
Finally, with regard to analogy-driven change in particular, we understand
analogy as grammar optimization, as proposed by Kiparsky (1982, 2012). Kiparsky
defines analogical change as “the elimination of unmotivated grammatical com-
plexity or idiosyncrasy” (Kiparsky 2012: 21). Thus, analogical change makes a
pattern more general by removing contextual restrictions. This is exactly what
happens in cases of syntactization where an expression’s dependence upon spe-
cific semantic or pragmatic contexts is weakened and eventually dropped.
252 Guido Seiler

It is worth noting that the concept of syntactic analogy is not new at all,
although its role has perhaps been underestimated. According to Percival (1971),
it goes back to Neogrammarian concepts of change, in particular to Blümel (1914).
What does the proposed scenario mean for the relationship between struc-
ture and function, and for the division of labor between formal and functional
explanations? Morphosyntactic change is driven by forces which are well under-
stood and described in functionalist terms, such as reanalysis, grammaticaliza-
tion, iconicity and analogical extension. It seems that (the direction of) change
is a direct reflection of the ways language is used by speakers to achieve their
communicative goals. However, frequent use of grammatical patterns may
entrench their structural makeup to such a degree that functional motivations
(which enabled the process to get into play) become obsolete. What a concrete
example of such usage-driven syntactization (with autonomy as its result) might
look like will be discussed in greater detail in Section 3. If the proposal made here
is correct, it follows that functional explanations are especially powerful with
regard to patterns of variant selection and thus ongoing change, but too limited
for a deeper understanding of the resultant, synchronic grammatical structure. It
is formal theories of syntax in the first place that are suitable to predict grammat-
ical well-formedness (and this, of course, is exactly what they are designed for).

3 Variation and change: A case study

3.1 The importance of variation and change for theoretical linguistics

In order to capture how syntactization works, it is essential to understand how


extrasyntactic triggering of an expression may turn into a syntactic one. There is
no better source of data than cross-dialectal variation for a deeper investigation
into the subtle differences in the conditioning factors for various expressions.
We adopt an approach to language change inspired by evolutionary theory
(Haspelmath 1999a; Croft 2000; Seiler 2002, 2003, 2004; de Vogelaer 2007; Rosen-
bach 2008; cf. Haider, in this volume). According to this view, change is a two-
step process: emergence of new variants and selection among available variants.
Croft (2000) terminologically distinguishes between “innovation” (≈ emergence)
and “propagation” (≈ selection). Croft’s “propagation” is limited to the success of
a variant in terms of its sociolinguistic function only. In earlier work (Seiler 2002,
2003, 2004) I proposed supplementing Croft’s “propagation” with the concept
of “implementation” which refers to the status of the variant in the respective
linguistic system (its valeur linguistique), for Croft’s limitation to social factors in
the selection process turned out to be insufficient and entirely ignores linguistic
Syntactization, analogy and distinction between causations 253

factors which may become crucial for the selectional success of a variant (cf. de
Vogelaer 2007 for a similar point).
So, why is cross-dialectal variation important in this context? New variants
emerge at some place at some time (often via processes of relatively mechanical,
“blind” structural reanalysis, as we will see in the following section). They then
gradually spread over larger areas. The first consequence of variant spread for the
infected grammars is just the addition of a new option, i. e. spread leads to variant
competition in larger areas. However, different dialects often deal with a given set
of competing variants in different ways, according to social, functional or struc-
tural factors (one might say that dialects do different things with the same set of
available expressions). Dialects may develop different functional arrangements
between those variants. A variant may become obligatory in dialect A under
certain contextual conditions, but not in dialect B, whereas in dialect C other con-
ditions are relevant than in dialects A and B, etc. In short, dialects differ not only
in their inventories of variants, but also in the ways variants are implemented in
their respective systems of grammar.6 Therefore, cross-dialectal variation offers
us the most direct insight into the rise and fall of functional motivations of variant
selection.

3.2 Prepositional dative marking in Upper German

The phenomenon under discussion in this section is relatively widespread in


Upper (southern) German dialects. It occurs in dialects of Alsace, Baden-Würt-
temberg, German-speaking Switzerland, Bavaria, Austria and South Tyrol. For all
kinds of details I refer to previous work (Seiler 2002, 2003, 2004).
In these dialects, a dative noun phrase can be preceded by a prepositional
marker (DM = dative marker):

(11) sàg’s in der frau


say-it DM the:Dsf woman
‘say it to the woman’ (Bavarian: Upper Inn Valley; Schöpf 1866: 286)

(12) er git dr Öpfel a mir, statt a dir


he gives the apple DM me:D instead DM you:D
‘he gives the apple to me, not to you’ (Alemannic: Glarus; Bäbler 1949: 31)

6 We assume that different dialects have different grammars. Dialect variation is just cross-lin-
guistic variation between closely related languages.
254 Guido Seiler

The dative marker is homophonous with the local/directional prepositions an ‘at’


or in ‘in’. The distribution of the two sound forms is geographically determined.
However, the dative marker is entirely meaningless, and its historical source is
probably not a local/directional preposition (see below).
The examples above demonstrate that prepositional dative marking is not a
periphrasis, i. e. not a strategy to avoid the dative case since dative case morphol-
ogy is used in this construction, too. Prepositional dative marking is therefore
rather a reinforcement of the dative, which is somewhat surprising since Upper
German dialects have generally preserved dative inflections anyway (in many
dialects the dative is even the only case which is clearly morphologically distinct
from the nominative). Thus, prepositional dative marking cannot be interpreted
as a compensation for eroding case inflections either.
As for the grammatical status of the dative marker, it is not entirely clear
whether we are dealing with a preposition or something else (e. g. a prefix). In most
respects, the dative marker indeed behaves like prototypical prepositions. Most
strikingly, dative marker and preposition occur in complementary distribution:

(13) mit der frau ‘with the:Dsf woman’


in der frau ‘DM the:Dsf woman’
*mit in der frau ‘with DM the:Dsf woman’

Other observations suggest that the dative marker is less independent than other
prepositions. For example, it does not allow scope over two conjuncts. In Seiler
(2003: 148) I analyze the dative marker as an element of the class of prepositions,
whereby it is a special property of the dative marker that it is not able to project
a prepositional phrase but is rather head-adjoined to the following determiner.
As for the emergence of the dative marker, it is argued in Seiler (2003: 215) that
reanalysis of article forms plays a crucial part. Already in Middle High German,
dative article forms, e. g. the singular masculine dëme, formed fusional morphs
with prepositions, whereby the initial dental of the article was dropped:

(14) obem 1280, uf(f)em 1270, am 1277, im 1258, underm 1276, us(s)em 1409,
vom 1277, vorem 1280, hinderm 1403, bim 1280, zem 1245
(Idiotikon XIII: 1191–1192).

In Upper German the form without an initial dental has been generalized over
all other contexts, also in dialects without prepositional dative marking. Thus
(with the exception of extremely conservative dialects) the article form became
əm, with some variation in the vocalism. There exists a whole paradigm of
fusional morphs <preposition_article>, some of which are homophonous with
the bare dative article in unstressed position (namely the equivalents of Stand-
Syntactization, analogy and distinction between causations 255

ard German im ‘in_the’, am ‘at_the’; cf. Seiler 2003: chapter 8.1 for details). It is
relatively obvious to reanalyze a form əm, which is etymologically just <article>,
as having the morphological structure <preposition_article>. This is exactly what
happened in a subset of Upper German dialects which developed prepositional
dative marking. But why should this reanalysis take place after all?
According to Nübling (1992: 221), the most frequent and thus prototypical
context for datives is post-prepositional anyway. More than 90 % of datives are
governed by a “true” preposition in Upper German. Developing prepositional
dative marking means that the prototypical context for dative forms is general-
ized even over those contexts where no other preposition is there already (e. g. in
indirect object function). We might interpret this process as analogical extension:
Formerly bare datives are realized in analogy to the more frequent, i. e. post-prep-
ositional occurrence type. Reanalysis as a process of mechanical structural vari-
ation produces an element without any particular meaning or function, but with
a category label: the dative marker as an expletive preposition.
In light of the evolutionary framework as outlined in Section 3.1, language
change is a two-step process. Reanalysis simply adds a new variant; indeed, prep-
ositional dative marking and bare datives still coexist in most dialects. However,
different dialects deal with this variant competition in different ways, i. e. they
show different patterns of variant selection. Moreover, the distribution of the bare
vs. prepositional dative can be attributed to more general functional (extrasyn-
tactic) principles. I will focus on the influence of information structure and icon-
icity here (see Seiler 2003: chapter 7 for other factors).
In Alemannic dialects of northern Switzerland there is a strong tendency to
insert the dative marker only if the dative noun phrase is focused and bears main
sentence stress. It is not inserted if another constituent is focused (cf. Seiler 2003:
177–186):

(15) a. Dative ≠ focus:


dasmal han ich etz dr Marte es BUECH gschänkt
this_time have I now the:Dsf Martha a book given
‘this time, I gave Martha a BOOK’ (Alemannic: Schaffhausen)

b. Dative = focus:
ich han s buech i dr MARTE ggëë
I have the book DM the:Dsf Martha given
‘I gave the book to MARTHA’ (Alemannic: Schaffhausen)

Is there a functional motivation available for this kind of distribution? Accord-


ing to Givón (1984), indirect objects are typically secondary topics. It is therefore
unusual for a dative to be the focus of the sentence. Prepositional dative marking
is more explicit and involves more phonological material than bare datives. Thus,
256 Guido Seiler

the more marked situation (dative = focus) is expressed by means of the more
marked variant (= prepositional dative marking). This distribution is (construc-
tionally) iconic. A similar point is made by Lambrecht (1994) about the correla-
tion between prosodic prominence and communicative importance:

The interpretation of sentence prosody in terms of communicative intentions is based on


the notion of a correlation between prosodic prominence and the relative communicative
importance of the prosodically highlighted element, the prosodic peak pointing to the
communicatively most important element in the utterance. Prosodic marking is thus in an
important sense iconic, since it involves a more or less direct, rather than purely symbolic,
relationship between meaning and grammatical form. (Lambrecht 1994: 242)

Thus, in northern Switzerland, where both bare and prepositional datives coexist,
their distribution nicely reflects extrasyntactic factors such as information struc-
ture and sentence stress, the concrete realization of which corresponds to more
general cognitive principles such as iconicity.
However, in other dialects prepositional dative marking is obligatory in all
contexts. This is the case e. g. in the Muotathal valley of central Switzerland. Here,
all dative noun phrases are preceded by the dative marker or another preposition,
regardless of discourse function, stress pattern or other factors (distinctiveness
of dative morphology, thematic roles, position, determiner category, etc.). The
dative marker serves as an expletive which is inserted whenever no other prepo-
sition is there already, without respect to any other (in particular extrasyntactic)
factor. We interpret this state of affairs as full implementation of prepositional
dative marking. Diachronically speaking, dative marker insertion is analogically
extended to all datives. Recall that, according to Kiparsky (1982, 2012), analogical
extension can be understood as grammar simplification since contextual con-
straints are dropped.7 A strategy is extended to the whole of a certain context –
and in our case this context is purely syntactic, i. e. the target environment of
analogical extension is defined on purely structural grounds. Assuming that
analogy relies on a similarity relation, similarity here is based on a purely struc-
tural description without any reference to function or meaning.
How do we get from the Schaffhausen to the Muotathal variant of preposi-
tional dative marking? Is there a way of motivating the analogical extension of the

7 Whereas constraint removal can be understood as the impetus for analogy, its result may also
(and paradoxically) be a complexification of the system, as long as obligatorization is not yet
reached: “As every working historical linguist knows, analogical changes tend towards improv-
ing the system in some way (even if incomplete regularization may paradoxically end up com-
plicating it)”. (Kiparsky 2012: 21)
Syntactization, analogy and distinction between causations 257

variant that involves more phonological material? Perhaps it is due to the maxim
of “extravagance” which, according to Haspelmath (1999b), plays a central role
in grammaticalization processes. Haspelmath discusses why grammaticalization
is irreversible. Pursuing a usage-based approach to change in the spirit of Keller
(1994), he introduces a maxim of extravagance (“Extravagance: talk in such a
way that you are noticed”, Haspelmath 1999b: 1055), which may ultimately cause
grammaticalization processes as the unintended cumulative effect of communi-
cative actions: “Grammaticalization is a side effect of the maxim of extravagance,
that is, speakers’ use of unusually explicit formulations in order to attract atten-
tion” (Haspelmath 1999b: 1043). As an unintended side-effect of increasing use,
the more explicit expression may become obligatory. Increasing obligatoriness,
however, is nothing else than what we called syntactization earlier, i. e. applied
to our case: dative marker insertion due to a purely morphosyntactic constraint
on possible environments of dative forms.
In sum, every single step of the gradual implementation of prepositional
dative marking can be relatively easily motivated on the basis of very general,
extrasyntactic, highly usage-based mechanisms such as analogical extension,
iconicity and “extravagance”. However, the result of these processes cannot.
Muotathal speakers certainly do not focus their datives all the time. The example
of prepositional dative marking shows that functional factors provide a plausible
explanation for selectional preferences during a phase of variant competition and
for further implementation of the variant in question. At the same time, it is true
that functional motivations which promote the implementation of a variant may
become obsolete once the variant is implemented further. As for obligatory, fully
syntactisized prepositional dative marking, it is not only impossible to ascribe
it any extrasyntactic function: It is unnecessary. The dative marker is inserted
because the syntax wants it. Any search for a functional motivation within the
synchronic state of the language misses the generalization.

4 Concluding remarks: Lessons from evolutionary biology

In this chapter it was argued that both formal and functional approaches in lin-
guistics are explanatory, but at different levels. It was shown that syntax con-
tains traits which cannot be motivated on the basis of extrasyntactic function
in any direct way. We called this class of phenomena syntactic autonomy. Meth-
odologically, it seems fully appropriate to us to make use of the analytical tools
provided by the formalist tradition in order to capture abstract, purely structural
regularities and relationships. Functionalist argumentations run the risk of over-
interpreting autonomous traits of syntax by searching for extrasyntactic motiva-
258 Guido Seiler

tions where none exist. Based on the example of prepositional dative marking
in Upper German, we have shown that the patterns of variant selection found in
some dialects can indeed be motivated extrasyntactically whereas in other dia-
lects dative marker insertion is purely syntactically triggered, which makes the
search for a functional motivation not only a difficult, but also a pointless task:
Here, prepositional dative marking is due to syntactic well-formedness. We have
hypothesized that well-formedness as such does have a communicative function
insofar as it makes communication more efficient, yet the concrete instantiations
of well-formedness in a particular language are often independent of concrete
functional motivations.
According to our hypothesis, autonomy of syntax is the result of diachronic
development – processes of changes in variant selection which often reflect more
general, i. e. extrasyntactic cognitive principles such as analogical extension,
iconicity or “extravagance”. These must be understood in functionalist or usage-
based terms. Paradoxically, analogical extension may lead to syntactization
which makes the functional factors formerly promoting the selection of a particu-
lar variant obsolete: Whereas pathways of change may be motivated by language
use and communicative function, these processes may ultimately enhance syn-
tactic autonomy. If this reasoning is on the right track, it means that functional
explanations are actually diachronic explanations. Extrasyntactic motivations
are at play especially as long as a variant is not yet fully syntactisized.
Another consequence is the fact that the synchronic structural makeup of
a syntactic pattern is not determined by its function. Knowing the function of
a construction tells us little or nothing about its formal structure. Interestingly,
a similar point can be made from the perspective of evolutionary biology. Ven-
omous snakes use their poison for hunting and digesting their prey in the first
place. It is functional for the snake not to waste the poison for defense. There are
two basic strategies which limit the use of poison for defense: camouflage and
warning. As for warning, different species display different patterns: warning
gestures (cobras), warning sounds (rattlesnakes), or warning colors (coral
snakes). Important in our context is the fact that the function of those patterns
does not determine their structural makeup and therefore leaves space for formal
variation.
Also, from a diachronic perspective, form may follow function only on the
basis of inherited traits. Languages can never invent things ex nihilo (even if that
would be extremely functional); they can only transform devices which are there
already. Most aspects of the structure of a language are determined by the fact that
they are inherited from the language spoken by the preceding generation, regard-
less of whether they are functional or not, whether they are good representatives
of a language universal or not, whether they reflect cross-linguistic preferences
Syntactization, analogy and distinction between causations 259

or not. Only changes in that structure are in a more direct way interpretable as
adjustments towards more general, structural or functional tendencies. Things
cannot be invented ex nihilo in biology, either. The predecessors of sea urchins
were sessile and had no limbs. Later, pre-sea urchins began to move, perhaps as
a reaction to a change in their environment. Evolution did not invent new limbs
because there was nothing which could be transformed into limbs, due to the
pentaradial-symmetric structure of the pre-sea urchin’s body. But pre-sea urchins
had spines, and indeed today’s sea urchins use their spines for motion (Knop
2008: 9).
Finally, if the synchronic grammar allows for a great degree of autonomy, i. e.
independence of functional motivations, one question remains: Is all syntactic
structure just historical contingency? Given our assumptions, shouldn’t it be the
case that anything goes in syntax, without respect to limitations of possible cross-
linguistic variation? Probably not. First, certain types of change are likely to occur
and produce certain kinds of synchronic structure. This idea has been elaborated
in great detail in the field of phonology (Blevins 2004). According to Blevins’
theory of evolutionary phonology, cross-linguistically recurrent patterns are not
so much due to (innate) language universals but rather due to the fact that these
patterns are the results of common types of phonological change. It is worth con-
sidering to what extent this approach is applicable to syntax as well (evolution-
ary syntax in analogy to evolutionary phonology). Second, even linguists who
are generally skeptical about the idea of Universal Grammar must acknowledge
the striking fact that the syntaxes of all languages have something to say about
constituent structure, recursion, grammatical function, lexical classes and basic
principles of case marking and agreement. Whereas Culicover and Jackendoff
(2005) refuse the concrete instantiation of Universal Grammar as suggested by
mainstream generativist linguistics in its technical detail, they nonetheless main-
tain the idea that limitations of cross-linguistic variation cannot be understood
without reference to a downsized version of Universal Grammar, which consists
exactly of the ingredients quoted above (Culicover and Jackendoff 2005: 40).
Let us now construct a last, more far-reaching analogy to evolutionary
biology. We have tried to show that both structure-driven and function-driven
explanations are justified in linguistics: Both structural and functional causa-
tions are at play in syntactic patterning, variation and change. Having accepted
that both explanations are necessary, a central question of theoretical linguistics
must be: In what ways do structure and function interact, and in what sense are
they independent of each other? How can we talk about structural and functional
causations in an objective, non-sectarian way? The answer is clear: by acknowl-
edging that they are complementary. Structural and functional approaches
explain different aspects of language. They are, so to speak, in complementary
260 Guido Seiler

distribution, and this is exactly the reason why they are ultimately compatible
with each other. Evolutionary biology could serve as a model for the integration
of different, but compatible levels of explanation.
According to Nesse (2009), biologists distinguish between two levels of
explanation – proximate and evolutionary – which coexist side by side and are
complementary of each other: “The most fundamental distinction in biology
is between proximate and evolutionary explanations. Proximate explanations
are about a trait’s mechanism […]. Evolutionary explanations are about how
the mechanism came to exist. These two kinds of explanation do not compete.
They are fundamentally different. Both are essential for a complete explanation”
(Nesse 2009: 158). Based on the fundamental distinction between proximate and
evolutionary explanations, Tinbergen (1963) even distinguishes between four
questions a biologist must deal with in order to arrive at a complete explanation
of a trait. Tinbergen’s questions enhance the proximate–evolutionary distinction
with the dimensions of ontogeny and phylogeny. They are “now nearly univer-
sal as a foundation for the study of animal behavior […]. Textbooks all begin by
explaining the need for all four kinds of explanation” (Nesse 2009: 159):

(16) Tinbergen’s Four Questions (following Nesse 2009: 159):


1. What is the mechanism? [proximate]
2. What is the ontogeny of the mechanism? [proximate]
3. What is the phylogeny of the mechanism? [evolutionary]
4. What selection forces shaped the mechanism? [evolutionary]

The proximate–evolutionary distinction was introduced and promoted mainly by


evolutionary biologist Ernst Mayr. As Nesse (2009: 159) points out, Mayr’s ter-
minology has caused confusion insofar as he sometimes calls “ultimate” what
is usually called “evolutionary”, and “functional” what is usually called “prox-
imate”, as we will see below.
Despite the terminological and technical details, the crucial point about the
proximate–evolutionary distinction is its role in the historical development of the
discipline. As Mayr (1997) himself points out, there was an immense controversy
in biology, too, which is surprisingly reminiscent of the formalist vs. functionalist
divide in theoretical linguistics. One camp of biologists claimed that biological
traits must be explained on the basis of the instructions given by genetic pro-
grams. This is the type of biological explanation which we called proximate. The
other camp defended the view that explanations must be formulated in terms of
the function of a trait in its evolutionary context. This is the type of biological
explanation which we called evolutionary. (In Mayr’s own terminology, “func-
tional” refers to proximate explanations, which is the source of the terminologi-
cal confusion mentioned above.) Mayr (1997) states:
Syntactization, analogy and distinction between causations 261

Every phenomenon or process in living organisms is the result of two separate causations,
usually referred to as proximate (functional) causations and ultimate (evolutionary)
causations. All the activities or processes involving instructions from a program are prox-
imate causations. [...] Ultimate or evolutionary causations are those that lead to the origin
of new genetic programs or to the modification of existing ones – in other words, all causes
leading to the changes that occur during the process of evolution. [...] It is nearly always
possible to give both a proximate and an ultimate causation as the explanation for a given
biological phenomenon. [...] Many famous controversies in the history of biology came
about because one party considered only proximate causations and the other party consid-
ered only evolutionary ones. (Mayr 1997: 67)

The debates in biology and linguistics do not, of course, match in detail. For
example, one might ask whether functional explanations in linguistics are anal-
ogous to evolutionary explanations of phylogeny in biology, to phenotypic plas-
ticity of organisms (van Buskirk and Schmidt 2000), or to both.8 However, the
fundamental structure of the debates in biology and linguistics is astonishingly
similar. In both disciplines, two schools defended their way of explaining aspects
of nature as the only possible one at their time: proximate vs. evolutionary in
biology, formal vs. functional in linguistics. The main difference between biology
and linguistics lies in the fact that the complementarity (and compatibility) of
the two kinds of explanation has been widely accepted by biologists since the
modern evolutionary synthesis some seventy years ago. A modern linguistic syn-
thesis is still yet to come.
For linguists, this is not exactly a reason to be proud of.

References
Aissen, Judith (2003): Differential object marking: Iconicity vs. economy. Natural Language and
Linguistic Theory 21: 435–483.
Aronoff, Mark (1994): Morphology by Itself: Stems and Inflectional Classes. Cambridge, MA: MIT
Press.
Bäbler, Heinrich (1949): Glarner Sprachschuel: Mundartsprachbuch für die Mittel- und
Oberstufe der Glarner Schulen. Glarus: Verlag der Erziehungsdirektion.
Blake, Barry (2001): Case. 2nd ed. Cambridge: Cambridge University Press.
Blevins, James P. and Blevins, Juliette (eds). (2009): Analogy in Grammar. Form and Acquisition.
Oxford et al.: Oxford University Press.

8 Phenotypic plasticity leaves room for direct interactions between traits and environment,
whereas in phylogeny this interaction is mediated by evolution. Nevertheless, the existence of
phenotypic plasticity simultaneously calls for proximate and evolutionary explanations: How
does it work, and how did it come into being?
262 Guido Seiler

Blevins, Juliette (2004): Evolutionary Phonology: The Emergence of Sound Patterns. Cambridge:
Cambridge University Press .
Blümel, Rudolf (1914): Einführung in die Syntax. Heidelberg: Winter.
Braune, Wilhelm (2004): Althochdeutsche Grammatik. Edited by Ingo Reiffenstein. Tübingen:
Niemeyer.
Bresnan, Joan (2001): Lexical-Functional Syntax. Malden, MA/Oxford, UK: Blackwell.
Bybee, Joan (2010): Language, Usage and Cognition. Cambridge et al.: Cambridge University
Press.
Choi, Hye-Won (1997): Optimizing Structure in Context. Scrambling and Information Structure.
Stanford: Center for the Study of Language and Information.
Croft, William (1995): Autonomy and functionalist linguistics. Language 71: 490–532.
Croft, William (2000): Explaining Language Change. Harlow et al.: Longman.
Culicover, Peter W. and Ray Jackendoff (2005): Simpler Syntax. Oxford et al.: Oxford University
Press.
de Smet, Hendrik (2012): The course of actualization. Language 88: 601–633.
de Vogelaer, Gunther (2007): Darwinian or Lamarckian change: innovative 2pl.-pronouns in
English and Dutch. In: Frank Brisard (ed.): Papers of the Linguistic Society of Belgium,
1–14. Bruxelles: Linguistic Society of Belgium.
Fanselow, Gisbert and Sascha W. Felix (1993): Sprachtheorie I: Grundlagen und Zielsetzungen.
3rd ed. Tübingen: Francke.
Gentner, Dedre, Keith J. Holyoak and Boicho N. Kokinov (eds.) (2001): The Analogical Mind:
Perspectives from Cognitive Science. Cambridge,MA/London: MIT Press.
Givón, Talmy (1979): On Understanding Grammar. New York: Academic Press.
Givón, Talmy (1984): Direct object and dative shifting: semantic and pragmatic case. In: Frans
Plank (ed.), Objects. Towards a Theory of Grammatical Relations, 151–182. London/New
York: Academic Press.
Halliday, Michael A. K. (1973): Explorations in the Functions of Language. London: Arnold.
Haspelmath, Martin (1999a): Optimality and diachronic adaptation. Zeitschrift für Sprachwis-
senschaft 18: 180–205.
Haspelmath, Martin (1999b): Why is grammaticalization irreversible? Linguistics 37:
1043–1068.
Hockett, Charles F. (1960): The origin of speech. Scientific American 203: 88–96.
Idiotikon (1881–): Schweizerisches Idiotikon. Wörterbuch der schweizerdeutschen Sprache.
Begonnen von Friedrich Staub und Ludwig Tobler und fortgesetzt unter der Leitung
von Albert Bachmann, Otto Gröger, Hans Wanner, Peter Dalcher, Peter Ott, Hanspeter
Schifferle. Frauenfeld: Huber.
Itkonen, Esa (2005): Analogy as Structure and Process: Approaches in Linguistics, Cognitive
Psychology and Philosophy of Science. Amsterdam/Philadelphia: John Benjamins.
Keller, Rudi (1994): Language Change: The Invisible Hand in Language. London: Routledge.
Kiparsky, Paul (1973): Abstractness, opacity and global rules. In: Osamu Fujimura (ed.), Three
Dimensions of Linguistic Theory, 57–86. Tokyo: Tokyo Institute for Advanced Studies of
Language.
Kiparsky, Paul (1982): Explanation in Phonology. Dordrecht: Foris.
Kiparsky, Paul (2012): Grammaticalization as optimization. In: Dianne Jonas, John Whitman and
Andrew Garrett (eds.), Grammatical Change: Origins, Nature, Outcomes, 15–51. Oxford
et al.: Oxford University Press.
Knop, Daniel (2008): Seeigel im Meerwasseraquarium. Münster: Natur und Tier.
Syntactization, analogy and distinction between causations 263

Lambrecht, Knud (1994): Information Structure and Sentence Form. Topic, Focus, and the
Mental Representations of Discourse Referents. Cambridge: Cambridge University Press.
Mayr, Ernst (1997): This is Biology. The Science of the Living World. Cambridge, MA/ London:
Harvard University Press.
Musan, Renate (2002): Informationsstrukturelle Dimensionen im Deutschen. Zur Variation der
Wortstellung im Mittelfeld. Zeitschrift für germanistische Linguistik 30: 198–221.
Nesse, Randolph M. (2009): Evolutionary and proximate explanations. In: David Sander and
Klaus R. Scherer (eds.), The Oxford Companion to Emotion and the Affective Sciences,
158–159. Oxford: Oxford University Press.
Nübling, Damaris (1992): Klitika im Deutschen. Tübingen: Narr.
Percival, Keith W. (1971): The Neogrammarian approach to syntactic change. Manuscript
presented at the Twenty-Fourth Annual University of Kentucky Foreign Language
Conference in Lexington, Kentucky, 22–24 April 1971. http://people.ku.edu/~percival/
NeogramSyntax.html.
Rosenbach, Annette (2008): Language change as cultural evolution: Evolutionary approaches
to language change. In: Regine Eckardt, Gerhard Jäger and Tonjes Veenstra (eds.),
Variation, Selection, Development : Probing the Evolutionary Model of Language Change –
Proceedings of Blankensee Colloquium 2005, 23–72. Berlin/New York: Mouton de Gruyter.
Schöpf, Johann Baptist (1866): Tirolisches Idiotikon. Innsbruck: Wagner.
Seiler, Guido (2002): Prepositional dative marking in Upper German: a case of syntactic
microvariation. In: Sjef Barbiers, Susanne van der Kleij and Leonie Cornips (eds.), Syntactic
Microvariation, 243–279. Amsterdam: Meertens Instituut. Available at: www.meertens.
knaw.nl/projecten/sand/synmic/.
Seiler, Guido (2003): Präpositionale Dativmarkierung im Oberdeutschen. Stuttgart: Steiner.
Seiler, Guido (2004): The role of functional factors in language change. An evolutionary
approach. In: Ole Nedergaard Thomsen (ed.), Competing Models of Linguistic Change.
Evolution and beyond, 163–182. (Current Issues in Linguistic Theory 279.) Amsterdam/
Philadelphia: John Benjamins.
Siewierska, Anna (1991): Functional Grammar. London: Routledge.
Vallduví, Enric (1992): The Informational Component. New York: Garland.
van Buskirk, Josh and Benedikt R. Schmidt (2000): Predator-induced phenotypic plasticity in
larval newts: trade-offs, selection, and variation in nature. Ecology 81: 3009–3028.
Rena Torres Cacoullos, Penn State University
Gradual loss of analyzability:
Diachronic priming effects

Abstract: Competing accounts of the formation of grammatical units are tested


by deploying the facts of variation of the Spanish Progressive. First, unithood
and frequency measures support usage-based chunking as more tenable than
formal reanalysis as an account of change in constituency. Second, compari-
son of multivariate models of variation over time reveals that the spread of the
Spanish Progressive relative to the simple Present has been differential, as shown
in change in the linguistic conditioning of variant choice, in disagreement with
an abrupt-reanalysis, constant-rate hypothesis but in support of gradual change
in diachrony and inherent variability in synchrony. Third, a priming effect – such
that selection of a given construction is favored by previous use of a related con-
struction (here, priming of the estar Progressive by non-Progressive estar con-
structions) – is introduced as a measure of internal structure, in particular, of
(loss of) analyzability.

1 Introduction

How do grammatical units come about, and how can change in constituency be
observed? Reanalysis is widely invoked by linguists of otherwise different per-
suasions as a pivotal mechanism of syntactic change. Reanalysis is understood
to change underlying structure, including constituency and syntactic-category
labels (Campbell 1998: 284). For example, the English future auxiliary is said to
result from reanalysis of the purposive motion construction of main verb go with
a non-finite clause complement, as represented by rebracketing of some kind:
[BE going [to Verb]] > [BE going to Verb] (Hopper and Traugott 1993: 3). A material
indication of such reanalysis would be phonetic reduction of going to to gonna.
Reanalysis has been seen as abrupt, following from the view that each word
sequence must have a unique constituent analysis, which in turn follows from
the formalist (generative) view that proposed syntactic rules or constraints are
categorical and that syntactic categories, for example, main vs. auxiliary verb,
are discrete. But the facts of synchronic variation, as between going to and gonna
(e. g., Poplack and Tagliamonte 1999: 328–332), disturb an understanding of
grammatical change as abrupt reanalysis.
266 Rena Torres Cacoullos

The probabilistic aspects of grammar (Labov 1969; Cedergren and Sankoff


1974) are now being recognized by more linguists, who are exploring usage-based
and emergentist theories of grammar. Bybee (2010) proposes that constituent
structure is derivable from domain-general mechanisms in operation as speakers
produce and process language. Pertinent here is the fusing of sequential expe-
riences that occurs with repetition or, for language, the chunking of frequent word
sequences as single processing units (Bybee 2010: 34 and references therein). For
example, the vowel of don’t is more likely to reduce to a schwa in I don’t know
than when the main verb is a less frequent one, as in I don’t inhale, even though
the two expressions are apparently of the same syntactic structure (Scheibman
2001: 114).
In a usage-based view, a consequence of frequent repetition and ensuing
chunking of contiguous linguistic units is the loss of analyzability of the sequence
of (erstwhile) units (Bybee 2010: 44–45; see also Croft and Cruse 2004: 250–253;
Langacker 1987: 292). Analyzability is seen as a morphosyntactic parameter that
has to do with the degree to which the internal structure is discernable (akin to
the morphological “decomposability” of complex words (cf. Hay 2001)), which
is not subsumable under a semantic criterion. For example, while pull strings
has a non-transparent meaning that is not predictable from pull and strings, it is
syntactically analyzable, in that speakers presumably recognize the component
parts as individual words and the relation between them, here a verb with its
object.
With schematic (productive) constructions that have open classes of items,
such as [BE going to Verb], loss of analyzability is understood as the weakening
of the association between the erstwhile individual components with other
instances of the same items. In Bybee’s (2003: 618) example, as going to reduces
to gonna, its composite morphemes lose their association with go, to or -ing. But
what observations provide evidence for “association” and its loss?
In this paper, the facts of variation are deployed to tackle the question of
how grammatical units come about. I use variability in the Spanish Progressive
to test gradualness vs. abruptness in the formation of grammatical units and put
forward diachronic priming effects as a gauge of analyzability and its erosion over
time. After presenting the linguistic variable in Section 2, I begin in Section 3
with a recapitulation of unithood indices and frequency measures, which score
an initial point in favor of usage-based chunking as more tenable than formal
reanalysis. I then present a multivariate model of variation between the Progres-
sive and simple Present, in Section 4. The shift in the relative importance of
aspectual reading and locative co-occurrence scores a further point in favor of
gradual change in diachrony and inherent variability in synchrony. In Section 5 I
introduce priming effects as a measure of (erosion of) analyzability.
Gradual loss of analyzability: Diachronic priming effects 267

2 Spanish Progressive ESTAR + Verb-ndo

Latin did not have a dedicated morpheme or construction for progressive aspect,
the simple Present serving this function among others (Allen and Greenough 1916:
293, § 465). Probably the most common source for progressives crosslinguistically
is locative expressions (Bybee, Perkins and Pagliuca 1994: 127–133; cf. Comrie
1976: 98–105). Beginning from the earliest Spanish texts, we find gerunds (-ndo
forms) combining with finite forms of spatial (locational, postural or movement)
verbs. Besides estar ‘be (at)’, these were usually ir ‘go’, andar ‘walk, go around’,
venir ‘come’, salir ‘go out’, quedar ‘remain, stand still’. Examples are (1a), with ir,
and (1b), with venir.

(1a) déxa=me dezir, que se va hazie-ndo noche


let.imp=acc.1sg say.inf that refl go.prs.3sg make-ger night
‘let me speak, it is [literally: goes] becoming night’
(15th c., Celestina, Act VI)

(1b) ¿No oyes lo que viene canta-ndo ese villano?


neg hear.prs.2sg that.rel come.prs.3sg sing-ger that rustic
‘Don’t you hear what that rustic is [literally: comes] singing?’
(17th c., Quijote II, Ch. IX)

Allen and Greenough (1916: 819, § 507) give a medieval Latin example of this
general Spatial Verb + Verb-ndo (gerund) construction, cum una dierum flendo
sedisset, quidam miles generosus iuxta eam equitando venit (Gesta Romanorum,
66 [58]) ‘as one day she sat weeping, a certain knight came riding by’ (Gesta
Romanorum, 66 [58]).
In Torres Cacoullos (2000) I adduced evidence for the origins of Spanish
Progressive ESTAR (< Latin stare ‘stand’) + Verb-ndo (gerund) as a locative
expression ‘be located somewhere Verb-ing’ from its early distributions across
co-occurring locatives (most frequently with en ‘in’) and verbs (most frequently
hablando ‘talking’, other verbs of speech, esperando ‘waiting’, and verbs of body
activity). These co-occurring elements are consonant with being stationary in a
given place. A 13th century example appears in (2). In contrast, gerund combina-
tions with motion verbs ir ‘go’ and andar ‘walk (around)’ tended to co-occur with
other kinds of locatives (a ‘to’, por ‘along’) and verb classes (motion, process,
general activity).

(2) u<uest>ros hebreos estan aqui razona-ndo


prs.3pl here discourse-ger
‘your Hebrews are here conferring’
(13th c., General Estoria I, fol. 151r)
268 Rena Torres Cacoullos

The key construct in variation theory is the linguistic variable (Labov 1969), a set
of variants which “are used interchangeably to refer to the same states of affairs”
(Weiner and Labov 1983: 31), i. e. “alternative ways of saying the same thing”
(Labov 1982: 22). In the pair of examples from a 19th century play in (3), the “same
thing”, or grammatical function, is present progressive and the “alternative
ways”, or variants, are the Progressive and simple Present forms. In the English
translation, PROG designates the Progressive – ESTAR + Verb-ndo – as in (3a),
PRS the simple Present, as in (3b). Both forms here express a situation in progress
at the moment of speech.

(3a) EDUARDO. – No me muestres esa compasión. Yo no la merezco.


¿Sabes tú con quién estás habla-ndo?
know.prs.2sg you with rel be.prs.2sg speak-ger
‘EDUARDO: Don’t show me such compassion. I don’t deserve it.
Do you know who you are talking to (PROG)?’
(19th c., Amor de padre, Act 5, Scene2)

(3b) AGENTE. – ¿Cómo tienes valor?


Olvidas que hablas con un republicano?
forget.prs.2sg comp speak prs.2sg with a republican
‘AGENT: How do you have the courage?
Do you forget that you are talking (PRS) to a republican?’
(19th c., Amor de padre, Act 3, Scene VII)

The variable context is the sum of contexts where distinctions in grammatical


function among different forms may be “neutralized in discourse” (Sankoff 1988a:
153). This is defined here broadly as the domain of present temporal reference,
since the Progressive and simple Present also compete as expressions of non-pro-
gressive present situations (e. g., (8), below). We circumscribe a variable context
in order to adhere to the principle of accountability, that not only occurrences but
also non-occurrences of a given variant be noted (Labov 1982: 30), here, where the
Progressive could have materialized but the simple Present did instead, as in (3b).
Tokens of Present-tense ESTAR + Verb-ndo were exhaustively extracted from
a corpus comprised of 60 texts from three time periods, the 13th–15th, 17th and 19th
centuries (traditionally, Old Spanish, Golden Age Spanish, and Modern Spanish;
the texts are listed in the Appendix). Tokens of the “non-occurrences”, i. e., of the
simple Present, were extracted by taking simple Present-tense occurrences of the
same lexical types that appear in the Progressive in a given text. From this sample,
Present-tense forms with future or past temporal reference were excluded, for
example, estaba […] para montar a caballo […], cuando oigo ¡tras tris, tras tras! ‘I
was […] about to get on the horse, when I hear tras tris, tras tras! (Pazos, Ch. XXI).
Also discarded were first- or second-person singular discourse routines (e. g., digo
‘I say’, ya ve(s) ‘you see’) or prefabs involving ser ‘be’ (e. g. es que ‘it’s that’).
Gradual loss of analyzability: Diachronic priming effects 269

Following these protocols, a total of 1,656 tokens of the Progressive or simple


Present were retained for the analyses of variation. Table 1 depicts the number of
texts, word counts and Ns for the three time periods.

Table 1: Data for the study of Progressive – simple Present variation in present temporal
reference contexts

13th–15th century 17th century 19th century

No. texts 17 15 28
Word count 2,500,000 600,000 900,000
N Progressive 119 180 317
N simple Present 4291 564 663

All tokens of both forms were coded according to a number of hypotheses about
variant choice, operationalized as factors based on the presence or absence of lin-
guistic elements of the context in which the token occurs. Included in the factor
groups (independent or predictor variables, or constraints) are co-occurrence of
locative adverbials, aspectual reading and priming. The linguistic conditioning of
variant selection is instantiated in probabilistic associations of forms with con-
textual elements. A multivariate model of the variation is presented in section 4
ahead, after we first consider evidence from distributional analysis and frequency
counts, below.

3 Unithood and frequency

Spanish Progressive ESTAR + Verb-ndo would seem a good candidate for change
via either reanalysis or loss of analyzability. The change in constituent structure
would be from a sequence of two independent units – a finite form of main verb
estar ‘to be (located)’ with a gerund -ndo complement – to a single periphrastic
unit, in which estar is an auxiliary and the gerund is the main verb (4).

reanalysis / loss of analyzability


(4) [ESTAR]verb + [Verb-ndo (gerund)]complement > [ESTARaux + Verb-ndoverb]Progressive

1 For the 13th–15th century simple Present sample, tokens of lexical types appearing in the Pro-
gressive were not extracted from Grimalte y Gradissa and Crónica de los Reyes Católicos, for which
electronic versions were not available (three Progressive tokens each); also omitted were Present
tokens of frequent decir ‘say’ in Corbacho (of which there was one Progressive token). More on
the texts, the simple Present sampling and exclusions is given in Torres Cacoullos (2012).
270 Rena Torres Cacoullos

Whereas in going to the items are contiguous, here we have a schematic construc-
tion with an intervening slot for the open class of items, the Verb. In the absence
of surface phonetic reduction, as with English future gonna, what evidence could
be assembled for the status of ESTAR + Verb-ndo as a unitary constituent?
We may take the obverse of analyzability to be unithood, operationalized as
the proportion of the instances of the construction in which the adjoining items
behave as a single unit, i. e. as one word. In previous work (e. g., Torres Cacoullos
2006; Torres Cacoullos and Walker 2011) we developed unithood indices from dis-
tributional analysis, which tracks proportions of tokens of an expression across
its contexts of occurrence. Increasing unithood of ESTAR + Verb-ndo has been
inferred from a decreasing proportion of occurrences with elements intervening
between estar and the gerund, with more than one gerund per estar, or with the
gerund preceding estar (Torres Cacoullos 2000: 31–55; Bybee and Torres Cacoul-
los 2009: 201–203; Torres Cacoullos 2012: 79).
A more direct index of unithood is the positioning of object pronouns, which
precede finite verb forms in modern Spanish (Torres Cacoullos 1999b). In (5a) the
object pronoun (underlined) is postposed to the gerund (is telling him), in (5b)
it is preposed to estar (literally, ‘it are saying’). The latter configuration, known
as “clitic climbing”, has been viewed in generative syntax as a restructuring of a
series of verbs into a single verbal complex (e. g., Rizzi 1982). In a functionalist
view, “clitic climbing” has been seen as a manifestation of the grammatical-
ization of auxiliaries, as a verb comes to express grammatical (e. g., aspectual,
progressive) more than lexical (e. g., spatial, locative) meaning (Myhill 1988).

(5a) [ESTAR] + [Verb-ndo + object pronoun/clitic]

Yo voy con tu cordon tan alegre: que se me figura que


esta dizie-ndo le alla su coraçon la merced que nos heziste
be.prs.3sg tell-ger dat.3sg there his heart […]
‘I’ll go with your cord so happily, I can almost imagine that his heart there is telling
him of the great favor you have done us’
(15th c., Celestina, Act IV, fol.32r)

(5b) [object pronoun/clitic + ESTAR + Verb-ndo]

– P
 ero que nosotros tampoco les vamos a dar cien días. Vamos a decir lo que nos parezca
desde hoy.
– Ya lo estamos dicie-ndo.
already acc.3sg be.prs.1pl tell-ger
‘But we’re not going to give them a hundred days. We’re going to say what we think
starting today.’
‘We [it] are already saying it’
(20th c., CORLEC, CDEB014A, p215–p216)
Gradual loss of analyzability: Diachronic priming effects 271

In the 15th century example in (5a) ESTAR + Verb-ndo is compatible with locative
meaning, indicated by co-occurring allá ‘there’ in the same clause and the motion
verb voy ‘I go’ in a previous clause: the speaker will go to where the person rep-
resented metonymically by his heart is located (está…allá ‘is…there’). There is at
the same time aspectual meaning, as conveyed by se me figura ‘I can imagine’: the
situation referred to by the gerund is in progress at speech time. In comparison,
spatial meaning appears at best attenuated in the 20th century example in (5b),
where most prominent is aspectual meaning, indicated by co-occurring temporal
adverbial ya ‘already’: the speaker asserts that the verbal situation (diciendo
‘saying’) is in progress.
In Table 2, though the count of all eligible cases is low, there is a clear trend
of increased rates of placement before estar (proclisis). Increasing placement of
object pronouns before the whole complex (as with single finite verbs), rather
than attached to the gerund, can be taken as an indication of enhanced unithood.2

Table 2: Increasing unithood of ESTARPresent + Verb-ndo: placement of object pronouns before


estar (“clitic climbing”)3

13th–15th century 17th century 19th century 20th century4

71 % (10/14) 72 % (18/25) 89 % (58/65) 97 % (100/103)

Unithood is a theory-neutral measure, compatible with either reanalysis or loss of


analyzability. However, the two accounts are distinguished by the expected (non)
role of frequency. On the one hand, loss of analyzability attributable to chunking
depends on repetition. Applied to the case at hand, with frequent repetition the
sequence ESTAR + Verb-ndo would become a new chunk – more of a fused unit.
On the other hand, a theory of syntactic change based on reanalysis in terms of

2 The 19th and 20th century rate of preposed object clitics shown in Table 2 is higher than for
all tenses of ESTAR + Verb-ndo (respectively, 70 %, 54/77 in the same texts (reported in Bybee
and Torres Cacoullos 2009: 203) and 89 %, 103/115 in Mexico City “habla popular” (UNAM 1976)
(reported in Torres Cacoullos 1999b: 146). This is consonant with Progressive grammaticaliza-
tion advancing in present before past tenses (Torres Cacoullos 2012: 110, n. 3) (whereas habitual
markers are said to appear in past before generalizing to present temporal reference contexts
(Bybee et al. 1994: chapter 5)).
3 In χ2 tests, difference between 17th and 19th p < 0.06 (n. s.), between 19th and 20th p < 0. 05.
13th–15th and 17th century totals include object pronouns placed between estar and the gerund
(e. g., ell Aguila esta la remira<n>-do, GEII, fol. 189v) (N = 2, N = 3, respectively).
4 20th century = CORLEC (Marcos Marín 1992) “conversacional” portion (see Table 1 for remain-
ing data).
272 Rena Torres Cacoullos

operations such as “movement and “merge” (e. g., Roberts and Roussou 2003)
makes no predictions about frequency of use, under the assumption that usage
does not impinge on grammar (e. g., Newmeyer 2003).
Table 3 displays three frequency counts for ESTAR + Verb-ndo, one absolute,
i. e. the token frequency of the construction, and two relative, namely the propor-
tion it constitutes of gerund constructions vis-à-vis other spatial auxiliaries and
its rate relative to the simple Present. The first count, in the first row, is straight-
forward text frequency normalized per 100,000 words (based on the figures given
in Table 1 above), by which there is an evident rise (cf. Torres Cacoullos 2012: 77).

Table 3: Frequency increase of ESTARPresent + Verb-ndo

Century 13th–15th 17th century 19th century 20th century5


century

Token frequency per 100k words 5 30 35 151

Proportion estar gerund constructions 38 % 41 % 62 % 83 %


with respect to other locative-postural- (45/117) (54/133) (93/149) (364/436)
movement verbs6

Rate relative to simple Present 14 % 24 % 32 %


occurrences of verbs which appear in (39/282)7 (180/744) (317/980)
ESTAR + Verb-ndo

The second frequency measure is the proportion that ESTAR + Verb-ndo con-
stitutes as an instance of the general gerund construction with finite forms of
spatial verbs. Some of these gerund constructions (especially with ir and, in some
dialects, andar) remain robust in modern varieties of Spanish (Torres Cacoullos
1999a). Nevertheless, as Table 2 shows in the second row, the aspectual auxiliary
is increasingly more likely to be estar than another spatial verb (cf. Torres Cacoul-
los 2000: 55–60). We can think of this as a measure of string frequency, which
may indicate “chunk status” (Brown and Rivas 2011: 42–43).

5 20th century = CORLEC (Marcos Marín 1992) “conversacional” portion: word count 241K,
­N ­ESTARPres + Verb-ndo 364.
6 Counts from a subset of texts in Table 1: for 3th–15th century from Calila, GE, Celestina, CRC
(andar (17), ir (39), venir (10)); 17th from Quijote (andar (12), ir (60), quedar (5)); 19th century, from
Pepita, Perfecta, Regenta, Pazos (andar (6), ir (39), seguir (5)); 20th century from CORLEC (ir (47),
seguir (20), venir (4)).
7 Count from 15th century Corbacho and Celestina only, furthermore not counting simple Present
decir ‘say, tell’ in the Corbacho.
Gradual loss of analyzability: Diachronic priming effects 273

A third frequency measure is the increasing rate of ESTAR + Verb-ndo relative


to the simple Present variant (as in the pair of examples in (3)). Here we count the
frequency of Progressive forms relative to simple Present forms of lexical types
appearing in the Progressive in the same text (cf. Table 1). Though such a lexi-
cally-based count does not provide overall rates, we again observe a rising trend,
in the third row of Table 3.
In summary, from enhanced unithood of the construction, measured here by
object clitic position, we may infer the increasing absorption of the auxiliary-in-
becoming into a periphrastic unit. Coupled with this inference of loss of analyz-
ability is the finding that Present-tense ESTAR + Verb-ndo sequences increase in
frequency, by both absolute and relative frequency measures. We have support,
then, for the prediction following from the chunking hypothesis that loss of ana-
lyzability of a word sequence is accompanied by frequency increases. In Section
5 I will put forward priming as a more direct measure of analyzability and its
erosion. But first, in the next section, evidence for gradualness in the evolution
of ESTAR + Verb-ndo is adduced from the comparison of probabilistic models of
its variable use over time.

4 Change as change in linguistic conditioning

4.1 I nherent variability vs. competing grammars:


Predictions about contextual effects

The position of Weinreich, Labov and Herzog (1968: 101) is that “command of het-
erogeneous structures is not a matter of multidialectalism or ‘mere’ performance,
but is part of unilingual linguistic competence”. In this view, systematic variation
belongs to (a single) grammar (Cedergren and Sankoff 1974: 334).
However, variation and apparent gradualness may be attributed to com-
peting grammars underlying a given form. Abruptness and discreteness may be
upheld in the face of observed variation by viewing the aggregate data as reflect-
ing the coexistence of multiple (generative) grammars. Language change is then
modeled as modification in the distribution of competing grammars over time
(e. g., Yang 2000). For example, the spread of English do-support across differ-
ent sentence types (e. g., negative declaratives and affirmative wh-object ques-
tions) is seen as “surface manifestations of a single change in grammar” (such
as loss of Verb-to-Infl movement) (Kroch 1989: 199; but see Bybee 2010: chapter
7). Because change is postulated to be a single abrupt change in a parameter
setting and the gradual time course is seen as representing a shift from the old
274 Rena Torres Cacoullos

invariant homogenous grammar to the new one, the contexts of the change must
be uniform in their effects.
According to this scenario a new structure such as do-support may be favored
earlier in some contexts than others and begin in those contexts with a higher
rate, but the rate of change is constant across contexts; in terms of linguistic con-
ditioning, the effects remain fixed in magnitude and direction as the change is
propagated (Kroch 1989: 206).
Returning to ESTAR + Verb-ndo, if the change is abrupt and it is the propaga-
tion of change (e. g., across authors and texts) that is gradual, we should observe
that the frequency of the newer variant – the Progressive – relative to the older
one – the simple Present – increases at a constant rate, uniformly, across linguis-
tic contexts. But if the change itself is a gradual modification of the grammar – one
with inherent variability – it is possible that the linguistic contexts and conditions
could vary across the course of the change. That is, the rate of occurrence of the
Progressive could increase differentially across linguistic contexts.8 Does it?

4.2 Shifts in relative magnitude of effect

Table 4 shows three independent Variable-rule analyses (Sankoff, Tagliamonte


and Smith 2005) of the probability that the Progressive variant will be selected,
in 13th–15th century, 17th century and 19th century texts. Variable-rule analysis uses
logistic regression to perform binomial multivariate analysis for a choice between
two variants, here, the Progressive vs. the simple Present (Sankoff 1988b).
There are three lines of evidence in interpreting results of Variable-rule analysis
(Poplack and Tagliamonte 2001: 88–95):

1. statistical significance of effect, determining the factor groups (independent variables or


constraints) that together account for the largest amount of variation (in terms of stepwise
increase of log likelihood, such that the addition of any of the remaining factor groups does
not significantly increase the fit to the model);

2. direction of effect, with probabilities (or factor weights, shown in the bordered set of
columns) closer to 1 indicating a favoring, and closer to 0 a disfavoring, effect on ESTAR +
Verb-ndo. That is, the closer to 1 the probability, the greater the likelihood of the Progressive
in each of the contexts (factors) listed on the left;9

8 I thank Greg Guy for help in formulating the competing predictions about contextual effects.
Thanks to Shana Poplack and Catherine Travis for extensive comments on an earlier version of this
paper, and also to the editors of the volume, Aria Adli, Göz Kaufmann and Marco García García.
9 Factor weights for non-significant groups, from the first “step down” run, in which all groups
Gradual loss of analyzability: Diachronic priming effects 275

3. relative magnitude of effect, as assessed by the Range (shown in italics) between the favor-
ing and the disfavoring probability within each (binary) factor group.

With respect to significance, in Table 4 we see that contributing to variant choice


in the 13th–15th century data are Aspect, Locative co-occurrence and Priming (as
well as Polarity-Sentence type and Temporal co-occurrence, but not Stativity).
In the 19th century, Priming no longer significantly increases the fit to the model
(and Stativity has achieved significance). Zeroing in here on Locative co-occur-
rence and Aspect, the quantitative argumentation that follows will rely on direc-
tion of effect, which has generally remained stable, and relative magnitude of
effect, which turns out to be the main locus of change (on Priming, see Section 5;
on the other effects, see Torres Cacoullos 2012).
For the Locative co-occurrence factor group, tokens of both forms were coded
for the presence of a locative in the same clause, as in (6). The hypothesis of reten-
tion (Bybee et al. 1994: 15–19) or persistence (Hopper 1991: 28–29), according to
which grammaticalizing constructions/morphemes have semantic content deriv-
ing from the meaning of their source construction, leads to the prediction that
selection of ESTAR + Verb-ndo will be favored in the presence of a locative (Torres
Cacoullos 2012: 83–85).

(6a) en la galería me está esperando


In the gallery dat.1sg be.prs.3sg wait-ger
‘he is waiting (PROG) for me in the gallery’
(19th c., Perfecta, Ch. X)

(6b) ahí le esperan a Vd. con las caballerías


there dat.3sg wait.prs.3pl acc you with the mounts
‘they are waiting (PRS) for you there with the mounts’
(19th c., Perfecta, Ch. I)

For the Aspect factor group, I coded tokens of both the Progressive and simple
Present as ‘limited duration’ if the aspectual reading was one of progressive or
continuous (Comrie 1976: 33), as in the pair of examples in (3) and (6), above.
Limited duration also applies to stative predicates when the situation is tem-
porally circumscribed, or bound to speech time, again for both variants, as in (7).
‘Extended duration’, on the other hand, subsumes habitual aspect for dynamic
verbs, and states without temporal limits, which exist indefinitely, as in (8) (on
coding for aspect, see Torres Cacoullos 2012: 87–91).

are included in the regression, are provided within brackets to indicate direction of effect (Po-
plack and Tagliamonte 2001: 93–94).
Table 4: Three independent Variable rule analyses of linguistic factors contributing to selection of the Progressive10
276

13th–15th century 17th century 19th century


Input .21 (119/548) .21 (180/744) .33 (317/980)
Prob %Prog N Prob %Prog N Prob %Prog N
Locative co-occurrence
Present .90 72 % 21/29 .76 57 % 34/60 .67 51 % 36/70
Absent .47 19 % 98/519 .47 21 % 142/676 .49 31 % 275/888
Range 43 29 18
Aspect
Limited duration .68 37 % 62/169 .70 41 % 144/355 .71 56 % 269/481
Extended duration .35 15 % 30/200 .16 6% 11/182 .12 8% 16/210
Rena Torres Cacoullos

Range 33 54 59
Priming
Preceding estar + X construction .76 56 % 9/16 .69 55 % 13/26 [.49] 30 % 14/46
Preceding ‘Other’ tenses .54 21 % 41/194 .53 27 % 70/256 [.56] 38 % 117/310
Preceding simple Present .46 19 % 55/296 .47 20 % 84/430 [.46] 27 % 133/486
Range 30 22
Polarity – Sentence type
Affirmative declarative .54 24 % 106/440 .57 27 % 164/598 .58 37 % 285/778
Negative, Interrogative .31 13 % 11/86 .18 8% 9/116 .18 13 % 20/160
Range 23 39 40
Temporal co-occurrence
Present .76 33 % 25/76 [.54] 28 % 29/103 .60 37 % 50/135
Absent .45 20 % 93/469 [.49] 24 % 150/639 .48 32 % 266/839
Range 31 12
Stativity
Dynamic predicate [.49] 21 % 93/435 [.52] 26 % 153/593 .51 34 % 282/840
Stative predicate [.56] 23 % 26/113 [.43] 18 % 27/151 .44 25 % 35/140
Range 7

10 Non-significant factors are shown within square brackets. Ns in some factor groups do not add up to total N because of excluded factors or uncodable
tokens.
Gradual loss of analyzability: Diachronic priming effects 277

(7) Limited duration (in progress, bound to speech time)


(7a) fabla p<er>o bermudo por q<ue> estas calla-ndo
speak. imp Pero Bermudo why be.prs.2sg be_silent-ger
‘speak Pero Bermudo, why are you (being) silent (PROG)?’
(13th c., EE II, fol. 240v)

(7b) No me entiendes, Sancho: no quiero decir sino que […]


neg dat.1sg understand.prs.2sg
‘You do not understand (PRS) me, Sancho: all I want to say is that […]’
(17th c., Quijote II, Ch. XXV)

(8) Extended duration (habitual, indefinitely existing)


(8a) los peces son los huéspedes que siempre están calla-ndo
the fish be.prs.3pl the guests rel always be.prs.3pl be_silent-ger
‘the fish are the guests who are always (being) silent (PROG)’
(13th c., Apolonio, verse 506)

(8b) No entiendo otra lengua que la mía


neg understand. prs.1sg other language than the mine
‘I understand (PRS) no language other than my own’
(17th c., Quijote II, Ch. II)

With respect to direction of effect, we note that a co-occurring locative consis-


tently favors selection of ESTAR + Verb-ndo (with probabilities of .90, .76 and .67,
in the 13th–15th, 17th and 19th centuries, respectively; in the absence of a co-occur-
ring locative values are close to .50, at .47, .47, .49). Stability in the direction of
effect for locative co-occurrence supports the retention hypothesis. Also consis-
tently favoring the Progressive are situations of limited duration. This effect is in
place from the beginning (probability .68 in the 13th–15th century). Early favoring
of ESTAR + Verb-ndo in limited duration situations is congruent with the view
that progressive aspect is an implication of the locative construction, rather than
the result of an abrupt metaphorical space > time leap (Bybee et al. 1994: 137;
Torres Cacoullos 2012: 84). By the direction of effect, then, there is continuity in
the linguistic conditioning of the Progressive.
Is there change? Yes. The locus of change here is in the relative magnitude
of effect. Comparison of the Range values for Aspect and Locative co-occurrence
within each analysis gives an indication of their relative importance. 11 We see

11 This is not a strict mathematical rule. The goal of the stepwise procedure in Variable-rule
analysis is to find the set of factor groups which jointly account for the variation and is not pri-
marily meant to order these factor groups according to any criterion. In the analyses shown in
Table 4, the order of the factor groups according to the Range is consistent with that suggested by
278 Rena Torres Cacoullos

that in the 13th–15th century the Ranges are comparable (with a ratio of 33:43 =
0.8) but that in the 17th century the Range for Aspect is twice as great (54:29 =
1.9) and in the 19th century it is three times greater (59:19 = 3.1). Here we have the
answer to the question of whether the Progressive spreads differentially across
linguistic contexts. It does. This contradicts the hypothesis of gradualness in the
propagation of change but abruptness in grammatical change itself.
The weakening of the favoring effect of co-occurring locatives, as the prob-
abilities get farther from 1 over time, may be taken as a measure of the loss of
source-construction meaning, known as semantic bleaching (“depletion” (Givón
1975: 94)), in the course of grammaticalization. Furthermore, we can note that
ESTAR + Verb-ndo is increasingly disfavored in reference to extended duration
(habitual, indefinitely-existing-state) situations, which are becoming more the
province of the simple Present, as probability values get closer to zero (at .35,
.16 and .12, in the 13th–15th, 17th and 19th centuries, respectively). This means that
an aspectual opposition with the simple Present has developed gradually: the
originally more locative construction is used more and more as an aspectual
expression of limited, as opposed to extended, duration. The developing – but
not (yet) obligatory – Progressive – simple Present opposition is illustrated in (9),
sleeping in progress (9a) vs. a habitual mode of sleeping (9b). Thus, in the course
of speakers’ recurrent choices of variants (as in (3), (6)–(8)), among which dis-
tinctions in aspectual function can be neutralized in discourse (Sankoff 1988a),
the newer and older variant may gradually become aspectually more distinct
(Torres Cacoullos 2012).

(9a) Coronel ¡Ah!, está durmie-ndo


Colonel ah be.prs.3sg sleep-ger
Nada, nada, duerma usted, no se moleste.
‘Colonel … Ah, you’re sleeping (PROG). Never mind, never mind, sleep, don’t be both-
ered’
(19th c., Serafina la devota, Act 1, Scene VI)

the significance of the change in log likelihood, from the first step of the “step down” procedure
when the least important group gets “cut”: most important in Old Spanish are Aspect, Locative
and Temporal (all p = .000), followed by Polarity-Sentence type (p = .001) and Priming (p = .008);
in the 17th century Aspect, Locative and Polarity-Sentence type (all p = .000), followed by Priming
(p = .032); in the 19th century Aspect and Polarity-Sentence type (both p = .000), followed by
Locative (p = .014), Stativity (p = .019), and Temporal (p = .040). The same holds according to the
order of selection in the “step up” procedure, another indication of relative magnitude of effect,
except for the selection of Aspect before Locative co-occurrence in Old Spanish and of Stativity
before Locative co-occurrence in the 19th century.
Gradual loss of analyzability: Diachronic priming effects 279

(9b) Dime cómo duermes


tell.imp how sleep.prs.2sg
y te diré quién eres.
‘Tell me how you sleep (PRS) and I’ll tell you who you are’
(19th c., Regenta I, Ch. III)

To summarize this section: in comparing the “environmental constraints” (Labov


1982: 20) on the selection of the Progressive over time, we have found shifts in
effect strength, or differential increases of the rate of the Progressive by linguis-
tic context. These are compatible not with abrupt reanalysis but with gradual
modification of the grammar. The findings here match those in another case of
tense-aspect-mood grammaticalization, that of the ‘go’-future in Brazilian Por-
tuguese. In tracking future temporal reference over five centuries, Poplack and
Malvar (2007: 157–160) found that as a variant receded or advanced, constraints
on its selection changed, leading them to conclude that “the transition period in
linguistic change is not abrupt, but proceeds as a series of small adjustments, as
incoming and outgoing variants jockey for position in the system” (Poplack and
Malvar 2007: 121).
We turn now to the priming effect in the multivariate analysis.

5 I ncreasing unithood (loss of analyzability):


Evidence from priming

Priming (perseveration or persistence) is the repetition of a preceding form or


structure. In their study of constraints on the English agentless passive (The
liquor closet was broken into vs. They broke into the liquor closet) in spontaneous
discourse, Weiner and Labov (1983: 52) found that the passive variant was most
strongly favored by a preceding passive. Priming, defined by Bock and Griffin
(2000: 177) as “the unintentional and pragmatically unmotivated tendency to
repeat the general syntactic pattern of an utterance” is also robust in psycholin-
guistic experiments.
We can first mention that ESTAR + Verb-ndo is primed by itself. That is, when
the verb of the preceding clause is a Progressive estar construction (any tense,
any grammatical person), as in (10), the rate of the Progressive is higher than
average (though Ns are low, at 2/4, 6/6, and 8/10, in the 13th–15th, 17th and 19th
century data, respectively). But such self-priming on its own is unrevealing as to
unithood/analyzability.
280 Rena Torres Cacoullos

(10) Progressive to Progressive priming


Os estoy leye-ndo en el semblante
dat.2sg be.prs.1sg read-ger
lo que está asa-ndo en vuestra alma...
that.rel be.prs.3sg broil-ger
‘I am reading (PROG) on your face what is broiling (PROG) in your heart’
(Amor de padre, Act IV, Scene V)

The priming effect shown in Table 4 is different. Here the question is whether
non-Progressive estar, i. e., in other than a gerund construction, “triggers” the
Progressive. Thus, we consider preceding use of non-Progressive estar construc-
tions of the schematic form ESTAR + X, including locative (11), predicate adjective
(12) and resultative (13) constructions.

Non-Progressive estar to Progressive estar priming

(11) no sabemos quién está dentro; habla-ndo están.


neg know.prs.1pl who be.prs.3sg inside speak-ger be.prs.3pl
‘we don’t know who is inside; they are talking’
(Celestina, Act XIV)

(12) cuando alguno está mal y al paso de la muerte,


when someone be.prs.3sg ill and at step from the death
están los expectantes roga-ndo a Dios:
be.prs.3pl the expectants beseech-ger acc God
‘when someone is ill and a few steps from death, there are the expectants beseeching
God’
(Corbacho IV, II)

(13) están cocidas con sus garbanzos, cebollas y tocino,


be.prs.3pl cook.ptcp.f.pl with their garbanzos onion and bacon
y la hora de ahora están dicie-ndo: ‘’¡comé=me!’’
and the hour of now be.prs.3pl say-ger eat.imp=me
‘They are cooked with their garbanzos, onions and bacon and now are saying “eat
me!”’
(Quijote II, LIX)

Tokens were coded as having an estar construction (as in (11)–(13)), a simple


Present, or another finite verb form in the immediately preceding clause. I
omitted discourse formulas (such as digo ‘I say’ or ¿qué sé yo? ‘what do I know?’,
lo mismo da ‘it makes no difference’) but included subordinate clauses (comple-
ment, concessive, relative, temporal). Finer analysis would consider clause type,
Gradual loss of analyzability: Diachronic priming effects 281

lexical repetition, other preceding periphrastic forms, and the distance at which
a preceding ESTAR construction has an effect.12
I propose that by considering whether other estar constructions prime ESTAR
+ Verb-ndo we obtain a measure of its analyzability. If ESTAR + Verb-ndo is “ana-
lyzable” – with internal structure and component parts that are recognizable
as individual words (section 1), namely a finite form of estar and the gerund of
another verb – it should also be primed by other constructions composed of estar
and another unit. If, on the other hand, the Progressive is no longer analyzable,
having become, in reanalysis parlance, a single constituent or periphrastic unit
(Section 3), no such priming effect should hold.
The multivariate analysis in Table 4 shows that the Progressive is favored
by a preceding non-Progressive estar construction in the 13th–15th and in the 17th
century. This corresponds to Szmrecsanyi’s (2005: 139) β-persistence. Such an
effect could be taken as lexical, estar to estar, or as structural, estar + X to estar +
X, priming (where X encompasses various word classes or syntactic roles). Either
way, we can think of this kind of priming as based on associations between sub-
units of constructions (for example, between English auxiliary go in the future
construction and lexical verb go in various motion constructions) as opposed to
priming based on syntactic identity of the entire unit (as when BE going to Verb
primes itself or, here, when ESTAR + Verb-ndo primes itself).
The priming by non-Progressive estar constructions is as expected, if the Pro-
gressive has an analyzable internal structure. It also suggests that, since estar has
independently increased in frequency to the detriment of copula ser ‘be’ in several
constructions (Silva-Corvalán 1994: 94–95 and references therein), priming may
be part of the explanation for the advancement of the Progressive in Spanish.
In the 19th century, however, which, as we saw, is when choice of the Progres-
sive is most strongly favored by limited duration aspect (section 4.2), the priming
effect is no longer significant (nor is there a discernable direction of effect). This
disappearance of the earlier priming effect – ESTAR + X no longer triggers ESTAR

12 On the length of discourse over which priming may operate, see Labov 1994: 567, Travis 2007:
110, 128–129 and references therein. Patterns seemed to be the same when I excluded the first
instance of the variable in a speaker turn (in the plays) or in a stretch of discourse attributed to
a character (in the novels) as when I included such tokens, if the preceding finite verb appeared
in speech directed by the interlocutor to the speaker (for example, -¿Dónde está? –Está … está
viajando ‘Where is she? / She is…is travelling’ (Acertar errando, Act 3, Scene XV)) or if the token
is separated from the same speaker’s preceding finite verb by an interlocutor’s turn having no
finite verb (for example, – […] ¿Estás ya contento? – (Va a arrodillarse para besarle la mano.)
¡Padre mío! – ¿Qué haces, Eduardo? ‘Are you content now? / Dear father! (kneeling to kiss his
hand) / What are you doing, Eduardo? (Amor de padre, Act 2, Scene I).
282 Rena Torres Cacoullos

+ Verb-ndo – is consonant with diminished analyzability of the whole, i. e. the


absorption of estar into a periphrastic unit. Together with bleaching of locative
meaning and aspectual differentiation from the simple Present (weakened favor-
ing by co-occurring locatives and strengthened disfavoring in extended duration
situations), this result is indicative of increasing cohesion of ESTAR + Verb-ndo
as a new unit.

6 Conclusion

The diachronic quantitative patterns of the Spanish Progressive are consistent


with gradual loss of analyzability and inherent variability. First, the ESTAR +
Verb-ndo sequence has increased in frequency, on which an abrupt reanalysis
account is silent, but which is predicted by an understanding of change in con-
stituency based on chunking that occurs with repetition (cf. Table 3). Second, the
spread of the Spanish Progressive relative to the simple Present has been differ-
ential, as shown in change in the linguistic conditioning of variant choice, here
in the form of shifting relative magnitude of effect, which contradicts an abrupt-
reanalysis, constant-rate hypothesis (cf. Table 4).
A measure of internal structure is priming, whereby selection of a given con-
struction is favored by previous use of a different construction with the same
verb. Here, favoring of the Progressive over the simple Present when the preced-
ing clause has a different ESTAR + X construction indicates a greater degree of
analyzability of ESTAR + Verb-ndo in the 13th–15th and 17th than in the 19th century
data, where this priming effect no longer holds (Section 5). My general hypothesis
is that structural priming from related constructions is operative in earlier stages
of grammaticalization. For example, priming of BE going to Verb by lexical go
should weaken over time.
In “Building on empirical foundations”, Labov (1982: 20) wrote:

Change is the process of replacement, not the outcome of that process. When we study the
process directly we are immediately confronted with the heterogeneous character of lin-
guistic systems. Change implies variation; change is variation [italics in original]. […] [The
progress of change] is rarely represented by the categorical replacement of one form by
another, but normally by changes in the relative frequencies of the variants and changes in
their environmental constraints [my italics].

Studying change as change in the linguistic conditioning of variant choice makes


gradualness tangible. It is hoped that more quantitative diachronic studies of var-
iation, which include tests of priming effects, will elucidate the transition periods
between the endpoints of change.
Gradual loss of analyzability: Diachronic priming effects 283

References
Allen, Henry J. and J. B. Greenough (1916): Allen and Greenough’s new Latin grammar for
schools and colleges, founded on comparative grammar. Boston et al.: Ginn.
Bock, Kathryn J. and Zenzi M. Griffin (2000): The persistence of structural priming: Transient
activation or implicit learning. Journal of Experimental Psychology: General 129(2):
177–192.
Brown, Esther L. and Javier Rivas (2011): Subject-verb order in Spanish interrogatives: a
quantitative analysis of Puerto Rican Spanish. Spanish in Context 8: 23–49.
Bybee, Joan (2003): Mechanisms of change in grammaticization: the role of frequency. In: Brian
D. Joseph and Richard D. Janda (eds.), The handbook of historical linguistics, 602–623.
Oxford: Blackwell.
Bybee, Joan (2010): Language, usage and cognition. Cambridge: Cambridge University Press.
Bybee, Joan, Revere Perkins and William Pagliuca (1994): The evolution of grammar: Tense,
aspect, and modality in the languages of the world. Chicago: University of Chicago Press.
Bybee, Joan and Rena Torres Cacoullos (2009): The role of prefabs in grammaticization: How
the particular and the general interact in language change. In: Roberta L. Corrigan, Edith
A. Moravcsik, Hamid Ouali and Kathleen Wheatley (eds.), Formulaic language: Volume 1.
Distribution and historical change, 187–217. Amsterdam: John Benjamins.
Campbell, Lyle (1998): Historical linguistics: an introduction. Cambridge, MA: MIT Press.
Cedergren, Henrietta J. and David Sankoff (1974): Variable rules: performance as a statistical
reflection of competence. Language 50(2): 333–355.
Comrie, Bernard (1976): Aspect. Cambridge: Cambridge University Press.
Croft, William and Alan D. Cruse (2004): Cognitive linguistics. Cambridge: Cambridge University
Press.
Givón, Talmy (1975): Serial verbs and syntactic change: Niger-Congo. In: Charles N. Li (ed.),
Word order and word order change, 47–112. Austin: University of Texas Press.
Hay, Jennifer (2001): Lexical frequency in morphology: is everything relative? Linguistics 39:
1041–1070.
Hopper, Paul J. (1991): On some principles of grammaticization. In: Elizabeth Closs Traugott
and Bernd Heine (eds.), Approaches to grammaticalization (Volume 1), 17–35. Amsterdam:
John Benjamins.
Hopper, Paul J. and Elizabeth Closs Traugott (1993): Grammaticalization. Cambridge: Cambridge
University Press.
Kroch, Anthony (1989): Reflexes of grammar in patterns of language change. Language
Variation and Change 1: 199–244.
Labov, William (1969): Contraction, deletion, and inherent variability of the English copula.
Language 45: 715–762.
Labov, William (1982): Building on empirical foundations. In: Winfred P. Lehmann and Yakov
Malkiel (eds.), Perspectives on historical linguistics, 11–92. Amsterdam: John Benjamins.
Labov, William (1994): Principles of linguistic change: Internal factors (Volume 1). Oxford:
Blackwell.
Langacker, Ronald (1987): Foundations of cognitive grammar: theoretical prerequisites (Volume
1). Stanford, CA: Stanford University Press.
Marcos Marín, Francisco (dir.) (1992): Corpus de Referencia de la Lengua Española Contem-
poránea Peninsular (CORLEC), http://www.lllf.uam.es/ING/Info%20Corlec.html (accessed
April 2004).
284 Rena Torres Cacoullos

Myhill, John (1988): The grammaticalization of auxiliaries: Spanish clitic climbing. Berkeley
Linguistics Society 14: 352–363.
Newmeyer, Frederick (2003): Grammar is grammar and usage is usage. Language 79(4):
682–707.
Poplack, Shana and Elisabete Malvar (2007): Elucidating the transition period in linguistic
change: The expression of the future in Brazilian Portuguese. Probus 19: 121–169.
Poplack, Shana and Sali Tagliamonte (1999): The grammaticization of going to in (African
American) English. Language Variation and Change 11: 315–342.
Poplack, Shana and Sali Tagliamonte (2001): African American English in the diaspora: tense
and aspect. Oxford: Blackwell.
Rizzi, Luigi (1982): A restructuring rule in Italian syntax. In: Luigi Rizzi (ed.), Issues in Italian
syntax, 1–48. Dordrecht: Foris.
Roberts, Ian and Anna Roussou (2003): Syntactic change: a minimalist approach to grammati-
calization. Cambridge: Cambridge University Press.
Sankoff, David (1988a): Sociolinguistics and syntactic variation. In: Frederick J. Newmeyer (ed.),
Linguistics: The Cambridge survey (Volume IV), 140–161. Cambridge: Cambridge University
Press.
Sankoff, David (1988b): Variable rules. In: Ulrich Ammon, Norbert Dittmar and Klaus J. Mattheier
(eds.), Sociolinguistics: An international handbook of the science of language and society,
984–997. Berlin/New York: Walter de Gruyter.
Sankoff, David, Sali Tagliamonte and Eric Smith (2005): GOLDVARB X: A multivariate analysis
application for Macintosh and Windows. <http://individual.utoronto.ca/tagliamonte/
Goldvarb/GV_index.htm>.
Scheibman, Joanne (2000): I dunno but... a usage-based account of the phonological reduction
of don’t. Journal of Pragmatics 32: 105–124.
Silva-Corvalán, Carmen (1994): Language contact and change. Spanish in Los Angeles. Oxford:
Clarendon Press.
Szmrecsanyi, Benedikt (2005): Language users as creatures of habit: A corpus-based analysis
of persistence in spoken English. Corpus Linguistics and Linguistic Theory 1(1): 113–150.
Torres Cacoullos, Rena (1999a): Variation and grammaticization in progressives: Spanish -ndo
constructions. Studies in Language 23–1: 25–59.
Torres Cacoullos, Rena (1999b): Construction frequency and reductive change: diachronic
and register variation in Spanish clitic climbing. Language Variation and Change 11(2):
143–170.
Torres Cacoullos, Rena (2000): Grammaticization, synchronic variation, and language contact:
a study of Spanish progressive -ndo constructions. Amsterdam: John Benjamins.
Torres Cacoullos, Rena (2006): Relative frequency in the grammaticization of collocations:
nominal to concessive a pesar de. In: Timothy L. Face and Carol A. Klee (eds.), Selected
proceedings of the 8th Hispanic Linguistics Symposium, 37–49. Somerville, MA: Cascadilla
Proceedings Project.
Torres Cacoullos, Rena (2012): Grammaticalization through inherent variability: The
development of a progressive in Spanish. Studies in Language 36(1): 73–122.
Torres Cacoullos, Rena and James A. Walker (2011): Collocations in grammaticalization and
variation. In: Bernd Heine and Heiko Narrog (eds.), Handbook of Grammaticalization,
225–238. Oxford: Oxford University Press.
Travis, Catherine E. (2007): Genre effects on subject expression in Spanish: Priming in narrative
and conversation. Language Variation and Change 19(2): 101–135.
Gradual loss of analyzability: Diachronic priming effects 285

UNAM (1976): El habla popular de la Ciudad de México: Materiales para su estudio. Mexico City:
Centro de Lingüística Hispánica.
Weiner, Judith E. and William Labov (1983): Constraints on the agentless passive. Journal of
Linguistics 19(1): 29–58.
Weinreich, Uriel, William Labov and Marvin Herzog (1968): Empirical foundations for a theory of
language change. In: Winfred P. Lehmann and Yakov Malkiel (eds.), Directions for historical
linguistics, 95–188. Austin: University of Texas Press.
Yang, Charles D. (2000): Internal and external forces in language change. Language Variation
and Change 12(3): 231–250.

Appendix: Texts

Author/Title Source/Edition
Listed in chronological order Unless otherwise indicated, electronic
texts were downloaded from Biblioteca
Virtual Miguel de Cervantes, http://www.
cervantesvirtual.com/

Anonymous, Cantar de mio Cid Texto, gramática y vocabulario, Ramón


Menéndez Pidal (ed.), Vol. 3. Madrid:
Espasa Calpe, 1944.
Anonymous, Calila e Dimna Juan Manuel Cacho Blecua & María Jesús
Lacarra (eds.). Madrid: Castalia, 1984.
Anonymous, Poema de Fernán González
Anonymous, Libro de Apolonio
Gonzalo de Berceo, Sacrificio de la misa
Alfonso X, General estoria. Primera parte Antonio G. Solalinde (ed.). Madrid: Centro
de Estudios Históricos, 1930
Alfonso X, General Estoria. Segunda parte Antonio Solalinde, Lloyd Kasten & Victor
R. B. Oelshläger (eds.). Madrid: CSIC
(Consejo Superior de Investigaciones
Científicas), 1957
Alfonso X, Estoria de España (EE1, EE2); The Electronic Texts and Concordances of
Libro de las leyes the Prose Works of Alfonso X, El Sabio,
Lloyd Kasten, John Nitti, and Wilhemina
Jonxis-Henkemens (eds.). Hispanic Semi-
nary of Medieval Studies, 1995
Anonymous, El libro del caballero Zifar
Arcipreste de Hita, Libro de Buen Amor
Don Juan Manuel, El conde Lucanor José Manuel Blecua (ed.). Madrid: Castalia,
1971
Alfonso Martínez de Toledo, Corbacho Michael Gerli (ed.). Madrid: Cátedra, 1978
Juan de Flores, Juan, Grimalte y Gradissa Pamela Waley (ed.). London: Tamesis,
1971
286 Rena Torres Cacoullos

Author/Title Source/Edition
Listed in chronological order Unless otherwise indicated, electronic
texts were downloaded from Biblioteca
Virtual Miguel de Cervantes, http://www.
cervantesvirtual.com/

Hernando del Pulgar, Crónica de los Reyes Juan de Mata Carriazo (ed.). Madrid:
Católicos Espasa Calpe, 1943
Fernando de Rojas, Fernando, La Celestina Dorothy S. Severin (ed.), Madrid: Cátedra,
1987
Íñigo López de Mendoza, marqués de Santillana,
Sonetos
Juan del Enzina [14 Auctos, Eglogas, Represen-
taciones]
Miguel de Cervantes Saavedra, Don Quijote de
la Mancha; Comedia famosa de La casa de los
celos y selvas de Ardenia
Lope de Vega, La dama boba; Comedia del Prín-
cipe Ynocente; La vengadora de las mujeres
Guillén de Castro, El amor constante
Ruiz de Alarcón y Mendoza, La Amistad cas-
tigada
Tirso de Molina, Don Gil de las calzas verdes;
Por el sótano y el torno; Amor y celos hacen
discretos [Act I]; La villana de Sagra
Gaspar de Ávila, La boca y no el corazón o Fingir
por conserver
Calderón de la Barca, Casa con dos puertas mala
es de guarder; La dama duende; Amor honor
y poder
Leandro Fernández de Moratín, La comedia J. Dowling & R. Andioc (eds.). Madrid: Cas-
nueva; El sí de las niñas talia, 1975
Manuel Bretón de los Herreros, A Madrid me
vuelvo
José María de Carnerero, El afán de figurar
Ventura de la Vega, Acertar errando, o El cambio
de diligencia
José de Mariano Larra, Los inseparables
Duque de Rivas, Don Álvaro o la fuerza del sino
Francisco Martínez de la Rosa, La boda y el
duelo; Amor de padre
Pablo Alonso de la Avecilla, Los presupuestos
Luis Mariano de Larra, La paloma y los halcones;
La primera piedra
Manuel Tamayo y Baus. No hay mal que por bien
no venga
Juan Valera, Pepita Jiménez
Gradual loss of analyzability: Diachronic priming effects 287

Author/Title Source/Edition
Listed in chronological order Unless otherwise indicated, electronic
texts were downloaded from Biblioteca
Virtual Miguel de Cervantes, http://www.
cervantesvirtual.com/

Benito Pérez Galdós, Doña Perfecta


Leopoldo Alas “Clarín”, La Regenta, vol I & II
Pardo Bazán, Emilia. Los pazos de Ulloa
Julio Cuevas, ¡El siete!: juguete cómico en un
acto y en prosa /original de Don Julio Cuevas y
Don Manuel Labra
Enrique Gaspar, Don Ramón y El señor Ramón;
El estómago; La gran comedia; La lengua;
La levita; Las circunstancias; Lola; Los niños
grandes; Problema; Serafina la devota
Malte Rosemeyer, University of Freiburg
How usage rescues the system:
Persistence as conservation1

Abstract: This paper evaluates the relationship between usage and systemati­
city in language from the perspective of usage-based linguistics. In particular,
it investigates the diachronic effects of the phenomena of entrenchment and
persistence on the development of morphosyntactic alternations. Both entrench-
ment and persistence depend on a language user’s experience with language:
They lead to a (temporary) strengthening of the cognitive representation of a
linguistic item. For this reason, both processes can lead to the conservation of
disappearing grammatical constructions. In order to evaluate this hypothesis, a
quantitative analysis of the historical changes in Spanish auxiliary selection is
proposed. There is a higher probability for speakers to select ‘be’ over ‘have’ as a
perfect auxiliary if ‘be’ + participle (PtcP) has already appeared in the preceding
co-text. Over time, this effect becomes stronger. The greater dependence of ‘be’
selection on persistence effects in later stages of the process by which ‘be’ was
replaced with ‘have’ suggests that the cognitive mechanism of persistence can be
understood as a type of weak entrenchment with a conserving effect.

1 Introduction

From the perspective of usage-based linguistics (UBL, cf. Langacker 1987; Bybee
2006, 2007, 2010), there is a strong relationship between usage and systematicity.
Whereas many traditional approaches assume that linguistic structure is system-
atic in order to allow for communication, UBL suggests that because a language
is used as a means of communication, its structures acquire systematicity.2 For
UBL, usage frequency is of crucial importance in this process. For one thing,

1 I wish to express my gratitude to the organizers of the Freiburg conference for the invitation.
In addition, I would like to thank the participants and in particular, my reviewers Rena Torres
Cacoullos and Göz Kaufmann, for helping me to develop my ideas.
2 Consequently, it is necessary to distinguish “systematicity” from “system” in the structuralist
sense (a set of paradigmatic oppositions through which (grammatical) meaning arises). System-
aticity is a matter of degree: Some grammatical functions can be expressed more systematically
than others. For instance, the Modern French “system” of intransitive auxiliary selection is rather
290 Malte Rosemeyer

usage frequency plays an important role in the emergence of grammatical struc-


tures. Studies on language change have shown that the repeated co-occurrence
of two linguistic elements leads to the routinization, or entrenchment, of the link
between these two elements (Bybee 2002; Bybee and Torres Cacoullos 2009). This
entrenchment can lead to an increase in the productivity of this type of syntag-
matic connection. This phenomenon, usually termed grammaticalization, leads
to the emergence of a new grammatical construction and can thus be interpreted
as generating a rise in systematicity (Bybee 2003). Also relevant to UBL is the
fact that frequency of occurrence can be said to inhibit systematicity. Thus, the
repeated occurrence of a linguistic element leads to a conserving effect: Complex
linguistic elements with a high absolute usage frequency resist analogical regu-
larization processes longer than complex linguistic elements with the same gram-
matical function but a lower absolute usage frequency (Bybee 2006).
The concept of entrenchment thus illustrates that language users are “crea-
tures of habit” (Szmrecsanyi 2005). Trivially, speakers tend to recur to already-
existing linguistic patterns in their utterances. Indeed, without the relative con-
servativism of speakers, languages could not serve their communicative needs.
Szmrecsanyi’s (2005, 2006) work has shown that this assumption has important
repercussions for the study of morphosyntactic alternations. If language use is
modeled probabilistically, the choice of one construction over another is often
influenced by whether or not one of these constructions appears in the preceding
discourse. The triggering construction thus “persists” in the speaker’s memory,
influencing her or his language use.
This paper investigates the similarities and differences of the concepts of
entrenchment and persistence. In Section 2, it is argued that for both concepts, the
speaker’s preceding experience with language is a crucial factor. Entrenchment
and persistence alike lead to a strengthening of the cognitive representation of a
linguistic element. It is hypothesized that for this reason, the two concepts have
a similar influence on processes of language change. After a presentation of the
data and measurements in Section 3, a quantitative analysis of the morphosyn-
tactic alternation of Spanish split auxiliary selection demonstrates that similar to
entrenchment, persistence has a conserving effect on language change. A gener-
alized linear mixed-effect regression analysis conducted in Section 5 confirms the
statistical significance of this effect. The discussion of these findings leads to the
hypothesis that whereas the conserving effect of entrenchment operates locally,
the conserving effect of persistence operates globally. Section 6 gives a brief

inconsistent, with some verb classes – such as verbs that express a change of location – typically,
but not always, selecting être (‘be’) (for an overview, see Kailuweit 2011).
How usage rescues the system: Persistence as conservation 291

summary of the findings and relates them to one of the general questions evalu-
ated in this volume, i. e. what is the link between language use and the system?

2 Frequency and conservation

Several contributions in this volume (e. g., Haider) argue that there is a close
relationship between systematicity and language change. As already apparent in
Coseriu’s (1974) discussion on Saussure’s attitude to language change, languages
are historical objects. Coseriu argues that the reification of language (“langue”)
as an abstract system that exists independently of its speakers leads to insur-
mountable problems in the description of language change. If the functionality
of a language is the result of its systematicity, language change cannot be due
to system-internal factors. Consequently, from the perspective of Saussurian
structuralism, language change is an “unreal phenomenon caused by ‘external
factors’” (Coseriu 1974: 23, translation M. R.). However, languages do change.
Coseriu (1974: 23–24) argues that the “apparent aporia of language change” arises
from a confusion of perspectives. Languages are not functional because they
are systematic – rather, their functionality creates systematicity. Consequently,
languages change because speakers want to continue to be able to express their
thoughts with it: “A language, however, that is continuously [...] determined by
its function, is not complete, but perpetually emerging from concrete linguistic
actions: it is not εργον [work] but ενέργεια [working]” (Coseriu 1974: 24, trans-
lation M. R.). In contrast, dead languages like Latin are no longer functional
because they have stopped changing. In Coseriu’s view, change is thus an intrin-
sic property of language.
Coseriu’s approach to language change can be directly related to the more
recent concept of “emergentism” in linguistics (Hopper 1987; Bybee and Hopper
2001b; MacWhinney 2006). Regarding grammar, Hopper’s (1987: 142) concept of
Emergent Grammar has become very influential. In Hopper’s words,

[t]he notion of Emergent Grammar is meant to suggest that structure, or regularity, comes
out of discourse and is shaped by discourse as much as it shapes discourse in an on-
going process. Grammar is hence not to be understood as a pre-requisite for discourse, a
prior possession attributable in identical form to both speaker and hearer. Its forms are
not fixed templates but are negotiable in face-to-face interaction in ways that reflect the
individual speakers’ past experience of these forms and their assessment of the present
context, including especially their interlocutors, whose experiences and assessments may
be quite different. Moreover, the term Emergent Grammar points to a grammar which is not
abstractly formulated and abstractly represented, but always anchored in the specific con-
crete form of an utterance.
292 Malte Rosemeyer

Emergentism thus understands grammar to be shaped by the language expe-


rience of the speakers of that language. Due to the creativity of speakers, gram-
matical forms are continuously used in new contexts and with new meanings.3 As
a result, grammatical categories are not clearly delimited but characterized by “a
continual movement towards structure, a postponement or ‘deferral’ of structure,
a view of structure as always provisional, always negotiable, and in fact as epi-
phenomenal” (Hopper 1987: 142).
However, speakers’ language use experience also leads to conservative
behavior. Usage-based linguistics assumes frequency of use to be a defining factor
for this mechanism. Thus, a high token frequency of a linguistic element results
in a stronger cognitive representation of that element because it is repeatedly
accessed. Utterances are not always produced from scratch, but rather formed
from pre-packaged building blocks, so-called “chunks” (Newell 1990: 185–193;
Ellis 1996; Bybee 2010: 33–56). A strong cognitive representation of a frequent
complex linguistic element leads to cognitive chunking: The linguistic element
is increasingly accessed holistically. On the linguistic level, chunking results in
“entrenchment” (as termed by Langacker 1987: 59): The complex element is both
recognized and produced faster by the language user.
Important evidence for the linguistic reality of chunking and entrenchment
comes from usage-based studies in first language acquisition. The hypothesis of
“islands” of grammar development raised by Tomasello (1992) predicts that cat-
egory formation in language acquisition is crucially dependent on item-specific
chunks. In line with this prediction, studies like Lieven, Pine, and Baldwin (1997)
have shown that children’s productivity is item-based. The productivity of chil-
dren is unlike that of adults because early language is typically more formulaic
than adult language. Although there is an ongoing debate on the chronology of
the acquisition of the ability to generalize from specific chunks (Behrens 2009),
it appears to depend on similarity relations to other constructions. For instance,
Abbot-Smith and Behrens (2006) show that the acquisition of the complex German

3 Note that this use of the term “creativity” is not synonymous with creativity in Generative
Grammar. At least in one acceptation of the term, creativity in Generative Grammar refers to
the fact that a language user can only ever experience a fraction of all the possible sentences
in a language. However, he can “on the basis of this finite linguistic experience […] produce an
indefinite number of new utterances which are immediately acceptable to other members of his
speech community” (Chomsky 1975: 61). In contrast, the term “creativity” as used here refers
to an individual’s capacity to use a certain linguistic element in a novel function. This reinter-
pretation of the function of a linguistic element – fundamental to historical processes such as
grammaticalization – is possible due to analogical reasoning processes, and is motivated by con-
siderations of expressiveness.
How usage rescues the system: Persistence as conservation 293

passive is rather quick because of the previous acquisition of the formally similar
perfect construction. Since the learning of grammatical categories is crucially
dependent on highly frequent chunks, grammatical categories are organized in
terms of prototypicality (Goldberg 2006).
The skewed distribution of grammatical categories across lexical items can
have an influence on the directionality of language change. Thus, it has been
argued that entrenchment leads to the loss of analyzability of complex linguistic
items (Bybee and Hopper 2001a; Bybee 2006, 2007, 2010): “The more a sequence
of morphemes or words is used together, the stronger the sequence will become
as a unit and the less associated it will be to its component parts” (Bybee 2010:
48). This loss of analyzability can lead to the conservation of highly frequent
syntagms in processes of language change. Entrenchment causes highly
frequent syntagms to grow more and more autonomous from the constructions
to which they originally belonged. In extreme cases, the paradigmatic relation
between syntagm and mother construction may be severed. If the mother con-
struction is subject to a grammatical change, highly frequent syntagms belong-
ing to that construction will be less affected by that change than other related
but less frequent syntagms: “[…] frequent forms resist regularizing or other mor-
phological change with the well­known result that irregular inflectional forms
tend to be of high-frequency. Assuming that regularization occurs when an
irregular form is not accessed and instead the regular process is used, it is less
likely that high­-frequency inflected forms would be subject to regularization”
(Bybee 2010: 25). Processes of the analogical generalization (in Bybee’s terms,
“regularization”) of a construction leading to the disappearance of another
construction are counteracted by entrenchment. The intrusion of a new con-
struction into the usage contexts of another construction will first affect those
specific syntagms which are used less, and only afterwards specific syntagms
with a high absolute frequency of use. The global disappearance process related
to the analogical transfer of the competing construction is stalled in specific
instances. Consequently, a disappearing construction can survive in particular
instantiations until very late.
A second important aspect of the conservative language behavior of speakers
is covered by the concept termed “persistence” by Szmrecsanyi (2005, 2006).
Persistence refers to the notion of “production priming” in psycholinguistics
and “repetitiveness” in discourse analysis (Szmrecsanyi 2005: 116). Produc-
tion priming has been shown to be important in lexical (Neely 1977, 1991; Hoey
2004, 2005), phonological (Baddeley 1966; Griffin 2002) and syntactic domains
(Gries 2005; Travis 2007; Travis and Torres Cacoullos 2012). Put simply, the use
of a linguistic element raises the probability of the use of a formally or func-
tionally similar element in the following discourse. Persistence thus influences
294 Malte Rosemeyer

the speaker’s choice between linguistic elements that compete within a certain
envelope of variation.4 Consequently, “while it is corpus-linguistic standard prac-
tice to view successive occurrences of a variable as independent binomial trials
(like independent, unrelated throws of a dice), there may, in fact, exist inter-
actions between neighboring variables, depending on the syntagmatic proximity
between them” (Szmrecsanyi 2005: 115).
In his analysis of alternations such as the English future markers be going
to and will, Smzrecsanyi (2005) shows that the use of one variant in the pre-
ceding co-text significantly increases the probability of a speaker selecting the
same variant in the later co-text over the competing variant. Moreover, he dem-
onstrates that this effect crucially depends on the textual distance between the
persisting element and text passage where the envelope of variation applies
(Szmrecsanyi 2005: 119–120). Persistence effects thus decrease as the temporal
distance to the original stimulus increases: The stimulus becomes less and less
salient to the speaker. Smzrecsanyi argues that these observations have far-reach-
ing consequences for quantitative analyses of alternations in language, since
“system-internal” factors governing the speaker’s choice of one variant or the
other may, in fact, in some contexts be neutralized by persistence. Failing to take
into account the co-dependence between earlier and later utterances may distort
statistical models of constraints on language use.
Crucially, Smzrecsanyi recognizes that entrenchment and persistence result
from the same cause, namely the activation of cognitive representations of lin-
guistic experiences:

Along somewhat different lines, persistence may be thought of as a type of short-term


entrenchment [...]. It is true that entrenchment is understood as being a mechanism
operating over longer intervals of time, possibly a speaker’s lifetime – in contrast,
persistence is a phenomenon that probably dissipates after a few minutes. Yet, persistence
as well is due to linguistic patterns, or representations thereof, being activated through
use; in this way, it may make sense to refer to persistence as “micro-entrenchment”, and to
entrenchment as “macro-persistence”. (Szmrecsanyi 2005: 141)

In this sense, persistence is fundamental to entrenchment: The use of a linguistic


element strengthens its cognitive representation and therefore increases its prob-

4 The Labovian notion of a variable presupposes a degree of interchangeability between lin-


guistic elements in a particular envelope of variation. It is appropriate to speak of a “choice”
between these linguistic elements because they are deemed interchangeable. This terminology
does not imply, however, that the speaker actively and consciously chooses between those vari-
ants. Rather, the “choice” between the variants is a result of automatic processing. Likewise,
speakers are very much unaware of persistence effects in their speech.
How usage rescues the system: Persistence as conservation 295

ability of use in the subsequent discourse. Repetition does not lead to a qual-
itatively different phenomenon, but merely reinforces this effect.
This paper aims at evaluating this assumption. As argued above, due to the
high strength of the cognitive representation of entrenched linguistic elements
these elements are less susceptible to ongoing language change than other ele-
ments belonging to the same construction. Since persistence also leads to a (tem-
porally) higher strength of the cognitive representation of a linguistic element,
it can be hypothesized that persistence effects play a conservative role in lan-
guage change. In particular, persistence can be shown to conserve the use of a
grammatical construction whose usage frequency is declining. In disappearance
processes, a construction becomes gradually restricted to specific usage contexts.
Its syntactic productivity declines; the construction typically only appears in the
form of singular specific syntagms. Due to this growing restrictedness, the pro-
ductivity of the construction increasingly relies on persistence effects: Whether or
not a persisting token occurs in the preceding co-text becomes a better predictor
of the occurrence of tokens of the disappearing construction.
If this hypothesis is correct, the preceding discourse contexts of late tokens
of a disappearing construction (i. e., when the construction is already scarcely
attested) should have a higher probability of containing a token of the same con-
struction than early tokens (i. e., when the construction is still widely used, and
the change is only incipient).
The remainder of this paper is dedicated to the evaluation of this prediction
for split auxiliary selection in Spanish. Old Spanish possessed two auxiliaries for
compound tense constructions in which the participle (PtcP) was formed from
intransitive verbs, aver (‘have’) and ser (‘be’) (Benzing 1931; Yllera 1980; Elvira
González 2001; García Martín 2001; Aranovich 2003; Romani 2006; Mateu 2009,
among others). As shown in (1–2), participles formed from predicates involving a
change of state typically select ‘be’, whereas participles formed from predicates
that denote unbounded activities or states typically select ‘have’:

(1) Ellos respusiéronle que pues que en aquel logar eran


they answer.him.pst.pfv. 3pl that since that to that place be.pst.ipfv. 3pl
venidos [...] luego farién quequier que les él
come.ptcp.m.pl later do.prs.cond.3pl whatever that to.them he
mandasse
order.pst.ipfv.sbjv. 3sg
‘They answered him that since they had come to this place […] they would do whatever
he told them to’ [GEI]15

5 In the examples, the source texts are indicated with the abbreviations in square brackets. For
a list of the source texts and the abbreviations, cf. the appendix.
296 Malte Rosemeyer

(2) e he más luengamente vevido e morado aquí


and have.prs.1sg more long.time live.ptcp.m.sg and lodge.ptcp.m.sg here
que en mi tierra
than in my land
‘And I have lived and lodged here for a longer time than in my land’ [DTL]

Split auxiliary selection in Spanish was subject to a gradual grammatical change


by which ‘be’ + PtcP came to be replaced with ‘have’ + PtcP. Whereas split auxiliary
selection appears to have been relatively stable until the middle of the fifteenth
century (cf. Rosemeyer 2012b), a strong analogical expansion of ‘have’ + PtcP into
the functional domains of ‘be’ + PtcP can be observed after this date (Lapesa 1987:
23–24). Aranovich (2003) observes that this replacement process was gradual and
first affected those verb classes peripheral to ‘be’-selection. Only in the sixteenth
century does a strong trend of ‘have’-selection of predicates denoting a change
of state or location arise. However, Aranovich hypothesizes that in addition, the
frequency of occurrence of the corresponding verbs influenced the gradualness
of the replacement process. Less frequent verbs appear to have been more sus-
ceptible to the change than more frequent verbs from the same predicate class.
Evidence for this assumption has been raised in Rosemeyer (2012a) and Rose-
meyer (2013). After 1650, the ‘be’ + PtcP construction becomes less common in
Spanish texts.

3 Data and measurements

This study relies on a corpus of 3,732 auxiliary + PtcP tokens from 41 Spanish
historiographical texts dated between 1270 and 1650. The selection of the texts
closely followed the guidelines regarding the authenticity of the source texts’
manuscripts established by Fernández-Ordoñez (2006).6 The majority of the edi-
tions used are from the Corpus Diacrónico del Español (CORDE, Real Academia
Española 2010), with the exception of parts from the Gran Conquista de Ultramar
(Admyte 1992) and the Spanish translation of the Roman de Troie by the order of
Alfonso XI (Parker 1977). In his study, Smzrecsanyi (2005) compares a much wider
range of data, including different registers and varieties of English (Szmrecsanyi

6 Fernández-Ordoñez (2006) establishes a canon of editions of historiographical texts that are


based on original manuscripts, or manuscripts copied from the original manuscript less than 50
years after the composition of the original text. The restriction of the corpus to these source texts
allows us to filter out data that reflects not the target date of the language, but the manuscript
copier’s language.
How usage rescues the system: Persistence as conservation 297

2005: 121). Since he shows persistence effects to be relevant for all of these dif-
ferent language varieties, the restriction of the present study to historiographical
texts is not expected to distort the results.
The tokens were selected and annotated manually by searching for parti-
ciples. In these queries, the great orthographic variation in the historical texts
was accounted for. This concerned especially the alternations between <b,v,u>,
<z,sz,sc,ç>, <f,ff,h>, <i,y,j,u>, <r,rr>, <s,ss>, and <n,nn,ñ>. Since the query syntax
in CORDE is sensitive to capitalization, additional queries for capitalized partici-
ples were conducted.
The study includes 43 verb lemmata from a wide range of semantic classes of
intransitive verbs: change of location (volver ‘return’, venir ‘come’, etc.), change
of state (morir ‘die’, espantar(se) ‘become frightened’, crecer ‘grow’, etc.), prolon-
gation of a pre-existing state (quedar ‘stay’, fincar ‘stay’, etc.), and state (yacer
‘lie’, etc.). Very frequent verb lemmata were randomized. Thus, the upper limit
of tokens collected per verb and century was defined as 50, since this quantity
allows for statistical modeling. Because CORDE does not offer an automatic ran-
domization procedure for queries in single books, the randomization was done
manually by selecting random tokens from each section of a book.
Each token was annotated for persistence effects in the following fashion.
Szmrecsanyi’s (2005, 2006) work shows that persistence effects crucially hinge
on temporal distance because the effect of the original stimulus decays over
time. Consequently, persistence was modeled as a categorical variable uniting
the factors of the presence/absence of a persisting token and, in the case of the
presence of such a token, the distance between the token and the auxiliary + PtcP
construction. Thus, the variables “PERSIST_BE” and “PERSIST_HAVE” received
the value 0 if no persistence-triggering ‘be’ + PtcP viz. ‘have’ + PtcP token was
present in the preceding co-text.7 If such a token was present in the 1–200 words
preceding the co-text, the respective variable received a value between 1 and 4.
The value was chosen on the basis of the quartiles of the distribution and rep-
resents the distance in words between the closest ‘be’ + PtcP viz. ‘have’ + PtcP

7 Only ‘be’ + PtcP and ‘have’ + PtcP tokens that fall in the envelope of variation were annotated
as persistence triggers. For instance, ‘be’ + PtcP constructions could have a passive function
in Old Spanish. It is often difficult to distinguish between passive ‘be’ + PtcP tokens and ‘be’ +
PtcP tokens with a temporal function (for instance, the verb morir could appear both with an
intransitive verb meaning (‘die’) and a transitive verb meaning (‘be killed’)). In considering only
persistence effects due to form and function priming, the study limits persistence to Szmrecsa-
nyi’s (2005) notion of “α-persistence”. Thus, persistence effects due to purely formal similarity
(“β-persistence”) are not taken into account.
298 Malte Rosemeyer

token with temporal function and the anchor token.8 Thus, the value “1” rep-
resents a very large number of intervening words, whereas “4” represents a very
small number of intervening words, with “2” and “3” as intermediate values.
Although PERSIST_BE and PERSIST_HAVE gave the best results regarding the
synchronic influence of persistence on Spanish auxiliary selection, they proved
to be too fine-grained for the diachronic statistical analysis. This is due to the
fact that both of these variables have a total of five levels (0, 1, 2, 3, 4). In many
instances, there were not enough tokens in one time point to yield a minimum of
occurrences for each of these levels. For this reason, a second set of persistence
variables was created. PERSIST_BE_BIN and PERSIST_HAVE_BIN are binary vari-
ables referring only to the presence/absence of a persisting ‘be’ + PtcP viz. ‘have’
+ PtcP token in the preceding co-text. As an illustration of this coding procedure,
consider example (3).

(3) desque el Rey fue partido de Seuilla por


after the that be.pst.pfv. 3sg leave.ptcp.m.sg from Sevilla because
venir a Madrid, el Maestro de Alcantara et los caualleros
come.inf to Madrid the master of Alcantara and the knights
que eran fincados con el aiustaron en Cordoua
that be.pst.ipfv. 3pl stay.ptcp.m.pl with him agree.pst.pfv. 3pl in Cordoba
con algunos ricos hombres et concellos de la frontera
with some rich men and councils of the border
‘When the King had left Sevilla for Madrid, the master of Alcantara and the knights
who had stayed with him came to an agreement with some rich men and border cities’
[CRO]

The ‘be’ + PtcP token eran fincados is preceded by the ‘be’ + PtcP token fue partido
which is similar in function. Consequently, a persistence effect is assumed and
the example receives the value “TRUE” for the variable PERSIST_BE_BIN. There
are 15 intervening words between the first and the second mention of ‘be’. Due
to this rather small distance in words, the example receives the value “1” on the
variable PERSIST_BE. Note that there is no ‘have’ + PtcP token in example (3).
Neither is there a ‘have’ + PtcP token in the rest of the preceding co-text (not
given in (3)). Consequently, example (3) receives the value “FALSE” for the vari-
able PERSIST_HAVE_BIN, and “0” on the variable PERSIST_HAVE.

8 As a result, the values of the variables PERIST_BE and PERSIST_HAVE represent slightly differ-
ent distances in words between stimulus and anchor token.
How usage rescues the system: Persistence as conservation 299

4 Descriptive analysis

Szmrecsanyi’s (2005, 2006) analysis of the influence of persistence on morpho-


syntactic alternations predicts that a higher score for the variable PERSIST_BE
leads to a higher frequency of the selection of ‘be’ over ‘have’. By contrast, a
higher score for the variable PERSIST_HAVE would be expected to favor the selec-
tion of ‘have’ over ‘be’. The following two tables illustrate that this expectation is
borne out by the data.

Table 1: Distribution of ‘have’ + PtcP and ‘be’ + PtcP according to PERSIST_BE

‘have’ + PtcP ‘be’ + PtcP TOTAL


N % N % N %

0: No persisting ‘be’ + PtcP token 1453 49.0 1513 51.0 2966 100
1: Textual distance 112–200 words 46 24.5 142 75.5 188 100
2: Textual distance 67–111 words 50 26.0 142 74.0 192 100
3: Textual distance 32–66 words 52 26.9 141 73.1 193 100
4: Textual distance 1–31 words 49 25.4 144 74.6 193 100

SUM 1650 2082 3732 100

Table 2: Distribution of ‘have’ + PtcP and ‘be’ + PtcP according to PERSIST_HAVE

‘have’ + PtcP ‘be’ + PtcP TOTAL


N % N % N %

0: No persisting ‘have’ + PtcP token 573 34.7 1080 65.3 1653 100
1: Textual distance 109–200 words 231 44.6 287 55.4 518 100
2: Textual distance 63–108 words 267 50.8 259 49.2 526 100
3: Textual distance 30–62 words 273 52.9 243 47.1 516 100
4: Textual distance 1–29 words 306 59.0 213 41.0 519 100

SUM 1650 2082 3732 100

The percentages of use of ‘have’ + PtcP and ‘be’ + PtcP vary within a range of
about 25 percent according to whether or not a persistence-triggering ‘have’ +
PtcP or ‘be’ + PtcP token is present in the preceding co-text. Table 1 demonstrates
that in the absence of a persisting ‘be’ + PtcP token, the distribution of ‘have’ +
PtcP and ‘be’ + PtcP is rather balanced (49 percent vs. 51 percent). However, in
tokens where a persisting ‘be’ + PtcP token is present, ‘have’ + PtcP is much less
frequent than ‘be’ + PtcP (approximately 26 percent vs. 74 percent). Note that con-
300 Malte Rosemeyer

trary to the expectation, a smaller distance between a persisting ‘be’ + PtcP token
and anchor token does not appear to reinforce this tendency.
Table 2 demonstrates that the persistence effect operates for both alternatives.
In the absence of a persisting ‘have’ + PtcP token, ‘have’ + PtcP is less frequent
than ‘be’ + PtcP (34.7 percent vs. 65.3 percent). If a ‘have’ + PtcP token is present
however, ‘have’ + PtcP is more frequent than ‘be’ + PtcP. This effect increases with
a decreasing distance in words and is strongest in condition 4, with the smallest
distance in words (1–31 words), where the relative frequency of ‘have’-selection is
59 percent and the relative frequency of ‘be’-selection is 41 percent.
Note that in absolute numbers, the incidence of persistence-triggering
‘have’ + PtcP is almost three times more frequent than the occurrence of per-
sistence-triggering ‘be’ + PtcP. Out of 2,079 tokens one persisting ‘have’ + PtcP
token is attested. By contrast, only one out of 766 tokens is a persisting ‘be’ +
PtcP token. This observation is unsurprising given that ‘have’ + PtcP gradually
became the more frequent variant, replacing ‘be’ + PtcP. In addition, the relative
scarcity of tokens involving a persisting ‘be’ + PtcP token may explain why the
descriptive analysis does not demonstrate a word distance effect for the variable
PERSIST_BE.
With the exception of the effect of the distance between stimulus and anchor
token on ‘be’-selection, the data from Spanish auxiliary selection meets the
expectations regarding the synchronic influence of persistence gathered from
the discussion of Szmrecsanyi’s (2005, 2006) analysis. These descriptive findings
thus illustrate the influence of usage on morphosyntactic phenomena such as
auxiliary selection. Spanish auxiliary selection is crucially conditioned by the
verb lemma from which the participle is formed. However, the writers of the his-
toriographical texts gathered in the corpus did not base their decision to use one
auxiliary over another one solely on factors such as the semantics of the auxili-
ated verb. The existence of a persistence effect in the data suggests a view on com-
petence that is highly dependent on contextual factors, particularly frequency of
occurrence. Persistence effects represent a direct influence of a speaker’s expe-
rience with language on his/her language use.
In order to measure the diachronic development of the influence of these
persistence effects, it is necessary to establish a chronology of the disappearance
of ‘be’ + PtcP in the data. This study employs the variability-based neighbor clus-
tering (VNC) method developed in Gries and Hilpert (2008) and Hilpert and Gries
(2009).9 VNC offers a data-driven method to statistically identify qualitatively

9 All statistical tests and plots presented in this paper were conducted using the open-source
statistical software R (R Development Core Team 2012).
How usage rescues the system: Persistence as conservation 301

different temporal stages in the development of a given linguistic phenomenon.


This is achieved by a modified hierarchical agglomerative clustering method.
Thus, the algorithm quantifies the dissimilarity of all data points (representing
50-year intervals) with regard to a specific variable. In this case, this variable is
the number of ‘be’ + PtcP tokens in comparison to the number of ‘have’ + PtcP
tokens at each point in time. It then recursively merges those two data points
adjacent in time that are most similar to one another into a new data point, until
the last two data points have been merged. The result is a hierarchically organ-
ized tree of clustered data points that allows the identification of temporal stages
in a diachronic process (Gries and Hilpert 2008: 64–65):

96
Distance in summed standard deviations

80
89
82
75

Tokens per million words


60 68
61
54
40 47
40
33
20 26
19
12
6
0 0
1275 1325 1375 1425 1475 1525 1575 1625
Time

Figure 1: Variability-based neighbor clustering (VNC) analysis for the development of the per-
centage of use of ‘be’-selection in the corpus of historiographical texts

In Figure 1, the line with breakpoints in the background plots the frequency of
‘be’-selection relative to ‘have’-selection in each of the eight time periods per
million words. The dendrogram in the foreground illustrates the clustering pro-
posed by VNC on the basis of the data. The dendrogram suggests two temporal
clusters whose distance measured in summed standard deviation is greatest: a
first cluster spanning the period from the thirteenth century until the mid-fif-
teenth century, and a second cluster spanning the period from the mid-fifteenth
century until the mid-seventeenth century. In line with the description given in
Section 2, the pace of the replacement of ‘be’ + PtcP with ‘have’ + PtcP did not
accelerate until after the beginning of the fifteenth century. Based on the VNC
analysis, the data was therefore divided into two time periods: Old Spanish (1270–
1424) and Early Modern Spanish (1425–1650).
302 Malte Rosemeyer

If, as hypothesized in this paper, the use of the disappearing construction


‘be’ + PtcP increasingly relies on persistence effects, this phenomenon should
only be palpable in tokens after the mid-fifteenth century. The following plot
illustrates the development of the influence of the binary persistence variable
PERSIST_BE_BIN on Spanish auxiliary selection over time:

100%
Percentage selection BE over HAVE

BE + PtcP in previous co-text


No BE + PtcP in previous co-text
80%

60%

40%

20%

1275 1325 1375 1425 1475 1525 1575 1625


Time

Figure 2: Development of the influence of the binary persistence variable PERSIST_BE_BIN on


Spanish auxiliary selection over time

The distance between the two lines (referring to the percentage of ‘be’-selection
in tokens where PERSIST_BE_BIN = TRUE at a point in time, and the percentage
of ‘be’-selection in tokens where PERSIST_BE_BIN = FALSE at a point in time)
gradually becomes greater. As expected, this effect increases in strength only after
the beginning of the fifteenth century: From 1425 onwards ‘be’ + PtcP tokens that
are preceded by a persisting ‘be’ + PtcP token are relatively more frequent than
‘be’ + PtcP tokens for which no persisting ‘be’ + PtcP is attested in the co-text.
Consequently, the increasing dependence of ‘be’ + PtcP tokens on persistence
appears to be related to the process of disappearance of ‘be’ + PtcP.

5 Multivariate analysis

The descriptive results summarized in Figure 2 were evaluated for significance


using generalized linear mixed-effect regression modeling. In Section 5.1, the
model selection process is described. Section 5.2 gives a description of the results.
In Section 5.3, these results are discussed.
How usage rescues the system: Persistence as conservation 303

5.1 Model selection

Generalized linear mixed-effect regression models (henceforth: GLMMs) provide a


way to calculate the degrees of correlation of several predictor variables with a cat-
egorical dependent variable (Pinheiro et al. 2009). Like other regression models,
GLMMs work iteratively: They estimate the maximum likelihood of the influence
of a predictor variable on the dependent variable by calculating a number of pos-
sible scenarios of the interplay of all predictor variables. GLMMs evaluate the sta-
tistical significance of an effect much more precisely than simple chi-square tests
because as a multivariate statistical method they allow the simultaneous assess-
ment of a number of predictors that may operate on the same phenomenon.
In contrast to simple logistic regression models, GLMMs allow for the inclu-
sion of random effects. While the slope of the regression line (and consequently,
the mean of the variable) is calculated for fixed effect variables, for random effect
variables the random intercept for the respective distribution (i. e., the variance
of the variable) is calculated. Consequently, including a variable as a random
effect allows us to account for variables that represent a more or less random
selection out of a greater population, but which are assumed to influence the
dependent variable (cf. Baayen 2008: chapter 7 for more information). In this
study, the variable VERB LEMMA (the verb lemma from which the participle is
formed) qualifies as a random variable. Thus, the values of this variable represent
only a subset of all possible values of the variables (more verb lemmata could
have been included). Controlling for the random variable VERB LEMMA allows
us to exclude the possibility that an effect only operates in a subset of the data,
i. e. only some of the verbs.
The generalized linear mixed-effect regression model described in this
section measures the probability of a positive value on the dependent variable
BE (‘be’-selection over ‘have’-selection) as a function of the date of occurrence
of the token’s source text (variable TIME) and the presence of a ‘be’ + PtcP token
in the preceding co-text (variable PERSIST_BE_BIN). The variable VERB LEMMA
was included as a random effect. Both TIME and PERSIST_BE_BIN were mod-
elled as binary variables with the levels “yes” and “no”. Given that the VNC
analysis suggested a binary division between Old Spanish (1270–1424) and Early
Modern Spanish (1425–1650), it was decided to model the variable TIME accord-
ing to whether the source text of the token is from Old Spanish (value = “no”) or
Early Modern Spanish (value = “yes”). The persistence variable PERSIST_BE_BIN
was modeled as a binary variable because the descriptive analysis in Section 4
has shown that (a) the data does not appear to be sufficient to include the word
distance between persisting stimulus and anchor token as a predictor, and
304 Malte Rosemeyer

(b) the word distance does not appear to increase the effect of persistence on
‘be’-selection.
As a last predictor variable, an interaction term between TIME and PERSIST_
BE_BIN was included. This interaction term measures whether the probability
of a persistence effect for ‘be’ + PtcP tokens increased or decreased in Early
Modern Spanish in comparison to Old Spanish. Table 3 summarizes the variables
employed in the regression model.

Table 3: Coding summary of the variables employed in the regression model

Variable type Variable Coding Values


type

DEPENDENT BE Binary FALSE: ‘have’ + PtcP selected


VARIABLE TRUE: ‘be’ + PtcP selected

FIXED EFFECTS TIME Binary FALSE: Old Spanish (1270–1424)


TRUE: Early Modern Spanish (1425–1650)

PERSIST_BE_BIN Binary FALSE: Token is not preceded by a ‘be’ +


PtcP token with temporal function
TRUE: Token is preceded by a ‘be’ + PtcP
token with temporal function

TIME: PERSIST_BE_BIN Binary: Interaction between the two binary


Binary variables TIME and PERSIST_BE

RANDOM VERB LEMMA Factor 43 values (i. e., the 43 verb lemmata from
EFFECTS which the participles are formed)

Using Pinheiro et al.’s (2009) nlme package in R, this statistical setup yielded
the regression formula lmer (BE ~ TIME + PERSIST_BE_BIN + TIME : PERSIST_
BE_BIN + ( 1 | VERB LEMMA, data = file, family = “binomial”). As evident in the
formula, the model was set to assume a binomial distribution because essentially,
it is a logistic regression model with a binary outcome. Table 4 summarizes the
results from the regression model.
Before the description of these results in the next section, a short evaluation
of the model fit of the model is in order. The model scores high for the C index of
concordance (0.90 of 1) and Somer’s dxy (0.80 of 1). Although all of the predictors
significantly enhance the model fit, the good score of the model is above all a
result of the random effect VERB LEMMA. The model calculates a high degree of
variance (4.51) for the random effect VERB LEMMA. As predicted by the literature
reviewed in Section 2, auxiliary selection is determined much more by the verb
lemma from which the participle is formed than the author of the source text. For
How usage rescues the system: Persistence as conservation 305

Table 4: Results from the generalized linear mixed-effect regression model

VARIABLE ESTIMATE ODDS RATIO STANDARD ERROR P

(Intercept) 1.342 3.826 0.375 0.000


TIME -2.778 0.062 0.130 0.000
PERSIST_BE_BIN 0.329 1.390 0.192 0.087
TIME: PERSIST_BE_BIN 0.762 2.143 0.250 0.002

MODEL EVALUATION C index of concordance = 0.90


Somer’s dxy = 0.80
AIC = 3163
N = 3732

instance, the fact that the event structure template of verbs such as morir (‘die’)
involves a transition is a very potent predictor of Spanish auxiliary selection.
Figure 3 illustrates this fact. It gives the by-word random intercepts calculated
by the model for each verb. Each point in the plot refers to one verb. Its value
on the y-axis represents the adjusted intercept value for each of the values of
the variable VERB LEMMA with regard to the dependent variable BE. Thus, verbs
with a random intercept higher than 0 typically appear in the ‘be’ + PtcP con-
struction, whereas verbs with a random intercept lower than 0 typically appear
in the ‘have’ + PtcP construction. For the sake of clarity, the names of some of

Figure 3: By-word random intercepts for each verb lemma


306 Malte Rosemeyer

the highest- and lowest-ranking verbs are given next to the points they are rep-
resented by.10

5.2 Description of the results

The description of the results summarized in Table 4 focuses on two values for
each effect: the odds ratio (OR) and the p-value (P). P evaluates the degree of sta-
tistical significance of an effect. Each effect to which the regression model assigns
a p-value lower than the threshold value of 0.05 can be assumed to be statistically
significant. The OR, by contrast, evaluates the strength and direction of the corre-
lation between the predictor variable and the dependent variable. ORs assume a
value between 0 and ∞. If the OR is below 1, a positive value on the predictor vari-
able lowers the probability of a positive value on the dependent variable (in this
case, ‘be’-selection). If the OR is above 1, a positive value on the predictor variable
raises the probability of a positive value on the dependent variable. Crucially, the
strength of an OR does not imply statistical significance as such: An effect with a
very high or very low OR might not reach statistical significance.
The model demonstrates a strong effect of TIME on auxiliary selection.
Thus, in comparison to tokens from source texts before 1425, the usage frequency
of ‘be’ + PtcP in comparison to ‘have’ + PtcP drops significantly after 1425
(OR = 0.062, P < 0.001).
Although PERSIST_BE_BIN only reaches marginal statistical significance in
the regression model, subsequent analyses over a larger corpus in Rosemeyer
(2014) have shown that if a greater amount of examples is included, the main
effect of persistence reaches statistical significance. If a ‘be’ + PtcP token that
falls in the envelope of variation occurs in the preceding co-text of an auxiliary +
PtcP token, ‘be’-selection becomes more probable (OR = 1.390, P < 0.1).
The interaction between TIME and PERSIST_BE_BIN has a significant pos-
itive influence on the probability of ‘be’-selection. Although the usage frequency
of ‘be’-selection decreases rapidly in Early Modern Spanish, the negative effect of
TIME on ‘be’-selection is to a certain extent cushioned by PERSIST_BE_BIN. As
predicted, late ‘be’ + PtcP tokens are more likely to involve a persisting ‘be’ + PtcP
token in the preceding co-text than early ‘be’ + PtcP tokens (OR = 2.143, P < 0.01).

10 See Rosemeyer (2012a) for a more comprehensive discussion of the influence of verb seman-
tics on Spanish auxiliary selection.
How usage rescues the system: Persistence as conservation 307

5.3 Discussion of the results

The results from the regression model suggest that from a diachronic perspective,
persistence influences a language’s systematicity. In particular, persistence has a
conserving effect: If a token of the disappearing ‘be’ + PtcP construction is used,
the probability that ‘be’ + PtcP is used in the following discourse rises. This leads
to “islands of use” of the ‘be’ + PtcP in the texts. Rather than being scattered over
a text, later examples of ‘be’ + PtcP are clustered in specific text passages. Within
these text passages, the use of ‘be’ + PtcP is conserved.
Although entrenchment and persistence both have been shown to fulfill a
conserving function in diachronic processes, the findings suggest a difference
between conserving effects due to entrenchment and conserving effects due to
persistence. This difference concerns the question of syntactic productivity (in
the sense of Barðdal 2008). Entrenchment always affects specific linguistic ele-
ments: The repeated use of a specific linguistic element leads to a stronger cogni-
tive representation of that item. As a result, processes of language change operat-
ing on its paradigm have less of an effect on highly frequent linguistic elements.
Although this process conserves systematicity in the sense that an alternation is
conserved, it also creates irregularity in that the paradigm of the disappearing
construction becomes fractured. In the late stages of Spanish auxiliary selection,
some verbs denoting a change of location usually select ‘be’, while others select
‘have’.
By contrast, the conserving effect of persistence does not create this type
of irregularity. Crucially, the mixed-effect regression modeling proposed in this
section controls for verb-specific differences. Although a quantitative correlation
between frequency of use and persistence could be assumed (linguistic elements
that appear more frequently also trigger more persistence effects), the persistence
effects observed in this study do not have a different strength for different lin-
guistic elements, but rather work globally. This is because the persisting token
need not exactly match the ‘be’ + PtcP token it triggers. Persistence consequently
involves processes of pattern recognition, i. e. analogy. In contrast, analogical
thinking is rather irrelevant for entrenchment processes where the cognitive rep-
resentation of the exact linguistic item is strengthened. Due to this difference in
the conceptual nature of entrenchment and persistence, it can be argued that
whereas the conserving effect of entrenchment creates irregularity in the para-
digm of a disappearing construction, the conserving effect of persistence affects
all instantiations of a disappearing construction alike. This is an empirical ques-
tion that could be addressed and elaborated in future research on frequency
effects in language change.
308 Malte Rosemeyer

6 Summary and outlook

This paper has given further evidence of a strong relationship between usage and
systematicity in language. With entrenchment and persistence, two processes
crucial for the rise and conservation of systematicity have been described. Since
both entrenchment and persistence (temporarily) strengthen the cognitive rep-
resentation of a linguistic element, they lead to conserving effects in diachronic
processes in which a construction is disappearing from use. As a case study, split
auxiliary selection in Spanish was investigated. It was shown that later tokens of
the disappearing ‘be’ + PtcP construction are more likely to involve a persisting
‘be’ + PtcP token in the preceding co-text than earlier ‘be’ + PtcP tokens. The
use of ‘be’ + PtcP thus appears to be increasingly relying on persistence effects,
which is why persistence is argued to have a conserving effect on disappearing
grammatical constructions.
The analysis proposed in this paper thus emphasizes the similarities between
entrenchment and persistence with regard to their effect. However, it is also sug-
gested that the two processes may have different diachronic effects on the sys-
tematicity of the disappearing construction’s paradigm. Whereas conservation
due to entrenchment affects specific linguistic elements and therefore leads to
irregular and fractured paradigms, conservation due to persistence acts globally.
Due to the reliance of persistence on analogical thinking, the conserving effect
of persistence is expected to affect all linguistic elements belonging to a certain
construction alike. I leave the investigation of this hypothesis to further research.
In summary, this paper has illustrated the benefits of a usage-based approach
to historical linguistics. A speaker’s linguistic behavior is crucially determined
by his or her experience with language. The effect of linguistic experience is not
restricted to very recent language events (persistence), but can accumulate over
time (entrenchment). Acknowledging the intricate relationship between the use
of a language and its systematicity offers an explanation for quantitative effects
in linguistic data that are unexpected from the perspective of an abstract “system-
oriented” approach.

References
Abbot-Smith, Kirsten and Heike Behrens (2006): How known constructions influence
the acquisition of other constructions: The German periphrastic passive and future
constructions. Cognitive Science 30: 995–1026.
Admyte (1992): Archivo digital de manuscritos y textos españoles, Tomo 0. Madrid: Micronet.
Aranovich, Raúl (2003): The semantics of auxiliary selection in Old Spanish. Studies in
Language 27(1): 1–37.
How usage rescues the system: Persistence as conservation 309

Baayen, Harald (2008): Analyzing Linguistic Data. A Practical Introduction to Statistics Using
R. Cambridge: Cambridge University Press.
Baddeley, Alan D. (1966): Short-term memory for word sequences as a function of acoustic,
semantic and formal similarity. Quarterly Journal of Experimental Psychology 18: 362–365.
Barðdal, Jóhanna (2008): Productivity : Evidence from Case and Argument Structure in
Icelandic. Amsterdam: John Benjamins.
Behrens, Heike (2009): Usage-based and emergentist approaches to language acquisition.
Linguistics 47(2): 383–411.
Benzing, Joseph (1931): Zur Geschichte von ser als Hilfszeitwort bei den intransitiven Verben im
Spanischen. Zeitschrift für romanische Philologie 51: 385–460.
Bybee, Joan L. (2002): Sequentiality as the basis of constituent structure. In: Talmy Givón
and Bertram F. Malle (eds.), The Evolution of Language from Pre-Language, 109–132.
Amsterdam/Philadelphia: John Benjamins.
Bybee, Joan L. (2003): Mechanisms of change in grammaticization: the role of frequency. In:
Richard Janda and Brian Joseph (eds.), The Handbook of Historical Linguistics, 624–647.
Oxford: Blackwell.
Bybee, Joan L. (2006): From usage to grammar: the mind’s response to repetition. Language 82
(4): 711–733.
Bybee, Joan L. (ed.) (2007): Frequency of Use and the Organization of Language. Oxford: Oxford
University Press.
Bybee, Joan L. (2010): Language, Usage, and Cognition. Cambridge/New York: Cambridge
University Press.
Bybee, Joan L. and Paul J. Hopper (eds.) (2001a): Frequency and the Emergence of Linguistic
Structure. Amsterdam: John Benjamins.
Bybee, Joan L. and Paul J. Hopper (2001b): Introduction to frequency and the emergence
of linguistic structure. In: Joan L. Bybee and Paul J. Hopper (eds.), Frequency and the
Emergence of Linguistic Structure, 1–26. Amsterdam: John Benjamins.
Bybee, Joan L. and Rena Torres Cacoullos (2009): The role of prefabs in grammaticization: how
the particular and the general interact in language change. In: Roberta Corrigan, Edith
A. Moravcsik, Hamid Ouali and Kathleen M. Wheatley (eds.), Formulaic Language, Volume
I: Distribution and Historical Change, 187–217. Amsterdam: John Benjamins.
Chomsky, Noam (1975): The Logical Structure of Linguistic Theory. New York: Plenum Press.
Coseriu, Eugenio (1974): Synchronie, Diachronie und Geschichte: Das Problem des
Sprachwandels: Übersetzt von Helga Sohre. München: Fink.
Ellis, Nick (1996): Sequencing in SLA: phonological memory, chunking and points of order.
Studies in Second Language Acquisition 18: 91–126.
Elvira González, Javier (2001): Intransitividad escindida en español: El uso auxiliar de ser en
español medieval. EdLing 15: 201–245.
Fernández-Ordóñez, Inés (2006): La Historiografía medieval como fuente de datos lingüísticos.
Tradiciones consolidadas y rupturas necesarias. In: José Jesus Bustos Tovar and José Luis
Girón Alchonchel (eds.), Actas del VI Congreso Internacional de Historia de la Lengua
Española, 1779–1807. Madrid: Arco Libros.
García Martín and José María (2001): La formación de los tiempos compuestos del verbo en
español medieval y clásico. Aspectos fonológicos, morfológicos y sintácticos. València:
Universitat de València.
Goldberg, Adele E. (2006): Constructions at Work: the Nature of Generalization in Language.
Oxford: Oxford University Press.
310 Malte Rosemeyer

Gries, Stefan (2005): Syntactic priming: a corpus-based approach. Journal of Psycholinguistic


Research 34(4): 365–399.
Gries, Stefan and Martin Hilpert (2008): The identification of stages in diachronic data:
variability-based neighbour clustering. Corpora 3(1): 59–81.
Griffin, Zenzi M. (2002): Recency effects for meaning and form in word selection. Brain and
Language 80: 465–487.
Hilpert, Martin and Stefan Gries (2009): Assessing frequency changes in multistage diachronic
corpora: Applications for historical corpus linguistics and the study of language
acquisition. Literary and Linguistic Computing 24(4): 385–401.
Hoey, Michael (2004): The textual priming of lexis. In: Guy Aston, Silvia Bernardini and Dominic
Steward (eds.), Corpora and Language Learners: 21–41. Amsterdam: John Benjamins.
Hoey, Michael (2005): Lexical Priming. A New Theory of Words and Language. New York:
Routledge.
Hopper, Paul J. (1987): Emergent grammar. Berkeley Linguistics Society 13: 139–157.
Kailuweit, Rolf (2011): Le choix de l’auxiliaire: Être ou avoir en français standard contemporain.
In: Renata Enghels, Machteld Meulleman and Clara Vanderschueren (eds.), Peregrinatio
in Romania: Artículos en homenaje a Eugeen Roegiest con motivo de su 65 cumpleaños,
397–420. Gent: Academia Press.
Langacker, Ronald W. (1987): Foundations of Cognitive Grammar. Theoretical Prerequisites.
Stanford: Stanford University Press.
Lapesa, Rafael (1987): Estudios lingüísticos, literarios y estilísticos. Valencia: Servicio de
Publicación Universitat de Valencia.
Lieven, Elena V. M., Julian M. Pine and Gillian Baldwin (1997): Lexically-based learning and early
grammatical development. Journal of Child Language 24: 187–219.
MacWhinney, Brian (2006): Emergentism – use often and with care. Applied Linguistics 27(4):
729–740.
Mateu, Jaume (2009): Gradience and auxiliary selection in Old Catalan and Old Spanish. In:
Paola Crisma and Giuseppe Longobardi (eds.), Historical Syntax and Linguistic Theory,
176–193. Oxford: Oxford University Press.
Neely, James H. (1977): Semantic priming and retrieval from lexical memory: roles of inhibi-
tionless spreading activation and limited capacity attention. Journal of Experimental
Psychology: General 106(3): 226–254.
Neely, James H. (1991): Semantic priming effects in visual word recognition: a selective review
of current findings and theories. In: Derek Besner and Glyn W. Humphreys (eds.), Basic
Processes in Reading: Visual Word Recognition, 264–336. Hillsdale, NJ: Erlbaum.
Newell, Allen (1990): Unified Theories of Cognition. Cambridge: MIT Press.
Parker, Kelvin M. (1977): La versión de Alfonso XI del Roman de Troie. Ms. H-j-6 del Escorial.
Illinois: Applied Literature Press.
Pinheiro, Jose, Douglas Bates, Saikat DebRoy, Deepayan Sarkar and the R Development Core
Team (2009): nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1–96.
R Development Core Team (2012): R: A Language and Environment for Statistical Computing,
Reference Index Version GUI 1.40-devel. R Foundation for Statistical Computing, Vienna,
Austria, URL http://www.R-project.org.
Real Academia Española (2010): Banco de datos (CORDE) [en línea]. Corpus diacrónico del
español. <http://www.rae.es> [17. 08. 2010].
How usage rescues the system: Persistence as conservation 311

Romani, Patrizia (2006): Tiempos de formación romance I: Los tiempos compuestos. In:
Concepción Company Company (ed.): Sintaxis histórica de la lengua española, 243–348.
México: Universidad Nacional Autónoma de México.
Rosemeyer, Malte (2012a): Auxiliary selection in Spanish. Gradience, gradualness, and
conservation. Ph.D. thesis, Albert-Ludwigs-Universität, Freiburg.
Rosemeyer, Malte (2012b): How to measure replacement: Auxiliary selection in Old Spanish
bibles. Folia Linguistica Historica 33(1): 135–174.
Rosemeyer, Malte (2013): Tornar and volver: The interplay of frequency and semantics in
compound tense auxiliary selection in Medieval and Classical Spanish. In: Jóhanna
Barðdal, Elly van Gelderen and Michela Cennamo (eds.), Argument Structure in Flux,
435–458. Amsterdam/Philadelphia: John Benjamins.
Rosemeyer, Malte (2014): Auxiliary Selection in Spanish. Gradience, Gradualness, and
Conservation. Amsterdam, Philadelphia: Benjamins.
Szmrecsanyi, Benedikt (2005): Language users as creatures of habit: A corpus-based analysis
of persistence in spoken English. Corpus Linguistics and Linguistic Theory 1(1): 113–150.
Szmrecsanyi, Benedikt (2006): Morphosyntactic Persistence in Spoken English. A Corpus
Study at the Intersection of Variationist Sociolinguistics, Psycholinguistics, and Discourse
Analysis. Berlin/New York: Mouton de Gruyter.
Tomasello, Michael (1992): First Verbs: A Case Study of Early Grammatical Development.
Cambridge: Cambridge University Press.
Travis, Catherine E. (2007): Genre effects on subject expression in Spanish: Priming in narrative
and conversation. Language Variation and Change 19(2): 101–135.
Travis, Catherine E. and Rena Torres Cacoullos (2012): What do subject pronouns do in
discourse? Cognitive Linguistics 23(4): 711–748.
Yllera, Alicia (1980): Sintaxis histórica del verbo español: Las perífrasis medievales. Zaragoza:
Universidad de Zaragoza.
312 Malte Rosemeyer

Appendix: Texts and editions in the corpus of historiographical


texts 11

Table 5: Texts and editions in the corpus of historiographical texts

ID Title Date Author Source Edition

EDEI Estoria de Espanna que 1270 Alfonso X CORDE Pedro Sánchez Prieto, Alcalá
fizo el muy noble rey don de Henares: Universidad de
Alfonsso, fijo del rey don Alcalá de Henares, 2002
Fernando et de la reyna ...
EDEII Estoria de España, II 1275 Alfonso X CORDE
Lloyd A. Kasten; John J. Nitti,
Madison: Hispanic Seminary
of Medieval Studies, 1995
GEI General Estoria. Primera 1277 Alfonso X CORDE Pedro Sánchez Prieto-Borja,
parte Alcalá de Henares: Univer-
sidad de Alcalá de Henares,
2002
GEIV General Estoria. Cuarta 1280 Alfonso X CORDE Pedro Sánchez-Prieto Borja,
parte. Alcalá de Henares: Univer-
sidad de Alcalá, 2002
GCU Gran Conquista de Ultramar 1293 Anonymous ADMYTE ADMYTE
CSA Crónica de Sancho IV. Ms. 1340 Anonymous CORDE Pedro Sánchez-Prieto Borja,
829 BNM Alcalá de Henares: Univer-
sidad de Alcalá de Henares,
2004
RDT Roman de Troie 1345 Anonymous BOOK Kelvin M. Parker, Illinois:
Applied Literature Press,
1977
SUM Sumas de la historia 1350 Anonymous CORDE Robert G. Black, Madison:
troyana de Leomarte Hispanic Seminary of
Medieval Studies, 1995
GCE1 Gran crónica de España, III. 1384 Fernández CORDE Juan Manuel Cacho Blecua,
BNM, ms. 10134 de Heredia, Zaragoza: Universidad de
Juan Zaragoza, 2003
GCE2 Gran crónica de España, 1385 Fernández CORDE Regina af Geijerstam,
I. Ms. 10133 BNM de Heredia, Madison: Hispanic Seminary
Juan of Medieval Studies, 1995
CDP Crónica del rey don Pedro 1400 López de CORDE Germán Orduna, Buenos
Ayala, Pero Aires: SECRIT, 1994

11 When the date of a source book was given as an approximate time span, the mean of that time
span was used as the date. For instance, the Atalaya corónicas [ATA] were supposedly written
between 1443 and 1454. Therefore, tokens from this source book were assigned the date 1449.
How usage rescues the system: Persistence as conservation 313

ID Title Date Author Source Edition

DTL Taducción de las Décadas 1400 López de CORDE Curt J. Wittlin, Barcelona:
de Tito Livio Ayala, Pero Puvill, 1982
TAM Historia del gran Tamorlán. 1406 González CORDE Juan Luis Rodríguez Bravo;
BNM 9218 de Clavijo, María del Mar Martínez Rod-
Ruy ríguez, Hispanic Seminary of
Medieval Studies (Madison),
1986
CRR Crónica del rey don Rodrigo, 1430 Corral, CORDE James Donald Fogelquist,
postrimero rey de los godos Pedro de Madrid: Castalia, 2001
(Crónica sarracina)
VIC El victorial 1440 Díaz de CORDE Rafael Beltrán Llavador,
Games, Madrid: Taurus, 1994
Gutierre
ATA Atalaya corónicas. British 1449 Martínez CORDE James B. Larkin, Madison:
L 288 de Toledo, Hispanic Seminary of
Alfonso Medieval Studies, 1985
GJU Guerra de Jugurtha de Caio 1450 Ramírez de CORDE Jerry R. Rank, Madison:
Salustio Crispo. Escorial Guzmán, Hispanic Seminary of
G. III.11 Vasco Medieval Studies, 1995
REP Repertorio de príncipes de 1471 Escavias, CORDE Michel García, Madrid: Insti-
España Pedro de tuto de Estudios Giennenses,
1972
IBF Istoria de las bienandanzas 1474 García de CORDE Ana María Marín Sánchez,
e fortunas Salazar, Madrid: Corde, 2000
Lope
ENRC Crónica de Enrique IV de 1482 Anonymous CORDE María Pilar Sánchez Parra,
Castilla 1454–1474 Madrid: Ediciones de la
Torre, 1991
CRCP Crónica de los Reyes 1482 Pulgar, CORDE Juan de Mata Carriazo,
Católicos (Hernando del Hernando Madrid: Espasa-Calpe, 1943
Pulgar) del
CVC Claros varones de Castilla 1486 Pulgar, CORDE Óscar Perea Rodríguez,
Hernando Madrid: Universidad Com-
del plutense, 2003
CBC Compilación de las batallas 1487 Rodríguez CORDE Lago Rodríguez López,
campales de Almela, Madison: Hispanic Seminary
Diego of Medieval Studies, 1992
ENRE Crónica de Enrique IV 1492 Enríquez CORDE Aureliano Sánchez Martín,
del Cas- Valladolid: Universidad de
tillo, Diego Valladolid, 1994
MAE Hechos del Maestre de 1492 Maldo- CORDE Antonio Rodríguez Moñino,
Alcántara don Alonso de nado, Madrid: Revista de Occi-
Monroy Alonso dente, 1935
314 Malte Rosemeyer

ID Title Date Author Source Edition

TCAF Traducción de la Corónica 1499 García CORDE José Carlos Pino Jiménez,
de Aragón de fray Gauberto de Santa Madison: Hispanic Seminary
Fabricio de Vagad María, of Medieval Studies, 2002
Gonzalo
TCAL Traducción de la Crónica de 1524 Molina, CORDE Óscar Perea, Madrid: Uni-
Aragón de Lucio Marineo Juan de versidad Complutense de
Siculo Madrid, 2003
CBE Crónica burlesca del 1527 Zúñiga, CORDE José Antonio Sánchez Paso,
emperador Carlos V Francés de Salamanca: Universidad de
Salamanca, 1989
NAU Los Naufragios 1541 Núñez CORDE Enrique Pupo-Walker,
Cabeza de Madrid: Castalia, 1992
Vaca, Alvar
HDI Historia de las Indias 1544 Casas, Fray CORDE Paulino Castañeda Delgado,
Bartolomé Madrid: Alianza Editorial,
de las 1994
CEC Crónica del Emperador 1550 Santa Cruz, CORDE Ricardo Beltrán y Antonio
Carlos V Alonso de Blázquez, Madrid: Real
Academia de la Historia,
1920
ANA Anales de la corona de 1562 Zurita, CORDE Ángel Canellas López,
Aragón. Primera parte Jerónimo Zaragoza: CSIC, 1967
GCP Las guerras civiles peruanas 1569 Cieza de CORDE Carmelo Sáenz de Santama-
León, ría, Madrid: CSIC, 1985
Pedro
CNE Historia verdadera de la 1572 Díaz del CORDE Carmelo Sáenz de Santa
conquista de la Nueva Castillo, María, Madrid: CSIC, 1982
España Bernal
QUI Quinquenarios o Historia de 1576 Gutiérrez CORDE Madrid: Ediciones Atlas,
las guerras civiles del Perú de Santa 1963
(1544–1548) y de otros Clara,
sucesos de las Indias Pedro
GCG Guerras civiles de Granada. 1595 Pérez de CORDE Shasta M. Bryant, Newark,
1ª parte Hita, Ginés Delaware: Juan de la Cuesta,
1982
HHC Historia general de los 1601 Herrera y CORDE Ángel de Altolaguirre y
hechos de los castellanos Tordesillas, Duvale, Madrid: Real Acade-
en las islas y tierra firme. Antonio de mia de la Historia, 1934
Década primera
HVH Historia de la vida y hechos 1611 Sandoval, CORDE Alicante: Universidad de
del Emperador Carlos V Fray Alicante, 2003
Prudencio
de
How usage rescues the system: Persistence as conservation 315

ID Title Date Author Source Edition

HFE Historia de Felipe II, rey de 1619 Cabrera de CORDE José Martínez Millán y Carlos
España Córdoba, Javier de Carlos Morales,
Luis Salamanca: Junta de Castilla
y León, 1998
HDC Historia y descripción de la 1625 Fernán- CORDE Córdoba: Boletín de la Real
antigüedad y descendencia dez de Academia de Córdoba, 1954
de la Casa de Córdoba Córdoba,
Francisco
HCA Historia de los movimien- 1645 Melo, CORDE Joan Estruch Tobella, Madrid:
tos, separación y guerra de Francisco Castalia, 1996
Cataluña Manuel de

You might also like