Starke - Language Variation
Starke - Language Variation
Michal Starke*
Three decades after the “Principles and Parameters” revolution in language variation,
we still have no theory of variation. Thirty years ago, if some element moved in one
language but not in another, this would be expressed by adding a movement rule to one
language but not to the other. Today, it is expressed by adding a feature “I want to
move” (“EPP”, “strength”, etc.) to the elements of one language but not of the other. In
both cases (and in all attempts between them), we express variation by stipulating it, via
the postulation of a brute-force marker.
This paper shows that you can do variation without inventing any dedicated marker
such as “EPP features” or “strength of features”. The solution is simple: if you allow
lexical items to spell out entire syntactic phrases, some lexemes will be bigger phrases,
some will be smaller phrases – and I explore the conjecture that this is all we need for
variation.
Before I show you how to do parameters this way, we need to clarify some backdrop:
where Principles & Parameters (P&P) stands today, and why we ended up with [±“I
want to move”] features. On occasions such as this conference, one sometimes comes
across the claim that Minimalism is the successor to P&P, or that P&P failed because
some ideas about parameters are wrong, etc. Such claims rest on a misunderstanding of
what P&P is about.
The current lack of a satisfactory theory of parameters should not be taken to mean that
P&P failed as a framework. On the contrary, it has been working remarkably well at its
core. The “core” of P&P (Chomsky 1981) is the idea that most of grammar – perhaps all
of it – is invariant across languages. I.e. the “core” of P&P is the “principles” part.
Parameters are secondary in that they presuppose the principles: parameters only make
sense against the backdrop of pre-established principles. Parameters are simply the
“residue” left once the invariant cross-linguistic principles are factored out, whatever
that residue turns out to be. The current lack of a satisfying approach to parameters is
something we need to solve – and that's the aim of this paper – but it is certainly not a
good barometer of the overall health of P&P. By extension, any value judgement about
a particular approach to parameters is not a good barometer of the overall health of
P&P.
Once it is clear that P&P is primarily about the idea that most or all of grammar is
cross-linguistically invariant, it becomes obvious that Minimalism is an instance of
P&P, along with remnant-based approaches (Kayne 1994), GB (Chomsky 1981),
cartography (eg. Cinque 1999), nanosyntax (Starke 2002), etc.1 This is because
Minimalism builds on the idea of largely or entirely invariant cross-linguistic principles,
* Adapted from a presentation at the Barcelona workshop on parameters. Many thanks to Carme Picallo
for her patience and support. [v009]
1
both in its early versions (economy theory, Chomsky 1993) or in its second generation
(phase-theory, Chomsky 2000).
The question thus stands: given the health of the theory of invariants (“Principles”), can
we construct a non-stipulative theory of the residue (“Parameters”)? Let me first show
you that in trying to resolve this question, we are stuck in a dark and inhospitable place
– in part due to the very success of the invariants.
In the early days of P&P, there was plenty of space in the theorertical apparatus to
implement variation: Grammars could vary either lexically (perhaps complementisers
vary as to whether they induce subjacency effects or not) or in their principles
(grammars might vary as to whether they require wh-movement overtly or covertly). As
research advanced, that theoretical space open for implementing variation gradually
closed down, to the extent that the most promising contemporary approaches lead to a
landscape in which there is apparently no space at all left for variation.
The first move in that direction, a standard move by now, was to eliminate parametric
variation from grammar itself, and restrict it to the lexicon. According to that line of
thinking, principles are invariant across languages, but they are exquisitely sensitive to
the grammatical properties of lexical items. The second move, was the rise of fine-
grained representations – ie. cartography, the functional sequence, etc. As years went
by, results piled up showing that the ingredients of syntax are smaller and more
numerous than classically thought, and hence syntactic representations are bigger and
more fine-grained than classically thought. To almost everybody's surprise, the order in
which those new ingredients (“phrases”, “projection”, “functional categories”) occurred
in syntactic representations turned out to be also largely or entirely invariant across
languages. This path of research is quickly leading in the direction of one feature per
terminal, and invariant content and order of phrases across languages. This is a
spectacular success for the invariant part of grammar (“principles”), but it just as
spectacularly shrinks the theoretical space available to express variation: under that
view, both the content and the order of features is invariant cross-linguistically. Not
only are grammatical principles invariant, but the features and underlying
representations they operate on are also invariant.
Not everybody lives at the end of that road (yet), but the further down the road you live,
the more desperate the variation problem looks. And yet, there is a simple way out. No
1 Depending on whether the “Principles” in “Principles and Parameters” are implicitely assumed to be
inviolable principles only, some versions of Optimality Theory are also instances of P&P.
2
matter how much invariance you are led to accept in your principles, representations
and features, there is a simple and elegant solution to the parameter problem.
That solution builds on the idea of phrasal spellout. Assume an underlying syntactic
structure of the type:
(1)
The idea of phrasal spellout is that an entire (sub)constituent such as for instance [b [c]]
in the structure above can be spelled out by one lexical item, eg 'how' in English:
(2)
a
b
c
how
Given this technology, developed in the nanosyntax framework independently of
variation, lexical items come in various “sizes”: they spellout either bigger or smaller
constituents. This opens the possibility that in the next language we look at, the
counterpart of 'how' spells out a slightly different amount of structure; for instance, the
Slovak 'jak' might be:
(3)
a
b
c
jak
At this point, we have language variation without diacritic markers such as strength of
features: all grammatical processes involving the 'a' layer, aP, will affect the Slovak 'jak'
but not the English 'how' – or more generally Slovak wh-phrases but not English wh-
phrases.
The conjecture I am exploring is that such “size differences” are enough to express all
cross-linguistic syntactic variation. What we thought of as “parameters” are just
differing sizes of lexical items. Here, I restrict myself to the theoretical side of this
3
claim: detailing the logic of the above reasoning and showing how it leads to various
types of grammatical variation.
At first sight, it looks like there is a catch however. To see it, let's start by looking more
closely at the notion of phrasal spellout. The motivation for it runs something like this
(see nanosyntax work for a more detailed walkthrough): as syntactic representations
become bigger, their terminals become more fine-grained, until they reach the point of
being “sub-morphemic”, ie smaller than individual morphemes. Syntactic trees with
even moderate amounts of functional projections have long passed the “submorphemic
terminals” point. If terminals are smaller than individual morphemes, it follows that
morphemes cannot feed syntax: they are too big, too coarse, they do not provide the
right granularity of ingredients to build syntactic trees. Rather, it is only after some
steps of derivation that a constituent large enough to correspond to a morpheme is
created. It thus follows that the lexicon comes strictly after syntax and lexemes
correspond to entire phrasal constituents.2
(4) features
Computational
System
Lexicon
And this is where the catch comes: if the lexicon is strictly after syntax, the lexicon
comes too late to ever influence the course of syntax. And therefore, the size of the
constituent spelled out by eg. 'how' or 'jak' will never be able to influence the working
of the syntactic computational system. If this were so, parameters could not be reduced
to the size of lexical items after all.
R. Kayne: I object to the idea that “phrasal spellout follows from submorphemic
terminals”. One could equally well view the situation in terms of a morpheme-
sized terminal surrounded by several projections headed by null morphemes,
hence dispensing with phrasal spellout.
Starke: Actually, no: even if you do that, you don't escape phrasal spellout. Once
the details of the approach you suggest are worked out, the null morphemes idea
turns out to be a variant of phrasal spellout. Let's take a concrete example to make
the discussion clearer. Take for instance the two synthetic forms of the past tense
in French, “il chant-ait” and “il chant-a”. Both are past tense, both are spelled out
as a single phoneme. But they differ in their “aspectual” properties in complex
2 Here the architecture remains neutral on whether the features feed syntax from the “outside” or
whether they are part of the computational system itself, and also on whether they are differentiated
already (“labelled”) or whether that comes later. The point is rather that only atomic features are
available at the beginning of a computation. Grouping of features into lexical items strictly follow
syntax.
4
ways that can be informally summarised as “imperfective” and “perfective”. Since
tense and aspect are different syntactic projections, I would conclude that a
morpheme such as the French past tense 'a' spells out a phrasal constituent, ie a
constituent containing at least the two terminals T and Asp. You on the other hand
would phrase things in slightly different terms: 'a' spells out one of the terminals,
say T, and co-occurs with a null perfective morpheme in Asp, 'ait' also spells out
T, but co-occurs with a null imperfective morpheme in Asp. The crucial question
is: under your description, how do we ensure that the flavor of T spelled out as 'a'
necessarily co-occurs with the flavor of Asp spelled out by the null “imperfective”
aspect and cannot co-occur with the flavor of Asp spelled out by the null
“perfective” aspect ? (Mutatis mutandis for “ait”, of course). This co-occurrence is
automatic in phrasal spellout, but is not captured by your null morpheme
approach. So a “null morpheme” approach needs to be supplemented by a
mechanism which ensures that the right (null) morphemes occur with the right
overt morpheme. And once we do that, spelling out one (surface) morpheme will
involve a mechanism spanning over an entire stretch of terminals: the terminal
with the morpheme being spelled out, and all the adjacent terminals populated by
null morphemes whose values need to be “synchronised” with that realised
terminal. In effect, you will have recreated phrasal spellout in your system (or
“spans” or “stretches” as some call them). The equivalent of phrasal spellout is
thus unavoidable once terminals become submorphemic.
Moro: So the difference between this and generative semantics would be that the
hierarchy here is fix and independently motivated?
Starke: The most fundamental architectural claim of generative semantics was that
semantics comes before syntax. Much of the rest of the discussion followed from
that assumption. Since phrasal spellout (or nanosyntax in general) does not adopt
this claim, there is a fundamental difference with generative semantics right at the
outset. That said, there are obvious similarities between modern syntax and some
aspects of generative semantics. In our case, lexical decomposition as practiced in
generative semantics is a historical antecedent to some forms of phrasal spellout.
Interestingly though, if you read through the technical details of how lexical
decomposition was done by generative semantics, they remain heavily tied to the
notion of terminals and shy away from phrasal spellout per se. So even in this
domain, the difference is clear.
Boeckxx: Distributed Morphology also has the “too late” problem you were
starting to discuss.
5
only individual features before syntax. So the issue is not so much “is there a late
lexicon after syntax”, but rather “is there an early lexicon before syntax, feeding
syntax?”. Everybody except nanosyntax says “yes, there is an early lexicon before
syntax” (and perhaps another afterwards). Nanosyntax stands alone in claiming
that “no, there is no lexicon before syntax” and hence it also stands alone with this
new problem on its hands: the lexicon comes too late to be a supplier of
parameters, whatever the format of parameters is.
Luckily, there is a simple solution to this “too late” problem, and in fact, that solution is
already included in the Nanosyntax framework. It comes from idioms and semi-regular
morphology, so let's look at that, as it will not only solve our new problem, but it will
also give us tools for the discussion of variation.
Idioms are prima facie an important source of support for phrasal spellout. Within the
traditional approach, there is no easy way to handle multi-word idiomatic expressions,
as witnessed by the clunkiness of the existing attempts at handling idioms while at the
same time confining spellout to terminals. Under phrasal spellout, idioms are natural:
they are cases in which a relatively high-level constituent has been stored. The
traditional example of “kick the bucket” can now be rendered as the lexicon storing an
entire VP, or the modern-day equivalent of a VP (eg. a syntactic layer above AspP).
Kayne: Could you say something else about discontinuous idioms of the type
“give somebody a piece of your mind”, where in English you have the indirect
object that’s completely free? How does that interact with your theory of spellout?
Starke: Yes and no. Yes in that a traditional solution to such facts carries over
unchanged to the phrasal spellout view of idioms: there is a level of constituency
at which [give a piece of your mind] is a constituent excluding the indirect object,
and that is the lexically stored constituent. The indirect object is then
compositionally added to this constituent. On the other hand, looking at idioms in
any detail will quickly reveal that there are many facts and regularities about gaps
that we have little or no understanding of for the moment. So I am reluctant to
invent explanations for the few facts we do know, before we get a better picture of
the overall domain. For instance, there are unnoticed facts in French showing that
you can insert quantifiers into some idioms, and hence there is a gap available in
the middle of the idiom, but it is only available to the type of quantifiers that
correspond to the type of nouns you have in the idiom. That means that you have
a gap in the idiom but you also have some regularity about how the gap can be
filled. And before we have a generalization about that kind of fact about gaps in
idioms, I am reluctant to start inventing a theory of gaps in idioms – we just don’t
know enough about what the facts are.
There are many interesting technical and empirical issues to address about storing
idioms as phrasal constituents, but only one of them is directly relevant to our concerns
here: put simply, multi-word idiomatic expressions are made out of regular words, with
their regular allomorphies and quirks of lexical insertion. Those words making up the
idiom are therefore themselves the result of spellout operations at lower hierarchical
6
levels of the syntactic structure (eg. the lexical insertion of “bucket” in “kick the
bucket” or of “took” in “took to the cleaners”). We thus have a series of lower-level
spellout operations (“kick”, “the”, “bucket”) and one higher-level spellout operation
identifying the whole VP/AspP as corresponding to the lexical entry of the idiom.
Given this setup, notice that the VP/AspP-level spellout operation must know about the
outcome of the previous lower-level spellout operations. It is only if the lower-level
operations chose “bucket” over “jar”, “horse” or “plate” that the idiom will be
applicable and hence spellout of the higher constituent must somehow “remember” the
result of the lower-level spellout operation.
This means (inter alia) that the architectural schema in (4) above must be modified to
add the equivalent of a feedback loop after spellout:
(5) features
Computational
System
Lexicon
This way the choice of “bucket” over “jar” will be visible to the next computational
cycle, and the idiom “kick the bucket” can be correctly restricted to cases where
“bucket” was spelled out earlier.
As a byproduct, we have now solved our “too late” problem: given the feedback loop
from spellout back into the syntactic computational system, syntax does have access to
prior lexical choices. Lexical choices can therefore in principle affect further
computation. We are in business again: there is a logical sense in which spelling out a
syntactic structure with “how” may lead to different consequences than spelling out a
syntactic structure with “jak”, given the different lexicalised structures of “how” and
“jak” in (2-3).
In fact, there are at least 3 different ways in which the “size” of a (the syntactic structure
of) a lexical item can be a parameter, ie. affect computation. Let's go from the simplest
to the most involved.
1. Suppose that a structure such as (1) is lexicalised differently in two languages, for
instance as in (2) versus as in (3), both repeated here:
7
(6)
a
b a
c b
c
how jak
the a layer, aP, will be “eaten up” in one language and hence unavailable for use by
independent lexical items, whereas it will be available for such use in the other
language. As a result, one language will have some visible constructions targeting aP
whereas the other language will lack such constructions.
To make things concrete, here are a couple of concrete illustrations, simplifed for the
exposition. Consider for instance the following indefinites in English and French:
Although the English and the French series share many properties, they curiously differ
as to whether they enter into the “or other” construction:
8
in English and “or other” can attach to it (presumably followed by movement of the
indefinite), but it is unavailable with French indefinites, as illustrated in (11):3
(11)
a a
b b
c c
or other
Regardless of the empirical plausibility of this particular example, this shows that cross-
linguistic differences can be expressed by simple size differences. In other words, at
least some “parameters” are expressible in terms of structural size.
Here is another example of the same logic, again offered for illustrative purposes. The
germanic verbs in English can be found with a few constructions which are not
available to the latinate verbs of English: verb-particle constructions, resultative
constructions. Again, such a situation fits into the same pattern: if latinate verbs spellout
a larger syntactic structure than germanic verbs, the germanic verbs will leave some
layer of structure available for further use while latinate verbs will “eat out” those
layers:
(12) a. b.
a a
b b
c c
3 Readers familiar with Nanosyntax may object here that the technical definition of spellout in
Nanosyntax derives the “superset principle” as a theorem and hence the French indefinites should be
able to shrink to accommodate the “or other” construction. If we were to pursue this analysis of
indefinites, we would thus need to treat French indefinites as “unshrinkable”, in a way reminiscent of
the R-state adjectival passives in Starke (2006).
There is would in fact be natural: these indefinites are clearly composite expressions, ie. idioms
composed of “quelque” and “chose”, “un”, etc. Technically they would thus be lexical items referring
to other lexical items (“pointers”, in technical terms), and such cases are known to be unshrinkable.
9
Again, we would have a case of a “parameter”, ie the presence vs absence of classical
resultative constructions in English vs Italian, reducing to a simple size effect: the verbs
of Italian spellout a slightly larger syntactic structure than the germanic verbs in
English.
This reasoning extends to another familiar case: often, some language is described as
“lacking some functional projection” which is present in another language, a situation
which has led to controversies about whether all functional layers are always present but
somewhat silent, or whether they can be genuinely absent. Consider the situation we
just saw however: if all verbs of Italian (or French) spell out the bigger structure (12b),
and some other language has verbs spelling out the smaller structure (12a), in such a
situation, aP will never be “visible” in Italian (or French) but will be visible in other
languages. In this case, it will thus appear as if Italian or French “lack a functional
projection” which is present in other languages. Again, a familiar case of language
variation reduces to the differing sizes of lexical items in different languages.
2. Size differences trigger another type of syntactic effect: movement differences. Let us
start with what we could call “spellout driven movement”, before addressing more
classical movements such as wh-movement. Finding selective triggers for movement,
such that there is movement in one language but not the other, is a notoriously difficult
task.
(13)
a
b
c
d
Assume we came to the stage of the derivation in which we have built cP:
(15)
c
d
kick
10
This constituent can be spelled out with the lexical entry (14a), as “kick”. Compare this
to the situation in which we have built the tree up to aP:
(16)
a
b
c
d
*
kick
Here we have a problem: this tree cannot spellout. There is no single lexical item that
covers abcd, so we need to resort to the two lexical items covering ab and cd
respectively. As before, the lexical item (14a) matches the constituent [c [d]] and
presents no problem. The lexical item (14b) on the other hand cannot be used: it
matches a constituent [a [b]], but there is no such constituent in the structure, ie a
constiuent made up of ab to the exclusion of anything else.
If this structure is to be spelled out, something must be done to save it. I propose that
this kind of situations is the source of one type of movement: last resort movement
driven by the need to spellout. In this case, [c [d]] moves out as a last resort, so as to
create the configuration:
(17)
c a
d b
ed
kick
Now the structure can be spelled out: as before, [c [d]] corresponds to “kick” and now
[a [b]] matches the lexical (14b) and hence spells out as “ed”, yielding “kick-ed”.
We have now created a situation in whch movement is triggered without ever having to
say that there is a trigger or any special movement-related ingredient. Movement
happens because you have to create a configuration that is adequate for spell-out.
Note that this style of movement trigger is not limited to morphological entities. The
same logic applied high up in the syntactic tree will trigger movement of large syntactic
constituents. For instance a syntactically complex complementiser, perhaps spelling out
11
ForceP and FinitenessP in Rizzi's 1997 system, will yield movement of the entire clause
around the complementiser, so as to create a configuration in which [force [fin]] is a
constituent matching the lexical entry of the complementiser.
As before, there is a lot more to say on this topic, among others how to derive the
difference between triggering a “cyclic”-like movement (specifier to specifier) versus
remnant-like (complement to specifier within an fseq), but here I limit myself to
illustrating our core point: parameters, including movement parameters, can be done
cleanly, without hacks such as EPP features or feature-strength, once lexical items
correspond to various “sizes” of syntactic structures.
How can we express the fact that some grammars require wh-movement (of the first wh
element), such as English, and some grammars don't, such as French? As is well-known,
French-style grammars still show locality effects even in cases in which the wh-word
stays in situ. (What is less well know is that the locality effects associated with wh in
situ are partially different than those of overtly moving wh – see Starke 2001 for a
detailed discussion – but this need not concern us here, as we concentrate on the cases
in which the locality effects are the same).
The locality facts indicate that there is movement in both cases (and the difference in
the locality facts suggests that this is not merely an issue of timing, ie. early vs late
movement). I would like to suggest that this is again a size difference. Assume the
representations (2-3) for wh-elements, now transposed to English versus French:
(18)
a a
b b
c c
french wh english wh
12
Now assume further that French has a null morpheme spelling out aP. The first
consequence is that bP will move over aP, by the logic of spellout-driven movement
discussed above:
(19)
a
b
c
french wh
Assume now that aP is the layer targeted by wh-movement. Since aP is contained in the
wh-words of English and hence aP remains a constituent with bP and cP, English will
have audible wh-movement. Since aP is not contained in the wh-words of French,
French will have wh-movement of the null morpheme spelling out aP, thereby
providing the twin consequence of locality effects and wh-in-situ.4
It thus turns out that both types of cross-linguist variations can be expressed in terms of
lexical elements spelling out bigger or smaller syntactic structures: the presence or
absence of a construction or functional projection in one language but not the other, and
the presence or absence of overt movement in one language but not the other. We have
finally opened the way for a clean theory of parameters, one without technical notions
invented only to notate variation.
Conclusion.
I have shown that “parameters”, i.e. cross-linguistic variation, can be expressed in terms
of lexical elements spelling out bigger or smaller subconstituents of the syntactic
structure being built by the computational system:
a a
b b
c c
french wh english wh
4 Notice this reasoning does not require anything like the lexical integrity principle, at this stage.
13
The yellow oval represents the lexically stored element, the black tree represent the
structure being built by the computational system and waiting to be spelled out. In such
a situation, I have illustrated above that grammatical processes moving aP will affect
bigger lexical items (the second type), and grammatical processes populating aP will be
available with the smaller lexical items (the first type).
After three decades of Principles & Parameters tradition, this is to my knowledge the
first explanatory theory of (the format of) parameters – as opposed to notational
diacritics marking loci of variation. In this theory, “parameters” reduce to the size of
structure that lexical items spell out. Of course, being the only principled game in town
is a comfortable position, but it remains to be seen if in this case it is also a correct
position.
References.
den Besten, Hans (1976). "Surface lexicalisation and trace theory", in: H. van Riemsdijk
(ed.), Green Ideas Blown Up. Papers from the Amsterdam Colloquium on Trace
Theory. Amsterdam: University of Amsterdam, 4-28.
Cardinaletti, Anna and Michal Starke (1993/1999) “The typology of structural
deficiency: A case study of the three classes of pronouns” in Clitics in the
Languages of Europe, ed. Henk van Riemsdijk, 145-233. Berlin: Mouton de Gruyter
Chomsky, Noam (1981) Lectures on Government and Binding. Dordrecht, Foris
Publications.
Chomsky, Noam. 1993. "A minimalist program for linguistic theory". En: Hale,
Kenneth L. and S. Jay Keyser, eds. The view from Building 20: Essays in linguistics
in honor of Sylvain Bromberger. Cambridge, MA: MIT Press. 1-52
Chomsky, Noam. 2000. Minimalist inquiries: the framework, in: Step by Step: Essays
on Minimalist Syntax in Honor of Howard Lasnik, eds. Roger Martin, David
Michaels and Juan Uriagereka, 89-155. MIT Press.
Cinque, Guglielmo (1999) Adverbs and Inflectional Heads, Oxford University Press.
Halle, Morris & Alec Marantz (1993) 'Distributed Morphology and the Pieces of
Inflection.' in The View from Building 20, ed. Kenneth Hale and S. Jay Keyser. MIT
Press.
Kayne, Richard (1994) The Antisymmetry of Syntax, MIT Press
Otero, Carlos (1976). "The dictionary in Generative Grammar". [Unpublished ms.,
UCLA.]
Rizzi, Luigi (1997) "The Fine Structure of the Left Periphery" in L. Haegeman (ed).
Elements of Grammar Dordrecht: Kluwer, 281-387.
Starke, Michal (2001) Move reduces to merge: a theory of locality. PhD thesis,
University of Geneva.
Starke, Michal (2002) The day syntax ate morphology. Class taught at the EGG summer
school, Novi Sad.
14