Merge and The Strong Minimalist Thesis
Merge and The Strong Minimalist Thesis
Sandiway Fong
University of Arizona
M. A. C. Huybregts
Utrecht University
Hisatsugu Kitahara
Keio University
Andrew McInnerney
University of Michigan
Yushi Sugimoto
University of Tokyo
Shaftesbury Road, Cambridge CB2 8EA, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre,
New Delhi – 110025, India
103 Penang Road, #05–06/07, Visioncrest Commercial, Singapore 238467
www.cambridge.org
Information on this title: www.cambridge.org/9781009462266
DOI: 10.1017/9781009343244
© Noam Chomsky, T. Daniel Seely, Robert C. Berwick, Sandiway Fong,
M. A. C. Huybregts, Hisatsugu Kitahara, Andrew McInnerney, and Yushi Sugimoto
2023
This publication is in copyright. Subject to statutory exception and to the provisions
of relevant collective licensing agreements, no reproduction of any part may take
place without the written permission of Cambridge University Press & Assessment.
First published 2023
A catalogue record for this publication is available from the British Library
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
DOI: 10.1017/9781009343244
First published online: November 2023
Noam Chomsky
University of Arizona
T. Daniel Seely
Eastern Michigan University
Robert C. Berwick
Massachusetts Institute of Technology
Sandiway Fong
University of Arizona
M. A. C. Huybregts
Utrecht University
Hisatsugu Kitahara
Keio University
Andrew McInnerney
University of Michigan
Yushi Sugimoto
University of Tokyo
Author for correspondence: T. Daniel Seely, [email protected]
closely examine Merge, its form, its function, and its central role in
current linguistic theory. It explores what it does (and does not do), why
it has the form it has, and its development over time. The basic idea
behind Merge is quite simple. However, Merge interacts, in intricate
ways, with other components including the language’s interfaces, laws
of nature, and certain language-specific conditions. Because of this,
and because of its fundamental place in the human faculty of language,
this Element’s focus on Merge provides insights into the goals and
development of Generative Grammar more generally, and its prospects
for the future.
1 Introduction 1
6 Illustrations 36
References 67
Merge and the Strong Minimalist Thesis 1
1 Introduction
A remarkable property of human knowledge of language is that it is infinite. You
are not a simple recording device, capable only of repeating members of the
finite set of sentences you have logged from experience. On the contrary, you
can understand and produce novel utterances; in fact, the vast majority of all
sentences that are understood and produced by humans across the globe are
new. There are so many sentences, indeed an infinite number of new ones, that
we all can, and should, use only our own. Although we all share the set of words
we use, and it’s perfectly ethical for you to use a thesaurus to find just the right
word for the poem you’re writing, the same is not true with sentences. There is
no sentence-thesaurus; and taking someone else’s sentence is very often not
ethical – it’s plagiarism. The infinity of sentences is the basis for this ethical
principle; there are enough sentences to go around for each of us to use only our
own. It’s also the starting point of Generative Grammar, and of our discussion of
Merge: a driving goal of the modern study of language is to determine and
explain this property of discrete infinity.1
Your knowledge of language is infinite, but your memory is finite. Your
knowledge of language therefore can’t be just a list of memorized sentences.
A central component of any theory of language, then, involves generating an
infinity of sentences with finite resources. From a finite set of atomic elements,
lexical items (roughly but not exactly words2) composed of irreducible linguis-
tic features, the syntax must build an infinite array of hierarchically structured
expressions interpretable at the ‘meaning’ interface and available for external-
ization at the ‘form’ (sound/sign3) interface, the so-called basic property of
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
1
There’s a one-word expression, a two-word expression, and so forth indefinitely, but no one-and
-a-half word expression; that’s “discrete” or “digital infinity.” Among others, see Huybregts
(2019).
2
There is a significant difference between the abstract elements in the Lexicon, the “lexical items”
that we refer to in the text, and a common sense notion of a word that may actually be spoken or
written. We can leave this aside for present purposes.
3
Despite historical prejudice, the sound modality is not critical to what makes language in the
sense intended. The sign modality is equally relevant. Core, mind-internal aspects of language are
shared across modalities (Lillo-Martin 1991; Emmorey 2002; Sandler and Lillo-Martin 2006,
among others), and different language modalities seem to share modality-independent neural
hardware; see Petitto (1987, 2005). For example, when sign-language users have damage in
Broca’s area (which is responsible for language production), they will show production errors,
just as Broca’s aphasia patients who use a spoken language could have a problem with language
production. See Hickok and colleagues (1998) and Klima and colleagues (2002).
2 Generative Syntax
m2
m1 …
mn
The Lexicon provides the raw material out of which Merge, the structure-
building device, constructs larger objects. The Lexicon and Merge together
constitute language in the narrow sense of the term.6 We assume that lexical
items are drawn as needed from the Lexicon and inscriptions of them are
available for computation. Thus, a lexical item such as the noun child can be
selected, and as many inscriptions of it as might be needed can be available,
allowing such sentences as One child slept while a second child played with
another child. Similarly, in mathematics there are multiple inscriptions of, say,
the numeral three in an equation like 3x + 3y = 3.
The computational process of structure building takes place within
a Workspace (WS), which is updated in the course of the derivation of some
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
expression. The WS is the set consisting of the material available for computa-
tion at a given derivational stage. Thus, the WS contains inscriptions of lexical
items that have been entered into it and any objects constructed by Merge at
earlier points:7
4
Merge was first introduced in Chomsky (1994); see also Chomsky (2004a, 2013). For further
discussion see Collins and Stabler (2016), Epstein and colleagues (2014, 2015), Epstein (2022),
Collins (2017), among others. Section 7 provides a history of the development of Merge in the
generative tradition.
5
An even stronger view, to be reviewed in Section 8, is that Merge is the only operation of the
syntax.
6
See Hauser and colleagues (2002).
7
We return in Section 3 to the technical specification of the WS, differentiating the set that is the
WS from the sets that are syntactic objects within the WS; the WS is a set (that is a simple way to
represent it) but the WS itself is not a syntactic object that is joined by Merge with any other
object. See Collins and Stabler (2016); see also Kitahara and Seely (in press) and Marcolli and
colleagues (in press) for a more complete formalization closer to the verson of Merge described
here.
Merge and the Strong Minimalist Thesis 3
(3) Merge:
(i) ‘looks inside’ the WS that it is applying to,
(ii) targets material within the WS,
(iii) builds from that targeted-material an object (i.e., it builds a nonatomic
structure), which is now
(iv) a new object within the WS, thereby modifying the WS.
Merge can take this WS as input and target the inscriptions of the lexical items
the and child to create the set {the, child},9 adding that set to the WS, and
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
yielding (5).10
8
As we’ll see in more detail as we develop the framework, material that enters the WS is generally
accessible for computation. However, in certain circumstances, material can be present in the
WS but not accessible to Merge – this material is in effect ‘hidden’ from Merge. Generally
speaking, an element is accessible unless rendered inaccessible in some way – we’ll trace some
of those ways in Section 6, and will consider various complexities associated with the notion
‘accessibility.’
9
The sets constructed by Merge correlate with the traditional construct ‘phrase.’ For ease of
exposition, we will sometimes use both terms, set and phrase, designating the same object. We
use standard curly brackets to indicate the sets/phrases constructed by Merge; to avoid confu-
sion, we’ll use square brackets, [], to designate the WS, which as pointed out earlier in this
section is also a set, but not a syntactic object constructed by Merge.
10
As we will see as we get into additional details, the inscriptions of the and child that were
members of the WS do not themselves remain as members of the WS. The objects the and child
are, in effect, replaced by the new set {the, child}, a result that will follow from independent
third-factor principles, as will be clarified in Section 4.
4 Generative Syntax
Merge can then take this modified WS as input, target within it the object it just
created, namely {the, child}, and join that object with see to produce the
predicate phrase see the child,
and so on leading to the final abstract representation of the sentence I see the
child.
As we’ll see as we proceed, building our exposition through successive
stages of complexity, Merge does not in practice freely target any element in
the WS. In fact, there are very general principles, external and internal to
language, that constrain just how Merge applies, with far-reaching empirical
consequences. Note further that Merge is recursive in that it can target the
objects that it creates in subsequent applications. In other words, in principle,
the output of one application of Merge can serve as input to another application
of Merge.11
Putting aside technical details, to be introduced in Section 6, this general
picture provides a solution to the problem of discrete infinity. With a finite
number of atoms and a finite number of computational mechanisms (so far, just
one, Merge), the system has an infinite output; Merge can build a new syntactic
object out of what it has already created. Merge, then, is the central component
of language, where language is understood as a computational device generat-
ing linguistic objects receiving an interpretation at the meaning interface and
a potential externalization at one of the SM interfaces (of sound or sign).
The goal of this contribution to the Elements series is to closely examine
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
Merge, its form, its function, and its central role in current linguistic theory. We
explore what it does (and does not do), why it has the form it has, and its
development over time. The basic idea behind Merge is quite simple. However,
Merge interacts, in intricate ways, with other components including the lan-
guage’s interfaces, laws of nature, and certain language-specific conditions.
Because of this, and because of its fundamental place in the human faculty of
language, this Element’s focus on Merge provides insights into the goals and
development of Generative Grammar more generally, and its prospects for the
future.
To provide an outline of this Element: In Section 2, we review important
background information, tracing the biolinguistic perspective on language
assumed by Generative Grammar, that is, that language (in the narrow sense
focused on here) involves a computational device embedded within an array of
11
In a more general sense, what Merge builds within the WS at a given stage remains accessible for
computation at later stages. Thus, if Merge puts X and Y together to form Z, then the new object
Z is available for further computation.
Merge and the Strong Minimalist Thesis 5
human cognitive systems, and interacting with them. We also stress that
a central goal from the generative perspective is to explain the properties of
language, not merely to describe them. We then consider a number of key
modes of explanation that have been pursued in the generative tradition,
including recent work in minimalism, that seek to deduce seemingly complex
properties of language from a few simple computational operations that interact
with general laws of nature.
Having set the stage, we turn in Section 3 to Merge. The language faculty,
by virtual conceptual necessity, involves a structure-building device. The
goal is to stipulate as little as possible about it, deriving its seemingly
complex properties from more general principles. We start with the simplest
conception of Merge, that is, Merge as it would be in a ‘vacuum’ removed
from other properties of language and from laws of nature – we start with
Merge as a simple computational device. We then add in, step by step,
different principles which affect Merge, and which thereby shape the form
and function of this operation. Being a computational device, Merge con-
forms to general efficiency principles that all computation is subject to. But
Merge is also a component of the human language capacity. Thus, Merge is
subject to – and its operation is constrained by – general properties of human
cognition, as well as language-specific principles. We trace such principles
in Sections 4 and 5. Merge itself is maximally simple. But it interacts with
general and language-specific principles in intricate ways, constraining its
application; and these interactions conspire together to produce, ideally, just
the empirical effects that we find.
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
in the species? And, perhaps the most fundamental question of all might be put
this way:
How can there be just one human language and multiple languages at the
same time?16
12
The discussion in this section is meant to outline background assumptions essential to under-
standing the nature of Merge in current syntactic theory. For further detail, see, among others,
Chomsky (1986, 2000, 2004b, 2017a, 2021b).
13
Early works include Chomsky (1955, 1959, 1965, 1966a, 1968), and Lenneberg (1967), among
others.
14
As a historical note, the term ‘generative enterprise’ was first used in The Generative Enterprise
(Chomsky 1982a; Chomsky 2004b).
15
For a full account of the origins and break with structural linguistics, see Chomsky (1964). For
further discussion, see Chomsky (1966b).
16
For discussion, see Huybregts (2017). Given current genomic evidence, the evolution of Merge
(or one Faculty of Language, FL) apparently antedates early human dispersals with subsequent
distinct means of externalization (multiple languages). The idea that there is one human Faculty
of Language, but multiple means of externalizing the products of that faculty resolves the
paradox.
Merge and the Strong Minimalist Thesis 7
Take any typically developing human baby, place that baby in any linguistic
environment, and the baby will, effortlessly for the most part, grow the ambient
language – there’s a massive naturalistic experiment on the planet right now that
shows this. At a certain level of abstraction, then, there is just one human
language faculty and thus one human capacity for language, which is, by
definition, part of the innate biological endowment of Homo sapiens – your
baby is born with it, your puppy is not. Yet, when we look around the world, we
see literally thousands of mutually unintelligible languages, which are often
characterized as radically different. How is this possible? What must the human
language capacity be like for this to occur?
To understand how such questions are addressed, it’s important to note that,
from the biolinguistic perspective, language is understood as a computational
system, one that builds structured objects from lexical material, where, in
current theory, Merge is the structure-building device. This means, among
other things, that Merge has the general properties of computational devices,
and that language is subject to general principles of computational systems,
something taken up in detail in later discussion.
It is important to note too that, in recent work in the generative enterprise, it is
assumed that language is closely related to thought; a conception captured
simply in William Dwight Whitney’s phrase that language is “audible thought,”
a notion that revives a long tradition dating back thousands of years (Chomsky
2022a). Perhaps language is/constitutes thought (see Hinzen 2017, and
Chomsky 2022a and Chomsky, in press). On this view, syntactic computation
primarily serves the conceptual-intentional (CI) system. Externalization of
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
not by appeal to the simplest computational rule, adjacency. Rather, the child
reflexively relies on something it never hears: the structure its mind creates. The
child then assigns plurality by virtue of the nature of this abstract structure.
This crucial distinction between I-language and externalization is also high-
lighted by homesign or emerging sign languages. As Huybregts and colleagues
(2016) discuss, homesign in deaf isolates or newly emerging sign languages (e.g.,
ABSL, Negev) in communities with a high incidence of congenital hearing loss
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
17
For recent discussion see, among others, Chomsky (2021b).
18
According to some experimental work, down to eighteen months (Shi et al. 2020).
Merge and the Strong Minimalist Thesis 9
19
For important discussion, dealing with certain confusions regarding the notion ‘linguistic input,’
see Epstein (2016).
20
The theory of this initial state is often labeled Universal Grammar, UG, adapting a traditional
notion to a new context.
21
This fundamental point is missed in the state-of-the-art language models (e.g., GPT3, see, for
example https://garymarcus.substack.com/p/noam-chomsky-and-gpt-3). For important discus-
sion see Chomsky (2022b). See also Chomsky and Moro (2022).
10 Generative Syntax
From the outset, the goals regarding the empirical content of the innate compu-
tational system, the first factor, were in conflict. On the one hand, descriptive
adequacy (i.e., getting the facts right) seemed to require that the innate endow-
ment be rich in available mechanisms, initially including multiple and rather
complex subcomponents for structure building and structure manipulation (see
Section 7 for further details in a historical context). It seemed that the innate
system had to be quite complex if the facts of language were to be accounted
for – even a superficial look shows that language is complex, diverse, mutable.
On the other hand, given the apparently recent evolution of language in the
species, the innate computational system must be simple – a complex, multifa-
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
ceted system could not have evolved in so short a time. Explanatory adequacy
(i.e., accounting for the acquisition of language) required uniformity, simplicity,
and an account of the ease and rapidity of language acquisition. Furthermore,
the quest for simplicity is a defining feature of theory building. As Einstein
notes: “The grand aim of all science is to cover the greatest possible number of
empirical facts by logical deduction from the smallest possible number of
hypotheses or axioms” (Einstein 1954, p. 282).
The attempt to deal with these conflicting demands, that the innate computa-
tional system account for complex facts and yet be simple, characterizes much
of the history of the generative enterprise. How can the system be made as
simple as possible while at the same time maintaining descriptive adequacy?
22
See Section 7 for more detailed discussion of the history of the development of Merge and the
changes that have occurred in the components of the syntax.
23
As noted earlier, language is a cognitive faculty, understood as a computational device.
Merge and the Strong Minimalist Thesis 11
Needless to say, this is a challenging task, and marks a central research goal of
the framework.
We’ve taken steps toward explanation when we’ve developed the sim-
plest system that meets certain conditions. One is the condition of learn-
ability: the theory must account for how a child can, through the
interaction of the innate language faculty and the environment, grow an
I-language (recalling our discussion of ‘I-language’ in subsection 2.2).
A second condition is evolvability: the innate computational system must
have emerged in accord with the conditions on the evolution of Homo
sapiens.24 As noted in Chomsky (2021b, p. 8):
With regard to evolvability, genetic studies have shown that humans began to
separate not long after their appearance.25 There are no known differences in
Language Faculty, narrowing the window for its emergence. Furthermore,
there is no meaningful evidence of symbolic behavior prior to emergence of
Homo Sapiens. These facts suggest that language emerged pretty much along
with modern humans, an instant in evolutionary time. If so, we would expect
that the basic structure of language should be quite simple, the result of some
relatively small rewiring of the brain that took place once and has not changed
in the brief period since.
The competing requirements of these first two conditions on explanation are set
in clear relief: to account for the rapidity and uniformity of acquisition, the
innate computational system would seem to need to be complex;26 recent
evolution demands the opposite.
A final condition is that the innate system must accommodate all
possible languages while barring all impossible ones. A series of studies
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
designed by Andrea Moro, for example, shows that language areas of the
brain react normally when a subject is presented with invented languages
that model actual languages. However, this is not the case with invented
languages that are not like actual ones; in this case, the subject’s brain
activity is that of puzzle solving, not language-area activation. Indeed, in
acquisition, children ignore what is seemingly most accessible to them,
namely, linear order, and appeal instead to the nonlinear structures created
by the mind (i.e., created by Merge), Moro (2016); see also Chomsky and
Moro (2022).
24
See Berwick and Chomsky (2016). See Huybregts (2017), and see footnote 14 in subsection 2.1.
25
See Ragsdale and colleagues 2023 for recent evidence that this demographic picture might be
less tree-like, with a more subtle pattern of weak admixture and divergence occurring prior to
120,000 to 130,000 years ago.
26
In fact, through much of the history of the development of the theory, UG was quite complex,
consisting of a variety of disparate mechanisms and language-specific principles.
12 Generative Syntax
sign) and meaning that are combined to comprise the atoms of syntax,
abstract lexical items, inscriptions of which are entered into the WS. The
structure-building operation Merge constructs syntactic objects from lexical
material and from the syntactic objects it has already constructed, yielding
hierarchically structured objects that are then available to the interface
systems, the sensorimotor (SM) for the externalization of ‘form’ in speak-
ing, signing, and writing, and the conceptual-intentional (CI) involving
‘thought’ for planning and other such cognitive activities. We stress that
the CI interface is primary (as we’ll see in more detail as we proceed); SM
is secondary:
27
For important relevant discussion of SMT, see Chomsky (2000b, 2004a) and also Freidin
(2021).
28
For example, see D’Arcy Thompson’s work (Thompson 1917). See also Turing (1952).
Merge and the Strong Minimalist Thesis 13
more general laws. There are many examples of a mechanism being influenced
by forces external to it: thus, in mitosis, the resulting cells are spherical not
because of biology but because of physics, similarly with the hexagonal struc-
ture of honeycombs. And there are many examples of natural systems that are
effectively maximally simple but this simplicity is masked by constraints
imposed by laws of nature so that ‘on the surface’ their behavior seems to
require complex description. Perhaps the most famous example concerns the
motion of the planets when viewed against the apparently fixed background of
stars. As is familiar to astronomers, this planetary motion can be accurately
described by a complex set of forty to sixty partial circles of different diameters
superimposed on each other, called “epicycles.” However, all this complexity
can be eliminated by a single natural constraint, namely, Newton’s law of
gravitational force declining inversely as the square of distance between the
sun and the planets, as outlined by Kepler and Newton.
Merge takes as input the WS including the elements, P1 through Pm, that are
selected from the WS, and it gives the output WS’, of which the set {P1, . . ., Pm}
is a member, where ‘. . .‘ represents everything in the WS minus P1, . . ., Pm.
29
Note that the most fundamental operation is essentially Form Set; that is, put any number of
objects into a set, adding that set to the WS. What is referred to as ‘Merge’ is, in effect, Form Set
restricted to two; thus, Merge is binary set formation. We discuss below why Merge is binary.
Merge and the Strong Minimalist Thesis 15
Merge thereby constructs the familiar objects of syntax, like noun phrases and
predicate phrases. To put, say, the and man into the set {the, man} is a simple
way to express that the words are now part of a larger object, a phrase,
traditionally referred to as a noun phrase.30 A set is a simple and readily
available notation to express the core idea of a phrase.31 In this way, syntactic
structure is built.
As formulated, Merge is simple. A host of questions immediately arise:
We will address these and other questions step by step, as we proceed. Our
immediate goal is simply to present Merge in its simplest and unconstrained
form.
Our next task is to show that the way Merge applies in language is con-
strained by forces external to it, specifically by third-factor considerations of
computational efficiency and by language-specific constraints. What Merge can
(and cannot) apply to within the WS and what it yields is in large part deter-
mined by the fact that Merge is a component of a computational system
operating within the human organism. We first outline third-factor consider-
ations that constrain how Merge applies (Section 3 and 4); we then turn to
certain crucial language-specific constraints (Section 5).
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
30
This structure is, in turn, crucial to interpretation at the interfaces.
31
In Section 7, we outline the development of generative grammar’s evolving views on human
phrase structure, ultimately leading to the Merge Operation, including discussion of how the
label of a phrase is determined.
16 Generative Syntax
targets inscriptions in the WS; thus, it targets the inscription of child and the
inscription {one, child} in the course of the derivation associated with the
sentence above.
Language, like any other formal computational system, is about manipulation
of the inscriptions that serve as its atomic units. And parallel to other formal
systems, there are structurally identical inscriptions. Thus in the formula
[ p & ~p] in logic there are structurally identical inscriptions, multiple instances,
of p in this case, and these instances are interpreted in the same way. There are
a number of ways for structurally identical inscriptions to arise in language as
well. The WS for our simple example contained three separate inscriptions of
child (i.e., child was entered into the WS from the Lexicon three times)32.
Language has another way for an inscription to arise in the WS: Inscriptions
arise as a result of an application of Merge, as we’ll see in subsection 3.4. What
we stress from the outset, since it plays a critical role in later discussion, is that
language has a unique property, distinguishing it from other formal systems:
Only language has structurally identical inscriptions that are not invariably
interpreted in the same way. Thus, in the sentence Many people praised many
people, the identical inscriptions of the phrase many people are interpreted
differently – clearly the two instances of many people can refer to different
groups.33 All of this will become clearer as we proceed; our immediate goal is to
stress a feature of Merge: it is operating on inscriptions within the WS, just like
any other computational system.
So far, we’ve introduced the basic, unconstrained form of Merge – Merge in
a ‘vacuum,’ as it were; that is, the unconstrained form of Merge, without
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
32
As we’ll see in more detail as we proceed, we distinguish the identity relation (the very same
inscription) from distinct inscriptions that are identical (they have the same form) but are, under
specific conditions, (i) interpreted in the same way versus (ii) not interpreted in the same way.
33
Thus, the logical form is roughly: For many x, x people, there are many y, y people, such that
x chased y, and not: For many x, x people, x chased x.
Merge and the Strong Minimalist Thesis 17
34
This is taken for granted in formal systems (in a ‘vacuum’) but has no effects since everything is
carried over; in a proof, you can go back to an earlier line, which, in the relevant sense, is still in
the WS. It becomes significant for Merge because of third-factor considerations constraining the
WS, as we’ll see.
35
This allows Form Set to operate on just a single object. As the simplest illustration, suppose the
Lexicon contains just one element, a, and Form Set creates a set. Then, it could create the set {a},
and then {{a}}, and {{{a}}}, and so on ad infinitum, effectively yielding the successor function.
The form of Merge that we investigate later in this section is binary (putting two objects into
a set), the minimum necessary for natural language syntax. As we’ll see in more detail, language-
specific conditions, more precisely Theta Theory, require more than one target of Merge for
language.
36
On the economy-based argument for the binarity of Merge, see Collins (1997).
18 Generative Syntax
The noun phrase with an ordinal number and an adjective, as in the third yellow
house above, is not adequately represented by the nonbinary set {the, third,
yellow, house}. This would allow the meaning that it’s the third house and it is
yellow since both the modifiers yellow and third are in the same minimal phrase as
the noun house and so should be able to independently modify that noun. What it
actually means is: it’s the third house out of the yellow houses (thus, ‘it’s the third
yellow house but not the third house overall’ is not a contradiction). We can
account for this under binary phrases. If only two objects are assembled at a time,
we can build the larger noun phrase in binary chunks: first, {yellow, house},
where yellow modifies house; then {third, {yellow, house}}, where third modifies
the complex object {yellow, house}; then {the, {third, {yellow, house}}}. Under
the binary structure, third is not directly paired with house and hence we correctly
disallow the unavailable interpretation that it is the third house (which happens to
also have the property that it is yellow). Details aside, our immediate point is that
the empirical evidence for binary phrases is well established.
This shapes the definition of Merge as follows:
(12) Merge(P, Q, WS) = WS’ = ({P, Q}, X1, . . ., Xn) = Binary Merge
Merge takes a WS as input; it targets two (and only two) terms P, Q within that
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
WS; it puts P, Q into the set {P, Q}, thereby adding the newly created set to the
WS and yielding a new derivational stage, WS’. The key point here is: the
binary property of Merge does not have to be stipulated for Merge, but rather it
follows from a general notion of simplicity.
Besides how many objects Merge targets (just two), a related question has to do
with what Merge targets: are any two inscriptions in the WS available to Merge,
and what, at an even more basic level, is meant by an object being ‘in’ the WS?
For an operation to apply to objects, it must first locate those objects. Let’s
abstract out the ‘locating’ operation, formulating and exploring its properties –
referring to the operation as Search. Search locates the (two) items in the WS
37
See, for example, the classic work of Kayne (1981, 1983, 1984, 1994); for important discussion,
see also Collins (1997, 2022).
Merge and the Strong Minimalist Thesis 19
The elements a and {b, c} are members of the WS (13) and are thus also terms of
that WS. The objects b and c, on the other hand, are terms of the WS, not
members of it. Let’s adopt ‘term of’ in order to have a vocabulary to more fully
detail the concept of Minimal Search.40
Search operates in a stepwise fashion: it first locates some P (from the WS),
and in a second step locates some Q to merge with P. Applied to the WS in
(13), that is, [a, {b, c}], Merge will first locate P, where P can be any member of
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
[a, {b, c}], so either a or {b, c}. Members of the WS are all equally available
with Least Search; the members are the first objects found looking into the WS
and the first Search step can take any one of the members. The terms b and c in
(13) are not members of that WS and hence are not candidates for (first-step)
Search. Staying for now at an informal, intuitive level, locating b or c takes
more computation (a more extended Search) than locating the containing set
38
We’ll see later that some objects are rendered inaccessible to computation if certain conditions
hold, something we can put aside for right now.
39
See Chomsky (in press) and Chomsky (2021b). See also the ‘contains’ relation of Collins and
Stabler (2016, p. 46).
40
For pioneering discussion of the formal nature of the Search procedure of syntax and its
consequences for syntactic theory, see Ke (2019). Ke points out that (i) the search target T and
search domain D depend on the operation O that Search is feeding. If O is Agree, then T is
a particular feature (depending on the probe), and D is the c-command domain of the probe. If
O is Labeling, then T is any feature, and D is the object being labeled. Ke (2022) discusses
examples of ‘least effort/Minimal Search’ in other domains, including visual search and com-
puter science.
20 Generative Syntax
{b, c}; simply put: it’s easier to take one step than two; locating a term within
a member of WS is a two-step process involving locating that member and
then locating a term in it, and thus takes more computation than locating that
member or any other member of the WS. It follows that the first step (locating
P) in any application of Search will take any member (and only a member) of
the WS as (the Merge target) P – this is least effort.
What about the second step of Search, locating Q (to merge with P)? In
the second step there are two options. One is that once Search locates P, it looks
inside P for Q. A second option is for Search to locate Q in the same way as it
located P; that is, it takes the WS as the Search domain and by Least Search can
take any member of that WS as Q.41 Overall, this gives us two general modes of
application of Merge, what are referred to as External Merge (EM) and Internal
Merge (IM). With EM the two Merge targets P, Q are separate members of the
WS. With IM, Q is contained within P.
Over-simplifying for purposes of illustration, consider the key points in the
derivation of a simple passive sentence like The apple was eaten. Assume that
inscriptions of the relevant lexical items have been entered into the WS, and that
the and apple are merged to form {the, apple}. We would then have the WS:
Suppose the first step of Search locates the transitive verb eat(en) (= P) and then
in the second step locates the noun phrase {the, apple} (= Q). Both steps
conform to Least Search since the targeted objects are both members of the
WS. With the objects now located we apply Merge, using those objects:
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
Since Merge creates within the WS the set {P, Q}, the output of this application
of Merge would be:
41
In the first step of Search, any member of the WS can be selected. In the second step, it’s the first
relevant object, where ‘relevant’ is determined by various factors. Our discussion here is focused
mainly on Merge applying to NP arguments; hence ‘NP’ is relevant, as we’ll see in the example
later in this section.
42
It should be asked: why isn’t the output
where P (= eaten) and Q (= {the, apple}) remain as members of the WS. We return to exactly this
question in subsection 3.3.3.
Merge and the Strong Minimalist Thesis 21
This yields the passive predicate, which we’ll refer to as the verb phrase (VP).
Suppose Search applies to the WS (18) and in the first Search step locates the
VP, a member of the WS; thus P = VP. As we pointed out above, the second step
of Search can look inside P and, in this case, find within it the NP {the, apple} as
Q.45 Thus,
With EM, P and Q are separate; with IM, Q is contained within P.46 There is
a single operation, Merge, with two modes of application.
Notice further that Internal Merge necessarily results in structurally identical
inscriptions of P, which in this case is {the, apple}. Given its formulation, Merge
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
combines its targets P and Q in a set; it doesn’t alter their structure.47 This is
a property of all derivations in any formal system. In the present case, it means
43
Note that the earlier construction of {the, apple} is also an instance of EM.
44
Chomsky (in press) proposes that EM always creates such semantic relations, referred to as
Theta Structures; that is, that EM (and only EM) builds the propositional domain. We return to
this idea and to some of its consequences in Section 5.
45
As stated earlier in this section, our focus here is on IM of arguments; hence, we assume that
Merge is searching for an NP; then, the NP {the, apple} is in fact the first NP found looking into
the VP; this is least search – it is the fewest search steps to find an NP.
46
Internal and External Merge, as traced earlier in this section, are the only permissible applica-
tions of Merge under the assumptions developed here. See Epstein and colleagues (2012) for
some relevant discussion. Importantly, while prior work has sometimes framed EM as Merge of
separate objects (see e.g., Chomsky 2004a: 110), EM here refers specifically to Merge of two
members of the WS. Other forms of Merge have been proposed, including what is referred to as
Parallel Merge (Citko 2005) and Late Merge (Lebeaux 1988); but we argue later in this section
that these are extensions of Merge, rather than subcases of Merge.
47
In previous literature this is referred to as the No-Tampering Condition: the targets of Merge
remain intact (Chomsky 2007). The No-Tampering Condition follows directly from the formu-
lation of Merge (Freidin 2021).
22 Generative Syntax
where P (somehow) changes from {was, {eaten, {the, apple}}} to {was, {eaten}}.
Indeed, altering P in this way would destroy the semantic relation between the
transitive verb and its semantic object; but, crucially, it would simply not be an
instance of Merge (as Merge has been defined) since it would be replacing P with an
entirely new P’. We stress that this is a general property of all computation.
3.3.3 Preservation
The application of Merge is restricted by the third factor, entailing that Merge is
binary and its two targets are located by Minimal Search, allowing only Internal
and External Merge. In this subsection we make explicit another important
property of computation, Preservation: the interpretation of an inscription does
not change in the course of computation.
Preservation is a general constraint, normal for all computation in formal
systems. There can be no valid computation unless each inscription is inter-
preted in one and only one way. Since language is a computational system, we
expect Preservation to hold. It certainly does for material that is not directly
affected by Merge. All nontargets of Merge in the input WS remain in the
output, WS’. To somehow remove such objects is an extreme form of change of
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
(22)
Merge and the Strong Minimalist Thesis 23
Preservation can see both the input (WS) to and the output (WS’) of IM, and
thus can see that {the, apple} is the very same inscription in the input/output, as
indicated by the dotted line between the derivational stages. Thus, {the, apple}
in WS is {the, apple} in WS’; it’s one and the same inscription. That is the
identity relation.48 Likewise with the passive VP {was, {eaten, {the, apple}}};
it’s the same inscription in WS and WS’. That is how Preservation must work. It
tracks the same inscription, ensuring that the inscription does not change its
interpretation. The operations of the syntax, however, are Markovian. Thus, the
identity information encoded with the dotted line is lost for the operation Merge.
All Merge sees is WS’, and looking only at WS’, it sees two distinct inscriptions
of {the, apple} that happen to be structurally identical (i.e., that have identical
form). In short, with respect to {the, apple} in the case above, Preservation sees
the very same inscription. Merge, and other operations of the syntax, sees two
inscriptions that are structurally identical; that is, that have the same form, with
the same lexical items hierarchically arranged in the same way. Syntactic
operations have no information about how these inscriptions were created.
48
The strict identity relation X=Y (‘the same inscription’) can only be determined across deriv-
ational steps. So, when we say ‘identical inscriptions’ or ‘structurally identical inscriptions’ we
mean two inscriptions that have identical form.
49
As stated in Chomsky (2021b, p. 17, fn 25); “The issue has arisen in the history of mathematics in
a debate over validity of Newton’s proofs, at a time when there was no clearly formulated theory
of limits. Was there equivocation in his use of zero and ‘as small as possible’”? See Kitcher
(1973). For an alternative view and further discussion of the copy-repetition distinction, see
Freidin (2016).
24 Generative Syntax
language, the question arises: When are identical inscriptions interpreted in the
same way and are therefore copies,50 and when are identical inscriptions not
interpreted in the same way and are therefore repetitions? The distinction is
illustrated with the following simplified, abstract representations.
language; it does not freely apply to any and all identical inscriptions. For
language, we can define it as:
The default is that identical inscriptions are repetitions, becoming copies only
if assigned to the copy relation via FC; it’s just that FC is restricted in
language.
To illustrate how FC applies, consider again the (most directly relevant)
derivational steps of the simple passive sentence Many people were praised.
50
‘Copy’ and ‘repetitions’ are standardly used terms from the literature. See Chomsky (2019a,
2019b, 2021b).
51
Of course, it’s possible for repetitions to be interpreted as co-referential. Thus, in He thinks he’s
smart, the identical inscriptions of he are repetitions, but can still have the same referent.
52
FC does not create structure; rather it is interpretive, assigning the copy relation to structurally
identical inscriptions under certain conditions.
Merge and the Strong Minimalist Thesis 25
Assume that lexical material is introduced into the WS and multiple instances of
EM have applied, building up to the passive verb phrase:
Here is a critical point: as noted above, operations of the syntax are Markovian,
as is true of computation generally. Merge and FC have access to only the
current state of the WS, prohibiting any ‘look back’ at earlier derivational
points. The syntax sees the WS at the stage represented by (28), not (26) nor
anything earlier. The syntax has no way to ‘see’ into the derivational past to
determine whether a single noun phrase has been ‘moving around’ within the
syntactic structure, or whether there are two separately constructed noun
phrases that happen to constitute identical inscriptions.
Let’s assume that we are at the point in the derivation where interpretation is
going to take place,53 that the syntactic object within the current WS, repre-
sented by (28), is ‘opened up’ to interpretation by the interfaces. It is at this point
that FC applies since the copy relation is relevant to the semantics; for example,
FC contributes crucial information for the semantic interpretation of inscrip-
tions. It is an operation that applies to a syntactic object, and which provides
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
‘instructions’ to the interfaces relevant to interpretation just at the point that the
object is entering semantic interpretation; FC provides the information ‘these
two identical inscriptions are copies’ and the semantics thus knows to interpret
them in the same way. Suppose FC applies to the WS in (28) and, given the
Markovian property, all it sees is (28). FC applies, thus: FC({many, people},
53
In technical terms, the point at which FC applies is referred to as the phase level. It goes beyond
our immediate concerns to go into further detail; it suffices to note that the pieces of structure
built by Merge are organized into chunks called ‘phases’ corresponding to the clausal and verbal
domains. The key idea is that Merge builds to a phase, the phase is opened up to interpretation by
the interfaces, and then Merge continues building to the next phase, and so on. This phase-based,
cyclic computation instantiates third-factor simplicity in that it reduces computation: the system
doesn’t build an object only to retrace its steps and modify that same object later. Relevant
consequences of this phase-based approach are explored in Section 6. Technically, the phases are
CP and v*P – see Section 6. Nothing really hinges on the construct ‘phase’ until we get to more
complex cases; see Section 6 for further comment; but we do want to acknowledge at this point
the phase-based approach that is assumed throughout.
26 Generative Syntax
{many, people}), and the copy relation between the two structurally identical
inscriptions is thereby established.
As noted above, the information provided by an application of FC is critically
important to CI (interpret structurally identical inscriptions X, Y in exactly the
same way). It is also relevant to SM. SM uses the ‘copy relation’ information
supplied by FC as an instruction to not pronounce the lower copy, which we
indicate in gray below:
associated with many people. But, for now, further details regarding how FC
works need to be developed.
Since FC applies to terms of the WS, it follows that it, like Merge, must locate
its targets within the WS. This is done by computationally optimal Search.
While Merge builds syntactic structure, FC aids interpretation of those struc-
tures by establishing relations between elements within them. Thus, FC will
apply to two structurally identical inscriptions, X, Y, only if they are found by
Minimal Search. The smallest set in which the copy relation can hold between
X and Y is {X, Z} where Y is a term of Z. Therefore X, as a sister of Z,
constituent-commands (abbreviated c-commands) Z and any element contained
in Z, and thus X c-commands Y.54 In this way, Minimal Search for FC requires
that X and Yare in a c-command configuration (henceforth cc-configuration). In
54
There are three fundamental relations that result from Merge. Relative to elements X, Y; (i) X,
Y can be co-members of a set; (ii) Y can be a term of X; and (iii) Y can be a term of the co-
member of X, which is c-command. Thus, we appeal to the simplest relations.
Merge and the Strong Minimalist Thesis 27
Summarizing the results so far, Merge is binary. Only External and Internal
Merge are possible, and are optimal given Minimal Search. Merge minimally
modifies the WS, by Preservation. Merge yields identical inscriptions, which
may be assigned the copy relation by FC.
(32) WS = [a, b, c]
Merge(a, b, WS)
WS’ = [{a, b}, c]
28 Generative Syntax
Shouldn’t Merge minimally disrupt the WS, adding {P, Q}, but leaving the WS
completely undisturbed otherwise, such that every member of the input WS is
also a member of the output WS’? If not, what’s happening to P, Q?
In constructing formulas of propositional calculus, all propositions intro-
duced remain fully available to further manipulation; with a proof of formal
logic: if p is in line one of a proof and is joined with q, which is on another line of
the proof, via the rule of addition, the inscription p in line one remains, and an
identical inscription occurs in the conclusion [p & q]. All else equal, the
computation of language should be no different. The seemingly simplest state-
ment of (third-factor-compliant) Merge is: Merge(P, Q) results in identical
inscriptions of P, Q remaining in the WS; that is, the output (33).
As stressed above, Merge is a component of the human linguistic system and
is, naturally, constrained by the relevant properties of that system. One con-
straint, following the work of Sandiway Fong, is that the computational system
seeks to minimize resources.55 As Fong notes: “The device we call the brain is
a marvelous organ, endowing us with the capacity for symbolic thought,
language and reasoning far beyond what other animals have exhibited.”
But, Fong continues:
this marvel is not the computational powerhouse that we might assume. The
biological unit of computation, the neuron, possesses a slow communication
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
55
See Fong (2021) and Chomsky (2019b), for further details. As discussed there, the same
problems bar all extensions of Merge: Parallel Merge, Sidewards Merge, Late Merge (which
also violates SMT for independent reasons); such extensions expand the WS.
Merge and the Strong Minimalist Thesis 29
can detect vibrations smaller than the diameter of a hydrogen atom (Fletcher
& Munson 1933). In case after case, the brain does not make use of the full
resolution of available sensory inputs. Perhaps the answer is that it cannot, as
a slow organic system, it does not possess the necessary bandwidth, and
therefore, it must (selectively) throw away much of the signal. The idea that
this pressure for efficiency also pertains to both data and computation in
language, born out of biological limitations, was termed The Third Factor by
Chomsky (2005). (Fong 2021, p. 3)
(34) WS = [a, b, c]
Merge(b, c, WS) P = b; Q = c
If the targets of Merge remain as members of the WS, the result would be:
But output (35) violates MY: the number of accessible terms in the output WS
has increased by more than one. Not only is the new object {b, c} accessible,
but, recalling our earlier discussion, there are now also two distinct inscriptions
b and two distinct inscriptions c. Another, somewhat more intuitive way to think
about it is this. Merge targets b and c and puts them into the set {b, c} – the
inscriptions b, c in {b, c} are the original inscriptions from the input WS. Thus,
the inscriptions b and c that are members of the output WS (35) are new
elements, new inscriptions, and hence new terms. So Merge is resulting in
56
This principle is arguably not unique to syntax, but more general, potentially relevant for the
acquisition of phonology by ‘forgetting’ unused options (see, for example, Mehler and Dupoux
1994), and see Charles Yang’s work on probability distribution of grammars in acquisition (Yang
2002, 2004).
57
Defined in this way, MY diverges from the earlier principle ‘Restrict Computational Resources’
of Chomsky (2019a) and the similar ‘No proliferation of roots condition’ of De Vries (2009).
These prior conditions limited cardinality of the WS (or ‘number of roots’ in De Vries’s terms).
Instead, MY limits accessibility of terms, which subsumes cardinality.
30 Generative Syntax
more than one new term in the output in violation of MY. In order to satisfy MY,
the output WS would have to be:
where only one new term, {b, c}, arises. MY determines the fate of these
‘excess’ inscriptions.
binary and subject to efficient Search. Critically, Merge is also subject to the
first factor; that is, to language-specific conditions like TT. Theta Theory is not
a general efficiency condition constraining formal systems; it is unique (as far as
we know) to language. It’s important, then, to consider the consequences of the
interaction of Merge with such conditions, as it accounts for certain unique
properties of language compared to other formal systems.
58
Predicate/argument structure is the representation of lexical structure, including the theta-role
assigners (predicates) and theta-role assignees (arguments), in the format of set theory. Thus, as
mentioned in the text, a theta role assigner like a transitive verb (say, chase) is combined with an
argument (like the cat) in the set {chase, {the, cat}} to create the configuration for the ‘chase-ee’
theta role to be assigned to {the, cat}, yielding the relevant semantic interpretation.
Merge and the Strong Minimalist Thesis 31
the ‘chaser’ and Patient-of CHASE, the ‘chase-ee’). In its classic form,59 TT
requires an isomorphism between theta roles and arguments: every theta role
must be assigned to one, but only one argument; and every argument must
receive one, but only one theta role. It captures the fact that gratuitous argu-
ments unconnected to the meaning of a sentence yield gibberish, as in (37)
where the noun phrase arguments the building and Tom have no relation to the
rest of the sentence – they do not receive a theta role. Likewise, we don’t get
(38)
where the theta roles of the verb put, informally ‘the-thing-put’ and ‘location,’
are unassigned. Additionally, a single argument can’t be assigned multiple theta
roles; thus (39)
can’t mean that the dog saw itself (where the dog is assigned both the ‘see-er’
and see-ee’ theta roles).
Interpretation is computed from the meaning of lexical and phrasal items and
the structural relationships between those items. Structure building, and hence
the structure building operation Merge, will be constrained by the (language-
specific) requirements of interpretation, including TT.60 Just what are the
consequences of the interaction of Merge with TT? Just how is Merge con-
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
At the semantic interface, the two types of Merge correlate well with the
duality of semantics that has been studied within generative grammar for
almost forty years, at first in terms of “deep and surface structure interpret-
ation” (and of course with much earlier roots). To a large extent, EM yields
generalized argument structure (θ-roles, the “cartographic” hierarchies, and
59
See Chomsky (1981) and references therein for further discussion.
60
As we’ve noted, Merge itself does not know what will be the interpretation of an object it builds.
However, the interpretative component can act as a ‘filter’ on the output of Merge, and hence
constrain its operation.
32 Generative Syntax
If you look quite generally at the interpretation of expressions, it falls into two
categories. There is one category which yields argument structure (theta roles
and the interpretation of complements of functional elements). There is
another category which is involved in displacement, which has kind of
discourse-oriented or information-related properties or scopal properties
and so on, but not argument properties. That’s duality of semantics. If you
think about it a little further, you see that the first type, argument structure, is
invariably given by external MERGE. The second type, non-argument struc-
ture (other factors) is always given by internal MERGE.
Let’s consider how this distinction arises in the current system. The hierarch-
ical position of an argument relative to a predicate (i.e., a theta role assigner)
determines the assignment of theta roles and therefore the interpretation of an
argument. For example, the argument NP the dog merged with a transitive verb
(like chase) forms the {V, NP} configuration associated with the theta role of
Patient (chase-ee).61 The primary theta position is direct object (object of
a transitive verb, or object of an adjective, preposition, or nominal), and this
position can be created only by EM, not IM. Consider a verb and its object. If the
WS is
then since neither V nor NP is a term of the other, only EM, that is, Merge(V, NP,
WS), is possible. Thus,
(41) EM and only EM creates object position, the primary theta position.
The other major theta position is predicate-internal subject (i.e., the external
argument position). For the moment, let’s put aside technical details (see
Section 6 for formal discussion) and illustrate with the simplified structure in (42).
Assume that chased assigns a theta role to the NP argument Bill. In (42), Bill
must have been externally merged into this position; that is, we build the
61
As in, among many others, Williams (1994), Harley (1995), Hale and Keyser (2002).
A relatively recent alternative view takes the Patient theta role to be assigned in a Spec-Head
relation with a dedicated functional head, analogous to assignment of Agent to the specifier of v*
(or Voice in work following Kratzer 1996). See, for example, Borer (2003) and Lohndal (2014).
Merge and the Strong Minimalist Thesis 33
predicate {chased, dogs} and then combine the subject with this predicate using
EM. This external argument could never be filled by IM. Suppose, for instance,
that Bill starts in the object position
This violates Preservation. The single inscription Bill is interpreted in more than
one way: by virtue of its initial position, and by virtue of the position into which
it merges by IM. Hence, such an application of IM is disallowed.
What we find, then, is EM (and only EM) creates the primary theta positions.
Suppose we strengthen this to a segregation of EM and IM, yielding the
principle of Duality of Semantics:62
(45) Duality of Semantics: EM, and only EM, creates theta positions.
creates a theta position; any argument internally merged to form a theta position
will necessarily change its interpretation, in violation of Preservation. Thus,
Duality follows.
Note further that Duality as in (45) is an economy condition that sharply
reduces options for application of Merge: merger of an argument that creates
a theta position is necessarily via EM; and IM of an NP argument always creates
a non-theta position. Thus, IM and EM are segregated, contributing to the
overriding meta-condition of Resource Restriction (RR) in a natural way; in
short, choice points are reduced.
Duality has a number of important consequences. Consider, for instance, the
classic distinction between (46) and (47), recalling our earlier notation where
a strikethrough indicates that the element is present in the syntactic object (for
CI interpretation), but not pronounced at the SM interface.
62
See Chomsky (2019b).
63
However, IM from a theta to a non-theta position is clearly allowed; this is natural since the target
NP is not adding to its already-established interpretation.
34 Generative Syntax
For (46), suppose that an instance of the NP argument the man is externally merged
into the (semantic) subject position.64 Then, following FC(the man, the man), a TT
violation will result: a single theta role assigner, chased, is assigning more than one
theta role to the same element – where the ‘same element’ is structurally identical
inscriptions of a copy pair formed by FC. If we try to internally merge the man from
object to subject position, we violate Duality (IM is not to a theta position) and
Preservation. (46) is thus correctly disallowed. (47), on the other hand, is perfectly
fine: by Duality, we can’t externally merge (an instance of) the man into subject
position since the subject of the passive is a non-theta position. However, we can in
this case internally merge the man from object to (non-theta) subject position,
running afoul of neither TT/Duality nor Preservation. See Section 6 for full details.
The modes of application of Merge, the conditions Merge is subject to, and
the nature of FC together give us just the right empirical results for standard
cases, as illustrated in this section (see also Section 6 for detailed illustrations).
But, SMT, as conceived here, also has an important enabling function, predict-
ing the existence of phenomena that are otherwise completely unexplained. One
of these is obligatory control.
ideally just Merge and (ii) that the language faculty is an ‘optimal’ solution to
certain language-specific conditions, including TT. We now turn to a further
consequence of adherence to SMT.
In earlier stages of the development of Generative Grammar, control phe-
nomena, as in (48)
64
That is, suppose the, man are selected and then Merged into {the, man}, which is Merged to form
{chased, {the, man}}. Then, the and man are selected from the Lexicon a second time and
entered into the WS. Merge then builds a separate instance of {the, man} and merges it to
{chased, {the, man}}. In this case, two distinct instances of {the, man}, each built separately,
would be EM’ed into their respective positions.
65
See, for instance, Chomsky (1981) for analysis within the Government and Binding Framework.
See also Landau (2013) and Reed (2014).
Merge and the Strong Minimalist Thesis 35
that (48) was interpreted as: the man x, x tried x to read a book. Under current
assumptions, none of this is required. Rather, the central properties of the
‘control component’ simply fall out as a consequence of SMT; nothing beyond
what we’ve proposed is necessary.
Let’s consider (48), avoiding full technical details for right now (see
Section 6). The argument NP the man can be merged into the lower (theta-)
subject position:
We then build up to
By Duality, {the, man} cannot be internally merged into the higher subject
position; that higher (predicate-internal) position is a theta position of the
predicate try and hence internally merging it would run afoul of Duality. But
nothing prohibits externally merging an identical inscription of {the, man}
(built separately from the first one) into the higher subject position.66 Thus,
the WS would contain the already-constructed object, {tried {to, {{the, man},
{read, {a, book}}}}}. Then, the and man are selected from the Lexicon, entered
into the WS where a new object {the, man} is created. This new object, {the,
man}, is then externally merged with the predicate to give:
(51) {{the, man}, {tried {to, {{the, man}, {read, {a, book}}}}}}
case, unlike in the man chased the man reviewed in subsection 5.2, there are two
distinct theta role assigners associated with the inscriptions of the man, namely,
read a book and tried. Thus, Merge can generate the representation (51) – as long
as identical inscriptions of the man are each externally merged into their respect-
ive positions. Now FC can apply to the representation (51), with no knowledge of
how (51) was constructed. The conditions for FC(the man, the man) to apply are
met – it is in fact a cc-configuration, and thus the man, the man are assigned to the
copy relation, the lower copy is unpronounced (as a consequence of the economy
condition discussed above), and (51) the man tried to read a book meaning the
man x, x tried to x read a book results. Exactly the right result with significant
empirical advantages, as we’ll see in Section 6.
As mentioned in subsection 2.2, one way to understand the Strong Minimalist
Thesis (SMT) is as a thesis about the nature of language, that is, about FL: the
thesis that FL is an optimal solution to certain language-specific conditions
66
Note that there is some similarity to the older Equi-NP Deletion analysis of Rosenbaum (1967).
36 Generative Syntax
6 Illustrations
Up to this point, (i) the form and function of Merge have been outlined,
(ii) key third-factor (computational efficiency) and first-factor (language-
specific) principles have been traced, and (iii) the consequences of the
interaction of Merge with these principles have been explored. Merge, and
the way that it can (and can’t) operate, is reasonably clear. In this section,
we turn to a somewhat more technical exploration of Merge, working our
way through a set of central derivation types. In Sections 1–5, we have
tried to keep the discussion at a fairly nontechnical level, focusing on key
concepts and components rather than formal technicalia. The more formal
implementation of the framework is ultimately important, however, and so
in this section we consider technical details, presupposing familiarity with
recent work.
We assume, first, that inscriptions of relevant lexical material are inserted into
the Workspace (WS) as needed.68 Given this, the direct object a pear can be
67
Under current assumptions, Duality constrains Merge, which builds structures, but not FC,
which assigns the copy relation, while such structures with copy relations are interpreted in
accordance with univocality. Both Duality and Univocality are rooted in TT, but the former is
construed as a condition on Merge, whereas the latter is how structures (built by Merge and
assigned copy relations by FC) get interpreted.
68
We leave open the nature of lexical insertion; the assumption is simply that lexical material can
be retrieved from the Lexicon and entered into the WS at any point in the derivation – nothing in
the subsequent discussion hinges on how this is done. One option is insertion of lexical items via
an operation ‘Select’ (Collins and Stabler 2016). Alternatively, lexical items might simply be
freely accessible.
Merge and the Strong Minimalist Thesis 37
derived via the mapping shown below, where lexical items a and pear are
members of the input WS:69
It is not clear to us at this point whether object shift, whereby the NP comple-
ment of R is raised to create a specifier position of R, is optional or obligatory,72
but we assume it occurs. Such raising of the object must be by IM given
Preservation (the specifier of R73 is a non-theta position). If we were to create
a duplicate instance of the object, a pear, and try to EM it into the Spec-of-R
69
Technically, Merge maps WS directly to WS’. Despite this, we could informally describe the
internal workings of the process like this: Constrained by efficient Search, Merge looks into the
WS (53) and finds the two WS members a and pear (both locatable by Search). Merge creates the
set {a, pear} and adds this to the output WS’. In a vacuum, the inscriptions a and pear would
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
However, Merge obeys Minimal Yield (MY), which would be violated in the mapping from (53)
to (i) in that the WS would be expanding beyond just one. Thus, MY requires that the inscriptions
a and pear each appear only once in the output WS’ (54). One could thus informally think of (i)
as an intermediate step between (53) and (54), though it is important to note that (i) is never
computed given the formal apparatus developed above.
70
We assume the labeling algorithm of Chomsky (2013, 2015), and assume that proper labeling
occurs at relevant points; that is, at the phase level – we leave the details of labeling aside given
our focus here on Merge itself.
71
Note that the derivation of (52) is built through successive applications of Merge, specifically
External Merge. Note further that Duality of Semantics does not play a role in these applications
of Merge; Duality is relevant only when the merger of arguments (NP and embedded CP) is
involved, where it requires EM of an argument to saturate a theta position; as we’ve seen, IM
cannot be used to fill a theta position.
72
See Lasnik (2022) for extensive discussion; see also Johnson (1991), Lasnik and Saito (1991),
Koizumi (1993, 1995), and Lasnik (2002).
73
We use the traditional expressions ‘Spec-of-X’ and ‘Complement-of-X’ for expository conveni-
ence; these terms (Chomsky 1970), and the notions of subject or object that they represent
(Chomsky 1965, p. 71) were never introduced as syntactic primitives. By ‘Complement-of-R’
38 Generative Syntax
position (and then relate the structurally identical inscriptions of a pear, one in
Spec-R and one as object of R, by FC), Duality would be violated – EM of an
argument must be to a theta position – thus, EM is simply not available in this
instance.74 If the object raises to Spec-R, meaning IM has applied, identical
inscriptions of the object will automatically be produced. The original position
of the object as complement of R must remain given Preservation. Thus, we get:
(57) WS = [{{a, pear}, {R, {a, pear}}}] = [{IA, {R, IA}}], IA is the internal argument75
Least Search finds only the higher of the identical inscriptions and hence (as we
saw in Section 3), Minimal Yield is not violated with this instance of IM. Such
‘object shift’ occurs with Exceptional Case Marking,76 something we’ll turn to
a bit later.
We then merge in the phase head v*, and externally merge the external
argument (EA) the fox, which would have been constructed in parallel, to yield
We are now at the v*P phase level where Form Copy (FC) can apply. Relevant
here would be FC(IA, IA), assigning the copy relation to the structurally identical
inscriptions of the IA, {a, pear}. The phase-head complement, RP={IA, {R,
IA}}, is accessed by the interfaces and, given the Phase Impenetrability
Condition (PIC),77 is inaccessible to Merge from here on – an expression of
the bottom-up, cyclic nature of the syntax. What this means for interpretation at
the SM interface, if externalization is activated, is that only the higher copy will
be pronounced, yielding the linear effect of object shift. Note further that we
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
assume the labeling system of Chomsky (2013, 2015); and, specifically, that the
phase-head complement is labeled by the ‘shared prominent feature’ option of
the labeling algorithm; that is, in {IA, {R, IA}} = {{the, fox}, {R, {the, fox}}},
Search finds the phi features inherently borne by the lexical iterm fox and those
phi features of R, and assuming these features match, the object is labeled by the
we simply mean the co-set-member of R. ‘Spec-of-R’ would be the co-set-member of {R, IA},
and so on.
74
A question arises regarding how expletives enter into the derivation. Expletive there, for
instance, is not an argument but must be externally merged into a non-theta position. We leave
this matter aside.
75
The external/internal distinction refers to older analyses where an argument that is interpreted as
the subject of a predicate is syntactically external to that predicate, as opposed to an object of
a predicate, which would be internal to the predicate. We use the terms here to distinguish the two
arguments in v*P, for expository convenience.
76
We presuppose familiarity with ECM and related conceptual/empirical issues discussed in the
literature.
77
We assume that CP and v*P are the phases. In Chomsky (2015), feature inheritance was
assumed, along with the ‘phase head’ property. We put aside that complexity here.
Merge and the Strong Minimalist Thesis 39
phi features themselves. In short, a single unique feature set (the phi features
shared by IA and R) serve as the label.78
Let’s continue the derivation up to the next phase, the C phase. First, we
merge INFL (I).
Note that the grey shading indicates material that is no longer accessible to
Merge given the PIC.
The EA the fox now merges (for labeling purposes) to Spec of INFL (the
syntactic subject position). This must be by IM; the Spec-of-INFL position is
a non-theta position, and hence, by Preservation, merging an NP argument to
this non-theta position can’t be by EM. That is, we can’t build a duplicate of the
fox, EM this duplicate in the Spec-of-INFL position, and then, at the phase level,
use FC to make the identical inscriptions of the fox copies. Thus, EA will
internally merge to Spec of INFL:79
As traced in Section 4, this application of Merge does not violate MY; the lower
instance of EA, the fox, is not found by Search, only the higher identical
inscription of the EA is; thus, the lower instance is inaccessible to Merge
(hence the strikethrough of the lower inscription) and so does not count as
expanding the WS. We then merge the phase head C, reaching the next phase
level, CP
(61) WS = [{C {EA, {I, {EA, {v*, {IA, {R, IA}} }}}}}]
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
Now that we are at the phase level, FC(EA, EA) can apply, rendering copies.
Assuming that labeling is appropriately carried out, the derivation is complete,
ultimately yielding the structure pronounced as the fox ate a pear.
It’s worth pausing for a moment to consider a few points from this simple
derivation. We stress first that there is just a single operation, Merge. The
expressions ‘internal’ and ‘external’ Merge are for ease of exposition and
have no theoretical significance; they are simply modes of application of
Merge. Merge is doing the same thing with both ‘external’ and ‘internal’
Merge. Merge is always (i) targeting elements P, Q, and (ii) adding the set {P,
78
The full details of the labeling mechanism go beyond the scope of the present discussion; see
Epstein and colleagues (2014).
79
Note, furthermore, that efficiency considerations also favor IM over EM, where possible. On the
one hand, IM requires one instance of Merge; EM, on the other hand, as specified in the text,
requires building a separate instance of {the, fox} and then externally merging it to create the
Spec-of-INFL position.
40 Generative Syntax
Sentence (62) is perfectly acceptable, but not (63) on the interpretation “the fox
chased itself.” Let’s work through the derivations under current assumptions.
We first Merge the object and the verbal root R (i.e., CHASE):
At this point, an object must be merged in Spec-v* to discharge v*’s Agent theta
role. Given Preservation, we can’t achieve this via IM of IA (either from
Complement-R or, with object shift, from Spec-R); only EM can put an NP
argument into a theta position. Thus, IM is blocked here. Suppose, then, that we
build a duplicate of the fox and try to EM this duplicate into the subject position,
yielding
(66) WS = [{X, {v*, {(IA), {R, IA}}}}] where X = a duplicate of {the, fox} built
independently
assigner, namely, the v*-R complex.80 There is thus no way to generate (63) on
the intended interpretation, neither by IM nor by EM. As a final point, we are
adopting the simplest assumption that FC is optional. If FC(X, IA) does not
apply in (66), then we get a legitimate derivation for the well-formed ‘The fox
chased the fox’ (where there are two foxes), the right result.
The passive (62), on the other hand, is perfectly fine. We start again with (65),
and Merge in INFL (I) but since the syntactic subject of a passive construction is
not a position to which a theta role is assigned, nothing prohibits IM of the IA to
create the (non-theta) Spec-of-INFL position.81 The result of such Internal
Merge is:
Since passive v does not assign a theta role to the Spec-of-v position, FC(X, IA)
can apply without violating univocality.82 Thus (X, IA) are copies, the lower
copy is not pronounced, and we derive the surface for The fox was chased.
Similar reasoning holds for object raising in unaccusative sentences, subject-to-
object raising (ECM), and subject-to-subject raising.
Consider, for instance, the key points of the derivation of ECM con-
struction (68):
The lower semantic subject, the fox, is externally merged into the Spec-of-v-EAT:
Now EA can internally merge into the non-theta Spec-of-R position (for Case
and labeling). By Search, the lower EA inscription is inaccessible and hence the
80
Note that there is an alternative explanation for why FC is blocked in (66): the PIC; that is, FC
cannot apply across the phase head. Under this view (66) is disallowed since FC(X, IA) can’t
apply, meaning that X and IA are repetitions, yielding The fox chased the fox (two different
foxes); but not The fox chased (meaning the fox chased itself). In control cases like, Bill tried to
Bill leave, FC could apply on the assumption that both instances of Bill are within the same
phase. See Chomsky (in press) for further comments.
81
Here we make the standard distinction between weak v (not a phase head) and strong v* (is
a phase head). We put aside issues with the merge of to.
82
And under the alternative view suggested in footnote 80, FC could apply here since both of the
identical inscriptions are included in the same phase. See Chomsky (in press) for further
consequences.
42 Generative Syntax
WS has not expanded. Finally, we merge in the higher phase head v* (and its
external argument X):
Here, FC({the, fox}, {the, fox}) applies, ultimately yielding (68), after exter-
nally merging INFL and internally merging external argument X to Spec of
INFL. Raising structures like (72) pattern in essentially the same way.
(72) a. the student seems to be confused by the new concepts/happy about the
assignment/ etc.
b. the student was expected to be confused by the new concepts/happy about
the assignment/ etc.
We see, then, that the various first and third-factor principles conspire to constrain
the application of Merge, yielding just the right empirical results in these core cases.
The derivation proceeds as follows. We first build the lower, infinitival clause.
We assume that there is no lower CP. The matrix root TRY is introduced into the
WS and externally merged with the infinitival, followed by the introduction and
merger of matrix v*:
Given Preservation, the lower external argument (EA) {the, man} can’t be
internally merged from the lower (theta position) to the Spec-v* position, that
is, to the higher external argument position: IM is only to non-theta and not to
theta positions.83 But another option is available. First, we build within the WS
a duplicate instance of {the, man}, completely independent of the first:
83
We leave open the exact status of the infinitive marker to. It could be argued that to is
a morphological reflex of the bare form of the verb. In any case, whether the EA moves through
Spec of to, the EA can’t IM into the higher Spec of TRY position, given Preservation.
Merge and the Strong Minimalist Thesis 43
We now externally merge this new instance of {the, man} to Spec-v*; this is EM
of an NP argument creating a theta position, sanctioned by Preservation:
This matrix v*P is a phase and thus FC and labeling apply. FC will apply as FC
({the, man}, EA), rendering these structurally identical inscription copies. The
lower copy, EA, is not pronounced at SM if externalization is opted for. The
phase-head complement is transferred and now inaccessible for further syntac-
tic operations, given the PIC:
Crucially, the result here does not violate TT since {the, man} and its structur-
ally identical copy, EA (= {the, man}), get a theta role from different theta role
assigners, in complete conformity with univocality.
Next, we construct the higher C phase (79) in three steps, where EA2 stands
for {the, man} in (78):
(79) WS = [{C, {EA2, {I {EA2, {v*, {TRY, {to, {EA, SLEEP}}} }}}}}]
84
We assume transitivity of the FC relation.
85
We assume that FC applies optionally. The assumption leads to an empirical consequence,
namely potentially SM-blocked but CI-convergent derivations of John tried Mary/John to win,
raising factual questions that there is no obvious way of answering.
Note furthermore that FC is constrained by MS, as is attested by contrasts like,
(80) a. NP seems [NP to win] = the man seems to the man win = ‘trace’
b. NP tried [NP to win] = the man tried to the man win = ‘PRO’
In each of the two structures, the relation between the NPs is the same in that
they are copies resulting from FC(NP, NP) applying at the phase level. The
distinction is not in the featural makeup of the elements themselves (as was the
case with trace vs PRO). Looking just at the representations (80a) and (80b)
themselves, as FC does, there is no distinction between the lower NPs.
However, the derivations associated with the two are distinct and this distinc-
tion is naturally detected by the interpretive component at the point that
interpretation takes place: in (80a), the lower NP was internally merged to the
higher position, whereas in (80b) the lower and higher NPs were created
independently and each was externally merged into its respective theta position.
The lower copies in (80a) and (80b) are structurally identical elements in a theta
position c-commanded by some higher structurally identical NP. The interpret-
ive component can detect a difference by being equipped with thematic infor-
mation: the lower copy functions as ‘PRO’ if its c-commanding identical
element occupies a theta position; otherwise, it is ‘trace.’ When they function
as trace or PRO is thus deducible.
The crucial distinction is whether the structure is created by IM (for ‘trace’)
or by EM (for ‘PRO’). We’ve already detailed the derivation associated with
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
control, (80b). Let’s turn then to (80a). Suppose we have built up to:
(81) WS = [{I, {seems, {to, {NP1, win}}}}] (Subscripts are used only for ease of
exposition and NP = {the, man}.)
(82) Merge(NP1, IP, WS) where IP = {I, {seems, {to, {NP1, win}}}}
By looking over the steps of the derivation, which Preservation by definition has
access to, Preservation can detect the identity relation between NP1 in (81) and
NP1 in (83); it is the very same NP1 inscription, and Preservation is satisfied
since the raised NP does not accrue an additional theta role and hence does not
change meaning. As we’ve seen, this identity relation does not hold with
Merge and the Strong Minimalist Thesis 45
control; with control, separate instances of the NP are created and put into the
copy relation by FC.
In this way, the classic trace versus PRO distinction arises, not by stipulation
but as a natural consequence of SMT. Consider, for instance, the contrast in (84)
from Burzio (1981, 1986)86:
(84) (a) one interpreter each seems [t to have been assigned t to the diplomats]
(b) *one interpreter each tried [PRO to be assigned t to the diplomats]
This distinction arises not because of a distinction between PRO and trace, but
because of the difference in the derivation of the two structures: (84a) involves
IM of one interpreter from the positions marked t and reconstructed in that
position, while in (84b), there are two separate instances of one interpreter (as
we have just detailed) given Preservation, and there is no reconstruction; rather,
interpretation of an independent element, and thus each is stranded. Similar
reasoning holds for the other classic cases of the traditional trace versus PRO
distinction, including (85) versus (86), from Chomsky (1965),87
where (85a) and b have a distinct interpretation, while (86a) and b are inter-
preted in the same way. The distinction follows from structurally identical
inscriptions being derived from IM, in which case in the eyes of Preservation
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
they are in fact one and the same inscription, as opposed to the structurally
identical inscriptions being constructed from separate instances of EM. What is
traditionally referred to as ‘trace’ is the identity relation ensured by FC; while
PRO is a separate instance of NP put into the copy relation via FC. As noted in
Chomsky (2021b, p. 22): “The distinction between the two kinds of copy seems
well established from several perspectives. It therefore provides empirical
86
There are many additional illustrations of the trace versus PRO distinction. Thus Kayne (1975)
gives:
See also Burzio (1986) for similar cases in Italian. These also follow from the present framework.
See Chomsky (2021b) for a range of additional examples.
87
It should be noted that the Aspects model of Chomsky (1965) did not use ‘trace,’ but,
presupposing object raising, the contrast in (85)/(86) translates to the trace versus PRO
distinction.
46 Generative Syntax
support for the assumption that the Duality principle . . ., on which the distinc-
tion rests, is indeed an LSC [Language Specific Condition].”
There are therefore a number of different relations between structurally
identical inscriptions: Identity, Copy, Repetition. The identity relation is one
that can only be detected across steps of a derivation. Thus, in the mapping from
(87) to (88), via Merge(NP1, IP, WS), the inscriptions in the input and output
WS are identical, the very same inscription:
the relation between the structurally identical inscriptions NP1 and NP is one of
copy, assuming FC(NP1, NP) at the phase level. In effect, FC confirms the
identity relation of an application of IM. But FC also applies in some cases of
EM, as in control. Finally, structurally identical inscriptions can be repetitions,
as in, say, Many people praised many people (where, as we’ve seen, there are
distinct two sets of people). In short, structurally identical inscriptions can be
identical or not. In the former case they are IM-derived (IM-trace); in the latter
case they may be either copies by FC (EM-PRO) or repetitions, independently
generated. The interpretive component, equipped with Duality, can see this
difference relative to the representations to which it applies: the matrix subject
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
88
For similar comments on the history of Merge, see Epstein and colleagues (2022), from which
this section draws.
Merge and the Strong Minimalist Thesis 47
89
It should be noted, however, that this is a standard misinterpretation of Humboldt, who was
talking about the ‘creative use’ of an internal system, that is, production (‘performance’), not
generation by a recursive system that made creative use possible (‘competence’), a distinction
that did not exist clearly until mid-twentieth century (Chomsky 1965, 1966a).
48 Generative Syntax
where who is interpreted relative to its position as the object of visit, but who is
pronounced relative to its position as the initial element of the sentence.92
So, more than just PS rules were needed. A different set of operations,
embedded within the transformational component, was proposed. These oper-
ations took the hierarchical phrase structure built by PS rules as input, manipu-
lated it in various ways, crucially including displacement, and gave a modified
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
phrase marker as output. Thus, it seemed during the pre-Merge period that two
fundamentally different components were necessary:
Through most of the history of the theory, displacement (one property of the
transformational operations) was considered an oddity and a complication:93
90
The terms Deep Structure and Surface Structure refer to levels of representation within the
earlier framework (see in particular Chomsky 1965), where Deep Structure was an abstract
representation resulting from the application of certain rules (phrase structure rules plus lexical
insertion) that could then be mapped onto Surface Structure via transformational operations. The
bottom-up Merge-based framework eliminates these levels of representation.
91
The comparison of the Merge system with LSLT is necessarily imprecise as they are conceptu-
ally quite different.
92
Its scopal properties are also determined from the clause-initial position.
93
In fact, various alternative programs emerged, some attempting to eliminate the transform-
ational component; see, for instance, Harman (1968); the GPSG framework of Gazdar; see
Merge and the Strong Minimalist Thesis 49
why would human linguistic systems have displacement, whereby the sound of
a category is determined relative to one position, but its meaning is determined
relative to another (e.g., who in Who did you visit? as discussed above),
a nonoptimal design with respect to communicative efficiency. As is well
known from the parsing literature, comprehension of sentences displaying
displacement of a wh-phrase, yielding ‘filler-gap’94 dependencies, imposes
a burden on the speech perception device.95 On hearing who at the outset of
such a sentence, the comprehender’s parser must store this NP, continue parsing
the ensuing input, and then identify the position from which who was moved,
thereby recovering the meaning structure. For now, we note that at the time there
was no explanation for displacement beyond the statement that transformations
had defined within them the power to carry out the dissociation.96 As we will see
later, an explanation had to await the introduction of Merge; an explanation for
displacement phenomena in terms of hierarchical structure was essentially
unformulable with only the rewrite rules of phrase structure grammar, whose
concern was the generation of terminal strings (weak generative capacity). The
structure it could express was the kind of structure that could be determined
from any derivational sequence of a generable string. This was unprincipled
because any grammar that fulfilled the task would be a success. But more
importantly, linguistically significant generalizations could not be expressed,
and explanatory adequacy was beyond reach (Chomsky 1956). Structure
dependence of rules and of displacement showed the inadequacy of string-
based rewrite systems (Berwick et al. 2011; cf. Chomsky 1980). To overcome
these fundamental shortcomings and attain some level of explanatory adequacy,
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
transformational rules that map phrase markers (PMs) onto PMs were intro-
duced to explain, rather than merely describe, the structure of displacement. The
Gazdar and colleagues (1985); the LFG framework of Bresnan – see, for example, Bresnan
(1982); see also Steedman (1987) on CCG; and the Functionalist Grammar of Dik (1987) and
later work.
94
The so-called ‘filler’ refers to who in initial position, and the ‘gap’ is the object position of visit
with respect to which who is interpreted. In fact, as was noted, who has a dual interpretation, an
interpretation relative to object position (its theta role), and an interpretation relative to the
operator position binding the object; that is, who is also interpreted as a quantifier binding
a variable in VP, hence it’s interpreted in the higher position as well.
95
See Chomsky (2019b) for detailed discussion. It would seem that language is not particularly
well designed for efficient communication, contra the twentieth-century behaviorist/structuralist
conception that sees language as fundamentally a system of communication, still virtual dogma
in philosophy of language and most of cognitive science.
96
Nor could displacement be explained in string-producing rewriting systems. At best it could be
‘described’ (weak generative capacity of terminal strings). Tree structure is ruled out for string
sets of mildly context-sensitive languages like Swiss German (Huybregts 1984; Schieber 1985).
But also for simple context-free grammar, displacement was a problem (forcing extra devices,
‘slash features’ of PGSG simulating ‘movement trajectories’). See also Gazdar and colleagues
(1985).
50 Generative Syntax
In fact, however, no such rules were ever proposed. Rather, for the major lexical
categories (noun, verb, adjective) what we find is endocentricity. So, we can
ask: why does a rule like (95) have the properties it has?
(95) VP → V NP
For instance, why is the ‘mother node’ on the left labeled VP (and not NP or
something else entirely)? And more generally still, why is there a label at all; for
note that at one level of abstraction (95) is no different than, say, X → Y Z; that
is, it is a pure stipulation that VP is ‘above V.’ Within the earlier theory, these
questions were simply not addressed; rather, PS rules were axiomatic and any
single phrasal category could be rewritten as any sequence of categories and
thus the existence and categorial status of mother labels were pure stipulation;
that VP was above V in (95) is arbitrary. There was no answer to the question:
Why these rules and not others? Such considerations led to the next stage in the
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
97
For an accessible account of this, see Everaert and colleagues (2015). Chomsky (1956) was the
first important study to show precisely that.
Merge and the Strong Minimalist Thesis 51
only the general X-bar-format (i.e., the abstract structure that all phrases share)
as part of UG.98
Under X-bar theory, all phrases were strictly endocentric: each phrase was
assumed to contain a unique head whose lexical category label determined the
label of the phrase containing it. Thus, exocentric PS rules like (94) above are
excluded, as is the PS rule for sentence in (96) along with the label S.99
(96) S → NP VP
Ultimately, all such cases were ruled out under strict adherence to X-bar theory,
as all structures must be ‘headed.’ A partial explanation for ‘why these rules and
not others’ was thus possible.
Furthermore, massive simplification of the PS component resulted. Rather
than the language-specific PS rules of, say, French or Japanese, there remained
only the general phrasal template, the idea being that all phrases of all languages
had the same abstract structure. Note further that this exposed the artificiality of
‘constructions,’ showing that they have no theoretical status but are rather like
‘terrestrial mammal’ in biology.100
Although it may not have been realized at the time, X-bar theory created the
possibility of factoring out linear order: X-bar projections encoded the struc-
tural relations of the elements within but not their linear order. It was maybe the
first step in dissociating linear order and hierarchical structure that led to the
result that internal language exclusively relies on structure and ignores linear
order,101 which is relevant only in externalization. Standard PS rules conflated
two relations, dominance and precedence. It was eventually realized that X-bar
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
theory encodes dominance only, pushing linear order into another domain,
ultimately to ‘externalization,’ that is, a property of phonological interpretation,
not meaning. This disentangling of dominance and precedence, along with
explaining their existence as subservient to the interfaces (dominance for
98
Note that there is some similarity to Harris’s Morpheme-to-Utterance procedures of analysis;
see Harris (1952). Note furthermore that X-bar format alone cannot generate structures, so there
was the implicit assumption that there must be some structure-building operation; one might
argue something like ‘generate alpha.’ Under X-bar theory, only well-formed (according to the
X-bar template) structures were allowed.
99
It should be pointed out that in the original formulation of X-bar theory, Chomsky (1970), (96)
was adopted; the endocentric analysis of clause structure was proposed years later in Chomsky
(1986), where it was designated “the optimal hypothesis” (p. 3). Not only did it unify the phrase
structure of lexical and nonlexical categories, but it seemed to be empirically motivated in terms
of head-to-head relations (selection) which occur between C & T, T & V, and V & C.
100
With the elimination of ‘constructions,’ there was also the elimination of such (rather pointless)
questions as whether John was expected to win was a raising or a passive structure.
101
A reviewer points out that “interestingly, though, even classic PS grammar itself created such
a possibility.” Thus, Chomsky (1965, pp. 124–126) discusses this point, but, as the reviewer
further notes, Chomsky (1965) rejected, at that point, removing linear order from PS grammar.
52 Generative Syntax
ment), but the system overall was massively simplified, with no remaining
language- or construction-specific operations. We’ve traced a few of the steps
in the historical development of structure-building operations. We’ve seen:
We also see the apparent need for separate systems: structure building and
transformations. Finally, we see the role of simplification, with steps toward
explanation. Rather than the language-, and construction-specific operations of
standard theory (the rules of French or the rules of Russian; or the rules of
relative clause formation), what emerged are very general operations and
principles, common to human language, while concomitantly deriving linguis-
tic variation and deeper insight into the nature of the language faculty, without
losing empirical coverage. With the emergence of the Principle and Parameters
Merge and the Strong Minimalist Thesis 53
model, languages are the same in conforming to the X-bar template (and
conforming to general principles), but different in linear order and in the
value of certain parameters.102
We stress too that the general linguistic principles that arose (like, for
instance, subjacency, ECP, binding conditions, or the head movement con-
straint), while not language or construction specific, were nonetheless specific
to the faculty of language, something that underwent a radical change in the
ensuing period of research. Overall, then, progress can be claimed.
But, as we see in the history of science generally, we’re never satisfied.
The unrelenting quest for explanation continued. X-bar theory, for instance,
raised a new set of questions – again, following the theme of a continued
quest for yet deeper explanation. Namely, why should there be projection,
and why should it conform to the X-bar template?103 Furthermore, the
transformational component (now reduced to just Move-alpha, and more
generally still, Affect-alpha, as in Lasnik and Saito 1992) remained
a mystery: why should there be displacement? These questions were taken
up in the next major stage of the development of the theory, what is referred
to as the Minimalist Program.
since. That is, from representations like (97) to those like (98).104
(97)
102
Among many others, see Baker (2001), Lightfoot (1993), and Roberts (2019).
103
In some cases rules were proposed for the basic structure of clause that did conform to X-bar;
various possible heads were explored, such as Infl or T, but these seemed stipulative. Some
phrases seemed to be exocentric, leading later to labeling theory; see Chomsky (2013, 2015),
among others.
104
For important discussion, see Fukui and Speas (1986).
54 Generative Syntax
Guided by this method, one question pursued in BPS is: What’s the least we can
say about human phrase structure; what’s required by virtual conceptual neces-
sity? Certainly, elements larger than lexical elements exist; phrases exist; there
is in fact hierarchical structure. Thus, as Chomsky (1994, p. 4) notes: “One . . .
operation is necessary on conceptual grounds alone: an operation that forms
larger units out of those already constructed, call it Merge . . . ”
So, Merge was introduced in BPS as the central structure-building operation
of the narrow syntax, necessary on conceptual grounds alone.105 And the basic
105
If indeed X-bar theory requires some structure-building operation that generates structures for
X-bar schemata to filter, then it is natural to ask:
Merge and the Strong Minimalist Thesis 55
idea was that Merge takes two syntactic objects, X, Y, and creates a new object
out of them; thus Merge (X, Y) = {X, Y}, and then a label for the new object is
constructed from X or Y. Thus, for BPS, Merge is defined as:
(99) Merge(X, Y) = {Z, {X, Y}}, where λ is the label of the object
Take two objects (binary), X, Y; put X, Y into a set, {X, Y}, and then label that
object with the syntactic category feature(s) of X or Y. PBS Merge, then,
represents an important step in the development of the theory of language;
and, as we’ve seen, it had profound consequences.
that takes objects X and Y, and forms {X, Y}, as in (99), repeated here
Collins (2002) was the first within the generative tradition to propose that labels
be eliminated from the representation of syntactic objects and thus that the
output of Merge is {X, Y} rather than {Z, {X, Y}},107 thus
The question above makes the transition from X-bar theory to Merge a natural move.
106
See Chomsky (2000, 2007, 2008).
107
See also Seely (2006) and see Collins and Seely (2020).
56 Generative Syntax
Building on earlier ideas, e.g. Moro (1997, 2000), the absence of syntactic-
ally encoded labels is exploited in important new ways in Chomsky (2013,
2015), where Merge, defined in the simplest form, also applied freely. Of
course, Merge is third-factor compliant; thus it conforms to such principles as
the proposed Inclusiveness Condition, “no new objects are added in the course
of computation apart from arrangements of lexical properties” (Chomsky 1995,
p. 228), and the No-Tampering Condition (NTC), “Merge of X and Y leaves the
two SOs unchanged” (Chomsky 2008, p. 138).108
In adopting simplest Merge, the syntactic objects it creates (as in Chomsky’s
2013 analysis, aka PoP) do not have labels (clearly not in the sense of BPS).
How then is the information encoded by labels derived? The answer in PoP is:
What information does PoP focus on? PoP assumes that syntactic objects must
be identified, not just for interpretation at the CI and SM interfaces, but for
legibility more generally; an object must be identified as verbal, nominal, and so
on. Thus, PoP states:
{X, Y}. That is, it must be provided by X and/or Y, since that’s all there is. And
this is precisely what PoP does. Consider a simple verb phrase. As noted above,
with a classic tree-structure representation, the label VP is providing the infor-
mation that the object, namely V+NP, is ‘verbal.’ Deconstructing the label, we
see that, informally speaking, it has two ‘parts’: the ‘V’ and the ‘P.’ V provides
the information ‘verbal’ by virtue of V bearing verbal features, but note that the
Vof ‘VP’ is just a copy of what’s already part of the syntactic object, namely the
verb V itself. The ‘P’ provides the information that it’s a phrase (and not a bare
verb); hence VP 6¼ V. Consider now the simplest Merge representation adopted
by PoP for the VP, namely {V, NP}. The information that it’s a phrase is already
(and inherently) encoded by the set brackets {. . .}. It’s a ‘phrase’ because it’s
a set (i.e., it’s not a lexical item); hence the information ‘phrase’ follows
108
Note that in the version of Merge theory proposed in Sections 3–6, NTC is a consequence of
Preservation, which also explains Duality and requires that deletion follow from an economy
principle of externalization.
Merge and the Strong Minimalist Thesis 57
automatically. What about the information that the set (i.e., the phrase) is
‘verbal’? Somehow, we need to retrieve the relevant features (verbal vs nom-
inal, etc.) that are inherently borne by individual lexical items. The object-
identification information of phrases does not arise out of the blue; in fact, it’s
provided by lexical material. The ‘verbal’ of VP is clearly derived from the fact
that its head is a verb; it’s the lexical features of the verb that ultimately serve as
the identifier of the larger object. With PoP’s representation {V, NP}, the
identifying features are located in V via the independently available, third-
factor, principle of Minimal Search.
With respect to the SO {V, NP}, at the phase level, Minimal Search MS
‘looks into’ the set and finds its two members: V and NP. NP is itself a set and
qua set, it has no object-identifying features, which is to say that a set has no
lexical features; in fact, it has no linguistic features at all (a set is not a lexical
item). The lexical item V, on the other hand, bears relevant lexical features, in
this case the features ‘verbal.’ This featural information is automatically pro-
vided by third-factor Minimal Search, and the information is used by the
interfaces to identify the object; that is, the information ‘verbal’ is appealed to
for object identification. The search results are freely provided by Minimal
Search; in the case of {V, NP}, it basically says: I found a set (=NP) and a verbal
element V; that is, I found the two members of the set I’m searching. The
interfaces in fact can use the information ‘verbal’ and do so, interpreting the
object as such; the interfaces avail themselves of information that is automatic-
ally given for free by Minimal Search.
The matter gets more complicated with ‘exocentric’ structures such as {NP,
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
VP}, where there is no single head found by Minimal Search. PoP provides
further, natural mechanisms, ultimately appealing to Minimal Search; we put
aside those additional details here; see Chomsky (2013, 2015), see also Epstein
and colleagues (2016), for discussion.
Thus, PoP takes labeling to be the process of finding the relevant object-
identifying information of {X,Y} generated by Merge. PoP proposes that such
labeling is “just minimal search, presumably appropriating a third factor
principle, as in Agree and other operations” (Chomsky 2013, p. 43). So,
labeling is not syntactically represented. No new categories are created in
the course of a derivation (which, in fact, reduces to Inclusiveness). ‘Labeling’
is simply the name given to the independently motivated Minimal Search
procedure, itself third factor and hence not stipulated. PoP eliminates labels
and projection, replacing it with a labeling algorithm that is an instance of the
general principle of Minimal Computation, hence gaining yet greater depth of
explanation.
58 Generative Syntax
of binding theory on which the theorem was based, as well as case theory and the
Barriers theory of bounding, which subsumes the Empty Category Principle
(ECP) and the Constraint on Extraction Domains (CED). The elements entering
into these principles were no longer formulable and had to be eliminated.
However, a formidable problem arose: how do we explain the empirical effects
of these former principles? After all, Principles and Parameters had been an
incredibly rich paradigm that had addressed significant theoretical questions
involving masses of interesting empirical phenomena from a wide variety of
languages. That was not a question that could be immediately answered for any/
some of these cases. But there were successes, some sooner than others. The
bounding theory has been reformulated in phase-based generation as a Third-
Factor Resource Restriction (see Chomsky 2008). The Empty Category Principle
109
The strongest view is that Merge is not just the only structure-building operation, but the only
operation at all, relegating Agree to externalization; see Epstein and colleagues (2022) and
Chomsky and colleagues (2019) for discussion.
Merge and the Strong Minimalist Thesis 59
was unified with EPP under general labeling requirements (see Chomsky 2013).
Obligatory Control has received a principled explanation under the enabling
function of SMT that allowed FC to apply to cc-configurations of structurally
identical inscriptions (see Chomsky 2021b and the discussion in Section 6).
Binding had been already discussed in Chomsky 1993 as an element of CI
interpretation, perhaps a variant of FC. Finally, Subject Island effects, accounted
for by CED, should now be explained under segregation of A- and A-bar systems
(see Section 8).110
A particularly far-reaching example of the maximization of Merge is the
reduction of the transformational component to structure building, that is, the
reduction of PS rules and transformations to a single operation of simplest
Merge, which constructs hierarchical structures with no designated linear order
or labeling.111 We’ve stressed that through much of the history of Generative
Grammar, PS grammar and transformational grammar (TG) were considered
fundamentally distinct, consisting of the unique component-internal operations
of structure building, on the one hand, and structure manipulation on the other.
But, beginning with Kitahara (1995, 1997) and continuing through Chomsky
(2013, 2015) and beyond, we find that PS grammar and TG can be collapsed
into simplest Merge. Merge(X, Y)={X, Y} unifies modes of application: X and
Y can be separate (External Merge) or one of X, Y can be contained within the
other (Internal Merge), and these applications just correspond to structure
building (contiguity) and displacement (discontiguity), respectively. Thus,
Merge can take as input
target the two syntactic objects {the, women} and {eat, apples}, and form from
them the new object,
But Merge can do exactly the same thing with the input in
targeting the two syntactic objects {{the, women} {eat, {which, apples}}} and
{which, apples} resulting in
110
See Freidin and Vergnaud (2001) for important discussion. On the reduction of the control
component to movement, see O’Neil (1995) and Hornstein (2001).
111
In early minimalist proposals Merge (composition) and Move (displacement) replaced X-bar
and Move-alpha, respectively. Merge was then reformulated as EM and Move as IM, and still
later EM/IM were unified under simple Merge.
60 Generative Syntax
112
For more detailed discussion of the balance between empirical coverage and the quest for
explanation in scientific inquiry, see, among others, Epstein and Seely (2002, 2006).
Merge and the Strong Minimalist Thesis 61
In Citko’s analysis, both conjuncts share the same what (i.e., what is parallel
merged from one conjunct to the other) and it’s this unique object that is A-bar
extracted. For reasons discussed in Section 4, we conclude that PM does not
apply.113 In fact, however, there is a way to derive the core properties of this
structure (having the same ‘copy’ relations intended on the multidominant
analysis), along with empirical merits,114 with only the necessary mechanisms
(‘bare essentials’) that SMT provides for Merge adopted here. Multiple, separ-
ately created instances of what are externally merged into their respective
conjuncts. Then, one of the inscriptions of what (it doesn’t matter in which
conjunct) can be internally merged into the matrix Spec-C position, at which
point Form Copy (FC) relates it to both lower inscriptions (Chomsky 2021a,
2021b; see Blümel 2014 for related ideas).
(108) a. [CP what1 C [INFLP1 Gretel [vP1 what2 . . .]] and [INFLP2 Hansel [vP2 what3
. . .]]]
b. FC(what1, what2), FC(what1, what3) (indices used only for exposition)
Though space prohibits discussion of the empirical details, this kind of applica-
tion of FC provides promising directions for the analysis of ATB extraction and
related phenomena (e.g., parasitic gaps).115
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
Consider next a question that arises given the FC-based analysis of obligatory
control. Hornstein (1999 et seq.), building on the work of O’Neil (1995, 1997),
develops a Movement Theory of Control (MTC), where control involves
movement of an NP from one to another theta position. As we’ve seen, the
analysis proposed here is quite different: the FC-based approach posits EM-plus
-FC; there is no IM for control, IM being ruled out by Preservation. One
question involves control into adjuncts116. A major component of the MTC is
Sideward Movement (subject to the same concerns as Parallel Merge, from the
113
Besides the arguments we’ve already given, PM also requires a distinct notion of ‘copy’ more
complex than the one adopted in this Element. For example, ‘copy’ could be implemented with
indices as in Collins and Stabler (2016); see also Citko and Gračanin-Yuksek (2021, ch. 2); but
note that such indices violate the Inclusiveness and No-Tampering conditions.
114
For example, FC-based derivation also explains the impossibility of covert ATB movement,
a result that Citko (2005) derives as a contradiction in linearization.
115
Among the challenges for this approach is to explain the unique properties of Right-Node-
Raising, such as cumulative agreement (see Citko 2017 for details).
116
For recent discussion of adjuncts with the minimalist framework, see Bode (2020).
62 Generative Syntax
present perspective; see footnote 113), argued to apply for control into adjuncts
as in (109).
The relevant adjuncts occur above the external argument’s base position in
Spec-vP but below its canonical surface position in Spec-INFL.117 Hornstein
(2003, pp. 30–31) argues that this requires Sideward Movement of the subject
from the adjunct into Spec-vP, yielding (109b) as output, and allowing the
derivation to proceed from there to (109c). This derivation is impossible for
us since, as we’ve noted, Sideward Movement is disallowed (and note that this
derivation violates Duality of Semantics/Preservation). The FC-based approach
would, in contrast, construct the vP and the PP of (109b) independently, then
raise one or the other inscription of Priya to Spec-INFL, and then apply FC to
link the high inscription to both lower ones, much like what we suggested for
(108). Note further that under our FC analysis of control, the NP-trace versus
PRO distinction, which is crucial empirically, falls out; as detailed in subsection
6.2, if the antecedent of an NP is in a theta position, it’s ‘PRO,’ otherwise it’s
‘trace.’118
cyclicity. Much previous work (e.g., Chomsky 2001 et seq.) assumes that
successive-cyclic A-bar movement is forced by the Phase Impenetrability
Condition (PIC), the phase level being the derivational point when an object
117
Consider the Condition C effect in (i), showing that such PPs are below Spec-INFL, and the
adjunct-stranding VP-ellipsis in (ii), showing that the PPs are above Spec-vP. See also Ernst
(2002) and Truswell (2011).
Partial Control would thus involve a structure quite different from that of exhaustive, obligatory
control discussed in Section 5.3.
Merge and the Strong Minimalist Thesis 63
facilitating new analyses of, for example, hyper-raising, among other things.
The nature of, and ultimate explanation for, the A/A-bar distinction constitutes
an important question which we leave open here.
Within the framework of this Element, these issues remain essential.
Consider (110b), for instance, derived via IM from (110a).
At first glance, this derivation seems to violate MY; not only is the syntactic
object labeled CP in (110b) constructed by Merge, but now there are two
accessible inscriptions Emre, where before there was only one. However, the
119
That is, DP (Matushansky 2005; Jiménez-Fernández 2009, i.a.), or nP, if n heads the nominal
domain.
64 Generative Syntax
The present approach could fare better here, given the availability of n-ary
Form-Set: the n-ary operation is available as the general procedure at no cost,
with binary Merge as a limiting case. Nevertheless, coordination retains prob-
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
120
Crucially, we assume Merge does have knowledge that Emre and Emre1 are formed by IM in
generating INFLP, but this information is lost by the next mapping.
121
The empirical puzzles involving respective-predication were first discussed in McCawley
(1968).
Merge and the Strong Minimalist Thesis 65
fact that important judgments can be quite murky. Without a clear understand-
ing of such data, competing analyses can be difficult to compare.
Even more fundamentally, the formal principles underlying island effects are
unclear. Familiar mechanisms like the PIC are inadequate on their own to model
islandhood; the PIC does not block extraction per se, just extraction beyond the
phase edge (with extraction through the edge remaining possible). A challenge
then is to identify plausible formal principles by which extraction could be
constrained. Recent work suggests that island effects derive crucially from the
interaction of the PIC with other, independent principles (e.g., Sichel 2018,
among others, on the interaction of the PIC and Anti-Locality). Additionally, it
has been well known since Miller and Chomsky 1963 that extra-syntactic
factors (e.g., memory organization principles barring self-embedding, and
other factors, Chomsky 1965, pp. 12–14) can have a significant effect in
ostensibly syntactic phenomena. For islands, a range of different extra-
syntactic principles could conceivably be relevant. Given islands’ characteris-
tically complex empirical profile, progress depends on identifying the precise
contribution of independent syntactic and extra-syntactic mechanisms, espe-
cially where judgments are marginal.122
There are additional puzzles, ones surrounding ellipsis and deaccenting,
anaphora and focus, grafts/amalgamation,123 among others. Though we cannot
discuss any of these issues in detail here, promising directions present them-
selves in many cases. Ellipsis phenomena (delineated in Merchant 2018 and
elsewhere), for instance, violate I-language constraints, and share properties
that must be part of performance (e.g., destressing/deletion of repeated material,
parallelism conditions),124 suggesting a crucial role for principles of external-
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
ization in these phenomena. A similar situation holds for many other topics.
Given the framework of this Element and the set of core principles it provides
(i.e., those that enter into SMT), the frontiers of understanding rest on explana-
tory maximization of core principles and their interactions with independent
processes.
In addition to the topics above, there are interesting questions regarding the
nature of operations other than Merge. Although this Element develops the
hypothesis that Merge is the sole structure-building operation of narrow
syntax, structure building is not all there is to syntactic derivations. There
122
Recent years have seen an increased effort to identify the contribution of extrasyntactic factors,
such as processing demands or pragmatic anomaly, in island effects (e.g., Miliorini 2019;
Chaves and Putnam 2020; Culicover and Winkler 2022; Namboodiripad et al. 2022, among
others). See also Heil and Ebert (2018) and Sedarous (2022) for intrasentential code-switching
as a technique isolating syntactic contributions to extraction constraints.
123
See Van Riemsdijk (1998, 2006) and Kluck (2011).
124
See Chomsky and Lasnik (1993), Chomsky (1995, fn31), and Tancredi (1992).
66 Generative Syntax
and Samuel J. Keyser (eds.), The View from Building 20: Essays in Linguistics
in Honor of Sylvain Bromberger, 1–52. Cambridge, MA: MIT Press.
Chomsky, Noam. 1994. Bare phrase structure. MIT Occasional Papers in
Linguistics 5. Department of Linguistics and Philosophy, MIT. Reprinted in
Gert Webelhuth (ed.), Government and Binding Theory and the Minimalist
Program, 383–439. Malden: Blackwell.
[Bare phrase structure was also published in 1995 in: Evolution and Revolution in
Linguistic Theory: Essays in Honor of Carlos P. Otero, Hector Campos and
Paula Kempchinsky (eds.), Washington DC: Georgetown University Press,
51–109.]
Chomsky, Noam. 1995. The Minimalist Program. Cambridge, MA: MIT Press.
Chomsky, Noam. 2000. Minimalist inquiries: The framework. In Roger Martin,
David Michaels, and Juan Uriagereka (eds.), Step by Step: Essays on
Minimalist Syntax in Honor of Howard Lasnik, 89–155. Cambridge, MA:
MIT Press.
References 69
Chomsky, Noam. 2021a. Linguistics then and now: Some personal reflections.
Annual Review of Linguistics 7, 1–11. https://doi.org/10.1146/annurev-
linguistics-081720-111352.
Chomsky, Noam. 2021b. Minimalism: Where are we now, and where can we
hope to go. Gengo Kenkyu 160, 1–41.
Chomsky, Noam. 2021c. Reflections. In Nicholas Allott, Terje Lohndal, and
Georges Rey (eds.), A Companion to Chomsky, 582–594 Hoboken, NJ:
Blackwell/Wiley.
Chomsky, Noam. 2022a. SMT and the science of language. Talk at MIT, April 1.
Chomsky, Noam. 2022b. Genuine explanation and the Strong Minimalist
Thesis. Talk at MIT, April 1.
Chomsky, Noam. In press . The Miracle Creed and SMT. In Giuliano Bocci,
Daniele Botteri, Claudia Manetti, and Vicenzo Moscati (eds.), Issues in
Comparative Morpho-syntax and Language Acquisition.
Chomsky, Noam, Ángel J. Gallego, and Dennis Otto. 2019. Generative
Grammar and the faculty of language: Insights, questions, and challenges.
Catalan Journal of Linguistics Special Issue, 229–261.
Chomsky, Noam and Howard Lasnik. 1993. The theory of principles and
parameters. In Joachim Jacobs, Arnim von Stechow, Wolfgang Sternefeld,
and Theo Vennemann (eds.), Syntax: An International Handbook of
Contemporary Research, vol. 1, 506–569. Berlin: Walter de Gruyter.
Reprinted in Chomsky (1995).
Chomsky, Noam and Andrea Moro. 2022. The Secrets of Words. Cambridge,
MA: MIT Press.
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
Citko, Barbara. 2005. On the nature of Merge: External Merge, Internal Merge,
and Parallel Merge. Linguistic Inquiry 36:4, 475–496.
Citko, Barbara. 2011. Symmetry in syntax: Merge, move and labels.
Cambridge: Cambridge University Press.
Citko, Barbara. 2017. Right node raising. In Martin Everaert and Henk van
Riemsdijk (eds.), The Wiley Blackwell Companion to Syntax, Second Edition.
https://doi.org/10.1002/9781118358733.wbsyncom020.
Citko, Barbara and Martina Gračanin-Yuksek. 2021. Merge: Binarity in
(Multidominant) Syntax. Cambridge, MA: MIT Press.
Collins, Chris. 1997. Local Economy. Cambridge, MA: MIT Press.
Collins, Chris. 2017. Merge(X,Y) = {X,Y}. In Leah Bauke and Andreas Blümel
(eds.), Labels and Roots, 47–68. Berlin: Mouton de Gruyter.
Collins, Chris. 2022. The complexity of trees, universal grammar and economy
conditions. Biolinguistics 16, 1–13.
Collins, Chris and T. Daniel Seely. 2020. Labeling without labels. Manuscript.
lingbuzz (lingbuzz/005486). [A revised version is to be included in Kleanthes
References 71
Fukui, Naoki and Margaret Speas. 1986. Specifiers and projection. In Tova Rapoport
and Elizabeth Sagey (eds.), MIT Working Papers in Linguistics: Papers in
Theoretical Linguistics 8, 128–172. Reprinted in Naoki Fukui. 2006.
Theoretical Comparative Syntax: Studies in Macroparameters. London:
Routledge.
Gallego, Ángel J. (ed.). 2012. Phases: Developing the Framework. Berlin: De
Gruyter Mouton.
Gallego, Ángel J. 2020. Strong and weak “strict cyclicity” in phase theory. In
András Bárány, Theresa Biberauer, Jamie Douglas, and Sten Vikner (eds.),
Syntactic Architecture and Its Consequences II: Between Syntax and
Morphology, 207–226. Berlin: Language Science Press. https://doi.org/10.5281
/zenodo.4280647.
Gazdar, Gerald. , Ewan Klein, Geoffrey K. Pullum, and Ivan A. Sag. 1985.
Generalized Phrase Structure Grammar. Oxford: Blackwell, and Cambridge,
MA: Harvard University Press.
References 73
69–96.
Hornstein, Norbert. 2001. Move! A Minimalist Theory of Construal. Malden,
MA: Blackwell.
Hornstein, Norbert. 2003. On control. In Randall Hendrick (ed.), Minimalist
Syntax, 6–81. Oxford: Blackwell. https://doi.org/10.1002/9780470758342.ch1.
Huybregts, M. A. C. (Riny) 1984. The weak inadequacy of context-free phrase
structure grammars. In G. de Haan, M. Trommelen, and W. Zonneveld (eds.),
Van Periferie naar Kern, 81–99. Dordrecht: Foris.
Huybregts, M. A. C. (Riny) 2017. Phonemic clicks and the mapping asym-
metry: How language emerged and speech developed. Neuroscience &
Biobehavioral Reviews 81, Part B, 279–294.
Huybregts, M. A. C. (Riny) 2019. Infinite generation of language unreachable
from a stepwise approach. Frontiers in Psychology 10. https://doi.org/10
.3389/fpsyg.2019.00425.
Huybregts, M. A. C. (Riny), Robert Berwick, and Johan J. Bolhuis. 2016. The
language within. Science 352:6291, 1286.
74 References
Koizumi, Masatoshi. 1993. Object agreement phrases and the split VP hypoth-
esis. In Jonathan D. Bobaljik and Colin Phillips (eds.), Papers on Case and
Agreement I: MIT Working Papers in Linguistics 18, 99–148. Cambridge,
MA: MIT.
Koizumi, Masatoshi. 1995. Phrase structure in minimalist syntax. Doctoral
dissertation. Cambridge, MA: MIT.
Kratzer, Angelika. 1996. Serving the external argument from its verb. In
Johan Rooryck and Laurie Zaring (eds.), Phrase Structure and the Lexicon.
Studies in Natural Language and Linguistic Theory, vol. 33, 109–137.
Dordrecht: Springer. https://doi.org/10.1007/978-94-015-8617-7_5.
Landau, Idan. 2007. Movement-resistant aspects of control. In William
D. Davies and Stanley Dubinsky (eds.), New Horizons in the Analysis of
Control and Raising, 293–325. Dordrecht: Springer.
Landau, Idan. 2013. Control in Generative Grammar: A Research Companion.
Cambridge: Cambridge University Press.
Lasnik, Howard. 2002. Clause-mate conditions revisited. Glot International
6:4, 94–96.
Lasnik, Howard. 2011. What kind of computing device is the human language
faculty? In Anna Maria and Cedric Boeckx (eds.), The Biolinguistic
Enterprise: New Perspectives on the Evolution and Nature of the Human
Language Faculty, 354–365. Oxford: Oxford University Press.
Lasnik, Howard. 2022. On optionality: A brief history and a case study. Talk at
First Biolinguistic Conference of the Université du Québec à Trois-Rivières,
June 24–26.
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
Nash, Neonard K. 1963. The Nature of the Natural Sciences. Boston. MA:
Little, Brown.
Nunes, Jairo. 2001. Sideward movement. Linguistic Inquiry 31:2, 303–344.
Nunes, Jairo. 2004. Linearization of Chains and Sideward Movement.
Cambridge, MA: MIT Press.
O’Neil, John. 1995. Out of control. North East Linguistics Society 25, Article
25. https://scholarworks.umass.edu/nels/vol25/iss1/25.
O’Neil, John. 1997. Means of control: Deriving the properties of PRO in the
minimalist program. Doctoral dissertation. Cambridge, MA: Harvard University.
Obata, Miki. 2010. Root, Successive-Cyclic and Feature-Splitting Internal
Merge: Implications for Feature-Inheritance and Transfer. Doctoral disserta-
tion. Ann Arbor, MI: University of Michigan, Ann Arbor.
.Obata, Miki and Samuel David Epstein. 2011. Feature-Splitting Internal Merge:
Improper movement, intervention, and the A/A’ distinction. Syntax 14:2,
122–147.
Petitto, Laura Anne. 1987. On the autonomy of language and gesture: Evidence
from the acquisition of personal pronouns in American Sign Language.
Cognition 27:1, 1–52.
.Petitto, Laura Anne. 2005. How the brain begets language. In James McGilvray
(ed.), The Chomsky Reader, 85–101. Cambridge: Cambridge University Press.
Ragsdale, Aaron, Timothy D. Weaver, Elizabeth G. Atkinson, et al. 2023.
A weakly structured stem for human origins in Africa. Nature 617,
755–763. https://doi.org/10.1038/s41586-023-06055-y.
.Reed, Lisa A. 2014. Strengthening the PRO Hypothesis. Berlin: De Gruyter
https://doi.org/10.1017/9781009343244 Published online by Cambridge University Press
Mouton.
Reinhart, Tanya. 1976. The syntactic domain of anaphora. Doctoral disserta-
tion. Cambridge, MA: MIT.
Reinhart, Tanya. 1981. Definite NP anaphora and c-command domains.
Linguistic Inquiry 12:4, 605–635.
.Reinhart, Tanya. 1983. Anaphora and Semantic Interpretation. London: Croom
Helm.
van Riemsdijk, Henk. 1998. Trees and scions: Science and trees. In Fest-Web-
Page for Noam Chomsky. Cambridge, MA: MIT Press.
van Riemsdijk, Henk. 2006. Grafts follow from merge. In Mara Frascarelli
(ed.), Phases of Interpretation, 17–44. Berlin: De Gruyter Mouton.
Roberts, Ian. 2019. Parameter Hierarchies and Universal Grammar. Oxford:
Oxford University Press.
Rosenbaum, Peter S. 1967. The Grammar of English Predicate Complement
Constructions. Cambridge, MA: MIT Press.
78 References
University Press.
Tinsley, J. N., M. U. Molodstov, R. Prevedel, D., et al. 2016. Direct detection of
a single photon by humans. Nature Communications 7: 12172.
Truswell, Robert. 2011. Events, Phrases, and Questions. Oxford: Oxford
University Press.
Turing, Alan. 1952. The chemical basis of morphogenesis. Philosophical
Transactions of the Royal Society of London. Series B, Biological Sciences
237: 641, 37–72.
van Urk, Coppe. 2015. A uniform syntax for phrasal movement: A case study of
Dinka Bor. Doctoral dissertation. Cambridge, MA: MIT.
van Urk, Coppe. 2020. Successive cyclicity and the syntax of long-distance
dependencies. Annual Review of Linguistics 6, 111–130. https://doi.org/10
.1146/annurev-linguistics-011718-012318.
de Vries, Mark. 2009. On multidominance and linearization. Biolinguistics 4:3,
344–403.
References 79
Robert Freidin
Princeton University
Robert Freidin is Emeritus Professor of Linguistics at Princeton University. His research on
syntactic theory has focused on cyclicity, case and binding, with special emphasis on the
evolution of the theory from its mid-twentieth century origins and the conceptual shifts
that have occurred. He is the author of Adventures in English Syntax (Cambridge
2020), Syntax: Basic Concepts and Applications (Cambridge 2012), and Generative Grammar:
Theory and its History (Routledge 2007). He is co-editor with Howard Lasnik of Syntax:
Critical Assessments (6 volumes) (Routledge 2006).
language syntax over the past sixty-five years. It focuses on the underlying principles and
processes that determine the structure of human language, including where this
research may be heading in the future and what outstanding questions remain to be
answered.
Generative Syntax