Nekrasova 2009
Nekrasova 2009
The purpose of the present study is to contribute to the ongoing debate about the use
of lexical bundles by first (L1) and second language (L2) speakers of English. The
study consists of two experiments that examined whether L1 and L2 English speakers
displayed any knowledge of lexical bundles as holistic units and whether their knowledge
was affected by the discourse function of the lexical bundles (discourse-organizing or
referential). The participants in Experiment 1 (N = 61) completed a gap-filling activity,
whereas the participants in Experiment 2 (N = 61) carried out a dictation task. Results
showed that the participants’ knowledge differed for specific lexical bundles and that,
overall, they knew more discourse-organizing bundles than referential bundles. The
implications of the study are discussed in terms of current research about the role of
frequency-based language chunks in L1 and L2 speech processing in English.
Since the late 1970s, linguists have established the importance of formulaic
sequences for language processing and production (Hakuta, 1974; Nattinger &
DeCarrico, 1992; Peters, 1983; Wong Fillmore, 1976; Wray, 2002). Typically
defined as frequent multiword combinations that are stored and retrieved holis-
tically from the mental lexicon at the moment of speech, formulaic sequences
have been argued to minimize encoding work for both the speaker and ad-
dressee, thus allowing for the construction of fluent spoken discourse (Erman,
Preliminary results were reported at AAAL 2007 in Costa Mesa, CA and AAAL 2008 in Washing-
ton, DC. I am grateful to Kim McDonough for her insightful comments on earlier versions of this
article. I also thank Doug Biber, Viviana Cortes, Bethany Gray, and the editor and four anonymous
reviewers of Language Learning for their valuable input, Valeria Kashpur for assistance with data
collection, and Tony Becker for his assistance with data coding and consistent support of this
project through comments and discussion. Any errors are, of course, my own.
Correspondence concerning this article should be addressed to Tatiana M. Nekrasova, Depart-
ment of English, Northern Arizona University, P.O. Box 6032, Flagstaff, AZ 86011-6032. Internet:
[email protected]
2007; Pawley & Syder, 1983; Raupach, 1984; Wood, 2006). In addition, proper
use of formulaic sequences has been found to be critical for the acquisition of
nativelike language competence (Dufon, 1995; House, 1996).
Formulaic sequences, as a broad category, include many different sub-
classes: proverbs, lexicalized stems, clichés, speech formulae, idioms, recur-
ring utterances, and others. Wray (2002) provided a list of terms that are used
to describe aspects of formulaic language in terms of their place on a contin-
uum from being completely fixed (e.g., idioms and set expressions) to more
compositional (e.g., semi-preconstructed phrases, sentence builders, patterns).
Although formulaic language has been the focus of linguistic inquiry for sev-
eral decades, only a few relatively fixed subclasses of formulaic sequences
have been targeted in traditional linguistic studies conducted in phraseology
and pragmatics. As a result, more compositional subclasses of formulaic se-
quences that differed from idioms and set expressions in their structural and
functional characteristics were largely ignored in linguistic research until the
development of corpus-based methods to data analysis. Ever since corpus-
driven research revealed that, in addition to idioms and set phrases, a much
greater number of language constructions have a tendency to occur together,
attempts have been made to formally describe and classify these units in order
to examine their formulaic nature and identify their importance for language
production and language acquisition. These co-occurring constructions and,
more specifically, the question of their psychological reality are the focus of
the present study. Before turning the current discussion to the topic at hand, the
following sections provide a brief overview of research conducted on formulaic
sequences in phraseology, pragmatics, and corpus linguistics, thus situating the
present study within a broader scope of formulaic language.
Pragmatics
Another subclass of formulaic sequences—speech formulas (or routine
formulas)—has been largely explored in pragmatic research within the frame-
work of speech acts through the work of the linguists who examined the lan-
guage of routine social encounters (Coulmas, 1979, 1981; Ferguson, 1976;
House, 1996). Speech formulas were identified as set expressions that are tied
to particular predictable situations and are used to realize such speech func-
tions as thanking, apologizing, and others (e.g., thank you very much, I am very
sorry). Although semantically more transparent compared to traditional idioms,
speech formulas acquired their formulaic status from their ability to meet cer-
tain functional demands that, subsequently, led to their high predictability and
frequency of occurrence in certain types of social situations. At the same time,
speech formulas were described to be similar to idioms in terms of their form:
both subclasses of formulaic sequences are considered to be relatively fixed,
with certain types of speech formulas, however, being defined as more compo-
sitional than others (e.g., Nattinger & DeCarrico, 1992; Van Lancker-Sidtis &
Rallon, 2004).
Corpus Linguistics
The development of corpus-based techniques introduced a new way to explore
a language. Whereas previous language research relied exclusively on native-
speaker intuition when describing language units, corpus linguistics brought
in a more objective frequency-based approach to not only offer new insights
about existing language regularities but also to reveal previously unobserved
phenomena (e.g., Conrad, 2000; Lindemann & Mauranen, 2001). Thus, when
analyzing a range of oral and written corpora, it became obvious that, in addition
to already established classes of formulaic sequences (sayings, proverbs, speech
formulae, and idioms), a large number of language units were found to co-
occur in preferred order without being governed by specific grammar rules
(Altenberg, 1998; Biber & Conrad, 1999; Granger, 1998; Moon, 1998; Sinclair,
1999; Wray, 2002).
According to Biber, Conrad, and Cortes (2004), the four primary functions
of lexical bundles identified in English academic registers and conversation in-
clude (a) stance bundles that convey interpersonal meanings, such as attitudes
and assessments (e.g., it is important to, I don’t think so, I want you to); (b)
discourse organizers that help reveal relationships between prior and coming
discourse, such as topic introduction and topic elaboration (e.g., nothing to do
with, on the other hand, as well as the); (c) referential bundles that perform an
ideational function and are used to make direct reference to physical or abstract
entities, such as time, place, and text references (e.g., is one of the, in the form
of, as a result of, the nature of the); and (d) special conversational bundles
that are mostly used in conversation to express politeness, inquiry, and report
(e.g., thank you very much, what are you doing, I said to him/her). These four
discourse functions of lexical bundles should be distinguished from pragmatic
functions that other multiword constructions, such as speech formulas, can have
(Coulmas, 1979, 1981). Pragmatic functions are usually associated with highly
conventionalized expressions that are more salient and, thus, are used to effec-
tively communicate certain pragmatic meanings, such as expressing requests,
apologies, or gratitude. Unlike speech formulas that are more interactional in
nature and are highly dependent on conversational context, the majority of
lexical bundles operate on a textual level and are relatively “context-free.” For
example, whereas the occurrence of a speech formula Nice to see you is closely
bound to a social situation of greeting a person, the occurrence of a lexical
bundle nothing to do with is not associated with any specific situation and
can be equally frequent in a variety of contexts. In this regard, lexical bundles
that serve special conversational functions in discourse are more likely to have
pragmatic functions.
oral corpora (De Cock, 2000; De Cock, Granger, Leech, & McEnery, 1998;
Granger, 1998; Warga, 2005). These studies indicated that L1 and L2 (second
language) speakers’ use of recurrent phrases was different both quantitatively
and qualitatively (De Cock, 2000; De Cock et al.). More specifically, L2 speak-
ers were found to be unaware of the more common, yet less salient L2 chunks,
and in order to compensate for their lack of awareness, they often referred to
L1 transfer. The process of L1 transfer was realized in several ways. First, L2
speakers were found to either modify or avoid using certain L2 constructions
that did not have L1 equivalents. Second, L2 speakers tended to overuse those
L2 constructions whose L1 equivalents were more common. Finally, L2 speak-
ers showed the misuse of those constructions whose L2 equivalents did not
match their L1 counterparts. As De Cock (2000) argued, turning to L1 transfer
during L2 production could potentially lead to the “foreign-soundness” of L2
speakers’ speech and writing.
Because lexical bundles are defined as combinations that occur frequently
in a text or a collection of texts, it is logical to assume that the frequency counts
serve as an indication of these units being conventionalized by the speech
community, which would suggest their formulaic nature. At the same time,
some corpus linguists argue that simple frequency counts do not provide enough
grounds to view any corpus-derived construction as formulaic (De Cock, 2000;
De Cock et al., 1998). One of the reasons for skepticism is that frequency
information may not be relevant to how language structure is represented in
one’s mind. For example, a combination of the two words it and is extremely
frequent in English language, mostly because the individual words included
in this combination are closed-class items that occur very frequently in any
corpus. Thus, it is very unlikely that this combination is represented in the
mind as a holistic unit and can be defined as formulaic. Another reason to
question the assumptions about the formulaic nature of lexical bundles comes
from their structural and functional differences from other established classes
of formulaic sequences. In order to contribute to the existing body of research
on lexical bundles and define their place within a broader category of multiword
constructions, the present study explores the issue of psycholinguistic validity
of lexical bundles.
and the majority of lexical bundle studies generally describe the distribution of
these units in different registers (see Biber & Conrad, 1999; Biber, Conrad, &
Cortes, 2004; Biber et al., 1999; Cortes, 2004).
In the only study to date that has examined the issue of psycholinguistic
validity of lexical bundles, Schmitt, Grandage, et al. (2004) questioned whether
corpus-derived recurrent clusters (i.e., lexical bundles) are psycholinguistically
valid and, therefore, stored and processed holistically. After identifying 25
sequences from previous publications, they created a text about a hitchhiker
and embedded the target sequences in it. Both English L1 (n = 34) and L2
(n = 45) participants performed a dictation task during which they listened
to the recorded text and orally reconstructed it sentence by sentence. The
authors argued that the bundles the participants were able to reproduce could
be considered formulaic and were holistically stored in mind. The findings of
the study suggested that not all corpus-driven clusters were psycholinguistically
valid according to their criteria, with many of them being used idiosyncratically
by the individual speakers. The researchers concluded that both corpus and
psycholinguistic approaches should be used when deciding whether corpus-
driven clusters share the same psycholinguistic characteristics as holistically
stored formulaic chunks. By employing the sequences that varied in length,
frequency, and transparency of meaning, the study did not provide conclusive
evidence to either bridge the two categories (i.e., lexical bundles and formulaic
chunks) or distinguish them.
Schmitt, Grandage et al.’s (2004) study is innovative in that it put a com-
monly accepted assumption to empirical testing. At the same time, this study
displayed several limitations that need to be addressed here. First, not all target
sequences employed in the study could qualify as recurrent bundles: Some of
them were much more frequent in the corpus than others (e.g., you know vs.
to make a long story short). Second, whereas some of the bundles could be
described as more salient in terms of the pragmatic functions they realized in
certain language situations (e.g., I don’t know what to do, go away, to make a
long story short, it’s not too bad), other bundles did not have any pragmatic
function and served more like cohesive devices in a text (e.g., as shown in
figure, is one of the most, what I want to, etc.). Finally, some of the bundles
examined in the study were clearly extracted from academic register, whereas
other bundles came from and were characteristic of a more informal register
(i.e., conversation). Both types of bundles were then embedded in a story about
a hitchhiker, a narrative that had a rather informal tone, which, as the authors
acknowledged, might have created some difficulties for the academic regis-
ter bundles to be equally produced by the participants. Thus, the choices the
authors made during the initial selection of the target structures and the context
in which they were embedded might have contributed to the inconclusive results.
Present Study
The present study was designed to contribute to the debate concerning the psy-
cholinguistic validity of lexical bundles by addressing some of the limitations
of the previous research conducted in this area. First, all target structures em-
ployed in the study were lexical bundles; thus, they were identified strictly on
the basis of frequency counts. Second, all lexical bundles were homogeneous in
terms of their functional characteristics: They all performed discourse functions
signaling the relationships between different elements (i.e., phrases, clauses,
sentences) in a text. None of the bundles had an advantage of expressing a prag-
matic function by carrying out the meaning related to a certain conversational
context (e.g., See you later in a situation of saying “good-bye” to someone).
Furthermore, the findings of previous L1 corpus-based studies indicated that
discourse function of lexical bundles related to the frequency of their use by the
participants (Cortes, 2004, 2006). Therefore, the present study also investigates
the possible effect of two discourse functions of lexical bundles—referential
bundles and discourse organizers—that may affect their production by L1 and
L2 English speakers. Finally, an attempt was made to ensure that all contexts in
which target lexical bundles were embedded were register-appropriate, that is,
both the target lexical bundles and the contexts belonged to the same registers:
university teaching and textbooks.
The main purpose of the study was to examine if lexical bundles are recog-
nized by L1 and L2 participants as holistic units and, therefore, have psycholog-
ical validity. Following Schmitt, Grandage, et al.’s (2004) study, it was assumed
that no direct nonlaboratory measure was available to determine whether L1
and L2 participants recognize lexical bundles as holistic units. For that reason,
participants’ recognition of lexical bundles as holistic units was operationalized
as (a) their ability to produce them as fixed units in both short and extended
pieces of discourse, (b) their ability to produce lexical bundles in a contex-
tually appropriate matter, and (c) participants’ use of lexical bundles to ease
the processing burden during text comprehension and subsequent production
(see Wray, 2000, 2002; Wray & Perkins, 2000). The study consists of two
experiments that employed different measures to assess whether L1 and L2
English speakers have knowledge of lexical bundles as holistic units. Whereas
Experiment 1 involved a controlled-production activity (a gap-filling task), Ex-
periment 2 employed an extended production activity (a timed dictation task).
Experiment 1
Method
Participants
The participants were L1 English speakers (n = 20), advanced L2 English
speakers (n = 18), and intermediate L2 English speakers (n = 23), all of
whom were undergraduate and graduate students at a regional university in
the western United States. None of the participants were majoring in applied
linguistics or TESL. The L1 speakers consisted of 4 males and 16 females,
aged between 18 and 45 years (M = 24.3, SD = 7.88). The advanced L2
speakers included 8 males and 10 females, aged between 20 and 43 years (M
= 28.44, SD = 8.32), who had completed between 3 and 16 years (M = 10.56,
SD = 3.71) of formal high school/college education in English and reported
the length of residence in the United States ranging from 1 to 127 months
(M = 20.17, SD = 30.56). The intermediate L2 speakers included 12 males
and 11 females, aged between 17 and 37 years (M = 20.7, SD = 3.85), who
had completed between 4 and 11 years (M = 7.17, SD = 1.99) of formal
English instruction, and their length of residence in the United States was
reported to be between 2 and 48 months (M = 6.39, SD = 9.65). The advanced
and intermediate groups were established on the basis of the participants’
enrollment status. Whereas the advanced L2 speakers were degree-seeking
undergraduate or graduate students, the intermediate L2 speakers were enrolled
in an Intensive English Program. The participants volunteered to take part in
the study and were not compensated.
Materials
Gap-filling task. Following Schmitt, Grandage, et al. (2004), it was as-
sumed that no direct measures were available to assess participants’ knowledge
word within a bundle was deleted with space provided to be filled in by the
participants. Finally, all test items were randomly ordered and presented as a
list of sentences (see Appendix B).
The decision as to which word within a lexical bundle to delete was moti-
vated by two criteria. First, lexical bundles are typically described as incomplete
structural units (Biber et al., 1999), so they usually include a limited set of func-
tion words, such as articles, particles, and prepositions, that are often used to
construct the frame of a lexical bundle (e.g., to __ with the, in the __ of, the __
of the, etc.). Because one frame could be employed in several different lexical
bundles, it would be easier for the participants to produce the missing elements
of the frame; this, however, would not necessarily illustrate their knowledge of
a specific bundle. Thus, the decision was made to delete a content word that is
used uniquely in each bundle to explore if the participants could produce each
individual bundle rather than the frame (e.g., the bundles the rest of the and the
top of the are created from the same frame the _ of the). Second, each frame
could potentially be filled with a number of different content words (e.g., in the
absence of, in the form of, in the case of ). Therefore, selecting an appropriate
content word associated with a particular frame in a certain context would pro-
vide more support for the idea that participants recognize certain lexical bundle
as a unit.
The materials were pilot-tested with 10 L1 and 10 L2 speakers of English.
Based on the pilot test, a few sentences were judged to be too difficult for
intermediate learners and the contexts for the target bundles were replaced. The
replaced contexts were also selected from the corpus. A split-half reliability
procedure was used to measure the internal consistency among the items in the
gap-filling task. Because there were two different structures targeted in the task,
separate Guttman split-half coefficients were obtained for discourse-organizing
and referential bundles, which were .86 and .84, respectively, suggesting that
both sections of the task had sufficient internal consistency.
Analysis
The data were scored by the researcher by giving one point to each lexical
bundle for which the participant provided a contextually appropriate word.
Because some lexical bundles could be equally possible (e.g., at the beginning
of, at the end of ) or synonymous (e.g., what I want to, what I have to, what I need
to) in certain contexts, the decision was made to give one point to each bundle
that was produced as contextually appropriate. All modifications of the original
bundles were checked in terms of their frequency range against the TOEFL
2000 Spoken and Written Academic Language Corpus and were given one
point if they occurred at least 10 times per million words in any written or oral
register in the corpus (e.g., what I have to do, the order in which). The decision
to use this frequency cutoff was based on the definition of a lexical bundle
as a frequently occurring sequence in a register, originally identified by Biber
et al. (1999) as a unit that occurred at least 10 times per million words. No points
were given if the resultant sequence did not occur frequently in the corpus (i.e.,
did not qualify as a lexical bundle), was not contextually appropriate, or if the
item was left blank. Spelling errors were ignored. An independent rater scored
36% of the test data, and simple percentage agreement with the researcher was
98%. After the data were scored, each lexical bundle was analyzed in terms
of how frequently and how accurately it was completed as well as how much
modification to the original form it exhibited. Due to the unequal number of
discourse-organizing and referential bundles in the gap-filling task, raw scores
were converted into proportions, which were then used in the statistical tests.
In addition to the significance tests, the results were also analyzed in terms of
the effect size to estimate the magnitude of the observed differences, measured
by the standardized difference between the means (Rosenthal & Rubin, 1982).
Alpha was set at .05 for all statistical tests.
Results
The research question asked whether English L1 and L2 speakers differed in
their knowledge of lexical bundles that served different discourse functions.
As shown in Table 1, L1 speakers scored the highest on the gap-filling task,
with a mean score of .88 (SD = .06). In terms of the two L2 groups, advanced
L2 speakers scored higher (M = .84, SD = .08) than the intermediate L2
speakers (M = .53, SD = .11). Furthermore, compared to the mean scores for
the referential bundles, the mean scores for discourse organizers were higher
for all three groups, with advanced learners’ scores (M = .93, SD = .06) being
the same as the native speakers’ scores (M = .93, SD = .07).
Discourse-
Referential org. Total
Group N M SD M SD M SD
appropriately at least 80% of the time were completed with the same word by
all L1 participants (at the same time, on the other hand, know what I mean, one
of the most, nothing to do with, or something like that). Only one bundle (in
the absence of ) was appropriately completed by L1 speakers less than 50% of
the time. Advanced L2 speakers completed 11 bundles appropriately 100% of
the time, six of which overlapped with those completed by L1 speakers (at the
end of, at the same time, on the other hand, I would like to, know what I mean,
one of the most). Furthermore, advanced L2 speakers completed eight lexical
bundles as fixed units (i.e., with the same word), six of which were the same
as completed by L1 speakers and the two additional bundles included I would
like to and if you look at. Three bundles were completed appropriately less than
50% of the time (e.g., in the absence of, on the basis of, as a result of ). Finally,
intermediate L2 speakers appropriately completed only 1 lexical bundle 100%
of the time (what I want to), 7 lexical bundles 80–99% of the time, and 14
bundles less than 50% of the time (e.g., in the absence of, on the basis of, in
terms of the, as a result of, in the case of, in the form of, in the presence of, etc.).
In addition, intermediate L2 speakers completed three lexical bundles as fixed
units, which overlapped with those produced by L1 and advanced L2 speakers
(at the same time, on the other hand, one of the most).
Experiment 2
Method
Participants
L1 speakers. The L1 speakers in this study were L1 speakers of American
English who were students at a regional university in the western United States.
Twenty-one participants were recruited on a voluntary basis from among stu-
dents enrolled in a Freshmen Composition course and were offered five extra
credit points for their participation in the study. None of the participants in
Experiment 2 took part in Experiment 1. The participants consisted of 9 males
and 12 females, aged between 18 and 23 years (M = 18.9, SD = 1.09). Only
one participant reported that they had taken a course that discussed language
acquisition.
L2 speakers. In order to account for possible L1 influence, all of the
participants in Experiment 2 were from the same L1 background. The L2
speakers were English as a Foreign Language (EFL) learners (N = 40) enrolled
in a public university in western Siberia, Russia. The participants consisted
of 6 males and 34 females, aged between 19 and 22 years (M = 20.38, SD
= .93), who were all native speakers of Russian. They completed between 7
and 17 years (M = 11.86, SD = 2.29) of formal English instruction, and none
of them reported that they had lived in/visited countries where English was
spoken as a native language. All participants reported that they had never taken
courses in Second Language Acquisition, Psycholinguistics, Genre/Discourse
Target Structures
The target structures were lexical bundles as defined previously, which repre-
sented two discourse functions: referential and discourse-organizing. Based on
Biber, Conrad, and Cortes’s (2004) corpus-based study of university discourse,
12 lexical bundles (see Appendix C) were selected from the corpus of classroom
teaching and textbooks. Three criteria were considered when selecting target
bundles. First, bundles in both functional categories were matched for the word
length (12.8 letters/bundle for discourse organizers and 12.2 letters/sequence
for referential bundles). Second, the bundles in both groups were matched for
the frequency range with which they occurred in classroom teaching and text-
books: In each category, three more frequent (40–99 times per million words)
and three less frequent (10–19 times per million words) sequences were used.
Furthermore, an attempt was made to select the bundles that were frequent in
both academic prose and conversation: Of six discourse-organizing bundles,
three were frequent in both registers and three were frequent in the academic
prose only. Likewise, two referential bundles were frequent in both registers and
four bundles were frequent in the academic prose only. Of 12 lexical bundles
tested in Experiment 2, seven bundles were previously employed in Experiment
1 and five bundles were new (was one of the, than or equal to, the nature of
the, has to do with, in this chapter we).
Materials
The materials consisted of the dictation activity, a follow-up questionnaire, and
a cloze test.
Dictation activity. To elicit participants’ immediate recall of lexical bun-
dles, a dictation activity was used. Dictation is widely used in L2 classrooms
as a part of a dictogloss, which is claimed to be an effective language learn-
ing task that provides a context for negotiation and facilitates L2 learning
(Kowal & Swain, 1997; Swain, 1998; Wajnryb, 1990). For research purposes,
Design
This study employed a cross-sectional design to test the effect of the partici-
pants’ L1 background and the discourse function of lexical bundles on their
immediate recall of lexical bundles during the dictation activity. The dependent
variable was the participants’ immediate recall of lexical bundles, which was
operationalized as their score on the dictation activity.
Procedure
For the dictation activity, the instructions and the task were recorded as a single
audio file. All participants were tested in a computer lab. The participants
listened to the instructions, completed the practice item, and did the dictation
task during which they listened to the recorded text and recalled it section by
section. Each section of the text was recorded twice, which was followed by
a 1–2-min pause for the participants to do a written recall of the section. The
participants were not allowed to take any notes while they were listening to the
recording. The same procedure was repeated for all 13 sections.
L1 Speakers
The L1 speakers enrolled in the Freshmen Composition course were tested by
their instructor during their scheduled class time. The instructor informed the
students about the experiment and reviewed the consent form with them. Those
students who agreed to participate in the study completed the consent form and
did the dictation task, followed by the questionnaire. Two additional students
who had been absent on the day of testing completed the tasks several days
later.
L2 Speakers
The data from the Russian participants were collected by their instructor during
their scheduled English class. The researcher electronically mailed all test ma-
terials and detailed instructions to the instructor and had a phone conversation
with her to ensure that the same procedure for task administration was fol-
lowed for both participant groups. The instructor informed the students about
the experiment and those students who volunteered to participate in the study
completed the cloze test, followed by the dictation task and the questionnaire.
All typed test answers completed by the students were saved as separate files
and sent electronically to the researcher shortly after the testing. All paper-and-
pencil answers were collected by the instructor, scanned, and electronically
mailed as attached files to the researcher as well.
Analysis
Cloze Test
The cloze test was scored by identifying the two most frequently supplied
answers in the L1 speakers’ pilot tests as the base line for scoring L2 speakers’
responses. Thus the answer to each test item was scored as either correct if it
matched one of the two possible responses, or as incorrect if it did not. Each
correct response was given a score of one point, for a possible total of 55
points.
Dictation
Two analyses of the recalled texts were carried out. First, the recalled texts were
analyzed by the researcher for the participants’ use of the target lexical bundles.
Each lexical bundle used by a participant in their texts was given a score of
one point. All modifications of the original bundles and new sequences were
checked in the TOEFL 2000 Spoken and Written Academic Language Corpus
to ensure that the product sequences were as frequent as at least 10 times per
million words in the corpus. Those new structures and the modifications of
the original form that resulted in sequences that could not be identified in the
corpus were not given any points, as they did not qualify as lexical bundles
(e.g., to do about this, equal to or more, just as if not). Spelling errors were
ignored. The subscores for each discourse type of lexical bundles were totaled
Results
Cloze Test
The mean score for the cloze test was calculated (M = 37.35, SD = 7.58), and
all L2 speakers who scored less than or equal to the mean score were assigned
to a lower proficiency group, with the range of scores from 17 to 37 (M =
30.00, SD = 5.26). The participants who scored higher than the mean score
were assigned to a higher proficiency group (38–46, M = 42.78, SD = 3.04).
Dictation
The first research question asked if English L1 and L2 speakers differed in their
knowledge of lexical bundles that served two different discourse functions. The
scores that the participants received on the dictation activity are presented in
Table 3. The higher proficiency L2 speakers recalled more lexical bundles
(M = 6.83, SD = 2.29) than both the L1 speakers (M = 5.14, SD = 1.85) and
the lower proficiency L2 speakers (M = 3.65, SD = 1.90). Table 3 also shows
that all three participant groups recalled the discourse-organizing bundles more
frequently than the referential bundles.
Discourse-
Referential org. Total
Group N M SD M SD M SD
To address the first research question, the data were analyzed using a linear
mixed model with group as a between-subjects, three-level factor (L1 speakers
vs. higher proficiency L2 speakers vs. lower proficiency L2 speakers) and
function as a repeated two-level factor (discourse-organizing or referential
bundles). Results of evaluation of assumptions of normality and homogeneity
of variance-covariance were satisfactory. The results indicated that group was
a significant factor, F(2, 111.112) = 17.84, p < .05, which suggests that there
were significant differences among the three participant groups in terms of their
recall of lexical bundles. The ω2 = .13 indicated that approximately 13% of
the variation in participants’ scores was attributed to the differences among the
three participant groups. Function was also a significant factor, F(1, 111.112) =
95.34, p < .05, Cohen’s d = 1.56, showing that there was a significant difference
of a large magnitude between the recall of the two types of lexical bundles by
the participants. However, there was no significant interaction between group
and function (p > .05).
A pairwise comparison of the participant groups using a Bonferroni adjust-
ment (Table 4) indicated that higher proficiency L2 speakers recalled signifi-
cantly more lexical bundles compared to both L1 speakers, p < .05, Cohen’s
d = .81, and lower proficiency L2 speakers (p < .001, Cohen’s d = 1.51). L1
speakers recalled significantly more bundles than lower proficiency L2 speakers
(p < .05, Cohen’s d = .79).
Additionally, the results of Experiment 2 indicated that only 45% of all
possible bundles were recalled during the dictation activity. Furthermore, L1
speakers recalled only one lexical bundle 100% of the time (nothing to do with)
and three lexical bundles 80–99% of the time (in this chapter we, to do with
the, on the other hand), all of which served discourse-organizing function. The
other eight bundles were recalled less than 50% of the time, with the three least
recalled bundles being in terms of the (9%), was one of the (14%), and the
nature of the (19%). The higher proficiency L2 speakers recalled one bundle
100% of the time (in this chapter we), seven bundles 80–99% of the time (the
nature of the, than or equal to, nothing to do with, in terms of the, to do with
the, the top of the, on the other hand), and four bundles less than 50% of the
time, two of which were recalled the least: the rest of the (13%), and was one
of the (17%). Finally, the lower proficiency L2 speakers recalled three bundles
80% of the time (in this chapter we, nothing to do with, to do with the) and the
other nine bundles less than 50% of the time, with the least recalled bundles
being as well as the (6%), in terms of the (6%), the rest of the (12%), was one
of the (12%), and than or equal to (12%). All three participant groups recalled
three discourse-organizing bundles as fixed units more than 90% of the time
(in this chapter we, on the other hand, nothing to do with).
Research question 2 asked if the use of lexical bundles allowed L1 and
L2 speakers to retain more information during discourse comprehension and
subsequent production. The results of a subsequent analysis of the role of
lexical bundles on the i-units density indicated that only one group—higher
proficiency L2 learners—recalled considerably more i-units for the sections that
contained lexical bundles in the original text, t(22) = 13.21, p < .001, Cohen’s
d = 2.75. The other two participant groups did not show any difference in their
recall of i-units in relation to the presence of lexical bundles in the text (p >
.05). A Pearson correlation test indicated that there was no reliable relationship
between the length of a section and the number of i-units produced by the
participants (p > .05).
Questionnaire
To supplement the results of statistical tests, the results of the questionnaire
are reported here. Overall, 64% of the participants reported that they noticed
more bundles than they actually used. To the question of which bundles they
used in their own text reconstructions, 69% of L1 speakers, 44% of higher
proficiency L2 speakers, and 14% of lower proficiency L2 speakers reported
fewer bundles than they actually used. In terms of the accuracy, 43% of L1
speakers, 17% of higher proficiency L2 speakers, and 65% of lower proficiency
L2 speakers reported that they either noticed or used the bundles that were
not used in the original text. Although L1 speakers reported a variety of new
bundles, there were only two new bundles that lower proficiency L2 speakers
indicated as noticed or used: at the same time and in the case of . To the question
of why they used (or did not use) certain expressions in their recalled texts, the
majority of the participants reported that they were easy to remember (77%),
whereas some said that these expressions stood out in a sentence (12%), helped
to capture the main idea (3%), or helped to link different ideas (2%). To the last
question about which expressions they thought were particularly helpful for
understanding the meaning of the text, the answers for the two language groups
varied. L1 speakers gave more descriptive answers, explaining that the most
useful expressions were the phrases that “introduced the ideas, such as in this
chapter we or the nature of the,” “were the basis of the sentence,” “consisted of
words that often go together,” and “helped combine the phrases in a sentence.”
L2 speakers were more specific in their answers and listed the bundles that
they found particularly useful, among which the most frequent were on the
other hand (63%), nothing to do with (54%), and has to do with (52%), all
discourse-organizing bundles.
Summary of Findings
To summarize the findings of Experiment 2, the three participant groups were
different in their recall of lexical bundles, with higher proficiency L2 speakers
outperforming the L1 speakers and the lower proficiency L2 speakers. Addi-
tionally, all three participant groups recalled more discourse-organizing bundles
than referential bundles. Finally, although participants produced some bundles
more frequently than others, only the higher proficiency L2 group showed dif-
ferent recall rates of i-units for the sections that contained lexical bundles in
the original text compared to those sections that did not. A general discussion
of the findings of both experiments follows.
General Discussion
Holistic Status of Lexical Bundles
The study employed a production criterion to explore if L1 and L2 English
speakers demonstrated any knowledge of lexical bundles as holistic units. In
their study on the psychological validity of corpus-derived recurrent clusters
Schmitt, Grandage, et al. (2004) argued that not all clusters were produced
intact, hence not all of them could be considered formulaic. At first glance,
Schmitt, Grandage, et al.’s findings seem to be similar to the findings obtained
in the present study. However, because the target sequences utilized in Schmitt,
Grandage, et al.’s study were heterogeneous in terms of their structure (i.e.,
some more structurally complete than others) and functions served in a text
(pragmatic versus discourse), one should be cautious when generalizing these
findings to lexical bundles. Furthermore, Schmitt, Garndage, et al.’s study
employed only one criterion—intact form—to judge whether corpus-derived
clusters were psychologically real; this criterion cannot always be applied to
lexical bundles, which, due to their structural characteristics, allow more vari-
ation in their form. The results from the present study indicated that L1 and L2
speakers did not use all lexical bundles the same way. Although some of the
bundles were consistently produced in a fixed form more frequently than others
(e.g., on the other hand, at the same time, nothing to do with), other bundles
were contextually appropriate but showed more variation in form (e.g., what I
want to, if you look at), whereas other bundles were contextually appropriate
less than 50% of the time (e.g., in the absence of, in terms of the, in the case
of, in the form of, on the basis of, the nature of the). This distribution of lexical
bundles suggests two things. First, form-fixedness could, on the one hand, indi-
cate that a bundle itself is psychologically fixed. On the other hand, the fact that
L1 speakers produced fewer fixed lexical bundles than advanced L2 speakers in
Experiment 1 could suggest that L1 speakers had a larger inventory of lexical
bundles that allowed them to select the most contextually appropriate variant.
Thus, form-fixedness could be an indicator of one’s language proficiency level,
which is discussed in greater detail in the following section. Next, the results of
this study also suggest that more than one criterion should be considered before
a bundle can be defined as a holistic unit: how frequently it is appropriately pro-
duced in a certain context and how frequently it is produced in a fixed form. The
data from both experiments imply that, instead of a binary distinction of either
being treated as a holistic unit or not, a bundle should be described in terms of
its place on a continuum from more holistic to more compositional units. For
example, based on the two criteria discussed here, the following bundles from
Experiment 1 could be defined as leaning more toward the holistic end—one
of the most, at the same time, know what I mean, I would like to, on the other
hand, what I want to, if you look at, a little bit about, the beginning of the—
whereas in the form of, in the case of, in terms of the, in the absence of, and on
the basis of would lean more toward the compositional end of the continuum.
Second, different production tasks seemed to feature different lexical bun-
dles as holistic units. The issue here is not necessarily that one task was more
accurate than the other. Rather, the two tasks measured two different types of
knowledge of lexical bundles: whereas the gap-filling task measured L1 and
L2 English speakers’ knowledge of the particular word needed to complete the
frame, the dictation task measured participants’ knowledge of an entire bundle.
The fact that only 45% of all possible bundles were recalled in Experiment 2 (as
opposed to 74% in Experiment 1) suggests that it was easier for the participants
to display the knowledge of a lexical bundle when a frame was provided, as
they were prompted to refer to this knowledge. Thus, the smaller number of
lexical bundles produced in Experiment 2 does not necessarily indicate that
participants did not have the knowledge of these structures; they simply might
not have been prompted to fully demonstrate this knowledge.
In addition, the results of Experiment 2 showed that the relationship between
the presence of lexical bundles in a text section and the number of i-units
recalled by the participants was found significant only for higher proficiency
L2 speakers. This, however, could lead to two different conclusions. On the
one hand, higher proficiency learners could, indeed, have employed lexical
bundles in order to reduce the processing burden during L2 comprehension
and subsequent production. In this case, it would suggest that the participants
could have recognized the holistic status of lexical bundles. On the other hand,
this difference among the groups might have been attributed to the difference in
learning skills acquired by L2 speakers, which increased with L2 proficiency.
This issue is discussed in more detail in the next subsection.
Finally, because the results of Experiment 2 indicated that higher proficiency
L2 learners not only recalled more lexical bundles during the dictation activity,
with most of them being recalled in the original form as presented in the input,
but also recalled significantly more idea units for those sections of the text that
contained lexical bundles, it could be a reasonable assumption to make that
producing these units unmodified during text recall helps a language user to
retain more information, which, again, could be an indication of the holistic
manner of lexical bundles. Thus, an interesting question to explore is whether
the holistic nature of lexical bundles, as indicated by the ease of the processing
burden during text recall, is necessarily reflected in a greater number of lexical
bundles produced intact. Consequently, a post hoc analysis was carried out to
determine whether there was any relationship between the number of idea units
recalled during the dictation activity and the overall number of lexical bundles
produced intact by the three participant groups.
In the post hoc analysis, the total number of lexical bundles produced
intact and the number of idea units recalled during the dictation activity were
calculated and then correlated. Whereas the Pearson correlation coefficient
obtained for L1 speakers indicated a weak correlation between the two variables
(r = .26, p > .05), the coefficient obtained for the two L2 speaker groups
showed moderate correlations between the variables (r = .46, p < .05 for the
higher proficiency L2 speakers and r = .57, p < .05 for the lower proficiency
L2 speakers). Thus, the post hoc analysis indicated that whereas there was
a positive moderate relationship between the number of idea units recalled
and the number of lexical bundles produced intact, the same did not hold
true for L1 speakers, for whom the recall of text information (i.e., idea units)
did not seem to be related to the intact production of lexical bundles. Taken
together, the findings of Experiment 2 suggest that the holistic nature of lexical
bundles might not necessarily be reflected in a greater number of these units
produced in the exact form in which they appeared in the input. This, again,
provides additional support for the idea that more than one criterion (e.g., intact
production) should be taken into account in order to identify a lexical bundle
as a formulaic unit.
Proficiency Differences
The results of both experiments in terms of the difference (or no difference in
Experiment 1) between L1 and advanced L2 speakers were unexpected. In their
dictation study in which the participants orally reconstructed a story, Schmitt,
Grandage, et al. (2004) found that, on the whole, L1 speakers performed better
in terms of both the accuracy of reproduction and the number of accurately
reproduced chunks. They also discovered that L1 speakers did very little mod-
ification of the original sequence and either used the exact string or did not use
it at all. Furthermore, L2 speakers were found to partially reproduce the strings
or produce them inaccurately. Contrary to these findings, higher proficiency
L2 speakers in Experiment 2 of the present study were found to not only recall
a larger number of lexical bundles but also to use very few modifications of
the original bundles. L1 speakers, however, showed more creativity within the
reproduced bundles and created strings that were very different from the target
bundles (e.g., the following variations of the target sequence than or equal to
were used: better or equal to, important or equal to, equal to or more, etc.).
A great number of the sequences that were modifications of the target bundles
could not be identified in the corpus, suggesting that L1 speakers showed id-
iosyncratic use of these bundles, which might be unrecognizable to others and
not flagged by frequent occurrence in a corpus, a feature that is considered to
be characteristic of L2 speakers’ production (Foster, 2001; Schmitt, Grandage
et al., 2004).
One of the possible explanations to account for these results is the nature
of the L2 use that the L2 speakers practiced in their L2 classroom. Being EFL
learners, the L2 participants learned English in a classroom and were constantly
engaged in activities that focused on memorization of lexical items and oral
reproduction of recorded texts. Thus, although gaining more proficiency in
English, higher proficiency learners could have developed both the skills to
hold longer stretches of words in short-term memory and to attend to the
language units used in the text to be able to exactly reproduce them. In contrast,
L1 speakers seemed to grasp an overall idea of the text without paying too much
attention to the units of language used in the original text. The L1 speakers’
answers to the questionnaire, which they completed after the dictation activity,
provide some evidence to the idea expressed above. L1 speakers not only used
more lexical bundles in their recalled texts than the number they reported but
also reported more bundles that were not present in the original text as noticed
and used in their own texts.
Comparing the two nonnative groups, one of the reasons why higher pro-
ficiency L2 speakers performed better than lower proficiency L2 speakers is
that they may have acquired greater lexical knowledge, which may lead to the
enhanced knowledge of lexical bundles.
sooner. The data from Experiment 2 provide more support for this argument:
Whereas the three most frequent bundles used in 90% of all text recalls were
discourse-organizing sequences (in this chapter we, on the other hand, and
nothing to do with), the four referential bundles (the rest of the, in terms of the,
the nature of the, and was one of the) were the least frequent and occurred in
only 20% of all text recalls. Furthermore, in their questionnaires, the majority
of the participants referred to discourse-organizing bundles as being the most
helpful for understanding the meaning of the text by either describing their
characteristics (e.g., introducing new ideas) or listing specific examples (e.g.,
on the other hand, nothing to do with, and has to do with).
Another reason why discourse organizers may have been easier for the
participants to produce than referential bundles might have to do with the char-
acteristics of the frames used to produce these two classes of lexical bundles.
Although both types of bundles are usually incomplete structural units, the
same frame might be used with different discourse organizers much less than
with referential bundles (e.g., compare on the other hand to the rest of the,
the beginning of the, the end of the, the top of the). Thus, the strength of as-
sociation of a specific frame with a particular lexical bundle might be higher
for discourse organizers than for referential bundles, which would make the
former more salient and easier to retrieve than the latter. This, however, is just
a hypothesis, which needs to be further tested on larger samples.
Finally, some discourse-organizing bundles might have the advantage of
being more salient to the participants due to the fact that their usefulness as
transition phrases is explicitly discussed in many language classes in both the
L1 and L2 educational contexts (e.g., Cortes, 2006).
Implications
Pedagogical Implications
One of the findings of the study was that lower proficiency English learners
were not able to accurately produce as many lexical bundles as did L1 speakers
and higher proficiency learners. This finding could be interpreted in two ways:
the learners either did not have the knowledge of how to use certain bundles
appropriately or they preferred to use other structures instead. By underusing
lexical bundles whose function is to signal relationships between smaller and
larger pieces of discourse, L2 learners run the risk of creating texts without
cohesiveness and clear organizational structure. Thus, L2 learners need to
become aware of how using lexical bundles can help them improve their writing.
This finding is consistent with Cortes (2004), who demonstrated that by just
being exposed to lexical bundles in a specific register L1 learners could not
master the use of these structures in their own writing. Cortes argued that in
order for the learners to use lexical bundles appropriately, they need to “notice”
the contexts in which these units are typically used, as well as the discourse
functions they perform in those contexts. Cortes (2006) also suggested that the
exposure to lexical bundles should be long enough for the students to be able
to start using them in their own writing, because these expressions might be
challenging for the learners to acquire. These suggestions could work equally
well for L2 learners who could benefit from more explicit teaching of how
different classes of lexical bundles should be utilized.
Theoretical Implications
One of the most recent tendencies in present-day L2 research is to employ
lexical bundles as target structures in the studies investigating the acquisition
of formulaic sequences (see Jones & Haywood, 2004; Schmitt, Dornyei et al.,
2004; Warga, 2005). Although it might seem logical to treat the two structures
(i.e., lexical bundles and formulaic sequences) as equivalents to each other
because some of the criteria for their identification might overlap (e.g., phrase
length and frequency of occurrence), lexical bundles and formulaic sequences
do not necessarily reflect the same phenomenon. Although formulaic sequences
are traditionally described as complete units that are stored and retrieved holisti-
cally and used as shortcuts in language processing and production (Wray, 2000,
2002; Wray & Perkins, 2000), the results of the present study indicate that not
all lexical bundles have the same psycholinguistic status. Thus, treating the
two structures as equivalent to each other and employing lexical bundles as tar-
get structures in the research exploring the acquisition of formulaic sequences
should be done with caution.
In terms of determining the psychological validity of lexical bundles, the
results of the present study indicated that participants’ knowledge of these
units was affected by their register characteristics and the discourse functions
more so than their frequency of occurrence. For example, whereas most of the
appropriately produced lexical bundles served a discourse-organizing function
(e.g., on the other hand, nothing to do with, at the same time, if you look at),
all of the least appropriately produced bundles served a referential function
and were characteristic of the academic writing register (e.g., in the absence
of, in the form of, on the basis of, the nature of the). This suggests that the
saliency of lexical bundles is determined by the interaction of at least three
factors: frequency of occurrence, distribution in a specific register, and their
discourse function. Furthermore, it appears that lexical bundles, although being
structurally incomplete units, can be further classified into different structural
patterns, with some of them having more productive frames (e.g., in the case of,
in the middle of, in the form of, in the absence of ) than others (e.g., or something
like that, is one of the, on the other hand). These structural differences of lexical
bundles could affect the way they are perceived by L1 and L2 English speakers.
References
Altenberg, B. (1998). On the phraseology of spoken English: The evidence of
recurrent word combinations. In A. P. Cowie (Eds.), Phraseology: Theory, analysis
and applications (pp. 101–122). Oxford: Oxford University Press.
Biber, D., & Conrad, S. (1999). Lexical bundles in conversation and academic prose.
In H. Hasselgård & S. Oksefjell (Eds.), Out of corpora. Studies in honour of Stig
Johansson (pp. 181–190). Amsterdam: Rodopi.
Biber, D., Conrad, S., & Cortes, V. (2004). If you look at. . .: Lexical bundles in
university teaching and textbooks. Applied Linguistics, 25, 371–405.
Biber, D., Conrad, S., Reppen, R., Byrd, P., Helt, M., Clark, V., et al. (2004).
Representing language use in the university: Analysis of the TOEFL 2000 Spoken
and Written Academic Language Corpus. TOEFL Monograph Series. Princeton,
NJ: Educational Testing Service.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). The Longman
grammar of spoken and written English. London: Longman.
Brown, J. (1980). Relative merits of four methods for scoring cloze tests. Modern
Language Journal, 64, 311–317.
Chiu, C-Y., & Savignon, S. (2006). Writing to mean: Computer-mediated feedback in
online tutoring of multidraft compositions. CALICO Journal, 24, 97–114.
Conrad, S. (2000). Will corpus linguistics revolutionize grammar teaching in the 21st
century? TESOL Quarterly, 34, 548–560.
Cortes, V. (2004). Lexical bundles in published and student disciplinary writing:
Examples from history and biology. English for Specific Purposes, 23, 397–423.
Cortes, V. (2006). Teaching lexical bundles in the disciplines: An example from a
writing intensive history class. Linguistics and Education, 17, 391–406.
Coulmas, F. (1979). On the sociolinguistic relevance of routine formulae. Journal of
Pragmatics, 3, 239–266.
Coulmas, F. (1981). Introduction: conversational routine. In F. Coulmas (Ed.),
Conversational routine (pp. 1–17). The Hague: Mouton.
Cowie, A. (1998). Phraseology: Theory, analysis and applications. Oxford: Oxford
University Press.
De Cock, S. (2000). Repetitive phrasal chunkiness and advanced EFL speech and
writing. In C. Mair & M. Hundt (Eds.), Corpus linguistics and linguistic theory (pp.
51–68). Amsterdam: Rodopi.
De Cock, S., Granger, S., Leech, G., & McEnery, T. (1998). An automated approach to
the phrasicon of EFL learners. In S. Granger (Ed.), Learner English on computer.
(pp. 67–79). New York: Longman.
Dufon, M. (1995). The acquisition of gambits by classroom foreign language learners
of Indonesian. In M. Alves (Ed.), Papers from the 3rd annual meeting of the
Southeast Asian Linguistic Society (pp. 27–42). Tempe: Arizona State University,
Program for Southeast Asian Studies.
Erman, B. (2007). Cognitive processes as evidence of the idiom principle.
International Journal of Corpus Linguistics, 12, 25–53.
Ferguson, C. (1976). The structure and use of politeness formulas. Language in
Society, 5, 137–151.
Ferrante, J. (2003). Sociology: A global perspective (5th ed., pp. 22–24). Belmont, CA:
Wadsworth.
677 Language Learning 59:3, September 2009, pp. 647–686
Nekrasova Knowledge of Lexical Bundles
Foster, P. (2001). Rules and routines: A consideration of their role in the task-based
language production of native and non-native speakers. In M. Bygate, P. Skehan, &
M. Swain (Eds.), Researching pedagogic tasks: Second language learning,
teaching, and testing (pp. 75–93). San Francisco: Pearson Education.
Fotos, S. (1991). The cloze test as an integrative measure of EFL proficiency: A
substitute for essays on college entrance examinations? Language Learning, 41,
313–336.
Gibbs, R., Jr., & Gonzales, G. (1985). Syntactic frozenness in processing and
remembering idioms. Cognition, 20, 243–259.
Gibbs, R., Jr., Nayak, N., & Cutting, C. (1989). How to kick the bucket and not
decompose: Analyzability and idiom processing. Journal of Memory and Language,
28, 576–593.
Granger, S. (1998). Prefabricated patterns in advanced EFL writing: Collocations and
formulae. In A. H. Cowie (Ed.), Phraseology: Theory, analysis, and applications
(pp. 145–160). Oxford: Clarendon Press.
Hakuta, K. (1974). Prefabricated patterns and the emergence of structure in second
language acquisition. Language Learning, 24, 287–297.
Heilenman, L. (1983). The use of a cloze procedure in foreign language placement.
Modern Language Journal, 67, 121–126.
House, J. (1996). Developing pragmatic fluency in English as a foreign language.
Studies in Second Language Acquisition, 18, 225–252.
Hudson, J. (1998). Perspectives on fixedness: Applied and theoretical. Lund, Sweden:
Lund University Press.
Izumi, S. (2002). Output, input enhancement, and the noticing hypothesis: An
experimental study on ESL relativization. Studies in Second Language Acquisition,
24, 541–577.
Jones, M., & Haywood, S. (2004). Facilitating the acquisition of formulaic sequences:
An exploratory study in an EAP context. In N. Schmitt (Ed.), Formulaic sequences:
Acquisition, processing and use (pp. 269–300). Amsterdam: Benjamins.
Kowal, M., & Swain, M. (1997). From semantic to syntactic processing: How can we
promote it in the French immersion classroom? In R. Johnson & M. Swain (Eds.),
Immersion education: International perspectives (pp. 284–309). New York:
Cambridge University Press.
Lindemann, S., & Mauranen, A. (2001). It’s just real messy: the occurrence and
function of just in a corpus of academic speech. English for Specific Purposes, 20,
459–475.
Moon, R. (1998). Fixed expressions and idioms in English. A corpus-based approach.
Oxford: Clarendon Press.
Nattinger, J., & DeCarrico, J. (1992). Lexical phrases and language teaching. Oxford:
Oxford University Press.
Pawley, A., & Syder, F. H. (1983). Two puzzles for linguistic theory: Nativelike
selection and nativelike fluency. In J. C. Richards & R. W. Schmidt (Eds.),
Language and communication (pp. 191–226). New York: Longman.
Language Learning 59:3, September 2009, pp. 647–686 678
Nekrasova Knowledge of Lexical Bundles
Wray, A., & Perkins, M. (2000). The functions of formulaic language: An integrated
model. Language and Communication, 20(1), 1–28.
Appendix A
Lexical Bundles Tested in Experiment 1
Appendix B
Test Materials Used in Experiment 1
2. Your class work and homework will be assigned by your instructor. Keep
all your work in an organized binder. At the ___________ of each chapter,
you will be assigned a series of problems to help you write a Chapter
Summary. (____________________)
3. You are responsible for material covered in the readings and in the lectures,
with particular emphasis on the latter. If you ___________ a question do
not hesitate to ask. (___________________)
4. I’m going to return some papers here and talk just a little ___________
about them. (___________________)
5. Even the most highly motivated and intelligent patients are likely
to become noncompliant in the __________ of any symptoms.
(____________________)
6. My name is Melanie Graham, I’m a first year master student in RTC
and I have no idea what I __________ to do yet. I’m still learning.
(___________________)
7. Don’t cross your arms and legs at the same ____________ because some
interviewers may think you are shutting them out (a preconception drawn
from the book Body Language). Don’t manipulate objects (like a pen-
cil or keys) during the interview; try to remain natural and at ease.
(____________________)
8. What I want to ___________ is quickly run through the exercise that
we’re going to do. (___________________)
9. Homework will be assigned regularly. It will be collected at the
___________ of the next class and will not be accepted late.
(____________________)
10. Socialism, on the other ___________, is a type of theory which could only
have arisen in societies where the division of labor is highly developed.
(____________________)
11. Personally, I find that I sometimes get new ideas while I am engaged in
activities that have __________ to do with my research at all, such as
gardening, painting in the house, or even shaving when I get up in the
morning. (____________________)
12. An eligible undergraduate student may be awarded a grant of $100 to
$4,000 on the __________ of financial need. A student must complete
the FAFSA in order to be considered. (____________________)
13. To avoid creating an ethical dilemma for yourself as you prepare your
resume, remember the following: Be honest in ____________ of the
information you include. (____________________)
26. One of the ___________ important factors in your grade is the amount
of time you spend reading the text and applying your knowledge.
(____________________)
27. You will be excused from exams only with a physician’s note or verifiable
personal emergency. In the __________ of an excused absence, your exam
score will be the average of your other exams. (____________________)
28. Three o’clock– I just finish off everything that I did not have time to do.
Then the ____________ of the day is my own. (____________________)
29. [. . .] it’s possible to _____________ at the same graph or table for example
and see different things. (__________________)
30. The most satisfactory way of teaching problem-solving is by the method
of guided discovery, in which the teacher presents the problem usually in
the ____________ of a question. (____________________)
31. These critiques should be typed and doubled spaced and must be one to
two pages in length. Be sure to give the name of the speaker, the title of
the lecture, and the date of the presentation at the __________ of the first
page. (____________________)
32. The will must be in writing. It must be signed by the testator or by some
person in his presence and by his express direction. The signature must be
made or acknowledged by the testator in the __________ of two or more
witnesses, both present at the same time. (____________________)
Appendix C
Lexical Bundles Tested in Experiment 2
Discourse-organizing
Referential bundles bundles
Appendix D
Test Materials Used in Experiment 2
1
Number of words in a section.
Sociology has almost nothing to do with the individual – it looks beyond the
individual and focuses on a larger population of humans. (22)
People belong to various groups, and human behavior can be interpreted in
terms of the interaction patterns that take place in those groups. (23)
As one example of how the sociological knowledge can be used in a real-life
situation, imagine that you work for a company that employs thousands of
workers. (27)
The problem is that employees from different units have never met each
other, so a company picnic is arranged to help employees meet one another.
(24)
At this event, however, everyone talks only with people they already know.
Thus, company executives do not know what to do with the problem. (24)
As a person who understands how groups work, you would think of ways to
“make” people break out of their limited social circles. (23)
For example, you can separate the crowd into 12 groups according to birthday
month and have people introduce themselves by telling the group members
something memorable. (26)
This exercise may not be on the top of the best conversation topics list, but it
will force people to talk to one another. (24)
Sociology can lead to many careers. On the other hand, sociology is not
connected with specific skills, so students must be able to explain what they
can do with this degree. (31)
Adapted from: Ferrante, J. (2003). Sociology: A Global Perspective (5th ed.).
Belmont, CA: Wadsworth, 22–24.
Appendix E
Examples of the Computation of Information Units (i-units)
Section 1: When employers, parents, and the rest of the world ask “Why did
you major in sociology?” or “Why take sociology classes?” the reply must be
convincing.
Basic i-unit → When employers, parents, and the rest of the world ask
Supporting i-unit 1 → why did you major in sociology
Supporting i-unit 2 → why take sociology classes
Supporting i-unit 3 → the reply must be convincing
Total i-units per section = 4 i-units
Section 13: Sociology can lead to many careers. On the other hand, soci-
ology is not connected with specific skills, so students must be able to explain
what they can do with this degree.
Basic i-unit 1 → Sociology can lead to many careers.
Basic i-unit 2 → Sociology is not connected with specific skills
Supporting i-unit 1 → students must be able to explain
Supporting i-unit 2 → what they can do
Supporting i-unit 3 → with this degree
Total i-units per section = 5 i-units