The Implications of Bilingulism and Multilingualism For Potential Evolved Language Mechanisms
The Implications of Bilingulism and Multilingualism For Potential Evolved Language Mechanisms
1.
Introduction
In many parts of the world, fluency in multiple languages is the norm. India has
twenty-two official languages, and only 18% of the population is a native Hindi
speaker. Half of the population of sub-Saharan Africa is bilingual as well.
Though bilingualism (or multilingualism, as is often the case) has been
investigated in some detail within linguistics and psycholinguistics, it has to date
received scant attention from researchers studying language evolution. An
extremely important issue remains undiscussed. Whatever theoretical
framework one chooses to subscribe to, it is clear that the mental mechanisms
used for language processing allow for the native acquisition of multiple distinct
languages nearly simultaneously. What is not immediately evident is why they
can be used in this way.
On the simplest level, there are two opposing possibilities: either the ability
to acquire, comprehend and produce speech in multiple languages was selected
for or it came for free as a by-product of whatever mechanisms we use for
language. In this paper, we consider a number of the contending theories of
language evolution in terms of their compatibility with bilingual acquisition.
We test one particular type of general learning mechanism, namely sequential
learning, which has been considered a potential mechanism for much of
language processing. We propose a simple recurrent network model of
bilingual processing trained on two artificial grammars with substantially
different syntax, and find a great deal of fine-scale separation by language and
grammatical role between words in each lexicon. These results are substantiated
by recent findings in neuroimaging and eye-tracking studies of fluent bilingual
subjects. We conclude that the bilingual case provides support for the
sequential learning paradigm of language evolution, which posits that the
existence of linguistic universals may stem primarily from the processing
constraints of pre-existing cognitive mechanisms parasitized by language.
2.
5.1. Languages
We used two grammars based on English and Japanese, which were
modeled on child-directed speech corpora (Christiansen & Dale, 2001). Both
grammars contained declarative, imperative and interrogative sentences. The
two grammars were chosen because of their different systems of word order
(SVO vs. SOV). The English lexicon contained 44 words, while the Japanese
was slightly smaller (30 words) due to the languages lack of plural forms.
5.2. Model
Our network contained 74 input units corresponding to each word in the
bilingual lexicon, 120 hidden units, 74 output units, and 120 context units 1 . The
networks goal was to predict the next word in each sentence. It was trained on
~400,000 sentences (200,000 in each language). Following French (1998),
languages would change with a 1% probability after any given sentence. The
learning rate was set to .01 and momentum to .5.
5.3. Results & Discussion
To test for differences between the internal representations of words in the
lexicon, a set of 10,000 test sentences was used to create averaged hidden unit
representations for each word. As a baseline comparison, the labels for the
same 74 vectors were randomly reordered so that they corresponded to a
different word (e.g. the vector for the noun X in English might instead be
associated with the verb Y in Japanese). We then performed a linear
discriminant analysis on the hidden unit representations and compared the
results in chi-square tests for goodness-of-fit. Classifying by language resulted
in 77.0% accuracy compared to 59.5% for the randomized vectors
[2(1,n=74)=5.26, p<.05]. We also created a crude grouping by part of speech.
Though nouns, verbs and adjectives were easy to group, there were a number of
words that served a more functional purpose in the sentence, such as
determiners, common interrogative adverbs (e.g. when, where, why), and
certain pronouns (e.g. that). We classified this set as function words. This
part of speech classification resulted in 48.65% correct classification, compared
with 35.14% for the randomized vectors, but this result was not significant
1
One reviewer asked about the significance of the number of hidden units used in the model.
Generally speaking, learning through back-propagation is rather robust to different quantities of
hidden units. It is unlikely that choosing any number of hidden units slightly below or even quite
a bit above the number of inputs units would yield different results other than on the efficiency of
training (in this case the amount of training required to reach a proficient state).
General Discussion
The bilingual case, as the most prevalent form of language fluency in the world,
must be considered in any explanation for the existence of human language. We
have argued that it seems difficult to develop a selectionist account of
bilingualism. In contrast, a theory of language origins and evolution via
sequential learning may be more parsimonious in this regard because it seems to
account for bilingualism without needing any major post-hoc revisions. Our
simulation of bilingual acquisition via sequential learning demonstrated
language separation at a very local scale (i.e. within part of speech and
language), rather than the creation of two completely separate lexicons.
Converging evidence from neurological and low-level perceptual studies of
bilingual processing seem to support this finding. Recent neuroimaging data
points to a great deal of overlap in the brain areas used to process different
languages in fluent bilinguals (Chee et. al, 1999a, 1999b; Hasegawa et. al,
2002). Eye-tracking studies of fluent bilinguals have also demonstrated partial
activation for phonologically-related words in a language not used in the
experimental task (Spivey & Marian, 1999)
There are many aspects of language that need to be considered in a final
model of bilingual acquisition that were not included in our first model.
However, there are at the moment few contending explanations for how this
ability came to exist. Our work thus far serves as a first step in demonstrating
that sequential learning might be able to account for the ability to process not
only a single language as shown in previous work, but also the ability to process
multiple languages simultaneously.
Acknowledgements
We thank Rick Dale for providing his sentgen script as well as his English and
Japanese grammars, which were used to create the sentences in the simulation.
We also thank Luca Onnis and three anonymous referees for their helpful
comments and feedback on earlier drafts of this paper.
References
Baker, M.C. (2003). Linguistic differences and language design. Trends in
Cognitive Sciences, 7, 349-353.
Botvinick, M., & Plaut, D. C. (2004). Doing without schema hierarchies: A
recurrent connectionist approach to normal and impaired routine sequential
action. Psychological Review, 111, 395-429.
Chee, M.W.L., Tan, E.W.L., & Thiel, T. (1999). Mandarin and English single
word processing studied with functional magnetic resonance imaging.
Journal of Neuroscience, 19, 3050-3056.
Chee, M.W.L., Caplan, D., Soon, C.S., Sriram, N. Tan, E.W.L., Thiel, T., &
Weekes, B. (1999). Processing of visually presented sentences in Mandarin
and English studied with fMRI. Neuron, 23, 127-137.
Christiansen, M.H., Allen, J. & Seidenberg, M.S. (1998). Learning to segment
speech using multiple cues: A connectionist model. Language and Cognitive
Processes, 13, 221-268.
Christiansen, M.H. & Chater, N. (Eds.). (2001). Connectionist
Psycholinguistics. Westport, CT: Ablex.
Christiansen, M.H. & Chater, N. (in preparation). Language as an organism:
Language evolution as the adaptation of linguistic structure. Unpublished
manuscript, Cornell University.
Christiansen, M.H., Conway, C.M. & Curtin, S.L. (in press). Multiple-cue
integration in language acquisition: A connectionist model of speech
segmentation and rule-like behavior. In J.W. Minett & W.S.-Y. Wang
(Eds.), Language Evolution, Change, and Emergence: Essays in
Evolutionary Linguistics. Hong Kong: City University of Hong Kong Press.
Christiansen, M.H. & Dale, R. (2001). Integrating distributional, prosodic and
phonological information in a connectionist model of language acquisition.
In Proceedings of the 23rd Annual Conference of the Cognitive Science
Society (pp. 220-225). Mahwah, NJ: Lawrence Erlbaum.
Christiansen, M.H., Kelly, L., Shillcock, R., & Greenfield, K. (2004). Artificial
grammar learning in agrammatism. Unpublished manuscript, Cornell
University.
Conway, C. M., & Christiansen, M. H. (2001). Sequential learning in nonhuman primates. Trends in Cognitive Sciences, 5(12):539--546.
Elman, J.L. (1990). Finding structure in time. Cognitive Science, 14, 179-211.
Elman, J.L. (1993). Learning and development in neural networks: The
importance of starting small. Cognition, 48, 71-99.
French, R.M. (1998). A simple recurrent network model of bilingual memory.
In Proceedings of the 20th Annual Conference of the Cognitive Science
Society. Hillsdale, NJ: Lawrence Erlbaum.
Friederici, A.D., Steinhauer, K., & Pfeifer, E. (2002). Brain signatures in
artificial language processing. Proceedings of the National Academy of
Sciences, 99, 529-534.
Hasegawa, M., Carpenter, P.A., & Just, M.A. (2002). An fMRI study of
bilingual sentence comprehension and workload. Neuroimage, 15, 647-660.
Hoen, M., Golembiowski, M., Guyot, E., Deprez, V., Caplan, D., & Dominey,
P.F. (2003).
Training with cognitive sequences improves syntactic
comprehension in agrammatic aphasics. NeuroReport, 495-499.
Koelsch, S., Schroger, E., & Gunter, T.C. (2002). Music matters: preattentive
musicality of the human brain. Psychophysiology, 39, 38-48.
Maess, B., Koelsch, S., Gunter, T., & Friederici, A.D. (2001). Musical syntax is
processed in Brocas area: an MEG study. Nature Neuroscience, 4, 540545.
Marian, V., Spivey, M.J., & Hirsch, J. (2003). Shared and separate systems in
bilingual language processing: Converging evidence from eyetracking and
brain imaging. Brain and Language, 86, 70-82.
Patel, A.D. (2003). Language, music, syntax and the brain. Nature
Neuroscience, 6, 674-681.
Patel, A.D., Gibson, E., Ratner, J., Besson, M., & Holcomb, P.J. (1998).
Processing syntactic relations in language and music: an event-related
potential study. Journal of Cognitive Neuroscience, 10, 717-733.
Petersson, K.M, Forkstam, C., & Ingvar, M. (2004). Artificial syntactic
violations activate Brocas region. Cognitive Science, 28, 383-407.
Pinker, S., & Bloom, P. (1990). Natural language and natural selection.
Behavioral and Brain Sciences, 13, 707-784.
Scutt, T., & Rickard, O. (1997). Hasta la vista, baby: bilingual and secondlanguage learning in a recurrent neural network trained on English and
Spanish sentences. In Proceedings of the GALA 97 Conference on
Language Acquisition.
Spivey, M.J. & Marian, V. (1999). Crosstalk between native and second
languages: Partial activation of an irrelevant lexicon. Psychological Science.
10, 281-284.