Hindi - English Code - Switching
Hindi - English Code - Switching
Abstract
The aim of this paper is to investigate the rules and constraints of code-switching (CS) in Hindi-English mixed language data. In this
paper, we’ll discuss how we collected the mixed language corpus. This corpus is primarily made up of student interview speech. The
speech was manually transcribed and verified by bilingual speakers of Hindi and English. The code-switching cases in the corpus are
discussed and the reasons for code-switching are explained.
2410
articulated in Hindi instead. The English uttered here is not questions were asked -
being used to fill the lexical gaps of Hindi, rather to extend Q.
the speaker’s style ranges. 1. Have you seen any good movie/ TV series recently?
Other reasons suggested by speakers as reported by (Eilert, What is your favourite type of movies / TV series?
2006) are – 2. Where have you been for holiday before? (Anywhere
(i) When there is no appropriate word in Hindi you plan to go?) What do you like about the place?
(ii) When it is easier to communicate with a fellow 3. Please talk about your hometown, any specialty? What
bilingual to speed up communication do you recommend? Anywhere worth going? Do you prefer
(iii) When the speaker is short of words Hong Kong or your hometown? Why?
(iv) Hindi-English code-switching allows for a wider 4. What do you like to do for leisure? Why?
scope of expression 5. What kind of food do you like? Any recommendation?
(v) Other’s code switch unintentionally as it has become Do you know how to make it or where to get it?
a part of their speaking habit 6. What courses do you think are the most difficult? Why?
7. What’s your plan after graduation?
3. Student Interviews 8. How’s recent work going? Got a deadline to catch?
The interview speech data was collected at the Hong Kong Anything you find difficult? How long will you take to
University of Science and Technology (HKUST) in the graduate? Have you begun writing a paper? How is it going?
summer of 2012 over a course of 1 month. The Did you sit for an exam recently? Which one was the most
interviewees were summer intern students (in their difficult? Do you have a lot of homework? Is it hard?
penultimate year) at the School of Engineering (HKUST) 9. How are you adapting to college life? Why did you come
coming from their host institution, the Indian Institute of to this university? Why did you choose to study your major?
Technology, Mumbai (IIT). A total of 9 students of the 10. How do you get along with other people? Compared to
Indian origin who spoke Hindi natively and English near high school, which one is better? How do you get along
natively took part in the experiment. with local students? Can you integrate into the Hong Kong
The criteria we paid attention to when selecting the society? Which place do you prefer between your
right candidate to interview for this project are - hometown and Hong Kong? Do you have any close friends
(i) The interviewee must be a native speaker of Hindi here?
(ii) The interviewee must also speak English fluently 11. What kind of things are you anxious about?
(iii) The interviewee is also a University student Employment? Academic life? Relationships? Love life?
Since we are based in HK, it is relatively more difficult to 12. Do your parents/ friends give you any pressure? What
get a hold of native speakers of Hindi who go to kind of pressure?
school/university here. The majority of the young non- After all the questions have been answered, the interviewee
resident Indians (NRI) in HK grow up in HK speaking is given a survey form. In this survey form, the interviewee
English, since Hindi is not offered as a second language in gets to tick yes or no to each question to answer whether he
any of the schools or tertiary institutions. Therefore, we was stressed while answering each of the 12 questions. The
picked the summer intern students from India over the HK same survey form is also filled in by the interviewer. These
NRIs because of their proficiency in both Hindi and two forms capture the perception of stress from the
English. perspective of the interviewee and that of the interviewer.
In our research group, we have also been investigating
the effect of stress on university students. We have been 4. Data Analysis
conducting research on HK students before, to check if we After the data had been collected, we investigated the most
can identify stressed students by analysing their voices. By common types of Hindi-to-English code switching, which
identifying students who were stressed, the university is gives us an insight on when in a sentence a bilingual
able to offer counselling to the inflicted students and speaker of English and Hindi is most likely to code switch.
consequently help them recover from stress. We thought it Our observations are listed in this section.
We noticed that determiners (e.g. mainne, maim, mujhe,
will be interesting to investigate Indian students as well,
mera, aapne) are not switched to English, whereas the head
hence we added the third criterion to our interviewee nouns and adjectives are code-switched (e.g. holidays,
selection process. graduation, college life, friends, action movies, friendship,
The recordings took place in a quiet conference room calculus, parents, difficult, further studies).
with good acoustics and using a high quality microphone Now consider the following case –
(Creative Labs, SB0490). The speech data was recorded in (In Hindi)
a lossless format with a sampling rate of 16 KHz and using Maim ais University ka internship kar raha hoon.
16-bit digitization. The audio software used to record the (In English)
audio was called Audacity, which is a free, open source I am doing an internship at this university.
cross-platform software for recording and editing sounds. Like pronouns and determiners, genitives like ‘ka’ here
A series of 12 questions were asked to each interviewee and are not prone to code-switching to English.
Code-switching within the noun phrases is common
their responses were recorded by the interviewer. In each
within the corpus. We can find three different combinations
interview setup, there was only one interviewer and one of elements in the noun phrase.
interviewee inside the conference room. The following
2411
(i) All constituents of the noun phrase is in Hindi the score is 12 in column ‘intra-sentential’ for that speaker
(e.g. mera kaam) on Table 2. From Table 2, it is evident that intra-sentential
(ii) All constituents of the noun phrase is in code-switching is the most prevalent form of code-
English (e.g. love life, South Indian switching in our corpus.
vegetarian food, academic life, college life,
close friends, major problem, complex
concepts) Intersentential Intra-sentential
(iii) The head noun in the noun phrase is in Speaker 1 0 12
English (e.g. jyada negative, mera hometown, Speaker 2 2 12
apane friends, kuch pressure, bahut
recommend)
Speaker 3 0 12
One other combination which can been seen in Hindi- Speaker 4 0 12
English code-switching is when the modifying adjective in Speaker 5 0 12
the noun phrase is in English (e.g. difficult pariksha). This
Speaker 6 0 12
combination was not prevalent in our recordings.
In Hindi, the compound verb consists of the verb root Speaker 7 2 12
and operator. The first element of the compound verb Speaker 8 1 12
determines it’s meaning, as modified by the operator Speaker 9 0 12
(Kumar, 1986). In our corpus we have seen code-switching
within the verb phrase, where the first element of the verb Table 2: Number of answers where each speaker used
phrase is usually switched to English e.g. – inter and intra-sentential code-switching
Integrate karana haim
Recommend karunga tumhe
Surfing karata haim Motivation for Code-Switching
Code-switching within the noun phrase and verb
phrase are known as insertions. They are the most common
type of code-switching encountered in the recorded corpus. Ease of Use
One other form of code-switching can exist in Hindi-
English CS called alternations. Alternations were first Comment
described by (Muyusken, 2000). Extended switches into
Referential Function
the other language is common property of alternations.
Alternations can happen inter-sententially (at sentence Topic Shift
boundaries) as well as intra-sententially (within the
utterance/sentence). Dispreference
Intra-sentential CS example:
Recently, maine ek Russian movie dekhi haim. Personalisation
Recently, I have seen a Russian movie.
Inter-sentential CS example: Emphasis
Hum kya kar rahe haim is none of your business. No Subsititute Word
What we are doing is none of your business.
Alternations were observed to be not as common as Name Entity
insertions in the corpus.
Some statistics on the data collected are given below – Clarification
Hindi English
0 5 10 15
Words Words % Hindi % English
Speaker 1 207 104 66.6 33.4 Number of Code Switches Per Speaker on
Speaker 2 256 138 65.0 35.0 Average
Speaker 3 263 93 73.9 26.1
Speaker 4 267 88 75.2 24.8 Figure 1: Motivation for code-switching among speakers
Speaker 5 231 117 66.4 33.6 in the collected corpus
Speaker 6 184 113 62.0 38.0
Speaker 7 202 165 55.0 45.0 Glancing at Figure 1, the motivation for code-switching
Speaker 8 290 115 71.6 28.4 among Hindi interviewees becomes clear. Most Hindi
Speaker 9 308 109
speakers, during the interviews, tend to switch to English
73.9 26.1
because the code-switching English word is easier to use
Average 245 116 67.7 32.3
compared to it’s Hindi counterpart. In other cases, they
Table 1 : Relative proportion of code-switching in the
switch to English to articulate name entities. The other
corpus
most common reason to switch to English is to clarify
their explanations because sometimes it is easier to grasp
Total duration of transcribed speech is roughly 30 minutes.
the concept in English compared to Hindi. Also whenever
We also checked every answer (total of 12 for each speaker)
there is no well-known Hindi word for an English word,
for intrasentential and intra-sentential code-swiching. If all
Hindi speakers will switch to English to just say that word
12 answers had at least one intra-sentential code-switching,
2412
and then switch back to English.
5. Conclusion
In this paper, the collection of a Hindi-English code-
switching corpus is described. The corpus includes student
interviews of 9 students, both proficient in Hindi and
English. Each student interviewee was asked a series of 12
questions and their responses recorded. The collected audio
data was then transcribed by hand. The data collected was
used to study the internal rules which Hindi-English code-
switching follows; this can help us determine the most
likely code-switching points within a sentence. On average,
roughly 67% of each sentence were made up of Hindi
words and 33% English words. It is also observed that
intra-sentential code-switching is the most prevalent form
of code-switching in our corpus. Since the interviewees
were recorded just before their examination period, the
questionnaire was designed to bring about stress during the
interaction. Hence this corpus is also suitable for carrying
out experiments and build classifiers to detect stress among
Indian university students. We are going to continue
collecting audio data from new students to expand this
corpus.
6. Acknowledgements
We will like to thank Abhilash Veeragouni of IIT Bombay
for helping us collect and transcribe the corpus. Special
thanks also goes to all the intern students from IIT who
volunteered to help with this project.
7. References
Malhotra, Sunil. “Hindi-English, Code-switching and
Language Choice in Urban, Upper middle-class Indian
Families.” Kansas Working Papers in Linguistics,
Volume 5 (1980): pp. 39-46. JSTOR. Web.
Gumperz, John J. “Linguistic and Social Interaction in Two
Communities.” American Anthropologist (1964): pp.
137-153. JSTOR. Web.
Woolford, E. “Bilingual code-switching and syntactic
theory.” Linguistic Inquiry (1983): pp. 520-36.
Eilert, R. (2006). English in India, a study of native Hindi
speakers in Delhi. Unpublished master’s thesis,
Australian National University.
Kumar, Ashok. “Certain Aspects of the Form and
Functions of Hindi-English Code-Switching.”
Anthropological Linguistics, vol. 28, no. 2 (1986): pp.
195-205.
Muyusken, Pieter. Bilingual Speech: A Typology of Code-
Mixing. Cambridge University Press, 2000.
2413