0% found this document useful (0 votes)

150 views16 pages

Debapriya Sengupta, Goutam Saha - Identification of The Major Language Families of India and Evaluation of Their Mutual Influence 2016

Debapriya Sengupta, Goutam Saha - Identification of the major language families of India and evaluation of their mutual influence 2016

Uploaded by

sreenathpnr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

150 views16 pages

Debapriya Sengupta, Goutam Saha - Identification of The Major Language Families of India and Evaluation of Their Mutual Influence 2016

Debapriya Sengupta, Goutam Saha - Identification of the major language families of India and evaluation of their mutual influence 2016

Uploaded by

sreenathpnr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Identification of the major language families of India and evaluation of their mutual

influence
Author(s): Debapriya Sengupta and Goutam Saha
Source: Current Science , 25 February 2016, Vol. 110, No. 4 (25 February 2016), pp. 667-
681
Published by: Current Science Association

Stable URL: https://www.jstor.org/stable/24907928

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

is collaborating with JSTOR to digitize, preserve and extend access to Current Science

This content downloaded from

49.15.139.42 on Thu, 20 May 2021 06:13:44 UTC
All use subject to https://about.jstor.org/terms
RESEARCH ARTICLES

Identification of the major language fami

India and evaluation of their mutual influence

Debapriya Sengupta* and Goutam Saha

Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology, Kharagpur 721 302, India

Therefore, language families are a rich source of informa

A language family is a group of languages which have
tion to historians. A study about language families tells us
descended from a common mother language. Since the
ancestor is common, these languages are expected howto our modern colloquial languages sounded hundred
be similar in some respect and manifest the similarity ago or even earlier. New historical facts may also
years
be revealed as a result of this study.
in scientific experiments. In language identification,
language-specific features are extracted from speechMost of the Indian languages belong to two major
and a model is created which represents the language.
language families, Indo-European and Dravidian. Indo
This work extends the language identification frame European is spoken by 73% of Indians and Dravidian by
work to capture features common to language families24% of Indians1'2. The remaining 3% of Indians speak
and create models which can efficiently represent the
languages belonging to several minor families like Austro
language families. Mel frequency cepstral coefficient
Asiatic, Tibeto-Burman, etc. The Indo-European family
(MFCC) and speech signal-based frequency cepstral
can be further grouped into a number of sub-families. The
coefficient (SFCC) are used as primary feature extrac
largest among these sub-families is Indo-Aryan, which
tion tools. A combination of these along with shifted
contains
delta coefficient (SDC) gives the final set of features. the Indian languages. About half of the languages
The work uses Gaussian mixture model (GMM) and in the Indo-European family belongs to this sub-family3.
support vector machines (SVM) as modelling tools. Among the Indian languages, Indo-European corresponds
Different combinations of these feature extraction and to Assamese, Bengali, Bhojpuri, Chhattisgarhi, Dogri,
modelling techniques are used to get four different English, Gujarati, Hindi, Kashmiri, Konkani, Manipuri,
systems: MFCC + SDC + GMM, SFCC + SDC + GMM, Marathi, Nagamese, Odia, Punjabi, Sanskrit, Sindhi and
MFCC + SDC + SVM and SFCC + SDC + SVM. Ex
Urdu, and Dravidian corresponds to Kannada, Malaya
periments with these systems show that the language
lam, Tamil and Telugu4.
families can be identified with reasonable accuracy. Language identification (LID) is a popular research
Further, the work tests the influence of one language
area and many research papers are available on the sub
family on the other and finds that in most cases, the
ject. Zissman5 has discussed LID of telephone speech
languages which are spoken in areas lying on the
usingby
boundary of two families are more influenced four
thedifferent approaches, namely Gaussian mixture
other family. A deviation from it can relate to geo phone recognition language modelling
model (GMM),
(PRLM),and
political isolation of two neighbouring regions parallel PRLM and parallel phone recognition
thus can give new insights or corroborate(PPR). He showed that phone recognizers give a better
investiga
tions of historians. result than GMM, but availability of phonetically labelled
speech is a drawback. An approach using GMM tokeniza
Keywords: Feature extraction, language family, modeltion is discussed by Torres-Carrasquillo et al.6, where
ing techniques, mutual influence. system complexity is reduced compared to phone recog
nizers. Shifted delta coefficient (SDC) as features are
A systematic study of language families is essentialused
in in another study7, which shows the importance of
order to understand the origin and development ofbroaderlan temporal features for LID. Campbell et a/.8 have
used
guages. In a large country like India, where the number of support vector machines SVM with a generalized
linear discriminant kernel (GLDS) for LID. Results are
languages spoken (official and unofficial) is more than
1500, knowledge about language families is important. comparable to GMM classifiers. Li et al.9 have discussed
The mother language from which different child about lan all the tools and techniques of LID available in the
literature.
guages are born is called the proto language of the fam
ily. With passage of time and increase in the number The of present work uses some of the popular techniques
utilized
speakers, the mother language splits into several pronun for language, speaker or speech recognition, for
experiments related to identification of language family.
ciations and dialects giving rise to different languages.
For feature extraction, mel frequency cepstral coefficient
(MFCC) is considered as a standard tool in speech
*For correspondence, (e-mail: [email protected])
processing domain10. Speech signal-based frequency

CURRENT SCIENCE, VOL. 110, NO. 4, 25 FEBRUARY 2016 667

This content downloaded from

49.15.139.42 on Thu, 20 May 2021 06:13:44 UTC
All use subject to https://about.jstor.org/terms
RESEARCH ARTICLES

20 ms,
cepstral coefficient (SFCC) is a newer method passed
and is through Hamming window and then its
power
known to work well in speech recognition. SDC hasspectral
been density (PSD) is calculated. The resulting
signal
successfully used for language identification. GMM is passed
is through the mel scale filter bank. Loga
rithm of
extensively used in speaker or language identification the output of the mel filter bank is taken and its
for
discretetool,
modelling purpose. SVM is a powerful classification cosine transform (DCT) is evaluated. These con
particularly suitable for two-class problems. stitute the mel coefficients". We have used frame size of
20 ms
The rest of the article is arranged as follows: the nextwith an overlap of 10 ms. The number of filter
banks is 20.
section gives a brief description of the feature extraction
and modelling techniques that have been used in the
study. Next, the database and experimentalSpeech set-upsignal-based
are frequency cepstral coefficients:
described. The results of the experiments are SFCC
thenis apre
frequency warping technique based purely on
sented and discussed, followed by conclusion. the properties of the acoustic speech signal. The filter
bank width depends on the data. Train data are divided

Brief description of the feature extraction and

modelling techniques

Feature extraction

We have used MFCC and SFCC as language-dependent

features. They are concatenated with SDC to give the final
feature vector. Thus the two sets of feature vectors that
we have used are MFCC + SDC and SFCC + SDC.

Mel frequency cepstral coefficients: The mel scale is

designed to match the human auditory spectrum. The
filters in this scale are equally spaced from 0 to 1000 Hz.
But their spacing increases continuously beyond
1000 Hz. The mel scale and the 'perpetual' frequency
scale are related by the equation:

Mel frequency = 2595 log10^l+ (1)

where/is the perpetual frequency.
Figure 1 shows the mel scale filter bank. Speech signal
Figure 2. Steps required
required to
to compute
compute signal-based
signal-based frequency
frequency cepstral
cepstral
is pre-emphasized, broken into smaller segments of (SFCC) filter
coefficient filter bank.
bank.

Frequency (Hz)-»

Figure
Figure 1.1.Mel
Melfrequency
frequencycepstral
cepstral
coefficient
coefficient
filterfilter
bank.bank. Figure 3. SFCC filter
filter bank.
bank.

668 CURRENT SCIENCE, VOL. 110, NO. 4, 25 FEBRUARY 2016

This content downloaded from

49.15.139.42 on Thu, 20 May 2021 06:13:44 UTC
All use subject to https://about.jstor.org/terms
RESEARCH ARTICLES

Discrete cosine transform

% 1
RASTA filter

V ZZ
SDC feature extraction

r
Forty-nine SDC coefficients
r

Seven MFCC/SFCC
coefficients

Fifty-six MFCC/SFCC +
SDC coefficients

Figure
Figure 4. Steps
4. Steps
required
required
to derive
toMFCC/SFCC
derive MFCC/SFCC
+ shifted delta +
coefficient
shifted (SDC)
delta
features
coefficient
of speech.
(SDC) features of speech

into frames of 20 ms with 10 ms overlap and Figure 4 shows the steps required to get MFCC/
passed
through Hamming window. PSD of each frame SFCC +is calcu
SDC features from speech. SDC is discussed in
lated and averaged over all frames, and itsthe literature713'14.
logarithm is
computed. Average energy is computed by summing up
the log PSD and dividing it by the number of filter banks.
Modelling
We have used 20 filter banks for our experiments. The
upper cut-offs for each filter bank are chosen to be such
For modelling the feature vectors, we have used GMM
that the log energy of the filter banks is equal to the aver
and SVM.
age energy12. The rest of the procedure is the same as that
used in case of MFCC. Figure 2 shows the steps involved
Gaussian mixture models: In GMM, given the feature
in computing SFCC filter banks, whereas Figure 3 shows
the SFCC filter bank.
vector of train data, the aim is to estimate the parameters
of the GMM A, which best represents the distribution of
the feature vectors. A Gaussian mixture density is a
Shifted delta coefficients: Delta cepstral features are
weighted sum of M component densities. It can be repre
computed across multiple speech frames and stacked
sented by
together to give shifted delta cepstral features. Four
parameters are used to specify these features: N, d, P and M

k. N is the number of cepstral coefficients computed at

each frame, d is the advance or delay for delta computa
pr(o|A) = 'YjPMo), (3)
i=i

tion, delta features of k number of blocks are concate

nated to form the final feature vector and P is the shift in where o is the D-dimensional feature vector, p„ i = 1,
time between two consecutive blocks. SDC coefficients 2,..., M are the mixture weights and bj(o), i = 1, 2M
can be represented by the following equation: are the component densities. Each component density is a
Gaussian function of D dimensions of the form
Ac,(t) = c,(t + iP + d)- Ci(t + iP - d), (2)

where c,(f) is the ;th block of delta cepstral feature, i = 0,

1, 2,..., (k - 1), and t is the time.
b'i0) = {In)™1 II- |1/2 eXP{~I(° ~ Ml)' Z7'(° ~ A)}'
Values of N, d, P and k in this case are 7, 1,3, and 7 (4)
respectively.
We take the first seven MFCC or SFCC coefficients. with mean vector //, and covariance matrix We as
SDC constitutes 49 coefficients (seven sets of delta sume diagonal covariance because it gives a better result.
coefficients, each set having seven coefficients). So the The constraint on /?, is = 1. The complete GMM
resulting feature vector has 56 coefficients. can be represented by A = {/?,-, Z,},
CURRENT SCIENCE, VOL. 110, NO. 4, 25 FEBRUARY 2016 669

This content downloaded from

49.15.139.42 on Thu, 20 May 2021 06:13:44 UTC
All use subject to https://about.jstor.org/terms
RESEARCH ARTICLES

The parameters ph //, and i = 1, 2,..., MWe are esti

have used a GMM supervector linear kernel in our
mated using maximum likelihood method. This is done The means of GMM are stacked to form a
experiments.
using expectation maximization (EM) algorithm15.
GMM mean supervector17 (Figure 6). The supervectors
The number of Gaussians M in a GMM is called the can be thought of as a mapping between an utterance and
model order of GMM and is determined experimentally.
a high-dimensional vector. We use maximum-a-posteriori
In our experiments we have used two values of M- 64
(MAP) adaptation to compute the means of GMM. For
and 128. Every experiment is done twice, once for this, a universal background model (UBM) is developed
M = 64 and then for M— 128. using the entire train data18. UBM is a large GMM trained
to represent the language-independent distribution of
Support vector machines: SVM is a two-class classifier
features. For a good model it is required to be
which maps the input space to a high dimensional space
trained with languages, tones and speakers of all variety
and then creates a hyperplane which separates the two
which the model is expected to encounter during classifi
classes. The hyperplane should be such that it ensures
cation.
maximum separation between the two classes (Figure 5). The linear kernel used in our experiments is repre
This leads to an optimization problem. Solving this prob
sented by
lem using Lagrangian multipliers with suitable optimality
conditions16, we get M

Ki<p^q^) = ^Piitf)' I,"1 /<?

/ 1=1

fix) = ^UitjKix, *,.) + ß, (5)

1=1 M

= £r1/2 sr1/2 rt), (?)

where /, are the ideal outputs, Z-=t = 0, the weights (=i

a, > 0, ß is bias, x is any vector and jc, are the support

where (pd and ^ are two utterances under consideratio
vectors. Ideally outputs should be either +1 or -1 depend
ing upon which class the corresponding support vectors and /I and /I' are the adapted supervector of means.
belong to. k(x, j) is the kernel function which is con Detailed description of this method is given by Campbell
et al.17.
strained to fulfil certain properties called Mercer condi
tion, so that it can be expressed as Two feature extraction and two modelling techniques
are described in this section. With these four techniques
K(x,y) = (fix)'</>iy), (6) we get four systems, namely MFCC + SDC + GMM,
SFCC + SDC + GMM, MFCC + SDC + SVM and SFCC +
where cj>ix) is a mapping fromSDC
input
+ SVM. Our
space
experiments
to a are
high
performed with all
dimensional space. The Mercer condition ensures
these four systems that
in order the
to ensure that the results are
margin concept is valid, and the optimization
consistent irrespectiveof the
of the SVM
system used.
is bounded8.

Experimental set-up

Database preparation

The database used is prepared from the All India Radio

website19. Daily news bulletins of all major Indian lan
guages are available in its repository.

mn
Feature
extraction
■H MAP adaptation—► m

Speech input

mk,

GMM
GMMSupervector
Supervector

Figure
Figure 5. The
5. hyperplane
The hyperplane
which ensures
which
maximum
ensures
separation
Figure maximum
Figure6.6.be
Formation
Formation
of of separation
Gaussian
Gaussian
mixture
mixture
modelmodelbe
(GMM)(GMM)
super super
tween the two classes. vectors.

670
CURRENT SCIENCE, VOL. 110, NO. 4, 25 FEBRUARY 2016

This content downloaded from

49.15.139.42 on Thu, 20 May 2021 06:13:44 UTC
All use subject to https://about.jstor.org/terms
RESEARCH ARTICLES

There are various reasons for which we have selected tion of silence and unwanted voices are deleted as is done
this website as our data source. The quality of speech is for training speech. From these segments, we select 100
good as it contains very little noise. Speech is sufficiently utterances for each family and for each duration. These
loud and is of sufficiently long duration. A large number come from all the languages belonging to the particular
of speakers (news readers) are available in each language. family. For example, since Dravidian family has 4 lan
This reduces speaker bias. Both male and female speakers guages, there are 25 utterances from each language and
are present in almost all the available languages, ensuring Indo-European family has nearly 6 utterances from each
little or no gender bias. Speech covers a large variety of language. These 25 or 6 utterances from each language
topics that helps capture details of acoustic information. are equally divided between male and female speakers
To prepare the database, we have downloaded all news (where exact equal division is not possible, nearly equal
files that are available in the 22 languages mentioned ear division is taken). This is how one test set is prepared.
lier (18 Indo-European and 4 Dravidian languages). We have prepared three such test sets to verify our re
After listening to each of the news files, we have rejected sults. These three test sets are again mutually exclusive,
those of poor quality. Almost all news bulletins have a i.e. the 100 utterances contained in set A are different
short music of 10 sec at the beginning and the end. These from the 100 utterances in sets B or C. Figure 7 shows
have been removed in our experimental database. The the structure of train and test sets.
speech is originally in ,mp3 format. It is converted to
.wav format to enable further processing.
Influence of Dravidian family on Indo-European
languages
Language family identification
Two languages spoken in the neighbouring regions may
We train the systems to identify speech from each of the be influenced by each other. We have tried to test this
two families (Indo-European and Dravidian). Then we experimentally. Also, the extent of influence can be
test them by giving speech samples from both the fami indirectly measured from the accuracy of correct identifi
lies as input and find the percentage of correct identifica cation.
tion. Since languages belonging to the same family sound
similar (have similar acoustic features), it is expected that
Training: For each family, 4 h of speech is taken for a system trained with a language family can identify
training. This 4 h contains speech from all the languages whether a language belongs to that family or not even if
belonging to the families, i.e. Indo-European training set that language is not used during training of the system.
contains news files from all the 18 languages in equal Based on this idea, we trained the systems by leaving out
proportion. Since each news file is nearly of 10 min dura one of the languages of Indo-European family. Then
tion, we have taken two files from each language on an when we test these systems with the language left out,
average. Similarly, since Dravidian family has four lan languages having more Dravidian influence are expected
guages, eight files from each language have been taken to give less identification accuracy compared to those
on an average. This is done so that the systems do not get having less Dravidian influence.
biased towards any particular language. Also, duration of
male and female speech is nearly equal in each family to Influence on neighbouring Indo-European languages:
reduce gender bias. After the train data are arranged in Indo-European languages are spoken by most of Central
this format, each speech file is broken into segments of and North Indian states. Dravidian languages are spoken
30 sec duration. Each segment is listened to and segments in the southern states of Karnataka (Kannada), Kerala
containing music, long duration of silence or unwanted (Malayalam), Tamil Nadu (Tamil) and Andhra Pradesh
sounds are deleted. At the end, we have two sets of (Telugu). The neighbouring Indo-European speaking
speech files, one for each family, each having 4 h of states are Chhattisgarh, Goa, Maharashtra and Odisha
speech segmented into 30 sec duration. This is the speaking Chhattisgarhi, Konkani, Marathi and Odia
training set. respectively. The neighbouring languages are expected to
have more Dravidian influence than non-neighbouring
Testing: This is done for three different utterance dura languages. We tested the influence of Dravidian languages
tions, short (3 sec), medium (10 sec) and long (30 sec). on each of the neighbouring languages. Figure 8 shows
For each utterance duration, we have 100 test utterances. the Dravidian speaking states and their neighbouring
The train and test speaker sets are mutually exclusive. Indo-European speaking states.
News files from test speakers are chosen taking care of
the fact that the number of male and female-speaking Training: In order to test the influence of Dravidian
files is nearly equal. These news files are then broken in family on neighbouring Indo-European languages, we
to 3 sec, 10 sec and 30 sec segments. Music, long dura have prepared a train dataset which is similar to that

CURRENT SCIENCE, VOL. 110, NO. 4, 25 FEBRUARY 2016 671

This content downloaded from

49.15.139.42 on Thu, 20 May 2021 06:13:44 UTC
All use subject to https://about.jstor.org/terms
RESEARCH ARTICLES

Testing
Training Separate set of speakers

M< Male Male

>ale Female Female

Lang 2
Lang 1
Lang L

3s 3 s 3 s

10 s 10 s 10s

30 s
30 s 30 s

ft t tt t t *
3 s data 10 s data 30 s data

Figure7. 7.
Figure Train
Train and and test dataset.
test dataset.

European language. Care is taken so that total duration

Source of map:
http://wwwxoachingirKiians.com and male-female proportion of the dataset are disturbed
as little as possible. It is not necessary that the replaced
speech should be from one language. For example,
Chhattisgarhi speech can be replaced by one male speech
in Hindi and one female speech in Gujarati. The Dravid
ian dataset remains unchanged. In this way, the systems
are trained four times, each time leaving out one neigh
> bouring Indo-European language.

Testing: The test set contains speech of only the left out
language. We select some news files from the left out
8 / V" language keeping nearly equal proportion of male and
female speakers. These files are broken into segments of
v
/w' 1) Tamil Nadu 3 sec, 10 sec and 30 sec duration. We select 100 segments
2) Kerala from each duration to prepare the test dataset. Similar
3) Karnataka
procedure is carried out for all the four neighbouring
4) Andhra Pradesh
5) Goa Indo-European languages. The only exception is Chhat
6) Maharashtra tisgarhi, where 100 test utterances could not be collected
7) Odisha
for 30 sec utterance duration due to shortage of data. So
8) Chhattisgarh
tests are done with 86 segments.
Dravidian-speaking states
Influence on non-neighbouring Indo-European langua
ges: In order to compare the results of Dravidian influ
ence on neighbouring Indo-European languages, we have
Figure 8. Dravidian-speaking states and their neighbours.
done similar experiments on non-neighbouring Indo
European languages as well. For this, we have randomly
described in the section on 'Training'; but with a little
selected four Indo-European languages which do not have
change. As in the earlier case, we take 4 h of speech for
a Dravidian-speaking neighbour. The languages we
each language family. But here the Indo-European dataset
selected are Bengali, Gujarati, Hindi and Punjabi.
contains 17 languages, unlike 18 in the previous case. We
leave out one neighbouring Indo-European language each Training: The data are prepared in the same way as des
time. For example, to test the influence of Dravidian on cribed earlier. Each system is trained four times, each
Chhattisgarhi, we train the system with all 17 Indo time leaving out one of the four selected Indo-European
European languages, except Chhattisgarhi. So, Chhattis languages and compensating it with other Indo-European
garhi news files are removed from the training set of the languages of equal duration and equal male-female pro
earlier section and replaced by news of any other Indo portion.

672 CURRENT SCIENCE, VOL. 110, NO. 4, 25 FEBRUARY 2016

This content downloaded from

49.15.139.42 on Thu, 20 May 2021 06:13:44 UTC
All use subject to https://about.jstor.org/terms
RESEARCH ARTICLES

Table
Table 1. Language
1. Language
family identification
family identification
accuracy of threeaccuracy
test datasets
ofusing
three
MFCCtest
+ SDCdatasets
+ GMM andusing
SFCC +MFCC + SDC + GMM and S
SDC + GMM

MFCC
MFCC + SDC++ GMM
SDC + GMM SFCC + SDC
SFCC + + GMM
SDC + GMM

Set A Set B Set C Set A Set B Set C

Test duration M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O.
(sec) 64 128 64 128 64 128 64 128 64 128 64 128

3 74 74.5 80 79 76.5 78.5 75.5 77 78 76.5 77.5 78.5

10 81.5 84.5 83 82.5 83.5 85 84.5 82.5 85 83.5 86 85.5
30 88.5 88.5 87 87 89.5 89.5 90.5 90 88.5 87.5 91 88.5

MFCC, Mel frequency cepstral coefficient; SDS, Shifted

based
frequency cepstral coefficient; M.O., Model order.

Table 2. Language family identification accuracy of

SDC + SVM

MFCC
MFCC + SDC+
+ SVM
SDC + SVM SFCC + SDC
SFCC + +
SDCSVM
+ SVM

Set A Set B Set C Set A Set B Set C

Test duration M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O.
(sec) 64 128 64 128 64 128 64 128 64 128 64 128

3 74 75.5 78 78.5 77 78 77 77 78 80 78.5 80.5

10 86.5 88 84 87 84.5 88 90 89.5 87.5 89.5 86 87
30 90 88.5 92 92.5 94 93.5 93.5 92.5 91.5 93.5 95 97

SVM, Support vector machines.

Testing: Data preparation is Dravidian

from other similar to
languages. Thisthat
time only menti
Dravid
earlier. Four sets of test
ian data are prepared,
dataset is modified one
and Indo-European data each
are kept
untouched.
Bengali, Gujarati, Hindi and Punjabi, each set havi
segments of 3, 10 and 30 sec duration and each dura
having 100 test utterances.
Testing: The process is similar to that mentioned earlier.
Two sets of test data are prepared, one having Kannada
speech segments and the other Telugu. The structure of
Influence of Indo-European family on Dravidian
the datasets is the same as described earlier.
languages

Influence on non-neighbouring Dravidian languages:

We have tested the influence of Dravidian family on
Here influence of Indo-European family on Malayalam
neighbouring and non-neighbouring Indo-European lan
and Tamil is tested.
guages. Similarly, we now test the influence of Indo
European family on neighbouring and non-neighbouring
Training: This process is the same as the earlier proc
Dravidian languages. Figure 8 shows that the Dravidian
esses. Two sets of train data are prepared leaving out
speaking states lying on the boundary of Indo-European
Malayalam in one and Tamil in the other. This is com
speaking states are Karnataka and Andhra Pradesh. The
pensated by speech from other Dravidian languages.
remaining two Dravidian-speaking states, Kerala and
Tamil Nadu, which do not have a Indo-European speak
Testing: The process of data preparation and structure
ing boundary, can be considered as non-neighbouring
of test datasets are similar to the above-mentioned cases.
states.
Test data are prepared with Malayalam and Tamil speech
Influence on neighbouring Dravidian languages: Heresegments.
we test the influence of Indo-European family on Kan
nada and Telugu.
Results and discussion
Training: Data are prepared as mentioned earlier. Two
sets of data are prepared, leaving out Kannada in the first
The results have been divided into three sub-parts, namely
and Telugu in the next. This is compensated by speech
results showing language family identification accuracy,

CURRENT SCIENCE, VOL. 110, NO. 4, 25 FEBRUARY 2016 673

This content downloaded from

49.15.139.42 on Thu, 20 May 2021 06:13:44 UTC
All use subject to https://about.jstor.org/terms
RESEARCH ARTICLES

@ 1Q0 100
100
□
03s
3 sBIOs
B10sB30
B30 s
100

++ 90
90
tÎ 80
|70
O
o

a
o
E
S560
60 ■Iii
ill
£ 50
50 j

| 4040
ro

30
e 30
'■£

% 20
20 c
a>
-a
10

Set A Set B Set C Set A Set B Set C

Set A Set B Set C Set A Set B Set C
Test sets —* Test sets
Test sets—*■
—>
Test sets —* Test sets —*
—»

Model order 64 Model order 128 Model order 64 Model order 128

MFCC + SDC + GMM SFCC + SDC + GMM

□ 3s BIOs B30 s 03
□ 3s
s BBIOs
10 s B30
B30 s
03s ■ 10 s B30s □ 33s
s ■
BIOs
10 s B30
B 30 s
s

t
£

1aI

Set A
A Set B
B Set
Set IC
Set A Set B Set C Set A Set B Set C Set A Set B Set C
Test
Test sets—►
sets—*
Test sets —»
Test sets —* Test sets —>

Model order 64 Model order 128 Model order 64 Model order 128

MFCC + SDC + SVM SFCC + SDC + SVM

Figure 9 a d. Graphical representation of language family identification accuracy of three sets of test data using the f

influence of Dravidian family on Indo-European

higherlan
duration implies more test data. Results are consis
guages, and influence of Indo-European family on Dra
tent across all the systems. No system gives abrupt high
vidian languages. or low accuracy. Model order 64 and 128 do not show
much variation in the results.

Language family identification

Influence of Dravidian family on Indo-European
languages
Tables 1 and 2 show the results of language family identi
fication tests using all the four systems, for three sets of
test data. The experiments are done using GMMTablesmodel
3-6 show identification accuracy of Indo
order 64 and 128. Figure 9 is a graphical representation
European languages which are neighbouring and non
of the results. neighbouring to Dravidian-speaking states using the four
systems
All the four systems can identify the language respectively. Figures 10 and 11 show a graphical
families
with high rate of accuracy. The accuracy ranges from of the same.
representation
74% to 97%. Accuracy percentage increases with
As expected,
test the neighbouring Indo-European lan
guagesaccu
utterance duration, i.e. 30 sec utterances give higher give less accuracy than the non-neighbouring
languages
racy than 10 sec utterances which again give higher accu which implies that neighbouring languages
racy than 3 sec utterances. This is justified because have higher Dravidian influence. Also, among the
674 CURRENT SCIENCE, VOL. 110, NO. 4, 25 FEBRUARY 2016

This content downloaded from

49.15.139.42 on Thu, 20 May 2021 06:13:44 UTC
All use subject to https://about.jstor.org/terms
RESEARCH ARTICLES

Table
Table 3. Identification
3. Identification
accuracy accuracy
of Indo-European
of Indo-European
languages which arelanguages
neighbouringwhich
to Dravidian
are family,
neighbouring
using MFCCto
+ SDC
Dravidian
+ GMM and
family, using MFCC
SFCC + SDC + GMM

MFCC
MFCC +GMM
+ SDC + SFCC +
SDC + GMM SFCC + SDC + SDC
GMM+ GMM

Chhattisgarhi Konkani Marathi Odia Chhattisgarhi Konkani Marathi Odia

Test duration M.O. M.O. M.O M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O.
(sec) 64 128 64 128 64 128 64 128 64 128 64 128 64 128 64 128

3 42 41 69 72 75 79 32 32 49 41 75 78 78 74 26 28
10 41 37 69 76 90 90 34 39 52 43 78 78 89 87 35 38
30 47.6744 36.0465 63 73 94 95 25 29 48.8372 43.0233 67 69 92 93 23 25

Table 4. Identification accuracy of Indo-European languages which are nei

SFCC + SDC + SVM

MFCC
MFCC + SDC ++SVM SFCC ++SDC
SDC + SVM SFCC + SDC + SVM
SVM

Chhattisgarhi Konkani Marathi Odia Chhattisgarhi Konkani Marathi Odia

Test duration M.O. M.O. M.O M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O.
(sec) 64 128 64 128 64 128 64 128 64 128 64 128 64 128 64 128

3 38 28 68 67 81 80 61 64 32 27 62 65 80 77 58 51
10 38 37 72 80 91 88 65 73 39 40 80 84 89 90 55 64
30 40.6977 48.8372 63 72 95 95 75 77 45.3488 45.3488 69 77 95 97 60 64

Table 5. Identification accuracy of Indo-European languages wh

MFCC + SDC + GMM and SFCC + SDC + GMM

MFCC
MFCC + SDC +
+GMM SFCC +
SDC + GMM SFCC + SDC + SDC
GMM+ GMM

Bengali Gujarati Hindi Punjabi Bengali Gujarati Hindi Punjabi

Test duration M.O. M.O. M.O M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O,
(sec) 64 128 64 128 64 128 64 128 64 128 64 128 64 128 64 128

3 89 90 59 63 94 91 81 79 88 87 63 64 90 95 76 74
10 95 95 74 72 98 99 89 88 94 94 77 75 99 100 79 83
30 99 100 64 64 100 100 93 91 100 100 62 64 100 100 75 71

Table 6. Identification accuracy of Indo-European langu

MFCC + SDC + SVM and SFCC + SDC + SVM

MFCC
MFCC + SDC ++SVM
SDC + SVM SFCC + SDC
SFCC ++SDC
SVM
+ SVM

Bengali Gujarati Hindi Punjabi Bengali Gujarati Hindi Punjabi

Test duration M.O. M.O. M.O M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O.
(sec) 64 128 64 128 64 128 64 128 64 128 64 128 64 128 64 128

3 76 72 79 77 83 90 74 77 76 79 73 70 83 83 73 65
10 90 91 91 90 94 97 89 88 87 91 87 87 98 97 84 82
30 98 95 90 92 98 99 98 94 93 95 84 81 99 99 89 83

neighbouring languages, Chhattisgarhi

state)20. Odisha and O
has been ruled by several rulers from
ancient times. Its than
higher Dravidian influence boundaries have been reformed and and K
Marathi
This should be due to historical reasons.
reshaped from time to time. During the reign of rulers
Odisha is the modern name
like Anantavarma of Kalinga.
Chodagangadeva of the Ganga dynasty,Kalin
prised of most parts boundaries
of modern-day
of Odisha extended from RiverOdisha
Ganga in the an
north to River Godavari
dhra region of Andhra Pradesh (Dravidian- in the south. Godavari is in the

CURRENT SCIENCE, VOL. 110, NO. 4, 25 FEBRUARY 2016 675

This content downloaded from

49.15.139.42 on Thu, 20 May 2021 06:13:44 UTC
All use subject to https://about.jstor.org/terms
RESEARCH ARTICLES

110 s 130 s
□ 3s BIOs ■ 30 s (g) 03 s H 10 s »30 s □ 3s BIOs B30 s

/ </
J* ** if #
SBP

Languages
Languages —»—» Languages —»
—* Languages
Languages » Languages•
Languages■
Model order 64 Model order 128 Model order 64 Model order 128

MFCC + SDC + GMM SFCC + SDC + GMM

© <2)
□
□ 3s
3s ■BIOs
10 s B
■ 30
30 s □
□ 3s
3s BBIOs
10 s B
■ 30
30 s □ 3s
3s BIOs
B10s B30 s
130s □ 3s
□ 3s BIOs
B10sB30
«30s
s

Languages —s Languages Languages Languages

Languages
Model order 64 Model order 128 Model
Model orderorder
64 64 Model Model
order order 128
128

MFCC + SDC + SVM SFCC + SDC + SVM

Figure 10 a d. Graphical representation of identification accuracy of Indo-European languages which are neighbouring to D
ing the four systems.

present-day Andhra Pradesh. Also, the first king gradually

of thebecame the centre of Portuguese India24. Portu
Surya dynasty, Gajapati Kapilendradeva, extended gal is anhis
Indo-European language3. Even prior to Portu
empire from River Ganga to River Kaveri which guese control, Goa had trade connections with foreign
includes
land from early times. Influence of all these foreign lan
regions of present-day Tamil Nadu (Dravidian-speaking
guages must have been more than the Dravidian influence
state). All these might have resulted in the intermingling
of Oriya with the Dravidian languages21. on Konkani. Earlier during the Satavahana dynasty, Kon
kani language
In Chhattisgarh, the Chalukya dynasty established its was highly influenced by Maharashtri
rule during the middle ages22. This dynasty ruled Prakrit, which was the administrative language of the
parts
of southern and Central India covering modern-day period25. Prakrit also belongs to the Indo-European group
Karnataka, Andhra Pradesh and parts of Maharashtraof languages. Goa had a large Persian and Greek Bud
(Karnataka and Andhra Pradesh are Dravidian-speaking
dhist population. During the Kadamba rule, the port of
states). This can be a possible reason behind highGoapakapattna
Dravid in Goa became a centre for trade. It had
ian influence in Chhattisgarhi language. trade contacts with several Indian states and foreign
On the other hand, Konkani, though spoken incountries.
a neigh
bouring state (Goa), has little Dravidian influence.Maharashtra
There was ruled by the Mauryas, the Satavaha
nas, the
could be several reasons for this. The most important rea Rashtrakutas, the Chalukyas and several other
son possibly is the Portuguese colonization over IndianGoadynasties. But all these do not seem to have influ
shortly after Vasco da Gama entered into India23, enced the andlanguage of the region much. In Maharashtra,

676 CURRENT SCIENCE, VOL. 110, NO. 4, 25 FEBRUARY 2016

This content downloaded from

49.15.139.42 on Thu, 20 May 2021 06:13:44 UTC
All use subject to https://about.jstor.org/terms
RESEARCH ARTICLES

□ 3s ■ 110s
10 s ■
B30
30 s 03s
03s ■
BIOs
10 s «30
130 s □ 3s 30 s □
□ 3s
3s ■
■ 10
10s
s 130
130ss
100
100 100
90
| 90
90
BO 80
70
_ 80
70
£ 70
60
60 5" 60
ra
50
50 3 50
0
40
40
CO 40
30
30
.§ 30
20
20
S 20
10
0
10
1 10
0
I 0
& ê & s #
M # # #
<$>

Languages
Languages —>
•rj
<toe■"V
^
Languages »•
: Languages
Languages a» Languages
Languages ■
Model order 64 Model order 128 Model
Model order
order
64 64 Model Model order
order 128
128
MFCC SDC + GMM SFCC + SDC + GMM

03s Q 10s 130s □ 3s

3s M
S 10
10 ss «30
130 s □
□ 3s
3s110
B10s
s ■130s
30 s
100 100

90
| 90
80 ^ 80
£
70 & 70
60
§ 60
50 3 50
40 ra 40

30 § 30
20

10 10

0 0

#J-
### > > ,#
V<?>
&UjS»<?
-
Languages
Languages — — Languages
Languages *> Languages Languages
Languages *
Model order 64 Model order 128 Model order 64 Model order 128

MFCC + SDC + SVM SFCC + SDC + SVM

Figure 11 a-d. Graphical representation of identification accuracy of Indo-European languages which are non-neighb
family, using the four systems.

the dominant community is Maratha, which is a neighbouring

result of to Indo-European speaking states, using all
Aryan penetration from north and northeast26. the The
fourlan
systems respectively. Figures 12 and 13 show a
guage has maintained its originality, though small varia
graphical representation of the same.
tions have taken place with time. The results suggest that neighbouring and non
A look at the four systems shows that Odianeighbouringaccuracy classification is not suitable for these
increases when SVM is used. The rest of the languages languagesbecause there are only four languages and
show consistent results with all the systems. Malayalam, though a non-neighbouring language, is
In case of non-neighbouring languages, Gujarati expected
results to have Indo-European influence as suggested
are comparatively lower using GMM. This is by overcome
historical evidence and verified by experimental re
using SVM. For the rest of the languages, overall accu had been in contact with foreign land since
sults. Kerala
racy does not show much variation from system the 15th
to century, when Vasco da Gama arrived in pre
system. Accuracy is much higher than the neighbouring sent-day Kozhikode in 1498 in order to trade spices27-29.
languages (in the range 62-100%). Gradually, the Portuguese defeated the local rulers and
started ruling over Kerala. It is to be noted that similar
Influence of Indo-European family on Dravidian Portuguese colonization also happened in Goa, which re
languages sulted in low Dravidian influence (better identification
accuracy of Konkani). After the Portuguese, Kerala came
Tables 7 and 8 show the identification accuracy of Dra under Dutch rule. Finally by the end of 18th century, the
vidian languages which are neighbouring and non whole of Kerala came under British control.

CURRENT SCIENCE, VOL. 110, NO. 4, 25 FEBRUARY 2016 677

This content downloaded from

49.15.139.42 on Thu, 20 May 2021 06:13:44 UTC
All use subject to https://about.jstor.org/terms
RESEARCH ARTICLES

Table
Table 7.
7. Identification
Identificationaccuracy
accuracy
of of
Dravidian
Dravidian
languages
languages
which
which
are neighbouring
are neighbouring
to Indo-European
to Indo-European
family, family,
using theusing
four

MFCC
MFCC + SDC + GMM
+ SDC + GMM SFCC
SFCC ++SDC
SDC + GMM MFCC
+ GMM + +SDC
MFCC + SVM
SDC + SVM SFCCSFCC
+ SDC + +SVM
+ SDC SVM

Kannada Telugu Kannada Telugu Kannada Telugu Kannada Telugu

Test duration M.O. M.O. M.O M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O.
(sec) 64 128 64 128 64 128 64 128 64 128 64 128 64 128 64 128

3 40 44 42 59 54 51 53 51 77 69 54 58 78 77 49 55
10 43 47 52 56 55 47 54 56 83 76 42 42 88 84 45 41
30 42 50 54 57 64 57 52 52 85 83 42 38 90 88 39 34

□ 3s B 10 s 130 s 03s BIOs

10 sB30
B30 s
s □ 3s BIOs B30s

Kannada Telugu Kannada Telugu Kannada Telugu Kannada Telugu

Languages —►
—* Languages —» Languages »
Languages Languages
Languages ►
»
Model order 64 Model order 128 Model order 64 Model order 128

MFCC + SDC + GMM SFCC + SDC + GMM

□ 3s
□ 3 s ■ 10 s 130 s Bios B30s
□ 3s BIOs

&
5"
2
9
8
CO

c
a>
T)

Kannada
Kannada Telugu
Telugu Kannada Telugu Kannada Telugu Kannada
Kannada Telugu
Telugu
Languages
Languages - Languages
Languages » Languages *
» Languages >
Model order 64 Model order 128 Model order 64 Model order 128

MFCC + SDC + SVM SFCC + SDC + SVM

Figure 12 a-d. Graphical representation of identification accuracy of Dravidian languages which are neighbouring to Indo-European family, us
ing the four systems.

Unlike Kerala, Tamil Nadu was ruled mostly by Indian Malayalam, which is in accordance with historical evi
rulers like the Pallavas, the Rashtrakutas, the Cholas and
dence.
the Pandyas. So colonial influence is not expected inTelugu is the only Dravidian language which consis
Tamil. The Tamil results show better accuracy thantently gives low accuracy. This is because Andhra Pradesh

678 CURRENT SCIENCE, VOL. 110, NO. 4, 25 FEBRUARY 2016

This content downloaded from

49.15.139.42 on Thu, 20 May 2021 06:13:44 UTC
All use subject to https://about.jstor.org/terms
RESEARCH ARTICLES

Table
Table8. 8.
Identification
Identification
accuracy
accuracy
of Dravidian
of Dravidian
languages which
languages
are non-neighbouring
which are non-neighbouring
to Indo-European family,
to using
Indo-European
the four systems
family, using th

MFCC
MFCC + SDC + GMM
+ SDC + GMM SFCC
SFCC ++SDC
SDC + GMM MFCC
+ GMM + +SDC
MFCC + SVM
SDC + SVM SFCCSFCC
+ SDC + +SVM
+ SDC SVM

Malayalam Tamil Malayalam Tamil Malayalam Tamil Malayalam Tamil

Test duration M.O. M.O. M.O M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O. M.O.
(sec) 64 128 64 128 64 128 64 128 64 128 64 128 64 128 64 128

3 65 65 73 76 58 61 67 73 80 81 77 80 70 79 79 79
10 59 63 79 82 69 60 76 85 77 72 79 82 74 71 81 81
30 66 74 87 95 73 63 80 86 76 76 77 86 76 71 83 83

10 s 130 s 110 s 130 s

Malayalam
MalayalamTamil Tamil Malayalam Tamil
Malayalam Tamil Malayalam
MalayalamTamil
Tamil
Languages
Languages
—* —* Languages—*■ Languages *
> Languages
Languages >:
Model order 64 Model order 128 Model order 64 Model order 128
MFCC + SDC + GMM SFCC + SDC + GMM

□ 3s 110 s 130 s
100 □
□ 3s
3s■B10s
10 s

I 90

• 80

>. 70
o

| 60
8 50
(C

c 40
o

130
| 20
-o 10

Malayalam
Malayalam Tamil
Tamil Malayalam
Malayalam Tamil
Tamil Malayalam
Malayalam Tamil
Tamil Malayalam
MalayalamTamil
Tamil
Languages
Languages —— Languages Languages
Languages —*
—> Languages—>
Model
Model order 64
order 64 ModelModel
orderorder128
128 Model order 64 Model order 128

MFCC + SDC + SVM SFCC + SDC + SVM

Figure
Figure13
13a-d.
a d.
Graphical
Graphical
representation
representation
of identification
of identification
accuracy accuracy
of Dravidian
of languages
Dravidianwhich
languages
are non-neighbouring
which are non-neighbouring
to Indo-European family,
to Indo-Eur
using
usingthe
thefour systems.
four systems.

princely
was under Mughal rule from 14th to 18th century. During state of Hyderabad. This heralded an era of Pe
this time, Telugu was highly influenced by Urdu. In the
sian/Arabic influence on Telugu30. Figure 14 shows tha
latter half of the 17th century, Mughal rule Telugu
extended has descended from the Proto-South-Central
further south culminating in the establishment of thebranch, whereas the other three Dravidian
Dravidian
CURRENT SCIENCE, VOL. 110, NO. 4, 25 FEBRUARY 2016 679

This content downloaded from

49.15.139.42 on Thu, 20 May 2021 06:13:44 UTC
All use subject to https://about.jstor.org/terms
RESEARCH ARTICLES

Figure 14. The Dravidian family tree.

Conclusion
languages have descended from Proto-South-Dravidian
branch3. This separates out Telugu from the other three
Dravidian languages. This study uses machine learning for automatic recogn
tion of
A look at all the systems indicates that Kannada the major Indian language families. MFCC and
results
show significant improvement when SVM SFCC in conjunction
is used. For with SDC have been used as featur
the rest of the languages, results do not showextraction tools. For modelling of feature vectors, GMM
much varia
tion from system to system. and SVM are used. It is found that all the four system
can identify the language families with high accuracy
We have evaluated the influence of one language famil
Analysis
on the other. We see that in most of the cases the neig
bouring languages are influenced more by the other family
There are certain notable facts in the results. Marathi, in
Also, among the neighbouring languages, some show
spite of being a neighbouring Indo-European language,
higher or lower influence than others. This can be linked
shows high accuracy indicating low Dravidian influence.
to certain known historical facts. The work opens new
This can be possibly due to non-porosity among the
scope of study which will enable us to know our histor
border states. Chhattisgarhi, Odia and Telugu give con better.
sistent low results. A look at the geographical location of
the three states speaking these languages shows that they
1. Ishtiaq, M., Language Shifts Among the Scheduled Tribes in India:
are neighbouring states lying on the eastern region of
A Geographical Study, Motilal Banarsidass Publ., 1999.
India. There should be some specific historical reason 2. CIA World Factbook; https://www.cia.gov.
behind this high intermingling of languages in these re 3. Ethnologue: Languages of the World; http://www.ethnologue.com
gions apart from the ones stated above, which is yet to be 4. Encyclopedia Britannica; http://www.britannica.com
found out. 5. Zissman, M. A., Automatic language identification of telephon
speech. Lincoln Lab. J., 1995, 8(2), 115-144.
Overall accuracy of non-neighbouring Indo-European
6. Torres-Carrasquillo, P. A., Reynolds, D. A. and Deller Jr, J. R.
languages is more than overall accuracy of non Language identification using Gaussian mixture model tokeniz
neighbouring Dravidian languages. One reason for thistion. In International Conference Spoken Language Processing,
could be the geographical extent of the Indo-EuropeanDenver, Colorado, United States, September 2002.
7. Torres-Carrasquillo, P. A., Singer, E., Kohler, M. A., Green
speaking states. For example, Punjabi or Bengali (even
R. J., Reynolds, D. A. and Deller Jr, J. R.. Approaches to languag
Hindi in a few cases) is spoken in states which are far
identification using Gaussian mixture models and shifted delt
away from the Dravidian states. So it is unlikely that cepstral features. International Conference Spoken Languag
Dravidian influence would reach these states. Whereas Processing, Denver, Colorado, United States, 2002.
even the farthest Dravidian-speaking states are geographi
8. Campbell, W. M., Campbell, J. P., Reynolds, D. A., Singer, E. an
cally more closer to the Indo-European-speaking states. Torres-Carrasquillo, P. A., Support vector machines for speaker

680 CURRENT SCIENCE, VOL. 110, NO. 4, 25 FEBRUARY 2016

This content downloaded from

49.15.139.42 on Thu, 20 May 2021 06:13:44 UTC
All use subject to https://about.jstor.org/terms
RESEARCH ARTICLES

and language recognition. Computer Speech and Language, 18. Reynolds, D. A., Quatieri, T. F. and Dunn, R. B., Speaker verifica
Elsevier, 2006, pp. 210-229. tion using adapted Gaussian mixture models. Digital Signal Proc
9. Li, H., Ma, B. and Lee, K. A., Spoken language recognition: ess.,
from2000,10(1-3), 19-41.
fundamentals to practice. Spoken Language Recogn., Proc. IEEE,Bharati; http://newsonair.nic.in
19. Prasar
December 2012. 20. Majumdar, R. C., Raychaudhuri, H. C. and Datta, K., An
10. Davis, S. B. and Mermelstein, P., Comparison of parametric repre
Advanced History of India, Macmillan, 1946.
sentations for monosyllabic word recognition in continuously
21. http://orissa.gov.in
spo
ken sentences. IEEE Trans. Acoust., Speech Signal Process.,
22.1980,
http://chhattisgarh.nic.in
28(4), 357-366. 23. Diffie, B. W. and Winius, G. D., Foundations of the Portuguese
11. Vergin, R., Shaughnessy, D. O. and Farhat, A., Generalized Empire,
mel 1415-1850, University of Minnesota Press, Minnesota
frequency cepstral coefficients for large-vocabulary speaker Archive Editions, 1977.
independent continuous-speech recognition. IEEE Trans. 24. Shastry, B. S. and Borges, C. J., Goa-Kanara Portuguese Rela
Speech
Audio Process., 1999, 7(5), 525-532. tions, 1498-1763, The Xavier Centre of Historical Research,
2000.
12. Paliwal, K., Shannon, B., Lyons, J. and Wojcicki, K., Speech
signal-based frequency warping. IEEE Signal Process. Lett.,
25. 2009,
http://www.goakonkaniakademi.org
16(4), 319-322. 26. http://www.mu.ac.in
27. Ravindran,
13. Matejka, P., Bürget, L., Schwarz, P. and Cernocky, J., Brno Uni P. N.. Black Pepper: Piper Nigrum, CRC Press,
versity of Technology System for NIST 2005 Language Recogni 2004.

tion Evaluation. In IEEE Odyssey - The Speaker and Language

28. Curtin, P. D., Cross-Cultural Trade in World History, Cambridge
Recognition Workshop, San Juan, Puerto Rico, 28-30 June 2006. University Press, 1984.
14. Kohler, M. A. and Kennedy, M., Language Identification29. Using
Mathias Mundadan, A., From the Beginning up to the Middle of
the Sixteenth Century (up to 1542) (History of Christianity in
Shifted Delta Cepstra, In Circuits and Systems Conference, IEEE,
4-7 August 2002. India), Church History Association of India, 1989.
15. Reynolds, D. A. and Rose, R. C., Robust text-independent30.speaker
http://www.aponline.gov.in
identification using Gaussian mixture speaker models. IEEE
Trans. Speech Audio Process., 1995, 3(1), 72-83.
16. Haykin, S., Neural Networks and Learning Machines, Received
Pearson8 April 2014; accepted 30 August 2015
Education Inc., 2011.
17. Campbell, W. M., Sturim, D. E. and Reynolds, D. A., Support vec
tor machines using GMM supervectors for speaker verification.
IEEE Signal Process. Lett., 2006,13(5), 308-311. doi: 10.18520/cs/v 110/i4/667-681