A Genetic Algorithm-Based 3D Feature Selection For Lip Reading
A Genetic Algorithm-Based 3D Feature Selection For Lip Reading
Abstract—In lip reading, selection of features play crucial used to test classifiers. Meyor et al. [4] used DCT transform
role. In lip reading applications database is video, so 3 technique for pixel information of continuous digit recognition
Dimensional transformation is appropriate to extract lip motion and proposed different fusion techniques for audio and video
information. State of art the lip reading is based on frame feature data. They found that Word Error Rate (WER) is more
normalization and frame wise feature extraction. However this is
for continuous digit recognition.
not appropriate due to chances of information loss during frame
normalization. Also all the frames cannot be considered equally High dimensional feature set can negatively affect the
as they bear varying motion information. In this paper 3D performance of pattern or image recognition systems. In other
transform based method is proposed for feature extraction. words, too many features sometimes reduce the classification
These features are the input to Genetic Algorithm (GA) model accuracy of the recognition system since some of the features
for discriminative analysis. Genetic Algorithm is used for may be redundant and non- informative [5]. In machine
dimensionality reduction and to improve the performance of the learning , feature selection, which is also called variable
classifiers at low cost of computation. Both testing and training selection or variable subset selection, is the process of
time for classifier is reduced by compact feature size. For obtaining a subset of relevant features is useful in many
experimentation of digit utterances CUAVE and Tulips database
are used. The results obtained are compared with various feature
application. There are lots of techniques available for
selectors from WEKA software. It is found that from obtaining such subsets. Some of these techniques include
classification accuracy point of view proposed method is better Principal Component Analysis (PCA), Particle Swarm
than others. Optimization (PSO) and Genetic Algorithm (GA) [6]. More
often, lots of researchers in recent times have employed
Keywords— BPNN, Feature Selection, Genetic Algorithm, Waikato Environment for Knowledge Analysis (WEKA)
KNN, SVM, 3D-DWT, 3D-DCT. software for dimensionality reduction. However, WEKA
software is static in its feature selection approach as the users
I. INTRODUCTION cannot change the configuration of the concerned feature
Visual speech recognition is a technique used to identify the selectors [7]. GA has been known to be a very adaptive and
speech by lip movement. Lip reading is also called visual efficient method of feature selection as reported by [6]. It is to
speech recognition. Best lip reader can understand the speech be noted that feature selection is inherently a multi-objective
by lip movement. Hearing impaired person understand the problem with two main goals: 1) minimizing both the number
speech by lip movement. From long days people know that the of features 2) reduce classification error.
lip movement having speech information. In Many
applications such as speech in noisy area, places where you do Frame work of proposed lip reading model is described in
not have to speak and disaster condition (earthquake) visual section-II. Section III deals with GA based Feature selection.
speech is important. Experimentation results and description of test corpus is given
State of art literatures on appearance model are many, out of in section-IV. Finally section-V is based on our conclusion and
which few noteworthy literatures are cited here for basic scope for future work.
understanding of challenges in lip reading paradigm. E.
II. PROPOSED LIP READING FRAMEWORK
Petajan [1] experimented on lip-reading to enhance speech
recognition by using visual information. Potamianos et al. [2] A typical lip reading system consists of three major stages:
used linear image transforms namely PCA, DWT and DCT video frame normalization and lip localization, feature
transform techniques. R. Seymour et al. [3] compared image extraction, and the final step is classifier. In our proposed
transform features in visual speech recognition of clean and model one more stage can be added as feature selection. Fig. 1
shows the major steps used in the proposed lip reading process.
corrupted videos. They evaluated PCA, DCT, Fast Discrete
Curvalet Transform (FDCT), and Linear Discriminant
Analysis (LDA) methods. The classification performance
parameters such as specificity, sensitivity and accuracy are
978-1-4799-6272-3/15/$31.00(c)2015 IEEE
Authorized licensed use limited to: KIIT University. Downloaded on July 17,2023 at 04:24:18 UTC from IEEE Xplore. Restrictions apply.
Features 2𝑛
Normalised extraction Classifiers
Frames and Lip by 3 D DWT/ (SVM,BPNN 𝑊ℎ (𝑛, 𝑗) = � 𝐼(𝑚, 𝑗 − 1) ∗ 𝑔(2𝑛 − 𝑚) (2)
localisation 3 D DCT , KNN) 𝑚=0
where W (n, j) is wavelet output. h (n) and g(n) are the filter
Fig. 1. Lip reading process impulse response of low pass and high pass filter, j is the
current level, n is the current input index and I(n, j − 1) is the
A. Video Segmentation and Lip Contour Localization input signal. The results from [8] motivated us to select Dmey
There are large inter and intra subject variations in wavelet, as in lip reading application as the speech
utterance of a digit and this results in difference in the number information is extracted from visual information.
of frames for each utterance. We have used audio analysis,
using Pratt software to segment the time duration and the III. FEATURE SELECTION
associated video frames of each digit which is uttered. On an A. Genetic Algorithm (GA)
average 16 frames are sufficient for utterance of any digit
between 0-9.Out of 16 frames we have selected 10 significant Genetic Algorithm (GA) is an optimization technique, a
frames. We have used Adaboost algorithm for face and mouth population-based and algorithmic search heuristic methods
detection. A sample result is shown in Fig. 2(a-b). that mimic natural evolution process of man [12, 13]. The
operations in a GA are iterative procedures manipulating one
population of chromosomes (solution candidates) to produce a
new population through genetic functional such as crossover
and mutation. S. Sivanandam [14] used the terminology
between human genetic and GA. Table 1 is a comparison
between human genetic and GA. The fitnesses of the solution
candidates (chromosomes) are evaluated using a function
commonly referred to as objective or fitness function. The
formulation of the fitness function depends on the problem
being solved. In relation to this article, maximizing
classification accuracy is equivalent to minimizing the
Fig. 2. (a) Detection of face and lip area for CUAVE datbase s02m(b) Lip classification error rate. Fig. 3 shows Selection of 3D-DWT or
portion 3D-DCT features using Genetic Algorithm.
Authorized licensed use limited to: KIIT University. Downloaded on July 17,2023 at 04:24:18 UTC from IEEE Xplore. Restrictions apply.
respectively. The three (3) kids viz elite, crossover and population. Tournament selection is performed iteration until
mutation then form the new population (new generation). the new population is filled up.
Crossover (a genetic functional) is a combination of two e. Crossover function
individuals (chromosomes) to form a crossover kids. Mutation In this XOR operation is performed on the two parent
operator on the other hand, is used for genetic perturbation of chromosomes since they are binary
the genes in each chromosome through bits flipping depending 𝑐𝑜𝑘𝑖𝑑𝑠(𝑛) = 𝑝1 𝑥𝑜𝑟 𝑝2 (4)
on the mutation probability. Using the steps in Fig. 6, GA-
based feature selection is explained in this section [6]. Where n is an index that is from 1 to number of kids, p1 is first
parent chromosome and p2 is second parent chromosome.
TABLE 1 COMPARATIVE TERMINOLOGY BETWEEN HUMAN GENETIC AND
GA.
f. Mutation function
Mutation is genetic perturbation of individuals in a
S.No. Human genetic GA terminology population. Mutation ensures genetic diversity and searching
1 Chromosomes Bit strings of broader solution. In this paper uniform mutation is used and
2 Genes Features GA generates set of random numbers from uniform
3 Allele Feature value distribution.
4 Locus Bit position
g. Termination of GA
5 Genotype Encoded string
6 pheotype Decoded genotype Once the GA reaches the optimum solution, it stops. TWO
stopping conditions are applicable: i) Maximum number of
generation. Here this value is 100. ii) Stall generation limit. Its
TABLE 2 PARAMETERS USED IN GA value is 0.000001.
Authorized licensed use limited to: KIIT University. Downloaded on July 17,2023 at 04:24:18 UTC from IEEE Xplore. Restrictions apply.
TABLE 3 RECO. ACCURACY USING GA AND IG FEATURE SELECTION FOR
3D-DWT FEATURES. (CUAVE AND TULIP DATABASE)
Fig. 5. Plot of Fitness value vs generation for 3-D DCT ( CUAVE data 9 CFS+KNN 70.28 61.45
base)
Database CUAVE
A. CUAVE database S.No. Selector+Classifier Reco. Accuracies(%)
1 GA+BPNN 74.00
CUAVE [15] (Clemson University Audio Visual 2 GA+SVM 72.85
Experiments) was recorded by E. K. Pattererson of 3 GA+KNN 68.28
Department of Electrical and Computer Engineering, Clemson 4 IG+BPNN 69.14
University, US. The database was recorded in an isolated 5 IG+SVM 72.41
sound booth at a resolution of 720 x 480 with the NTSC 6 IG+KNN 70.28
7 CFS+BPNN 70
standard of 29.97 frames per second using 1 Megapixel-CCD
8 CFS+SVM 71.41
camera. This database is a speaker-independent database 9 CFS+KNN 70.28
consisting of connected and continuous digits spoken in
different situations. The database consists of two major V. CONCLUSION
sections: one of speaker pairs and the other one of
individuals.
In most cases, the difference in the classification accuracy
B. Tulips Database reported by the two approaches is very small. The features
Tulips are a small audiovisual database of 12 subjects saying selected by both method on the CUAVE and Tulips database
the first 4 digits in English. Subjects are undergraduate respectively. In overall, both the GA method and WEKA-CFS
students from the Cognitive Science Program at UCSD. The which are wrapper-based feature selectors produced better
database was compiled at R. Movellan's laboratory. Tulips classification accuracy than WEKA ranker (IG) which is
contains the video files in 100 x 75 pixel 8 bit gray level. filter-based. The main advantage of the method herein lies in
Each frame corresponds to 30 frames per second. R. Movellan the area of controllability as the GA can be fine-tuned to
presents work on speaker independent visual speech produce better results all the time by changing the fitness
recognition system and used simple HMM as a classifier used functions. Of all the features selected. With the application of
Tulips database of 1 to 4 digits for testing result [16]. GA for dimensionality reduction, more discriminating features
were obtained. After feature selection with GA the
C. Results performance of 3D-DWT is better for BPNN classifier. In
Results of GA are compared to two WEKA based feature future combination of 3D DWT and 3D DCT are used get
reduction technique The reduced feature vector are tested most discriminative feature between them.
classifiers such as SVM, BPNN, KNN. Figs. (4-5) show the
convergence of GA based on the chosen fitness function.
From the Figs. (4-5) fitness evaluation value is small for 3D
DWT as compared to 3D DCT. This is also reflected in
performance of 3D DWT. The performance of 3D DWT is
better as compared to 3D DCT. After GA based feature
selection features are reduced to 32% of original features.
Authorized licensed use limited to: KIIT University. Downloaded on July 17,2023 at 04:24:18 UTC from IEEE Xplore. Restrictions apply.
Start
N X 164 chromosomes
Fitness function
Selection of parent
chromosomes
Crossover Mutation
Elite Kids Kids Kids
Best Termination
Chromosom condition
Reduce
feature vector Fitness
Evaluation
End
New Population
Authorized licensed use limited to: KIIT University. Downloaded on July 17,2023 at 04:24:18 UTC from IEEE Xplore. Restrictions apply.
[11] S. Morade and S. Patnaik, “Lip reading by using 3-D Discrete [15] E. Patterson, S. Gurbuz, Z. Tufekci and J. Gowdy, “CUAVE: a
Wavelet Transform with Dmey wavelet” , IJIP, Vol 8, 385-396, new audio-visual database for multimodal human computer-
2014. interface research”, Proceedings of IEEE International
[12] J.Tian, Q. Hu, X. Ma and M.Ha “An Improved KPCA/GA-SVM conference on Acoustics, speech and Signal Processing, 2017-
Classication Model for Plant Leaf Disease Recognition Journal of 2020, 2002.
Computational Information Systems” 18, 7737-7745, 2012. [16] J. R. Movellan “ Visual Speech Recognition with Stochastic
[13] Melanie M. “An Introduction to Genetic Algorithms A Bradford Networks” Advances in Neural Information Processing Systems,
Book” The MIT Press, 1999. MIT Pess Cambridge, Vol. 7, 1995.
[14] N. Sivanandam and S. N. Deepa,” Introduction to Genetic
Algorithms Springer-Verlag” , Berlin, Heidelberg, 2008
Authorized licensed use limited to: KIIT University. Downloaded on July 17,2023 at 04:24:18 UTC from IEEE Xplore. Restrictions apply.