Application of Soft Computing to Face Recognition
Research scholar Guide
Abdullah Gubbi (4PA09PEM03) Dr. Mohammad Fazle Azeem
Department Of Electronics & Communication Engineering PA College of Engineering, MANGALORE
1
Contents
Introduction. Literature Survey. Objective of the work. Work carried out so far. Neural Network Based Face Recognition. Summary of the eigen-face Recognition Procedure. Type-2 Fuzzy Logic for Edge Detection of Gray Scale Images. Edges Detection by Type-2 FIS. Results and Discussions. Further work to be carried out. Conclusion
Introduction
Face recognition is one of the most important abilities that we use in our daily lives. Research in automatic face recognition started in the 1960s. Because of the nature of the problem, not only computer science researchers are interested in it, but also neuroscientists and psychologists. It is widely believed that one can instantly recognise thousands of people with whom one is familiar. As with many perceptual abilities, the ease with which humans can recognise faces disguises the complexity of the task.
Introduction
Why Face Recognition?
Security
Fight terrorism Find fugitives
Personal information access
ATM Sporting events Home access (no keys or passwords) Any other application that would want personal identification
Improved human-machine interaction Personalized advertising Beauty search
Introduction
4
Face Recognition System Requirements
Want the system to be inexpensive enough to use at many locations. Match almost instantaneously
Before the person walks away from the advertisement Before the fugitive has a chance to run away
Ability to handle a large database Ability to do recognition in varying environments
Introduction
Problem Statement
Given still or video images of a scene, identify one or more persons in the scene using a stored database of faces, or/and with available collateral information such as race, age and gender may be used in narrowing the search.
Introduction
What Is Difficult About Face Recognition
Lighting variation Orientation variation (face angle) Size variation Large database Processor intensive Time requirements
Facial Variations: (a) Original Image (b) noise , (c) expression, (d) illumination , (e) pose , and (f) ageing
Introduction 8
Challenges
Variations in pose Head positions, frontal view, profile view and head tilt, facial expressions
Illumination Changes
Light direction and intensity changes, cluttered background, low quality images
Camera Parameters
Resolution, color balance etc.
Occlusion
Glasses, facial hair and makeup
Face Normalization
Adjustment
Expression Rotation Lighting Scale Head tilt Eye location
Introduction 10
General Observations about Fuzzy Logic
Conceptually easy to understand
Tolerant to imprecise data. Built on top of the experience of experts.
Model nonlinear complexity.
functions
of
arbitrary
Blended with conventional control techniques.
Introduction
11
ANNs in Real Face Recognition
Many architectures are available but MLP is popular with back propagation algorithm. Disadvantages: Complex and difficult to train Difficult to implement Sensitive to lighting variation
Introduction 12
General Image Types
Still image (digital photograph) Still image can vary a lot from picture to picture, need face detection Dynamic image (Video camera) Dynamic image requires motion detection and head tracking
Introduction
13
Static Matching Mugshots matching are the most common application in this group. Typically, images in mug shots applications are of good quality, consistent with existing law enforcement standards These standards could involve the type of background, illumination, resolution of the camera, and the distance between the camera and the person being photographed.
Dynamic Matching The images available through a video camera tend to be of low quality. segmenting a face in the crowd difficult. One may also be able to do partial reconstruction of the face image using existing models. One of the strong constraints of this application is the need for real-time recognition.
Introduction
14
Literature Survey The objective is to explore approaches, algorithms, and technologies available for automated face recognition. The contemporary face recognition algorithms can mainly be classified into two categories. Model-based schemes: This uses shape and other
texture of the face, along with 3D depth information. Appearance-based schemes: This uses the holistic texture features.
Literature Survey
15
Model-Based Schemes
Changes in shape and changes in texture pattern across the face. Both shape and texture can also vary because of differences between individual and also due to changes in expression, lighting, viewpoint variations. There exists a strong concept known as model based approaches (statistical models of appearance), The approach relies on a large and representative training set of facial images. A feature-based system, based on elastic bunch graph matching, was developed by Wiskott et al.[12] . 2D demorphable face model used through which the face variations are learned [14],
Literature Survey 16
Elastic Matching
Elastic matching is one of the pattern recognition techniques in computer science. Elastic matching (EM) is also known as deformable template, flexible matching, or nonlinear template matching. Elastic matching can be defined as an optimization problem of twodimensional warping specifying corresponding pixels between subjected images.
Literature Survey
17
Deformable Models
Templates are allowed to translate, rotate and deform to fit the best representation of the shape present in image Employ wavelet decomposition of the face image as key element of matching pursuit filters to find the subtle differences between faces Elastic graph approach, based on the discrete wavelet transform: a set of Gabor wavelets is applied at a set of hand-selected prominent object points, so that each point is represented by a set of filter responses, named as a Jet
Literature Survey
18
Facial Fiducial Points
Literature Survey
19
Template Matching Methods
Store a template
Predefined: based on edges or regions Deformable: based on facial contours (e.g., Snakes)
Templates are hand-coded (not learned) Use correlation to locate faces
Literature Survey
20
Face Template
Use relative pair-wise ratios of the brightness of facial regions (14 16 pixels): the eyes are usually darker than the surrounding face [Sinha 94] Use average area intensity values than absolute pixel values See also Point Distribution Model (PDM) [Lanitis et al. 95]
Ration Template [Sinha 94]
average shape [Lanitis et al. 95]
Template-Based Methods: Summary
Pros:
Simple
Cons:
Templates needs to be initialized near the face images Difficult to enumerate templates for different poses.
Literature Survey
22
Appearance-Based Schemes
Principal Component Analysis (PCA) [15] Linear Discriminant Analysis (LDA) [16] or Fishers LDA (FLD) or Fisherface method [43]. Independent Component Analysis (ICA) [19]. Locality Preserving Projections (LPP) [20]. Support Vector Machines (SVM) method [8], The merits of PCA, LDA and Bayesian subspace approaches are combined in a single framework model was presented in [27].
Literature Survey
23
Principal Component Analysis (PCA)
The PCA is an unsupervised learning technique and hence does not include the label information of the data. Given the eigenfaces as basis for a face subspace, a face image is compactly represented by a low dimensional feature vector and a face can be reconstructed as a linear combination of the eigenfaces. The eigenface based method of face recognition, as proposed by Turk and Pentland uses PCA to identify the image space axis with the highest variance in facial characteristics. Much of the discriminatory information required for recognition is contained within the higher order statistics of the face images. [Bartlett et al]
Literature Survey 24
Linear Discriminant Analysis (LDA)
Belhumeur et al have proposed to solve face recognition using Linear Discriminant Analysis (LDA), also called Fisherfaces or Fisher's Linear Discriminant (FLD). LDA is supervised dimensionality reduction method. LDA to produce a linear projection into a low dimensional subspace, similar to that used in the eigenface method. LDA compute an image subspace in which face image variance is maximised, similar to that used in the eigenface method. LDA minimize the distance between faces of the same person (within-class scatter [SW ]) and maximize the distance between faces of different person's (between-class scatter [SW ]) The within-class scatter SW is defined as where xc is the mean of class c, x is the total mean, Nc is the number of samples of class ci, and C is the number of classes.
Literature Survey
25
Independent Component Analysis (ICA)
ICA is very similar to PCA, but where PCA minimizes only the second-order dependencies, ICA also minimizes higher-order dependencies, finding components that are non-Gaussian. ICA originates from solving the blind source separation problem decomposing the input signal x into a linear combination of independent source signals. ICA is a method for finding underlying factors or components from multivariate (multi-dimensional) statistical data. What distinguishes ICA from other methods is that it looks for components that are both statistically independent, and non-Gaussian.
Literature Survey 26
ICA vs. PCA decomposition of 2D data set.
ICA vs. PCA decomposition of 2D data set. ( a )The bases of PCA (orthogonal) and ICA (non orthogonal). (b) Left: the projection of the data onto the top two principal components (PCA). (b) Right: the projection onto the top two independent components (ICA). (From Bartlett et al.)
Literature Survey
27
Locality Preserving Projections (LPP)
LPP also known as Laplacian faces, was proposed which optimally preserves the neighborhood structure of the data set [20]. The LPP is considered as an alternative to the PCA method. The main objective of LPP is to preserve the local structure of the input vector space by explicitly considering the manifold structure. Since it preserves the neighborhood information, its classification performance is much better than other subspace approaches like PCA and FLD .
Literature Survey
28
LPP Model
Let there be N number of input data points (x1, x2, , xN), which are in rM. The first step of this algorithm is to construct the adjacency graph G of N nodes, such that node i and j are linked if xi and xj are close with respect to each other in any of the following two conditions. k-nearest neighbors: Nodes i and j are linked by an edge, if i is among k-nearest neighbors of j or vice-versa. neighbors: Nodes i and j are linked by an edge if kxi xjk2 < , where kk is the usual Euclidean norm. Next step is to construct the weight matrix Wt, which is a sparse symmetric N N matrix with weights Wtij if there is an edge between nodes i and j, and 0 if there is no edge. Two alternative criterion to construct the weight matrix:
29
Literature Survey
LPP Model
The objective function of LPP model is to solve the following generalized eigen value eigen vector problem: XLXT a = XDXT a Eq(2.5)
Note: The XDXT matrix is always singular because of high-dimensional nature of the image space. To alleviate this problem, PCA is used as the preprocessing step to reduce the dimensionality of the input vector space.
Literature Survey
30
LPP
Consider xi yi
constraint : yi w xi
T
Literature Survey
31
LPP
Literature Survey
32
Laplacian Eigen maps versus LPP
Apply the similar idea for computing low-dimensional representation Laplacian Eigenmaps does not form explicit transformation LPP computes explicit linear transformation
Literature Survey
33
Support Vector Machine(SVM)
SVMs introduced in COLT-92 by Boser, Guyon & Vapnik. Became rather popular since. Theoretically well motivated algorithm: developed from Statistical Learning Theory (Vapnik & Chervonenkis) since the 60s. Empirically good performance: successful applications in many fields (bioinformatics, text, image recognition, . . . ) SVM are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. The basic SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the output, making it a nonprobabilistic binary linear classifier.
Literature Survey
34
Support vector machine(SVM)
Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on. In addition to performing linear classification, SVMs can efficiently perform non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces.
Literature Survey
35
Gaussian Functions
Gaussian functions (GF)modulated by sine waves are called Gabor functions in the field of signal and image processing.[wiki] GFs modulated by sine waves are called Gabor functions in the field of signal and image processing. GFs form a complete but non-orthogonal basis set. Expanding a signal using this basis provides a localized frequency description. simultaneous localization of spatial and frequency information. Gabor Wavelet transform could extract both the time (spatial) and frequency information from a given signal, and the tunable kernel size allows it to perform multi-resolution analysis.
Literature Survey
36
Mixture Model
In statistics, a mixture model is a probabilistic model for representing the presence of sub-populations within an overall population, without requiring that an observed data-set should identify the sub-population to which an individual observation belongs. The problems associated with "mixture distributions" relate to deriving the properties of the overall population from those of the sub-populations, "mixture models" are used to make statistical inferences about the properties of the sub-populations given only observations on the pooled population, without sub-population-identity information.
Some ways of implementing mixture models involve steps that attribute postulated sub-population-identities to individual observations (or weights towards such sub-populations), in which case these can be regarded as types of unsupervised learning or clustering procedures. However not all inference procedures involve such steps.
Literature Survey
37
Curvelet Transform
Curvelet Transform is a new multi-scale representation, most suitable for objects with curves.
Developed by Cands and Donoho (1999).
Still not fully matured. Seems promising.
Literature Survey
38
Point and Curve Discontinuities
A discontinuity point affects all the Fourier coefficients in the domain.
Hence the FT doesnt handle points discontinuities well.
Using wavelets, it affects only a limited number of coefficients.
Hence the WT handles point discontinuities well.
Discontinuities across a simple curve affect all the wavelets coefficients on the curve.
Hence the WT doesnt handle curves discontinuities well.
Curvelets are designed to handle curves using only a small number of coefficients.
Hence the CvT handles curve discontinuities well.
Literature Survey 39
Curvelet Transform
The Curvelet Transform includes four stages: Sub-band decomposition Smooth partitioning Renormalization Ridgelet analysis
Literature Survey
40
Gabor Wavelet
A simple model for the responses of simple cells in the primary visual cortex. It extracts edge and shape information. It can represent face image in a very compact way.
41
Gabor Wavelet
The Gabor transform, named after Dennis Gabor, is a special case of the short-time Fourier transform. It is used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time. The function to be transformed is first multiplied by a Gaussian function, which can be regarded as a window function, and the resulting function is then transformed with a Fourier transform to derive the time-frequency analysis. The window function means that the signal near the time being analyzed will have higher weight. The Gabor transform of a signal x(t) is defined by this formula:
42
Gabor Wavelet (cont)
Advantages:
Fast Acceptable accuracy Small training set
Disadvantages:
Affected by complex background Slightly rotation invariance
43
Gabor Wavelet
Gabor wavelet can be used to extract the information of face. Matching with the feature extracted by Gabor wavelet Advantages and Disadvantages are the same as that of Face recognition.
44
Gabor Wavelet (cont)
Real Part
Imaginary Part
45
The Face Processing System
Gabor Filtering
. . .
PCA
. . .
46
The Face Processing System
Gabor Filtering
. . .
ICA
. . .
47
DiaPCA
DiaPCA is a diagonalization PCA method. This method maintains the changes of the correlation between rows and columns of the image during the image reduced-dimension processing. It can overcome the shortcomings of 2DPCA[8] which only reflects the changes between the image rows but ignores the varied change s between the column. It can keeping the features of the image in a better level while reducing the image dimensions.
48
Proposed Work Aims to Address the Following Issues:
Give an overview of existing face recognition systems and the current state of research in this field. Identify the problems associated with existing face recognition systems and possible avenues of research that may help to address these issues. Improve the effectiveness of existing face recognition algorithms, by introduction of additional processing steps or adaptation of the method. Analyze and evaluate a range of face recognition systems applied to twodimensional data, in order to identify the advantages and disadvantages offered by the various approaches. Determine the most effective method of combining methodologies from the range of face recognition techniques, in order to achieve a more effective face recognition system. Evaluate this final face recognition system and present results in a standard format that may be compared with other existing face recognition systems. Identify limitations of the final face recognition system and propose a line of further research to combat these limitations.
Objective of the work
49
Outline of a typical face recognition system
50
Work carried out so far Neural Network Based Face Recognition
Face region chopped
Resized
Histogram Equalized
(A)Pre-processing steps. (B) The histogram of an image before (up) and after (down) the histogram equalization.
Work carried out so far
51
Work carried out so far
(a)The face space and the three projected images on it. Here u1 and u2 are the eigenfaces. (b)The projected face from the training database.
52
Work carried out so far Summary of the Eigen-face Recognition Procedure
1. Form a face library that consists of the face images of known individuals. 2. Choose a training set that includes a number of images (M) for each person for the people with some variation in expression and in the lighting. 3. Calculate the MxM matrix L, find its eigenvectors and eigenvalues, and choose the M' eigenvectors with the highest associated eigenvalues. 4. Combine the normalized training set of images according to Eq 5. to produce M' eigenfaces. Store these eigenfaces for later use. 6. For each member in the face library, compute and store a feature vector according to Eq. T =[w1 w2 wM ] 7. For each new face image to be identified, calculate its feature vector according to Eq. T =[w1 w2 wM ] 8. Use these feature vectors as network inputs and simulate network with these inputs
53
(A)Sample Faces (B) Normalized Training image set(D) Average face of the Sample Faces(C)Mean face (D) Eigenvalues corresponding to eigenfaces (E) Eigen Values quickly dropping
54
Results
Number of Hidden Neurons How to determine the number of hidden neurons is always a discussion topic in neural networks. The standard equation which does not mean the equation will work for every network. The rule did not work in our case, so trial and error is still the best way to work out the optimal number of hidden units. In our experiments, we trained the Neural neural with different numbers of hidden units (20 to 45) and recorded their recognition rates. Recognition rate for the ORL face database 75% Recognition rate for the YALE face database 83.333% Number of Training images = 18 Number of people in training phase = 1 Total number of test images = 15 Number of hidden nodes = 36
55
Uncertainty knowledge in Image Processing.
56
Block Diagram of Type-2 FIS
57
Original Name
Gradient
Type-1 FIS
Type-2 FIS
Image
Magnitude
Taj Mahal,
India
Baboon
Leena
58 Work carried out so far
Further work to be carried out
The face recognition process consists of two phases: feature selection and classification. Feature selection not only reduces the dimension of the data, but also makes verification more accurate.
59
Further work to be carried out
face recognition based on Gabor wavelets and Fuzzy Logic, and aims to build a new classifier so as to improve the accuracy of recognition. Using combination of Wavelet and PCA for feature extraction is the preprocessing method using wavelet transform before process by PCA. As this technique may give high recognition rate for face images with the lowvariation.
60
Further work to be carried out
From literature it is understood that, the same number of curvelet coefficients contain more edge information compared to wavelet coefficients. Since object recognition is driven by edge information we can use more efficiently this information for the better recognition. To investigate the effect of kernel-based learning algorithms such as SVM (Support Vector machine), Kernel-PCA, Kernel-FLD (Fisher Linear Discrminant) on recognition rate Regular linear subspace methods can be made to extract non-linear features through the application of kernel trick.
61
Further work to be carried out
Combining domains such as Wavelet and/or Gaussian Mixture Model (GMM) and/or PCA and/or Locality Preserving Projections (LPP) These approaches include the advantage of both spatial and frequency components through the application of wavelets. These approaches may speed up the subsequent ExpectationMaximization learning of the GMM step through the usage of reduced number of components as wavelet coefficients. Also to find their performance under noisy conditions.
62
Conclusion
Face recognition is active research area as long as we achieve 100% recognition. There are many algorithms and Techniques exits, all of them have some have pros and cons. One should aim for higher recognition rate with minimum computation time and it should be robust. Always there is scope for improvement. Further work never ends.
63
References
1. Webpage1, [Link] 2. [Link], Dr. [Link] and [Link]. Automatic Facial Feature Extraction and Expression Recognition based on Neural Network. (IJACSA) International Journal of Advanced Computer science and Applications, Vol. 2, No.1, January 2011. 3. M. Seibert and A. Waxman, Recognizing faces from their parts, in SPIE Proc.: Sensor Fusion N: Control Paradigms and Data Structures, vol. 1611, 1991, pp. 129140.. 4. S. Akamatsu, T. Sasaki, H. Fukamachi, and Y. Suenaga, A robust face identification scheme-KL expansion of an invariant feature space, in SPIE Proc.: Intell. Robots and Computer Vision X: Algorithms and Techn., vol. 1607, 1991, pp. 71-84. 5. M. Kirby and L. Sirovich, Application of the Karhunen-Loeve procedure for the characterization of human faces, IEEE Trans. Patt. Anal. and Mach. Intell., vol. 12, pp. 103-108, 1990. 6. Z. Hong, Algebraic feature extraction of image for recognition, Patt. Recog., vol. 24, pp. 211-219, 1991.. 7. B.A. Golomb and T. J. Sejnowski, SEXNET: A neural network identifies sex from human faces, in Advances in Neural Information Processing Systems 3 , D. S. Touretzky and R. Lipmann, Eds. San Mateo, CA: Morgan Kaufmann, 1991,.
64
10
11
12 13
[Link], [Link], [Link], Jorg Lange, [Link], [Link], and [Link]. Distortion invariant object recognition in the dynamic link architecture. IEEE transactions on Computers, 42(3):300310, 1993. K. Aizawa et al., Human facial motion analysis and synthesis with application to model-based coding, in Motion Analysis and Image Sequence Processing, Boston, MA Kluwer, 1993, pp. 317-348. M. Buck and N. Diehl, Model-based image sequence coding,in Motion Analysis and Image Sequence Processing, M. 1. Sezan and R. L. Lagendijk, Eds. Boston, MA: Kluwer, 1993, pp.285-315 H. Li, P. Roivainen, and R. Forchheimer, 3-D motion estimation in model-based facial image coding, IEEE Trans. Patt. Anal. and Mach. Intell., vol. 15, pp. 545-555, 1993. Ming Li and Baozong Yuan. 2DLDA: A statistical linear discriminant analysis for image matrix. Pattern Recognition Letters, 26(5):527532, 2005. [Link], [Link], [Link], and [Link]. Face recognition by elastic bunch graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):775770, 1997.
65
14 [Link], [Link], and [Link]. Active appearance models. IEEE transactions on Pattern Analysis and Machine Intelligence, 23(6):681685, 2001. 15 [Link] and [Link]. Eigenfaces for Recognition. Journal of Cognitive Neuroscience, 3(1):7186, 1991. 16 [Link]. The statistical utilization of multiple measurements. Annals of Eugenics, 8:376386, 1998. 17 [Link]. Introduction to statistical pattern recognition. Academic Press, second edition, 1990. 18 [Link] and Avinash C Kak. PCA versus LDA. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2):228233, 2001. 19 [Link], [Link], and [Link]. Face recognition by independent component analysis. IEEE transactions on Neural Networks, 13(6):14501464, 2002. 20 Xiaofei He, Shuicheng Yan, Yuxiao Hu, and Partha Niyogi. Face recognition using Laplacianfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3):328340, March 2005. 21 Jianxin Wu and Zhi-Hua Zhou. Face recognition with one training image per person. Pattern Recognition Letters, 23:17111719, 2002. 22 [Link], Songcan Chen, Z.-H. Zhou, and [Link]. Recognizing partially occluded, expression variant faces from single training image per person with SOM and soft kNN ensemble. IEEE Transactions on Neural Networks,16(4):875886, 2005. 23 Daoqiang Zhang, Songcan Chen, and Zhi-Hua Zhou. A new face recognition method based on SVD perturbation for single example image per person. Applied Mathematics and Computation, 163:895907, 2005. 66
24 Hongtao Yin, Ping Fu, and Shengwei Meng. Sampled FLDA for face recognition with single training image per person. Neurocomputing, 69(16-18):2443 2445, 2006. 25 [Link] and [Link]. Face recognition using the nearest feature line [Link] Transactions on Neural Networks, 10(2):439443, 1999. 26 Bhagavathula [Link], Marios Savvides, and Chunyan Xie. Correlation pattern recognition for face recognition. Proceedings of the IEEE, 94(11):19631976, November 2006. 27 Xiaogang Wang and Xiaoou Tang. A unified framework for subspace face recognition. IEEE transactions on Pattern analysis and machine intelligence, 26(9):12221227, September 2004. 28 Hyun Chul Kim, Daijin Kim, and Sung Yang Bang. Face recognition using the mixture-ofeigenfaces method. Pattern Recognition Letters, 23:15491558, 2002. 29 Hyun Chul Kim, Daijin Kim, and Sung Yang Bang. Face recognition using LDA mixture model. Pattern Recognition Letters, 24:28152821, 2003. 30 Jen-Tzung Chien and Chia-Chen. Discriminant waveletfaces and nearest feature classifiers for face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12):1644 1649, 2002. 31 Wangmeng Zuo, David Zhang, and [Link]. Bidirectional PCA with assembled matrix distance measure. IEEE transactions on system, man, cybernetics Part B:Cybernetics, 36(4):863872, 2006. 32 Noushath.S, Hemantha Kumar.G, and Shivakumara.P. 2D2LDA: An efficient approach for face recognition. Pattern Recognition, 39(7):13961400, 2006. 33 Ming Li and Baozong Yuan. 2DLDA: A statistical linear discriminant analysis for image matrix. Pattern Recognition Letters, 26(5):527532, 2005. 34 Dacheng Tao, Xuelong Li, Xindong Wu, Weiming Hu, and Stephen J Maybank. Supervised tensor learning. Knowledge and Information System, 2007.
67
END
68