Face Recognition With Local Binary Patterns
Face Recognition With Local Binary Patterns
net/publication/221304831
CITATIONS READS
2,428 8,693
3 authors, including:
Matti Pietikäinen
University of Oulu
326 PUBLICATIONS 63,456 CITATIONS
SEE PROFILE
All content following this page was uploaded by Matti Pietikäinen on 21 February 2017.
1 Introduction
T. Pajdla and J. Matas (Eds.): ECCV 2004, LNCS 3021, pp. 469–481, 2004.
c Springer-Verlag Berlin Heidelberg 2004
470 T. Ahonen, A. Hadid, and M. Pietikäinen
5 9 1 1 1 0
Threshold Binary: 11010011
4 4 6 1 1
Decimal: 211
7 2 3 1 0 0
Fig. 2. The circular (8,2) neigbourhood. The pixel values are bilinearly interpolated
whenever the sampling point is not in the center of a pixel.
transitions from 0 to 1 or vice versa when the binary string is considered circular.
For example, 00000000, 00011110 and 10000011 are uniform patterns. Ojala et al.
noticed that in their experiments with texture images, uniform patterns account
for a bit less than 90 % of all patterns when using the (8,1) neighbourhood and
for around 70 % in the (16,2) neighbourhood.
We use the following notation for the LBP operator: LBPu2 P,R . The subscript
represents using the operator in a (P, R) neighbourhood. Superscript u2 stands
for using only uniform patterns and labelling all remaining patterns with a single
label.
A histogram of the labeled image fl (x, y) can be defined as
Hi = I {fl (x, y) = i} , i = 0, . . . , n − 1, (1)
x,y
in which n is the number of different labels produced by the LBP operator and
1, A is true
I {A} =
0, A is false.
This histogram contains information about the distribution of the local micropat-
terns, such as edges, spots and flat areas, over the whole image. For efficient face
representation, one should retain also spatial information. For this purpose, the
image is divided into regions R0 , R1 , . . . Rm−1 (see Figure 5 (a)) and the spatially
enhanced histogram is defined as
Hi,j = I {fl (x, y) = i} I {(x, y) ∈ Rj } , i = 0, . . . , n−1, j = 0, . . . , m−1. (2)
x,y
From the pattern classification point of view, a usual problem in face recog-
nition is having a plethora of classes and only a few, possibly only one, training
sample(s) per class. For this reason, more sophisticated classifiers are not needed
but a nearest-neighbour classifier is used. Several possible dissimilarity measures
have been proposed for histograms:
– Histogram intersection:
D(S, M) = min(Si , Mi ) (3)
i
– Log-likelihood statistic:
L(S, M) = − Si log Mi (4)
i
3 Experimental Design
The CSU Face Identification Evaluation System [11] was utilised to test the
performance of the proposed algorithm. The system follows the procedure of
the FERET test for semi-automatic face recognition algorithms [12] with slight
modifications. The system uses the full-frontal face images from the FERET
database and works as follows (see Figure 3):
1. The system preprocesses the images. The images are registered using eye
coordinates and cropped with an elliptical mask to exclude non-face area
from the image. After this, the grey histogram over the non-masked area is
equalised.
2. If needed, the algorithm is trained using a subset of the images.
Face Recognition with Local Binary Patterns 473
Algorithm
training
subset
Image files
Preprocessed
Preprocessing Training data
image files
Eye coordinates
Experimental
algorithm
Distance matrix
Rank curve NN Classifier Gallery image list
Probe image list
3. The preprocessed images are fed into the experimental algorithm which out-
puts a distance matrix containing the distance between each pair of images.
4. Using the distance matrix and different settings for gallery and probe image
sets, the system calculates rank curves for the system. These can be calcu-
lated for prespecified gallery and probe image sets or by choosing a random
permutations of one large set as probe and gallery sets and calculating the
average performance. The advantage of the prior method is that it is easy
to measure the performance of the algorithm under certain challenges (e.g.
different lighting conditions) whereas the latter is more reliable statistically.
The CSU system uses the same gallery and probe image sets that were used
in the original FERET test. Each set contains at most one image per person.
These sets are:
– fa set, used as a gallery set, contains frontal images of 1196 people.
– fb set (1195 images). The subjects were asked for an alternative facial ex-
pression than in fa photograph.
– fc set (194 images). The photos were taken under different lighting condi-
tions.
– dup I set (722 images). The photos were taken later in time.
– dup II set (234 images). This is a subset of the dup I set containing those
images that were taken at least a year after the corresponding gallery image.
In this paper, we use two statistics produced by the permutation tool: the
mean recognition rate with a 95 % confidence interval and the probability of one
algorithm outperforming another [13]. The image list used by the tool1 contains
4 images of each of the 160 subjects. One image of every subject is selected to
the gallery set and another image to the probe set on each permutation. The
number of permutations is 10000.
1
list640.srt in the CSU Face Identification Evaluation System package
474 T. Ahonen, A. Hadid, and M. Pietikäinen
The CSU system comes with implementations of the PCA, LDA, Bayesian
intra/extrapersonal (BIC) and Elastic Bunch Graph Matching (EBGM) face
recognition algorithms. We include the results obtained with PCA, BIC2 and
EBGM here for comparison.
There are some parameters that can be chosen to optimise the performance
of the proposed algorithm. The first one is choosing the LBP operator. Choosing
an operator that produces a large amount of different labels makes the histogram
long and thus calculating the distace matrix gets slow. Using a small number of
labels makes the feature vector shorter but also means losing more information.
A small radius of the operator makes the information encoded in the histogram
more local. The number of labels for a neighbourhood of 8 pixels is 256 for
standard LBP and 59 for LBPu2 . For the 16-neighbourhood the numbers are
65536 and 243, respectively. The usage of uniform patterns is motivated by the
fact that most patterns in facial images are uniform: we found out that in the
preprocessed FERET images, 79.3 % of all the patterns produced by the LBP16,2
operator are uniform.
Another parameter is the division of the images into regions R0 , . . . , Rm−1 .
The length of the feature vector becomes B = mBr , in which m is the number
of regions and Br is the LBP histogram length. A large number of small regions
produces long feature vectors causing high memory consumption and slow clas-
sification, whereas using large regions causes more spatial information to be lost.
We chose to divide the image with a grid into k ∗ k equally sized rectangular
regions (windows). See Figure 5 (a) for an example of a preprocessed facial image
divided into 49 windows.
4 Results
Operator Window size P(HI > LL) P(χ2 > HI) P(χ2 > LL)
LBPu2
8,1 18x21 1.000 0.714 1.000
LBPu2
8,1 21x25 1.000 0.609 1.000
LBPu2
8,1 26x30 0.309 0.806 0.587
LBPu2
16,2 18x21 1.000 0.850 1.000
LBPu2
16,2 21x25 1.000 0.874 1.000
LBPu2
16,2 26x30 1.000 0.918 1.000
LBPu2
16,2 32x37 1.000 0.933 1.000
LBPu2
16,2 43x50 0.085 0.897 0.418
parameters. Changes in the parameters may cause big differences in the length
of the feature vector, but the overall performance is not necessarily affected
significantly. For example, changing from LBPu2 16,2 in 18*21-sized windows to
LBPu28,2 in 21*25-sized windows drops the histogram length from 11907 to 2124,
while the mean recognition rate reduces from 76.9 % to 73.8 %.
The mean recognition rates for the LBPu2 u2 u2
16,2 , LBP8,2 and LBP8,1 as a func-
tion of the window size are plotted in Figure 4. The original 130*150 pixel image
was divided into k ∗ k windows, k = 4, 5, . . . , 11, 13, 16 resulting in window sizes
from 32*37 to 8*9. The five smallest windows were not tested using the LBPu2 16,2
operator because of the high dimension of the feature vector that would have
been produced. As expected, a larger window size induces a decreased recog-
nition rate because of the loss of spatial information. The LBPu2 8,2 operator in
18*21 pixel windows was selected since it is a good trade-off between recognition
performance and feature vector length.
0.8
0.75
Mean recognition rate
0.7
0.65
LBPu2
8,2
0.6 LBPu2
16,2
LBPu2
8,1
0.55
8x9 10x11 11x13 13x15 14x16 16x18 18x21 21x25 26x30 32x37
Window size
Fig. 4. The mean recognition rate for three LBP operators as a function of the window
size.
476 T. Ahonen, A. Hadid, and M. Pietikäinen
To find the weights wj for the weighted χ2 statistic (Equation 6), the follow-
ing procedure was adopted: a training set was classified using only one of the
18*21 windows at a time. The recognition rates of corresponding windows on
the left and right half of the face were averaged. Then the windows whose rate
lay below the 0.2 percentile of the rates got weight 0 and windows whose rate
lay above the 0.8 and 0.9 percentile got weights 2.0 and 4.0, respectively. The
other windows got weight 1.0.
The CSU system comes with two training sets, the standard FERET training
set and the CSU training set. As shown in Table 2, these sets are basically subsets
of the fa, fb and dup I sets. Since illumination changes pose a major challenge
to most face recognition algorithms and none of the images in the fc set were
included in the standard training sets, we defined a third training set, called the
subfc training set, which contains half of the fc set (subjects 1013–1109).
Table 2. Number of images in common between different training and testing sets.
The permutation tool was used to compare the weights computed from the
different training sets. The weights obtained using the FERET standard set gave
an average recognition rate of 0.80, the CSU standard set 0.78 and the subfc set
0.81. The pairwise comparison showed that the weights obtained with the subfc
set are likely to be better than the others (P(subfc > FERET)=0.66 and P(subfc
> CSU)=0.88).
The weights computed using the subfc set are illustrated in Figure 5 (b).
The weights were selected without utilising an actual optimisation procedure
and thus they are probably not optimal. Despite that, in comparison with the
nonweighted method, we got an improvement both in the processing time (see
Table 3) and recognition rate (P(weighted > nonweighted)=0.976).
The image set which was used to determine the weights overlaps with the fc
set. To avoid biased results, we preserved the other half of the fc set (subjects
(a) (b)
Fig. 5. (a) An example of a facial image divided into 7x7 windows. (b) The weights
set for weighted χ2 dissimilarity measure. Black squares indicate weight 0.0, dark grey
1.0, light grey 2.0 and white 4.0.
Face Recognition with Local Binary Patterns 477
Table 3. Processing times of weighted and nonweighted LBP on a 1800 MHz AMD
Athlon running Linux. Note that processing FERET images (last column) includes
heavy disk operations, most notably writing the distance matrix of about 400 MB to
disk.
Table 4. The recognition rates of the LBP and comparison algorithms for the FERET
probe sets and the mean recognition rate of the permutation test with a 95 % confidence
interval.
0.9
LBP weighted
0.85 LBP nonweighted
Bayesian MAP
PCA MahCosine
EBGM CSU optimal
0.8
0 10 20 30 40 50
Rank
0.9
0.8
Cumulative score
0.7
0.6
LBP weighted
0.5
LBP nonweighted
Bayesian MAP
0.4 PCA MahCosine
EBGM CSU optimal
0.3
0 10 20 30 40 50
Rank
0.9
Cumulative score
0.8
0.7
0.6
LBP weighted
LBP nonweighted
0.5 Bayesian MAP
PCA MahCosine
EBGM CSU optimal
0.4
0 10 20 30 40 50
Rank
Fig. 6. (a), (b), (c) Rank curves for the fb, fc and dup1 probe sets (from top to
down).
Face Recognition with Local Binary Patterns 479
0.9
0.7
0.6
0.5
LBP weighted
0.4 LBP nonweighted
Bayesian MAP
0.3 PCA MahCosine
EBGM CSU optimal
0.2
0 10 20 30 40 50
Rank
deviation of 100 random permutations using LBPu2 16,2 , a windows size of 30*37
and χ2 as a dissimilarity measure. Window weights were not used. Note that no
registration or preprocessing was made on the images. The good results indicate
that our approach is also relatively robust with respect to alignment. However,
because of the lack of a standardised protocol for evaluating and comparing
systems on the ORL database, it is to difficult to include here a fair comparison
with other approaches that have been tested using ORL.
The experimental results clearly show that the LBP-based method outper-
forms other approaches on all probe sets (fb, fc, dup I and dup II ). For instance,
our method achieved a recognition rate of 97% in the case of recognising faces
under different facial expressions (fb set), while the best performance among
the tested methods did not exceed 90%. Under different lighting conditions (fc
set), the LBP-based approach has also achieved the best performance with a
recognition rate of 79% against 65%, 37% and 42% for PCA, BIC and EBGM,
respectively. The relatively poor results on the fc set confirm that illumination
change is still a challenge to face recognition. Additionally, recognising duplicate
faces (when the photos are taken later in time) is another challenge, although
our proposed method performed better than the others.
To assess the performance of the LBP-based method on different datasets, we
also considered the ORL face database. The experiments not only confirmed the
validity of out approach, but also demonstrated its relative robustness against
changes in alignment.
Analyzing the different parameters in extracting the face representation, we
noticed a relative insensitivity to the choice of the LBP operator and region
size. This is an interesting result since the other considered approaches are more
sensitive to their free parameters. This means that only simple calculations are
needed for the LBP description while some other methods use exhaustive training
to find their optimal parameters.
In deriving the face representation, we divided the face image into several
regions. We used only rectangular regions each of the same size but other di-
visions are also possible as regions of different sizes and shapes could be used.
To improve our system, we analyzed the importance of each region. This is mo-
tivated by the psychophysical findings which indicate that some facial features
(such as eyes) play more important roles in face recognition than other features
(such as the nose). Thus we calculated and assigned weights from 0 to 4 to the
regions (See Figure 5 (b)). Although this kind of simple approach was adopted to
compute the weights, improvements were still obtained. We are currently inves-
tigating approaches for dividing the image into regions and finding more optimal
weights for them.
Although we clearly showed the simplicity of LBP-based face representation
extraction and its robustness with respect to facial expression, aging, illumi-
nation and alignment, some improvements are still possible. For instance, one
drawback of our approach lies in the length of the feature vector which is used for
face representation. Indeed, using a feature vector length of 2301 slows down the
recognition speed especially, for very large face databases. A possible direction
is to apply a dimensionality reduction to the face feature vectors. However, due
to the good results we have obtained, we expect that the methodology presented
here is applicable to several other object recognition tasks as well.
References
1. Phillips, P., Grother, P., Micheals, R.J., Blackburn, D.M., Tabassi, E., Bone, J.M.:
Face recognition vendor test 2002 results. Technical report (2003)
2. Zhao, W., Chellappa, R., Rosenfeld, A., Phillips, P.J.: Face recognition: a liter-
ature survey. Technical Report CAR-TR-948, Center for Automation Research,
University of Maryland (2002)
3. Phillips, P.J., Wechsler, H., Huang, J., Rauss, P.: The FERET database and
evaluation procedure for face recognition algorithms. Image and Vision Computing
16 (1998) 295–306
4. Turk, M., Pentland, A.: Eigenfaces for recognition. Journal of Cognitive Neuro-
science 3 (1991) 71–86
5. Etemad, K., Chellappa, R.: Discriminant analysis for recognition of human face
images. Journal of the Optical Society of America 14 (1997) 1724–1733
6. Wiskott, L., Fellous, J.M., Kuiger, N., von der Malsburg, C.: Face recognition by
elastic bunch graph matching. IEEE Transaction on Pattern Analysis and Machine
Intelligence 19 (1997) 775–779
7. Moghaddam, B., Nastar, C., Pentland, A.: A bayesian similarity measure for direct
image matching. In: 13th International Conference on Pattern Recognition. (1996)
II: 350–358
8. Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation
invariant texture classification with local binary patterns. IEEE Transactions on
Pattern Analysis and Machine Intelligence 24 (2002) 971–987
9. Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures
with classification based on feature distributions. Pattern Recognition 29 (1996)
51–59
10. Gong, S., McKenna, S.J., Psarrou, A.: Dynamic Vision, From Images to Face
Recognition. Imperial College Press, London (2000)
11. Bolme, D.S., Beveridge, J.R., Teixeira, M., Draper, B.A.: The CSU face identifica-
tion evaluation system: Its purpose, features and structure. In: Third International
Conference on Computer Vision Systems. (2003) 304–311
12. Phillips, P.J., Moon, H., Rizvi, S.A., Rauss, P.J.: The FERET evaluation method-
ology for face recognition algorithms. IEEE Transactions on Pattern Analysis and
Machine Intelligence 22 (2000) 1090–1104
13. Beveridge, J.R., She, K., Draper, B.A., Givens, G.H.: A nonparametric statisti-
cal comparison of principal component and linear discriminant subspaces for face
recognition. In: IEEE Computer Society Conference on Computer Vision and Pat-
tern Recognition. (2001) I: 535–542
14. Samaria, F.S., Harter, A.C.: Parameterisation of a stochastic model for human face
identification. In: IEEE Workshop on Applications of Computer Vision. (1994)
138–142