Face Recognition System
Face Recognition System
CONTENTS
1. Introduction
1.1 Motivation: Biometric Security Technology
1.2 Face Recognition
2. Analysis
2.1 Problem Statement
2.2 Literature Survey
2.2.1 Eigenface Method
2.2.2 Neural Network Approach
2.2.3 Stochastic Modeling
2.2.4 Geometrical feature matching
2.2.5 Template Matching
2.2.6 Graph Matching
2.2.7 N-tuple classifiers
2.2.8 Line Edge Map
2.3 Line Edge Map Method
2.3.1 Face Detection
2.3.2 Edge Detectors
2.3.3 Thinning Algorithm
2.3.4 Curve Fitting Algorithm
2.3.5 Hausdorff Distance Algorithm
2.4 Use Case Diagram
3. Design
3.1 Class Relationship Diagram
3.2 Class Diagram
3.3 Sequence Diagram
References
CHAPTER 1
INTRODUCTION
WHAT IS A BIOMETRIC?
Of these, a biometric is the most secure and convenient authentication tool. It can't be
borrowed, stolen, or forgotten; and forging one is practically impossible. Biometrics takes
account of individual’s unique physical or behavioral characteristics to recognize or
authenticate identity. Common physical biometrics include fingerprints; hand or palm
geometry; retina, iris, or facial characteristics. Behavioral characters include signature,
voice (which also has a physical component), keystroke pattern, and gait. Of these classes
of biometrics, technologies for signature and voice are the most developed.
Figure 1 describes the process involved in using a biometric system for security.
Figure 1: How a biometric system works.
(1) Capture the chosen biometric; (2) process the biometric, extract and enroll the
biometric template; (3) store the template in a local repository, or a central repository, or
a portable token such as a smart card; (4) live-scan the chosen biometric; (5) process the
biometric and extract the biometric template; (6) match the scanned biometric against
stored templates; (7) provide a matching score to business applications; (8) record a
secure audit trail with respect to system used.
Fingerprints
A fingerprint looks at the patterns found on a fingertip. There are different approaches to
fingerprint verification. Some emulates the traditional police method of matching
minutiae while others use either straight pattern-matching devices or a bit more unique
i.e. things like moiré fringe patterns and ultrasonic. Some verification approaches can
detect when a live finger is presented while others cannot.
Hand geometry
Hand geometry involves analyzing and measuring the shape of the hand. This biometric
offers a good balance of performance characteristics and is relatively easy to use. It is
suitable for places with more users. It is also appropriate when users access the system
infrequently and/or are perhaps less disciplined in their approach to the system. Accuracy
can be very high if desired. Flexible performance tuning and configuration can
accommodate a wide range of applications.
Retina
A retina-based biometric involves analyzing the layer of blood vessels situated at the back
of the eye. An established technology, this technique involves using a low-intensity light
source through an optical coupler to scan the unique patterns of the retina. Retinal
scanning can be quite accurate but does require the user to look into a receptacle and
focus on a given point. This is not particularly convenient if you wear glasses or are
concerned about having close contact with the reading device.
Iris
An iris-based biometric, on the other hand, involves analyzing features found in the
colored ring of tissue that surrounds the pupil. Iris scanning, undoubtedly the less
intrusive of the eye-related biometrics, uses a fairly conventional camera element and
requires no close contact between the user and the reader. In addition, it has the potential
for higher than average template-matching performance. Iris biometrics work with glasses
in place and is one of the few devices that can work well in identification mode.
Face
Face recognition analyzes facial characteristics. It requires a digital camera to develop a
facial image of the user for authentication. Facial features are most important thing in
face recognition. It extracts features from a face image and compare with those stored in
the database for identification.
Signature
Signature verification analyzes the way a user signs his/her name. Signing features such
as speed, velocity, and pressure are as important as the finished signature's static shape.
Voice
Voice authentication is not based on voice recognition but on voice-to-print
authentication, where complex technology transforms voice into text. Voice biometrics
has the maximum potential for growth, as it requires no new hardware.
Physical access
For decades, many highly secure environments have used biometric technology for entry
access. Today, the primary application of biometrics is in physical security i.e. to control
access to secure locations (rooms or buildings). Unlike photo identification cards, which a
security guard must verify, biometrics permits unmanned access control. Biometrics is
useful for high-volume access control. For example, biometrics controlled access of
65,000 people during the 1996 Olympic Games, and Disney World uses a fingerprint
scanner to verify season-pass holders entering the theme park.
Virtual access
For a long time, biometric-based network and computer access were areas often discussed
but rarely implemented. Virtual access is the application that will provide the critical mass
to move biometrics for network and computer access from the realm of science-fiction
devices to regular system components.
Physical lock-downs can protect hardware, and passwords are currently the most popular
way to protect data on a network. Biometrics, however, can increase the ability to protect
data by implementing a more secure key than a password. Biometrics also allows a
hierarchical structure of data protection, making the data further secure. Passwords supply
a minimal level of access to network data, but biometrics is the next level. You can even
lairize biometric technologies to enhance security levels.
E-commerce applications
E-commerce developers are exploring the use of biometrics and smart cards to more
accurately verify a trading party's identity. Some are using biometrics to obtain secure
services over the telephone through voice authentication.
Covert surveillance
One of the more challenging research areas involves using biometrics for covert
surveillance. Using facial and body recognition technologies, researchers hope to use
biometrics to automatically identify known suspects entering buildings or traversing
crowded security areas such as airports. The use of biometrics for covert identification as
opposed to authentication must overcome technical challenges such as simultaneously
identifying multiple subjects in a crowd and working with uncooperative subjects. In
these situations, devices cannot count on consistency in pose, viewing angle, or distance
from the detector.
Although companies are using biometrics for authentication in a variety of situations, the
industry is still evolving and emerging. To both guide and support the growth of
biometrics, the Biometric Consortium formed in December 1995.
Standardization
The biometrics industry includes more than 150 separate hardware and software vendors,
each with their own proprietary interfaces, algorithms, and data structures. Standards are
emerging to provide a common software interface, to allow sharing of biometric
templates, and to permit effective comparison and evaluation of different biometric
technologies.
The BioAPI standard, defines a common method for interfacing with a given biometric
application. BioAPI is open-systems standard developed by a consortium of more than 60
vendors and government agencies. Written in C, it consists of a set of function calls to
perform basic actions common to all biometric technologies, such as
• Enroll user,
• Verify asserted identity (authentication), and
• Discover identity.
Another draft standard is the Common Biometric Exchange File Format, which defines a
common mean of exchanging and storing templates collected from a variety of biometric
devices.
Biometric assurance i.e. the confidence that a biometric device can achieve the intended
level of security is another active research area. Current metrics for comparing biometric
technologies, such as the crossover error rate and the average enrollment time, are limited
because they lack a standard test bed on which to base their values. Several groups,
including the US Department of Defense's Biometrics Management Office, are
developing standard testing methodologies.
PKI uses public and private-key cryptography for user identification and authentication. It
has some advantages over biometrics: It is mathematically more secure, and it can be
used across the Internet. The main drawback of PKI is the management of the user's
private key. To be secure, the private key must be protected from compromise, while to be
useful; the private key must be portable. The solution to these problems is to store the
private key on a smart card and protect it with a biometric.
With the advancement in computer and automated systems, one is seldom surprised to
find such systems applicable to many visual tasks in our daily activities. Automated
systems on production lines inspect goods for our consumption, and law-enforcement
agencies use computer systems to search databases of fingerprint records. Visual
surveillance of scenes, visual feedback for control etc. all has potential applications for
automated visual systems.
One area that has grown significantly in importance over the past decade is that of
computer face processing in visual scenes. Researchers attempt to teach the computer to
recognize and analyze human faces from images so as to produce an easy and convenient
platform for interaction between human and computers. Law- enforcement can be
improved by automatically recognizing criminals from a group of suspects. Security can
also be reinforced by identifying that the authorized person is physically present.
Moreover, human facial expressions can be analyzed to direct robot motion to perform
certain secondary, or even primary, tasks in our routine work requirements.
For more than a quarter of a century, research has been done in automatic face
recognition. Psycho-physicists and neuroscientists have attempted to understand why a
human being is able to handle problem of face recognition without any problems.
Engineers had and still have the dream of a face recognition, which is done fully
automatically by computers. They want to obtain efficiency, which is comparable with the
human ability of face recognition. This problem has not been solved yet and the scientists
have still to go a very long way to reach this goal.
Face recognition can basically be understood as a complex pattern recognition task.
Thus, most of the techniques that have been applied originate from the field of signal
processing and computer science research.
Probably because of the fast development in the research of face recognition or the large
number of parallel existing approaches there is no textbook that can be recommended.
Though, some survey articles are useful to get acquainted with this matter.
First attempts of face recognition deal with the problem by describing features in the
image as compared to the stored data. Several other approaches have applied correlations
with already stored feature templates.
Automatic face recognition is a technique that can locate and identify faces automatically
in an image and determine “who is who” from a database. It is gaining more and more
attention in the area of computer vision, image processing and pattern recognition. There
are several important steps involved in recognizing face such as detection, representation
and identification. Based on different representations, various approaches can be grouped
into feature-based and image-based.
Usually every group of researchers uses their own database of manually normalized faces,
but these have conditions far away from the kind of images expected in reality. In realistic
situations there would be rather a face somewhere in the image, but not in the middle and
thus the background would be cluttered.
Many techniques for face recognition have been developed whose principles span several
disciplines, such as image processing, pattern recognition, computer vision, and neural
networks. The increasing interest in face recognition is mainly driven by application
demands, such as non-intrusive identification and verification for credit cards and
automatic teller machine transactions, non-intrusive access-control to buildings,
identification for law enforcement, etc. Machine recognition of faces yields problems that
belong to the following categories whose objectives are briefly outlined:
1. Face Recognition: Given a test face and a set of reference faces in a database
find the N most similar reference faces to the test face.
2. Face Authentication: Given a test face and a reference one, decide if the test
face is identical to the reference face.
Face recognition has been studied more extensively than face authentication. The two
problems are conceptually different. On one hand, a face recognition system usually
assists a human expert to determine the identity of a test face by computing all similarity
scores between the test face and each human face stored in the system database and by
ranking them. On the other hand, a face authentication system should decide itself if a test
face is assigned to a client (i.e., one who claims his/her own identity) or to an impostor
(i.e., one who pretends to be someone else).
Cognitive psychological studies indicated that human beings recognize line drawings as
quickly and almost as accurately as gray-level pictures. These results might imply that
edge images of objects could be used for object recognition and to achieve similar
accuracy as gray-level images. A novel concept, “faces can be recognized using line edge
map,” is proposed. A compact face feature, Line Edge Map (LEM), is extracted for face
coding and recognition.
The faces were encoded into binary edge maps using Sobel edge detection algorithm. The
Hausdorff distance was chosen to measure the similarity of the two point sets, i.e., the
edge maps of two faces, because the Hausdorff distance can be calculated without an
explicit pairing of points in their respective data sets. A pre-filtering scheme (two-stage
identification) is used to speed up the searching using a 2D pre-filtering vector derived
from the face LEM.
A feasibility investigation and evaluation for face recognition based solely on face LEM
is conducted, which covers all the conditions of human face recognition, i.e., face
recognition under controlled/ideal condition, varying lighting condition, varying facial
expression, and varying pose.
Chapter 2
Analysis
For more than a quarter of a century research has been done in automatic face
recognition. Psycho-physicists and neuroscientists have attempted to understand why the
human being is able to handle the problem of face recognition nearly without any
problems. Engineers had and still have the dream of a face recognition, which is done
fully automatically by computers. They want to obtain efficiency, which is comparable
with the human ability of face recognition. This problem has not been solved yet and the
scientists have still to go a very long way to reach this goal.
As we are using LEM method, we will use certain algorithms to fulfill this task. For that,
we will find important region of face from the image. Then we detect edge from that
image and will thin the edges. This thinned edge map is then converted to line edge map,
which will be store in to database. As input comes, that will also be processed through
these steps and finally compared with images stored in the database.
There are many techniques available for human face recognition, but the major human
face recognition technique applies mostly to frontal faces. Major methods considered for
face recognition are eigen-face (eigen-feature), neural network, dynamic link architecture,
hidden Markov model, geometrical feature matching, and template matching. The
approaches are analyzed in terms of the facial representations they used.
Illumination normalization [6] is usually necessary for the eigenface approach. Zhao and
Yang [32] proposed a new method to compute the covariance matrix using three images
each taken in different lighting conditions to account for arbitrary illumination effects, for
an object. Lambertian and Pentland et al. [8] extended their early work on eigenface to
eigenfeatures corresponding to face components, such as eyes, nose, and mouth. They
used a modular eigenspace which was composed of the above eigenfeatures (i.e.,
eigeneyes, eigennose, and eigenmouth). This method would be less sensitive to
appearance changes than the standard eigenface method. In summary, eigenface appears
as a fast, simple, and practical method. However, in general, it does not provide
invariance over changes in scale and lighting conditions.
Lawrence et al. [11] proposed a hybrid neural network, which combined local image
sampling, a self-organizing map (SOM) neural network, and a convolutional neural
network. The SOM provides a quantization of the image samples into a topological space
where inputs that are nearby in the original space are also nearby in the output space,
thereby providing dimension reduction and invariance to minor changes in the image
sample. The convolutional network extracts successively larger features in a hierarchical
set of layers and provides partial invariance to translation, rotation, scale, and
deformation. The authors reported 96.2 percent correct recognition on ORL database of
400 images of 40 individuals. The classification time is less than 0.5 second, but the
training time is as long as 4 hours. Lin et al. [13] used probabilistic decision-based neural
network (PDBNN), which inherited the modular structure from its predecessor, a decision
based neural network (DBNN) [14].
The PDBNN can be applied effectively to
1) Face detector: This finds the location of a human face in a cluttered image,
2) Eye localizer: This determines the positions of both eyes in order to generate
meaningful feature vectors, and
3) Face recognizer: A hierarchical neural network structure with non-linear basis
functions and a competitive credit-assignment scheme.
PDBNN-based biometric identification system has the merits of both neural networks and
statistical approaches, and its distributed computing principle is relatively easy to
implement on parallel computer. In [13], it was reported that PDBNN face recognizer had
the capability of recognizing up to 200 people and could achieve up to 96 percent correct
recognition rate in approximately 1 second. However, when the number of persons
increases, the computing expense will become more demanding. In general, neural
network approaches encounter problems when the number of classes (i.e., individuals)
increases. Moreover, they are not suitable for a single model image recognition task
because multiple model images per person are necessary in order for training the systems
to optimal parameter setting.
Stochastic modeling of non-stationary vector time series based on hidden Markov models
(HMM) has been very successful for speech applications. Samaria and Fallside [27]
applied this method to human face recognition. Faces were intuitively divided into
regions such as the eyes, nose, mouth, etc., which can be associated with the states of a
hidden Markov model. Since HMMs require a one-dimensional observation sequence and
images are two-dimensional, the images should be converted into either 1D temporal
sequence or 1D spatial sequence.
In [28], a spatial observation sequence was extracted from a face image by using a band
sampling technique. Each face image was represented by a 1D vector series of pixel
observation. Each observation vector is a block of L lines and there is an M lines overlap
between successive observations. An unknown test image is first sampled to an
observation sequence. Then, it is matched against every HMMs in the model face
database (each HMM represents a different subject). The match with the highest
likelihood is considered the best match and the relevant model reveals the identity of the
test face. The recognition rate of HMM approach is 87 percent using ORL database
consisting of 400 images of 40 individuals. Pseudo 2D HMM [28] was reported to
achieve a 95 percent recognition rate in their preliminary experiments. Its classification
time and training time were not given (believed to be very expensive). The choice of
parameters had been based on subjective intuition.
One of the pioneering works on automated face recognition by using geometrical features
was done by Kanade [19] in 1973. Their system achieved a peak performance of 75
percent recognition rate on a database of 20 people using two images per person, one as
the model and the other as the test image. Goldstein et al. [20] and Kaya and Kobayashi
[18] showed that a face recognition program provided with features extracted manually
could perform recognition apparently with satisfactory results. Bruneli and Poggio [21]
automatically extracted a set of geometrical features from the picture of a face, such as
nose width and length, mouth position, and chin shape. There were 35 features extracted
to form a 35 dimensional vector. The recognition was then performed with a Bayes
classifier. They reported a recognition rate of 90 percent on a database of 47 people.
The template matching was superior in recognition (100 percent recognition rate) to
geometrical matching (90 percent recognition rate) and was also simpler. Since the
principal components (also known as eigenfaces or eigenfeatures) are linear combinations
of the templates in the data basis, the technique cannot achieve better results than
correlation [21], but it may be less computationally expensive. One drawback of template
matching is its computational complexity. Another problem lies in the description of these
templates. Since the recognition system has to be tolerant to certain discrepancies
between the template and the test image, this tolerance might average out the differences
that make individual faces unique. In general, template-based approaches compared to
feature matching are a more logical approach.
Graph matching is another approach to face recognition. Lades et al. [15] presented a
dynamic link structure for distortion invariant object recognition, which employed elastic
graph matching to find the closest stored graph. Dynamic link architecture is an extension
to classical artificial neural networks. Memorized objects are represented by sparse
graphs, whose vertices are labeled with a multi-resolution description in terms of a local
power spectrum and whose edges are labeled with geometrical distance vectors. Object
recognition can be formulated as elastic graph matching which is performed by stochastic
optimization of a matching cost function.
Wiskott and von der Malsburg [16] extended the technique and matched human faces
against a gallery of 112 neutral frontal view faces. Probe images were distorted due to
rotation in depth and changing facial expression. Encouraging results on faces with large
rotation angles were obtained. They reported recognition rates of 86.5 percent and 66.4
percent for the matching tests of 111 faces of 15 degree rotation and 110 faces of 30
degree rotation to a gallery of 112 neutral frontal views. In general, dynamic link
architecture is superior to other face recognition techniques in terms of rotation invariant;
however, the matching process is computationally expensive.
While the traditional n-tuple classifier deals with binary-valued input vectors, methods
using n-tuple systems with integer-valued inputs have also been developed. Allinson and
Kolcz [3] have developed a method of mapping scalar attributes into bit strings based on
a combination of CMAC and Gray coding methods. This method has the property that for
small differences in the arithmetic values of the attributes, the hamming distance between
the bit strings is equal to the arithmetic difference. For larger values of the arithmetic
distance, the hamming distance is guaranteed to be above a certain threshold.
The continuous n-tuple method also shares some similarity at the architectural level with
the single layer look-up perceptron of Tattersall et al [32], though they differ in the way
the class outputs are calculated, and in the training methods used to configure the contents
of the look-up tables (RAMS).
In summary, no existing technique is free from limitations. Further efforts are required to
improve the performances of face recognition techniques, especially in the wide range of
environments encountered in real world.
Cognitive psychological studies indicated that human beings recognize line drawings as
quickly and almost as accurately as gray-level pictures. These results might imply that
edge images of objects could be used for object recognition and to achieve similar
accuracy as gray-level images. The faces were encoded into binary edge maps using
Sobel edge detection algorithm. The Hausdorff distance was chosen to measure the
similarity of the two point sets, i.e., the edge maps of two faces, because the Hausdorff
distance can be calculated without an explicit pairing of points in their respective data
sets. The modified Hausdorff distance in the formulation of
was used, as it is less sensitive to noise than the maximum or kth ranked Hausdorff
distance formulations. Takacs argued that the process of face recognition might start at a
much earlier stage and edge images can be used for the recognition of faces without the
involvement of high-level cognitive functions. However, the Hausdorff distance uses only
the spatial information of an edge map without considering the inherent local structural
characteristics inside such a map. A successful object recognition approaches might need
to combine aspects of feature-based approaches with template matching method. A Line
Edge Map (LEM) approach extracts lines from a face edge map as features. This
approach can be considered as a combination of template matching and geometrical
feature matching. The LEM approach not only possesses the advantages of feature-based
approaches, such as invariant to illumination and low memory requirement, but also has
the advantage of high recognition performance of template matching. The above three
reasons together with the fact that edges are relatively insensitive to illumination changes
motivated this research.
A novel face feature representation, Line Edge Map (LEM), is proposed here to integrate
the structural information with spatial information of a face image by grouping pixels of
face edge map to line segments. After thinning the edge map, a polygonal line fitting
process is applied to generate the LEM of a face. The LEM representation, which records
only the end points of line segments on curves, further reduces the storage requirement.
Efficient coding of faces is a very important aspect in a face recognition system. LEM is
also expected to be less sensitive to illumination changes due to the fact that it is an
intermediate-level image representation derived from low level edge map representation.
The basic unit of LEM is the line segment grouped from pixels of edge map.
In this study, we explore the information of LEM and investigate the feasibility and
efficiency of human face recognition using LEM. A Line Segment Hausdorff Distance
(LHD) measure is then proposed to match LEMs of faces. LHD has better distinctive
power because it can make use of the additional structural attributes of line orientation,
line-point association, and number disparity in LEM, i.e., it is not encouraged to match
two lines with large orientation difference, and all the points on one line have to match to
points on another line only.
The original algorithm is based on mosaic images of reduced resolution that attempt to
capture the macroscopic features of the human face. It is assumed that there is a
resolution level where the main part of the face occupies an area of about 4x4 cells.
Accordingly, a mosaic image can be created for this resolution level. It is the so called
quartet image. The grey level of each cell equals the average value of the grey levels of
all pixels included in the cell. An abstract model for the face at the resolution level of the
quartet image is depicted in Figure.
The main part of the face corresponds to the region of 4x4 cells having an origin cell
marked by “X”. By subdividing each quartet image cell to 2x2 cells of half dimensions
the octet image results, where the main facial features such as the eyebrows/eyes, the
nostrils/nose and the mouth are detected. Therefore, a hierarchical knowledge-based
system can be designed that aims at detecting facial candidates by establishing rules
applied to the quartet image and subsequently at validating the choice of a facial
candidate by establishing rules applied to the octet image for detecting the key facial
features.
Fig. 1
As can be seen, the underlying idea is very simple and very attractive, because it is close
to our intuition for the human face. However, the implementation is computationally
intensive. The algorithm is applied iteratively for the entire range of possible cell
dimensions in order to determine the best cell dimensions for creating the quartet image
for each person. Another limitation is that only square cells are employed. In order to
avoid the iterative nature of the original method, we estimate the cell dimensions in the
quartet image by processing the horizontal and the vertical profile of the image. Let us
denote by n and m the vertical and the horizontal quartet cell dimensions, respectively.
The horizontal profile of the image is obtained by averaging all pixel intensities in each
image column, by detecting abrupt transitions in the horizontal profile. These local
minima are determined in the horizontal profile. These correspond to the left and right
side of the head. Accordingly, the quartet cell dimension in the horizontal direction can
easily be estimated.
Similarly, the vertical profile of the image is obtained by averaging all pixel intensities in
each image row. The significant local minima in the vertical profile correspond to the
hair, eyebrows, eyes, mouth and chin. It is fairly easy to locate the row where the
eyebrows/eyes appear in the image by detecting the local minimum after the first abrupt
transition in the vertical profile. Searching for the row should be detected. It corresponds
to a significant maximum that occurs below the eyes. Then, the steepest minimum below
the nose tip is associated to the upper lip. By setting the distance between the rows where
the eyes and the upper lip have been found to 2n, the quartet cell dimension in the vertical
direction can be estimated. It is evident that the proposed preprocessing step overcomes
also the drawback of square cells, because the cell dimensions are adapted to each person
separately.
Having estimated the quartet cell dimensions, comes the description of facial candidate
detection rules. Since the system remains hierarchical, it is more preferable to decide that
a face exists in a scene although there may be no actual face than not to detect a face that
exists. The decision whether or not a region of 4x4 cells is a facial candidate is based on:
• The detection of a homogenous region of 2x2 cells in the middle of the model that
is shown in light grey color in fig 1 above.
• The detection of homogeneous connected components having significant length in
the π-shaped region shown in black color in fig 1, or,
• The detection of a beard region shown in dark gray color in fig 1.
Moreover, a significant difference in the average cell intensity between the central 2x2
region and the π-shaped region must be detected. For the sake of completeness, we note
that if there aren’t adequate cells in the vertical direction, the π-shaped region may have a
total length of 12 cells instead of 14 cells. We have found that the above described rules
are more successful in detecting a facial candidate. Subsequently, eyebrows/eyes,
nostrils/nose and mouth detection rules are developed to validate the facial candidates
determined by the procedure outlined above.
The operators described here are those whose purpose is to identify meaningful image
features on the basis of distributions of pixel grey levels. The two categories of operators
included here are:
1. Edge Pixel Detectors - that assign a value to a pixel in proportion to the likelihood that
the pixel is part of an image edge (i.e. a pixel which is on the boundary between
two regions of different intensity values).
2. Line Pixel Detectors - that assign a value to a pixel in proportion to the likelihood that
the pixel is part of an image line (i.e. a dark narrow region bounded on both sides
by lighter regions, or vice-versa).
Detectors for other features can be defined, such as circular arc detectors in intensity
images (or even more general detectors, as in the generalized Hough transform), or planar
point detectors in range images, etc.
Note that the operators merely identify pixels likely to be part of such a structure. To
actually extract the structure from the image it is then necessary to group together image
pixels (which are usually adjacent).
The Roberts Cross operator performs a simple, quick to compute, 2-D spatial gradient
measurement on an image. It thus highlights regions of high spatial gradient, which often
correspond to edges. In its most common usage, the input to the operator is a greyscale
image, as is the output. Pixel values at each point in the output represent the estimated
absolute magnitude of the spatial gradient of the input image at that point.
How It Works
In theory, the operator consists of a pair of 2×2 convolution masks as shown in Figure 1.
One mask is simply the other rotated by 90°. This is very similar to the Sobel operator.
+1 0 0 +1
0 -1 -1 0
Gx Gy
These masks are designed to respond maximally to edges running at 45° to the pixel grid,
one mask for each of the two perpendicular orientations. The masks can be applied
separately to the input image, to produce separate measurements of the gradient
component in each orientation (call these Gx and Gy). These can then be combined
together to find the absolute magnitude of the gradient at each point and the orientation of
that gradient.
The angle of orientation of the edge giving rise to the spatial gradient (relative to the pixel
grid orientation) is given by:
θ = arctan(Gy / Gx) − 3π / 4
In this case, orientation 0 is taken to mean that the direction of maximum contrast from
black to white runs from left to right on the image, and other angles are measured anti-
clockwise from this.
Often, the absolute magnitude is the only output the user sees --- the two components of
the gradient are conveniently computed and added in a single pass over the input image
using the pseudo-convolution operator shown in Figure 2.
P1 P2
P3 P4
| G |=| P1 − P4 | + | P2 − P3 |
The main reason for using the Roberts cross operator is that it is very quick to compute.
Only four input pixels need to be examined to determine the value of each output pixel,
and only subtractions and additions are used in the calculation. In addition there are no
parameters to set. Its main disadvantages are that since it uses such a small mask, it is
very sensitive to noise. It also produces very weak responses to genuine edges unless they
are very sharp. The Sobel operator performs much better in this respect.
The Sobel operator performs a 2-D spatial gradient measurement on an image and so
emphasizes regions of high spatial gradient that corresponds to edges. Typically it is used
to find the approximate absolute gradient magnitude at each point in an input greyscale
image.
How It Works
In theory at least, the operator consists of a pair of 3×3 convolution masks as shown in
Figure 1. One mask is simply the other rotated by 90°. This is very similar to the Roberts
Cross operator.
-1 0 +1 +1 +2 +1
-2 0 +2 0 0 0
-1 0 +1 -1 -2 -1
Gx Gy
These masks are designed to respond maximally to edges running vertically and
horizontally relative to the pixel grid, one mask for each of the two perpendicular
orientations. The masks can be applied separately to the input image, to produce separate
measurements of the gradient component in each orientation (call these Gx and Gy).
These can then be combined together to find the absolute magnitude of the gradient at
each point and the orientation of that gradient. The gradient magnitude is given by:
| G |= Gx 2 + Gy 2
| G |=| Gx | + | Gy |
which is much faster to compute.
The angle of orientation of the edge (relative to the pixel grid) giving rise to the spatial
gradient is given by:
θ = arctan(Gy / Gx) − 3π / 4
In this case, orientation 0 is taken to mean that the direction of maximum contrast from
black to white runs from left to right on the image, and other angles are measured anti-
clockwise from this.
Often, this absolute magnitude is the only output the user sees --- the two components of
the gradient are conveniently computed and added in a single pass over the input image
using the pseudo-convolution operator shown in Figure 2.
P1 P2 P3
P4 P5 P6
P7 P8 P9
| G |= | ( P1 + 2 × P2 + P3 ) − ( P7 + 2 × P8 + P9 ) | + | ( P3 + 2 × P6 + P9 ) − ( P1 + 2 × P4 + P7 ) |
The Sobel operator is slower to compute than the Roberts Cross operator, but its larger
convolution mask smoothes the input image to a greater extent and so makes the operator
less sensitive to noise. The operator also generally produces considerably higher output
values for similar edges compared with the Roberts Cross.
As with the Roberts Cross operator, output values from the operator can easily overflow
the maximum allowed pixel value for image types that only support smallish integer pixel
values (e.g. 8-bit integer images). When this happens the standard practice is to simply set
overflowing output pixels to the maximum allowed value. The problem can be avoided by
using an image type that supports pixel values with a larger range.
Natural edges in images often lead to lines in the output image that are several pixels
wide due to the smoothing effect of the Sobel operator. Some thinning may be desirable
to counter this. Failing that, some sort of hysteresis ridge tracking could be used as in the
Canny operator.
The Canny operator was designed to be an optimal edge detector (according to particular
criteria --- there are other detectors around that also claim to be optimal with respect to
slightly different criteria). It takes as input a grey scale image, and produces as output an
image showing the positions of tracked intensity discontinuities.
How It Works
The Canny operator works in a multi-stage process. First of all the image is smoothed by
Gaussian convolution. Then a simple 2-D first derivative operator (somewhat like the
Roberts Cross) is applied to the smoothed image to highlight regions of the image with
high first spatial derivatives. Edges give rise to ridges in the gradient magnitude image.
The algorithm then tracks along the top of these ridges and sets to zero all pixels that are
not actually on the ridge top so as to give a thin line in the output, a process known as
non-maximal suppression. The tracking process exhibits hysteresis controlled by two
thresholds: T1 and T2 with T1 > T2. Tracking can only begin at a point on a ridge higher
than T1. Tracking then continues in both directions out from that point until the height of
the ridge falls below T2. This hysteresis helps to ensure that noisy edges are not broken
up into multiple edge fragments.
The effect of the Canny operator is determined by three parameters --- the width of the
Gaussian mask used in the smoothing phase, and the upper and lower thresholds used by
the tracker. Increasing the width of the Gaussian mask reduces the detector's sensitivity to
noise, at the expense of losing some of the finer detail in the image. The localization error
in the detected edges also increases slightly as the Gaussian width is increased.
Usually, the upper tracking threshold can be set quite high, and the lower threshold quite
low for good results. Setting the lower threshold too high will cause noisy edges to break
up. Setting the upper threshold too low increases the number of spurious and undesirable
edge fragments appearing in the output.
One problem with the basic Canny operator is to do with Y-junctions i.e. places where
three ridges meet in the gradient magnitude image. Such junctions can occur where an
edge is partially occluded by another object. The tracker will treat two of the ridges as a
single line segment, and the third one as a line that approaches, but doesn't quite connect
to, that line segment.
How It Works
When using compass edge detection the image is convolved with a set of (in general 8)
convolution masks, each of which is sensitive to edges in a different orientation. For each
pixel the local edge gradient magnitude is estimated with the maximum response of all 8
masks at this pixel location:
|G| = max (|Gi| : I = 1 to n)
where Gi is the response of the mask i at the particular pixel position and n is the number
of convolution masks. The local edge orientation is estimated with the orientation of the
mask, which yields the maximum response.
Various masks can be used for this operation, for the following discussion we will use the
Prewitt mask. Two templates out of the set of 8 are shown in Figure 1:
-1 +1 +1 +1 +1 +1
-1 -2 +1 -1 -2 +1
-1 +1 +1 -1 -1 +1
0o 45o
The whole set of 8 masks is produced by taking one of the masks and rotating its
coefficients circularly. Each of the resulting masks is sensitive to another edge orientation
ranging from 0° to 315° in steps of 45°, where 0° corresponds to a vertical edge.
The maximum response |G| for each pixel gives rise to the value of the corresponding
pixel in the output magnitude image. The values for the output orientation image lie
between 1 and 8, depending on which of the 8 masks produced the maximum response.
This edge detection method is also called edge template matching, because a set of edge
templates is matched to the image, each representing an edge in a certain orientation. The
edge magnitude and orientation of a pixel is then determined by the template, which
matches the local area of the pixel the best.
The compass edge detector is an appropriate way to estimate the magnitude and
orientation of an edge. Whereas differential gradient edge detection needs a rather time-
consuming calculation to estimate the orientation from the magnitudes in x- and y-
direction, the compass edge detection obtains the orientation directly from the mask with
the maximum response. The compass operator is limited to (here) 8 possible orientations;
however experience shows that most direct orientations estimates are not much more
accurate.
On the other hand, the compass operator needs (here) 8 convolutions for each pixel,
whereas the gradient operator needs only 2, one mask being sensitive to edges in the
vertical direction and one to the horizontal direction. The result for the edge magnitude
image is very similar with both methods, provided the same convolving mask is used.
Common Variants
As already mentioned earlier, there are various masks, which can be used for Compass
Edge Detection. The most common ones are shown in Figure 2:
0o 45o
-1 0 +1 0 1 2
-2 0 +2 Sobel -1 0 1
-1 0 +1 -2 -1 0
-3 -3 +5 -3 +5 +5
-3 0 +5 Kirsch -3 0 +5
-3 -3 +5 -3 -3 -3
-1 0 +1 0 +1 +1
-1 0 +1 Robinson -1 0 +1
-1 0 +1 -1 -1 0
Figure 2 Some examples for the most common compass edge detecting
masks, each example showing two masks out of the set of eight.
For every template, the set of all eight masks is obtained by shifting the coefficients of the
mask circularly. The result for using different templates is similar; the main difference is
the different scale in the magnitude image. The advantage of Sobel and Robinson masks
is, that only 4 out of the 8 magnitude values must be calculated. Since each pair of masks
rotated about 180° opposite is symmetric, each of the remaining four values can be
generated by inverting the result of the opposite mask.
The zero crossing detector looks for places in the Laplacian of an image where the value
of the Laplacian passes through zero --- i.e. points where the Laplacian changes sign.
Such points often occur at `edges' in images --- i.e. points where the intensity of the
image changes rapidly, but they also occur at places that are not as easy to associate with
edges. It is best to think of the zero crossing detector as some sort of feature detector
rather than as a specific edge detector. Zero crossings always lie on closed contours and
so the output from the zero crossing detector is usually a binary image with single pixel
thickness lines showing the positions of the zero crossing points.
The starting point for the zero crossing detector is an image, which has been filtered using
the Laplacian of Gaussian filter. The zero crossings that result are strongly influenced by
the size of the Gaussian used for the smoothing stage of this operator. As the smoothing is
increased then fewer and fewer zero crossing contours will be found, and those that do
remain will correspond to features of larger and larger scale in the image.
How It Works
The core of the zero crossing detector is the Laplacian of Gaussian filter and so
knowledge of that operator is assumed here. As described there, `edges' in images give
rise to zero crossings in the LoG output. For instance, Figure 1 shows the response of a
1-D LoG filter to a step edge in the image.
However, zero crossings also occur at any place where the image intensity gradient starts
increasing or starts decreasing, and this may happen at places that are not obviously
edges. Often zero crossings are found in regions of very low gradient where the intensity
gradient wobbles up and down around zero.
Once the image has been LoG filtered, it only remains to detect the zero crossings. This
can be done in several ways.
The simplest is to simply threshold the LoG output at zero, to produce a binary image
where the boundaries between foreground and background regions represent the locations
of zero crossing points. These boundaries can then be easily detected and marked in
single pass, e.g. using some morphological operator. For instance, to locate all boundary
points, we simply have to mark each foreground point that has at least one background
neighbor.
The problem with this technique is that will tend to bias the location of the zero crossing
edge to either the light side of the edge, or the dark side of the edge, depending upon
whether it is decided to look for the edges of foreground regions or for the edges of
background regions.
Figure 1 Response of 1-D LoG filter to a step edge. The left hand graph
shows a 1-D image, 200 pixels long, containing a step edge. The right
hand graph shows the response of a 1-D LoG filter with Gaussian standard
deviation 3 pixels.
A better technique is to consider points on both sides of the threshold boundary, and
choose the one with the lowest absolute magnitude of the Laplacian, which will hopefully
be closest to the zero crossing.
Since the zero crossings generally fall in between two pixels in the LoG filtered image, an
alternative output representation is an image grid, which is spatially, shifted half a pixel
across and half a pixel down relative to the original image. Such a representation is
known as a dual lattice. This does not actually localize the zero crossing any more
accurately of course. A more accurate approach is to perform some kind of interpolation
to estimate the position of the zero crossing to sub-pixel precision.
The behavior of the LoG zero crossing edge detector is largely governed by the standard
deviation of the Gaussian used in the LoG filter. The higher this value is set, the more
smaller features will be smoothed out of existence, and hence fewer zero crossings will be
produced. Hence, this parameter can be set to remove unwanted detail or noise as desired.
The idea that at different smoothing levels, different sized features become prominent is
referred to as `scale'.
6) Line detection
While edges (i.e. boundaries between regions with relatively distinct greylevels) are by
far the most common type of discontinuity in an image, instances of thin lines in an image
occur frequently enough that it is useful to have a separate mechanism for detecting them.
Here we present a convolution based technique, which produces a gradient image
description of the thin lines in an input image. Note that the Hough transform can be used
to detect lines; however, in that case, the output is a parametric description of the lines in
an image.
How It Works
The line detection operator consists of a convolution mask tuned to detect the presence of
lines of a particular width n, at a particular orientation θ. Figure 1 shows a collection of
four such masks, which each respond to lines of single pixel width at the particular
orientation shown.
Figure 1 Four line detection masks which respond maximally to horizontal, vertical, and
oblique (+45 and -45 degree) single pixel wide lines.
If Ri denotes the response of mask i, we can apply each of these masks across an image
and, for any particular point, if | Ri| > | Rj |for all j ≠ i that point is more likely to contain a
line whose orientation (and width) corresponds to that of mask i. One usually thresholds
Ri to eliminate weak lines corresponding to edges and other features with intensity
gradients which have a different scale than the desired line width. In order to find
complete lines, one must join together line fragments, e.g., with an edge tracking
operator.
“Thinning” plays an important role in digital image processing and pattern recognition,
since the outcome of thinning can largely determine the effectiveness and efficiency of
extracting the distinctive features from the images. In image processing and pattern
recognition problems, a digitized binary pattern is normally defined by a matrix, where
each element, called a pixel, is either 1 (front/white pixel) or 0 (background/dark pixel).
Thinning is a process that deletes the front pixels and transforms the pattern into a “thin”
line drawing. The resulted thin image is denominated as a skeleton of the original image.
The thinned image must preserve the basic structure and the connectedness of the original
image.
The objective of thinning is to reduce the amount of information in image pattern to the
minimum needed for recognition. Thinned image helps the extraction of important
features such as end points, junction points, and connections from image patterns. Thus,
many thinning algorithms have been proposed until now.
Two major approaches of thinning digital patterns can be categorized into iterative
boundary removal algorithms and distance transformation algorithms [38]. Iterative
boundary removal algorithms delete pixels on the boundary of a pattern repeatedly until
only unit pixel-width thinned image remains. Distance transformation algorithms are not
appropriate for general applications since they are not robust, especially for patterns with
highly variable stroke directions and thickness.
Thinning based on iterative boundary removal can be divided into sequential and parallel
algorithms. In a sequential/serial method, the value of a pixel at the nth iteration depends
on a set of pixels for some of which the result of nth iteration is already known. In
parallel processing, the value of a pixel at the nth iteration depends on the values of the
pixel and its neighbors at the (n - 1)th iteration. Thus, all the pixels of the digital pattern
can be thinned simultaneously.
There are two main steps in this thinning algorithm that are repeated until the obtained
image approaches the medium axis of the original image. In the first step the contour of
the image is calculated for deletion and marked (this is a serial approach) and in the
second step the contour marked is deleted (this is a parallel approach). The contour of an
image is formed by pixels-on that are found in the innermost and most distant positions of
this image.
These are the main characteristics of the TA algorithm: i) maintain the connectivity and
preserve the end points; ii) the skeleton resulting approaches the medium axis of the
original image; iii) is practically immune to noise; and iv) the execution time is very fast
[39].
Strip algorithms for curve fitting have received much attention recently because of their
superior speed performance advantage. As shown in fig. 1, a strip is defined by one
critical and two boundary lines. A critical line is defined by two reference points, the first
and the second data points (i.e. points O and a in fig. 1) of a curve. Then two boundary
lines which are parallel to the critical line and at a distance d from it are defined. The
distance d is commonly called the error tolerance. These two boundary lines form a strip
to restrict the line fitting process. The curve is then traversed point by point. The process
stops and a line segment is generated when the first point which is outside the strip is
found (e.g. point e in fig. 1). A line segment is then defined by the points O and c. Point c
is used again as the starting point for the next strip fitting mechanism.
One major problem with the strip algorithm is that if the second reference point is
positioned in such a way that the third point on the curve is outside the strip, the resulting
line segment will then be very short and is often not desirable. An example is shown as
strip 1 in fig 2. it can be seen in the same figure that strip 2 is a more desirable strip
because it contains more data points.
From the above simple observation, Leung and Yang [40] proposed a Dynamic Strip
algorithm (DSA) which rotates the strip using the starting point as a pivot. The basic idea
is to rotate the strip to enclose as many data points as possible. An example to illustrate
the advantage of the Dynamic Strip algorithm can be seen in fig.3 where case (a)
illustrates the best possible strip without rotation while case (b) illustrates the best
possible strip when rotation is allowed. Orientation of the strip is the only parameter to
vary in the Dynamic Strip Algorithm.
Boundary Line
d
o. .a Critical Line
.b .c d
.e .f Boundary Line
Strip 2
.e
.b .c
o. .a
Strip 1
The Dynamic Two-Strip algorithm has two stages. In the first stage, a generator called the
Left-Right Strip Generator (LRSG) is employed to find the best fitted LHS and RHS
strips at each data point. In our convention, a point that is traversed before (after) the data
point P is said to be on the RHS (LHS) of P. The computed strips are used to compute the
figure of merit of the data point. In the second stage, a local maximum detection process
is applied to pick out desirable feature points, i.e. points of high curvature. The
approximated curve is one with the feature points connected by straight lines.
LRSG is an extension of the Dynamic Strip algorithm. The strip is allowed to adjust its
orientation as well as the width dynamically. To simplify our discussion, we assume our
data points are labeled from 0, 1, …, to N-1 and are traversed in either clockwise or
Left Right Left Right
counter-clockwise fashion. Let Li ( Li ) and Wi ( Wi ) be the length and
th
width of the fitted LHS (RHS) strip at the i data point. Initially, a strip with the
minimum width (i.e. Wi = wmin ) is used in each direction. When no more data points can
be included into the strip, the ratio
LLeft LRight
E i
left
= i
and E i
Right
= i
Wi Left Wi Right
Wi
with Wi < 1 / Lmax and 1 ≤ Wi’ < ∞. In this case, we will have E i > Li ⋅ Lmax . Since Li is
bounded from below by 1, Ei would be > Lmax. On the other hand, Ei’ can be at most equal
to Li’ with Wi’ = 1. since Li’ is bounded from above by Lmax, E i would be larger than Ei’.
Therefore, under the situation with Wi < 1/Lmax, no strip of width ≥ 1 will be chosen and
none or little data reduction (noise filtering) is done. In practice, data reduction or noise
filtering is desirable.
The result of the above operation is a collection of all the longest possible LHS (RHS)
strips with different width at each data point. At each side of the data point, only the strip
with the largest Ei is selected.
The LRSG simulates the side detection mechanism. The curvature at a point can then be
determined by the angle subtended by the best fitted left and right strips. In order to
determine if the ith data point pi is a feature point, we define a figure of merit ( f i ) that
measures the worthiness of Pi to be included in the approximation. f i is defined as :
f i = EiLeft ⋅ S iθ ⋅ EiRight
θ
Where θ is the angle subtended by the best fitted left and right strips and Si is the angle
acuteness measure at point i.
S iθ =| 180 − θ | 0 ≤ θ ≤ 360o.
θ
According to this computation, sharper angles will give a larger value of Si . It can be
seen that a sharp angle subtended by long strips will result in a large fi whereas a blunt
angle subtended by short strips will result in a small fi. The above discussion can be
summarized by the following three steps.
Left Right
(1) Determine Ei and Ei for all i.
(2) Determine the angle θ subtended by the left and right strips and also the value of
Siθ .
(3) Determine f i .
The local maximum detection process consists of three stages. First, non-local-maximum
points (i.e. points with small f as compared with their neighbors) are eliminated
temporarily. The second step is to check if over eliminated has occurred. Consequently,
some temporarily eliminated points are added back to the result. The final step is to fit
narrow strips to the remaining points to eliminate points that align approximately on a
straight line. Details of the above steps are described in the following.
Non-local-maximum elimination process: basically, this is a process that allows each data
point, Pi, with high f i to eliminate other points that are in the left and right domain of P i.
A domain is defined by the area or length covered by the best fitted strip of a point. To
Left
simplify future discussing, the left and right domains of Pi are denoted by Di and
DiRight respectively. A point Q is in say, the left domain of Pi is written as:
Left
Q ∈ Di
An ideal case is shown in fig. 4 where points A and B are local maxima since all the other
points which are between A and B (e.g. C) have strips subtending an angle of
approximately 180o (fig 3(a)) or they may have strips of wider widths together with wider
angles (fig. 3(b)). In these cases, points between A and B (e.g. point C) are eliminated.
In the algorithm, a point Pj is eliminated if one of the following conditions are satisfied.
relaxed. We define that the condition D j ⊆ D j − m is said to hold if half of the left
Left Left
**
domain of Prcrcj Left
is covered by D j − m . The same can be applied to the right domain
Another problem that can arise can be understood by considering fig. 4. In fig. 4, if the
lines AB and FG are ling enough, the curve BCDF is comparatively insignificant and can
be ignored. On the other hand, if either AB or FG is short, the curve BCDF may be of
significance. The classification can be illustrated by considering the angle at point B. At
point B, the best fitted right strip would be from point B to A. If the line FG is long, the
best fitted left strip of B would be from point B to G. On the other hand, if the line FG is
short, the best fitted left strip may be from B to C since a narrower strip, which can give a
Left
larger value of f i , can be used. In the first case the angle subtended by the left and right
strips of point B is obtuse while in the second be eliminated.
C
Left domain of C
B. . .A
Right domain of C
(a)
Right domain of C
B. .A
.C
Left domain of C
(b)
For example (see fig. 3), if the best fitted left strip of point A is from A to G, the process
will examine the points (e.g. B,C,D and F) in between before eliminating any of them. If
the lines AB and FG are long enough, all the points in between will have obtuse angles
and are eliminated. Otherwise, those which have acute angles will be retained. For
example, if point B has an acute angle, only points between A and B will be eliminated by
A. consequently, the left domain of A is reduced to be from A to B only
G. F B .A
D C
In either case, additional feature points are sought and the points involved are reexamined
iteratively (or recursively) until all neighboring points are bridged together. The
additional feature points are sought at the end of the shortened domains by selecting
immediate local maximum points in the neighborhood. For example, in fig. 5(b) at the
end of the shortened right domain of B, the process looks for the first local maximum
starting from point C to B. For the shortened left domain of A, the process starts from
point D to A.
In short, the bridging process checks for the termination condition (i.e. all neighboring
points are bridged together) in each iteration. If the condition is satisfied, the process
terminates. Otherwise, additional feature Right
pointsdomain
are sought
of Band the iteration continues.
Left domain of A
Strip fitting
B process: it is a data reduction process to fit narrow strips to the remaining
A
points. The reason behind this process is that some consecutive feature points may align
approximately on a straight line and it is desirable to eliminate the points in between. For
example, ifFigure
points5(a):
A, B,AnCideal
and relationship
D are chosen between
as thetwo localpoints after the first two
feature
processes as shown inmaximum
fig. 5(b),points (A and B)toand
it is desirable their domains.
eliminate points C and D and let the
more prominent points A and B to represent the curve ADCB. In practice, the process first
locates the most outstanding points, the local maximum points (e.g. A) among the
remaining points,
Right as starting
domain of points.
B Then two narrowLeft strips of fixed
domain of Awidths (one half of
the minimum width) is fitted to the LHS and RHS of the data point, eliminating any
B
points within the strips with smaller values of merit (e.g. C and D). TheAfitting strips
whenever the last point can be fitted within the strip is found or a point of a larger value
of merit is met. In either case, the lastCpoint examined isDnot eliminated.
Fig 5(b) An example of bridges broken with condition(i)
Right domain of B
Left domain of A
B A
C
Fig. 5(c) An example of bridge broken with condition(ii)
2.3.5 Hausdorff Distance Algorithm
The use of the Hausdorff distance for binary image comparison and computer vision was
originally proposed by Huttenlocher and colleagues [41]. In their paper the authors argue
that the method is more tolerant to perturbations in the locations of points than binary
correlation techniques since it measures proximity rather than exact superposition. Unlike
most shape comparison methods, the Hausdorff distance can be calculated without the
explicit pairing of points in their respective data sets, A and B. Furthermore, there is a
natural allowance to compare partial images and the method lends itself to simple and fast
implementation. Formally, given two finite point sets A ={a1, …, ap}, and B = {b1, …,
bq}, the Hausdorff distance is defined as
H ( A , B ) = max ( h ( A , B ) , h ( A , B ) ),
In the formulation above ||.|| is some underlying norm over the point sets A and B. In the
following discussing, we assume that the distance between any two data points is defined
as the Euclidean distance. h (A, B) can be trivially computed in time O(pq) for point sets
of size p and q, respectively, and this can be improved to O((p + q)log(p + q)). The
function h (A, B) is called the directed Hausdorff distance from set A to B. It identifies
the point a∈A that is farthest from any point of B and measures the distance from a to its
nearest neighbor in B. In other words, h (A, B) in effect ranks each point of A based on its
distance to the nearest point in B and then used the largest ranked such point as the
measure of distance (the most mismatched point of A). Intuitively, if h (A, B) = d, then
each point of A must be within distance d of some point of B, and there also is some point
of A that is exactly distance d from the nearest point of B.
Realizing that there could be many different ways to define the directed (h (A, B), h (B,
A)) and undirected (H (A, B)) distances between two point sets A and B, Dubuisson and
Jain revised the metric and redefine the original definition of h (A, B) proposing an
improved measure, called the modified Hausdorff distance (MHD), which is less sensitive
to noise. Specifically, in their formulation
1
h( A, B ) =
Na
∑ min || a − b ||
a∈ A
b∈B
where Na = p, the number of points in set A. In their paper, the authors argue that even the
Kth ranked Hausdorff distance of Huttenlocher present some problems for object matching
under noisy conditions, and conclude that the modified distance proposed above has the
most desirable behavior for real-world applications.
In this paper, we adopt the MHD formulation of Dubuisson, and further improve its
performance by introducing the notion of a neighborhood function ( N Ba ) and associated
penalties (P). Specifically, we assume that for each point in set A, the corresponding point
in B must fall within a range of a given diameter. This assumption is valid under the
conditions that (i) the input and reference images are normalized by appropriate
preprocessing algorithms, and (ii) the non-rigid transformation is small and localized. Let
Nba be the neighborhood of point a in set B, and an indicator I = 1 if there exists a point b
∈ N Ba , and I=0 otherwise. The complete formulation of the “doubly” modified Hausdorff
distance (M2HD) can now be written as
1
h( A, B) =
Na
∑ d ( a , B ),
a∈ A
The notion of similarity encoded by this modified Hausdorff distance is that each point of
A be near some point of B and vice versa. It requires, however, that all matching pairs fall
within a given neighborhood of each other in consistency with our initial assumption that
local image transformations may take place. If no matching pair can be found, the present
model introduces a penalty mechanism to ensure that images with large overlap are easily
distinguished as well. As a result, the proposed modified Hausdorff measure (M2HD) is
ideal for applications, such as face recognition, where although overall shape similarity is
maintained, the matching algorithm has to account for small, non-rigid local distortions.
Create
File
Perform
User Director
Perform
Fig. Use Case Diagram for How User interacts with System
Face Detection
Robert Cross
Edge Detection
Robert Cross
Convert Image
to Binary Image
Process
Generate Line
Edge Map
Tester
Trainer
Save Image to
Database
Find Image
Directed Modified
Doubly
Modified HD
Fig. Use Case Diagram for Systems internal process
CHAPTER 3
DESIGN
Package: FaceRecognitionSystem
Package: FaceRecognitionSystem.MainWin
Package: FaceRecognitionSystem.GUI
Package: FaceRecognitionSystem.CreateDB
Package: FaceRecognitionSystem.StoreDB
Package: FaceRecognitionSystem.Support
Package: FaceRecognitionSystem.Binary
Package: FaceRecognitionSystem.FaceRegion
Package: FaceRecognitionSystem.EdgeDetector
Package: FaceRecognitionSystem.Thinning
Package: FaceRecognitionSystem.Dynamic2Strip
Package: FaceRecognitionSystem.HausdorffDistance
matrix.
operations on image-
performing binary
This class is used for
Class FaceRecognitionSystem.Thinning.HitAndMiss
Class FaceRecognitionSystem.Thinning.SerialThinning
Class FaceRecognitionSystem.Dynamic2Strip.LocalMaximum
This class is used to
find Hausdorff
distance of an input
image to each
images stored in the
database.
Class FaceRecognitionSystem.HausdorffDistance.HausdorffDistance
This class is a left-right
strip generator based on
Dynamic Two-Strip
algorithm.
Class FaceRecognitionSystem.Dynamic2Strip.LRSG
This class will find
pixels with local
maximum and eliminate
other pixels.
3.3 Sequence Diagram
This class is an
implementation of a
Doubly Modified
Hausdorff Distance
algorithm.
Class FaceRecognitionSystem.HausdorffDistance.M2HD
This class is an
implementation of a
Modified Hausdorff
Distance algorithm.
Class FaceRecognitionSystem.HausdorffDistance.MHD
This class is an
implementation of a
Directed Hausdorff
Distance algorithm.
Class FaceRecognitionSystem.HausdorffDistance.HD
Fig. Sequence Diagram for Creating Database
Fig.
Fig.
Fig. Sequence
Sequence
Sequence Diagram
Diagram
Diagram for
forfor
Full
Full
Step Testing
By Training
Step Testing
Class FaceRecognitionSystem.GUI.MainWin
public class MainWin : System.Windows.Forms.Form
{
private System.Windows.Forms.MainMenu mainMenu1;
private System.Windows.Forms.MenuItem menuFile;
private System.Windows.Forms.MenuItem menuOpen;
private System.Windows.Forms.MenuItem menuExit;
private System.Windows.Forms.MenuItem menuItem1;
private System.Windows.Forms.MenuItem menuOptions;
private System.Windows.Forms.MenuItem menuTraining;
private System.Windows.Forms.MenuItem menuTesting;
private System.Windows.Forms.MenuItem menuSBSTraining;
private System.Windows.Forms.MenuItem menuFTraining;
private System.Windows.Forms.MenuItem menuSBSTesting;
private System.Windows.Forms.MenuItem menuFTesting;
private System.Windows.Forms.OpenFileDialog openFileDialog;
public MainWin()
{
InitializeComponent();
}
protected override void Dispose( bool disposing );
static void Main()
{
Application.Run(new MainWin());
}
}
This is a class from where Main( ) method is called. When we run application, this
form will be loaded first, where other user-defined user controls are placed.
public HSL();
}
This class is used as an external support. This class is useful to convert RGB to
HSL and vice versa. And also used to set/modify brightness, saturation and hue.
Class FaceRecognitionSystem.Support.SuccPredec
This method returns a point, which will be a successor point of p with respect to x
from image-matrix Q.
This method returns a point, which will be a predecessor point of p with respect to
x from image-matrix Q.
Class FaceRecognitionSystem.CreateDB.CreateDB
public CreateDB();
}
This method will create database named “FaceDB” in SQL Server and than create
a table named as “FaceTab” in FaceDB, which is used for storing images in to database
for identification.
Class FaceRecognitionSystem.StoreDB.DataMgmt
public DataMgmt();
public void insertion(Image OI, Image PI);
public void distroy();
}
public DataMgmt()
Encapsulation : public
Method type : constructor
Method name : DataMgmt
Arguments : N/A
This method will create connection to database. And connect dataset to the table
“FaceTab” in the database.
This method will close connection and dispose builder and adapter objects.
Class FaceRecognitionSystem.FaceRegion.FaceRegion
This function will find the region of the face from the image passed as an
argument and will return the portion of the image found as an image.
This function will scale image to given size of width and height and return the
scaled image.
This method will extract the region founded, specified with two array cols and
rows form the image I and convert that region to image and return.
Interface FaceRecognitionSystem.EdgeDetector.Convolution
interface Convolution
{
double[][] Convolve(double[][] X,int[][] Y);
}
double[][] Convolve(double[][] X,int[][] Y)
Encapsulation : public
Return type : double[][]
Method name : Convolve
Arguments :
x : 2-D array of double, which will be convolve.
y : 2-D array of integer, by which x will be convolved.
Class FaceRecognitionSystem.EdgeDetector.ImgMatrix
This method will convert image I to matrix of double values filled with the
intensity values of each pixel and return matrix.
protected abstract Image MatToImg(double[][] X)
Encapsulation : protected
Return type : double[][]
Method name : MatToImg
Arguments :
I - argument of type Image.
This method will convert matrix of double values filled with the intensity values
of each pixel to an image and return an image.
Class FaceRecognitionSystem.EdgeDetector.EdgeMap
This method will convolve Z with respect to X and Y and store result into Gx and
Gy respectively.
This method will find magnitude from Gx and Gy and return the result in 2-D
array of double value.
This method will find Angle from Gx and Gy and return result in 2-D array of
double value.
This method will perform convolution of X with respect to Y and return result in
2-D array of double value.
This method will convert an image-matrix which contains intensity values of each
pixel to an image and return that image.
Class FaceRecognitionSystem.EdgeDetector.RobertCross
public RobertCross()
Encapsulation : public
Method type : constructor
Method name : RobertCross
Arguments : N/A
This method will process on image I, extracts edges from image, and returns
extracted edge-map as an image.
Class FaceRecognitionSystem.EdgeDetector.Sobel
public Sobel()
Encapsulation : public
Method type : constructor
Method name : Sobel
Arguments : N/A
This method will process on image I, extracts edges from image, and returns
extracted edge-map as an image.
Class FaceRecognitionSystem.Binary.BinImage
This method will convert image I to binary based on some predefined threshold.
Class FaceRecognitionSystem.Thinning.BinMatrix
This method will invert image-matrix x into 2-D array and return the got result.
This method will OR two images and generate single resulted image and return it.
This method will AND two images and generate single resulted image and return
it.
This method will convert an image-matrix, which contains intensity values of each
pixel to an image and return that image.
Class FaceRecognitionSystem.Thinning.HitAndMiss
This method will process image I with structuring element SE and find resulted
image and convert that into 2-D array of integers and returns it.
Class FaceRecognitionSystem.Thinning.SerialThinning
public SerialThinning(Image I)
Encapsulation : public
Method type : constructor
Method name : SerialThinning
Arguments :
I – image to thin.
This method will perform thinning operation over image-matrix Q and return
thinned image as a result.
This method will return total number of pixels form value 1 in the neighborhood
of P.
Q1 : 2-D array of type integer, an image-matrix, used to store original input image.
Q : 2-D array of type integer, an image-matrix, used for processing.
fi : 2-D array of type float, used to store calculated value of each pixel.
This method will initialize Q,Q1 and fi and then call other method to process of
input image. Finally place processed image into the picture box whose reference is passed
as an argument.
This method will calculate local maximum of each dark pixel of the image and
store result in matrix f.
This method will eliminate all pixels which are not local maximum.
This method will find strips on both sides of a pixel p, and return an array of
point in which all points within the rectangle generate from strips are there.
Class FaceRecognitionSystem.Dynamic2Strip.LRSG
public LRSG()
Encapsulation : public
Method type : constructor
Method name : LRSG
Arguments : N/A
This method will find strips and store their end points into global variables
Lpt1,Lpt2,Lpt3,Lpt4 and Rpt1,Rpt2,Rpt3,Rpt4 as per direction passed.
This method will find slop of a line passing through p and z and return slop found.
This method will check if point t is in between two lines passing from pt1 and pt2
with slop m1 and return its status as true if point is in between two lines else flase.
This method will check if points p and t are on same side of a line passing from x
with slop m.
This method will find two points on both side of p which will be on two strips of
p.
This method will find intersection of two lines passing from p an pt with slops m1
and m2 respectively.
Class FaceRecognitionSystem.HausdorffDistance.HausdorffDistance
This method will initialize server address that will be used in processing through
svr.
This method will find distance between image pass as argument and images in the
database and return number of images requested in array of images.
public Image[] HausdorffDist(Image I1, int P1, int N1, int no)
Encapsulation : public
Return type : Image[]
Method name : HausdorffDist
Arguments :
I1 : Image, passed for searching.
P1 : integer, penalty value passed for calculation.
N1 : integer, radius of neighborhood.
no : integer, number of best match images to return.
This method will find distance between image pass as argument and images in the
database and return number of images requested in array of images.
This method will sort the dist array and return sorted array of index.
Class FaceRecognitionSystem.HausdorffDistance.HD
public class HD
{
public float h(int[][] A,int[][] B)
public float max(float[] a)
public float[] min(int[][] A,int[][] B)
public float minimum(Point p,int[][] B)
}
This method will return the maximum value from float array a.
public float[] min(int[][] A,int[][] B)
Encapsulation : public
Return type : float[]
Method name : min
Arguments :
A : image-matrix, which is to be search.
B : image-matrix, retrieved from database.
This method will find minimum distance of all points in A with each point in B
and returns calculated distance as an array of float values.
This method will find minimum distance from p to each point in B and returns
calculated distance as a float value.
Class FaceRecognitionSystem.HausdorffDistance.MHD
This method will return the average value of all elements of array m.
Class FaceRecognitionSystem.HausdorffDistance.M2HD
P : integer, penalty
N : integer, radius of neighborhood.
This method will return the values calculated for all points in A with each point in
B based on penalty.
This method will find distance of point p with each point in B and returns
calculated distance based on penalty.
This method will find minimum distance from p to each point in B and returns
calculated distance as a float value.
CONCLUSION
REFERENCES
[1] Yongsheng Gao and Maylor K.H. Leung, “Face Recognition Using Line Edge Map”
IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 24, No. 6, June
2002.
[2] Surendra Gupta and Krupesh Parmar, “A Combined approach of Serial and Parallel
Thinning Algorithm for Binary Face Image,” Computing-2005, Division IV, CSI
Conference, May 2005.
[3] Y. Gao, “Efficiently comparing face images using a modified Hausdorff distance,”
IEE Proc.-Vis. Image Signal Process., Vol. 150, No. 6, December 2003.
[4] M.K.H. Leung and Y.H. Yang, “Dynamic Two-Strip Algorithm in Curve Fitting,”
Pattern Recognition, vol. 23, pp. 69-79, 1990.
[7] Daniel L. Swets, John (Juyang) Weng, “ Using Discriminant Eigenfeatures for Image
Retrieval, ,” IEEE Trans. Pattern Anal. Machine Intell., vol. 18, August 1996.
[8] C. Kotropoulos and I. Pitas, “Rule-Based Face Detection in Frontal Views,” Proc.
IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP-97), vol. 4, pp.
2537-2540, Apr. 1997.
[11]M. Kirby and L. Sirovich, “Application of the Karhunen-LoeÁve Procedure for the
Characterisation of Human Faces,” IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 12, pp. 831-835, Dec. 1990.
[13]M.A. Grudin, “A Compact Multi-Level Model for the Recognition of Facial Images,”
PhD thesis, Liverpool John Moores Univ., 1997.
[14]L. Zhao and Y.H. Yang, “Theoretical Analysis of Illumination in PCA-Based Vision
Systems,” Pattern Recognition, vol. 32, pp. 547-564, 1999.
[15]A. Pentland, B. Moghaddam, and T. Starner, “View-Based and Modular Eigenspaces
for Face Recognition,” Proc. IEEE CS Conf. Computer Vision and Pattern
Recognition, pp. 84-91, 1994.
[16]T.J. Stonham, “Practical Face Recognition and Verification with WISARD,” Aspects
of Face Processing, pp. 426-441, 1984.
[17]K.K. Sung and T. Poggio, “Learning Human Face Detection in Cluttered Scenes,”
Computer Analysis of Image and Patterns, pp. 432-439, 1995.
[18]S. Lawrence, C.L. Giles, A.C. Tsoi, and A.D. Back, “Face Recognition: A
Convolutional Neural-Network Approach,” IEEE Trans. Neural Networks, vol. 8, pp.
98-113, 1997.
[19]J. Weng, J.S. Huang, and N. Ahuja, “Learning Recognition and Segmentation of 3D
objects from 2D images,” Proc. IEEE Int'l Conf. Computer Vision, pp. 121-128, 1993.
[20]S.H. Lin, S.Y. Kung, and L.J. Lin, “Face Recognition/Detection by Probabilistic
Decision-Based Neural Network,” IEEE Trans. Neural Networks, vol. 8, pp. 114-132,
1997.
[21]S.Y. Kung and J.S. Taur, “Decision-Based Neural Networks with Signal/Image
Classification Applications,” IEEE Trans. Neural Networks, vol. 6, pp. 170-181,
1995.
[22]F. Samaria and F. Fallside, “Face Identification and Feature Extraction Using Hidden
Markov Models,” Image Processing: Theory and Application, G. Vernazza, ed.,
Elsevier, 1993.
[23]F. Samaria and A.C. Harter, “Parameterisation of a Stochastic Model for Human Face
Identification,” Proc. Second IEEE Workshop Applications of Computer Vision, 1994.
[24]S. Tamura, H. Kawa, and H. Mitsumoto, “Male/Female Identification from 8_6 Very
Low Resolution Face Images by Neural Network,” Pattern Recognition, vol. 29, pp.
331-335, 1996.
[25]Y. Kaya and K. Kobayashi, “A Basic Study on Human Face Recognition,” Frontiers
of Pattern Recognition, S. Watanabe, ed., p. 265, 1972.
[27]A.J. Goldstein, L.D. Harmon, and A.B. Lesk, “Identification of Human Faces,” Proc.
IEEE, vol. 59, p. 748, 1971.
[28]R. Bruneli and T. Poggio, “Face Recognition: Features versus Templates,” IEEE
Trans. Pattern Analysis and Machine Intelligence, vol. 15, pp. 1042-1052, 1993.
[29]I.J. Cox, J. Ghosn, and P.N. Yianios, “Feature-Based Face Recognition Using
Mixture-Distance,” Computer Vision and Pattern Recognition, 1996.
[30]B.S. Manjunath, R. Chellappa, and C. von der Malsburg, “A Feature Based Approach
to Face Recognition,” Proc. IEEE CS Conf. Computer Vision and Pattern
Recognition, pp. 373-378, 1992.
[31]B. TakaÂcs, “Comparing Face Images Using the Modified Hausdorff Distance,”
Pattern Recognition, vol. 31, pp. 1873-1881, 1998.
[33]P.J.M. van Laarhoven and E.H.L. Aarts, Simulated Annealing: Theory and
Applications. Kluwer Academic Publishers, 1987.
[34]R.H.J.M. Otten and L.P.P.P. van Ginneken, The Annealing Algorithm. Kluwer
Academic Publishers, 1989.
[35]Olivier de Vel and Stefan Aeberhard, “Line-Based Face Recognition under Varying
Pose,” IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 21, No.
10, October 1999.
[36]H.I. Kim, S.H. Lee and N.I. Cho, “Rotation-invariant face detection using angular
projections,” ELECTRONICS LETTERS 10th June 2004 Vol. 40 No. 12
[37]Frank Y. Shih and Wai-Tak Wong, “A New Safe-Point Thinning Algorithm Based on
the Mid-Crack Code Tracing”, IEEE Transactions on Systems. Man, and Cybernetics,
vol. 25, no. 2, pp. 370-377, Feb. 1995.
[38]N. H. Han, C. W. La, and P. K. Rhee, “An Efficient Fully Parallel Thinning
Algorithm”, IEEE,1997.
[40]Leung M.K., Yang Y. “A region based approach for human body motion analysis”,
Pattern Recognition 20:321-339; 1987.