Face Recognition
week #
Outlines
1. Introduction
2. History of Face Recognition
3. Technical Foundation
4. Challenges in Face Recognition
5. Popular Algorithms
6. Hands-on/Demonstration
Applications
Face Recognition
Face recognition is a technology
to identify or verify a person from
an image or video using facial
features.
How They works?
History of Face Recognition
History: 1960s (Early Beginning)
Woodrow Wilson Bledsoe
developed a system that used a
RAND tablet
Manually record the
coordinates of facial landmarks
like the eyes, nose, and mouth
Source: [Link]
History: 1970s (First Automated Systems)
Researchers developed algorithms to
automatically detect facial features
Techniques like template matching were used
System using 21 subjective markers (e.g., hair
color, lip thickness) to identify faces was
invented.
Source: [Link]
History: 1980s (Eigenfaces and Machine Learning)
A significant breakthrough occurred with the
development of eigenfaces by Sirovich and
Kirby (1987)
This method used principal component
analysis (PCA) to represent faces as a
combination of eigenvectors, reducing the
complexity of face images into a "face space."
Source: [Link]
History: 2000s (Robustness in Real-World)
Fisherfaces (1997) (Belhumeur et al.) used LDA
to enhance discrimination between individuals
while reducing sensitivity to lighting changes.
The use of 3D sensors to capture facial depth
improved recognition under varying poses and
lighting, though it required specialized
equipment.
Local Binary Patterns (LBP) Introduced by
Ahonen et al. in 2004
LBP analyzed local texture patterns, making
systems more resilient to illumination
changes.
Source:
[Link]
[Link]
History: 2010s (Deep Learning Revolution)
DeepFace (2014) used a CNN to extract
hierarchical features from face images, achieving
97.35% accuracy on the Labeled Faces in the Wild
(LFW) dataset
FaceNet (2015) used a CNN to create 128-
dimensional face embeddings, trained with triplet
loss to ensure embeddings of the same person
were similar. It achieved 99.63% accuracy on LFW
and 95.12% on the YouTube Faces Database.
Source:
[Link]
[Link]
Current State & Challenges
Systems now handle occlusions (e.g., masks) and
improve fairness by addressing biases in training
data.
Anti-spoofing techniques detect fake inputs like
photos or masks.
Technical Foundations
Main Approach
Face recognition systems use different strategies to “see” and
identify faces. We’ll focus on two key methods: the feature-based
approach and the appearance-based approach. Each has its own
way of tackling the problem, with unique strengths and challenges.
Main Approaches: Feature-based
What It Is: This method marks specific parts of the face, called
facial landmarks—like the eyes, nose, mouth, and chin. Think of
these as key dots on a face map.
How It Works: The system measures things like the distance
between the eyes, the width of the nose, or the angle of the jaw.
These measurements form a unique “template” for each face, kind
of like a fingerprint for your facial features.
Analogy: Imagine recognizing a friend by their standout features—
like their big eyes or sharp cheekbones. You’re focusing on
specific details, not the whole face.
Visual Idea: A face with dots marking the eyes, nose, and mouth,
connected by lines to show distances—almost like a connect-the-
dots drawing.
Source: [Link]
Main Approaches: Feature-based
Strengths: It’s simple and doesn’t need a lot of computing power,
so it’s fast on basic systems.
Weaknesses: It can get tripped up by changes in lighting, head
position (e.g., tilting), or facial expressions, since these shift the
landmarks.
Source: [Link]
Main Approaches: Appearance-based
What It Is: This method looks at the whole face as one big pattern,
instead of breaking it into parts. It’s about the overall “look” of
the face.
How It Works: It uses math tricks (like something called principal
component analysis, or PCA) to analyze the face image and pull
out the most important patterns. These patterns summarize how
the face varies from others.
Analogy: Think of recognizing a friend from far away by their
overall vibe or silhouette, even if you can’t see the details clearly.
Visual Idea: A face image being “broken down” into simpler
pattern pieces (sometimes called eigenfaces), like a puzzle being
simplified.
Source: [Link]
Main Approaches: Appearance-based
Strengths: It handles changes in lighting, pose, or expression
better because it’s not tied to specific points—it sees the big
picture.
Weaknesses: It needs more computing power and lots of data to
work well, so it’s a bit heavier to run.
Source: [Link]
Machine Learning Approach
Face recognition isn’t just about analyzing faces—it’s about
teaching computers to learn from examples, using a field called
machine learning. We’ll cover two types: supervised learning and
unsupervised learning.
Machine Learning: Supervised
What It Is: The computer is trained with a dataset of face images where each one is
labeled—like “This is Alex” or “This is Priya.”
How It Works: The system studies these labeled examples to figure out what features
or patterns match each person. After training, it can look at a new face and guess who
it is.
Example: Imagine feeding the system thousands of tagged photos. It learns what
makes Alex’s face different from Priya’s—like eye shape or jawline.
Analogy: It’s like teaching a kid to name family members by showing them labeled
photos over and over until they get it.
Alex
Priya
Machine Learning: Unsupervised
What It Is: This is less common but still useful. The system doesn’t get labels—it just
groups similar faces together based on what it sees.
How It Works: It uses clustering techniques to spot patterns and lump similar faces
into groups. It’s like sorting without knowing who’s who.
Example: Picture a stack of random face photos. The system sorts them into piles
where each pile has faces that look alike, even without names.
Analogy: It’s like sorting a deck of cards by color or shape without knowing the rules—
just grouping what looks similar.
Source: [Link]
Challenges in Face Recognition
Lighting Variations
What It Is: The way a face looks changes depending on the light. Bright sunlight might
cast harsh shadows, while a dimly lit room might hide details.
Why It’s a Challenge: A system might not recognize the same person in two photos if
one is taken in sunlight and the other in a dark room. This can cause false negatives
(not identifying a face) or false positives (mistaking one face for another).
Example: Imagine trying to recognize a friend in a shadowy corner versus under a
bright spotlight—it’s tough for humans, and even tougher for machines!
Source: [Link]
Pose Variations
What It Is: The angle of a face in an image—like a front view versus a side profile—
affects its appearance.
Why It’s a Challenge: Many systems are trained on straight-on face images, so they
struggle with side views or tilted angles. This is a big issue for security cameras, which
often capture faces from odd perspectives.
Example: If you only know someone’s face from the front, spotting them from the side
can be tricky. Computers face the same problem.
Source: [Link]
Expression Variations
What It Is: Smiling, frowning, or even blinking changes how a face looks by shifting
features like the mouth or eyes.
Why It’s a Challenge: A system might not match a smiling photo with a neutral one of
the same person because the features don’t line up perfectly.
Example: Think about how different someone looks when they’re laughing versus
when they’re stone-faced—it’s a real puzzle for recognition tech.
Source: [Link]
Occlusion
What It Is: Parts of the face can be hidden by things like glasses, hats, masks, or facial
hair.
Why It’s a Challenge: When key features (like eyes or the nose) are covered, the system
has less to work with, reducing its ability to identify someone.
Example: Picture trying to recognize someone wearing a scarf over half their face—
you’d struggle, and so would a computer.
Source: [Link]
Aging
What It Is: Faces change as people age—skin wrinkles, hair thins or grays, and features
shift over time.
Why It’s a Challenge: A system might not connect a recent photo with one taken years
earlier, which is a problem for things like long-term identity checks or finding missing
people.
Example: Recognizing someone from a decades-old photo is hard for you, and it’s just
as hard for a machine.
Source: [Link]
FaceNet by Google
FaceNet (A Unified Embedding for Face
Recognition and Clusterin) by Google
FaceNet is a deep learning algorithm developed by Google to recognize faces. It takes a
face image as input and generates a 128-dimensional vector representation, known as an
embedding, that captures the unique features of that face. This embedding can then be
used for various face recognition tasks.
Source: [Link]
What is Triplet Loss?
Common Classification Loss
Softmax
Classname 1
Classname 2
Dataset form
Image A Classname A
Image A Classname A
Image A
Image B
Classname A
Classname B
Loss calculated by how
Image B Classname B good is the prediction
Image A Classname A
Image A Classname A value
Image A Classname A
Image B Classname B
Image B Classname B
Source: [Link]
What is Triplet Loss?
Triplet Loss
Dataset form
Image A Image A Image B
Image A Image A Image B
Image A
Image B
Image A
Image B
Image B
Image A
Embedding mapped into
Image B Image B Image A n-dimensional space
Image A Image A Image B
Image A Image A Image B
Image A Image A Image B
Image B Image B Image A
Image B Image B Image A
Source: [Link]
What is Triplet Loss?
To make it easy, assume the
embedding has only 2 dimension
Condition before training
Source: [Link]
What is Triplet Loss?
To make it easy, assume the
embedding has only 2 dimension
Condition after training
Source: [Link]
Build Our Face Recognition
Install Libraries
Preparing Dataset
Utilizing our knowledge from previous
courses to create face detection,
whether it is pretrained or not to
capture faces automatically.
Preparing Dataset
Here you should prepare your face
images by capturing it automatically.
Example Public Dataset
[Link]
Fine Tuning FaceNet
Importing Libraries
Fine Tuning FaceNet
Create Dataset Class #1
Fine Tuning FaceNet
Create Dataset Class #2
Fine Tuning FaceNet
Create Custom Triplet Loss
Fine Tuning FaceNet
Determine used device and
preprocessing object using transform
Fine Tuning FaceNet
Define the dataset object and the dataloader
Fine Tuning FaceNet
Define the model (ex. InceptionResnetV1) and
freeze some weights
Fine Tuning FaceNet
Define the criterion/tripletloss object,
optimizer, number of epochs, and ensure the
model is in training mode
Fine Tuning FaceNet
Do the training loop
Fine Tuning FaceNet
The training process
Test The Finetuned Model
Load the known faces
Test The Finetuned Model
Match embedding to known face embedding
Test The Finetuned Model
Determine the used device, create face
detection model using MTCNN, and load the
fine-tuned FaceNet model
Test The Finetuned Model
Initiating video input
Test The Finetuned Model
Do detection and
face recognition
process in realtime
Got a question?