Open In App

Dataset for Face Recognition

Last Updated : 24 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Face recognition means to identify and verify a person by looking at their facial features and it is used in security systems, social media or even unlocking your phone. To build and test these systems researchers and developers need a good quality dataset for training. In this article, we will discuss some of the famous datasets for face recognition.

Dataset-for-Face-Recognition
Dataset for Face Recognition

1. Labelled Faces in the Wild (LFW)

  • This dataset contains over 13,000 images of people’s faces collected from internet. Each image is labeled with the person’s name. This dataset is widely used for face verification and recognition algorithms.
  • It was created by the University of Massachusetts, Amherst and is used to check how accurate a face recognition model is.

2. YouTube Faces DB

  • As the name suggests this dataset includes videos taken from YouTube. It contains 3,425 videos of 1,595 people. Because these are videos each person’s face appears in many different ways moving, turning, smiling or in different lighting.
  • This makes it very useful for testing how well a face recognition system works when the face is not still just like in real life. The dataset was created by the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT and Tel Aviv University.

3. CelebA

  • This dataset is made up of over 200,000 images of celebrities. But what’s special about it is that each photo comes with 40 different labels like “wearing glasses”, “smiling”, “has beard” and so on.
  • This helps train systems not just to recognize faces but also to detect features and emotions. It’s very useful for tasks like face detection and analyzing facial expressions.
  • This dataset was developed by the Multimedia Laboratory at the Chinese University of Hong Kong.

4. CASIA WebFace

  • The CASIA WebFace dataset contains approximately 500,000 images of 10,575 individuals sourced from the web.
  • It is primarily used for face recognition research and has been instrumental in advancing deep learning techniques in this area.
  • It was created by the Institute of Automation, Chinese Academy of Sciences. If you're building a strong face recognition system this dataset is a good place to start.

5. FERET Database

  • FERET stands for Facial Recognition Technology. This dataset has been for a long time and includes over 14,000 images of 1,199 individuals.
  • These images were taken under very controlled conditions like fixed lighting and camera angles which makes it ideal for comparing the basic performance of different face recognition systems.
  • It was developed by the National Institute of Standards and Technology (NIST).

6. PubFig

  • The Public Figures Face Database (PubFig) contains images of 200 public figures with 58,797 images. The images are collected from the web and include a wide variety of conditions and variations.
  • PubFig is used for evaluating face recognition systems particularly in unconstrained environments. This dataset was created by researchers at the University of Massachusetts, Amherst.

7. MS-Celeb-1M

  • The MS-Celeb-1M dataset is a large-scale face recognition dataset with 1 million images of 100,000 celebrities. Created by Microsoft Research it provides a massive resource for training and evaluating face recognition models.
  • The dataset is designed to address the challenge of large-scale face recognition in different conditions.

8. VGGFace2

  • The VGGFace2 dataset consists of 3.31 million images of 9,131 individuals. It includes a wide range of poses, ages and lighting conditions which make it suitable for training robust face recognition models.
  • It build face recognition models that are more flexible and reliable in real-world conditions. This dataset was created by the Visual Geometry Group at the University of Oxford.

9. MegaFace

  • The MegaFace dataset is designed to evaluate the performance of face recognition algorithms at a large scale. It includes over 1 million images of 690,000 unique individuals.
  • The primary goal of MegaFace is to test face recognition systems under large-scale and real-world conditions. This dataset was developed by the University of Washington.

10. UMDFaces

  • The UMDFaces dataset contains over 367,000 face annotations for 8,277 subjects collected from web.
  • This dataset includes both still images and video frames and was developed by the University of Maryland.

In order to train efficient systems for recognizing people’s faces it is important to have a well selected, vast, diverse and ethically obtained collection of data and you can use any of the dataset for that discussed above.


Similar Reads