0% found this document useful (0 votes)
26 views

Image Net

ImageNet is a large-scale image database containing over 3.2 million labeled images organized according to WordNet hierarchy. It aims to have 50 million labeled images across many categories. ImageNet provides a valuable resource for computer vision research due to its large scale, hierarchical structure based on WordNet, high precision of image labels, and diversity of images. Researchers are working to complete the construction of ImageNet and further develop applications that can leverage its large collection of labeled images.

Uploaded by

naimemed3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Image Net

ImageNet is a large-scale image database containing over 3.2 million labeled images organized according to WordNet hierarchy. It aims to have 50 million labeled images across many categories. ImageNet provides a valuable resource for computer vision research due to its large scale, hierarchical structure based on WordNet, high precision of image labels, and diversity of images. Researchers are working to complete the construction of ImageNet and further develop applications that can leverage its large collection of labeled images.

Uploaded by

naimemed3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

ImageNet

A Large-Scale Hierarchical Image Database

Réalisé par : Medjili mohamed naime & Bounab Abdelmounaam


Introduction:
The digital era's data explosion inspires ImageNet, a large-scale image
ontology with 3.2 million images, leveraging WordNet's hierarchy and Amazon
Mechanical Turk for construction, offering a vital resource for advanced
image applications.
2. Properties of ImageNet:
Structured hierarchically from WordNet, aims for 50 million labeled high-
resolution images, with the current focus on 12 subtrees, notably mammal and
vehicle categories.
- Scale: ImageNet's scale is evident with 3.2 million annotated images
across 5247 categories, making it the largest clean image dataset in
vision research.
- Hierarchy: ImageNet employs a densely populated semantic hierarchy
akin to WordNet, utilizing interlinked synsets through relations like "IS-A,"
resulting in an unmatched density, exemplified by 147 dog categories not found
in other vision datasets.
- Accuracy: ImageNet aims for high precision throughout the WordNet
hierarchy, exemplified by an average of 99.7%, acknowledging challenges in
distinguishing finer categories within the hierarchy.
- Diversity: Quantifying it through the average image's JPG file size, with
the expectation that more diverse synsets yield blurrier average images,
demonstrated in comparisons with Caltech101.
- TinyImage: 32x32, with 80 million low-resolution images, contrasts with
ImageNet's high-quality synsets (approx. 99% precision) and full-resolution
images 400x350, making ImageNet more suitable for robust algorithm
development and evaluation.
- ESP Dataset: Obtained through an online game, exhibits a biased
distribution at the "basic level" and sense disambiguation challenges, with
limited public availability, while ImageNet offers a more balanced hierarchy
distribution and avoids such issues, providing a larger and publicly
accessible dataset.
- LabelMe and Lotus Hill datasets: They complement ImageNet with
detailed object outlines, yet ImageNet's broader scope, larger category and image
counts, sourced from the entire Internet, set it apart. The Lotus Hill
dataset is purchasable.
3. Constructing ImageNet:
ImageNet is an ambitious project. Our goal is to complete the construction
of around 50 million images in the next two years. We describe here the method
we use to construct ImageNet, shedding light on how properties of Sec. To can be
ensured in this process.
3.1. Collecting Candidate Images:
In ImageNet's inception, despite a 10% internet search accuracy, it aims for
500-1000 clean images per synset. Utilizing WordNet synonyms and multilingual
translations, ImageNet meticulously compiles a diverse pool of over 10,000
images per synset, laying a strong foundation for computer vision research.
3.2 Cleaning Candidate Images:
Human evaluators on Amazon Mechanical Turk ensure the accuracy of the
dataset through meticulous verification of each candidate image.
Users verify synset presence in candidate images, prioritizing diversity by
overlooking occlusions and scene complexities in labeling tasks.
To overcome challenges, multiple users independently label images,
requiring a convincing majority for positivity; an algorithm dynamically adjusts
consensus levels based on semantic difficulty, successfully filtering candidate
images and ensuring a high percentage of cleanliness per synset.
4. ImageNet Applications:
In this section, we show three applications of ImageNet.
4.1. Non-parametric Object Recognition:
The objective is to determine the object class in an image by comparing it to
similar images in ImageNet.
This work proposes that using a clean set of full-resolution images and exploiting
more feature-level information can lead to more accurate object recognition.
4.2. Tree Based Image Classification
Compared to other available datasets, ImageNet provides image data in a densely
populated hierarchical structure. Many possible algorithms could be applied to
exploit a hierarchical data structure
4.3. Automatic Object Localisation
ImageNet can be extended to provide additional information about each
image. One such information is the spatial extent of the objects in each image.
Two application areas come to mind. First, for training a robust object detection
algorithm one often needs localized objects in different poses and under different
viewpoints. Second, having localized objects in cluttered scenes enables users to
use ImageNet as a benchmark dataset for object localization algorithms. In this
section we present results of localization on 22 categories from different depths
of the WordNet hierarchy. The results also throw light on the diversity of images
in each of these categories.
5. Discussion and Future Work
Our future work has two goals:
5.1. Completing ImageNet
The current ImageNet constitutes ∼ 10% of the WordNet synsets. To
further speed up the construction process, we will continue to explore more
effective methods to evaluate the AMT user labels and optimize the number of
repetitions needed to accurately verify each image.
5.2. Exploiting ImageNet
We hope ImageNet will become a central resource for a broad of range of
vision related research.

You might also like