ImageNet is a large-scale image database containing over 3.2 million labeled images organized according to WordNet hierarchy. It aims to have 50 million labeled images across many categories. ImageNet provides a valuable resource for computer vision research due to its large scale, hierarchical structure based on WordNet, high precision of image labels, and diversity of images. Researchers are working to complete the construction of ImageNet and further develop applications that can leverage its large collection of labeled images.
ImageNet is a large-scale image database containing over 3.2 million labeled images organized according to WordNet hierarchy. It aims to have 50 million labeled images across many categories. ImageNet provides a valuable resource for computer vision research due to its large scale, hierarchical structure based on WordNet, high precision of image labels, and diversity of images. Researchers are working to complete the construction of ImageNet and further develop applications that can leverage its large collection of labeled images.
Réalisé par : Medjili mohamed naime & Bounab Abdelmounaam
Introduction: The digital era's data explosion inspires ImageNet, a large-scale image ontology with 3.2 million images, leveraging WordNet's hierarchy and Amazon Mechanical Turk for construction, offering a vital resource for advanced image applications. 2. Properties of ImageNet: Structured hierarchically from WordNet, aims for 50 million labeled high- resolution images, with the current focus on 12 subtrees, notably mammal and vehicle categories. - Scale: ImageNet's scale is evident with 3.2 million annotated images across 5247 categories, making it the largest clean image dataset in vision research. - Hierarchy: ImageNet employs a densely populated semantic hierarchy akin to WordNet, utilizing interlinked synsets through relations like "IS-A," resulting in an unmatched density, exemplified by 147 dog categories not found in other vision datasets. - Accuracy: ImageNet aims for high precision throughout the WordNet hierarchy, exemplified by an average of 99.7%, acknowledging challenges in distinguishing finer categories within the hierarchy. - Diversity: Quantifying it through the average image's JPG file size, with the expectation that more diverse synsets yield blurrier average images, demonstrated in comparisons with Caltech101. - TinyImage: 32x32, with 80 million low-resolution images, contrasts with ImageNet's high-quality synsets (approx. 99% precision) and full-resolution images 400x350, making ImageNet more suitable for robust algorithm development and evaluation. - ESP Dataset: Obtained through an online game, exhibits a biased distribution at the "basic level" and sense disambiguation challenges, with limited public availability, while ImageNet offers a more balanced hierarchy distribution and avoids such issues, providing a larger and publicly accessible dataset. - LabelMe and Lotus Hill datasets: They complement ImageNet with detailed object outlines, yet ImageNet's broader scope, larger category and image counts, sourced from the entire Internet, set it apart. The Lotus Hill dataset is purchasable. 3. Constructing ImageNet: ImageNet is an ambitious project. Our goal is to complete the construction of around 50 million images in the next two years. We describe here the method we use to construct ImageNet, shedding light on how properties of Sec. To can be ensured in this process. 3.1. Collecting Candidate Images: In ImageNet's inception, despite a 10% internet search accuracy, it aims for 500-1000 clean images per synset. Utilizing WordNet synonyms and multilingual translations, ImageNet meticulously compiles a diverse pool of over 10,000 images per synset, laying a strong foundation for computer vision research. 3.2 Cleaning Candidate Images: Human evaluators on Amazon Mechanical Turk ensure the accuracy of the dataset through meticulous verification of each candidate image. Users verify synset presence in candidate images, prioritizing diversity by overlooking occlusions and scene complexities in labeling tasks. To overcome challenges, multiple users independently label images, requiring a convincing majority for positivity; an algorithm dynamically adjusts consensus levels based on semantic difficulty, successfully filtering candidate images and ensuring a high percentage of cleanliness per synset. 4. ImageNet Applications: In this section, we show three applications of ImageNet. 4.1. Non-parametric Object Recognition: The objective is to determine the object class in an image by comparing it to similar images in ImageNet. This work proposes that using a clean set of full-resolution images and exploiting more feature-level information can lead to more accurate object recognition. 4.2. Tree Based Image Classification Compared to other available datasets, ImageNet provides image data in a densely populated hierarchical structure. Many possible algorithms could be applied to exploit a hierarchical data structure 4.3. Automatic Object Localisation ImageNet can be extended to provide additional information about each image. One such information is the spatial extent of the objects in each image. Two application areas come to mind. First, for training a robust object detection algorithm one often needs localized objects in different poses and under different viewpoints. Second, having localized objects in cluttered scenes enables users to use ImageNet as a benchmark dataset for object localization algorithms. In this section we present results of localization on 22 categories from different depths of the WordNet hierarchy. The results also throw light on the diversity of images in each of these categories. 5. Discussion and Future Work Our future work has two goals: 5.1. Completing ImageNet The current ImageNet constitutes ∼ 10% of the WordNet synsets. To further speed up the construction process, we will continue to explore more effective methods to evaluate the AMT user labels and optimize the number of repetitions needed to accurately verify each image. 5.2. Exploiting ImageNet We hope ImageNet will become a central resource for a broad of range of vision related research.