Product Counting Using Images With Application To Robot-Based Retail Stock Assessment
Product Counting Using Images With Application To Robot-Based Retail Stock Assessment
Abstract—In this paper, we propose a novel method for the other hand, Gokturk [4] uses camera and multiple lighting
obtaining product count directly from images recorded using sources to compute occupancy in a enclosed compartment us-
a monocular camera mounted on a mobile robot. This has ing triangulation methods. It also suggests using depth sensors
application in robot-based retail stock assessment problem where
a mobile robot is used for monitoring the stock levels on or stereo-vision system for occupancy measurement. There are
the shelves of a retail store. The products are recognized by other patents such as [5], [6] which talk about generic systems
carrying out a nearest-neighbor search in the template feature that can identify products, generate planogram, detect out-of-
space using a k-d tree. Unlike current approaches which only stock situations and provide percentage occupancy of products.
provide approximate stock level, we propose a method which can In this paper, we look into the problem of obtaining accurate
compute the exact number of discrete products visible in a given
image. The product count is obtained by fitting bounding box product count directly from images recorded using on-board
around each product and removing them sequentially from the camera. We do not use depth sensors, stereo-vision system or
image. A second stage of grid-based search is carried out in the any other range measuring device like IR or laser for obtaining
neighborhood of each detected product to detect new products the product count. We are interested in counting the number
which were missed out in the previous step. This detection is of products which are visible in a given image. The method
based on a confidence measure that includes various information
such as histogram matching and spatial location. The efficacy involves two steps - in the first step, the product category or
of the proposed approach is demonstrated through experiments label is identified and in the second step, the product count is
on different datasets obtained using robot camera as well as estimated.
mobile phone camera. These results show that the robot-based The product is identified using interest point features like
retail stock assessment may become a viable alternative to the SURF [7]. A k-d tree is created in the feature space comprising
currently prevailing manual mode of carrying out these surveys.
of SURF descriptors from all the product templates. For
Index Terms—Retail Robotics, stock assessment, product each query image, a nearest neighborhood search is carried
counting, OOS, object recognition, service robotics out in the descriptor space to identify the matching product
templates. We provide two methods for obtaining the product
I. I NTRODUCTION count. The first method involves computing feature repeatabil-
In this paper, we look into the problem of carrying out ity for each product which is counting the maximum number
stock monitoring and assessment in retail stores using mobile of times a particular feature is repeated in a given image.
robots [1] [2] [3]. The robot uses on-board cameras to capture This factor is more or less proportional to the number of
video that contains the images of the shelves on either side products present in the image. The second method consists
of the robot. These images are processed, either on-board or of obtaining the bounding box for each identified product by
on a remote server, to generate statistics of the products on using homography coupled with RANSAC [8] and removing
the shelf and detect various situations like out-of-stock (OOS), them sequentially. A second stage of search based on his-
misplaced items etc. An illustration of robot-based retail stock togram matching is employed to detect those products which
assessment system is shown in Figure 1. The robot may carry were left out in the previous step. This search is performed by
a pair of cameras that can move up and down on a shaft or creating a 3 × 3 grid around each detected product. More will
may carry multiple cameras placed at different heights. Use be discussed in the later sections of this paper. This second
of robots may not only reduce the cost of such surveys, but method provides not only product count but also product
also increase the accuracy of data collected by avoiding human arrangement in a given shelf.
related factors. The main contributions made in this paper are as follows:
The robot has to identify various products, know their (1) We provide two novel methods for obtaining accurate
location based on a given planogram and detect incidents product count from images. (2) We have provided performance
like out-of-stock situations and misplaced items. A number evaluation over different test cases and carried out experiment
of methods have been proposed to solve this problem. For with actual robots to demonstrate the utility of the proposed
instance, Zimmerman [3] decodes a product barcode from approach. This is in contrast to other works such as [1] [6],
the shelf image. It then retrieves the product image from a where authors have reported systems with similar capabilities
database and segments the shelf image to match with the re- but, do not provide either the method description or perfor-
trieved image. If no match is found, out-of-stock flag is set. On mance evaluation. To our knowledge, such results for retail
0.8
0.6
Precision
0.4
Fig. 3. Explaining the method of product counting through pictures.
0.2 Dataset D1
Dataset D2
IV. E XPERIMENT R ESULTS Dataset D3
Dataset D4
0
Our robotic system consists of a Turtlebot 2 robot with 0 0.2 0.4 0.6 0.8 1
an on-board USB camera facing the rack on either side of Recall
the aisle. The entire software is implemented using ROS [22]
Fig. 4. Precision-Recall curve for product counting method I on different
software framework. The image processing is carried out using datasets. This is best performance obtained by varying the user-defined
OpenCV [23]. The images are collected at a speed of 15 parameters in the algorithm.
frames per second. The robot moves at a speed of 0.1 m/s to
avoid blurring of images. The accuracy of product recognition
using SURF is 100%. In other words, we are able to identify V. C ONCLUSION
a product if it is present on the rack and is easily identifiable Carrying out stock assessment using robots is still fraught
under ambient illumination. The product counting is carried with several challenges. One of the challenge is to reliably
out using two methods - one based on descriptor repeatability detect the stock level on the shelves. Low cost of visual
and other making use of colour along SURF descriptors. The sensors have encouraged people to generate several meaningful
dataset D1 and D2 are collected using camera mounted on statistics by processing images. In this work, we show that it
a mobile robot while the datasets D3 and D4 are recorded is possible to obtain very high level of precision in obtaining
using a mobile phone. So, the later videos have other effects product count using features like SURF and colour histogram.