Object Detection using TensorFlow
Last Updated :
24 Apr, 2025
Identifying and detecting objects within images or videos is a key task in computer vision. It is critical in a variety of applications, ranging from autonomous vehicles and surveillance systems to augmented reality and medical imaging. TensorFlow, a Google open-source machine learning framework, provides a robust collection of tools for developing and deploying object detection models.
In this article, we will go over the fundamentals of using TensorFlow for object identification. TensorFlow provides a flexible and efficient framework to match your demands, whether you're working on a computer vision research project or designing apps that require real-time object identification capabilities. Let's get into the specifics of utilizing TensorFlow to develop object detection and realize the full potential of this cutting-edge technology.
What is Object detection?
Object detection is a computer vision task that involves identifying and locating multiple objects within an image or video. The goal is not just to classify what is in the image but also to precisely outline and pinpoint where each object is located.
Key Concepts in Object Detection:
- Bounding Boxes
- Object detection involves drawing bounding boxes around detected objects. A bounding box is a rectangle that encloses an object and is defined by its coordinates—typically, (x_min, y_min) for the top-left corner and (x_max, y_max) for the bottom-right corner.
- Object Localization
- Localization is the process of determining the object's location within the image. It involves predicting the coordinates of the bounding box that encapsulates the object.
- Class Prediction
- Object detection not only locates objects but also categorizes them into different classes (e.g., person, car, dog). Each object is assigned a class label, providing information about what the object is.
- Model Architectures
Object Detection using TensorFlow
Setting Up TensorFlow
Begin by installing TensorFlow using pip:
!pip install tensorflow
Ensure that you have the necessary dependencies, and if you have a compatible GPU, consider installing TensorFlow with GPU support for faster training.
Choosing a Pre-trained Model
TensorFlow provides pre-trained models on large datasets like COCO (Common Objects in Context). These models serve as a starting point for transfer learning. Common models include Faster R-CNN, SSD (Single Shot Multibox Detector), and YOLO (You Only Look Once). For this tutorial we will be using the ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8 model.
Understanding the ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8 Model
- SSD (Single Shot Multibox Detector): SSD is a popular object detection algorithm known for its speed and accuracy. It's designed to detect objects of different scales and aspect ratios in a single pass.
- MobileNetV2: MobileNetV2 is a lightweight neural network architecture optimized for mobile and edge devices. It strikes a balance between efficiency and performance, making it ideal for real-time applications.
- 640x640: This denotes the input image size the model expects. Larger input sizes often yield more accurate results but require more computational resources. These models are also smaller in size than models trained on larger images like 1024x1024. Also the inference time is shorter.
- Example: centernet_hg104_1024x1024_coco17_tpu-32 is a model of 1.33 GBs
- while ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8 stands at 19 MBs
- and efficientdet_d1_coco17_tpu-32 is of 50 MB (for 640x640 images)
- the inference time for all three in Google Colab is around 42s, 0s and 4s. You can clearly see how size affects the inference time of the models
- COCO (Common Objects in Context) Dataset: The COCO dataset is a large-scale dataset for object detection, segmentation, and captioning. It encompasses a diverse range of object categories and is widely used for training and evaluating computer vision models.
- TPU-8 (Tensor Processing Unit - 8): TensorFlow's TPUs are custom hardware accelerators designed for machine learning workloads. The "8" refers to the number of cores, indicating enhanced parallel processing capabilities.
Now that we have everything needed, let's begin with the code:
Step 1: Import Libraries
First let's import the necessary libraries for TensorFlow, NumPy, OpenCV, Pillow, and Matplotlib.
Python3
import tensorflow as tf
import numpy as np
import cv2
from PIL import Image
from matplotlib import pyplot as plt
from random import randint
Step 2: Download, Extract and Load the Pre-trained Model
Now, load the pre-trained model using TensorFlow's SavedModel format.
Python3
!wget http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.tar.gz
!tar -xzvf ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.tar.gz
model = tf.saved_model.load("ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/saved_model")
Step 3: Load and Preprocess Image
In this step, load an image, convert it to a NumPy array, and preprocess it for input to the model, as the model can't directly work on an image therefore we first converted it into a tensor.
Python3
image = Image.open("detect.jpg")
image_np = np.array(image)
input_tensor = tf.convert_to_tensor(np.expand_dims(image_np, 0), dtype=tf.uint8)
image
Output:
.jpg)
Step 5: Perform Object Detection
Here we use the loaded model to perform object detection on the input image and extract bounding box coordinates, class IDs, and scores.
Python3
detection = model(input_tensor)
# Parse the detection results
boxes = detection['detection_boxes'].numpy()
classes = detection['detection_classes'].numpy().astype(int)
scores = detection['detection_scores'].numpy()
Step 6: Add the COCO Labels
These are the labels for the COCO dataset, which contains class names corresponding to class IDs.
The Model only gives us the integer values of classes that it was trained on i.e. the COCO dataset, to translate those integer values into meaningful class names we need these labels.
Python3
labels = ['__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter',
'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra',
'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis',
'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana',
'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake',
'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse',
'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator',
'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']