Object Detection with YOLO and OpenCV
Last Updated :
28 May, 2024
Object Detection is a task of computer vision that helps to detect the objects in the image or video frame. It helps to recognize objects count the occurrences of them to keep records, etc. The objective of object detection is to identify and annotate each of the objects present in the media.
YOLO(You Only Look Once) is a state-of-the-art model to detect objects in an image or a video very precisely and accurately with very high accuracy. In this tutorial, we will learn to run Object Detection with YOLO and plot the frames using OpenCV on both a recorded video and a camera.
Steps to Detect Object with YOLO and OpenCV
Step 1: Setup the Environment
We will be using Ultralytics and OpenCV that can be installed using the following command:
pip install opencv-python
pip install ultralytics
After installation, create the file main.py, and download the video from the given reference or use any other video.

Step 2: Importing Necessary Libraries
import cv2
from ultralytics import YOLO
- cv2: It is the OpenCV python library.
- We import the YOLO from ultralytics to load the model and work upon it.
Step 3: Define Function to Get Class Colors
To generate random colours for different classes label and frame for object detection, we use the following method:
- The method takes the RGB extreme values as a tuple array
- Next we store the index in colour_index and choose the appropriate of R, G or B.
- Then we add some increments to the base colours to add variation.
- Finally we add modulus for keeping it in range
- Then we return the colour as tuple.
# Function to get class colors
def getColours(cls_num):
base_colors = [(255, 0, 0), (0, 255, 0), (0, 0, 255)]
color_index = cls_num % len(base_colors)
increments = [(1, -2, 1), (-2, 1, -1), (1, -1, 2)]
color = [base_colors[color_index][i] + increments[color_index][i] *
(cls_num // len(base_colors)) % 256 for i in range(3)]
return tuple(color)
Step 4: Load YOLO Model
We will initializes the YOLO object detector with the specified model file (yolov8s.pt
), which contains the pre-trained weights and configuration for the YOLOv8s model.
yolo = YOLO('yolov8s.pt')
Step 5: Open Video Capture
We are going to use the small model, since with better accuracy, it comes at the cost of speed. Now we are going to load the given model, and it will be downloaded in the project directory. Next, we will capture the video using VideoCapture(0) method.
videoCap = cv2.VideoCapture(0)
Step 6: Process Video Frames
This part of the code continuously captures frames from the video feed (videoCap.read()), processes each frame using YOLO (yolo.track()), and visualizes the detected objects in the frame.
while True:
ret, frame = videoCap.read()
if not ret:
continue
results = yolo.track(frame, stream=True)
# Object detection and visualization code
Step 7: Main Loop for Real-time Object Detection
Continuously read frames from the video capture, perform object detection using YOLOv8s, and visualize the results.
while True:
ret, frame = videoCap.read()
if not ret:
continue
results = yolo.track(frame, stream=True)
for result in results:
# get the classes names
classes_names = result.names
Now we will iterate over the loop and then fetch each of the result. Each result contains the following attributes.
- orig_img: This is the original image as an numpy array.
- orig_shape: This tuple contains the frame shape.
- boxes: This is a array of Box objectsof Ultralytics containing bounding box. Each box has following parameters:
- xyxy: It contains the coordinates according to frame and we are going to use this for this tutorial.
- conf: It is the confidence value of the bounding box or the detected object.
- cls: It is the class of object. There are total 80 classes.
- id: It is the ID of the box.
- xywh: Returns the bounding box in xywh format.
- xyxy: It returns the bounding box in xyxy format but in normalized form that is from 0 to 1.
- xywhn: It returns the bounding box in xywh format but in normalized form that is from 0 to 1.
- masks: It is the collection of Mask objects for detection mask.
- probs: It contains the probability of each classification.
- keypoints: It is the collection of Keypoints which stores the collection of each keypoint.
- obb: It contains OBB object containing oriented bounding boxes.
- speed: It returns the speed of prediction. Faster computer with GPU gives better speed.
- names: It is the dictionary of classes with names and it is irrespective of prediction. The cls that we get in box can be used here to get the respective name.
- path: It is the path to image file.
Now for this tutorial, we will be needing only the boxes for drawing the bounding box.
Now each Box has the conf value, so we check if the confidence is greater than 40% or 0.4 or not. If yes, we draw the bounding box. According to that, we also label the Bounding box.
Step 8: Display Image and Break Loop
Show the annotated frame and break the loop if the 'q' key is pressed.
cv2.imshow('frame', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
Step 9: Release Resources
Release video capture resources and close all windows.
videoCap.release()
cv2.destroyAllWindows()
Complete Code for Object Detection
Python
import cv2
from ultralytics import YOLO
# Load the model
yolo = YOLO('yolov8s.pt')
# Load the video capture
videoCap = cv2.VideoCapture(0)
# Function to get class colors
def getColours(cls_num):
base_colors = [(255, 0, 0), (0, 255, 0), (0, 0, 255)]
color_index = cls_num % len(base_colors)
increments = [(1, -2, 1), (-2, 1, -1), (1, -1, 2)]
color = [base_colors[color_index][i] + increments[color_index][i] *
(cls_num // len(base_colors)) % 256 for i in range(3)]
return tuple(color)
while True:
ret, frame = videoCap.read()
if not ret:
continue
results = yolo.track(frame, stream=True)
for result in results:
# get the classes names
classes_names = result.names
# iterate over each box
for box in result.boxes:
# check if confidence is greater than 40 percent
if box.conf[0] > 0.4:
# get coordinates
[x1, y1, x2, y2] = box.xyxy[0]
# convert to int
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
# get the class
cls = int(box.cls[0])
# get the class name
class_name = classes_names[cls]
# get the respective colour
colour = getColours(cls)
# draw the rectangle
cv2.rectangle(frame, (x1, y1), (x2, y2), colour, 2)
# put the class name and confidence on the image
cv2.putText(frame, f'{classes_names[int(box.cls[0])]} {box.conf[0]:.2f}', (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 1, colour, 2)
# show the image
cv2.imshow('frame', frame)
# break the loop if 'q' is pressed
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# release the video capture and destroy all windows
videoCap.release()
cv2.destroyAllWindows()
Output:
Conclusion
Hence in this tutorial, we learnt to detect objects using YOLO and plot them using OpenCV using both camera and recorded video. We learnt to operate on objects such as Results and Box so that we have better control over our choice.
Similar Reads
Detect an object with OpenCV-Python
Object detection refers to identifying and locating objects within images or videos. OpenCV provides a simple way to implement object detection using Haar Cascades a classifier trained to detect objects based on positive and negative images. In this article we will focus on detecting objects using i
4 min read
Object Detection using yolov8
In the world of computer vision, YOLOv8 object detection really stands out for its super accuracy and speed. It's the latest version of the YOLO series, and it's known for being able to detect objects in real-time. YOLOv8 takes web applications, APIs, and image analysis to the next level with its to
7 min read
Object Detection by YOLO using Tensorflow
YOLO, or "You Only Look Once," is a family of deep learning models that enable real-time object detection by treating the task as a single regression problem. Unlike traditional methods that apply detection across multiple regions of an image, YOLO detects objects in one pass, which makes it fast an
15+ min read
OpenCV Selective Search For Object Detection
OpenCV is a Python library that is used to study images and video streams. It basically extracts the pixels from the images and videos (stream of image) so as to study the objects and thus obtain what they contain. It contains low-level image processing and high-level algorithms for object detection
14 min read
Eye blink detection with OpenCV, Python, and dlib
In this article, we are going to see how to detect eye blink using OpenCV, Python, and dlib. This is a fairly simple task and it requires you to have a basic understanding of OpenCV and how to implement face landmark detection programs using OpenCV and dlib, since we'll be using that as the base for
6 min read
Real time object color detection using OpenCV
In this article, we will discuss how to detect a monochromatic colour object using python and OpenCV. Monochromatic color means light of a single wavelength. We will use the video, captured using a webcam as input and try to detect objects of a single color, especially Blue. But you can detect any c
4 min read
Region Proposal Object Detection with OpenCV, Keras, and TensorFlow
In this article, we'll learn how to implement Region proposal object detection with OpenCV, Keras and TensorFlow. Install all the dependencies Use the pip command for installing all the dependencies pip install tensorflow keras imutils pip install opencv-contrib-python Note: Make sure about installi
9 min read
Feature detection and matching with OpenCV-Python
In this article, we are going to see about feature detection in computer vision with OpenCV in Python. Feature detection is the process of checking the important features of the image in this case features of the image can be edges, corners, ridges, and blobs in the images. In OpenCV, there are a nu
5 min read
Object Detection using TensorFlow
Identifying and detecting objects within images or videos is a key task in computer vision. It is critical in a variety of applications, ranging from autonomous vehicles and surveillance systems to augmented reality and medical imaging. TensorFlow, a Google open-source machine learning framework, pr
7 min read
Detecting ArUco markers with OpenCV and Python
ArUco markers are widely used in computer vision applications for tasks such as camera calibration, pose estimation, and augmented reality. These markers are square fiducial markers with a unique binary pattern that can be easily detected by computer vision algorithms. In this article, we will explo
6 min read