Unit 5 Introduction Robot Vision
Unit 5 Introduction Robot Vision
Study Guide
School of Engineering
Compiled by: Dr. E.M. Migabo (PhD Computer Science & DEng Electrical Engineering)
May, 2023
ELB1502
Study Guide
I. Learning objectives
2
STUDY UNIT 4
Introduction to Computer
Vision for Robotics
Compiled by: Dr. E.M. Migabo (PhD)
Introduction to Electrical Robotics
ELB1502
Computer
Image Vision Information
System
Image
Image
Processing
(Image
Processing + )
Image
Computer Vision
System
1. Object Detection
Recognition: Cat?
Image: https://tinyurl.com/yanp2o5e
1. Object Detection
Recognition: Cat?
Localization: Where is the cat?
Image: https://tinyurl.com/yanp2o5e
1. Object Detection
1. Object Detection
2. Segmentation
1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
4. Image to Text
1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
4. Image to Text
5. Image Generation
Image Generation: A style based
generator architecture for GANs
Credits: Tero Karras, arXiv 2018
1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
4. Image to Text
5. Image Generation
Optical Flow: Lucas Kanade
6. Motion Estimation method for motion estimation
Credits: https://tinyurl.com/y5rloh3g
1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
4. Image to Text
5. Image Generation
3D Reconstruction: REMODE,
6. Motion Estimation Real Time Reconstruction
Credits: Matia Pizzoli, ICRA 2014
7. 3D reconstruction from Images
1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
4. Image to Text
5. Image Generation
3D Reconstruction: REMODE,
6. Motion Estimation Real Time Reconstruction
Credits: Matia Pizzoli, ICRA 2014
7. 3D reconstruction from Images
8. Visual SLAM
1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
4. Image to Text
5. Image Generation
Biometrics : Fingerprint
6. Motion Estimation Detection, Apple Face ID
7. 3D reconstruction from Images Credits:https://tinyurl.com/y2a7wybz,
TheVerge Youtube
8. Visual SLAM
9. Biometrics and more ...
Credits: https://tinyurl.com/y6bkhnqa
Credits: https://tinyurl.com/y49rp7sd
Credits: Wikipedia, Spinning Dancer
Credits: Oleg Shuplyak, Pinterest
Introduction to Computer Vision 34
The (human) (computer) vision system
Credits: https://tinyurl.com/y6qen2vb
Credits: https://tinyurl.com/y6qen2vb
aperture
aperture
aperture
Pi
Calibration Rig Image Pci
Machine vision is concerned with the sensing of vision data and its interpretation by a computer.
The typical vision system consists of the camera and digitizing hardware, a digital computer and
hardware and software necessary to interface them. This interface hardware and software is often
referred to as a pre-processor. The operation of the vision system consists of three functions:
The sensing and digitizing functions involve the input of vision data by means of a camera
focused on the scene of interest. Special lighting techniques are frequently used to obtain an
image of sufficient contrast for later processing. The image viewed by the camera is typically
digitized and stored in computer memory.
The digital image is called frame of vision data and is frequently captured by a hardware device
called as frame grabber. These devices are capable of digitizing images at the rate of 30 frames
per second.
The digitized image matrix for each frame is stored and then subjected to image processing and
analysis functions for data reduction and interpretation of the image. These steps are required to
permit the real-time application of vision analysis required in robot applications.
Typically an image frame will be threshold to produce a binary image and then various feature
measurements will further reduce the data representation of the image. This data reduction can
change the representation of a frame from several hundred thousand bytes of raw image data to
several hundred bytes of feature value data. The resultant feature data can be analysed in the
available time for action by the robot system.
3. Application
The third function of a machine vision system is the applications function. The current
applications of machine vision in robotics include inspection part identification, location and
orientation.
3
ELB1502
Study Guide
Image sensing requires some type of image formation device such as a camera and a digitizer
which stores a video frame in the computer memory. We divide the sensing and digitizing
functions into several steps.
The initial step involves capturing the image of the scene with the vision camera. The image
consists of relative light intensities corresponding to the various portions of the scene. These light
intensities are continuous analogue values which must be sampled and converted into a digital
form.
The second step, digitizing, is achieved by an analogue to digital converter. The A/D converter is
either part of a digital video camera or the front end of a frame grabber. The choice is dependent
on the type of hardware in the system.
The frame grabber representing the third step, is an image storage and computational device which
stores a given pixel array. The frame grabber can vary in capability from one which simply stores
an image to significant computation capability.
In the more powerful frame grabbers, thresholding, windowing, and histogram modification
calculations can be carried out under computer control. The stored image is then subsequently
processed and analysed by the combination of the frame grabber and the vision controller.
4
ELB1502
Study Guide
4.1.Robotic applications
Robotic application of machine vision falls into three broad categories listed below:
· Inspection
The first category is one in which the primary function is the inspection process. This is carried
out by the machine vision system, and the robot us used in a secondary to support the application.
The objectives of machine vision inspection include checking for gross surface defects, discovery
of flaws in labelling, verification of the presence of components in assembly and checking for the
presence of holes and other features in a part.
When these kinds of inspection operations are performed manually, there is a tendency for human
error. Also, the time required in most manual inspection operations require that the procedures are
carried out automatically using 100 percent inspection and usually in much less time.
· Identification
This is concerned with applications in which the purpose of the machine vision system is to
recognise and classify an object rather than to inspect it. Inspection implies that the part must be
either accepted or rejected. Identification involves a recognition process in which the “part itself,
or its position and/or orientation, is determined.
This is usually followed by subsequent decision and action taken by the robot. Identification
applications of machine vision include part sorting, palletizing and depalletizing and picking parts
that are randomly oriented from a conveyer or bin.
In the third category, visual serving and navigation control, the purpose of the vision system is to
direct the actions of the robot based on its visual input.
The generic example of robot visual serving is where the machine vision system is used to control
the trajectory of the robot’s end effector toward an object in the workspace. Industrial examples
of this application include part positioning, retrieving parts moving along a conveyor, retrieving,
and reorienting parts moving along a conveyor, assembly etc.
5
ELB1502
Study Guide
III. Tutorials
1. Q: What is computer vision? A: Computer vision is the field of study that focuses on
enabling computers to interpret and understand visual information from digital images
or videos.
2. Q: How does human vision differ from machine vision? A: Human vision is a complex
process involving the eyes, brain, and perception, while machine vision refers to the
use of computer algorithms and techniques to extract information from images or
videos.
3. Q: How are images represented in computer vision? A: In computer vision, images are
represented as matrices or grids of pixels, where each pixel stores numerical values
representing the color or intensity of the corresponding image location.
4. Q: What are the components of the camera model in computer vision? A: The camera
model includes intrinsic parameters (focal length, principal point) and extrinsic
parameters (position and orientation) that describe the relationship between the 3D
world and 2D image coordinates.
5. Q: What are some robotic applications of machine vision? A: Robotic applications of
machine vision include object recognition and localization, robot navigation, industrial
automation, surveillance, autonomous vehicles, and augmented reality.
6. Q: Define computer vision. A: Computer vision is an interdisciplinary field that focuses
on developing algorithms and techniques for machines to extract, analyze, and interpret
information from digital images or videos.
7. Q: How does the human vision system work? A: The human vision system involves
the eyes capturing light, which is then processed by the brain to form visual perception,
including recognition, depth perception, and object tracking.
8. Q: Explain images as matrices in computer vision. A: In computer vision, images are
represented as matrices, where each element in the matrix represents a pixel value that
encodes color or intensity information.
9. Q: What is the camera model in computer vision? A: The camera model describes the
mathematical relationship between 3D points in the world and their projection onto a
2D image plane. It includes intrinsic and extrinsic parameters.
10. Q: Provide examples of robotic applications that utilize machine vision. A: Examples
include industrial robots for quality control, autonomous vehicles for road scene
understanding, surgical robots for precise image-guided procedures, and drones for
object tracking.
11. Q: How would you define computer vision in the context of robotics? A: In robotics,
computer vision refers to the application of image processing and analysis techniques
to enable robots to perceive and interpret visual information from the environment.
12. Q: What are the primary stages of human vision processing? A: Human vision
processing involves image formation on the retina, feature extraction in the visual
cortex, and higher-level interpretation in the brain for object recognition and
understanding.
13. Q: How can images as matrices be manipulated in computer vision? A: Matrices
representing images can be processed using various techniques, such as filtering, edge
detection, morphological operations, and transformations like rotation or scaling.
14. Q: Explain the concept of intrinsic parameters in the camera model. A: Intrinsic
parameters describe the internal characteristics of the camera, such as focal length,
principal point, and lens distortion, which affect the mapping of 3D points to the image
plane.
6
ELB1502
Study Guide
15. Q: What are some examples of robotic applications that utilize machine vision for
object recognition? A: Examples include industrial robots identifying and sorting
objects on an assembly line, autonomous drones detecting and avoiding obstacles, and
robots in healthcare assisting in surgical procedures.
16. Q: How does the machine vision process contribute to robot navigation? A: Machine
vision allows robots to perceive and understand the environment by analyzing visual
information, which aids in tasks such as obstacle detection, mapping, and localization.
17. Q: Explain the concept of extrinsic parameters in the camera model. A: Extrinsic
parameters define the position and orientation of the camera in the 3D world coordinate
system, enabling the transformation from 3D world points to the 2D image plane.
18. Q: What are some challenges in robotic applications of machine vision? A: Challenges
include handling variations in lighting conditions, occlusions, complex scenes, real-
time processing requirements, and robustness to noise and uncertainties.
19. Q: How does machine vision contribute to industrial automation? A: Machine vision
systems are used in industrial automation for tasks such as quality control, defect
detection, object sorting, robotic assembly, and visual inspection.
20. Q: What is the role of computer vision in robot navigation? A: Computer vision enables
robots to perceive and interpret the environment, allowing them to understand
obstacles, landmarks, and spatial relationships. This information is crucial for tasks
such as mapping, localization, path planning, and obstacle avoidance. By analyzing
visual data from cameras or other sensors, robots can make informed decisions to
navigate their surroundings safely and efficiently. Computer vision provides valuable
input for autonomous navigation systems, enabling robots to adapt to dynamic
environments and handle complex scenarios.
3. Q: Calculate the total number of pixels in a grayscale image with dimensions 800x600
pixels.
A: The total number of pixels will be 800 * 600 = 480,000 pixels.
4. Q: Given a camera with a focal length of 50 mm and an object distance of 2 meters,
1 1 1
calculate the image distance using the camera model equation = + )
𝑓 𝑑0 𝑑𝑖
A: Using the camera model equation, the image distance (𝑑𝑖 ) will be 0.040 meters or
40 millimeters.
7
ELB1502
Study Guide
5. Q: Determine the intrinsic matrix K given the camera's focal length of 500 pixels and
principal point coordinates (320, 240).
A: The intrinsic matrix K will be:
500 0 320
[ 0 500 240]
0 0 1
8
ELB1502
Study Guide
milliseconds, what is the total processing time for a video sequence of 100 frames?
Solution: The time taken to convert one frame to grayscale is 5 milliseconds.
Therefore, the total processing time for 100 frames will be 100 frames * 5 milliseconds
= 500 milliseconds or 0.5 seconds.
14. Problem: A robot is equipped with a depth-sensing camera that measures the distance
of objects in a scene. The camera has a depth resolution of 1 millimeter. If the robot
detects an object at a distance of 5 meters, what is the depth measurement accuracy in
centimeters?
Solution: The depth measurement accuracy is equal to the depth resolution, which is 1
millimeter. Converting this to centimeters gives an accuracy of 0.1 centimeters.
15. Problem: A robot is using machine vision to navigate through a maze. The camera
captures images at a resolution of 640x480 pixels. The robot's algorithm requires the
images to be resized to a resolution of 320x240 pixels. If each image resizing operation
takes 10 milliseconds, what is the processing time for a video sequence of 50 frames?
Solution: The time taken to resize one frame is 10 milliseconds. Therefore, the total
processing time for 50 frames will be 50 frames * 10 milliseconds = 500 milliseconds
or 0.5 seconds.
V. References
[1] J. J. Craig, Introduction to Robotics, Global Edition, 3rd ed. Harlow, England: Pearson
Education Limited, 2014