0% found this document useful (0 votes)
12 views

Unit 5 Introduction Robot Vision

ROBOT VISION

Uploaded by

Philile Ngwenya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Unit 5 Introduction Robot Vision

ROBOT VISION

Uploaded by

Philile Ngwenya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

ELB1502

Study Guide

ELECTRICAL ENGINEERING ROBOTICS

Unit 5: Introduction to Robot Vision

Diploma in Electrical Engineering

In the Department of Electrical Engineering

School of Engineering

College of Science, Engineering & Technology (CSET)

University of South Africa

Compiled by: Dr. E.M. Migabo (PhD Computer Science & DEng Electrical Engineering)

Instructors: Dr. M.E. Migabo & Mr. A.M. Dlamini

May, 2023
ELB1502
Study Guide

I. Learning objectives

The learning objectives for this study unit are:

a. To define what is computer vision.


b. To understand human vision system and machine
c. To understand images as matrices
d. To understand the camera model
e. To understand robotic applications of machine vision
II. Unit summary
The following set of slides summarize very well the content of the study unit in terms of learning
objectives a. to d.:

2
STUDY UNIT 4

Introduction to Computer
Vision for Robotics
Compiled by: Dr. E.M. Migabo (PhD)
Introduction to Electrical Robotics
ELB1502

Introduction to Computer Vision 1


8QLW Outline
● Introduction
○ What is CV?
○ Overview of the field
○ A look at history
○ Hard Problem?
● Human Vision System & the Machine
○ The human vision system
○ Fooling humans
○ The computer vision system
● Images as matrices.
○ How cameras work to produce these matrices
○ Meaning of Intensity, Color etc
○ Shoutout to Image Processing

Introduction to Computer Vision 2


8QLW Outline
● Camera Model
○ Pinhole Camera Model
○ Intrinsic Camera Matrix
○ Camera Calibration

Introduction to Computer Vision 3


Introduction

Introduction to Computer Vision 4


What is Computer Vision?
Universe

Computer
Image Vision Information
System
Image
Image
Processing

Introduction to Computer Vision 5


What is Computer Vision?

Image Credits: CS131, Fall ‘18, Stanford

Introduction to Computer Vision 6


What is Computer Vision?
● Computer Vision is deals with extracting information regarding the 3D world
we live in using a single or a bunch of images.
● Computer Vision like most other fields today, is at the junction of numerous
disciplines from Biology to Computer Science and has applications only
limited by our imagination.

Introduction to Computer Vision 7


Overview of the field

Image Credits: XKCD, 1425, 2014

Introduction to Computer Vision 8


Overview of the field

Image Credits: https://tinyurl.com/y53by9pr Image Credits: XKCD, 1425, 2014


Image Credits: https://tinyurl.com/y53by9pr

Introduction to Computer Vision 9


Overview of the field
What kind of Information?
Universe

(Image
Processing + )
Image
Computer Vision
System

Introduction to Computer Vision 10


Overview of the field
What kind of Information?

Image Credits: Karpathy, CVPR’15

Image Credits: https://tinyurl.com/lxuex6o

Introduction to Computer Vision 11


Overview of the field
Primary themes in Computer Vision are:

1. Object Detection

Recognition: Cat?
Image: https://tinyurl.com/yanp2o5e

Introduction to Computer Vision 12


Overview of the field
Primary themes in Computer Vision are:

1. Object Detection

Recognition: Cat?
Localization: Where is the cat?
Image: https://tinyurl.com/yanp2o5e

Introduction to Computer Vision 13


Overview of the field
Primary themes in Computer Vision are:

1. Object Detection

Object Detection: Which Objects


are here and where?
Image: https://tinyurl.com/y4ly96rd

Introduction to Computer Vision 14


Overview of the field
Primary themes in Computer Vision are:

1. Object Detection
2. Segmentation

Segmentation: Which pixels


belong to which object?
Credits: Own Work

Introduction to Computer Vision 15


Overview of the field
Primary themes in Computer Vision are:

1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements

Image Colorization: From


Grayscale to Colored Images
Credits: Richard Zhang, CVPR 2016

Introduction to Computer Vision 16


Overview of the field
Primary themes in Computer Vision are:

1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements

Image Enhancement: Real Time


Image Enhancement
Credits: Michael Gharbi, ACM Graphics 2017

Introduction to Computer Vision 17


Overview of the field
Primary themes in Computer Vision are:

1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements

Super Resolution: Upsampling


Images while preserving quality
Credits: https://github.com/tensorlayer/srgan

Introduction to Computer Vision 18


Overview of the field
Primary themes in Computer Vision are:

1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
4. Image to Text

Image Description: Automatic


semantic description for images
Credits: Karpathy, CVPR 2015

Introduction to Computer Vision 19


Overview of the field
Primary themes in Computer Vision are:

1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
4. Image to Text
5. Image Generation
Image Generation: A style based
generator architecture for GANs
Credits: Tero Karras, arXiv 2018

Introduction to Computer Vision 20


Overview of the field
Primary themes in Computer Vision are:

1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
4. Image to Text
5. Image Generation
Optical Flow: Lucas Kanade
6. Motion Estimation method for motion estimation
Credits: https://tinyurl.com/y5rloh3g

Introduction to Computer Vision 21


Overview of the field
Primary themes in Computer Vision are:

1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
4. Image to Text
5. Image Generation
3D Reconstruction: REMODE,
6. Motion Estimation Real Time Reconstruction
Credits: Matia Pizzoli, ICRA 2014
7. 3D reconstruction from Images

Introduction to Computer Vision 22


Overview of the field
Primary themes in Computer Vision are:

1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
4. Image to Text
5. Image Generation
3D Reconstruction: REMODE,
6. Motion Estimation Real Time Reconstruction
Credits: Matia Pizzoli, ICRA 2014
7. 3D reconstruction from Images
8. Visual SLAM

Introduction to Computer Vision 23


Overview of the field
Primary themes in Computer Vision are:

1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
4. Image to Text
5. Image Generation
Biometrics : Fingerprint
6. Motion Estimation Detection, Apple Face ID
7. 3D reconstruction from Images Credits:https://tinyurl.com/y2a7wybz,
TheVerge Youtube
8. Visual SLAM
9. Biometrics and more ...

Introduction to Computer Vision 24


A look at history
● Robert Nathan started writing
computer programs for enhancing
Credits: EE604, nasa.gov
images from NASA’s spacecraft’s at
Jet Propulsion Lab, NASA.
● The Summer Vision Project: Project at
MIT to solve a significant part of
visual system. Primary Objective was
to divide the image into object,
background and chaos regions, over
the course of a summer.
Credits: https://tinyurl.com/y6bpo4nk

Introduction to Computer Vision 25


A look at history

Credits: Prof. Tanaya Guha, EE698K

Introduction to Computer Vision 26


A look at history

Credits: Prof. Tanaya Guha, EE698K

Introduction to Computer Vision 27


Hard Problem?
● Why are we still working on roughly the same problem as the “summer vision
project”?
● Why is it that creating 3D models of chairs is easier than identifying them?

Introduction to Computer Vision 28


Hard Problem?
● Why are we still working on roughly the same problem as the “summer vision
project”?
● Why is it that creating 3D models of chairs is easier than identifying them?

➔ There is a large between some ~1920x1080x3 numbers and the high-level


abstract meaning we associate with them.
➔ Images are 2D representation of information from 3D world.

Introduction to Computer Vision 29


Human Vision System & Computer Vision System

Introduction to Computer Vision 30


The human vision system

Credits: https://tinyurl.com/y6bkhnqa

Introduction to Computer Vision 31


The human vision system

Introduction to Computer Vision 32


The human vision system

Credits: Ulas Bagci, UCF

Introduction to Computer Vision 33 Harsh Sinha


Fooling humans

Credits: https://tinyurl.com/y49rp7sd
Credits: Wikipedia, Spinning Dancer
Credits: Oleg Shuplyak, Pinterest
Introduction to Computer Vision 34
The (human) (computer) vision system

Credits: CS131, Stanford

Introduction to Computer Vision 35


Fooling Computers

Credits: https://tinyurl.com/l5pwp6t Credits: Wikipedia, Barber Pole Illusion

Introduction to Computer Vision 36


Images as Matrices

Introduction to Computer Vision 37


Camera Models

Introduction to Computer Vision 38


Camera Models

Credits: https://tinyurl.com/y6qen2vb

Introduction to Computer Vision 39


Camera Models
Not this one but models as
in modelling a phenomena

Credits: https://tinyurl.com/y6qen2vb

Introduction to Computer Vision 40


Camera Models
● Like so many things in engineering, we create a simple “model” of a camera
to which is easy to understand and can approximate the actual functioning of
a camera to a good degree.
● There are different models:
■ Pinhole camera model
■ Lens model
■ ...

Introduction to Computer Vision 41


Pinhole camera model

aperture

Credits: Wikipedia, Pinhole Camera Model

Introduction to Computer Vision 42


Pinhole camera model

aperture

Credits: Wikipedia, Pinhole Camera Model

Introduction to Computer Vision 43


Pinhole camera model

aperture

Credits: Wikipedia, Pinhole Camera Model

Introduction to Computer Vision 44


Pinhole camera model
where x’i = yi and z = x3

where c is an offset in pixels

Can we make this into a matrix multiplication of the


form p’ = Mp?

Introduction to Computer Vision 45


Intrinsic camera matrix

Credits: Edwin Olson, University of Michigan

Introduction to Computer Vision 46


Intrinsic camera matrix

Credits: Edwin Olson, University of Michigan

Introduction to Computer Vision 47


Intrinsic camera matrix

Credits: Edwin Olson, University of Michigan

Introduction to Computer Vision 48


Intrinsic camera matrix

Credits: Edwin Olson, University of Michigan

Introduction to Computer Vision 49


Intrinsic camera matrix

Credits: Edwin Olson, University of Michigan

Introduction to Computer Vision 50


Camera calibration

Pi
Calibration Rig Image Pci

Credits: Gaurav Pandey, Ford

Introduction to Computer Vision 51


ELB1502
Study Guide

Summary notes and Computer Vision Applications to Robotics

Machine vision is concerned with the sensing of vision data and its interpretation by a computer.
The typical vision system consists of the camera and digitizing hardware, a digital computer and
hardware and software necessary to interface them. This interface hardware and software is often
referred to as a pre-processor. The operation of the vision system consists of three functions:

1. Sensing and digitizing image data

The sensing and digitizing functions involve the input of vision data by means of a camera
focused on the scene of interest. Special lighting techniques are frequently used to obtain an
image of sufficient contrast for later processing. The image viewed by the camera is typically
digitized and stored in computer memory.

The digital image is called frame of vision data and is frequently captured by a hardware device
called as frame grabber. These devices are capable of digitizing images at the rate of 30 frames
per second.

2. Image processing and analysis

The digitized image matrix for each frame is stored and then subjected to image processing and
analysis functions for data reduction and interpretation of the image. These steps are required to
permit the real-time application of vision analysis required in robot applications.

Typically an image frame will be threshold to produce a binary image and then various feature
measurements will further reduce the data representation of the image. This data reduction can
change the representation of a frame from several hundred thousand bytes of raw image data to
several hundred bytes of feature value data. The resultant feature data can be analysed in the
available time for action by the robot system.

3. Application

The third function of a machine vision system is the applications function. The current
applications of machine vision in robotics include inspection part identification, location and
orientation.

The relationship between three function is shown in figure 1

3
ELB1502
Study Guide

Figure 1: Relationship between the three functions

4. Sensing and digitizing function in machine vision

Image sensing requires some type of image formation device such as a camera and a digitizer
which stores a video frame in the computer memory. We divide the sensing and digitizing
functions into several steps.

The initial step involves capturing the image of the scene with the vision camera. The image
consists of relative light intensities corresponding to the various portions of the scene. These light
intensities are continuous analogue values which must be sampled and converted into a digital
form.

The second step, digitizing, is achieved by an analogue to digital converter. The A/D converter is
either part of a digital video camera or the front end of a frame grabber. The choice is dependent
on the type of hardware in the system.

The frame grabber representing the third step, is an image storage and computational device which
stores a given pixel array. The frame grabber can vary in capability from one which simply stores
an image to significant computation capability.

In the more powerful frame grabbers, thresholding, windowing, and histogram modification
calculations can be carried out under computer control. The stored image is then subsequently
processed and analysed by the combination of the frame grabber and the vision controller.

4
ELB1502
Study Guide

4.1.Robotic applications

Robotic application of machine vision falls into three broad categories listed below:

· Inspection

The first category is one in which the primary function is the inspection process. This is carried
out by the machine vision system, and the robot us used in a secondary to support the application.

The objectives of machine vision inspection include checking for gross surface defects, discovery
of flaws in labelling, verification of the presence of components in assembly and checking for the
presence of holes and other features in a part.

When these kinds of inspection operations are performed manually, there is a tendency for human
error. Also, the time required in most manual inspection operations require that the procedures are
carried out automatically using 100 percent inspection and usually in much less time.

· Identification

This is concerned with applications in which the purpose of the machine vision system is to
recognise and classify an object rather than to inspect it. Inspection implies that the part must be
either accepted or rejected. Identification involves a recognition process in which the “part itself,
or its position and/or orientation, is determined.

This is usually followed by subsequent decision and action taken by the robot. Identification
applications of machine vision include part sorting, palletizing and depalletizing and picking parts
that are randomly oriented from a conveyer or bin.

· Visual serving and navigation

In the third category, visual serving and navigation control, the purpose of the vision system is to
direct the actions of the robot based on its visual input.

The generic example of robot visual serving is where the machine vision system is used to control
the trajectory of the robot’s end effector toward an object in the workspace. Industrial examples
of this application include part positioning, retrieving parts moving along a conveyor, retrieving,
and reorienting parts moving along a conveyor, assembly etc.

5
ELB1502
Study Guide

III. Tutorials
1. Q: What is computer vision? A: Computer vision is the field of study that focuses on
enabling computers to interpret and understand visual information from digital images
or videos.
2. Q: How does human vision differ from machine vision? A: Human vision is a complex
process involving the eyes, brain, and perception, while machine vision refers to the
use of computer algorithms and techniques to extract information from images or
videos.
3. Q: How are images represented in computer vision? A: In computer vision, images are
represented as matrices or grids of pixels, where each pixel stores numerical values
representing the color or intensity of the corresponding image location.
4. Q: What are the components of the camera model in computer vision? A: The camera
model includes intrinsic parameters (focal length, principal point) and extrinsic
parameters (position and orientation) that describe the relationship between the 3D
world and 2D image coordinates.
5. Q: What are some robotic applications of machine vision? A: Robotic applications of
machine vision include object recognition and localization, robot navigation, industrial
automation, surveillance, autonomous vehicles, and augmented reality.
6. Q: Define computer vision. A: Computer vision is an interdisciplinary field that focuses
on developing algorithms and techniques for machines to extract, analyze, and interpret
information from digital images or videos.
7. Q: How does the human vision system work? A: The human vision system involves
the eyes capturing light, which is then processed by the brain to form visual perception,
including recognition, depth perception, and object tracking.
8. Q: Explain images as matrices in computer vision. A: In computer vision, images are
represented as matrices, where each element in the matrix represents a pixel value that
encodes color or intensity information.
9. Q: What is the camera model in computer vision? A: The camera model describes the
mathematical relationship between 3D points in the world and their projection onto a
2D image plane. It includes intrinsic and extrinsic parameters.
10. Q: Provide examples of robotic applications that utilize machine vision. A: Examples
include industrial robots for quality control, autonomous vehicles for road scene
understanding, surgical robots for precise image-guided procedures, and drones for
object tracking.
11. Q: How would you define computer vision in the context of robotics? A: In robotics,
computer vision refers to the application of image processing and analysis techniques
to enable robots to perceive and interpret visual information from the environment.
12. Q: What are the primary stages of human vision processing? A: Human vision
processing involves image formation on the retina, feature extraction in the visual
cortex, and higher-level interpretation in the brain for object recognition and
understanding.
13. Q: How can images as matrices be manipulated in computer vision? A: Matrices
representing images can be processed using various techniques, such as filtering, edge
detection, morphological operations, and transformations like rotation or scaling.
14. Q: Explain the concept of intrinsic parameters in the camera model. A: Intrinsic
parameters describe the internal characteristics of the camera, such as focal length,
principal point, and lens distortion, which affect the mapping of 3D points to the image
plane.

6
ELB1502
Study Guide

15. Q: What are some examples of robotic applications that utilize machine vision for
object recognition? A: Examples include industrial robots identifying and sorting
objects on an assembly line, autonomous drones detecting and avoiding obstacles, and
robots in healthcare assisting in surgical procedures.
16. Q: How does the machine vision process contribute to robot navigation? A: Machine
vision allows robots to perceive and understand the environment by analyzing visual
information, which aids in tasks such as obstacle detection, mapping, and localization.
17. Q: Explain the concept of extrinsic parameters in the camera model. A: Extrinsic
parameters define the position and orientation of the camera in the 3D world coordinate
system, enabling the transformation from 3D world points to the 2D image plane.
18. Q: What are some challenges in robotic applications of machine vision? A: Challenges
include handling variations in lighting conditions, occlusions, complex scenes, real-
time processing requirements, and robustness to noise and uncertainties.
19. Q: How does machine vision contribute to industrial automation? A: Machine vision
systems are used in industrial automation for tasks such as quality control, defect
detection, object sorting, robotic assembly, and visual inspection.
20. Q: What is the role of computer vision in robot navigation? A: Computer vision enables
robots to perceive and interpret the environment, allowing them to understand
obstacles, landmarks, and spatial relationships. This information is crucial for tasks
such as mapping, localization, path planning, and obstacle avoidance. By analyzing
visual data from cameras or other sensors, robots can make informed decisions to
navigate their surroundings safely and efficiently. Computer vision provides valuable
input for autonomous navigation systems, enabling robots to adapt to dynamic
environments and handle complex scenarios.

IV. Exercises and problems:


1. Q: Convert a color image with dimensions 640x480 pixels into a grayscale image.
Calculate the resulting image size.
A: The resulting image size will be 640x480 pixels since a grayscale image has only
one channel.
2. Q: Given an image represented as a 3x3 matrix, perform element-wise multiplication
by a scalar value of 2.
A: If the original image matrix is
1 2 3
[4 5 6]
7 8 9
, the resulting image matrix will be
2 4 6
[ 8 10 12]
14 16 18

3. Q: Calculate the total number of pixels in a grayscale image with dimensions 800x600
pixels.
A: The total number of pixels will be 800 * 600 = 480,000 pixels.
4. Q: Given a camera with a focal length of 50 mm and an object distance of 2 meters,
1 1 1
calculate the image distance using the camera model equation = + )
𝑓 𝑑0 𝑑𝑖
A: Using the camera model equation, the image distance (𝑑𝑖 ) will be 0.040 meters or
40 millimeters.

7
ELB1502
Study Guide

5. Q: Determine the intrinsic matrix K given the camera's focal length of 500 pixels and
principal point coordinates (320, 240).
A: The intrinsic matrix K will be:
500 0 320
[ 0 500 240]
0 0 1

6. Q: Calculate the aspect ratio of an image with dimensions 1024x768 pixels.


A: The aspect ratio is calculated by dividing the width by the height, resulting in
1024/768 ≈ 1.3333.
7. Q: Given an RGB image with dimensions 640x480 pixels, calculate the total number
of color channels.
A: RGB images have three color channels (Red, Green, and Blue). So, the total number
of color channels will be 3.
8. Q: Determine the field of view (FOV) of a camera with a focal length of 35 mm and
an image sensor size of 22.3 mm x 14.9 mm.
A: The horizontal FOV can be calculated using the formula:
𝑠𝑒𝑛𝑠𝑜𝑟𝑤𝑖𝑑𝑡ℎ
𝐹𝑂𝑉 = 2 × tan−1 ( )
2 × 𝑓𝑜𝑐𝑎𝑙𝑙𝑒𝑛𝑔𝑡ℎ
Substituting the values, the horizontal FOV will be approximately 37.1 degrees.
9. Q: Given a camera with a pixel size of 5 μm and a resolution of 2048x1536 pixels,
calculate the physical size of the image sensor.
A: The physical size of the image sensor can be calculated by multiplying the pixel
size by the resolution. In this case, it will be 5 μm * 2048 x 5 μm * 1536 ≈ 10.24 mm
x 7.68 mm.
10. Q: Calculate the Euclidean distance between two points A(3, 4) and B(7, 1) in an
image.
A: The Euclidean distance can be calculated using the formula:
√(𝑥2 − 𝑥1 )2 + (𝑦2 − 𝑦1 )2
Substituting the values, the Euclidean distance between points A and B will be
√(7 − 3)2 + (1 − 4)2 = √(16 + 9) = √25 = 5 𝑢𝑛𝑖𝑡𝑠
11. Problem: A robot is equipped with a camera that captures images at a resolution of
800x600 pixels. Each pixel represents a 0.1 cm x 0.1 cm area in the real world. The
robot needs to determine the size of an object in the image. If the object occupies 200
pixels in width, what is its size in centimeters?
Solution: The size of the object in centimeters can be calculated by multiplying the
number of pixels by the pixel size. In this case, the object size is 200 pixels * 0.1
cm/pixel = 20 cm.
12. Problem: A robot is using machine vision to detect defects on manufactured parts. The
camera captures images at a rate of 30 frames per second. If each image processing
operation takes 20 milliseconds to complete, what is the maximum number of parts the
robot can inspect per minute?
Solution: The time taken to process one image is 20 milliseconds. Therefore, the robot
can process 1000 milliseconds / 20 milliseconds = 50 images per second. Multiplying
this by 60 seconds gives a maximum inspection rate of 50 images/second * 60 seconds
= 3000 parts per minute.
13. Problem: A robot is performing object recognition using machine vision. The camera
captures images at a resolution of 1280x960 pixels. The robot's algorithm requires the
images to be converted to grayscale. If each grayscale conversion operation takes 5

8
ELB1502
Study Guide

milliseconds, what is the total processing time for a video sequence of 100 frames?
Solution: The time taken to convert one frame to grayscale is 5 milliseconds.
Therefore, the total processing time for 100 frames will be 100 frames * 5 milliseconds
= 500 milliseconds or 0.5 seconds.
14. Problem: A robot is equipped with a depth-sensing camera that measures the distance
of objects in a scene. The camera has a depth resolution of 1 millimeter. If the robot
detects an object at a distance of 5 meters, what is the depth measurement accuracy in
centimeters?
Solution: The depth measurement accuracy is equal to the depth resolution, which is 1
millimeter. Converting this to centimeters gives an accuracy of 0.1 centimeters.
15. Problem: A robot is using machine vision to navigate through a maze. The camera
captures images at a resolution of 640x480 pixels. The robot's algorithm requires the
images to be resized to a resolution of 320x240 pixels. If each image resizing operation
takes 10 milliseconds, what is the processing time for a video sequence of 50 frames?
Solution: The time taken to resize one frame is 10 milliseconds. Therefore, the total
processing time for 50 frames will be 50 frames * 10 milliseconds = 500 milliseconds
or 0.5 seconds.

V. References

[1] J. J. Craig, Introduction to Robotics, Global Edition, 3rd ed. Harlow, England: Pearson
Education Limited, 2014

You might also like