Vision Ugguide PDF
Vision Ugguide PDF
User's Guide
R2019a
How to Contact MathWorks
Phone: 508-647-7000
Featured Examples
1
Code Generation for Object Detection Using YOLO v2 . . . . . . . 1-3
v
Detect SURF Interest Points in a Grayscale Image . . . . . . . . 1-88
vi Contents
Affine Transformations of 3-D Point Cloud . . . . . . . . . . . . . . 1-138
Merge Two Identical Point Clouds Using Box Grid Filter . . 1-141
vii
Point Cloud Processing
2
Point Cloud Registration Workflow . . . . . . . . . . . . . . . . . . . . . . 2-2
viii Contents
Import from Video Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
Setting Block Parameters for this Example . . . . . . . . . . . . . . . 4-4
Configuration Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5
ix
Intensity Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-45
RGB Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-45
x Contents
Single Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 6-21
Open the Camera Calibrator . . . . . . . . . . . . . . . . . . . . . . . . . 6-22
Prepare the Pattern, Camera, and Images . . . . . . . . . . . . . . . 6-22
Add Images and Select Camera Model . . . . . . . . . . . . . . . . . 6-26
Calibrate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-30
Evaluate Calibration Results . . . . . . . . . . . . . . . . . . . . . . . . . 6-32
Improve Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-37
Export Camera Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 6-41
Object Detection
7
How Labeler Apps Store Exported Pixel Labels . . . . . . . . . . . . 7-3
Location of Pixel Label Data Folder . . . . . . . . . . . . . . . . . . . . . 7-4
View Exported Pixel Label Data . . . . . . . . . . . . . . . . . . . . . . . 7-4
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4
xi
Advantage of Using Anchor Boxes . . . . . . . . . . . . . . . . . . . . . 7-10
How Do Anchor Boxes Work? . . . . . . . . . . . . . . . . . . . . . . . . 7-11
Anchor Box Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-14
xii Contents
Import Custom Algorithm into Labeling App . . . . . . . . . . . . . 7-85
Custom Algorithm Execution . . . . . . . . . . . . . . . . . . . . . . . . 7-85
xiii
Temporal Automation Algorithms . . . . . . . . . . . . . . . . . . . . . 7-139
Class Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-139
Enable Temporal Properties . . . . . . . . . . . . . . . . . . . . . . . . 7-139
Create a Temporal Automation Algorithm to use with the
Ground Truth Labeler . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-139
xiv Contents
Image Registration Using Multiple Features . . . . . . . . . . . . 7-177
xv
Track Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4
Geometric Transformations
9
Rotate an Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
xvi Contents
Statistics and Morphological Operations
11
Correct Nonuniform Illumination . . . . . . . . . . . . . . . . . . . . . . 11-2
Fixed-Point Design
12
Fixed-Point Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . 12-2
Fixed-Point Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2
Benefits of Fixed-Point Hardware . . . . . . . . . . . . . . . . . . . . . 12-2
Benefits of Fixed-Point Design with System Toolboxes Software
............................................ 12-3
xvii
Code Generation
13
Code Generation in MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2
xviii Contents
1
Featured Examples
• “Code Generation for Object Detection Using YOLO v2” on page 1-3
• “Track Vehicles Using Lidar: From Point Cloud to Track List” on page 1-7
• “Object Detection Using YOLO v2 Deep Learning” on page 1-30
• “Estimate Anchor Boxes Using Clustering” on page 1-39
• “Create YOLO v2 Object Detection Network” on page 1-46
• “Estimate Anchor Boxes Using Clustering” on page 1-51
• “Semantic Segmentation Using Dilated Convolutions” on page 1-58
• “Define Custom Pixel Classification Layer with Dice Loss” on page 1-64
• “Read and Play a Video File” on page 1-74
• “Find Vertical and Horizontal Edges in Image” on page 1-77
• “Blur an Image Using an Average Filter” on page 1-81
• “Define a Filter to Approximate a Gaussian Second Order Partial Derivative in Y
Direction” on page 1-83
• “Find Corresponding Interest Points Between Pair of Images” on page 1-84
• “Find Corresponding Points Using SURF Features” on page 1-86
• “Detect SURF Interest Points in a Grayscale Image” on page 1-88
• “Using LBP Features to Differentiate Images by Texture” on page 1-89
• “Extract and Plot HOG Features” on page 1-93
• “Find Corresponding Interest Points Between Pair of Images” on page 1-94
• “Recognize Text Within an Image” on page 1-96
• “Run Nonmaximal Suppression on Bounding Boxes Using People Detector”
on page 1-98
• “Train Stop Sign Detector” on page 1-101
• “Track an Occluded Object” on page 1-104
• “Track a Face in Scene” on page 1-108
• “Assign Detections to Tracks in a Single Video Frame” on page 1-113
1 Featured Examples
1-2
Code Generation for Object Detection Using YOLO v2
Prerequisites
Use the coder.checkGpuInstall function and verify that the compilers and libraries
needed for running this example are set up correctly.
envCfg = coder.gpuEnvConfig('host');
envCfg.DeepLibTarget = 'cudnn';
envCfg.DeepCodegen = 1;
envCfg.Quiet = 1;
coder.checkGpuInstall(envCfg);
net = getYOLOv2();
The DAG network contains 150 layers including convolution, ReLU, and batch
normalization layers along with the YOLO v2 transform and YOLO v2 output layers. Use
the command net.Layers to see all the layers of the network.
net.Layers
1-3
1 Featured Examples
The yolov2_detect.m function takes an image input and run the detector on the image
using the deep learning network saved in yolov2ResNet50VehicleExample.mat file. The
function loads the network object from yolov2ResNet50VehicleExample.mat into a
persistent variable mynet. On subsequent calls to the function, the persistent object is
reused for detection.
type('yolov2_detect.m')
if isempty(yolov2Obj)
yolov2Obj = coder.loadDeepLearningNetwork('yolov2ResNet50VehicleExample.mat');
end
% pass in input
[bboxes,~,labels] = yolov2Obj.detect(in,'Threshold',0.5);
To generate CUDA code from the design file yolov2_detect.m, create a GPU code
configuration object for a MEX target and set the target language to C++. Use the
coder.DeepLearningConfig function to create a CuDNN deep learning configuration
object and assign it to the DeepLearningConfig property of the GPU code configuration
object. Run the codegen command specifying an input of size [224,224,3]. This value
corresponds to the input layer size of YOLOv2.
cfg = coder.gpuConfig('mex');
cfg.TargetLang = 'C++';
1-4
Code Generation for Object Detection Using YOLO v2
cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');
codegen -config cfg yolov2_detect -args {ones(224,224,3,'uint8')} -report
Set up the video file reader and read the input video. Create a video player to display the
video and the output detections.
videoFile = 'highway_lanechange.mp4';
videoFreader = vision.VideoFileReader(videoFile,'VideoOutputDataType','uint8');
depVideoPlayer = vision.DeployableVideoPlayer('Size','Custom','CustomSize',[640 480]);
Read the video input frame-by-frame and detect the vehicles in the video using the
detector.
cont = ~isDone(videoFreader);
while cont
I = step(videoFreader);
in = imresize(I,[224,224]);
out = yolov2_detect_mex(in);
step(depVideoPlayer, out);
cont = ~isDone(videoFreader) && isOpen(depVideoPlayer); % Exit the loop if the vide
end
1-5
1 Featured Examples
References
[1] Redmon, Joseph, and Ali Farhadi. "YOLO9000: Better, Faster, Stronger." 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.
1-6
Track Vehicles Using Lidar: From Point Cloud to Track List
Due to high resolution capabilities of the lidar sensor, each scan from the sensor contains
a large number of points, commonly known as a point cloud. This raw data must be
preprocessed to extract objects of interest, such as cars, cyclists, and pedestrians. For
more details about segmentation of lidar data into objects such as the ground plane and
obstacles, refer to the “Ground Plane and Obstacle Detection Using Lidar” (Automated
Driving Toolbox) example. In this example, the point clouds belonging to obstacles are
further classified into clusters using the pcsegdist function, and each cluster is
converted to a bounding box detection with the following format:
, and refer to the x-, y- and z-positions of the bounding box and , and refer to its
length, width, and height, respectively.
The bounding box is fit onto each cluster by using minimum and maximum of coordinates
of points in each dimension. The detector is implemented by a supporting class
HelperBoundingBoxDetector, which wraps around point cloud segmentation and
clustering functionalities. An object of this class accepts a pointCloud input and returns
a list of objectDetection objects with bounding box measurements.
The diagram shows the processes involved in the bounding box detector model and the
Computer Vision Toolbox™ functions used to implement each process. It also shows the
properties of the supporting class that control each process.
1-7
1 Featured Examples
The first step in tracking an object is defining its state, and the models that define the
transition of state and the corresponding measurement. These two sets of equations are
collectively known as the state-space model of the target. To model the state of vehicles
for tracking using lidar, this example uses a cuboid model with following convention:
1-8
Track Vehicles Using Lidar: From Point Cloud to Track List
refers to the portion of the state that controls the kinematics of the motion center,
and is the yaw angle. The length, width, height of the cuboid are modeled as a
constants, whose estimates evolve in time during correction stages of the filter.
In this example, you use two state-space models: a constant velocity (cv) cuboid model
and a constant turn-rate (ct) cuboid model. These models differ in the way they define the
kinematic part of the state, as described below:
For information about their state transition, refer to the helperConstvelCuboid and
helperConstturnCuboid functions used in this example.
The image below demonstrates the measurement model operating at different state-space
samples. Notice the modeled effects of bounding box shrinkage and center-point offset as
the objects move around the ego vehicle.
1-9
1 Featured Examples
The image below shows the complete workflow to obtain a list of tracks from a pointCloud
input.
1-10
Track Vehicles Using Lidar: From Point Cloud to Track List
Now, set up the tracker and the visualization used in the example.
A joint probabilistic data association tracker (trackerJPDA) coupled with an IMM filter
(trackingIMM) is used to track objects in this example. The IMM filter uses a constant
velocity and constant turn model and is initialized using the supporting function,
helperInitIMMFilter, included with this example. The IMM approach helps a track to
switch between motion models and thus achieve good estimation accuracy during events
like maneuvering or lane changing. Set the HasDetectableTrackIDsInput property of
the tracker as true, which enables you to specify a state-dependent probability of
detection. The detection probability of a track is calculated by the
helperCalcDetectability function, listed at the end of this example.
1-11
1 Featured Examples
'HasDetectableTrackIDsInput',true,...
'InitializationThreshold',0);
1 Lidar Preprocessing and Tracking - This display shows the raw point cloud,
segmented ground, and obstacles. It also shows the resulting detections from the
detector model and the tracks of vehicles generated by the tracker.
2 Ego Vehicle Display - This display shows the 2-D bird's-eye view of the scenario. It
shows the obstacle point cloud, bounding box detections, and the tracks generated by
the tracker. For reference, it also displays the image recorded from a camera
mounted on the ego vehicle and its field of view.
3 Tracking Details - This display shows the scenario zoomed around the ego vehicle. It
also shows finer tracking details, such as error covariance in estimated position of
each track and its motion model probabilities, denoted by cv and ct.
% Create display
displayObject = HelperLidarExampleDisplay(imageData{1},...
'PositionIndex',[1 3 6],...
'VelocityIndex',[2 4 7],...
'DimensionIndex',[9 10 11],...
'YawIndex',8,...
'MovieName','',... % Specify a movie name to record a movie.
'RecordGIF',false); % Specify true to record new GIFs
Loop through the recorded lidar data, generate detections from the current point cloud
using the detector model and then process the detections using the tracker.
1-12
Track Vehicles Using Lidar: From Point Cloud to Track List
% Update display
if isvalid(displayObject.PointCloudProcessingDisplay.ObstaclePlotter)
% Get current image scan for reference image
currentImage = imageData{i};
1-13
1 Featured Examples
if displayObject.RecordGIF
% second input is start frame, third input is end frame and last input
% is a character vector specifying the panel to record.
writeAnimatedGIF(displayObject,10,170,'trackMaintenance','ego');
writeAnimatedGIF(displayObject,310,330,'jpda','processing');
writeAnimatedGIF(displayObject,140,160,'imm','details');
end
The figure above shows the three displays at time = 18 seconds. The tracks are
represented by green bounding boxes. The bounding box detections are represented by
orange bounding boxes. The detections also have orange points inside them, representing
the point cloud segmented as obstacles. The segmented ground is shown in purple. The
cropped or discarded point cloud is shown in blue.
1-14
Track Vehicles Using Lidar: From Point Cloud to Track List
Generate C Code
You can generate C code from the MATLAB® code for the tracking and the preprocessing
algorithm using MATLAB Coder™. C code generation enables you to accelerate MATLAB
code for simulation. To generate C code, the algorithm must be restructured as a
MATLAB function, which can be compiled into a MEX file or a shared library. For this
purpose, the point cloud processing algorithm and the tracking algorithm is restructured
into a MATLAB function, mexLidarTracker. Some variables are defined as persistent to
preserve their state between multiple calls to the function (see persistent). The inputs
and outputs of the function can be observed in the function description provided in the
"Supporting Files" section at the end of this example.
MATLAB Coder requires specifying the properties of all the input arguments. An easy way
to do this is by defining the input properties by example at the command line using the -
args option. For more information, see “Define Input Properties by Example at the
Command Line” (MATLAB Coder). Note that the top-level input arguments cannot be
objects of the handle class. Therefore, the function accepts the x, y and z locations of the
point cloud as an input. From the stored point cloud, this information can be extracted
using the Location property of the pointCloud object. This information is also directly
available as the raw data from the lidar sensor.
% Input lists
inputExample = {lidarData{1}.Location, 0};
% Reset time
time = 0;
1-15
1 Featured Examples
for i = 1:numel(lidarData)
time = time + dT;
currentLidar = lidarData{i};
[detectionsMex,obstacleIndicesMex,groundIndicesMex,croppedIndicesMex,...
confirmedTracksMex, modelProbsMex] = mexLidarTracker_mex(currentLidar.Location,
disp(isequal(numTracks(:,1),numTracks(:,2)));
Notice that the number of confirmed tracks is the same for MATLAB and MEX code
execution. This assures that the lidar preprocessing and tracking algorithm returns the
same results with generated C code as with the MATLAB code.
Results
Now, analyze different events in the scenario and understand how the combination of
lidar measurement model, joint probabilistic data association, and interacting multiple
model filter, helps achieve a good estimation of the vehicle tracks.
Track Maintenance
1-16
Track Vehicles Using Lidar: From Point Cloud to Track List
The animation above shows the simulation between time = 3 seconds and time = 16
seconds. Notice that tracks such as T9 and T6 maintain their IDs and trajectory during
the time span. However, track T10 is lost because the tracked vehicle was missed (not
detected) for a long time by the sensor. Also, notice that the tracked objects are able to
maintain their shape and kinematic center by positioning the detections onto the visible
portions of the vehicles. For example, as Track T7 moves forward, bounding box
detections start to fall on its visible rear portion and the track maintains the actual size of
1-17
1 Featured Examples
the vehicle. This illustrates the offset and shrinkage effect modeled in the measurement
functions.
Capturing Maneuvers
The animation shows that using an IMM filter helps the tracker to maintain tracks on
maneuvering vehicles. Notice that the vehicle tracked by T4 changes lanes behind the
ego vehicle. The tracker is able maintain a track on the vehicle during this maneuvering
event. Also notice in the display that its probability of following the constant turn model,
denoted by ct, increases during the lane change maneuver.
1-18
Track Vehicles Using Lidar: From Point Cloud to Track List
This animation shows that using a joint probabilistic data association tracker helps in
maintaining tracks during ambiguous situations. Here, vehicles tracked by T44 and T97,
have a low probability of detection due to their large distance from the sensor. Notice that
the tracker is able to maintain tracks during events when one of the vehicles is not
detected. During the event, the tracks first coalesce, which is a known phenomenon in
JPDA, and then separate as soon as the vehicle was detected again.
1-19
1 Featured Examples
Summary
This example showed how to use a JPDA tracker with an IMM filter to track objects using
a lidar sensor. You learned how a raw point cloud can be preprocessed to generate
detections for conventional trackers, which assume one detection per object per sensor
scan. You also learned how to define a cuboid model to describe the kinematics,
dimensions, and measurements of extended objects being tracked by the JPDA tracker. In
addition, you generated C code from the algorithm and verified its execution results with
the MATLAB simulation.
Supporting Files
helperLidarModel
This function defines the lidar model to simulate shrinkage of the bounding box
measurement and center-point offset. This function is used in the helperCvmeasCuboid
and helperCtmeasCuboid functions to obtain bounding box measurement from the
state.
% Shrink rate
s = 3/50; % 3 meters radial length at 50 meters.
sz = 2/50; % 2 meters height at 50 meters.
az = az - deg2rad(yaw);
1-20
Track Vehicles Using Lidar: From Point Cloud to Track List
% Shrink height.
Hshrink = min(H,sz*r);
Hs = H - Hshrink;
% Measurement format
meas = [x;y;z;Lmeas;Wmeas;Hs];
end
helperInverseLidarModel
This function defines the inverse lidar model to initiate a tracking filter using a lidar
bounding box measurement. This function is used in the helperInitIMMFilter
function to obtain state estimates from a bounding box measurement.
1-21
1 Featured Examples
% Shrink rate.
s = 3/50;
sz = 2/50;
[az,~,r] = cart2sph(x,y,z);
shiftX = Lshrink;
shiftY = Wshrink;
shiftZ = Hshrink;
x = x + sign(x).*shiftX/2;
y = y + sign(y).*shiftY/2;
z = z + sign(z).*shiftZ/2;
pos = [x;y;z];
posCov = measCov(1:3,1:3,:);
yaw = zeros(1,numel(x),'like',x);
yawCov = ones(1,1,numel(x),'like',x);
HelperBoundingBoxDetector
1-22
Track Vehicles Using Lidar: From Point Cloud to Track List
% Cropping properties
properties
XLimits = [-70 70];
YLimits = [-6 6];
ZLimits = [-2 10];
end
properties
MeasurementNoise = blkdiag(eye(3),eye(3));
end
methods
function obj = HelperBoundingBoxDetector(varargin)
1-23
1 Featured Examples
setProperties(obj,nargin,varargin{:})
end
end
1-24
Track Vehicles Using Lidar: From Point Cloud to Track List
xMin = min(thisPointData(:,1));
xMax = max(thisPointData(:,1));
yMin = min(thisPointData(:,2));
yMax = max(thisPointData(:,2));
zMin = min(thisPointData(:,3));
zMax = max(thisPointData(:,3));
l = (xMax - xMin);
w = (yMax - yMin);
h = (zMax - zMin);
x = (xMin + xMax)/2;
y = (yMin + yMax)/2;
z = (zMin + zMax)/2;
bboxes(:,i) = [x y z l w h]';
isValidCluster(i) = l < 20; % max length of 20 meters
end
end
bboxes = bboxes(:,isValidCluster);
end
1-25
1 Featured Examples
ptCloudOut = select(ptCloudIn,indices);
end
mexLidarTracker
This function implements the point cloud preprocessing display and the tracking
algorithm using a functional interface for code generation.
function [detections,obstacleIndices,groundIndices,croppedIndices,...
confirmedTracks, modelProbs] = mexLidarTracker(ptCloudLocations,time)
filterInitFcn = @helperInitIMMFilter;
tracker = trackerJPDA('FilterInitializationFcn',filterInitFcn,...
'TrackLogic','History',...
'AssignmentThreshold',assignmentGate,...
'ClutterDensity',Kc,...
'ConfirmationThreshold',confThreshold,...
'DeletionThreshold',delThreshold,...
'HasDetectableTrackIDsInput',true,...
'InitializationThreshold',0,...
1-26
Track Vehicles Using Lidar: From Point Cloud to Track List
'MaxNumTracks',30);
detectableTracksInput = zeros(tracker.MaxNumTracks,2);
currentNumTracks = 0;
end
ptCloud = pointCloud(ptCloudLocations);
% Detector model
[detections,obstacleIndices,groundIndices,croppedIndices] = detectorModel(ptCloud,time)
% Call tracker
[confirmedTracks,~,allTracks] = tracker(detections,time,detectableTracksInput(1:current
% Update the detectability input
currentNumTracks = numel(allTracks);
detectableTracksInput(1:currentNumTracks,:) = helperCalcDetectability(allTracks,[1 3 6]
end
helperCalcDetectability
The function calculate the probability of detection for each track. This function is used to
generate the "DetectableTracksIDs" input for the trackerJPDA.
1-27
1 Featured Examples
% The bounding box detector has low probability of segmenting point clouds
% into bounding boxes are distances greater than 40 meters. This function
% models this effect using a state-dependent probability of detection for
% each tracker. After a maximum range, the Pd is set to a high value to
% enable deletion of track at a faster rate.
if isempty(tracks)
detectableTracksInput = zeros(0,2);
return;
end
rMax = 75;
rAmbig = 40;
stateSize = numel(tracks(1).State);
posSelector = zeros(3,stateSize);
posSelector(1,posIndices(1)) = 1;
posSelector(2,posIndices(2)) = 1;
posSelector(3,posIndices(3)) = 1;
pos = getTrackPositions(tracks,posSelector);
if coder.target('MATLAB')
trackIDs = [tracks.TrackID];
else
trackIDs = zeros(1,numel(tracks),'uint32');
for i = 1:numel(tracks)
trackIDs(i) = tracks(i).TrackID;
end
end
[~,~,r] = cart2sph(pos(:,1),pos(:,2),pos(:,3));
probDetection = 0.9*ones(numel(tracks),1);
probDetection(r > rAmbig) = 0.4;
probDetection(r > rMax) = 0.99;
detectableTracksInput = [double(trackIDs(:)) probDetection(:)];
end
loadLidarAndImageData
Stitches Lidar and Camera data for processing using initial and final time specified.
function [lidarData,imageData] = loadLidarAndImageData(initTime,finalTime)
initFrame = max(1,floor(initTime*10));
lastFrame = min(350,ceil(finalTime*10));
load ('imageData_35seconds.mat','allImageData');
imageData = allImageData(initFrame:lastFrame);
1-28
Track Vehicles Using Lidar: From Point Cloud to Track List
counter = 1;
for i = initFileIndex:lastFileIndex
startFrame = frameIndices(counter);
endFrame = frameIndices(counter + 1) - 1;
load(['lidarData_',num2str(i)],'currentLidarData');
lidarData(startFrame:endFrame) = currentLidarData(1:(endFrame + 1 - startFrame));
counter = counter + 1;
end
end
References
[1] Arya Senna Abdul Rachman, Arya. "3D-LIDAR Multi Object Tracking for Autonomous
Driving: Multi-target Detection and Tracking under Urban Road Uncertainties." (2017).
1-29
1 Featured Examples
Overview
Deep learning is a powerful machine learning technique that automatically learns image
features required for detection tasks. There are several techniques for object detection
using deep learning such as Faster R-CNN and you only look once (YOLO) v2. This
example trains YOLO v2, which is an efficient deep learning object detector.
Note: This example requires Computer Vision Toolbox™ and Deep Learning Toolbox™.
Parallel Computing Toolbox™ is recommended to train the detector using a CUDA-
capable NVIDIA™ GPU with compute capability 3.0.
This example uses a pretrained detector to allow the example to run without having to
wait for training to complete. If you want to train the detector with the
trainYOLOv2ObjectDetector function, set the doTraining variable to true.
Otherwise, download the pretrained detector.
doTraining = false;
if ~doTraining && ~exist('yolov2ResNet50VehicleExample.mat','file')
% Download pretrained detector.
disp('Downloading pretrained detector (98 MB)...');
pretrainedURL = 'https://www.mathworks.com/supportfiles/vision/data/yolov2ResNet50V
websave('yolov2ResNet50VehicleExample.mat',pretrainedURL);
end
Load Dataset
This example uses a small vehicle data set that contains 295 images. Each image contains
one or two labeled instances of a vehicle. A small data set is useful for exploring the
YOLO v2 training procedure, but in practice, more labeled images are needed to train a
robust detector.
% Unzip vehicle dataset images.
unzip vehicleDatasetImages.zip
1-30
Object Detection Using YOLO v2 Deep Learning
The training data is stored in a table. The first column contains the path to the image
files. The remaining columns contain the ROI labels for vehicles.
ans=4×2 table
imageFilename vehicle
_______________________________ ____________
Display one of the images from the data set to understand the type of images it contains.
1-31
1 Featured Examples
Split the data set into a training set for training the detector, and a test set for evaluating
the detector. Select 60% of the data for training. Use the rest for evaluation.
The YOLO v2 object detection network can be thought of as having two sub-networks. A
feature extraction network, followed by a detection network.
The feature extraction network is typically a pretrained CNN (see “Pretrained Deep
Neural Networks” (Deep Learning Toolbox) for more details). This example uses
1-32
Object Detection Using YOLO v2 Deep Learning
First, specify the image input size and the number of classes. The image input size should
be at least as big as the images in the training image set. In this example, the images are
224-by-224 RGB images.
Next, specify the size of the anchor boxes. The anchor boxes should be selected based on
the scale and size of objects in the training data. You can “Estimate Anchor Boxes Using
Clustering” on page 1-39 to determine a good set of anchor boxes based on the training
data. Using this procedure, the anchor boxes for the vehicle dataset are:
anchorBoxes = [
43 59
18 22
23 29
84 109
];
See “Anchor Boxes for Object Detection” on page 7-9 for additional details.
Finally, specify the network and feature extraction layer within that network to use as the
basis of YOLO v2.
1-33
1 Featured Examples
spatial resolution and the strength of the extracted features (features extracted further
down the network encode stronger image features at the cost of spatial resolution).
Choosing the optimal feature extraction layer requires empirical analysis and is another
hyperparameter to tune.
Note that you can also create a custom YOLO v2 network layer-by-layer. “Design a YOLO
v2 Detection Network” on page 7-18
if doTraining
1-34
Object Detection Using YOLO v2 Deep Learning
Note: This example verified on an NVIDA™ Titan X with 12 GB of GPU memory. If your
GPU has less memory, you may run out of memory. If this happens, lower the
'MiniBatchSize' using the trainingOptions function. Training this network took
approximately 5 minutes using this setup. Training time varies depending on the
hardware you use.
1-35
1 Featured Examples
Evaluate the detector on a large set of images to measure the trained detector's
performance. Computer Vision Toolbox™ provides object detector evaluation functions to
measure common metrics such as average precision (evaluateDetectionPrecision)
and log-average miss rates (evaluateDetectionMissRate). Here, the average
precision metric is used. The average precision provides a single number that
incorporates the ability of the detector to make correct classifications (precision) and the
ability of the detector to find all relevant objects (recall).
The first step for detector evaluation is to collect the detection results by running the
detector on the test set.
% Create a table to hold the bounding boxes, scores, and labels output by
% the detector.
numImages = height(testData);
results = table('Size',[numImages 3],...
'VariableTypes',{'cell','cell','cell'},...
'VariableNames',{'Boxes','Scores','Labels'});
% Run detector on each image in the test set and collect results.
for i = 1:numImages
The precision/recall (PR) curve highlights how precise a detector is at varying levels of
recall. Ideally, the precision would be 1 at all recall levels. The use of additional layers in
1-36
Object Detection Using YOLO v2 Deep Learning
the network can help improve the average precision, but might require additional training
data and longer training time.
% Plot precision/recall curve
plot(recall,precision)
xlabel('Recall')
ylabel('Precision')
grid on
title(sprintf('Average Precision = %.2f', ap))
Code Generation
Once the detector is trained and evaluated, you can generate code for the
yolov2ObjectDetector using GPU Coder™. See “Code Generation for Object
Detection Using YOLO v2” (GPU Coder) example for more details.
Summary
This example showed how to train a vehicle detector using deep learning. You can follow
similar steps to train detectors for traffic signs, pedestrians, or other objects.
To learn more about deep learning, see “Object Detection using Deep Learning”.
1-37
1 Featured Examples
References
[1] Redmon, Joseph, and Ali Farhadi. "YOLO9000: Better, Faster, Stronger." 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.
1-38
Estimate Anchor Boxes Using Clustering
Anchor boxes are important parameters of deep learning object detectors such as Faster
R-CNN and YOLO v2. The shape, scale, and number of anchor boxes impact the efficiency
and accuracy of the detectors. This example shows how to estimate anchor boxes from a
vehicle detector training dataset using the K-Medoids clustering algorithm.
See “Anchor Boxes for Object Detection” on page 7-9 to learn more about anchor
boxes.
Load the vehicle dataset, which contains 295 images and associated box labels.
Variables:
Visualize the labeled boxes to better understand the range of object sizes present in the
dataset.
1-39
1 Featured Examples
figure
scatter(area,aspectRatio)
xlabel("Box Area")
ylabel("Aspect Ratio (width/height)");
title("Box area vs. Aspect ratio")
1-40
Estimate Anchor Boxes Using Clustering
Visually, you see a few groups of objects that are of similar size and shape, but the groups
are spread out. This makes it difficult to manually choose anchor boxes. A better way to
estimate anchor boxes is to use a clustering algorithm that can group similar boxes
together using a meaningful metric.
Cluster the boxes using the kmedoids function with custom intersection-over-union (IoU)
distance metric. Other clustering functions such as clusterdata or dbscan may also be
used.
A distance metric based on IoU is invariant to the size of boxes, unlike the Euclidean
distance metric, which produces larger errors as the box sizes increase [1]. In addition,
an IoU distance metric leads to boxes of similar aspect ratio and sizes being clustered
together, which results in anchor box estimates that fit the data. The IoU distance metric
is implemented in the supporting function, iouDistanceMetric.
Select the number of anchors and estimate the anchor boxes using kmedoids. The
cluster centers returned by kmedoids are the anchor box estimates.
% Select the number of anchor boxes.
numAnchors = 4;
% Display estimated anchor boxes. The box format is the [width height].
anchorBoxes
anchorBoxes = 4×2
59 43
22 18
29 23
109 84
1-41
1 Featured Examples
Choosing the number of anchors is another training hyperparameter that requires careful
selection using empirical analysis. One quality measure for judging the estimated anchor
boxes is the mean IoU of the boxes in each cluster. Calculate this using the cluster
assignments produced by kmedoids.
% Count number of boxes per cluster. Exclude the cluster center while
% counting.
counts = accumarray(clusterAssignments, ones(length(clusterAssignments),1),[],@(x)sum(x
1-42
Estimate Anchor Boxes Using Clustering
meanIoU = 0.8244
The mean IoU should be greater than 0.5 to ensure anchor boxes overlap well with the
boxes in the training data. Increasing the number of anchors may improve the mean IoU
measure. However, using more anchor boxes in an object detector may increase the
computation cost and lead to overfitting, which results in poor detector performance.
Sweep over a range of values and plot the mean IoU versus number of anchor boxes to
measure the trade-off between number of anchors an mean IoU.
maxNumAnchors = 15;
for k = 1:maxNumAnchors
figure
plot(1:maxNumAnchors, meanIoU,'-o')
ylabel("Mean IoU")
xlabel("Number of Anchors")
title("Number of Anchors vs. Mean IoU")
1-43
1 Featured Examples
Two anchor boxes provide a mean IoU above 0.7 and there is marginal improvement in
mean IoU beyond 6 anchor boxes. Given these results, the next step is to train and
evaluate multiple object detectors using values between 2 and 6. This empirical analysis
helps determine the number of anchor boxes required to satisfy application performance
requirements such as detection speed or accuracy.
References
1 Redmon, Joseph, and Ali Farhadi. "YOLO9000: Better, Faster, Stronger." 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.
1-44
Estimate Anchor Boxes Using Clustering
Supporting Functions
1-45
1 Featured Examples
The procedure to convert a pretrained network into an YOLO v2 network is similar to the
transfer learning procedure for image classification:
You can also implement this procedure using the deepNetworkDesigner app.
Load a pretrained MobileNet v2 network using mobilenetv2. This requires the Deep
Learning Toolbox Model for MobileNet v2 Network™.
Change the image size of the network based on the training data requirements. To
illustrate this step, assume the required image size is [300 300 3] for RGB images.
% Create new image input layer. Set the new layer name
% to the original layer name.
imgLayer = imageInputLayer(imageInputSize,"Name","input_1")
imgLayer =
ImageInputLayer with properties:
1-46
Create YOLO v2 Object Detection Network
Name: 'input_1'
InputSize: [300 300 3]
Hyperparameters
DataAugmentation: 'none'
Normalization: 'zerocenter'
AverageImage: []
A good feature extraction layer for YOLO v2 is one where the output feature width and
height is between 8 and 16 times smaller than the input image. This level of
downsampling is a trade-off between spatial resolution and quality of output features. The
analyzeNetwork app or deepNetworkDesigner app can be used to determine the
output sizes of layers within a network. Note that selecting an optimal feature extraction
layer requires empirical evaluation.
Set the feature extraction layer to “block_12_add” from MobileNet v2. Because the
required input size was previously set to [300 300], the output feature size is [19 19]. This
results in a downsampling factor of about 16.
featureExtractionLayer = "block_12_add";
To easily remove layers from a deep network, such as MobileNet v2, use the
deepNetworkDesigner app. Import the network into the app to manually remove the
layers after "block_12_add". Export the modified network to your workspace. This
example uses a pre-saved version of MobileNet v2 which was exported from the app.
Alternatively, if you have a list of layers to remove, you can use the removeLayers
function to remove them manually.
1-47
1 Featured Examples
The detection subnetwork consists of groups of serially connected convolution, ReLU, and
batch normalization layers. These layers are followed by a yolov2TransformLayer and a
yolov2OutputLayer.
Create the convolution, ReLU, and batch normalization portion of the detection sub-
network.
% group 2
convolution2dLayer(filterSize,numFilters,"Name","yolov2Conv2",...
"Padding", "same", "WeightsInitializer",@(sz)randn(sz)*0.01)
batchNormalizationLayer("Name","yolov2Batch2");
reluLayer("Name","yolov2Relu2");
]
detectionLayers =
6x1 Layer array with layers:
1-48
Create YOLO v2 Object Detection Network
The remaining layers are configured based on application specific details such as number
of object classes and anchor boxes.
detectionLayers =
9x1 Layer array with layers:
1-49
1 Featured Examples
Use analyzeNetwork(lgraph) to check the network and then train a YOLO v2 object
detector using the trainYOLOv2ObjectDetector function.
1-50
Estimate Anchor Boxes Using Clustering
Anchor boxes are important parameters of deep learning object detectors such as Faster
R-CNN and YOLO v2. The shape, scale, and number of anchor boxes impact the efficiency
and accuracy of the detectors. This example shows how to estimate anchor boxes from a
vehicle detector training dataset using the K-Medoids clustering algorithm.
See “Anchor Boxes for Object Detection” on page 7-9 to learn more about anchor
boxes.
Load the vehicle dataset, which contains 295 images and associated box labels.
Variables:
Visualize the labeled boxes to better understand the range of object sizes present in the
dataset.
1-51
1 Featured Examples
figure
scatter(area,aspectRatio)
xlabel("Box Area")
ylabel("Aspect Ratio (width/height)");
title("Box area vs. Aspect ratio")
1-52
Estimate Anchor Boxes Using Clustering
Visually, you see a few groups of objects that are of similar size and shape, but the groups
are spread out. This makes it difficult to manually choose anchor boxes. A better way to
estimate anchor boxes is to use a clustering algorithm that can group similar boxes
together using a meaningful metric.
Cluster the boxes using the kmedoids function with custom intersection-over-union (IoU)
distance metric. Other clustering functions such as clusterdata or dbscan may also be
used.
A distance metric based on IoU is invariant to the size of boxes, unlike the Euclidean
distance metric, which produces larger errors as the box sizes increase [1]. In addition,
an IoU distance metric leads to boxes of similar aspect ratio and sizes being clustered
together, which results in anchor box estimates that fit the data. The IoU distance metric
is implemented in the supporting function, iouDistanceMetric.
Select the number of anchors and estimate the anchor boxes using kmedoids. The
cluster centers returned by kmedoids are the anchor box estimates.
% Select the number of anchor boxes.
numAnchors = 4;
% Display estimated anchor boxes. The box format is the [width height].
anchorBoxes
anchorBoxes = 4×2
59 43
22 18
29 23
109 84
1-53
1 Featured Examples
Choosing the number of anchors is another training hyperparameter that requires careful
selection using empirical analysis. One quality measure for judging the estimated anchor
boxes is the mean IoU of the boxes in each cluster. Calculate this using the cluster
assignments produced by kmedoids.
% Count number of boxes per cluster. Exclude the cluster center while
% counting.
counts = accumarray(clusterAssignments, ones(length(clusterAssignments),1),[],@(x)sum(x
1-54
Estimate Anchor Boxes Using Clustering
meanIoU = 0.8244
The mean IoU should be greater than 0.5 to ensure anchor boxes overlap well with the
boxes in the training data. Increasing the number of anchors may improve the mean IoU
measure. However, using more anchor boxes in an object detector may increase the
computation cost and lead to overfitting, which results in poor detector performance.
Sweep over a range of values and plot the mean IoU versus number of anchor boxes to
measure the trade-off between number of anchors an mean IoU.
maxNumAnchors = 15;
for k = 1:maxNumAnchors
figure
plot(1:maxNumAnchors, meanIoU,'-o')
ylabel("Mean IoU")
xlabel("Number of Anchors")
title("Number of Anchors vs. Mean IoU")
1-55
1 Featured Examples
Two anchor boxes provide a mean IoU above 0.7 and there is marginal improvement in
mean IoU beyond 6 anchor boxes. Given these results, the next step is to train and
evaluate multiple object detectors using values between 2 and 6. This empirical analysis
helps determine the number of anchor boxes required to satisfy application performance
requirements such as detection speed or accuracy.
References
1 Redmon, Joseph, and Ali Farhadi. "YOLO9000: Better, Faster, Stronger." 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.
1-56
Estimate Anchor Boxes Using Clustering
Supporting Functions
1-57
1 Featured Examples
Semantic segmentation networks like DeepLab [1] make extensive use of dilated
convolutions (also known as atrous convolutions) because they can increase the receptive
field of the layer (the area of the input which the layers can see) without increasing the
number of parameters or computations.
The example uses a simple dataset of 32x32 triangle images for illustration purposes. The
dataset includes accompanying pixel label ground truth data. Load the training data using
an imageDatastore and a pixelLabelDatastore.
dataFolder = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageFolderTrain = fullfile(dataFolder,'trainingImages');
labelFolderTrain = fullfile(dataFolder,'trainingLabels');
pxdsTrain =
PixelLabelDatastore with properties:
1-58
Semantic Segmentation Using Dilated Convolutions
Create a data source for training data and get the pixel counts for each label.
pximdsTrain = pixelLabelImageDatastore(imdsTrain,pxdsTrain);
tbl = countEachLabel(pximdsTrain)
tbl=2×3 table
Name PixelCount ImagePixelCount
____________ __________ _______________
The majority of pixel labels are for background. This class imbalance biases the learning
process in favor of the dominant class. To fix this, use class weighting to balance the
classes. There are several methods for computing class weights. One common method is
inverse frequency weighting where the class weights are the inverse of the class
frequencies. This increases weight given to under-represented classes. Calculate the class
weights using inverse frequency weighting.
numberPixels = sum(tbl.PixelCount);
frequency = tbl.PixelCount / numberPixels;
classWeights = 1 ./ frequency;
Create a network for pixel classificaiton with an image input layer with input size
corresponding to the size of the input images. Next, specify three blocks of convolution,
batch normalization, and ReLU layers. For each convolutional layer, specify 32 3-by-3
filters with increasing dilation factors and specify to pad the inputs to be the same size as
the outputs by setting the 'Padding' option to 'same'. To classify the pixels, include a
convolutional layer with K 1-by-1 convolutions, where K is the number of classes, followed
by a softmax layer and a pixelClassificationLayer with the inverse class weights.
inputSize = [32 32 1];
filterSize = 3;
numFilters = 32;
numClasses = numel(classNames);
layers = [
imageInputLayer(inputSize)
1-59
1 Featured Examples
convolution2dLayer(filterSize,numFilters,'DilationFactor',1,'Padding','same')
batchNormalizationLayer
reluLayer
convolution2dLayer(filterSize,numFilters,'DilationFactor',2,'Padding','same')
batchNormalizationLayer
reluLayer
convolution2dLayer(filterSize,numFilters,'DilationFactor',4,'Padding','same')
batchNormalizationLayer
reluLayer
convolution2dLayer(1,numClasses)
softmaxLayer
pixelClassificationLayer('Classes',classNames,'ClassWeights',classWeights)];
Train Network
Specify the training options. Using the SGDM solver, train for 100 epochs, mini-batch size
64, and learn rate 0.001.
net = trainNetwork(pximdsTrain,layers,options);
1-60
Semantic Segmentation Using Dilated Convolutions
Test Network
Load the test data. Create an image datastore for the images. Create a
pixelLabelDatastore for the ground truth pixel labels.
imageFolderTest = fullfile(dataFolder,'testImages');
imdsTest = imageDatastore(imageFolderTest);
labelFolderTest = fullfile(dataFolder,'testLabels');
pxdsTest = pixelLabelDatastore(labelFolderTest,classNames,labels);
pxdsPred = semanticseg(imdsTest,net,'WriteLocation',tempdir);
metrics = evaluateSemanticSegmentation(pxdsPred,pxdsTest);
1-61
1 Featured Examples
imgTest = imread('triangleTest.jpg');
figure
imshow(imgTest)
Segment the test image using semanticseg and display the results using
labeloverlay.
C = semanticseg(imgTest,net);
B = labeloverlay(imgTest,C);
figure
imshow(B)
1-62
Semantic Segmentation Using Dilated Convolutions
References
1 Chen, Liang-Chieh, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L.
Yuille. "Deeplab: Semantic image segmentation with deep convolutional nets, atrous
convolution, and fully connected crfs." IEEE transactions on pattern analysis and
machine intelligence 40, no. 4 (2018): 834-848.
1-63
1 Featured Examples
This layer can be used to train semantic segmentation networks. To learn more about
creating custom deep learning layers, see “Define Custom Deep Learning Layers” (Deep
Learning Toolbox).
Dice Loss
The Dice loss is based on the Sørensen-Dice similarity coefficient for measuring overlap
between two segmented images. The generalized Dice loss [1,2], L, for between one
image Y and the corresponding ground truth T is given by
K M
2∑k = 1 wk ∑m = 1 Y kmTkm
L=1− K M 2 2
,
∑k = 1 wk ∑m = 1 Y km + Tkm
where K is the number of classes, M is the number of elements along the first two
dimensions of Y , andwk is a class specific weighting factor that controls the contribution
each class makes to the loss. wk is typically the inverse area of the expected region:
1
wk = 2
M
∑m = 1 Tkm
This weighting helps counter the influence of larger regions on the Dice score making it
easier for the network to learn how to segment smaller regions.
Copy the classification layer template into a new file in MATLAB®. This template outlines
the structure of a classification layer and includes the functions that define the layer
behavior. The rest of the example shows how to complete the
dicePixelClassificationLayer.
properties
% Optional properties
end
1-64
Define Custom Pixel Classification Layer with Dice Loss
methods
• Name – Layer name, specified as a character vector or a string scalar. To include this
layer in a layer graph, you must specify a nonempty unique layer name. If you train a
series network with this layer and Name is set to '', then the software automatically
assigns a name at training time.
• Description – One-line description of the layer, specified as a character vector or a
string scalar. This description appears when the layer is displayed in a Layer array. If
you do not specify a layer description, then the software displays the layer class name.
• Type – Type of the layer, specified as a character vector or a string scalar. The value of
Type appears when the layer is displayed in a Layer array. If you do not specify a
layer type, then the software displays 'Classification layer' or 'Regression
layer'.
• Classes – Classes of the output layer, specified as a categorical vector, string array,
cell array of character vectors, or 'auto'. If Classes is 'auto', then the software
automatically sets the classes at training time. If you specify a string array or cell
array of character vectors str, then the software sets the classes of the output layer
to categorical(str,str). The default value is 'auto'.
If the layer has no other properties, then you can omit the properties section.
The Dice loss requires a small constant value to prevent division by zero. Specify the
property, Epsilon, to hold this value.
1-65
1 Featured Examples
properties(Constant)
% Small constant to prevent division by zero.
Epsilon = 1e-8;
end
...
end
Create the function that constructs the layer and initializes the layer properties. Specify
any variables required to create the layer as inputs to the constructor function.
Specify an optional input argument name to assign to the Name property at creation.
Create a function named forwardLoss that returns the weighted cross entropy loss
between the predictions made by the network and the training targets. The syntax for
forwardLoss is loss = forwardLoss(layer, Y, T), where Y is the output of the
previous layer and T represents the training targets.
The size of Y depends on the output of the previous layer. To ensure that Y is the same
size as T, you must include a layer that outputs the correct size before the output layer.
For example, to ensure that Y is a 4-D array of prediction scores for K classes, you can
1-66
Define Custom Pixel Classification Layer with Dice Loss
include a fully connected layer of size K or a convolutional layer with K filters followed by
a softmax layer before the output layer.
intersection = sum(sum(Y.*T,1),2);
union = sum(sum(Y.^2 + T.^2, 1),2);
end
Create the backward loss function that returns the derivatives of the Dice loss with
respect to the predictions Y. The syntax for backwardLoss is loss =
backwardLoss(layer, Y, T), where Y is the output of the previous layer and T
represents the training targets.
intersection = sum(sum(Y.*T,1),2);
union = sum(sum(Y.^2 + T.^2, 1),2);
1-67
1 Featured Examples
N = size(Y,4);
Completed Layer
properties(Constant)
% Small constant to prevent division by zero.
Epsilon = 1e-8;
end
methods
intersection = sum(sum(Y.*T,1),2);
union = sum(sum(Y.^2 + T.^2, 1),2);
1-68
Define Custom Pixel Classification Layer with Dice Loss
end
intersection = sum(sum(Y.*T,1),2);
union = sum(sum(Y.^2 + T.^2, 1),2);
N = size(Y,4);
GPU Compatibility
For GPU compatibility, the layer functions must support inputs and return outputs of type
gpuArray. Any other functions used by the layer must do the same.
1-69
1 Featured Examples
layer = dicePixelClassificationLayer('dice');
Check the layer validity of the layer using checkLayer. Specify the valid input size to be
the size of a single observation of typical input to the layer. The layer expects a H-by-W-by-
K-by-N array inputs, where K is the number of classes, and N is the number of
observations in the mini-batch.
numClasses = 2;
validInputSize = [4 4 numClasses];
checkLayer(layer,validInputSize, 'ObservationDimension',4)
Running nnet.checklayer.OutputLayerTestCase
.......... .......
Done nnet.checklayer.OutputLayerTestCase
__________
Test Summary:
17 Passed, 0 Failed, 0 Incomplete, 0 Skipped.
Time elapsed: 1.6227 seconds.
The test summary reports the number of passed, failed, incomplete, and skipped tests.
layers =
10x1 Layer array with layers:
1-70
Define Custom Pixel Classification Layer with Dice Loss
4 '' Max Pooling 2x2 max pooling with stride [2 2] and paddi
5 '' Convolution 64 3x3 convolutions with stride [1 1] and p
6 '' ReLU ReLU
7 '' Transposed Convolution 64 4x4 transposed convolutions with stride [
8 '' Convolution 2 1x1 convolutions with stride [1 1] and pa
9 '' Softmax softmax
10 'dice' Classification Output Dice loss
dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageDir = fullfile(dataSetDir,'trainingImages');
labelDir = fullfile(dataSetDir,'trainingLabels');
imds = imageDatastore(imageDir);
ds = pixelLabelImageDatastore(imds,pxds);
net = trainNetwork(ds,layers,options);
1-71
1 Featured Examples
Evaluate the trained network by segmenting a test image and displaying the
segmentation result.
I = imread('triangleTest.jpg');
[C,scores] = semanticseg(I,net);
B = labeloverlay(I,C);
figure
imshow(imtile({I,B}))
References
1 Crum, William R., Oscar Camara, and Derek LG Hill. "Generalized overlap measures
for evaluation and validation in medical image analysis." IEEE transactions on
medical imaging 25.11 (2006): 1451-1461.
1-72
Define Custom Pixel Classification Layer with Dice Loss
2 Sudre, Carole H., et al. "Generalised Dice overlap as a deep learning loss function for
highly unbalanced segmentations." Deep Learning in Medical Image Analysis and
Multimodal Learning for Clinical Decision Support. Springer, Cham, 2017. 240-248.
1-73
1 Featured Examples
videoFReader = vision.VideoFileReader('ecolicells.avi');
videoPlayer = vision.VideoPlayer;
Use a while loop to read and play the video frames. Pause for 0.1 seconds after displaying
each frame.
while ~isDone(videoFReader)
videoFrame = videoFReader();
videoPlayer(videoFrame);
pause(0.1)
end
1-74
Read and Play a Video File
release(videoPlayer);
release(videoFReader);
1-75
1 Featured Examples
1-76
Find Vertical and Horizontal Edges in Image
I = imread('pout.tif');
intImage = integralImage(I);
Construct Haar-like wavelet filters. Use the dot notation to find the vertical filter from the
horizontal filter.
vertH =
integralKernel with properties:
imtool(horiH.Coefficients, 'InitialMagnification','fit');
1-77
1 Featured Examples
horiResponse = integralFilter(intImage,horiH);
vertResponse = integralFilter(intImage,vertH);
1-78
Find Vertical and Horizontal Edges in Image
figure;
imshow(horiResponse,[]);
title('Horizontal edge responses');
figure;
imshow(vertResponse,[]);
title('Vertical edge responses');
1-79
1 Featured Examples
1-80
Blur an Image Using an Average Filter
Cast the result back to the same class as the input image.
1-81
1 Featured Examples
J = uint8(J);
figure
imshow(J);
1-82
Define a Filter to Approximate a Gaussian Second Order Partial Derivative in Y Direction
You can also define this filter as integralKernel([1,1,5,3;1,4,5,3;1,7,5,3], [1, -2, 1]);|.
This filter definition is less efficient because it requires three bounding boxes.
ydH.Coefficients
ans = 9×5
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
-2 -2 -2 -2 -2
-2 -2 -2 -2 -2
-2 -2 -2 -2 -2
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1-83
1 Featured Examples
I1 = rgb2gray(imread('viprectification_deskLeft.png'));
I2 = rgb2gray(imread('viprectification_deskRight.png'));
points1 = detectHarrisFeatures(I1);
points2 = detectHarrisFeatures(I2);
[features1,valid_points1] = extractFeatures(I1,points1);
[features2,valid_points2] = extractFeatures(I2,points2);
indexPairs = matchFeatures(features1,features2);
matchedPoints1 = valid_points1(indexPairs(:,1),:);
matchedPoints2 = valid_points2(indexPairs(:,2),:);
Visualize the corresponding points. You can see the effect of translation between the two
images despite several erroneous matches.
figure; showMatchedFeatures(I1,I2,matchedPoints1,matchedPoints2);
1-84
Find Corresponding Interest Points Between Pair of Images
1-85
1 Featured Examples
I1 = imread('cameraman.tif');
I2 = imresize(imrotate(I1,-20),1.2);
points1 = detectSURFFeatures(I1);
points2 = detectSURFFeatures(I2);
[f1,vpts1] = extractFeatures(I1,points1);
[f2,vpts2] = extractFeatures(I2,points2);
indexPairs = matchFeatures(f1,f2) ;
matchedPoints1 = vpts1(indexPairs(:,1));
matchedPoints2 = vpts2(indexPairs(:,2));
Display the matching points. The data still includes several outliers, but you can see the
effects of rotation and scaling on the display of matched features.
figure; showMatchedFeatures(I1,I2,matchedPoints1,matchedPoints2);
legend('matched points 1','matched points 2');
1-86
Find Corresponding Points Using SURF Features
1-87
1 Featured Examples
I = imread('cameraman.tif');
points = detectSURFFeatures(I);
1-88
Using LBP Features to Differentiate Images by Texture
1-89
1 Featured Examples
figure
imshow(rotatedBrickWall)
title('Rotated Bricks')
figure
imshow(carpet)
title('Carpet')
1-90
Using LBP Features to Differentiate Images by Texture
Extract LBP features from the images to encode their texture information.
lbpBricks1 = extractLBPFeatures(brickWall,'Upright',false);
lbpBricks2 = extractLBPFeatures(rotatedBrickWall,'Upright',false);
lbpCarpet = extractLBPFeatures(carpet,'Upright',false);
Gauge the similarity between the LBP features by computing the squared error between
them.
brickVsBrick = (lbpBricks1 - lbpBricks2).^2;
brickVsCarpet = (lbpBricks1 - lbpCarpet).^2;
Visualize the squared error to compare bricks versus bricks and bricks versus carpet. The
squared error is smaller when images have similar texture.
1-91
1 Featured Examples
figure
bar([brickVsBrick; brickVsCarpet]','grouped')
title('Squared Error of LBP Histograms')
xlabel('LBP Histogram Bins')
legend('Bricks vs Rotated Bricks','Bricks vs Carpet')
1-92
Extract and Plot HOG Features
img = imread('cameraman.tif');
[featureVector,hogVisualization] = extractHOGFeatures(img);
figure;
imshow(img);
hold on;
plot(hogVisualization);
1-93
1 Featured Examples
I1 = rgb2gray(imread('viprectification_deskLeft.png'));
I2 = rgb2gray(imread('viprectification_deskRight.png'));
points1 = detectHarrisFeatures(I1);
points2 = detectHarrisFeatures(I2);
[features1,valid_points1] = extractFeatures(I1,points1);
[features2,valid_points2] = extractFeatures(I2,points2);
indexPairs = matchFeatures(features1,features2);
matchedPoints1 = valid_points1(indexPairs(:,1),:);
matchedPoints2 = valid_points2(indexPairs(:,2),:);
Visualize the corresponding points. You can see the effect of translation between the two
images despite several erroneous matches.
figure; showMatchedFeatures(I1,I2,matchedPoints1,matchedPoints2);
1-94
Find Corresponding Interest Points Between Pair of Images
1-95
1 Featured Examples
ocrResults =
ocrText with properties:
recognizedText = ocrResults.Text;
figure;
imshow(businessCard);
text(600, 150, recognizedText, 'BackgroundColor', [1 1 1]);
1-96
Recognize Text Within an Image
1-97
1 Featured Examples
peopleDetector = vision.PeopleDetector('ClassificationThreshold',...
0,'MergeDetections',false);
Read an image, run the people detector, and then insert bounding boxes with confidence
scores.
I = imread('visionteam1.jpg');
[bbox,score] = step(peopleDetector,I);
I1 = insertObjectAnnotation(I,'rectangle',bbox,...
cellstr(num2str(score)),'Color','r');
[selectedBbox,selectedScore] = selectStrongestBbox(bbox,score);
I2 = insertObjectAnnotation(I,'rectangle',selectedBbox,...
cellstr(num2str(selectedScore)),'Color','r');
1-98
Run Nonmaximal Suppression on Bounding Boxes Using People Detector
1-99
1 Featured Examples
1-100
Train Stop Sign Detector
load('stopSignsAndCars.mat');
Select the bounding boxes for stop signs from the table.
positiveInstances = stopSignsAndCars(:,1:2);
imDir = fullfile(matlabroot,'toolbox','vision','visiondata',...
'stopSignImages');
addpath(imDir);
negativeFolder = fullfile(matlabroot,'toolbox','vision','visiondata',...
'nonStopSigns');
negativeImages = imageDatastore(negativeFolder);
Train a cascade object detector called 'stopSignDetector.xml' using HOG features. NOTE:
The command can take several minutes to run.
trainCascadeObjectDetector('stopSignDetector.xml',positiveInstances, ...
negativeFolder,'FalseAlarmRate',0.1,'NumCascadeStages',5);
--cascadeParams--
Training stage 1 of 5
[........................................................................]
Used 42 positive and 84 negative samples
Time to train stage 1: 1 seconds
1-101
1 Featured Examples
Training stage 2 of 5
[........................................................................]
Used 42 positive and 84 negative samples
Time to train stage 2: 1 seconds
Training stage 3 of 5
[........................................................................]
Used 42 positive and 84 negative samples
Time to train stage 3: 5 seconds
Training stage 4 of 5
[........................................................................]
Used 42 positive and 84 negative samples
Time to train stage 4: 14 seconds
Training stage 5 of 5
[........................................................................]
Used 42 positive and 17 negative samples
Time to train stage 5: 23 seconds
Training complete
detector = vision.CascadeObjectDetector('stopSignDetector.xml');
img = imread('stopSignTest.jpg');
bbox = step(detector,img);
figure; imshow(detectedImg);
1-102
Train Stop Sign Detector
rmpath(imDir);
1-103
1 Featured Examples
Create System objects to read the video frames, detect foreground physical objects, and
display results.
videoReader = vision.VideoFileReader('singleball.mp4');
videoPlayer = vision.VideoPlayer('Position',[100,100,500,400]);
foregroundDetector = vision.ForegroundDetector('NumTrainingFrames',10,...
'InitialVariance',0.05);
blobAnalyzer = vision.BlobAnalysis('AreaOutputPort',false,...
'MinimumBlobArea',70);
Process each video frame to detect and track the ball. After reading the current video
frame, the example searches for the ball by using background subtraction and blob
analysis. When the ball is first detected, the example creates a Kalman filter. The Kalman
filter determines the ball?s location, whether it is detected or not. If the ball is detected,
the Kalman filter first predicts its state at the current video frame. The filter then uses the
newly detected location to correct the state, producing a filtered location. If the ball is
missing, the Kalman filter solely relies on its previous state to predict the ball's current
location.
if ~isTrackInitialized
if isObjectDetected
kalmanFilter = configureKalmanFilter('ConstantAcceleration',...
detectedLocation(1,:), [1 1 1]*1e5, [25, 10, 10], 25);
isTrackInitialized = true;
end
label = ''; circle = zeros(0,3);
else
if isObjectDetected
predict(kalmanFilter);
trackedLocation = correct(kalmanFilter, detectedLocation(1,:));
label = 'Corrected';
1-104
Track an Occluded Object
else
trackedLocation = predict(kalmanFilter);
label = 'Predicted';
end
circle = [trackedLocation, 5];
end
colorImage = insertObjectAnnotation(colorImage,'circle',...
circle,label,'Color','red');
step(videoPlayer,colorImage);
end
1-105
1 Featured Examples
Release resources.
release(videoPlayer);
release(videoReader);
1-106
Track an Occluded Object
1-107
1 Featured Examples
videoFileReader = vision.VideoFileReader('visionface.avi');
videoPlayer = vision.VideoPlayer('Position',[100,100,680,520]);
Read the first video frame, which contains the object, define the region.
objectFrame = videoFileReader();
objectRegion = [264,122,93,93];
As an alternative, you can use the following commands to select the object region using a
mouse. The object must occupy the majority of the region:
figure; imshow(objectFrame);
objectRegion=round(getPosition(imrect))
objectImage = insertShape(objectFrame,'Rectangle',objectRegion,'Color','red');
figure;
imshow(objectImage);
title('Red box shows object region');
1-108
Track a Face in Scene
points = detectMinEigenFeatures(rgb2gray(objectFrame),'ROI',objectRegion);
pointImage = insertMarker(objectFrame,points.Location,'+','Color','white');
figure;
imshow(pointImage);
title('Detected interest points');
1-109
1 Featured Examples
1-110
Track a Face in Scene
videoPlayer(out);
end
1-111
1 Featured Examples
release(videoPlayer);
release(videoFileReader);
1-112
Assign Detections to Tracks in a Single Video Frame
Set the predicted locations of objects in the current frame. Obtain predictions using the
Kalman filter System object.
predictions = [1,1;2,2];
Set the locations of the objects detected in the current frame. For this example, there are
2 tracks and 3 new detections. Thus, at least one of the detections is unmatched, which
can indicate a new track.
detections = [1.1,1.1;2.1,2.1;1.5,3];
cost = zeros(size(predictions,1),size(detections,1));
Compute the cost of each prediction matching a detection. The cost here, is defined as the
Euclidean distance between the prediction and the detection.
for i = 1:size(predictions, 1)
diff = detections - repmat(predictions(i,:),[size(detections,1),1]);
cost(i, :) = sqrt(sum(diff .^ 2,2));
end
Associate detections with predictions. Detection 1 should match to track 1, and detection
2 should match to track 2. Detection 3 should be unmatched.
[assignment,unassignedTracks,unassignedDetections] = ...
assignDetectionsToTracks(cost,0.2);
figure;
plot(predictions(:,1),predictions(:,2),'*',detections(:,1),...
detections(:,2),'ro');
hold on;
legend('predictions','detections');
for i = 1:size(assignment,1)
text(predictions(assignment(i, 1),1)+0.1,...
predictions(assignment(i,1),2)-0.1,num2str(i));
text(detections(assignment(i, 2),1)+0.1,...
detections(assignment(i,2),2)-0.1,num2str(i));
end
for i = 1:length(unassignedDetections)
1-113
1 Featured Examples
text(detections(unassignedDetections(i),1)+0.1,...
detections(unassignedDetections(i),2)+0.1,'unassigned');
end
xlim([0,4]);
ylim([0,4]);
1-114
Create 3-D Stereo Display
Display the anaglyph. Use red-blue stereo glasses to see the stereo effect.
figure; imshow(A);
1-115
1 Featured Examples
1-116
Measure Distance from Stereo Camera to a Face
load('webcamsSceneReconstruction.mat');
I1 = imread('sceneReconstructionLeft.jpg');
I2 = imread('sceneReconstructionRight.jpg');
I1 = undistortImage(I1,stereoParams.CameraParameters1);
I2 = undistortImage(I2,stereoParams.CameraParameters2);
faceDetector = vision.CascadeObjectDetector;
face1 = faceDetector(I1);
face2 = faceDetector(I2);
1-117
1 Featured Examples
1-118
Reconstruct 3-D Scene from Disparity Map
1-119
1 Featured Examples
disparityMap = disparitySGM(rgb2gray(J1),rgb2gray(J2));
figure
imshow(disparityMap,[0,64],'InitialMagnification',50);
Reconstruct the 3-D world coordinates of points corresponding to each pixel from the
disparity map.
xyzPoints = reconstructScene(disparityMap,stereoParams);
Segment out a person located between 3.2 and 3.7 meters away from the camera.
Z = xyzPoints(:,:,3);
mask = repmat(Z > 3200 & Z < 3700,[1,1,3]);
J1(~mask) = 0;
imshow(J1,'InitialMagnification',50);
1-120
Reconstruct 3-D Scene from Disparity Map
1-121
1 Featured Examples
[imagePoints,boardSize] = detectCheckerboardPoints(...
leftImages.Files,rightImages.Files);
squareSize = 108;
worldPoints = generateCheckerboardPoints(boardSize,squareSize);
Calibrate the stereo camera system. Both cameras have the same resolution.
I = readimage(leftImages,1);
imageSize = [size(I, 1), size(I, 2)];
cameraParams = estimateCameraParameters(imagePoints,worldPoints, ...
'ImageSize',imageSize);
figure;
showExtrinsics(cameraParams);
1-122
Visualize Stereo Pair of Camera Extrinsic Parameters
figure;
showExtrinsics(cameraParams,'patternCentric');
1-123
1 Featured Examples
1-124
Read Point Cloud from a PLY File
1-125
1 Featured Examples
pcwrite(ptCloud,'teapotOut','PLYFormat','binary');
1-126
Visualize the Difference Between Two Point Clouds
load('livingRoom');
pc1 = livingRoomData{1};
pc2 = livingRoomData{2};
figure
pcshowpair(pc1,pc2,'VerticalAxis','Y','VerticalAxisDir','Down')
title('Difference Between Two Point Clouds')
xlabel('X(m)')
ylabel('Y(m)')
zlabel('Z(m)')
1-127
1 Featured Examples
1-128
View Rotating 3-D Point Cloud
ptCloud = pcread('teapot.ply');
x = pi/180;
R = [ cos(x) sin(x) 0 0
-sin(x) cos(x) 0 0
0 0 1 0
0 0 0 1];
tform = affine3d(R);
Compute x-_y_ limits that ensure that the rotated teapot is not clipped.
player = pcplayer(xlimits,ylimits,zlimits);
xlabel(player.Axes,'X (m)');
ylabel(player.Axes,'Y (m)');
zlabel(player.Axes,'Z (m)');
1-129
1 Featured Examples
for i = 1:360
ptCloud = pctransform(ptCloud,tform);
view(player,ptCloud);
end
1-130
View Rotating 3-D Point Cloud
1-131
1 Featured Examples
ptCloud = pcread('teapot.ply');
player = pcplayer(ptCloud.XLimits,ptCloud.YLimits,ptCloud.ZLimits);
1-132
Hide and Show 3-D Point Cloud Figure
Hide figure.
hide(player)
Show figure.
show(player)
view(player,ptCloud);
1-133
1 Featured Examples
1-134
Align Two Point Clouds Using ICP Algorithm
pcshow(ptCloud);
title('Teapot');
Create a transform object with 30 degree rotation along z -axis and translation [5,5,10].
A = [cos(pi/6) sin(pi/6) 0 0; ...
-sin(pi/6) cos(pi/6) 0 0; ...
0 0 1 0; ...
1-135
1 Featured Examples
5 5 10 1];
tform1 = affine3d(A);
pcshow(ptCloudTformed);
title('Transformed Teapot');
1-136
Align Two Point Clouds Using ICP Algorithm
disp(tform1.T);
0.8660 0.5000 0 0
-0.5000 0.8660 0 0
0 0 1.0000 0
5.0000 5.0000 10.0000 1.0000
tform2 = invert(tform);
disp(tform2.T);
1-137
1 Featured Examples
Create an affine transform object that defines a 45 degree rotation along the z-axis.
A = [cos(pi/4) sin(pi/4) 0 0; ...
-sin(pi/4) cos(pi/4) 0 0; ...
0 0 1 0; ...
0 0 0 1];
tform = affine3d(A);
Create an affine transform object that defines shearing along the x-axis.
A = [1 0 0 0; ...
0.75 1 0 0; ...
0.75 0 1 0; ...
0 0 0 1];
tform = affine3d(A);
1-138
Affine Transformations of 3-D Point Cloud
xlabel('X');
ylabel('Y');
zlabel('Z');
title('3-D Point Cloud','FontSize',14)
Plot the rotation and shear affine transformed 3-D point clouds.
1-139
1 Featured Examples
1-140
Merge Two Identical Point Clouds Using Box Grid Filter
ptCloudA = pointCloud(100*rand(1000,3));
ptCloudB = copy(ptCloudA);
ptCloud = pcmerge(ptCloudA,ptCloudB,1);
pcshow(ptCloud);
1-141
1 Featured Examples
1-142
Extract Cylinder from Point Cloud
maxDistance = 0.005;
roi = [0.4,0.6,-inf,0.2,0.1,inf];
sampleIndices = findPointsInROI(ptCloud,roi);
referenceVector = [0,0,1];
Detect the cylinder and extract it from the point cloud by specifying the inlier points.
[model,inlierIndices] = pcfitcylinder(ptCloud,maxDistance,...
referenceVector,'SampleIndices',sampleIndices);
pc = select(ptCloud,inlierIndices);
figure
pcshow(pc)
title('Cylinder Point Cloud')
1-143
1 Featured Examples
1-144
Detect Multiple Planes from Point Cloud
1-145
1 Featured Examples
maxDistance = 0.02;
referenceVector = [0,0,1];
maxAngularDistance = 5;
Detect the first plane, the table, and extract it from the point cloud.
[model1,inlierIndices,outlierIndices] = pcfitplane(ptCloud,...
maxDistance,referenceVector,maxAngularDistance);
plane1 = select(ptCloud,inlierIndices);
remainPtCloud = select(ptCloud,outlierIndices);
Set the region of interest to constrain the search for the second plane, left wall.
roi = [-inf,inf;0.4,inf;-inf,inf];
sampleIndices = findPointsInROI(remainPtCloud,roi);
Detect the left wall and extract it from the remaining point cloud.
[model2,inlierIndices,outlierIndices] = pcfitplane(remainPtCloud,...
maxDistance,'SampleIndices',sampleIndices);
plane2 = select(remainPtCloud,inlierIndices);
remainPtCloud = select(remainPtCloud,outlierIndices);
figure
pcshow(plane1)
title('First Plane')
1-146
Detect Multiple Planes from Point Cloud
figure
pcshow(plane2)
title('Second Plane')
1-147
1 Featured Examples
figure
pcshow(remainPtCloud)
title('Remaining Point Cloud')
1-148
Detect Multiple Planes from Point Cloud
1-149
1 Featured Examples
1-150
Detect Sphere from Point Cloud
maxDistance = 0.01;
roi = [-inf,0.5,0.2,0.4,0.1,inf];
sampleIndices = findPointsInROI(ptCloud,roi);
Detect the sphere, a globe, and extract it from the point cloud.
[model,inlierIndices] = pcfitsphere(ptCloud,maxDistance,...
'SampleIndices',sampleIndices);
globe = select(ptCloud,inlierIndices);
hold on
plot(model)
1-151
1 Featured Examples
figure
pcshow(globe)
title('Globe Point Cloud')
1-152
Detect Sphere from Point Cloud
1-153
1 Featured Examples
gv = 0:0.01:1;
[X,Y] = meshgrid(gv,gv);
ptCloud = pointCloud([X(:),Y(:),0.5*ones(numel(X),1)]);
figure
pcshow(ptCloud);
title('Original Data');
1-154
Remove Outliers from Noisy Point Cloud
figure
pcshow(ptCloudA);
title('Noisy Data');
Remove outliers.
ptCloudB = pcdenoise(ptCloudA);
figure;
pcshow(ptCloudB);
title('Denoised Data');
1-155
1 Featured Examples
1-156
Downsample Point Cloud Using Box Grid Filter
1-157
1 Featured Examples
Compare the point cloud to data that is downsampled using a fixed step size.
stepSize = floor(ptCloud.Count/ptCloudA.Count);
indices = 1:stepSize:ptCloud.Count;
ptCloudB = select(ptCloud, indices);
figure;
pcshow(ptCloudB);
1-158
Measure Distance from Stereo Camera to a Face
load('webcamsSceneReconstruction.mat');
I1 = imread('sceneReconstructionLeft.jpg');
I2 = imread('sceneReconstructionRight.jpg');
I1 = undistortImage(I1,stereoParams.CameraParameters1);
I2 = undistortImage(I2,stereoParams.CameraParameters2);
faceDetector = vision.CascadeObjectDetector;
face1 = faceDetector(I1);
face2 = faceDetector(I2);
1-159
1 Featured Examples
1-160
Remove Motion Artifacts From Image
hdinterlacer = vision.Deinterlacer;
I = imread('vipinterlace.png');
clearimage = hdinterlacer(I);
imshow(I);
title('Original Image');
1-161
1 Featured Examples
figure, imshow(clearimage);
title('Image after deinterlacing');
1-162
Remove Motion Artifacts From Image
1-163
1 Featured Examples
I = imread('pout.tif');
intImage = integralImage(I);
Construct Haar-like wavelet filters. Use the dot notation to find the vertical filter from the
horizontal filter.
vertH =
integralKernel with properties:
imtool(horiH.Coefficients, 'InitialMagnification','fit');
1-164
Find Vertical and Horizontal Edges in Image
horiResponse = integralFilter(intImage,horiH);
vertResponse = integralFilter(intImage,vertH);
1-165
1 Featured Examples
figure;
imshow(horiResponse,[]);
title('Horizontal edge responses');
figure;
imshow(vertResponse,[]);
title('Vertical edge responses');
1-166
Find Vertical and Horizontal Edges in Image
1-167
1 Featured Examples
images = imageSet(fullfile(toolboxdir('vision'),'visiondata',...
'calibration','mono'));
imageFileNames = images.ImageLocation;
squareSizeInMM = 29;
worldPoints = generateCheckerboardPoints(boardSize,squareSizeInMM);
I = readimage(images,1);
imageSize = [size(I, 1),size(I, 2)];
params = estimateCameraParameters(imagePoints,worldPoints, ...
'ImageSize',imageSize);
showReprojectionErrors(params);
1-168
Single Camera Calibration
figure;
showExtrinsics(params);
1-169
1 Featured Examples
drawnow;
figure;
imshow(imageFileNames{1});
hold on;
plot(imagePoints(:,1,1), imagePoints(:,2,1),'go');
plot(params.ReprojectedPoints(:,1,1),params.ReprojectedPoints(:,2,1),'r+');
legend('Detected Points','ReprojectedPoints');
hold off;
1-170
Single Camera Calibration
1-171
1 Featured Examples
I = imread(fullfile(matlabroot,'toolbox','vision','visiondata','calibration','mono','im
J = undistortImage(I,cameraParams);
figure; imshowpair(imresize(I,0.5),imresize(J,0.5),'montage');
title('Original Image (left) vs. Corrected Image (right)');
1-172
Plot Spherical Point Cloud with Texture Mapping
numFaces = 600;
[x,y,z] = sphere(numFaces);
figure;
pcshow([x(:),y(:),z(:)]);
title('Sphere with Default Color Map');
xlabel('X');
ylabel('Y');
zlabel('Z');
1-173
1 Featured Examples
I = im2double(imread('visionteam1.jpg'));
imshow(I);
1-174
Plot Spherical Point Cloud with Texture Mapping
J = flipud(imresize(I,size(x)));
pcshow([x(:),y(:),z(:)],reshape(J,[],3));
title('Sphere with Color Texture');
xlabel('X');
ylabel('Y');
zlabel('Z');
1-175
1 Featured Examples
1-176
Plot Color Point Cloud from Kinect for Windows
Initialize a point cloud player to visualize 3-D point cloud data. The axis is set
appropriately to visualize the point cloud from Kinect.
player = pcplayer(ptCloud.XLimits,ptCloud.YLimits,ptCloud.ZLimits,...
'VerticalAxis','y','VerticalAxisDir','down');
xlabel(player.Axes,'X (m)');
ylabel(player.Axes,'Y (m)');
zlabel(player.Axes,'Z (m)');
Acquire and view 500 frames of live Kinect point cloud data.
for i = 1:500
colorImage = step(colorDevice);
1-177
1 Featured Examples
depthImage = step(depthDevice);
ptCloud = pcfromkinect(depthDevice,depthImage,colorImage);
view(player,ptCloud);
end
1-178
Plot Color Point Cloud from Kinect for Windows
1-179
1 Featured Examples
release(colorDevice);
release(depthDevice);
1-180
Estimate Optical Flow Using Farneback Method
vidReader = VideoReader('visiontraffic.avi','CurrentTime',11);
Create an optical flow object for estimating the optical flow using Farneback method. The
output is an object specifying the optical flow estimation method and its properties.
opticFlow = opticalFlowFarneback
opticFlow =
opticalFlowFarneback with properties:
NumPyramidLevels: 3
PyramidScale: 0.5000
NumIterations: 3
NeighborhoodSize: 5
FilterSize: 15
h = figure;
movegui(h);
hViewPanel = uipanel(h,'Position',[0 0 1 1],'Title','Plot of Optical Flow Vectors');
hPlot = axes(hViewPanel);
Read the image frames and convert to grayscale images. Estimate the optical flow from
consecutive image frames. Display the current image frame and plot the optical flow
vectors as quiver plot.
while hasFrame(vidReader)
frameRGB = readFrame(vidReader);
frameGray = rgb2gray(frameRGB);
flow = estimateFlow(opticFlow,frameGray);
imshow(frameRGB)
hold on
plot(flow,'DecimationFactor',[5 5],'ScaleFactor',2,'Parent',hPlot);
hold off
pause(10^-3)
end
1-181
1 Featured Examples
1-182
Estimate Optical Flow Using Farneback Method
1-183
1 Featured Examples
vidReader = VideoReader('visiontraffic.avi','CurrentTime',11);
Create an optical flow object for estimating the optical flow using Lucas-Kanade DoG
method. Specify the threshold for noise reduction. The output is an optical flow object
specifying the optical flow estimation method and its properties.
opticFlow = opticalFlowLKDoG('NoiseThreshold',0.0005)
opticFlow =
opticalFlowLKDoG with properties:
NumFrames: 3
ImageFilterSigma: 1.5000
GradientFilterSigma: 1
NoiseThreshold: 5.0000e-04
h = figure;
movegui(h);
hViewPanel = uipanel(h,'Position',[0 0 1 1],'Title','Plot of Optical Flow Vectors');
hPlot = axes(hViewPanel);
Read the image frames and convert to grayscale images. Estimate the optical flow from
consecutive image frames. Display the current image frame and plot the optical flow
vectors as quiver plot.
while hasFrame(vidReader)
frameRGB = readFrame(vidReader);
frameGray = rgb2gray(frameRGB);
flow = estimateFlow(opticFlow,frameGray);
imshow(frameRGB)
hold on
plot(flow,'DecimationFactor',[5 5],'ScaleFactor',35,'Parent',hPlot);
hold off
pause(10^-3)
end
1-184
Compute Optical Flow Using Lucas-Kanade DoG Method
1-185
1 Featured Examples
1-186
Estimate Optical Flow Using Horn-Schunck Method
vidReader = VideoReader('visiontraffic.avi','CurrentTime',11);
Specify the optical flow estimation method as opticalFlowHS. The output is an object
specifying the optical flow estimation method and its properties.
opticFlow = opticalFlowHS
opticFlow =
opticalFlowHS with properties:
Smoothness: 1
MaxIteration: 10
VelocityDifference: 0
h = figure;
movegui(h);
hViewPanel = uipanel(h,'Position',[0 0 1 1],'Title','Plot of Optical Flow Vectors');
hPlot = axes(hViewPanel);
Read image frames from the VideoReader object and convert to grayscale images.
Estimate the optical flow from consecutive image frames. Display the current image
frame and plot the optical flow vectors as quiver plot.
while hasFrame(vidReader)
frameRGB = readFrame(vidReader);
frameGray = rgb2gray(frameRGB);
flow = estimateFlow(opticFlow,frameGray);
imshow(frameRGB)
hold on
plot(flow,'DecimationFactor',[5 5],'ScaleFactor',60,'Parent',hPlot);
hold off
pause(10^-3)
end
1-187
1 Featured Examples
1-188
Create an Optical Flow Object and Plot Its Velocity
Vx = randn(100,100);
Vy = randn(100,100);
opflow = opticalFlow(Vx,Vy);
Inspect the properties of the optical flow object. The orientation and the magnitude are
computed from the velocity matrices.
opflow
opflow =
opticalFlow with properties:
plot(opflow,'DecimationFactor',[10 10],'ScaleFactor',10);
1-189
1 Featured Examples
1-190
2
2-2
See Also
See Also
pcregistericp
Related Examples
• “3-D Point Cloud Registration and Stitching”
2-3
2 Point Cloud Processing
The version 1.0 PLY format, also known as the Stanford Triangle Format, defines a flexible
and systematic scheme for storing 3D data. The ASCII header specifies what data is in the
file by defining "elements" each with a set of "properties." Many PLY files only have vertex
and face data, however, it is possible to also include other data such as color information,
vertex normals, or application-specific properties.
Note The Computer Vision Toolbox point cloud data functions only support the (x,y,z)
coordinates, normals, and color properties.
File Header
An example header (italicized text is comment):
ply file ID
format binary_big_endian 1.0 specify data format and version
element vertex 9200 define "vertex" element
property float x
property float y
property float z
element face 18000 define "face" element
property list uchar int vertex_indices
end_header data starts after this line
The file begins with "ply," identifying that it is a PLY file. The header must also include a
format line with the syntax
2-4
The PLY Format
Supported data formats are "ascii" for data stored as text and "binary_little_endian" and
"binary_big_endian" for binary data (where little/big endian refers to the byte ordering of
multi-byte data). Element definitions begin with an "element" line followed by element
property definitions
For example, "element vertex 9200" defines an element "vertex" and specifies that 9200
vertices are stored in the file. Each element definition is followed by a list of properties of
that element. There are two kinds of properties, scalar and list. A scalar property
definition has the syntax
Name Type
char (8-bit) character
uchar (8-bit) unsigned character
short (16-bit) short integer
ushort (16-bit) unsigned short integer
int (32-bit) integer
uint (32-bit) unsigned integer
float (32-bit) single-precision float
double (64-bit) double-precision float
For compatibility between systems, note that the number of bits in each data type must
be consistent. A list type is stored with a count followed by a list of scalars. The definition
syntax for a list property is
For example,
2-5
2 Point Cloud Processing
defines vertex_index properties are stored starting with a byte count followed by integer
values. This is useful for storing polygon connectivity as it has the flexibility to specify a
variable number of vertex indices in each face.
The header can also include comments. The syntax for a comment is simply a line
beginning with "comment" followed by a one-line comment:
comment<comment text>
Comments can provide information about the data like the file's author, data description,
data source, and other textual data.
Data
Following the header, the element data is stored as either ASCII or binary data (as
specified by the format line in the header). After the header, the data is stored in the
order the elements and properties were defined. First, all the data for the first element
type is stored. In the example header, the first element type is "vertex" with 9200 vertices
in the file, and with float properties "x," "y," and "z."
float vertex[1].x
float vertex[1].y
float vertex[1].z
float vertex[2].x
float vertex[2].y
float vertex[2].z
...
float vertex[9200].x
float vertex[9200].y
float vertex[9200].z
In general, the properties data for each element is stored one element at a time.
2-6
The PLY Format
The list type properties are stored beginning with a count and followed by a list of
scalars. For example, the "face" element type has the list property "vertex_indices" with
uchar count and int scalar type.
uchar count
int face[1].vertex_indices[1]
int face[1].vertex_indices[2]
int face[1].vertex_indices[3]
...
int face[1].vertex_indices[count]
uchar count
int face[2].vertex_indices[1]
int face[2].vertex_indices[2]
int face[2].vertex_indices[3]
...
int face[2].vertex_indices[count]
...
2-7
2 Point Cloud Processing
2-8
See Also
See Also
pcread | pcwrite
2-9
3
• Select Get Add-ons from the Add-ons drop-down menu from the MATLAB® desktop.
The Add-on files are in the “MathWorks Features” section.
• Type visionSupportPackages in a MATLAB Command Window and follow the
prompts.
Note You must have write privileges for the installation folder.
When a new version of MATLAB software is released, repeat this process to check for
updates. You can also check for updates between releases.
3-2
Install OCR Language Data Files
OCR Language Data files contain pretrained language data from the OCR Engine,
tesseract-ocr, to use with the ocr function.
Installation
After you install third-party support files, you can use the data with the Computer Vision
Toolbox product. Use one of two ways to install the Add-on support files.
• Select Get Add-ons from the Add-ons drop-down menu from the MATLAB desktop.
The Add-on files are in the “MathWorks Features” section.
• Type visionSupportPackages in a MATLAB Command Window and follow the
prompts.
Note You must have write privileges for the installation folder.
When a new version of MATLAB software is released, repeat this process to check for
updates. You can also check for updates between releases.
txt = ocr(img,'Language','Finnish');
• 'Afrikaans'
• 'Albanian'
3-3
3 Using the Installer for Computer Vision System Toolbox Product
• 'AncientGreek'
• 'Arabic'
• 'Azerbaijani'
• 'Basque'
• 'Belarusian'
• 'Bengali'
• 'Bulgarian'
• 'Catalan'
• 'Cherokee'
• 'ChineseSimplified'
• 'ChineseTraditional'
• 'Croatian'
• 'Czech'
• 'Danish'
• 'Dutch'
• 'English'
• 'Esperanto'
• 'EsperantoAlternative'
• 'Estonian'
• 'Finnish'
• 'Frankish'
• 'French'
• 'Galician'
• 'German'
• 'Greek'
• 'Hebrew'
• 'Hindi'
• 'Hungarian'
• 'Icelandic'
• 'Indonesian'
3-4
Install OCR Language Data Files
• 'Italian'
• 'ItalianOld'
• 'Japanese'
• 'Kannada'
• 'Korean'
• 'Latvian'
• 'Lithuanian'
• 'Macedonian'
• 'Malay'
• 'Malayalam'
• 'Maltese'
• 'MathEquation'
• 'MiddleEnglish'
• 'MiddleFrench'
• 'Norwegian'
• 'Polish'
• 'Portuguese'
• 'Romanian'
• 'Russian'
• 'SerbianLatin'
• 'Slovakian'
• 'Slovenian'
• 'Spanish'
• 'SpanishOld'
• 'Swahili'
• 'Swedish'
• 'Tagalog'
• 'Tamil'
• 'Telugu'
• 'Thai'
3-5
3 Using the Installer for Computer Vision System Toolbox Product
• 'Turkish'
• 'Ukrainian'
See Also
OCR Trainer | ocr | visionSupportPackages
Related Examples
• “Recognize Text Using Optical Character Recognition (OCR)”
3-6
Install and Use Computer Vision Toolbox OpenCV Interface
In this section...
“Installation” on page 3-7
“Support Package Contents” on page 3-7
“Create MEX-File from OpenCV C++ file” on page 3-8
“Use the OpenCV Interface C++ API” on page 3-9
“Create Your Own OpenCV MEX-files” on page 3-10
“Run OpenCV Examples” on page 3-10
Installation
After you install third-party support files, you can use the data with the Computer Vision
Toolbox product. Use one of two ways to install the Add-on support files.
• Select Get Add-ons from the Add-ons drop-down menu from the MATLAB desktop.
The Add-on files are in the “MathWorks Features” section.
• Type visionSupportPackages in a MATLAB Command Window and follow the
prompts.
Note You must have write privileges for the installation folder.
When a new version of MATLAB software is released, repeat this process to check for
updates. You can also check for updates between releases.
3-7
3 Using the Installer for Computer Vision System Toolbox Product
fileparts(which('mexOpenCV'))
Files Contents
example folder Template Matching, Foreground Detector, and Oriented FAST and
Rotated BRIEF (ORB) examples, including a GPU version. Each
subfolder in the example folder contains a README.txt file with step-
by-step instructions.
registry Registration files.
folder
mexOpenCV.m Function to build MEX-files.
file
README.txt Help file.
file
The mex function uses prebuilt OpenCV libraries, which ship with the Computer Vision
Toolbox product. Your compiler must be compatible with the one used to build the
libraries. The following compilers are used to build the OpenCV libraries for MATLAB
host:
cd(fullfile(fileparts(which('mexOpenCV')),'example',filesep,'TemplateMatching'))
2 Create the MEX-file from the source file:
3-8
Install and Use Computer Vision Toolbox OpenCV Interface
mexOpenCV matchTemplateOCV.cpp
3 Run the test script, which uses the generated MEX-file:
testMatchTemplate
Function Description
ocvCheckFeaturePointsStruct Check that MATLAB struct represents feature points
ocvStructToKeyPoints Convert MATLAB feature points struct to OpenCV
KeyPoint vector
ocvKeyPointsToStruct Convert OpenCV KeyPoint vector to MATLAB struct
ocvMxArrayToCvRect Convert a MATLAB struct representing a rectangle
to an OpenCV CvRect
ocvCvRectToMxArray Convert OpenCV CvRect to a MATLAB struct
ocvCvBox2DToMxArray Convert OpenCV CvBox2D to a MATLAB struct
ocvCvRectToBoundingBox_{DataType} Convert vector<cv::Rect> to M-by-4 mxArray of
bounding boxes
ocvMxArrayToSize_{DataType} Convert 2-element mxArray to cv::Size
ocvMxArrayToImage_{DataType} Convert column major mxArray to row major
cv::Mat for image
ocvMxArrayToMat_{DataType} Convert column major mxArray to row major
cv::Mat for generic matrix
ocvMxArrayFromImage_{DataType} Convert row major cv::Mat to column major
mxArray for image
ocvMxArrayFromMat_{DataType} Convert row major cv::Mat to column major
mxArray for generic matrix.
ocvMxArrayFromVector Convert numeric vectorT to mxArray
3-9
3 Using the Installer for Computer Vision System Toolbox Product
Function Description
ocvMxArrayFromPoints2f Converts vector<cv::Point2f> to mxArray
mexOpenCV yourfile.cpp
For help creating MEX files, at the MATLAB command prompt, type:
help mexOpenCV
cd(fullfile(fileparts(which('mexOpenCV')),'example',filesep,'TemplateMatching'))
2 Create the MEX-file from the source file:
mexOpenCV matchTemplateOCV.cpp
3 Run the test script, which uses the generated MEX-file:
testMatchTemplate
cd(fullfile(fileparts(which('mexOpenCV')),'example',filesep,'ForegroundDetector'))
3-10
See Also
PC:
mexOpenCV detectORBFeaturesOCV_GPU.cpp -lgpu -lmwocvgpumex -largeArrayDims
Linux/Mac:
mexOpenCV detectORBFeaturesOCV_GPU.cpp -lmwgpu -lmwocvgpumex -largeArrayDims
3 Run the test script, which uses the generated MEX-file:
testORBFeaturesOCV_GPU.m
See Also
“C Matrix API” (MATLAB) | mxArray
3-11
3 Using the Installer for Computer Vision System Toolbox Product
More About
• “Install Computer Vision Toolbox Add-on Support Files” on page 3-2
• Using OpenCV with MATLAB
3-12
4
Learn how to import and export videos, and perform color space and video image
conversions.
You can open the example model by typing at the MATLAB command line.
ex_export_to_mmf
By increasing the red, green, and blue color values, you increase the contrast of the
video. The To Multimedia File block exports the video data from the Simulink model to a
multimedia file that it creates in your current folder.
This example manipulated the video stream and exported it from a Simulink model to a
multimedia file. For more information, see the To Multimedia File block reference page.
4-2
Export to Video Files
Block Parameter
Gain The Gain blocks are used to increase the red, green, and blue
values of the video stream. This increases the contrast of the
video:
Configuration Parameters
You can locate the Model Configuration Parameters by selecting Model
Configuration Parameters from the Simulation menu. For this example, the
parameters on the Solver pane, are set as follows:
• Stop time = 20
• Type = Fixed-step
• Solver = Discrete (no continuous states)
4-3
4 Input, Output, and Conversions
You can open the example model by typing at the MATLABcommand line.
ex_import_mmf
You have now imported and displayed a multimedia file in the Simulink model. In the
“Export to Video Files” on page 4-2 example you can manipulate your video stream and
export it to a multimedia file.
For more information on the blocks used in this example, see the From Multimedia File
and To Video Display block reference pages.
4-4
Import from Video Files
Block Parameter
From Multimedia File Use the From Multimedia File block to import the multimedia
file into the model:
• If you do not have your own multimedia file, use the default
vipmen.avi file, for the File name parameter.
• If the multimedia file is on your MATLAB path, enter the
filename for the File name parameter.
• If the file is not on your MATLAB path, use the Browse
button to locate the multimedia file.
• Set the Image signal parameter to Separate color
signals.
Configuration Parameters
You can locate the Model Configuration Parameters by selecting Model
Configuration Parameters from the Simulation menu. For this example, the
parameters on the Solver pane, are set as follows:
• Stop time = 20
• Type = Fixed-step
• Solver = Discrete (no continuous states)
4-5
4 Input, Output, and Conversions
Note In this example, the image files are a set of 10 microscope images of rat prostate
cancer cells. These files are only the first 10 of 100 images acquired.
1 Specify the folder containing the images, and use this information to create a list of
the file names, as follows:
fileFolder = fullfile(matlabroot,'toolbox','images','imdata');
dirOutput = dir(fullfile(fileFolder,'AT3_1m4_*.tif'));
fileNames = {dirOutput.name}'
2 View one of the images, using the following command sequence:
I = imread(fileNames{1});
imshow(I);
text(size(I,2),size(I,1)+15, ...
'Image files courtesy of Alan Partin', ...
'FontSize',7,'HorizontalAlignment','right');
text(size(I,2),size(I,1)+25, ....
'Johns Hopkins University', ...
'FontSize',7,'HorizontalAlignment','right');
3 Use a for loop to create a variable that stores the entire image sequence. You can use
this variable to import the sequence into Simulink.
for i = 1:length(fileNames)
my_video(:,:,i) = imread(fileNames{i});
end
For additional information about batch processing, see the “Image Sequences and Batch
Processing” (Image Processing Toolbox) section for the Image Processing Toolbox™.
Configuration Parameters
You can locate the Model Configuration Parameters by selecting Model
Configuration Parameters from the Simulation menu. For this example, the
parameters on the Solver pane, are set as follows:
4-6
Batch Process Image Files
• Stop time = 10
• Type = Fixed-step
• Solver = Discrete (no continuous states)
4-7
4 Input, Output, and Conversions
You can open the example model by typing at the MATLAB command line.
ex_display_sequence_of_images
1 The Video From Workspace block reads the files from the MATLAB workspace. The
Signal parameter is set to the name of the variable for the stored images. For this
example, it is set to my_video.
2 The Video Viewer block displays the sequence of images.
3 Run your model. You can view the image sequence in the Video Viewer window.
4-8
Display a Sequence of Images
4 Because the Video From Workspace block's Sample time parameter is set to 1 and
the Stop time parameter in the configuration parameters, is set to 10, the Video
Viewer block displays 10 images before the simulation stops.
Pre-loading Code
To find or modify the pre-loaded code, select File > Model Properties > Model
Properties. Then select the Callbacks tab. For more details on how to set-up callbacks,
see “Callbacks for Customized Model Behavior” (Simulink).
4-9
4 Input, Output, and Conversions
Configuration Parameters
You can locate the Model Configuration Parameters by selecting Model
Configuration Parameters from the Simulation menu. For this example, the
parameters on the Solver pane, are set as follows:
• Stop time = 10
• Type = Fixed-step
• Solver = Discrete (no continuous states)
4-10
Partition Video Frames to Multiple Image Files
You can open the example model by typing at the MATLAB command line.
ex_vision_partition_video_frames_to_multiple_files
implay output1.avi
implay output2.avi
implay output3.avi
4 Press the Play button.
For more information on the blocks used in this example, see the From Multimedia File,
Insert Text, Enabled Subsystem, and To Multimedia File block reference pages.
Block Parameter
From Multimedia File The From Multimedia File block imports an AVI file into the
model.
4-11
4 Input, Output, and Conversions
Block Parameter
Insert Text The example uses the Insert Text block to annotate the video
stream with frame numbers. The block writes the frame
number in green, in the upper-left corner of the output video
stream.
• Number of bits: 8
• Sample time: 1/30
Bias The bias block adds a bias to the input. The block parameters
are modified as follows:
• Bias: 1
Compare To Constant The Compare to Constant block sends frames 1 to 9 to the first
AVI file. The block parameters are modified as follows:
• Operator: <
• Constant value: 10
4-12
Partition Video Frames to Multiple Image Files
Block Parameter
Compare To Constant1 The Compare to Constant1 and Compare to Constant2 blocks
Compare To Constant2 send frames 10 to 19 to the second AVI file. The block
parameters are modified as follows:
• Operator: >=
• Constant value: 10
• Operator: <
• Constant value: 20
Compare To Constant3 The Compare to Constant3 block send frames 20 to 30 to the
third AVI file. The block parameters are modified as follows:
• Operator: >=
• Constant value: 20
Compare To Constant4 The Compare to Constant4 block stops the simulation when
the video reaches frame 30. The block parameters are
modified as follows:
• Operator: ==
• Constant value: 30
• Output data type: boolean
Each enabled subsystem should look similar to the subsystem shown in the following
figure.
4-13
4 Input, Output, and Conversions
Configuration Parameters
You can locate the Model Configuration Parameters by selecting Model
Configuration Parameters from the Simulation menu. For this example, the
parameters on the Solver pane, are set as follows:
• Type = Fixed-step
• Solver = Discrete (no continuous states)
4-14
Combine Video and Audio Streams
You can open the example model by typing at the MATLAB command line.
ex_combine_video_and_audio_streams
1 Run your model. The model creates a multimedia file called output.avi in your
current folder.
2 Play the multimedia file using a media player. The original video file now has an audio
component to it.
The From Multimedia File block used for the input video file inherits its sample time from
the vipmen.avi file. For video signals, the sample time equals the frame period. The
frame period is defined as 1/(frame rate). Because the input video frame rate is 30 frames
per second (fps), the block sets the frame period to 1/30 or 0.0333 seconds per frame.
The Samples per audio channel parameter is set to 735. This output audio frame size is
calculated by dividing the frequency of the audio signal (22050 samples per second) by
the frame rate (approximately 30 frames per second).
You must adjust the audio signal frame period to match the frame period of the video
signal. The video frame period is 0.0333 seconds per frame. Because the frame period is
also defined as the frame size divided by frequency, you can calculate the frame period of
the audio signal by dividing the frame size of the audio signal (735 samples per frame) by
the frequency (22050 samples per second) to get 0.0333 seconds per frame.
4-15
4 Input, Output, and Conversions
Configuration Parameters
You can locate the Configuration Parameters by selecting Model Configuration
Parameters from the Simulation menu. The parameters, on the Solver pane, are set as
follows:
• Stop time = 10
• Type = Fixed-step
• Solver = Discrete (no continuous states)
4-16
Import MATLAB Workspace Variables
4-17
4 Input, Output, and Conversions
Use the Signal parameter to specify the MATLAB workspace variable from which to read.
For more information about how to use this block, see the Video From Workspace block
reference page.
4-18
Resample Image Chroma
You can open the example model by typing at the MATLAB command line.
ex_vision_resample_image_chroma
1 Define an RGB image in the MATLAB workspace. To do so, at the MATLAB command
prompt, type:
I= imread('autumn.tif');
This command reads in an RGB image from a TIF file. The image I is a 206-by-345-
by-3 array of 8-bit unsigned integer values. Each plane of this array represents the
red, green, or blue color values of the image.
2 To view the image this array represents, at the MATLAB command prompt, type:
4-19
4 Input, Output, and Conversions
imshow(I)
3 Configure Simulink to display signal dimensions next to each signal line. Select
Display > Signals & Ports > Signal Dimensions.
4 Run your model. The recovered image appears in the Video Viewer window. The
Chroma Resampling block has downsampled the Cb and Cr components of an image.
5 Examine the signal dimensions in your model. The Chroma Resampling block
downsamples the Cb and Cr components of the image from 206-by-346 matrices to
206-by-173 matrices. These matrices require less bandwidth for transmission while
still communicating the information necessary to recover the image after it is
transmitted.
Block Parameter
Image from Import your image from the MATLAB workspace. Set the Value
Workspace parameter to I.
4-20
Resample Image Chroma
Block Parameter
Image Pad Change dimensions of the input I array from 206-by-345-by-3 to
206-by-346-by-3. You are changing these dimensions because the
Chroma Resampling block requires that the dimensions of the
input be divisible by 2. Set the block parameters as follows:
• Method = Symmetric
• Add columns to = Right
• Number of added columns = 1
• Add row to = No padding
The Image Pad block adds one column to the right of each plane of
the array by repeating its border values. This padding minimizes
the effect of the pixels outside the image on the processing of the
image.
4-21
4 Input, Output, and Conversions
Block Parameter
Selector, Selector1, Separate the individual color planes from the main signal. Such
Slector2 separation simplifies the color space conversion section of the
model. Set the Selector block parameters as follows:
Selector
Selector1
Selector2
4-22
Resample Image Chroma
Block Parameter
Color Space Convert the input values from the Y'CbCr color space to the R'G'B'
Conversion1 color space. Set the block parameters as follows:
Configuration Parameters
Open the Configuration dialog box by selecting Model Configuration Parameters from
the Simulation menu. Set the parameters as follows:
• Stop time = 10
• Type = Fixed-step
• Solver = Discrete (no continuous states)
4-23
4 Input, Output, and Conversions
You can open the example model by typing at the MATLAB command line.
ex_vision_thresholding_intensity
1 You can create a new Simulink model and add the blocks shown in the table.
4-24
Convert Intensity Images to Binary Images
6 Use the Video Viewer block to view the binary image. Accept the default parameters.
7 Connect the blocks as shown in the following figure.
8 Set the configuration parameters. Open the Configuration dialog box by selecting
Simulation > Model Configuration Parameters menu. Set the parameters as
follows:
4-25
4 Input, Output, and Conversions
4-26
Convert Intensity Images to Binary Images
Note A single threshold value was unable to effectively threshold this image due to
its uneven lighting. For information on how to address this problem, see “Correct
Nonuniform Illumination” on page 11-2.
You have used the Relational Operator block to convert an intensity image to a binary
image. For more information about this block, see the Relational Operator block reference
page. For additional information, see “Convert Between Image Types” (Image Processing
Toolbox).
4-27
4 Input, Output, and Conversions
ex_vision_autothreshold
ex_vision_thresholding_intensity
4-28
Convert Intensity Images to Binary Images
2 Use the Image from File block to import your image. In this example the image file is
a 256-by-256 matrix of 8-bit unsigned integer values that range from 0 to 255. Set the
File name parameter to rice.png
3 Delete the Constant and the Relational Operator blocks in this model.
4 Add an Autothreshold block from the Conversions library of the Computer Vision
Toolbox into your model.
5 Use the Autothreshold block to perform a thresholding operation that converts your
intensity image to a binary image. Select the Output threshold check box. This
block outputs the calculated threshold value at the Th port.
4-29
4 Input, Output, and Conversions
6 Add a Display block from the Sinks library of the DSP System Toolbox library.
Connect the Display block to the Th output port of the Authothreshold block.
7 If you have not already done so, set the configuration parameters. Open the
Configuration dialog box by selecting Model Configuration Parameters from the
Simulation menu. Set the parameters as follows:
4-30
Convert Intensity Images to Binary Images
4-31
4 Input, Output, and Conversions
In the model window, the Display block shows the threshold value, calculated by the
Autothreshold block, that separated the rice grains from the background.
4-32
Convert Intensity Images to Binary Images
You have used the Autothreshold block to convert an intensity image to a binary image.
For more information about this block, see the Autothreshold block reference page in the
Computer Vision Toolbox Reference. To open an example model that uses this block, type
vipstaples at the MATLAB command prompt.
4-33
4 Input, Output, and Conversions
Some image processing algorithms are customized for intensity images. If you want to
use one of these algorithms, you must first convert your image to intensity. In this topic,
you learn how to use the Color Space Conversion block to accomplish this task. You can
use this procedure to convert any R'G'B' image to an intensity image:
ex_vision_convert_rgb
1 Define an R'G'B' image in the MATLAB workspace. To read in an R'G'B' image from a
JPG file, at the MATLAB command prompt, type
I= imread('greens.jpg');
I is a 300-by-500-by-3 array of 8-bit unsigned integer values. Each plane of this array
represents the red, green, or blue color values of the image.
2 To view the image this matrix represents, at the MATLAB command prompt, type
imshow(I)
4-34
Convert R'G'B' to Intensity Images
3 Create a new Simulink model, and add to it the blocks shown in the following table.
4-35
4 Input, Output, and Conversions
7 Connect the blocks so that your model is similar to the following figure.
8 Set the configuration parameters. Open the Configuration dialog box by selecting
Model Configuration Parameters from the Simulation menu. Set the parameters
as follows:
4-36
Convert R'G'B' to Intensity Images
The image displayed in the Video Viewer window is the intensity version of the
greens.jpg image.
In this topic, you used the Color Space Conversion block to convert color information
from the R'G'B' color space to intensity. For more information on this block, see the Color
Space Conversion block reference page.
4-37
4 Input, Output, and Conversions
ex_vision_process_multidimensional
4-38
Process Multidimensional Color Video Signals
You can choose to process the image as a multidimensional array by setting the Image
signal parameter to One multidimensional signal in the Image From File block
dialog box.
The blocks that support multidimensional arrays meet at least one of the following
criteria:
4-39
4 Input, Output, and Conversions
You can also choose to work with the individual color planes of images or video signals.
For example, the following model passes a color image from a source block to a sink block
using three separate color planes.
ex_vision_process_individual
4-40
Process Multidimensional Color Video Signals
To process the individual color planes of an image or video signal, set the Image signal
parameter to Separate color signals in both the Image From File and Video Viewer
block dialog boxes.
4-41
4 Input, Output, and Conversions
Note The ability to output separate color signals is a legacy option. It is recommend that
you use multidimensional signals to represent color data.
If you are working with a block that only outputs multidimensional arrays, you can use
the Selector block to separate the color planes. If you are working with a block that only
accepts multidimensional arrays, you can use the Matrix Concatenation block to create a
multidimensional array.
4-42
Video Formats
Video Formats
The values in a binary, intensity, or RGB image can be different data types. The data type
of the image values determines which values correspond to black and white as well as the
absence or saturation of color. The following table summarizes the interpretation of the
upper and lower bound of each data type. To view the data types of the signals at each
port, from the Display menu, point to Signals & Ports, and select Port Data Types.
Note The Computer Vision Toolbox software considers any data type other than double-
precision floating point and single-precision floating point to be fixed point.
For example, for an intensity image whose image values are 8-bit unsigned integers, 0 is
black and 255 is white. For an intensity image whose image values are double-precision
floating point, 0 is black and 1 is white. For an intensity image whose image values are
16-bit signed integers, -32768 is black and 32767 is white.
For an RGB image whose image values are 8-bit unsigned integers, 0 0 0 is black,
255 255 255 is white, 255 0 0 is red, 0 255 0 is green, and 0 0 255 is blue. For an RGB
image whose image values are double-precision floating point, 0 0 0 is black, 1 1 1 is
white, 1 0 0 is red, 0 1 0 is green, and 0 0 1 is blue. For an RGB image whose image
values are 16-bit signed integers, -32768 -32768 -32768 is black, 32767 32767 32767 is
white, 32767 -32768 -32768 is red, -32768 32767 -32768 is green, and
-32768 -32768 32767 is blue.
4-43
4 Input, Output, and Conversions
If you have imported an image or a video stream into the MATLAB workspace using a
function from the MATLAB environment or the Image Processing Toolbox, the Computer
Vision Toolbox blocks will display this image or video stream correctly. If you have written
your own function or code to import images into the MATLAB environment, you must take
the column-major convention into account.
4-44
Image Formats
Image Formats
In the Computer Vision Toolbox software, images are real-valued ordered sets of color or
intensity data. The blocks interpret input matrices as images, where each element of the
matrix corresponds to a single pixel in the displayed image. Images can be binary,
intensity (grayscale), or RGB. This section explains how to represent these types of
images.
Binary Images
Binary images are represented by a Boolean matrix of 0s and 1s, which correspond to
black and white pixels, respectively.
Intensity Images
Intensity images are represented by a matrix of intensity values. While intensity images
are not stored with colormaps, you can use a gray colormap to display them.
RGB Images
RGB images are also known as a true-color images. With Computer Vision Toolbox blocks,
these images are represented by an array, where the first plane represents the red pixel
intensities, the second plane represents the green pixel intensities, and the third plane
represents the blue pixel intensities. In the Computer Vision Toolbox software, you can
pass RGB images between blocks as three separate color planes or as one
multidimensional array.
4-45
5
Use the video player vision.VideoPlayer System object when you require a simple
video display in MATLAB for streaming video.
You can open several instances of the implay function simultaneously to view multiple
video data sources at once. You can also dock these implay players in the MATLAB
desktop. Use the figure arrangement buttons in the upper-right corner of the Sinks
window to control the placement of the docked players.
5-2
Display, Stream, and Preview Videos
Use the To Video Display block in your Simulink model as a simple display viewer
designed for optimal performance. This block supports code generation for the Windows
platform.
Use the Video Viewer block when you require a wired-in video display with simulation
controls in your Simulink model. The Video Viewer block provides simulation control
buttons directly from the player interface. The block integrates play, pause, and step
features while running the model and also provides video analysis tools such as pixel
region viewer.
The implay function enables you to view video signals in Simulink models without adding
blocks to your model. You can open several instances of the implay player
simultaneously to view multiple video data sources at once. You can also dock these
players in the MATLAB desktop. Use the figure arrangement buttons in the upper-right
corner of the Sinks window to control the placement of the docked players.
Set Simulink simulation mode to Normal to use implay. implay does not work when you
use “Accelerating Simulink Models” on page 13-9.
5-3
5 Display and Graphics
Note During code generation, the Simulink Coder™ does not generate code for the
implay player.
5-4
Annotate Video Files with Frame Numbers
You can open the example model by typing at the MATLAB command line.
ex_vision_annotate_video_file_with_frame_numbers
5-5
5 Display and Graphics
Color Formatting
For this example, the color format for the video was set to Intensity, and therefore the
color value for the text was set to a scaled value. If instead, you set the color format to
RGB, then the text value must satisfy this format, and requires a 3-element vector.
Inserting Text
Use the Insert Text block to annotate the video stream with a running frame count. Set
the block parameters as follows:
5-6
Annotate Video Files with Frame Numbers
Configuration Parameters
Set the configuration parameters. Open the Configuration dialog box by selecting Model
Configuration Parameters from the Simulation menu. Set the parameters as follows:
5-7
5 Display and Graphics
Rectangle
Shape PTS input Drawn Shape
Single Rectangle Four-element row vector
[x y width height] where
x1 y1 width1 height1
x2 y2 width2 height2
⋮ ⋮ ⋮ ⋮
xM yM widthM heightM
5-8
Draw Shapes and Lines
5-9
5 Display and Graphics
Polygon
You can draw one or more polygons.
5-10
Draw Shapes and Lines
5-11
5 Display and Graphics
Circle
You can draw one or more circles.
x1 y1 radius1
x2 y2 radius2
⋮ ⋮ ⋮
xM yM radiusM
See Also
Insert Text | insertMarker | insertObjectAnnotation | insertShape
5-12
6
ex_vision_detect_edges_in_image
6-2
Detect Edges in Images
Open the Configuration dialog box by selecting Model Configuration Parameters from
the Simulation menu. The parameters are set as follows:
6-3
6 Registration and Stereo Vision
The Video Viewer1 window displays the edges of the rice grains in white and the
background in black.
6-4
Detect Edges in Images
The Video Viewer2 window displays the intensity image of the vertical gradient
components of the image. You can see that the vertical edges of the rice grains are darker
and more well defined than the horizontal edges.
6-5
6 Registration and Stereo Vision
The Video Viewer3 window displays the intensity image of the horizontal gradient
components of the image. In this image, the horizontal edges of the rice grains are more
well defined.
6-6
Detect Edges in Images
The Edge Detection block convolves the input matrix with the Sobel kernel. This
calculates the gradient components of the image that correspond to the horizontal and
vertical edge responses. The block outputs these components at the Gh and Gv ports,
respectively. Then the block performs a thresholding operation on the gradient
components to find the binary image. The binary image is a matrix filled with 1s and 0s.
The nonzero elements of this matrix correspond to the edge pixels and the zero elements
correspond to the background pixels. The block outputs the binary image at the Edge
port.
The matrix values at the Gv and Gh output ports of the Edge Detection block are double-
precision floating-point. These matrix values need to be scaled between 0 and 1 in order
to display them using the Video Viewer blocks. This is done with the Statistics and Math
Operation blocks.
6-7
6 Registration and Stereo Vision
Run the model faster by double-clicking the Edge Detection block and clear the Edge
thinning check box.
Your model runs faster because the Edge Detection block is more efficient when you clear
the Edge thinning check box. However, the edges of rice grains in the Video Viewer
window are wider.
bdclose('ex_vision_detect_edges_in_image');
6-8
Detect Lines in Images
You can open the example model by typing at the MATLAB command line.
ex_vision_detect_lines
The Video Viewer blocks display the original image, the image with all edges found, and
the image with the longest line annotated.
6-9
6 Registration and Stereo Vision
The Edge Detection block finds the edges in the intensity image. This process improves
the efficiency of the Hough Lines block by reducing the image area over which the block
searches for lines. The block also converts the image to a binary image, which is the
required input for the Hough Transform block.
For additional examples of the techniques used in this section, see the following list of
examples. You can open these examples by typing the title at the MATLAB command
prompt:
6-10
Detect Lines in Images
6-11
6 Registration and Stereo Vision
• Shape = Lines
• Border color = White
Configuration Parameters
Set the configuration parameters. Open the Configuration dialog box by selecting Model
Configuration Parameters from the Simulation menu. Set the parameters as follows:
6-12
Fisheye Calibration Basics
Fisheye cameras are used in odometry and to solve the simultaneous localization and
mapping (SLAM) problems visually. Other applications include, surveillance systems,
GoPro, virtual reality (VR) to capture 360 degree field of view (fov), and stitching
algorithms. These cameras use a complex series of lenses to enlarge the camera's field of
view, enabling it to capture wide panoramic or hemispherical images. However, the lenses
achieve this extremely wide angle view by distorting the lines of perspective in the
images
Because of the extreme distortion a fisheye lens produces, the pinhole model cannot
model a fisheye camera.
6-13
6 Registration and Stereo Vision
6-14
Fisheye Calibration Basics
Extrinsic Parameters
The extrinsic parameters consist of a rotation, R, and a translation, t. The origin of the
camera's coordinate system is at its optical center and its x- and y-axis define the image
plane.
Intrinsic Parameters
For the fisheye camera model, the intrinsic parameters include the polynomial mapping
coefficients of the projection function. The alignment coefficients are related to sensor
alignment and the transformation from the sensor plane to a pixel location in the camera
image plane.
The following equation maps an image point into its corresponding 3-D vector.
6-15
6 Registration and Stereo Vision
ÊXc ˆ Ê u ˆ
Á Y ˜= lÁ ˜
Á ˜c Á v ˜
ÁZ ˜ Á 4 ˜
Ë c¯ Ë a0 + a2 r + a3r + a4 r ¯
2 3
center:
r = u 2 + v2 .
The intrinsic parameters also account for stretching and distortion. The stretch matrix
compensates for the sensor-to-lens misalignment, and the distortion vector adjusts the
(0,0) location of the image plane.
The following equation relates the real distorted coordinates (u'',v'') to the ideal distorted
coordinates (u,v).
6-16
Fisheye Calibration Basics
[imagePoints,boardSize] = detectCheckerboardPoints(images.Files);
Estimate the fisheye camera calibration parameters based on the image and world points.
Use the first image to get the image size.
6-17
6 Registration and Stereo Vision
I = readimage(images,1);
imageSize = [size(I,1) size(I,2)];
params = estimateFisheyeParameters(imagePoints,worldPoints,imageSize);
Remove lens distortion from the first image I and display the results.
J1 = undistortFisheyeImage(I,params.Intrinsics);
figure
imshowpair(I,J1,'montage')
title('Original Image (left) vs. Corrected Image (right)')
J2 = undistortFisheyeImage(I,params.Intrinsics,'OutputView','full');
figure
imshow(J2)
title('Full Output View')
6-18
Fisheye Calibration Basics
References
[1] Scaramuzza, D., A. Martinelli, and R. Siegwart. "A Toolbox for Easy Calibrating
Omnidirectional Cameras." Proceedings to IEEE International Conference on
Intelligent Robots and Systems, (IROS). Beijing, China, October 7–15, 2006.
6-19
6 Registration and Stereo Vision
See Also
estimateFisheyeParameters | fisheyeCalibrationErrors |
fisheyeIntrinsics | fisheyeIntrinsicsEstimationErrors |
fisheyeParameters | undistortFisheyeImage | undistortFisheyePoints
Related Examples
• “Monocular Visual Odometry”
6-20
Single Camera Calibrator App
In this section...
“Camera Calibrator Overview” on page 6-21
“Single Camera Calibration” on page 6-21
“Open the Camera Calibrator” on page 6-22
“Prepare the Pattern, Camera, and Images” on page 6-22
“Add Images and Select Camera Model” on page 6-26
“Calibrate” on page 6-30
“Evaluate Calibration Results” on page 6-32
“Improve Calibration” on page 6-37
“Export Camera Parameters” on page 6-41
The suite of calibration functions used by the Camera Calibrator app provide the
workflow for camera calibration. You can use these functions directly in theMATLAB
workspace. For a list of functions, see “Single and Stereo Camera Calibration”.
6-21
6 Registration and Stereo Vision
In some cases, the default values work well, and you do not need to make any
improvements before exporting parameters. You can also make improvements using the
camera calibration functions directly in the MATLAB workspace. For a list of functions,
see “Single and Stereo Camera Calibration”.
Note The Camera Calibrator app supports only checkerboard patterns. If you are using a
different type of calibration pattern, you can still calibrate your camera using the
estimateCameraParameters function. Using a different type of pattern requires that
you supply your own code to detect the pattern points in the image.
6-22
Single Camera Calibrator App
you can use the camera calibration MATLAB functions directly. See “Single and Stereo
Camera Calibration” for the list of functions.
You can print (from MATLAB) and use the checkerboard pattern provided. The
checkerboard pattern you use must not be square. One side must contain an even number
of squares and the other side must contain an odd number of squares. Therefore, the
pattern contains two black corners along one side and two white corners on the opposite
side. This criteria enables the app to determine the orientation of the pattern. The
calibrator assigns the longer side to be the x-direction.
1 Attach the checkerboard printout to a flat surface. Imperfections on the surface can
affect the accuracy of the calibration.
2 Measure one side of the checkerboard square. You need this measurement for
calibration. The size of the squares can vary depending on printer settings.
6-23
6 Registration and Stereo Vision
3 To improve the detection speed, set up the pattern with as little background clutter
as possible.
Camera Setup
Capture Images
For better results, use at least 10 to 20 images of the calibration pattern. The calibrator
requires at least three images. Use uncompressed images or images in lossless
compression formats such as PNG. For greater calibration accuracy:
• Capture the images of the pattern at a distance roughly equal to the distance from
your camera to the objects of interest. For example, if you plan to measure objects
from 2 meters, keep your pattern approximately 2 meters from the camera.
• Place the checkerboard at an angle less than 45 degrees relative to the camera plane.
6-24
Single Camera Calibrator App
• Capture a variety of images of the pattern so that you have accounted for as much of
the image frame as possible. Lens distortion increases radially from the center of the
image and sometimes is not uniform across the image frame. To capture this lens
distortion, the pattern must appear close to the edges of the captured images.
The Calibrator works with a range of checkerboard square sizes. As a general rule, your
checkerboard should fill at least 20% of the captured image. For example, the preceding
images were taken with a checkerboard square size of 108 mm, as the following montage
shows:
6-25
6 Registration and Stereo Vision
On the Calibration tab, in the File section, click Add images, and then select From
file. You can add images from multiple folders by clicking Add images for each folder.
To begin calibration, you must add images. You can acquire live images from a webcam
using the MATLAB Webcam support. To use this feature, you must install MATLAB
Support Package for USB Webcams. See “Install the MATLAB Support Package for USB
Webcams” (Image Acquisition Toolbox) for information on installing the support package.
To add live images, follow these steps.
1 On the Calibration tab, in the File section, click Add Images, then select From
camera.
This action opens the Camera tab opens. If you have only one webcam connected to
your system, it is selected by default and a live preview window opens. If you have
6-26
Single Camera Calibrator App
multiple cameras connected and want to use one different from the default, select
that specific camera in the Camera list.
2 Set properties for the camera to control the image (optional). Click the Camera
Properties to open a menu of the properties for the selected camera. This list varies
depending on your device.
Use the sliders or drop-down list to change any available property settings. The
Preview window updates dynamically when you change a setting. When you are done
setting properties, click anywhere outside of the menu box to dismiss the properties
list.
3 Enter a location for the acquired image files in the Save Location box by typing the
path to the folder or using the Browse button. You must have permission to write to
the folder you select.
4 Set the capture parameters.
• To set the number of seconds between image captures, use the Capture Interval
box or slider. The default is 5 seconds, the minimum is 1 second, and the
maximum is 60 seconds.
• To set the number of image captures, use the Number of images to capture box
or slider. The default is 20 images, the minimum is 2 images, and the maximum is
100 images.
In the default configuration, a total of 20 images are captured, one every 5 seconds.
5 The Preview window shows the live images streamed as RGB data. After you adjust
any device properties and capture settings, use the Preview window as a guide to line
up the camera to acquire the checkerboard pattern image you want to capture.
6 Click the Capture button. The number of images you set are captured and the
thumbnails of the snapshots appear in the Data Browser pane. They are
automatically named incrementally and are captured as .png files.
You can optionally stop the image capture before the designated number of images
are captured by clicking Stop Capture.
When you are capturing images of a checkerboard, after the designated number of
images are captured, a Checkerboard Square Size dialog box displays. Specify the
size of the checkerboard square, then click OK.
6-27
6 Registration and Stereo Vision
The detection results are then calculated and displayed. For example:
6-28
Single Camera Calibrator App
Analyze Images
After you add the images, the Checkerboard Square Size dialog box appears. Specify size
of the checkerboard square by entering the length of one side of a square from the
checkerboard pattern.
The calibrator attempts to detect a checkerboard in each of the added images, displaying
an Analyzing Images progress bar window, indicating detection progress. If any of the
images are rejected, the Detection Results dialog box appears, which contains diagnostic
information. The results indicate how many total images were processed, and of those
processed, how many were accepted, rejected, or skipped. The calibrator skips duplicate
images.
To view the rejected images, click View images. The calibrator rejects duplicate images.
It also rejects images where the entire checkerboard could not be detected. Possible
reasons for no detection are a blurry image or an extreme angle of the pattern. Detection
takes longer with larger images and with patterns that contain a large number of squares.
6-29
6 Registration and Stereo Vision
The Data Browser pane displays a list of images with IDs. These images contain a
detected pattern. To view an image, select it from the Data Browser pane.
The Image window displays the selected checkerboard image with green circles to
indicate detected points. You can verify that the corners were detected correctly using the
zoom controls. The yellow square indicates the (0,0) origin. The X and Y arrows indicate
the checkerboard axes orientation.
Calibrate
Once you are satisfied with the accepted images, click the Calibrate button on the
Calibration tab. The default calibration settings assume the minimum set of camera
parameters. Start by running the calibration with the default settings. After evaluating
6-30
Single Camera Calibrator App
the results, you can try to improve calibration accuracy by adjusting the settings and
adding or removing images and then calibrating again. If you switch between standard
and fisheye camera model, you must recalibrate.
You can select either a standard or fisheye camera model on the Calibration tab, in the
Camera Model section, select Standard or Fisheye.
You can switch camera models at any point in the session. You must calibrate again after
any changes you make to the app's settings. Click Options to access settings and
optimizations for either camera model.
When the camera has severe lens distortion, the app can fail to compute the initial values
for the camera intrinsics. If you have the manufacturer’s specifications for your camera
and know the pixel size, focal length, or lens characteristics, you can manually set initial
guesses for camera intrinsics and radial distortion. To set initial guesses, click Options >
Optimization Options.
• Select the top checkbox and then enter a 3-by-3 matrix to specify initial intrinsics. If
you do not specify an initial guess, the function computes the initial intrinsic matrix
using linear least squares.
• Select the bottom checkbox and then enter a 2- or 3-element vector to specify the
initial radial distortion. If you do not provide a value, the function uses 0 as the initial
value for all the coefficients.
In the Camera Model section, with Fisheye selected, click Options. Select Estimate
Alignment to enable estimation of the axes alignment when the optical axis of the fisheye
lens is not perpendicular to the image plane.
Calibration Algorithm
See “Fisheye Calibration Basics” on page 6-13 for the fisheye camera model calibration
algorithm.
The standard camera model calibration algorithm assumes a pinhole camera model:
R
wx y 1 = X Y Z 1 K
t
6-31
6 Registration and Stereo Vision
fx 0 0
s fy 0
cx cy 1
The coordinates (cx cy) represent the optical center (the principal point), in pixels.
When the x- and y-axes are exactly perpendicular, the skew parameter, s, equals 0. The
matrix elements are defined as:
fx = F*sx
fy = F*sy
F is the focal length in world units, typically expressed in millimeters.
[sx, sy] are the number of pixels per world unit in the x and y direction respectively.
fx and fy are expressed in pixels.
• R: matrix representing the 3-D rotation of the camera .
• t: translation of the camera relative to the world coordinate system.
The camera calibration algorithm estimates the values of the intrinsic parameters, the
extrinsic parameters, and the distortion coefficients. Camera calibration involves these
steps:
1 Solve for the intrinsics and extrinsics in closed form, assuming that lens distortion is
zero. [1]
2 Estimate all parameters simultaneously, including the distortion coefficients, using
nonlinear least-squares minimization (Levenberg–Marquardt algorithm). Use the
closed-form solution from the preceding step as the initial estimate of the intrinsics
and extrinsics. Set the initial estimate of the distortion coefficients to zero. [1][2]
6-32
Single Camera Calibrator App
The reprojection errors are the distances, in pixels, between the detected and the
reprojected points. The Camera Calibrator app calculates reprojection errors by
projecting the checkerboard points from world coordinates, defined by the checkerboard,
into image coordinates. The app then compares the reprojected points to the
corresponding detected points. As a general rule, mean reprojection errors of less than
one pixel are acceptable.
6-33
6 Registration and Stereo Vision
world coordinates of
checkerboard points
cameraParameters
detected points
points detected
ted
points reprojected
po
from image
using camera parameters
us
usin
reprojection error
The Camera Calibrator app displays, in pixels, the reprojection errors as a bar graph.
The graph helps you to identify which images that adversely contribute to the calibration.
Select the bar graph entry and remove the image from the list of images in the Data
Browser pane.
6-34
Single Camera Calibrator App
The 3-D extrinsic parameters plot provides a camera-centric view of the patterns and a
pattern-centric view of the camera. The camera-centric view is helpful if the camera was
stationary when the images were captured. The pattern-centric view is helpful if the
pattern was stationary. You can click the cursor and hold down the mouse button with the
rotate icon to rotate the figure. Click a checkerboard (or camera) to select it. The
highlighted data in the visualizations correspond to the selected image in the list.
Examine the relative positions of the pattern and the camera to determine if they match
what you expect. For example, a pattern that appears behind the camera indicates a
calibration error.
6-35
6 Registration and Stereo Vision
To view the effects of removing lens distortion, click Show Undistorted in the View
section of the Calibration tab. If the calibration was accurate, the distorted lines in the
image become straight.
6-36
Single Camera Calibrator App
Checking the undistorted images is important even if the reprojection errors are low. For
example, if the pattern covers only a small percentage of the image, the distortion
estimation might be incorrect, even though the calibration resulted in few reprojection
errors. The following image shows an example of this type of incorrect estimation for a
single camera calibration.
While viewing the undistorted images, you can examine the fisheye images more closely
by selecting Fisheye Scale in the View section of the Calibration tab. Use the slider in
the Scale Factor window to adjust the scale of the image.
Improve Calibration
To improve the calibration, you can remove high-error images, add more images, or
modify the calibrator settings.
6-37
6 Registration and Stereo Vision
You can specify two or three radial distortion coefficients. On the Calibrations tab, in the
Camera Model section, with Standard selected, click Options. Select the Radial
Distortion as either 2 Coefficients or 3 Coefficients. Radial distortion occurs when
light rays bend more near the edges of a lens than they do at its optical center. The
smaller the lens, the greater the distortion.
6-38
Single Camera Calibrator App
The radial distortion coefficients model this type of distortion. The distorted points are
denoted as (xdistorted, ydistorted):
Typically, two coefficients are sufficient for calibration. For severe distortion, such as in
wide-angle lenses, you can select 3 coefficients to include k3.
The undistorted pixel locations are in normalized image coordinates, with the origin at
the optical center. The coordinates are expressed in world units.
When you select the Compute Skew check box, the calibrator estimates the image axes
skew. Some camera sensors contain imperfections that cause the x- and y-axes of the
image to not be perpendicular. You can model this defect using a skew parameter. If you
do not select the check box, the image axes are assumed to be perpendicular, which is the
case for most modern cameras.
6-39
6 Registration and Stereo Vision
Tangential distortion occurs when the lens and the image plane are not parallel. The
tangential distortion coefficients model this type of distortion.
Camera
sensor Camera
sensor
When you select the Compute Tangential Distortion check box, the calibrator
estimates the tangential distortion coefficients. Otherwise, the calibrator sets the
tangential distortion coefficients to zero.
6-40
Single Camera Calibrator App
In the Camera Model section, with Fisheye selected, click Options. Select Estimate
Alignment to enable estimation of the axes alignment when the optical axis of the fisheye
lens is not perpendicular to the image plane.
Select Export Camera Parameters > Generate MATLAB script to save your camera
parameters to a MATLAB script, enabling you to reproduce the steps from your
calibration session.
References
[1] Zhang, Z. “A Flexible New Technique for Camera Calibration.” IEEE Transactions on
Pattern Analysis and Machine Intelligence. Vol. 22, Number. 11, 2000, pp. 1330–
1334.
[2] Heikkila, J. and O. Silven. “A Four-step Camera Calibration Procedure with Implicit
Image Correction.” IEEE International Conference on Computer Vision and
Pattern Recognition. 1997.
6-41
6 Registration and Stereo Vision
[3] Scaramuzza, D., A. Martinelli, and R. Siegwart. "A Toolbox for Easy Calibrating
Omindirectional Cameras." Proceedings to IEEE International Conference on
Intelligent Robots and Systems (IROS 2006). Beijing, China, October 7–15, 2006.
[4] Urban, S., J. Leitloff, and S. Hinz. "Improved Wide-Angle, Fisheye and Omnidirectional
Camera Calibration." ISPRS Journal of Photogrammetry and Remove Sensing. Vol.
108, 2015, pp.72–79.
See Also
Camera Calibrator | Stereo Camera Calibrator | cameraParameters |
detectCheckerboardPoints | estimateCameraParameters |
generateCheckerboardPoints | showExtrinsics | showReprojectionErrors |
stereoParameters | undistortImage
Related Examples
• “Evaluating the Accuracy of Single Camera Calibration”
• “Measuring Planar Objects with a Calibrated Camera”
• “Structure From Motion From Two Views”
• “Structure From Motion From Multiple Views”
• “Depth Estimation From Stereo Video”
• “3-D Point Cloud Registration and Stitching”
• “Uncalibrated Stereo Image Rectification”
• Checkerboard pattern
More About
• “Stereo Camera Calibrator App” on page 6-43
• “Coordinate Systems”
External Websites
• Camera Calibration with MATLAB
6-42
Stereo Camera Calibrator App
The Stereo Camera Calibrator app produces an object containing the stereo camera
parameters. You can use this object to
The suite of calibration functions used by the Stereo Camera Calibrator app provide the
workflow for stereo system calibration. You can use these functions directly in the
MATLAB workspace. For a list of calibration functions, see “Single and Stereo Camera
Calibration”.
Note You can use the Camera Calibrator app with cameras up to a field of view (FOV) of
95 degrees.
6-43
6 Registration and Stereo Vision
Follow this workflow to calibrate your stereo camera using the app:
6-44
Stereo Camera Calibrator App
You can print (from MATLAB) and use the checkerboard pattern provided. The
checkerboard pattern you use must not be square. One side must contain an even number
of squares and the other side must contain an odd number of squares. Therefore, the
pattern contains two black corners along one side and two white corners on the opposite
side. This criteria enables the app to determine the orientation of the pattern. The
calibrator assigns the longer side to be the x-direction.
1 Attach the checkerboard printout to a flat surface. Imperfections on the surface can
affect the accuracy of the calibration.
2 Measure one side of the checkerboard square. You need this measurement for
calibration. The size of the squares can vary depending on printer settings.
6-45
6 Registration and Stereo Vision
3 To improve the detection speed, set up the pattern with as little background clutter
as possible.
Camera Setup
Capture Images
For best results, use at least 10 to 20 images of the calibration pattern. The calibrator
requires at least three images. Use uncompressed images or images in lossless
compression formats such as PNG. For greater calibration accuracy:
• Capture the images of the pattern at a distance roughly equal to the distance from
your camera to the objects of interest. For example, if you plan to measure objects
from 2 meters, keep your pattern approximately 2 meters from the camera.
• Place the checkerboard at an angle less than 45 degrees relative to the camera plane.
6-46
Stereo Camera Calibrator App
6-47
6 Registration and Stereo Vision
• Make sure the checkerboard pattern is fully visible in both images of each stereo pair.
6-48
Stereo Camera Calibrator App
• Keep the pattern stationary for each image pair. Any motion of the pattern between
taking image 1 and image 2 of the pair negatively affects the calibration.
• Create a stereo display, or anaglyph, by positioning the two cameras approximately 55
mm apart. This distance represents the average distance between human eyes.
• For greater reconstruction accuracy at longer distances, position your cameras farther
apart.
Load Images
You can add images from multiple folders by clicking Add images in the File section of
the Calibration tab. Select the location for the images corresponding to camera 1 using
the Browse button, then do the same for camera 2. Specify Size of checkerboard
square by entering the length of one side of a square from the checkerboard pattern.
Analyze Images
The calibrator attempts to detect a checkerboard in each of the added images, displaying
an Analyzing Images progress bar window, indicating detection progress. If any of the
images are rejected, the Detection Results dialog box appears, which contains diagnostic
information. The results indicate how many total images were processed, and of those
processed, how many were accepted, rejected, or skipped. The calibrator skips duplicate
images.
6-49
6 Registration and Stereo Vision
To view the rejected images, click View images. The calibrator rejects duplicate images.
It also rejects images where the entire checkerboard could not be detected. Possible
reasons for no detection are a blurry image or an extreme angle of the pattern. Detection
takes longer with larger images and with patterns that contain a large number of squares.
The Data Browser pane displays a list of image pairs with IDs. These image pairs contain
a detected pattern. To view an image, select it from the Data Browser pane.
6-50
Stereo Camera Calibrator App
The Image pane displays the selected checkerboard image pair with green circles to
indicate detected points. You can verify that the corners were detected correctly using the
zoom controls. The yellow square indicates the (0,0) origin. The X and Y arrows indicate
the checkerboard axes orientation.
Intrinsics
You can choose for the app to compute camera intrinsics or you can load pre-computed
fixed intrinsics. To load intrinsics into the app, select Use Fixed Intrinsics in the
Intrinsics section of the Calibration tab. The Radial Distortion and Compute options in
the Options section are disabled when you load intrinsics.
To load intrinsics as variables from your workspace, click Load Intrinsics. For example,
if the wideBaselineStereo struct contains the intrinsics for both cameras.
6-51
6 Registration and Stereo Vision
ld = load('wideBaselineStereo');
int1 = ld.intrinsics1
int2 = ld.intrinsics2
Then, click Load Intrinsics to specify these variables in the dialog box, as shown.
Calibrate
Once you are satisfied with the accepted image pairs, click the Calibrate button on the
Calibration tab. The default calibration settings assume the minimum set of camera
parameters. Start by running the calibration with the default settings. After evaluating
the results, you can try to improve calibration accuracy by adjusting the settings and
adding or removing images, and then calibrate again.
Optimization
When the camera has severe lens distortion, the app can fail to compute the initial values
for the camera intrinsics. If you have the manufacturer’s specifications for your camera
and know the pixel size, focal length, or lens characteristics, you can manually set initial
guesses for camera intrinsics and radial distortion. To set initial guesses, click Options >
Optimization Options.
6-52
Stereo Camera Calibrator App
• Select the top checkbox and then enter a 3-by-3 matrix to specify initial intrinsics. If
you do not specify an initial guess, the function computes the initial intrinsic matrix
using linear least squares.
• Select the bottom checkbox and then enter a 2- or 3-element vector to specify the
initial radial distortion. If you do not provide a value, the function uses 0 as the initial
value for all the coefficients.
6-53
6 Registration and Stereo Vision
The reprojection errors are the distances, in pixels, between the detected and the
reprojected points. The Stereo Camera Calibrator app calculates reprojection errors by
projecting the checkerboard points from world coordinates, defined by the checkerboard,
into image coordinates. The app then compares the reprojected points to the
corresponding detected points. As a general rule, mean reprojection errors of less than
one pixel are acceptable.
world coordinates of
detected points
detect
de checkerboard points
stereoParameters
points detected
d
poin reprojected
points
po
from image pairs
airs
rs
using stereo parameters
us
usin
reprojection error
The Stereo Calibration App displays, in pixels, the reprojection errors as a bar graph.
The graph helps you to identify which images that adversely contribute to the calibration.
Select the bar graph entry and remove the image from the list of images in the Data
Browser pane.
6-54
Stereo Camera Calibrator App
The 3-D extrinsic parameters plot provides a camera-centric view of the patterns and a
pattern-centric view of the camera. The camera-centric view is helpful if the camera was
stationary when the images were captured. The pattern-centric view is helpful if the
pattern was stationary. You can click the cursor and hold down the mouse button with the
rotate icon to rotate the figure. Click a checkerboard (or camera) to select it. The
highlighted data in the visualizations correspond to the selected image in the list.
Examine the relative positions of the pattern and the camera to determine if they match
6-55
6 Registration and Stereo Vision
what you expect. For example, a pattern that appears behind the camera indicates a
calibration error.
To view the effects of stereo rectification, click Show Rectified in the View section of the
Calibration tab. If the calibration was accurate, the images become undistorted and row-
aligned.
6-56
Stereo Camera Calibrator App
Checking the rectified images is important even if the reprojection errors are low. For
example, if the pattern covers only a small percentage of the image, the distortion
estimation might be incorrect, even though the calibration resulted in few reprojection
errors.The following image shows an example of this type of incorrect estimation for a
single camera calibration.
Improve Calibration
To improve the calibration, you can remove high-error image pairs, add more image pairs,
or modify the calibrator settings.
6-57
6 Registration and Stereo Vision
You can specify 2 or 3 radial distortion coefficients by selecting the corresponding radio
button from the Options section. Radial distortion occurs when light rays bend more near
the edges of a lens than they do at its optical center. The smaller the lens, the greater the
distortion.
The radial distortion coefficients model this type of distortion. The distorted points are
denoted as (xdistorted, ydistorted):
6-58
Stereo Camera Calibrator App
Typically, two coefficients are sufficient for calibration. For severe distortion, such as in
wide-angle lenses, you can select 3 coefficients to include k3.
Compute Skew
When you select the Compute Skew check box, the calibrator estimates the image axes
skew. Some camera sensors contain imperfections that cause the x- and y-axes of the
image to not be perpendicular. You can model this defect using a skew parameter. If you
do not select the check box, the image axes are assumed to be perpendicular, which is the
case for most modern cameras.
Tangential distortion occurs when the lens and the image plane are not parallel. The
tangential distortion coefficients model this type of distortion.
Camera
sensor Camera
sensor
6-59
6 Registration and Stereo Vision
When you select the Compute Tangential Distortion check box, the calibrator
estimates the tangential distortion coefficients. Otherwise, the calibrator sets the
tangential distortion coefficients to zero.
Select Export Camera Parameters > Generate MATLAB script to save your camera
parameters to a MATLAB script, enabling you to reproduce the steps from your
calibration session.
6-60
See Also
References
[1] Zhang, Z. “A Flexible New Technique for Camera Calibration”. IEEE Transactions on
Pattern Analysis and Machine Intelligence.Vol. 22, No. 11, 2000, pp. 1330–1334.
[2] Heikkila, J, and O. Silven. “A Four-step Camera Calibration Procedure with Implicit
Image Correction.” IEEE International Conference on Computer Vision and
Pattern Recognition. 1997.
See Also
Camera Calibrator | Stereo Camera Calibrator | cameraParameters |
detectCheckerboardPoints | estimateCameraParameters |
generateCheckerboardPoints | showExtrinsics | showReprojectionErrors |
stereoParameters | undistortImage
Related Examples
• “Evaluating the Accuracy of Single Camera Calibration”
• “Measuring Planar Objects with a Calibrated Camera”
• “Structure From Motion From Two Views”
• “Structure From Motion From Multiple Views”
• “Depth Estimation From Stereo Video”
• “3-D Point Cloud Registration and Stitching”
• “Uncalibrated Stereo Image Rectification”
• Checkerboard pattern
More About
• “Single Camera Calibrator App” on page 6-21
• “Coordinate Systems”
External Websites
• Camera Calibration with MATLAB
6-61
6 Registration and Stereo Vision
Before
After
• Plot the relative locations of the camera and the calibration pattern
• Calculate the reprojection errors.
• Calculate the parameter estimation errors.
6-62
What Is Camera Calibration?
Use the Camera Calibrator to perform camera calibration and evaluate the accuracy of
the estimated parameters.
Camera Model
The Computer Vision Toolbox calibration algorithm uses the camera model proposed by
Jean-Yves Bouguet [3]. The model includes:
The pinhole camera model does not account for lens distortion because an ideal pinhole
camera does not have a lens. To accurately represent a real camera, the full camera
model used by the algorithm includes the radial and tangential lens distortion.
2-D image Image plane Focal point Virtual image plane 3-D object
h
engt
al l
Foc
The pinhole camera parameters are represented in a 4-by-3 matrix called the camera
matrix. This matrix maps the 3-D world scene into the image plane. The calibration
algorithm calculates the camera matrix using the extrinsic and intrinsic parameters. The
extrinsic parameters represent the location of the camera in the 3-D scene. The intrinsic
parameters represent the optical center and focal length of the camera.
6-63
6 Registration and Stereo Vision
w [x y 1] = [X Y Z 1] P
{
Scale factor Image points World points
R
P= [ ]K
t
Camera matrix Intrinsic matrix
Extrinsics
Rotation and translation
The world points are transformed to camera coordinates using the extrinsics parameters.
The camera coordinates are mapped into the image plane using the intrinsics parameters.
Oi Oc Ow
R
Intrinsics K Extrinsics t
6-64
What Is Camera Calibration?
Extrinsic Intrinsic
parameters parameters
Extrinsic Parameters
The extrinsic parameters consist of a rotation, R, and a translation, t. The origin of the
camera’s coordinate system is at its optical center and its x- and y-axis define the image
plane.
Z
Z
X R
t
X
Y
Y
Intrinsic Parameters
The intrinsic parameters include the focal length, the optical center, also known as the
principal point, and the skew coefficient. The camera intrinsic matrix, K, is defined as:
fx 0 0
s fy 0
cx cy 1
6-65
6 Registration and Stereo Vision
Py Pixel
Px
Skew
Radial Distortion
Radial distortion occurs when light rays bend more near the edges of a lens than they do
at its optical center. The smaller the lens, the greater the distortion.
6-66
What Is Camera Calibration?
The radial distortion coefficients model this type of distortion. The distorted points are
denoted as (xdistorted, ydistorted):
Typically, two coefficients are sufficient for calibration. For severe distortion, such as in
wide-angle lenses, you can select 3 coefficients to include k3.
Tangential Distortion
Tangential distortion occurs when the lens and the image plane are not parallel. The
tangential distortion coefficients model this type of distortion.
Camera
sensor Camera
sensor
6-67
6 Registration and Stereo Vision
References
[1] Zhang, Z. “A Flexible New Technique for Camera Calibration.” IEEE Transactions on
Pattern Analysis and Machine Intelligence. Vol. 22, No. 11, 2000, pp. 1330–1334.
[2] Heikkila, J., and O. Silven. “A Four-step Camera Calibration Procedure with Implicit
Image Correction.” IEEE International Conference on Computer Vision and
Pattern Recognition.1997.
[3] Bouguet, J. Y. “Camera Calibration Toolbox for Matlab.” Computational Vision at the
California Institute of Technology. Camera Calibration Toolbox for MATLAB.
[4] Bradski, G., and A. Kaehler. Learning OpenCV: Computer Vision with the OpenCV
Library. Sebastopol, CA: O'Reilly, 2008.
See Also
“Single Camera Calibrator App” on page 6-21 | Camera Calibrator
Related Examples
• “Evaluating the Accuracy of Single Camera Calibration”
• “Measuring Planar Objects with a Calibrated Camera”
• “Structure From Motion From Two Views”
6-68
Structure from Motion
Structure from motion (SfM) is the process of estimating the 3-D structure of a scene
from a set of 2-D images. SfM is used in many applications, such as 3-D scanning and
augmented reality.
SfM can be computed in many different ways. The way in which you approach the
problem depends on different factors, such as the number and type of cameras used, and
whether the images are ordered. If the images are taken with a single calibrated camera,
then the 3-D structure and camera motion can only be recovered up to scale. up to scale
means that you can rescale the structure and the magnitude of the camera motion and
still maintain observations. For example, if you put a camera close to an object, you can
see the same image as when you enlarge the object and move the camera far away. If you
want to compute the actual scale of the structure and motion in world units, you need
additional information, such as:
6-69
6 Registration and Stereo Vision
6-70
Structure from Motion
The triangulate function takes two camera matrices, which you can compute using
cameraMatrix.
5 Use pcshow to display the reconstruction, and use plotCamera to visualize the
camera poses.
To recover the scale of the reconstruction, you need additional information. One method
to recover the scale is to detect an object of a known size in the scene. The “Structure
From Motion From Two Views” example shows how to recover scale by detecting a
sphere of a known size in the point cloud of the scene.
The approach used for SfM from two views can be extended for multiple views. The set of
multiple views used for SfM can be ordered or unordered. The approach taken here
assumes an ordered sequence of views. SfM from multiple views requires point
6-71
6 Registration and Stereo Vision
Using the approach in SfM from two views, you can find the pose of camera 2 relative to
camera 1. To extend this approach to the multiple view case, find the pose of camera 3
relative to camera 2, and so on. The relative poses must be transformed into a common
coordinate system. Typically, all camera poses are computed relative to camera 1 so that
all poses are in the same coordinate system. You can use viewSet to manage camera
poses. The viewSet object stores the views and connections between the views.
6-72
See Also
Every camera pose estimation from one view to the next contains errors. The errors arise
from imprecise point localization in images, and from noisy matches and imprecise
calibration. These errors accumulate as the number of views increases, an effect known
as drift. One way to reduce the drift, is to refine camera poses and 3-D point locations.
The nonlinear optimization algorithm, called bundle adjustment, implemented by the
bundleAdjustment function, can be used for the refinement.
The “Structure From Motion From Multiple Views” example shows how to reconstruct a
3-D scene from a sequence of 2-D views. The example uses the Camera Calibrator app
to calibrate the camera that takes the views. It uses a viewSet object to store and
manage the data associated with each view.
See Also
Camera Calibrator | Stereo Camera Calibrator | bundleAdjustment |
cameraMatrix | estimateFundamentalMatrix | matchFeatures | pointTrack |
relativeCameraPose | triangulateMultiview | viewSet | vision.PointTracker
Related Examples
• “Structure From Motion From Two Views”
• “Structure From Motion From Multiple Views”
6-73
7
Object Detection
7-2
How Labeler Apps Store Exported Pixel Labels
• A folder named PixelLabelData, which contains the PNG files of pixel label
information. These labels are encoded as indexed values.
• A MAT-file containing a groundTruth object, which stores correspondences between
image or video frames and the PNG files. The object also contains any marked
rectangles or polylines.
The PNG files within the PixelLabelData folder are stored as a categorical matrix. The
categorical matrices contain values assigned to categories. Categorical is a data type.
A categorical matrix provides efficient storage and convenient manipulation of
nonnumeric data, while also maintaining meaningful names for the values. These matrices
are natural representations for semantic segmentation ground truth, where each pixel is
one of a predefined category of labels.
7-3
7 Object Detection
Use the imread function with the categorical and labeloverlay functions. You
cannot view the pixel data directly from the categorical matrix. See “View Exported Pixel
Label Data” on page 7-4.
Examples
View Exported Pixel Label Data
Read image and corresponding pixel label data that was exported from a labeler app.
visiondatadir = fullfile(toolboxdir('vision'),'visiondata');
buildingImage = imread(fullfile(visiondatadir,'building','building1.JPG'));
buildingLabels = imread(fullfile(visiondatadir,'buildingPixelLabels','Label_1.png'));
7-4
How Labeler Apps Store Exported Pixel Labels
buildingLabelCats = categorical(buildingLabels,labelIDs,labelcats);
figure
imshow(labeloverlay(buildingImage,buildingLabelCats))
7-5
7 Object Detection
dataDir = fullfile(toolboxdir('vision'),'visiondata');
imDir = fullfile(dataDir,'building');
pxDir = fullfile(dataDir,'buildingPixelLabels');
imds = imageDatastore(imDir);
classNames = ["sky" "grass" "building" "sidewalk"];
pixelLabelID = [1 2 3 4];
pxds = pixelLabelDatastore(pxDir,classNames,pixelLabelID);
Read the image and pixel label data. read(pxs) returns a categorical matrix, C. The
element C(i,j) in the matrix is the categorical label assigned to the pixel at the location
l(i,j).
I = read(imds);
C = read(pxds);
categories(C)
Overlay and display the pixel label data onto the image.
B = labeloverlay(I,C);
figure
imshow(B)
7-6
See Also
See Also
Apps
Ground Truth Labeler | Image Labeler | Video Labeler
Objects
groundTruth | pixelLabelImageDatastore
7-7
7 Object Detection
More About
• “Label Pixels for Semantic Segmentation” on page 7-88
• “Share and Store Labeled Ground Truth Data” on page 7-147
7-8
Anchor Boxes for Object Detection
The network does not directly predict bounding boxes, but rather predicts the
probabilities and refinements that correspond to the tiled anchor boxes. The network
returns a unique set of predictions for every anchor box defined. The final feature map
represents object detections for each class. The use of anchor boxes enables a network to
detect multiple objects, objects of different scales, and overlapping objects.
7-9
7 Object Detection
7-10
Anchor Boxes for Object Detection
7-11
7 Object Detection
Each anchor box is tiled across the image. The number of network outputs equals the
number of tiled anchor boxes. The network produces predictions for all outputs.
The distance, or stride, between the tiled anchor boxes is a function of the amount of
downsampling present in the CNN. Downsampling factors between 4 and 16 are common.
These downsampling factors produce coarsely tiled anchor boxes, which can lead to
localization errors.
7-12
Anchor Boxes for Object Detection
7-13
7 Object Detection
To fix localization errors, deep learning object detectors learn offsets to apply to each
tiled anchor box refining the anchor box position and size.
To generate the final object detections, tiled anchor boxes that belong to the background
class are removed, and the remaining ones are filtered by their confidence score. Anchor
boxes with the greatest confidence score are selected using nonmaximum suppression
(NMS). For more details about NMS, see the selectStrongestBboxMulticlass
function.
7-14
See Also
ratio of objects in your training data. For an example of estimating sizes, see “Estimate
Anchor Boxes Using Clustering” on page 1-51.
See Also
Related Examples
• “Create YOLO v2 Object Detection Network” on page 1-46
• “Object Detection Using Deep Learning”
• “Object Detection Using Faster R-CNN Deep Learning”
More About
• “YOLO v2 Basics” on page 7-16
• “Deep Learning in MATLAB” (Deep Learning Toolbox)
• “Pretrained Deep Neural Networks” (Deep Learning Toolbox)
7-15
7 Object Detection
YOLO v2 Basics
The you-only-look-once (YOLO) v2 object detector uses a single stage object detection
network. YOLO v2 is faster than other two-stage deep learning object detectors, such as
regions with convolutional neural networks (Faster R-CNNs).
The YOLO v2 model runs a deep learning CNN on an input image to produce network
predictions. The object detector decodes the predictions and generates bounding boxes.
• Intersection over union (IoU) — Predicts the objectness score of each anchor box.
• Anchor box offsets — Refine the anchor box position
• Class probability — Predicts the class label assigned to each anchor box.
The figure shows the predefined anchor box (the dotted line) and the refined location
after offsets are applied.
7-16
YOLO v2 Basics
Transfer Learning
With transfer learning, you can use a pretrained CNN as the feature extractor in a YOLO
v2 detection network. Use the yolov2Layers function to create a YOLO v2 detection
network from any pretrained CNN, for example MobileNet v2. For a list of pretrained
CNNs, see “Pretrained Deep Neural Networks” (Deep Learning Toolbox)
You can also design a custom model based on a pretrained image classification CNN. For
more details, see “Design a YOLO v2 Detection Network” on page 7-18.
7-17
7 Object Detection
You can also use the Deep Network Designer app to manually create a network. The
designer incorporates Computer Vision Toolbox YOLO v2 features.
The reorganization layer (created using the yolov2ReorgLayer object) and the depth
concatenation layer ( created using the depthConcatenationLayer object) are used to
combine low-level and high-level features. These layers improve detection by adding low-
level image information and improving detection accuracy for smaller objects. Typically,
the reorganization layer is attached to a layer within the feature extraction network
whose output feature map is larger than the feature extraction layer output.
Tip
• Adjust the 'Stride' property of the yolov2ReorgLayer object such that its output
size matches the input size of the depthConcatenationLayer object.
• To simplify designing a network, use the interactive Deep Network Designer app and
the analyzeNetwork function.
7-18
YOLO v2 Basics
For more details on how to create this kind of network, see “Create YOLO v2 Object
Detection Network” on page 1-46.
Code Generation
To learn how to generate CUDA® code using the YOLO v2 object detector (created using
the yolov2ObjectDetector object) see “Code Generation for Object Detection Using
YOLO v2” on page 1-3.
7-19
7 Object Detection
References
[1] Redmon, J. and A. Farhadi. "YOLO9000: Better, Faster, Stronger." IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 6517–6525. Honolulu, HI:
CVPR 2017.
[2] Redmon, J., S. Divvala, R. Girshick, and A. Farhadi. "You only look once: Unified, real-
time object detection." Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 779–788. Las Vegas, NV: CVPR, 2016.
See Also
Apps
Deep Network Designer | Ground Truth Labeler | Image Labeler | Video Labeler
Objects
depthConcatenationLayer | yolov2ObjectDetector | yolov2OutputLayer |
yolov2ReorgLayer | yolov2TransformLayer
7-20
See Also
Functions
analyzeNetwork | trainYOLOv2ObjectDetector
Related Examples
• “Object Detection Using Deep Learning”
• “Object Detection Using YOLO v2 Deep Learning” on page 1-30
• “Code Generation for Object Detection Using YOLO v2” on page 1-3
More About
• “Anchor Boxes for Object Detection” on page 7-9
• “R-CNN, Fast R-CNN, and Faster R-CNN Basics” on page 7-22
• “Deep Learning in MATLAB” (Deep Learning Toolbox)
• “Pretrained Deep Neural Networks” (Deep Learning Toolbox)
7-21
7 Object Detection
• Autonomous driving
• Smart surveillance systems
• Facial recognition
Computer Vision Toolbox provides object detectors for the R-CNN, Fast R-CNN, and
Faster R-CNN algorithms.
• Find regions in the image that might contain an object. These regions are called
region proposals.
• Extract CNN features from the region proposals.
• Classify the objects using the extracted features.
There are three variants of an R-CNN. Each variant attempts to optimize, speed up, or
enhance the results of one or more of these processes.
R-CNN
The R-CNN detector [2] first generates region proposals using an algorithm such as Edge
Boxes[1]. The proposal regions are cropped out of the image and resized. Then, the CNN
classifies the cropped and resized regions. Finally, the region proposal bounding boxes
are refined by a support vector machine (SVM) that is trained using CNN features.
7-22
R-CNN, Fast R-CNN, and Faster R-CNN Basics
Fast R-CNN
As in the R-CNN detector , the Fast R-CNN[3] detector also uses an algorithm like Edge
Boxes to generate region proposals. Unlike the R-CNN detector, which crops and resizes
region proposals, the Fast R-CNN detector processes the entire image. Whereas an R-
CNN detector must classify each region, Fast R-CNN pools CNN features corresponding
to each region proposal. Fast R-CNN is more efficient than R-CNN, because in the Fast R-
CNN detector, the computations for overlapping regions are shared.
Faster R-CNN
The Faster R-CNN[4] detector. Instead of using an external algorithm like Edge Boxes,
Faster R-CNN adds a region proposal network (RPN) to generate region proposals
7-23
7 Object Detection
directly in the network. The RPN uses “Anchor Boxes for Object Detection” on page 7-9.
Generating region proposals in the network is faster and better tuned to your data.
7-24
R-CNN, Fast R-CNN, and Faster R-CNN Basics
Transfer Learning
You can use a pretrained convolution neural network (CNN) as the basis for an R-CNN
detector, also referred to as transfer learning. See “Pretrained Deep Neural Networks”
(Deep Learning Toolbox). Use one of the following networks with the
trainRCNNObjectDetector, trainFasterRCNNObjectDetector, or
trainFastRCNNObjectDetector functions. To use any of these networks you must
install the corresponding Deep Learning Toolbox™ model:
• 'alexnet'
• 'vgg16'
• 'vgg19'
• 'resnet50'
• 'resnet101'
• 'inceptionv3'
• 'googlenet'
• 'inceptionresnetv2'
• 'squeezenet'
You can also design a custom model based on a pretrained image classification CNN. See
the “Design an R-CNN, Fast R-CNN, and a Faster R-CNN Model” on page 7-25 section
and the Deep Network Designer app.
1 The basic R-CNN model starts with a pretrained network. The last three classification
layers are replaced with new layers that are specific to the object classes you want to
detect.
For an example of how to create an R-CNN object detection network, see “Create R-
CNN Object Detection Network” on page 7-65
7-25
7 Object Detection
2 The Fast R-CNN model builds on the basic R-CNN model. A box regression layer is
added to improve on the position of the object in the image by learning a set of box
offsets. An ROI pooling layer is inserted into the network to pool CNN features for
each region proposal.
For an example of how to create a Fast R-CNN object detection network, see “Create
Fast R-CNN Object Detection Network” on page 7-69
3 The Faster R-CNN model builds on the Fast R-CNN model. A region proposal network
is added to produce the region proposals instead of getting the proposals from an
external algorithm.
7-26
R-CNN, Fast R-CNN, and Faster R-CNN Basics
For an example of how to create a Faster R-CNN object detection network, see
“Create Faster R-CNN Object Detection Network” on page 7-75
7-27
7 Object Detection
References
[1] Zitnick, C. Lawrence, and P. Dollar. "Edge boxes: Locating object proposals from
edges." Computer Vision-ECCV. Springer International Publishing. Pages
391-4050. 2014.
[2] Girshick, R., J. Donahue, T. Darrell, and J. Malik. "Rich Feature Hierarchies for
Accurate Object Detection and Semantic Segmentation." CVPR '14 Proceedings of
the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Pages
580-587. 2014
[3] Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE International Conference on
Computer Vision. 2015
[4] Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards
Real-Time Object Detection with Region Proposal Networks." Advances in Neural
Information Processing Systems . Vol. 28, 2015.
7-28
See Also
See Also
Apps
Deep Network Designer | Ground Truth Labeler | Image Labeler | Video Labeler
Functions
fastRCNNObjectDetector | fasterRCNNObjectDetector | rcnnObjectDetector |
trainFastRCNNObjectDetector | trainFasterRCNNObjectDetector |
trainRCNNObjectDetector
Related Examples
• “Object Detection Using Deep Learning”
• “Object Detection Using Faster R-CNN Deep Learning”
More About
• “Anchor Boxes for Object Detection” on page 7-9
• “Deep Learning in MATLAB” (Deep Learning Toolbox)
• “Pretrained Deep Neural Networks” (Deep Learning Toolbox)
7-29
7 Object Detection
• Autonomous driving
• Industrial inspection
• Classification of terrain visible in satellite imagery
• Medical imaging analysis
7-30
See Also
See Also
Apps
Image Labeler
Functions
evaluateSemanticSegmentation | fcnLayers | pixelLabelDatastore |
segnetLayers | semanticSegmentationMetrics | semanticseg
7-31
7 Object Detection
Objects
pixelClassificationLayer | pixelLabelImageDatastore
See Also
Related Examples
• “Semantic Segmentation Using Deep Learning”
• “Label Pixels for Semantic Segmentation” on page 7-88
• “Define Custom Pixel Classification Layer with Dice Loss” on page 1-64
• “Semantic Segmentation Using Dilated Convolutions” on page 1-58
More About
• “Deep Learning in MATLAB” (Deep Learning Toolbox)
7-32
Semantic Segmentation Examples
The following code loads a small set of images and their corresponding pixel labeled
images:
dataDir = fullfile(toolboxdir('vision'),'visiondata');
imDir = fullfile(dataDir,'building');
pxDir = fullfile(dataDir,'buildingPixelLabels');
Load the image data using an imageDatastore. An image datastore can efficiently
represent a large collection of images because images are only read into memory when
needed.
imds = imageDatastore(imDir);
I = readimage(imds,1);
figure
imshow(I)
7-33
7 Object Detection
Load the pixel label images using a pixelLabelDatastore to define the mapping
between label IDs and categorical names. In the dataset used here, the labels are "sky",
"grass", "building", and "sidewalk". The label IDs for these classes are 1, 2, 3, 4,
respectively.
Create a pixelLabelDatastore.
7-34
Semantic Segmentation Examples
pxds = pixelLabelDatastore(pxDir,classNames,pixelLabelID);
C = readimage(pxds,1);
The output C is a categorical matrix where C(i,j) is the categorical label of pixel
I(i,j).
C(5,5)
ans = categorical
sky
Overlay the pixel labels on the image to see how different parts of the image are labeled.
B = labeloverlay(I,C);
figure
imshow(B)
7-35
7 Object Detection
The categorical output format simplifies tasks that require doing things by class names.
For instance, you can create a binary mask of just the building:
buildingMask = C == 'building';
figure
imshowpair(I, buildingMask,'montage')
7-36
Semantic Segmentation Examples
imgLayer =
ImageInputLayer with properties:
7-37
7 Object Detection
Name: ''
InputSize: [32 32 3]
Hyperparameters
DataAugmentation: 'none'
Normalization: 'zerocenter'
Start with the convolution and ReLU layers. The convolution layer padding is selected
such that the output size of the convolution layer is the same as the input size. This makes
it easier to construct a network because the input and output sizes between most layers
remain the same as you progress through the network.
filterSize = 3;
numFilters = 32;
conv = convolution2dLayer(filterSize,numFilters,'Padding',1);
relu = reluLayer();
The downsampling is performed using a max pooling layer. Create a max pooling layer to
downsample the input by a factor of 2 by setting the 'Stride' parameter to 2.
poolSize = 2;
maxPoolDownsample2x = maxPooling2dLayer(poolSize,'Stride',2);
Stack the convolution, ReLU, and max pooling layers to create a network that
downsamples its input by a factor of 4.
downsamplingLayers = [
conv
relu
maxPoolDownsample2x
conv
relu
maxPoolDownsample2x
]
downsamplingLayers =
6x1 Layer array with layers:
7-38
Semantic Segmentation Examples
The upsampling is done using the tranposed convolution layer (also commonly referred to
as "deconv" or "deconvolution" layer). When a transposed convolution is used for
upsampling, it performs the upsampling and the filtering at the same time.
The 'Cropping' parameter is set to 1 to make the output size equal twice the input size.
Stack the transposed convolution and relu layers. An input to this set of layers is
upsampled by 4.
upsamplingLayers = [
transposedConvUpsample2x
relu
transposedConvUpsample2x
relu
]
upsamplingLayers =
4x1 Layer array with layers:
The final set of layers are responsible for making pixel classifications. These final layers
process an input that has the same spatial dimensions (height and width) as the input
image. However, the number of channels (third dimension) is larger and is equal to
number of filters in the last transposed convolution layer. This third dimension needs to
be squeezed down to the number of classes we wish to segment. This can be done using a
1-by-1 convolution layer whose number of filters equal the number of classes, e.g. 3.
Create a convolution layer to combine the third dimension of the input feature maps down
to the number of classes.
7-39
7 Object Detection
numClasses = 3;
conv1x1 = convolution2dLayer(1,numClasses);
Following this 1-by-1 convolution layer are the softmax and pixel classification layers.
These two layers combine to predict the categorical label for each image pixel.
finalLayers = [
conv1x1
softmaxLayer()
pixelClassificationLayer()
]
finalLayers =
3x1 Layer array with layers:
net = [
imgLayer
downsamplingLayers
upsamplingLayers
finalLayers
]
net =
14x1 Layer array with layers:
7-40
Semantic Segmentation Examples
This network is ready to be trained using trainNetwork from Deep Learning Toolbox™.
dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageDir = fullfile(dataSetDir,'trainingImages');
labelDir = fullfile(dataSetDir,'trainingLabels');
imds = imageDatastore(imageDir);
classNames = ["triangle","background"];
labelIDs = [255 0];
pxds = pixelLabelDatastore(labelDir,classNames,labelIDs);
I = read(imds);
C = read(pxds);
I = imresize(I,5);
L = imresize(uint8(C),5);
imshowpair(I,L,'montage')
7-41
7 Object Detection
numFilters = 64;
filterSize = 3;
numClasses = 2;
layers = [
imageInputLayer([32 32 1])
convolution2dLayer(filterSize,numFilters,'Padding',1)
reluLayer()
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(filterSize,numFilters,'Padding',1)
reluLayer()
transposedConv2dLayer(4,numFilters,'Stride',2,'Cropping',1);
convolution2dLayer(1,numClasses);
softmaxLayer()
pixelClassificationLayer()
]
layers =
10x1 Layer array with layers:
7-42
Semantic Segmentation Examples
trainingData = pixelLabelImageDatastore(imds,pxds);
net = trainNetwork(trainingData,layers,opts);
testImage = imread('triangleTest.jpg');
imshow(testImage)
7-43
7 Object Detection
C = semanticseg(testImage,net);
B = labeloverlay(testImage,C);
imshow(B)
7-44
Semantic Segmentation Examples
The network failed to segment the triangles and classified every pixel as "background".
The training appeared to be going well with training accuracies greater than 90%.
However, the network only learned to classify the background class. To understand why
this happened, you can count the occurrence of each pixel label across the dataset.
tbl = countEachLabel(trainingData)
tbl=2×3 table
Name PixelCount ImagePixelCount
____________ __________ _______________
The majority of pixel labels are for the background. The poor results are due to the class
imbalance. Class imbalance biases the learning process in favor of the dominant class.
7-45
7 Object Detection
That's why every pixel is classified as "background". To fix this, use class weighting to
balance the classes. There are several methods for computing class weights. One common
method is inverse frequency weighting where the class weights are the inverse of the
class frequencies. This increases weight given to under-represented classes.
totalNumberOfPixels = sum(tbl.PixelCount);
frequency = tbl.PixelCount / totalNumberOfPixels;
classWeights = 1./frequency
classWeights = 2×1
19.8334
1.0531
Class weights can be specified using the pixelClassificationLayer. Update the last
layer to use a pixelClassificationLayer with inverse class weights.
layers(end) = pixelClassificationLayer('Classes',tbl.Name,'ClassWeights',classWeights);
net = trainNetwork(trainingData,layers,opts);
C = semanticseg(testImage,net);
B = labeloverlay(testImage,C);
imshow(B)
7-46
Semantic Segmentation Examples
Using class weighting to balance the classes produced a better segmentation result.
Additional steps to improve the results include increasing the number of epochs used for
training, adding more training data, or modifying the network.
The triangleImages data set has 100 test images with ground truth labels. Define the
location of the data set.
dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
7-47
7 Object Detection
Define the class names and their associated label IDs. The label IDs are the pixel values
used in the image files to represent each class.
classNames = ["triangle" "background"];
labelIDs = [255 0];
Create a pixelLabelDatastore object holding the ground truth pixel labels for the test
images.
pxdsTruth = pixelLabelDatastore(testLabelsDir,classNames,labelIDs);
Load a semantic segmentation network that has been trained on the training images of
triangleImages.
net = load('triangleSegmentationNetwork.mat');
net = net.net;
Run the network on the test images. Predicted labels are written to disk in a temporary
directory and returned as a pixelLabelDatastore object.
pxdsResults = semanticseg(imds,net,"WriteLocation",tempdir);
The predicted labels are compared to the ground truth labels. While the semantic
segmentation metrics are being computed, progress is printed to the Command Window.
metrics = evaluateSemanticSegmentation(pxdsResults,pxdsTruth);
7-48
Semantic Segmentation Examples
Display the classification accuracy, the intersection over union (IoU), and the boundary
F-1 score for each class in the data set.
metrics.ClassMetrics
ans=2×3 table
Accuracy IoU MeanBFScore
________ _______ ___________
metrics.ConfusionMatrix
ans=2×2 table
triangle background
________ __________
triangle 4730 0
background 9601 88069
normConfMatData = metrics.NormalizedConfusionMatrix.Variables;
figure
h = heatmap(classNames,classNames,100*normConfMatData);
h.XLabel = 'Predicted Class';
7-49
7 Object Detection
imageIoU = metrics.ImageMetrics.MeanIoU;
figure
histogram(imageIoU)
title('Image Mean IoU')
7-50
Semantic Segmentation Examples
Read the test image with the worst IoU, its ground truth labels, and its predicted labels
for comparison.
worstTestImage = readimage(imds,worstImageIndex);
worstTrueLabels = readimage(pxdsTruth,worstImageIndex);
worstPredictedLabels = readimage(pxdsResults,worstImageIndex);
Convert the label images to images that can be displayed in a figure window.
7-51
7 Object Detection
Display the worst test image, the ground truth, and the prediction.
worstMontage = cat(4,worstTestImage,worstTrueLabelImage,worstPredictedLabelImage);
worstMontage = imresize(worstMontage,4,"nearest");
figure
montage(worstMontage,'Size',[1 3])
title(['Test Image vs. Truth vs. Prediction. IoU = ' num2str(minIoU)])
Repeat the previous steps to read, convert, and display the test image with the best IoU
with its ground truth and predicted labels.
bestTestImage = readimage(imds,bestImageIndex);
bestTrueLabels = readimage(pxdsTruth,bestImageIndex);
bestPredictedLabels = readimage(pxdsResults,bestImageIndex);
bestMontage = cat(4,bestTestImage,bestTrueLabelImage,bestPredictedLabelImage);
7-52
Semantic Segmentation Examples
bestMontage = imresize(bestMontage,4,"nearest");
figure
montage(bestMontage,'Size',[1 3])
title(['Test Image vs. Truth vs. Prediction. IoU = ' num2str(maxIoU)])
Optionally, list the metric(s) you would like to evaluate using the 'Metrics' parameter.
MeanAccuracy MeanIoU
____________ _______
0.95085 0.61588
7-53
7 Object Detection
metrics.ClassMetrics
ans=2×2 table
Accuracy IoU
________ _______
triangle 1 0.33005
background 0.9017 0.9017
A pixel labeled dataset is a collection of images and a corresponding set of ground truth
pixel labels used for training semantic segmentation networks. There are many public
datasets that provide annotated images with per-pixel labels. To illustrate the steps for
importing these types of datasets, the example uses the CamVid dataset from the
University of Cambridge [1].
The CamVid dataset is a collection of images containing street level views obtained while
driving. The dataset provides pixel-level labels for 32 semantic classes including car,
pedestrian, and road. The steps shown to import CamVid can be used to import other
pixel labeled datasets.
imageURL = 'http://web4.cs.ucl.ac.uk/staff/g.brostow/MotionSegRecData/files/701_StillsR
labelURL = 'http://web4.cs.ucl.ac.uk/staff/g.brostow/MotionSegRecData/data/LabeledAppro
if ~exist(outputFolder, 'dir')
disp('Downloading 557 MB CamVid data set...');
unzip(imageURL, imageDir);
unzip(labelURL, labelDir);
end
7-54
Semantic Segmentation Examples
Note: Download time of the data depends on your internet connection. The commands
used above will block MATLAB® until the download is complete. Alternatively, you can
use your web browser to first download the dataset to your local disk. To use the file you
downloaded from the web, change the outputFolder variable above to the location of
the downloaded file.
The CamVid data set encodes the pixel labels as RGB images, where each class is
represented by an RGB color. Here are the classes the dataset defines along with their
RGB encodings.
classNames = [ ...
"Animal", ...
"Archway", ...
"Bicyclist", ...
"Bridge", ...
"Building", ...
"Car", ...
"CartLuggagePram", ...
"Child", ...
"Column_Pole", ...
"Fence", ...
"LaneMkgsDriv", ...
"LaneMkgsNonDriv", ...
"Misc_Text", ...
"MotorcycleScooter", ...
"OtherMoving", ...
"ParkingBlock", ...
"Pedestrian", ...
"Road", ...
"RoadShoulder", ...
"Sidewalk", ...
"SignSymbol", ...
"Sky", ...
"SUVPickupTruck", ...
"TrafficCone", ...
"TrafficLight", ...
"Train", ...
"Tree", ...
"Truck_Bus", ...
"Tunnel", ...
"VegetationMisc", ...
"Wall"];
7-55
7 Object Detection
Define the mapping between label indices and class names such that classNames(k)
corresponds to labelIDs(k,:).
labelIDs = [ ...
064 128 064; ... % "Animal"
192 000 128; ... % "Archway"
000 128 192; ... % "Bicyclist"
000 128 064; ... % "Bridge"
128 000 000; ... % "Building"
064 000 128; ... % "Car"
064 000 192; ... % "CartLuggagePram"
192 128 064; ... % "Child"
192 192 128; ... % "Column_Pole"
064 064 128; ... % "Fence"
128 000 192; ... % "LaneMkgsDriv"
192 000 064; ... % "LaneMkgsNonDriv"
128 128 064; ... % "Misc_Text"
192 000 192; ... % "MotorcycleScooter"
128 064 064; ... % "OtherMoving"
064 192 128; ... % "ParkingBlock"
064 064 000; ... % "Pedestrian"
128 064 128; ... % "Road"
128 128 192; ... % "RoadShoulder"
000 000 192; ... % "Sidewalk"
192 128 128; ... % "SignSymbol"
128 128 128; ... % "Sky"
064 128 192; ... % "SUVPickupTruck"
000 000 064; ... % "TrafficCone"
000 064 064; ... % "TrafficLight"
192 064 128; ... % "Train"
128 128 000; ... % "Tree"
192 128 192; ... % "Truck_Bus"
064 000 064; ... % "Tunnel"
192 192 000; ... % "VegetationMisc"
064 192 000]; % "Wall"
Note that other datasets have different formats of encoding data. For example, the
PASCAL VOC [2] dataset uses numeric label IDs between 0 and 21 to encode their class
labels.
labels = imread(fullfile(labelDir,'0001TP_006690_L.png'));
figure
7-56
Semantic Segmentation Examples
imshow(labels)
7-57
7 Object Detection
imds = imageDatastore(fullfile(imageDir,'701_StillsRaw_full'));
pxds = pixelLabelDatastore(labelDir,classNames,labelIDs);
I = readimage(imds,10);
C = readimage(pxds,10);
The pixel label image is returned as a categorical array where C(i,j) is the categorical
label assigned to pixel I(i,j). Display the pixel label image on top of the image.
B = labeloverlay(I,C,'Colormap',labelIDs./255);
figure
imshow(B)
% Add a colorbar.
N = numel(classNames);
ticks = 1/(N*2):1/N:1;
colorbar('TickLabels',cellstr(classNames),'Ticks',ticks,'TickLength',0,'TickLabelInterp
colormap(labelIDs./255)
7-58
Semantic Segmentation Examples
It is common for pixel labeled datasets to include "undefined" or "void" labels. These are
used to designate pixels that were not labeled. For example, in CamVid, the label ID [0 0
0] is used to designate the "void" class. Training algorithms and evaluation algorithms are
not expected to include these labels in any computations.
The "void" class need not be explicitly named when using pixelLabelDatastore. Any
label ID that is not mapped to a class name is automatically labeled "undefined" and is
excluded from computations. To see the undefined pixels, use isundefined to create a
mask and then display it on top of the image.
7-59
7 Object Detection
undefinedPixels = isundefined(C);
B = labeloverlay(I,undefinedPixels);
figure
imshow(B)
title('Undefined Pixel Labels')
Combine Classes
When working with public datasets, you may need to combine some of the classes to
better suit your application. For example, you may want to train a semantic segmentation
network that segments a scene into 4 classes: road, sky, vehicle, pedestrian, and
7-60
Semantic Segmentation Examples
background. To do this with the CamVid dataset, group the label IDs defined above to fit
the new classes. First, define the new class names.
newClassNames = ["road","sky","vehicle","pedestrian","background"];
% "sky"
[
128 128 128; ... % "Sky"
]
% "vehicle"
[
064 000 128; ... % "Car"
064 128 192; ... % "SUVPickupTruck"
192 128 192; ... % "Truck_Bus"
192 064 128; ... % "Train"
000 128 192; ... % "Bicyclist"
192 000 192; ... % "MotorcycleScooter"
128 064 064; ... % "OtherMoving"
]
% "pedestrian"
[
064 064 000; ... % "Pedestrian"
192 128 064; ... % "Child"
064 000 192; ... % "CartLuggagePram"
064 128 064; ... % "Animal"
]
% "background"
[
128 128 000; ... % "Tree"
7-61
7 Object Detection
pxds = pixelLabelDatastore(labelDir,newClassNames,groupedLabelIDs);
Read the 10th pixel label image and display it on top of the image.
C = readimage(pxds,10);
cmap = jet(numel(newClassNames));
B = labeloverlay(I,C,'Colormap',cmap);
figure
imshow(B)
% add colorbar
N = numel(newClassNames);
ticks = 1/(N*2):1/N:1;
colorbar('TickLabels',cellstr(newClassNames),'Ticks',ticks,'TickLength',0,'TickLabelInt
colormap(cmap)
7-62
Semantic Segmentation Examples
The pixelLabelDatastore with the new class names can now be used to train a
network for the 4 classes without having to modify the original CamVid pixel labels.
References
[1] Brostow, Gabriel J., Julien Fauqueur, and Roberto Cipolla. "Semantic object classes in
video: A high-definition ground truth database." Pattern Recognition Letters 30.2 (2009):
88-97.
7-63
7 Object Detection
[2] Everingham, M., et al. "The PASCAL visual object classes challenge 2012 results." See
http://www. pascal-network. org/challenges/VOC/voc2012/workshop/index. html. Vol. 5.
2012.
7-64
Faster R-CNN Examples
The procedure to convert a network into an R-CNN network is the same as the transfer
learning workflow for image classification. You replace the last 3 classification layers with
new layers that can support the number of object classes you want to detect, plus a
background class.
In ResNet-50, the last three layers are named fc1000, fc1000_softmax, and
ClassificationLayer_fc1000. Display the network, and zoom in on the section of the
network you will modify.
figure
plot(lgraph)
ylim([-5 16])
7-65
7 Object Detection
7-66
Faster R-CNN Examples
Add the new classification layers to the network. The layers are setup to classify the
number of objects the network should detect plus an additional background class. During
detection, the network processes cropped image regions and classifies them as belonging
to one of the object classes or background.
% Specify the number of classes the network should classify.
numClassesPlusBackground = 2 + 1;
7-67
7 Object Detection
];
% Display the final R-CNN network. This can be trained using trainRCNNObjectDetector.
figure
plot(lgraph)
ylim([-5 16])
7-68
Faster R-CNN Examples
Start by creating an R-CNN network that forms the basis of Fast R-CNN. The “Create R-
CNN Object Detection Network” on page 7-65 example explains this section of code in
detail.
7-69
7 Object Detection
Add a box regression layer to learn a set of box offsets to apply to the region proposal
boxes. The learned offsets transform the region proposal boxes so that they are closer to
the original ground truth bounding box. This transformation helps improve the
localization performance of Fast R-CNN.
The box regression layers are composed of a fully connected layer followed by an R-CNN
box regression layer. The fully connected layer is configured to output a set of 4 box
offsets for each class. The background class is excluded because the background
bounding boxes are not refined.
The box regression layers are typically connected to same layer the classification branch
is connected to.
7-70
Faster R-CNN Examples
The next step is to choose which layer in the network to use as the feature extraction
layer. This layer will be connected to the ROI max pooling layer which will pool features
for classifying the pooled regions. Selecting a feature extraction layer requires empirical
evaluation. For ResNet-50, a typical feature extraction layer is the output of the 4-th block
of convolutions, which corresponds to the layer named activation40_relu.
featureExtractionLayer = 'activation_40_relu';
figure
plot(lgraph)
ylim([30 42])
7-71
7 Object Detection
In order to insert the ROI max pooling layer, first disconnect the layers attached to the
feature extraction layer: res5a_branch2a and res5a_branch1.
outputSize = 1×2
14 14
7-72
Faster R-CNN Examples
roiPool = roiMaxPooling2dLayer(outputSize,'Name','roiPool');
% Connect the output of ROI max pool to the disconnected layers from above.
lgraph = connectLayers(lgraph, 'roiPool','res5a_branch2a');
lgraph = connectLayers(lgraph, 'roiPool','res5a_branch1');
% Show the result after adding and connecting the ROI max pooling layer.
figure
plot(lgraph)
ylim([30 42])
7-73
7 Object Detection
Finally, connect the ROI input layer to the second input of the ROI max pooling layer.
% Connect ROI input layer to the 'roi' input of the ROI max pooling layer.
lgraph = connectLayers(lgraph, 'roiInput','roiPool/roi');
% Show the resulting faster adding and connecting the ROI input layer.
figure
plot(lgraph)
ylim([30 42])
7-74
Faster R-CNN Examples
7-75
7 Object Detection
Start by creating Fast R-CNN, which forms the basis of Faster R-CNN. The “Create Fast R-
CNN Object Detection Network” on page 7-69 example explains this section of code in
detail.
7-76
Faster R-CNN Examples
% Connect the output of ROI max pool to the disconnected layers from above.
lgraph = connectLayers(lgraph, 'roiPool','res5a_branch2a');
lgraph = connectLayers(lgraph, 'roiPool','res5a_branch1');
Faster R-CNN uses a region proposal network (RPN) to generate region proposals. An
RPN produces region proposals by predicting the class, “object” or “background”, and
box offsets for a set of predefined bounding box templates known as "anchor boxes".
Anchor boxes are specified by providing their size, which is typically determined based on
a priori knowledge of the scale and aspect ratio of objects in the training dataset.
Learn more about “Anchor Boxes for Object Detection” on page 7-9.
7-77
7 Object Detection
Add the convolution layers for RPN and connect it to the feature extraction layer selected
above.
% Number of anchor boxes.
numAnchors = size(anchorBoxes,1);
rpnLayers = [
convolution2dLayer(3, numFilters,'padding',[1 1],'Name','rpnConv3x3')
reluLayer('Name','rpnRelu')
];
Add the RPN classification output layers. The classification layer classifies each anchor as
"object" or "background".
% Add RPN classification layers.
rpnClsLayers = [
convolution2dLayer(1, numAnchors*2,'Name', 'rpnConv1x1ClsScores')
rpnSoftmaxLayer('Name', 'rpnSoftmax')
rpnClassificationLayer('Name','rpnClassification')
];
lgraph = addLayers(lgraph, rpnClsLayers);
Add the RPN regression output layers. The regression layer predicts 4 box offsets for
each anchor box.
% Add RPN regression layers.
rpnRegLayers = [
convolution2dLayer(1, numAnchors*4, 'Name', 'rpnConv1x1BoxDeltas')
rcnnBoxRegressionLayer('Name', 'rpnBoxDeltas');
7-78
Faster R-CNN Examples
];
Finally, connect the classification and regression feature maps to the region proposal
layer inputs, and the ROI pooling layer to the region proposal layer output.
7-79
7 Object Detection
7-80
Train Object Detector or Semantic Segmentation Network from Ground Truth Data
7-81
7 Object Detection
• Image Labeler — “Get Started with the Image Labeler” on page 7-100
• Video Labeler — “Get Started with the Video Labeler” on page 7-109
• Ground Truth Labeler — “Get Started with the Ground Truth Labeler”
(Automated Driving Toolbox)
You can choose from one of the built-in algorithms or create your own custom
algorithm to label objects in your data. To learn how to create your own automation
algorithm, see “Create Automation Algorithm for Labeling” on page 7-84.
3 Export labels: After labeling your data, you can export the labels to the workspace
or save them to a file. The labels are exported as a groundTruth object. If your data
source consists of multiple image collections, label the entire set of image collections
to obtain an array of groundTruth objects. For details about sharing groundTruth
objects, see “Share and Store Labeled Ground Truth Data” on page 7-147.
4 Create training data: To create training data from the groundTruth object, use
one of these functions:
Sample the ground truth data by specifying a sampling factor. Sampling mitigates
overtraining an object detector on similar samples. For objects created using a video
file or custom data source, the objectDetectorTrainingData and
pixelLabelTrainingData functions write images to disk for groundTruth.
5 Train algorithm:
• Object detectors — Use one of several Computer Vision Toolbox object detectors.
See “Object Detection Using Features”. For object detectors specific to automated
driving, see the Automated Driving Toolbox object detectors listed in “Visual
Perception” (Automated Driving Toolbox).
7-82
See Also
See Also
Apps
Ground Truth Labeler | Image Labeler | Video Labeler
Functions
groundTruth | groundTruthDataSource | objectDetectorTrainingData |
pixelLabelTrainingData | semanticseg | trainACFObjectDetector |
trainFasterRCNNObjectDetector | trainRCNNObjectDetector |
trainRCNNObjectDetector
More About
• “Get Started with the Image Labeler” on page 7-100
• “Get Started with the Video Labeler” on page 7-109
• “Get Started with the Ground Truth Labeler” (Automated Driving Toolbox)
• “Create Automation Algorithm for Labeling” on page 7-84
• “Semantic Segmentation Basics” on page 7-30
• “Object Detection Using Deep Learning”
7-83
7 Object Detection
To define and use a custom automation algorithm with your loaded data source:
projectFolder = fullfile('local','MyProject');
automationFolder = fullfile('+vision','+labeler');
mkdir(projectFolder,automationFolder)
2 Define a class that inherits from the AutomationAlgorithm class: At the
MATLAB command prompt, enter the appropriate command to open the labeling app
you want: imageLabeler, videoLabeler, or groundTruthLabeler. Then click
Select Algorithm > Add Algorithm > Create new algorithm to open the
vision.labeler.AutomationAlgorithm class template. Define your algorithm by
following the instructions in the header and comments in the class.
3 Save the file: Save the file to the +vision/+labeler package folder to use your
custom algorithm from within the app. To add a folder to the path, use the addpath
function.
4 Refresh the algorithm list: To start using your custom algorithm, refresh the
algorithm list for it to display in the list of algorithms. In the app, click Select
Algorithm > Refresh list in the app.
7-84
Create Automation Algorithm for Labeling
When you click Automate, the app checks each label definition in the ROI Label
Definition and Scene Label Definition panes by using the checkLabelDefinition
method defined in your custom algorithm. Label definitions that return true are retained
for automation. Label definitions that return false are disabled and not included. Use
the checkLabelDefinition method to choose a subset of label definitions that are valid
for your custom algorithm. For example, if your custom algorithm is a semantic
segmentation algorithm, use this method to return false for label definitions that are not
of type PixelLabel.
7-85
7 Object Detection
After you select the algorithm, click Automate to start an automation session. Then, click
Settings, which enables you to modify custom app settings. To control the Settings
options, use the settingsDialog method.
When you first run the algorithm, the app calls the checkSetup method to check if it is
ready for execution. If the method returns true, the app calls the initialize method
and then the run method on every image selected for automation. Then, the app calls the
terminate method.
Use the checkSetup method to check whether all conditions needed for your custom
algorithm are set up correctly. For example, before running the algorithm, check that the
scene contains at least one ROI label before running the algorithm. Use the initialize
method to initialize the state for your custom algorithm by using the image. Use the run
method to implement the core of the algorithm that computes and returns labels for each
image. Use the terminate method to clean up or terminate the state after the algorithm
runs.
7-86
See Also
See Also
Apps
Ground Truth Labeler | Image Labeler | Video Labeler
Functions
groundTruth | groundTruthDataSource |
vision.labeler.AutomationAlgorithm | vision.labeler.mixin.Temporal
Related Examples
• “Automate Ground Truth Labeling of Lane Boundaries” (Automated Driving Toolbox)
• “Automate Ground Truth Labeling for Semantic Segmentation” (Automated Driving
Toolbox)
• “Automate Attributes of Labeled Objects” (Automated Driving Toolbox)
More About
• “Get Started with the Image Labeler” on page 7-100
• “Get Started with the Video Labeler” on page 7-109
• “Get Started with the Ground Truth Labeler” (Automated Driving Toolbox)
• “Temporal Automation Algorithms” on page 7-139
7-87
7 Object Detection
• Image Labeler — “Get Started with the Image Labeler” on page 7-100
• Video Labeler — “Get Started with the Video Labeler” on page 7-109
• Ground Truth Labeler — “Get Started with the Ground Truth Labeler” (Automated
Driving Toolbox)
This example shows pixel labeling with the Image Labeler. You use the same tools to
label videos and image sequences with the Video Labeler or Ground Truth Labeler.
Select a pixel label definition from the ROI Label Definition pane. A Label Pixels tab
opens, containing tools to label pixels manually using polygons, brushes, or flood fill. You
can use the labeling tools in any order. This tab also has controls to adjust the display of
the image by zooming and panning and to adjust the opacity of the labels.
This example uses two general strategies to label pixels in the highway image:
• First use the semi-automated tools, such as Flood Fill and Smart Polygon. Then,
refine the labels using tools that offer more direct control, such as Polygon, Assisted
Freehand and Brush.
• First label distant objects with a rough estimation of object borders. Then, label nearer
objects with more precise object borders.
7-88
Label Pixels for Semantic Segmentation
1
Select the tool and a label. The pointer changes to a paint can .
2 Click a starting pixel in the image.
7-89
7 Object Detection
You can undo the flood fill, or any other labeling operation, by pressing Ctrl+Z.
1
Select the tool and a label. The pointer changes to a crosshair .
2 Click to add polygon vertices. Completely surround the object of interest, with some
space between the object and the polygon.
3 Close the polygon by clicking the first vertex after placing the other vertices.
Alternatively, you can double-click to add the last vertex and close the polygon in one
step.
After you close the polygon, the tool draws an initial label.
4 Adjust the shape and position of the polygon. When the object of interest extends to
the edge of the image, drag vertices to the edge of the image to ensure that the smart
polygon completely encloses the object. For instance, this example shows the two
leftmost vertices placed at the left edge of the image.
7-90
Label Pixels for Semantic Segmentation
7-91
7 Object Detection
Goal Control
Move vertex Click and drag the vertex.
Add vertex • Right-click the polygon boundary at the position of the
new vertex, and select Add Point.
• Double-click the point on the boundary.
Delete vertex Right-click the vertex and select Delete Vertex.
Move polygon Click and drag any point on the polygon boundary
(excluding vertices).
Delete polygon Right-click the polygon boundary and select Delete
Polygon.
5 Use the Smart Polygon Editor tools to refine the label.
• Select Mark Foreground to mark areas inside the region that you want to label.
Foreground marks appear in green.
• Select Mark Background to mark areas inside the region that you do not want to
label. Background marks appear in red.
• Select Erase Marks to remove foreground or background marks that are no
longer needed.
• See Tips on page 7-98 for additional suggestions on using the Smart Polygon
tool.
6 To finalize the label, press Enter or select a new ROI Label Definition. You can no
longer edit the polygon vertices or mark foreground and background regions.
7-92
Label Pixels for Semantic Segmentation
Add additional polygons over structures such as barriers and the road. Many vehicle
pixels are incorrectly labeled. The next step shows how to replace the erroneous labels
with the correct label.
7-93
7 Object Detection
7-94
Label Pixels for Semantic Segmentation
This example uses the Smart Polygon tool to label pixels belonging to the truck.
Foreground marks assign the vehicle label to subregions. Background marks revert
subregions to their prior label. For instance, in the first pair of images, background marks
revert subregions to the sky and vegetation labels. Similarly, in the second pair of images,
background marks revert subregions to the road label.
7-95
7 Object Detection
The border of the truck is jagged because Smart Polygon labels entire subregions, not
individual pixels. The next step shows how to refine the labels along the border of the
truck.
1
Select the tool and a label. The pointer changes to a pen , and a square appears to
indicate the size of the brush.
2 Adjust the size of the brush by using the Brush Size slider.
3 Click and drag the mouse to label pixels.
The Erase tool removes pixel labels when you draw over the image with the mouse.
7-96
Label Pixels for Semantic Segmentation
The Label Opacity slider adjusts the opacity of all pixel labels.
• Decrease the opacity to see the image more clearly. For instance, decrease the opacity
to make it easier to find the border between the bottom of the car and the road.
• Increase the opacity to see the segmentation more clearly. For instance, increase the
opacity to see that edge along the front bumper of the car should be smoothed. Also,
observe that the barrier and some distant vehicles have unlabeled pixels.
7-97
7 Object Detection
Tips
• The Smart Polygon tool identifies an object of interest by using regional graph-based
segmentation ("GrabCut") [1]. The Smart Polygon tool divides the image into
subregions. The tool treats all subregions that are fully or partially outside the polygon
as belonging to the background. Therefore, to get an optimal segmentation, make sure
the object to be labeled is fully contained within the polygon, surrounded by a few
background pixels.
All pixels within a subregion have the same label. Marking pixels outside the polygon
has no effect on the label.
• To delete the most recently labeled ROI, press Ctrl+Z.
• Each pixel can have at most one pixel label. When you apply a label to a pixel, the new
label replaces the previous label.
7-98
See Also
• Pixel labeling is disabled when you pan and zoom the image. You must click the Label
button to resume pixel labeling.
• To ensure that all pixels in an image are labeled, begin by labeling the entire image
with a single label. Pick a label that represents a predominant ROI in the image, such
as sky, road, or background. Then, use the labeling tools to relabel objects with their
correct label.
References
[1] Rother, C., V. Kolmogorov, and A. Blake. "GrabCut - Interactive Foreground Extraction
using Iterated Graph Cuts". ACM Transactions on Graphics (SIGGRAPH). Vol. 23,
Number 3, 2004, pp. 309–314.
See Also
Ground Truth Labeler | Image Labeler | Video Labeler
More About
• “Get Started with the Image Labeler” on page 7-100
• “Get Started with the Video Labeler” on page 7-109
• “Get Started with the Ground Truth Labeler” (Automated Driving Toolbox)
• “How Labeler Apps Store Exported Pixel Labels” on page 7-3
7-99
7 Object Detection
• Use rectangular ROI labels for objects such as vehicles, pedestrians, and road
signs.
7-100
Get Started with the Image Labeler
• Use pixel labels for areas such as backgrounds, roads, and buildings.
• Use scene labels for conditions such as lighting and weather conditions, or for
events such as lane changes.
• Use built-in detection or tracking to automatically label the regions and scene labels.
• Write, import, and use your own custom automation algorithm to automatically label a
region and scene labels.
• Export the ground truth labels for object detector training, semantic segmentation, or
image classification.
• Data Source: Add images from a folder or by using the imageDatastore function.
• Label Definitions: Load a previously saved set of label definitions from a file. Label
definitions specify the names and types of items to label.
• Session: Load a previously saved session.
To import ROI and scene labels into the app, click Import Labels. You can import labels
from the MATLAB workspace or from previously exported MAT-files. The imported labels
must be groundTruth objects.
7-101
7 Object Detection
2 Specify a label name and choose either Rectangle or Pixel label for the label type
from the drop-down menu.
3 Use the optional Group field to create a group. Click New Group from the drop-
down menu and enter a group title in the field that appears. You can move a label to a
different group by left-clicking and dragging the label.
7-102
Get Started with the Image Labeler
To create labels from the MATLAB command line, use the labelDefinitionCreator
object.
• To draw ROI labels manually, select an ROI label definition from the left pane and use
the mouse to draw the regions on the image frames.
• To label individual pixels, see “Label Pixels for Semantic Segmentation” on page 7-88.
7-103
7 Object Detection
• To mark scene labels manually, select a scene label definition from the left pane and
then click Add Label.
• Built-In Algorithm: Track people using the aggregated channel features (ACF)
people detector algorithm.
• Add a Custom Algorithm: To define and use a custom automation algorithm with the
Image Labeler app, see “Create Automation Algorithm for Labeling” on page 7-84.
• Import an Algorithm: To import your own algorithm, select Algorithm > Add
Algorithm > Import Algorithm.
2 Click Run.
3 Examine the results of running the algorithm. If they are not satisfactory, click Undo
Run and change algorithm settings by clicking Settings.
7-104
See Also
4 When you are satisfied with the algorithm results, click Accept. To delete the labels
generated during the automation session, click Cancel. The Cancel button cancels
only the algorithm session, not the app session.
Note Pixel label data and ground truth data are saved in separate files. The app saves
both files in the same folder. Keep these tips in mind:
• The groundTruth object contains the file paths corresponding to the data source and
the pixel label data. If you move the data source and pixel label data to a different
folder, to update the paths stored within the groundTruth object, use the
changeFilePaths function.
• If you used an image collection to create your ground truth, do not delete images from
the location you loaded them from. The path to those images is saved in the
groundTruth object.
• You can move the groundTruth MAT-file to a different folder.
For more details, see “How Labeler Apps Store Exported Pixel Labels” on page 7-3 and
“Share and Store Labeled Ground Truth Data” on page 7-147.
See Also
Apps
Image Labeler
Objects
groundTruth | groundTruthDataSource
7-105
7 Object Detection
More About
• “Create Automation Algorithm for Labeling” on page 7-84
• “Train Object Detector or Semantic Segmentation Network from Ground Truth
Data” on page 7-81
• “Keyboard Shortcuts and Mouse Actions for Image Labeler” on page 7-153
7-106
Choose a Labeling App
One key consideration is the type of data that you want to label.
• If your data is an image collection, use the Image Labeler app. An image collection is
an unordered set of images that can vary in size. For example, you can use the app to
label images of books to train a classifier.
• If your data is a video or image sequence, use the Video Labeler or Ground Truth
Labeler app. An image sequence is an ordered set of images that resemble a video.
For example, you can use these apps to label a video or image sequence of cars driving
on a highway to train an object detector.
The table summarizes the key features of all three labeling apps.
• Sublabels
• Attributes
7-107
7 Object Detection
See Also
More About
• “Get Started with the Image Labeler” on page 7-100
• “Get Started with the Video Labeler” on page 7-109
• “Get Started with the Ground Truth Labeler” (Automated Driving Toolbox)
7-108
Get Started with the Video Labeler
videoLabeler('visiontraffic.avi')
Alternatively, open the app from the Apps tab, under Image Processing and Computer
Vision. Then, from the Load menu, load a video data source.
Explore the video. Click the Play button to play the entire video, or use the slider
to navigate between frames.
7-109
7 Object Detection
The app also enables you to load image sequences, with corresponding timestamps, by
selecting Load > Image Sequence. The images must be readable by imread.
To load a custom data source that is readable by VideoReader or imread, see “Use
Custom Data Source Reader for Ground Truth Labeling” on page 7-130.
7-110
Get Started with the Video Labeler
Optionally, to make adjustments to the time interval, click and drag the red interval flags.
The entire app is now set up to focus on this specific time interval. The video plays only
within this interval, and labeling and automation algorithms apply only to this interval.
You can change the interval at any time by moving the flags.
To expand the time interval to fill the entire playback section, click Zoom in Time
Interval.
An ROI label is a label that corresponds to a region of interest (ROI). You can define these
types of ROI labels.
7-111
7 Object Detection
7-112
Get Started with the Video Labeler
In this example, you define a vehicle group for labeling types of vehicles, and then
create a Rectangle ROI label for a Car and a Truck.
The Vehicle group name appears in the ROI Label Definition pane with the label
Car created. You can move a labels to a different position or group by left-clicking
and dragging the label.
5 Add a second label. Click Label. Name the label Truck and make sure the Vehicle
group is selected. Click OK.
6 In the first video frame within the time interval, use the mouse to draw rectangular
Car ROIs around the two vehicles.
7-113
7 Object Detection
Create Sublabels
A sublabel is a type of ROI label that corresponds to a parent ROI label. Each sublabel
must belong to, or be a child of, a specific label defined in the ROI Label Definition
pane. For example, in a driving scene, a vehicle label might have sublabels for headlights,
license plates, or wheels.
1 In the ROI Label Definition pane on the left, click the Car label.
2 Click Sublabel.
3 Create a Rectangle sublabel named headlight and optionally write a description.
Click OK.
The headlight sublabel appears in the ROI Label Definition pane. The sublabel is
nested under the selected ROI label, Car, and has the same color as its parent label.
You can add multiple sublabels under a label. You can also drag-and-drop the
sublabels to reorder them in the list. Right-click any label for additional edits.
7-114
Get Started with the Video Labeler
7-115
7 Object Detection
Sublabels can only be used with rectangular or polyline ROI labels and cannot have their
own sublabels. For more details on working with sublabels, see “Use Sublabels and
Attributes to Label Ground Truth Data” on page 7-134.
Create Attributes
String
Logical
7-116
Get Started with the Video Labeler
1 In the ROI Label Definition pane on the left, select the Car label and click
Attribute.
2 In the Attribute Name box, type carType. Set the attribute type to List.
3 In the List Items section, type different types of cars, such as Sedan, Hatchback,
and Wagon, each on its own line. Optionally give the attribute a description, and click
OK.
7-117
7 Object Detection
4 In the first frame of the video, select a Car ROI label. In the Attributes and
Sublabels pane, select the appropriate carType attribute value for that vehicle.
5 Repeat the previous step to assign a carType attribute to the other vehicle.
You can also add attributes to sublabels. Add an attribute for the headlight sublabel that
tells whether the headlight is on.
1 In the ROI Label Definition pane on the left, select the headlight sublabel and
click Attribute.
7-118
Get Started with the Video Labeler
2 In the Attribute Name box, type isOn. Set the attribute type to Logical. Leave the
Default Value set to Empty, optionally write a description, and click OK.
3 Select a headlight in the video frame. Set the appropriate isOn attribute value, or
leave the attribute value set to Empty.
4 Repeat the previous step to set the isOn attribute for the other headlights.
7-119
7 Object Detection
To delete an attribute, right-click an ROI label or sublabel, and select the attribute to
delete. Deleting the attribute removes attribute information from all previously created
ROI label annotations.
A scene label defines additional information for the entire scene. Use scene labels to
describe conditions, such as lighting and weather, or events, such as lane changes.
The Scene Label Definition pane shows the scene label definition. The scene labels
that are applied to the current frame appear in the Scene Labels pane on the right.
The sunny scene label is empty (white), because the scene label has not yet been
applied to the frame.
2 The entire scene is sunny, so specify to apply the sunny scene label over the entire
time interval. With the sunny scene label definition still selected in the Scene Label
Definition pane, select Time Interval.
3 Click Add Label.
The sunny label now applies to all frames in the time interval.
7-120
Get Started with the Video Labeler
When you click the right arrow key to advance to the next frame, the ROI labels from the
previous frame do not carry over. Only the sunny scene label applies to each frame,
because this label was applied over the entire time interval.
Advance frame by frame and draw the label and sublabel ROIs manually. Also update the
attribute information for these ROIs.
To speed up the labeling process, you can use an automation algorithm within the app.
You can either define your own automation algorithm, see “Create Automation Algorithm
for Labeling” on page 7-84 and “Temporal Automation Algorithms” on page 7-139, or use
a built-in automation algorithm. In this example, you label the ground truth using a built-
in point tracking algorithm.
In this example, you automate the labeling of only the Car ROI labels. The built-in
automation algorithms do not support sublabel and attribute automation.
1 Select the labels you want to automate. In the first frame of the video, press Ctrl and
click to select the two Car label annotations. The labels are highlighted in yellow.
7-121
7 Object Detection
2 From the app toolstrip, select Select Algorithm > Point Tracker. This algorithm
tracks one or more rectangle ROIs over short intervals using the Kanade-Lucas-
Tomasi (KLT) algorithm.
3 (optional) Configure the automation settings. Click Configure Automation. By
default, the automation algorithm applies labels from the start of the time interval to
the end. To change the direction and start time of the algorithm, choose one of the
options shown in this table.
7-122
Get Started with the Video Labeler
The Import selected ROIs must be selected so that the Car labels you selected are
imported into the automation session.
7-123
7 Object Detection
The vehicles that enter the scene later are unlabeled. The unlabeled vehicles did not
have an initial ROI label, so the algorithm did not track them. Click Undo Run. Use
the slider to find the frames where each vehicle first appears. Draw vehicle ROIs
around each vehicle, and then click Run again.
7 Advance frame by frame and manually move, resize, delete, or add ROIs to improve
the results of the automation algorithm.
When you are satisfied with the algorithm results, click Accept. Alternatively, to
discard labels generated during the session and label manually instead, click Cancel.
The Cancel button cancels only the algorithm session, not the app session.
Optionally, you can now manually label the remaining frames with sublabel and attribute
information.
To further evaluate your labels, you can view a visual summary of the labeled ground
truth. From the app toolstrip, select View Label Summary. Use this summary to
compare the frames, frequency of labels, and scene conditions. For more details, see
“View Summary of Ground Truth Labels” on page 7-141. This summary does not support
sublabels or attributes.
7-124
Get Started with the Video Labeler
Note If you export pixel data, the pixel label data and ground truth data are saved in
separate files but in the same folder. For considerations when working with exported pixel
labels, see “How Labeler Apps Store Exported Pixel Labels” on page 7-3.
In this example, you export the labeled ground truth to the MATLAB workspace. From the
app toolstrip, select Export Labels > To Workspace. The exported MATLAB variable,
gTruth, is a groundTruth object.
Display the properties of the exported groundTruth object. The information in your
exported object might differ from the information shown here.
gTruth
gTruth =
Data Source
ans =
Source: ...matlab\toolbox\vision\visiondata\visiontraffic.avi
TimeStamps: [531×1 duration]
7-125
7 Object Detection
Label Definitions
Display the label definitions table. Each row contains information about an ROI label
definition or a scene label definition. If you exported pixel label data, the
LabelDefinitions table also includes a PixelLabelID column containing the ID
numbers for each pixel label definition.
gTruth.LabelDefinitions
ans =
3×5 table
Display the sublabel and attribute information for the Car label.
gTruth.LabelDefinitions.Hierarchy{1}
ans =
7-126
Get Started with the Video Labeler
ans =
Type: Rectangle
Description: ''
isOn: [1×1 struct]
ans =
Label Data
LabelData is a timetable containing information about the ROI labels drawn at each
timestamp, across the entire video. The timetable contains one column per label.
Display the first few rows of the timetable. The first few timestamps indicate that no
vehicles were detected and that the sunny scene label is false. These results are
because this portion of the video was not labeled. Only the time interval of 5–10 seconds
was labeled.
labelData = gTruth.labelData;
head(labelData)
ans =
8×3 timetable
7-127
7 Object Detection
Display the first few timetable rows from the 5-10 second interval that contains labels.
gTruthInterval = labelData(timerange('00:00:05','00:00:10'),:);
head(gTruthInterval)
ans =
8×3 timetable
For each Car label, the structure includes the position of the bounding box and
information about its sublabels and attributes.
Display the bounding box positions for the vehicles at the start of the time interval. Your
bounding box positions might differ from the ones shown here.
ans =
ans =
7-128
See Also
The app session MAT-file is separate from the ground truth MAT-file that is exported when
you select Export > From File. To share labeled ground truth data, as a best practice,
share the ground truth MAT-file containing the groundTruth object, not the app session
MAT-file. For more details, see “Share and Store Labeled Ground Truth Data” on page 7-
147.
See Also
Apps
Video Labeler
Objects
groundTruth | groundTruthDataSource | labelDefinitionCreator |
vision.labeler.AutomationAlgorithm | vision.labeler.mixin.Temporal
More About
• “Use Custom Data Source Reader for Ground Truth Labeling” on page 7-130
• “Keyboard Shortcuts and Mouse Actions for Video Labeler” on page 7-157
• “Use Sublabels and Attributes to Label Ground Truth Data” on page 7-134
• “Label Pixels for Semantic Segmentation” on page 7-88
• “Create Automation Algorithm for Labeling” on page 7-84
• “View Summary of Ground Truth Labels” on page 7-141
• “Share and Store Labeled Ground Truth Data” on page 7-147
• “Train Object Detector or Semantic Segmentation Network from Ground Truth
Data” on page 7-81
7-129
7 Object Detection
The Ground Truth Labeler (requires Automated Driving Toolbox) and Video Labeler
apps enable you to label ground truth data in a video or in a sequence of images.
You can use a custom reader to import any video or sequence of images that is supported
by VideoReader or imread. You can either use the custom reader dialog box in the app
or open the app and specify a custom reader source.
The Image Labeler app does not support custom data source readers.
7-130
Use Custom Data Source Reader for Ground Truth Labeling
Specify a custom reader as a function handle. The custom reader must have the syntax:
outputImage = readerFcn(sourceName,currentTimeStamp)
The custom reader function loads an image from sourceName, which corresponds to the
current timestamp specified by currentTimeStamp.
currentTimeStamp = timestamps(currIdx);
The outputImage from the custom function must be a grayscale or RGB image in any
format supported by imshow. currentTimeStamp is a scalar value that corresponds to
the current frame that the algorithm is executing.
Use the groundTruthDataSource function to read the custom source data with the
custom reader function handle:
7-131
7 Object Detection
gtSource = groundTruthDataSource(sourceName,readerFcn,timeStamps)
The syntax returns a groundTruthDataSource object with the custom reader function
handle, readerFcn. The app uses the handle to load the custom data source specified by
sourceName. The custom reader function loads an image from sourceName that
corresponds to the current timestamp specified by the indexed value in the timeStamps
vector.
The syntax returns a groundTruthDataSource object, which the app uses to read data
from the custom source.
Use the groundTruthDataSource function to read the custom source data with the
custom reader function handle:
gtSource = groundTruthDataSource(sourceName,readerFcn,timeStamps)
The syntax returns a groundTruthDataSource object with the custom reader function
handle, readerFcn. The app uses the handle to load the custom data source specified by
sourceName. The custom reader function loads an image from sourceName that
corresponds to the current timestamp specified by the indexed value in the timeStamps
vector.
The syntax returns a groundTruthDataSource object, which the app uses to read data
from the custom source.
You can import the returned groundTruthDataSource object into the Ground Truth
Labeler or Video Labeler app. For example:
groundTruthLabeler(gtSource)
videoLabeler(gtSource)
See Also
Apps
Ground Truth Labeler | Video Labeler
7-132
See Also
Functions
groundTruth | groundTruthDataSource
More About
• “Get Started with the Ground Truth Labeler” (Automated Driving Toolbox)
• “Get Started with the Video Labeler” on page 7-109
7-133
7 Object Detection
Consider the possible sublabel and attribute candidates for the label vehicle:
• A wheel is a good candidate for a sublabel. A wheel is part of a vehicle, and you can
draw a label around a wheel.
7-134
Use Sublabels and Attributes to Label Ground Truth Data
• Vehicle color is a good candidate for an attribute. You cannot draw a label around the
color of a vehicle.
• Vehicle type (car, truck, and so on) is a good candidate for an attribute. Although you
can draw a label around cars and trucks, they are not part of a vehicle. Instead, you
can define a list attribute with types car and truck, or define logical attributes
named isCar, isTruck, and so on.
Draw Sublabels
Within each frame, each sublabel that you draw must be associated with a parent label.
Therefore, before you can draw a sublabel on a frame, you must:
1 From the ROI Label Definition pane, select the type of sublabel that you want to
draw.
2 Within the frame, select a parent ROI label.
For example, to label the headlights of a vehicle, you must first select the headlight
sublabel definition. On the frame, however, you cannot yet create a sublabel.
After you select a vehicle label on the frame, you can draw a sublabel that is associated
with that vehicle. Once you create a sublabel, you cannot add another sublabel to the
vehicle unless you select the vehicle label again.
7-135
7 Object Detection
Notice that sublabels do not have to be completely enclosed within the parent label. You
can drag sublabels outside the bounds of the parent label and the parent-child
relationship remains unchanged.
If you copy a sublabel into another frame, the parent label is copied over as well. That
way, the parent-child relationship is maintained between frames. Any sublabels that you
did not select to copy do not appear in the new frame.
7-136
Use Sublabels and Attributes to Label Ground Truth Data
If you copy a parent label, however, the associated sublabels are not copied over.
7-137
7 Object Detection
Delete Sublabels
To delete an ROI sublabel from a frame, right-click the sublabel and select the Delete
option for the sublabel shape.
To delete an ROI sublabel definition, from the ROI Label Definition pane, right-click the
sublabel and select Delete.
Caution If you delete a sublabel, all ROI sublabel annotations currently on the frames
are deleted as well. Attribute definitions for that sublabel are deleted as well.
Sublabel Limitations
• Sublabels can be used only with rectangle and polyline labels.
• Sublabels cannot have their own sublabels.
• The built-in automation algorithms do not support sublabel automation.
• When you click View Label Summary, the Label Summary window does not display
sublabel information.
See Also
Apps
Ground Truth Labeler | Video Labeler
Functions
labelDefinitionCreator
More About
• “Get Started with the Ground Truth Labeler” (Automated Driving Toolbox)
• “Get Started with the Video Labeler” on page 7-109
• “Label Pixels for Semantic Segmentation” on page 7-88
• “Automate Attributes of Labeled Objects” (Automated Driving Toolbox)
7-138
Temporal Automation Algorithms
Class Inheritance
If your algorithm is time-based, you must inherit from the
vision.labeler.AutomationAlgorithm and vision.labeler.mixin.Temporal
classes. For example:
classdef MyCustomTemporalAlg < vision.labeler.AutomationAlgorithm && vision.labeler.mixin.Temporal
7-139
7 Object Detection
To create a temporal automation algorithm to use with the Ground Truth Labeler, open
the app by typing groundTruthLabeler at the MATLAB command prompt. Click Select
Algorithm > Add Algorithm > Create new algorithm to open the template.
See Also
Apps
Ground Truth Labeler | Image Labeler | Video Labeler
Functions
groundTruth | groundTruthDataSource |
vision.labeler.AutomationAlgorithm | vision.labeler.mixin.Temporal
Related Examples
• “Get Started with the Video Labeler” on page 7-109
• “Get Started with the Ground Truth Labeler” (Automated Driving Toolbox)
• “Automate Ground Truth Labeling for Semantic Segmentation” (Automated Driving
Toolbox)
• “Automate Ground Truth Labeling of Lane Boundaries” (Automated Driving Toolbox)
7-140
View Summary of Ground Truth Labels
You can use the Image Labeler, Video Labeler, and Ground Truth Labeler (requires
Automated Driving Toolbox) apps to interactively label ground truth data in an image
collection, video, image sequence, or from a custom data source. For details about the
supported data sources, see “Choose a Labeling App” on page 7-107.
You can use the View Label Summary option in the app to view and compare the session
distribution of ROI and scene labels over either time or frames.
For ROI labels, the graph displays the number of ROIs on the y-axis, at each time stamp
on the x-axis. The visual summary does not include information about sublabels or label
attributes.
For scene labels, the graph displays the presence or absence of a scene label at each
timestamp. For video, the x-axis represents the time in seconds. For images or for a
custom sequence of images, the x-axis represents frames. Use the graphs to examine the
occurrence of labels over time in relation to each other. Drag the black vertical line in any
graph to move the video to a different timestamp.
7-141
7 Object Detection
For pixel labels, the graph displays the percentage of the frame that is labeled with each
pixel label.
7-142
View Summary of Ground Truth Labels
To dock the Label Summary window in your workspace, select Layout > Dock Label
Summary.
7-143
7 Object Detection
7-144
See Also
See Also
Apps
Ground Truth Labeler | Image Labeler | Video Labeler
Functions
driving.connector.Connector | groundTruth | groundTruthDataSource |
objectDetectorTrainingData | pixelLabelTrainingData
More About
• “Choose a Labeling App” on page 7-107
• “Get Started with the Image Labeler” on page 7-100
7-145
7 Object Detection
7-146
Share and Store Labeled Ground Truth Data
• Data source
• Label definitions
• Marked ground truth labels
If the exported ground truth contains pixel labels, the app also generates a
PixelLabelData folder containing the pixel label data. The LabelData table stored in
the groundTruth object references the path to this folder. Share this folder along with
the groundTruth object.
The labeling apps also enable you to save a MAT-file of the entire app session. Do not
share this file. This file contains app preferences that are specific to your local machine,
and it might not work on other machines.
7-147
7 Object Detection
If you re-export a ground truth object containing pixel label data, the app generates a new
PixelLabelData folder. Even if you are overwriting the original groundTruth object,
the app generates a new PixelLabelData folder. The generated folders are named
PixelLabelData_1, PixelLabelData_2, and so on, depending on how many times you
re-export the groundTruth object to the same folder.
7-148
Share and Store Labeled Ground Truth Data
In addition to sharing the groundTruth object, you must also share the data source, and
any additional files associated with that data source.
7-149
7 Object Detection
7-150
Share and Store Labeled Ground Truth Data
gTruth.DataSource
ans =
Source: {
' ...\matlab\toolbox\vision\visiondata\imageSets\cups\big
' ...\matlab\toolbox\vision\visiondata\imageSets\cups\blu
' ...\matlab\toolbox\vision\visiondata\imageSets\cups\han
... and 9 more
}
If you move the groundTruth object to a new location, you might need to change the file
paths stored in the groundTruthDataSource object. Even if the data source files are on
a shared network, if other people map a different drive letter to their network folder, the
file paths can be incorrect.
To update these paths, use the changeFilePaths function. Specify the groundTruth
object as an input argument to this function. Also specify a cell array of string vectors
containing the old paths and new paths. For example: {["C:\Shared\ImgFolder
\Img1.png" "D:\Shared\ImgFolder\Img1.png"]; ["C:\Shared\ImgFolder
\Img2.png" "D:\Shared\ImgFolder\Img2.png"]; ...}.
If your groundTruth object contains pixel label data, the changeFilePaths function
also updates the path names to the pixel data stored in the PixelLabelData folder.
For a video, an image sequence, or an image collection containing images from a single
folder, consider storing the groundTruth object in the parent folder of the data source.
For image collections containing images from different folders, no specific
recommendations exist for where to store the object. You can label image collections
using the Image Labeler only.
7-151
7 Object Detection
See Also
Apps
Ground Truth Labeler | Image Labeler | Video Labeler
Objects
groundTruth | groundTruthDataSource
Functions
changeFilePaths
More About
• “How Labeler Apps Store Exported Pixel Labels” on page 7-3
7-152
Keyboard Shortcuts and Mouse Actions for Image Labeler
Note On Macintosh platforms, use the Command (⌘) key instead of Ctrl.
Label Definitions
Task Action
In the ROI Label Definition pane, Up arrow or down arrow
navigate through ROI labels and their
groups
In the Scene Label Definition pane, Hold Alt and press the up arrow or down
navigate through scene labels and their arrow
groups
Reorder labels within a group or move Click and drag labels
labels between groups
Reorder groups Click and drag groups
Task Action
Browse through images one at a time Left arrow and right arrow
Browse to the next set of images that is • PC: Page Up and Page Down
viewable in the image browser • Mac: Hold Fn and press the up and
down arrows
Go to the first image • PC: Home
• Mac: Hold Fn and press the left arrow
Go to the last image • PC: End
• Mac: Hold Fn and press the right arrow
7-153
7 Object Detection
Task Action
Select all images from the current image to • PC: Shift+Home
the first image • Mac: Hold Fn+Shift and press the left
arrow
Select all images from the current image to • PC: Shift+End
the last image • Mac: Hold Fn+Shift and press the right
arrow
Select all images from the current image to Hold Shift and click the final image in the
a specific image range
Select a specific set of images Hold Ctrl and click the images you want to
select
Labeling Window
Perform labeling actions, such as adding, moving, and deleting regions of interest (ROIs),
on the current image.
Task Action
Undo labeling action Ctrl+Z
Redo labeling action Ctrl+Y
Select all rectangle ROIs Ctrl+A
Select specific rectangle ROIs Hold Ctrl and click the ROIs you want to
select
Cut selected rectangle ROIs Ctrl+X
Copy selected rectangle ROIs to clipboard Ctrl+C
Paste copied rectangle ROIs Ctrl+V
Delete selected rectangle ROIs Delete
Polygon Drawing
Draw polygons to label pixels on a frame.
7-154
Keyboard Shortcuts and Mouse Actions for Image Labeler
Task Action
Commit a polygon to the frame, excluding Press Enter or right-click while drawing
the currently active line segment the polygon
Zooming
Task Action
Zoom in or out of frame Move the scroll wheel up (zoom in) or down
(zoom out)
App Sessions
Task Action
Save current session Ctrl+S
7-155
7 Object Detection
See Also
Image Labeler
More About
• “Get Started with the Image Labeler” on page 7-100
7-156
Keyboard Shortcuts and Mouse Actions for Video Labeler
Note On Macintosh platforms, use the Command (⌘) key instead of Ctrl.
Label Definitions
Task Action
In the ROI Label Definition pane, Up arrow or down arrow
navigate through ROI labels and their
groups
In the Scene Label Definition pane, Hold Alt and press the up arrow or down
navigate through scene labels and their arrow
groups
Reorder labels within a group or move Click and drag labels
labels between groups
Reorder groups Click and drag groups
Task Action
Go to the next frame Right arrow
Go to the previous frame Left arrow
Go to the last frame • PC: End
• Mac: Hold Fn and press the right arrow
Go to the first frame • PC: Home
• Mac: Hold Fn and press the left arrow
Navigate through time interval boxes and Tab
frame navigation buttons
7-157
7 Object Detection
Task Action
Commit time interval settings Press Enter within the active time interval
box (Start Time, Current, or End Time)
Labeling Window
Perform labeling actions, such as adding, moving, and deleting regions of interest (ROIs),
on the current image or video frame.
Task Action
Undo labeling action Ctrl+Z
Redo labeling action Ctrl+Y
Select all rectangle and line ROIs Ctrl+A
Select specific rectangle and line ROIs Hold Ctrl and click the ROIs you want to
select
Cut selected rectangle and line ROIs Ctrl+X
Copy selected rectangle and line ROIs to Ctrl+C
clipboard
Paste copied rectangle and line ROIs Ctrl+V
Polyline Drawing
Draw ROI line labels on a frame. ROI line labels are polylines, meaning that they are
composed of one or more line segments.
7-158
Keyboard Shortcuts and Mouse Actions for Video Labeler
Task Action
Commit a polyline to the frame, excluding Press Enter or right-click while drawing
the currently active line segment the polyline
Commit a polyline to the frame, including Double-click while drawing the polyline
the currently active line segment
A new line segment is committed at the
point where you double-click.
Delete the previously created line segment Backspace
in a polyline
Cancel drawing and delete the entire Escape
polyline
Polygon Drawing
Draw polygons to label pixels on a frame.
Task Action
Commit a polygon to the frame, excluding Press Enter or right-click while drawing
the currently active line segment the polygon
7-159
7 Object Detection
Zooming
Task Action
Zoom in or out of frame Move the scroll wheel up (zoom in) or down
(zoom out)
App Sessions
Task Action
Save current session Ctrl+S
See Also
Video Labeler
More About
• “Get Started with the Video Labeler” on page 7-109
7-160
Point Feature Types
7-161
7 Object Detection
Corners
Multiscale detection
Point tracking, image registration,
handles changes in scale and rotation,
corner detection in scenes of human
origin, such as streets and indoor scenes
SURFPoints detectSURFFeatures
Speeded-up robust features
(SURF) algorithm [11]
Blobs
Multiscale detection
Object detection and image registration
with scale and rotation changes
ORBPoints detectORBFeatures
Oriented FAST and Rotated
BRIEF (ORB) method [13]
Corners
Multi-scale detection
Point tracking, image registration,
handles changes in rotation, corner
detection in scenes of human origin, such
as streets and indoor scenes
7-162
Point Feature Types
7-163
7 Object Detection
Function Description
BRISK The function sets the Orientation
property of the validPoints output
object to the orientation of the
extracted features, in radians.
FREAK The function sets the Orientation
property of the validPoints output
object to the orientation of the
extracted features, in radians.
SURF The function sets the Orientation
property of the validPoints output
object to the orientation of the
extracted features, in radians.
7-164
Point Feature Types
Function Description
KAZE Non-linear pyramid-based features.
7-165
7 Object Detection
Function Description
Auto The function selects the Method based
on the class of the input points and
implements:
The FREAK method for a
cornerPoints input object.
The SURF method for a SURFPoints
or MSERRegions input object.
The FREAK method for a
BRISKPoints input object.
The ORB method for a ORBPoints
input object.
References
[1] Rosten, E., and T. Drummond, “Machine Learning for High-Speed Corner Detection.”
9th European Conference on Computer Vision. Vol. 1, 2006, pp. 430–443.
[2] Mikolajczyk, K., and C. Schmid. “A performance evaluation of local descriptors.” IEEE
Transactions on Pattern Analysis and Machine Intelligence. Vol. 27, Issue 10,
2005, pp. 1615–1630.
[3] Harris, C., and M. J. Stephens. “A Combined Corner and Edge Detector.” Proceedings
of the 4th Alvey Vision Conference. August 1988, pp. 147–152.
[4] Shi, J., and C. Tomasi. “Good Features to Track.” Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition. June 1994, pp. 593–600.
7-166
See Also
[5] Tuytelaars, T., and K. Mikolajczyk. “Local Invariant Feature Detectors: A Survey.”
Foundations and Trends in Computer Graphics and Vision. Vol. 3, Issue 3, 2007,
pp. 177–280.
[6] Leutenegger, S., M. Chli, and R. Siegwart. “BRISK: Binary Robust Invariant Scalable
Keypoints.” Proceedings of the IEEE International Conference. ICCV, 2011.
[7] Nister, D., and H. Stewenius. "Linear Time Maximally Stable Extremal Regions."
Lecture Notes in Computer Science. 10th European Conference on Computer
Vision. Marseille, France: 2008, no. 5303, pp. 183–196.
[8] Matas, J., O. Chum, M. Urba, and T. Pajdla. "Robust wide-baseline stereo from
maximally stable extremal regions." Proceedings of British Machine Vision
Conference. 2002, pp. 384–396.
[9] Obdrzalek D., S. Basovnik, L. Mach, and A. Mikulik. "Detecting Scene Elements Using
Maximally Stable Colour Regions." Communications in Computer and Information
Science. La Ferte-Bernard, France: 2009, Vol. 82 CCIS (2010 12 01), pp 107–115.
[10] Mikolajczyk, K., T. Tuytelaars, C. Schmid, A. Zisserman, T. Kadir, and L. Van Gool. "A
Comparison of Affine Region Detectors." International Journal of Computer Vision.
Vol. 65, No. 1–2, November, 2005, pp. 43–72 .
[11] Bay, H., A. Ess, T. Tuytelaars, and L. Van Gool. “SURF:Speeded Up Robust Features.”
Computer Vision and Image Understanding (CVIU).Vol. 110, No. 3, 2008, pp. 346–
359.
[12] Alcantarilla, P.F., A. Bartoli, and A.J. Davison. "KAZE Features", ECCV 2012, Part VI,
LNCS 7577 pp. 214, 2012
[13] Rublee, E., V. Rabaud, K. Konolige and G. Bradski. "ORB: An efficient alternative to
SIFT or SURF." In Proceedings of the 2011 International Conference on Computer
Vision, 2564–2571. Barcelona, Spain, 2011.
See Also
Related Examples
• “Detect BRISK Points in an Image and Mark Their Locations”
7-167
7 Object Detection
7-168
Local Feature Detection and Extraction
I = imread('circuit.tif');
corners = detectFASTFeatures(I,'MinContrast',0.1);
J = insertMarker(I,corners,'circle');
imshow(J)
7-169
7 Object Detection
7-170
Local Feature Detection and Extraction
• Repeatable detections:
When given two images of the same scene, most features that the detector finds in
both images are the same. The features are robust to changes in viewing conditions
and noise.
• Distinctive:
The neighborhood around the feature center varies enough to allow for a reliable
comparison between the features.
• Localizable:
The feature has a unique location assigned to it. Changes in viewing conditions do not
affect its location.
7-171
7 Object Detection
local pixel neighborhood into a compact vector representation. This new representation
permits comparison between neighborhoods regardless of changes in scale or orientation.
Descriptors, such as SIFT or SURF, rely on local gradient computations. Binary
descriptors, such as BRISK, ORB or FREAK, rely on pairs of local intensity differences,
which are then encoded into a binary vector.
Criteria Suggestion
Type of features in your image Use a detector appropriate for your data. For example, if
your image contains an image of bacteria cells, use the
blob detector rather than the corner detector. If your
image is an aerial view of a city, you can use the corner
detector to find man-made structures.
Context in which you are using the The HOG, SURF, and KAZE descriptors are suitable for
features: classification tasks. In contrast, binary descriptors, such
as ORB, BRISK and FREAK, are typically used for finding
• Matching key points point correspondences between images, which are used
• Classification for registration.
Type of distortion present in your image Choose a detector and descriptor that addresses the
distortion in your data. For example, if there is no scale
change present, consider a corner detector that does not
handle scale. If your data contains a higher level of
distortion, such as scale and rotation, then use SURF,
ORB or KAZE feature detector and descriptor. The SURF
and the KAZE methods are computationally intensive.
Performance requirements: Binary descriptors are generally faster but less accurate
than gradient-based descriptors. For greater accuracy,
• Real-time performance required use several detectors and descriptors at the same time.
• Accuracy versus speed
7-172
Local Feature Detection and Extraction
Note Detection functions return objects that contain information about the features. The
extractHOGFeatures and extractFeatures functions use these objects to create
descriptors.
7-173
7 Object Detection
Note
7-174
Local Feature Detection and Extraction
original = imread('cameraman.tif');
figure;
imshow(original);
scale = 1.3;
J = imresize(original,scale);
theta = 31;
distorted = imrotate(J,theta);
figure
imshow(distorted)
Detecting the matching SURF features is the first step in determining the transform
needed to correct the distorted image.
ptsOriginal = detectSURFFeatures(original);
ptsDistorted = detectSURFFeatures(distorted);
Extract features and compare the detected blobs between the two images
The detection step found several roughly corresponding blob structures in both images.
Compare the detected blob features. This process is facilitated by feature extraction,
which determines a local patch descriptor.
[featuresOriginal,validPtsOriginal] = ...
extractFeatures(original,ptsOriginal);
7-175
7 Object Detection
[featuresDistorted,validPtsDistorted] = ...
extractFeatures(distorted,ptsDistorted);
It is possible that not all of the original points were used to extract descriptors. Points
might have been rejected if they were too close to the image border. Therefore, the valid
points are returned in addition to the feature descriptors.
The patch size used to compute the descriptors is determined during the feature
extraction step. The patch size corresponds to the scale at which the feature is detected.
Regardless of the patch size, the two feature vectors, featuresOriginal and
featuresDistorted, are computed in such a way that they are of equal length. The
descriptors enable you to compare detected features, regardless of their size and
rotation.
Obtain candidate matches between the features by inputting the descriptors to the
matchFeatures function. Candidate matches imply that the results can contain some
invalid matches. Two patches that match can indicate like features but might not be a
correct match. A table corner can look like a chair corner, but the two features are
obviously not a match.
indexPairs = matchFeatures(featuresOriginal,featuresDistorted);
Each row of the returned indexPairs contains two indices of candidate feature matches
between the images. Use the indices to collect the actual point locations from both
images.
matchedOriginal = validPtsOriginal(indexPairs(:,1));
matchedDistorted = validPtsDistorted(indexPairs(:,2));
If there are a sufficient number of valid matches, remove the false matches. An effective
technique for this scenario is the RANSAC algorithm. The
estimateGeometricTransform function implements M-estimator sample consensus
7-176
Local Feature Detection and Extraction
(MSAC), which is a variant of the RANSAC algorithm. MSAC finds a geometric transform
and separates the inliers (correct matches) from the outliers (spurious matches).
figure
showMatchedFeatures(original,distorted,inlierOriginal,inlierDistorted)
title('Matching points (inliers only)')
legend('ptsOriginal','ptsDistorted')
outputView = imref2d(size(original));
recovered = imwarp(distorted,tform,'OutputView',outputView);
figure
imshowpair(original,recovered,'montage')
original = imread('cameraman.tif');
figure;
imshow(original);
text(size(original,2),size(original,1)+15, ...
'Image courtesy of Massachusetts Institute of Technology', ...
'FontSize',7,'HorizontalAlignment','right');
7-177
7 Object Detection
Scale and rotate the original image to create the distorted image.
scale = 1.3;
J = imresize(original, scale);
theta = 31;
distorted = imrotate(J,theta);
figure
imshow(distorted)
7-178
Local Feature Detection and Extraction
Detect the features in both images. Use the BRISK detectors first, followed by the SURF
detectors.
ptsOriginalBRISK = detectBRISKFeatures(original,'MinContrast',0.01);
ptsDistortedBRISK = detectBRISKFeatures(distorted,'MinContrast',0.01);
7-179
7 Object Detection
ptsOriginalSURF = detectSURFFeatures(original);
ptsDistortedSURF = detectSURFFeatures(distorted);
Extract descriptors from the original and distorted images. The BRISK features use the
FREAK descriptor by default.
[featuresOriginalFREAK,validPtsOriginalBRISK] = ...
extractFeatures(original,ptsOriginalBRISK);
[featuresDistortedFREAK,validPtsDistortedBRISK] = ...
extractFeatures(distorted,ptsDistortedBRISK);
[featuresOriginalSURF,validPtsOriginalSURF] = ...
extractFeatures(original,ptsOriginalSURF);
[featuresDistortedSURF,validPtsDistortedSURF] = ...
extractFeatures(distorted,ptsDistortedSURF);
Determine candidate matches by matching FREAK descriptors first, and then SURF
descriptors. To obtain as many feature matches as possible, start with detector and
matching thresholds that are lower than the default values. Once you get a working
solution, you can gradually increase the thresholds to reduce the computational load
required to extract and match features.
indexPairsBRISK = matchFeatures(featuresOriginalFREAK,...
featuresDistortedFREAK,'MatchThreshold',40,'MaxRatio',0.8);
indexPairsSURF = matchFeatures(featuresOriginalSURF,featuresDistortedSURF);
matchedOriginalBRISK = validPtsOriginalBRISK(indexPairsBRISK(:,1));
matchedDistortedBRISK = validPtsDistortedBRISK(indexPairsBRISK(:,2));
matchedOriginalSURF = validPtsOriginalSURF(indexPairsSURF(:,1));
matchedDistortedSURF = validPtsDistortedSURF(indexPairsSURF(:,2));
figure
showMatchedFeatures(original,distorted,matchedOriginalBRISK,...
matchedDistortedBRISK)
title('Putative matches using BRISK & FREAK')
legend('ptsOriginalBRISK','ptsDistortedBRISK')
7-180
Local Feature Detection and Extraction
Combine the candidate matched BRISK and SURF local features. Use the Location
property to combine the point locations from BRISK and SURF features.
matchedOriginalXY = ...
[matchedOriginalSURF.Location; matchedOriginalBRISK.Location];
7-181
7 Object Detection
matchedDistortedXY = ...
[matchedDistortedSURF.Location; matchedDistortedBRISK.Location];
Determine the inlier points and the geometric transform of the BRISK and SURF features.
[tformTotal,inlierDistortedXY,inlierOriginalXY] = ...
estimateGeometricTransform(matchedDistortedXY,...
matchedOriginalXY,'similarity');
Display the results. The result provides several more matches than the example that used
a single feature detector.
figure
showMatchedFeatures(original,distorted,inlierOriginalXY,inlierDistortedXY)
title('Matching points using SURF and BRISK (inliers only)')
legend('ptsOriginal','ptsDistorted')
7-182
Local Feature Detection and Extraction
outputView = imref2d(size(original));
recovered = imwarp(distorted,tformTotal,'OutputView',outputView);
7-183
7 Object Detection
figure;
imshowpair(original,recovered,'montage')
References
[1] Rosten, E., and T. Drummond. “Machine Learning for High-Speed Corner Detection.”
9th European Conference on Computer Vision. Vol. 1, 2006, pp. 430–443.
[2] Mikolajczyk, K., and C. Schmid. “A performance evaluation of local descriptors.” IEEE
Transactions on Pattern Analysis and Machine Intelligence. Vol. 27, Issue 10,
2005, pp. 1615–1630.
[3] Harris, C., and M. J. Stephens. “A Combined Corner and Edge Detector.” Proceedings
of the 4th Alvey Vision Conference. August 1988, pp. 147–152.
[4] Shi, J., and C. Tomasi. “Good Features to Track.” Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition. June 1994, pp. 593–600.
7-184
See Also
[5] Tuytelaars, T., and K. Mikolajczyk. “Local Invariant Feature Detectors: A Survey.”
Foundations and Trends in Computer Graphics and Vision. Vol. 3, Issue 3, 2007,
pp. 177–280.
[6] Leutenegger, S., M. Chli, and R. Siegwart. “BRISK: Binary Robust Invariant Scalable
Keypoints.” Proceedings of the IEEE International Conference. ICCV, 2011.
[7] Nister, D., and H. Stewenius. "Linear Time Maximally Stable Extremal Regions." 10th
European Conference on Computer Vision. Marseille, France: 2008, No. 5303, pp.
183–196.
[8] Matas, J., O. Chum, M. Urba, and T. Pajdla. "Robust wide-baseline stereo from
maximally stable extremal regions."Proceedings of British Machine Vision
Conference. 2002, pp. 384–396.
[9] Obdrzalek D., S. Basovnik, L. Mach, and A. Mikulik. "Detecting Scene Elements Using
Maximally Stable Colour Regions."Communications in Computer and Information
Science. La Ferte-Bernard, France: 2009, Vol. 82 CCIS (2010 12 01), pp. 107–115.
[10] Mikolajczyk, K., T. Tuytelaars, C. Schmid, A. Zisserman, T. Kadir, and L. Van Gool. "A
Comparison of Affine Region Detectors. "International Journal of Computer Vision.
Vol. 65, No. 1–2, November 2005, pp. 43–72 .
[11] Bay, H., A. Ess, T. Tuytelaars, and L. Van Gool. “SURF: Speeded Up Robust Features.”
Computer Vision and Image Understanding (CVIU). Vol. 110, No. 3, 2008, pp.
346–359.
[12] Alcantarilla, P.F., A. Bartoli, and A.J. Davison. "KAZE Features", ECCV 2012, Part VI,
LNCS 7577 pp. 214, 2012
[13] Rublee, E., V. Rabaud, K. Konolige and G. Bradski. "ORB: An efficient alternative to
SIFT or SURF." In Proceedings of the 2011 International Conference on Computer
Vision, 2564–2571. Barcelona, Spain, 2011.
See Also
Related Examples
• “Detect BRISK Points in an Image and Mark Their Locations”
7-185
7 Object Detection
7-186
Train a Cascade Object Detector
7-187
7 Object Detection
whether the window contains the object of interest. The size of the window varies to
detect objects at different scales, but its aspect ratio remains fixed. The detector is very
sensitive to out-of-plane rotation, because the aspect ratio changes for most 3-D objects.
Thus, you need to train a detector for each orientation of the object. Training a single
detector to handle all orientations will not work.
Each stage of the classifier labels the region defined by the current location of the sliding
window as either positive or negative. Positive indicates that an object was found and
negative indicates no objects were found. If the label is negative, the classification of this
region is complete, and the detector slides the window to the next location. If the label is
positive, the classifier passes the region to the next stage. The detector reports an object
found at the current window location when the final stage classifies the region as positive.
The stages are designed to reject negative samples as fast as possible. The assumption is
that the vast majority of windows do not contain the object of interest. Conversely, true
positives are rare and worth taking the time to verify.
To work well, each stage in the cascade must have a low false negative rate. If a stage
incorrectly labels an object as negative, the classification stops, and you cannot correct
the mistake. However, each stage can have a high false positive rate. Even if the detector
incorrectly labels a nonobject as positive, you can correct the mistake in subsequent
stages.
s
The overall false positive rate of the cascade classifier is f , where f is the false positive
rate per stage in the range (0 1), and s is the number of stages. Similarly, the overall true
positive rate is ts, where t is the true positive rate per stage in the range (0 1]. Thus,
adding more stages reduces the overall false positive rate, but it also reduces the overall
true positive rate.
7-188
Train a Cascade Object Detector
Select the function parameters to optimize the number of stages, the false positive rate,
the true positive rate, and the type of features to use for training. When you set the
parameters, consider these tradeoffs.
7-189
7 Object Detection
Condition Consideration
A large training set (in the thousands). Increase the number of stages and set a
higher false positive rate for each stage.
A small training set. Decrease the number of stages and set a
lower false positive rate for each stage.
To reduce the probability of missing an Increase the true positive rate. However, a
object. high true positive rate can prevent you
from achieving the desired false positive
rate per stage, making the detector more
likely to produce false detections.
To reduce the number of false detections. Increase the number of stages or decrease
the false alarm rate per stage.
Choose the feature that suits the type of object detection you need. The
trainCascadeObjectDetector supports three types of features: Haar, local binary
patterns (LBP), and histograms of oriented gradients (HOG). Haar and LBP features are
often used to detect faces because they work well for representing fine-scale textures.
The HOG features are often used to detect objects such as people and cars. They are
useful for capturing the overall shape of an object. For example, in the following
visualization of the HOG features, you can see the outline of the bicycle.
7-190
Train a Cascade Object Detector
your data. Training a detector using Haar features takes much longer. After that, you can
run the Haar features to see if the accuracy improves.
To create positive samples easily, you can use the Image Labeler app. The Image Labeler
provides an easy way to label positive samples by interactively specifying rectangular
regions of interest (ROIs).
You can also specify positive samples manually in one of two ways. One way is to specify
rectangular regions in a larger image. The regions contain the objects of interest. The
other approach is to crop out the object of interest from the image and save it as a
separate image. Then, you can specify the region to be the entire image. You can also
generate more positive samples from existing ones by adding rotation or noise, or by
varying brightness or contrast.
7-191
7 Object Detection
As more stages are added, the detector's overall false positive rate decreases, causing
generation of negative samples to be more difficult. For this reason, it is helpful to supply
as many negative images as possible. To improve training accuracy, supply negative
images that contain backgrounds typically associated with the objects of interest. Also,
include negative images that contain nonobjects similar in appearance to the objects of
interest. For example, if you are training a stop-sign detector, include negative images
that contain road signs and shapes similar to a stop sign.
There is a trade-off between fewer stages with a lower false positive rate per stage or
more stages with a higher false positive rate per stage. Stages with a lower false positive
rate are more complex because they contain a greater number of weak learners. Stages
with a higher false positive rate contain fewer weak learners. Generally, it is better to
have a greater number of simple stages because at each stage the overall false positive
rate decreases exponentially. For example, if the false positive rate at each stage is 50%,
then the overall false positive rate of a cascade classifier with two stages is 25%. With
three stages, it becomes 12.5%, and so on. However, the greater the number of stages,
the greater the amount of training data the classifier requires. Also, increasing the
number of stages increases the false negative rate. This increase results in a greater
chance of rejecting a positive sample by mistake. Set the false positive rate
(FalseAlarmRate) and the number of stages, (NumCascadeStages) to yield an
acceptable overall false positive rate. Then you can tune these two parameters
experimentally.
Training can sometimes terminate early. For example, suppose that training stops after
seven stages, even though you set the number of stages parameter to 20. It is possible
that the function cannot generate enough negative samples. If you run the function again
and set the number of stages to seven, you do not get the same result. The results
between stages differ because the number of positive and negative samples to use for
each stage is recalculated for the new number of stages.
7-192
Train a Cascade Object Detector
Troubleshooting
What if you run out of positive samples?
The number of available positive samples used to train each stage depends on the true
positive rate. The rate specifies what percentage of positive samples the function can
classify as negative. If a sample is classified as a negative by any stage, it never reaches
subsequent stages. For example, suppose you set the TruePositiveRate to 0.9, and all
of the available samples are used to train the first stage. In this case, 10% of the positive
samples are rejected as negatives, and only 90% of the total positive samples are
available for training the second stage. If training continues, then each stage is trained
with fewer and fewer samples. Each subsequent stage must solve an increasingly more
difficult classification problem with fewer positive samples. With each stage getting fewer
samples, the later stages are likely to overfit the data.
Ideally, use the same number of samples to train each stage. To do so, the number of
positive samples used to train each stage must be less than the total number of available
positive samples. The only exception is that when the value of TruePositiveRate times
the total number of positive samples is less than 1, no positive samples are rejected as
negatives.
The function calculates the number of positive samples to use at each stage using the
following formula:
number of positive samples = floor(totalPositiveSamples / (1 +
(NumCascadeStages - 1) * (1 - TruePositiveRate)))
This calculation does not guarantee that the same number of positive samples are
available for each stage. The reason is that it is impossible to predict with certainty how
7-193
7 Object Detection
many positive samples will be rejected as negatives. The training continues as long as the
number of positive samples available to train a stage is greater than 10% of the number
of samples the function determined automatically using the preceding formula. If there
are not enough positive samples the training stops and the function issues a warning. The
function also outputs a classifier consisting of the stages that it had trained up to that
point. If the training stops, you can add more positive samples. Alternatively, you can
increase TruePositiveRate. Reducing the number of stages can also work, but such
reduction can also result in a higher overall false alarm rate.
The function calculates the number of negative samples used at each stage. This
calculation is done by multiplying the number of positive samples used at each stage by
the value of NegativeSamplesFactor.
Just as with positive samples, there is no guarantee that the calculated number of
negative samples are always available for a particular stage. The
trainCascadeObjectDetector function generates negative samples from the negative
images. However, with each new stage, the overall false alarm rate of the cascade
classifier decreases, making it less likely to find the negative samples.
The training continues as long as the number of negative samples available to train a
stage is greater than 10% of the calculated number of negative samples. If there are not
enough negative samples, the training stops and the function issues a warning. It outputs
a classifier consisting of the stages that it had trained up to that point. When the training
stops, the best approach is to add more negative images. Alternatively, you can reduce
the number of stages or increase the false positive rate.
Examples
Train a Five-Stage Stop-Sign Detector
This example shows you how to set up and train a five-stage, stop-sign detector, using 86
positive samples. The default value for TruePositiveRate is 0.995.
Step 1: Load the positive samples data from a MAT-file. In this example, file names and
bounding boxes are contained in the array of structures labeled 'data'.
load('stopSigns.mat');
7-194
Train a Cascade Object Detector
imDir = fullfile(matlabroot,'toolbox','vision','visiondata','stopSignImages');
addpath(imDir);
All 86 positive samples were used to train each stage. This high rate occurs because the
true positive rate is very high relative to the number of positive samples.
7-195
7 Object Detection
This example shows you how to train a stop-sign detector on the same data set as the first
example, (steps 1–3), but with the TruePositiveRate decreased to 0.98.
Only 79 of the total 86 positive samples were used to train each stage. This lowered rate
occurs because the true positive rate was low enough for the function to start rejecting
some of the positive samples as false negatives.
7-196
Train a Cascade Object Detector
This example shows you how to train a stop-sign detector on the same data set as the first
example, (steps 1–3), but with the number of stages increased to 10.
7-197
7 Object Detection
7-198
Train a Cascade Object Detector
At this point, you can add more negative images, reduce the number of stages, or
increase the false positive rate. For example, you can increase the false positive rate,
FalseAlarmRate, to 0.5. The expected overall false-positive rate in this case is 0.0039.
7-199
7 Object Detection
7-200
Train a Cascade Object Detector
This time the function trains eight stages before the threshold reaches the overall false
alarm rate of 0.000587108 and training stops.
load('stopSignsAndCars.mat');
Select the bounding boxes for stop signs from the table.
positiveInstances = stopSignsAndCars(:,1:2);
imDir = fullfile(matlabroot,'toolbox','vision','visiondata',...
'stopSignImages');
addpath(imDir);
negativeFolder = fullfile(matlabroot,'toolbox','vision','visiondata',...
'nonStopSigns');
negativeImages = imageDatastore(negativeFolder);
Train a cascade object detector called 'stopSignDetector.xml' using HOG features. NOTE:
The command can take several minutes to run.
trainCascadeObjectDetector('stopSignDetector.xml',positiveInstances, ...
negativeFolder,'FalseAlarmRate',0.1,'NumCascadeStages',5);
--cascadeParams--
Training stage 1 of 5
7-201
7 Object Detection
[........................................................................]
Used 42 positive and 84 negative samples
Time to train stage 1: 1 seconds
Training stage 2 of 5
[........................................................................]
Used 42 positive and 84 negative samples
Time to train stage 2: 1 seconds
Training stage 3 of 5
[........................................................................]
Used 42 positive and 84 negative samples
Time to train stage 3: 5 seconds
Training stage 4 of 5
[........................................................................]
Used 42 positive and 84 negative samples
Time to train stage 4: 14 seconds
Training stage 5 of 5
[........................................................................]
Used 42 positive and 17 negative samples
Time to train stage 5: 23 seconds
Training complete
detector = vision.CascadeObjectDetector('stopSignDetector.xml');
img = imread('stopSignTest.jpg');
bbox = step(detector,img);
figure; imshow(detectedImg);
7-202
See Also
rmpath(imDir);
See Also
More About
• “Get Started with the Image Labeler” on page 7-100
External Websites
• Cascade Trainer
7-203
7 Object Detection
The optical character recognition (OCR) app trains the ocr function to recognize a
custom language or font. You can use this app to label character data interactively for
OCR training and to generate an OCR language data file for use with the ocr function.
Train OCR
1 In the OCR Trainer, click New Session to open the OCR Training Session Settings
dialog box.
2 Under Output Settings, enter a name for the OCR language data file and choose the
output folder location for the file. The location you specify must be writable.
3 Under Labeling Method, either label the data manually or pre-label it using optical
character recognition. If you use OCR, you can select either the pre-installed English
or Japanese language, or you can download additional language support files.
7-204
Train Optical Character Recognition for Custom Fonts
4 Add images at any time during the training session. The trainer automatically
segments the images for OCR training. Inspect the results to verify expected text
segmentation. To improve the segmentation, pre-process your images using the
Image Segmenter app. Once the images are added, you can inspect segmentation
results from the training image view.
To limit the OCR to a specific character set, select the Character set check box and
add the characters.
Note Use training images that contain text that you want OCR to recognize. Do not
use training images with only a few characters. OCR training works best if training
images contain blocks of many words. You can use the insertText function to
automatically generate training images for a known font.
I = zeros(500,500,3,'uint8');
textLines = [
"some training text"
"even more stuff to learn"
]
lineYLocation = 50;
for i = 1:numel(textLines)
I = insertText(I,[50 lineYLocation],char(textLines(i)), ...
'Font','LucidaSansRegular',...
'FontSize',16,'TextColor','white',...
7-205
7 Object Detection
'BoxOpacity',0);
• To correct samples, select a group of samples from the Data Browser pane and
change the labels using the Character Label field.
• To exclude a sample from training, right-click the sample and select the option to
move that sample to the Unknown category. Unknown samples are listed at the
top of the Data Browser pane and are not used for training.
• If the bounding box clipped a character, double-click the character and modify it
in the image it was extracted from.
7-206
See Also
7 After correcting the samples, click Train. When the trainer completes training, the
app creates an OCR language data file and saves it to the folder you specified.
App Controls
Sessions
Starts a new session, opens a saved session, or adds a session to the current one. You can
also save and name the session. The sessions are saved as MAT files.
Add Images
Adds images. You can add images when you start a new session or after you accept the
current collection of images.
Settings
Edit Box
Selects the image that contains the selected character, along with the bounding boxes.
You can create additional regions, merge, modify, or delete existing images. To delete an
ROI, use the delete key.
Train
Creates an OCR data file from the session. To use the .traineddata file with the ocr
function, set the 'Language' property for the ocr function, and follow the directions for
a custom language.
Generate Function
See Also
OCR Trainer | ocr
7-207
7 Object Detection
See Also
graythresh | imbinarize | imtophat | ocr | ocrText | visionSupportPackages
More About
• “Install Computer Vision Toolbox Add-on Support Files” on page 3-2
7-208
Create a Custom Feature Extractor
7-209
7 Object Detection
Input images can require preprocessing before feature extraction. To extract SURF
features and to use the detectSURFFeatures or detectMSERFeatures functions, the
images must be grayscale. If the images are not grayscale, you can convert them using
the rgb2gray function.
7-210
Create a Custom Feature Extractor
[height,width,numChannels] = size(I);
if numChannels > 1
grayImage = rgb2gray(I);
else
grayImage = I;
end
Use a regular spaced grid of point locations. Using the grid over the image allows for
dense SURF feature extraction. The grid step is in pixels.
gridStep = 8;
gridX = 1:gridStep:width;
gridY = 1:gridStep:height;
[x,y] = meshgrid(gridX,gridY);
You can manually concatenate multiple SURFPoints objects at different scales to achieve
multiscale feature extraction.
multiscaleGridPoints = [SURFPoints(gridLocations,'Scale',1.6);
SURFPoints(gridLocations,'Scale',3.2);
SURFPoints(gridLocations,'Scale',4.8);
SURFPoints(gridLocations,'Scale',6.4)];
multiscaleSURFPoints = detectSURFFeatures(I);
Extract features
Extract features from the selected point locations. By default, bagOfFeatures extracts
upright SURF features.
features = extractFeatures(grayImage,multiscaleGridPoints,'Upright',true);
The feature metrics indicate the strength of each feature. Larger metric values are
assigned to stronger features. Use feature metrics to identify and remove weak features
7-211
7 Object Detection
before using bagOfFeatures to learn the visual vocabulary of an image set. Use the
metric that is suitable for your feature vectors.
For example, you can use the variance of the SURF features as the feature metric.
featureMetrics = var(features,[],2);
If you used a feature detector for the point selection, then use the detection metric
instead.
featureMetrics = multiscaleSURFPoints.Metric;
You can optionally return the feature location information. The feature location can be
used for spatial or geometric verification image search applications. See the “Geometric
Verification Using estimateGeometricTransform Function” example. The
retrieveImages and indexImages functions are used for content-based image
retrieval systems.
if nargout > 2
varargout{1} = multiscaleGridPoints.Location;
end
7-212
Image Retrieval with Bag of Visual Words
The retrieval system uses a bag of visual words, a collection of image descriptors, to
represent your data set of images. Images are indexed to create a mapping of visual
words. The index maps each visual word to their occurrences in the image set. A
comparison between the query image and the index provides the images most similar to
the query image. By using the CBIR system workflow, you can evaluate the accuracy for a
known set of image search results.
7-213
7 Object Detection
Index images
imageIndex = indexImages(imds ) imageIndex = indexImages(imds , bag)
{
.
. . . I 23
visual words . . 23
. I 19
query image . 19
I3
Wn-1 I1, I73, I100, I233, ... 3
imageIDs
7-214
Image Retrieval with Bag of Visual Words
You can use the original imgSet or a different collection of images for the training
set. To use a different collection, create the bag of visual words before creating the
image index, using the bagOfFeatures function. The advantage of using the same
set of images is that the visual vocabulary is tailored to the search set. The
disadvantage of this approach is that the retrieval system must relearn the visual
vocabulary to use on a drastically different set of images. With an independent set,
the visual vocabulary is better able to handle the additions of new images into the
search index.
3 Index the images. The indexImages function creates a search index that maps
visual words to their occurrences in the image collection. When you create the bag of
visual words using an independent or subset collection, include the bag as an input
argument to indexImages. If you do not create an independent bag of visual words,
then the function creates the bag based on the entire imgSet input collection. You
can add and remove images directly to and from the image index using the
addImages and removeImages methods.
4 Search data set for similar images. Use the retrieveImages function to search
the image set for images which are similar to the query image. Use the NumResults
property to control the number of results. For example, to return the top 10 similar
images, set the ROI property to use a smaller region of a query image. A smaller
region is useful for isolating a particular object in an image that you want to search
for.
7-215
7 Object Detection
within the collection. For example, if you are searching an image collection made up of
scenes, such as beaches, cities, or highways, use a global image feature. A global image
feature, such as a color histogram, captures the key elements of the entire scene. To find
specific objects within the image collections, use local image features extracted around
object keypoints instead.
See Also
Related Examples
• “Image Retrieval Using Customized Bag of Features”
7-216
Image Classification with Bag of Visual Words
setDir = fullfile(toolboxdir('vision'),'visiondata','imageSets');
imds = imageDatastore(setDir,'IncludeSubfolders',true,'LabelSource',...
'foldernames');
Separate the sets into training and test image subsets. In this example, 30% of the images
are partitioned for training and the remainder for testing.
[trainingSet,testSet] = splitEachLabel(imds,0.3,'randomize');
trainingSets
partition
imageSets
testSets
7-217
7 Object Detection
The bagOfFeatures object defines the features, or visual words, by using the k-means
clustering (Statistics and Machine Learning Toolbox) algorithm on the feature descriptors
extracted from trainingSets. The algorithm iteratively groups the descriptors into k
mutually exclusive clusters. The resulting clusters are compact and separated by similar
characteristics. Each cluster center represents a feature, or visual word.
You can extract features based on a feature detector, or you can define a grid to extract
feature descriptors. The grid method may lose fine-grained scale information. Therefore,
use the grid for images that do not contain distinct features, such as an image containing
scenery, like the beach. Using speeded up robust features (or SURF) detector provides
greater scale invariance. By default, the algorithm runs the 'grid' method.
grid
This algorithm workflow analyzes images in their entirety. Images must have appropriate
labels describing the class that they represent. For example, a set of car images could be
labeled cars. The workflow does not rely on spatial information nor on marking the
particular objects in an image. The bag-of-visual-words technique relies on detection
without localization.
7-218
Image Classification with Bag of Visual Words
bagOfFeatures object to encode images in the image set into the histogram of visual
words. The histogram of visual words are then used as the positive and negative samples
to train the classifier.
1 Use the bagOfFeatures encode method to encode each image from the training
set. This function detects and extracts features from the image and then uses the
approximate nearest neighbor algorithm to construct a feature histogram for each
image. The function then increments histogram bins based on the proximity of the
descriptor to a particular cluster center. The histogram length corresponds to the
number of visual words that the bagOfFeatures object constructed. The histogram
becomes a feature vector for the image.
word count
2
9
1 4 7 .
3 54 ...
5 1 2 3 4 5 ...
visual word index
2 Repeat step 1 for each image in the training set to create the training data.
x x
x x
} boats
x x
x
x
x
x x
x
x x
x x
x
x
x x
x x
} mugs
x x
x x
x x
x x
x
x
x
x x
x
x x
x x
x x
} hats
x
x
x
x x
x
x x
x x
x
x
7-219
7 Object Detection
classify
boat
hat x confusion matrix
boat mug boat hat
mug 1
mug
mug 2/ 1
boat 3 /3
mug hat 1
hat
hat
hat
References
[1] Csurka, G., C. R. Dance, L. Fan, J. Willamowski, and C. Bray. Visual Categorization with
Bags of Keypoints. Workshop on Statistical Learning in Computer Vision. ECCV 1
(1–22), 1–2.
See Also
Related Examples
• “Image Category Classification Using Bag of Features”
• “Image Retrieval Using Customized Bag of Features”
7-220
8
Detection
Selecting the right approach for detecting objects of interest depends on what you want
to track and whether the camera is stationary.
To detect objects in motion with a stationary camera, you can perform background
subtraction using the vision.ForegroundDetector System object. The background
subtraction approach works efficiently but requires the camera to be stationary.
To detect objects in motion with a moving camera, you can use a sliding-window detection
approach. This approach typically works more slowly than the background subtraction
approach. To detect and track a specific category of object, use the System objects or
functions described in the table.
8-2
Multiple Object Tracking
Prediction
To track an object over time means that you must predict its location in the next frame.
The simplest method of prediction is to assume that the object will be near its last known
location. In other words, the previous detection serves as the next prediction. This
method is especially effective for high frame rates. However, using this prediction method
can fail when objects move at varying speeds, or when the frame rate is low relative to
the speed of the object in motion.
A more sophisticated method of prediction is to use the previously observed motion of the
object. The Kalman filter (vision.KalmanFilter) predicts the next location of an
object, assuming that it moves according to a motion model, such as constant velocity or
constant acceleration. The Kalman filter also takes into account process noise and
measurement noise. Process noise is the deviation of the actual motion of the object from
the motion model. Measurement noise is the detection error.
Data Association
Data association is the process of associating detections corresponding to the same
physical object across frames. The temporal history of a particular object consists of
multiple detections, and is called a track. A track representation can include the entire
history of the previous locations of the object. Alternatively, it can consist only of the
object's last known location and its current velocity.
8-3
8 Motion Estimation and Tracking
To match a detection to a track, you must establish criteria for evaluating the matches.
Typically, you establish this criteria by defining a cost function. The higher the cost of
matching a detection to a track, the less likely that the detection belongs to the track. A
simple cost function can be defined as the degree of overlap between the bounding boxes
of the predicted and detected objects. The “Tracking Pedestrians from a Moving Car”
example implements this cost function using the bboxOverlapRatio function. You can
implement a more sophisticated cost function, one that accounts for the uncertainty of the
prediction, using the distance function of the vision.KalmanFilter object. You can
also implement a custom cost function than can incorporate information about the object
size and appearance.
Track Management
Data association must take into account the fact that new objects can appear in the field
of view, or that an object being tracked can leave the field of view. In other words, in any
given frame, some number of new tracks might need to be created, and some number of
existing tracks might need to be discarded. The assignDetectionsToTracks function
8-4
See Also
returns the indices of unassigned tracks and unassigned detections in addition to the
matched pairs.
One way of handling unmatched detections is to create a new track from each of them.
Alternatively, you can create new tracks from unmatched detections greater than a
certain size, or from detections that have certain locations or appearance. For example, if
the scene has a single entry point, such as a doorway, then you can specify that only
unmatched detections located near the entry point can begin new tracks, and that all
other detections are considered noise.
Another way of handling unmatched tracks is to delete any track that remain unmatched
for a certain number of frames. Alternatively, you can specify to delete an unmatched
track when its last known location is near an exit point.
See Also
assignDetectionsToTracks | bboxOverlapRatio | configureKalmanFilter |
extractHOGFeatures | selectStrongestBbox | trainCascadeObjectDetector |
vision.CascadeObjectDetector | vision.ForegroundDetector |
vision.KalmanFilter | vision.PeopleDetector | vision.PointTracker
Related Examples
• “Tracking Pedestrians from a Moving Car”
• “Using Kalman Filter for Object Tracking”
• “Motion-Based Multiple Object Tracking”
More About
• “Train a Cascade Object Detector” on page 7-187
External Websites
• Detect and Track Multiple Faces
8-5
8 Motion Estimation and Tracking
Video Mosaicking
This example shows how to create a mosaic from a video sequence. Video mosaicking is
the process of stitching video frames together to form a comprehensive view of the scene.
The resulting mosaic image is a compact representation of the video data. The Video
Mosaicking block is often used in video compression and surveillance applications.
This example illustrates how to use the Corner Detection block, the Estimate Geometric
Transformation block, the Projective Transform block, and the Compositing block to
create a mosaic image from a video sequence.
Example Model
The Input subsystem loads a video sequence from either a file, or generates a synthetic
video sequence. The choice is user defined. First, the Corner Detection block finds points
that are matched between successive frames by the Corner Matching subsystem. Then
the Estimate Geometric Transformation block computes an accurate estimate of the
transformation matrix. This block uses the RANSAC algorithm to eliminate outlier input
points, reducing error along the seams of the output mosaic image. Finally, the
Mosaicking subsystem overlays the current video frame onto the output image to
generate a mosaic.
Input Subsystem
The Input subsystem can be configured to load a video sequence from a file, or to
generate a synthetic video sequence.
8-6
Video Mosaicking
If you choose to use a video sequence from a file, you can reduce computation time by
processing only some of the video frames. This is done by setting the downsampling rate
in the Frame Rate Downsampling subsystem.
If you choose a synthetic video sequence, you can set the speed of translation and
rotation, output image size and origin, and the level of noise. The output of the synthetic
video sequence generator mimics the images captured by a perspective camera with
arbitrary motion over a planar surface.
The subsystem finds corner features in the current video frame in one of three methods.
The example uses Local intensity comparison (Rosen & Drummond), which is the fastest
method. The other methods available are the Harris corner detection (Harris & Stephens)
and the Minimum Eigenvalue (Shi & Tomasi).
8-7
8 Motion Estimation and Tracking
The Corner Matching Subsystem finds the number of corners, location, and their metric
values. The subsystem then calculates the distances between all features in the current
frame with those in the previous frame. By searching for the minimum distances, the
subsystem finds the best matching features.
Mosaicking Subsystem
The subsystem is reset when the video sequence rewinds or when the Estimate Geometric
Transformation block does not find enough inliers.
8-8
Video Mosaicking
The Corners window shows the corner locations in the current video frame.
8-9
8 Motion Estimation and Tracking
The Corners window shows the corner locations in the current video frame.
8-10
Video Mosaicking
8-11
8 Motion Estimation and Tracking
8-12
Pattern Matching
Pattern Matching
This example shows how to use the 2-D normalized cross-correlation for pattern matching
and target tracking. The example uses predefined or user specified target and number of
similar targets to be tracked. The normalized cross correlation plot shows that when the
value exceeds the set threshold, the target is identified.
Introduction
In this example you use normalized cross correlation to track a target pattern in a video.
The pattern matching algorithm involves the following steps:
• The input video frame and the template are reduced in size to minimize the amount of
computation required by the matching algorithm.
• Normalized cross correlation, in the frequency domain, is used to find a template in
the video frame.
• The location of the pattern is determined by finding the maximum cross correlation
value.
Initialize required variables such as the threshold value for the cross correlation and the
decomposition level for Gaussian Pyramid decomposition.
threshold = single(0.99);
level = 2;
Specify the target image and number of similar targets to be tracked. By default, the
example uses a predefined target and finds up to 2 similar patterns. You can set the
variable useDefaultTarget to false to specify a new target and the number of similar
targets to match.
useDefaultTarget = true;
[Img, numberOfTargets, target_image] = ...
videopattern_gettemplate(useDefaultTarget);
8-13
8 Motion Estimation and Tracking
% Rotate the target image by 180 degrees, and perform zero padding so that
% the dimensions of both the target and the input image are the same.
target_image_rot = imrotate(target_image_gp, 180);
[rt, ct] = size(target_image_rot);
Img = single(Img);
Img = multilevelPyramid(Img, level);
[ri, ci]= size(Img);
r_mod = 2^nextpow2(rt + ri);
c_mod = 2^nextpow2(ct + ci);
target_image_p = [target_image_rot zeros(rt, c_mod-ct)];
target_image_p = [target_image_p; zeros(r_mod-rt, c_mod)];
Create a System object to calculate the local maximum value for the normalized cross
correlation.
sz = get(0,'ScreenSize');
pos = [20 sz(4)-400 400 300];
hROIPattern = vision.VideoPlayer('Name', 'Overlay the ROI on the target', ...
'Position', pos);
Initialize figure window for plotting the normalized cross correlation value
8-14
Pattern Matching
Create a processing loop to perform pattern matching on the input video. This loop uses
the System objects you instantiated above. The loop is stopped when you reach the end of
the input file, which is detected by the VideoFileReader System object.
while ~isDone(hVideoSrc)
Im = step(hVideoSrc);
% Calculate image energies and block run tiles that are size of
% target template.
IUT_energy = (Im_gp).^2;
IUT = conv2(IUT_energy, C_ones, 'valid');
IUT = sqrt(IUT);
norm_Corr_f_linear = norm_Corr_f(:);
norm_Corr_value = norm_Corr_f_linear(linear_index);
detect = (norm_Corr_value > threshold);
target_roi = zeros(length(detect), 4);
ul_corner = (gain.*(xyLocation(detect, :)-1))+1;
target_roi(detect, :) = [ul_corner, fliplr(target_size(detect, :))];
8-15
8 Motion Estimation and Tracking
videopatternplots('update',hPlot,norm_Corr_value);
step(hROIPattern, Imf);
end
snapnow
release(hVideoSrc);
I = inI;
outI = I;
for i=1:level
outI = impyramid(I, 'reduce');
I = outI;
end
end
8-16
Pattern Matching
8-17
8 Motion Estimation and Tracking
Summary
This example shows use of Computer Vision Toolbox™ to find a user defined pattern in a
video and track it. The algorithm is based on normalized frequency domain cross
correlation between the target and the image under test. The video player window
displays the input video with the identified target locations. Also a figure displays the
normalized correlation between the target and the image which is used as a metric to
match the target. As can be seen whenever the correlation value exceeds the threshold
(indicated by the blue line), the target is identified in the input video and the location is
marked by the green bounding box.
Appendix
8-18
Pattern Matching
• videopattern_gettemplate.m
• videopatternplots.m
8-19
8 Motion Estimation and Tracking
Pattern Matching
This example shows how to use the 2-D normalized cross-correlation for pattern matching
and target tracking.
Double-click the Edit Parameters block to select the number of similar targets to detect.
You can also change the pyramiding factor. By increasing it, you can match the target
template to each video frame more quickly. Changing the pyramiding factor might require
you to change the Threshold value.
Additionally, you can double-click the Correlation Method switch to specify the domain in
which to perform the cross-correlation. The relative size of the target to the input video
frame and the pyramiding factor determine which domain computation is faster.
Example Model
8-20
Pattern Matching
The Match metric window shows the variation of the target match metrics. The model
determines that the target template is present in a video frame when the match metric
exceeds a threshold (cyan line).
8-21
8 Motion Estimation and Tracking
The Cross-correlation window shows the result of cross-correlating the target template
with a video frame. Large values in this window correspond to the locations of the targets
in the input image.
8-22
Pattern Matching
The Overlay window shows the locations of the targets by highlighting them with
rectangular regions of interest (ROIs). These ROIs are present only when the targets are
detected in the video frame.
8-23
8 Motion Estimation and Tracking
8-24
9
Geometric Transformations
Rotate an Image
You can use the Rotate block to rotate your image or video stream by a specified angle. In
this example, you learn how to use the Rotate block to continuously rotate an image.
ex_vision_rotate_image
1 Define an RGB image in the MATLAB workspace. At the MATLAB command prompt,
type
I = checker_board;
imshow(I)
3 Create a new Simulink model, and add to it the blocks shown in the following table.
9-2
Rotate an Image
The Video Viewer block automatically displays the original image in the Video Viewer
window when you run the model. Because the image is represented by double-
precision floating-point values, a value of 0 corresponds to black and a value of 1
corresponds to white.
6 Use the Rotate block to rotate the image. Set the block parameters as follows:
The Angle port appears on the block. You use this port to input a steadily increasing
angle. Setting the Output size parameter to Expanded to fit rotated input
image ensures that the block does not crop the output.
7 Use the Video Viewer1 block to display the rotating image. Accept the default
parameters.
8 Use the Counter block to create a steadily increasing angle. Set the block parameters
as follows:
9-3
9 Geometric Transformations
• Output = Count
• Clear the Reset input check box.
• Sample time = 1/30
The Counter block counts upward until it reaches the maximum value that can be
represented by 16 bits. Then, it starts again at zero. You can view its output value on
the Display block while the simulation is running. The Counter block's Count data
type parameter enables you to specify it's output data type.
9 Use the Gain block to convert the output of the Counter block from degrees to
radians. Set the Gain parameter to pi/180.
10 Connect the blocks as shown in the following figure.
9-4
Rotate an Image
11 Set the configuration parameters. Open the Configuration dialog box by selecting
Model Configuration Parameters from the Simulation menu. Set the parameters
as follows:
9-5
9 Geometric Transformations
In this example, you used the Rotate block to continuously rotate your image. For more
information about this block, see the Rotate block reference page in the Computer Vision
Toolbox Reference. For more information about other geometric transformation blocks,
see the Resize and Shear block reference pages.
Note If you are on a Windows operating system, you can replace the Video Viewer block
with the To Video Display block, which supports code generation.
9-6
Resize an Image
Resize an Image
You can use the Resize block to change the size of your image or video stream. In this
example, you learn how to use the Resize block to reduce the size of an image:
ex_vision_resize_image
1 Create a new Simulink model, and add to it the blocks shown in the following table.
9-7
9 Geometric Transformations
7 Set the configuration parameters. Open the Configuration dialog box by selecting
Model Configuration Parameters from the Simulation menu. Set the parameters
as follows:
9-8
Resize an Image
9-9
9 Geometric Transformations
In this example, you used the Resize block to shrink an image. For more information
about this block, see the Resize block reference page. For more information about other
geometric transformation blocks, see the Rotate, Warp, Estimate Geometric
Transformation, and Translate block reference pages.
9-10
Crop an Image
Crop an Image
You can use the Selector block to crop your image or video stream. In this example, you
learn how to use the Selector block to trim an image down to a particular region of
interest:
ex_vision_crop_image
1 Create a new Simulink model, and add to it the blocks shown in the following table.
The Selector block starts at row 140 and column 200 of the image and outputs the
next 70 rows and columns of the image.
9-11
9 Geometric Transformations
5 Use the Video Viewer1 block to display the cropped image. This block automatically
displays the modified image in the Video Viewer window when you run the model.
6 Connect the blocks as shown in the following figure.
7 Set the configuration parameters. Open the Configuration dialog box by selecting
Model Configuration Parameters from the Simulation menu. Set the parameters
as follows:
9-12
Crop an Image
The cropped image appears in the Video Viewer1 window. The following image is
shown at its true size.
9-13
9 Geometric Transformations
In this example, you used the Selector block to crop an image. For more information
about the Selector block, see the Simulink documentation. For information about the
imcrop function, see the Image Processing Toolbox documentation.
9-14
Nearest Neighbor, Bilinear, and Bicubic Interpolation Methods
1 2 3
4 5 6
7 8 9
represents your input image. You want to translate this image 1.7 pixels in the positive
horizontal direction using nearest neighbor interpolation. The Translate block's nearest
neighbor interpolation algorithm is illustrated by the following steps:
1 Zero pad the input matrix and translate it by 1.7 pixels to the right.
0 1 0 2 1 3 2 0 3 0 0
0 4 0 5 4 6 5 0 6 0 0
0 7 0 8 7 9 8 0 9 0 0
1.7 pixels
2 Create the output matrix by replacing each input pixel value with the translated value
nearest to it. The result is the following matrix:
9-15
9 Geometric Transformations
0 0 12 3
0 0 45 6
0 0 78 9
Note You wanted to translate the image by 1.7 pixels, but this method translated the
image by 2 pixels. Nearest neighbor interpolation is computationally efficient but not as
accurate as bilinear or bicubic interpolation.
Bilinear Interpolation
For bilinear interpolation, the block uses the weighted average of two translated pixel
values for each output pixel value.
1 2 3
4 5 6
7 8 9
represents your input image. You want to translate this image 0.5 pixel in the positive
horizontal direction using bilinear interpolation. The Translate block's bilinear
interpolation algorithm is illustrated by the following steps:
1 Zero pad the input matrix and translate it by 0.5 pixel to the right.
0 1 1 2 2 3 3 0 0
0 4 4 5 5 6 6 0 0
0 7 7 8 8 9 9 0 0
2 Create the output matrix by replacing each input pixel value with the weighted
average of the translated values on either side. The result is the following matrix
where the output matrix has one more column than the input matrix:
9-16
Nearest Neighbor, Bilinear, and Bicubic Interpolation Methods
Bicubic Interpolation
For bicubic interpolation, the block uses the weighted average of four translated pixel
values for each output pixel value.
1 2 3
4 5 6
7 8 9
represents your input image. You want to translate this image 0.5 pixel in the positive
horizontal direction using bicubic interpolation. The Translate block's bicubic
interpolation algorithm is illustrated by the following steps:
1 Zero pad the input matrix and translate it by 0.5 pixel to the right.
0 0 0 1 1 2 2 3 3 0 0 0 0
0 0 0 4 4 5 5 6 6 0 0 0 0
0 0 0 7 7 8 8 9 9 0 0 0 0
2 Create the output matrix by replacing each input pixel value with the weighted
average of the two translated values on either side. The result is the following matrix
where the output matrix has one more column than the input matrix:
9-17
10
ex_vision_adjust_contrast_intensity
1 Create a new Simulink model, and add to it the blocks shown in the following table.
10-2
Adjust the Contrast of Intensity Images
10-3
10 Filters, Transforms, and Enhancements
9 Set the configuration parameters. Open the Configuration dialog box by selecting
Model Configuration Parameters from the Simulation menu. Set the parameters
as follows:
10-4
Adjust the Contrast of Intensity Images
In this example, you used the Contrast Adjustment block to linearly scale the pixel values
in pout.tif between new upper and lower limits. You used the Histogram Equalization
block to transform the values in tire.tif so that the histogram of the output image
approximately matches a uniform histogram. For more information, see the Contrast
Adjustment and Histogram Equalization reference pages.
10-5
10 Filters, Transforms, and Enhancements
ex_vision_adjust_contrast_color.mdl
1 Use the following code to read in an indexed RGB image, shadow.tif, and convert it
to an RGB image. The model provided above already includes this code in file >
Model Properties > Model Properties > InitFcn, and executes it prior to
simulation.
[X map] = imread('shadow.tif');
shadow = ind2rgb(X,map);
2 Create a new Simulink model, and add to it the blocks shown in the following table.
• Value = shadow
• Image signal = Separate color signals
5 Use the Color Space Conversion block to separate the luma information from the
color information. Set the block parameters as follows:
10-6
Adjust the Contrast of Color Images
Because the range of the L* values is between 0 and 100, you must normalize them
to be between zero and one before you pass them to the Histogram Equalization
block, which expects floating point input in this range.
6 Use the Constant block to define a normalization factor. Set the Constant value
parameter to 100.
7 Use the Divide block to normalize the L* values to be between 0 and 1. Accept the
default parameters.
8 Use the Histogram Equalization block to modify the contrast in the image. This block
enhances the contrast of images by transforming the luma values in the color image
so that the histogram of the output image approximately matches a specified
histogram. Accept the default parameters.
9 Use the Product block to scale the values back to be between the 0 to 100 range.
Accept the default parameters.
10 Use the Color Space Conversion1 block to convert the values back to the sR'G'B'
color space. Set the block parameters as follows:
10-7
10 Filters, Transforms, and Enhancements
13 Set the configuration parameters. Open the Configuration dialog box by selecting
Model Configuration Parameters from the Simulation menu. Set the parameters
as follows:
As shown in the following figure, the model displays the original image in the Video
Viewer1 window.
10-8
Adjust the Contrast of Color Images
As the next figure shows, the model displays the enhanced contrast image in the
Video Viewer window.
10-9
10 Filters, Transforms, and Enhancements
In this example, you used the Histogram Equalization block to transform the values in a
color image so that the histogram of the output image approximately matches a uniform
histogram. For more information, see the Histogram Equalization reference page.
10-10
Remove Salt and Pepper Noise from Images
ex_vision_remove_noise
1 Define an intensity image in the MATLAB workspace and add noise to it by typing the
following at the MATLAB command prompt:
I= double(imread('circles.png'));
I= imnoise(I,'salt & pepper',0.02);
The model provided with this example already includes this code in file>Model
Properties>Model Properties>InitFcn, and executes it prior to simulation.
2 To view the image this matrix represents, at the MATLAB command prompt, type
imshow(I)
10-11
10 Filters, Transforms, and Enhancements
The intensity image contains noise that you want your model to eliminate.
3 Create a Simulink model, and add the blocks shown in the following table.
The Median Filter block replaces the central value of the 3-by-3 neighborhood with
the median value of the neighborhood. This process removes the noise in the image.
10-12
Remove Salt and Pepper Noise from Images
6 Use the Video Viewer blocks to display the original noisy image, and the modified
image. Images are represented by 8-bit unsigned integers. Therefore, a value of 0
corresponds to black and a value of 255 corresponds to white. Accept the default
parameters.
7 Connect the blocks as shown in the following figure.
8 Set the configuration parameters. Open the Configuration dialog box by selecting
Model Configuration Parameters from the Simulation menu. Set the parameters
as follows:
10-13
10 Filters, Transforms, and Enhancements
10-14
Remove Salt and Pepper Noise from Images
You have used the Median Filter block to remove noise from your image. For more
information about this block, see the Median Filter block reference page in the Computer
Vision Toolbox Reference.
10-15
10 Filters, Transforms, and Enhancements
Sharpen an Image
To sharpen a color image, you need to make the luma intensity transitions more acute,
while preserving the color information of the image. To do this, you convert an R'G'B'
image into the Y'CbCr color space and apply a highpass filter to the luma portion of the
image only. Then, you transform the image back to the R'G'B' color space to view the
results. To blur an image, you apply a lowpass filter to the luma portion of the image. This
example shows how to use the 2-D FIR Filter block to sharpen an image. The prime
notation indicates that the signals are gamma corrected.
ex_vision_sharpen_image
1 Define an R'G'B' image in the MATLAB workspace. To read in an R'G'B' image from a
PNG file and cast it to the double-precision data type, at the MATLAB command
prompt, type
I= im2double(imread('peppers.png'));
The model provided with this example already includes this code in file>Model
Properties>Model Properties>InitFcn, and executes it prior to simulation.
2 To view the image this array represents, type this command at the MATLAB command
prompt:
imshow(I)
10-16
Sharpen an Image
Now that you have defined your image, you can create your model.
3 Create a new Simulink model, and add to it the blocks shown in the following table.
10-17
10 Filters, Transforms, and Enhancements
The block outputs the R', G', and B' planes of the I array at the output ports.
5 The first Color Space Conversion block converts color information from the R'G'B'
color space to the Y'CbCr color space. Set the Image signal parameter to Separate
color signals
6 Use the 2-D FIR Filter block to filter the luma portion of the image. Set the block
parameters as follows:
• Coefficients = fspecial('unsharp')
• Output size = Same as input port I
• Padding options = Symmetric
• Filtering based on = Correlation
10-18
Sharpen an Image
10 Set the configuration parameters. Open the Configuration dialog box by selecting
Model Configuration Parameters from the Simulation menu. Set the parameters
as follows:
A sharper version of the original image appears in the Video Viewer window.
10-19
10 Filters, Transforms, and Enhancements
To blur the image, double-click the 2-D FIR Filter block. Set Coefficients parameter
to fspecial('gaussian',[15 15],7) and then click OK. The
fspecial('gaussian',[15 15],7) command creates two-dimensional Gaussian
lowpass filter coefficients. This lowpass filter blurs the image by removing the high
frequency noise in it.
In this example, you used the Color Space Conversion and 2-D FIR Filter blocks to
sharpen an image. For more information, see the Color Space Conversion and 2-D FIR
Filter, and fspecial reference pages.
10-20
11
ex_vision_correct_uniform
1 Create a new Simulink model, and add to it the blocks shown in the following table.
The strel object creates a circular STREL object with a radius of 15 pixels. When
working with the Opening block, pick a STREL object that fits within the objects you
want to keep. It often takes experimentation to find the neighborhood or STREL
object that best suits your application.
11-2
Correct Nonuniform Illumination
5 Use the Video Viewer1 block to view the background estimated by the Opening block.
Accept the default parameters.
6 Use the first Sum block to subtract the estimated background from the original
image. Set the block parameters as follows:
11-3
11 Statistics and Morphological Operations
13 Set the configuration parameters. Open the Configuration dialog box by selecting
Model Configuration Parameters from the Simulation menu. Set the parameters
as follows:
11-4
Correct Nonuniform Illumination
11-5
11 Statistics and Morphological Operations
The image without the estimated background appears in the Video Viewer2 window.
11-6
Correct Nonuniform Illumination
The preceding image is too dark. The Constant block provides an offset value that
you used to brighten the image.
The corrected image, which has even lighting, appears in the Video Viewer3 window.
The following image is shown at its true size.
11-7
11 Statistics and Morphological Operations
In this section, you have used the Opening block to remove irregular illumination from an
image. For more information about this block, see the Opening reference page. For
related information, see the Top-hat block reference page. For more information about
STREL objects, see the strel object in the Image Processing Toolbox documentation.
11-8
Count Objects in an Image
11-9
11 Statistics and Morphological Operations
If the input to the Relational Operator block is less than 200, its output is 1;
otherwise, its output is 0. You must threshold your intensity image because the Label
block expects binary input. Also, the objects it counts must be white.
6 Use the Opening block to separate the spokes from the rim and from each other at
the center of the wheel. Use the default parameters.
The strel object creates a circular STREL object with a radius of 5 pixels. When
working with the Opening block, pick a STREL object that fits within the objects you
want to keep. It often takes experimentation to find the neighborhood or STREL
object that best suits your application.
7 Use the Video Viewer1 block to view the opened image. Accept the default
parameters.
8 Use the Label block to count the number of spokes in the input image. Set the
Output parameter to Number of labels.
9 The Display block displays the number of spokes in the input image. Use the default
parameters.
10 Connect the block as shown in the following figure.
11-10
Count Objects in an Image
11 Set the configuration parameters. Open the Configuration dialog box by selecting
Model Configuration Parameters from the Simulation menu. Set the parameters
as follows:
The original image appears in the Video Viewer1 window. To view the image at its
true size, right-click the window and select Set Display To True Size.
11-11
11 Statistics and Morphological Operations
The opened image appears in the Video Viewer window. The following image is shown
at its true size.
11-12
Count Objects in an Image
As you can see in the preceding figure, the spokes are now separate white objects. In
the model, the Display block correctly indicates that there are 24 distinct spokes.
11-13
11 Statistics and Morphological Operations
You have used the Opening and Label blocks to count the number of spokes in an image.
For more information about these blocks, see the Opening and Label block reference
pages in the Computer Vision Toolbox Reference. If you want to send the number of
spokes to the MATLAB workspace, use the To Workspace block in Simulink. For more
information about STREL objects, see strel in the Image Processing Toolbox
documentation.
11-14
12
Fixed-Point Design
Note To take full advantage of fixed-point support in System Toolbox software, you must
install Fixed-Point Designer™ software.
Fixed-Point Features
Many of the blocks in this product have fixed-point support, so you can design signal
processing systems that use fixed-point arithmetic. Fixed-point support in DSP System
Toolbox software includes
12-2
Fixed-Point Signal Processing
its large dynamic range reduces the risk of overflow, quantization errors, and the need for
scaling. In contrast, the smaller dynamic range of fixed-point hardware that allows for
low-power, inexpensive units brings the possibility of these problems. Therefore, fixed-
point development must minimize the negative effects of these factors, while exploiting
the benefits of fixed-point hardware; cost- and size-effective units, less power and memory
usage, and fast real-time processing.
This software allows you to easily run multiple simulations with different word length,
scaling, overflow handling, and rounding method choices to see the consequences of
various fixed-point designs before committing to hardware. The traditional risks of fixed-
point development, such as quantization errors and overflow, can be simulated and
mitigated in software before going to hardware.
Fixed-point C code generation with System Toolbox software and Simulink Coder code
generation software produces code ready for execution on a fixed-point processor. All the
choices you make in simulation in terms of scaling, overflow handling, and rounding
methods are automatically optimized in the generated code, without necessitating time-
consuming and costly hand-optimized code.
12-3
12 Fixed-Point Design
Note The Glossary (DSP System Toolbox) defines much of the vocabulary used in these
sections. For more information on these subjects, see “Fixed-Point Designer”.
Binary numbers are represented as either fixed-point or floating-point data types. In this
section, we discuss many terms and concepts relating to fixed-point numbers, data types,
and mathematics.
A fixed-point data type is characterized by the word length in bits, the position of the
binary point, and whether it is signed or unsigned. The position of the binary point is the
means by which fixed-point values are scaled and interpreted.
where
12-4
Fixed-Point Concepts and Terminology
Fixed-point data types can be either signed or unsigned. Signed binary fixed-point
numbers are typically represented in one of three ways:
• Sign/magnitude
• One's complement
• Two's complement
Two's complement is the most common representation of signed fixed-point numbers and
is used by System Toolbox software. See “Two's Complement” on page 12-10 for more
information.
Scaling
Fixed-point numbers can be encoded according to the scheme
The integer is sometimes called the stored integer. This is the raw binary number, in
which the binary point assumed to be at the far right of the word. In System Toolboxes,
the negative of the exponent is often referred to as the fraction length.
The slope and bias together represent the scaling of the fixed-point number. In a number
with zero bias, only the slope affects the scaling. A fixed-point number that is only scaled
by binary point position is equivalent to a number in the Fixed-Point Designer [Slope Bias]
representation that has a bias equal to zero and a slope adjustment equal to one. This is
referred to as binary point-only scaling or power-of-two scaling:
exponent
real‐world value = 2 × integer
or
12-5
12 Fixed-Point Design
− f ractionlength
real‐world value = 2 × integer
In System Toolbox software, you can define a fixed-point data type and scaling for the
output or the parameters of many blocks by specifying the word length and fraction
length of the quantity. The word length and fraction length define the whole of the data
type and scaling information for binary-point only signals.
All System Toolbox blocks that support fixed-point data types support signals with binary-
point only scaling. Many fixed-point blocks that do not perform arithmetic operations but
merely rearrange data, such as Delay and Matrix Transpose, also support signals with
[Slope Bias] scaling.
Range
The range is the span of numbers that a fixed-point data type and scaling can represent.
The range of representable numbers for a two's complement fixed-point number of word
length wl, scaling S, and bias B is illustrated below:
For both signed and unsigned fixed-point numbers of any data type, the number of
different bit patterns is 2wl.
12-6
Fixed-Point Concepts and Terminology
Overflow Handling
Because a fixed-point data type represents numbers within a finite range, overflows can
occur if the result of an operation is larger or smaller than the numbers in that range.
System Toolbox software does not allow you to add guard bits to a data type on-the-fly in
order to avoid overflows. Any guard bits must be allocated upon model initialization.
However, the software does allow you to either saturate or wrap overflows. Saturation
represents positive overflows as the largest positive number in the range being used, and
negative overflows as the largest negative number in the range being used. Wrapping
uses modulo arithmetic to cast an overflow back into the representable range of the data
type. See “Modulo Arithmetic” on page 12-9 for more information.
Precision
For example, a fixed-point representation with four bits to the right of the binary point
has a precision of 2-4 or 0.0625, which is the value of its least significant bit. Any number
within the range of this data type and scaling can be represented to within (2-4)/2 or
0.03125, which is half the precision. This is an example of representing a number with
finite precision.
Rounding Modes
When you represent numbers with finite precision, not every number in the available
range can be represented exactly. If a number cannot be represented exactly by the
specified data type and scaling, it is rounded to a representable number. Although
precision is always lost in the rounding operation, the cost of the operation and the
amount of bias that is introduced depends on the rounding mode itself. To provide you
12-7
12 Fixed-Point Design
with greater flexibility in the trade-off between cost and bias, DSP System Toolbox
software currently supports the following rounding modes:
• Ceiling rounds the result of a calculation to the closest representable number in the
direction of positive infinity.
• Convergent rounds the result of a calculation to the closest representable number. In
the case of a tie, Convergent rounds to the nearest even number. This is the least
biased rounding mode provided by the toolbox.
• Floor, which is equivalent to truncation, rounds the result of a calculation to the
closest representable number in the direction of negative infinity.
• Nearest rounds the result of a calculation to the closest representable number. In the
case of a tie, Nearest rounds to the closest representable number in the direction of
positive infinity.
• Round rounds the result of a calculation to the closest representable number. In the
case of a tie, Round rounds positive numbers to the closest representable number in
the direction of positive infinity, and rounds negative numbers to the closest
representable number in the direction of negative infinity.
• Simplest rounds the result of a calculation using the rounding mode (Floor or
Zero) that adds the least amount of extra rounding code to your generated code. For
more information, see “Rounding Mode: Simplest” (Fixed-Point Designer).
• Zero rounds the result of a calculation to the closest representable number in the
direction of zero.
To learn more about each of these rounding modes, see “Rounding” (Fixed-Point
Designer).
For a direct comparison of the rounding modes, see “Choosing a Rounding Method”
(Fixed-Point Designer).
12-8
Arithmetic Operations
Arithmetic Operations
In this section...
“Modulo Arithmetic” on page 12-9
“Two's Complement” on page 12-10
“Addition and Subtraction” on page 12-11
“Multiplication” on page 12-12
“Casts” on page 12-14
Note These sections will help you understand what data type and scaling choices result
in overflows or a loss of precision.
Modulo Arithmetic
Binary math is based on modulo arithmetic. Modulo arithmetic uses only a finite set of
numbers, wrapping the results of any calculations that fall outside the given set back into
the set.
For example, the common everyday clock uses modulo 12 arithmetic. Numbers in this
system can only be 1 through 12. Therefore, in the “clock” system, 9 plus 9 equals 6. This
can be more easily visualized as a number circle:
12-9
12 Fixed-Point Design
Similarly, binary math can only use the numbers 0 and 1, and any arithmetic results that
fall outside this range are wrapped “around the circle” to either 0 or 1.
Two's Complement
Two's complement is a way to interpret a binary number. In two's complement, positive
numbers always start with a 0 and negative numbers always start with a 1. If the leading
bit of a two's complement number is 0, the value is obtained by calculating the standard
binary value of the number. If the leading bit of a two's complement number is 1, the
value is obtained by assuming that the leftmost bit is negative, and then calculating the
binary value of the number. For example,
0
01 = (0 + 2 ) = 1
1 0
11 = (( − 2 ) + (2 )) = ( − 2 + 1) = − 1
12-10
Arithmetic Operations
For example, consider taking the negative of 11010 (-6). First, take the one's complement
of the number, or flip the bits:
11010 00101
00101
+1 (6)
00110
For example, consider the addition of 010010.1 (18.5) with 0110.110 (6.75):
010010.1 (18.5)
+0110.110 (6.75)
011001.010 (25.25)
Fixed-point subtraction is equivalent to adding while using the two's complement value
for any negative values. In subtraction, the addends must be sign extended to match each
other's length. For example, consider subtracting 0110.110 (6.75) from 010010.1 (18.5):
Most fixed-point DSP System Toolbox blocks that perform addition cast the adder inputs
to an accumulator data type before performing the addition. Therefore, no further shifting
is necessary during the addition to line up the binary points. See “Casts” on page 12-14
for more information.
12-11
12 Fixed-Point Design
Multiplication
The multiplication of two's complement fixed-point numbers is directly analogous to
regular decimal multiplication, with the exception that the intermediate results must be
sign extended so that their left sides align before you add them together.
For example, consider the multiplication of 10.11 (-1.25) with 011 (3):
The following diagrams show the data types used for fixed-point multiplication in the
System Toolbox software. The diagrams illustrate the differences between the data types
used for real-real, complex-real, and complex-complex multiplication. See individual
reference pages to determine whether a particular block accepts complex fixed-point
inputs.
In most cases, you can set the data types used during multiplication in the block mask.
For details, see “Casts” on page 12-14.
Note The following diagrams show the use of fixed-point data types in multiplication in
System Toolbox software. They do not represent actual subsystems used by the software
to perform multiplication.
Real-Real Multiplication
The following diagram shows the data types used in the multiplication of two real
numbers in System Toolbox software. The software returns the output of this operation in
the product output data type, as the next figure shows.
12-12
Arithmetic Operations
Real-Complex Multiplication
The following diagram shows the data types used in the multiplication of a real and a
complex fixed-point number in System Toolbox software. Real-complex and complex-real
multiplication are equivalent. The software returns the output of this operation in the
product output data type, as the next figure shows.
Complex-Complex Multiplication
The following diagram shows the multiplication of two complex fixed-point numbers in
System Toolbox software. Note that the software returns the output of this operation in
the accumulator output data type, as the next figure shows.
12-13
12 Fixed-Point Design
System Toolbox blocks cast to the accumulator data type before performing addition or
subtraction operations. In the preceding diagram, this is equivalent to the C code
acc=ac;
acc-=bd;
acc=ad;
acc+=bc;
Casts
Many fixed-point System Toolbox blocks that perform arithmetic operations allow you to
specify the accumulator, intermediate product, and product output data types, as
12-14
Arithmetic Operations
applicable, as well as the output data type of the block. This section gives an overview of
the casts to these data types, so that you can tell if the data types you select will invoke
sign extension, padding with zeros, rounding, and/or overflow.
For most fixed-point System Toolbox blocks that perform addition or subtraction, the
operands are first cast to an accumulator data type. Most of the time, you can specify the
accumulator data type on the block mask. For details, see the description for
Accumulator data type parameter in “Specify Fixed-Point Attributes for Blocks” (DSP
System Toolbox). Since the addends are both cast to the same accumulator data type
before they are added together, no extra shift is necessary to insure that their binary
points align. The result of the addition remains in the accumulator data type, with the
possibility of overflow.
For System Toolbox blocks that perform multiplication, the output of the multiplier is
placed into a product output data type. Blocks that then feed the product output back into
the multiplier might first cast it to an intermediate product data type. Most of the time,
you can specify these data types on the block mask. For details, see the description for
Intermediate Product and Product Output data type parameters in “Specify Fixed-
Point Attributes for Blocks” (DSP System Toolbox).
Many fixed-point System Toolbox blocks allow you to specify the data type and scaling of
the block output on the mask. Remember that the software does not allow mixed types on
the input and output ports of its blocks. Therefore, if you would like to specify a fixed-
point output data type and scaling for a System Toolbox block that supports fixed-point
data types, you must feed the input port of that block with a fixed-point signal. The final
cast made by a fixed-point System Toolbox block is to the output data type of the block.
Note that although you cannot mix fixed-point and floating-point signals on the input and
output ports of blocks, you can have fixed-point signals with different word and fraction
lengths on the ports of blocks that support fixed-point signals.
Casting Examples
It is important to keep in mind the ramifications of each cast when selecting these
intermediate data types, as well as any other intermediate fixed-point data types that are
allowed by a particular block. Depending upon the data types you select, overflow and/or
12-15
12 Fixed-Point Design
rounding might occur. The following two examples demonstrate cases where overflow and
rounding can occur.
Cast from a Shorter Data Type to a Longer Data Type
Consider the cast of a nonzero number, represented by a four-bit data type with two
fractional bits, to an eight-bit data type with seven fractional bits:
As the diagram shows, the source bits are shifted up so that the binary point matches the
destination binary point position. The highest source bit does not fit, so overflow might
occur and the result can saturate or wrap. The empty bits at the low end of the
destination data type are padded with either 0's or 1's:
• If overflow does not occur, the empty bits are padded with 0's.
• If wrapping occurs, the empty bits are padded with 0's.
• If saturation occurs,
You can see that even with a cast from a shorter data type to a longer data type, overflow
might still occur. This can happen when the integer length of the source data type (in this
12-16
Arithmetic Operations
case two) is longer than the integer length of the destination data type (in this case one).
Similarly, rounding might be necessary even when casting from a shorter data type to a
longer data type, if the destination data type and scaling has fewer fractional bits than the
source.
Cast from a Longer Data Type to a Shorter Data Type
Consider the cast of a nonzero number, represented by an eight-bit data type with seven
fractional bits, to a four-bit data type with two fractional bits:
As the diagram shows, the source bits are shifted down so that the binary point matches
the destination binary point position. There is no value for the highest bit from the source,
so the result is sign extended to fill the integer portion of the destination data type. The
bottom five bits of the source do not fit into the fraction length of the destination.
Therefore, precision can be lost as the result is rounded.
In this case, even though the cast is from a longer data type to a shorter data type, all the
integer bits are maintained. Conversely, full precision can be maintained even if you cast
to a shorter data type, as long as the fraction length of the destination data type is the
same length or longer than the fraction length of the source data type. In that case,
however, bits are lost from the high end of the result and overflow might occur.
12-17
12 Fixed-Point Design
The worst case occurs when both the integer length and the fraction length of the
destination data type are shorter than those of the source data type and scaling. In that
case, both overflow and a loss of precision can occur.
12-18
Fixed-Point Support for MATLAB System Objects
The following Computer Vision Toolbox objects support fixed-point data processing.
12-19
12 Fixed-Point Design
You change the values of fixed-point properties in the same way as you change any
System object property value. You also use the Fixed-Point Designer numerictype object
to specify the desired data type as fixed point, the signedness, and the word- and fraction-
lengths.
In the same way as for blocks, the data type properties of many System objects can set
the appropriate word lengths and scalings automatically by using full precision. System
objects assume that the target specified on the Configuration Parameters Hardware
Implementation target is ASIC/FPGA.
If you have not set the property that activates a dependent property and you attempt to
change that dependent property, you will get a warning message.
You must set the property that activates a dependent property before attempting to
change the dependent property. If you do not set the activating property, you will get a
warning message.
Note System objects do not support fixed-point word lengths greater than 128 bits.
For any System object provided in the Toolbox, the fimath settings for any fimath attached
to a fi input or a fi property are ignored. Outputs from a System object never have an
attached fimath.
12-20
Specify Fixed-Point Attributes for Blocks
Note Floating-point inheritance takes precedence over the settings discussed in this
section. When the block has floating-point input, all block data types match the input.
You can find most fixed-point parameters on the Data Types pane of System Toolbox
blocks. The following figure shows a typical Data Types pane.
12-21
12 Fixed-Point Design
All System Toolbox blocks with fixed-point capabilities share a set of common parameters,
but each block can have a different subset of these fixed-point parameters. The following
table provides an overview of the most common fixed-point block parameters.
12-22
Specify Fixed-Point Attributes for Blocks
The Data Type Assistant is an interactive graphical tool available on the Data Types
pane of some fixed-point System Toolbox blocks.
To learn more about using the Data Type Assistant to help you specify block data type
parameters, see “Specify Data Types Using Data Type Assistant” (Simulink).
Some fixed-point System Toolbox blocks have Minimum and Maximum parameters on
the Data Types pane. When a fixed-point data type has these parameters, you can use
them to specify appropriate minimum and maximum values for range checking purposes.
12-23
12 Fixed-Point Design
To learn how to specify signal ranges and enable signal range checking, see “Signal
Ranges” (Simulink).
Logging
The Fixed-Point Tool logs overflows, saturations, and simulation minimums and
maximums for fixed-point System Toolbox blocks. The Fixed-Point Tool does not log
overflows and saturations when the Data overflow line in the Diagnostics > Data
Integrity pane of the Configuration Parameters dialog box is set to None.
Autoscaling
You can use the Fixed-Point Tool autoscaling feature to set the scaling for System Toolbox
fixed-point data types.
System Toolbox blocks obey the Use local settings, Double, Single, and Off
modes of the Data type override parameter in the Fixed-Point Tool. The
Scaled double mode is also supported for System Toolboxes source and byte-shuffling
blocks, and for some arithmetic blocks such as Difference and Normalization.
12-24
Specify Fixed-Point Attributes for Blocks
Note In the equations in the following sections, WL = word length and FL = fraction
length.
The internal rule for accumulator data types first calculates the ideal, full-precision result.
Where N is the number of addends:
FLidealaccumulator = FLinputtoaccumulator
For example, consider summing all the elements of a vector of length 6 and data type
sfix10_En8. The ideal, full-precision result has a word length of 13 and a fraction length of
8.
The accumulator can be real or complex. The preceding equations are used for both the
real and imaginary parts of the accumulator. For any calculation, after the full-precision
result is calculated, the final word and fraction lengths set by the internal rule are
affected by your particular hardware. See “The Effect of the Hardware Implementation
Pane on the Internal Rule” on page 12-26 for more information.
The internal rule for product data types first calculates the ideal, full-precision result:
For example, multiplying together the elements of a real vector of length 2 and data type
sfix10_En8. The ideal, full-precision result has a word length of 20 and a fraction length of
16.
For real-complex multiplication, the ideal word length and fraction length is used for both
the complex and real portion of the result. For complex-complex multiplication, the ideal
12-25
12 Fixed-Point Design
word length and fraction length is used for the partial products, and the internal rule for
accumulator data types described above is used for the final sums. For any calculation,
after the full-precision result is calculated, the final word and fraction lengths set by the
internal rule are affected by your particular hardware. See “The Effect of the Hardware
Implementation Pane on the Internal Rule” on page 12-26 for more information.
A few System Toolbox blocks have an Inherit via internal rule choice available
for the block output. The internal rule used in these cases is block-specific, and the
equations are listed in the block reference page.
As with accumulator and product data types, the final output word and fraction lengths
set by the internal rule are affected by your particular hardware, as described in “The
Effect of the Hardware Implementation Pane on the Internal Rule” on page 12-26.
The internal rule selects word lengths and fraction lengths that are appropriate for your
hardware. To get the best results using the internal rule, you must specify the type of
hardware you are using on the Hardware Implementation pane of the Configuration
Parameters dialog box. You can open this dialog box from the Simulation menu in your
model.
12-26
Specify Fixed-Point Attributes for Blocks
ASIC/FPGA
On an ASIC/FPGA target, the ideal, full-precision word length and fraction length
calculated by the internal rule are used. If the calculated ideal word length is larger than
the largest allowed word length, you receive an error.
Other targets
For all targets other than ASIC/FPGA, the ideal, full-precision word length calculated by
the internal rule is rounded up to the next available word length of the target. The
calculated ideal fraction length is used, keeping the least-significant bits.
If the calculated ideal word length for a product data type is larger than the largest word
length on the target, you receive an error. If the calculated ideal word length for an
accumulator or output data type is larger than the largest word length on the target, the
largest target word length is used.
The largest word length allowed for Simulink and System Toolbox software on any target
is 128 bits.
12-27
12 Fixed-Point Design
The following sections show examples of how the internal rule interacts with the
Hardware Implementation pane to calculate accumulator data types on page 12-28
and product data types on page 12-31.
Accumulator Data Types
In the Difference blocks, the Accumulator parameter is set to Inherit: Inherit via
internal rule, and the Output parameter is set to Inherit: Same as
accumulator. Therefore, you can see the accumulator data type calculated by the
internal rule on the output signal in the model.
12-28
Specify Fixed-Point Attributes for Blocks
In the preceding model, the Device type parameter in the Hardware Implementation
pane of the Configuration Parameters dialog box is set to ASIC/FPGA. Therefore, the
accumulator data type used by the internal rule is the ideal, full-precision result.
Calculate the full-precision word length for each of the Difference blocks in the model:
Calculate the full-precision fraction length, which is the same for each Matrix Sum block
in this example:
FLidealaccumulator = FLinputtoaccumulator
FLidealaccumulator = 4
Now change the Device type parameter in the Hardware Implementation pane of the
Configuration Parameters dialog box to 32–bit Embedded Processor, by changing the
parameters as shown in the following figure.
12-29
12 Fixed-Point Design
As you can see in the dialog box, this device has 8-, 16-, and 32-bit word lengths available.
Therefore, the ideal word lengths of 10, 17, and 128 bits calculated by the internal rule
cannot be used. Instead, the internal rule uses the next largest available word length in
each case You can see this if you rerun the model, as shown in the following figure.
12-30
Specify Fixed-Point Attributes for Blocks
12-31
12 Fixed-Point Design
In the Array-Vector Multiply blocks, the Product Output parameter is set to Inherit:
Inherit via internal rule, and the Output parameter is set to Inherit: Same
as product output. Therefore, you can see the product output data type calculated by
the internal rule on the output signal in the model. The setting of the Accumulator
parameter does not matter because this example uses real values.
For the preceding model, the Device type parameter in the Hardware Implementation
pane of the Configuration Parameters dialog box is set to ASIC/FPGA. Therefore, the
product data type used by the internal rule is the ideal, full-precision result.
Calculate the full-precision word length for each of the Array-Vector Multiply blocks in the
model:
12-32
Specify Fixed-Point Attributes for Blocks
Calculate the full-precision fraction length, which is the same for each Array-Vector
Multiply block in this example:
FLidealaccumulator = FLinputtoaccumulator
FLidealaccumulator = 4
Now change the Device type parameter in the Hardware Implementation pane of the
Configuration Parameters dialog box to 32–bit Embedded Processor, as shown in the
following figure.
12-33
12 Fixed-Point Design
As you can see in the dialog box, this device has 8-, 16-, and 32-bit word lengths available.
Therefore, the ideal word lengths of 12 and 31 bits calculated by the internal rule cannot
be used. Instead, the internal rule uses the next largest available word length in each
case. You can see this if you rerun the model, as shown in the following figure.
12-34
Specify Fixed-Point Attributes for Blocks
12-35
12 Fixed-Point Design
This model uses the Cumulative Sum block to sum the input coming from the Fixed-
Point Sources subsystem. The Fixed-Point Sources subsystem outputs two signals
with different data types:
• The Signed source has a word length of 16 bits and a fraction length of 15 bits.
• The Unsigned source has a word length of 16 bits and a fraction length of 16 bits.
2 Run the model to check for overflow. MATLAB displays the following warnings at the
command line:
Warning: Overflow occurred. This originated from
'ex_fixedpoint_tut/Signed Cumulative Sum'.
Warning: Overflow occurred. This originated from
'ex_fixedpoint_tut/Unsigned Cumulative Sum'.
12-36
Specify Fixed-Point Attributes for Blocks
4 Now that you have turned on logging, rerun the model by clicking the Simulation
button.
5 The results of the simulation appear in a table in the central Contents pane of the
Fixed-Point Tool. Review the following columns:
12-37
12 Fixed-Point Design
• Name — Provides the name of each signal in the following format: Subsystem
Name/Block Name: Signal Name.
• SimDT — The simulation data type of each logged signal.
• SpecifiedDT — The data type specified on the block dialog for each signal.
• SimMin — The smallest representable value achieved during simulation for each
logged signal.
• SimMax — The largest representable value achieved during simulation for each
logged signal.
• OverflowWraps — The number of overflows that wrap during simulation.
For more information on each of the columns in this table, see the “Contents Pane”
(Simulink) section of the Simulink fxptdlg function reference page.
You can also see that the SimMin and SimMax values for the Accumulator data
types range from 0 to .9997. The logged results indicate that 8,192 overflows
wrapped during simulation in the Accumulator data type of the Signed Cumulative
Sum block. Similarly, the Accumulator data type of the Unsigned Cumulative Sum
block had 16,383 overflows wrap during simulation.
To get more information about each of these data types, highlight them in the
Contents pane, and click the Show details for selected result button ( )
6 Assume a target hardware that supports 32-bit integers, and set the Accumulator
word length in both Cumulative Sum blocks to 32. To do so, perform the following
steps:
12-38
Specify Fixed-Point Attributes for Blocks
5 Change the Word length to 32, and click the Refresh details button in the
Fixed-point details section to see the updated representable range. When you
change the value of the Word length parameter, the Data Type edit box
automatically updates.
6 Click OK on the block dialog box to save your changes and close the window.
7 Set the word length of the Accumulator data type of the Unsigned Cumulative
Sum block to 32 bits. You can do so in one of two ways:
• Type the data type fixdt([],32,0) directly into Data Type edit box for the
Accumulator data type parameter.
12-39
12 Fixed-Point Design
• Perform the same steps you used to set the word length of the Accumulator
data type of the Signed Cumulative Sum block to 32 bits.
7 To verify your changes in word length and check for overflow, rerun your model. To
do so, click the Simulate button in the Fixed-Point Tool.
The Contents pane of the Fixed-Point Tool updates, and you can see that no
overflows occurred in the most recent simulation. However, you can also see that the
SimMin and SimMax values range from 0 to 0. This underflow happens because the
fraction length of the Accumulator data type is too small. The SpecifiedDT cannot
represent the precision of the data values. The following sections discuss how to find
a floating-point benchmark and use the Fixed-Point Tool to propose fraction lengths.
The Data type override feature of the Fixed-Point tool allows you to override the data
types specified in your model with floating-point types. Running your model in Double
override mode gives you a reference range to help you select appropriate fraction lengths
for your fixed-point data types. To do so, perform the following steps:
1 Open the Fixed-Point Tool and set Data type override to Double.
2 Run your model by clicking the Run simulation and store active results button.
3 Examine the results in the Contents pane of the Fixed-Point Tool. Because you ran
the model in Double override mode, you get an accurate, idealized representation of
the simulation minimums and maximums. These values appear in the SimMin and
SimMax parameters.
4 Now that you have an accurate reference representation of the simulation minimum
and maximum values, you can more easily choose appropriate fraction lengths.
Before making these choices, save your active results to reference so you can use
them as your floating-point benchmark. To do so, select Results > Move Active
Results To Reference from the Fixed-Point Tool menu. The status displayed in the
Run column changes from Active to Reference for all signals in your model.
Now that you have your Double override results saved as a floating-point reference, you
are ready to propose fraction lengths.
1 To propose fraction lengths for your data types, you must have a set of Active
results available in the Fixed-Point Tool. To produce an active set of results, simply
rerun your model. The tool now displays both the Active results and the Reference
results for each signal.
12-40
Specify Fixed-Point Attributes for Blocks
2 Select the Use simulation min/max if design min/max is not available check
box. You did not specify any design minimums or maximums for the data types in this
model. Thus, the tool uses the logged information to compute and propose fraction
lengths. For information on specifying design minimums and maximums, see “Signal
Ranges” (Simulink).
3
Click the Propose fraction lengths button ( ). The tool populates the proposed
data types in the ProposedDT column of the Contents pane. The corresponding
proposed minimums and maximums are displayed in the ProposedMin and
ProposedMax columns.
Before accepting the fraction lengths proposed by the Fixed-Point Tool, it is important to
look at the details of that data type. Doing so allows you to see how much of your data the
suggested data type can represent. To examine the suggested data types and accept the
proposed scaling, perform the following steps:
1 In the Contents pane of the Fixed-Point Tool, you can see the proposed fraction
lengths for the data types in your model.
• The proposed fraction length for the Accumulator data type of both the Signed
and Unsigned Cumulative Sum blocks is 17 bits.
• To get more details about the proposed scaling for a particular data type,
highlight the data type in the Contents pane of the Fixed-Point Tool.
• Open the Autoscale Information window for the highlighted data type by clicking
accepted fraction lengths button ( ). The tool updates the specified data types
on the block dialog boxes and the SpecifiedDT column in the Contents pane.
4 To verify the newly accepted scaling, set the Data type override parameter back to
Use local settings, and run the model. Looking at Contents pane of the Fixed-Point
Tool, you can see the following details:
12-41
12 Fixed-Point Design
• The SimMin and SimMax values of the Active run match the SimMin and
SimMax values from the floating-point Reference run.
• There are no longer any overflows.
• The SimDT does not match the SpecifiedDT for the Accumulator data type of
either Cumulative Sum block. This difference occurs because the Cumulative Sum
block always inherits its Signedness from the input signal and only allows you to
specify a Signedness of Auto. Therefore, the SpecifiedDT for both Accumulator
data types is fixdt([],32,17). However, because the Signed Cumulative Sum
block has a signed input signal, the SimDT for the Accumulator parameter of that
block is also signed (fixdt(1,32,17)). Similarly, the SimDT for the
Accumulator parameter of the Unsigned Cumulative Sum block inherits its
Signedness from its input signal and thus is unsigned (fixdt(0,32,17)).
12-42
13
Code Generation
• Write your Computer Vision Toolbox function or application as you would normally,
using functions from the Computer Vision Toolbox.
• Add the %#codegen compiler directive to your MATLAB code.
• Open the MATLAB Coder app, create a project, and add your file to the project. Once
in MATLAB Coder, you can check the readiness of your code for code generation. For
example, your code may contain functions that are not enabled for code generation.
Make any modifications required for code generation.
• Generate code by clicking Generate in the Generate Code dialog box. You can choose
to build a MEX file, a C/C++ shared library, a C/C++ dynamic library, or a C/C++
executable.
Even if you addressed all readiness issues identified by MATLAB Coder, you might still
encounter build issues. The readiness check only looks at function dependencies.
When you try to generate code, MATLAB Coder might discover coding patterns that
are not supported for code generation. View the error report and modify your MATLAB
code until you get a successful build.
For more information about code generation, see the MATLAB Coder documentation and
the “Introduction to Code Generation with Feature Matching and Registration” example.
Note To generate code from MATLAB code that contains Computer Vision Toolbox
functionality, you must have the MATLAB Coder software.
• For some Computer Vision Toolbox functions, code generation includes creation of a
shared library.
• Refer to the “Code Generation Support, Usage Notes, and Limitations” on page 13-3
for supported functionality, usages, and limitations.
13-2
Code Generation Support, Usage Notes, and Limitations
To generate code from MATLAB code that contains Computer Vision Toolbox functions,
classes, or System objects, you must have the MATLAB Coder software.
An asterisk (*) indicates that the reference page has usage notes and limitations for C/C+
+ code generation.
Name
Feature Detection, Extraction, and Matching
BRISKPoints*
cornerPoints*
detectBRISKFeatures*
detectFASTFeatures*
detectHarrisFeatures*
detectMinEigenFeatures*
detectMSERFeatures*
detectORBFeatures
detectSURFFeatures*
extractFeatures
extractHOGFeatures*
extractLBPFeatures*
matchFeatures*
MSERRegions*
ORBPoints*
SURFPoints*
Point Cloud Processing
findNearestNeighbors*
findNeighborsInRadius*
13-3
13 Code Generation
Name
findPointsInROI*
pcdenoise*
pcdownsample*
pcfitcylinder*
pcfitplane*
pcfitsphere*
pcmerge*
pcnormals*
pcregistercpd*
pcsegdist*
pctransform*
pointCloud*
removeInvalidPoints*
segmentLidarData*
select*
Image Registration and Geometric Transformations
estimateGeometricTransform*
Object Detection and Recognition
acfObjectDetector*
detect of acfObjectDetector*
ocr*
ocrText*
vision.PeopleDetector*
vision.CascadeObjectDetector*
Tracking and Motion Estimation
assignDetectionsToTracks
estimateFlow
opticalFlow
13-4
Code Generation Support, Usage Notes, and Limitations
Name
opticalFlowFarneback
opticalFlowHS
opticalFlowLKDoG
opticalFlowLK
reset
vision.ForegroundDetector*
vision.HistogramBasedTracker*
vision.KalmanFilter*
vision.PointTracker*
vision.TemplateMatcher*
Camera Calibration and Stereo Vision
bboxOverlapRatio
bbox2points
disparity*
disparityBM
disparitySGM
cameraPoseToExtrinsics
cameraMatrix*
cameraPose*
cameraParameters*
detectCheckerboardPoints*
epipolarLine
estimateEssentialMatrix*
estimateFundamentalMatrix*
estimateUncalibratedRectification
estimateWorldCameraPose*
extrinsics*
extrinsicsToCameraPose
13-5
13 Code Generation
Name
generateCheckerboardPoints
isEpipoleInImage
lineToBorderPoints
reconstructScene*
rectifyStereoImages*
relativeCameraPose*
rotationMatrixToVector
rotationVectorToMatrix
selectStrongestBbox
stereoAnaglyph
stereoParameters*
triangulate*
undistortImage*
Statistics
vision.BlobAnalysis*
vision.LocalMaximaFinder*
vision.Maximum*
vision.Mean*
vision.Median*
vision.Minimum*
vision.StandardDeviation*
vision.Variance*
Filters, Transforms, and Enhancements
integralImage
vision.Deinterlacer*
Video Loading, Saving, and Streaming
vision.DeployableVideoPlayer
vision.VideoFileReader*
13-6
Code Generation Support, Usage Notes, and Limitations
Name
vision.VideoFileWriter*
Color Space Formatting and Conversions
vision.ChromaResampler*
vision.GammaCorrector*
Graphics
insertMarker*
insertShape*
insertObjectAnnotation*
insertText*
vision.AlphaBlender*
13-7
13 Code Generation
There are a few Computer Vision Toolbox blocks that generate code with limited
portability. These blocks use precompiled shared libraries, such as DLLs, to support I/O
for specific types of devices and file formats. To find out which blocks use precompiled
shared libraries, open the Computer Vision Toolbox Block Support Table. You can identify
blocks that use precompiled shared libraries by checking the footnotes listed in the Code
Generation Support column of the table. All blocks that use shared libraries have the
following footnote:
Simulink Coder provides functions to help you set up and manage the build information
for your models. For example, one of the Build Information functions that Simulink Coder
provides is getNonBuildFiles. This function allows you to identify the shared libraries
required by blocks in your model. If your model contains any blocks that use precompiled
shared libraries, you can install those libraries on the target system. The folder that you
install the shared libraries in must be on the system path. The target system does not
need to have MATLAB installed, but it does need to be supported by MATLAB.
13-8
Accelerating Simulink Models
To change between Rapid Accelerator, Accelerator, and Normal mode, use the
drop-down list at the top of the model window.
For more information on the accelerator modes in Simulink, see “Choosing a Simulation
Mode” (Simulink).
13-9
13 Code Generation
The generated binary uses prebuilt OpenCV libraries that ship with the Computer Vision
Toolbox product. Your compiler must be compatible with the one used to build the
libraries. The following compilers are used to build the OpenCV libraries for MATLAB
host:
Limitations
Computer Vision Toolbox functions that use the OpenCV library do not support target
code generation from Simulink.
13-10