Range Image Segmentation For 3-D Object Recognition
Range Image Segmentation For 3-D Object Recognition
ScholarlyCommons
May 1988
Recommended Citation
Alok Gupta, "Range Image Segmentation for 3-D Object Recognition", . May 1988.
University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-88-32.
Abstract
Three dimensional scene analysis in an unconstrained and uncontrolled environment is the ultimate goal
of computer vision. Explicit depth information about the scene is of tremendous help in segmentation and
recognition of objects. Range image interpretation with a view of obtaining low-level features to guide
mid-level and high-level segmentation and recognition processes is described. No assumptions about the
scene are made and algorithms are applicable to any general single viewpoint range image. Low-level
features like step edges and surface characteristics are extracted from the images and segmentation is
performed based on individual features as well as combination of features. A high level recognition
process based on superquadric fitting is described to demonstrate the usefulness of initial segmentation
based on edges. A classification algorithm based on surface curvatures is used to obtain initial
segmentation of the scene. Objects segmented using edge information are then classified using surface
curvatures. Various applications of surface curvatures in mid and high level recognition processes are
discussed. These include surface reconstruction, segmentation into convex patches and detection of
smooth edges. Algorithms are run on real range images and results are discussed in detail.
Comments
University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-
CIS-88-32.
MS-CIS-88-32
GRASP LAB 141
May 1988
Acknowledgements: The work reported herein was supported in part by NSF grant
DCR-8410771, Airforce grant AFOSR F49620-85-K-0018,ArmytDAAG-29-84-K-0061, NSF-
CERlDCR82-19196 Ao2, DARPAIONR NO014-85-K-0807, U.S. Postal Service contract
104230-87-M-0195.
UNIVERSITY OF PENKSYLVANU
THE MOORE SCHOOL OF ELECTRICAL ENGINEERING
SCHOOL OF ENGINEERING AND APPLIED SCIENCE
Alok Gupta
Philadelphia, Pennsylvania
May 1988
A thesis presented to the Faculty of Engineering and Applied Science of the University of
Pennsylvania in partial fulfillment of the requirements for the degree of Master of Science
in Engineering for graduate work in Computer and Information Science.
Ruzena ~ a j c s ~ '
(Advisor) '
Richard ~ a u l
(Graduate Group Chair)
Abstract
I would like to thank my advisor Dr. Ruzena Bajcsy for the guidance and encouragement. I
am grateful t o Dr. Kwangyoen Wohn for motivating me t o work in the field of range image
interpretation. Prof. Wohn also guided me in initial stages of this thesis and furnished
programs for least squares fitting. Thanks to Franc Solina for many ideas and suggestions
and for superquadrics software. Finally, thanks to Gus Tsikos for help with the range image
scanner.
The support of the following contract and grants is gratefully acknowledged : NSF grant
DCR-8410771, Airforce grant AFOSR F49620-85-K-0018, Army/DAAG-29-84-K-0061, NSF-
CER/DCR82-19196 Ao2, DARPA/ONR N0014-85-K-0807, US Postal Service contract 104230-
87-M-0195.
Contents
Abstract
Acknowledgements
1 Introduction
5 Discussion
Introduction
Preprocessing :
1) Scaling.
2) Smoothing. 1
' Edge det
Laplacian of Gaussian.
Computation of second
order derivatives by
least squares fitting.
I arameters
Range images obtained by different scanners differ in the format of the output. In order to
apply low level techniques to the image it is necessary that the image points be quantized
in Z-depth format with equal resolution factor in X and Y direction. Once converted into
Z-depth format the image is smoothed.This chapter discusses some practical aspects of real
range image processing which are important if any useful results are desired.
The test images.used in this work were acquired by structured lighting triangulation based
scanners. Figure 2 shows the ranging geometry of a typical range sensor. The trigonometry
of a sensor will not be described here.
Either the laser stripe moves and scans the scene or the workspace moves under a vertical
laser stripe. If the viewpoint of the sensing camera is not the same as the laser then shadows
(regions with missing data) are obtained. In order to discriminate between shadows and
background, (region of known depth on which object is sitting) background is assigned a
nonzero depth.
Laser Source
Direction of scanning
(a) Ranging u s i n g structured l i g h t i n g .
+ydim
(b) Z-depth format of range images
Sampling interval of the scanners depends on the thickness of the laser stripe, value of laser
stripe increment and resolution of the camera. More often than not vertical resolution (along
Y axis) is different from horizontal resolution (along X axis). Thus the sampled points are
not spaced uniformly in X and Y direction. Since we apply neighborhood operators during
low level processing of images, it is necessary to rescale the images uniformly in both
directions. 'We have rescaled the 2-depth image by fitting a plane on three neighborhood
points. Figure 3 illustrates the difference between unscaled and uniformly scaled images.
Depth resolution of a range image is an important parameter in low level processing. Range
scanners usually have depth resolution good enough for most applications. In fact a res-
olution of 0.01 inchlpixel is too fine and noise sensitive for surface fitting purposes. The
problem comes in quantization of z values. If entire scan depth is quantized within 8 bits
(most convenient representation), effective depth resolution is drastically reduced thereby
Figure 3: 2-depth format images. left :original resolution. right : uniformly scaled
increasing the quantization error. Since surface fitting is very sensitive to quantization
error, we have minimized it by following two step procedure :
1. Original depth resolution is preserved by storing the depth value unscaled in 2 bytes.
This allows 64k possible quantization levels. Scaling along Z axis is done only when
needed.
2. Image is smoothed using a Gaussian operator and smoothed values are stored in
floating point buffers so as not to lose any precision.
One way to reduce noise is to perform median filtering of the image. It ensures that
isolated noise is reduced and edges are not smoothed.
Our approach is to study the scale-space behavior of range images. We have used
Gaussian operator to smooth images. The Gaussian function in two dimensions is given
by :
Since it is separable in X and Y directions, one dimensional Gaussian operator is applied
separably on the image. Smoothing is controlled by the size of the operator, which is
determined by a. Gaussian operator has some nice properties that make it a unique operator
for our purposes.
Yuille and Poggio [39]have proved that the Gaussian low-pass filter is the only filter
with a nice-scaling behavior for linear derivative operators like the laplacian operator. It
also satisfies the following conditions :
2. The filter has no preferred scale. The filter is properly normalized at all the scales.
lim F ( z , u ) = S(z)
u-+o
4. The position of the center of the filter is independent of scale of the filter. Otherwise
zero crossings of a step edge would change their position with change of scale.
We have studied the behavior of increasing sigma value on edge detection, surface char-
acterization and segmentation. As a value increases, window size of the Gaussian operator
increases and details are lost. Figure 4 and figure 5 show the perspective plots of a range
image before and after smoothing respectively.
Minor surface perturbations are smoothed easily. But the undesirable effect of uniform
application of Gaussian operator is smoothing of all types of edges. Step edges (chapter 3)
are smoothed t o form roughly convex and concave subparts (chapter 4).This complicates
edge detection, specially detection of smooth edges (concave and convex edges) that are
Figure 4: 3-D perspective plot of original image
12
further smoothed. Thin objects tend to merge into the underlying objects, making seg-
mentation difficult. As will be discussed in chapter 3, as we go up the scale objects start
merging. We have found sigma value of 1 ( window size 5 x 5 ) to be best suited for our
experiments. In surface based segmentation technique, smoothing alters the local behavior
of the surface, but makes the result more reliable, specially away from the edges. Step
edges are shown as adjacent convex and concave regions. Segmentation using these effects
of smoothing is discussed in chapter 4 in detail.
Step Concave roof Convex roof Convex ramp concave ramp
allows boundaries to be read off as zero crossings of the LOG operated image. VTe'lldiscuss
the significance of the LOG operator in view of range images. M'hile step edges pose no
particular problem, smooth edges are difficult to detect by local operations. In range image
segmentation it is of particular interest that the a pile of objects be segmented into convex
subparts. This requires detection of concave edges that will delimit the convex subparts.
Mitiche and Aggarwal [28] have presented a probabilistic approach of detecting the convex
and concave edges by using domain specific constraints.
Though 3-D edges are quite useful for object recognition, there are some inherent limita-
tions in edge information that make their use limited to aiding the higher level recognition
processes along with a host of invariant features. Edge classification depends on the orienta-
tion of object in 3-D space and is therefore not an invariant feature. Thus edge information
cannot be the only feature used by the recognizer and it has to be used in conjunction with
other features. However, as will be seen later, edge information is good enough for early
segmentation of range images because the requirement of invariant features does not apply
to the initial-segmentation process.
In case intensity information is available range data can be complemented by reflectance
data to pick up weak 3-D edges like the step edges created by overlapping thin objects. Wong
and Hayrapetian [32] used range information to segment intensity images. Gil, Mitiche
and Aggarwal [27] have described experiments in combining intensity and range edges.
While intensity images are certainly useful in detecting edges in the scene, they need to be
registered in the same way as range images to avoid correspondence problem. This may
Vertical section
of a range image.
b
X
First
derivative.
Second
h Derivative.
Y
Figure 7: Derivatives of a cross-section of a range image
The cross-section of a range image and the profiles of first and second order derivative
are shown in figure 7.
In two dimensions the LOG operator becomes :
Figure 8: 3-D edges detected in a synthetic range image. : upperleft original im-
age. upperright thresholded step edges. lowerleft thresholded convex edges. lowerright
t hresholded concave edges.
It is clear that behavior of second order derivatives is unique at every type of change in
surface. There are positive spikes at concave ramp edge, negative spikes at convex roof and
ramp edges and zero crossings at jump edges. However there are serious practical limitations
in using this response t o detect concave and convex edges. The response at these edges is
dependent on the convexity or concavity of the edge which is roughly the measure of angle
at which the two surfaces meet. If the angle is too small and change in depth is gradual
as in most situations, the response would be below or same as that due to local surface
changes. See figure 8 for the step, convex and concave responses obtained in a synthetic
range image having planar regions. Even in synthetic range image the responses deteriorate
as image was smoothed and smooth concave and convex edges virtually disappeared.
Thresholding of zero crossings is necessary in case of range images to avoid local surface
perturbations. Responses due to weak concave and convex edges would then be filtered.
Zero-crossings generated by weak step edges may also lie below the thresholding value. In
range images, Value of zero-crossing has direct relationship with the magnitude of depth
discontinuity. Thus selection of a thmshold effectively restricts the minimum detectable
depth. An object with less than acceptable height would be invisible in the edge image.
As observed in the previous chapter, it is absolutely necessary to smooth an image before
attempting any local operation it. LOG operator gives following image :
Degree of smoothing depends on the value of sigma, which controls the size of the
window. Larger the sigma, greater is smoothing. While this effect is interpreted in intensity
images as blurring and hence reduction of details, In range images it is seen in terms of
smoothing the surface at the boundaries in addition to reduction of details. This results in
all types of boundaries to become smoothed and can have undesirable effects on boundary
detection and surface based segmentation. We have observed that with increasing scale
value, range images loose vital boundary information presenting difficulties in edge based
segmentation. Empirically determined window size of 5 ( sigma value = 1 ) is chosen for
processing all the images.
The algorithm for edge detection is given below.
Figures ll(b),12(b) and other figures show the magnitude of zero-crossings detected in
range images.
It is observed that threshold selection is important in defining the acceptable depth,
which in turn is determined by the amount of detail seen in the filtered image. Also the
thresholding value is different at different scales. Threshold value varies inversely with
smoothing parameter (a).As we go up the scale-space the need for thresholding decreases.
Next we discuss a segmentation technique based on the step edges detected by the LOG
operator.
1. Make assumptions about the type of the objects. For example, assume the topmost
object to be convex. This is a very strong assumption and requires convex internal
boundary information. As noted before, this information is difficult to derive using
second order derivatives. We will see in next chapter, how surface characterization
techniques can be used to approximate the presence of smooth internal boundaries of
an object.
2. a priori knowledge about the scene. But this is against our approach towards a general
and robust segmentation algorithm.
3. Feedback from higher stages of object recognition to eliminate need for a priori knowl-
edge and to relax the strong assumptions made at low-level segmentation stage. Since
higher levels of object recognition are global processes and may have knowledge about
the domain, a closed loop segmentation procedure is bound to perform better than
one having no feedback.
Thus to achieve a reliable segmentation of the initial scene, we will assume that the top-
most object is delineated by jump boundaries. This may not always be true as two objects
can join at convex or concave edges or one object may merge into next one due to negligible
thickness of the object at the point of contact. This means that local information cannot
give perfect segmentation in all the cases. In such cases we need higher level processing to
figure out the right segmentation of the scene. The segmentation based on external bound-
ary information will give only an initial estimate of segmentation. This estimate is reliable
to the point that it can distinguish between objects of predetermined depth.
In the context of invariant object recognition it is important to note that step boundaries
may vary with orientation of the object. Thus they are used only to segment the object
and not to recognize the object. We will discuss the results of recognition and classification
of the segmented object using the superquadric technique developed by Franc [23].
The block diagram of segmentation and classification process is shown in figure 9
A practical problem with using zero-crossing as step boundaries is that they do not form
closed contours. The boundaries delineating the object may not be completely enclosing
the object, resulting in region growing process to overflow from the object and include the
neighboring object as part of the object. This drawback renders the final segmentation
result very unreliable. Yang and Kak [33] use a priori knowledge about the width of the
-
Smoothed range Range Image
by region growing.
All objects
seg~mented. Given
Supporting surface to surface
I information
~- -- -
. .. . . .. . ..
c la ssificat ion
Generate list of procedure
points 1
Add points.
horizontally/
vertically.
#
Superquadric I
Model selection.
I Model classification. 1
Figure 9: Flow chart of segmentation and recognition process
object and contour tracking to extract the closed contour surrounding the object. Their
method does not guarantee success in all the cases. David Heeger [26]has proposed a
computational approach to gap filling. It is computationally expensive and not suited for
our purpose since we want to avoid explicit contour tracing in the entire image and want
the region growing method to take care of it. Peter Allen [29] has used gap filling method
based on contour growing proposed by Nevatia and Babu [30]. They perform gap filling on
the entire scene using the predecessor-successor graph of all connected contours.See Peter
Allen's Ph.D. thesis for details. Contours are then merged based on the requirement of
merging N pixel gaps. This approach is again computationally expensive. Peter Allen
observes that filling of at most two pixel gaps is acceptable because of the ambiguities
resulting with three or more pixel gap-filling requirement. We have implemented two pixel
gap f a n g by constraining the region growing process near the boundaries, thus avoiding
the explicit gap filling stage.
One and two pixel gap filling is accomplished by simply requiring that a pixel having a
boundary pixel in its &neighborhood be not grown recursively. Instead, the pixel and the
boundary pixel are marked as grown. Figure 10 illustrates gap filling in one instance.
Thus we are able to avoid the contour tracing explicitly to fill gaps. Three or more pixel
gaps cannot be adequately handled by gap-fillers. Some sort of post processing is necessary
to further segment the segmented object in this case. One way is to trace all the boundary
pixels of the segmented object and use concavity information to segment the object into
parts. This approach is being implemented and will not be discussed here.
The algorithm for segmentation is given below :
1. Read in the original range image I(x,y) and V2G * I(x,y) image.
label-val = 0
3. Locate the 3x3 window with maximum height by averaging the pixel values in 3x3
Pixel Being Grown. Pixels in region.
Boundary Pixel.
4. label-val = label-val + 1
5. Grow the seed region recursively in a l l 8 directions. For gap filling procedure to work
it is necessary to grow pixels in &neighborhood. Let pij be the pixel being grown . A
pixel pij in &connected neighborhood of pi, is not grown under one of the following
conditions :
6. If number of pixels in the extracted region < acceptable size then region is invalid else
it is valid.
8. The region extracted in the first pass is topmost region. Subsequent regions are grown
from top to bottom, left to right. If any more pixels are left to be processed then pick
up any unprocessed pixel and go back to step 5 to grow region.
9. Output topmost region, all valid regions and supporting points in separate image files.
The surfaces extracted by the previous algorithm can be classified as one of the eight basic
surface types. We will discuss this classification approach in detail in next chapter. In this
section we will describe a high level recognition and classification method that classifies the
segmented object into four broad categories.
We have used superquadric model recovery method implemented by Franc [23] to recog-
nize the segmented object in a range image. Details of the procedure for superquadric fitting
are discussed in Franc's Ph.D. thesis. Superquadrics are a family of parametric shapes that
can be used as primitives for shape representation in computer vision [31]. Superquadrics
are like lumps of clay that can be deformed and glued together into realistic looking models.
However, we will consider only non-deformed superquadric models for classification of the
object into one of the categories :
4. Irregular : Any object not falling in any of the above three categories.
Parameters a l , a2, and as define the superquadric size in x,y and z direction respectively.
EI is the squareness parameter in the latitude plane and EZ is the squareness parameter in
the longitude plane. Based on these parameter values superquadrics can model a large set
of standard building blocks, like spheres, cylinders, parallelepipeds and shapes in between.
Figure 16 illustrates the various types of shapes obtainable by changing two shape param-
eters. If both ~1 and e2 are 1, the surface defines an ellipsoid. Cylindrical shapes are
obtained for ~1 < 1 and ~2 = 1. Parallelepipeds are obtained for both ~1 and ~2 are < 1.
Figure 16: Superquadric models as function of shape parameters ( c 1 . z 2 ) for given size
parcmeters ( a l .az, a s )
We have restricted the model recovery procedure to fit the models with 0 < E ~ , E Z< 1. We
will not discuss the details of model recovery here.
The Criteria used for classification are three size parameters, two shape parameters and
the goodness of fit (GOF) measure. The superquadric procedure returns a GOF measure
using the following equation :
N
1
GOF = - ~
N -
[ a 1 a 2 a 3 ( ~ ( ~ , ~ , ~ ; a l , a 2 , a 3 , ~ 1 , ~ 21)12
~ # ~ ~ , $ , ~ ~ ~ ~ ~ ,
i=l
1. Background is assumed to be the supporting surface of the object. Points are added on
the background by backprojecting the visible surface on the background ( figure 17(c).
While this is desirable in case of flat surfaces, it is not right for surfaces with volumetric
information.
2. Supporting points of the segmented object are used to determine the immediate sup-
porting surface(s) of the object. Points are added vertically (figure 17(e)) to the
object. This technique is more flexible since it can handle objects not lying on the
background. But it results in more points to be added in addition to assuming that
Figure 17: Horizontal and vertical addition of points.(a) object. (b) original points.
(c) horizontal addition of points. (d) fit with horizontal addition. (e) vertical addition of
poins. (f) fit with vertical addition.
the object is actually touching the neighboring objects, which may not be true in
general.
2. Format conversion and point addition : Generate a list of points in 3-D space
representing the object. Call it points.orig. For every point on the visible surface add
a point at the same (s,y) coordinates on the background. Output the list of original
and added points in points-add.
Condition (a) assumes that model recovery is complete by 15th iteration. Con-
dition (b) stops the procedure if an acceptable model is obtained early in the
process. Condition (c) monitors the rate convergence of fitting procedure. It
terminates the fitting procedure if the GOF measures of last five iterations do
not vary much. All the values used in the above three conditions are empirically
determined.
4. M o d e l selection :
(b) ELSE I F ((al & as) AND (a2 & as) AND
(el < 0.5) AND (ez < 0.5))
THEN OBJECT = FLAT.
(d) ELSE I F ((a1 > THRESH-1JlOLL) AND (a2 > THRESH-1-ROLL) AND
(a3 > T H R E S H L R O L L ) AND
< 0.5) AND (e2 > 0.5))
THEN OBJECT = ROLL.
7. Done : Output classified model with parameters. Determine Orientation and position
of the model in world coordinate system.
The superquadric fitting procedure and classifier were run on the objects segmented pre-
viously. The superquadric parameters for the four types of recovered objects are shown in
figure 18.
Figure 19 shows model recovery on topmost object segmented in figure 14. The model
selection process rejected model.orig due to large fit-error and accepted model.add. Even
if model.orig had an acceptable error measure, rnodel.add would have been selected due
RCA 21
Figure 19: Superquadric fitting and Model selection (a box) : (a) original points. (b)
fitted model on original points. (c) original and added points. (d) fitted model on original
and added points.
to larger volume. The acceptable error magnitude was empirically determined to be 500.
In figure 20 model.orig is selected and classified as roll because of the tremendous error
difference between the two acceptable models. modeLadd is accepted and classified as flat
in figure 22 because of volume consideration, although both the models have acceptable
error measures. Finally, the film mailer in figure 23 is classified as irregular as the fit-errors .
Figure 20: Superquadric fitting and Model selection (a Cylindrical object) : (a)
original points. (b) fitted model on original points. (c) original and added points. (d) fitted
model on original and added points.
Figure 21: .Segmentation: upperleft : original image from grasp lab scanner ( aletter )
upperright:Edges detected. lowerleft : topmost object. lowerright : all segmented objects
Figure 22: Superquadric fitting and Model selection (a letter) (a) original points. (b)
fitted model on original points. (c) original and added points. (d) fitted model on original
and added points.
RCA 24
Figure 23: Superquadric fitting and Model selection (a film-mailer) : (a) original
points. (b) fitted model on original points. (c) original and added points. (d) fitted model
on original and added points.
of the complex objects into parts. In the next chapter we will describe a surface classification
scheme that uses the output of segmentation routines described in this chapter.
Chapter 4
Surface characterization refers t o the computational process of partitioning the surfaces into
regions with equal characteristics. Since our ultimate goal is object recognition, classifica-
tion of the surfaces by the characteristics of the surface functions is very useful. Classical
differential geometry provides a complete surface description of analytic surfaces so as to
obtain a complete set of surface characteristics. Surface characterization can be successfully
used in intermediate and high level processing of the object recognition problem.
Important surface characteristics, that are visible-invariant are Gaussian cumaturt and
the mean curvature. They are invariant to changes in surface parametrization and to trans-
lations and rorations of object surfaces. Guassian curvature is an intrinsic property of the
surface while mean curvature is an extrinsic property of the curvature.
From differential geometry it is well known that curvature, speed, and torsion uniquely
determine the shape of 3-D surfaces. The surface characteristics of our interest are the
ones with one-to-one relationship with curve shapes. The mathematics of a general surface
representation scheme and calculation of Guassian and mean curvatures is described in
following section.
4.1 Differential Geometry of Surfaces
Parametric form of equation for a regular surface S with respect to a known coordinate
system is :
The surface is a locus of points in Euclidean three-space defined by the end points of
the vector X(u,v) with xi(u,v) the components of the vector. These red functions are
assumed to be defined over an open connected domain of a Cartesian u,v plane and to
have continuous second partial derivatives there. In our analysis of range images we are
assuming that this condition is satisfied.
The second condition for a regular surface is automatically satisfied by 2-depth format
images. It requires that the coordinate vectors Xu= XI = a
a~ 9 X, = X 2 = ax are linearly
independent :
The two tangent vectors x, and x, lie in the tangent plane T(u,v) of the surface at the
point ( u ,v). [g] matrix is symmetric for an analytic surface.
figure 24 shows the coordinate frame at the Neighborhood of a point.
The first fundamental form I(u, v , du, dv) measures the small amount of movement in the
parameter space (du,sv). The first fundamental form is invariant to surface parametrization
changes and to translations and rotations in the surface. Therefore it depends on the
surface itself and not on how it is embedded in the 3-D space. The metric functions E, F, G
determine all the intrinsic properties of the surface. In addition they define the area of a
surface :
The second fundamental form of the surface is given by :
The second fundamental form measures the correlation between the change in the normal
vector dn and the change in the surface position at a point ( u , v ) as a function of small
movement (du,dv) in the parametric space. Besl and Jain [9]have discussed the properties
of first and second fundamental forms in detail. We will consider some of the important
properties of Gaussian and Mean curvature in the following paragraphs.
It can be shown that the [g] matrix and the [b] matrix elements are the continuous
functions with continuous second and first partial derivatives respectively and that they
uniquely determine the surface type. From the [g] and [b] matrices calculated above surface
shape and intrinsic surface geometry can be uniquely determined.
The Gaussian curvature function K of a surface can be defined in terms of the two
matrices as :
K = det
921 g22 b21 922
RIDGE L-
G =o
RIDGE
Figure 25: Basic surface types in range images (a) surface types (b) table of surface types.
The two types of curvatures are together referred to as surface curvature functions.
They exhibit very important properties that enable them t o be used as features for higher
level of processing. For detailed discussion on the properties of surface curvature functions
see Besi and Jain [9].Some of the relevant properties are summarized below :
1. Surface types can be determined by the sign of surface curvatures. They are shown
in figure 25
5 . Mean curxature function of a graph surface taken together with the boundary curve
of a graph surface uniquely determines the graph surface from which it was computed.
6. Gaussian and mean curvature are invariant to arbitrary transformations of the (u, v)
parameters of a surface as long as the Jacobian of the transformation is always non-
zero.
7. Gaussian and mean curvatures are invariant to rotations and translations of a surface.
This property enables us to obtain view-independent characteristics.
10. Another important property of surface curvatures is that Gaussian curvature indicates
the surface shape at individual surface points.When surface is shaped like an ellipsoid
in the Neighborhood of(u, v), K(u, v) > 0. It is < 0 for locally saddle-shaped surface
and is = 0 if the surface is flat,rdge-shaped or valley-shaped locally. Mean curvature
also indicates surface shapes at individual points when considered together with the
Gaussian curvature.
The above observations are very important for surface classification and have been widely
studied and used in range image processing.In fact surface characteristics constitute an
important part in the realization of the ultimate goal of three dimensional object recognition.
Given a range image, our objective is to calculate the Gaussian and mean curvature. To
compute surface curvature we need to know the estimates of the first and second partial
derivatives of the depth map. Equations to get the partial derivatives can be simplified in
the case of the 2-depth format range image. Parameterization takes a very simple form:
T
XU = [ u v f(u,v) ] . The T superscript indicates the transpose This gives following
formulas for the surface partial derivative and the surface normal.
and the six fundamental form coefficients :
K =
fuufvv - f:v
(1 + f,2 + fv2)2
And the expression for mean curvature is given by:
Partial derivatives of the range image can be obtained by fitting a continuous differentiable
function that best fits the data. There are various techniques available in mathematics that
have been used by computer vision researchers to determine partial derivatives of depth
maps.
Using Discrete Orthogonal Polynomials
Besl and Jain [9]used discrete quadratic orthogonal polynomial fitting at each pixel to
estimate derivatives. It is possible to control Neighborhood size for making local estimates
which is important in case of actual range images.
A quadratic surface is fit at each pixel in the image, using a window convolution operator
of size desired by the user.
Each point in the given window is associated with a position (u, v ) from the set U X U
where N is odd :
The following discrete orthogonal polynomials provide the quadratic surface fit :
bi(u) vectors are computed according t o the window size. First the surface estimate
function f ( u , v ) is calculated :
The first and second partial derivatives can then be directly read from the aij coeffi-
cients :
After the first and second partial derivatives are determined, surface Chara,cteristics at
each pixel are calculated.
Using Difference O p e r a t o r s
Brady etal [4] have used 3 x 3 difference operators to locally compute first and second
derivatives of the Gaussian smoothed surface. Neighborhood size cannot be increased in
this method. The operators are :
Yang and Kak [33] have derived 3 x 3 operators using B-splines for computing partial
derivatives of a range map. These can be combined with Gaussian operator to increase the
window size and reduce sensitivity to noise. The operators give partial derivatives at the
center pixel of each operator.
XUU : [:2 ;4
:2] XVV:; [: 1 -2
-8
-2
1
4 1 XUV:;
1
[ ]1
0
-1 0
0 -1
0 0
1
We have used a fast least squares fitting method to derive partial derivatives in the symmetric
Neighborhood of a pixel. This method allows the Neighborhood size to be controlled.
A surface fit of order n can be written as :
We have used second ( n = 2) order fitting in the Neighborhood of every pixel to compute
first and second order derivatives. Clearly, since the pixel a t which derivatives are computed
is at the origin, we get :
Thus derivatives are read off directly from the coefficients. We have also used the general
least squares fitting procedure for fitting polynomial on surface patches. For the purpose of
Original
crosssection.
smoothed
crosssection.
concave convex
f
concave
The above mentioned process is applied to actual range images and results are shown in
figures 27,28,29, 30,31,32,33.
The smoothing behavior of Gaussian operator was briefly discussed in chapter 2. It is
observed that step edges in range images are actually adjacent convex and concave edges.
This is further amplified after smoothing the image with any size of Gaussian operator.
Brady etal. [4] have restricted the Gaussian application to inside of the region. We have
used Gaussian uniformly in the range image with the intention of uniformly smoothing the
image for the purpose of obtaining reliable curvature estimates (see figure 26)
Figure 27: curvature estimation (a) original image. 192 x 256 12 bitslpixel image (b)
smoothed image. (c) regions. (d) error in fitting
Figure 28: curvature estimation lefi to right?from top; original image.150 x 150 8
bitslpixel image; error in fitting; segmented regions; flat regions; convex regions; concave
regions.
Figure 29: Initial labeling of scene with different threshold values (a) 0.01 (b) 0.02
(c) 0.03 (d) 0.04 .
Figure 30: Labeling of scene :(a) a l l regions (b) convex (c) concave (d) flat.
Figure 31: Thresholded left : gauss. right : mean. black indicates zero, gray is positive
and white is negative value
-0.4 -0.2 0 0.2 0.4
<-- values -->
The segmentation done by labeling the individual pixels using sign of Gaussian and mean
curvature is local in nature and threshold dependent. In order to interpret these labelings
globally, we need to process the the labeled image with glob1 constraints. Besl and Jain [25]
have proposed a variable order surface fitting algorithm. Surface patches are described as
linear,quadric or cubic.
Our approach depends on the actual requirements. We describe two methods, ( both are
preliminary ) to obtain useful segmentation given labeled image. The first method simply
groups convex patches to form connected convex subparts of the scene. Second method uses
the segmented objects obtained from algorithm described in chapter 3.
As noted in third chapter, detection of smooth edges is difficult to extract using only local
information. Curvature information at the d types of edges is easy to record. From figure 26
it is clear that edges in smoothed images can be recorded as thin convex and concave regions.
In particular, convex edges are of convex cylinder type, with zero Gaussian curvature but
appreciable negative mean curvature. Similarly, concave edges are of concave cylinder type,
with zero Gaussian curvature but appreciable positive mean curvature. Thus all types of
edges give either convex or concave cylindrical response. But the edge response is obtained
over wider region due to smoothing and large window size during derivative computation.
It is therefore not possible to have exact localization of patches obtained by merging convex
regions.
A simple algorithm for obtaining convex patches is given below :
3. Initialize the region data structure to record surface type, number of pixels, topmost
pixel in the region,neighbours of the regi~n~extremities
of the region and the label
Figure 34: Convex patches
64
assigned to the region.
4. For the next unprocessed topmost region of the type flat or peak(convex sphere) or
ridge (convex cylinder) with acceptable number of pixels do:
(a) Extend the original region to include all neighboring regions of type flat or
ridge. Other types of regions are considered concave or part of other convex
subpart. p e a k patches are not included because they will be selected as seed
region.
(b) Repeat the above step to extend the region, till it is not possible to grow any
more.
Figures 34 and 35 show convex patches obtained from labeled image obtained in figure27
and in figure 28(a) respectively. Majority of objects in figure 34 are merged into one convex
patch while they are separated in figure 35
Surfaces on the segmented objects can be classified as one of basic surface type using the
initial labeling based on sign of curvatures. Yang and Kak [33]have used extended Gaussian
images to identify surface type on isolated surfaces. Histogram of labels in an isolated object
can give some idea about the surface and guide the surface fitting process.
The classification algorithm is as follows :
(a) Erode the object in labeled image so as to remove points within 5 pixel distance of
the object boundary. This reduces the effect of smoothing and window size during
curvature estimation which is mainly contributed by pixels near the boundary,
and does not reflect the nature of region.
Figure 36: Classified surfaces
(d) In single surface cases fit the best fitting surface on the points. Output the
description of surface.
This algorithm is being implemented. Initial classification process will classify the sur-
faces in figure 11 as 5 plane surfaces, 1 cylindrical and 1 irregular surface ( the film mailer).
See figure 36.
First and second order polynomials were fitted on flat and non-flat surface patches
respectively in image of figure 11. The reconstructed image is shown in figure 37. Besl
Figure 37: Original a n d reconstructed Images. left: original images right: recon-
structed images obtained by fitting 1st and 2nd order surfaces on patches labeled by seg-
mentation process.
and Jain (13,251 have used initial labeling to obtain seed regions in the final region growing
process. They perform variable order surface fitting to approximate the scene as a collection
of piece-wise continuous functions.
Chapter 5
Discussion
Though results of running the various algorithms described in previous chapters on im-
ages acquired from different scanners are consistent, there is scope for refinement of all the
approaches. We will discuss the merits and demerits of each method and suggest improve-
ments.
We need to study the scale-space behavior of range images in detail. This would lead to a
better understanding of the scale at which range images should be handled. We have noticed
that thresholding of zero-crossings makes the entire segmentation procedure dependent on
the threshold value. Though we have obtained consistent results with a fixed empirically
determined value for all the images obtained from a particular scanner, threshold selection is
not automatic. Secondly, even with right threshold value the region may not be completely
bounded by the zero-crossings ( in case of overlaps by thin objects or sensor noise) To
make the whole process less sensitive to threshold, following post-processing steps (region
splitting ) are suggested :
2. Trace the contours around the object as it is defined now and also any other boundaries
that are now lying inside the object. Except for the bounding contours, other contours
may not be closed. They may simply lie within the region and actually are boundary
of the real object. In such a case mark the beginning and end of the contour. If the
contour touches the closed contour then mark the point of contact as end of inner
contour.
4. Now split the region by connecting two contours (gap filling) or connecting two points
of concavity ( gap filling or region splitting) or connecting an end point of contour
with a concavity, based on predetermined gap filling distance.
The above method should be indifferent to threshold values on higher side as it splits the
region consisting of more than one regions. To reduce the sensitivity to low threshold values
(which will result too many small regions) some sort of merging is required. Merging is a
much difficult task, so it is better to keep the threshold high and have the post-segmentation
process perform the splitting, rather than initial segmentation performing splitting due t o
low threshold value.
Another solution to splitting is to let higher level recognition process make globally valid
observations to split the region. The higher level procedure may use a priori information or
may make some assumptions or apply global constraints to split the region. Franc Solina's
(see 1231) superquadric procedure caa split the regions into identifiable parts by performing
model fitting on individual part of the object.
In chapter 4 we noticed that labeling of the scene based on curvature sign is threshold
sensitive. While thresholding around zero is necessary to obtain meaningful results, it is
not clear how that value can be automatically determined. Curvature determination being
local, the labeling is sensitive to noise and surface texture. It is not well understood how to
generate a global interpretation of such surfaces.
Bibliography
[l] R.M.Bolle and D.B. Cooper,Bayesian recognition of local 3-D shape by approximating
image intensity functions with quadric polynomials, IEEE Trans. Pattern Analysis and
Machine Intelligence. PAMI-6,No.4,1984,418-429.
[2] R.Bajcsy; Three dimensional scene ana1ysis;Proc. Pattern Recognition Conf. ; Mi-
ami,Florida,pp. 1064-1074,1980.
[3] P.J.Bes1 and R.C.Jain,Three dimensional object recognition, ACM Computing surveys
17,No.(1),1985,pp. 75-145.
[5] Haruo Asada and Michael Brady ;The curvature Curvature Primal Sketch, IEEE Pat-
tern Analysis and Machine Intelligence, PAMI-8,No.l, January 1986.
[6] Robert Haralick,Layne Watson and Thomas Laffey ;The Topographic Primal Sketch,
International Journal of Robotics Research, vol 2,No 1,Spring 1983,pp 50-72.
[7] Jean Ponce and Michael Brady ;Toward a Surface Prima2 Sketch, IEEE Conference on
Robotics and Automation, March 1984, pp 420-425.
[8] M.Hebert and J. PonceA new method for segmenting 3-D scenes into primitives Proc.
of the 6th int conf. on pattern recognition. 1982
[9] P.J.Bes1 and R.C.Jain,Invariant surface characteristics for 9d Object Recognition in
Range Images,Computer vision,Graphics, Image Processing No.(1),1986,pp 33-80.
[lo] R.M.Haralick Digital step edges from zero crossings of second directonal
derivatioes,PAMI-6,No. 1,1984,58-68.
[13] P.J.Bes1 and R.C. Jain Segmentation through symbolic surface descriptions,Proc on
Computer Vision and Pattern Recognition,l986.
[14] T.J.Fan,G.Medioni and R.NevatiaDescriptions of surfaces from Range data using cur-
vature properties, Proc. on Computer Vision and Pattern Recognition,l986.
[15] T.C.Henderson efficient segmentation method for mnge data. Proc. of the society for
photo-optical Instrumentation Engineers conference on Robot Vision. 1982.
[16] A. Heurtas and R. Nevatia ;Edge detection in Aerial Images Using V2G(x, y), in Semi-
annual Technical Report on Image Understanding Research, University of Southern
California,l98l,pp 16-26.
[17] S.Inokuchi eta1 ,A three dimensional edge region opemtor for range pictures. Proc. of
6th international Conference on pattern recognition. 1982.
[18] D.L. Milgrim and C.M. Bjorklund Range image processing : Planar surface extmction.
Proc of the 5th International Conference on Pattern Recognition. 1980.
[22] B.K.P. HorqExtended Gaussian Images;Proc. of the IEEE,vol. 72,No. 12, pp. 1671-
1686,December 1984.
[23] Franc Solina; Shape Recovery and Segmentation with Deformable Part Modeki; Ph.D.
thesis, Grasp laboratory, University of Pennsylvania. MS- CIS-87-111.
[24] D. Marr and E. Hildreth; Theory of Edge Detection; Proc. of Royal Society of
London,B-207,pp 187-217,1980.
[25] P.J. Besl and Ramesh Jain; Segmentation Through Variable-Order Surface fitting, IEEE
Transactions on Pattern Analysis and Machine Intelligence, 1988.
[26] David J. Heeger; Filling in the Gaps: A Computational Theory of Contour Generation,
University of Pennsylvania, Grasp Lab, Technical report MS-CIS-84-64
\
[27] B. Gil, A. Mitiche and J.K. Aggarwal ;Experiments in combining intensity and range
edge maps,Computer Vision, Graphics and Image Processing 21, 1983, pp 395-411.
[28] A. Mitiche and J.K Aggarwal Detection of Edges Using Range Information, IEEE
Trans. Pattern Analysis and Machine Intelligence,PAMI-5,No. 2,pp 174-178.
[29] Peter Allen; Object Recognition Using Vision and Touch; Ph.D. Thesis, Grasp
Lab.,University of Pennsylvania. 1985.
[30] R. Nevatia and K.R. Babu; Linear feature extractor and Description, Computer Graph-
ics and Image Processing,voll3,pp. 257-269,1980.
[33] R. S. Yang and A. C. Kak; Determination of the Identity, Position and Orientation
of the Topmost Object in a Pile, Computer Vision, Graphics and Image Processing
3 6 , 1 9 8 6 , ~229-255.
~
[34] David Smith and Takeo Kanade; Autonomous Scene Description with Range Imagery,
Computer Vision, Graphics and Image Processing, Vol 31,No. 3,Sept 1985,pp 322-334.
[35] Martin Herman; Generating Detailed Scene Descriptions from Range Images, IEEE
conference on Robotics and Automation, March 1984, pp 426-431.
[36] Richard Duda, David Nitzan and Phyllis Barrett ;Use of Range and Reflectance Data
to Find Planar Surface Regions, IEEE Pattern Analysis and Machine Intelligence,
PAMI- 1, NO 3, July 1979.
[37] Darwin Kuan and Robert Drazovich ;Model-based Interpretationof Range Im-
agery,Proc. of the National Conference on Artificial Intelligence, Austin, August 6-10,
AAA1,pp 210-215.
[38] B .C. Vemuri and J.K.Aggarwal;PD Model Construction from Multiple Views Using
Range and Intensity Data, IEEE conference on Computer Vision and Pattern Recog-
nition,l986,pp 435-437.
[39] Alan Yuille and Tomaso Poggio ;Scaling Theorems for Zen, Crossings, IEEE Pattern
Analysis and Machine Intelligence, PAMI-8,No 1, January 1986.
Appendix A
we get :
In a symmetric neighborhood :
Spqo = Sqpo
and
Appendix B
There are other supporting programs that are vital to the algorithms. Superquadric
fitting programs are developed by Franc Solina and are not listed here.
FILE *infile.*out; / * pointers to input image and output image fil
Program for computing scale - space description of the input imag FILE *out2;
and lots of other things in interactive manner. Later will overri FILE *infs,*outgmsmooth,*outputgm;
sca1e.c. char *smooth-cmt;
char ikonas-disp [lo1;
Modifications for display of histogram done on Feb 18, 1988. char *out-cmt;
int count;
March 24 1988 : To read/write in both PM-C and PM-S formats. unsigned char c;
April 1 1988 : All buffers made floating pt. Computation of lapla static int nhb-x[8] = (0.1.1.1.0,-1,-1,-1);
and zerocrossing made floating pt. static int nhb-y [81 = (1,1,0,-1,-1,-1,O.l);
..................................................................... float nbd[l],temp;
int i,j,k,l,m,n,b;
float nbr [lo]:
Yinclude <stdio.h> int gsize;
Yinclude <math.h> float sigma; / * size and sigma of the gaussian operator * /
#include <local/pm.h> float sum;
Yinclude <ik.h> float gauss 1601 ;
/ * Yinclude <local/qkopt.h> * / float gsum;
int offset;
Ydefine BUFSIZE 256 / * Image buffer size * / char input [201;
Ydefine BUF-NUM 4 / * # of buffers available for manipulation * / int offx = 0,offy = 0; / * offset coordinates , intiallzed
char outfile[301; / * name of the output file * /
float result[BUFSIZE][BUFSIZE]; int bl,b2,b3;
float buffer[~U~-N~][BuFSIZE][BUFSIZE];/ * 0 stores original image * unsigned char *pmgoint;
int temp-buf[BUFSIZE]; / * temporarily store a buffer int disp-row;
short int *pmsqoint; / * pointer to short integer to hand
float x[BUFSIZE], y[BUFSIZE]; / * to store points * / PM-S format * /
int factor; / * # by which PM-S image pixel to b
for display on IKONAS * /
int threshold; / * Threshold for detecting zero-xings * /
int booll,bool2,bool3;
char *cmt;
char input-filename [50]; if (argc != 2)
int sizex,sizey; / * size of the image * / (
pmpic *pml; printf("usage : scale cinput-image-file-pmpic>\nN);
u-int image-format; / * stores the format of the last image read exit (0);
1
float mino ;
float getmedian() ; printf("want to display on ikonas ? ');
float abszz ( ) ; scanf (mls",&ikonas-disp[Ol);
float squarezz( ) ;
printf ("read\nm);
/ * open the ikonas display. value of env. variable is taken * /
if (~trcmp(~y",ikonas-disp)== 0)
main (argc,argv) if (ikopen(NULL) == -1)
int argc: f
char **argv; printf(*cantt open ikonas. exiting\nn);
( exit (0);
spa-. c Wn Apr 18 18:07:36 1988 3 sp.cm.c I(on Apr 18 18:07:36 1980 4
) count++;
1
/ * get comment line * / result [il [ jl = get-hedian(nbr) ;
1
cmt = pm-cmt(argc,argv);
for(i=l;i<(sizex-l);it+)
/ * open input pm file * / for(j=l;j<(sizey-1): j++)
bufferibl [il [ jl = result[i] [j];
readgicture (argv[l] ,0);
readgicture(argv[l], 1); ) / * end of median filtering * /
printf("Rows : %d Columns : %d\nn.sizex,sizey); else if((strcmp(input,"gauss*) == 0) I I (strcmp(input,"g') == 0
(