Information Extraction From Remotely Sensed Images
Information Extraction From Remotely Sensed Images
The era of 1-meter satellite imagery presents new and exciting opportunities for users of
spatial data. With Space Imagings IKONOS satellite already in orbit and satellites from
EarthWatch Inc., Orbital Imaging Corp. and, of course, ISRO scheduled for launch in the
near future, high resolution imagery will add an entirely new level of geographic
knowledge and detail to the intelligent maps that we create from imagery.
Geographic imagery is now widely used in GIS applications worldwide. Decisions made
using these GIS systems by national, regional and local governments, as well as
commercial companies, affect millions of people, so it is critical that the information in
the GIS is up to date. In most instances, what aerial or satellite imagery provides is the
most up to date source of data available, helping to ensure accurate and reliable decisions.
However, with technological advancements come new opportunities and challenges. The
challenge now facing the geotechnology industry is two fold - how best to fully exploit
high-resolution imagery and how to get access to it in a timely manner.
Is high-resolution imagery making a difference?
There is no doubt that the GIS press has been deluged with high-resolution imagery for
the last few years. Showing an application with an imagery backdrop provides an
immediate visual cue for readers. Without the imagery backdrop, the context is lost and
the basic map, comprising polygons, lines and points becomes more difficult for the
layman to interpret. It is the context or visual clues that provide the useful information
and it is this information that is the inherent value of the imagery.
The higher the resolution of the imagery, the more man made objects that can be
identified. The human eye the best image processor of all can quickly detect and
identify these objects. If the application is therefore one that just requires an operator to
identify objects and manually add them into the GIS database, then the imagery is
making a positive difference. It is adding a new data source for the GIS Manager to use.
However, if the imagery requires information to be extracted from it in an automated and
semi automated fashion (for example, a land cover classification), it is a different matter.
If the same techniques that were developed for earlier lower resolution satellite imagery
are used on the high-resolution imagery, (such as maximum likelihood classification), the
results can actually create a negative impact. Whilst lower resolution imagery isnt
affected greatly by artifacts such as shadows, high-resolution data can be. Lower
resolution data also smoothes out variations across ranges of individual pixels,
allowing statistical processing to create effective land cover maps. Higher resolution data
doesnt do this individual pixels can represent individual objects like manhole covers,
puddles and bushes - and contiguous pixels in an image can vary dramatically, creating
very mixed or confused classification results. There is also the issue of linear feature
extraction. Lines of communication on a lower resolution image (such as roads) can be
identified and extracted as a single line. However, on a high-resolution image, a road
comprises the road markings, the road itself, the kerb (and its shadow) and the pavement
(or sidewalk). A very different method of feature extraction is therefore needed.
Its not just the spatial resolution that can affect the usage of the imagery. With 11 bit
imagery becoming available, the ability of the GIS to work with high spectral content
imagery becomes key. 11 bit data means that up to 2048 levels of grey can be stored and
viewed. If the software being used to view the imagery assumes it is 8 bit (256 levels),
then it will either a) display only the information below the 255 level (creating either a
black or very poor image) or b) try to compress the 2048 levels into 256, also reducing
the quality of the displayed image considerably. Having 2048 levels allows more
information in shadowy areas to be extracted as well as enabling more precise spectral
signatures to be defined to aid in feature identification. However, without the correct
software, this added bonus can easily turn into a problem.
Information Extraction from Remotely Sensed Images:
Geoinformation extraction using image data involves the consuction of
explicit,meaningful descriptions of physical objects (Ballard & Brown, 1982).When
performing analysis of complex data one of the major problems stems from the number
of variables involved. Analysis with a large number of variables generally requires a
large amount of memory and computation power or a classification algorithm which
overfits the training sample and generalizes poorly to new samples. Feature extraction is
a general term for methods of constructing combinations of the variables to simplify
these problems while still describing the data with sufficient accuracy. Best results are
achieved when an expert constructs a set of application-dependent features. All
approaches usually include object recognition i.e. interpretation using eye-brain/computer
system and object reconstruction i.e. coding, digitizing,sructuring.
It can be used in the area of image processing which involves using algorithms to detect
and isolate various desired portions or shapes (features) of a digitized image or video
stream. Generally approaches for information extraction using image processing
technoques may be grouped as follows:
Low-level
Edge detection
Corner detection
Blob detection
Ridge detection
Scale-invariant feature transform
Curvature
Image motion
Motion detection.
Shape Based
Thresholding
Blob extraction
Template matching
Hough transform (Lines, Circles/Ellipse, Arbitrary shapes -Generalized Hough
Transform)
Flexible methods
position as it requires not only the information which can be derived from the image, but
also a priori knowledge about the properties of a road and its relationships with other
features in the image and other related knowledge such as knowledge on the imaging
system. Due to the complexity of aerial images and existence of image noise and
disturbances, the information derived from the image is always incomplete and
ambiguous. This makes the recognition process more complex.
A knowledge-based method for automatic road extraction from aerial images has been
developed in this laboratory. The method includes bottom-up hypothesis generation of
road segments and top-down verification of hypothesized road segments. The generation
of hypotheses starts with low-level processing in which linear features are detected,
tracked and linked. The results of this step are numerous edge segments. They are then
grouped to form the structure of road segments based on the general knowledge of a road,
and the generated structures of road segments are represented symbolically in terms of
geometric and radio-metric attributes. Finally, applying the knowledge stored in the
knowledge base to the generated road structures hypothesizes road segments. As
hypotheses of road segments are generated in a local context, ambiguity is unavoidable.
To remove spurious hypothesized road segments, all hypotheses are checked in a global
context using the topological information of road networks, which is derived from lowresolution images. The missing road segments are predicted using topological
information of road networks. This method has been applied to a number of aerial
images with encouraging results.
EXTRACTION OF POINTS
General Principles for Point Extraction
Definition :
Points are image objects, whose geometric properties can be represented by
only two coordinates (x, y). One can distinguish between several types of points.
A circular symmetric point is a local heterogeneity in the interior of a
homogeneous image region. CSPs are too small to be extracted as regions
(depending on the image scale) and are characterised by properties of circular
symmetry (e.g., peaks, geodetic control point signals, man-holes). CPSs can be
interpreted as region attributes; they do not affect the image structure. Endpoints
(start point or end point of a line), corners (intersection of two lines) and junctions
(intersections of more than two lines) are used for the geometrical description of
edges and region boundaries. Missing of these points can cause fatal
consequences for the symbolic image description.
REPRESENTATION:
The symbolic description of points can be given as a list containing geometric
attributes (the coordinates), radiometric attributes (e.g., strength) and relational
attributes (e.g., the edges, intersecting at this point).
APPLICATIONS
Major applications for extracted image points are image-matching operations.
Assuming that extracted points refer to significant points in the real world, we can
look for the same real point in two images taken from a different view. This
technique is used for image orientation (PADERS et al. 1984) or DTM-generation
(e.g., KRZYSTEK 1991).
BASIC APPROACHES
Here we only review approaches that solely use the image data (one could also
think of point extraction methods, which determine junctions or intersections from
already extracted contours). Three prominent methods are:
Point template matching
Corner detection based on properties of differential geometry
Point detection by local optimization
Deriving the point coordinates, normally follows a three-step procedure: in the
first step point regions are selected, applying a threshold procedure. These are
image regions where points are supposed to lie inside. In a subsequent step the
best point pixels within these regions are selected this operation could be
referred to as thinning. An even more accurate determination of the point position
can be derived by a least squares estimation (LSE), so in this step we look for
the real valued coordinates of points.
Point Templates
One possibility to detect point regions is to define a point pattern (template),
which represents the point structure we are looking for. The main idea of
template matching is to find the places in the image where the template fits best
in the image. The similarity between the template and the image can be
evaluated by multiplication of the template values with the underlying image
intensities or by the estimation of the correlation coefficients. Disadvantages of
template matching in general are the limitation by the number and types of
templates, and sensitivity to changes in scale and to image rotation (assuming
that the template are rotational invariant).
Corner Detection by Curvature
Let us assume that the image data is stored in an image function g(r,c), r refers
to the row of the image, c to the column. Several approaches are based on the
curvature of g, which can be expressed by the second partial derivatives in the
coordinates axes r and c. The sign of the curvature can be used for the
classification of the pixels and for the detection of corners. An overview and
evaluation of these approaches can be found in (DERICHE AND GIRAUDON
1990).
Point Detection by Optimization:
MORAVEC (1977) was the first who proposed an approach aiming at detecting
points, which can be easily identified and matched in stereo pairs. He suggested
measuring the suitability or interest of an image point by the estimation of the
variances in a small window (4x4, 8x8 pixels). This method is used in many
stereo matching algorithms and initiated further investigations leading to the
interest operators proposed by PADERES et. Al (1984) and FORSTNER AND
GULCH (1987). Similar to the Moravec-Operator, the objective of these
operators is the detection of adequate points (but with higher accuracy).
Adequate points are those which meet the two criteria of (1) local distinctness (to
increase geometric precision) and (2) global uniqueness (to decrease search
complexity), in figure the Forester-Operator is able to detect different point types
with the same algorithm and can be used either for image matching or image
analysis approaches.
Interest-operator in a 1-D case: Image matching can be reduced to a onedimensional problem, using the epipolar geometry of two images. In this case the
aim is to match two intensity profiles. The effect of the interest operator in 1-D is
identical to finding the zero crossings of the Laplacian, neglecting saddle points
of the intensity function.
EXTRACTION OF EDGES
General Principles for Edge Extraction :
DEFINITION
Referring to BALLARD AND BROWN 1983, ROSENFELD AND KAK 1982,
NALWA 1993 and edge is an image contour, where a certain property like
brightness, depth, color or texture (see Fig.11a) changes abruptly perpendicular
to the edge. Moreover, we assume that on each side of the edge the adjacent
regions are homogeneous in this property. According to these characteristics,
edges can be classified into two general types, step edges (edges) and bar
edges (lines)
Edges represent boundaries between two regions. The regions have two distinct
(and approximately constant) pixel values; e.g., in an aerial image two adjacent
agricultural fields with different land use.
Lines either occur at a discontinuity in the orientation of surfaces, or they are thin,
elongated objects like streets in a small-scale image. The latter may appear dark
on bright background or vice versa. When the scale is large the street appears as
an elongated 2-D region with edges on both sides. To avoid conflicts in the
symbolic image description it might be necessary to make an explicit distinction
between edges and lines.
REPRESENTATION
Edges extraction usually leads to an incomplete description of the image, i.e.
edges do not build closed boundaries of homogeneous image regions. The types
of representation of single edges are manifold depending on the intended use.
The symbolic description of edges can be given, e.g. as a list, containing
geometric, radiometric (e.g. strength, contrast) and relational attributes (e.g.
adjacent regions, junctions, etc.). The geometric attributes depend on the choice
of the approximation function (see step 5 below). For linear edges it is sufficient
to specify the start and endpoint.
APPLICATIONS
Contrary to points as image features, one can argue that a list of all edges in an
image contains all the desired image information, but its representation is much
more reduced and is easier to be interpreted by a computer. To support this
statement, consider again the image in Figure 3a. Just by looking at the edges it
is possible to recognize the object. If in addition each had stored the brightness
of its left and right adjacent region, the information would be even more
complete. Another justification could be based on information theory, COVER
AND THOMAS (1991) wrote: the less a certain structure can be found in an
image, the more unexpected it is. This means that an unexpected structure
contains much more information than a frequent one (like homogeneous
regions). Edges can be used to solve a broad range of problems because their
importance, some of them are:
Relative orientation: Edge-based matching in stereo pairs are applied for relative
orientation, e.g. L1 and SCHENK (1991) use curved edges.
Absolute orientation: Matching edges with wire frame models of buildings can be
used for absolute orientation.
Object recognition and reconstruction: In many cases object models consist of
structural descriptions of object parts. Straight lines often bound parts of manmade objects. The structural description based on edge extraction provides
besides its completeness highest geometrical accuracy. Models about the
expected shape of object boundaries can be involved easily in the process, e.g.
searching for straight lines. Therefore, extracting edges is widely used for object
recognition.
BASIC APPROACHES
Both edges types can be detected by the discontinuity in the image domain and
in the following we will make no distinction between these types as long as it
makes no difference for the algorithm. Since the beginning of digital image
processing, edge detection has been an important and very active research area.
As a result, a lot of edge detectors have been developed, which differ in the
image or edge model they are based on, the complexity, the flexibility and the
performance. In particular, the performance depends on 1) the quality of
detection, i.e. the probability of missing edges and yielding spurious edges and
2) the accuracy of the edge location. Unfortunately both criteria are conflicting.
Even a short description of all approaches is beyond the scope here, so we only
outline the principles by looking at the main processing steps most edge detector
algorithms have in common. A typical approach consists of five steps:
Extraction of edge regions: Extraction of all pixels, which probably belong to an
edge. The result is elongated edge regions.
Extraction of edge pixels: Extraction of the most probable edge pixels within
the edge regions reducing the regions to one pixel wide edge pixel chains.
Extraction of edge elements (edgels): Estimating edge pixel attributes, e.g.
real valued position of the edge pixels, accuracy, strength, orientation, etc.
Extraction of streaks: Aggregation or grouping of the edgels that belong to the
same edge.
Extraction of edges: Approximation of the streaks by a set of analytic functions,
for example polygons.
In the following section the main objectives and the most common techniques of
each step will be mentioned.
Edge Regions
The aim of this step is to extract all pixels from an input image, which are likely to
be edge pixels. The extraction could be done by template matching, by
parametric edge models or by gradients. Starting from an image with the
intensity function g, the result is a binary image where all edge pixels are labeled.
In addition, iconic features, e.g. the edge magnitude and the edge direction, of
each edge pixel are extracted and stored as they are required in subsequent
steps.
Template Matching: Edge templates are patterns, which represent certain edge
shapes. For each edge type (different edge models, different edge directions,
edge widths and strengths) a special pattern is required. Operators can be found
e.g. in ROSENFELD AND KAK (1982).
Gradient Operators (Difference Operators): The main idea of these
approaches is that in terms of differential geometry the derivatives of an image
intensity function g can be used to detect edges, which is more general than
template matching procedures. The first step is to apply linear filters (convolution)
to obtain difference (slope) images. The slope images represent the components
of the gradient of g; from these the edge direction and edge strength (magnitude)
can be calculated for each pixel.
The convolution of the image with one of the many known difference operators is
followed by a threshold procedure for distinguishing between the heterogeneous
image areas, i.e. pixels with high gradients and the homogeneous area, i.e.
pixels with low gradients (see Sec. 2.3). All pixels above a certain threshold are
edge region pixels.
Parametric Edge Models: An example for a parametric solution of edge
detection is Haralicks Facet Model (HARLICK AND WATSON 1981), which can
be used either for edge detection or for extracting regions and points.
The idea is to fit local parts of the image surface g by a first order polynomial f
(sloped planes or facets). Three parameter , and represent the facet f,
which can be evaluated by least squares estimation. The model is given by g (r,
c) = r + c + + n (r,c), where and , are the slopes in the two coordinate
axes r and c, the altitude of the facet and n(r,c) the image noise. HARALICK
AND SHAPIRO 1992 showed that the result of this approach is identical to the
convolution with a difference operator. The classification of edge pixels is a
function of the estimated slopes (,) : if the slopes are greater than a given
threshold and, in addition, the variances are small enough (to avoid noisy image
areas, which are assumed to be horizontal), the pixel belongs to an edge region.
Edge Pixels
Due to low contrast, image noise, image smoothing, etc. the first step leads to
edge regions, which are possibly more than one pixel wide. The aim of this step
is, to thin the edge regions to one pixel wide edge chains. These pixels should
represent the real edges with highest probability. Assuming the real edge is
located in the mid-line (skeleton) of the edge regions, thinning or skeleton
algorithms can be applied. Obviously these midlines of edge areas are not
necessarily identical to the real edges. To improve the accuracy of edge location,
the properties of the pixel like the gradient or the Laplacian may be used for
extracting the most probable location of the edges. This can be done by the
analysis of the local neighbourhood of each pixel (non-maxima-suppression) or
by global techniques (relaxation, Hough transformation). The non-maximasuppression is the most widely used method.
Edge Elements
The extraction of edgels is the first transition stage from the edge pixels in the
discrete image domain to the symbolic description of the edge. This step
contains the estimation of properties of the edge pixels required for subsequent
interpretation processes (e.g. real values coordinates, contrast, sharpness,
strength, type) and which are stored as attributes of the symbolic edge elements.
Edge Streaks
The next step is to group all edgels, which belong to the same edge. One can
say that now the real detection of the image feature edge happensbut the
real edge is represented as a list of edgesl. The aggregation of the edge
elements can be done using local (edge tracking) or global techniques (Hough
transformation, dynamic programming, heuristic search algorithm).
The grouping process should ensure that each streak 1) consists of connected
edgels, where each pixel pair is connected by a non-ambiguous pixel path and 2)
delineates at most regions (usually edges delineate two regions, except dean
lines or open edges, which are surrounded by the same region).
To satisfy the second criterion we define a streak as an edge pixel chain between
two edge pixels, which are either end pixel(s) and/or node pixel(s). According to
the number of neighbours in a N8-Neighbourhood we classify the pixels as node,
line or end pixels as shown in Fig. 9. Given the classification, the easiest
aggregation method is an edge following or edge tracking algorithm: first one has
to look for an unlabeled edge pixel, which means, that this edge pixel does not
yet belong to an edge. If you found one, you track all direct and indirect
neighbours until en end-or node pixel appears. All these collected edge pixels
belongs to one edge and will be labeled with a unique edge number.
Edge Approximation
Up to now, the extracted streaks are still defined in the discrete image model as
they are represented by a set of connected edge elements. Thus, for deriving a
symbolic description of the edges a last processing step is required. This step is
very important since the representation domain changes from the discrete image
raster to a continuous image model, the plane.
results, but it would be too much hassle if you only look for straight lines.
Furthermore a polygon as a set of straight lines can also approximate a curved
edge. As usual, the choice of the approximation depends on what you want (or
the application requires). Here we look at straight-line fitting.
Approximation by Straight Lines: For the approximation of the edges by
straight lines many different approaches are possible like merging, splitting or
split and merge algorithms. The critical point is to find the breakpoints or corners,
which lead to the best approximation.
The merging algorithm sequentially follows an edge and considers each pixel to
belong to a straight line as long as it fits the line. If the current pixel does not fit
anymore, the line ends and a new breakpoint is established. A disadvantage of
this approach is its dependency on the merging order: starting from the other end
of the edge would probably lead to different breakpoints.
Splitting algorithms divide recursively the edges in (usually) two parts, until the
parts fulfill some fitting conditions. Considering an edge consisting of a
sequence of edge pixel P1, P2,..Pn then P1 and Pn being the end points are
joined by an arc. For each pixel on that arc, the distance to the edge is
calculated. If the maximum distance is larger than a given threshold the edge
segment is divided into two new segments at the position where the maximum
distance was found.
It is possible to combine the advantages of the merging the splitting methods by
developing a split and merge algorithm. First we split, and then we do a merging
step by grouping lines if the new line fits the streak well enough, see Fig. 10.
The accuracy of the symbolic description, i.e. the edge parameters can be
improved applying a least square estimation taking all edgels belonging to one
edge into account. The observation values are given by the real valued
coordinates (xI, yI) of each edgel and the weights are defined by e.g. the squared
gradient magnitude. The covariance matrix of the estimated edge parameters
contains the accuracy of the edge. Thus, the uncertainty of the discrete image
information is preserved in the accuracy of the edges, which could be important
for the image interpretation processes.
Extraction of Regions
General Principles for Region Extraction
DEFINITION
Regions are image areas, which fulfill a certain similarity criterion, we call such
regions blobs. A similarity or homogeneity criterion could be intensity value of the
image pixel or some texture properties of the surrounded area of the pixel. The
result of such a region extraction should divide or segment the image to a
number of blobs. Ideally the union of these blobs will give the image again. The
regions themselves should be connected and bounded by simple lines.
REPRESENTATION
Depending on the strategy of the region extraction, we distinguish between
different segmentation results.
Incomplete segmentation: The image is divided into homogeneous and
heterogeneous area first. The latter (we call those areas background) do not
fulfill the homogeneity criterion and therefore do not fulfill the above definition
exactly.
Complete Segmentation: The image is completely divided into regions, fulfilling
the definition as given above for the discrete image, too. That might yield to
conflicting topology of the image regions, depending on the definition of the
neighborhood (N8 or N4) (see PAVLIDIS 1977) but also to inaccurate region
boundaries, depending on the cost of the approach.
The final symbolic representation of blobs consists of geometric, radiometric and
relational attributes. A blob itself can be represented by its boundaries (if the blob
contains holes, the blob has more than one boundary) or by a list of pixels inside
the blob. Blob boundaries define the location of the blob. Representing blob
boundaries is equivalent to representing image edges.
Geometric attributes of blobs are size, shape, center of gravity, mean direction,
etc.) Algorithms for extracting these attributes can be found in literature,
particular in the field of binary image analysis. Radiometric attributes are e.g.
mean intensity within the blob, variances of the intensities, texture parameter.
Lists of adjacent blobs mutual boundaries, junctions and corners are examples
for relational attributes.
APPLICATIONS
Region information has the advantage that it covers geometrically large parts of
the image. Therefore it can be used for several applications like compression or
interpretation tasks.
Data compression: Grouping all pixels, which are connected in image space
and have similar properties to one object (i.e. the blob) and representing the
object by characteristics attributes, reduces the amount of data and the
redundancy of information.
Analysing range images: Region-based segmentation algorithms were found to
be more robust when analysing range image.
Binary image analysis: In many cases region extraction is a prerequisite for
binary image analysis, widely used in industrial applications.
High-level image interpretation: in many case object models consist of the
structural description of object parts, where the interior of each part is assumed
to have similar surface and reflectance properties. Therefore, extracting blobs
and their attributes is quite useful for object recognition.
BASIC APPROACHES
Given a digital image with a discrete image function, region extraction is the
process of grouping pixels to regions according to connectivity and similarity
(homogeneity). The large amount of region extraction methods can be classified
in several ways. One possibility is to separate the methods by the number of
pixels, which are used for the grouping decision and are therefore called local or
global techniques. Further on we distinguish the methods depending on where
the grouping is done:
In the first place the grouping process is defined in the image domain. That
means, that the decision that connected pixels can be merged or should be split
is done directly by the analysis of the properties of adjacent pixels. Thus, both
the similarity and the connectivity are considered in one processing step.
Examples of this types are: region growing or region merging, region splitting and
split and merge algorithms.
The second approach applies the similarity and connectivity evaluation in two
separate steps: The goal is first to analyse the discriminating properties of the
pixels of the entire image and use the result to define several classes of objects.
Examples are thresholding and cluster techniques. This is done outside the
image raster by storing all pixel properties in a so-called measurement space
(e.g. a histogram). Then, the definition of the classes can be used to classify the
pixels: Going back to the image domain, each pixel is labeled with the identify
number of the class. In the second step, pixels of the same class and which are
can be obtained by investigating not only the pixel properties themselves, but a
mean property of the local neighbourhood or the properties of already extracted
regions. Local neighbourhood properties are e.g. the mean values and variances,
but also gradients or Laplacians. The latter area also used for many edge
detectors. Using gradient or Laplacian, edges and regions can be extracted by
the same operator, which directly takes the duality of regions and edges into
account. Combinations of different techniques provide further improvements by
consequently using their positive properties. Criteria are the accuracy of the
regions are significantly different, the ability to place boundaries in weak areas,
and the robustness to noisy data.
Region Merging: Assuming the image area being completely partitioned into
regions, the aim is to merge adjacent region, which are not significantly different.
The main problem of region extraction by region growing algorithms is the
question of the merging order. Except of methods working in a highly parallel
manner (e.g. relaxation techniques), the result depends on which region was
extracted first and which of the adjacent pixels or regions are attended first
(usually more than one neighbour fulfils the homogeneity criterion). The
determination of the best merging candidate is a time-consuming search
algorithm and is difficult to be solved. Less complex approaches consist of well
(and locally) defined merging rules.
is homogeneous, single regions are split if they do not meet the homogeneity
criterion. The process continues until no more merging or splitting can be done. A
further advantage of this method is that is faster than a single splitting or merging
process.
Drawbacks
The independent application of the techniques presented here reveals a number
of drawbacks:
Techniques aiming at complete partitioning of the image area like regionbased approaches lead to uncertain or even artificial boundaries.
Region-based techniques conceptually are not able to incorporate midlevel knowledge such as the straightness of the boundaries.
Edge based techniques normally cannot guarantee closed boundaries,
thus do not lead to a complete partitioning. Edges are likely to be broken
or do not represent the boundaries of the regions (spurious edges)
because of image noise.
Corner detectors usually dont work at junctions. All point detectors have
difficulties at smooth corners.
The used models are either wrong or at least not adaptive to the local
image content (e.g. edge detection at junctions).
The beauty of an expert system is that because true experts, such as foresters
or geologists, create the rules, also called a knowledge base, non-experts can
use the system successfully.
In terms of satellite images, the knowledge base identifies features by applying
questions and hypotheses that examine pixel values, relationships with other
features and spatial conditions, such as altitude, slope, aspect and shape. Most
importantly, the know ledge base can accept inputs of multiple data types, such
as digital elevation models, digital maps, GIS layers and other pre-processed
thematic satellite images, to make the necessary assessments.
Schenk (Schenk, 1999) proposes the expression autonomous for a system that
can perform autonomously from human interaction. Also those which are called
automatic (like automatic DEM generation) are not purely automatic, as they
solve the task up to a certain percent of errors. In extension to that Heuel (Heuel,
2000) gives a proposal to classify the automation degree of systems using the
terms quantitative and qualitative interaction: methods are defined automatic, if
only simple y/n decisions or a selection of alternatives, i.e. qualitative interaction
are needed, they are regarded as semi-automatic, if qualitative decisions and
quantitative input parameters are needed.
We need to initialize the extraction process, we might need to interact during runtime and we certainly need to validate or correct the results. The less interaction
we need, the higher is the degree of automation. We expect from the integration
of automatic processes, that the overall efficiency of the system is increased, but
we know, that those processes can give erroneous results, which are costly for
the user and thus may decrease the efficiency of the system. We may want to
reduce the level of training by avoiding complexity and skill requirements in
decision-making, but we also want to reduce the number of manual actions in the
collection phase. Here we should not only refer to the amount of human
interaction referred to time and number of mouse operations, but also to the type
of interaction needed. We certainly have to select parameters according to the
task we want to solve and the data, which is available. This is valid for all
systems. We need to give the image numbers of overlapping photographs, we
need to defined the units (m or feet) or we need to give the type of features
searched for alike buildings and/or roads. We have to provide instructions on
how to collect buildings in an interactive system or we need to give a set of
building models and some min-max values if we want to extract them
automatically. If we need to get deeper involved in the algorithms we might need
to give thresholds and steering parameters (window sizes, minimal angle
difference, minimal line length in the image etc), which are not always
interpretable. Sometimes it is difficult to connect them to task and imager
material. This holds also for some stopping criteria for the algorithms, like
maximal number of iterations etc. Also the type of post editing can vary. We
might need to correct single vertex or corner points, or the topology of whole
structures or we need to manually check for completeness.
Summarizing the above statements we propose the following scheme, starting
from an interactive system, where we can solve all tasks required, to a semiautomatic system, where we interact during the measurement phase, to an
automated system, where the interaction is focused at the beginning and the end
of the automatic process and to an autonomous system, which is behind horizon
right now.
1. Interactive system (purely manual measurement, no automation for any
measurement task).
cursor snaps to the local terrain surface. Thus, the operator is relieved from a
precise stereoscopic measurement and can therefore increase the speed of
data acquisition. The second type of point measure algorithms is used to
make the cursor snap to a specific object corner. These algorithms can be
used for monoplotting as well as for stereoplotting. For monoplotting the
operator approximately indicates the location of an object corner to be
measured. The image patch around this approximate point will usually contain
grey value gradients caused by the edges of the object. By applying an
interest operator (see e.g. [Frstner and Glch, 1987]) to this patch the
location of the object corner can be determined. Thus, such utilities can make
the cursor snap to the nearest point of interest. When using the same
principle for stereoplotting, the operator has to supply an approximate 3D
position of the object corner. The interest operator can then be applied to both
stereo images, whereas the estimated 3D corner position will be constrained
by the known epipolar geometry. For the measurement of house roof corners,
this procedure was reported to double the speed of data acquisition and
reduce the operator fatigue [Firestone et al., 1996].
Extraction of lines : The extraction of lines from digital images has been a
topic of research for many years in the area of computer vision [Rosenfeld,
1969, ueckel, 1971, Davis, 1975, Canny, 1986]. First attempts to extract
linear features from digital aerial and space imagery were reported in [Bajcsy
and Tavakoli, 1976, Nagao and Matsuyama, 1980]. Semi-automatic
algorithms have been eveloped for the extraction of roads. These algorithms
can be classified into two categories: algorithms using deformable templates
and road trackers.
Deformable templates:
Before starting algorithms using deformable templates the operator needs to
provide the approximate outline of the road. This initial template of the road is
usually represented by a polygon with a few nodes near to the road to be
measured. The task of the algorithm is to refine the initial template to a new
polygon with many more nodes that accurately outline the road edges or the
road centre (depending on the road model used).
This is achieved by deforming the template such that a combination of two
criteria is optimised: the template should coincide with image pixels with high
grey value gradients and the shape of the template should be relatively
smooth. The latter criterion is often accomplished by constraining the (first
and) second derivatives of the template. This constraint is needed for
regularisation but is also leading to more likely outline results, since road
shapes generally are quite smooth. Most algorithms of this kind are based on
so-called snakes [Kass et al., 1988]. The snakes approach uses an energy
function in which the two optimisation objectives are combined.
After computing the energy gradients due to changes in the positions of the
polygon nodes the optimal direction for the template deformation can be
found by solving a set of differential equations. In an iterative process the
polygon nodes are shifted in this optimal direction. The resulting behaviour of
the template looks like that of a moving snake, hence the name. Whereas
snakes were initially formulated for optimally outlining linear features in a
single image, they can also be used to outline a feature in 3D object space by
combining grey value gradients from multiple images together with the
exterior orientation of these images [Trinder and Li, 1995, Neuenschwander
et al., 1995].
This snakes approach has also been extended to outline both sides of a road
simultaneously. More research is conducted to further improve the efficiency
of mapping with snakes by reducing the requirements on the precision of the
initial template provided by the operator and by incorporating scene
knowledge into the template deformation process [Neuenschwander et al.,
1995, Fua, 1996].
Road trackers
In the case of snakes, the operator needs to provide a rough outline of the
complete road to be measured. In contrast, the input for road trackers only
consists of a small road segment outlined by the operator. The purpose of the
road tracker is then to find the adjacent parts of the road. Most road trackers are
based on matching grey value profiles [McKeown and Denlinger, 1988, Quam
and Strat, 1991, Vosselman and Knecht, 1995].
Based on the initial road segment outlined by the operator, a characteristic grey
value profile of the road is derived. Furthermore, the local direction and curvature
of the road is estimated. This estimation is used to predict the position of the road
at some step size after the initial road segment. At this position and
perpendicular to the predicted road direction at this position a grey value profile is
extracted from the image. By matching this profile with the characteristic road
profile a shift between the two profiles can be determined. Based on this shift, an
estimate for the road position along the extracted profile is obtained. By
incorporating previously estimated positions, other road parameters like the road
direction and the road curvature can also be updated. The updated road
parameters can then be used to make a next prediction of the road position at
some step size further along the road. This recursive process of prediction,
measurement by profile matching and updating the road parameters can be
implemented elegantly in a Kalman filter [Vosselman and Knecht, 1995].
The road tracking continues until the profile matching fails at several consecutive
predicted positions, i.e. it stops when the several extracted profiles show little
correspondence with the characteristic grey value profile. Some characteristic
results are shown in figure 3. Trees along the road or road crossings and
junctions can often explain matching failures. Due to these objects the grey value
profiles extracted at those positions deviate substantially from the characteristic
profile. By making predictions with increasing step sizes, the road tracker is often
able to jump over these kinds of obstacles and continue the outlining of the road.
Extraction of areas
Due to the lack of modeled knowledge about objects, the computer-supported
extraction of area features is more of less limited to areas that are homogeneous
with respect to some attribute. Of course, in images the most common attributes
to look at are the pixels grey value, colour and texture attributes. Algorithms that
extract homogeneous grey value areas can facilitate the extraction of objects like
water areas and house roofs. The most common approach is to let the operator
indicate a point on the homogeneous object surface and let an algorithm find the
outlines of that surface.
An example can be seen in figure 4. It is clear that the results of such an
algorithm still require some editing by an operator. Overhanging trees at the left
side of the river and trees that cast dark shadows at the right side of the river
cause differences between the bounds of the homogeneous area and the river
borders, as they should be mapped. Similar differences will also arise when
using these techniques to extract building roofs. Most objects are not
homogeneous enough to allow a perfect delineation. Still, the majority of the lines
to be mapped may be at the correct place. Thus, editing the results of such an
area feature extraction will often be faster than a complete manual mapping
process. Firestone et al. [1996] report the use this technique for mapping
lakeshores. Especially for small scale mapping this can be very efficient since the
water surface generally appears homogeneous and the disturbing effects of trees
along the shoreline, as in the example, may be negligible at small scale.
The algorithms used to find the boundaries of a homogeneous area are usually
based on the region-growing algorithm [Haralick and Shapiro, 1992]. Starting at
the pixel indicated
by the operator, this algorithm checks whether an adjacent pixel has similar
attributes (e.g. grey value). If the difference is below some threshold, the two
pixels are merged to one area. Next, the attributes of another pixel adjacent to
this area are examined and this pixel is also merged with the area if the attribute
differences are small. In this way a homogeneous area is grown pixel by pixel.
This process is repeated until all pixels that are adjacent to the grown area have
significantly different attributes.
used to find the best correspondence between the edges of the object model and
the location of high gradients in the image (middle image). Especially in presence
of neighboring edges with
High contrast (like the windows on the house front in the example) the resulting
fit does often not correspond to the desired result and therefore requires one or
more additional corrective measurements by the operator (right image). Different
approaches are being used to find the optimal alignment of the object model to
the image. Fua [1996] extended the above described snake algorithm for fitting
object models. The energy function is defined as a function of the sum of the
grey value gradients along the model edges. Derivatives of this energy function
with respect to changes in the co-ordinates of the object corners determine the
optimal direction for changes in these co-ordinates, whereas constraints on the
co-ordinates ensure that a valid building model with parallel and rectangular
edges is maintained. Lowe [1991] and Lang and Schickler [1993] use parametric
object descriptions and determine the optimal parameter values by fitting the
object edges to edge pixels (pixels with high grey value gradients) and extracted
linear edges respectively. Veldhuis [1998] analysed the approaches of Fua
[1996] and Lowe [1991] with respect to suitability for mapping.