Data Processing
Data Processing
1
images and to aid the interpretation of Remote Sensing satellite data also for archaeological
purpose.
The choice of specific techniques or algorithms to use depends on the goals of each
individual project and on the data-set available.
This overview starts with a discussion of pre-processing techniques, which include
radiometric correction to correct for uneven sensor response over the whole image and
geometric correction to correct for geometric distortion due to Earth's rotation and other
imaging conditions. The image may also be transformed to conform to a specific map
projection system. Furthermore, if accurate geographical location of an area on the image
needs to be known, ground control points (GCP's) are used to register the image to a precise
map (geo-referencing). Alternatively, and particularly when georeferencing is not important,
it is possible to realize an image to image registration.
In order to aid visual interpretation, visual appearance of the objects in the image can be
improved by image enhancement, which include radiometric and geometric enhancement. In
this direction Vegetation Index, Principal Component Analysis, Tasseled Cap
Transformation, Edge Detection Algorithms and Spatial Statistic are discussed.
Then classification algorithms are described as a tool to discriminate different land-cover
types in an image using spectral features.
Finally, in order to obtain more information that can be derived from each of the single
sensor, data fusion technique is proposed. In particular, more emphasis is given to the fusion
of two images acquired by a multi-spectral sensor with a lower spatial resolution and a
panchromatic sensor with a higher spatial resolution.
2
transmission medium through which radiation must travel from its source to the sensor, and
can be a result also of instrumentation effects (Richards, 2006).
Geometric correction means to reduce image geometry errors which can arise in many ways:
i. The rotation of the earth during image acquisition,
ii. The finite scan rate of some sensors,
iii. The wide field of view of some sensors,
iv. The curvature of the earth
v. Sensor non-idealities
vi. Variations in platforms altitude, attitude and velocity,
vii. Panoramic effects to the imaging geometry.
There are two techniques to correct the various types of geometric distortion for digital
images: (i) to model the nature and magnitude of the source of distortion and use these
models to establish correction formulae (this approach is effective when the type of
distortion are well known); to establish a mathematical relationship between the addresses of
pixels in an image and the corresponding coordinates of those points on the ground.
Geometric correction must be applied on each band of image. If the bands are well
registered to each other, steps taken to correct one band in an image, can be used on all
remaining bands.
The modeling for doing these corrections are not described in this chapter because satellite
images used in this thesis are yet radiometrically and geometrically correct, but several
references are available that deal in depth with the issue of these corrections.
2.2 Image registration
An image can be registered to a map coordinate system using correction techniques and
therefore have its pixels addressable in terms of map coordinates (easting and northing, or
latitudes and longitudes) rather than pixels and line numbers.
Alternatively, and particularly when georeferencing is not important, it is possible to realize
an image to image registration. In this case an image is chosen as a master to which the other
is to be registered.
This registration is very useful to compare images of the same area acquired in different
periods because registered images facilities a pixel by pixel comparison.
3
These processing techniques can be divided into image enhancement and image
classification.
Image enhancement purpose is to improve image visual impact and to help pattern
recognition (Richards, 2006). It includes radiometric enhancement and geometric
enhancement.
Classification is the procedure most often used in image processing for grouping all the
pixels in an image into a finite number of individual classes or categories to produce a
thematic representation.
4
that are added, divided, or multiplied in a manner designed to yield a single value that
indicates the amount or vigour of vegetation within a pixel. High values of the VI identify
pixels covered by substantial proportions of healthy vegetation. The simplest form of VI is a
ratio between two digital values from separate spectral bands.
Some band ratios have been defined by applying knowledge of the spectral behaviour of
living vegetation.(Campbell, 2002).
Band ratios are quotients between measurements of reflectance in separate portions of the
spectrum. Ratios are effective in enhancing or revealing latent information where there is an
inverse relationship between two spectral responses to the same biophysical phenomenon. If
two features have the same spectral behaviour, ratios provide little additional information;
but if they have quite different spectral responses, the ratio between the two values provides
a single value that concisely express the contrast between the two reflectances. (Campbell,
2002).
For living vegetation, the ratioing strategy can be especially effective
because of the inverse relationship between vegetation brightness in the
red and infrared regions. That is, absorption of the red light (R) by
chlorophyll and strong reflection if infrared (IR) radiation by mesophyll
tissue assures that the red and near infrared values will be quite
different, and that the ratio (IR/R) of actively growing plants will be high.
Non vegetated surfaces, including open water, man-made features, bare
soil, and dead or stressed vegetation will not display this specific spectral
response, and the ratios will decrease in magnitude. Thus, the IR/R ratio
can provide a measure of photosynthetic activity and biomass within a
pixel. (Campbell,2002)
The IR/R ratio is only one of many measures of vegetation vigour and abundance. The
green/red (G/R) ratio is based upon the same concepts used for the IR/R ratio, although it is
considered less effective.
One of the most widely use VIs developed by Rouse et al. in 1974 is known as the
normalized difference vegetation index (NDVI):
IRR
NDVI =
IR + R
5
This index in principle conveys the same kind of information as the IR/R
and G/R ratios, but is
constrained to vary within limits that provide desirable statistical
properties in the resulting values.
(Campbell, 2002; Jensen 2000)
Although such ratios have been shown to be powerful tools for studying
vegetation, they must be used with care if the values are to be rigorously
(rather than qualitative) interpreted. Values of ratios and Vis can be
influenced by many factors external to the plant leaf, including viewing
angle, soil background, and differences in row direction and spacing in
the case of agricultural crops. Ratios may be sensitive to atmospheric
degradation. Because atmospheric path length varies with viewing angle,
values calculated using off-nadir satellite data vary according to position
within the image. (Campbell, 2002).
Quickbird normalized difference vegetation index (NDVI) data were used
in my Dissertation (2005) in order to assess their capability in the field of
archaeological prospection. The investigations were performed for a test
case (Jure Vetere in the south of Italy) that is characterized by the
presence of dense vegetation mainly composed by herbaceous plants.
The results showed the high capability of QuickBird NDVI to enhance the
typical surface anomalies linked to the presence of archaeological buried
remains. The detected anomalies were confirmed by independent
investigations based on geophysical prospections performed in 2005.
6
1
cov k1, k 2
nm SB I , J , K 1 K 2 SB i , j , k 2 k 2
i 1 j 1
(8)
where k1, k2 are two image input, SB i,j, is the digital number (DN) value of processed
image in row i and column j, n number of row, m number of columns, mean of all pixel
SB values.
The percent of total dataset variance explained by each component is obtained by formula 9.
i
%i 100 *
i
i 1
(9)
Finally, a series of new image layers (called eigenchannels or components) are computed
(formula 10) by multiplying, for each pixel, the eigenvector of S for the original value of a
given pixel in the image:
Pi Pk u k ,i
i 1
(10)
where Pi indicates the data set in component i, u k,i eigenvector element for component i in
input data k, Pk is the DN for k, number of input data.
A loading, or correlation R, of each component i with each input data k can be calculated by
using formula 11,
1 1
Rk , i u k ,i i 2 var k 2
(11)
where var k is the variance of input data k (obtained by reading the k th diagonal of the
covariance matrix).
The PCA transforms the input data-set in new components that should be able to make the
identification of distinct features and surface types easier. The major portion of the variance
is associated with homogeneous areas, whereas localized surface anomalies will be
enhanced in later components, which contain less of the total dataset variance. This is the
7
reason why they may represent information variance for a small area or essentially noise
and, in this case, it must be disregarded. Some problems can arise from the fact that
eigenvectors can not have general and universal meaning since they are extracted from the
series.
When a PCA is applied on a satellite image the parameters of section 1.2.3 becomes the
following:
k1, k2 are two input spectral channels;
SB i,j, is the spectral value of the given channel in row i and column j,
The percent of total dataset variance explained by each component is obtained by formula
(10). Moreover:
A loading, or correlation R, of each component i with each input date k can be calculated by
using formula (11).
The PCA transforms the input multispectral bands in new component that should be able to
make the identification of distinct features and surface types easier. The major portion of the
variance in a multi-spectral data set is associated with homogeneous areas, whereas
localized surface anomalies will be enhanced in later components, which contain less of the
total dataset variance. This is the reason why they may represent information variance for a
small area or essentially noise and, in this case, it must be disregarded. Some problems can
arise from the fact that eigenvectors can not have general and universal meaning since they
are extracted from the series.
In archaeology PCA was usefully applied for linear pattern detection and spatial filtering to
Landsat 7 images, for the detection of Pre-hispanic pathways, in Aztec cities within and
outside the Valley of Mexico (Argote- Epino & Chavez 2005); for the discrimination of
surface archaeological remains in Hisar (southwest Turkey) (De Laet 2007), the extraction
of land patterns, useful for palaeogeographic and palaenvironmental investigations in
Metaponto, in the Ionian coast of Southern Italy (Masini & Lasaponara, 2006),
8
2.3.1.3 Tasseled Cap Transformation (TCT)
The TCT, also known as Kauth-Thomas technique, was devised for enhancing spectral
information content of satellite data. The TCT is a linear affine transformation
substantially based on the conversion of given input channel data set in a new data set of
composite values. The transformation depends on the considered sensor.
Usually there are just three composite variables tasseled cap transform bands that are often
used:
TCT-band 1
TCT-band 2
TCT-band 3
In particular, TCT-band 1 is a weighted sum of all spectral bands and can be interpreted as
the overall brightness or albedo at the surface. TCT-band 2 primarily measures the contrast
between the visible bands and near-infrared bands and is similar to a vegetation index. The
TCT-band 3 can be interpreted as a measure of soil and plant moisture.
The original TCT was derived (Kauth and Thomas, 1976) for the four bands of the Landsat
MSS sensor.
A1 A2 A3 A4
Brightness 0.32331 0.60316 0.67581 0.26278
Greenness -0.28317 -0.66006 0.57735 0.38833
Yellowness -0.89952 0.42830 0.07592 -0.04080
Later, the TCT was extended to the Landsat TM (Crist and Cicone, 1984) and ETM (as
available, for example, in a routine of geomatica, PCI software).
For TM data:
T.C. = A1*(TM1) + A2*(TM2) + A3*(TM3) + A4*(TM4) + A5*(TM5) + A7*(TM7)
9
For TM data we have chosen the coefficients as follows:
A1 A2 A3 A4 A5 A6
Brightness 0.3037 0.2793 0.4743 0.5585 0.5082 0.1863
Greenness -0.2848 - 0.2435 - 0.5436 0.7243 0.0840 - 0.1800
Wetness 0.1509 0.7112 0.1973 0.3279 - 0.3406 - 0.4572
A1 A2 A3 A4 A5 A7
Brightness 0.1544 0.2552 0.3592 0.5494 0.5490 0.4228
Greenness -0.1009 - 0.1255 - 0.2866 0.8226 - 0.2458 - 0.3936
Wetness 0.3191 0.5061 0.5534 0.0301 - 0.5167 - 0.2604
All the existing TCTs are performed on a pixel basis to best show the underlying
structure of the image by using weighted sums of the input channels.
Later, the TCT was also extended to the IKONOS (Horne 2003) and QuickBird
sensors(Lasaponara and Masini, 2007).
The weighted sums developed by Horne (2003) for IKONOS input channels were:
TCT IKONOS-band1 = .326 BLUE + 0.509 GREEN + 0.560 RED + 0.567 NIR (1)
TCT IKONOS-band2=-0.311 BLUE - 0.356 GREEN -0.325 RED + 0.819NIR (2)
TCT IKONOS-band3 = -0.612 BLUE - 0.312 GREEN + 0.722 RED -0.081 NIR (3)
TCT IKONOS-ban4 = -0.650 BLUE + 0.719 GREEN -0.243 RED -0.031 NIR (4)
To apply TCT on Quickbird, spectral channels was performed in two different ways
(Lasaponara and Masini, 2007). firstly, the weighted sums devised for ETM imagery was
adopted, solely considering the values for the BLUE, GREEN, RED, NIR channels (see
equations 5 to 7); and, secondly, those specifically developed for IKONOS data (see
equations 1 to 4).
TCT ETM-band 1 = 0.1544 BLUE + 0.2552 GREEN + 0.3592 RED + 0.5494 NIR (5)
10
TCT ETM-band 2 = -0.1009 BLUE 0.1255 GREEN 0.2866 RED + 0.8226 NIR (6)
TCT ETM-band 3 = 0.3191 BLUE +0.5061 GREEN + 0.5534 RED + 0.0301 NIR (7)
Lasaponara and Masini (2007) applied the Tasseled Cap Transformation (TCT) to QuickBird
multispectral images for extracting archaeological features linked to ancient human
transformations of the landscape. The investigation was performed on Metaponto.
11
M N
r ( i, j )= ( m ,n ) t (m ,n)
m=1 n=1
where ( m ,n ) is the pixel brightness value in a defined location and t(m,n) is the
.
Fig. 1.2 - Type of Edges (a) Step Edge (b) Ramp Edge (c) Line Edge (d)Roof Edge
Edge detection contain three mainly steps (Senthilkumaran N. and Rajesh R, 2009):
12
1) Filtering: Images are often corrupted by random variations in intensity values, called
noise. Some common types of noise are salt and pepper noise, impulse noise and Gaussian
noise. Salt and pepper noise contains random occurrences of both black and white intensity
values. However, there is a trade-off between edge strength and noise reduction. More
filtering to reduce noise results in a loss of edge strength.(Senthilkumaran N and Rajesh R.,
2008).
2) Enhancement: In order to facilitate the detection of edges, it is essential to determine
changes in intensity in the neighborhood of a point. Enhancement emphasizes pixels where
there is a significant change in local intensity values and is usually performed by computing
the gradient magnitude (Xian Bin Wen et al., 2008).
3) Detection: Many points in an image have a nonzero value for the gradient, and not all of
these points are edges for a particular application. Therefore, some method should be used to
determine which points are edge points. Frequently, thresholding provides the criterion used
for detection (Paulinas M. and Usinskas A., 2007).
All the Edge Detection Algorithms have the following common principles:
good detection: the algorithm should mark as many real edges in the image as possible.
good localization : edges marked should be as close as possible to the edge in the real
image.
minimal response: a given edge in the image should only be marked once, and where
possible, image noise should not create false edges.
Edge Detection Algorithms were the main subject of my Dissertation (2005). They were
successfully applied to data fusion Quick Bird images in order to emphasize the marks
arising from the presence of buried structures.
a1 a2 a3
a4 CT a5
13
a6 a7 a8
where:
- a1,,a8 = grey pixel values into the template
- CT = grey value for pixel at the center of the template
a1 a2 a3
a4 a5 a6
a7 a8 a9
14
filtered pixel
a1 a2 a3 a4 ... a9
9
coordinates x and y, then a vector gradient can be defined in the image according to:
( x , y )= ( x , y ) i+ ( x , y) j
x y
where i,j are a pair of unit vectors. The direction of the vector gradient is the direction of
maximum upward slope and its amplitude is the value of the slope.
2
( x , y ) =tan 1
1
The magnitude of the gradient defines the edge according to the following formula:
[
2 2
||= + =
2
1
2
2
x ][
( x , y ) +
y
( x , y )
]
The direction of the gradient defines contouring applications or determining aspect in DTM
(Richards, 2006).
Roberts Filtering
Roberts Algorithm realizes a cross-detection in the diagonal directions:
i+1,j+1)-i,j) and i,j+1)-i+1,j)
0 1 1 0
1 0 0 1
and
Since this procedure computes a local gradient it is necessary to choose a threshold value
above which edge gradients are said to occur.
15
Sobel Operators
Sobel edge detection produces an image where higher grey-level values indicate the
presence of an edge between two objects. The Sobel Edge Detection filter computes the root
mean square of two 3X3 templates. Sobel operator computes discrete gradient in the
orizontal and vertical directions at the pixel location I,j. For that the orthogonal components
of gradient are:
1 0 1 1 0 1
2 0 2 2 0 2
1 0 1 1 0 1
and
1 0 1 1 1 1
1 0 1 0 0 0
1 0 1 1 1 1
and
The first matrix implements a spatial derivative in the horizontal direction, whilst the second
matrix implements a spatial derivative in the vertical direction.
If we have the following window, with a1-a9 grey pixel values:
a1 a2 a3
a4 a5 a6
a7 a8 a9
16
Finally theSobel gradient is:
Sobel gradient = SQRT (X*X + Y*Y)
Prewitt filtering
Prewitt edge detection produces an image where higher grey-level values indicate the
presence of an edge between two objects. The Prewitt Edge Detection filter computes the
root mean square of two 3X3 templates. It is one of the most popular 3X3 edge detection
filters.
The component of the gradient are:
1 0 1 1 1 1
1 0 1 0 0 0
1 0 1 1 1 1
and
The first matrix implements a spatial derivative in the horizontal direction, whilst the second
matrix implements a spatial derivative in the vertical direction.
If we have the following window:
a1 a2 a3
a4 a5 a6
a7 a8 a9
17
Prewitt gradient = SQRT (X*X + Y*Y)
This algorithm is very sensitive to the noise. For that before using Laplace for edge
detection it need to suppress the noise.
Laplacian of Gaussian (LoG), this is the name of the algorithm, is composed of the
following steps:
- to suppress the noise before using Laplace for edge detection;
- Laplacian alghorithm application
- Localization of centers of thick edges based on zero crossing property of second
order derivative.
In this approach, firstly noise is reduced by convoluting the image with a Gaussian filter.
Isolated noise points and small structures are filtered out. With smoothing; however; edges
are spread. Those pixels, that have locally maximum gradient, are considered as edges by
the edge detector in which zero crossings of the second derivative are used. To avoid
detection of insignificant edges, only the zero crossings whose corresponding first derivative
is above some threshold, are selected as edge point. The edge direction is obtained using the
direction in which zero crossing occurs.
18
2.3.2.1.4 Canny Algorithm
Canny used the calculus of variations to find the function which optimizes a given edge
detection algorithm. The optimal function in Canny's detector is described by the sum of
four exponential terms, but can be approximated by the first derivative of a Gaussian. On the
base of this, Canny allowed to define the following algorithm:
Smooth the image with a Gaussian filter,
Compute the gradient magnitude and orientation using finite-difference approximations
for the partial derivatives,
Apply non-maxima suppression to the gradient magnitude,
Use the double thresholding algorithm to detect and link edges.
Canny edge detector approximates the operator that optimizes the product of signal-to-noise
ratio and localization. It is generally the first derivative of a Gaussian.
The ''non-maximal suppression'' step is an estimation of the image gradients, a search is
then carried out to determine if the gradient magnitude assumes a local maximum in the
gradient direction. From this stage a set of edge points is obtained. These are sometimes
referred to as "thin edges".
It is in most cases impossible to specify a threshold at which a given intensity gradient
switches from corresponding to an edge into not doing so. Therefore Canny uses
thresholding with hysteresis.
This requires two thresholds: high and low. Making the assumption that important edges
should be along continuous curves in the image allows us to follow a faint section of a given
line and to discard a few noisy pixels that do not constitute a line but have produced large
gradients. Therefore the algorithms begins by applying a high threshold. This marks out the
edges that can be fairly sure genuine. Starting from these, using the directional information
derived earlier, edges can be traced through the image. While tracing an edge, it applies the
lower threshold, allowing to trace faint sections of edges as long as to find a starting point.
The use of two thresholds with hysteresis in Canny Algorithm allows more flexibility than in
a single-threshold approach, but general problems of thresholding approaches still apply. A
threshold set too high can miss important information. On the other hand, a threshold set too
low will falsely identify irrelevant information (such as noise) as important. It is difficult to
give a generic threshold that works well on all images. No tried and tested approach to this
problem yet exists.
19
2.3.2.1.5 Edge Detection Algorithm based on the frequency domain: the Fourier
transform
An image can be represented in the spatial domain, as in the previous cases, but also in the
frequency domain. In the frequency domain each image channel is represented in term of
sinusoidal waves.
The Fourier Transform image represents the composition of the original image in terms of
spatial frequency components, i.e. sine and cosine components. Spatial frequency is the
image analog of the frequency of a signal in the time.
If we have a pixel in location i, j in a KxK pixels image and each pixel hasi,j) brightness,
the Fourier transform in discrete form is described by the following formula:
k 1 k 1
(r , s ) (i, j ) exp j 2 (ir js) / K
i 0 j 0
From the transformed image an image can be reconstructed according to the following
formula:
k 1 k 1
1
(i , j ) 2
K
(r , s) exp j 2 (ir js) / K
i 0 j 0
Thus the discrete Fourier transform of an image transforms each single row to generate an
intermediate image, and then transforms this by column to obtain the final result.
Usually the high spatial frequency into an image is associated with frequent changes of
brightness with position, whilst gradual changes of brightness with position are typical of
low frequency content in the spectrum.
Interpretation of frequency transformed images can be quite complicated. Infact, when the
Fourier Transformation is selected, the output domain is the two-dimensional frequency
spectrum of the input image. If these results are output to the display, a fairly symmetric
pattern will appear. Frequencies are along two directions (X and Y). The DC component,
which corresponds to the average brightness, (frequency = (0,0)) is at (K/2+1,K/2+1) where
K is the image size.
Points away from the DC point indicate higher frequencies. The transform at point
(K/2+1+x,K/2+1+y) corresponds to the cosine wave component which repeats every K/x
pixels along X direction and every N/y pixels along Y direction.
20
Image features which are aligned horizontally, make up the vertical components in the
Fourier Spectrum (and vice versa).
Fig. 1.3 (a) positive autocorrelation; (b) negative autocorrelation; (c) no autocorrelation
(or random)
In the context of image processing, spatial autocorrelation statistics can be used to measure
and analyze the degree of dependency among spectral features. It is generally described
thought some indices of covariance for a series of lag distances (or distance classes) from
each point. The plot of the given indices against the distance classes d is called
correlogramm, that illustrates autocorrelation at each lag distance. The distance at which the
value of spatial autocorrelation crosses the expected value, indicates the range of the patch
size or simply the spatial range of the pattern.
21
In the context of image processing, for each index and lag distance, the output is a new
image which contains a measure of autocorrelation.
Classic spatial autocorrelation statistics include a spatial weights matrix that reflects the the
intensity of the geographic relationship between observations in a neighbourhood. Such
spatial weights matrix indicate elements of computations that are to be included or excluded.
In this way it is possible to define ad hoc weights to extract and emphasize specific pattern.
N
1 2
( h )=
2N 1 [ z ( i )z ( i +h ) ] (14)
The semivariogram is the relationship between semivariance and lag. It has been studied
extensively, and a strong theoretical understanding exists about its behavior (e.g. Jupp et al.,
1988).
The Gearys c statistic is based on the squared difference between spatially lagged pairs of
pixels, normalized by the overall scene variance. In a remote sensing context, Gearys c
statistic can be defined as:
22
(n1)
W ij( i j )2
i j
c= (15)
2 W ij ( i )2
i j
n
W ij ( i )( j )
i j
I= (16)
W ij
i j
( i )2
23
Anselin Local Moran's I index identifies pixel clustering. Positive values indicate a cluster of
similar values, while negative values imply no clustering (that is, high variability between
neighboring pixels).
i
(17)
I i =
2
where S X is the X variance.
The Local Geary's C index identifies areas of high variability between a pixel value and its
neighboring pixels. It is useful for detecting edge areas between clusters and other areas with
dissimilar neighboring values.
(n1)
W ij( i j )2
i j
c= (18)
2 W ij ( i )2
i j
The Getis-Ord Local Gi index (1992) compares pixel values at a given location with those
pixels at a lag, d, from the original pixel at location i. So it identifies hot spots, such as areas
of very high or very low values that occur near one another. This is useful for determining
clusters of similar values, where concentrations of high values result in a high Gi value and
concentrations of low values result in a low Gi value. The results of this index differ from
the results of the Local Moran's I index because clusters of negative values give high values
for I, but low values for Gi.
24
wij ( d ) jW i
i=1
Gi ( d ) = for ij (19)
S W i ( nW i ) /( n1)
where Wi = wij ( d )
i=1
Getis and Ord (1992) introduced a local autocorrelation measure, the Gi statistic. Anselin
(1995) subsequently proposed local indicators of spatial autocorrelation (LISA) as a general
means for decomposing global autocorrelation measurements so that the individual
contribution of each observation can be assessed, and local hot spot identified (Anselin,
1995).
To resume we can say that spatial autocorrelation statistics measure and analyze the degree
of dependence among features that have clusters of similar or dissimilar values. The use of
classic spatial autocorrelation statistics such as Morans I, Gearys C and Getis-Ord Local Gi
index (for more information, see Anselin (1995) and Getis and Ord (1994)) enables the
characterization of the spatial autocorrelation within a user-dened distance. For each index,
the output is a new image which contains a measure of autocorrelation around the given
pixel. In particular:
(i) the Local Morans I index identies pixel clustering and positive values imply the
presence of a cluster of similar values which means low variability between neighbouring
pixels, whereas negative values indicate the absence of clustering which means high
variability between neighbouring pixels;
(ii) the Getis-Ord Gi index permits the identication of areas characterized by very high or
very low values (hot spots) compared to those of neighbouring pixels;
(iii) the Local Gearys C index allows to identify edges and areas characterized by high
variability between a pixel value and its neighbouring pixels;
(iv) all of these indices are available as tools of commercial software for Geographical
Information System (GIS) or image processing such as ENVI.
25
2.3.4 Classification
Classification is the procedure most often used in image processing for grouping all the
pixels in an image into a finite number of individual classes or categories to produce a
thematic representation.
It can be performed on single or multiple image channels to separate areas according to their
different scattering or spectral characteristics.
The classification procedures are differentiated as being either supervised or unsupervised
(clustering).
An example of successful use of Classification in archaeology is Malinverni and Fangi
(2008); they used K-means classification on a QuickBird image to localize emergencies. In
realty, the use of Classification algorithms in archaeology is limited because it can help to
obtain a modern classification and maximum a distribution of each class.
26
- To assess the accuracy of the selected classification using the labeled testing data set and
to refine the training process on the basis of the obtained results. In fact, the accuracy
assessment must be made to determine how correct the classified image is. An accuracy
assessment involves the determination of the overall accuracy of the classification, errors
of omission, errors of commission, producers accuracy, and consumers accuracy. All
measures give an indication of how well the classification of the image was conducted.
Some common classification algorithms include:
- Minimum-Distance to the Mean-Classifier: they use the mean values for each of the
ground cover classes calculated from the training areas. Each pixel within the image is
then examined to determine the mean value that it is closest to. Whichever mean value
that pixel is closest to, based on Euclidian Distance, is the class to which that pixel will
be assigned.
- Parallelepiped Classifier: uses a mean vector as opposed to a single mean value. The
vector contains an upper and lower threshold, which dictates which class a pixel will be
assigned to. If a pixel is above the lower threshold and below the upper threshold, then it
is assigned to that class. If the pixel does not lie within the thresholds of any mean
vectors, then it is assigned to a unclassified or null category.
- The Mahalanobis Distance classification: is a direction-sensitive distance classifier that
uses statistics for each class. It is similar to the Maximum Likelihood classification but
assumes all class covariances are equal and therefore is a faster method. All pixels are
classified to the closest ROI class unless you specify a distance threshold, in which case
some pixels may be unclassified if they do not meet the threshold.
- Maximum Likelihood Classifier: evaluates the variance and co-variance of the various
classes when determining in which class to place an unknown pixel. The statistical
probability of a pixel belonging to a class is calculated based on the mean vector and co-
variance matrix. A pixel is assigned to the class that contains the highest probability.
- The Spectral Angle Mapper algorithm (SAM): measures the spectral similarity by
calculating the angle between the two spectra, treating them as vectors in n-
dimensional space, where n is the number of bands (Kruse et al., 1993; Van der Meer
et al., 1997; Rowan and Mars., 2003). The reference spectra can either be taken
from a spectral libraries, from a field measurements or extracted directly from the
image. From a mathematical point of view this method makes the assumption that the
dot product between the test spectrums "u" to a reference spectrum "v" is:
27
ui
( vi )=uv cos
n
u v =
i=1
ui e v i
where are the components of the vectors u e v in in n-dimensional space. So the
i
ui v
n n
2
( u i vi2)
i =1 i=1
n
i=1
uv
=arccos =arccos
uv
The angle is in the range from zero to /2. Small angles between the two spectrums
indicate high similarity and high angles indicate low similarity. Pixels further away than
the specified maximum angle threshold in radians are not classified. This method is
relatively insensitive to illumination and albedo because the angle between the two
vectors is independent of the vectors length (Crosta et al., 1998; Kruse et al., 1993).
It is common practice to make two or more iterations of a classification process, to improve
the accuracy of the result. With each iteration, the test sites are edited to better reflect the
representation of their class and to remove or reduce any class overlap.
28
The two unsupervised classifications most commonly used in remote sensing are the
ISODATA and K-mean algorithm. Both of these algorithms are iterative procedures. In
general, both of them assign first an arbitrary initial cluster vector. Then they classify each
pixel to the closest cluster and finally calculate the new cluster mean vectors on the base of
all the pixels in one cluster. The second and third steps are repeated until the change between
the iteration is small. The change can be defined in several different ways, either by
measuring the distances the mean cluster vector have changed from one iteration to another
or by the percentage of pixels that have changed between iterations.
The objective of the k-means algorithm is to minimize the within cluster variability. The
objective function (which is to be minimized) is the sums of squares distances (errors)
between each pixel and its assigned cluster center.
It is widely used for classifying satellite imagery but it is biased on initial mean value
selected and it tends to misclassify the pixel value to different class.
ISODATA is an unsupervised classification method that uses an iterative approach
incorporating a number of heuristic procedures to compute classes. The ISODATA utility
repeats the clustering of the image into classes until either a specified maximum number of
iterations has been performed, or a maximum percentage of unchanged pixels has been
reached between two successive iterations. The algorithm starts by randomly selecting
cluster centers in the multidimensional input data space. Each pixel is then grouped into a
candidate cluster based on the minimization of a distance function between that pixel and
the cluster centers. After each iteration, the cluster means are updated, and clusters may be
split or merged further, depending on the size and spread of the data points in the clusters.
The ISODATA clustering method uses the minimum spectral distance formula to form
clusters. The equation for classifying by spectral distance is based on the equation for
Euclidean distance, i.e.:
SD
n
XYc=
i=1
2
(Ci X xyi ) )
30
arbitrarily discarded (which is suggested when the separability is closer to 0). The second
option is that the two signatures can be merged (which is suggested when the separability
is closer to 1).
- to 1.9: poor separability. It indicates that the two signatures are separable to some extent.
However, it is desirable to improve separability if possible. Low signature separability is
usually caused by improper combinations of image bands and/or training sites which
have large internal variability within each class.
- 1.9 to 2.0: good separability.
D(i,j)=0.5*T[M(i)-M(j)]*[InvS(i)+InvS(j)]*
[M(i)-M(j)+0.5*Trace[InvS(i)*S(j)+InvS(j)*S(i)-2*I ]
where M(i) = mean vector of class i, where the vector has Nchanne elements (Nchannel is
the number of channels used)
S(i) = covariance matrix for class i, which has Nchannel by Nchannel elements
InvS(i) = inverse of matrix S(i)
Trace[ ] = trace of matrix (sum of diagonal elements)
T[ ] = transpose of matrix
I = identity matrix
The JeffriesMatusita (JM) distance is obtained from the Bhattacharya distance, shown in
equation 1.
{| |( + )/2|
}
1
1
B= ( mim j )t
8 {
i
2
j
} 1
( mim j ) + 2 ln
i
| | |
i
j
1/ 2
j
1/ 2 (12)
31
in which mi and mj are the class mean vectors, and i; i; are the class covariances.
The Bhattacharya distance can be seen as two components. The first part of equation (12)
represents the mean, whereas the second part is the covariance difference. For the BD a
greater value indicates a greater average distance.
Adrawback of theBDis that such an index does not provide any indication of threshold
values for separability.
JeffriesMatusita (JM) distance is shown in equation 12.
J ij =2(1eB) (13)
M M
d ave = p( i) p ( j )d ij
i=1 j=i+1
where M is the number of spectral classes and p( i) and p( j ) are the class prior
probabilities.
32
The goal of multiple sensor data fusion is to integrate complementary and redundant
information to provide a composite image which could be used to better understanding of
the entire scene.
The fusion of information from sensors with different physical characteristics enhances the
understanding of our surroundings and provides the basis for planning, decision-making, and
control of autonomous and intelligent machines (Hall, 1997). In the past decades it has been
applied to different fields such as pattern recognition, visual enhancement, classification,
change detection, object detection and area surveillance (Pohl, 1998).
Multi-sensor data fusion can be performed at four different processing levels, according to
the stage at which the fusion takes place: signal level, pixel level, feature level, and decision
level (Dai, 1999).
(1) Signal level fusion. In signal-based fusion, signals from different sensors are combined
to create a new signal with a better signal-to noise ratio than the original signals.
(2) Pixel level fusion. Pixel-based fusion is performed on a pixel-by-pixel basis. It generates
a fused image in which information associated with each pixel is determined from a set of
pixels in source images to improve the performance of image processing tasks such as
segmentation (3) Feature level fusion. Feature-based fusion at feature level requires an
extraction of objects recognized in the various data sources. It requires the extraction of
salient features which are depending on their environment such as pixel intensities, edges or
textures. These similar features from input images are fused.
(4) Decision-level fusion consists of merging information at a higher level of abstraction,
combines the results from multiple algorithms to yield a final fused decision. Input images
are processed individually for information extraction. The obtained information is then
combined applying decision rules to reinforce common interpretation.
Among the hundreds of variations of image fusion techniques, the most popular and
effective methods include, but are not limited to, intensity-hue-saturation (IHS), high-pass
filtering, principal component analysis (PCA), different arithmetic combination(e.g., Brovey
transform), multi-resolution analysis-based methods (e.g., pyramid algorithm, wavelet
transform), and Artificial Neural Networks (ANNs).
Principal component analysis (PCA), intensity-hue-saturation (IHS), The Brovey transform,
Synthetic Variable Ratio (SVR), High-pass Filtering are standard fusion algorithms.
Three problems must be considered before their application: (1) Standard fusion algorithms
generate a fused image from a set of pixels in the various sources. These pixel-level fusion
33
methods are very sensitive to registration accuracy, so that co-registration of input images at
sub-pixel level is required; (2) One of the main limitations of HIS and Brovey transform is
that the number of input multiple spectral bands should be equal or less than three at a time;
(3) Standard image fusion methods are often successful at improves the spatial resolution,
however, they tend to distort the original spectral signatures to some extent [9,10]. More
recently new techniques such as the wavelet transform seem to reduce the color distortion
problem and to keep the statistical parameters invariable.
Some of these data fusion algorithms are described in the following section.
34
[ ][ ][ ]
1 2 1 1
0 0
[]x 3 3 2 2 B
y = 2 1 0 1 0 G
0
z 3 3 1 1 R
0
0 0 1 2 2
y
H=tan 1 z
( )
x S=cos1
( x+ y+ z
m ( H ) ) I=
( x + y + z)
I M (H , S)
X k (i , j ) X p (n , m)
Y k ( i , j )= n
X k (i , j )
k=2
where Yk(i, j) and Xk(i, j) are the kth fused multispectral band and the original multispectral
band respectively.; I and j denote the pixel and line number respectively. Xp(m, n) is the
original panchromatic band., and m, and n denote the pixel and line number.
In the current case, the image fusion for quickbird data is carried as follows:
(i) selection of spectral bands
(ii) resample of them to panchromatic spatial resolution
(iii) perform Brovey transformation for the resampled new image data
35
The resulting image consist of a combination of the n multispectral bands and panchromatic
image.
where XSPi is the grey value of the ith band of the merged image, PanH is the grey
value of the original high spatial resolution image, XSLi is the grey value of the ith
band of the original MS image, and PanLSyn is the grey value of the low resolution
synthetically panchromatic image by the following simulation equation proposed by
Suits et al. (1988):
4
PanLSyn = i XS Li
i=1
model. The parameters i were then calculated through a regression analysis between the
values simulated through the atmospheric model and then measured for ve typical land
cover types: urban, soil, water, trees and grass. After construction of the PanLSyn, a linear
histogram match was used to force the original SPOT Pan image to match the PanLSyn in
order to eliminate atmospheric and illumination differences.
Zhang (1999) modied the SVR method in order to obtain a more stable i.
36
S ( x , y )= {W high HPF [ PAN ( x , y) ] } {W low LPF [ O' (x , y ) ] }
In equation (1), PAN (x,y) and O' (x , y ) correspond to a pixel at location (x,y) in the
PAN image and the interpolated and registered multispectral image (at wavelength ),
respectively.
HPF and LPF correspond to the high and low pass filter operators, respectively. The result
after applying the filter operator on an image I at location ( x,y) can be expressed as
Ky Kx
N X +1 N +1
along the x and the y axis as k x= k y= y whereas wi,j are the filter
2 2
37
(ii) Secondly, to eliminate the problem of dataset dependency, it employs a set of statistic
approaches to estimate the grey value relationship between all the input bands.
This algorithm was adopted by Digital Globe
http://www.pcigeomatics.com/support_center/tech_papers/techpapers_main.php] and it is
also available in a PCIGeomatica routine (PANSHARP). In the PANSHARP routine, if the
original MS and Pan images are geo-referenced, the resampling process can also be
accomplished together with the fusion within one step. All the MS bands can be fused at one
time. The fusion can also be performed solely on user-specified MS bands.
38
2.3.5.8 Artificial neural network
Artificial neural networks (ANNs) have proven to be a more powerful and self-adaptive
method of pattern recognition as compared to traditional linear and simple nonlinear
analyses. The ANN-based method employs a nonlinear response function that iterates many
times in a special network structure in order to learn the complex functional relationship
between input and output training data.
The General schematic diagram of the ANN-based image fusion method can be summarized
as follow: the input layer has several neurons, which represent the feature factors extracted
and normalized from image A and image B. The hidden layer has several neurons and the
output layer has one neuron (or more neuron). The ith neuron of the input layer connects with
the jth neuron of the hidden layer by weight Wij, and weight between the j th neuron of the
hidden layer and the tth neuron of output layer is
Vjt (in this case t =1). The weighting function is used to simulate and recognize the response
relationship between features of fused image and corresponding feature from original
images (image A and image B).
As the first step of ANN-based data fusion, two registered images are decomposed into
several blocks with size of M and N. Then, features of the corresponding blocks in the two
original images are extracted, and the normalized feature vector incident to neural networks
can be constructed. The next step is to select some vector samples to train neural networks.
Many neural network models have been proposed for image fusion such as BP, SOFM, and
ARTMAP neural network.
The ANN-based fusion method exploits the pattern recognition capabilities of artificial
neural networks, and meanwhile, the learning capability of neural networks makes it feasible
to customize the image fusion process. Many of applications indicated that the ANN-based
fusion methods had more advantages than traditional statistical methods, especially when
input multiple sensor data were incomplete or with much noises. It is often served as an
efficient decision level fusion tools for its self learning characters, especially in land
use/land cover classification. In addition, the multiple inputs - multiple outputs framework
make it to be an possible approach to fuse high dimension data, such as long-term time-
series data or hyper-spectral data.
39
The best results from data fusion is that the multispectral set of fused images should be as
identical as possible to the set of multispectral images that the corresponding sensor
(reference) would observe with the high spatial resolution of panchromatic. As no
multispectral reference images are available at the requested higher spatial resolution, the
assessment of the quality of the fused products is not obvious. Several score indices or
figure metrics have been designed over the years (see, Thomas and Wald, 2007) to evaluate
the performances of the fused images. Both intra-band indices and inter-band indices have
been set up in order to measure respectively, spatial distortions (radiometric and geometric
distortions) and spectral distortions (colour distortions).
In order to assess the performance of data fusion algorithms, three properties should be
verified as expressed by Wald et al., PERS, 1997, Best Paper Award 97:
1. The data fusion products, once degraded to their original resolution, should be equal to the
original.
2. The data fusion image should be as identical as possible to the MS image that would be
acquired by the corresponding sensor with the high spatial resolution of the Pan sensor.
3. The MS set of fused images should be as identical as possible to the set of MS images that
would be acquired by the corresponding sensor with the high spatial resolution of Pan.
As no multispectral reference images are available at the requested higher spatial resolution,
the verification of the second and the third property is not obvious. In order to overcome this
drawback, three different methodological approaches can be followed, the Wald protocol,
the Zhou protocol, and, finally the QNR (Quality with No Reference) index devised by
Alparone et al (2007).
40
This means that performances of fusion methods are supposed to be invariant when fusion
algorithms are applied to the full spatial resolution. Nevertheless, in the context of remote
sensing of archaeology, the small features, which represent a large amount of the
archaeological heritage, can be lost after degrading both the Pan and MS. In this situations,
the archaeological feature will be missed, and, therefore, the evaluation of data fusion results
could not be performed over the targets of interest. To avoid the degradation, one can
considered the two alternative approaches described in section 2. 7.2 and 2.7.3.
41
( 2 x + y + C 1 )( 2 xy +C 2 )
Q=
( 2x + 2y + C 1 ) ( 2x +2y +C 2)
( xy +C 3) (2 x y +C 1) (2 x y + C 2)
Q ( x , y )=f ( l ( x , y ) , c ( x , y ) , s ( x , y ) )=
( x y +C 3) ( 2x + 2y +C 1) ( 2x + 2y +C 2)
42
The spectral distortion is obtained by computing the difference of Q values from the fused
MS bands and the input MS bands, re-sampled at the same spatial resolution as the Pan
image
The Q is calculated for each couple of bands of the fused and re-sampled MS data to form
two matrices with main diagonal equal to 1
The measure of spectral distortion Dl is computed by using a value proportional to the p-
norm of the difference of the two matrices
L L
1 ^ r ) Q (~
Ge , ~
2
D = P 2
L L l =1 r=1
| ^e,G
Q(G Gr )|
r l
^ e ,G
(G ^r )
where L is the number of the spectral bands processed, and denotes the Q is
calculated for each couple of bands of the fused and resampled MS data.
The spatial distorsion is computed two times: 1) between each fused MS band and the Pan
image; and than 2) between each input MS band and the spatially degraded Pan image. The
spatial distortions Ds are calculated by a value proportional to the q-norm of the differences
L
q 1 ^ , P ) Q ( ~ ~ q
DS =
L l=1
|Qn ( G l n G l , P )|
^l,P)
(G
where L is the number of the spectral bands processed, and denotes the Q is calculated
~ ~
between each fused MS band and the Pan image, and Qn ( G l , P ) denotes the Q is calculated
between each input MS band and the spatially degraded Pan image
43
Hall, L.; Llinas, J. An introduction to multisensor data fusion. Proc. IEEE. 1997, 85, 623.
Pohl, C.; Van Genderen, J.L. Multisensor image fusion in remote sensing: concepts, methods and
applications. Int. J. Remote Sens. 1998, 19, 823854.
Dai, X.; Khorram, S. Data fusion using artificial neural networks: a case study on multitemporal
change analysis. Comput. Environ. Urban Syst. 1999, 23, 1931.
44