Lecture20 Fuzzy PDF
Lecture20 Fuzzy PDF
FUZZY CLASSIFICATION
1. Introduction
Hard classification is based on classical set theory in which precisely defined boundaries are
generated for a pixel as either belonging to a particular class or not belonging to that class.
During hard classification, each individual pixel within a remotely sensed imagery is given a
class label. This technique works efficiently when the area imaged is homogeneous in nature.
But geographical information is heterogeneous in nature. This implies that the boundaries
between different land cover classes are fuzzy that gradually blend into one another.
Fuzziness and hardness are characteristics of landscape at a particular scale of observation. If
the aim of end user is to label each pixel unambiguously, the existence of heterogeneous
pixels containing more than one land cover type will create a problem. This is owing to the
fact that the pixel may not fall clearly into one of the available classes as it represents mixed
classes. This problem also surfaces if the satellite borne instrument imaging earth has a large
field of view (1 km or more). Fuzzy set theory provides useful concepts to work with
imprecise data. Fuzzy logic can be used to discriminate among land cover types using
membership functions. These are elaborated in the sections below.
Researchers in the field of psycholinguistics have investigated the way humans evaluate
concepts and derive decisions. Analysis of this kind of uncertainty usually results in a
perceived probability rather than the mathematically defined mobility, which forms the basis
of fuzzy sets (Zadeh, 1973). The theory of fuzzy sets was first introduced when it was
realized that it may not be possible to model ill-defined systems with precise mathematical
assumptions of the classical methods, such as probability theory (Chi et al., 1996). The
underlying logic of fuzzy-set theory is that it allows an event to belong to more than one
sample space where sharp boundaries between spaces are hardly found.
The operations on fuzzy sets presented in this section are based on the original works of
Zadeh (1965) and these should not be considered as a complete collection.
1. Fuzzy Union: The union of two fuzzy sets A and B with respective membership
functions A ( x) and B ( x) is a fuzzy set C, written as C = A B, whose
membership value is the smallest fuzzy set containing both A and B.
2. Fuzzy Intersection: The intersection of two fuzzy sets A and B with respective
membership functions A ( x) and B ( x) is a fuzzy set C, written as C= A B whose
membership function is related to those of A and B by:
A 1 A
' (4)
3. Membership Function
The membership function is the underlying power of every fuzzy model as it is capable of
modeling the gradual transition from a less distinct region to another in a subtle way (Chi et
al., 1996). Membership functions characterize the fuzziness in a fuzzy set, whether the
elements in the set are discrete or continuous, in a graphical form for eventual use in the
mathematical formalisms of fuzzy set theory. But the shapes used to describe fuzziness have
very few restrictions indeed. There are an infinite number of ways to graphically depict the
membership functions that describe this fuzziness (Ross et al., 2002). Since membership
functions essentially embody all fuzziness for a particular fuzzy set, its description is the
essence of a fuzzy property or operation. Because of the importance of the shape of the
membership function, a great deal of attention has been focused on development of these
functions. There are several standard shapes available for membership functions like
triangular, trapezoidal and Gaussian, etc. The direct use of available shapes for membership
function is found effective for image enhancement where different types of membership
functions are used to reduce the amount of iterations carried out by a relaxation technique and
provides a better way to handle the uncertainty of the image histogram. The choice of
membership function is problem dependent which requires expert knowledge (Zadeh, 1996).
In situations wherein prior information about data variation is not available, membership
values can also be generated from the available data using clustering algorithms, which is the
normal practice.
Fuzzy set theory provides useful concepts and tools to deal with imprecise information and
partial membership allows that the information about more complex situations such as cover
mixture or intermediate conditions be better represented and utilized (Wang, 1990). Use of
fuzzy sets for partitioning of spectral space involves determining the membership grades
attached to each pixel with respect to every class. Instead of being assigned to a single class,
out of m possible classes, each pixel in fuzzy classification has m membership grade values,
where each pixel is associated with a probability of belonging to each of the m classes of
interest (Kumar, 2007). The membership grades may be chosen heuristically or subjectively.
Heuristically chosen membership functions do not reflect the actual data distribution in the
input and the output spaces. Another option is to build membership functions from the data
available for which, we can use a clustering technique to partition the data, and then generate
membership functions from the resulting clusters. A number of classification methods may be
used to classify remote sensing image into various land cover types. These methods may be
broadly grouped as supervised and unsupervised (Swain and Davis, 1978). In fuzzy
unsupervised classification, membership functions are obtained from clustering algorithms
like C-means or ISODATA method. In fuzzy supervised classification, these are generated
from training data.
The classification of remotely sensed imagery relies on the assumptions that the study
area is composed of a number of unique, internally homogeneous classes, classification
analysis is based on reflectance data and that ancillary data can be used to identify these
unique classes with the aid of ground data (Lilliesand and Kiefer, 1994). The fuzzy
approaches are adopted as they take into account the fuzziness that may be characteristic of
the ground data (Foody, 1995). Zhang and Foody (1998) investigated fuzzy approach for land
cover classification and suggested that fully fuzzy approach holds advantages over both the
conventional hard methods and partially fuzzy approaches. Clustering algorithms can be
loosely categorized by the principle (objective function, graph-theoretical, hierarchical) or by
the model type (deterministic, statistical and fuzzy). In the literature on soft classification, the
fuzzy c-mean (FCM) algorithm is the most popular method (Bastin 1997; Wu and Yand
2002; Yang et al., 2003). One of the popular parametric classifiers based on statistical theory
is the Fuzzy Gaussian Maximum Likelihood (FGML) classifier. This is an extension of
traditional crisp maximum likelihood classification wherein, the partition of spectral space is
based on the principles of classical set theory. In this method, land cover classes can be
represented as fuzzy sets by the generation of fuzzy parameters from the training data. Fuzzy
representation of geographical information makes it possible to calculate statistical
parameters which are closer to the real ones. This can be achieved by means of the
probability measures of fuzzy events (Zadeh, 1968). Compared with the conventional
methods, this method has proved to improve remote sensing image classification in the
aspects of geographical information representation, partitioning of spectral space, and
estimation of classification parameters (Wang, 1990). Despite the limitations due to its
assumption of normal distribution of class signature (Swain and Davis, 1978), it is perhaps
one of the most widely used classifiers (Wang, 1990; Hansen et al., 1996; Kumar, 2007).
In remote sensing, pixel measurement vectors are often, considered as points in a spectral
space. Pixels with similar spectral characteristics form groups which correspond to various
ground-cover classes that the analyst defines. The groups of pixels are referred to as spectral
classes, while the cover classes are information classes. To classify pixels into groups, the
spectral space should be partitioned into regions, each of which corresponds to one of the
information classes defined. Decision surfaces are defined precisely by some decision rules
(for example, the decision rule of conventional maximum likelihood classifier) to separate the
regions. Pixels inside a region are classified into the corresponding information class. Such a
partition is usually called a hard partition. Fig 4.1a illustrates a hard partition of spectral
space and decision surfaces. A serious drawback of the hard partition is that a great quantity
of spectral information is lost in determining the pixel membership, Let X be a universe of
discourse; whose generic elements are denoted x: X = {x}. Membership in a classical set A of
X is often viewed as a characteristic function A from {0,1} such that A(x) = 1 if and only if
xA. A fuzzy set (Zadeh, 1965) B in X is characterized by a membership function, fB, which
associates with each x a real number in [0,1]. fB(x) represents the "grade of membership" of x
in B. The closer the value of fB(x) is to 1, the more x belongs to B. A fuzzy set does not have
sharply defined boundaries and an element may have partial and multiple memberships.
Fuzzy representation of geographical information enables a new method for spectral space
partition. When information classes can be represented as fuzzy sets, so can the cor-
responding spectral classes. Thus a spectral space is not partitioned by sharp surfaces. A pixel
may belong to a class to some extent and at the same time belong to another class to some
other extent. Membership grades are attached to indicate these extents. Such a partition is
referred to as a fuzzy partition of spectral space. Fig 1b illustrates membership grades of a
pixel in a fuzzy partition. A fuzzy partition of spectral space can represent a real situation
better than a hard partition and allows more spectral information to be utilized in subsequent
analysis. Membership grades can be used to describe cover class mixture and intermediate
cases.
Figure 4 Hard partition of spectral space and decision surfaces; 1b: membership grades of a pixel in fuzzy partition
of spectral space.
F1 ( x1 ) F1 ( x2 ) F ( x )
1 N
F2 ( x1 ) F2 ( x2 )
F ( x)
. .....
Fc ( x1 ) Fc ( x2 ) Fc ( xN )
Membership grades can be used to describe cover class mixture and intermediate classes. In
the process, the stray pixels between classes may be classified as such. In Supervised
Approach, which is similar to maximum likelihood classification approach, instead of normal
mean vector and covariance matrices, fuzzy mean vectors and fuzzy covariance matrices are
developed from statistically weighted training data, and the training areas may be a
combination of pure and mixed pixels. By knowing mixtures of various features, the fuzzy
training class weights are defined. A classified pixel is assigned a membership grade with
respect to its membership in each information class. In this procedure the conventional mean
and covariance parameters of training data are represented as a fuzzy set. The following two
equations (1, 2) describe the fuzzy parameters of the training data:
(x )x c i i
M *
c
i 1
n
(5)
(x )
i 1
c i
c ( xi )( xi M C* )( xi M C* ) T
Vc* i 1
n
(6)
i 1
c ( xi )
where, Mc* is the fuzzy mean of training class c; Vc* is the fuzzy covariance of training class
c; xi is the vector value of pixel i, c(xi) is the membership of pixel xi, to training class c, n is
the total number of pixels in the training data. In order to find the fuzzy mean (eqn. 1 ) and
fuzzy covariance (eqn. 2) of every training class, the membership of pixel xi, to the training
class c must be first known. Membership function to class c based on the conventional
maximum likelihood classification algorithm with fuzzy mean and fuzzy covariance is:
Pc* ( xi )
c ( xi ) m
(7)
P j 1
*
j ( xi )
where, Pc*(xi) is the maximum likelihood probability of pixel xi to class c, m is the number of
classes. The membership grades of a pixel vector x depend upon the pixels position in the
spectral space. The a posteriori probabilities are used to determine the class proportions in a
pixel. The algorithm iterates until there is no significant change is the membership values
obtained.
As this method is an extension of MLC, it inherits its advantages and disadvantages. The
disadvantage is the normality of data assumption which it is based upon. Compared with the
conventional methods, FGML improves remote sensing image classification in the aspects of:
1) Representation of geographical information, 2) Partitioning of spectral space, and 3)
Estimation of classification parameters (Wang, 1990).
Fuzzy C-means clustering also known as Fuzzy ISODATA is an iterative technique which is
separated from hard c-means that employ hard partitioning. The FCM employs fuzzy
partitioning such that a data point can belong to all groups with different membership grades
between 0 and 1. The aim of FCM is to find cluster centroids that minimize the dissimilarity
function. Differing from hard clustering techniques such as c-means, which will converge
the objective function iteratively to a local minimum from each sample to the nearest cluster
centroid, fuzzy clustering methods assign each training sample a degree of uncertainty
described by a membership grade. A pixel's membership grade function with respect to a
specific cluster indicates to what extent its properties belong to that cluster. The larger the
membership grade (close to 1), the more likely that the pixel belongs to that cluster. FCM
algorithm was first introduced by Dunn (1973); and the related formulation and the algorithm
was extended by Bezdek (1974), The purpose of FCM approach, like the conventional
clustering techniques; is to minimize the criteria in the least squared error sense. For c 2
and m any real number greater than 1, the algorithm chooses i, : X [0,1] so that i i = 1
and wj Rd for i=l,2,...,c to minimize objective function
1 c n
2
J FCM ( i , j ) m xi x j (8)
2 j 1 i 1
where i,j is the value of the jth membership grade on the ith sample xi. The vectors w1,....,
wj,...., wc, called cluster centroids, can be regarded as prototypes for clusters represented by
the membership grades. For the purpose of minimizing the objective function, the cluster
centroids and membership grades are chosen so that a high degree of membership occurs for
samples close to the corresponding centroids. The FCM algorithm, a well-known and
powerful method in clustering analysis, is further modified as follows.
1 c n 1 c n
im, j ln j
2
J PFCM ( i , j ) m xi x j (9)
2 j 1 i 1 2 j 1 i 1
where j is a proportional constant the value of class j and (0) is a constant. When =0,
JPFCM equals JFCM. The penalty term is added to the JFCM objective function, where
n n
im, j m
i, j xi
j c
i 1
n
, j 1,2,..., c wj i 1
n
(10)
j 1 i 1
m
i, j
i 1
m
i, j
1
c 2 1 /( m 1)
xi w j ln j
i, j ; i 1,2,..., n; j 1,2,..., c
1 /( m 1)
(11)
l 1 xi wl ln l
2
In the last step, a defuzzification process should be applied to the fuzzy partition data to
obtain the final segmentation. A pixel is assigned to a cluster when its membership grade in
that cluster is the highest. The disadvantage of FCM is that due to the use of an inner-product
norm induced distance matrix, its performance is good only when the data set contains
clusters of roughly the same size and shape. Also, since it is unsupervised, the order of
occurrence of class fraction images cannot be predicted. However, the independence of this
algorithm to any type of data distribution makes it popular among all the clustering
algorithms.
membership of the sample elements in classes n and m . The fuzzy set operators can be used
within the matrix building procedure to provide a fuzzy error matrix M . The assignment to
the element M (m, n) involves the computation of the degree of membership in the fuzzy
Class c M ( c ,1) M ( c ,2) M ( c ,c )
Producers Overall
Accuracy Accuracy
The fuzzy error matrix can be used as the starting point for descriptive techniques in the same
manner as used in the conventional error matrix.
th
X i = row marginal total in i row of confusion matrix.
elements.
Where, the quantities X i and X i represent column marginal and row marginal total of
membership grades respectively. Using this value of chance agreement Kappa Co-efficient or
Khat index is defined using Equation (3.29) (Stein, Meer and Gorte, 2002):
( p0 pc )
(17)
(1 pc )
(e) Z statistic
This test determines if the results of two error matrices are statistically similar or not
(Congalton et al., 1983). It is calclulated using Equation (3.30)
a b
Z (18)
a2 b2
where Z is the test statistic for significant difference in large samples, a and b are the Khat
indices for two error matrices a and b with variance for Khat indices as a2 and b2 .
6. Case Study
Fuzzy clustering algorithms explained in the previous section are applied to identify Paddy,
Semi-dry and Sugracane crops using IRS LISS I (Linear Imaging Self Scanner) data in the
Bhadra command area for Rabi season of 1993. Bhadra dam is located in Chickmagulur
District of Karnataka state. The dam is situated 50 km upstream of the point where Bhadra
river joins Tunga, another tributary of Krishna river, and intercepts a catchment of almost
2000 sq.km. Bhadra reservoir system consists of a storage reservoir with a capacity of 2025
M m3, a left bank canal and a right bank canal with irrigable areas of 7,031 ha and 92,360 ha
respectively. Figure 6.1 shows location map of Bhadra command area. Major crops cultivated
in the command area are Paddy, Semi-dry and Sugarcane. Paddy transplantation is staggered
over a period of more than a month and semi-dry crops are sown considerably earlier to
Paddy. The command area is divided into three administrative divisions, viz., Bhadravati,
Malebennur and Davangere.
Satellite imageries used for the study are acquired from IRS LISS I (with spatial resolutions
of 72.5 m) on dates 20th February, 14th March and 16th April in the years 1993. Figure 6.2
shows the standard FCC (False colour composite) of Bhadra commad area on 16th April
1993. For the study area, ground truth was collected for various crops by scientists from
National Remote Sensing Agency, Hyderabad, during Rabi 1993, by visiting the field.
Use of penalized fuzzy c-means algorithm requires selection of values for the number of
clusters c, weighting exponent m, and constant v. The algorithm is implemented with c= 20,
15, 9, 6 and 5 clusters, the value of m between 1.4 and 1.6, and the value of v between 1.0 and
1.5. The algorithm gave good result with c = 6, 9; m =1.5 and v =1.0.
Since paddy transplantation is staggered across the command area, satellite data of any one
date does not represent the same growth stage at all locations. In view of this heterogeneity in
crop calendar, in order to obtain complete estimate of area under any crop as well as to ensure
better discriminability, satellite data of three dates as mentioned in the previous section are
used to reflect the following features.
Table 6.1. Semi-dry crop classified using single date imagery with c = 5 and 15
Date Available
c Correctly Accuracy
ground Misclassified
Classified (%)
truth
20th February, 5 86 84 2 98
1993 15 86 63 23 73
14th March, 5 86 82 4 95
1993 15 86 70 16 81
Using 14th March data, with c=5, majority of the Paddy locations were classified into water
cluster and some locations to Semi-dry crop because at that time paddy is just transplanted in
most of the areas and therefore; water is dominating compared to the crop seedlings. With
16th April data and c=5; 42 Paddy locations were correctly classified out of 53 available
ground truth locations. Detailed results are given in Laxmi Raju (2003).