Unit-14_accuracy_assessment
Unit-14_accuracy_assessment
net/publication/324943246
CITATIONS READS
20 14,615
1 author:
Anupam Anand
World Bank
45 PUBLICATIONS 1,739 CITATIONS
SEE PROFILE
All content following this page was uploaded by Anupam Anand on 04 May 2018.
Structure
14.1 Introduction
Objectives
14.2 Concept of Accuracy Assessment
Definition
Need for Accuracy Assessment
Sources of Errors
14.3 Consideration of Sampling Size and Scheme
14.4 Calculation of Classification Accuracy
Error Matrix
Generation of Error Matrix
Interpretation of Error Matrix
Limitations
14.5 Kappa Analysis
Calculation Steps
Advantages
14.6 Summary
14.7 Unit End Questions
14.8 References
14.9 Further/Suggested Reading
14.10 Answers
14.1 INTRODUCTION
In the previous unit, you have learnt about different image classification
methods which help us to create thematic maps. We also discussed the
advantages and limitations of some of the commonly used classification
algorithms. Both supervised and unsupervised classification needs direct or
indirect information of the surface characteristics e.g., for unsupervised
classification the user must define the classes based on prior information of
surface and in case of supervised classification, it is based on training samples
from the surface. Quality and quantity of training samples, therefore, have
considerable implication on the accuracy of the classified images.
Once you have an interpreted map, the obvious step is that you would want to
know how much accurate those outputs are because inaccuracies in outputs
will have their bearing on the map’s utility and users would have greater
confidence in utilising data if its accuracy is high. Hence, assessment of
accuracy is a very important part of the interpretation as it not only tells you
about quality of maps generated or classified images but also provides you
with a benchmark to compare different interpretation and classification
methods.
In this unit, you will learn about accuracy assessment and related concepts and
methods. We will also discuss the role of sampling size for the purpose of
accuracy assessment.
59
Processing and Objectives
Classification of Remotely
Sensed Images After studying this unit, you should be able to:
• define accuracy assessment;
• discuss need for accuracy assessment;
• generate a error matrix for interpreted outputs;
• explain the role of sampling size in accuracy assessment; and
• list measures for accuracy assessment.
14.2.1 Definition
Accuracy defines Accuracy is referred to in many different contexts. In the context of image
correctness and it interpretation, accuracy assessment determines the quality of information
measures the degree of
derived from remotely sensed data. Assessment can be either qualitative or
agreement between a
standard that assumed to quantitative. In qualitative assessment, you determine if a map ‘looks right’ by
be correct and a map comparing what you see in the map or image with what you see on the ground.
created from an image. A However, quantitative assessments attempt to identify and measure remote
visually interpreted map or sensing based map errors. In such assessments, you compare map data with
classified image is only
ground truth data, which is assumed to be 100% correct.
said to be highly accurate,
when it corresponds Accuracy of image classification is most often reported as a percentage correct
closely with the assumed and is represented in terms of consumer’s accuracy and producer’s accuracy.
standard.
The consumer’s accuracy (CA) is computed using the number of correctly
classified pixels to the total number of pixels assigned to a particular category.
60 It takes errors of commission into account by telling the consumer that, for all
areas identified as category X, a certain percentage are actually correct. The Accuracy Assessment
producer’s accuracy (PA) informs the image analyst of the number of pixels
correctly classified in a particular category as a percentage of the total number The term consumer’s
of pixels actually belonging to that category in the image. Producer’s accuracy accuracy is used when a
measures errors of omission. classified image is
examined from the user’s
14.2.2 Need for Accuracy Assessment point of view. Producer’s
accuracy is used when
The need for assessing accuracy of a map generated from any remotely sensed same is viewed from
product has become a universal requirement and an integral part of any analyst’s perspective.
classification project. The user community needs to know accuracy of the
classified image data being used. Moreover, different projects have different
accuracy requirement and only those classified images which are above a
certain level of accuracy can be used. Furthermore, accuracy becomes a
critical issue while working in a Geographical Information System (GIS)
framework where you use several layers of remotely sensed data. In such
cases, it would be very important to know the overall accuracy which is
dependent upon knowing the accuracy of each of data layers.
There are a number of reasons why assessment of accuracy is so important.
Some of them are given below:
• accuracy assessment allows self-evaluation and to learn from mistakes in
the classification process
• it provides quantitative comparison of various methods, algorithms and
analysts and
• it also ensures greater reliability of the resulting maps/spatial information
to use in decision-making process.
The need for accuracy assessment is emphasised in literature as well as in
anecdotal evidence. For example, maps of wetlands from various states of
India (e.g., Jammu and Kashmir, Rajasthan, Tamil Nadu, West Bengal) have
been made by several central, state and local agencies using techniques that
included satellite images, aerial photographs and field data. Simply comparing
the various wetland maps would yield little agreement about location, size and
extent of these. In the absence of a valid accuracy assessment you may never
know which of these maps to use.
A map using remotely sensed or other spatial data cannot be regarded as the
final product without taking necessary steps towards assessing accuracy or
validity of that map.
A number of methods exist to investigate accuracy/error in spatial data
including visual inspection, non-site-specific analysis, generating difference
images, error budget analysis and quantitative accuracy assessment.
Fig. 14.1: (a) Non-site-specific accuracy in which two images are compared based on
their total areas. Note that the area of image 1(i.e. A+B+C) is equal to the area
of image 2 (i.e. A+B+C) and (b) site-specific accuracy in which two images are
compared on a site-by-site (i.e. cell-by-cell or pixel by pixel) (source: modified
62 from Campbell, 1996)
Check Your Progress I Spend Accuracy Assessment
5 mins
1) List the prerequisites for accuracy assessment.
......................................................................................................................
......................................................................................................................
......................................................................................................................
......................................................................................................................
The number of samples for each category can also be weighted based on the
relative importance of that category within the objectives of the mapping or on
the inherent variability within each of the categories. Sometimes, it is better to
concentrate the sampling on the categories of interest and increase their
63
Processing and number of samples while reducing the number of samples taken in less
Classification of Remotely
Sensed Images
important categories. Also, it may be useful to take fewer samples in
categories that show little variability such as water or forest plantations and
increase sampling in the categories that are more variable such as uneven-aged
forests or riparian areas. In summary, the goal is to balance the statistical
recommendations to obtain an adequate sample from which to generate an
appropriate error matrix within the objectives, time, cost and practical
limitations of the mapping project.
Along with sample size, sampling scheme is an important part of any accuracy
assessment. Selection of the proper scheme is absolutely critical in generating
an error matrix that is representative of the entire classified image. Poor choice
in sampling scheme can result in significant biases being introduced into the
error matrix that may over or under estimate true accuracy. In addition, the use
of proper sampling scheme may be essential depending on the analysis
techniques to be applied to the error matrix. Many researchers have expressed
opinions about proper sampling scheme to use, including everything from
simple random sampling to stratified, systematic and unaligned sampling.
Despite all these opinions, very little work has actually been performed in this
area. One of the studies carried out on sampling simulations on three
geographically diverse areas such as forest, agriculture and rangeland
concluded that in all cases simple random sampling and stratified random
sampling provided satisfactory results. Despite the desirable statistical
properties of simple random sampling, this sampling scheme is not always
very practical to apply. Simple random sampling tends to under-sample small
but possibly very important areas unless the sample size is significantly
increased. For this reason, stratified random sampling is recommended where
a minimum number of samples are selected from each stratum (i.e. category).
Even stratified random sampling can be somewhat impractical because of
having to collect ground information for the accuracy assessment at random
locations on the ground.
There are two problems which arise while using random locations:
• location can be very difficult to access and
• they can only be selected after the classification has been performed.
The second condition limits accuracy assessment data of being collected late
in the project instead of in conjunction with the training data collection,
thereby increasing costs of the project. In addition, in some projects time
between project beginning and accuracy assessment may be so long as to
cause temporal problems in collecting reference data.
Once a classification exercise has been carried out, there is a need to determine the
degree of error in the end product which includes identified categories on the map.
Errors are the result of incorrect labeling of the pixels for a category. The most
commonly used method of representing the degree of accuracy of a classification is
to build a k×k array, where k represents the number of categories. For example, in An error matrix is a
Table 14.1, the left hand side of the table is marked with the categories on the square array of rows and
columns in which each
standard (i.e. reference) map/data. The top side of same table is marked with the row and column repre-
same k categories but these categories represent end product of a created map to sents one category/class in
be evaluated. The values in the matrix indicate the numbers of pixels. This the interpreted map. Error
arrangement establishes a standard form which helps to find site-specific error in the matrix is also known as
end product and is known as error matrix. Error matrix is useful for the confusion matrix, evalua-
tion matrix, or a contin-
determination of overall errors for each category and misclassifications by category, gency table.
as a result it is also known as confusion matrix. The strength of a confusion matrix
is that it not only identifies the nature of the classification errors but also their
quantities.
Error matrix is a set array (rows and columns) that can be used to evaluate the
degree of correctness of classified image. According to Campbell (1987), it is
a method of reporting site-specific error. It is derived from a comparison of
two types of maps such as a standard (reference) map and a classified map. It
has two-dimensional arrangement in which rows show the reference data and
column show the classified data.
65
Processing and a comparison, the classifier or analyst should make a network of appropriate (i.e.
Classification of Remotely
Sensed Images
neither very small nor very large) uniform cells that form the units of comparison for
site-specific accuracy assessment. Then two images are superimposed either by
manually or digitally depending on the availability of the images. Then superimposed
images are analysed on the basis of a cell-by-cell in case of manual comparison or
pixel-by-pixel in case of digital assessment and tabulated for each cell/pixel the
dominant category shown on the standard/reference data and category of the
corresponding cell/pixel on the classified image. The classifier also keeps a count of
the numbers of cells or pixels in each reference category as they are assigned to
categories on the created image (see Table 14.1). Finally, the summation of the
tabulation forms the basis for generation of the error matrix.
You can read about the various components of the confusion matrix outlined below:
• rows correspond to classes in the ground truth map (or test set)
The water category of Table 14.1, for example, has accuracy 0.89 meaning that
approximately 89% of the water ground truth pixels also appear as water
pixels in the classified image. This statistics is also known as errors of
Accuracy for water
commission. category of Table 14.1 can
be calculated as given
The average accuracy is calculated as given below: below:
Total number of correct
Sum of all accuracy figures in accuracy column pixels for water = 240.
Average Accuracy = —————————————————————— Total number of pixel in
Total number of categories in the test set water row
= 0+20+0+0+0+240+10
The average accuracy of data given in Table 14.1 = 270.
Hence, accuracy for water
= (0.83+0.71+0.58+0.56+0.88+0.89) / 6 = 240/270 = 0.89
= 4.428 / 6 = 0.74
= 74.25%
This means average accuracy of the classification shown in Table 14.1 is
74.25% (or 0.74).
Reliability or User’s Accuracy
User’s accuracy is defined as the probability that a pixel classified on the
image actually represents that category on the ground. The figures in row
reliability (user’s accuracy) present the reliability of classes in the classified
image (Table 14.1). It is calculated as given below:
The water category of Table 14.1, for example, has reliability 0.86 meaning
that approximately 86% of the water pixels in the classified image actually
represent water on the ground. This statistics is also called errors of omission.
Reliability for water
The average reliability is calculated as given below: category of Table 14.1 can
be calculated as shown
Sum of all reliability figures in reliability row below:
Average reliability = ————————————————————— Total number of correct
Total number of categories in the test set
pixels for water = 240.
Total number of pixel in
Average reliability of data given in Table 14.1 water column
= (0.90+0.76+0.88+0.92+0.51+0.86) / 6 = 10+10+10+10+0+240
= 280.
= 4.81 / 6 = 0.80 Hence, reliability for water
= 240/280 = 0.86.
= 80.27%
It indicates average reliability of the classification shown in Table 14.1 as
80.27% (or 0.80).
From the accuracy and reliability values for different classes given in Table 14.1, it
can be concluded that the test set classes crop and urban were difficult to classify 67
Processing and as many of such test set pixels were excluded from the crop and urban categories,
Classification of Remotely
Sensed Images
thus the areas of these classes in the classified image are probably underestimated.
On the other hand, class open land in the image is not very reliable as many test set
pixels of other categories were included in the open land category in the classified
image. Thus, the area of open land category in the classified image is probably
overestimated.
Overall Accuracy
We have discussed about the individual classes and their accuracies. It is also
desirable to calculate a measure of accuracy for the entire image across all
classes present in the classified image. The collective accuracy of map for all
the classes can be described using overall accuracy, which calculates the
proportion of pixels correctly classified.
For the sample data presented in Table 14.1, the overall accuracy
= (440+220+210+240+230+240) / (490+290+240+260+450+280)
= 1580 / 2010 = 0.78
= 78%.
It indicates that overall accuracy of the classification shown in Table 14.1 is 78%.
14.4.4 Limitations
Use of confusion matrix for accuracy assessment has become a standard
practice in quality assessment of remote sensing products. However, it is not
free from limitations due to three crucial assumptions involved in the
classification accuracy assessment:
• that the reference data are truly representative of the entire classification,
which is quite unlikely
• the reference data and classified image are perfectly co-registered, which
is impossible and
• there is no error in the reference data, which again is highly unlikely.
The actual accuracy of our classification is unknown because it is impossible
to perfectly assess the true class of every pixel. It is possible to produce a
misleading assessment of classification accuracy. Depending on how the
reference data are collected, our estimate of accuracy may be either
conservative or optimistic. If our estimate is less than the actual classification
accuracy, then we have made a conservative estimate. Some of the sources of
conservative estimates are:
• errors in reference data
• positional errors and
• minimum mapping unit of reference grid.
68
• positional errors and Accuracy Assessment
..............................................................................................................................
..............................................................................................................................
70
Table 14.4: Error matrix showing the products of row and column Accuracy Assessment
marginals based on Table 14.3
Classification result (i.e. image to be evaluated)
Forest Water Urban
Ground truth
(i.e. reference
Forest 30x57=1710 30x57=1710 40x57=2280
image)
14.5.2 Advantages
One of the advantages of using this method is that you can statistically
compare two classification products. For example, two classification maps can
be made using different algorithms and you can use the same reference data to
verify them. Two Khats can be derived like Khat1, Khat2. For each Khat, the
variance can also be calculated. Kappa coefficient, unlike the overall accuracy,
includes errors of omission and commission. Computation of the Kappa
71
Processing and kappa and average mutual information (AMI). AMI is based on use of
Classification of Remotely
Sensed Images
posteriori entropies for one map given that the class identity from the second map
allows evaluation of individual class performance. Unlike the percentage correct or
Kappa, that measures correctness, the AMI measures consistency between two
maps. It provides an alternate viewpoint because it is used to access similarity of
maps. For example, it can be used to compare the consistency between maps of
the same region that have entirely different themes.
Accuracy assessment is still relatively new and is an evolving area in remote sensing.
The effectiveness of different methods and measurement are still being explored and
debated.
14.6 SUMMARY
We have studied in this unit about the concepts of accuracy assessments. This can
be summarised in the following points:
• Assessing accuracy for each category as well as for the whole image is
essential to compare the results of various classification techniques and
quality and reliability of the results obtained.
• Accuracy in image classification is affected because of errors of inclusion
and errors of exclusion.
• Sampling size is an important consideration for accuracy assessment and
sufficient number of samples should be taken for the same.
• Error/confusion matrix can be used for accuracy and reliability
assessments.
• Overall accuracy is a measure of accuracy for the whole image across all
categories.
• Kappa coefficient is another method for accuracy assessment having a
number of advantages over other methods.
14.10 ANSWERS
Check Your Progress I
Standard (reference) image and classified image data are the basic
prerequisites for accuracy assessment.
73
Processing and
Classification of Remotely GLOSSARY
Sensed Images
Contrast: Ratio between the energy emitted or reflected by an object and its
immediate surroundings.
Contrast ratio: The ratio of reflectances between the brightest and darkest
parts of an image.
Distortions: Are the errors in the remotely sensed image in terms of the pixel
shape, position or the recorded value.
75
Processing and Median: The central value in a set of data such that an equal number of values are
Classification of Remotely
Sensed Images
greater than and less than the median.
Mode: Represents most commonly occurring value in a set of data. For an image
histogram, peak of the curve represents mode.
Pixel: A picture element; smallest element of an image that has been electroni-
cally coded in an array.
Training area: A sample of the Earth’s surface with known properties; the
statistics of the imaged data within the area are used to determine decision
boundaries in classification.
77
Processing and
Classification of Remotely ABBREVIATIONS
Sensed Images