0% found this document useful (0 votes)
390 views

Data Quality and Error Analysis in GIS

Uploaded by

Elavarasan Ela
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
390 views

Data Quality and Error Analysis in GIS

Uploaded by

Elavarasan Ela
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

ABSTRACT

One of the major challenges of GIS is dealing with the


uncertainty and the assessment of the quality of spatial
Data Quality information.
and The challenge is to assess the quality of spatial
information not just the quality of spatial data.
Error Analysis in GIS Many professionals are involved in providing GIS
services. Surveying is only one of them.

Joshua Greenfeld, PhD, LS For surveying to make a mark on the GIS industry and
Professor emeritus, NJIT become a prominent stake holder of GIS, it has to offer
Professor, Israel Institute of Technology some expertise that most other professionals cannot.
Unfortunately, the ability to collect spatial data is becoming
a common skill and the surveyors positioning expertise is
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 1 not as unique
Data Quality as itin GIS
and Error Analysis used toGreenfeld
(c) Dr. J. be. 2

ABSTRACT Objective
There is one area that surveyors have an advantage over The objective of this seminar is to enable surveyors to
other GIS professionals is their propensity and ability to understand the broader issues of accuracy assessment
understand and quantify spatial errors and accuracies. beyond positional accuracies.
In surveying, the uncertainty and quality assessment is It will outline the extended definition of uncertainty and
mostly confined to positioning or positional accuracies. quality as it applies to GIS.
The quality of surveying results is typically assessed on the It will include an overview on the errors and uncertainties
basis of measurement accuracy and the propagation of that could impact the quality of spatial data.
these accuracies into other computed quantities.
This will be followed by discussing the impact of errors in
In GIS uncertainty and quality issues are much more spatial data on spatial information.
broad. In addition to positional accuracy there is:
The ISO geospatial standards will be reviewed as well.
attribute accuracy, completeness of the data, sources and
lineage of the data, logical consistency, fuzziness of the Finally, some practical tools and examples of numerical
spatial phenomenon, currency of the data and other and statistical assessment of uncertainty and quality of
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 3 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld
uncertainty issues. spatial information will be discussed and demonstrated. 4

1
Importance of Quality No unified definition of data quality
1. Data Quality refers to the degree of excellence
exhibited by the data in relation to the portrayal of the
Gain confidence in geodata actual phenomena. GIS Glossary
Reduce users‘ complaints 2. The state of completeness, validity, consistency,
timeliness and accuracy that makes data appropriate
Get customer’s satisfaction for a specific use. Government of British Columbia
Minimize consecutive costs caused by decisions 3. The totality of features and characteristics of data
or actions based on erroneous data that bears on their ability to satisfy a given purpose; the
sum of the degrees of excellence for factors related to
data. Glossary of Quality Assurance Terms

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 5 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 6

No unified definition of data quality Error and Uncertainty in GIS


4. Information Quality : the fitness for use of • One of the major problems currently existing within GIS is
information; information that meets the requirements of the aura of accuracy surrounding digital geographic data
its authors, users, and administrators. (Martin Eppler) • Often hardcopy map sources include a map reliability rating
5. Data quality: The processes and technologies or confidence rating in the map legend
involved in ensuring the conformance of data values to • This rating helps the user in determining the fitness for use
business requirements and acceptance criteria for the map
6.ISO/PAS 26183:2006 defines product data quality as • However, rarely is this information encoded in the digital
a measure of the accuracy and appropriateness of conversion process
product data, combined with the timeliness with which
those data are provided to all the people who need • Often because GIS data is in digital form and can be
them. represented with a high precision it is considered to be
totally accurate
And
Datamore……
Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 7 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 8

2
Error and Uncertainty in GIS Error and Uncertainty in GIS
• In reality, a buffer exists around each feature which • The ease with which geographic data in a GIS can be
represents the actual positional location of the feature used at any scale highlights the importance of
detailed data quality information.
• For example, data captured at the 1:20,000 scale
commonly has a positional accuracy of ± 20 metres • Although a data set may not have a specific scale
once it is loaded into the GIS database, it was
• This means the actual location of features may vary 20
produced with levels of accuracy and resolution that
metres in either direction from the identified position of the
feature on the map make it appropriate for use only at certain scales, and
in combination with data of similar scales.
• Considering that the use of GIS commonly involves the
integration of several data sets, usually at different scales
and quality, one can easily see how errors can be
propagated during processing
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 9 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 10

Error and Uncertainty in GIS Error and Uncertainty in GIS


• Error - Two sources of error: • Possible sources of operational errors include :

Inherent and Operational • Mislabelling of areas on thematic maps


• Misplacement of horizontal (positional)
• Inherent error is the error present in source boundaries
documents and data • Human error in digitizing classification error
• GIS algorithm inaccuracies
• Operational error is the amount of error produced • human bias
through the data capture and manipulation functions
of a GIS • While error will always exist in any scientific process,
the aim within GIS processing should be to identify
• Both contribute to the reduction in quality of the existing error in data sources and minimize the
products that
Data Quality and are generated
Error Analysis in GIS (c) Dr. J. Greenfeldby GIS. 11
amount of error added during processing
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 12

3
Errors in Database Creation Error and Uncertainty in GIS
Errors are introduced at almost every step of database
creation

Concerns the degree to which the data exhausts the


universe of possible items
Are all possible objects included within the
database?
Affected by rules of selection, generalization and
scale

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 13 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 14

Error induced by data cleaning, Longley et al., chapter 6, pages Merging. Longley et al., chapter 6, pages 132-133
132-133

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 15 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 16

4
classification error -- difference in
pixel class between the map and a
reference

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 17

1939 1971
Error and Uncertainty in GIS
• Because of cost constraints it is often more appropriate to
manage error than attempt to eliminate it!

• There is a trade-off between reducing the level of error in a


data base and the cost to create and maintain the
database
1956 1995
• An awareness of the error status of different data sets will
allow user to make a subjective statement on the quality
and reliability of a product derived from GIS processing

• The validity of any decisions based on a GIS product is


directly related to the quality and reliability rating of the
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 19
product
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 20

5
Error and Uncertainty in GIS Error and Uncertainty in GIS
• Depending upon the level of error inherent in the source
data, and the error operationally produced through data
capture and manipulation, GIS products may possess Tools to get a handle on uncertainty
significant amounts of error
Models of uncertainty: methods for assessing and
describing error

Error propagation (during analysis)

Fuzzy approaches (membership of classes)

Sensitivity analysis (effect of errors)

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 21 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 22

Error and Uncertainty in GIS Classification of Errors in GIS


[Hun ‘92]

Error assessment, reporting, interpretation - more difficult Final


Resulting in Product
Errors
Quality of data: standards and metadata
(Primary) (Secondary)
But: No professional GIS currently in use can present the
Forms of Error Positional Logical
user with information about the confidence limits that Error Error
should be associated with the results of an analysis. Attribute
Error Completeness

Source of Error Data Collection Data Data


and Compilation Processing Usage

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 23 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 24

6
Uncertainty Uncertainty (Definition of a Forest)
16
Zimbabwe
14
Uncertainties in geographic information originate from
different sources: 12

Tree Height (m)


Sudan
Uncertainty due to the inherent nature of geography: 10

different interpretations can be equally valid; 8 Turkey Tanzania


Morocco Ethiopia
Cartographic uncertainty resulting in positional and Belgium
6 Malaysia Somalia Denmark
UN New Zealand
attribute errors; U.S. Australia
UNESCO
4 Israel Japan
Conceptual uncertainty as a result of differences in Mexico South Africa
Switzerland
2 Kenya
“what it is that is being mapped”. Portugal
Estonia
0
0 10 20 30 40 50 60 70 80 90
Canopy Coverage (%)
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 25 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 26

Characteristics to define the


Internal and External Data Quality internal quality
internal quality - Corresponds to the level of similarity that – Completeness: presence and absence of features,
exists between “perfect” data to be produced (what is their attributes and relationships.
called “nominal ground”) and the data actually produced
– Logical consistency: degree of adherence to logical
external quality - Corresponds to the similarity between rules of data structure, attribution, and relationships (data
the data produced and user needs structure can be conceptual, logical or physical).
User
needs 1 – Positional accuracy: accuracy of the position of
features.
Data that
should have Internal Data External User
been produced Quality 2 needs 2 – Temporal accuracy: accuracy of the temporal
Quality
produced attributes and temporal relationships of features.

User – Thematic accuracy: accuracy of quantitative attributes


needs n
and the correctness of non-quantitative attributes and of
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 27
theData Quality and Error Analysis in GIS (c) Dr. J. Greenfeld
classifications of features and their relationships. 28

7
six characteristics to define the six characteristics to define the
external quality (Beard and Vallière) external quality (Beard and Vallière)
– Definition: to evaluate whether the exact nature of a – Precision: to evaluate what data is worth and whether
data and the object that it describes, that is, the “what”, it is acceptable for an expressed need (semantic,
corresponds to user needs (semantic, spatial and temporal, and spatial precision of the object and its
temporal definitions). attributes).

– Coverage: to evaluate whether the territory and the – Legitimacy: to evaluate the official recognition and the
period for which the data exists, that is, the “where” and legal scope of data and whether they meet the needs of
the “when”, meet user needs. de facto standards, respect recognized standards, have
legal or administrative recognition by an official body, or
– Lineage: to find out where data come from, their legal guarantee by a supplier, etc.;
acquisition objectives, the methods used to obtain them,
that is, the “how” and the “why”, and to see whether the – Accessibility: to evaluate the ease with which the user
data meet user needs. can obtain the data analyzed (cost, time frame, format,
confidentiality, respect of recognized standards,
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 29 copyright, etc.).
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 30

Conceptual model of
uncertainty in spatial data Definitions of geographic objects
Uncertainty An examples of well-defined geographical objects is
land ownership. The boundary between land parcels is
Poorly Defined
commonly marked on the ground, and shows an abrupt
Well Defined
Objects Objects and total change in ownership

Examples of poorly defined geographical objects are


Error Vagueness Ambiguity the rule in natural resource mapping. The
conceptualization of mappable phenomena and the
Fuzzy Set
Probability Theory Discord Non-Specifity spaces they occupy is rarely clear-cut
There are rarely sharp transitions from one vegetation
Expert Opinion Endorsement Theory,
Dempster Schafer Fuzzy Set Theory type to another
In a region there could be several types of vegetation

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 31 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 32

8
Five dimensions of objects A and B Error

Space
Ideally, if an object is conceptualized as being definable
A
in both attribute and spatial dimensions, then it has a
Boolean occurrence; any location is either part of the
object, or it is not.
Scale Time
Within GIS, for a number of reasons, a location or the
assignment of an object to a location or to the a class
may be expressed as a probability.
B

Relation Attribute
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 33 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 34

Common reasons for a Common reasons for a


database being in error database being in error
Type of Error Cause of error Type of Error Cause of error
Measurement Measurement of a property is erroneous. Entry Data are miscoded during (electronic or
Assignment The object is assigned to the wrong class manual) entry in a GIS.
because of measurement error by the Temporal The object changes character between the
scientist in either the field or laboratory or by time of data collection and the time of
the surveyor. database use.
Class Following observation in the field, and for Processing In the course of data transformations an
Generalization reasons of simplicity, the object is grouped error arises because of rounding or
with objects possessing somewhat dissimilar algorithm error.
properties.
Spatial Generalization of the cartographic
Generalization representation of the object before digitizing,
including
Data Quality and Error displacement,
Analysis in GIS (c) Dr. J. Greenfeld simplification, etc. 35 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 36

9
Vagueness Vagueness
Sorites Paradox (is a bald man with an additional 1 Fuzzy-set theory is an alternative to Boolean sets.
hair still bald?
Membership of an object in a Boolean set is
When, exactly, is a house a house; a settlement, a absolute, and defined by one of two integer values
settlement; a city a city; an oak woodland, an oak {0,1}.
woodland? Membership of a fuzzy set is defined by a real
number in the range [0,1]. Membership or non-
The questions always revolve around the threshold
membership of the set is identified by the terminal
value of some measurable parameter or the opinion
values, while all intervening values define an
of some individual, expert or otherwise.
intermediate degree of belonging to the set (a
membership of 0.25 reflects a smaller degree of
belonging to the set than a membership of 0.5.)
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 37 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 38

Ambiguity Definition of a Forest


Ambiguity occurs when there is doubt as to how a 16
Zimbabwe
phenomenon should be classified because of differing 14
perceptions of that phenomenon.
12
Tree Height (m)

There are two types of ambiguity: 10 Sudan

Tanzania
Discord – different definitions and interpretation of the 8 Turkey
Morocco Ethiopia
Belgium
same piece of land. (not a problem of a single 6 Malaysia Somalia Denmark
UN New Zealand
classification but of multiple mapping of the same area) UNESCO
U.S. Australia
4 Israel Japan
Mexico South Africa
in the defining of soil, for example, many countries Switzerland
2 Kenya
Portugal
Estonia
have slightly different definitions of what constitutes a
0
soil, names for soils and the spatial and attribute 0 10 20 30 40 50 60 70 80 90
boundaries between soil types. Canopy Coverage (%)
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 39 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 40

10
Discord Ambiguity Discord Ambiguity

a) There is more class1 than b) The “zone of transition” between c) the whole area is allocated d) the two distinct areas of class1
class2 classes 1 and 2 is represented into a class1-&-class2 and class2 are separated by two
by a mosaic of class1-&-class2
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 41 mosaicData Quality and Error Analysis in GIS (c) Dr. J. Greenfeld
mosaics of class1-&-class2 and
42
class2-&-class1

Discord Ambiguity Non-specificity Ambiguity


Some solutions for the problem of discord include: Ambiguity through non-specificity can be illustrated by
geographical relationships.
Use of expert look-up tables and producer-supplied
metadata to compare classifications. This is an artificial The relation “A is north of B” is itself non-specific
intelligence based solution. because it can mean:

Use personal (expert) judgment to compare A lies on exactly the same line of longitude and
towards the north pole from B;
classification and phenomenon changes over a longer
period. This solution makes extensive use of rough and A lies somewhere to the north of a line running east to
fuzzy sets to accommodate the uncertainty in the west through B
correspondence of classes. A lies between perhaps north-east and north-west, but
is most likely to lie in the sector between north-north-
east and north-north-west of B.
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 43 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 44

11
Non-specificity Ambiguity Uncertainty
The first two definitions are precise and specific, the third Attribute uncertainty (Forest vs. Ag)
is the natural language concept, which is itself vague.
Positional uncertainty
Any lack of definition as to which should be used means
that uncertainty arises in the interpretation of “north of”. Definitional uncertainty

Measurement uncertainty

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 45 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 46

The Necessity of “Fuzziness” Fuzziness (cont.)


“It’s not easy to lie with maps, it’s essential...to present
All GIS subject to uncertainty
a useful and truthful picture, an accurate map must tell
white lies.” -- Mark Monmonier What the data tell us about the real world
distort 3-D world into 2-D abstraction Range of possible “truths”

characterize most important aspects of spatial reality Uncertainty affects results of analysis

portray abstractions (e.g., gradients, contours) as Confidence limits - “plus or minus”


distinct spatial objects Difficult to determine
“If it comes from a computer it must be right”
“If it has lots of decimal places, it must be accurate”

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 47 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 48

12
Method for determination
conformance quality levels Data quality
Assumptions Lineage
The more errors are in a dataset, the higher the
likelihood of applying erroneous data for decisions or Accuracy Positional
actions
Attribute
Each false decision or action leads to consecutive
Data Quality Completeness
costs
• costs on finding the right answer
Logical Consistency
• costs due to damages caused by false information e.g. by
hitting a pipeline which was documented at a different location
• hidden costs by loosing confidence of the user community Semantic Accuracy
• hidden costs due to image loss by the customer
A dataset is never completely free of errors Currency
The effort to gain a certain quality level costs time and
money
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 49 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 50

Positional accuracy (2D example) Positional accuracy (2D example)


We distinguish between point objects, line objects, and area A simple approximation for Lines and Area objects with n
objects.
points is: PAL,A = 𝑛(𝜎x2 + 𝜎y2
For a point object, with (x ±σx, y ±σy) coordinates
The values of σx and σy may be known from: sy
sx
sy
sx sy
sx
– previous studies Line
(x1,y1) (x2,y2)
– specifications (x3,y3)
– derived from the collected data
(x1,y1) (x2,y2) (x3,y3)
𝜎x2 + 𝜎y2 sy sy
sx sy
The point positional accuracy is then PAP = Polygon sx sx

sy
sx

(x,y) sy
sx
Note: The size of each
(x4,y4) error could be different
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 51 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 52

13
Error Propagation or
Propagation of Random Errors Error Propagation
Definition: Error propagation is a way of combining two
or more random errors together to get a third.
Given independent variables each with an
uncertainty, error propagation is the method It can be used when you need to measure
of determining an uncertainty in a function of more than one quantity to get at your final
these variables. result. For example, an angle and a distance
Computed errors Measurement or given Errors to compute coordinates
E x, E y Angular and distance Error propagation can also be used to
E area Coordinates combine several independent sources of
random error on the same measurement.
E vol Distance and elevation
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 53 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 54

Error Propagation Derivation of formulas.


Suppose that x is a measured quantity and y is computed
In General matrix equation Σzz = A Σxx AT from
y = ax + b

If we knew xt is the true value of x, we could compute yt

yt = axt + b
The measured value of x has an error of dx or
x = xt + dx.
Thus y = a(xt + dx) + b = axt + b + a dx
y = yt + a dx y
a
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 55 dy = a dx
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld x 56

14
Error Propagation Error Propagation Examples:
Random error of a sum
A general formula (assuming independence ...
If y = x1 + x 2 + x 3 + + xn
or no correlation)

y y y y
Then s y  s x 2 s x 2 s x 2   s x 2

sy  ( s x )2  ( s x )2  ( s x )2      ( s x )2 1 2 3 n

 x1  x2
1
 x3 2
 xn 3 n

A leveling loop was measured


with the following accuracies:
DH1 = 12.34 ±0.01
DH2 = -8.72 ±0.02 The closure is 0.02
DH3 = 4.93 ±0.005 The accuracy is of the loop:
DH4 = -8.53 ±0.01
0.012+0.022+0.0052+0.01 2 =0.025
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 57 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 58

Error Propagation Examples: Error Propagation Examples:


Random error of a series
Random error of area
If y = x1 + x2 + x3 + ... + xn and

s x  s x  s x    s x A=ab s A  b 2s a2  a 2s b2
1 2 3 n

Then
s y  n s x Example
The sides of an 80’x100’ rectangle lot was measured with
an accuracy of ±0.02’. What is the accuracy of the area of
Example the lot?

0.012+0.012+0.012+0.012 = 4 x 0.01 = 0.02


sA = 802 x 0.022 + 1002 x 0.022 = 2.56’
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 59 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 60

15
Error Propagation of Azimuth and Error Propagation of coordinates (x,y)
Distance to coordinates (x,y) to Azimuth and Distance

B
X B  X A  D sin AZ AB D  ( X B  X A ) 2  (YB  YA ) 2  DX 2  DY 2
D

YB  YA  D cos AZ AB XB  XA DX
AZ
AZ  tan 1  tan 1
A YB  YA DY
s AZ
2
s X  s X2  (sin AZ AB ) 2 s D2  ( D cos AZ AB ) 2 sD 
1
DX 2 (s X2 A  s X2 B )  DY 2 (s Y2A  s Y2B )
B A
2062652
D
s AZ
2
1
s Y  s Y2  (cos AZ AB ) 2 s D2  ( D sin AZ AB ) 2 s AZ  2
DY 2 (s X2 A  s X2 B )  DX 2 (s Y2A  s Y2B )
B A
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld
2062652 61
D
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 62

Error Propagation of coordinates Error Propagation of coordinates


to area of a closed polygon to area of a closed polygon
1 1
A  xi ( yi 1  yi 1 ) sA   [( yi 1  yi 1 ) s xi ]   [( xi 1  xi 1 ) s yi ]
2 2 2 2
2 2

Point X Y Xi+1 -Xi-1 Yi-1 -Yi+1 Xi (Yi-1 -Yi+1) [sx(Yi-1 -Yi+1)]2 [sy(Xi+1 -Xi-1)]2
1
A   xi ( yi 1  yi 1 ) A10000.00 10000.00
2 B 9600.04 10599.96 -1300.01 -799.95 -7679521.06 78.76 468.01
C 8699.99 10799.95 -500.06 649.94 5654467.30 216.58 116.81
1
sA   [( y  yi 1 ) s ]   [( xi 1  xi 1 ) s ]
2 2 2 2 D 9099.98 9950.02 1300.01 799.95 7279498.42 570.06 1457.26
i 1 xi yi A 10000.0010000.00 500.06 -649.94 -6499391.57 545.43 344.47
2 S
B 9600.04 10599.96 -1244946.91 1410.83 2386.55

AREA= 622473.45
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 63
s = 30.81
Data Quality and Error Analysis in GIS (c) a
Dr. J. Greenfeld 64

16
POSITIONAL ACCURACY The National Standard for
defined as the closeness of locational information Spatial Data Accuracy (NSSDA)
(usually coordinates) to the true position
How to test positional accuracy? A well-defined statistic and testing methodology for
use an independent source of higher accuracy (e.g. GPS positional accuracy of spatial data.
or raw survey data) Applicable to digital and graphic forms (aerial
photographs, satellite imagery, and maps)
use internal evidence
The standard does not define “pass-fail” accuracy
unclosed polygons, lines which overshoot or values. (agencies are to set criteria)
undershoot junctions, are indications of inaccuracy -
Accuracy report
the sizes of gaps, overshoots and undershoots may
be used as a measure of positional accuracy
compute accuracy from knowledge of the errors http://www.fgdc.gov/standards/projects/FGDC-standards-projects/accuracy/
introduced by different sources using error propagation65
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 66

The standard deviation for the


Spatial Accuracy (Horizontal horizontal coordinate r is:
Accuracy)
sr   (d i  d )2
n 1
Circular error is based on the sample Where:
standard deviation of di, the difference
between the data set coordinate value and d 
d i
The mean discrepancy
the coordinate value determined by an n
independent check survey of higher accuracy
for the same point. di  rdatai  rchecki ri  xi2  yi2
n = total number of points checked
NSSDA horizontal accuracy is:
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 67 Accuracy
Data Quality and Error Analysis= 2.4477
r in GIS * si , (95% confid. level)
(c) Dr. J. Greenfeld 68

17
The standard deviation for the z
coordinate direction is: Well-Defined Points
Features that can be identified within 1/3 of the
sz   (d i  d )2
maximum expected uncertainty for the data set.
n 1
Acceptable features
where:
d i  zdatai  zchecki Small scale Large scale
Road/Rail intersections Center of utility access cover
d
d i The mean discrepancy Small isolated shrubs Sidewalk/curb/gutter intersec.
n Corners of structures Monuments
n = total number of points checked

NSSDA vertical accuracy is: Check survey points should have accuracies within
one-third the data sets intended accuracy (95% CL)
Accuracyr = 1.96 * si , (95% confidence level)
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 69 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 70

Positional Accuracy evaluation


Check Point Location of Othophotos in New Jersey
(assuming rectangle area)
 Spaced at intervals of at least 10% of the diagonal.
 At least 20% of the points are located in each quad. Point Accuracy (ft)
 Check points may be distributed more densely in the vicinity 1 4.25
of important features

 When data exist for only a portion of the data set, confine 2 4.07
test points to that area. 3 2.28
 When the distribution of error is likely to be nonrandom, it 4 3.98
may be desirable to locate check points to correspond to
the error distribution. 5 4.18
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 71 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 72

18
ATTRIBUTE ACCURACY ATTRIBUTE ACCURACY
Defined as the closeness of attribute values to their true For categorical attributes such as classified polygons:
value
Are the categories appropriate, sufficiently detailed
Note that while location does not change with time, and defined?
attributes often do Is polygon classified as A really A or should be B?
Attribute accuracy must be analyzed in different ways How heterogeneous are the polygon (e.g. 70% A and
depending on the nature of the data 30% B
For continuous attributes (surfaces) such as on a DEM How well are A and B defined (e.g. soils
or TIN: classifications)
center area may be definitely A, but more like B at
accuracy is expressed as measurement error (e.g. the edges
±1m)

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 73 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 74

ATTRIBUTE ACCURACY The Kappa coefficient


Dataset A Dataset B Comparing A to B
How to test attribute accuracy?
prepare a misclassification matrix and calculate the
degree of correctness
Examples:
A B A B
The Kappa coefficient A PAA PAB PAr
A OAA OAB OAr
Map Producer’s accuracy B OBA OBB OBr B PBA PBB PBr
O – Observed
Map User’s accuracy OAc OBc Σ PAc PBc 1 P – Percentage

P0  PAA  PBB
P0  Pe
Kappa 
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 75
Pe  PData
Ac Quality
PAr and Error  PBrin GIS (c) Dr. J. Greenfeld
PBcAnalysis 1  Pe 76

19
The Kappa coefficient How to interpret Kappa
Dataset A Dataset B Comparing A to B Kappa is always less than or equal to 1.
A value of 1 implies perfect agreement and values less
than 1 imply less than perfect agreement.
In rare situations, Kappa can be negative. This is a sign
R B
that the two observers agreed less than would be
R B
expected just by chance.
R 58 6 64 R 0.586 0.061 0.646
B 7 28 35 B 0.071 0.283 0.354 A possible interpretation of Kappa. The agreement is:
65 34 99 0.657 0.343 1

Poor Fair Moderate Good Very good


P0  0.586  0.283  0.869
0.869  0.546
Kappa   0.711 0.0 0.2 0.4 0.6 0.8 1.0
Pe  0.657  0.646  0.343  0.354  0.546 1  0.546 Kappa
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 77 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 78

Other Accuracy Assessment Other Accuracy Assessment


Assume we have a 9 cell land cover map, one from 1980 Sum up the rows and columns. But A B C
and one from 2000 with three categories: A, B, and C.
what do these numbers tell us? A 2 2 0 4

1980 LC 2000 LC Cross Tabulated Grid B 0 2 1 3


The bottom row tells us that there C 0 1 1 2
A B B A A B AA BA BB were two cells that were A, five B,
B B C B C C BB BC CC and two C. 2 5 2
B A C A A B BA AA CB
The rightmost column tells us that we mapped 4 cells as
The cross tabulation can be quantified into a matrix A, 3 as B, and 2 as C.
oftentimes called a confusion matrix Adding up the Diagonal cells says that 5 cells were right.
A B C
The matrix shows the agreements The overall agreement between maps is:
A 2 2 0
between the 1980 and 200 maps. As
an example, 2 cells remained A (AA), B 0 2 1 Σdii /n = 5/9 = 0.55%
1 cell was C 0 1 1
Data Quality andC
Errorand
Analysisis now
in GIS (c) Dr. B (CB), etc.
J. Greenfeld 79 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 80

20
User and Producer Accuracy User and Producer Accuracy
The total correspondence of our example is 55%. But, Map user’s accuracy = the total number correct within
that only tells us part of the story. What if we were a row divide by the total number in the whole row.
really interested in classification B? Where there Map producer’s accuracy = the total number of
changes in classification B? Even here, there are two correct within a column divided by the
different ways of interpreting that question: total number in the whole column.
A B C
If I were interested in mapping all the areas of B, Example of classification B A 2 2 0 4
how well did I get them all? This is called the map B 0 2 1 3
Producer’s Accuracy. That is, how well did we C 0 1 1 2
produce a map of classification B.
2 5 2
If I were to use the map to find B, how successful Map user’s accuracy = 2/3 = 67%
would I be? This is called the Map User’s Accuracy.
That is, much confidence should a user of the map Map producer’s accuracy = 2/5 = 40%
have for a given classification.
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 81 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 82

User and Producer Accuracy LOGICAL CONSISTENCY


How can we use the above results? Refers to the degree of adherence to logical rules of
data structures (conceptual, logical or physical),
This means that if we were to use this map and look attribution and relationships. It includes:
for the classification of B, we would be correct 67% of
the time. Conceptual consistence; adherence to rules of
conceptual schema
This means that the map produced only 40% of all
the B’s that were out there. Domain consistency; adherence of values to the value
This also gives us some indication of the nature of domain
the errors. For instance, it appears that we confused
classification A with classification B (we said on two Format consistency; degree to which data is stored in
occasions that B was A). By understanding the accordance to physical structure of the dataset
nature of the errors, perhaps we can go back, look
over our process and correct for that mistake.
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 83 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 84

21
LOGICAL CONSISTENCY COMPLETENESS
Topological consistency; correctness of the explicitly Refers to and absence of features, their attributes and
encoded topological characteristics of a dataset. For relationships of spatial data in comparing what is
example: defined in the data model or what is in the real world.

• If there are polygons, do they close? Error of commission – data presented in a data set that
• Is there exactly one label within each polygon? is not present in the data model or the real world
• Are there nodes wherever arcs cross, or do arcs Error of omission – data that is present in the data
sometimes cross without forming nodes? model or the real world is absent in the dataset.

Affected by rules of selection, generalization and scale

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 85 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 86

An Example of Data Quality Elements


LINEAGE
and Sub-elements for Buildings
A record of the data sources and of the operations
which created the database Quality Quality sub- Description by
elements elements examples
How was it digitized, from what documents?
Commission error Buildings with area less
When was the data collected? than 4m2 are presented
What agency collected the data? in Building Polygon layer
What steps were used to process the data? of 1:1000 data set.
Completeness Omission error Buildings with area equal
• precision of computational results
to or larger than 4m2 are
Is often a useful indicator of accuracy
absent from the Building
Polygon layer.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 87 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 88

22
An Example of Data Quality Elements An Example of Data Quality Elements
and Sub-elements for Buildings and Sub-elements for Buildings
Quality Quality sub- Description by Quality Quality sub- Description by
elements elements examples elements elements examples
RMSE of a building RMSE of a building
polygon based on a com- polygon based on a
parison of the horizontal
comparison of the
Positional Horizontal accuracy coordinates of all the Positional
Vertical accuracy vertical coordinates of all
nodes of its footprints of accuracy
accuracy
a building in GIS with the nodes of its footprints
the corresponding of a building in GIS with
reference values. the corresponding
reference values.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 89 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 90

An Example of Data Quality Elements An Example of Data Quality Elements


and Sub-elements for Buildings and Sub-elements for Buildings
Quality Quality sub- Quality Quality sub-
elements elements Description by examples elements elements Description by examples
Correctness that a building or Logical Conceptual A tower is described to be
related features is correctly consistency consistency under its podium.
Attribute Classification classified as one (or more) Domain The classification of feature
accuracy correctness building- related features. consistency code for a building polygon is
Non-quantitative The Name of a building beyond any of the following
attribute polygon may be correct or given classes: BR BAR BUP,
correctness wrong in a GIS. IBP, OSP, PWP, TSP.
The value of the field
Quantitative
"Building Top Level" of a
attribute
Building Polygon may be
correctness
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 91 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 92
correct or wrong.

23
An Example of Data Quality Elements Uncertainties Measured Based on
and Sub-elements for Buildings Various Mathematical Theories
Quality Quality sub- Uncertainty
elements elements Description by examples
Building names in title case - Imprecision Ambiguity Vagueness
Logical Hong Kong Airport- are
consistency Format consistent, while a name Confidence region
consistency such as "HONG KONG model Shi 1994
Airport" is not consistent in Probability and Entropy Shannon 1948 Hartley’s measure 1928
statistical theory
format.
U-uncertainty, Fuzzy measure
When the outline of a building Evidence theory Fuzzy topology measure
Topological polygon is closed, the
consistency topology is consistent; when Discord measure, Confusion measure Fuzzy sets, Probability
and Fuzzy topology
the outline is not closed, the and non-specificity measure
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 93 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 94
topology is not consistent.

A framework for modeling uncertainties


in spatial data and analysis Error model of point – Error Ellipse
Positional
Uncertainty Uncertainty Sx
t is rotation angle from Y axis to
Visualization and the distribution of

Point In spatial U axis of largest error.


Positional analysis
Uncertainty V Su is the semi-major axis of
Su
Line Sy
Processing and the uncertain

Uncertainty Information

Object Uncertain ellipse. (Largest error) u


t
control of Spatial data

Polygon Topology Uncertain X Sv is the semi-minor axis of


Uncertainty spatial Sv ellipse. (Least error) v
3D From Query
objects Multi-data Sx is the standard deviation in X
Real source of coordinate x
World Sy is the standard deviation in Y
DEM Errors Hybrid of coordinate y
surface in DEM DEM
Interpolation
Field
Uncertainty Geometric
The transformation equation between U,V and X,Y is:
Raster of Remote Correction
Image Sensing and image
data fusion U   cos t sin t   X 
Uncertainty modeling V     sin t cos t   Y 
Real World Datatype
Data
Classification Description of
Quality
Ofand Error data
spatial AnalysisUncertainty
In spatial analysis
in GIS (c) Dr. J. Greenfeld and query
Control of Visualization of
95
Uncertainties Uncertainties
 andError Analysis
Data Quality  in GIS (c) Dr. J. Greenfeld   96

24
Error model of point – Error Ellipse Error model of line - Epsilon band
Assumptions:
Sx
V U 1. each error effect relevant to a particular digital line in a
Su GIS can be treated as a random variable, perturbing the
Sy
2S xy true line to obtain the observed line.
tan 2t  t
S  SY2
2
X
X 2. the processes of generating a digital line in a GIS can be
Sv
treated as being independent.
The bandwidth is determined from a statistical function of
( S X2  SY2 ) 2
K  S XY
2
those positional errors on the line accumulated from the
4 first stage to the final stage of data capture.

S X2  SY2 S X2  SY2
Su2  K Su2  K The measured Line
2 2 The true Line
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 97 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 98

Error model of a polygon Error model of a polygon


The area S of the polygon is computed from: For simplicity assume all coordinate accuracies are equal
n n to σo and covariance is 0 we get:
1 1
S  [ xi ( yi 1  yi 1 )]  2 
2 i 1
[ xi Dyi 1,i 1 ]
1 n 1 n
i 1
sS  
4 i 1
[ Dyi21,i 1  Dxi21,i 1 ]s o2   [li21,i 1 ]s o2
4 i 1
The differential of the area is given as:
Where: li-1,i+1 is the distance between points Pi-1 and Pi+1
n
1
dS   [Dyi1,i1dxi  Dxi 1,i 1dyi ]
2 i 1

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 99 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 100

25
What is a standard? Examples of Everyday Standards
 Traffic Signals – Road Signs
 Standards are documented agreements containing  VISA / Mastercard: standards allow people to use
technical specifications or other precise criteria to be a single card to obtain cash in the local currency
used around the world
 consistently as rules, guidelines, or definitions of  Commerce/Manufacturing/Industry
characteristics, to
World War II - Allied supplies and facilities were
 ensure that materials, products, processes and severely strained due to the incompatibility of
services are fit for their purpose. tools, replacements parts, and equipment. The
establishment of international standards helped
to increase compatibility.

(as defined by ISO)

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 101 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 102

The Importance of Standards The Need for Standards in


(when standards do not exist) Geographic Information
 Disasters (fire, flood, …)
Great Baltimore Fire of 1904 - fire engines from different  To ensure common understanding through a common set of
regions arrived to help put out the fire, only they had terminology
different hose coupling sizes that did not fit the Baltimore  To promote/enable interoperability
hydrants - fire burned over 30 hours, resulted in destruction
 To support the establishment of geospatial infrastructures at
of 1526 building covering 17 city blocks.
local, regional, and global levels
 Metric System vs US Customary System  To promote data and information sharing/exchange

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 103 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 104

26
Evaluating and Reporting Quality Evaluation
Types of geospatial standards Results [ISO 19114]
Dataset as Product specification
specified by or user requirements work
 Data Classification the scope
item
19131
e.g., Vegetation Classification 1

ISO 19114
Identify an applicable data quality
element, data quality subelement,
 Data Content and data quality scope
ISO 19113
e.g., Digital Geospatial Metadata, Spatial Schema
2 Identify a data quality measure
 Data Symbology or Presentation 5 step process on Conformance
quality evaluation quality level
e.g., Digital Geologic Map Symbolization 3 Select and apply a data quality
evaluation method
 Data Transfer 4
Determine the data quality result
 Data Usability 5
Determine conformance
e.g., Geospatial Positioning Accuracy
Report data quality Report data quality
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 105 Data Quality andresult (quantitative)
Error Analysis in GIS (c) Dr. J. Greenfeld result (pass / 106
fail)

Metadata Example

Without…

With…

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 108

27
Metadata need Example The Standard
Metadata has four major roles:

HUH? Availability- information needed to determine the


sets of data that exist for a geographic location.
WQPW- ID DIN Pb
Fitness for use- information needed to determine if a
PB-31 .34 .012 set of data meets a specific need.
HK-14 .12 .023
Access- information needed to acquire an identified
PB12 35 034 set of data.
PB-12 .35 .034

WA-3 .28 .001


Transfer- information needed to process and use a
set of data
PB-4 .23 .022

PB-5 .21 .013

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 109 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 110

Information that can be found in Metadata The Value of Metadata

• Title, Abstract, Publication Date Organize and maintain an organization’s investment in data
(Section 1: Identification information)
• Data Accuracy and Completeness Provide information to data catalogs and clearinghouses
(Section 2: Data Quality Information)
Provide information to aid data transfer
• Data Form: Vector or Raster?
(Section 3: Spatial Data Organization Information)
• Projection or Geographic Reference System Food for thought...
(Section 4: Spatial Reference Information)
• What Values Are Associated with Geodata? Nothing happens overnight: get used to thinking of the long term benefits
(Section 5: Entity and Attribute Information) of metadata. $$$
• How Do You Get It? Cost? Documentation = defense
(Section 6: Distribution Information)
• How Current Is the Documentation? The Standard: don't judge a book by its cover
Data(Section
Quality and Error 7: Metadata
Analysis Reference Information)
in GIS (c) Dr. J. Greenfeld 111 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 112

28
Metadata resources Metadata resources
The FGDC
Federal Geographic Data Committee: Interagency committee that • The objective of this International Standard is to provide a clear
coordinates federal geo-data activities. procedure for the description of digital geographic datasets so that users
will be able to determine whether the data in a holding will be of use to
The Content Standard for Digital Geospatial Metadata (CSDGM) them and how to access the data. By establishing a common set of
•The current US Federal Metadata standard metadata terminology, definitions and extension procedures, this
•Often referred to as the 'FGDC Metadata Standard‘ standard will promote the proper use and effective retrieval of geographic
•Has been implemented in federal state and local governments data.
• Supplementary benefits of this standard for metadata are to facilitate the
organization and management of geographic data and to provide
International Organization of Standards (ISO), has developed and information about an organization’s database to others.
approved an international metadata standard, ISO 19115 – Geographic • This standard for the implementation and documentation of metadata
Information Metadata furnishes those unfamiliar with geographic data the appropriate
information to characterize their geographic data and it makes possible
dataset cataloguing enabling data discovery, retrieval and reuse.
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 113 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 114

Graphical Representation of the:


Best Practices for Writing Quality Metadata
Writing Principles
US Geological Survey Biological Resources Division
DRAFT Content Standard for Biological Metadata
Based on : The Federal Geographic Data Committee’s Content Standard for Digital Geospatial
Metadata June 8, 1994 version 1.0
Prepared by Susan Stitt, Center for Biological Informatics
Write simply but completely
Metadata
Document for a general audience

Adopt a consistent style

1. 2. 3. 4. 5. 6. 7.
Avoid using jargon

Define technical terms


Identification Data Quality Spatial Data Spatial Entity and Distribution Metadata
Information Information Organization Reference Attribute Information Reference
Information Informatio Informatio Information
n n
Mandatory Biological
Mandatory if Optional Items
Data Quality and Error AnalysisApplicable
in GIS (c) Dr. J. Greenfeld Added 115 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 116

29
Best Practices for Writing Quality Metadata Best Practices for Writing Quality Metadata
In Practice In Practice (continued)

Use subtitles to define and clarify long passages


State clearly what your data are not
Quantify assessments wherever possible
Find, evaluate, and reuse good examples
See examples from FGDC workbook Use “None” and “Unknown” carefully
Mine the Clearinghouse for other examples
Format date: YYYYMMD
Use keywords as indicators of the contents of a dataset
Avoid using confusing symbols & conventions:
Use a thesaurus or controlled vocabulary when possible
!@#%{}|/\<>~
Unnecessary carriage returns, tabs, indents, etc.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 117 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 118

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 119 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 120

30

You might also like