Data Quality and Error Analysis in GIS
Data Quality and Error Analysis in GIS
Joshua Greenfeld, PhD, LS For surveying to make a mark on the GIS industry and
Professor emeritus, NJIT become a prominent stake holder of GIS, it has to offer
Professor, Israel Institute of Technology some expertise that most other professionals cannot.
Unfortunately, the ability to collect spatial data is becoming
a common skill and the surveyors positioning expertise is
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 1 not as unique
Data Quality as itin GIS
and Error Analysis used toGreenfeld
(c) Dr. J. be. 2
ABSTRACT Objective
There is one area that surveyors have an advantage over The objective of this seminar is to enable surveyors to
other GIS professionals is their propensity and ability to understand the broader issues of accuracy assessment
understand and quantify spatial errors and accuracies. beyond positional accuracies.
In surveying, the uncertainty and quality assessment is It will outline the extended definition of uncertainty and
mostly confined to positioning or positional accuracies. quality as it applies to GIS.
The quality of surveying results is typically assessed on the It will include an overview on the errors and uncertainties
basis of measurement accuracy and the propagation of that could impact the quality of spatial data.
these accuracies into other computed quantities.
This will be followed by discussing the impact of errors in
In GIS uncertainty and quality issues are much more spatial data on spatial information.
broad. In addition to positional accuracy there is:
The ISO geospatial standards will be reviewed as well.
attribute accuracy, completeness of the data, sources and
lineage of the data, logical consistency, fuzziness of the Finally, some practical tools and examples of numerical
spatial phenomenon, currency of the data and other and statistical assessment of uncertainty and quality of
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 3 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld
uncertainty issues. spatial information will be discussed and demonstrated. 4
1
Importance of Quality No unified definition of data quality
1. Data Quality refers to the degree of excellence
exhibited by the data in relation to the portrayal of the
Gain confidence in geodata actual phenomena. GIS Glossary
Reduce users‘ complaints 2. The state of completeness, validity, consistency,
timeliness and accuracy that makes data appropriate
Get customer’s satisfaction for a specific use. Government of British Columbia
Minimize consecutive costs caused by decisions 3. The totality of features and characteristics of data
or actions based on erroneous data that bears on their ability to satisfy a given purpose; the
sum of the degrees of excellence for factors related to
data. Glossary of Quality Assurance Terms
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 5 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 6
2
Error and Uncertainty in GIS Error and Uncertainty in GIS
• In reality, a buffer exists around each feature which • The ease with which geographic data in a GIS can be
represents the actual positional location of the feature used at any scale highlights the importance of
detailed data quality information.
• For example, data captured at the 1:20,000 scale
commonly has a positional accuracy of ± 20 metres • Although a data set may not have a specific scale
once it is loaded into the GIS database, it was
• This means the actual location of features may vary 20
produced with levels of accuracy and resolution that
metres in either direction from the identified position of the
feature on the map make it appropriate for use only at certain scales, and
in combination with data of similar scales.
• Considering that the use of GIS commonly involves the
integration of several data sets, usually at different scales
and quality, one can easily see how errors can be
propagated during processing
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 9 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 10
3
Errors in Database Creation Error and Uncertainty in GIS
Errors are introduced at almost every step of database
creation
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 13 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 14
Error induced by data cleaning, Longley et al., chapter 6, pages Merging. Longley et al., chapter 6, pages 132-133
132-133
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 15 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 16
4
classification error -- difference in
pixel class between the map and a
reference
1939 1971
Error and Uncertainty in GIS
• Because of cost constraints it is often more appropriate to
manage error than attempt to eliminate it!
5
Error and Uncertainty in GIS Error and Uncertainty in GIS
• Depending upon the level of error inherent in the source
data, and the error operationally produced through data
capture and manipulation, GIS products may possess Tools to get a handle on uncertainty
significant amounts of error
Models of uncertainty: methods for assessing and
describing error
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 21 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 22
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 23 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 24
6
Uncertainty Uncertainty (Definition of a Forest)
16
Zimbabwe
14
Uncertainties in geographic information originate from
different sources: 12
7
six characteristics to define the six characteristics to define the
external quality (Beard and Vallière) external quality (Beard and Vallière)
– Definition: to evaluate whether the exact nature of a – Precision: to evaluate what data is worth and whether
data and the object that it describes, that is, the “what”, it is acceptable for an expressed need (semantic,
corresponds to user needs (semantic, spatial and temporal, and spatial precision of the object and its
temporal definitions). attributes).
– Coverage: to evaluate whether the territory and the – Legitimacy: to evaluate the official recognition and the
period for which the data exists, that is, the “where” and legal scope of data and whether they meet the needs of
the “when”, meet user needs. de facto standards, respect recognized standards, have
legal or administrative recognition by an official body, or
– Lineage: to find out where data come from, their legal guarantee by a supplier, etc.;
acquisition objectives, the methods used to obtain them,
that is, the “how” and the “why”, and to see whether the – Accessibility: to evaluate the ease with which the user
data meet user needs. can obtain the data analyzed (cost, time frame, format,
confidentiality, respect of recognized standards,
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 29 copyright, etc.).
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 30
Conceptual model of
uncertainty in spatial data Definitions of geographic objects
Uncertainty An examples of well-defined geographical objects is
land ownership. The boundary between land parcels is
Poorly Defined
commonly marked on the ground, and shows an abrupt
Well Defined
Objects Objects and total change in ownership
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 31 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 32
8
Five dimensions of objects A and B Error
Space
Ideally, if an object is conceptualized as being definable
A
in both attribute and spatial dimensions, then it has a
Boolean occurrence; any location is either part of the
object, or it is not.
Scale Time
Within GIS, for a number of reasons, a location or the
assignment of an object to a location or to the a class
may be expressed as a probability.
B
Relation Attribute
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 33 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 34
9
Vagueness Vagueness
Sorites Paradox (is a bald man with an additional 1 Fuzzy-set theory is an alternative to Boolean sets.
hair still bald?
Membership of an object in a Boolean set is
When, exactly, is a house a house; a settlement, a absolute, and defined by one of two integer values
settlement; a city a city; an oak woodland, an oak {0,1}.
woodland? Membership of a fuzzy set is defined by a real
number in the range [0,1]. Membership or non-
The questions always revolve around the threshold
membership of the set is identified by the terminal
value of some measurable parameter or the opinion
values, while all intervening values define an
of some individual, expert or otherwise.
intermediate degree of belonging to the set (a
membership of 0.25 reflects a smaller degree of
belonging to the set than a membership of 0.5.)
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 37 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 38
Tanzania
Discord – different definitions and interpretation of the 8 Turkey
Morocco Ethiopia
Belgium
same piece of land. (not a problem of a single 6 Malaysia Somalia Denmark
UN New Zealand
classification but of multiple mapping of the same area) UNESCO
U.S. Australia
4 Israel Japan
Mexico South Africa
in the defining of soil, for example, many countries Switzerland
2 Kenya
Portugal
Estonia
have slightly different definitions of what constitutes a
0
soil, names for soils and the spatial and attribute 0 10 20 30 40 50 60 70 80 90
boundaries between soil types. Canopy Coverage (%)
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 39 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 40
10
Discord Ambiguity Discord Ambiguity
a) There is more class1 than b) The “zone of transition” between c) the whole area is allocated d) the two distinct areas of class1
class2 classes 1 and 2 is represented into a class1-&-class2 and class2 are separated by two
by a mosaic of class1-&-class2
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 41 mosaicData Quality and Error Analysis in GIS (c) Dr. J. Greenfeld
mosaics of class1-&-class2 and
42
class2-&-class1
Use personal (expert) judgment to compare A lies on exactly the same line of longitude and
towards the north pole from B;
classification and phenomenon changes over a longer
period. This solution makes extensive use of rough and A lies somewhere to the north of a line running east to
fuzzy sets to accommodate the uncertainty in the west through B
correspondence of classes. A lies between perhaps north-east and north-west, but
is most likely to lie in the sector between north-north-
east and north-north-west of B.
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 43 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 44
11
Non-specificity Ambiguity Uncertainty
The first two definitions are precise and specific, the third Attribute uncertainty (Forest vs. Ag)
is the natural language concept, which is itself vague.
Positional uncertainty
Any lack of definition as to which should be used means
that uncertainty arises in the interpretation of “north of”. Definitional uncertainty
Measurement uncertainty
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 45 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 46
characterize most important aspects of spatial reality Uncertainty affects results of analysis
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 47 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 48
12
Method for determination
conformance quality levels Data quality
Assumptions Lineage
The more errors are in a dataset, the higher the
likelihood of applying erroneous data for decisions or Accuracy Positional
actions
Attribute
Each false decision or action leads to consecutive
Data Quality Completeness
costs
• costs on finding the right answer
Logical Consistency
• costs due to damages caused by false information e.g. by
hitting a pipeline which was documented at a different location
• hidden costs by loosing confidence of the user community Semantic Accuracy
• hidden costs due to image loss by the customer
A dataset is never completely free of errors Currency
The effort to gain a certain quality level costs time and
money
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 49 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 50
sy
sx
(x,y) sy
sx
Note: The size of each
(x4,y4) error could be different
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 51 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 52
13
Error Propagation or
Propagation of Random Errors Error Propagation
Definition: Error propagation is a way of combining two
or more random errors together to get a third.
Given independent variables each with an
uncertainty, error propagation is the method It can be used when you need to measure
of determining an uncertainty in a function of more than one quantity to get at your final
these variables. result. For example, an angle and a distance
Computed errors Measurement or given Errors to compute coordinates
E x, E y Angular and distance Error propagation can also be used to
E area Coordinates combine several independent sources of
random error on the same measurement.
E vol Distance and elevation
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 53 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 54
yt = axt + b
The measured value of x has an error of dx or
x = xt + dx.
Thus y = a(xt + dx) + b = axt + b + a dx
y = yt + a dx y
a
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 55 dy = a dx
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld x 56
14
Error Propagation Error Propagation Examples:
Random error of a sum
A general formula (assuming independence ...
If y = x1 + x 2 + x 3 + + xn
or no correlation)
y y y y
Then s y s x 2 s x 2 s x 2 s x 2
sy ( s x )2 ( s x )2 ( s x )2 ( s x )2 1 2 3 n
x1 x2
1
x3 2
xn 3 n
s x s x s x s x A=ab s A b 2s a2 a 2s b2
1 2 3 n
Then
s y n s x Example
The sides of an 80’x100’ rectangle lot was measured with
an accuracy of ±0.02’. What is the accuracy of the area of
Example the lot?
15
Error Propagation of Azimuth and Error Propagation of coordinates (x,y)
Distance to coordinates (x,y) to Azimuth and Distance
B
X B X A D sin AZ AB D ( X B X A ) 2 (YB YA ) 2 DX 2 DY 2
D
YB YA D cos AZ AB XB XA DX
AZ
AZ tan 1 tan 1
A YB YA DY
s AZ
2
s X s X2 (sin AZ AB ) 2 s D2 ( D cos AZ AB ) 2 sD
1
DX 2 (s X2 A s X2 B ) DY 2 (s Y2A s Y2B )
B A
2062652
D
s AZ
2
1
s Y s Y2 (cos AZ AB ) 2 s D2 ( D sin AZ AB ) 2 s AZ 2
DY 2 (s X2 A s X2 B ) DX 2 (s Y2A s Y2B )
B A
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld
2062652 61
D
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 62
Point X Y Xi+1 -Xi-1 Yi-1 -Yi+1 Xi (Yi-1 -Yi+1) [sx(Yi-1 -Yi+1)]2 [sy(Xi+1 -Xi-1)]2
1
A xi ( yi 1 yi 1 ) A10000.00 10000.00
2 B 9600.04 10599.96 -1300.01 -799.95 -7679521.06 78.76 468.01
C 8699.99 10799.95 -500.06 649.94 5654467.30 216.58 116.81
1
sA [( y yi 1 ) s ] [( xi 1 xi 1 ) s ]
2 2 2 2 D 9099.98 9950.02 1300.01 799.95 7279498.42 570.06 1457.26
i 1 xi yi A 10000.0010000.00 500.06 -649.94 -6499391.57 545.43 344.47
2 S
B 9600.04 10599.96 -1244946.91 1410.83 2386.55
AREA= 622473.45
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 63
s = 30.81
Data Quality and Error Analysis in GIS (c) a
Dr. J. Greenfeld 64
16
POSITIONAL ACCURACY The National Standard for
defined as the closeness of locational information Spatial Data Accuracy (NSSDA)
(usually coordinates) to the true position
How to test positional accuracy? A well-defined statistic and testing methodology for
use an independent source of higher accuracy (e.g. GPS positional accuracy of spatial data.
or raw survey data) Applicable to digital and graphic forms (aerial
photographs, satellite imagery, and maps)
use internal evidence
The standard does not define “pass-fail” accuracy
unclosed polygons, lines which overshoot or values. (agencies are to set criteria)
undershoot junctions, are indications of inaccuracy -
Accuracy report
the sizes of gaps, overshoots and undershoots may
be used as a measure of positional accuracy
compute accuracy from knowledge of the errors http://www.fgdc.gov/standards/projects/FGDC-standards-projects/accuracy/
introduced by different sources using error propagation65
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 66
17
The standard deviation for the z
coordinate direction is: Well-Defined Points
Features that can be identified within 1/3 of the
sz (d i d )2
maximum expected uncertainty for the data set.
n 1
Acceptable features
where:
d i zdatai zchecki Small scale Large scale
Road/Rail intersections Center of utility access cover
d
d i The mean discrepancy Small isolated shrubs Sidewalk/curb/gutter intersec.
n Corners of structures Monuments
n = total number of points checked
NSSDA vertical accuracy is: Check survey points should have accuracies within
one-third the data sets intended accuracy (95% CL)
Accuracyr = 1.96 * si , (95% confidence level)
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 69 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 70
When data exist for only a portion of the data set, confine 2 4.07
test points to that area. 3 2.28
When the distribution of error is likely to be nonrandom, it 4 3.98
may be desirable to locate check points to correspond to
the error distribution. 5 4.18
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 71 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 72
18
ATTRIBUTE ACCURACY ATTRIBUTE ACCURACY
Defined as the closeness of attribute values to their true For categorical attributes such as classified polygons:
value
Are the categories appropriate, sufficiently detailed
Note that while location does not change with time, and defined?
attributes often do Is polygon classified as A really A or should be B?
Attribute accuracy must be analyzed in different ways How heterogeneous are the polygon (e.g. 70% A and
depending on the nature of the data 30% B
For continuous attributes (surfaces) such as on a DEM How well are A and B defined (e.g. soils
or TIN: classifications)
center area may be definitely A, but more like B at
accuracy is expressed as measurement error (e.g. the edges
±1m)
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 73 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 74
P0 PAA PBB
P0 Pe
Kappa
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 75
Pe PData
Ac Quality
PAr and Error PBrin GIS (c) Dr. J. Greenfeld
PBcAnalysis 1 Pe 76
19
The Kappa coefficient How to interpret Kappa
Dataset A Dataset B Comparing A to B Kappa is always less than or equal to 1.
A value of 1 implies perfect agreement and values less
than 1 imply less than perfect agreement.
In rare situations, Kappa can be negative. This is a sign
R B
that the two observers agreed less than would be
R B
expected just by chance.
R 58 6 64 R 0.586 0.061 0.646
B 7 28 35 B 0.071 0.283 0.354 A possible interpretation of Kappa. The agreement is:
65 34 99 0.657 0.343 1
20
User and Producer Accuracy User and Producer Accuracy
The total correspondence of our example is 55%. But, Map user’s accuracy = the total number correct within
that only tells us part of the story. What if we were a row divide by the total number in the whole row.
really interested in classification B? Where there Map producer’s accuracy = the total number of
changes in classification B? Even here, there are two correct within a column divided by the
different ways of interpreting that question: total number in the whole column.
A B C
If I were interested in mapping all the areas of B, Example of classification B A 2 2 0 4
how well did I get them all? This is called the map B 0 2 1 3
Producer’s Accuracy. That is, how well did we C 0 1 1 2
produce a map of classification B.
2 5 2
If I were to use the map to find B, how successful Map user’s accuracy = 2/3 = 67%
would I be? This is called the Map User’s Accuracy.
That is, much confidence should a user of the map Map producer’s accuracy = 2/5 = 40%
have for a given classification.
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 81 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 82
21
LOGICAL CONSISTENCY COMPLETENESS
Topological consistency; correctness of the explicitly Refers to and absence of features, their attributes and
encoded topological characteristics of a dataset. For relationships of spatial data in comparing what is
example: defined in the data model or what is in the real world.
• If there are polygons, do they close? Error of commission – data presented in a data set that
• Is there exactly one label within each polygon? is not present in the data model or the real world
• Are there nodes wherever arcs cross, or do arcs Error of omission – data that is present in the data
sometimes cross without forming nodes? model or the real world is absent in the dataset.
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 85 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 86
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 87 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 88
22
An Example of Data Quality Elements An Example of Data Quality Elements
and Sub-elements for Buildings and Sub-elements for Buildings
Quality Quality sub- Description by Quality Quality sub- Description by
elements elements examples elements elements examples
RMSE of a building RMSE of a building
polygon based on a com- polygon based on a
parison of the horizontal
comparison of the
Positional Horizontal accuracy coordinates of all the Positional
Vertical accuracy vertical coordinates of all
nodes of its footprints of accuracy
accuracy
a building in GIS with the nodes of its footprints
the corresponding of a building in GIS with
reference values. the corresponding
reference values.
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 89 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 90
23
An Example of Data Quality Elements Uncertainties Measured Based on
and Sub-elements for Buildings Various Mathematical Theories
Quality Quality sub- Uncertainty
elements elements Description by examples
Building names in title case - Imprecision Ambiguity Vagueness
Logical Hong Kong Airport- are
consistency Format consistent, while a name Confidence region
consistency such as "HONG KONG model Shi 1994
Airport" is not consistent in Probability and Entropy Shannon 1948 Hartley’s measure 1928
statistical theory
format.
U-uncertainty, Fuzzy measure
When the outline of a building Evidence theory Fuzzy topology measure
Topological polygon is closed, the
consistency topology is consistent; when Discord measure, Confusion measure Fuzzy sets, Probability
and Fuzzy topology
the outline is not closed, the and non-specificity measure
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 93 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 94
topology is not consistent.
Uncertainty Information
24
Error model of point – Error Ellipse Error model of line - Epsilon band
Assumptions:
Sx
V U 1. each error effect relevant to a particular digital line in a
Su GIS can be treated as a random variable, perturbing the
Sy
2S xy true line to obtain the observed line.
tan 2t t
S SY2
2
X
X 2. the processes of generating a digital line in a GIS can be
Sv
treated as being independent.
The bandwidth is determined from a statistical function of
( S X2 SY2 ) 2
K S XY
2
those positional errors on the line accumulated from the
4 first stage to the final stage of data capture.
S X2 SY2 S X2 SY2
Su2 K Su2 K The measured Line
2 2 The true Line
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 97 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 98
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 99 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 100
25
What is a standard? Examples of Everyday Standards
Traffic Signals – Road Signs
Standards are documented agreements containing VISA / Mastercard: standards allow people to use
technical specifications or other precise criteria to be a single card to obtain cash in the local currency
used around the world
consistently as rules, guidelines, or definitions of Commerce/Manufacturing/Industry
characteristics, to
World War II - Allied supplies and facilities were
ensure that materials, products, processes and severely strained due to the incompatibility of
services are fit for their purpose. tools, replacements parts, and equipment. The
establishment of international standards helped
to increase compatibility.
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 101 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 102
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 103 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 104
26
Evaluating and Reporting Quality Evaluation
Types of geospatial standards Results [ISO 19114]
Dataset as Product specification
specified by or user requirements work
Data Classification the scope
item
19131
e.g., Vegetation Classification 1
ISO 19114
Identify an applicable data quality
element, data quality subelement,
Data Content and data quality scope
ISO 19113
e.g., Digital Geospatial Metadata, Spatial Schema
2 Identify a data quality measure
Data Symbology or Presentation 5 step process on Conformance
quality evaluation quality level
e.g., Digital Geologic Map Symbolization 3 Select and apply a data quality
evaluation method
Data Transfer 4
Determine the data quality result
Data Usability 5
Determine conformance
e.g., Geospatial Positioning Accuracy
Report data quality Report data quality
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 105 Data Quality andresult (quantitative)
Error Analysis in GIS (c) Dr. J. Greenfeld result (pass / 106
fail)
Metadata Example
Without…
With…
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 108
27
Metadata need Example The Standard
Metadata has four major roles:
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 109 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 110
• Title, Abstract, Publication Date Organize and maintain an organization’s investment in data
(Section 1: Identification information)
• Data Accuracy and Completeness Provide information to data catalogs and clearinghouses
(Section 2: Data Quality Information)
Provide information to aid data transfer
• Data Form: Vector or Raster?
(Section 3: Spatial Data Organization Information)
• Projection or Geographic Reference System Food for thought...
(Section 4: Spatial Reference Information)
• What Values Are Associated with Geodata? Nothing happens overnight: get used to thinking of the long term benefits
(Section 5: Entity and Attribute Information) of metadata. $$$
• How Do You Get It? Cost? Documentation = defense
(Section 6: Distribution Information)
• How Current Is the Documentation? The Standard: don't judge a book by its cover
Data(Section
Quality and Error 7: Metadata
Analysis Reference Information)
in GIS (c) Dr. J. Greenfeld 111 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 112
28
Metadata resources Metadata resources
The FGDC
Federal Geographic Data Committee: Interagency committee that • The objective of this International Standard is to provide a clear
coordinates federal geo-data activities. procedure for the description of digital geographic datasets so that users
will be able to determine whether the data in a holding will be of use to
The Content Standard for Digital Geospatial Metadata (CSDGM) them and how to access the data. By establishing a common set of
•The current US Federal Metadata standard metadata terminology, definitions and extension procedures, this
•Often referred to as the 'FGDC Metadata Standard‘ standard will promote the proper use and effective retrieval of geographic
•Has been implemented in federal state and local governments data.
• Supplementary benefits of this standard for metadata are to facilitate the
organization and management of geographic data and to provide
International Organization of Standards (ISO), has developed and information about an organization’s database to others.
approved an international metadata standard, ISO 19115 – Geographic • This standard for the implementation and documentation of metadata
Information Metadata furnishes those unfamiliar with geographic data the appropriate
information to characterize their geographic data and it makes possible
dataset cataloguing enabling data discovery, retrieval and reuse.
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 113 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 114
1. 2. 3. 4. 5. 6. 7.
Avoid using jargon
29
Best Practices for Writing Quality Metadata Best Practices for Writing Quality Metadata
In Practice In Practice (continued)
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 117 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 118
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 119 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 120
30