0% found this document useful (0 votes)
2 views

02_spatial_data_models

The document discusses spatial data models in GIS, focusing on raster and vector data models. It explains the characteristics, advantages, and disadvantages of raster models, including encoding methods such as cell-by-cell, run-length, and quad-tree encoding. The vector data model is contrasted with raster, emphasizing its use of points, lines, and polygons, and introduces spaghetti and topological data structures.

Uploaded by

gegis.jkuat2021
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

02_spatial_data_models

The document discusses spatial data models in GIS, focusing on raster and vector data models. It explains the characteristics, advantages, and disadvantages of raster models, including encoding methods such as cell-by-cell, run-length, and quad-tree encoding. The vector data model is contrasted with raster, emphasizing its use of points, lines, and polygons, and introduces spaghetti and topological data structures.

Uploaded by

gegis.jkuat2021
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

GIS II

EGE 2414

Spatial Data Models


Lecture No. 02

Felix Mutua, PhD


Monday, September 23, 2024
Definition
• In order to visualize natural phenomena, one must first
determine how to best represent geographic space.
• Data models are a set of rules and/or constructs used to
describe and represent aspects of the real world in a
computer.
• Two primary data models are available to complete this
task:
– raster data models and
– vector data models.

9/23/2024
Raster Data Models
• widely used in applications ranging far beyond geographic
information systems (GISs).
• you are already very familiar with this data model if you
have any experience with digital photographs.
• The ubiquitous JPEG, BMP, and TIFF file formats (among
others) are based on the raster data model

9/23/2024
Raster Data Models
• If you zoom deeply into the image, you will notice that it is
composed of an array of tiny square pixels (or picture
elements). Each of these uniquely colored pixels, when
viewed as a whole, combines to form a coherent image
all liquid crystal display
(LCD) computer monitors
are based on raster
technology as they are
composed of a set number of
rows and columns of pixels.
9/23/2024
Raster Data Models
• Because of the reliance on a uniform series of square pixels,
the raster data model is referred to as a grid-based system.
• Typically, a single data value will be assigned to each grid
locale.
• Each cell in a raster carries a single value, which represents
the characteristic of the spatial phenomenon at a location
denoted by its row and column.
• The data type for that cell value can be either integer or
floating-point
9/23/2024
Raster Data Models
• Nowadays, the raster graphic can reference a database
management system wherein open-ended attribute tables
can be used to associate multiple data values to each pixel.
• The advance of computer technology has made this
second methodology increasingly feasible as large
datasets are no longer constrained by computer storage
issues as they were previously.

9/23/2024
Spatial Resolution
• The raster model will average all values within a given pixel to
yield a single value. Therefore, the more area covered per pixel,
the less accurate the associated data values.
• The area covered by each pixel determines the spatial
resolution of the raster model from which it is derived.
• Specifically, resolution is determined by measuring one side of the
square pixel.
• A raster model with pixels representing 10 m by 10 m (or 100 square
meters) in the real world would be said to have a spatial resolution of
10 m; a raster model with pixels measuring 1 km by 1 km (1 square
kilometer) in the real world would be said to have a spatial resolution
of 1 km; and so forth.
9/23/2024
Spatial Resolution
• Care must be taken when determining the resolution of a
raster because using an overly coarse pixel resolution will
cause a loss of information, whereas using overly fine
pixel resolution will result in significant increases in file
size and computer processing requirements during
display and/or analysis.
• An effective pixel resolution will take both the map scale
and the minimum mapping unit of the other GIS data into
consideration

9/23/2024
Raster Model Characteristics (Rules)
1. each pixel must hold at least one
value, even if that data value is
zero.
– If no data are present for a given
pixel, a data value placeholder must
be assigned to this grid cell.
– Often, an arbitrary, readily
identifiable value (e.g., −9999) will
be assigned to pixels for which
there is no data value

9/23/2024
Raster Model Characteristics (Rules)
2. a cell can hold any alphanumeric
index that represents an
attribute.
– In the case of quantitative datasets,
attribute assignation is fairly
straightforward. For example, if a raster
image denotes elevation, the data values
for each pixel would be some indication
of elevation, usually in feet or meters
– In the case of qualitative datasets, data
values are indices that necessarily refer
to some predetermined translational
rule. In the case of a land-use/land-cover
raster graphic, the following rule may be
applied: 1 = grassland, 2 = agricultural, 3
9/23/2024
= disturbed, and so forth
Raster Model Characteristics (Rules)
3. points and lines “move” to the center of the cell. As one
might expect, if a 1 km resolution raster image contains a
river or stream, the location of the actual waterway within
the “river” pixel will be unclear. Therefore, there is a general
assumption that all zero-dimensional (point) and one-
dimensional (line) features will be located toward the center
of the cell.
4. the minimum width for any line feature must necessarily be
one cell regardless of the actual width of the feature. If it is
not, the feature will not be represented in the image and
will therefore be assumed to be absent.

9/23/2024
ENCODING/STORING RASTER DATA

9/23/2024
Cell-by-cell raster encoding
• This minimally intensive method encodes a raster by
creating records for each cell value by row and column
• This method could be thought of as a large spreadsheet
wherein each cell of the spreadsheet represents a
pixel in the raster image.
• This method is also referred to as “exhaustive
enumeration.”

9/23/2024
Cell-by-cell raster encoding

Cell by Encoding
Characteristics
• each cell having its
particular code is stored
individually
• problem of storage space
• processing speed is
reduced
• redundancy in database

9/23/2024
Run-length raster encoding
• This method encodes cell values in runs of similarly
valued pixels and can result in a highly compressed image
file
• The run-length encoding method is useful in situations
where large groups of neighboring pixels have similar
values (e.g., discrete datasets such as land use/land cover
or habitat suitability) and is less useful where
neighboring pixel values vary widely (e.g., continuous
datasets such as elevation or sea-surface temperatures).

9/23/2024
Run-length raster encoding

• adjacent cells along a row that


have the same value are treated
as a group termed a “run”

• the pixel value is stored once,


together with information about
the size and location of the run

9/23/2024
Run-length raster encoding
Types of Run - length encoding:
a) Standard Run-Length Encoding:
- the value of the attribute, the number of
the cells in the run and the row number are
recorded in a file

b) Value Point Encoding


– cells are assigned position numbers starting
in the upper left corner of the image, proceeding from
left to right and from top to bottom
– position number at the end of each run is
stored in the “POINT” column while the value for each
cell is stored in the “VALUE” column in a file
– Start counting from the first entry with ‘0’
9/23/2024
Quad-tree raster encoding
• A quadtree is a tree data structure in which each internal
node has up to four children. Quadtrees are most often
used to partition a two dimensional space by recursively
subdividing it into four quadrants or regions.
• The regions may be square or rectangular, or may have
arbitrary shapes
• A quadtree is a special tree that defines each node as having
four children. They are very useful to subdivide a 2D space by
splitting it recursively in four quadrants.

9/23/2024
Quad-tree raster encoding
• This method divides a raster into a hierarchy of quadrants
that are subdivided based on similarly valued pixels.
• The division of the raster stops when a quadrant is made
entirely from cells of the same value.
• A quadrant that cannot be subdivided is called a “leaf
node.”

9/23/2024
Quad-tree raster encoding

9/23/2024
Quad-tree raster encoding

9/23/2024
Quad-tree raster encoding

9/23/2024
Quad-tree raster encoding
• The major disadvantage is the time it takes to create and
modify the quadtree
• requires more processing time to generate the quadtree with
its indexes and tables
• if the data is fairly homogeneous then quadtrees provide
efficient storage
• fewer the classes larger the clumps greater the degree of
compression and more efficient is the quadtree structure
• best utilized when the need for updating is not frequent
9/23/2024
Advantages of the Raster Model
1. the technology required to create raster graphics is inexpensive and
ubiquitous. Nearly everyone currently owns some sort of raster
image generator, namely a digital camera
2. Similarly, a plethora of satellites are constantly beaming up-to-the-
minute raster graphics to scientific facilities across the globe
3. relative simplicity of the underlying data structure. Each grid
location represented in the raster image correlates to a single value
(or series of values if attributes tables are included). This simple
data structure may also help explain why it is relatively easy to
perform overlay analyses on raster data

9/23/2024
Disadvantages of the Raster Model
1. raster files are typically very large. Particularly in the
case of raster images built from the cell-by-cell encoding
methodology, the sheer number of values stored for a
given dataset result in potentially enormous files
2. the output images are less “pretty” than their vector
counterparts. This is particularly noticeable when the
raster images are enlarged or zoomed

9/23/2024
3. changing map projections will alter the size and shape of the
original input layer and frequently result in the loss or addition of
pixels. These alterations will result in the perfect square pixels of
the input layer taking on some alternate rhomboidal dimensions
4. the reprojection of a raster image dataset from one projection to
another brings change to pixel values that may, in turn,
significantly alter the output information
5. The raster data model is not suitable for some types of spatial
analyses. For example, difficulties arise when attempting to overlay
and analyze multiple raster graphics produced at differing scales
and pixel resolutions.

9/23/2024
VECTOR DATA MODEL

9/23/2024
• In contrast to the raster data model is the vector data model. In this
model, space is not quantized into discrete grid cells like the raster
model.
• Vector data models use points and their associated X, Y coordinate
pairs to represent the vertices of spatial features
• The data attributes of these features are then stored in a separate
database management system.
• The spatial information and the attribute information for these
models are linked via a simple identification number that is given to
each feature in a map.
• Three fundamental vector types exist in geographic information
systems (GISs): points, lines, and polygons

9/23/2024
Points
• Points are zero-dimensional objects that contain only a single
coordinate pair.
• Points are typically used to model singular, discrete features such as
buildings, wells, power poles, sample locations, and so forth.
• Points have only the property of location.
• Other types of point features include the node and the vertex.
– a point is a stand-alone feature, while
– a node is a topological junction representing a common X, Y coordinate
pair between intersecting lines and/or polygons.
– Vertices are defined as each bend along a line or polygon feature that is
not the intersection of lines or polygons.

9/23/2024
Points
• Points can be spatially linked to
form more complex features.
• Lines are one-dimensional
features composed of multiple,
explicitly connected points.
• Lines are used to represent linear
features such as roads, streams,
faults, boundaries, and so forth.
• Lines have the property of length.
• Lines that directly connect two
nodes are sometimes referred to
as chains, edges, segments,
or arcs.
9/23/2024
Polygons
• Polygons are two-dimensional features created by multiple
lines that loop back to create a “closed” feature.
• In the case of polygons, the first coordinate pair (point) on the
first line segment is the same as the last coordinate pair on
the last line segment.
• Polygons are used to represent features such as city
boundaries, geologic formations, lakes, soil associations,
vegetation communities, and so forth. Polygons have the
properties of area and perimeter. Polygons are also
called areas.

9/23/2024
Vector Data Models Structures
• Vector data models can be structured many different
ways.
• We will examine two of the more common data structures
here.
– spaghetti data model
– topological data model

9/23/2024
spaghetti data model
• simplest vector data structure is called the spaghetti
data model
• In the spaghetti model, each point, line, and/or polygon
feature is represented as a string of X, Y coordinate pairs
(or as a single X, Y coordinate pair in the case of a vector
image with a single point) with no inherent structure

9/23/2024
spaghetti data model

vn

9/23/2024
spaghetti data model
• each line in this model to be a single strand of spaghetti that
is formed into complex shapes by the addition of more and
more strands of spaghetti.
• any polygons that lie adjacent to each other must be made up
of their own lines, or stands of spaghetti.
• each polygon must be uniquely defined by its own set
of X, Y coordinate pairs, even if the adjacent polygons
share the exact same boundary information.
• This creates some redundancies within the data model and
therefore reduces efficiency.
9/23/2024
spaghetti data model
• Despite the location designations associated with each line, or
strand of spaghetti, spatial relationships are not explicitly
encoded within the spaghetti model; rather, they are implied
by their location.
• The computational requirements, therefore, are very steep if
any advanced analytical techniques are employed on vector
files structured thusly.
• Nevertheless, the simple structure of the spaghetti data model
allows for efficient reproduction of maps and graphics as this
topological information is unnecessary for plotting and
printing.

9/23/2024
topological data model
• In contrast to the spaghetti data model, the topological
data model is characterized by the inclusion of
topological information within the dataset, as the name
implies.
• Topology is a set of rules that model the relationships
between neighboring points, lines, and polygons and
determines how they share geometry.

9/23/2024
topological data model
• In contrast to the spaghetti data model, the topological
data model is characterized by the inclusion of
topological information within the dataset, as the name
implies.
• Topology is a set of rules that model the relationships
between neighboring points, lines, and polygons and
determines how they share geometry.

9/23/2024
topological data model
• In mathematics, topology
(from the Greek words τόπος,
'place, location', and λόγος,
'study') is concerned with the
properties of a geometric object
that are preserved under
continuous deformations, such
as stretching, twisting,
crumpling, and bending; that is,
without closing holes, opening Möbius strips, which have only one surface and one
holes, tearing, gluing, or passing edge, are a kind of object studied in topology.

through itself.
9/23/2024
topological data model
• Consider a shared polygon boundary - The inclusion of
topology into the data model allows for a single line to
represent this shared boundary with an explicit reference
to denote which side of the line belongs with which
polygon.
• Topology is also concerned with preserving spatial
properties when the forms are bent, stretched, or placed
under similar geometric transformations, which allows for
more efficient projection and reprojection of map files.

9/23/2024
Three basic topological precepts
• connectivity describes the arc-node
topology for the feature dataset. Connectivity
• In the topological data model,
nodes are the intersection points
where two or more arcs meet.
• In the case of arc-node topology,
arcs have both a from-node (i.e.,
starting node) indicating where the
arc begins and a to-node (i.e.,
ending node) indicating where the
arc ends

9/23/2024
Three basic topological precepts
• In addition, between each node
pair is a line segment, Connectivity
sometimes called a link, which
has its own identification
number and references both its
from-node and to-node.
• arcs A,B and C all intersect
because they share node 17.
Therefore, the computer can
determine that it is possible to
move along arc A and turn onto
arc C
9/23/2024
Three basic topological precepts
• Area definition states that an
arc that connects to surround Area definition
an area defines a polygon,
also called polygon-arc
topology.
• In the case of polygon-arc
topology, arcs are used to
construct polygons, and each
arc is stored only once
9/23/2024
Three basic topological precepts
• Area definition states that an
arc that connects to surround Area definition
an area defines a polygon, also
called polygon-arc topology.
• In the case of polygon-arc
topology, arcs are used to
construct polygons, and each
arc is stored only once
• This results in a reduction in the
amount of data stored and
ensures that adjacent polygon
boundaries do not overlap the polygon-arc topology makes it clear that polygon F
is made up of arcs 6,8,11
9/23/2024
Three basic topological precepts
• Contiguity, the third Contiguity
topological precept, is based on
the concept that polygons that
share a boundary are deemed
adjacent.
• Specifically, polygon topology
requires that all arcs in a
polygon have a direction (a
from-node and a to-node),
which allows adjacency
information to be determined
9/23/2024
Three basic topological precepts
• Polygons that share an arc are Contiguity
deemed adjacent, or
contiguous, and therefore the
“left” and “right” side of each
arc can be defined.
• This left and right polygon
information is stored
explicitly within the attribute
information of the topological
data model.
The “universe polygon” is an essential component of polygon topology that represents the external area located outside of the
study area
Topological Errors
• Topology allows the computer to rapidly determine and analyze the
spatial relationships of all its included features.
• In addition, topological information is important because it allows
for efficient error detection within a vector dataset.
• In the case of polygon features,
– open or unclosed polygons, which occur when an arc does not completely
loop back upon itself, and
– unlabeled polygons, which occur when an area does not contain any
attribute information, violate polygon-arc topology rules.
– Another topological error found with polygon features is the sliver.
Slivers occur when the shared boundary of two polygons do not meet
exactly

9/23/2024
Topological Errors
In the case of line features,
• topological errors occur when two lines
do not meet perfectly at a node. This
error is called an “undershoot”
• when the lines do not extend far
enough to meet each other and an
“overshoot” when the line extends
beyond the feature it should connect to
• The result of overshoots and
undershoots is a “dangling node” at the
end of the line.
• Dangling nodes aren’t always an error,
however, as they occur in the case of
dead-end streets on a road map.
9/23/2024
Advantages of the Vector Model
1. vector data models tend to be better representations of
reality due to the accuracy and precision of points, lines,
and polygons over the regularly spaced grid cells of the
raster model.
2. This results in vector data tending to be more
aesthetically pleasing than raster data.
3. ector data also provides an increased ability to alter the
scale of observation and analysis.

9/23/2024
Advantages of the Vector Model
4. As each coordinate pair associated with a point, line, and polygon
represents an infinitesimally exact location (albeit limited by the
number of significant digits and/or data acquisition
methodologies), zooming deep into a vector image does not change
the view of a vector graphic in the way that it does a raster graphic
5. Vector data tend to be more compact in data structure, so file sizes
are typically much smaller than their raster counterparts
6. topology is inherent in the vector model. This topological
information results in simplified spatial analysis (e.g., error
detection, network analysis, proximity analysis, and spatial
transformation) when using a vector model.

9/23/2024
Disadvantages of the Vector Model
1. the data structure tends to be much more complex than the
simple raster data model. As the location of each vertex must
be stored explicitly in the model, there are no shortcuts for
storing data like there are for raster models (e.g., the run-
length and quad-tree encoding methodologies).
2. the implementation of spatial analysis can also be relatively
complicated due to minor differences in accuracy and
precision between the input datasets.
3. Similarly, the algorithms for manipulating and analyzing
vector data are complex and can lead to intensive processing
requirements, particularly when dealing with large datasets.

9/23/2024
Assignment 02 - Part I
1.Examine a digital photo you have taken recently. Can you
estimate its spatial resolution? Attach the photo
2.If you were to create a raster data file showing the major land-
use types in your county,
– which encoding method would you use?
– What method would you use if you were to encode a map of the
major waterways in your county? Why?
– Create a hypothetical raster showing 5-7 classes based you the actual
county boundaries (tiff format)
Similar or copied answers will be penalized. A score of -10(negative) will be award to the copier and the original

9/23/2024
Assignment 02 - Part II
Each student has is required to generated a 1km by 1km grid representing a part of your
county that is urbanized (No two students should have the same location!).
Required
1. Download and extract road/paths data from OpenStreet maps in vector format (shp)
2. Implement the following topologies
• Must not overlap
• Must not have dangles
• Must Be Covered By your allocated grid
• Must Not self-overlap
• Must be a single part
3. Plot a before map showing the topological errors (pdf)
4. Fix the topological errors and store the data (raw and fixed), and the topology attributes in a
geodatabase
5. Submit the Part I answers, the map and the geodatabase all zipped together this link
https://forms.gle/CNjX9U85C73ZShwV9 Next week 7:00am

9/23/2024

You might also like