0% found this document useful (0 votes)
3 views

Gis Module 2

This document provides an overview of spatial data models and data structure models used in Geographic Information Systems (GIS). It discusses various types of spatial data, including vector and raster data, and outlines the advantages and disadvantages of different data structure models such as hierarchical, network, relational, and object-oriented models. Additionally, it covers data compression techniques and the importance of data modeling in organizing and analyzing geographic information.

Uploaded by

aryasree392
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Gis Module 2

This document provides an overview of spatial data models and data structure models used in Geographic Information Systems (GIS). It discusses various types of spatial data, including vector and raster data, and outlines the advantages and disadvantages of different data structure models such as hierarchical, network, relational, and object-oriented models. Additionally, it covers data compression techniques and the importance of data modeling in organizing and analyzing geographic information.

Uploaded by

aryasree392
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

GEOGRAPHICAL INFORMATION

SYSTEM
MODULE 2

Prepared by
Aryasree Madhukumar
Assistant Professor
Department of Civil Engineering

3/20/2025 1
SPATIAL DATA MODELS

• Spatial data are crucial for Geographic Information


Systems (GIS) as they form the foundation for all
functionalities that differentiate GIS from other
analytical tools. These data are often called layers,
which represent features on, above, or below the Earth's
surface.
3/20/2025 2
3/20/2025 3
Depending on the type of features they represent, and the
purpose to which the data will be applied, layers will be
one of two major types.
a) Vector data represent features as discrete points, lines,
and polygons.
b) Raster data represent the landscape as a rectangular
matrix of square cells.

3/20/2025 4
DATA STRUCTURE MODELS
Data models are the conceptual models that describe the
structures of databases. The structure of a database is
defined by the data types, the constraints and the
relationships for the description or storage of data.
Following are the most often used data models:
1) Hierarchical Data Structure Model
2) Network Data Structure Model
3) Relational Data Structure Model
4) Object Oriented Data Structure Model
3/20/2025 5
Hierarchical Data Structure Model
• It is the earliest database model that is evolved from file system where
records are arranged in a hierarchy or as a tree structure.
• Records are connected through pointers that store the address of the
related record.
• Each pointer establishes a parent-child relationship where a parent can
have more than one child but a child can only have one parent.
• There is no connection between the elements at the same level.
• To locate a particular record, you have to start at the top of the tree with
a parent record and trace down the tree to the child.

3/20/2025 6
3/20/2025 7
Advantages
• Easy to understand: The organization of database parallels a family
tree understanding which is quite easy.
• Accessing records or updating records are very fast since the
relationships have been predefined.
Disadvantages
• Large index files are to be maintained and certain attribute values
are repeated many times which lead to data redundancy and
increased storage.
• The rigid structure of this model doesn’t allow alteration of tables,
therefore to add a new relationship entire database is to be redefined.

3/20/2025 8
Network Data Structure Model
• A network is a generalized graph that captures
relationships between objects using connectivity .
• A network database consists of a collection of records
that are connected to each other through links.
• A link is an association between two records.
• It allows each record to have many parents and many
children thus allowing a natural model of relationships
between entities.

3/20/2025 9
3/20/2025 10
Advantages
• Too many relationships are easily implemented in a network data
model.
• Data access and flexibility in network model is better than that in
hierarchical model.
• An application can access an owner record and the member records
within a set.
• It enforces data integrity as a user must first define owner record and
then the member records.
• The model eliminated redundancy but at the expense of more
complicated relationships.
3/20/2025 11
Relational Data Structure Model
• Introduced by Codd in 1970.
• Relates or connects data in different files through the use of a common
field.
• A flat file structure is used with a relational database model.
• In this arrangement, data is stored in different tables made up of rows
and columns
• The columns of a table are named by attributes.
• Each row in the table is called a tuple and represents a basic fact.
• No two rows of the same table may have identical values in all
columns.

3/20/2025 12
Advantages
• The manager or administrator does not have to be aware of any
data structure or data pointer.
• One can easily add, update, delete or create records using simple
logic.
Disadvantages
• A few search commands in a relational database require more time
to process compared with other database models.

3/20/2025 13
3/20/2025 14
Object Oriented Database Structure
• Uses functions to model spatial and non-spatial relationships of
geographic objects and the attributes.
• Object is an encapsulated unit which is characterized by attributes,
a set of orientations and rules.
An object-oriented model has the following characteristics.
• Generic Properties:.
• Abstraction:
• Adhoc Queries:

3/20/2025 15
3/20/2025 16
An object-oriented database is based on a semantic model ,Which is
usually managed by a spatial language although the language has not yet
been fully completed.

3/20/2025 17
ENTITY-RELATIONSHIP DIAGRAM (ERD)
• A data modeling technique that graphically illustrates an information
system’s entities and the relationships between those entities.
• It is a conceptual and representational model of data used to
represent the entity framework infrastructure.

The elements of an ERD are:


• Entities
• Relationships
• Attributes

3/20/2025 18
Steps involved in creating an ERD include:
• Identifying and defining the entities
• Determining all interactions between the entities
• Analyzing the nature of interactions/determining the
cardinality of the relationships
• Creating the ERD

3/20/2025 19
3/20/2025 20
3/20/2025 21
3/20/2025 22
3/20/2025 23
SPATIAL DATA MODELS
• Models are simplification of reality.
• The process of defining and organizing data about the real world into a
consistent digital dataset that is useful and reveals information is called data
modeling.
• The logical organization of data according to a scheme is known as data
models
• Data can be defined as verifiable facts.
• Information is data organized to reveal patterns, and to facilitate search.
• Spatial information is difficult to extract from spatial data, unless the data are
organized primarily by spatial attributes.

3/20/2025 24
• Spatial objects are characterized by attributes that are both spatial and
non-spatial, and the digital description of objects and their attributes
comprise spatial datasets.
• Spatial data can be organized in different ways, depending on the way
they are collected, how they are stored, and the purpose they are put.
• A database is a collection of inter-related data and everything that is
needed to maintain and use it.
• A Database Management System is a collection of software for
storing, editing and retrieving data in a database.

3/20/2025 25
Traditionally spatial data has been stored and presented in the
form of a map. Three basic types of spatial data models have
evolved for storing geographic data digitally. These are referred
to as:
• Vector;
• Raster;
• Image.

3/20/2025 26
3/20/2025 27
VECTOR DATA FORMATS
• Vector storage implies the use of vectors (directional lines) to
represent a geographic feature.
• Vector data is characterized by the use of sequential points or
vertices to define a linear segment.
• Each vertex consists of an X coordinate and a Y coordinate.
• Vector lines are often referred to as arcs and consist of a
string of vertices terminated by a node.
• A node is defined as a vertex that starts or ends an arc
segment.
3/20/2025 28
• Point features are defined by one coordinate pair, a vertex.
• Polygonal features are defined by a set of closed coordinate
pairs.
• In vector representation, the storage of the vertices for each
feature is important, as well as the connectivity between
features, e.g. the sharing of common vertices where features
connect.

3/20/2025 29
• The most popular method of retaining spatial relationships among
features is to explicitly record adjacency information in what is
known as the topologic data model.
• spatial relationships between geographic features are easily derived
when using them.
• The topologic model is the dominant vector data structure currently
used in GIS technology.
• Many of the complex data analysis functions cannot effectively be
undertaken without a topologic vector data structure.

3/20/2025 30
• The secondary vector data structure that is common among GIS
software is the computer aided drafting (CAD) data structure.
• This structure consists of listing elements, not features, defined by
strings of vertices, to define geographic features, e.g. points, lines, or
areas.
• The CAD structure emerged from the development of computer
graphics systems without specific considerations of processing
geographic features.
• Accordingly, since features, e.g. polygons, are self-contained and
independent, questions about the adjacency of features can be difficult
to answer.
• The CAD vector model lacks the definition of spatial relationships
between features that is defined by the topologic data model.3/20/2025 31
3/20/2025 32
Raster Data Formats
• Raster data models incorporate the use of a grid-cell data structure where the geographic area
is divided into cells identified by row and column.
• While the term raster implies a regularly spaced grid other tessellated data structures do exist in
grid-based GIS systems.
• In particular, the quadtree data structure has found some acceptance as an alternative raster
data model.

• Cell Size and Accuracy – The size of each cell affects data accuracy.
• No Need for Coordinates –
• Raster vs. Vector –
• Vector to Raster Conversion –
• One Attribute per Cell –
• Different Uses

3/20/2025 33
Image Data
Image data is mainly used to store pictures, such as satellite
images, scanned maps, or photographs. In GIS, image data is
different from raster data because it is not directly used for
analysis but often serves as background display or visual
reference.

3/20/2025 34
VECTOR AND RASTER – ADVANTAGES AND
DISADVANTAGES
Vector Data: Advantages:
• Data can be represented at its original resolution and form without
generalization.
• Graphic output is usually more aesthetically pleasing
• No data conversion is required.
• Accurate geographic location of data is maintained.
• Allows for efficient encoding of topology, and as a result more efficient
operations that require topological information, e.g. proximity, network
analysis.
3/20/2025 35
Disadvantages:
•Vertex Storage – Each point (vertex) in vector data must be explicitly stored.
•Complex Algorithms – Manipulating and analyzing vector data is computationally
heavy, especially for large datasets.
•Topology Requirement – Vector data needs a topological structure for analysis,
which requires intensive processing and data cleaning.
•Static Topology – Any updates or edits to vector data require rebuilding the
topology.
•Poor for Continuous Data – Vector format does not efficiently represent
continuous data like elevation; generalization or interpolation is often needed.
•Limited Polygon Analysis – Spatial analysis and filtering within polygons are not
possible in vector format.

3/20/2025 36
Advantages of Raster Data:
•No Need for Coordinates – Only the origin point is stored; other
locations are implied by the grid.
•Easy and Fast Analysis – Raster data is simple to process and quick for
computations.
•Great for Math & Modeling – Works well for quantitative analysis and
mathematical modeling.
•Handles All Data Types – Supports both discrete data (e.g., forests) and
continuous data (e.g., elevation).
•Compatible with Output Devices – Works well with plotters and graphic
screens.

3/20/2025 37
Disadvantages of Raster Data:
•Complex Storage – Every vertex (point) must be stored separately.
•Requires Topology – Needs a structured format for analysis, requiring
data cleaning and rebuilding after edits.
•Processing-Intensive – Analysis and manipulation take more
computing power, especially for large datasets.
•Not Ideal for Continuous Data – Struggles with elevation and similar
datasets, requiring generalization or interpolation.
•Limited Polygon Analysis – Cannot easily perform filtering or spatial
analysis within polygons.

3/20/2025 38
RASTER DATA STRUCTURE

(a) Entity model: (c) File structure:

(b) Pixel values:

3/20/2025 39
DATA COMPACTION METHODS- RASTER
Run length encoding Block encoding

3/20/2025 40
Chain encoding Quadtree

3/20/2025 41
A single layer raster data can be represented using

(a) Two colors (binary): (b) Gray-scale:

3/20/2025 42
DATA COMPRESSION
• The process of reducing the size of a file or database.
• Compression improves data handling, storage, and database
performance.
• Examples of compression methods include quadtrees, run-length
encoding, and wavelets.

3/20/2025 43
Compression ratio:
• The compression ratio (that is , the size of the compressed file
compared to that of the uncompressed file) of lossy video codec’s is
nearly always far superior to that of the audio and still-image
equivalents Wavelet compression, used by raster formats such as
MrSID,JPEG2000,andER Map per’s ECW, takes time to decompress
before drawing.
• Compression a series of techniques used for the reduction of space,
bandwidth, cost, transmission, generating time, and the storage of
data.
• It’s a computer process using algorithms that reduces the size of
electronic documents so they occupy less digital storage space.
3/20/2025 44
Compression data principle

“Eliminate data
redundancy and try to find
a code with less data
volume.”

• All raster / image compression / encoding techniques attempt to get rid


of the inherent redundancy, which may be spatial (neighboring similarity
or equal pixels), spectral (pixels in different spectral bands in a color
image) or temporal (correlated images in a sequence.
3/20/2025 45
Raster Data Compression

3/20/2025 46
Raster Data Compression
• Huge raster data has be to stored, retrieved, manipulated and analyzed.
• Large no. of thematic map layer is involved.
• Many repetitive characters are involved.
• Therefore, for better storage and to preserve highest possible degree of
accuracy, we need to go for compact methods of storing.
• Common method is elimination of repetitive characters.

3/20/2025 47
3/20/2025 48
Run length Encoding
• Value often occur in runs across several cells, i.e., cells of the same
value are often neighbors, like same soil type, or similar parameters.
• spatial auto-correlation exists –a tendency for nearby things to be
more similar than distant things
• In run length encoding, the cells of the same value in arrow may be
compacted by stating the value and their total.
• Thematic maps storage sizes get reduced using run length encoding.
• Some raster GIS packages have the capability to handle run length
encoded files.

3/20/2025 49
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 Run-length
2 codes allow
3
the points in
4
5
each
6 mapping unit
10 to be stored
11 per row in
12
13
terms, from
14 left to right,
15 of a begin
16 cell and an
end cell:

50
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1
2
3
4
Row 9: 2-3, 6-6 8-10
5
Row 10: 1,10
6
7
Row 11: 1,9
RUN-LENGH 8
Row 12: 1,9
Row 13: 3,9
CODES / 9
10 Row 14: 5,16
ENCODES (RLE)
11 Row 15: 7, 14
12 Row 16: 9-11
13
14
15
16

51
Describes the interior of
an area by run-lengths,
instead of the boundary.

http://www.gitta.info/DataCompress/en/html/rastercomp_chain.html
In multiple attribute case
there a more options
X
available.
Y
R
W

Codes – II: Recording


end cell position within
a row.

52
Run length coding

http://www.gitta.info/DataCompress/en/html/rastercomp_chain.html
Full raster coding 256 values

Run length coding 129 values

53
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1
2
3
The idea of
4 run-length
5 codes can be
6
7
extended to
BLOCK CODES 8 two
9 dimensions by
10
11
using square
12 blocks to tile
13 the area to be
14
15
mapped.
16

54
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1
2
3
The data
4 structure
5
consists of just
6
7
three
BLOCK CODES 8 numbers,
9
position, the
10
11
size and the
12 contents of
13
the pixels are
14
15
stored.
16

55
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1
2 This image
3
4
can be
5 stored by:
6
7
17 unit
BLOCK CODES 8
9 squares +
10
11
12
9 4-squares +
13
14 1 16-squres.
15
16

56
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 The larger
2
3
the square
4 that can be
5 fitted in any
6
7
given region
BLOCK 8 and the
CODES 9 simpler the
10
11
boundary,
12 the more
13 efficient
14
15
block coding
16 becomes.
57
Example of Block Codes having Multiple Attributes

58

http://www.gitta.info/DataCompress/en/html/rastercomp_chain.html
Quadtree
• Typical type of raster model is dividing area into equal-sized rectangular cells .
• However, many cases, variable sized grid cell size used for more compact raster
representation
• Larger cells used to represent large homogenous areas and smaller cells for finely
details.
• Process involves regularly subdividing a map into four equal sized quadrants.
Quadrant that has more than one class is again subdivided. Then; it is further
subdivided within each quadrant until a square is found to be so homogenous that it
is no longer needed to be divided.
• Then a Quadtree is prepared, resembling an inverted tree with “Root”, i.e., a point
from which all branches expand; Leaf is a lower most point and all other points in the
tree are nodes.
3/20/2025 59
3/20/2025 60
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 Quadtree
2
3
method is
4 0 1 more
5 compact
6
QUADTREE representati
7
8 on is based
9 200
1 on
10
successive
11
12 2 3
division of
13 the 2n x 2n
array into
2001
14
15
16
quadrants
61
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1
2
3 Quadreee
4 0 1 method is
5
6
more compact
7 representation
QUADTREE 8 is based on
9 2001

10
successfive
11 202 203 212 213 division of the
12 2 3 2n x 2n array
13 230 330
14
231
into quadrants
15 322
16

62
Example-1 of Quatree having Multiple Attributes

Example: Position code of cell having value 10: 3,2

63
Example-2 of Quatree having Multiple Attributes

64

http://www.gitta.info/DataCompress/en/html/rastercomp_chain.html
Landuse Map
Reserve forest 0 1
a – Guava
a b – Peach Quadtree
a Range c – Single cropping 20 210 211 Representation
b d
Land d – Mango
212 213
3
Single Cropping
22 23

Quadtree Levels Attributes


Root 1 2 3

0 Forest

1 Forest

0 1 2 3 2 Agriculture

20 Single Cropping

21 Orchard

210 Guava
20 21 22 23 211 Guava

212 Peach

213 Mango

22 Double Cropping
210 211 212 213 23 Double Cropping

3 Range Land
Schematic Representation of the Quadtree

65
Example of how an area is represented on a map and the corresponding
quadtree representation?

http://www.gitta.info/DataCompress/en/html/rastercomp_chain.html
66
VECTOR DATA STRUCTURE
Geographic entities encoded using the vector data model, are often called features.
The features can be divided into two classes:
a) Simple features
• These are easy to create, store and are rendered on screen very quickly.
• They lack connectivity relationships and so are inefficient for modeling
phenomena conceptualized as fields.

3/20/2025 67
b) Topological features
A topology is a mathematical procedure that describes how
features are spatially related and ensures data quality of the
spatial relationships. Topological relationships include following
three basic elements:
1) Connectivity: Information about linkages among spatial
objects
2) Contiguity: Information about neighbouring spatial object
3) Containment: Information about inclusion of one spatial
object within another spatial object
3/20/2025 68
Connectivity
Arc node topology defines connectivity - arcs are connected to each other if they share a
common node. This is the basis for many network tracing and path finding operations.
Arcs represent linear features and the borders of area features. Every arc has a from-node which
is the first vertex in the arc and a to-node which is the last vertex. These two nodes define the
direction of the arc. Nodes indicate the endpoints and intersections of arcs. They do not exist
independently and therefore cannot be added or deleted except by adding and deleting arcs.

3/20/2025 69
3/20/2025 70
3/20/2025 71
Contiguity
Polygon topology defines contiguity. The polygons are said to be
contiguous if they share a common arc. Contiguity allows the vector data
model to determine adjacency.

3/20/2025 72
The from node and to node of an arc indicate its direction, and it helps
determining the polygons on its left and right side. Left-right topology refers to the
polygons on the left and right sides of an arc. In the illustration above, polygon B
is on the left and polygon C is on the right of the arc 4. Polygon A is outside the
boundary of the area covered by polygons B, C and D. It is called the external or
universe polygon, and represents the world outside the study area. The universe
polygon ensures that each arc always has a left and right side defined.

3/20/2025 73
Containment
Geographic features cover distinguishable area on the surface of the
earth. An area is represented by one or more boundaries defining a
polygon.
The polygons can be simple or they can be complex with a hole or
island in the middle. In the illustration given below assume a lake with
an island in the middle.

3/20/2025 74
Polygons are represented as an ordered list of arcs and not in
terms of X, Y coordinates. This is called Polygon-Arc topology.
Since arcs define the boundary of polygon, arc coordinates are
stored only once, thereby reducing the amount of data and
ensuring no overlap of boundaries of the adjacent polygons

3/20/2025 75
Triangular Irregular Network (TIN)
• TIN stands for Triangular Irregular Network,
which is a vector approach to handling a digital
elevation model.
• TIN represents surface as contiguous non-
overlapping triangles created by performing
Delaunay triangulation.
• These triangles have a unique property that the
circum circle that passes through the vertices of
a triangle contains no other point inside it.
• Used to interpolate surfaces using multiple
triangles.
• The data points consist of X, Y and Z values.
The final result gives users a TIN surface. 3/20/2025 76
3/20/2025 77
Advantages of TIN models
• TIN’s give researchers the ability to view 2.5D and 3D at an area that
was interpolated from minimal data collection.
• Users can describe a surface at different levels of resolution based on
the points that were collected.
• TIN interpolation gives GIS users greater analytical capabilities.
• TIN models are easy to create and use.
• They provide users a simplified model that represents collected data
points.
• Using a TIN surface in conjunction with Arc-Map extensions such as
Spatial Analysis and 3D Analyst, TIN users can also derive slope,
aspect, elevation, contour lines, hill shades, etc. 3/20/2025 78
Different Types of TIN Methods and Processes
There are many different types of TIN interpolation methods.
Some of the most popular TIN methods include:
• Natural Neighbour,
• Krigging,
• Spline,
• Nearest Neighbour and
• Inversed Distance Weighting.
These TIN interpolation methods use mathematical algorithms
in order to generate interpolated surfaces. Each of these
methods will produce different types of surfaces. 3/20/2025 79
Structure of TIN Data Model
• The TIN model represents a surface as a series of linked triangles,
hence the adjective triangulated. Triangles are made from three points,
which can occur at any location, giving the adjective, irregular. For each
triangle, TIN records:
• The triangle number
• The numbers of each adjacent triangle
• The three nodes defining the triangle
• The x, y coordinates of each node
• The surface z value of each node
• The edge type of each triangle edge (hard or soft)

3/20/2025 80
Components of TIN
Nodes:
Nodes are the fundamental building blocks of the TIN. The nodes
originate from the points and arc vertices contained in the input data
sources. Every node is incorporated in the TIN triangulation. Every node
in the TIN surface model must have a z value.
Edges:
Every node is joined with its nearest neighbors by edges to form triangles,
which satisfy the Delaunay criterion. Each edge has two nodes, but a node
may have two or more edges. Because edges have a node with a z value at
each end, it is possible to calculate a slope along the edge from one node
to the other.
3/20/2025 81
GRID/LUNR/MAGI
In this model each grid cell is referenced or addressed individually and is
associated with identically positioned grid cells in all other coverage’s, rather
than like a vertical column of grid cells, each dealing with a separate theme.
Comparisons between coverage’s are therefore performed on a single column
at a time. Soil attributes in one coverage can be compared with vegetation
attributes in a second coverage. Each soil grid cell in one coverage can be
compared with a vegetation grid cell in the second coverage. The advantage
of this data structure is that it facilitates the multiple coverage analysis for
single cells. However, this limits the examination of spatial relationships
between entire groups or themes in different coverages.

3/20/2025 82
Imgrid GIS
To represent a thematic map of land use that contains four categories: recreation,
agriculture, industry and residence, each of these features have to be separated out as
an individual layer. In the layer that represents agriculture 1 or 0 will represent the
presence or absence of crops respectively. The rest of layer will be represented in the
same way, with each variable referenced directly. The major advantage of IMGRID is
its two-dimensional array of numbers resembling a map-like structure. The binary
character of the information in each coverage simplifies long computations and
eliminates the need for complex map legends. Since each coverage feature is uniquely
identified, there is no limitation of assigning a single attribute value to a single grid
cell. On the other side, the main problem related to information storage in an
IMGRID structure is the excessive volume of data stored. Each grid cell will contain
more than 1 or 0 values from more than one coverage and a large number of
coverages are needed to store different types of information.
3/20/2025 83

You might also like