0% found this document useful (1 vote)
672 views

GIS Notes PDF

1) A geographic information system (GIS) is a computer system for capturing, storing, analyzing, and displaying spatially-referenced data. 2) GIS allows users to create interactive queries (maps, tables, charts) using spatial data from different layers to support decision-making. 3) The key components of a GIS are hardware, software, data, and users. GIS data is organized into layers representing different themes like property boundaries, land use, drainage, etc. Layers must align spatially to integrate data.

Uploaded by

akhil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
672 views

GIS Notes PDF

1) A geographic information system (GIS) is a computer system for capturing, storing, analyzing, and displaying spatially-referenced data. 2) GIS allows users to create interactive queries (maps, tables, charts) using spatial data from different layers to support decision-making. 3) The key components of a GIS are hardware, software, data, and users. GIS data is organized into layers representing different themes like property boundaries, land use, drainage, etc. Layers must align spatially to integrate data.

Uploaded by

akhil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

GeoGraphic

8 information
SyStem (GiS)
Go through the highlighted portion, Focus on the small definition and try to
understand the conceot with example given here.One more thing ..try to summarised
the flow chart portion .
Introduction
The development of technology and the geographical world is taking place at a very
fast pace. This necessitates having a system capable to analyse and manipulate the
spatially referenced data or remotely sensed information, and give the desired output
(information) in a very short time. A geographic information system helps to better
understand the world around us and enables development of spatial intelligence
for logical decision-making. Several definitions of GIS exist in literature, some
of which are as follows.
GIS is a computer-based information system which attempts to capture, store,
manipulate, analyse and display spatially referenced and associated attribute data
Basic definitionfor solving complex research, planning and management problems.
GIS is a system of hardware, software, data, and people organising, collecting,
storing, analysing and disseminating information about the areas of the earth.
GIS is an information technology which stores, analyses and displays both
spatial and non-spatial data.
GISs are specialised data bases that preserve locational identities of the
information that they record.
The word graphic in GIS carries two meanings: earth and geographic space.
By earth, it implies that all data in the system are pertinent to earth’s features and
resources, including human activities based on or associated with these features and
resources. By geographic space, it means that the commonality of both the data and
the problems that the systems are developed to solve, is geography, i.e., location,
distribution, pattern, and relationship within a specific geographical reference
framework. The word information implies that data in a GIS are organised to yield
useful knowledge often as coloured maps and images, statistical graphics, tables
and various on-screen activities. The word system implies that a GIS is made up
from several inter-related and linked components with different functions.
The term ‘Geographic Information System’ first appeared in published literature
in 1970. Although it sounds to be relatively a new term, many of its concepts have
been in existence for centuries. For example, to illustrate fundamentals that still
comprise the basis of GIS, consider the base plan of a locality of a city wherein
it is required to show public utilities such as water supply network, fire hydrants,
302 Surveying

drainage system, network of sewers, arrangement of manholes, underground


electrical cables, etc. Since incorporation of all these into the base plan probably
will make the plan unreadable and confusing, the practice in conventional surveying
is to draw the utility plans on tracing papers separately and overlay them over the
base plan as per the requirements. In addition, the data related to such utilities, for
example, diameter, make, type of pipes, etc., are maintained in a separate register
to be referred for complete information. Of course, maintenance of these records
and updating the changes effected from time to time is a cumbursome process. All
this information and more can be stored, manipulated and retrieved with the help
of computer and softwares within no time, which is the essence of GIS. The use
of GIS removes the need of paper plans and associated documents and speeds up
the production of information in the form of maps, tables, etc., (the GIS products)
by rapidly updating and editing the data in computers.
GIS is capable of acquiring spatially indexed data from a variety of sources,
changing the data into useful formats, storing the data, retrieving and manipulating
the data for analysis, and then generating the output required by the user. The
acquired special indexed data are known as layers. Each layer represents a thematic
approach to a particular purpose. For urban planning, a set of the layers may
represent, for example, property lines of areas, land use, drainage and contour
information, soil type, etc. Any selected layer or a combination of layers can be
depicted on a map at any desired scale. Figure 8.1 shows the layer-based concept
of data organisation in GIS.
try to read all the highlighted portion several times in
order to make yourown concept.

The great strength of GIS is the ability to handle a large multilayered,


heterogeneous database and to answer queries about the existence, location and
properties of a wide range of spatial objects in an interactive way. All the layers
need to have a common reference system to enable two or more layers to be
integrated. Also, maps can be created from the merged data and overlaid, if desired.
However, the accuracy of the spatial analyses and hence the validity of decisions
reached (map) as a result of these analyses depend directly on the quality of the
spatially related information in its database. A digital computer provides the basis
for storage, manipulation and display of large amounts of data that have been
encoded in digital form. GIS consists of a package of computer programs with a
user interface that provides access to particular functions.
Geographic Information System (GIS) 303

The objective of collecting geographic data and converting them into useful
information (desired output, e.g., map, table, etc.) by means of a GIS transcend
the traditional boundary of data processing and information management. The main
advantage of GIS is rapid analysis and display of data, with flexibility not possible
using manual methods. GIS does not hold maps or pictures it holds a database.
That is how it is different from computer mapping, which can produce only good
graphic output. GIS, as we understand it today, is very different from its predecessor.
GIS, earlier used for computer-based applications for map-data processing, is now
an essential component of the information technology infrastructure of modern
society. It is a multidisciplinary science. GIS practitioners may be geographers,
surveyors, planners, or computer engineers. Despite the diversity in approaches, a
special set of skills and knowledge is required by professionals to use GIS in all
its forms and implementations. GIS, today, has become an indispensable tool to
manage land and natural resources, monitor the environment, formulate economic
and community-development strategies, enforce law and order, and deliver social
services.
The main purposes of GIS are:
1. To support decision-making based on spatial data; for example, an
engineering geologist may evaluate slope stability conditions through GIS
for deciding the best new route
2. To support general research
3. To collect, manipulate and use spatial data in database management
4. To produce standardised and customised cartographic production

8.1 SubSyStemS of GiS Important part for subjective question


A GIS must include at least three main elements: (i) computer hardware, (ii)
computer programs, and (iii) data. A GIS may be considered to have five major
component subsystems (Fig. 8.2), and are as follows:
1. Input : Which deals with creating an image-based GIS from
multigeodatasets.
2. Management : The purpose is efficient storage, retrieval and database
management.
3. Processing : Data manipulation, feature enhancement and classification,
etc.
4. Display : Display and product generation.
5. Output : Provides thematic maps, images, etc., for application.
The basic forms of the data for GIS are spatial data—data that provide locations
and shapes of features in a map; tabular data—data that are collected or compiled
for a given area, a GIS links to the features in a map; and image data—such as
aerial photographs and products, satellite images, and scanned data (photographic
prints converted to digital format).
The captured data can be transformed from existing maps, detailed observation,
satellites and aerial photography into digital or computer compatible format. The
remote sensing images and digital data are the primary source of modern GIS.
Using digitiser/scanner, keyboard entry of attribute information, etc., these data
can be stored in a computer. The various types of geographic data are stored in
304 Surveying

two different formats in a GIS in the form of: the Cartesian coordinates and the
raster format (e.g., grids). The graphical features are stored with their location
(x, y) defined by latitude and longitude, and attribute data may be qualitative (e.g.,
land use), or quantitative. The location can be represented either by a raster (grid
cell format), or a vector format (polygon). In raster format, a location is defined
by the row and column position of the cell it occupies; the value corresponding to
the location indicates the type of feature. In the vector format, the geographic space
is continuous and the data structure is more representative of the dimensionality
as in a map. For GIS data input, either of the two data formats can be used. The
two data formats, their structure and conversion from one format to the other are
discussed in the sections to follow.

Look the flow diagram once

Once the data is transformed into the computer, this data has to be stored to
create a permanent database for further data analysis and manipulation. The digital
map file is stored on a magnetic or optical digital medium. The encoded spatial
data are stored systematically in the form of layers, known as GIS layers. These
layers are archived in the digital format as a geographically-referenced plane in the
GIS database. The database files are stored in the central processing unit (CPU)
memory and can be processed and manipulated.
The computer program that is employed to organise the database is commonly
known as Database Management System (DBMS). The analysed and manipulated
result of the data has to be displayed or presented to the user in a user-specified
format for decision-making purpose. Either the data may be presented as maps,
tables and figures on the screen, or recorded on magnetic media in digital format
or as hardcopy output drawn on printer or plotter.
not soo important only refet the device used
Geographic Information System (GIS) 305

8.2 hardware of GiS


The hardware of GIS is made up of a configuration of core and peripheral
equipment that is used for the acquisition, storage, analysis, and display of
geographic information. At the heart of the GIS hardware architecture is the
central processing unit (CPU) of the computer. The CPU performs all the data
processing and analysis tasks and also controls the input/output connectivity with
data acquisition, storage, and display systems. Depending on the data-processing
power of the CPU, computers are classified as supercomputers, mainframes,
minicomputers, workstations, and microcomputers or personal computers (PCs).
All these classes of computers can be used as the hardware platforms for GIS.
Conventionally, GIS were developed as stand-alone applications that ran on one
of these classes of computers. Today’s GIS are mostly implemented in a network
environment using the client/server model of computing,
Client/server computing is based on the concept of division of work among
different machines in local or distributed computer network. A server is the
computer on which data and software are stored. A client, on the other hand,
is the computer by which the users access the server. The application programs
can be executed on either the server or the client computer. In the client/server
environment, a client can access multiple servers, and similarly, a server can provide
services to a number of clients at the same time. For GIS that are implemented on
the client/server architecture, processor-intensive operations and data management
are most commonly performed in the workstation class of servers, and PCs are
used as the clients that provide the graphical interface to the system. Such a
configuration, which combines the processing power of workstations and the
economy of using PCs, has replaced the mainframes and minicomputers as the
dominant hardware platforms for GIS.
Because of the very large amounts of data that are inherently required for a
GIS, it is necessary to have access to tape drives and disk drives that permit the
GIS to read the information transported on computer tape from other computers
and other GISs. A colour display is often an important element of a GIS as a
means of displaying several images or map patterns.
A GIS usually includes equipment for entering data into the system. Some data
can be entered using a video camera, which is capable of recording an image in
much the same way that a television camera does: a map or image that can be
analysed in much the same way as any digital image. Although the video camera
is useful for some applications, it often cannot produce the geometric accuracy or
the fine detail that are required for GIS work. Therefore, other methods of data
entry are required.
A digitising table, or digitiser, (Fig. 8.3) forms a third method of data entry.
Digitiser is a traditional device for encoding the digital data from an existing map.
The table consists of a fine grid of thin wires that have been encased in a dense,
stable substance such as fibre glass. The wires are capable of sensing the x–y
positions of a cursor that can be moved over the surface of the table. The more
finely and accurately the wires are spaced, the more precise is the data generated
by the digitiser. The analyst tapes a map or aerial photograph to the surface of the
digitiser, and after establishing a coordinate system, can trace the outlines of areas,
306 Surveying

or mark positions of points and lines with the cursor. As the cursor moves, the
digitiser creates a digital record of its successive positions usually as a sequence
of coordinate pairs.

Usually, the digitiser is supported by a small processor or computer that


can allow the analyst to record data with separate codes (to identify lines that
record streams, highways, or power lines, for example) and to perform geometric
transformations (to correct positional errors). The best digitising systems may
allow the analyst to see the data on a computer screen and to identify and correct
errors as they occur.

8.3 data for GiS read all the form of data used for collecting information
The prime aspect in the construction of GIS is the acquisition of data. Data can
be gathered directly in the field by an original survey specifically carried out
for the purpose of GIS and is known as captured data. However, it will be very
time consuming and costly and thus rarely resorted to. Alternatively, the data can
be obtained/derived from a source that is already available, such as topographic
maps, digitised maps and plans, aerial photographs, satellite imagery, or directly
from GPS survey. Such data are called encoded data. Data that are obtained by
human intervention, for example, from sketches of landscapes, a questionnaire,
etc., are called the interpreted data. Those which are in a table or in GIS are
called structured or organised data. The remotely sensed images and digital data,
and the extracted information from these are the primary source of modern GIS.
The digital remote sensing data are in raster format and are acquired by sensors
through a scanning device. The device collects the data at successive instants of
time by dividing the field of view in a grid pattern; each grid element is called
pixel. A sensor records a series of radiometric values for each pixel of each band,
it is sensing, and the image is built up by a combination of consecutive pixel lines
(scan lines). Data in GIS may be classified as spatial and non-spatial.
1. Spatial data Also called graphical data, it consists of natural and cultural
features that can be shown with lines or symbols on maps, or that can be seen
Geographic Information System (GIS) 307

as images on photographs. The data in the different forms (maps, photographs,


images, etc.) being in non-compatible formats create problems while integrating
in GIS. In a GIS, these data must be represented and spatially located in digital
form, by using a combination of fundamental elements called simple spatial objects
(SSO). These SSO include points, lines and strings, areas or polygons, pixels,
and grid cells. SSO can be represented by their respective symbols. In addition
to these SSO, surface element is also there, which can represent most realistically
the spatial objects we observe in the real world. A surface is an area or a polygon
having a third dimension, i.e., height (elevation). Natural features such as hills,
valleys, etc., and man-made features such as structures, etc., can be best described
by the surface element. Modelling of surface is known as Digital Terrain Modelling
(DTM) and is given in Appendix II.
2. Non-spatial data: Also called attribute data, it describes geographic regions or
defines characteristics of spatial features within geographic regions. These data
are usually alphanumeric and provide information such as colour, texture, quantity
and quality. The non-spatial data are often derived from documents such as plans,
files, reports, tables, etc. For example, for a highway, the data may be its number,
pavement type, number of lanes, lane width, year of last resurfacing, etc.
In general, spatial data will have related non-spatial attributes. The linkage
between the spatial and non-spatial data, therefore, must be established and is
achieved with a common identifier, which is stored with both of these. The
identifier may, for example, be a unique parcel identification number, a grid cell
table, etc. important for objective question, read the representation of
different features
8.4 repreSentation of featureS
GISs are the information systems which offer the functionality and tools to
collect, store, retrieve, analyse and display geographic information. Features, events
and activities with spatial components are modeled, as points, lines, polygons,
nets or links to form the geographic database under georelational schemes. The
geometrical part is defined by a series of coordinates and is connected through
the feature code with the attribute tables where the non-spatial information such
as the properties, symbolism, etc., is stored. GIS can represent spatial data as the
data which have physical dimension on the earth. In order to represent the complex
three dimensional realities in a spatial database, the SSO used are described as
follows.
1. Point data: Point data consist of observations that occur only at points or
occupy very small areas in relation to the scale of the database. These define
single geometric positions (Fig. 8.4(a)). Spatial location of points are given by
their coordinates (x, y). Features such as wells, for example, illustrate data that
occupy a single point even at the largest levels of detail. In contrast, features
such as buildings sometimes occupy significant areas, even though they may be
represented as a point in the database. Some of the other examples of point data
are survey control points, monuments, mines, etc.
308 Surveying

2. Line and string data: Lines and strings are obtained by connecting points. A line
connects two points (Fig. 8.4(b)), and a string is a sequence of two or more lines.
Line and string data are formed by features such as highways, railways, canals,
rivers, pipelines, power lines, etc. An arc, which is the locus of points, may be
defined by a spline curve or polynomial mathematical function.
3. Areal data: An area or polygon consists of a continuous space within three or
more connected lines. Examples of areal data (Fig. 8.4(c)) include distribution such
as soil type, parcels (pockets) of land ownership, different types of land cover,
vegetation classes, and other patterns that occupy area at the scale of the GIS.
4. Pixels: These are usually tiny squares that represent the smallest elements into
which a digital image is divided (Fig. 8.4(d)). Continuous arrays of pixels, arranged
in rows and columns, are used to enter data from aerial photos, satellite images,
orthophotos, etc. The distributions of colours or tones throughout the image are
specified by assigning a numerical value to each pixel. Pixel size can be varied
and is specified either in terms of image or object scale. At the image scale,
pixel size may be specified directly (e.g., 0.025 × 0.025 mm) or as a number of
pixels per unit distance (e.g., 10 dots per cm, where a dot corresponds to a pixel).
At the object scale, pixel size is usually expressed directly by dimension (e.g.,
10 m pixel size).
5. Grid cells: These are single elements, usually square, within a continuous
geographic variable. Similar to pixels, their sizes can be varied, with smaller cells
yielding improved resolution. Grid cells (Fig. 8.4(e)) may be used to represent
terrain slopes, soil types, land cover, water-table depths, land values, population
density, etc. The distribution of a given data type within an area is indicated by
Geographic Information System (GIS) 309

assigning a numerical value to each cell. For example, to show soil types in an area,
numerals 1, 2, and 3 may be used to represent sand, silt and clay respectively.

8.5 data Structure for GiS


Data for a GIS must be represented in a form that preserves locational identities
of each unit of information, so that it is possible to retrieve data by location and
therefore to depict and analyse geographic patterns. Because data are frequently
derived from a ‘conventional’ (non-digital) map or image, it is necessary to
convert them into digital form suitable for use by a GIS. This process, known as
geocoding, records the pattern/features of a map in a form that can be accepted
and manipulated by computers.
The simple spatial objects described in Section 8.4 may be coded in two
different formats—vector and raster—for storing and manipulating these (spatial
data) in a GIS. Both of these data structures, also called data models or sometimes
data formats, offer contrasting advantages and disadvantages. When data are
depicted in the vector format, a combination of points, lines and strings, an area
is used, whereas the raster format uses pixels and grid cells. Figure 8.5 shows a
representation of simple spatial objects in vector and raster models.

For comparison used this diagram

Usually, a GIS must be designed on either a raster or a vector format. Because


of differences in equipment, computer programs, and expertise required for the two
different approaches, the choice depends upon the facilities available, the kinds of
data to be examined, and the purposes of establishing the GIS. Also, it is possible
to convert from vector-to-raster format by applying relatively straightforward
computer algorithms, but it is a little difficult to do raster-to-vector conversion.
The data format conversions are described in Section 8.9.

8.6 Vector data Structure


Vector data depicts the real world by means of discrete points, lines and polygons
and is stored as a collection of x, y coordinates. The vector format provides an
accurate representation of spatially referenced data incorporating the topology
(Appendix IV) and other spatial relationships between the individual entities. The
three forms of basic graphical elements are the point represented as a single
pair of coordinates, the simplest type of vector data; the line or arc represented
as a string of coordinates, which begin and end with a node; and the area or
310 Surveying

polygon represented as a closed loop of coordinates. When graphical elements


representing an individually identifiable real-world feature are logically grouped
together, a graphical entity is formed. For example, the different line segments
that represent a railway are graphical elements. When these line segments/graphical
elements are identified in the database and logically joined together, a graphical
entity railway is formed.
In addition to the formatting of the basic graphical elements, it is also necessary
for the vector data to be properly linked to the descriptive data in geographic
databases. This is usually achieved by the use of a unique feature identifier (FID)
that is assigned to individual spatial objects. By using common FIDs, the graphical
and descriptive elements of vector data can be correctly cross-referenced during
database creation and spatial data processing. Usually, the assignment of the feature
identifier is an automated procedure during the topology building process, but the
linkage to descriptive data is normally a manual process that can be only partially
automated.
A number of instruments are available to input a vector data into a GIS, but
manual digitising is used commonly. Figure
8.6 illustrates the vector format for two
adjacent parcels of land, designated as Parcel
I and Parcel II. It consists of points, lines and
areas. Vector representation of the data can
be achieved by creating a set of tables that
list these points, lines, and areas (Table 8.1).
Data within the table are linked by using
identifiers and are related spatially through
the coordinates of points. As illustrated in
column (i) of Table 8.1, all points in the area
are indicated by an identifier. Similarly each
line is described by its end-points, as shown in column (ii) of Table 8.1, and the
end-point coordinates locate the various lines spatially. Areas in Fig. 8.6 are defined
by lines that enclose them, as shown in column (iii) of Table 8.1. As before,
coordinates of end-points locate the areas and enable the determination of their
locations and magnitudes.

(i) (ii) (iii)


Point identifier Coordinates Line Points Area Lines
identifier identifier
1 (x1, y1) a 1,2 I a,e,d
2 (x2, y2) b 2,3 II b,c,e
3 (x3, y3) c 3,4
4 (x4, y4) d 4,1
Monument, m (xm, ym) e 4,2

As another example, consider a land-cover map showing an orchard, a forest,


a built-up area and a river with standard topographic symbols (Fig. 8.7(a)). A
vector representation of the region is shown in Fig. 8.7(b). Here, the line and
Geographic Information System (GIS) 311

string elements demarcate and locate boundaries of different regions. The river
has been shown by string elements. Tables similar to 8.1 can be constructed and
entered into GIS using the vector format.

The representation of vector data is governed by the scale of the input data.
For example, a building that is represented as a polygon on a large-scale map will
become a point on a medium-scale map, and it will not be represented at all as an
individual entity on a small-scale map (unless it is a very important landmark). The
possibility of representing vector data differently at different scales is associated
with two important concepts: (i) cartographic generalisation, whereby line and
areal objects are represented by coordinates at a larger scale and (ii) cartographic
symbolisation, whereby vector data are represented by different symbols that serve
to visually distinguish them from one another when the data are displayed.
In the computer, vector data can be stored as integers or floating point numbers.
In order to avoid the problem of rounding off errors that occur during data
processing, most GIS software products store vector data by using double-precision
floating point numbers. This creates the impression that vector data are accurate
and precise representations of spatial objects in the real world. However, this is
not necessarily true because the precision of data storage does not always mean
accurate description of the data, and also, the boundaries of many spatial objects
are fuzzy rather than exact entities. Thus, storing vector data by double-precision
floating point numbers does not improve the quality of the data, but simply
serves to avoid degradation of data quality due to rounding errors during data
processing.

8.7 raSter data Structure


Raster data structure, also called cellular data structure depicts the real world by
pixels or grid cells. It is not as accurate or flexible as the vector format, as each
coordinate may be represented by a cell and each line by an array of cells. Raster
data can be positioned only to the nearest grid cell. Examples of data in the raster
format are aerial photographs, satellite imagery and scanned maps or plans.
312 Surveying

For the input of raster data, first the region of interest is subdivided into a
network of cells of uniform size and shape (regular, square or rectangular). The
linear dimensions of each cell define the spatial resolution of data or the precision
with which the data is represented. Thus, the size of an individual pixel or cell
is determined by the size of the smallest object in the geographic space to be
represented. The size is also known as the minimum mapping unit (MMU). A
general rule is that the grid size should be less than half the size of the MMU.
Once the grid cell size has been decided, each grid cell is assigned a value,
which can be an integer, a floating point number, or a character (a code value). A
raster data along with characteristics is shown in Fig. 8.8(a). The values marked
indicate the quantity, or characteristics of the spatial object, or phenomenon that
is found at the location of the cell. The input of the vector counterparts of this
raster data is also shown in Fig. 8.8(b). The value 3 has been used to classify the
raster cells according to land use—the road—at the given location. The remaining
cells are filled with 0 indicating that no identity is present at that location. There
are four methods for the input of the vector counterparts of the raster data the
dominant method, the precedence method, the presence/absence method, the per
cent occurrence method (see Section 8.9) available in the literature.
In a raster database, values pertaining to different characteristics at the same
cell location are stored in separate files (map layers). For example, a road and
forest cover for the same area are stored as separate road and forest data layers.
When the data are used for processing, the appropriate layers are retrieved. This
means that raster data processing always involves the use of multiple raster files,
in the same way different layers are used in vector data processing.
When a specific raster layer is displayed, it is shown as a two-dimensional
matrix of grid cells. In computer storage, the raster data are stored as a linear
array of attribute values. Since the dimension of the data (the number of rows
and columns) is known, the location of each cell is implicitly defined by its row
and column numbers. There is no need to store the coordinates of the cell in the
data file. The locations of the cells can be computed when the data are used for
display and analysis.
In order to translate linear array storage to a two-dimensional display, enough
information must be stored in the header section of the data file as well. In general,
the file header contains information about the number of bits used to represent
the value in each cell, the number of rows and columns, the type of image, the
legend, the name or the colour palette (if the file uses one), and the name of the
look-up table (if the file uses one). Some file headers also contain parameters for
coordinate transformation so that raster data in the files can be georeferenced.
This is, however, a system-dependent feature. The cells in each line of the image
(Fig. 8.8(a)) are mirrored by an equivalent row of numbers in the file structure
(Fig. 8.8(c)). The first line of the file structure indicates the computer that the file
consists of 6 rows and 6 columns and that the maximum cell value is 3.
Raster data files are stored in different file formats. The differences between
these file formats are due mainly to the different algorithms used to compress
the raster data files. In order to minimise the data-storage requirements, raster
data are often stored in compressed form. The data are decompressed ‘on-the-fly’
Geographic Information System (GIS) 313

when they are used by an application program. The raster model (the geometrical
arrangement of the figures covering the surface) is best employed to represent
geographic phenomenon that are continuous over a large area.

6, 6, 3
0 3 0 0 0 0 0 3 0 0 0 0

0 3 0 0 0 0 0 3 0 0 0 0

0 3 3 3 0 0 0 3 3 3 0 0

0 0 0 3 0 0 0 0 0 3 0 0

0 0 0 3 3 0 0 0 0 3 3 0

0 0 0 0 3 0 0 0 0 0 3 0

8.8 Vector VS raSter data StructureS


The relative advantages/disadvantages and limitations of vector and raster data
structures are presented with suitable examples.

1. A vector database can depict point data as points which can be positioned
accurately. However, a raster database can depict point data only at the
level of the detail of a single cell. This leads to loss of accuracy, for
example, a cell can show the presence of a tower, within a cell but cannot
show its placement within this cell. Of course, the cell size would affect the
depiction but a raster database cannot be as accurate as a vector database.
It may further be noted that some points may represent the quantitative
characteristics, e.g., amount of rainfall or elevation but these information
cannot be included in the raster format.
314 Surveying

2. A vector database can show a line data in exact and fine detail, whereas
a raster database can show the same line as a zigzag or a comparatively
more smooth line depending upon the resolution of the cell.
3. A vector database provides details and exact/fine boundaries among aerial
patterns, e.g., land cover. However, in case of a raster database, the accuracy
is lost for the reason explained earlier.
4. Discrete quantitative data such as population, which are grouped/associated
with an area, are best depicted in finer detail by polygons (vector
format). However, continuous data such as topographic elevation/contours,
represented by a network of equally spaced observations, can probably be
most directly presented by a raster format.
5. A vector database is best suited to represent various natural/artificial
features and also these can be presented mathematically (coordinates).
This makes the vector format conceptually more complex than the raster
format.
6. A vector database require less storage space on the computer as compared
to a raster database for the same information. Also, the vector formats are
more accurate and present a finer detail of shapes and sizes as compared
to the raster format.
Some of the other notable disadvantages of raster data are coding of a cell with
a single value (category) whereas many features may be present in it; recording of
spatial objects only to the nearest cell which may not correspond/match in reality
and also may not exist in realty, e.g., watershed; and coarser resolution of spatial
features leading to inaccurate representation.

1. Vector formats sometimes prove costlier because of higher data-encoding


cost. Also, the programs for data manipulation are more complex as
compared to the raster format.
2. In case of the vector format, the superimposition/overlaying different layers
of data may be difficult because some polygons in different layers may
not match exactly due to minor digitisation errors, forming small slivers
or strips (Fig. 8.9).
Geographic Information System (GIS) 315

Some of the notable advantages of raster format are use of simpler computer
programs for data manipulation; ideally suitable for a variety of spatial analysis
functions, for example, overlay, buffering and network analysis; direct use of
remote-sensing data which are in this format; use of available image processing
software for refining raster images; and in some data types (soil, boundary, wet
land, built-up area, etc.) which are relatively vague, its use does not significantly
degrade the inherent accuracy of the data.

8.9 data format conVerSionS


The usage of GIS data often requires integration of vector and raster data which
needs conversion from one form to the other. The various procedures for converting
vector data to raster format and vice versa are described below. It should be noted
that interpretation/classification of boundary/mixed data varies with the method
used and also that none of the methods is perfect in presenting the true feature.
1. Vector to raster conversion: Vector to raster conversion is also referred to as
coding. Figure 8.10(a) shows a vector data/format (map) of the vector representation
of the land-cover map shown in Fig. 8.7, overlaid/superimposed on a raster of grid
cells. The size of these cells will depend upon the accuracy desired, time and
computing facility available. Figure 8.10(b) illustrates the raster representation
of vector counterparts of raster data using a coarse-resolution grid cell of the
region of the land-cover map of Fig. 8.7(a), and that using a finer-resolution grid
cell is shown in Fig. 8.10(c). The finer-resolution grid renders depiction/storage
of areas with greater precision. Of course, the fine-resolution grid cell size will
yield better results but will demand more time and computing facility and results
in higher costs.
As discussed earlier, the vector counterparts of the raster data can be input
by the four methods. Of these, the presence/absence and dominant types are the
best method and are used depending upon the importance felt by the user. In the
presence/absence method, for each grid cell, a decision is made as to whether the
selected entity exists at the centre of the given grid cell or not.
In the presence/absence method, the cell is assigned a value corresponding to
the characteristics (vector location) of its centre. Thus, for example, the centre of
the cell (3, 4) is occupied by an orchard and hence this cell is assigned the value
O or 2, as shown in Fig. 8.11(b). If it does not, it is ignored.
The cell in the precedence method is allotted a value corresponding to the most
important characteristics or precedence with respect to the other characteristics
present in it. For example, in the cell (6, 3), there are three different characteris-
tics, viz., orchard, river and built-up area. Though the river in this cell occupies
the least area; it is the most important among the three features, and also the
river cannot be discontinuous. Thus, among the three characteristics of this cell,
the river is given precedence over others and accordingly this cell is coded as R
or 3, as shown in Fig. 8.11(c).
In the dominant type of conversion or coding, each cell of a grid is assigned
a value corresponding to predominant characteristics of the area within the cell.
For example, two different type of areas, viz., built-up area and orchard occupy
the cell located in the third row and the third column (3, 3). This cell is assigned
316 Surveying

the value B or 1, as shown in Fig. 8.11(d), because the larger part (more than 50
per cent) of the cell is occupied by buildings.

read this example , this would


help you to understand the basic
concept.

It should be obvious by now that the coding of a particular cell, in which


multiple features/characteristics are present, depends upon the conversion method
used. For example, the cell (4, 5) would be assigned F or 4 in the predominate
Geographic Information System (GIS) 317

method; R or 3 in the precedence method; and B or 1 in the presence/absence


method. Further, the cell size would also affect the coding or conversion from
vector to raster, with a larger cell resulting in a relatively inaccurate presentation
of vector data.
In the above coding methods, alphabets or numerals may be used depending
upon convenience. However, use of numbers is more common. These grid cell
values can be used directly for computation or indirectly as code numbers
referenced to an associated table.
2. Raster to vector conversion: In this conversion, the cells of the raster through
which the vector line (e.g., river, boundaries of buildings, roads, etc.) passes are
identified. Then the line (vector form) connecting these cells is drawn. One way
is to connect the centres of cells with straight line segments. Obviously, this will
produce a zigzag line, whereas in nature this line would be a smooth one. So,
either curve-fitting is required or the cell size is to be reduced to extract/draw
a smooth line passing through the cells. The first method involves complicated
mathematical calculations, which does not necessarily give a unique solution.
Either of the methods would require large capacity computers as well as time.
Even after this, the line drawn, using raster data, may not exactly match with
the actual feature present in nature. This conversion of data from raster to vector
model is illustrated in Fig. 8.12.
In case if the vector data is converted to raster data and then again converted
back to the vector data, the resultant data set will not likely match with the
original one. This is specifically true in case of boundaries of different common data or
linear features.
Also have a look on this

8.10 capabilitieS/functionalitieS of GiS


Analysis in GIS basically refers to the processes of drawing inferences from data.
The analysis is carried out on data available either in tabular form or maps. In
general, the capabilities or functionalities of GIS are innumerable and beyond the
scope of the book. However, queries, reclassification, buffering and overlay need
special mention and are described in detail. Since map overlay analysis is the most
important function, it is discussed in sections to follow.
1. Organisation: The importance of data organisation can be revealed by a person
who has collected a large mass of data for any particular purpose. Data can be
318 Surveying

arranged in many ways, and unless the organisation scheme is suitable for the
application at hand, useful information cannot be easily extracted. Schemes for
organising data are sometimes called data models (structures) the vector and
the raster already discussed in the previous sections. Data models organise
observations both by spatial and non-spatial attributes. Thus, data organisation has
a fundamental importance.
2. Visualisation: Visualisation is achieved in GIS with colour and by specialised
methods using perspective, shadowing and other means. The graphical capabilities
of computers are exploited by transforming a table of data, for example, into a
visual display through which the spatial associations can be visualised. Complex
relationship probably can be better understood by visual display rather than from
a table of data. Further a visual display can be manipulated to give alternative
views/representation of the data, thereby enhancing the capability to analyse the
anomalies and patterns through GIS. Visual display is obtained either on the video
monitor or other output devices such as colour printers.
3. Combination: The ability to merge spatial data sets from quite different sources
their manipulation and subsequent display can often lead to an understanding and
interpretation of spatial phenomena that are simply not apparent when individual
spatial data types are considered in isolation. The data measuring activity combines
image data for a certain geographic area with other reference data of the same area.
The GIS operator may overlay multiple images of this area at different dates a
technique used for identifying changes over time, for example, monitoring of forest
fire or spreading of disease in tree species. The process of combining layers of
spatial data is sometimes called data integration and can be carried out either by
visualising composite displays of various kinds, or with integration models that
effectively create a new map from two or more existing maps.
4. Prediction: Prediction is one of the purposes of GIS. For example, a number
of data layers indicating population data in different regions of a city along with
the growth patterns and civic facilities might be combined together to predict the
future population at the desired time in different parts of the city. Such a map
may then be used as a basis for making city development decisions. Prediction
may sometimes also be a research exercise to explore the outcome of making a
particular set of assumptions, often with the purpose of examining the performance
of a model.
5. Queries: Since GIS is a decision support system, performing queries on a GIS
database to retrieve information (data) is its essential part. Queries offer a method
of data retrieval, and can be performed on data that are part of the GIS database,
or on new data produced as a result of data analysis. These are useful at all stages
of GIS analysis for checking the quality of data and the results obtained. A GIS
typically stores spatial and non-spatial (also called aspatial or attribute) data in
two separate files. The GIS has capability to search and display spatial data based
on attribute criteria and vice versa. Accordingly, there are two general types of
query that can be performed with GIS: spatial and aspatial. Aspatial queries are the
questions about the attributes of features. ‘How many nursing homes are there?’
Geographic Information System (GIS) 319

is an aspatial query since neither the question nor the answer involves analysis
of the spatial component of data. This query could be performed by database
software alone. A question requiring information about ‘where’ is a spatial query.
This requires linking the data sets using location as the common key. A GIS has
the capacity to satisfy the following queries: only read the type of Queries
(a) About location: What exists at a particular location? The location of the
particular region can be described in many ways using place name, post or pin
code, or geographic reference, such as latitude and longitude.
(b) Condition: This query requires spatial analysis to give an answer. Instead of
identifying what exists at a certain location, one seeks to find a location where
certain conditions are satisfied.
(c) Pattern: This query is more sophisticated and important as one might want to
know how many anomalies are there within an area over a time.
(d) Trend: This query might involve both location and conditions and seeks to find
differences within an area over a period of time.
(e) Modelling: This query is posed to determine what happens if some addition or
changes are done in the existing network, e.g., to determine the extent and level of
contamination in an area if some toxic substances seeps into the ground water and
thence to the local water supply. For answering these queries, both geographic and
other information and possibly even scientific laws may be required. These queries
require efficient search of data items and capability for deriving their geometric
and topological attributes.
6. Reclassification: Although query is the most widely used function to retrieve
data from a GIS database, irrespective of the vector or raster model, reclassification
can also be used in place of query in the raster model. Consider a land-use image
from which we require to extract information on areas of schools. The answer
to this query could be obtained by creating a new coverage that eliminates all
unnecessary data. Reclassification would result in a new image. For example, in
a raster image, if cells representing schools in the original image had a value of
30, a set of rules for the reclassification could be
(a) Cells with values 30 (schools) should take the new value of 1.
(b) Cells with values other than 30 should take the new value of 0.
Such a reclassification will generate a new image with all schools coded with
1, and all the rest coded with 0. The resulting reclassified image is very useful
for land use/land cover and environmental studies.
Important part, read it thoroughly
8.11 neiGhbourhood functionS
There is a range of functions available in GIS that allow a spatial entity to influence
its neighbours, or the neighbours to influence the character of an entity. The most
common examples are buffering, proximity analysis and filtering.
1. Buffer operation: Buffering is the creation of a zone of interest around an entity.
Buffering is possible in both vector and raster GIS. In the vector case, the result
320 Surveying

is a new set of objects, while the result in the raster case is the classification of
cells according to whether they lie inside or outside the buffer. Buffers are very
useful for analysing landscapes, highway alignments, water supply networks and
drainage studies.
In most GIS data analysis, there is more than one method of achieving an
answer to a question. The trick is to find the most efficient method, and the most
appropriate analysis. For example, the question, ‘Which nursing homes are within
300 m of a main road?’ could be approached in a number of ways. One option
would be , first, to produce a buffer zone identifying all land up to 300 m from
the main road; and then, to find out which nursing homes fall within this buffer
zone using a point-in-polygon overlay. Then another query can be made to find the
names of the nursing homes. An alternative approach would be used to measure
the distance from each nursing home to a main road, and then to identify those
which are less than 300 m away. Repeated measurement of distances from nursing
homes to roads could be time consuming and prone to human error. Thus, the first
approach using buffering would be more appropriate.
Conceptually, buffering is very simple but involves complex computational
operation. If a point is buffered, a circular zone is created. Buffering lines and
areas creates new areas (Fig. 8.13). Creating buffer zones around point features is
the easiest operation; a circle of the required radius is simply drawn around each
point. However, creating buffer zones around line and area features are a little
more complicated. Some GIS do this by placing a circle of the required radius
at one end of the line or area boundary to be buffered. This circle is then moved
along the length of the segment. The path that the edge of the circle tangential
to the link makes is used to define the boundary to the buffer zone. Sometimes,
there may be a need for another buffer around a buffer. This is called a doughnut
buffer.
2. Proximity analysis: While buffer zones are often created with the use of one
command or option in vector GIS, a different approach is used in many raster
GISs. Here, proximity is calculated which results in a new raster data layer where
the attributes of each cell is a measure of distance. This is known as proximity
analysis.
3. Filtering: Data filtering involves the recalculation/reallotment of cells in a raster
image based on the characteristics of neighbours. Filtering is a technique used for
the processing of remotely sensed imagery. Filtering will change the value of a
cell based on the attributes of neighbouring cells. The filter is defined as a group
of cells around a target cell. The size and shape of the filter are determined by
the operator. Common filter shapes are squares and circles, and the dimensions of
the filter determine the number of neighbouring cells used in the filtering process.
The filter is passed across the raster data set (Fig. 8.14) and used to recalculate
the value of the target cell that lies at its centre. The new value assigned to the
target cell is calculated using one of a number of algorithms. Examples include
the maximum cell value within the filter and the most frequent value. The raster
data obtained from a classified satellite image may require filtering to ‘smooth’
Geographic Information System (GIS) 321

the noisy (erratic/fuzzy) data caused by high spatial variability in vegetation cover
or problems with the data collection device.
322 Surveying

8.12 map oVerlay analySiS


The map overlay technique to integrate data from various sources is perhaps the
key GIS analysis function. Using GIS, it is possible to take two different thematic
map layers of the same area and overly them one on top of the other to form
a new layer through a common reference network or coordinate system. These
individual layers must be spatially registered. For example, obtaining an answer to
the question, ‘Which nursing homes are within 300 m of a main road?’ requires
the use of several operations. First, a buffering operation must be applied to find
all the area of land within 300 m of a main road, and then overlay function used
to combine this buffer zone with the nursing home data layer. This will allow the
identification of nursing homes within the buffer zone.
As with many other operations and analyses in GIS, there are differences in the
way map overlay are performed between the raster and vector worlds. In vector-
based systems, map overlay is time consuming, complex and computationally
expensive whereas in raster-based systems, it is quick, straightforward and
efficient.
The techniques of GIS map overlay are analogous to sieve-mapping in
conventional methods of surveying; the overlaying of tracing of paper maps on a
light table. The concept of map overlay is illustrated through a case study of soil
erosion in a watershed as follows.
Figure 8.15 illustrates georeferenced data from a number of sources and
is used to study the soil erosion potential. In this illustration, the data maps
(a) are computer coded with respect to a grid (b). The data maps are encoded by
recording the information category most dominant in each cell in the grid. That is,
each cell is assigned a single soil type in the soil data file, a single cover type in
the land cover file, and an average elevation in the topographic file. The activity
of making the data type compatible is inherently accomplished by encoding the
Geographic Information System (GIS) 323

maps on a common grid. The job of interpreting applicable characteristics (slope,


erodibility, and run off) from the original data is a simple one for the computer.
The slope information can be derived from the elevations in the topographic files.
The erodibility can be derived from the data-based management system; and the
run off potential an attribute associated with each land cover-type can also be
calculated. The three sources of data can be interrelated by the analyst to identify
the sites prone to soil erosion. The overlay, also called a composite analysis,
consists of evaluating the data values within each cell in the combined grid
matrix. Complex weighting schemes may be applied to increase the importance
of the more critical variables. The resulting output grid can be displayed as a
matrix or printed characters. The output can also be generated on a line plotter, a
colour monitor, or a precision film recorder.

8.12.1 V ov c s
Vector map overlay relies heavily on the two associated disciplines geometry and
topology. The overlaid data layers need to be topologically correct so that lines
meet at nodes and all polygon boundaries are closed. To create topology for a new
data layer produced as a result of the overlay process, the intersections of lines and
polygons from the input layers need to be calculated using geometry. The three
main types of vector overlay the point-in-polygon, the line-in-polygon and the
polygon-on-polygon are as shown in Fig. 8.16. The overlay of two or more data
layers representing simple spatial features results in a more complex output layer.
This will contain more polygons, more intersections and more line segments than
either of the input layers.
The point-in-polygon overlay is used to find out the polygon in which a point
falls. For example, using the point-in-polygon overlay, it is possible to find out
in which land-use polygon are each of the fire stations located. Figure 8.16(a)
illustrates this overlay process. On the output map, a new set of fire station points
is created with additional attributes describing land use.
The line-in-polygon overlay is more complicated. Suppose that it is required
to know the parts of the roads passing through the new and old city areas. To do
this, we need to overlay the road data on a data layer containing city polygon.
The output map will contain roads split into smaller segments representing roads
in new city areas and those in the old city areas. Topological information must
be retained in the output map (Fig. 8.16(b)); therefore this is more complex than
either of the two input maps. The output map will contain a database record of
each new road segment.
The polygon-on-polygon overlay of Fig. 8.16(c) could be used to examine
the area of market in new/old city. Two input data layers a market data layer
contained in city polygons and the market boundary layer are required. Three
different outputs could be obtained which are shown in Fig. 8.16(c) and are
presented below:
1. The output data layer could contain all the polygons from both the input
maps. In this case, the question posed is ‘Where are areas of market or
areas which are within the new/old city?’ This corresponds to the Boolean
OR operation, or in mathematical set terms, UNION.
324 Surveying

2. The output data layer could contain the whole of the market area, and
the city area within this. The boundary of the market would be used as
the edge of the output map, and city areas would be cut away if they fall
outside it. This operation is referred to as ‘cookie cutting’. It is equivalent
to the mathematical IDENTITY operation. The questions being answered
are ‘Where is the market boundary, and which areas of city are within
this?’ This overlay might be used for calculation of the percentage of the
area of the city covered by the market.
3. The output data layer could contain areas that meet both the criteria; that
is, area that is both market and within the new city. An output map would
be produced showing the whole of the new city polygon that are entirely
covered by the market, and ‘cut’ away the new city polygon which crosses
the market boundary. This is the mathematical INTERSECT operation,
and the output map shows where the two input layers intersect. ‘Where are
market areas within the new city area?’ is the question being answered.

have a look on this example


Geographic Information System (GIS) 325

8.12.2 r s ov c s
In the raster data structure, everything is represented by grid cells a point is
represented by a single cell, a line by a string of cells and an area by a group of
cells. A raster map overlay introduces the idea of map algebra or mapematics.
Using map algebra, input data layers may be added, subtracted, multiplied or
divided to produce output. Mathematical operations are performed on individual
cell values from two or more input layers to produce an output value. Thus, the
most important consideration in raster overlay is the appropriate coding of point,
line and area features in the input data layers.
Consider five of the data layers of a hill station that have been registered and
are as follows.
Layer Code
1. Location of nursing home 1
2. Road 2
3. Agriculture land 3
4. Land use
(i) Habitat 1
(ii) Water 2
(iii) Agriculture land 4
(iv) Forest 5
5. Hill Station 10
On all data layers, ‘0’ is the value given to cells that do not contain features of
interest.
To find out how many nursing homes are contained within the hill station, an
operation equivalent to the vector point-in-polygon overlay is required. The two
data layers may be added as shown in Fig. 8.17(a). The output map would contain
cells with the following values:
1. 0 for cells outside the hill station boundary and without nursing homes
2. 1 for cells containing nursing homes, but outside the hill station
boundary
3. 10 for cells inside the hill station boundary, but without nursing homes
4. 11 for cells inside the hill station boundary and containing nursing
homes
To know about the sections/parts of roads that pass through forest area, an
operation equivalent to the vector line-in-polygon method (Fig. 8.17(b)) is required.
This would require the roads data layer, and reclassified version of the land use
map that contain only forest area. The two data layers will be added.
The output map would contain cells with the following values:
1. 0 for cells with neither roads nor forest present;
2. 2 for cells with roads, but outside forest areas;
3. 5 for cells with forest present, but roads absent;
4. 7 for cells with both forest and roads present.
If the value ‘2’ for a road was added to land-use codes, the new value for a cell
could be the same as that for another land use type (for example, a road value
of 2 + water value of 2 = 4 (which is the same as the value here for an agriculture land).
326 Surveying

Thus, the coding of raster images used in overlay is very important, and frequently users
employ Boolean images (using only codes 1 and 0) so that algebraic equations will produce
a meaningful answer.

The polygon-on-polygon analysis is conducted in just the same way (Fig.


8.17(c)). For example, adding the forests layer and the hill station boundary would
produce an output layer with the following codes:
0 for cells outside the hill station boundary and with forest absent
5 for cells outside the hill station boundary and with forest present
10 for cells inside the hill station boundary and with forest absent
15 for cells inside the hill station boundary and with forest present
Geographic Information System (GIS) 327

The output map is equivalent to a union polygon-on-polygon overlay in vector


GIS.
Reclassification will produce variants of this, and other overlay operations are
available, by multiplying, subtracting or dividing the data layers. The algebraic
manipulation of images in raster GIS is a powerful and flexible way of combining
data and organising analysis. Equations can be written with maps as variables
to allow the development of spatial models. Figure 8.17(d) represents polygon
operation using Boolean alternative.

8.13 data Quality


The success of any GIS application depends on the quality of the geographic data
used. Collecting high-quality geographic data for input to GIS is therefore an
important activity. In GIS, data quality is used to give an indication of how good the
data are. It describes the overall fitness or suitability of data for a specific purpose
or is used to indicate data free from errors and other problems. Some pointers
for gauging the overall quality of GIS database are error, accuracy, precision and
bias. In addition, the resolution and generalisation of source data, and the data
model used, may influence the portrayal of features of interest. Data sets used for
analysis need to be complete, compatible and consistent, and applicable for the
analysis being performed. These concepts are explained below.
1. Flaws in data are usually referred to as errors. Error is the physical difference
between the real world and the GIS facsimile. Errors may be single, definable
departures from reality, or may be persistent, widespread deviations throughout
a whole database.
2. Accuracy is the extent to which an estimated data value approaches its true
value. If a GIS database is accurate, it is a true representation of reality. It is
impossible for a GIS database to be 100% accurate, though it is possible to have
data that are accurate to within specified tolerances.
3. Precision is the recorded level of detail of the data. A coordinate in metres to
the nearest ten decimal places is more precise than one specified to the nearest
three decimal places. Computers store data with a high level of precision, though
a high level of precision does not imply a high level of accuracy.
4. Bias in GIS data is the systematic variation of data from reality. Bias is a
consistent error throughout a data set. A consistent overshoot in digitised data
caused by a badly calibrated digitiser, or the consistent truncation of the decimal
points from data values by a software program, are possible examples. These
examples have a technical source. However, human sources of bias also exist. An
aerial photograph interpreter may have a consistent tendency to ignore all features
below a certain size. Although such consistent errors should be easy to rectify,
they are often very difficult to spot.
5. Resolution is the term used to describe the smallest feature in a data set that
can be displayed or mapped. In raster GIS, resolution is determined by cell size.
For example, for a raster data set with a 20 m cell size, only those features that
328 Surveying

are 20 m 20 m or larger can be distinguished. At this resolution it is possible


to map large features such as fields, lakes and urban areas but not individual trees
or telegraph poles. Vector data can also have resolution, although this is described
in different terms. Resolution is dependent on the scale of the original map, the
point size and line width of the features represented thereon, and the precision
of digitising.
6. Generalisation is the process of simplifying the complexities of the real world
to produce scale models and maps. Cartographic generalisation is a subject in itself
and is the cause of many errors in GIS data derived from maps. It is the subjective
process by which the cartographer selectively removes the enormous detail of the
real world in order to make it understandable and attractive in map form.
7. Completeness: A complete data set covers the study area and the time period
of interest in its entirety. The data should be complete spatially and temporally,
and should have a complete set of attribute information.
8. Compatibility data sets can be used together sensibly. With GIS it is possible
to overlay two maps, one originally mapped at scale of 1:500 000 and the other
at 1:25 000. The result, however, is largely worthless because of incompatibility
between the scales of the source documents. Maps containing data measured
in different scales of measurement cannot be combined easily. To ensure
compatibility, ideally data sets should be developed using similar methods of data
capture, storage, manipulation and editing.
9. Consistency applies not only to separate data sets but also within individual data
sets. Inconsistencies can occur within data sets where sections have come from
different source documents or have been digitised by different people. This will
cause spatial variation in the error characteristics of the final data layer. One area
of the final data set may contain more error than another. Problems of inconsistency
also come from the manner in which the data were collected.
10. Applicability is a term used to describe the appropriateness or suitability of
data for a set of commands, operations or analyses.

8.14 SourceS of errorS in GiS


Go through the heading part only
Spatial and attribute errors can occur at any stage in a GIS project. They may
arise during the definition of spatial entities, from the representation of these
entities in the computer, or from the use of data in analysis. In addition, they
may be present in source data, arise during conversion of data to digital format,
occur during data manipulations and processing, or even be produced during the
presentation of results.

8.14.1 e sa s g u s g m g
r
Errors can originate from the ways in which we perceive, study and model reality.
These errors can be termed conceptual errors, since they are associated with the
representation of the real world for study and communication.
Geographic Information System (GIS) 329

The different ways in which people perceive reality can have effects on how
they model the world using GIS. The perception of reality influences the definition
of reality, and in turn the use of spatial data. This can create real errors and often
gives rise to inconsistencies between data collected by different surveyors, maps
drawn by different cartographers, and databases created by different GIS users.
In geography, and GIS, spatial models are used to reflect reality. The main
models in use are raster, vector, object-oriented and layer based. All of these spatial
models have limitations when it comes to portraying reality. For instance, the
raster model assumes that all real-world features can be represented as individual
cells. This is clearly not the case. The vector model assumes that all features can
be given a single coordinate or a collection of Cartesian coordinates. The world
is actually made up of physical and biological materials, which is, in turn, made
up of molecular and submolecular matter grouped into complex systems linked
by flows of energy and materials (solids, liquids and gases). Whatever GIS model
we adopt, it is a simplification of this reality, and any simplification of reality will
include errors of generalisation, completeness and consistency.

8.14.2 e s S d GiS
The models of reality in GIS are built from a variety of data sources including
survey data, remotely sensed and map data. All sources of spatial and attribute
data for GIS are likely to include errors.
Survey data can contain errors due to mistakes made by people operating
the equipments or recording the observations, or due to technical problems with
equipments.
Remotely sensed and aerial photography data could have spatial errors if they
were spatially referenced wrongly, and mistakes in classification and interpretation
would create attribute errors.
Maps are probably the most frequently used sources of data for GIS. Maps
contain both relatively straightforward spatial and attribute errors caused by
human or equipment failings, and more subtle errors, introduced as a result of the
cartographic techniques employed in the map-making process. Generalisation is
one cartographic techniques that may introduce errors.

8.14.3 e s d e g
Data encoding is the process by which data are transferred from some non-GIS
source, such as the paper map, satellite image or survey, into a GIS format.
The method of data encoding, and the conditions under which it is carried out,
are perhaps the greatest source of error in most GIS. Digitising, both manual
and automatic, is an important method of data entry. Despite the availability of
hardware for automatic conversion of paper maps into digital form, much of the
digitising of paper maps is still done using a manual digitising table. Manual
digitising is recognised by researchers as one of the main sources of error in GIS;
however, digitising error is often largely ignored.
Sources of error within the digitising process are many, but may be broken
down into two main types: source map error and operational error. Operational
330 Surveying

errors are those introduced and propagated during the digitising process. Human
operations can compound errors present in an original map and add their own
distinctive error signature.
Automatic digitising, like manual digitising, requires correct registration of the
map document before digitising commences, but there the similarity ends. By far
the most common method of automatic digitising is the use of a raster scanner.
This input device suffers from the same problems regarding resolution as the
raster data model.

8.14.4 e s d e g c v s
After data encoding is complete, cleaning and editing are almost always required.
These procedures are the last line of defence against errors before the data are
used for analysis. Of course, it is impossible to spot and remove all the errors, but
many problems can be eliminated by careful scrutiny of the data.
A different problem occurs when automated techniques are used to clean raster
data. The main problem requiring attention is ‘noise’—the misclassification of cells.
Noise can be easy to spot where it produces a regular pattern, such as striping. At
other times, it may be more difficult to identify as it occurs as randomly scattered
cells. These noise errors can be rectified by filtering the raster data to reclassify
single cell or small groups of cells by matching them with general trends in the
data. The ‘noisy’ cells are given the same value as their neighbouring cells.
After cleaning and editing data it may be necessary to convert the data from
vector to raster or vice versa. During vector-to-raster conversion both the size of
the raster and the method of rasterisation used have important implications for
positional error and, in some cases, attribute uncertainty. The smaller the cell size,
the greater is the precision of the resulting data. Finer raster sizes can trace the
path of a line more precisely and therefore help to reduce classification error—a
form of attribute error. Positional and attribute errors as a result of generalisation
are seen as classification error in cells along the vector polygon boundary. The
conversion of data from raster to vector format is largely a question of geometric
conversion; however, certain topological ambiguities can occur, such as where
differently coded raster cells join at corners.

8.14.5 e s d p ss g a ss
Errors may be introduced during the manipulation and analysis of the GIS database.
GIS users must ask themselves questions before initiating a GIS analysis. For
example: Are the data suitable for this analysis? Are they in a suitable format? Are
the data sets compatible? Are the data relevant? Will the output mean anything?
Is the proposed technique appropriate to the desired output? These questions
may seem obvious but there are many examples of inappropriate analysis. These
include the inappropriate phrasing of spatial queries, overlaying maps which have
different coordinate systems, combining maps which have attributes measured in
incompatible units, using maps together that have been derived from source data of
widely different map scales, and using an exact and abrupt method of interpolation
to interpolate approximate and gradual point data.
Geographic Information System (GIS) 331

GIS operations that can introduce errors include the classification of data,
aggregation or disaggregation of area data and the integration of data using overlay
techniques.
Classification errors also affect raster data. Classified satellite images
provide a reflectance value for each pixel within a specific wavelength range
or spectral band (for example, red, near infrared or microwave). Raster maps of
environmental variables, such as surface cover type, are derived by classifying
each pixel in the image according to typical reflectance values for the range of
individual cover types present in the image. Error can occur where different land
cover types have similar reflectance values and where shadows cast by terrain,
trees or buildings reduce the reflectance value of the surface. Careful choice of
classification method can help to reduce this type of error.
Where a certain level of spatial resolution or a certain set of polygon
boundaries are required, data sets that are not mapped with these may need to
be aggregated or disaggregated to the required level. This is not a problem if the
data need to be aggregated from smaller areas into larger areas, provided that
the smaller areas nest hierarchically into the larger areas. Problems with error
do occur, however, if we wish to disaggregate our data into smaller areas or
aggregate into larger non-hierarchical units. The information required to decide
how the attribute data associated with the available units aggregate into the larger
but non-nested units or disaggregate into a set of smaller units, rarely exists.
Error arising from map overlay in GIS is a major concern and has
correspondingly received much attention in the GIS literature. This is primarily
because much of the analysis performed using GIS consists of the overlay of
categorical maps (where the data are presented in a series of categories). GIS
allows the quantitative treatment of these data (for example, surface interpolation
or spatial autocorrelation), which may be inappropriate. Map overlay in GIS
uses positional information to construct new zones from input map layers using
Boolean logic or ‘mapematics’. Consequently, positional and attribute errors
present in the input map layers will be transferred to the output map, together
with additional error introduced by multiplicatory effects and other internal
sources. Data output from a map overlay procedure are only as good as the
worst data input to the process.
Perhaps the most visual effect to positional error in vector map overlay is
the generation of sliver polygons. Slivers (or ‘weird’ polygons) occur when two
maps containing common boundaries are overlaid. If the common boundaries in
the two separate maps have been digitised separately, the coordinates defining the
boundaries may be slightly different as the result of digitising error. When the map
are overlaid, a series of small, thin polygons will be formed where the common
boundaries overlap (Fig. 8.18). Slivers may also be produced when maps from
two different scales are overlaid. Of course, sliver polygons can and do occur by
chance, but genuine sliver polygons are relatively easy to spot by their location
along common boundaries and their physical arrangement as long thin polygonal
chains.
332 Surveying

8.14.6 e d o
From the preceding discussion it should be clear that all GIS database will
contain error. In addition, further errors will be introduced during manipulation
and analysis of the data. Therefore, it is inevitable that all GIS output, whether
in the form of a paper map or a digital database, will contain inaccuracies. The
extent of these inaccuracies will depend on the care and attention paid during the
construction, manipulation and analysis of the databases. It is also possible that
errors can be introduced when preparing GIS output.

8.15 applicationS of GiS Read all areas of application once


The concept of geographic information infrastructure has brought about a dramatic
philosophical and technological revolution in the development of GIS. Instead
of being used simply as a set of software tools for processing and analysing
geographic data stored locally, GIS is now a gateway for accessing and integrating
geographic data from different sources located locally and globally, increasingly
used for interactive visualisation of scenarios resulting from different business
decisions, as well as for the communication of spatial knowledge and intelligence
among people all over the world. GIS has popularised the use of geographic
information by empowering individuals and organisations to use such information
in areas that earlier generations of GIS user could never have thought of even with
their wildest imagination. It is now common place for ordinary people to use GIS
to check the weather and traffic conditions before they leave home for work and
Geographic Information System (GIS) 333

find information about the country or city they are about to visit. Increasingly,
business people rely on GIS to identify locations where to set up their new shops
and to determine the best routes to deliver their goods and services. At the same
time, GIS has become an indispensable tool for government officials to manage
land and natural resources, monitor the environment, formulate economic and
community development strategies, enforce law and order and deliver social
services. Major application areas in GIS are listed in Table 8.2.

Sectors Application Areas


Academic Research in engineering, science and humanities.
Primary and secondary schools—school district delineation, facilities,
management, bus routing, spatial digital libraries.
Industry Engineering—surveying and mapping, site and landscape development,
pavement management.
Transportation—route selection for goods delivery, public transit, vehicle
tracking.
Utilities and communications—electricity and gas distribution, pipelines,
telecommunication networks.
Forestry—forest resource inventory, harvest planning, wildlife management
and conservation.
Mining and mineral exploration.
Systems consulting and integration.
Business Banking and insurance.
Real estate—development project planning and management, sales and
renting services, building management.
Retail and market analysis.
Delivery of goods and services.
Government Central government—national topographic mapping, resource and
environmental management, weather services, public land management,
population census, election, and voting.
State government—surveying and mapping, land and resources
management, highway planning and management.
Local/municipal government—social and community development, land
registration and property assessment, water and wastewater services.
Public safety and law enforcement—crime analysis, deployment of human
resources, community policing, emergency planning and management.
Health care.
International development and humanitarian relief.
Military Training.
Command and control.
Intelligence gathering.
334 Surveying

8.16 SelectiVe GiS SoftwareS read only its types


A GIS requires specialised programs tailored for manipulation of geographic data.
Other kinds of databases may have very large volumes of data, but do not need
to retain locational information for data. Therefore, GIS software must satisfy the
special needs of the analyst who needs to reference data by geographic location.
Furthermore, the GIS must provide the analyst with the capability to solve the
special problems that arise whenever maps or images are examined—the problems
of changing coordinate systems, matching images, bringing different images into
registration, and so on. A GIS must be supported by the ability to perform certain
operations related to the geographic character of the data. For example, it must be
capable of identifying data by location or by specified areas in order to retrieve and
to display data in a map-like image. Thus, the GIS permit the analyst to display
data in a map-like format, so that geographic patterns and interrelationships are
visible to the analyst.
Furthermore, the software for a GIS must be able to perform operations that
relate values at one location to those at neighbouring locations. For example, to
compile slope information from elevation data, it is necessary to examine not
only specific elevation values, but also those at neighbouring locations, in order
to calculate the magnitude and directions of the topographic gradient.
A GIS, of course, consists not only of a single data set, but also of many that
together show several kinds of information for the same geographic area. Thus,
a GIS may include data for topographic elevation, streams and rivers, land use,
political and administrative boundaries, power line, and other variables. This
combined data set is useful only if the several overlays register to one another
exactly, and therefore the several kinds of data must share a common coordinate
system, because separate variables are likely to be derived from quite different
reference systems, and different cartographic projections. Thus, a GIS must have
special programs to bring data into registration by changing scale and geometric
qualities of the data.
The current trend of GIS software development is to move away from the
proprietary development environment to open industry standards. It is now possible
to build application software modules with programming languages, such as
Visual Basic, Visual C++, and Power Builder, and to integrate them with the GIS
functions originally supplied by the software vendor. The concepts and techniques
of using generic computer languages to build GIS applications are based on the
use of component software. This is a software engineering methodology that has
been evolving since the early 1990s. There has been considerable success in using
this approach to effectively address the integration of separate computer-based
applications such as document imaging, optical character recognition, database
query, and fax. GIS application can obviously be benefited from this new approach
to software development.

8.16.1 i g l w i S s
(ilwiS) 3.1
ILWIS integrates image, vector and thematic data in one unique and powerful
package on the desktop. ILWIS delivers a wide range of features including import/
Geographic Information System (GIS) 335

export, digitising, editing, analysis and display of data as well as production of


quality maps.

1. Integrated raster and vector design


2. Import and export of widely used data formats
3. On-screen and table digitising
4. Comprehensive set of image processing tools
5. Orthophoto, image georeferencing, transformation and mosaicing
6. Advanced modelling and spatial data analysis
7. 3D visualisation with interactive editing for optimal view findings
8. Rich projection and coordinate system library
9. Geo-statistical analyses for improved interpolation
10. Production and visualisation of stereo image pairs

Applied geomorphology and 1. Hazard, vulnerability and risk analysis


natural hazards 2. Flood hazard analysis using multitemporal
satellite
3. Modelling cyclone hazard
4. Modelling erosion potential of catchment
5. Statistical landslide hazard analysis
6. Deterministic landslide hazard zonation
7. Seismic landslide hazard zonation
Engineering geology 8. Creating an engineering geological database
Surface hydrology 9. Irrigation water requirement
10. Irrigation area characteristics
11. Determination of peak run off
12. Morgan approach for erosion modelling
Hydro-geology 13. Assessing aquifer vulnerability to pollution
Geology 14. Remote sensing and GIS techniques applied to
geological survey
15. Geological data integration

Neighbourhood modelling 16. Modelling with neighbourhood operators


17. Extracting topographic and terrain variables
for distributed models
Data combination 18. Tools for map analysis applied to the selection
of a waste disposal site

Urban surveys 19. Updating a land use map with oblique air
photos
20. Analysis of urban change and spatial pattern
21. Analysis of suitability for urban expansion

You might also like