GIS Notes PDF
GIS Notes PDF
8 information
SyStem (GiS)
Go through the highlighted portion, Focus on the small definition and try to
understand the conceot with example given here.One more thing ..try to summarised
the flow chart portion .
Introduction
The development of technology and the geographical world is taking place at a very
fast pace. This necessitates having a system capable to analyse and manipulate the
spatially referenced data or remotely sensed information, and give the desired output
(information) in a very short time. A geographic information system helps to better
understand the world around us and enables development of spatial intelligence
for logical decision-making. Several definitions of GIS exist in literature, some
of which are as follows.
GIS is a computer-based information system which attempts to capture, store,
manipulate, analyse and display spatially referenced and associated attribute data
Basic definitionfor solving complex research, planning and management problems.
GIS is a system of hardware, software, data, and people organising, collecting,
storing, analysing and disseminating information about the areas of the earth.
GIS is an information technology which stores, analyses and displays both
spatial and non-spatial data.
GISs are specialised data bases that preserve locational identities of the
information that they record.
The word graphic in GIS carries two meanings: earth and geographic space.
By earth, it implies that all data in the system are pertinent to earth’s features and
resources, including human activities based on or associated with these features and
resources. By geographic space, it means that the commonality of both the data and
the problems that the systems are developed to solve, is geography, i.e., location,
distribution, pattern, and relationship within a specific geographical reference
framework. The word information implies that data in a GIS are organised to yield
useful knowledge often as coloured maps and images, statistical graphics, tables
and various on-screen activities. The word system implies that a GIS is made up
from several inter-related and linked components with different functions.
The term ‘Geographic Information System’ first appeared in published literature
in 1970. Although it sounds to be relatively a new term, many of its concepts have
been in existence for centuries. For example, to illustrate fundamentals that still
comprise the basis of GIS, consider the base plan of a locality of a city wherein
it is required to show public utilities such as water supply network, fire hydrants,
302 Surveying
The objective of collecting geographic data and converting them into useful
information (desired output, e.g., map, table, etc.) by means of a GIS transcend
the traditional boundary of data processing and information management. The main
advantage of GIS is rapid analysis and display of data, with flexibility not possible
using manual methods. GIS does not hold maps or pictures it holds a database.
That is how it is different from computer mapping, which can produce only good
graphic output. GIS, as we understand it today, is very different from its predecessor.
GIS, earlier used for computer-based applications for map-data processing, is now
an essential component of the information technology infrastructure of modern
society. It is a multidisciplinary science. GIS practitioners may be geographers,
surveyors, planners, or computer engineers. Despite the diversity in approaches, a
special set of skills and knowledge is required by professionals to use GIS in all
its forms and implementations. GIS, today, has become an indispensable tool to
manage land and natural resources, monitor the environment, formulate economic
and community-development strategies, enforce law and order, and deliver social
services.
The main purposes of GIS are:
1. To support decision-making based on spatial data; for example, an
engineering geologist may evaluate slope stability conditions through GIS
for deciding the best new route
2. To support general research
3. To collect, manipulate and use spatial data in database management
4. To produce standardised and customised cartographic production
two different formats in a GIS in the form of: the Cartesian coordinates and the
raster format (e.g., grids). The graphical features are stored with their location
(x, y) defined by latitude and longitude, and attribute data may be qualitative (e.g.,
land use), or quantitative. The location can be represented either by a raster (grid
cell format), or a vector format (polygon). In raster format, a location is defined
by the row and column position of the cell it occupies; the value corresponding to
the location indicates the type of feature. In the vector format, the geographic space
is continuous and the data structure is more representative of the dimensionality
as in a map. For GIS data input, either of the two data formats can be used. The
two data formats, their structure and conversion from one format to the other are
discussed in the sections to follow.
Once the data is transformed into the computer, this data has to be stored to
create a permanent database for further data analysis and manipulation. The digital
map file is stored on a magnetic or optical digital medium. The encoded spatial
data are stored systematically in the form of layers, known as GIS layers. These
layers are archived in the digital format as a geographically-referenced plane in the
GIS database. The database files are stored in the central processing unit (CPU)
memory and can be processed and manipulated.
The computer program that is employed to organise the database is commonly
known as Database Management System (DBMS). The analysed and manipulated
result of the data has to be displayed or presented to the user in a user-specified
format for decision-making purpose. Either the data may be presented as maps,
tables and figures on the screen, or recorded on magnetic media in digital format
or as hardcopy output drawn on printer or plotter.
not soo important only refet the device used
Geographic Information System (GIS) 305
or mark positions of points and lines with the cursor. As the cursor moves, the
digitiser creates a digital record of its successive positions usually as a sequence
of coordinate pairs.
8.3 data for GiS read all the form of data used for collecting information
The prime aspect in the construction of GIS is the acquisition of data. Data can
be gathered directly in the field by an original survey specifically carried out
for the purpose of GIS and is known as captured data. However, it will be very
time consuming and costly and thus rarely resorted to. Alternatively, the data can
be obtained/derived from a source that is already available, such as topographic
maps, digitised maps and plans, aerial photographs, satellite imagery, or directly
from GPS survey. Such data are called encoded data. Data that are obtained by
human intervention, for example, from sketches of landscapes, a questionnaire,
etc., are called the interpreted data. Those which are in a table or in GIS are
called structured or organised data. The remotely sensed images and digital data,
and the extracted information from these are the primary source of modern GIS.
The digital remote sensing data are in raster format and are acquired by sensors
through a scanning device. The device collects the data at successive instants of
time by dividing the field of view in a grid pattern; each grid element is called
pixel. A sensor records a series of radiometric values for each pixel of each band,
it is sensing, and the image is built up by a combination of consecutive pixel lines
(scan lines). Data in GIS may be classified as spatial and non-spatial.
1. Spatial data Also called graphical data, it consists of natural and cultural
features that can be shown with lines or symbols on maps, or that can be seen
Geographic Information System (GIS) 307
2. Line and string data: Lines and strings are obtained by connecting points. A line
connects two points (Fig. 8.4(b)), and a string is a sequence of two or more lines.
Line and string data are formed by features such as highways, railways, canals,
rivers, pipelines, power lines, etc. An arc, which is the locus of points, may be
defined by a spline curve or polynomial mathematical function.
3. Areal data: An area or polygon consists of a continuous space within three or
more connected lines. Examples of areal data (Fig. 8.4(c)) include distribution such
as soil type, parcels (pockets) of land ownership, different types of land cover,
vegetation classes, and other patterns that occupy area at the scale of the GIS.
4. Pixels: These are usually tiny squares that represent the smallest elements into
which a digital image is divided (Fig. 8.4(d)). Continuous arrays of pixels, arranged
in rows and columns, are used to enter data from aerial photos, satellite images,
orthophotos, etc. The distributions of colours or tones throughout the image are
specified by assigning a numerical value to each pixel. Pixel size can be varied
and is specified either in terms of image or object scale. At the image scale,
pixel size may be specified directly (e.g., 0.025 × 0.025 mm) or as a number of
pixels per unit distance (e.g., 10 dots per cm, where a dot corresponds to a pixel).
At the object scale, pixel size is usually expressed directly by dimension (e.g.,
10 m pixel size).
5. Grid cells: These are single elements, usually square, within a continuous
geographic variable. Similar to pixels, their sizes can be varied, with smaller cells
yielding improved resolution. Grid cells (Fig. 8.4(e)) may be used to represent
terrain slopes, soil types, land cover, water-table depths, land values, population
density, etc. The distribution of a given data type within an area is indicated by
Geographic Information System (GIS) 309
assigning a numerical value to each cell. For example, to show soil types in an area,
numerals 1, 2, and 3 may be used to represent sand, silt and clay respectively.
string elements demarcate and locate boundaries of different regions. The river
has been shown by string elements. Tables similar to 8.1 can be constructed and
entered into GIS using the vector format.
The representation of vector data is governed by the scale of the input data.
For example, a building that is represented as a polygon on a large-scale map will
become a point on a medium-scale map, and it will not be represented at all as an
individual entity on a small-scale map (unless it is a very important landmark). The
possibility of representing vector data differently at different scales is associated
with two important concepts: (i) cartographic generalisation, whereby line and
areal objects are represented by coordinates at a larger scale and (ii) cartographic
symbolisation, whereby vector data are represented by different symbols that serve
to visually distinguish them from one another when the data are displayed.
In the computer, vector data can be stored as integers or floating point numbers.
In order to avoid the problem of rounding off errors that occur during data
processing, most GIS software products store vector data by using double-precision
floating point numbers. This creates the impression that vector data are accurate
and precise representations of spatial objects in the real world. However, this is
not necessarily true because the precision of data storage does not always mean
accurate description of the data, and also, the boundaries of many spatial objects
are fuzzy rather than exact entities. Thus, storing vector data by double-precision
floating point numbers does not improve the quality of the data, but simply
serves to avoid degradation of data quality due to rounding errors during data
processing.
For the input of raster data, first the region of interest is subdivided into a
network of cells of uniform size and shape (regular, square or rectangular). The
linear dimensions of each cell define the spatial resolution of data or the precision
with which the data is represented. Thus, the size of an individual pixel or cell
is determined by the size of the smallest object in the geographic space to be
represented. The size is also known as the minimum mapping unit (MMU). A
general rule is that the grid size should be less than half the size of the MMU.
Once the grid cell size has been decided, each grid cell is assigned a value,
which can be an integer, a floating point number, or a character (a code value). A
raster data along with characteristics is shown in Fig. 8.8(a). The values marked
indicate the quantity, or characteristics of the spatial object, or phenomenon that
is found at the location of the cell. The input of the vector counterparts of this
raster data is also shown in Fig. 8.8(b). The value 3 has been used to classify the
raster cells according to land use—the road—at the given location. The remaining
cells are filled with 0 indicating that no identity is present at that location. There
are four methods for the input of the vector counterparts of the raster data the
dominant method, the precedence method, the presence/absence method, the per
cent occurrence method (see Section 8.9) available in the literature.
In a raster database, values pertaining to different characteristics at the same
cell location are stored in separate files (map layers). For example, a road and
forest cover for the same area are stored as separate road and forest data layers.
When the data are used for processing, the appropriate layers are retrieved. This
means that raster data processing always involves the use of multiple raster files,
in the same way different layers are used in vector data processing.
When a specific raster layer is displayed, it is shown as a two-dimensional
matrix of grid cells. In computer storage, the raster data are stored as a linear
array of attribute values. Since the dimension of the data (the number of rows
and columns) is known, the location of each cell is implicitly defined by its row
and column numbers. There is no need to store the coordinates of the cell in the
data file. The locations of the cells can be computed when the data are used for
display and analysis.
In order to translate linear array storage to a two-dimensional display, enough
information must be stored in the header section of the data file as well. In general,
the file header contains information about the number of bits used to represent
the value in each cell, the number of rows and columns, the type of image, the
legend, the name or the colour palette (if the file uses one), and the name of the
look-up table (if the file uses one). Some file headers also contain parameters for
coordinate transformation so that raster data in the files can be georeferenced.
This is, however, a system-dependent feature. The cells in each line of the image
(Fig. 8.8(a)) are mirrored by an equivalent row of numbers in the file structure
(Fig. 8.8(c)). The first line of the file structure indicates the computer that the file
consists of 6 rows and 6 columns and that the maximum cell value is 3.
Raster data files are stored in different file formats. The differences between
these file formats are due mainly to the different algorithms used to compress
the raster data files. In order to minimise the data-storage requirements, raster
data are often stored in compressed form. The data are decompressed ‘on-the-fly’
Geographic Information System (GIS) 313
when they are used by an application program. The raster model (the geometrical
arrangement of the figures covering the surface) is best employed to represent
geographic phenomenon that are continuous over a large area.
6, 6, 3
0 3 0 0 0 0 0 3 0 0 0 0
0 3 0 0 0 0 0 3 0 0 0 0
0 3 3 3 0 0 0 3 3 3 0 0
0 0 0 3 0 0 0 0 0 3 0 0
0 0 0 3 3 0 0 0 0 3 3 0
0 0 0 0 3 0 0 0 0 0 3 0
1. A vector database can depict point data as points which can be positioned
accurately. However, a raster database can depict point data only at the
level of the detail of a single cell. This leads to loss of accuracy, for
example, a cell can show the presence of a tower, within a cell but cannot
show its placement within this cell. Of course, the cell size would affect the
depiction but a raster database cannot be as accurate as a vector database.
It may further be noted that some points may represent the quantitative
characteristics, e.g., amount of rainfall or elevation but these information
cannot be included in the raster format.
314 Surveying
2. A vector database can show a line data in exact and fine detail, whereas
a raster database can show the same line as a zigzag or a comparatively
more smooth line depending upon the resolution of the cell.
3. A vector database provides details and exact/fine boundaries among aerial
patterns, e.g., land cover. However, in case of a raster database, the accuracy
is lost for the reason explained earlier.
4. Discrete quantitative data such as population, which are grouped/associated
with an area, are best depicted in finer detail by polygons (vector
format). However, continuous data such as topographic elevation/contours,
represented by a network of equally spaced observations, can probably be
most directly presented by a raster format.
5. A vector database is best suited to represent various natural/artificial
features and also these can be presented mathematically (coordinates).
This makes the vector format conceptually more complex than the raster
format.
6. A vector database require less storage space on the computer as compared
to a raster database for the same information. Also, the vector formats are
more accurate and present a finer detail of shapes and sizes as compared
to the raster format.
Some of the other notable disadvantages of raster data are coding of a cell with
a single value (category) whereas many features may be present in it; recording of
spatial objects only to the nearest cell which may not correspond/match in reality
and also may not exist in realty, e.g., watershed; and coarser resolution of spatial
features leading to inaccurate representation.
Some of the notable advantages of raster format are use of simpler computer
programs for data manipulation; ideally suitable for a variety of spatial analysis
functions, for example, overlay, buffering and network analysis; direct use of
remote-sensing data which are in this format; use of available image processing
software for refining raster images; and in some data types (soil, boundary, wet
land, built-up area, etc.) which are relatively vague, its use does not significantly
degrade the inherent accuracy of the data.
the value B or 1, as shown in Fig. 8.11(d), because the larger part (more than 50
per cent) of the cell is occupied by buildings.
arranged in many ways, and unless the organisation scheme is suitable for the
application at hand, useful information cannot be easily extracted. Schemes for
organising data are sometimes called data models (structures) the vector and
the raster already discussed in the previous sections. Data models organise
observations both by spatial and non-spatial attributes. Thus, data organisation has
a fundamental importance.
2. Visualisation: Visualisation is achieved in GIS with colour and by specialised
methods using perspective, shadowing and other means. The graphical capabilities
of computers are exploited by transforming a table of data, for example, into a
visual display through which the spatial associations can be visualised. Complex
relationship probably can be better understood by visual display rather than from
a table of data. Further a visual display can be manipulated to give alternative
views/representation of the data, thereby enhancing the capability to analyse the
anomalies and patterns through GIS. Visual display is obtained either on the video
monitor or other output devices such as colour printers.
3. Combination: The ability to merge spatial data sets from quite different sources
their manipulation and subsequent display can often lead to an understanding and
interpretation of spatial phenomena that are simply not apparent when individual
spatial data types are considered in isolation. The data measuring activity combines
image data for a certain geographic area with other reference data of the same area.
The GIS operator may overlay multiple images of this area at different dates a
technique used for identifying changes over time, for example, monitoring of forest
fire or spreading of disease in tree species. The process of combining layers of
spatial data is sometimes called data integration and can be carried out either by
visualising composite displays of various kinds, or with integration models that
effectively create a new map from two or more existing maps.
4. Prediction: Prediction is one of the purposes of GIS. For example, a number
of data layers indicating population data in different regions of a city along with
the growth patterns and civic facilities might be combined together to predict the
future population at the desired time in different parts of the city. Such a map
may then be used as a basis for making city development decisions. Prediction
may sometimes also be a research exercise to explore the outcome of making a
particular set of assumptions, often with the purpose of examining the performance
of a model.
5. Queries: Since GIS is a decision support system, performing queries on a GIS
database to retrieve information (data) is its essential part. Queries offer a method
of data retrieval, and can be performed on data that are part of the GIS database,
or on new data produced as a result of data analysis. These are useful at all stages
of GIS analysis for checking the quality of data and the results obtained. A GIS
typically stores spatial and non-spatial (also called aspatial or attribute) data in
two separate files. The GIS has capability to search and display spatial data based
on attribute criteria and vice versa. Accordingly, there are two general types of
query that can be performed with GIS: spatial and aspatial. Aspatial queries are the
questions about the attributes of features. ‘How many nursing homes are there?’
Geographic Information System (GIS) 319
is an aspatial query since neither the question nor the answer involves analysis
of the spatial component of data. This query could be performed by database
software alone. A question requiring information about ‘where’ is a spatial query.
This requires linking the data sets using location as the common key. A GIS has
the capacity to satisfy the following queries: only read the type of Queries
(a) About location: What exists at a particular location? The location of the
particular region can be described in many ways using place name, post or pin
code, or geographic reference, such as latitude and longitude.
(b) Condition: This query requires spatial analysis to give an answer. Instead of
identifying what exists at a certain location, one seeks to find a location where
certain conditions are satisfied.
(c) Pattern: This query is more sophisticated and important as one might want to
know how many anomalies are there within an area over a time.
(d) Trend: This query might involve both location and conditions and seeks to find
differences within an area over a period of time.
(e) Modelling: This query is posed to determine what happens if some addition or
changes are done in the existing network, e.g., to determine the extent and level of
contamination in an area if some toxic substances seeps into the ground water and
thence to the local water supply. For answering these queries, both geographic and
other information and possibly even scientific laws may be required. These queries
require efficient search of data items and capability for deriving their geometric
and topological attributes.
6. Reclassification: Although query is the most widely used function to retrieve
data from a GIS database, irrespective of the vector or raster model, reclassification
can also be used in place of query in the raster model. Consider a land-use image
from which we require to extract information on areas of schools. The answer
to this query could be obtained by creating a new coverage that eliminates all
unnecessary data. Reclassification would result in a new image. For example, in
a raster image, if cells representing schools in the original image had a value of
30, a set of rules for the reclassification could be
(a) Cells with values 30 (schools) should take the new value of 1.
(b) Cells with values other than 30 should take the new value of 0.
Such a reclassification will generate a new image with all schools coded with
1, and all the rest coded with 0. The resulting reclassified image is very useful
for land use/land cover and environmental studies.
Important part, read it thoroughly
8.11 neiGhbourhood functionS
There is a range of functions available in GIS that allow a spatial entity to influence
its neighbours, or the neighbours to influence the character of an entity. The most
common examples are buffering, proximity analysis and filtering.
1. Buffer operation: Buffering is the creation of a zone of interest around an entity.
Buffering is possible in both vector and raster GIS. In the vector case, the result
320 Surveying
is a new set of objects, while the result in the raster case is the classification of
cells according to whether they lie inside or outside the buffer. Buffers are very
useful for analysing landscapes, highway alignments, water supply networks and
drainage studies.
In most GIS data analysis, there is more than one method of achieving an
answer to a question. The trick is to find the most efficient method, and the most
appropriate analysis. For example, the question, ‘Which nursing homes are within
300 m of a main road?’ could be approached in a number of ways. One option
would be , first, to produce a buffer zone identifying all land up to 300 m from
the main road; and then, to find out which nursing homes fall within this buffer
zone using a point-in-polygon overlay. Then another query can be made to find the
names of the nursing homes. An alternative approach would be used to measure
the distance from each nursing home to a main road, and then to identify those
which are less than 300 m away. Repeated measurement of distances from nursing
homes to roads could be time consuming and prone to human error. Thus, the first
approach using buffering would be more appropriate.
Conceptually, buffering is very simple but involves complex computational
operation. If a point is buffered, a circular zone is created. Buffering lines and
areas creates new areas (Fig. 8.13). Creating buffer zones around point features is
the easiest operation; a circle of the required radius is simply drawn around each
point. However, creating buffer zones around line and area features are a little
more complicated. Some GIS do this by placing a circle of the required radius
at one end of the line or area boundary to be buffered. This circle is then moved
along the length of the segment. The path that the edge of the circle tangential
to the link makes is used to define the boundary to the buffer zone. Sometimes,
there may be a need for another buffer around a buffer. This is called a doughnut
buffer.
2. Proximity analysis: While buffer zones are often created with the use of one
command or option in vector GIS, a different approach is used in many raster
GISs. Here, proximity is calculated which results in a new raster data layer where
the attributes of each cell is a measure of distance. This is known as proximity
analysis.
3. Filtering: Data filtering involves the recalculation/reallotment of cells in a raster
image based on the characteristics of neighbours. Filtering is a technique used for
the processing of remotely sensed imagery. Filtering will change the value of a
cell based on the attributes of neighbouring cells. The filter is defined as a group
of cells around a target cell. The size and shape of the filter are determined by
the operator. Common filter shapes are squares and circles, and the dimensions of
the filter determine the number of neighbouring cells used in the filtering process.
The filter is passed across the raster data set (Fig. 8.14) and used to recalculate
the value of the target cell that lies at its centre. The new value assigned to the
target cell is calculated using one of a number of algorithms. Examples include
the maximum cell value within the filter and the most frequent value. The raster
data obtained from a classified satellite image may require filtering to ‘smooth’
Geographic Information System (GIS) 321
the noisy (erratic/fuzzy) data caused by high spatial variability in vegetation cover
or problems with the data collection device.
322 Surveying
8.12.1 V ov c s
Vector map overlay relies heavily on the two associated disciplines geometry and
topology. The overlaid data layers need to be topologically correct so that lines
meet at nodes and all polygon boundaries are closed. To create topology for a new
data layer produced as a result of the overlay process, the intersections of lines and
polygons from the input layers need to be calculated using geometry. The three
main types of vector overlay the point-in-polygon, the line-in-polygon and the
polygon-on-polygon are as shown in Fig. 8.16. The overlay of two or more data
layers representing simple spatial features results in a more complex output layer.
This will contain more polygons, more intersections and more line segments than
either of the input layers.
The point-in-polygon overlay is used to find out the polygon in which a point
falls. For example, using the point-in-polygon overlay, it is possible to find out
in which land-use polygon are each of the fire stations located. Figure 8.16(a)
illustrates this overlay process. On the output map, a new set of fire station points
is created with additional attributes describing land use.
The line-in-polygon overlay is more complicated. Suppose that it is required
to know the parts of the roads passing through the new and old city areas. To do
this, we need to overlay the road data on a data layer containing city polygon.
The output map will contain roads split into smaller segments representing roads
in new city areas and those in the old city areas. Topological information must
be retained in the output map (Fig. 8.16(b)); therefore this is more complex than
either of the two input maps. The output map will contain a database record of
each new road segment.
The polygon-on-polygon overlay of Fig. 8.16(c) could be used to examine
the area of market in new/old city. Two input data layers a market data layer
contained in city polygons and the market boundary layer are required. Three
different outputs could be obtained which are shown in Fig. 8.16(c) and are
presented below:
1. The output data layer could contain all the polygons from both the input
maps. In this case, the question posed is ‘Where are areas of market or
areas which are within the new/old city?’ This corresponds to the Boolean
OR operation, or in mathematical set terms, UNION.
324 Surveying
2. The output data layer could contain the whole of the market area, and
the city area within this. The boundary of the market would be used as
the edge of the output map, and city areas would be cut away if they fall
outside it. This operation is referred to as ‘cookie cutting’. It is equivalent
to the mathematical IDENTITY operation. The questions being answered
are ‘Where is the market boundary, and which areas of city are within
this?’ This overlay might be used for calculation of the percentage of the
area of the city covered by the market.
3. The output data layer could contain areas that meet both the criteria; that
is, area that is both market and within the new city. An output map would
be produced showing the whole of the new city polygon that are entirely
covered by the market, and ‘cut’ away the new city polygon which crosses
the market boundary. This is the mathematical INTERSECT operation,
and the output map shows where the two input layers intersect. ‘Where are
market areas within the new city area?’ is the question being answered.
8.12.2 r s ov c s
In the raster data structure, everything is represented by grid cells a point is
represented by a single cell, a line by a string of cells and an area by a group of
cells. A raster map overlay introduces the idea of map algebra or mapematics.
Using map algebra, input data layers may be added, subtracted, multiplied or
divided to produce output. Mathematical operations are performed on individual
cell values from two or more input layers to produce an output value. Thus, the
most important consideration in raster overlay is the appropriate coding of point,
line and area features in the input data layers.
Consider five of the data layers of a hill station that have been registered and
are as follows.
Layer Code
1. Location of nursing home 1
2. Road 2
3. Agriculture land 3
4. Land use
(i) Habitat 1
(ii) Water 2
(iii) Agriculture land 4
(iv) Forest 5
5. Hill Station 10
On all data layers, ‘0’ is the value given to cells that do not contain features of
interest.
To find out how many nursing homes are contained within the hill station, an
operation equivalent to the vector point-in-polygon overlay is required. The two
data layers may be added as shown in Fig. 8.17(a). The output map would contain
cells with the following values:
1. 0 for cells outside the hill station boundary and without nursing homes
2. 1 for cells containing nursing homes, but outside the hill station
boundary
3. 10 for cells inside the hill station boundary, but without nursing homes
4. 11 for cells inside the hill station boundary and containing nursing
homes
To know about the sections/parts of roads that pass through forest area, an
operation equivalent to the vector line-in-polygon method (Fig. 8.17(b)) is required.
This would require the roads data layer, and reclassified version of the land use
map that contain only forest area. The two data layers will be added.
The output map would contain cells with the following values:
1. 0 for cells with neither roads nor forest present;
2. 2 for cells with roads, but outside forest areas;
3. 5 for cells with forest present, but roads absent;
4. 7 for cells with both forest and roads present.
If the value ‘2’ for a road was added to land-use codes, the new value for a cell
could be the same as that for another land use type (for example, a road value
of 2 + water value of 2 = 4 (which is the same as the value here for an agriculture land).
326 Surveying
Thus, the coding of raster images used in overlay is very important, and frequently users
employ Boolean images (using only codes 1 and 0) so that algebraic equations will produce
a meaningful answer.
8.14.1 e sa s g u s g m g
r
Errors can originate from the ways in which we perceive, study and model reality.
These errors can be termed conceptual errors, since they are associated with the
representation of the real world for study and communication.
Geographic Information System (GIS) 329
The different ways in which people perceive reality can have effects on how
they model the world using GIS. The perception of reality influences the definition
of reality, and in turn the use of spatial data. This can create real errors and often
gives rise to inconsistencies between data collected by different surveyors, maps
drawn by different cartographers, and databases created by different GIS users.
In geography, and GIS, spatial models are used to reflect reality. The main
models in use are raster, vector, object-oriented and layer based. All of these spatial
models have limitations when it comes to portraying reality. For instance, the
raster model assumes that all real-world features can be represented as individual
cells. This is clearly not the case. The vector model assumes that all features can
be given a single coordinate or a collection of Cartesian coordinates. The world
is actually made up of physical and biological materials, which is, in turn, made
up of molecular and submolecular matter grouped into complex systems linked
by flows of energy and materials (solids, liquids and gases). Whatever GIS model
we adopt, it is a simplification of this reality, and any simplification of reality will
include errors of generalisation, completeness and consistency.
8.14.2 e s S d GiS
The models of reality in GIS are built from a variety of data sources including
survey data, remotely sensed and map data. All sources of spatial and attribute
data for GIS are likely to include errors.
Survey data can contain errors due to mistakes made by people operating
the equipments or recording the observations, or due to technical problems with
equipments.
Remotely sensed and aerial photography data could have spatial errors if they
were spatially referenced wrongly, and mistakes in classification and interpretation
would create attribute errors.
Maps are probably the most frequently used sources of data for GIS. Maps
contain both relatively straightforward spatial and attribute errors caused by
human or equipment failings, and more subtle errors, introduced as a result of the
cartographic techniques employed in the map-making process. Generalisation is
one cartographic techniques that may introduce errors.
8.14.3 e s d e g
Data encoding is the process by which data are transferred from some non-GIS
source, such as the paper map, satellite image or survey, into a GIS format.
The method of data encoding, and the conditions under which it is carried out,
are perhaps the greatest source of error in most GIS. Digitising, both manual
and automatic, is an important method of data entry. Despite the availability of
hardware for automatic conversion of paper maps into digital form, much of the
digitising of paper maps is still done using a manual digitising table. Manual
digitising is recognised by researchers as one of the main sources of error in GIS;
however, digitising error is often largely ignored.
Sources of error within the digitising process are many, but may be broken
down into two main types: source map error and operational error. Operational
330 Surveying
errors are those introduced and propagated during the digitising process. Human
operations can compound errors present in an original map and add their own
distinctive error signature.
Automatic digitising, like manual digitising, requires correct registration of the
map document before digitising commences, but there the similarity ends. By far
the most common method of automatic digitising is the use of a raster scanner.
This input device suffers from the same problems regarding resolution as the
raster data model.
8.14.4 e s d e g c v s
After data encoding is complete, cleaning and editing are almost always required.
These procedures are the last line of defence against errors before the data are
used for analysis. Of course, it is impossible to spot and remove all the errors, but
many problems can be eliminated by careful scrutiny of the data.
A different problem occurs when automated techniques are used to clean raster
data. The main problem requiring attention is ‘noise’—the misclassification of cells.
Noise can be easy to spot where it produces a regular pattern, such as striping. At
other times, it may be more difficult to identify as it occurs as randomly scattered
cells. These noise errors can be rectified by filtering the raster data to reclassify
single cell or small groups of cells by matching them with general trends in the
data. The ‘noisy’ cells are given the same value as their neighbouring cells.
After cleaning and editing data it may be necessary to convert the data from
vector to raster or vice versa. During vector-to-raster conversion both the size of
the raster and the method of rasterisation used have important implications for
positional error and, in some cases, attribute uncertainty. The smaller the cell size,
the greater is the precision of the resulting data. Finer raster sizes can trace the
path of a line more precisely and therefore help to reduce classification error—a
form of attribute error. Positional and attribute errors as a result of generalisation
are seen as classification error in cells along the vector polygon boundary. The
conversion of data from raster to vector format is largely a question of geometric
conversion; however, certain topological ambiguities can occur, such as where
differently coded raster cells join at corners.
8.14.5 e s d p ss g a ss
Errors may be introduced during the manipulation and analysis of the GIS database.
GIS users must ask themselves questions before initiating a GIS analysis. For
example: Are the data suitable for this analysis? Are they in a suitable format? Are
the data sets compatible? Are the data relevant? Will the output mean anything?
Is the proposed technique appropriate to the desired output? These questions
may seem obvious but there are many examples of inappropriate analysis. These
include the inappropriate phrasing of spatial queries, overlaying maps which have
different coordinate systems, combining maps which have attributes measured in
incompatible units, using maps together that have been derived from source data of
widely different map scales, and using an exact and abrupt method of interpolation
to interpolate approximate and gradual point data.
Geographic Information System (GIS) 331
GIS operations that can introduce errors include the classification of data,
aggregation or disaggregation of area data and the integration of data using overlay
techniques.
Classification errors also affect raster data. Classified satellite images
provide a reflectance value for each pixel within a specific wavelength range
or spectral band (for example, red, near infrared or microwave). Raster maps of
environmental variables, such as surface cover type, are derived by classifying
each pixel in the image according to typical reflectance values for the range of
individual cover types present in the image. Error can occur where different land
cover types have similar reflectance values and where shadows cast by terrain,
trees or buildings reduce the reflectance value of the surface. Careful choice of
classification method can help to reduce this type of error.
Where a certain level of spatial resolution or a certain set of polygon
boundaries are required, data sets that are not mapped with these may need to
be aggregated or disaggregated to the required level. This is not a problem if the
data need to be aggregated from smaller areas into larger areas, provided that
the smaller areas nest hierarchically into the larger areas. Problems with error
do occur, however, if we wish to disaggregate our data into smaller areas or
aggregate into larger non-hierarchical units. The information required to decide
how the attribute data associated with the available units aggregate into the larger
but non-nested units or disaggregate into a set of smaller units, rarely exists.
Error arising from map overlay in GIS is a major concern and has
correspondingly received much attention in the GIS literature. This is primarily
because much of the analysis performed using GIS consists of the overlay of
categorical maps (where the data are presented in a series of categories). GIS
allows the quantitative treatment of these data (for example, surface interpolation
or spatial autocorrelation), which may be inappropriate. Map overlay in GIS
uses positional information to construct new zones from input map layers using
Boolean logic or ‘mapematics’. Consequently, positional and attribute errors
present in the input map layers will be transferred to the output map, together
with additional error introduced by multiplicatory effects and other internal
sources. Data output from a map overlay procedure are only as good as the
worst data input to the process.
Perhaps the most visual effect to positional error in vector map overlay is
the generation of sliver polygons. Slivers (or ‘weird’ polygons) occur when two
maps containing common boundaries are overlaid. If the common boundaries in
the two separate maps have been digitised separately, the coordinates defining the
boundaries may be slightly different as the result of digitising error. When the map
are overlaid, a series of small, thin polygons will be formed where the common
boundaries overlap (Fig. 8.18). Slivers may also be produced when maps from
two different scales are overlaid. Of course, sliver polygons can and do occur by
chance, but genuine sliver polygons are relatively easy to spot by their location
along common boundaries and their physical arrangement as long thin polygonal
chains.
332 Surveying
8.14.6 e d o
From the preceding discussion it should be clear that all GIS database will
contain error. In addition, further errors will be introduced during manipulation
and analysis of the data. Therefore, it is inevitable that all GIS output, whether
in the form of a paper map or a digital database, will contain inaccuracies. The
extent of these inaccuracies will depend on the care and attention paid during the
construction, manipulation and analysis of the databases. It is also possible that
errors can be introduced when preparing GIS output.
find information about the country or city they are about to visit. Increasingly,
business people rely on GIS to identify locations where to set up their new shops
and to determine the best routes to deliver their goods and services. At the same
time, GIS has become an indispensable tool for government officials to manage
land and natural resources, monitor the environment, formulate economic and
community development strategies, enforce law and order and deliver social
services. Major application areas in GIS are listed in Table 8.2.
8.16.1 i g l w i S s
(ilwiS) 3.1
ILWIS integrates image, vector and thematic data in one unique and powerful
package on the desktop. ILWIS delivers a wide range of features including import/
Geographic Information System (GIS) 335
Urban surveys 19. Updating a land use map with oblique air
photos
20. Analysis of urban change and spatial pattern
21. Analysis of suitability for urban expansion