NSM2016 Spatial Data Management
NSM2016 Spatial Data Management
net/publication/306456793
CITATIONS READS
2 4,194
1 author:
Geir-Harald Strand
Norwegian Institute of Bioeconomy Research
69 PUBLICATIONS 715 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Geir-Harald Strand on 24 August 2016.
1. Introduction
Much of the official statistics produced today is based on primary data with direct or
indirect spatial reference. The reference can be an explicit coordinate, but is more
likely to be an address, a cadastral unit or an administrative region. This kind of ref-
erences allow the observations to be linked by location, providing for new ap-
proaches to data collection, new opportunities for data analysis and more use of
maps as a visualization and communication tool. An example from the statistical
community is the increased use of geographical grids as a framework for spatial sta-
tistics (Strand & Bloch 2009, Fujimoto et al. 2015).
Spatially referenced data can be visualised as thematic maps, but also cross-linked
by location and analysed with respect to spatial aspects. The potential is probably
not fully utilized. This assertion can be illustrated using an example. Part of the
Norwegian economic statistics for agriculture is based on a detailed review of the
accounts from 910 farms. A possible application of these statistical data is to examine
the difference in the economic results of sheep-farmers in areas with and without the
Fortunately, the cadastral property identification code for each farm has been rec-
orded. This is an asset, because the National Agricultural Administration maintains
a database where the cadastral property identification code of every farm holding in
Norway is kept, together with key geographical data (including a representative
point location). The location of each farm in the survey is thus obtained by connect-
ing the two data sets and retrieving the locations from the NAA database. When this
is done, the survey data can be linked (by position) to a digital map of the manage-
ment areas for large carnivores. The latter data set is produced and maintained by
the National Environmental Administration and available as part of the National
Spatial Data Infrastructure. Through this link, the survey farms can be classified and
analysed according to their location: Inside or outside the management areas for
large carnivores.
The example shows that there is a potential for a spatial application, but it is not yet
utilized. Such examples are probably abundant. Increased attention to management
of the spatial aspect of statistics is therefore expected to improve the efficiency of
data collection, provide new opportunities for data analysis and improve the com-
munication of statistics through more and better cartography.
2. Everything is somewhere
“Everything is somewhere” is the catchy title of a geography quiz book for children
(McClintock 1986). This assertion may not quite be true – but much of the primary
data used in official statistics does have a location and can be linked to a place. Peo-
ple live somewhere and they do work somewhere. Accidents happen somewhere.
Goods are produced somewhere and maybe sold somewhere else. It is consequently
transported between those places. Much statistics can be produced without any
knowledge about these locations, but more statistics can be created when the loca-
tions are known – possibly also contributing new and valuable information (Good-
child 2007).
The spatial references can be direct or indirect. A direct spatial reference is an explic-
it coordinate or set of coordinates. The direct spatial reference can be used to place
the observation on a map. The indirect spatial reference is a reference to another ob-
ject that possesses the required direct spatial reference. An address, a cadastral unit
or the identification code of an administrative (e.g. NUTSx) region can act as indirect
references. This information cannot, by itself, place an observation on a map. But the
coordinates of the referenced object can be used for this purpose.
Figure 1: GSBPM steps 4 – 7 with step 5 described as “Data management” instead of “Data processing”
Proper spatial references allow the process to collect data by linkage to other data-
bases, and use spatial analysis and cartographic communication if needed. Spatial
references allow the data in a particular business process to interact, through spatial
linkages, with spatial data residing in other business processes. The multiple use of
the statistical system for farm accounts, described above, is an example of such in-
teraction. The prerequisite for interaction is that proper spatial references – direct or
indirect - are obtained in the data collection phase and that they are stored for later
use. Consequently, there is strong a need for (spatial) data management. This is cur-
rently not included in GSBPM (although it could be seen as variant of phase 5: Pro-
cessing data).
Data management is at the core of a spatial data information chain. In principle, the
spatial reference is simply an additional characteristic of an observation. A database
containing spatial data is at first glance only an extension of any ordinary data base
structure. Latitude and longitude (or any other spatial reference, e.g. NUTS code,
Data analysis (phase 6 of GSBPM) is where information is extracted from the data-
base. A database stocked with spatially referenced data allows the analyst to include
the spatial context and relationships in the analysis (e.g. Voss 2007). Survey data can
also be downscaled using small area estimation methodology (Strand & Aune-
Lundberg 2012, Leyk et al. 2013). Finally, the dissemination phase of the GSBPM is
where the results are conveyed to users. Spatial references enable the use of maps as
part of the reporting.
Any information that could be drawn on a map (if the spatial reference were availa-
ble) is potentially spatial data. The spatial reference can be direct (by coordinate) or
indirect (by reference to another dataset containing coordinates). Clearly, it is im-
portant to maintain access to key data supporting indirect spatial reference (cadas-
tres, address registers, NUTS data, grids and postal codes are but a few examples).
A basic property unit is a juridical entity, but does not correspond to the actual farm-
ing unit, defined as an economic entity. A farm can – and will frequently – consist of
several basic property units. This is a changeable relationship, and currently not rep-
resented in the cadastre. Instead, a centralized registry (the farm register) has been
established, connecting basic property units to the operational farm units. The spa-
tial references in the cadastre are explicit and direct, providing coordinates for indi-
vidual plots of land. The spatial references in the farm register are indirect, using the
basic property numbers as references. Any change in the cadastre, e.g. adjusting the
geometry of a parcel boundary, is thus immediately also reflected in the registry. As
a consequence of this organization, a particular application can request the registry
to return a list of all the basic property units belonging to a particular farm unit. This
list is, as a next step, used to request a longer list, containing all the parcels for each
basic property unit, from the cadastre.
Figure 2: Cadastral units and Land resource map units combined results in “atomic” spatial elements
(sometimes called Minimal Mapping Units), unique with respect to cadastral as well as land resource
information
The digital land resource “map” is a database representing a partition (in a mathe-
matical sense) of the land surface. Formally, the data structure is quite similar to the
parcel data held in the cadastre. Each unit in the database is an observation with an
explicit spatial reference (a geometry) and a set of attributes characterizing the area.
A national standardization program has assured that the reference geometry is com-
patible with the cadastre, as well as with all other spatial information held by public
institutions in Norway. Consequently, although the actual shape of the spatial units
in the cadastre and the land resource map are different, they can be combined by
simple geometrical operations in order to create “atomic” spatial units.
The combination of cadastral units and land resource map units results in “atomic”
spatial units. An “atomic” unit is unique with respect to its ancestors – in this case
the cadastral as well as land resource information (Figure 2). Each “atomic” spatial
unit reference a single cadastral unit as well as a single land resource unit. With ref-
erence to the farm registry, as described above, it is now possible to assemble all the
atomic elements that belong to a particular farm unit and compute land resource
statistics (area by land resource class) for the farm unit. The example can of course be
extended to any combination of entities present in a common geographical space.
The Norwegian system for farmland statistics is using this approach to combine spa-
tially referenced data from multiple sources – all using spatial data management and
national geospatial standards to maintain compatibility (Figure 3). Information is
The user initiates the farmland statistics by choosing a farm identification code. The
identification code is used in a request to the central farm register, which will return
a list of the basic property units that constitute the farm. This list is used in a request
to the national cadastre, which returns a list of land parcels – including geometry.
The extreme north, south, east and west coordinates are used to define a bounding
box around the farm
Figure 3: The Norwegian system for farmland statistics is combining data from multiple sources in
order to compile on-the fly statistical information.
The bounding box is used in a request to the land resources database, which returns
the land resource units falling (at least partially) within the box. This is a potentially
time-consuming operation, and spatial indexing of the database is critical in order to
ensure a rapid response. The organization of the spatial database is thus also an im-
portant aspect of the system.
The farmland statistics system is possible due to spatial data management. Basic
information is available and each topic is maintained by a particular institution
without redundant copies that creates uncertainty regarding data authority. Data are
reused for several purposes, and standardization ensures compatibility between sys-
tems. Another important factor is the organization of a national spatial data infra-
structure facilitating the sharing and exchange of data between public agencies.
6. Conclusion
Efficient and flexible use of spatial information in a statistical production process
following the GSBPM model requires systematic and well-designed data storage
between data collection and analysis. There is wide acceptance of the fact that the
efficiency of the information chain is enhanced when the data storage aspect is pro-
fessionalized. It allows better documentation and easier access to data, and also facil-
itates multiple uses of the same data. This requires that data management is taken in
as an element of the “processing” phase in the GSBPM. We maintain that this is
equally true for spatial data. Spatial data management involves a systematic ap-
proach to include spatial data and spatial references in the overall database man-
agement strategy of the information chain.
The methodology as well as the technology needed to build, maintain and use spa-
tial data management systems are well known and thoroughly tested. The obstacle is
mainly organizational. The Survey and statistics division of NIBIO has developed its
spatial data management system over a period of 20 years. Our experience is that a
number of organizational factors represent the key to successful spatial data man-
agement
7. References
Egenhofer, M. J., Frank, A. U. and Jackson, J. P. (1989) A topological data model for
spatial databases, In Buchmann, A.P., Günther, O., Smith, T.R. and Wang, Y-F. (eds)
Design and Implementation of Large Spatial Databases, Lecture Notes in Computer
Science, 409: 271-286. Springer Berlin-Heidelberg
Fujimoto, S., Mizuno, T., Ohnishi, T., Shimizu, C. and Watanabe, T. (2015) Geograp-
hic Dependency of Population Distribution. In: Proceedings of the International Con-
ference on Social Modeling and Simulation, plus Econophysics Colloquium 2014, 151
- 162, Springer International Publishing.
Leyk, S., Buttenfield, B. P., Nagle, N. N., and Stum, A. K. (2013) Establishing relat-
ionships between parcel data and land cover for demographic small area estimation.
Cartography and Geographic Information Science, 40: 305-315.
Marceau, D. J., Guindon, L., Bruel, M., and Marois, C. (2001) Building temporal topo-
logy in a GIS database to study the land-use changes in a rural-urban environment.
The Professional Geographer, 53: 546-558.
Strand, G-H. (2001) The role of Agriculture and Forestry in a National Geospatial
Data Infarstructure, Third International Conference on Geospatial Information in
Agriculture and Forestry, Denver, Colorado 5-7 November 2001
Strand, G-H. (2013) The Norwegian area frame survey of land cover and outfield
land resources. Norsk Geografisk Tidsskrift - Norwegian Journal of Geography 67,
24-35.
Strand, G-H. and Bloch, V.V.H. (2009) Statistical grids for Norway. Documentation
of national grids for analysis and visualization of spatial data in Norway.
Documents 2009/9, Statistics Norway, Oslo.
Tomter, S.M., Hylen, G., Nilsen, J.E. (2010) Norway. In: Tomppo, E., Gschwanter, T.,
Lawrence, M., McRoberts, R. (Eds.), National Forest Inventories, Pathways for
Common Reporting. Springer, pp. 411-424.