0% found this document useful (0 votes)
30 views71 pages

Spatial Databases (All Chapters)(Seng 3174) (1)

Uploaded by

dugasagemechu154
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views71 pages

Spatial Databases (All Chapters)(Seng 3174) (1)

Uploaded by

dugasagemechu154
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Chapter one

Spatial Databases Introduction


1. Introduction
A SDBMS is a software module that
 can work with an underlying DBMS
 supports spatial data models, spatial abstract data types (ADTs) and a query
language from which these ADTs are callable
 supports spatial indexing, efficient algorithms for processing spatial operations, and
domain specific rules for query optimization
Example: Oracle Spatial data cartridge, ESRI SDE , can work with Oracle 8i DBMS ,
Has spatial data types (e.g. polygon), operations (e.g. overlap) callable from SQL3
query language ,Has spatial indices, e.g. R-trees

 A spatial database is a collection of spatial data types, operators, indices, processing


strategies, etc. and can work with many postrelational DBMS as well as programming
languages like Java, Visual Basic etc.

 Spatial DB vs. Image/pictorial DB (90s)


 Spatial DB contains objects in the space
 Image DB contains representations of a space (images, pictures...: raster
data)

 Spatial databases provide structures for storage and analysis of spatial data

 Spatial data is composed of objects in multi-dimensional space

 Storing spatial data in a standard database would require excessive amounts of


space

 Queries to retrieve and analyze spatial data from a standard database would be long.

 Spatial databases provide much more efficient storage, retrieval, and analysis of
spatial data

 A spatial database is an ORDBMS that has the ability to store, query, manipulate and
analyze spatial data as well as traditional data formats

 It offers spatial data types/data models/ query language


 Structure in space: e.g., POINT, LINE, REGION
 Relationships among them: (ex:intersects)

 It provides spatial indexing (retrieving objects in particular area without scanning


the whole space).

1
 It provides efficient algorithms for spatial joins .

 SDBMS is mainly used for vector format.

 It defines data types for points, lines, polygons, multipoint, multiline, and
multipoloygon

 These databases have built in functions for manipulating spatial data anywhere from
100 to 300 functions

 Most common are functions for querying data such as overlap, intersect, touch, etc.

 Also including are geoprocessing functions such as union, merge, buffer, etc.
1.1. Properties of Spatial Database
 A spatial database is optimized to store and query data representing objects. These
are the objects which are defined in a geometric space.
1. It is a database system
2. It offers spatial data types (SDTs) in its data model and query language.
3. It supports spatial data types in its implementation, providing at least spatial
indexing and efficient algorithms for spatial join.
1.2 . Relationship between spatial databases and GIS
What is GIS?
Defining Geographic Information Systems (GIS)
A powerful set of tools for
collecting, storing, retrieving, transforming, and displaying spatial data from the real
world. (Burroughs, 1986)
A computerized database management system for
the capture, storage, retrieval, analysis and display of spatial (locationally defined)
data. (NCGIA, 1987)
A decision support system involving the integration of spatially referenced data in a
problem solving environment. (Cowen, 1988)
…intuitive description
 A map with a database behind it.
 A virtual representation of the real world and its infrastructure.
 A consistent “as-built” of the real world, natural and manmade
Which is
 queried to support on-going operations
 summarized to support strategic decision making and policy formulation
 analyzed to support scientific inquiry

What’s in a GIS?
GIS had three main components:

2
 a Database Management System;
 a Spatial Analytical Toolkit;
 a Mapping Package.
 GIS technology integrates common database operations (such as query and
statistical analysis) with the unique visualisation and geographic analysis benefits
offered by maps.

Data represents the second key component of GIS technology


• The GIS stores and manages the data not as a map but as a series of layers or, as
they are sometimes called, themes.
• In GIS, attributes are stored with the geographic data.
• SDBMS is a technology used by GIS application (geographic/georeferenced data).
• The database should be viewed as a representation or model of the world developed
for a very specific application
Spatial Databases
 Spatial databases provide structures for storage and analysis of spatial data
 Spatial data is comprised of objects in multi-dimensional space
 Storing spatial data in a standard database would require excessive amounts of
space
 Queries to retrieve and analyze spatial data from a standard database would be long
and cumbersome leaving a lot of room for error
 Spatial databases provide much more efficient storage, retrieval, and analysis of
spatial data
Spatial Databases Uses and Users
 Three types of uses
– Manage spatial data
– Analyze spatial data
– High level utilization
 A few examples of users
– Transportation agency tracking projects
– Insurance risk manager considering location risk profiles
– Doctor comparing Magnetic Resonance Images (MRIs)
– Emergency response determining quickest route to victim
– Mobile phone companies tracking phone usage
How is a sdbms different from a GIS?

3
GIS is a software to visualize and analyze spatial data using spatial analysis functions
 GIS uses SDBMS to store, search, query, share large spatial data sets
SDBMS focuses on
 Efficient storage, querying, sharing of large spatial datasets
 Provides simpler set based query operations
 Example operations: search by region, overlay, nearest neighbor, distance,
adjacency, perimeter etc.
 Uses spatial indices and query optimization to speed up queries over large
spatial datasets.
SDBMS may be used by applications other than GIS
 Astronomy, Genomics, Multimedia information systems, ...
 Will one use a GIS or a SDBM to answer the following:?
 How many neighboring countries does USA have?
 Which country has highest number of neighbors?
EVOLUTION OF ACRONYM “GIS”
 Geographic Information Systems (1980s)
 Geographic Information Science (1990s)
 Geographic Information Services (2000s)

three meanings of the acronym GIS


 Geographic Information Services
 Web-sites and service centers for casual users, e.g. travelers
 Example: Service (e.g. AAA, mapquest, google) for route planning
 Geographic Information Systems
 Software for professional users, e.g. cartographers
 Example: ESRI Arc/View software
 Geographic Information Science
 Concepts, frameworks, theories to formalize use and development of
geographic information systems and services
 Example: design spatial data types and operations for querying

4
Data types
 The data types explained in this topic include the data types available when
creating a feature class or table with ArcGIS.
 If you store your data in a database or a geodatabase in a database management
system (DBMS), ArcGIS data types and the data types of the DBMS might not
match exactly.
 The types are matched to the closest data type available. This process is referred
to as data type mapping.
 In this process, it is possible that the values will be stored in the DBMS as a
different type, applying different criteria to the data attribute. As a result, the
data type you see in the table or feature class properties in ArcGIS Desktop may
change from what you initially defined.
 Additionally, other data storage formats, such as shapefiles or dbf tables, have
different data type limitations.
 Be sure you know the data type and size limitations of your destination storage
format when moving data between data storage types.
Numbers
You can store numbers in one of four numeric data types:
 Short integer

 Long integer

 Float (single-precision floating-point numbers)

Size
Data type Storable range Applications
(Bytes)

Numeric values without


Short integer -32,768 to 32,767 2 fractional values within
specific range; coded values

Numeric values without


-2,147,483,648 to
Long integer 4 fractional values within
2,147,483,647
specific range

5
Float (single- Numeric values with
approximately -
precision floating- 4 fractional values within
3.4E38 to 1.2E38
point number) specific range

Double (double- Numeric values with


approximately -
precision floating- 8 fractional values within
2.2E308 to 1.8E308
point number) specific range

1.4 Metadata
 Data that provide information about other data.
 Metadata summarizes basic information about data, making finding & working with
particular instances of data easier.
 Metadata can be created manually to be more accurate, or automatically and
contain more basic information.

 metadata are the most forgotten data type

 absolutely necessary if you’re going to use data, or if someone is going to use


your data later (or your derivative information)

 contains information about: - scale, accuracy, projection/datum, data source,


manipulations and how to acquire data

Attributes: - attribute data are non-spatial characteristics that are connected


by tables to points, lines, “events” on lines, and polygons (and in some cases
GRID cells)
 A point, vector or raster geologic map might describe a “rock unit” on a map with a
single number, letter or name, but the associated attribute table might have: age,
lithology, percent quartz, etc, for each rock type on the map.
 most GIS programs can either plot the polygon by the identifier or by one of the
attributes
Spatial Database Elements

 Entity: a phenomenon of interest in reality that is not further subdivided into


phenomena of the same kind eg. city

 Object: a digital representation of all or part of an entity. (city represented by a point


or a region)

 Entity types: similar phenomena to be stored in a database are identified as entity


types. (road, river…)

 Attribute: an attribute is a characteristic of an entity selected for representation.

6
 Layers: spatial objects can be grouped into layers, also called overlays, coverage or
themes

 Metadata

 Spatial Reference System table


Spatial Database Types
There are two types of spatial databases used
 A Spatial Warehouse
 A spatial warehouse only stores spatial data.
 It has data types defined for vector data but few or no functions to
manipulate the data.
 GIS Spatial Database
 Has data types for vector data
 Some have data types for raster data
 Have functions for manipulation and analysis of the data
 Data types and models

Spatial Data types:


 Point : object represented only by its location in space
 Line : representation of moving through or connections in space
 Region : representation of an extent in 2d-space
 Partition: set of region objects that are required to be disjoint (adjacency or region
objects with common boundaries).
 Networks: embedded graph in plane consisting of set of points (vertices) and lines
(edges) objects, e.g. highways, power supply lines, rivers
Data types and models - spatial type system
EXT={lines, regions}, GEO={points, lines, regions}

 Spatial predicates for topological relationships:

7
 inside: geo x regions  bool

 intersect, meets: ext1 x ext2  bool

 adjacent, encloses: regions x regions  bool

 Operations returning atomic spatial data types:

 intersection: lines x lines  points

 intersection: regions x regions  regions

 plus, minus: geo x geo  geo

 contour: regions  lines


Spatial Relationship

 Topological relationships: adjacent, inside, disjoint.

 Direction relationships: e.g. above, below, or north_of, sothwest_of, …

 Metric relationships: e.g. distance

 valid topological relationships between two simple regions: disjoint, in, touch, equal,
cover, overlap
Data for a GIS comes in three basic forms:
Spatial data:

 maps are made of spatial data, made up of points, lines, and areas.

 Spatial data forms the locations and shapes of map features such as buildings,
streets, or cities.
Spatial data are derived from existing maps or aerial photographs.
Image data—using images to build maps

 Image data includes such diverse elements as satellite images, aerial photographs,
and scanned data=> data that has been converted from paper to digital format

8
Tabular data:

 Tabular data is information describing a map feature. For example, a map of


customer locations may be linked to demographic information about those
customers.

 Use of a database management system (DBMS) to allow the user to define the
specific data element types and formats and to store attribute.

Data can be classified into two types of data models:


o Vector model:

 displays graphical data as points, lines or curves, or areas with attributes.

 Cartesian coordinates and computational algorithms of the coordinates define points


in a vector system.

 Lines or arcs are a series of ordered points. Areas or polygons are also stored as
ordered lists of points
=> Vector data requires less computer storage space and maintaining topological
relationships is easier in this system.

9
The raster view of the world:

 A raster based system displays, locates, and stores graphical data by using a matrix
or grid of cells.

 these data are two-dimensional, GIS store various information such as forest cover,
soil type, land use, wetland habitat, or other data in different layers
o Raster data requires less processing than vector data, but it consumes more
computer storage space.

o Raster data model:

 Continuous numeric values, such as elevation, and continuous categories, such as


vegetation types, are represented using the raster model.

How does it work?

 Spatial data is stored using the coordinate system of a particular projection

 That projection is referenced with a Spatial Reference Identification Number (SRID)

 This number corresponds to another table in the database with all of the spatial
reference systems used.

 This allows the database to know what projection each table is in, and if need be, re-
project those tables for calculations
Example of a table in Spatial Database

10
Linkage of Tabular Attributes to Map Feature
How does GIS link between spatial and non-spatial data?
How does it link a Location Symbol with Its Meaning?
• Every geographic feature has at least one unique means of identification: a name or
number usually just called its ID.
• In other words, locational information is linked to specific information in a database

11
Chapter two
2. Databases versus database management systems
2.1. Data base

 A database is a collection of related, logically coherent data used by the application


programs in an organization.

 A database is a collection of data that can be shared by different users.

 It is a group of records and files that are organized so that there is little or no
redundancy.

 A collection of interrelated and persistent data (usually referred to as


the database (DB)).

 A database consists of data in many files. In order to access data from one or more
files easily it is necessary to have some kind of structures or organization.

 Databases are usually designed to manage large bodies of information. This involves

 Definition of structures for information storage (data modeling).

 Provision of mechanisms for the manipulation of information (file and systems


structure, query processing).

 Providing for the safety of information in the database (crash recovery and security).

 Concurrency control if the system is shared by users.


Characteristics of Database
 Concurrent Use :
• A database system allows several users to access the database concurrently.
• Answering different questions from different users with the same (base) data is a
central aspect of an information system. Such concurrent use of data increases the
economy of a system.
ex. The employees of different branches can access the database concurrently and book
journeys for their clients
 Structured and Described Data
• A fundamental feature of the database approach is that the database systems does
not only contain the data but also the complete definition and description of these
data.
• These descriptions are basically details about the extent, the structure, the type and
the format of all data and, additionally, the relationship between the data.
 Data Integrity

12
 Data integrity is a byword for the quality and the reliability of the data of a database
system.
 In a broader sense data integrity includes also the protection of the database from
unauthorized access (confidentiality) and unauthorized changes.
 Data reflect facts of the real world.
 Data Persistence
 Data persistence means that in a DBMS all data is maintained as long as it is not
deleted explicitly.
 The life span of data needs to be determined directly or indirectly be the user and
must not be dependent on system features.
 Additionally, data once stored in a database must not be lost.
 Changes of a database which are done by a transaction are persistent.
Advantages of databases
 Comparing the flat-file system, we can mention several advantages for a database
system.
 Less redundancy
 In a flat-file system there is a lot of redundancy. For example, in the flat file system
for a university, the names of professors and students are stored in more than one
file.
 Inconsistency avoidance
If the same piece of information is stored in more than one place, then any changes in the
data need to occur in all places that data is stored.
 Efficiency
 A database is usually more efficient than a flat file system, because a piece of
information is stored in fewer locations.
 Data integrity
 In a database system it is easier to maintain data integrity, because a piece of data is
stored in fewer locations.
 Confidentiality
 It is easier to maintain the confidentiality of the information if the storage of data is
centralized in one location.
Database Management Systems

13
 Database management systems (DBMSs) are very good at organizing and managing
large collections of persistent data.
 We use DBMSs to help cope with large amounts of data because, when problems get
big, they get hard.
 Consider the task of finding a particular book in a typical university library.
 Now, reconsider that same task if the library doesn’t keep the books arranged in any
particular order or if the library has no indexes.
o Using a big collection of unorganized things is practically
impossible. Structure turns data into information.
o Persistence means that the data exist permanently; they do not disappear
when the computer is shut off.
 DBMSs are like suitcases: they are somewhere to put stuff so that it’s all in one place
and easy to get to.
 DBMSs help protect data from unauthorized access.
 DBMSs help protect data from accidental corruption or loss due to:
o hardware failures such as power outages and computer crashes
o software failures such as operating system crashes
 DBMSs allow concurrent access, meaning that a single data set can be accessed by
more than one user at a time
o virtually all commercial database applications require the data entry staff to
have access to the database simultaneously.
 For example, an airline reservation system cannot restrict access to
the database to a single travel agent.
o concurrent data access introduces unwanted problems caused by two users
manipulating exactly the same data at exactly the same time.
 These problems can cause the database to be corrupted or for a
user’s interface program to never complete its query.
 These problems are analogous to road intersections: if there are no
traffic lights or stop signs, havoc will ensue.

 DBMSs provide mechanisms to prevent concurrent access problems; these


mechanisms are collectively called concurrency control.
 A database management system (DBMS) defines, creates and maintains a
database.
 A set of application programs used to access, update and manage the data (which
form the data management system (DMS)).

14
 The goal of a DBMS is to provide an environment that is
both convenient and efficient to use in
o Retrieving information from the database.
o Storing information into the database.
o allows controlled access to data in the database.
Advantage of DBMS
Data independence:
 Application programs should be as independent as possible from details of data
representation and storage.
 The DBMS can provide an abstract view of the data to insulate application code from
such details.
Efficient data access:
 A DBMS utilizes a variety of sophisticated techniques to store and retrieve data
efficiently. This feature is especially important if the data is stored on external
storage devices.
Data integrity and security:
 If data is always accessed through the DBMS, the DBMS can enforce integrity
constraints on the data
Data administration:
 When several users share the data, centralizing the administration of data can offer
significant improvements.
 Experienced professionals who understand the nature of the data being managed,
and how different groups of users use it, can be responsible for organizing the data
representation to minimize redundancy and fine-tuning the storage of the data to
make retrieval efficient.
Concurrent access and crash recovery:
 A DBMS schedules concurrent accesses to the data in such a manner that users can
think of the data as being accessed by only one user at a time. Further, the DBMS
protects users from the effects of system failures.
Reduced application development time:
 the DBMS supports many important functions that are common to many
applications accessing data stored in the DBMS.
Dis advantage of DBMS

15
 Danger of an Overkill: For small and simple applications for single users a database
system is often not advisable.
 Complexity: A database system creates additional complexity and requirements. The
supply and operation of a database management system with several users and
databases is quite costly and demanding.
 Qualified Personnel: The professional operation of a database system requires
appropriately trained staff. Without a qualified database administrator nothing will
work for long.
 Costs: Through the use of a database system new costs are generated for the system
itself but also for additional hardware and the more complex handling of the system.
 Lower Efficiency: A database system is a multi-use software which is often less
efficient than specialized software which is produced and optimized exactly for one
problem.
 DBMS is a combination of five components: hardware, software, data, users and
procedures
 The hardware is the physical computer system that allows access to data.
 The software is the actual program that allows users to access, maintain and update
data. In addition, the software controls which user can access which parts of the
data in the database.
 In a DBMS, the term users have a broad meaning. We can divide users into two
categories: end users and application programs.
 The last component of a DBMS is a set of procedures or rules that should be clearly
defined and followed by the users of the database.
Spatial Database Management System
 Spatial Database Management System (SDBMS) provides the capabilities of a
traditional database management system (DBMS) while allowing special storage and
handling of spatial data.
 SDBMS:
– Works with an underlying DBMS
– Allows spatial data models and types
– Supports querying language specific to spatial data types
– Provides handling of spatial data and operations
How do we incorporate spatial data into a computer application system?
Usually by using a relational Data Base Management System (DBMS)

16
GIS systems traditionally maintain spatial and attribute data separately, then “join” them for
display or analysis.
 The spatial data can be store in vector or raster format
 Vector format represents data in a series of (X,Y) coordinates
 Raster format represent data in a series of columns and rows-Matrix (Pixel,
cell)
 Accuracy
 Vector data are accurate and takes less storage, but take long time e.g.
digitization
 Raster data are inaccurate and takes large storage, but takes short time e.g.
scanning

Hybrid vs. Integrated Approaches


Hybrid Approach:
stores spatial data and attribute data in different data models (typically relational
data model for attribute data and proprietary data structure for spatial data).

17
SDBMS Three-layer Structure
 SDBMS works with a spatial application at the
front end and a DBMS at the back end
 SDBMS has three layers:
 Interface to spatial application
 Core spatial functionality
 Interface to DBMS

18
2.1. Data independence
What is Data Independence of DBMS?
Data Independence is defined as a property of DBMS that helps you to change the Database
schema at one level of a database system without requiring to change the schema at the
next higher level. Data independence helps you to keep data separated from all programs
that make use of it.
Types of Data Independence
In DBMS there are two types of data independence
1. Physical data independence
2. Logical data independence.
1. Physical Data Independence
Physical data independence helps you to separate conceptual levels from the
internal/physical levels.
 It allows you to provide a logical description of the database without the need to
specify physical structures.
 Compared to Logical Independence, it is easy to achieve physical data
independence.
 With Physical independence, you can easily change the physical storage structures
or devices with an effect on the conceptual schema.
 Any change done would be absorbed by the mapping between the conceptual and
internal levels.
 Physical data independence is achieved by the presence of the internal level of the
database and then the transformation from the conceptual level of the database to
the internal level.
Examples of changes under Physical Data Independence
Due to Physical independence, any of the below change will not affect the conceptual layer.
 Using a new storage device like Hard Drive or Magnetic Tapes
 Modifying the file organization technique in the Database
 Switching to different data structures.
 Changing the access method.
 Modifying indexes.
 Changes to compression techniques or hashing algorithms.
 Change of Location of Database from say C drive to D Drive
2. Logical Data Independence

19
Logical Data Independence is the ability to change the conceptual scheme without changing
External views
External API or programs
Any change made will be absorbed by the mapping between external and conceptual levels.
When compared to Physical Data independence, it is challenging to achieve logical data
independence.
Examples of changes under Logical Data Independence
Due to Logical independence, any of the below change will not affect the external layer.
1. Add/Modify/Delete a new attribute, entity or relationship is possible without a
rewrite of existing application programs
2. Merging two records into one
3. Breaking an existing record into two or more records
Importance of Data Independence
 Helps you to improve the quality of the data
 Database system maintenance becomes affordable
 Enforcement of standards and improvement in database security
 You don’t need to alter data structure in application programs
 Permit developers to focus on the general structure of the Database rather than
worrying about the internal implementation
 It allows you to improve state which is undamaged or undivided
 Database incongruity is vastly reduced.
 Easily make modifications in the physical level is needed to improve the performance
of the system.
Summary
 Data Independence is the property of DBMS that helps you to change the Database
schema at one level of a database system without requiring to change the schema at
the next higher level.
 Two levels of data independence are 1) Physical and 2) Logical
 Physical data independence helps you to separate conceptual levels from the
internal/physical levels
 Logical Data Independence is the ability to change the conceptual scheme without
changing

20
 When compared to Physical Data independence, it is challenging to achieve logical
data independence
 Data Independence Helps you to improve the quality of the data

21
Chapter three
Database management system architecture
3.1. What is Database Architecture?
A Database Architecture is a representation of DBMS design. It helps to design, develop,
implement, and maintain the database management system.
• DBMS architecture allows dividing the database system into individual components that
can be independently modified, changed, replaced, and altered. It also helps to understand
the components of a database
• Data modeling (data modelling) is the process of creating a data model for the data to be
stored in a database. This data model is a conceptual representation of Data objects, the
associations between different data objects, and the rules.
•Data modeling helps in the visual representation of data and enforces business rules,
regulatory compliances, and government policies on the data.

 Data Models ensure consistency in naming conventions, default values, semantics,


security while ensuring data Models in DBMS.
• The Data Model is defined as an abstract model that organizes data description, data
semantics, and consistency constraints of data.
• The data model emphasizes on what data is needed and how it should be organized
instead of what operations will be performed on data.
• Data Model is like an architect’s building plan, which helps to build conceptual
models and set a relationship between data items.
Why use Data Model?
The primary goal of using data model are:
• Ensures that all data objects required by the database are accurately represented.
Omission of data will lead to creation of faulty reports and produce incorrect results.
• A data model helps design the database at the conceptual, physical and logical
levels.
• Data Model structure helps to define the relational tables, primary and foreign keys
and stored procedures.
• It provides a clear picture of the base data and can be used by database developers
to create a physical database.
• It is also helpful to identify missing and redundant data.
• Though the initial creation of data model is labor and time consuming, in the long
run, it makes your IT infrastructure upgrade and maintenance cheaper and faster.

22
Types of Data Models in DBMS
Types of Data Models: There are mainly three different types of data models: conceptual
data models, logical data models, and physical data models, and each one has a specific
purpose. The data models are used to represent the data and how it is stored in the
database and to set the relationship between data items.
Conceptual Data Model:

 This Data Model defines WHAT the system contains. This model is typically created
by Business stakeholders and Data Architects.
 The purpose is to organize, scope and define business concepts and rules.
Logical Data Model:

 Defines HOW the system should be implemented regardless of the DBMS.


 This model is typically created by Data Architects and Business Analysts. T
 he purpose is to developed technical map of rules and data structures.
Physical Data Model:

 This Data Model describes HOW the system will be implemented using a specific
DBMS system.
 This model is typically created by DBA and developers.
 The purpose is actual implementation of the database.

23
Chapter Four
Database design approaches
4. Database design approaches:
There are two approaches for developing any database, the top-down method and the
bottom-up method. While these approaches appear radically different, they share the
common goal of utilizing a system by describing all of the interaction between the
processes.
4.1. Top – down design method
The top-down design method starts from the general and moves to the specific. In other
words, you start with a general idea of what is needed for the system and then work your
way down to the more specific details of how the system will interact. This process involves
the identification of different entity types and the definition of each entity’s attributes.

4.2. Bottom – up design method


The bottom-up approach begins with the specific details and moves up to the general. This
is done by first identifying the data elements (items) and then grouping them together in
data sets. In other words, this method first identifies the attributes, and then groups them
to form entities.

24
25
Chapter Five
Normalization
5.Normalization:
The decomposition of a complex data structure into simple, flat files
(relations). Normalization creates separate files that have common data fields, replacing
the associations represented by pointers and keys in hierarchical and network data
• This is the process which allows you to winnow out redundant data within your
database.
• This involves restructuring the tables to successively meeting higher forms of
Normalization.
• A properly normalized database should have the following characteristics
– Scalar values in each fields
– Absence of redundancy.
– Minimal use of null values.
– Minimal loss of information
Levels of Normalization
• Levels of normalization based on the amount of redundancy in the database.
• Various levels of normalization are:
– First Normal Form (1NF)
– Second Normal Form (2NF)
– Third Normal Form (3NF)
– Boyce-Codd Normal Form (BCNF)
– Fourth Normal Form (4NF)
– Fifth Normal Form (5NF)
– Domain Key Normal Form (DKNF)
• First Normal Form (1NF)
• A table is considered to be in 1NF if all the fields contain
• only scalar values (as opposed to list of values).
• Example (Not 1NF)

26
• Author and AuPhone columns are not scalar
• Functional Dependencies
1. If one set of attributes in a table determines another set of attributes in the table,
then the second set of attributes is said to be functionally dependent on the first set
of attributes.
Example 1

Second Normal Form (2NF)


For a table to be in 2NF, there are two requirements
– The database is in first normal form
– All nonkey attributes in the table must be functionally dependent on the
entire primary key
Note: Remember that we are dealing with non-key attributes
Example 1 (Not 2NF)
Scheme  {Title, PubId, AuId, Price, AuAddress}
1. Key  {Title, PubId, AuId}
2. {Title, PubId, AuID}  {Price}
3. {AuID}  {AuAddress}
4. AuAddress does not belong to a key
5. AuAddress functionally depends on AuId which is a subset of a key

27
Third Normal Form (3NF)
This form dictates that all non-key attributes of a table must be functionally dependent on a
candidate key i.e. there can be no interdependencies among non-key attributes.
For a table to be in 3NF, there are two requirements
– The table should be second normal form
– No attribute is transitively dependent on the primary key
Example (Not in 3NF)
Scheme  {Title, PubID, PageCount, Price }
1. Key  {Title, PubId}
2. {Title, PubId}  {PageCount}
3. {PageCount}  {Price}
4. Both Price and PageCount depend on a key hence 2NF
5. Transitively {Title, PubID}  {Price} hence not in 3NF
Boyce-Codd Normal Form (BCNF)
• BCNF does not allow dependencies between attributes that belong to candidate
keys.
• BCNF is a refinement of the third normal form in which it drops the restriction of a
non-key attribute from the 3rd normal form.
• Third normal form and BCNF are not same if the following conditions are true:
– The table has two or more candidate keys
– At least two of the candidate keys are composed of more than one attribute
– The keys are not disjoint i.e. The composite candidate keys share some
attributes
Example 1 - Address (Not in BCNF)
Scheme  {City, Street, ZipCode }
1. Key1  {City, Street }
2. Key2  {ZipCode, Street}
3. No non-key attribute hence 3NF
4. {City, Street}  {ZipCode}
5. {ZipCode}  {City}
6. Dependency between attributes belonging to a key

28
– Fourth Normal Form (4NF)
• Fourth normal form eliminates independent many-to-one relationships between
columns.
• To be in Fourth Normal Form,
– a relation must first be in Boyce-Codd Normal Form.
– a given relation may not contain more than one multi-valued attribute.
Example (Not in 4NF)
Scheme  {MovieName, ScreeningCity, Genre)
Primary Key: {MovieName, ScreeningCity, Genre)
1. All columns are a part of the only candidate key, hence BCNF
2. Many Movies can have the same Genre
3. Many Cities can have the same movie
4. Violates 4NF

Fifth normal form(5NF)


• Fifth normal form is satisfied when all tables are broken into as many tables as
possible in order to avoid redundancy. Once it is in fifth normal form it cannot be
broken into smaller relations without changing the facts or the meaning.
Domain Key Normal Form (DKNF)

 The relation is in DKNF when there can be no insertion or deletion anomalies in the
database.

29
Chapter six
6. ER modelling
6.1. Entity-Relationship Model
 The Entity-Relationship (ER) model is generally attributed to (Chen 1976).
 The ER model envisions the world as comprised of entities that are associated with
each other by relationships. All of the entities of a particular type are collected
together into entity sets.
 Entity sets and relationships can be depicted graphically in an ER-diagram.
Entities
 Entities are distinguishable “real-world” objects such as employees, maps, airplanes,
or bus schedules.
o “Distinguishable” means that all entities can be uniquely identified.
o Entities have common attributes that define what it means to be such an
entity.
o Any particular real-world object does not necessarily have a single or best
representation as an entity.
 For any given real-world object, different modelers can choose
different sets of attributes of the object that are of interest to their
particular situation.
 This results in the same object being modeled differently.

 Entities are collected into entity sets.


o Entity sets are depicted as rectangles in ER diagrams.
o Their attributes are depicted as ellipses attached to the rectangles by lines.
Relationships
 A relationship is a list of entity sets.
o Notation: two entity sets A and B that stand in relationship r is
written A r B. See the next bullet for examples.

 Types of relationships (see Figure 1.):


o aggregating relationships:

30
 one-one: if A r B and r is one-one then each entity of B is in
relationship with at most one entity of A and vice-versa.
 For example, if CAPTAIN commands VESSEL and commands is one-one then, in our
model, each vessel has at most one captain and each captain commands at most one
vessel at a time.
 many-one: if A r B and r is many-one then each entity of A is in relationship with at
most one entity of B but not vice-versa.
 For example, if CREW assigned-to VESSEL and assigned-to is many-one then, in our
model, a vessel has many crew members but a crew member is assigned to only one
vessel.
 many-many: if A r B and r is many-many then each entity of A can be in relationship
with any number of B entities and vice-versa.
 For example, if VESSEL patrols REGION and patrols is many-many then, in our model,
a vessel patrols many regions and a region is patrolled by many ships.
 Is a (read “is a”) relationships: if A is a B then A is a specialization of B, or,
conversely, B is a generalization of A.
 For example, if CAPTAIN is a CREW then, in our model, captains have all the
attributes of crew members but not vice versa.
 The is a relationship allows hierarchies to be established among entity sets.
 A Relationship is depicted by a lozenge with lines connecting it to the relevant entity
sets.
 The Entity-Relationship model lacks an underlying formalism and is, therefore, used
more for general conceptualization than for creating physical models
o (indeed, some authors do not acknowledge the ER model as a data model at
all).
o It is not uncommon for a conceptual design to be expressed in the ER model
and then “translated” into another model for implementation.
Entity-Relationship Modelling

 How to use Entity–Relationship (ER) modelling in database design.

 Basic concepts associated with ER model.

 Diagrammatic technique for displaying ER model using Unified Modelling Language


(UML).

 How to build an ER model from a requirements specification?


Concepts of the ER Model

 Entity types (Rectangles)

 Relationship types (Links)

31
 Attributes (Names inside rectangles

 Entity type (Class)

 Group of objects with same properties, identified by enterprise as having an


independent existence.

 Entity occurrence (Instance of a Class; Object)

 Uniquely identifiable object of an entity type.

 Entity occurrence (Instance of a Class; Object)

 Uniquely identifiable object of an entity type.


Examples of Entity Types (Nouns; appear in middle of box if alone, at top of box if attributes
are listed)

ER diagram of Staff and Branch entity types

Relationship Types

 Relationship type
 Meaningful associations among entity types.
 i.e., Link between classes
 Name occurs as label on the link
 Arrow indicates direction of relationship type

 Relationship occurrence
 Uniquely identifiable association, which includes one occurrence from each
participating entity type.
 i.e., Link between instances
32
ER diagram of Branch Has Staff relationship type

6.2 Degree of relationship


 Number of participating entities in relationship.
 Relationship of degree:
 two is binary
 three is ternary
 Four is quaternary.
Binary relationship called POwns

Ternary relationship called Registers (add diamond instead of label)

Quaternary relationship called Arranges (add diamond instead of label)

33
34
Chapter seven
Relational databases
7. Relational databases
Understand the relational database model’s basic components are entities and their
attributes, and relationships among entities

 Identify how entities and their attributes are organized into tables

 Understand concept of integrity rules of relational database


Relational Model

 In the relational data model, the database is represented as a group of related


tables.

 The relational data model was introduced in 1970 by E. F. Codd of IBM published
a paper in CACM entitled "A Relational Model of Data for Large Shared Data
Banks".

 It is currently the most popular model. The mathematical simplicity and ease of
visualization of the relational data model have contributed to its success.
Definitions of Terminology

Characteristics of a Relation (table)

 Two-dimensional structure with rows and columns

 A relation represent a single entity

 Each table must have an attribute to uniquely identify each row

 Column values all have same data type

 Order of the rows and columns is immaterial to the DBMS


Properties of a Relation

35
 Based on the set theory
There are no duplicate tuples (rows).

 The body of the relation is a mathematical set (i.e., a set of tuples), and sets
in mathematics by definition do not include duplicate elements.

 If a "relation" contains duplicate tuples, then it is not a relation.


1. Tuples (rows) are unordered (top to bottom).

 Sets in mathematics are not ordered. So, even if a relation A's tuples are
reversely ordered, it is still the same relation.

 Thus, there is no such thing as "the 5th tuple" or the last tuple. In other
words,
2. Attributes (columns) are unordered (left to right).

 The heading of a relation is also defined as a set.

 There is no such thing as "5th attribute (column)" or the last attribute.

 there is no concept of positional addressing.


3. All attribute values are atomic.

 At every row-and-column position within the table, there always exists


precisely one value, never a list of values. Or equivalently, relations do not
contain repeating groups.

 A relation satisfying this condition is said to be in First Normal Form.


Primary Key

 A PK is an attribute, or collection of attributes, whose values uniquely identify each


tuple in a relation.

 To being unique, a PK must be minimal (contain no unnecessary attributes) and must


not change in value.

 One attribute, or collection of attributes, that can serve as a PK is called candidate


key,

 And the remaining keys that cannot be used as a PK are called alternate key.

 Cost of PK
 SS# vs. finger print
Candidate key and alternate key

36
 Could any attribute (column) serve as the PK? candidate key

 Is there any attribute that should not be served as the PK? alternate key
Entity Integrity Rule

 Guarantees that each entity will have a unique identity and ensures that foreign key
values can properly reference primary key values.

 Requirement

 No component of the primary key is allowed to accept nulls.


By "null" here, we mean that information is missing for some reason.
Foreign Key

 An attribute in one table whose values must either match the primary key in another
table or be null.

 Attribute FK of base relation R2 is a foreign key if and only if it satisfies the following
two time-independent properties:

 Each value of FK is either wholly null or wholly non-null.

 Each non-null value of FK is identical to the value of PK in some tuple of R1.

7.1 Entity-relationship diagrams


Relationship Types

37
 Relationships can be categorized by
 cardinality constraints
 other properties, e.g. number of participating entities
• Binary relationship: two entities participate

 Types of Cardinality constraints for binary relationships


 One-One: An instance of an entity relates to a unique instance of other
entity.
 Many-One: Many instances of an entity relate to an instance of an other.
 Many-Many: Many instances of one entity relate to multiple instances of
another.
Exercise: Identify type of cardinality constraint for following:
 Many facilities belong to a forest. Each facility belong to one forest.
 A manager manages 1 forest. Each forest has 1 manager.
 A river supplies water to many facilities. A facility gets water from many
rivers.
ER Diagrams Graphical Notation
• ER Diagrams are graphic representation of ER models
• Several different graphic notation are used
• We use a simple notation summarized below
• Example ER Diagram for Forest
• Q? Compare and contrast “Atributes” and “Multi-valued attributes”.

ER Diagram for “State-Park”

38
• Exercise:
• List the entities, attributes, relationships in this ER diagram
• Identify cardinality constraint for each relationship.
• How many roads “Accesses” a “Forest_stand”? (one or many)
7.2. Data query: Query optimization,
What is a query?

 What is a Query ?
 A query is a “question” posed to a database
 Queries are expressed in a high-level declarative manner
• Algorithms needed to answer the query are not specified in the query

 Examples:
 Mouse click on a map symbol (e.g. road) may mean
• What is the name of road pointed to by mouse cursor ?
 Typing a keyword in a search engine (e.g. google, yahoo) means
• Which documents on web contain given keywords?
 SELECT S.name FROM Senator S WHERE S.gender = ‘F’ means
• Which senators are female?
o What is a query language?

 A language to express interesting questions about data

 A query language restricts the set of possible queries


Examples:

39
 Natural language, e.g. English, can express almost all queries

 Computer programming languages, e.g. Java,


• can express computable queries
• however algorithms to answer the query is needed

 Structured Query Language(SQL)


• Can express common data intensive queries
• Not suitable for recursive queries

 Graphical interfaces, e.g. web-search, mouse clicks on a map


• can express few different kinds of queries
An Example World Database

 Purpose: Use an example database to learn query language SQL

 Conceptual Model

 3 Entities: Country, City, River

 2 Relationships: capital-of, originates-in

 Attributes listed in Figure 3.1

An Example Database - Logical Model


• 3 Relations
Country(Name, Cont, Pop, GDP, Life-Exp, Shape)
City(Name, Country, Pop,Capital, Shape)
River(Name, Origin, Length, Shape)
40
• Keys
• Primary keys are Country.Name, City.Name, River.Name
• Foreign keys are River.Origin, City.Country
• Data for 3 tables
World database data tables

Query Optimization
• Query Optimization is
• A spatial operation can be processed using different strategies
• Computation cost of each strategy depends on many parameters
• Query optimization is the process of
• ordering operations in a query and
• selecting efficient strategy for each operation
• based on the details of a given dataset
• Example Query:
SELECT S.name FROM Senator S, Business B

41
WHERE S.soc-sec = B.soc-sec AND S.gender = ‘Female’
• Optimization decision examples
• Process (S.gender = ‘Female’) before (S.soc-sec = B.soc-sec )
• Do not use index for processing (S.gender = ‘Female’)
7.4 Structured Query Language
What is SQL?

 SQL - General Information


 is a standard query language for relational databases
 It support logical data model concepts, such as relations, keys, ...
 Supported by major brands, e.g. IBM DB2, Oracle, MS SQL Server, Sybase, ...
 3 versions: SQL1 (1986), SQL2 (1992), SQL 3 (1999)
 Can express common data intensive queries
 SQL 1 and SQL 2 are not suitable for recursive queries

 SQL and spatial data management


 ESRI Arc/Info included a custom relational DBMS named Info
 Other GIS software can interact with DBMS using SQL
• using open database connectivity (ODBC) or other protocols
 In fact, many software use SQL to manage data in back-end DBMS
 And a vast majority of SQL queries are generated by other software
 Although we will be writing SQL queries manually!
Three Components of SQL?

 Data Definition Language (DDL)


 Creation and modification of relational schema
 Schema objects include relations, indexes, etc.

 Data Manipulation Language (DML)


 Insert, delete, update rows in tables
 Query data in tables

 Data Control Language (DCL)


 Concurrency control, transactions

42
 Administrative tasks, e.g. set up database users, security permissions

 Focus for now


 A little bit of table creation (DDL) and population (DML)
 Primarily Querying (DML)
Creating Tables in SQL
• Table definition
• “CREATE TABLE” statement
• Specifies table name, attribute names and data types
• Create a table with no rows.
• See an example at the bottom
• Related statements
• ALTER TABLE statement modifies table schema if needed
• DROP TABLE statement removes an empty table

Populating Tables in SQL


• Adding a row to an existing table
• “INSERT INTO” statement
• Specifies table name, attribute names and values
• Example:
INSERT INTO River(Name, Origin, Length) VALUES(‘Mississippi’, ‘USA’, 6000)
• Related statements
• SELECT statement with INTO clause can insert multiple rows in a table
• Bulk load, import commands also add multiple rows
• DELETE statement removes rows

43
UPDATE statement can change values within selected rows
Querying populated Tables in SQL
• SELECT statement
• The commonly used statement to query data in one or more tables
• Returns a relation (table) as result
• Has many clauses
• Can refer to many operators and functions
• Allows nested queries which can be hard to understand
• Scope of our discussion
• Learn enough SQL to appreciate spatial extensions
• Observe example queries
• Read and write simple SELECT statement
• Understand frequently used clauses, e.g. SELECT, FROM, WHERE
• Understand a few operators and function
SELECT Statement- General Information
• Clauses
• SELECT specifies desired columns
• FROM specifies relevant tables
• WHERE specifies qualifying conditions for rows
• ORDER BY specifies sorting columns for results
• GROUP BY, HAVING specifies aggregation and statistics
• Operators and functions
• arithmetic operators, e.g. +, -, …
• comparison operators, e.g. =, <, >, BETWEEN, LIKE…
• logical operators, e.g. AND, OR, NOT, EXISTS,
• set operators, e.g. UNION, IN, ALL, ANY, …
• statistical functions, e.g. SUM, COUNT, ...
• many other operators on strings, date, currency, ...
SELECT Example 1.

44
• Simplest Query has SELECT and FROM clauses
• Query: List all the cities and the country they belong to.
SELECT Name, Country
FROM CITY

SELECT Example 2.
• Commonly 3 clauses (SELECT, FROM, WHERE) are used
• Query: List the names of the capital cities in the CITY table.
SELECT *
FROM CITY
WHERE CAPITAL=‘Y ’

Query Example…Where clause


Query: List the attributes of countries in the Country relation where the life-expectancy is
less than seventy years.
SELECT Co.Name,Co.Life-Exp
FROM Country Co
WHERE Co.Life-Exp <70
Note: use of alias ‘Co’ for Table ‘Country’

45
Multi-table Query Examples
Query: List the capital cities and populations of countries whose GDP exceeds one trillion
dollars.
Note:Tables City and Country are joined by matching “City.Country = Country.Name”. This
simulates relational operator “join”

Query: What is the name and population of the capital city in the country where the St.
Lawrence River originates?
SELECT Ci.Name, Ci.Pop
FROM City Ci, Country Co, River R
WHERE R.Origin =Co.Name
AND Co.Name =Ci.Country
AND R.Name =‘St.Lawrence ’
AND Ci.Capital=‘Y ’
Note: Three tables are joined together pair at a time. River.Origin is matched with
Country.Name and City.Country is matched with Country.Name. The order of join is decided
by query optimizer and does not affect the result.
Query: What is the average population of the noncapital cities listed in the City table?
SELECT AVG(Ci.Pop)
FROM City Ci
WHERE Ci.Capital=‘N ’
Query: For each continent, find the average GDP.

46
SELECT Co.Cont,Avg(Co.GDP)AS Continent-GDP
FROM Country Co
GROUP BY Co.Cont
Query Example..Having clause, Nested queries
Query: For each country in which at least two rivers originate, find the length of the smallest
river.
SELECT R.Origin, MIN(R.length) AS Min-length
FROM River
GROUP BY R.Origin
HAVING COUNT(*) > 1
Query: List the countries whose GDP is greater than that of Canada.
SELECT Co.Name
FROM Country Co
WHERE Co.GDP >ANY(SELECT Co1.GDP
FROM Country Co1
WHERE Co1.Name =‘Canada ’)
Extending SQL for Spatial Data

 Motivation

 SQL has simple atomic data-types, like integer, dates and string

 Not convenient for spatial data and queries


• Spatial data (e.g. polygons) is complex
• Spatial operation: topological, euclidean, directional, metric

 SQL 3 allows user defined data types and operations

 Spatial data types and operations can be added to SQL3

 Open Geodata Interchange Standard (OGIS)


 Half a dozen spatial data types
 Several spatial operations
 Supported by major vendors, e.g. ESRI, Intergraph, Oracle, IBM,...
OGIS Spatial Data Model

47
 Consists of base-class Geometry and four sub-classes:
 Point, Curve, Surface and GeometryCollection
 Operations fall into three categories:
 Apply to all geometry types
• SpatialReference, Envelope, Export,IsSimple, Boundary
 Predicates for Topological relationships
• Equal, Disjoint, Intersect, Touch, Cross, Within, Contains
 Spatial Data Analysis
• Distance,Buffer,Union, Intersection, ConvexHull, SymDiff
Spatial Queries with SQL/OGIS
• SQL/OGIS - General Information
• Both standard are being adopted by many vendors
• The choice of spatial data types and operations is similar
• Syntax differs from vendor to vendor
• Readers may need to alter SQL/OGIS queries given in text to make them run
on specific commercial products
• Using OGIS with SQL
• Spatial data types can be used in DML to type columns
• Spatial operations can be used in DML
• Scope of discussion
• Illustrate use of spatial data types with SQL
• Via a set of examples
List of Spatial Query Examples
• Simple SQL SELECT_FROM_WHERE examples
• Spatial analysis operations
• Unary operator: Area
• Binary operator: Distance
• Boolean Topological spatial operations - WHERE clause
• Touch
• Cross
48
• Using spatial analysis and topological operations
• Buffer, overlap
• Complex SQL examples
• Aggreagate SQL queries
• Nested queries
Using spatial operation in SELECT clause
Query: List the name, population, and area of each country listed in the Country table.
SELECT C.Name,C.Pop, Area(C.Shape)AS "Area"
FROM Country C
Note: This query uses spatial operation, Area().Note the use of spatial
operation in place of a column in SELECT clause.
Using spatial operator Distance
Query: List the GDP and the distance of a country’s capital city to the equator for all
countries.
SELECT Co.GDP, Distance(Point(0,Ci.Shape.y),Ci.Shape) AS "Distance"
FROM Country Co,City Ci
WHERE Co.Name = Ci.Country
AND Ci.Capital =‘Y ’

Using Spatial Operation in WHERE clause


Query: Find the names of all countries which are neighbors of the United States (USA) in the
Country table.
SELECT C1.Name AS "Neighbors of USA"

49
FROM Country C1,Country C2
WHERE Touch(C1.Shape,C2.Shape)=1
AND C2.Name =‘USA ’
Note: Spatial operator Touch() is used in WHERE clause to join Country table with itself. This
query is an example of spatial self join operation.
Spatial Query with multiple tables
Query: For all the rivers listed in the River table, find the countries through which they pass.
SELECT R.Name, C.Name
FROM River R, Country C
WHERE Cross(R.Shape,C.Shape)=1
Note: Spatial operation “Cross” is used to join River and Country tables. This query
represents a spatial join operation.
Example Spatial Query…Buffer and Overlap
Query: The St. Lawrence River can supply water to cities that are within 300 km. List the
cities that can use water from the St. Lawrence.
SELECT Ci.Name
FROM City Ci, River R
WHERE Overlap (Ci.Shape, Buffer(R.Shape,300))=1
AND R.Name =‘St.Lawrence ’
Recall List of Spatial Query Examples
• Simple SQL SELECT_FROM_WHERE examples
• Spatial analysis operations
• Unary operator: Area
• Binary operator: Distance
• Boolean Topological spatial operations - WHERE clause
• Touch
• Cross
• Using spatial analysis and topological operations
• Buffer, overlap
Using spatial operation in an aggregate query

50
Query: List all countries, ordered by number of neighboring countries.
SELECT Co.Name, Count(Co1.Name)
FROM Country Co, Country Co1
WHERE Touch(Co.Shape,Co1.Shape)
GROUP BY Co.Name
ORDER BY Count(Co1.Name)
Notes: This query can be used to differentiate querying capabilities of simple GIS software
(e.g. Arc/View) and a spatial database. It is quite tedious to carry out this query in GIS.
Earlier version of OGIS did not provide spatial aggregate operation to support GIS
operations like reclassify.
Using Spatial Operation in Nested Queries
Query: For each river, identify the closest city.
SELECT C1.Name, R1.Name
FROM City C1, River R1
WHERE Distance (C1.Shape,R1.Shape) <= ALL ( SELECT Distance(C2.Shape)
FROM City C2
WHERE C1.Name <> C2.Name
)
Note: Spatial operation Distance used in context of a nested query.
Exercise: It is interesting to note that SQL query expression to find smallest distance from
each river to nearest city is much simpler and does not require nested query. Audience is
encouraged to write a SQL expression for this query.
Nested Spatial Query
Query: List the countries with only one neighboring country. A country is a neighbor of
another country if their land masses share a boundary. According to this definition, island
countries, like Iceland, have no neighbors.
SELECT Co.Name
FROM Country Co
WHERE Co.Name IN (SELECT Co.Name
FROM Country Co,Country Co1
WHERE Touch(Co.Shape,Co1.Shape)

51
GROUP BY Co.Name
HAVING Count (*)=1)
Note: It shows a complex nested query with aggregate operations. Such queries can be
written into two expressions, namely a view definition, and a query on the view. The inner
query becomes a view and outer query is run on the view. This is illustrated in the next slide.
Rewriting nested queries using Views
• Views are like tables
• Represent derived data or result of a query
• Can be used to simplify complex nested queries
• Example follows:
CREATE VIEW Neighbor AS
SELECT Co.Name, Count(Co1.Name)AS num neighbors
FROM Country Co,Country Co1
WHERE Touch(Co.Shape,Co1.Shape)
GROUP BY Co.Name
SELECT Co.Name,num neighbors
FROM Neighbor
WHERE num neighbor = ( SELECT Max(num neighbors) FROM Neighbor )

52
Chapter eight
Data models for spatial and non-spatial data
8. Spatial vs. Non-spatial Data
 Spatial data includes location, shape, size, and orientation.
o For example, consider a particular square:
 its center (the intersection of its diagonals) specifies its location
 its shape is a square
 the length of one of its sides specifies its size
 the angle its diagonals make with, say, the x-axis specifies its
orientation.
 Spatial data includes spatial relationships. For example, the arrangement of ten
bowling pins is spatial data.
 Non-spatial data (also called attribute or characteristic data) is that information
which is independent of all geometric considerations.
o For example, a person’s height, mass, and age are non-spatial data because
they are independent of the person’s location.
o It’s interesting to note that, while mass is non-spatial data, weight is spatial
data in the sense that something’s weight is very much dependent on its
location!
 It is possible to ignore the distinction between spatial and non-spatial
data. However, there are fundamental differences between them:
o spatial data are generally multi-dimensional and autocorrelated.
o non-spatial data are generally one-dimensional and independent.

 These distinctions put spatial and non-spatial data into different philosophical camps
with far-reaching implications for conceptual, processing, and storage issues.
o For example, sorting is perhaps the most common and important non-spatial
data processing function that is performed.
o It is not obvious how to even sort locational data such that all points end up
“nearby” their nearest neighbors.
These distinctions justify a separate consideration of spatial and non-spatial data models.
What is a Data Model?
• What is a model? (Dictionary meaning)

53
• A set of plans (blueprint drawing) for a building
• A miniature representation of a system to analyze properties of interest
• What is Data Model?
• Specify structure or schema of a data set
• Document description of data
• Facilitates early analysis of some properties, e.g. querying ability,
redundancy, consistency, storage space requirements, etc.
• Examples:
• GIS organize spatial set as a set of layers
• Databases organize dataset as a collection of tables
Why Data Models?
• Data models facilitate
• Early analysis of properties, e.g. storage cost, querying ability, ...
• Reuse of shared data among multiple applications
• Exchange of data across organization

 Conversion of data to new software / environment


Types of Data Models
• Two Types of data models
• Generic data models
• Developed for business data processing
• Support simple abstract data types (ADTs), e.g. numbers, strings, date
• Not convenient for spatial ADTs, e.g. polygons
• Recall a polygon becomes dozens of rows in 3 tables (Fig. 1.4, pp. 8)
• Need to extend with spatial concepts, e.g. ADTs
• Application Domain specific, e.g. spatial models
• Set of concepts developed in Geographic Info. Science
• Common spatial ADTs across different GIS applications
• Plan of Study
• First study concepts in spatial models
• Then study generic model
54
• Finally put the two together
Models of Spatial Information

 Two common models


 Field based
 Object based

 Example: Forest stands


 Fig. 2.1
 (a) forest stand map
 (b) Object view has 3 polygons
 (c ) Field view has a function

Field based Model

 Three main concepts:


 Spatial Framework is a partitioning of space
• e.g., Grid imposed by Latitude and Longitude
 Field Functions:
f: Spatial Framework  Attribute Domain
 Field Operations
• Examples, addition(+) and composition(o).

55
f  g : x  f (x)  g(x)
f  g : x  f (g(x))
Types of Field Operations

 Local: value of the new field at a given location in the spatial frame-work depends
only on the value of the input field at that location (e.g., Thresholding)

 Focal: value of the resulting field at a given location depends on the values that the
input field assumes in a small neighborhood of the location (e.g., Gradient)

 Zonal: Zonal operations are naturally associated with aggregate operators or the
integration function. An operation that calculates the average height of the trees for
each species is a zonal operation.

 Exercise: Classify following operations on elevation field


 (I) Identify peaks (points higher than its neighbors)
 (II) Identify mountain ranges (elevation over 2000 feet)
 (III) Determine average elevation of a set of river basins
Object Model

 Object model concepts


 Objects: distinct identifiable things relevant to an application
 Objects have attributes and operations
 Attribute: a simple (e.g. numeric, string) property of an object
 Operations: function maps object attributes to other objects

 Example from a roadmap


 Objects: roads, landmarks, ...
 Attributes of road objects:
• spatial: location, e.g. polygon boundary of land-parcel
• non-spatial: name (e.g. Route 66), type (e.g. interstate, residential
street), number of lanes, speed limit, …
 Operations on road objects: determine center line, determine length,
determine intersection with other roads, ...
Classifying Spatial objects
• Spatial objets are spatial attributes of general objects

56
• Spatial objects are of many types
• Simple
• 0- dimensional (points), 1 dimensional (curves), 2 dimensional
(surfaces)
• Example given at the bottom of this slide
• Collections
• Polygon collection (e.g. boundary of Japan or Hawaii), …
• See more complete list in Figure

Spatial Object Example Object Dimension


Types

Point City 0

Curve River 1

Surface Country 2

Spatial Object Types in OGIS Data Model


Fig : Each rectangle shows a distinct spatial object type

Classifying Operations on spatial objects in Object Model


• Classifying operations
• Set based: 2-dimensional spatial objects (e.g. polygons) are sets of points

57
• a set operation (e.g. intersection) of 2 polygons produce another

Set theory based Union, Intersection,


Containment,

Toplogical Touches, Disjoint, Overlap, etc.

Directional East,North-West, etc.

Metric Distance

polygon
• Topological operations: Boundary of USA touches boundary of Canada
• Directional: New York city is to east of Chicago
• Metric: Chicago is about 700 miles from New York city.
Topological Relationships
 invariant under elastic deformation (without tear, merge).
 Two countries which touch each other in a planar paper map will continue to
do so in spherical globe maps.

 Topology is the study of topological relationships

 Example queries with topological operations


 What is the topological relationship between two objects A and B ?
 Find all objects which have a given topological relationship to object A ?
Topological Concepts

 Interior, boundary, exterior


 Let A be an object in a “Universe” U.

58
Question: Define Interior, boundary, exterior on curves and points
Nine-Intersection Model of Topological Relationships
• Many topological Relationship between A and B can be
• specified using 9 intersection model
• Examples on next slide
• Nine intersections
• intersections between interior, boundary, exterior of A, B
• A and B are spatial objects in a two dimensional plane.
• Can be arranged as a 3 by 3 matrix
• Matrix element take a value of 0 (false) or 1 (true).
• Q? Determine the number of many distinct 3 by 3 boolean matrices .

Specifying topological operation in 9-Intersection Model


Fig 2.3: 9 intersection matrices for a few topological operations

59
8.1 Relational model

 Relational model is based on set theory

 Main concepts
 Domain: a set of values for a simple attribute
 Relation: cross-product of a set of domains
• Represents a table, i.e. homogeneous collection of rows (tuples)
• The set of columns (i.e. attributes) are same for each row

 Comparison to concepts in conceptual data model


 Relations are similar to but not identical to entities
 Domains are similar to attributes
Relational Schema

 Schema of a Relation
 Enumerates columns, identifies primary key and foreign keys.
 Primary Key :
• one or more attributes uniquely identify each row within a table
 Foreign keys
• R’s attributes which form primary key of another relation S
• Value of a foreign key in any tuple of R match values in some row of S
 Relational schema of a database
 collection of schemas of all relations in the database
 Example: Figure 2.5 (next slide)
 Ablue print summary drawing of the database table structures
 Allows analysis of storage costs, data redundancy, querying capabilities
 Some databases were designed as relational schema in 1980s

60
 Nowadays, databases are designed as E R models and relational schema is
generated via CASE tools
Relational Schema Example

Relational Schema for “Point”, “Line”, “Polygon” and “Elevation”


• Relational model restricts attribute domains
• simple atomic values, e.g. a number
• Disallows complex values (e.g. polygons) for columns
• Complex values need to be decomposed into simpler domains
• A polygon may be decomposed into edges and vertices (Fig.below )

61
 Integrity Constraints
 Key: Every relation has a primary key.
 Entity Integrity: Value of primary key in a row is never undefined
 Referential Integrity: Value of an attribute of a Foreign Key must appear as a
value in the primary key of another relationship or must be null.
 Normal Forms (NF) for Relational schema
 Reduce data redundancy and facilitate querying
 1st NF: Each column in a relation contains an atomic value.
 2nd and 3rd NF: Values of non-key attributes are fully determined by the
values of the primary key, only the primary key, and nothing but the primary
key.
 Other normal forms exist but are seldom used
 Translating a well-designed ER model yields a relational schema in 3rd NF
• satisfying definition of 1st, 2nd and 3rd normal forms
Mapping ER to Relational
• Highlights of translation rules
• Entity becomes Relation
• Attributes become columns in the relation
• Multi-valued attributes become a new relation
• includes foreign key to link to relation for the entity
• Relationships (1:1, 1:N) become foreign keys
• M:N Relationships become a relation
containing foreign keys or relations from participating entities
8.2 Geo-relational model
A georelational data model is geographic data model that represents geographic features as
an interrelated set of spatial and attribute data.

 The Georelational Data Model stores spatial and attribute data separately in a split
system.
Spatial and Attribute Components

62
 Geospatial data comprise the spatial and attribute components.
 Spatial data describe the locations of spatial features, whereas attribute data
describe the characteristics of spatial features.
The georelational data model stores spatial and attribute data separately in a split system.
 Spatial data in ‘graphic files. It describes the absolute and relative location of
geographic features.
 Attribute data in ‘relational database files. It describes characteristics of the spatial
features. Attribute data is often referred to as tabular data.
Example of Georelational Data Model
As an example of the georelational data model, the soil-id coverage uses SOIL-ID to link to
the spatial and attribute data.

8.3 Object model


What is the Object Model?
 The word “object” is similar to the Entity-Relationship concept of an “entity”
although “object” is more general.
o I recommend taking “object” in the spirit of “objects in the physical world.”
o Objects are things but they are not limited to physical, tangible things. For
example, data structures (e.g., a hash table) can be objects.
o All objects are distinct and, like the network model, are made distinct by an
identifying attribute, the object ID.

 Like the other models, the object model assumes that objects can conceptually be
collected together into meaningful groups. These groups are called classes.
 An object grouping is meaningful because objects of the same class must have
common attributes, behaviors, and relationships with other objects.

 Unlike entity sets and relations, classes do not actually hold the objects of that class.
o Classes are purely conceptual.

63
o There is nothing in the object model that is equivalent to either a entity set or
a relation (there could be but it’s not required by the model).
 Like the network model, the relationships among objects are specified via a
“physical” link The DARPA Open OODB project proposes the following as the
essential features of the OO data model (Blakeley 1991) and (Rao 1994, p.72):
 Object identity: the ability of the system to distinguish between two different objects
that have the same state. The state of an object can be shared by several objects via
object identity.
 Encapsulation: a kind of abstraction that enforces a clean separation between the
external interface (behavior) of an object and its internal
implementation. Encapsulation requires that all access (or interaction) with objects
be done by invoking the services provided by their external interface.
 Complex state: the ability to define data types whose implementation has a nested
structure. The state of an object could be built from records of primitive types, other
objects, or [collections] of objects.
 Type extensibility: the ability to define new data types from previously defined types
by enhancing or changing the structure or behavior of the types. Type inheritance is
a mechanism used to define new types by enhancing already existing behavior.
 Genericity: The types of the object data model with which the object query language
collaborates must be generic. (pointer) between objects.
8.4 Object-relational model
Spatial and Graph supports the object-relational model for representing geometries. This
model stores an entire geometry in the Oracle native spatial data type for vector data,
SDO_GEOMETRY. An Oracle table can contain one or more SDO_GEOMETRY columns. The
object-relational model corresponds to a "SQL with Geometry Types" implementation of
spatial feature tables in the Open GIS ODBC/SQL specification for geospatial features.

The benefits provided by the object-relational model include:

 Support for many geometry types, including arcs, circles, compound polygons,
compound line strings, and optimized rectangles

 Ease of use in creating and maintaining indexes and in performing spatial queries

 Index maintenance by the Oracle database

 Geometries modeled in a single column

 Optimal performance

64
The geodatabase is object relational

 The geodatabase employs a multitier application architecture by implementing


advanced logic and behavior in the application tier on top of the data storage tier
(managed within various database management systems, files, or extensible mark-
up language [XML]).
 The geodatabase application logic includes support for a series of generic geographic
information system (GIS) data objects and behaviours such as feature classes, raster
datasets, topologies, networks, and other advanced functionality.
 Responsibility for management of geographic datasets is shared between the ArcGIS
software and database management system or files.
 Certain aspects of geographic dataset management, such as disk-based storage,
definition of attribute types, associative query processing, and multiuser transaction
processing, are delegated to the database management system in the case of
enterprise geodatabases.
 The GIS application retains responsibility for defining the specific schema used to
represent various geographic datasets and for domain-specific logic, which maintains
the integrity and utility of the underlying records.
 In effect, the database management system is used as one of a series of
implementation mechanisms for persisting geographic datasets.
 However, the database management system or file structure does not fully define
the semantics of the geographic data.
 This could be considered a multitier architecture (application and storage), where
aspects related to data storage and retrieval are implemented in the data storage
tier as simple tables, while high-level data integrity and information processing
functions are retained in the application and domain software (ArcGIS).
 The separation of geodatabase logic from storage enables open support for
numerous file types, database management systems, and XML.
 For example, access to almost any feature and tabular data format is provided by
the Data Interoperability extension to ArcGIS. This extension provides a gateway to
read and work with dozens of data formats using the geodatabase logic.

65
Chapter Nine
Unified model language (UML)
9.UML Diagram – What is UML?
The Unified Modeling Language (UML) is a standard language for

 UML basics
 Use case diagram
 Class diagram
 Activity diagram
 Sequence diagram
 StateMachine diagram
Conceptual Data Modeling with UML
• Motivation
• ER Model does not allow user defined operations
• Object oriented software development uses UML
• UML stands for Unified Modeling Language
• It is a standard consisting of several diagrams
• class diagrams are most relevant for data modeling
• UML class diagrams concepts
• Attributes are simple or composite properties
• Methods represent operations, functions and procedures
• Class is a collection of attributes and methods
• Relationship relate classes

66
• Example UML class diagram: Figure below
UML Class Diagram with Pictograms: Example
• Exercise: Identify classes, attributes, methods, relationships in Fig. 2.8.
• Compare Fig. 2.8 with corresponding ER diagram in Fig. 2.7.

Comparing UML Class Diagrams to ER Diagrams


• Concepts in UML class diagram vs. those in ER diagrams
• Class without methods is an Entity
• Attributes are common in both models
• UML does not have key attributes and integrity constraints
• ERD does not have methods
• Relationships properties are richer in ERDs
• Entities in ER diagram relate to datasets, but UML class diagram
• can contain classes which have little to do with data
Relationships between Class Diagrams

 Association -- a relationship between instances of the two classes. There is an


association between two classes if an instance of one class must know about the
other in order to perform its work. In a diagram, an association is a link connecting
two classes.

 Aggregation -- an association in which one class belongs to a collection. An


aggregation has a diamond end pointing to the part containing the whole.

67
 Generalization -- an inheritance link indicating one class is a superclass of the other.
A generalization has a triangle pointing to the superclass.

 Use case diagram

 Sequence Diagram

 Activities Diagram

68
 State Machine Diagram

69
CHAPTER TEN
Web databases processing
Why is ‘Databases on the Web’ Important?

 Databases are established technology for managing large amounts of data

 The Web is a good way to present information

 Separating data management from presentation improves efficiency


 updating
 finding information
Examples of Websites Using Databases

 Organizational information services


 employee directories

 Booking & scheduling


 airlines, university courses signup

 Electronic commerce

 Website automation
 www.yahoo.com
 www.webmonkey.com
How to Integrate Databases and the Web?
 Databases
 MS Access, MySQL, mSQL, Oracle, Sybase, MS SQL Server
 Integration tools
 PHP or CGI, Servlets, JSP, ASP etc.
 “Middleware”: e.g. ColdFusion
 http://www.allaire.com/

70
References
Spatial data base: with Application to GIS (Morgan kaufmann series in data management
systems);1st edition
Spatial Data management, M. Tamer Ozsu, University of waterloo

71

You might also like