Spatial Databases (All Chapters)(Seng 3174) (1)
Spatial Databases (All Chapters)(Seng 3174) (1)
Spatial databases provide structures for storage and analysis of spatial data
Queries to retrieve and analyze spatial data from a standard database would be long.
Spatial databases provide much more efficient storage, retrieval, and analysis of
spatial data
A spatial database is an ORDBMS that has the ability to store, query, manipulate and
analyze spatial data as well as traditional data formats
1
It provides efficient algorithms for spatial joins .
It defines data types for points, lines, polygons, multipoint, multiline, and
multipoloygon
These databases have built in functions for manipulating spatial data anywhere from
100 to 300 functions
Most common are functions for querying data such as overlap, intersect, touch, etc.
Also including are geoprocessing functions such as union, merge, buffer, etc.
1.1. Properties of Spatial Database
A spatial database is optimized to store and query data representing objects. These
are the objects which are defined in a geometric space.
1. It is a database system
2. It offers spatial data types (SDTs) in its data model and query language.
3. It supports spatial data types in its implementation, providing at least spatial
indexing and efficient algorithms for spatial join.
1.2 . Relationship between spatial databases and GIS
What is GIS?
Defining Geographic Information Systems (GIS)
A powerful set of tools for
collecting, storing, retrieving, transforming, and displaying spatial data from the real
world. (Burroughs, 1986)
A computerized database management system for
the capture, storage, retrieval, analysis and display of spatial (locationally defined)
data. (NCGIA, 1987)
A decision support system involving the integration of spatially referenced data in a
problem solving environment. (Cowen, 1988)
…intuitive description
A map with a database behind it.
A virtual representation of the real world and its infrastructure.
A consistent “as-built” of the real world, natural and manmade
Which is
queried to support on-going operations
summarized to support strategic decision making and policy formulation
analyzed to support scientific inquiry
What’s in a GIS?
GIS had three main components:
2
a Database Management System;
a Spatial Analytical Toolkit;
a Mapping Package.
GIS technology integrates common database operations (such as query and
statistical analysis) with the unique visualisation and geographic analysis benefits
offered by maps.
3
GIS is a software to visualize and analyze spatial data using spatial analysis functions
GIS uses SDBMS to store, search, query, share large spatial data sets
SDBMS focuses on
Efficient storage, querying, sharing of large spatial datasets
Provides simpler set based query operations
Example operations: search by region, overlay, nearest neighbor, distance,
adjacency, perimeter etc.
Uses spatial indices and query optimization to speed up queries over large
spatial datasets.
SDBMS may be used by applications other than GIS
Astronomy, Genomics, Multimedia information systems, ...
Will one use a GIS or a SDBM to answer the following:?
How many neighboring countries does USA have?
Which country has highest number of neighbors?
EVOLUTION OF ACRONYM “GIS”
Geographic Information Systems (1980s)
Geographic Information Science (1990s)
Geographic Information Services (2000s)
4
Data types
The data types explained in this topic include the data types available when
creating a feature class or table with ArcGIS.
If you store your data in a database or a geodatabase in a database management
system (DBMS), ArcGIS data types and the data types of the DBMS might not
match exactly.
The types are matched to the closest data type available. This process is referred
to as data type mapping.
In this process, it is possible that the values will be stored in the DBMS as a
different type, applying different criteria to the data attribute. As a result, the
data type you see in the table or feature class properties in ArcGIS Desktop may
change from what you initially defined.
Additionally, other data storage formats, such as shapefiles or dbf tables, have
different data type limitations.
Be sure you know the data type and size limitations of your destination storage
format when moving data between data storage types.
Numbers
You can store numbers in one of four numeric data types:
Short integer
Long integer
Size
Data type Storable range Applications
(Bytes)
5
Float (single- Numeric values with
approximately -
precision floating- 4 fractional values within
3.4E38 to 1.2E38
point number) specific range
1.4 Metadata
Data that provide information about other data.
Metadata summarizes basic information about data, making finding & working with
particular instances of data easier.
Metadata can be created manually to be more accurate, or automatically and
contain more basic information.
6
Layers: spatial objects can be grouped into layers, also called overlays, coverage or
themes
Metadata
7
inside: geo x regions bool
valid topological relationships between two simple regions: disjoint, in, touch, equal,
cover, overlap
Data for a GIS comes in three basic forms:
Spatial data:
maps are made of spatial data, made up of points, lines, and areas.
Spatial data forms the locations and shapes of map features such as buildings,
streets, or cities.
Spatial data are derived from existing maps or aerial photographs.
Image data—using images to build maps
Image data includes such diverse elements as satellite images, aerial photographs,
and scanned data=> data that has been converted from paper to digital format
8
Tabular data:
Use of a database management system (DBMS) to allow the user to define the
specific data element types and formats and to store attribute.
Lines or arcs are a series of ordered points. Areas or polygons are also stored as
ordered lists of points
=> Vector data requires less computer storage space and maintaining topological
relationships is easier in this system.
9
The raster view of the world:
A raster based system displays, locates, and stores graphical data by using a matrix
or grid of cells.
these data are two-dimensional, GIS store various information such as forest cover,
soil type, land use, wetland habitat, or other data in different layers
o Raster data requires less processing than vector data, but it consumes more
computer storage space.
This number corresponds to another table in the database with all of the spatial
reference systems used.
This allows the database to know what projection each table is in, and if need be, re-
project those tables for calculations
Example of a table in Spatial Database
10
Linkage of Tabular Attributes to Map Feature
How does GIS link between spatial and non-spatial data?
How does it link a Location Symbol with Its Meaning?
• Every geographic feature has at least one unique means of identification: a name or
number usually just called its ID.
• In other words, locational information is linked to specific information in a database
11
Chapter two
2. Databases versus database management systems
2.1. Data base
It is a group of records and files that are organized so that there is little or no
redundancy.
A database consists of data in many files. In order to access data from one or more
files easily it is necessary to have some kind of structures or organization.
Databases are usually designed to manage large bodies of information. This involves
Providing for the safety of information in the database (crash recovery and security).
12
Data integrity is a byword for the quality and the reliability of the data of a database
system.
In a broader sense data integrity includes also the protection of the database from
unauthorized access (confidentiality) and unauthorized changes.
Data reflect facts of the real world.
Data Persistence
Data persistence means that in a DBMS all data is maintained as long as it is not
deleted explicitly.
The life span of data needs to be determined directly or indirectly be the user and
must not be dependent on system features.
Additionally, data once stored in a database must not be lost.
Changes of a database which are done by a transaction are persistent.
Advantages of databases
Comparing the flat-file system, we can mention several advantages for a database
system.
Less redundancy
In a flat-file system there is a lot of redundancy. For example, in the flat file system
for a university, the names of professors and students are stored in more than one
file.
Inconsistency avoidance
If the same piece of information is stored in more than one place, then any changes in the
data need to occur in all places that data is stored.
Efficiency
A database is usually more efficient than a flat file system, because a piece of
information is stored in fewer locations.
Data integrity
In a database system it is easier to maintain data integrity, because a piece of data is
stored in fewer locations.
Confidentiality
It is easier to maintain the confidentiality of the information if the storage of data is
centralized in one location.
Database Management Systems
13
Database management systems (DBMSs) are very good at organizing and managing
large collections of persistent data.
We use DBMSs to help cope with large amounts of data because, when problems get
big, they get hard.
Consider the task of finding a particular book in a typical university library.
Now, reconsider that same task if the library doesn’t keep the books arranged in any
particular order or if the library has no indexes.
o Using a big collection of unorganized things is practically
impossible. Structure turns data into information.
o Persistence means that the data exist permanently; they do not disappear
when the computer is shut off.
DBMSs are like suitcases: they are somewhere to put stuff so that it’s all in one place
and easy to get to.
DBMSs help protect data from unauthorized access.
DBMSs help protect data from accidental corruption or loss due to:
o hardware failures such as power outages and computer crashes
o software failures such as operating system crashes
DBMSs allow concurrent access, meaning that a single data set can be accessed by
more than one user at a time
o virtually all commercial database applications require the data entry staff to
have access to the database simultaneously.
For example, an airline reservation system cannot restrict access to
the database to a single travel agent.
o concurrent data access introduces unwanted problems caused by two users
manipulating exactly the same data at exactly the same time.
These problems can cause the database to be corrupted or for a
user’s interface program to never complete its query.
These problems are analogous to road intersections: if there are no
traffic lights or stop signs, havoc will ensue.
14
The goal of a DBMS is to provide an environment that is
both convenient and efficient to use in
o Retrieving information from the database.
o Storing information into the database.
o allows controlled access to data in the database.
Advantage of DBMS
Data independence:
Application programs should be as independent as possible from details of data
representation and storage.
The DBMS can provide an abstract view of the data to insulate application code from
such details.
Efficient data access:
A DBMS utilizes a variety of sophisticated techniques to store and retrieve data
efficiently. This feature is especially important if the data is stored on external
storage devices.
Data integrity and security:
If data is always accessed through the DBMS, the DBMS can enforce integrity
constraints on the data
Data administration:
When several users share the data, centralizing the administration of data can offer
significant improvements.
Experienced professionals who understand the nature of the data being managed,
and how different groups of users use it, can be responsible for organizing the data
representation to minimize redundancy and fine-tuning the storage of the data to
make retrieval efficient.
Concurrent access and crash recovery:
A DBMS schedules concurrent accesses to the data in such a manner that users can
think of the data as being accessed by only one user at a time. Further, the DBMS
protects users from the effects of system failures.
Reduced application development time:
the DBMS supports many important functions that are common to many
applications accessing data stored in the DBMS.
Dis advantage of DBMS
15
Danger of an Overkill: For small and simple applications for single users a database
system is often not advisable.
Complexity: A database system creates additional complexity and requirements. The
supply and operation of a database management system with several users and
databases is quite costly and demanding.
Qualified Personnel: The professional operation of a database system requires
appropriately trained staff. Without a qualified database administrator nothing will
work for long.
Costs: Through the use of a database system new costs are generated for the system
itself but also for additional hardware and the more complex handling of the system.
Lower Efficiency: A database system is a multi-use software which is often less
efficient than specialized software which is produced and optimized exactly for one
problem.
DBMS is a combination of five components: hardware, software, data, users and
procedures
The hardware is the physical computer system that allows access to data.
The software is the actual program that allows users to access, maintain and update
data. In addition, the software controls which user can access which parts of the
data in the database.
In a DBMS, the term users have a broad meaning. We can divide users into two
categories: end users and application programs.
The last component of a DBMS is a set of procedures or rules that should be clearly
defined and followed by the users of the database.
Spatial Database Management System
Spatial Database Management System (SDBMS) provides the capabilities of a
traditional database management system (DBMS) while allowing special storage and
handling of spatial data.
SDBMS:
– Works with an underlying DBMS
– Allows spatial data models and types
– Supports querying language specific to spatial data types
– Provides handling of spatial data and operations
How do we incorporate spatial data into a computer application system?
Usually by using a relational Data Base Management System (DBMS)
16
GIS systems traditionally maintain spatial and attribute data separately, then “join” them for
display or analysis.
The spatial data can be store in vector or raster format
Vector format represents data in a series of (X,Y) coordinates
Raster format represent data in a series of columns and rows-Matrix (Pixel,
cell)
Accuracy
Vector data are accurate and takes less storage, but take long time e.g.
digitization
Raster data are inaccurate and takes large storage, but takes short time e.g.
scanning
17
SDBMS Three-layer Structure
SDBMS works with a spatial application at the
front end and a DBMS at the back end
SDBMS has three layers:
Interface to spatial application
Core spatial functionality
Interface to DBMS
18
2.1. Data independence
What is Data Independence of DBMS?
Data Independence is defined as a property of DBMS that helps you to change the Database
schema at one level of a database system without requiring to change the schema at the
next higher level. Data independence helps you to keep data separated from all programs
that make use of it.
Types of Data Independence
In DBMS there are two types of data independence
1. Physical data independence
2. Logical data independence.
1. Physical Data Independence
Physical data independence helps you to separate conceptual levels from the
internal/physical levels.
It allows you to provide a logical description of the database without the need to
specify physical structures.
Compared to Logical Independence, it is easy to achieve physical data
independence.
With Physical independence, you can easily change the physical storage structures
or devices with an effect on the conceptual schema.
Any change done would be absorbed by the mapping between the conceptual and
internal levels.
Physical data independence is achieved by the presence of the internal level of the
database and then the transformation from the conceptual level of the database to
the internal level.
Examples of changes under Physical Data Independence
Due to Physical independence, any of the below change will not affect the conceptual layer.
Using a new storage device like Hard Drive or Magnetic Tapes
Modifying the file organization technique in the Database
Switching to different data structures.
Changing the access method.
Modifying indexes.
Changes to compression techniques or hashing algorithms.
Change of Location of Database from say C drive to D Drive
2. Logical Data Independence
19
Logical Data Independence is the ability to change the conceptual scheme without changing
External views
External API or programs
Any change made will be absorbed by the mapping between external and conceptual levels.
When compared to Physical Data independence, it is challenging to achieve logical data
independence.
Examples of changes under Logical Data Independence
Due to Logical independence, any of the below change will not affect the external layer.
1. Add/Modify/Delete a new attribute, entity or relationship is possible without a
rewrite of existing application programs
2. Merging two records into one
3. Breaking an existing record into two or more records
Importance of Data Independence
Helps you to improve the quality of the data
Database system maintenance becomes affordable
Enforcement of standards and improvement in database security
You don’t need to alter data structure in application programs
Permit developers to focus on the general structure of the Database rather than
worrying about the internal implementation
It allows you to improve state which is undamaged or undivided
Database incongruity is vastly reduced.
Easily make modifications in the physical level is needed to improve the performance
of the system.
Summary
Data Independence is the property of DBMS that helps you to change the Database
schema at one level of a database system without requiring to change the schema at
the next higher level.
Two levels of data independence are 1) Physical and 2) Logical
Physical data independence helps you to separate conceptual levels from the
internal/physical levels
Logical Data Independence is the ability to change the conceptual scheme without
changing
20
When compared to Physical Data independence, it is challenging to achieve logical
data independence
Data Independence Helps you to improve the quality of the data
21
Chapter three
Database management system architecture
3.1. What is Database Architecture?
A Database Architecture is a representation of DBMS design. It helps to design, develop,
implement, and maintain the database management system.
• DBMS architecture allows dividing the database system into individual components that
can be independently modified, changed, replaced, and altered. It also helps to understand
the components of a database
• Data modeling (data modelling) is the process of creating a data model for the data to be
stored in a database. This data model is a conceptual representation of Data objects, the
associations between different data objects, and the rules.
•Data modeling helps in the visual representation of data and enforces business rules,
regulatory compliances, and government policies on the data.
22
Types of Data Models in DBMS
Types of Data Models: There are mainly three different types of data models: conceptual
data models, logical data models, and physical data models, and each one has a specific
purpose. The data models are used to represent the data and how it is stored in the
database and to set the relationship between data items.
Conceptual Data Model:
This Data Model defines WHAT the system contains. This model is typically created
by Business stakeholders and Data Architects.
The purpose is to organize, scope and define business concepts and rules.
Logical Data Model:
This Data Model describes HOW the system will be implemented using a specific
DBMS system.
This model is typically created by DBA and developers.
The purpose is actual implementation of the database.
23
Chapter Four
Database design approaches
4. Database design approaches:
There are two approaches for developing any database, the top-down method and the
bottom-up method. While these approaches appear radically different, they share the
common goal of utilizing a system by describing all of the interaction between the
processes.
4.1. Top – down design method
The top-down design method starts from the general and moves to the specific. In other
words, you start with a general idea of what is needed for the system and then work your
way down to the more specific details of how the system will interact. This process involves
the identification of different entity types and the definition of each entity’s attributes.
24
25
Chapter Five
Normalization
5.Normalization:
The decomposition of a complex data structure into simple, flat files
(relations). Normalization creates separate files that have common data fields, replacing
the associations represented by pointers and keys in hierarchical and network data
• This is the process which allows you to winnow out redundant data within your
database.
• This involves restructuring the tables to successively meeting higher forms of
Normalization.
• A properly normalized database should have the following characteristics
– Scalar values in each fields
– Absence of redundancy.
– Minimal use of null values.
– Minimal loss of information
Levels of Normalization
• Levels of normalization based on the amount of redundancy in the database.
• Various levels of normalization are:
– First Normal Form (1NF)
– Second Normal Form (2NF)
– Third Normal Form (3NF)
– Boyce-Codd Normal Form (BCNF)
– Fourth Normal Form (4NF)
– Fifth Normal Form (5NF)
– Domain Key Normal Form (DKNF)
• First Normal Form (1NF)
• A table is considered to be in 1NF if all the fields contain
• only scalar values (as opposed to list of values).
• Example (Not 1NF)
26
• Author and AuPhone columns are not scalar
• Functional Dependencies
1. If one set of attributes in a table determines another set of attributes in the table,
then the second set of attributes is said to be functionally dependent on the first set
of attributes.
Example 1
27
Third Normal Form (3NF)
This form dictates that all non-key attributes of a table must be functionally dependent on a
candidate key i.e. there can be no interdependencies among non-key attributes.
For a table to be in 3NF, there are two requirements
– The table should be second normal form
– No attribute is transitively dependent on the primary key
Example (Not in 3NF)
Scheme {Title, PubID, PageCount, Price }
1. Key {Title, PubId}
2. {Title, PubId} {PageCount}
3. {PageCount} {Price}
4. Both Price and PageCount depend on a key hence 2NF
5. Transitively {Title, PubID} {Price} hence not in 3NF
Boyce-Codd Normal Form (BCNF)
• BCNF does not allow dependencies between attributes that belong to candidate
keys.
• BCNF is a refinement of the third normal form in which it drops the restriction of a
non-key attribute from the 3rd normal form.
• Third normal form and BCNF are not same if the following conditions are true:
– The table has two or more candidate keys
– At least two of the candidate keys are composed of more than one attribute
– The keys are not disjoint i.e. The composite candidate keys share some
attributes
Example 1 - Address (Not in BCNF)
Scheme {City, Street, ZipCode }
1. Key1 {City, Street }
2. Key2 {ZipCode, Street}
3. No non-key attribute hence 3NF
4. {City, Street} {ZipCode}
5. {ZipCode} {City}
6. Dependency between attributes belonging to a key
28
– Fourth Normal Form (4NF)
• Fourth normal form eliminates independent many-to-one relationships between
columns.
• To be in Fourth Normal Form,
– a relation must first be in Boyce-Codd Normal Form.
– a given relation may not contain more than one multi-valued attribute.
Example (Not in 4NF)
Scheme {MovieName, ScreeningCity, Genre)
Primary Key: {MovieName, ScreeningCity, Genre)
1. All columns are a part of the only candidate key, hence BCNF
2. Many Movies can have the same Genre
3. Many Cities can have the same movie
4. Violates 4NF
The relation is in DKNF when there can be no insertion or deletion anomalies in the
database.
29
Chapter six
6. ER modelling
6.1. Entity-Relationship Model
The Entity-Relationship (ER) model is generally attributed to (Chen 1976).
The ER model envisions the world as comprised of entities that are associated with
each other by relationships. All of the entities of a particular type are collected
together into entity sets.
Entity sets and relationships can be depicted graphically in an ER-diagram.
Entities
Entities are distinguishable “real-world” objects such as employees, maps, airplanes,
or bus schedules.
o “Distinguishable” means that all entities can be uniquely identified.
o Entities have common attributes that define what it means to be such an
entity.
o Any particular real-world object does not necessarily have a single or best
representation as an entity.
For any given real-world object, different modelers can choose
different sets of attributes of the object that are of interest to their
particular situation.
This results in the same object being modeled differently.
30
one-one: if A r B and r is one-one then each entity of B is in
relationship with at most one entity of A and vice-versa.
For example, if CAPTAIN commands VESSEL and commands is one-one then, in our
model, each vessel has at most one captain and each captain commands at most one
vessel at a time.
many-one: if A r B and r is many-one then each entity of A is in relationship with at
most one entity of B but not vice-versa.
For example, if CREW assigned-to VESSEL and assigned-to is many-one then, in our
model, a vessel has many crew members but a crew member is assigned to only one
vessel.
many-many: if A r B and r is many-many then each entity of A can be in relationship
with any number of B entities and vice-versa.
For example, if VESSEL patrols REGION and patrols is many-many then, in our model,
a vessel patrols many regions and a region is patrolled by many ships.
Is a (read “is a”) relationships: if A is a B then A is a specialization of B, or,
conversely, B is a generalization of A.
For example, if CAPTAIN is a CREW then, in our model, captains have all the
attributes of crew members but not vice versa.
The is a relationship allows hierarchies to be established among entity sets.
A Relationship is depicted by a lozenge with lines connecting it to the relevant entity
sets.
The Entity-Relationship model lacks an underlying formalism and is, therefore, used
more for general conceptualization than for creating physical models
o (indeed, some authors do not acknowledge the ER model as a data model at
all).
o It is not uncommon for a conceptual design to be expressed in the ER model
and then “translated” into another model for implementation.
Entity-Relationship Modelling
31
Attributes (Names inside rectangles
Relationship Types
Relationship type
Meaningful associations among entity types.
i.e., Link between classes
Name occurs as label on the link
Arrow indicates direction of relationship type
Relationship occurrence
Uniquely identifiable association, which includes one occurrence from each
participating entity type.
i.e., Link between instances
32
ER diagram of Branch Has Staff relationship type
33
34
Chapter seven
Relational databases
7. Relational databases
Understand the relational database model’s basic components are entities and their
attributes, and relationships among entities
Identify how entities and their attributes are organized into tables
The relational data model was introduced in 1970 by E. F. Codd of IBM published
a paper in CACM entitled "A Relational Model of Data for Large Shared Data
Banks".
It is currently the most popular model. The mathematical simplicity and ease of
visualization of the relational data model have contributed to its success.
Definitions of Terminology
35
Based on the set theory
There are no duplicate tuples (rows).
The body of the relation is a mathematical set (i.e., a set of tuples), and sets
in mathematics by definition do not include duplicate elements.
Sets in mathematics are not ordered. So, even if a relation A's tuples are
reversely ordered, it is still the same relation.
Thus, there is no such thing as "the 5th tuple" or the last tuple. In other
words,
2. Attributes (columns) are unordered (left to right).
And the remaining keys that cannot be used as a PK are called alternate key.
Cost of PK
SS# vs. finger print
Candidate key and alternate key
36
Could any attribute (column) serve as the PK? candidate key
Is there any attribute that should not be served as the PK? alternate key
Entity Integrity Rule
Guarantees that each entity will have a unique identity and ensures that foreign key
values can properly reference primary key values.
Requirement
An attribute in one table whose values must either match the primary key in another
table or be null.
Attribute FK of base relation R2 is a foreign key if and only if it satisfies the following
two time-independent properties:
37
Relationships can be categorized by
cardinality constraints
other properties, e.g. number of participating entities
• Binary relationship: two entities participate
38
• Exercise:
• List the entities, attributes, relationships in this ER diagram
• Identify cardinality constraint for each relationship.
• How many roads “Accesses” a “Forest_stand”? (one or many)
7.2. Data query: Query optimization,
What is a query?
What is a Query ?
A query is a “question” posed to a database
Queries are expressed in a high-level declarative manner
• Algorithms needed to answer the query are not specified in the query
Examples:
Mouse click on a map symbol (e.g. road) may mean
• What is the name of road pointed to by mouse cursor ?
Typing a keyword in a search engine (e.g. google, yahoo) means
• Which documents on web contain given keywords?
SELECT S.name FROM Senator S WHERE S.gender = ‘F’ means
• Which senators are female?
o What is a query language?
39
Natural language, e.g. English, can express almost all queries
Conceptual Model
Query Optimization
• Query Optimization is
• A spatial operation can be processed using different strategies
• Computation cost of each strategy depends on many parameters
• Query optimization is the process of
• ordering operations in a query and
• selecting efficient strategy for each operation
• based on the details of a given dataset
• Example Query:
SELECT S.name FROM Senator S, Business B
41
WHERE S.soc-sec = B.soc-sec AND S.gender = ‘Female’
• Optimization decision examples
• Process (S.gender = ‘Female’) before (S.soc-sec = B.soc-sec )
• Do not use index for processing (S.gender = ‘Female’)
7.4 Structured Query Language
What is SQL?
42
Administrative tasks, e.g. set up database users, security permissions
43
UPDATE statement can change values within selected rows
Querying populated Tables in SQL
• SELECT statement
• The commonly used statement to query data in one or more tables
• Returns a relation (table) as result
• Has many clauses
• Can refer to many operators and functions
• Allows nested queries which can be hard to understand
• Scope of our discussion
• Learn enough SQL to appreciate spatial extensions
• Observe example queries
• Read and write simple SELECT statement
• Understand frequently used clauses, e.g. SELECT, FROM, WHERE
• Understand a few operators and function
SELECT Statement- General Information
• Clauses
• SELECT specifies desired columns
• FROM specifies relevant tables
• WHERE specifies qualifying conditions for rows
• ORDER BY specifies sorting columns for results
• GROUP BY, HAVING specifies aggregation and statistics
• Operators and functions
• arithmetic operators, e.g. +, -, …
• comparison operators, e.g. =, <, >, BETWEEN, LIKE…
• logical operators, e.g. AND, OR, NOT, EXISTS,
• set operators, e.g. UNION, IN, ALL, ANY, …
• statistical functions, e.g. SUM, COUNT, ...
• many other operators on strings, date, currency, ...
SELECT Example 1.
44
• Simplest Query has SELECT and FROM clauses
• Query: List all the cities and the country they belong to.
SELECT Name, Country
FROM CITY
SELECT Example 2.
• Commonly 3 clauses (SELECT, FROM, WHERE) are used
• Query: List the names of the capital cities in the CITY table.
SELECT *
FROM CITY
WHERE CAPITAL=‘Y ’
45
Multi-table Query Examples
Query: List the capital cities and populations of countries whose GDP exceeds one trillion
dollars.
Note:Tables City and Country are joined by matching “City.Country = Country.Name”. This
simulates relational operator “join”
Query: What is the name and population of the capital city in the country where the St.
Lawrence River originates?
SELECT Ci.Name, Ci.Pop
FROM City Ci, Country Co, River R
WHERE R.Origin =Co.Name
AND Co.Name =Ci.Country
AND R.Name =‘St.Lawrence ’
AND Ci.Capital=‘Y ’
Note: Three tables are joined together pair at a time. River.Origin is matched with
Country.Name and City.Country is matched with Country.Name. The order of join is decided
by query optimizer and does not affect the result.
Query: What is the average population of the noncapital cities listed in the City table?
SELECT AVG(Ci.Pop)
FROM City Ci
WHERE Ci.Capital=‘N ’
Query: For each continent, find the average GDP.
46
SELECT Co.Cont,Avg(Co.GDP)AS Continent-GDP
FROM Country Co
GROUP BY Co.Cont
Query Example..Having clause, Nested queries
Query: For each country in which at least two rivers originate, find the length of the smallest
river.
SELECT R.Origin, MIN(R.length) AS Min-length
FROM River
GROUP BY R.Origin
HAVING COUNT(*) > 1
Query: List the countries whose GDP is greater than that of Canada.
SELECT Co.Name
FROM Country Co
WHERE Co.GDP >ANY(SELECT Co1.GDP
FROM Country Co1
WHERE Co1.Name =‘Canada ’)
Extending SQL for Spatial Data
Motivation
SQL has simple atomic data-types, like integer, dates and string
47
Consists of base-class Geometry and four sub-classes:
Point, Curve, Surface and GeometryCollection
Operations fall into three categories:
Apply to all geometry types
• SpatialReference, Envelope, Export,IsSimple, Boundary
Predicates for Topological relationships
• Equal, Disjoint, Intersect, Touch, Cross, Within, Contains
Spatial Data Analysis
• Distance,Buffer,Union, Intersection, ConvexHull, SymDiff
Spatial Queries with SQL/OGIS
• SQL/OGIS - General Information
• Both standard are being adopted by many vendors
• The choice of spatial data types and operations is similar
• Syntax differs from vendor to vendor
• Readers may need to alter SQL/OGIS queries given in text to make them run
on specific commercial products
• Using OGIS with SQL
• Spatial data types can be used in DML to type columns
• Spatial operations can be used in DML
• Scope of discussion
• Illustrate use of spatial data types with SQL
• Via a set of examples
List of Spatial Query Examples
• Simple SQL SELECT_FROM_WHERE examples
• Spatial analysis operations
• Unary operator: Area
• Binary operator: Distance
• Boolean Topological spatial operations - WHERE clause
• Touch
• Cross
48
• Using spatial analysis and topological operations
• Buffer, overlap
• Complex SQL examples
• Aggreagate SQL queries
• Nested queries
Using spatial operation in SELECT clause
Query: List the name, population, and area of each country listed in the Country table.
SELECT C.Name,C.Pop, Area(C.Shape)AS "Area"
FROM Country C
Note: This query uses spatial operation, Area().Note the use of spatial
operation in place of a column in SELECT clause.
Using spatial operator Distance
Query: List the GDP and the distance of a country’s capital city to the equator for all
countries.
SELECT Co.GDP, Distance(Point(0,Ci.Shape.y),Ci.Shape) AS "Distance"
FROM Country Co,City Ci
WHERE Co.Name = Ci.Country
AND Ci.Capital =‘Y ’
49
FROM Country C1,Country C2
WHERE Touch(C1.Shape,C2.Shape)=1
AND C2.Name =‘USA ’
Note: Spatial operator Touch() is used in WHERE clause to join Country table with itself. This
query is an example of spatial self join operation.
Spatial Query with multiple tables
Query: For all the rivers listed in the River table, find the countries through which they pass.
SELECT R.Name, C.Name
FROM River R, Country C
WHERE Cross(R.Shape,C.Shape)=1
Note: Spatial operation “Cross” is used to join River and Country tables. This query
represents a spatial join operation.
Example Spatial Query…Buffer and Overlap
Query: The St. Lawrence River can supply water to cities that are within 300 km. List the
cities that can use water from the St. Lawrence.
SELECT Ci.Name
FROM City Ci, River R
WHERE Overlap (Ci.Shape, Buffer(R.Shape,300))=1
AND R.Name =‘St.Lawrence ’
Recall List of Spatial Query Examples
• Simple SQL SELECT_FROM_WHERE examples
• Spatial analysis operations
• Unary operator: Area
• Binary operator: Distance
• Boolean Topological spatial operations - WHERE clause
• Touch
• Cross
• Using spatial analysis and topological operations
• Buffer, overlap
Using spatial operation in an aggregate query
50
Query: List all countries, ordered by number of neighboring countries.
SELECT Co.Name, Count(Co1.Name)
FROM Country Co, Country Co1
WHERE Touch(Co.Shape,Co1.Shape)
GROUP BY Co.Name
ORDER BY Count(Co1.Name)
Notes: This query can be used to differentiate querying capabilities of simple GIS software
(e.g. Arc/View) and a spatial database. It is quite tedious to carry out this query in GIS.
Earlier version of OGIS did not provide spatial aggregate operation to support GIS
operations like reclassify.
Using Spatial Operation in Nested Queries
Query: For each river, identify the closest city.
SELECT C1.Name, R1.Name
FROM City C1, River R1
WHERE Distance (C1.Shape,R1.Shape) <= ALL ( SELECT Distance(C2.Shape)
FROM City C2
WHERE C1.Name <> C2.Name
)
Note: Spatial operation Distance used in context of a nested query.
Exercise: It is interesting to note that SQL query expression to find smallest distance from
each river to nearest city is much simpler and does not require nested query. Audience is
encouraged to write a SQL expression for this query.
Nested Spatial Query
Query: List the countries with only one neighboring country. A country is a neighbor of
another country if their land masses share a boundary. According to this definition, island
countries, like Iceland, have no neighbors.
SELECT Co.Name
FROM Country Co
WHERE Co.Name IN (SELECT Co.Name
FROM Country Co,Country Co1
WHERE Touch(Co.Shape,Co1.Shape)
51
GROUP BY Co.Name
HAVING Count (*)=1)
Note: It shows a complex nested query with aggregate operations. Such queries can be
written into two expressions, namely a view definition, and a query on the view. The inner
query becomes a view and outer query is run on the view. This is illustrated in the next slide.
Rewriting nested queries using Views
• Views are like tables
• Represent derived data or result of a query
• Can be used to simplify complex nested queries
• Example follows:
CREATE VIEW Neighbor AS
SELECT Co.Name, Count(Co1.Name)AS num neighbors
FROM Country Co,Country Co1
WHERE Touch(Co.Shape,Co1.Shape)
GROUP BY Co.Name
SELECT Co.Name,num neighbors
FROM Neighbor
WHERE num neighbor = ( SELECT Max(num neighbors) FROM Neighbor )
52
Chapter eight
Data models for spatial and non-spatial data
8. Spatial vs. Non-spatial Data
Spatial data includes location, shape, size, and orientation.
o For example, consider a particular square:
its center (the intersection of its diagonals) specifies its location
its shape is a square
the length of one of its sides specifies its size
the angle its diagonals make with, say, the x-axis specifies its
orientation.
Spatial data includes spatial relationships. For example, the arrangement of ten
bowling pins is spatial data.
Non-spatial data (also called attribute or characteristic data) is that information
which is independent of all geometric considerations.
o For example, a person’s height, mass, and age are non-spatial data because
they are independent of the person’s location.
o It’s interesting to note that, while mass is non-spatial data, weight is spatial
data in the sense that something’s weight is very much dependent on its
location!
It is possible to ignore the distinction between spatial and non-spatial
data. However, there are fundamental differences between them:
o spatial data are generally multi-dimensional and autocorrelated.
o non-spatial data are generally one-dimensional and independent.
These distinctions put spatial and non-spatial data into different philosophical camps
with far-reaching implications for conceptual, processing, and storage issues.
o For example, sorting is perhaps the most common and important non-spatial
data processing function that is performed.
o It is not obvious how to even sort locational data such that all points end up
“nearby” their nearest neighbors.
These distinctions justify a separate consideration of spatial and non-spatial data models.
What is a Data Model?
• What is a model? (Dictionary meaning)
53
• A set of plans (blueprint drawing) for a building
• A miniature representation of a system to analyze properties of interest
• What is Data Model?
• Specify structure or schema of a data set
• Document description of data
• Facilitates early analysis of some properties, e.g. querying ability,
redundancy, consistency, storage space requirements, etc.
• Examples:
• GIS organize spatial set as a set of layers
• Databases organize dataset as a collection of tables
Why Data Models?
• Data models facilitate
• Early analysis of properties, e.g. storage cost, querying ability, ...
• Reuse of shared data among multiple applications
• Exchange of data across organization
55
f g : x f (x) g(x)
f g : x f (g(x))
Types of Field Operations
Local: value of the new field at a given location in the spatial frame-work depends
only on the value of the input field at that location (e.g., Thresholding)
Focal: value of the resulting field at a given location depends on the values that the
input field assumes in a small neighborhood of the location (e.g., Gradient)
Zonal: Zonal operations are naturally associated with aggregate operators or the
integration function. An operation that calculates the average height of the trees for
each species is a zonal operation.
56
• Spatial objects are of many types
• Simple
• 0- dimensional (points), 1 dimensional (curves), 2 dimensional
(surfaces)
• Example given at the bottom of this slide
• Collections
• Polygon collection (e.g. boundary of Japan or Hawaii), …
• See more complete list in Figure
Point City 0
Curve River 1
Surface Country 2
57
• a set operation (e.g. intersection) of 2 polygons produce another
Metric Distance
polygon
• Topological operations: Boundary of USA touches boundary of Canada
• Directional: New York city is to east of Chicago
• Metric: Chicago is about 700 miles from New York city.
Topological Relationships
invariant under elastic deformation (without tear, merge).
Two countries which touch each other in a planar paper map will continue to
do so in spherical globe maps.
58
Question: Define Interior, boundary, exterior on curves and points
Nine-Intersection Model of Topological Relationships
• Many topological Relationship between A and B can be
• specified using 9 intersection model
• Examples on next slide
• Nine intersections
• intersections between interior, boundary, exterior of A, B
• A and B are spatial objects in a two dimensional plane.
• Can be arranged as a 3 by 3 matrix
• Matrix element take a value of 0 (false) or 1 (true).
• Q? Determine the number of many distinct 3 by 3 boolean matrices .
59
8.1 Relational model
Main concepts
Domain: a set of values for a simple attribute
Relation: cross-product of a set of domains
• Represents a table, i.e. homogeneous collection of rows (tuples)
• The set of columns (i.e. attributes) are same for each row
Schema of a Relation
Enumerates columns, identifies primary key and foreign keys.
Primary Key :
• one or more attributes uniquely identify each row within a table
Foreign keys
• R’s attributes which form primary key of another relation S
• Value of a foreign key in any tuple of R match values in some row of S
Relational schema of a database
collection of schemas of all relations in the database
Example: Figure 2.5 (next slide)
Ablue print summary drawing of the database table structures
Allows analysis of storage costs, data redundancy, querying capabilities
Some databases were designed as relational schema in 1980s
60
Nowadays, databases are designed as E R models and relational schema is
generated via CASE tools
Relational Schema Example
61
Integrity Constraints
Key: Every relation has a primary key.
Entity Integrity: Value of primary key in a row is never undefined
Referential Integrity: Value of an attribute of a Foreign Key must appear as a
value in the primary key of another relationship or must be null.
Normal Forms (NF) for Relational schema
Reduce data redundancy and facilitate querying
1st NF: Each column in a relation contains an atomic value.
2nd and 3rd NF: Values of non-key attributes are fully determined by the
values of the primary key, only the primary key, and nothing but the primary
key.
Other normal forms exist but are seldom used
Translating a well-designed ER model yields a relational schema in 3rd NF
• satisfying definition of 1st, 2nd and 3rd normal forms
Mapping ER to Relational
• Highlights of translation rules
• Entity becomes Relation
• Attributes become columns in the relation
• Multi-valued attributes become a new relation
• includes foreign key to link to relation for the entity
• Relationships (1:1, 1:N) become foreign keys
• M:N Relationships become a relation
containing foreign keys or relations from participating entities
8.2 Geo-relational model
A georelational data model is geographic data model that represents geographic features as
an interrelated set of spatial and attribute data.
The Georelational Data Model stores spatial and attribute data separately in a split
system.
Spatial and Attribute Components
62
Geospatial data comprise the spatial and attribute components.
Spatial data describe the locations of spatial features, whereas attribute data
describe the characteristics of spatial features.
The georelational data model stores spatial and attribute data separately in a split system.
Spatial data in ‘graphic files. It describes the absolute and relative location of
geographic features.
Attribute data in ‘relational database files. It describes characteristics of the spatial
features. Attribute data is often referred to as tabular data.
Example of Georelational Data Model
As an example of the georelational data model, the soil-id coverage uses SOIL-ID to link to
the spatial and attribute data.
Like the other models, the object model assumes that objects can conceptually be
collected together into meaningful groups. These groups are called classes.
An object grouping is meaningful because objects of the same class must have
common attributes, behaviors, and relationships with other objects.
Unlike entity sets and relations, classes do not actually hold the objects of that class.
o Classes are purely conceptual.
63
o There is nothing in the object model that is equivalent to either a entity set or
a relation (there could be but it’s not required by the model).
Like the network model, the relationships among objects are specified via a
“physical” link The DARPA Open OODB project proposes the following as the
essential features of the OO data model (Blakeley 1991) and (Rao 1994, p.72):
Object identity: the ability of the system to distinguish between two different objects
that have the same state. The state of an object can be shared by several objects via
object identity.
Encapsulation: a kind of abstraction that enforces a clean separation between the
external interface (behavior) of an object and its internal
implementation. Encapsulation requires that all access (or interaction) with objects
be done by invoking the services provided by their external interface.
Complex state: the ability to define data types whose implementation has a nested
structure. The state of an object could be built from records of primitive types, other
objects, or [collections] of objects.
Type extensibility: the ability to define new data types from previously defined types
by enhancing or changing the structure or behavior of the types. Type inheritance is
a mechanism used to define new types by enhancing already existing behavior.
Genericity: The types of the object data model with which the object query language
collaborates must be generic. (pointer) between objects.
8.4 Object-relational model
Spatial and Graph supports the object-relational model for representing geometries. This
model stores an entire geometry in the Oracle native spatial data type for vector data,
SDO_GEOMETRY. An Oracle table can contain one or more SDO_GEOMETRY columns. The
object-relational model corresponds to a "SQL with Geometry Types" implementation of
spatial feature tables in the Open GIS ODBC/SQL specification for geospatial features.
Support for many geometry types, including arcs, circles, compound polygons,
compound line strings, and optimized rectangles
Ease of use in creating and maintaining indexes and in performing spatial queries
Optimal performance
64
The geodatabase is object relational
65
Chapter Nine
Unified model language (UML)
9.UML Diagram – What is UML?
The Unified Modeling Language (UML) is a standard language for
UML basics
Use case diagram
Class diagram
Activity diagram
Sequence diagram
StateMachine diagram
Conceptual Data Modeling with UML
• Motivation
• ER Model does not allow user defined operations
• Object oriented software development uses UML
• UML stands for Unified Modeling Language
• It is a standard consisting of several diagrams
• class diagrams are most relevant for data modeling
• UML class diagrams concepts
• Attributes are simple or composite properties
• Methods represent operations, functions and procedures
• Class is a collection of attributes and methods
• Relationship relate classes
66
• Example UML class diagram: Figure below
UML Class Diagram with Pictograms: Example
• Exercise: Identify classes, attributes, methods, relationships in Fig. 2.8.
• Compare Fig. 2.8 with corresponding ER diagram in Fig. 2.7.
67
Generalization -- an inheritance link indicating one class is a superclass of the other.
A generalization has a triangle pointing to the superclass.
Sequence Diagram
Activities Diagram
68
State Machine Diagram
69
CHAPTER TEN
Web databases processing
Why is ‘Databases on the Web’ Important?
Electronic commerce
Website automation
www.yahoo.com
www.webmonkey.com
How to Integrate Databases and the Web?
Databases
MS Access, MySQL, mSQL, Oracle, Sybase, MS SQL Server
Integration tools
PHP or CGI, Servlets, JSP, ASP etc.
“Middleware”: e.g. ColdFusion
http://www.allaire.com/
70
References
Spatial data base: with Application to GIS (Morgan kaufmann series in data management
systems);1st edition
Spatial Data management, M. Tamer Ozsu, University of waterloo
71