GIS Lab
GIS Lab
1.0 Overview
This lab will take you through a number of spatial functions and exercises which you will
need to use for the coursework. The lab is quite long and part of your coursework time
will be spent completing the lab before undertaking the assignment.
TUTORIAL 8 1
2.0 SQL Reminder
A quick reminder of the basic syntax for a SQL SELECT statement to retrieve records from
a database server:
SELECT {field1,field2,...}
FROM {table1}
JOIN {table2} ON {table2.field} {comparator e.g. =} {table1.field}
WHERE {condition}
GROUP BY {field1, field2,...}
HAVING {condition based on group}
ORDER BY {field1,field2,...} {ASC or DESC} LIMIT {n};
• The first { } can be replaced with * which means return all of the fields in the table
• JOIN is optional – used when bring data from many tables together
• GROUP BY is optional – aggregates data & therefore restricts the fields which can be
returned
• HAVING is optional with GROUP BY and acts like a where condition on the results
after grouping
• ORDER BY is optional and can be ASC or DESC for ascending or Descending order
• LIMIT is optional and can be used to returning a few records not the entire table
• Comments are not parsed, but useful for keeping notes and are marked with double
hyphen --
TUTORIAL 8 2
3.0 Software Installation
3.2 MacOS
2. Drag and drop the Postgres app icon to your Applications folder to install.
TUTORIAL 8 3
5. pgAdmin will run from this image every time without installation, so don’t delete this
file “pgadmin4-x.xx.dmg”.
8. Drag and drop QGIS application icon to your Application folder to start the
installation.
9. Open your Terminal and enter the following command to make PostgresSQL
commands accessible from the terminal using “psql”
export PATH=/Applications/Postgres.app/Contents/
Versions/13/bin/psql:$PATH
TUTORIAL 8 4
3.2 Windows
TUTORIAL 8 5
4. After PostgreSQL installation finishes, make sure StackBuilder is selected and click
finish to exit the first installer.
5. StackBuilder installer will launch next. Choose PostgreSQL 13 on port 5432 and click
next.
6. Select the PostGIS Bundle for PostgreSQL from the Spatial Extensions stack, click next.
TUTORIAL 8 6
7. PostGIS installer will launch after. Follow the wizard and accept whenever you’re
prompted.
TUTORIAL 8 7
4.0 Setting up PostgreSQL to work with PostGIS
- The first time you use a database on the VM you will need to add the PostGIS Extension
to that database. This has to be done as “sudo” to gain the necessary privileges to add
the extension. It is only needed once per database. The example below is to enable
PostGIS on the default hw database on the VM.
3.1 MacOS/Windows
TUTORIAL 8 8
- Select Databases, right-click and choose “Create” and choose “Database” from the list
and give your database any name. For convenience and to be able to follow with the
rest of the steps, name the new database “hw”, as given in the MACS VM.
- Select your newly created database, right-click and choose “Query Tool”.
TUTORIAL 8 9
- Type the following command then hit run.
CREATE EXTENSION postgis;
TUTORIAL 8 10
4.0 An Introduction to Spatial Analysis
4.1 Overview of the data used in the lab
To begin we’ll create a new empty table in your database, which has 2 fields. Run this SQL QUERY from either
psql console, or a pgAdmin4 query window.
If you get an error about not recognising the geometry data type that means you did not install PostGIS for this database.
Notes:
- the serial datatype definition creates an integer field which increments by 1 with each new record added
- geometry is a spatial datatype which can handle points, polylines, polygons
There are a number of ways to create geometries using SQL - to start we’ll use the Well-Known Text
representation as defined by the Open Geospatial Consortium (OGC). You can read more about the OGC here:
http://www.opengeospatial.org/
You can add a single record for a POINT at coordinate (x=10, y=0) as follows:
To check that this was entered correctly retrieve the information from the table:
Notice that the id has been automatically given an integer value, and that the geometry column has been
converted to binary. To see this as text use ST_ASTEXT(geometry) as follows:
6
SELECT id,ST_ASTEXT(geom) from dftest;
1 | POINT(10 0)
Now try SELECTING the records in the table again to check all looks correct.
Note:
In pgAdmin4 you can have many query windows open, or have many SQL statements in a single query window
- you can run a single statement by highlighting it before pushing lightning bolt or F5. Take care if you have
many SQL commands in a window, as if you don’t select a single statement then all of the queries in that windows
will run sequentially. This is by design and can be very useful as it allows you to build models from complex
sequences of SQL statements. You can add comments prefixed with – (double hyphen).
In the latest version of pgAdmin you can visualize geometry columns by clicking on the geometry viewer.
As well as POINTS we can also add LINES (or POLYLINES), and POLYGONS to our spatial database.
LINES are entered as a list of ordered points separated by a comma (x1 y1, x2 y2). A simple line consists of 2
points (start point and end point), where as a POLYLINE is the name given to a line with many segments. They
are both defined the same way, just the number of coordinates differentiates them. Here you will insert a
polyline with 2 segments (i.e. 3 coordinates).
7
A POLYGON is a closed region (an area), therefore the first and last points must be identical. Here we will
make a simple polygon, but they can be more complex defined with holes (e.g. a donut shape) - therefore take
care with the brackets!
There are more complex versions of geometries including MULTI-geometries, 3D and GEOMCOLLECTIONS.
We will not go into that during this lab but you can read more here: https://postgis.net/docs/ST_GeomFromText.html
You can also use ST_MAKEPOINT (), ST_MAKELINE(), and ST_MAKEPOLYGON() as alternative ways to
create geometries, which we’ll try in a later section.
Now you have created some geometries in a table let’s do a few simple calculations based on those
geometries. Here we’ll output the geometry type as well as it’s length. Note that the spatial functions are
prefixed with ST_ before the function name (e.g. ST_LENGTH).
Notice that only the LINE has a length. Now let’s try that again but adding columns for the area and perimeter
(these are PostGIS spatial functions that take a geometry input).
You should see the polygon has an area of 100 map units, and a perimeter of 40 map units. The map units are
defined by the coordinate system for each record, which in these simple examples are a non-geographic space
- so you can think of these coordinates as being on a planar surface. The values returned from the length and
area calculations are in the same units as the coordinate system, so if the coordinates are defined as metres
then the output distances are also in metres and areas in metres squared. Here we will assume metres.
SELECT id,ST_LENGTH(geom),ST_AREA(geom),ST_PERIMETER(geom),
ST_GEOMETRYTYPE(geom)
FROM dftest;
Each feature also has a centroid, this is the central point of the feature. Here we’ll nest the ST_CENTROID()
function within the ST_ASTEXT() function.
Let’s now find the Euclidean (straight line) distance between 2 features. We’ll use ST_MAKEPOINT(x,y) as an
alternative way to create a POINT. You could also use this or the WKT approach we used in Section 4.3 -
whichever way you prefer.
ST_DISTANCE (geom1,geom2) returns the minimum distance between those features - here we populate
geom1 with values from the table, and specify a static point at (3,3) for geom2.
8
SELECT id,ST_GEOMETRYTYPE(geom),ST_DISTANCE(geom,ST_MAKEPOINT(3,3))
FROM dftest;
The output should look like this, showing the distance in map units (e.g. metres) from a point at coordinate (3,3)
to each of the records in the table.
1 | ST_Point | 7.61577310586391
2 | ST_Point | 0
3 | ST_Point | 15
4 | ST_Point | 20.8086520466848
5 | ST_LineString | 9.89949493661167
6 | ST_Polygon | 0
Notice that it returns a distance of 0 for the id=2 which was a point also at (3,3). It also returns a distance of 0
for id=6 which is the polygon. This is because the point (3,3) is located within (i.e. inside) of the region defined
by the polygon. Also notice that it returns the shortest distance from the point(3,3) to the line.
TASK: Try adding a new polygon with coordinates (20 20, 30 30, 30 50, 20 50, 20 20) to the dftest table , and
then calculate the distance to a point at (5,5). As this new polygon (id=7) does not contain the point (5,5) you
should get a distance other than 0 (e.g. 21.2).
Try the pgAdmin geometry visualisation tool again to see if your database table with 4 points, 1 line, and 2
polygons look like this:
9
4.6 Using a Spatial Join to count the number of Points in a Polygon
You will have used database JOINS before to link records, perhaps between an EMPLOYEE table and a
PROJECT table based on primary and foreign keys. We can do something similar using the spatial column to
join tables based on geographic locations. For example linking all rivers (polylines) within a national park
(polygon). This is known as a SPATIAL JOIN and a powerful feature as it allows us to join otherwise unrelated
data together.
Let’s try this with a simple example - first of all let’s create another table and insert a single polygon. We’ll then
create a SPATIAL JOIN between the tables to count how many features from the dftest table are within the
new polygon.
To count the number of features from dftest within this polygon we will use a SPATIAL JOIN:
Notice that this time we’ve added column name aliases (df1geom, df2geom) to make the output more readable.
You should find that two points from dftest are contained within the polygon defined in dftest2.
df1geom | df2geom
--------------+-----------------------------------
POINT(3 3) | POLYGON((0 0,13 0,15 20,3 10,0 0))`
POINT(12 15) | POLYGON((0 0,13 0,15 20,3 10,0 0))
Aside:
As we only added a single record to dftest2 we could have done this on the fly, for example the following SQL
shows us which features are within the polygon defined in the SQL statement.
10
df1geom | st_within
------------------------------------------+----------
POINT(10 0) | f
POINT(3 3) | t
POINT(12 15) | t
POINT(15 20) | f
(7 rows)
The map confirms our findings that the polygon in dftest2 (cross-hatched)
contains just 2 points completely. The other features are on the boundary, not
completely contained, or outside of the polygon. We’ll look at other types of
spatial predicate (e.g. WITHIN, TOUCHES, INTERSECTS) in the next section.
Here using the geometry has allowed you to link tables together in a way that
otherwise would not have been possible. Often geography is the only way to link
information from different sources (e.g. population density, crime rates, house
prices).
Now we’ll try this with the dftest table to find out how many points are within the first polygon (id=6). As we are
trying to find out how many geometries are within a polygon in the same table (dftest) we need to use a SELF
JOIN. This is where we reference a table twice in the SQL statement, each time with a different alias. We can
then treat the aliases as if they were different tables, and therefore find the spatial relationships between
features. This is used quite often – for example in a table which holds vehicle crash locations a self join is
necessary to find out how many vehicle crashes occurred nearby (e.g. within 100m).
Here we’ll load dftest and give it the alias of ‘a’, then load the table again and alias it as ‘b’ - these could be any
alias name you wish as long as they are unique.
11
You should have a single result as follows:
id | ST_ASTEXT
----+------------
2 | POINT(3 3)
This means that only the record with id=2 (point (3,3)) is within the polygon (id=6).
Notice from the figure above (Section 4.6) that there is also a point on the boundary, and a line which goes
along the boundary. These are not considered to be WITHIN the polygon.
Try running the SQL above again but this time substitute ST_WITHIN with ST_TOUCHES instead. Notice a
different point ID (id=1) is returned with a line (id=5) now - these are touching the boundary, but not within the
polygon. Try once more but this time use ST_INTERSECTS rather than ST_TOUCHES. This means that any
interaction (touches, within) will be returned in the results. You should see the following result:
1 | POINT(10 0)
2 | POINT(3 3)
5 | LINESTRING(10 10,15 10,15 20)
Aside:
You can determine the spatial relationship between features according to the Dimensionally Extended 9
Intersection Model (DE-9IM) using ST_RELATE(geom1,geom2) https://postgis.net/docs/ST_Relate.html
We can nest functions, for example modifying a geometry during a join by adding a spatial buffer around it.
Here we’ll do that and add a buffer of 3m around the line (id=5) before doing the spatial join. Here we’ll use a
SUBQUERY to select the line from the dftest table based on its id (5) and then buffer it, and then count the
features that intersect that buffer.
We could also use this approach to make a new table with modified geometries. For example we’ll make
dftest3 based on buffering all the points in the table dftest by 5 metres, using the GEOMETRYTYPE to filter
the input table for just the points.
12
CREATE TABLE dftest3 AS
(SELECT id, ST_BUFFER(geom,5) as geom FROM dftest
WHERE ST_GEOMETRYTYPE(geom) = 'ST_Point' );
Now you have a dftest3 table which has 4 polygons - as you’ve turned the points into
polygons using ST_BUFFER(). In the map on the right, the crosshatched circles are from
dftest3, and the blue features from dftest.
Earlier on you found the distance from a point (3,3) to all other features - you could do
something similar to calculate the distance from each feature to every other feature using a
CROSS JOIN. This requires a self join between the table but here you specify the join type
to be a CROSS JOIN.
The results look like this - notice that it is comparing each record against all other features (check the id
columns).
TASK: Try modifying the previous query to exclude all results where origin and destination are not identical.
13
5.0 Working with GIS Data in PostgreSQL + PostGIS
5.1 Using the command line to load data
Now that you have learnt the basics of handling spatial data within a database, we will load some larger
datasets for a section of Edinburgh (Scotland). The features are defined in the British National Grid (BNG)
coordinate system, which uses metres. You don’t need to be concerned about the coordinate system other than
to know that for this lab (and coursework) all the data will be loaded using the same coordinate system (BNG).
Aside:
There are many hundreds of coordinate systems, with projections defined from local regions. For example:
Use the Linux terminal window you opened earlier and download some GIS data using wget.
(if you shut the terminal then in Edinburgh GRID Lab open Virtual Box then run the terminal prompt)
wget www.macs.hw.ac.uk/~pb56/df_flickredin.sql
Note:
If you don’t have access to wget you can download the data required for the lab and coursework from a web
browser from: www.macs.hw.ac.uk/~pb56/f21df.html
Now load the data from the command line into the PostgreSQL server using psql as follows:
(substitute abc123 for your username + change the server host address as appropriate)
OR on the VM:
psql -h localhost -U hw -d hw -W -q -f df_flickredin.sql
where:
-h is the host server; -U is the user account to use; -d is the database;
-W prompt for password; -f file to load; -q work quietly / avoid verbose msgs
enter your Postgresql database password when prompted - it may take a few minutes to load the data
Repeat this procedure for df_grid100m.sql which can also be downloaded from the same URL.
14
You should have just loaded two tables:
We’ll get to how you can visualise these layers later but for now just
check they loaded into your database and check the number of records in each table.
In pgAdmin refresh (right click menu – refresh option) the list of tables to see the new table names added. Then
run the COUNT() function for reach table to find out how many records were loaded into each.
Then this SQL will return the number of records for the flickr_edin table would be as follows:
TASK
Repeat this for the other table that was just loaded (the polygon grid) - how many records are there? _______
15
5.2 Spatial Indexes
We want to make a new map which shows the number of photos taken in each of the hexagons - this involves a
point-in-polygon spatial operation as you did before but with a larger dataset. To speed up this operation we’ll
add a spatial index to each table so that the join is faster. This index enables the database to subset the data
based on spatial location, thereby improving performance for larger datasets.
An index is an extra data structure stored on disk that helps the database to search records based on the
specified index field - in this case the geometries stored in each table.
We will now create a new table which shows the number of flickr photos in each of the hexagon polygons by
using a spatial join based on points inside of polygons ST_WITHIN(geomA,geomB). In this example the output
table is going to be called flickrgrid and will consist of the point count and the grid geometry.
TASK Check what happened by summing the total count column in the new table and checking it matches the
total number of flickr photos. Does it match - if not why do you think this might be? The answer to this should
be clearer when you get to visualising the data in QGIS in step 6 of the lab.
TASK: Try to fix the query so that it returns 0 counts for cells without any flickr photos.
As well as working with spatial data you have a time series of photo locations (i.e. spatio-temporal data).
PostgreSQL has a range of tools for processing dates and timestamps.
Let’s take a look at the Flickr data and extract the Day of the Week (e.g. Mon, Tue) from the date_taken
timestamp. We can use this DOW to count the number of photos taken each day to see which are the most
popular days, using GROUP BY.
16
SELECT EXTRACT('dow' FROM date_taken) as dow,COUNT(*)
FROM flickr_edin
GROUP BY dow
ORDER BY dow;
dow | count
0 | 14630
1 | 8318
2 | 8769
3 | 8199
4 | 10152
5 | 10919
6 | 19250
TASK: Try doing the same thing but group by the week of the year (e.g. week 1 to week 52).
So far we’ve been using all of the data in a table, but often you will need to filter the table using an attribute.
This is done in SQL with the WHERE condition. The example below uses the ILIKE match function which
ignores case and permits the use of wildcards(%). A wildcard means that any other combination of characters
can be substituted (e.g. %h% would match ‘here’ and ‘there’ while ‘h%’ would only match ‘here’).
First take a look at the usertags field for a record which has some usertags:
SELECT usertags
FROM flickr_edin
WHERE LENGTH(usertags)>1
LIMIT 1;
Let’s count how many flickr photo locations mention the Edinburgh Fringe.
SELECT COUNT(*)
FROM flickr_edin
WHERE usertags ILIKE '%edinburgh fringe%';
In Edinburgh use the GRID Windows Lab as QGIS ver 3.x is installed on the desktop- this is the preferred
option for Edinburgh students. You should find QGIS from the START menu.
Note: you can also access an older version of QGIS (ver 2.x) on the LINUX server using the VM (GRID lab) or
X2GO - it’s found under the Applications Menu > Education > QGIS Desktop. It’s recommended you use the
17
same version throughout to avoid versioning issues with moving map project files about.
Host: pgsql-server-1.macs.hw.ac.uk
Port: 5432
Database: abc123
Username: abc123
If the database has large tables it’s a good idea to tick the use
estimated table metadata (not needed here) – as this speeds up the process of scanning for geometry columns
held in tables on the server.
Now connect to the server and highlight the layers you wish to view (e.g. flickr_edin, grid100m) and click Add
18
You can now re-order the layers -by dragging them in the Table of Contents. Usually this would be polygons at
the base, then polylines, then points on top. You should be able to see something like this.
Use the PAN tool (hand) and ZOOM tools (magnifying glass) to move around the map. Also the IDENTIFY (i)
tool is useful for clicking on a feature to see the attributes for that item – based on which layer is highlighted in
the Table of Contents at the time you click on the feature.
19
6.2 Symbology
20
6.3 Exporting a Map as an Image
To export a map you should include the basic cartographic elements as per the lecture (e.g. legend, title).
This is done from a PRINT LAYOUT screen.
Use ADD ITEM menu to place the map, legend, north arrow, scale bar, and title on the page.
You can now export this map as an image from: Layout Menu > Export as Image.
Have a play around with QGIS and the map design.
TIPS:
• To pan a map on the PRINT LAYOUT push the C key while using the mouse pointer to slide the map around.
• You can edit the layer names in the Table of Content (TOC) and they will update in the legend on the PRINT LAYOUT
• If you wish you can disconnect the LEGEND from the TOC and then remove items (layers) in the legend.
21
Some things to try:
Check LinkedIn learning (videos) and YouTube for more information about what you can do with QGIS.
YouTube videos:
• Labelling: https://www.linkedin.com/learning/learning-qgis-2/label-vector-data-part-1?u=2374954
You should save your map project to your personal drive. The map project contains links to the various
datasets (but not the actual data), the Table of Contents order, symbology, and any layout details (e.g. labels,
scale bars). Do this from PROJECT drop down menu > Save.
Well done you have now learned about PostgreSQL and PostGIS,
as well as how to display spatial data using QGIS.
22