Lecture 5 Slides
Lecture 5 Slides
Above shows DBMS and databases are different, but in real world the term
"database" is often used casually to refer to both a database and the DBMS
used to manipulate it.
Advantages of DBMS (over storing data in plain files)
Volume: Designed for storing and processing large amount of data
Security: Allow you to authorise and control who can access and/or update
the data
Reliability: Backup and recovery
Integrity: For example, DBMS can enforce constraints on the data and
prevent data anomalies
Concurrency: Multiple users can access and modify the data at the same
time in a controlled way
Unfortunately, we will only be able to demonstrate the integrity in this course
due to the time constraint.
Why are we learning database in this course?
Database is an important part of data management and data analysis. By
knowing about databases, it allows you to:
Store data collected for future use
More secure and reliable
Safe concurrent access
Quick data retrieval and merging data from different data sources
Access large amounts of (internal) data
Large organisations often have their data stored in a database to
support daily operations
By knowing about how to query a database, you can retrieve data
from them
In today's lecture you will learn how to do both on relational databases.
Relational databases
A relational database is a collection of data with pre-defined relationships
between them.
Data are organised as a set of tables with columns and rows
Each table is used for a different type of entity
Each table contains a fixed number of columns containing the
attributes of the entity
There are many types of databases but we only provide an overview of them in
the next lecture.
Table example: course
code name quota
ST101 Programming for Data Science 90
ST115 Managing and Visualising Data 60
MA214 Algorithms and Data Structures NULL
ST207 Databases NULL
ST310 Machine Learning 60
ST311 Artificial Intelligence 60
ST445 Managing and Visualising Data 90
Examples of relational DBMS (RDBMS)
There are many RDBMS available. Here are some examples:
Oracle Database
MySQL
PostgreSQL
SQLite
In this course, we will use SQLite as it is easy to set up. See here for the DB-
Engines Ranking according to their popularity.
Structured Query Language (SQL)
SQL is a computer language used to communicate with a RDBMS.
Pronounced as S-Q-L or "sequel"
It is used to store, manipulate and query data and control access to
databases
For students with some programming background:
SQL is a specific purpose, declarative programming language
(Python is a general purpose language which supports multiple
programming paradigms)
Note: SQL syntax and set of functionality implemented by different RDBMS can
be slightly different. We will use SQLite in this course.
RDBMS terminology
Each table is called a relation
Each row of relation is called a record or tuple
Rows do not have names
Each column of a relation is called an attribute or field
So a relational database is a set of relations, with each relation consists of
records and attributes.
Attributes
Attributes have:
Names (e.g. name , code , quota )
Data types (e.g. INTEGER, TEXT)
Attributes may also have constraints (e.g. must be non-negative)
This helps to avoid wrong input
Attributes may be marked as primary or foreign keys to identify each record
and show how data in different tables are linked
Data types
Data
Type
types available
Description
in SQLite:
NULL The value is a NULL value
INTEGER Signed integer, stored in 0, 1, 2, 3, 4, 6, or 8 bytes depending on
the magnitude of the value
REAL The value is a floating point value, stored in 8 bytes
TEXT The value is a text string, stored using the database encoding
(UTF-8, UTF-16)
BLOB Binary Large Object, can store a large chunk of data, document
types and even media files like video
Note other RDBMS may have different sets of data types.
Database schema
A database schema defines how data is organised within a relational database.
It is the definition for the entire database, for example:
Relations
Views
Indexes
In this course we only consider schema for relations.
Relation schema example
The following SQL code defines and creates the relation course :
CREATE TABLE course(
code TEXT NOT NULL CHECK(length(code) = 5),
name TEXT NOT NULL,
quota INTEGER CHECK(quota >= 0),
PRIMARY KEY(code)
);
Schema: enforce guarantees
The schema for course enforces:
Uniqueness:
Forbid a record to have the same code as another
Correct type:
e.g. Each record must have quota to be integer
Constraints:
e.g. Each record must not have quota to be negative
e.g. Each record must have some values for code and name
e.g. Each record must have code to have length 5
For some RDBMS, we can also set restriction on the form of the code to be
CCddd where C is some capitalised letter and d is some digit using regular
expression (regex).
We will not talk about how to do it in this course, but we will learn about
regex in week 7. Stay tuned!
Primary key
A primary key is an attribute or a set of attributes that contain values that
uniquely identify each record in a table.
Primary key must be unique
Used to cross-references between tables
Example:
code in the course table is a primary key
name in the course table is not a primary key
SQL syntax: create a table
id
student table:
name program department year
202022333 Harry BSc Data Science Statistics 2
202012345 Ron BSc Data Science Statistics 2
202054321 Hermione BSc Economics Economics 2
202101010 Ginny BSc Data Science Statistics 1
202155555 Dobby BSc Actuarial Science Statistics 1
202124680 Harry MSc Data Science Statistics 1
More examples of table
registration
courseCode studentId
table:
mark
ST207 202022333 72
MA214 202022333 NULL
ST207 202012345 66
MA214 202012345 NULL
EC220 202054321 NULL
ST101 202054321 93
ST115 202054321 NULL
ST101 202101010 70
ST115 202101010 NULL
More examples of database schema
Question: Why we need two attributes as primary key for the table
registration ?
Foreign key
A foreign key is an attribute or a set of attributes in a table that refers to the
primary key of another table.
The foreign key shows how tables are linked
Example: studentId in registration table is a foreign key
You can also explicitly specify foreign key relationships in the schema, but we
will not cover it in this course.
SQL insert
We can add a row by:
INSERT INTO <table> VALUES (<value 1>, <value 2>, ..., <value n>);
Example:
INSERT INTO course VALUES ('ST101', 'Programming for Data Science', 90);
SQL in Jupyter notebook
In this lecture, we will run sql scripts in Jupyter Notebook using %sql (or
%%sql ) magic.
%sql for one line and %%sql for multiple lines
In order to do so, please install ipython-sql by using the following
command:
conda install -c conda-forge ipython-sql
Now we will be able to run SQL by having %sql or %%sql at the beginning of
your code cell.
Connect / create to a database
The following command connect (and create if it does not exist) to the
school.db in the folder where your jupyter notebook is:
In [2]:
%sql sqlite:///school.db
Out[2]:
'Connected: @school.db'
Alternatives:
sqlite:// : temporary connection
sqlite:////Users/yyy/xxx.db : absolute path to the database
Create a table
In [3]:
%%sql
* sqlite:///school.db
Done.
Done.
Out[3]:
[]
Insert rows into the table
In [4]:
%%sql
INSERT INTO course VALUES ('ST101', 'Programming for Data Science', 90);
INSERT INTO course VALUES ('ST115', 'Managing and Visualising Data', 60);
INSERT INTO course VALUES ('MA214', 'Algorithms and Data Structures', NULL);
INSERT INTO course VALUES ('ST207', 'Databases', NULL);
INSERT INTO course VALUES ('ST310', 'Machine Learning', 60);
INSERT INTO course VALUES ('ST311', 'Artificial Intelligence', 30);
INSERT INTO course VALUES ('ST445', 'Managing and Visualising Data', 60);
* sqlite:///school.db
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
Out[4]:
[]
View the table
Check if we have created the table and inserted the rows properly:
In [5]:
%sql SELECT * FROM course;
* sqlite:///school.db
Done.
Out[5]:
code name quota
ST101 Programming for Data Science 90
ST115 Managing and Visualising Data 60
MA214 Algorithms and Data Structures None
ST207 Databases None
ST310 Machine Learning 60
ST311 Artificial Intelligence 30
ST445 Managing and Visualising Data 60
We will talk about the syntax in the next section.
Test the constraints
Try to insert ST101 again:
In [6]:
%sql INSERT INTO course VALUES ('ST101', 'Programming for Data Science', 90);
* sqlite:///school.db
---------------------------------------------------------------
------------
IntegrityError Traceback (most recen
t call last)
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/b
ase.py in _execute_context(self, dialect, constructor, statemen
t, parameters, execution_options, *args, **kw)
1818 if not evt_handled:
-> 1819 self.dialect.do_execute(
1820 cursor, statement, parameters,
context
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/d
efault.py in do_execute(self, cursor, statement, parameters, co
ntext)
731 def do_execute(self, cursor, statement, parameters,
context=None):
--> 732 cursor.execute(statement, parameters)
733
The above exception was the direct cause of the following excep
tion:
~/opt/anaconda3/lib/python3.9/site-packages/IPython/core/intera
ctiveshell.py in run_line_magic(self, magic_name, line, _stack_
depth)
2362 kwargs['local_ns'] = self.get_local_sco
pe(stack_depth)
2363 with self.builtin_trap:
-> 2364 result = fn(*args, **kwargs)
2365 return result
2366
~/opt/anaconda3/lib/python3.9/site-packages/decorator.py in fun
(*args, **kw)
230 if not kwsyntax:
231 args, kw = fix(args, kw, sig)
--> 232 return caller(func, *(extras + args), **kw)
233 fun.__name__ = func.__name__
234 fun.__doc__ = func.__doc__
~/opt/anaconda3/lib/python3.9/site-packages/IPython/core/magic.
py in <lambda>(f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
~/opt/anaconda3/lib/python3.9/site-packages/decorator.py in fun
(*args, **kw)
230 if not kwsyntax:
231 args, kw = fix(args, kw, sig)
--> 232 return caller(func, *(extras + args), **kw)
233 fun.__name__ = func.__name__
234 fun.__doc__ = func.__doc__
~/opt/anaconda3/lib/python3.9/site-packages/IPython/core/magic.
py in <lambda>(f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
~/opt/anaconda3/lib/python3.9/site-packages/sql/magic.py in exe
cute(self, line, cell, local_ns)
93
94 try:
---> 95 result = sql.run.run(conn, parsed['sql'], s
elf, user_ns)
96
97 if result is not None and not isinstance(re
sult, str) and self.column_local_vars:
~/opt/anaconda3/lib/python3.9/site-packages/sql/run.py in run(c
onn, sql, config, user_namespace)
338 else:
339 txt = sqlalchemy.sql.text(statement)
--> 340 result = conn.session.execute(txt, user
_namespace)
341 _commit(conn=conn, config=config)
342 if result and config.feedback:
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/b
ase.py in execute(self, statement, *multiparams, **params)
1304 )
1305 else:
-> 1306 return meth(self, multiparams, params, _EMP
TY_EXECUTION_OPTS)
1307
1308 def _execute_function(self, func, multiparams, para
ms, execution_options):
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/sql/elem
ents.py in _execute_on_connection(self, connection, multiparam
s, params, execution_options, _force)
330 ):
331 if _force or self.supports_execution:
--> 332 return connection._execute_clauseelement(
333 self, multiparams, params, execution_op
tions
334 )
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/b
ase.py in _execute_clauseelement(self, elem, multiparams, param
s, execution_options)
1496 linting=self.dialect.compiler_linting | com
piler.WARN_LINTING,
1497 )
-> 1498 ret = self._execute_context(
1499 dialect,
1500 dialect.execution_ctx_cls._init_compiled,
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/b
ase.py in _execute_context(self, dialect, constructor, statemen
t, parameters, execution_options, *args, **kw)
1860
1861 except BaseException as e:
-> 1862 self._handle_dbapi_exception(
1863 e, statement, parameters, cursor, conte
xt
1864 )
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/b
ase.py in _handle_dbapi_exception(self, e, statement, parameter
s, cursor, context)
2041 util.raise_(newraise, with_traceback=ex
c_info[2], from_=e)
2042 elif should_wrap:
-> 2043 util.raise_(
2044 sqlalchemy_exception, with_tracebac
k=exc_info[2], from_=e
2045 )
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/util/com
pat.py in raise_(***failed resolving arguments***)
206
207 try:
--> 208 raise exception
209 finally:
210 # credit to
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/b
ase.py in _execute_context(self, dialect, constructor, statemen
t, parameters, execution_options, *args, **kw)
1817 break
1818 if not evt_handled:
-> 1819 self.dialect.do_execute(
1820 cursor, statement, parameters,
context
1821 )
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/d
efault.py in do_execute(self, cursor, statement, parameters, co
ntext)
730
731 def do_execute(self, cursor, statement, parameters,
context=None):
--> 732 cursor.execute(statement, parameters)
733
734 def do_execute_no_params(self, cursor, statement, c
ontext=None):
* sqlite:///school.db
---------------------------------------------------------------
------------
IntegrityError Traceback (most recen
t call last)
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/b
ase.py in _execute_context(self, dialect, constructor, statemen
t, parameters, execution_options, *args, **kw)
1818 if not evt_handled:
-> 1819 self.dialect.do_execute(
1820 cursor, statement, parameters,
context
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/d
efault.py in do_execute(self, cursor, statement, parameters, co
ntext)
731 def do_execute(self, cursor, statement, parameters,
context=None):
--> 732 cursor.execute(statement, parameters)
733
The above exception was the direct cause of the following excep
tion:
~/opt/anaconda3/lib/python3.9/site-packages/IPython/core/intera
ctiveshell.py in run_line_magic(self, magic_name, line, _stack_
depth)
2362 kwargs['local_ns'] = self.get_local_sco
pe(stack_depth)
2363 with self.builtin_trap:
-> 2364 result = fn(*args, **kwargs)
2365 return result
2366
~/opt/anaconda3/lib/python3.9/site-packages/decorator.py in fun
(*args, **kw)
230 if not kwsyntax:
231 args, kw = fix(args, kw, sig)
--> 232 return caller(func, *(extras + args), **kw)
233 fun.__name__ = func.__name__
234 fun.__doc__ = func.__doc__
~/opt/anaconda3/lib/python3.9/site-packages/IPython/core/magic.
py in <lambda>(f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
~/opt/anaconda3/lib/python3.9/site-packages/decorator.py in fun
(*args, **kw)
230 if not kwsyntax:
231 args, kw = fix(args, kw, sig)
--> 232 return caller(func, *(extras + args), **kw)
233 fun.__name__ = func.__name__
234 fun.__doc__ = func.__doc__
~/opt/anaconda3/lib/python3.9/site-packages/IPython/core/magic.
py in <lambda>(f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
~/opt/anaconda3/lib/python3.9/site-packages/sql/magic.py in exe
cute(self, line, cell, local_ns)
93
94 try:
---> 95 result = sql.run.run(conn, parsed['sql'], s
elf, user_ns)
96
97 if result is not None and not isinstance(re
sult, str) and self.column_local_vars:
~/opt/anaconda3/lib/python3.9/site-packages/sql/run.py in run(c
onn, sql, config, user_namespace)
338 else:
339 txt = sqlalchemy.sql.text(statement)
--> 340 result = conn.session.execute(txt, user
_namespace)
341 _commit(conn=conn, config=config)
342 if result and config.feedback:
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/b
ase.py in execute(self, statement, *multiparams, **params)
1304 )
1305 else:
-> 1306 return meth(self, multiparams, params, _EMP
TY_EXECUTION_OPTS)
1307
1308 def _execute_function(self, func, multiparams, para
ms, execution_options):
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/sql/elem
ents.py in _execute_on_connection(self, connection, multiparam
s, params, execution_options, _force)
330 ):
331 if _force or self.supports_execution:
--> 332 return connection._execute_clauseelement(
333 self, multiparams, params, execution_op
tions
334 )
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/b
ase.py in _execute_clauseelement(self, elem, multiparams, param
s, execution_options)
1496 linting=self.dialect.compiler_linting | com
piler.WARN_LINTING,
1497 )
-> 1498 ret = self._execute_context(
1499 dialect,
1500 dialect.execution_ctx_cls._init_compiled,
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/b
ase.py in _execute_context(self, dialect, constructor, statemen
t, parameters, execution_options, *args, **kw)
1860
1861 except BaseException as e:
-> 1862 self._handle_dbapi_exception(
1863 e, statement, parameters, cursor, conte
xt
1864 )
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/b
ase.py in _handle_dbapi_exception(self, e, statement, parameter
s, cursor, context)
2041 util.raise_(newraise, with_traceback=ex
c_info[2], from_=e)
2042 elif should_wrap:
-> 2043 util.raise_(
2044 sqlalchemy_exception, with_tracebac
k=exc_info[2], from_=e
2045 )
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/util/com
pat.py in raise_(***failed resolving arguments***)
206
207 try:
--> 208 raise exception
209 finally:
210 # credit to
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/b
ase.py in _execute_context(self, dialect, constructor, statemen
t, parameters, execution_options, *args, **kw)
1817 break
1818 if not evt_handled:
-> 1819 self.dialect.do_execute(
1820 cursor, statement, parameters,
context
1821 )
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/d
efault.py in do_execute(self, cursor, statement, parameters, co
ntext)
730
731 def do_execute(self, cursor, statement, parameters,
context=None):
--> 732 cursor.execute(statement, parameters)
733
734 def do_execute_no_params(self, cursor, statement, c
ontext=None):
* sqlite:///school.db
---------------------------------------------------------------
------------
IntegrityError Traceback (most recen
t call last)
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/b
ase.py in _execute_context(self, dialect, constructor, statemen
t, parameters, execution_options, *args, **kw)
1818 if not evt_handled:
-> 1819 self.dialect.do_execute(
1820 cursor, statement, parameters,
context
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/d
efault.py in do_execute(self, cursor, statement, parameters, co
ntext)
731 def do_execute(self, cursor, statement, parameters,
context=None):
--> 732 cursor.execute(statement, parameters)
733
The above exception was the direct cause of the following excep
tion:
~/opt/anaconda3/lib/python3.9/site-packages/IPython/core/intera
ctiveshell.py in run_line_magic(self, magic_name, line, _stack_
depth)
2362 kwargs['local_ns'] = self.get_local_sco
pe(stack_depth)
2363 with self.builtin_trap:
-> 2364 result = fn(*args, **kwargs)
2365 return result
2366
~/opt/anaconda3/lib/python3.9/site-packages/decorator.py in fun
(*args, **kw)
230 if not kwsyntax:
231 args, kw = fix(args, kw, sig)
--> 232 return caller(func, *(extras + args), **kw)
233 fun.__name__ = func.__name__
234 fun.__doc__ = func.__doc__
~/opt/anaconda3/lib/python3.9/site-packages/IPython/core/magic.
py in <lambda>(f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
~/opt/anaconda3/lib/python3.9/site-packages/decorator.py in fun
(*args, **kw)
230 if not kwsyntax:
231 args, kw = fix(args, kw, sig)
--> 232 return caller(func, *(extras + args), **kw)
233 fun.__name__ = func.__name__
234 fun.__doc__ = func.__doc__
~/opt/anaconda3/lib/python3.9/site-packages/IPython/core/magic.
py in <lambda>(f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
~/opt/anaconda3/lib/python3.9/site-packages/sql/magic.py in exe
cute(self, line, cell, local_ns)
93
94 try:
---> 95 result = sql.run.run(conn, parsed['sql'], s
elf, user_ns)
96
97 if result is not None and not isinstance(re
sult, str) and self.column_local_vars:
~/opt/anaconda3/lib/python3.9/site-packages/sql/run.py in run(c
onn, sql, config, user_namespace)
338 else:
339 txt = sqlalchemy.sql.text(statement)
--> 340 result = conn.session.execute(txt, user
_namespace)
341 _commit(conn=conn, config=config)
342 if result and config.feedback:
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/b
ase.py in execute(self, statement, *multiparams, **params)
1304 )
1305 else:
-> 1306 return meth(self, multiparams, params, _EMP
TY_EXECUTION_OPTS)
1307
1308 def _execute_function(self, func, multiparams, para
ms, execution_options):
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/sql/elem
ents.py in _execute_on_connection(self, connection, multiparam
s, params, execution_options, _force)
330 ):
331 if _force or self.supports_execution:
--> 332 return connection._execute_clauseelement(
333 self, multiparams, params, execution_op
tions
334 )
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/b
ase.py in _execute_clauseelement(self, elem, multiparams, param
s, execution_options)
1496 linting=self.dialect.compiler_linting | com
piler.WARN_LINTING,
1497 )
-> 1498 ret = self._execute_context(
1499 dialect,
1500 dialect.execution_ctx_cls._init_compiled,
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/b
ase.py in _execute_context(self, dialect, constructor, statemen
t, parameters, execution_options, *args, **kw)
1860
1861 except BaseException as e:
-> 1862 self._handle_dbapi_exception(
1863 e, statement, parameters, cursor, conte
xt
1864 )
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/b
ase.py in _handle_dbapi_exception(self, e, statement, parameter
s, cursor, context)
2041 util.raise_(newraise, with_traceback=ex
c_info[2], from_=e)
2042 elif should_wrap:
-> 2043 util.raise_(
2044 sqlalchemy_exception, with_tracebac
k=exc_info[2], from_=e
2045 )
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/util/com
pat.py in raise_(***failed resolving arguments***)
206
207 try:
--> 208 raise exception
209 finally:
210 # credit to
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/b
ase.py in _execute_context(self, dialect, constructor, statemen
t, parameters, execution_options, *args, **kw)
1817 break
1818 if not evt_handled:
-> 1819 self.dialect.do_execute(
1820 cursor, statement, parameters,
context
1821 )
~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/d
efault.py in do_execute(self, cursor, statement, parameters, co
ntext)
730
731 def do_execute(self, cursor, statement, parameters,
context=None):
--> 732 cursor.execute(statement, parameters)
733
734 def do_execute_no_params(self, cursor, statement, c
ontext=None):
INSERT INTO student VALUES ('202022333', 'Harry', 'BSc Data Science', 'Statistics', 2);
INSERT INTO student VALUES ('202012345', 'Ron', 'BSc Data Science', 'Statistics', 2);
INSERT INTO student VALUES ('202054321', 'Hermione', 'BSc Economics', 'Economics', 2);
INSERT INTO student VALUES ('202101010', 'Ginny', 'BSc Data Science', 'Statistics', 1);
INSERT INTO student VALUES ('202155555', 'Dobby', 'BSc Actuarial Science', 'Statistics', 1);
INSERT INTO student VALUES ('202124680', 'Harry', 'MSc Data Science', 'Statistics', 1);
* sqlite:///school.db
Done.
Done.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
Out[9]:
[]
View the student table
Check if we have created the table and inserted the rows properly:
In [10]:
%sql SELECT * FROM student;
* sqlite:///school.db
Done.
Out[10]:
id name program department year
202022333 Harry BSc Data Science Statistics 2
202012345 Ron BSc Data Science Statistics 2
202054321 Hermione BSc Economics Economics 2
202101010 Ginny BSc Data Science Statistics 1
202155555 Dobby BSc Actuarial Science Statistics 1
202124680 Harry MSc Data Science Statistics 1
Create the registration table and insert rows
In [11]:
%%sql
* sqlite:///school.db
Done.
Done.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
Out[11]:
[]
View the registration table
Check if we have created the table and inserted the rows properly:
In [12]:
%sql SELECT * FROM registration;
* sqlite:///school.db
Done.
Out[12]:
courseCode studentId mark
ST207 202022333 72
MA214 202022333 None
ST207 202012345 66
MA214 202012345 None
EC220 202054321 None
ST101 202054321 93
ST115 202054321 None
courseCode studentId mark
ST101 202101010 70
ST115 202101010 None
SQL query
More on SQL
SQL is a computer language used to communicate with a RDBMS and it allows
us to store, update, query data and control access of a relational database. It
consists of many types of statements:
Data query
Data manipulation (insert, update and delete)
Data definition (schema creation and modification)
Data access control
We have previously seen how we can use SQL to create tables and insert data.
Now we will learn how to query data from a database using SQL.
In this course we will not cover how to use SQL to control access or modify
the data
SQL query syntax
* sqlite:///school.db
Done.
Out[13]:
code name quota
ST101 Programming for Data Science 90
ST115 Managing and Visualising Data 60
MA214 Algorithms and Data Structures None
Use SELECT * to select all attributes
Use LIMIT to specify the number of rows to get
SQL simple query demo 2: use WHERE and ORDER BY
Example (filtering): Select the name of the students who are from BSc Data
Science, and order their name in descending order
In [14]:
%%sql
SELECT name
FROM student
WHERE program = 'BSc Data Science'
ORDER BY name DESC;
* sqlite:///school.db
Done.
Out[14]:
name
Ron
Harry
Ginny
Use WHERE to filter data based on some condition - Note = is used for
equality!
Use ORDER BY to order the data based on some attribute(s)
Use DESC if you want the data to be ordered in descending order
(default is ascending order)
SQL simple query demo 3: use DISTINCT
Example (Unique values): Get the list of unique departments from
student .
In [15]:
%%sql
SELECT DISTINCT department
FROM student;
* sqlite:///school.db
Done.
Out[15]:
department
Statistics
Economics
Use DISTINCT to return unique results
SQL simple query demo 4: check NULL
Example (Find NULL values): Find out the register with mark missing.
In [16]:
%%sql
SELECT *
FROM registration
WHERE mark IS NULL;
* sqlite:///school.db
Done.
Out[16]:
courseCode studentId mark
MA214 202022333 None
MA214 202012345 None
EC220 202054321 None
ST115 202054321 None
ST115 202101010 None
Note you must use IS NULL to check NULL . The following does not
work:
In [17]:
%sql SELECT * FROM registration WHERE mark = NULL;
* sqlite:///school.db
Done.
Out[17]:
courseCode studentId mark
SQL simple query demo 5: use AND / OR to chain up conditions
Example: Get the rows from the registration table for which it is for
ST101 and the mark is ≥ 90
In [18]:
%%sql
SELECT *
FROM registration
WHERE courseCode = 'ST101' AND mark >= 90;
* sqlite:///school.db
Done.
Out[18]:
courseCode studentId mark
ST101 202054321 93
Example: Get the courses with code is ST101 or ST115 from the
course table
In [19]:
%%sql
SELECT *
FROM course
WHERE code = 'ST101' OR code = 'ST115';
* sqlite:///school.db
Done.
Out[19]:
code name quota
ST101 Programming for Data Science 90
ST115 Managing and Visualising Data 60
SQL simple query demo 6: built-in predicates
The last example can also be checked with the use of some built-in predicates.
Example: Use the LIKE keyword to test whether a string matches a pattern
with wildcards:
% : zero, one, or multiple characters
_ : one, single character
In [23]:
%%sql
SELECT *
FROM course
WHERE code LIKE 'ST1__';
* sqlite:///school.db
Done.
Out[23]:
code name quota
ST101 Programming for Data Science 90
code name quota
ST115 Managing and Visualising Data 60
SQL simple query demo 6: built-in predicates (continue)
Example: Use the IN keyword to specify multiple values, which is a
shorthand for multiple OR conditions
In [19]:
%%sql
SELECT *
FROM course
WHERE code IN ('ST101', 'ST115');
* sqlite:///school.db
Done.
Out[19]:
code name quota
ST101 Programming for Data Science 90
ST115 Managing and Visualising Data 60
Note on SQL syntax case
SQL syntax in SQLite is case insensitive. For example, the following code will
give you the same result as the previous slide:
In [20]:
%%sql
select *
from COURSE
where CODE in ('ST101', 'ST115');
* sqlite:///school.db
Done.
Out[20]:
code name quota
ST101 Programming for Data Science 90
ST115 Managing and Visualising Data 60
However, it is the convention to use CAPITAL letters for the SQL keywords and
functions, and pascalCase for the table and attribute names.
Note on SQL syntax case (continue)
Note text comparison is case sensitive:
In [21]:
%%sql
select *
from COURSE
where CODE in ('st101', 'ST115');
* sqlite:///school.db
Done.
Out[21]:
code name quota
ST115 Managing and Visualising Data 60
A Grammar of data manipulation
Similar
functionality
functionality provided
SQL
by Pandas: Pandas
Select/filter data SELECT , WHERE loc[] , iloc[]
Sort ORDER BY sort_values()
Join JOIN merge()
Aggregate AVG() , COUNT() mean() , count()
GROUP BY groupby()
Aggregation
SQL provides aggregate functions to return a single value from a set of values
Examples:
COUNT()
SUM()
AVG()
,
MIN() MAX()
See here for a list of aggregate functions provided by SQLite.
SQL aggregation demo 2: COUNT()
Example: count the number of students registered for ST101
In [24]:
%%sql
SELECT COUNT(*)
FROM registration
where courseCode = 'ST101'
* sqlite:///school.db
Done.
Out[24]:
COUNT(*)
2
Example: count the number of unique departments from student
In [25]:
%%sql
SELECT COUNT(DISTINCT department)
FROM student;
* sqlite:///school.db
Done.
Out[25]:
COUNT(DISTINCT department)
2
SQL aggregation demo: MAX()
Example: find the maximum mark for ST207 in registration :
In [26]:
%%sql
SELECT MAX(mark) AS maxMark
FROM registration
WHERE courseCode = 'ST207';
* sqlite:///school.db
Done.
Out[26]:
maxMark
72
The AS keyword is used to rename a column or table with an alias.
SQL aggregation demo: AVG() (with the use of GROUP BY )
Example: find the average mark for each course:
In [25]:
%%sql
SELECT courseCode, AVG(mark) AS avgMark
FROM registration
GROUP BY courseCode;
* sqlite:///school.db
Done.
Out[25]:
courseCode avgMark
EC220 None
MA214 None
ST101 81.5
ST115 None
ST207 69.0
SQL aggregation demo: HAVING
To filter the groups based on some specified conditions, use the HAVING
clause.
Example: find the courses such that the average mark is ≥ 70:
In [26]:
%%sql
SELECT courseCode, AVG(mark) AS avgMark
FROM registration
GROUP BY courseCode
HAVING AVG(mark) >= 70;
* sqlite:///school.db
Done.
Out[26]:
courseCode avgMark
ST101 81.5
SQL joins
SQL joins allow you to combine and retrieve data from two or more tables.
Inner join demo
Example: combine the student and registration table using the inner
join.
In [27]:
%%sql
SELECT *
FROM student
INNER JOIN registration
ON student.id = registration.studentId;
* sqlite:///school.db
Done.
Out[27]:
id name program department year courseCode studentId mark
* sqlite:///school.db
Done.
Out[28]:
studentId name program
202012345 Ron BSc Data Science
202022333 Harry BSc Data Science
202054321 Hermione BSc Economics
studentId name program
202101010 Ginny BSc Data Science
Inner join demo (continue)
Example: For each student who has registered to some Statistics courses, get
the number of Statistics courses the student is in.
In [29]:
%%sql
* sqlite:///school.db
Done.
Out[29]:
studentId name COUNT(*)
202012345 Ron 1
202022333 Harry 1
202054321 Hermione 2
202101010 Ginny 2
By joining the tables, it allows us to aggregate the information and answer
some more complicated questions.
Left join
Example: Same as the previous inner join example, but with the left join:
In [30]:
%%sql
SELECT *
FROM student
LEFT JOIN registration
ON id = studentId
ORDER BY courseCode
* sqlite:///school.db
Done.
Out[30]:
id name program department year courseCode studentId mark
BSc
202155555 Dobby Actuarial Statistics 1 None None None
Science
202124680 Harry MSc Data Statistics 1 None
Science None None
BSc Economics 2 EC220 202054321 None
202054321 Hermione Economics
id name program department year courseCode studentId mark
SELECT *
FROM registration
LEFT JOIN student
ON id = studentId
ORDER BY studentId
* sqlite:///school.db
Done.
Out[31]:
courseCode studentId mark id name program department year
SELECT *
FROM registration, student
LIMIT 15; -- only show the first 15 rows
* sqlite:///school.db
Done.
Out[32]:
courseCode studentId mark id name program department year
* sqlite:///school.db
Done.
Out[33]:
courseCode studentId mark id name program department year