Session 3 BIZ 2
Session 3 BIZ 2
10:15
https://unsplash.com/photos/gm3bxHin8VA
MySQL for Data Analytics
13.09.2023
4
ENCLOSED BY "
ESCAPED BY \\
13.09.2023
5
http://stackoverflow.com/questions/8462615/what-do-the-following-mysql-
csv-import-query-terms-mean
Clean the data of a table
• If the data was not imported correctly, you can
drop the problematic data. Please using the
following comment to clean the table
13.09.2023
7
Section 2: Key and Index
“In database systems, an index (IDX) is
a data structure defined on columns in
a database table to significantly speed
up data retrieval operations. An index
is a small copy of a database table
sorted by key values. Without an index,
query languages like SQL may have to
scan the entire table from top to bottom
to choose relevant rows.”
13.09.2023
9
Index of MySQL
13.09.2023
10
13.09.2023
11
Advantages vs. Disadvantages of index
Advantages Disadvantages
• Speed up relevant queries • more disk space,
like select. • degrade insert/
updates/delete speed
• Building index itself takes
time
13.09.2023
12
Operation on big data
• Performing a command in a table with 384,243 rows of data
Commands Query duration
select * from MyTable where id = 7278409 Duration for 1 query:
0,000 sec.
select * from MyTable where title like Duration for 1 query:
'%This is a new product%‘ 10,765 sec.
select * from MyTable where title like Duration for 1 query:
'%This is a new product%‘ and id = 7278409 0,000 sec.
select * from MyTable where title like Duration for 1 query:
'%This is a new product%' and via_mobile = 2,918 sec.
‘TRUE'
Columns ‘id’ and ‘via_mobile’ are indexed, but column ‘title’ is not indexed13
“Via_mobile” only has two different values: TRUE or FALSE 13.09.2023
Building index in MySQL (Attention!)
• To operate on a big table (e.g. 10 million rows),
you must build an index before your perform
basic command like ‘select’.
13.09.2023
14
Building index also takes time!
• “I have a table with 1.4 billion records. The table structure is as follows:
CREATE TABLE text_page ( text VARCHAR(255),
page_id INT UNSIGNED )
ENGINE=MYISAM DEFAULT CHARSET=ascii
13.09.2023
17
Primary key
• As a principle, there should be no duplicated rows
co-existing in a table.
13.09.2023
19
Duplicating a table
• Copy the structure and indexes, but not the data:
- create table new_table like old_table;
• Copy the data and the structure, but not the indexes:
- create table new_table as select * from old_table; 13.09.2023
20
Purpose of foreign key?
• To delete the information of the students who are
graduated from the university from the database, you
need to, e.g. :
- Delete students’ IDs from the table of university student list
- Delete students’ IDs from the table of department student list
- Delete students’ IDs from the table of course management
- Delete students’ IDs from the table of library
- Delete students’ IDs from the table of health care
- Etc.
13.09.2023
21
Foreign key
• A foreign key is a column in a table (table A).
• But this column is a primary key in the other table
(table B).
• Any data in a foreign key column of table A must
have corresponding data in the other table (table
B).
Table customer
Table product_order
13.09.2023
25
Using FOREIGN KEY Constraints
• Delete and update record using foreign key
13.09.2023
http://www.sitepoint.com/mysql-foreign-keys-quicker-database-development/
default http://dev.mysql.com/doc/refman/5.6/en/create-table-foreign-keys.html
26
FOREIGN KEY Constraints
For both update and delete :
if you try to update / delete the parent row :
• Restrict : Nothing gonna be delete if there is a child
row. Rejects the delete or update operation for the
parent table [Equivalent to No Action].
• Cascade : the child row will be delete / update too
• Set Null : the child column will be set to null if you
delete the parent [make sure that you have NOT declared
13.09.2023
the columns in the child table as NOT NULL ] . 27
https://dev.mysql.com/doc/refman/5.7/en/create-table-foreign-keys.html
Create foreign key (2)
If you want to add a foreign key after the related
tables have been built:
13.09.2023
28
Function of foreign key
• Inserting an order from a customer that
does not exist in the table customer?
13.09.2023
29
Referential integrity
• Foreign key values must exist in another table
- If not, those records cannot be joined
Table customer
Table product_order
13.09.2023
32
Drop foreign key via HeidiSQL
13.09.2023
33
Drop foreign key via commands
• Foreign key tends to have a different name that is not
intuitively available.
13.09.2023
34
Step 1: Step 2:
“show create table Obtain the name of a
Table_Name” foreign key:
Step 3:
“alter table Table_Name drop
foreign key `constraint_name`”
13.09.2023
35
Section 3: Entity-relationship diagram (ERD)
13.09.2023
36
Entity-relationship diagram
13.09.2023
39
http://www.conceptdraw.com/solution-park/diagramming-ERD
13.09.2023
40
Section 4: Select
• Select command is used to retrieve data from a
table
• Template of a “select” query:
Select attributes
from table or view
[Where conditions]
[Group by attributes [Having condition]]
[Order by attributes [asc | desc]] 13.09.2023
41
[Limit]
Select for calculation
• In command window:
This does not work!
This works!
13.09.2023
42
Select: Retrieving certain columns and all rows
• If you have a data table with 20+ variables and half a
million rows, but you just want to use a few variables.
select contactLastName, contactFirstName, phone
from customers;
Table name
Column names
13.09.2023
43
Select all columns via *
• An asterisk (*) indicates an inclusion of all
the columns of a table
13.09.2023
44
Select…Where…(1)
• A customer of your company is found to not pay
the bill by the deadline. Your boss know the ID
of the customer (customerNumber) is 103 and
ask you to provide the contact details of the
customer.
Your task: quickly detecting the contact
details of the customer whose cusomterNumber is
103. 13.09.2023
45
Solution
Select contactLastName, contactFirstName
from customers
where customerNumber = 103;
13.09.2023
46
Select…Where…(2)
• An important customer contacted your
company asking for some information on his
deal with the company, but he lost the name
card of sales representative who assisted him
with the deal. He remember the first name of
the person is Leslie. Please find the contact
information of Leslie from the database.
13.09.2023
47
“ = ” is MySQL equal operator
Remember to use ' to specify a string
13.09.2023
49
Comparison operator
Comparison operator Description
= Equal to
< Less than
> Greater than
<= Less than or equal to
>= Greater than or equal to
<> Not equal to
!= Not equal to
13.09.2023
51
Comparison operator: Date
Comparison operator can also be applied to
date type column, e.g.:
13.09.2023
52
Compound conditions: and / or / not
13.09.2023
54
SELECT contactLastName,contactFirstName,
phone, city, addressLine1
from customers where city = Paris and creditLimit >
50000
Single quotes are
needed.
13.09.2023
55
Tips: quick way of counting rows
13.09.2023
58
Example for NOT
Select contactLastName, contactFirstName, phone, city
from customers
where NOT
(city = 'Paris' and CreditLimit > 5000)
or (city = 'Madrid' and CreditLimit > 10000)
Alternative 1:
Select * from customers
where creditLimit >= 60000 and creditLimit <= 70000
Alternative 2:
Select * from customers
where creditLimit between 60000 and 70000 13.09.2023
61
Challenge!
• Please retrieve the records from
table “orders” in which status is not “Shipped” and the
orderDate is between '2005-05-09' and '2005-05-31' and
requiredDate is between '2005-06-01' and '2005-06-10'
by customers whose customersNumber are 124 or 119.
13.09.2023
Table orders 62
Reflect and Question
SELECT *
FROM orders
WHERE `status` != 'Shipped' AND
orderDate BETWEEN '2005-05-09' AND '2005-05-31' AND
requiredDate BETWEEN '2005-06-01' AND '2005-06-10' AND
(customerNumber = 124 or 119)
13.09.2023
64
Select …. limit
• Limit is used to limit your MySQL query
results to those that fall within a specified
range.
• select * from products Limit 0,10;
# Retrieve first ten rows or using [Limit 10]
• select * from products Limit 5,10;
# Retrieve rows 6-15
13.09.2023
65
Select …. limit
Limit function:
• Enable a quick check of the validity of the
result.
• Save lots of time when a huge number of
records will be returned [a big-data issue].
13.09.2023
66
Using computed columns (1)
• Your boss asks you to provide information
on the values of different products in stock.
• Value = quantityInStock * buyPrice
13.09.2023
67
Using computed columns (2)
Select productCode, quantityInStock*buyPrice
from products +
select 5/2 → 2,5000
-
select 5%2 → 1 *
/
%
13.09.2023
68
Using As for Aliases
• SELECT column_name AS alias_name
FROM table_name;
Select productCode,
quantityInStock*buyPrice as productValue
from products
13.09.2023
69
Make the result more readable
• Select productCode,
round(quantityInStock*buyPrice, 1) as productValue
from products
13.09.2023
70
Save results as a new table
Duplicate tables
• create table new_table as select * from old_table;
13.09.2023
72
Order by… asc/desc
• Problem of unsorted results
• Order by column asc/desc
- Asc : ascending (default option)
- Desc: descending
13.09.2023
73
Example
• select productCode,
quantityInStock*buyPrice as productValue
from products
order by productValue desc
limit 3
13.09.2023
74
More complex sort
• Sort several columns at the
same time
Order by column 1, column 2…
Primary sort key Secondary sort key
• If you want to change direction
of sorting among columns
Order by column 1 desc,
column 2 asc 13.09.2023
75
Question
• Your boss wants to identify
‘big’ payments to the
company in 2004
• Amounts of ‘big’ payments
should be over 10000.
• Please exhibit results in
both an ascending order
for customerNumber and a
descending order for
amount 13.09.2023
76
Table ‘payments’
Answer
Select * from payments
where amount> 10000 and
paymentDate between '2004-01-01'
and '2004-12-31'
order by customerNumber asc,
amount desc
13.09.2023
77
Instructions on Hands-on session 3
• “chile” data and “classicmodels” database will be used for the
hands-on training 3. The tables belonging to “classicmodels”
database should have already been imported at the first hands-on
session via importing the “classicmodels (using PCs in the lab or at
home).sql” file.
• The Chile data has 2400 rows and 8 columns. This data was derived
from a national survey conducted in April and May of 1988 by
FLACSO/Chile. Missing data are removed.
• Please read description of the dataset (Description of Chile election
1988.docx), downloadable from MyCourse [Data and database files
folder] 13.09.2023
78