0% found this document useful (0 votes)
21 views78 pages

Session 3 BIZ 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views78 pages

Session 3 BIZ 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

The lecture will start soon at

10:15

https://unsplash.com/photos/gm3bxHin8VA
MySQL for Data Analytics

Lecturer: Yong Liu


Contact me at: [email protected]
Objectives for class 3
- HeidiSQL: Import csv file to MySQL table
- Export data to a database file (sql file)
- Key and index: foreign key
- Understanding the basics of Entity-
Relationship Diagram (ERD)
- MySQL Keyword: Select
3
Import CSV File Into MySQL Table
1. Download ‘Chile.csv’ from MyCourse
2. Open the csv file to check its structure.
What if the csv file is very large? R?
3. Create a new table in the DB with a structure that is
consistent with the structure of the csv file.
4. Import the csv file to the new table:
Tools → Import CSV file

13.09.2023
4
ENCLOSED BY "

One; two; "three; tres; trois" ; four ; five

ESCAPED BY \\

One; two; "In France, one would say


\"trois\""; four ; five

13.09.2023
5
http://stackoverflow.com/questions/8462615/what-do-the-following-mysql-
csv-import-query-terms-mean
Clean the data of a table
• If the data was not imported correctly, you can
drop the problematic data. Please using the
following comment to clean the table

delete from table_name;

• Output: An empty table will be returned


13.09.2023
6
Export data to be a database file
• Tools→ Export database as SQL

13.09.2023
7
Section 2: Key and Index
“In database systems, an index (IDX) is
a data structure defined on columns in
a database table to significantly speed
up data retrieval operations. An index
is a small copy of a database table
sorted by key values. Without an index,
query languages like SQL may have to
scan the entire table from top to bottom
to choose relevant rows.”

Index of MySQL is similar to the index of


A database without index! a dictionary, or the address of a person in
Finland when you want to find the person.
13.09.2023
http://1zxig52eo2js464ld31v6ig86yy.wpengine.netdna-cdn.com/wp- http://www.techopedia.com/definition/1210/index-idx-database-systems 8
content/uploads/2014/05/iStock_000018288212Small.jpg
http://discuss.fogcreek.com/joelonsoftware5/default.asp?cmd=show&ixPost=152398
Index
• Indexes are used to find rows with specific column values
quickly. Without an index, MySQL must begin with the first
row and then read through the entire table to find the relevant
rows. The larger the table, the more this costs.
• If the table has an index for the columns in question, MySQL
can quickly determine the position to seek to in the middle of
the data file without having to look at all the data. This is
much faster than reading every row sequentially.

13.09.2023
9
Index of MySQL

13.09.2023
10
13.09.2023
11
Advantages vs. Disadvantages of index
Advantages Disadvantages
• Speed up relevant queries • more disk space,
like select. • degrade insert/
updates/delete speed
• Building index itself takes
time

13.09.2023
12
Operation on big data
• Performing a command in a table with 384,243 rows of data
Commands Query duration
select * from MyTable where id = 7278409 Duration for 1 query:
0,000 sec.
select * from MyTable where title like Duration for 1 query:
'%This is a new product%‘ 10,765 sec.
select * from MyTable where title like Duration for 1 query:
'%This is a new product%‘ and id = 7278409 0,000 sec.
select * from MyTable where title like Duration for 1 query:
'%This is a new product%' and via_mobile = 2,918 sec.
‘TRUE'
Columns ‘id’ and ‘via_mobile’ are indexed, but column ‘title’ is not indexed13
“Via_mobile” only has two different values: TRUE or FALSE 13.09.2023
Building index in MySQL (Attention!)
• To operate on a big table (e.g. 10 million rows),
you must build an index before your perform
basic command like ‘select’.

• Building index also takes time.

13.09.2023
14
Building index also takes time!
• “I have a table with 1.4 billion records. The table structure is as follows:
CREATE TABLE text_page ( text VARCHAR(255),
page_id INT UNSIGNED )
ENGINE=MYISAM DEFAULT CHARSET=ascii

The requirement is to create an index over the column text.


• The table size is about 34G.
• I have tried to create the index by the following statement:
ALTER TABLE text_page ADD KEY ix_text (text)
• After 10 hours' waiting I finally give up this approach.
• Is there any workable solution on this problem?”
13.09.2023
15
http://serverfault.com/questions/140488/mysql-create-index-on-1-4-billion-records
http://www.getambition.com/wp-content/uploads/2013/05/BigData.jpg
Tips
• Indexed the columns referenced in the WHERE clause and
columns used in JOIN clauses.
• Indexing columns in abundance will result in some
disadvantages. However, many times these disadvantages
are negligible.
• Use the NOT NULL attribute for those columns in which
you consider the indexing, so that NULL values will never
be stored.
Where and join commands
will be introduced in the
future sessions. 13.09.2023
16
http://stackoverflow.com/questions/11694743/should-i-add-an-index-for-all-fields-in-the-where-clause-mysql
Key and Index
• KEY is normally a synonym for INDEX.

• Columns defined as primary keys or unique


keys are automatically indexed in MySQL.

13.09.2023
17
Primary key
• As a principle, there should be no duplicated rows
co-existing in a table.

• A PRIMARY KEY is a unique index where all key


columns must be defined as NOT NULL.

• A table can have only one PRIMARY KEY.


13.09.2023
18
Unique key
• A unique key creates a constraint such that
all values in the index must be distinct.
• A unique key permits multiple NULL values
for columns that can contain NULL.
• A table can have multiple unique keys

13.09.2023
19
Duplicating a table
• Copy the structure and indexes, but not the data:
- create table new_table like old_table;

• Copy the structure, indexes and the data


- Create table new_table like old_table;
- Insert new_table select * from old_table;

• Copy the data and the structure, but not the indexes:
- create table new_table as select * from old_table; 13.09.2023
20
Purpose of foreign key?
• To delete the information of the students who are
graduated from the university from the database, you
need to, e.g. :
- Delete students’ IDs from the table of university student list
- Delete students’ IDs from the table of department student list
- Delete students’ IDs from the table of course management
- Delete students’ IDs from the table of library
- Delete students’ IDs from the table of health care
- Etc.
13.09.2023
21
Foreign key
• A foreign key is a column in a table (table A).
• But this column is a primary key in the other table
(table B).
• Any data in a foreign key column of table A must
have corresponding data in the other table (table
B).

- Note: foreign-key column need not be unique in table A.


13.09.2023
22
Example
Primary key

Table customer

Table product_order

Foreign key 13.09.2023


23
Create foreign key (1)

CREATE TABLE product_order


(
Product char(50) NOT NULL,
Order_ID int NOT NULL,
Person_ID int NOT NULL,
Price int NOT NULL,
Constraint `key_1` foreign key (Person_ID) references
customer(Person_ID)
); 13.09.2023
24
Create foreign key via HeidiSQL

13.09.2023
25
Using FOREIGN KEY Constraints
• Delete and update record using foreign key

13.09.2023
http://www.sitepoint.com/mysql-foreign-keys-quicker-database-development/
default http://dev.mysql.com/doc/refman/5.6/en/create-table-foreign-keys.html
26
FOREIGN KEY Constraints
For both update and delete :
if you try to update / delete the parent row :
• Restrict : Nothing gonna be delete if there is a child
row. Rejects the delete or update operation for the
parent table [Equivalent to No Action].
• Cascade : the child row will be delete / update too
• Set Null : the child column will be set to null if you
delete the parent [make sure that you have NOT declared
13.09.2023
the columns in the child table as NOT NULL ] . 27

https://dev.mysql.com/doc/refman/5.7/en/create-table-foreign-keys.html
Create foreign key (2)
If you want to add a foreign key after the related
tables have been built:

ALTER TABLE product_order ADD `key_1` FOREIGN KEY


(Person_ID) REFERENCES customer(Person_ID)

13.09.2023
28
Function of foreign key
• Inserting an order from a customer that
does not exist in the table customer?

insert into product_order values ('ABC washing


machine', '02','11', 550);
/* SQL Error (1452): Cannot add or update a child row: a foreign key constraint fails
(`temp`.`product_order`, CONSTRAINT `key_1` FOREIGN KEY (`Person_ID`)
REFERENCES `customer` (`Person_Id`)) */

13.09.2023
29
Referential integrity
• Foreign key values must exist in another table
- If not, those records cannot be joined

• Can be enforced when data is added


- Associate a primary key with each foreign key

• Helps avoid erroneous data


- Only need to ensure data quality for primary keys 13.09.2023
30
Example (again)
Primary key

Table customer

Table product_order

Foreign key 13.09.2023


31
Reflection
How to use foreign key in your research
project?

13.09.2023
32
Drop foreign key via HeidiSQL

13.09.2023
33
Drop foreign key via commands
• Foreign key tends to have a different name that is not
intuitively available.

• Step 1: Obtain the name of foreign key or constraint name


- Two substep2

• Step 2: drop the foreign key or constraint

13.09.2023
34
Step 1: Step 2:
“show create table Obtain the name of a
Table_Name” foreign key:

Step 3:
“alter table Table_Name drop
foreign key `constraint_name`”
13.09.2023
35
Section 3: Entity-relationship diagram (ERD)

13.09.2023
36
Entity-relationship diagram

• A graphical representation of the structure of a


database.
• When a relational database is to be designed,
an entity-relationship diagram is drawn at an
early stage and developed as the requirements
of the database and its processing become
better understood.
13.09.2023
37
13.09.2023
38
http://creately.com/diagram/example/gsv8l5hs2/Exam+Database
There are different notations, the Crow’s Foot ERD is a popular one.

13.09.2023
39
http://www.conceptdraw.com/solution-park/diagramming-ERD
13.09.2023
40
Section 4: Select
• Select command is used to retrieve data from a
table
• Template of a “select” query:
Select attributes
from table or view
[Where conditions]
[Group by attributes [Having condition]]
[Order by attributes [asc | desc]] 13.09.2023
41
[Limit]
Select for calculation
• In command window:
This does not work!

This works!

13.09.2023
42
Select: Retrieving certain columns and all rows
• If you have a data table with 20+ variables and half a
million rows, but you just want to use a few variables.
select contactLastName, contactFirstName, phone
from customers;

Table name

Column names

13.09.2023
43
Select all columns via *
• An asterisk (*) indicates an inclusion of all
the columns of a table

select * from customers;

13.09.2023
44
Select…Where…(1)
• A customer of your company is found to not pay
the bill by the deadline. Your boss know the ID
of the customer (customerNumber) is 103 and
ask you to provide the contact details of the
customer.
Your task: quickly detecting the contact
details of the customer whose cusomterNumber is
103. 13.09.2023
45
Solution
Select contactLastName, contactFirstName
from customers
where customerNumber = 103;

13.09.2023
46
Select…Where…(2)
• An important customer contacted your
company asking for some information on his
deal with the company, but he lost the name
card of sales representative who assisted him
with the deal. He remember the first name of
the person is Leslie. Please find the contact
information of Leslie from the database.
13.09.2023
47
“ = ” is MySQL equal operator
Remember to use ' to specify a string

Select lastName, firstName, email


from employees
where firstName = 'Leslie'

Is it possible to compare strings with 'greater than' and 'less than' ?


13.09.2023
48
select lastName, firstName, email
from employees
where firstName > 'L'

13.09.2023
49
Comparison operator
Comparison operator Description
= Equal to
< Less than
> Greater than
<= Less than or equal to
>= Greater than or equal to
<> Not equal to
!= Not equal to

Don’t get confused with other programming languages!


13.09.2023
50
Comparison operator: Date
• Your company did a one-day marketing
campaign on June 21, 2004.
• Your boss wants to know the contact details
of the customers who were motivated to
make an order on that day.

13.09.2023
51
Comparison operator: Date
Comparison operator can also be applied to
date type column, e.g.:

Select customerNumber from payments where


paymentDate = '2004-06-21'

13.09.2023
52
Compound conditions: and / or / not

• And: all the simple condition must be true in


order for the compound condition to be true.
• Or: the compound condition will be true
whenever any of the simple conditions is true.
• NOT: the compound condition will be true if
simple condition is false.
13.09.2023
53
Question (1)
• The marketing group of your company found that
customers who are living in Paris with a credit
limit over 50,000 are most profitable. They ask
you to provide contact details of those customers.

13.09.2023
54
SELECT contactLastName,contactFirstName,
phone, city, addressLine1
from customers where city = Paris and creditLimit >
50000
Single quotes are
needed.

13.09.2023
55
Tips: quick way of counting rows

How many customers have a credit limit over


5000? SELECT * from customers where
creditLimit > 5000
• This approach can also be used for big tables. 13.09.2023
56
/* Affected rows: 0 Found rows: 85 Warnings: 0 Duration for 1 query: 0,000 sec. */
Question (2)
• The marketing group also found that customers
living in city Madrid with a creditLimit over
10,000 are very profitable, in addition to those
living in city Paris with a creditLimit over
5000.
• Can you retrieve their contact details via one
query?
13.09.2023
57
• Make a proper use of ()

Select contactLastName, contactFirstName, phone, city


from customers
where (city = 'Paris' and CreditLimit > 5000)
or (city = 'Madrid' and CreditLimit > 10000)

13.09.2023
58
Example for NOT
Select contactLastName, contactFirstName, phone, city
from customers
where NOT
(city = 'Paris' and CreditLimit > 5000)
or (city = 'Madrid' and CreditLimit > 10000)

• What does the above command mean?


13.09.2023
59
What does the above query mean?
Select contactLastName, contactFirstName, phone, city
from customers
where NOT
(city = 'Paris' and CreditLimit > 5000)
or (city = 'Madrid' and CreditLimit > 10000)
Question: Is the above command equal to:
Select contactLastName, contactFirstName, phone, city
from customers
where NOT (city = 'Paris' and CreditLimit > 5000)
13.09.2023
http://presemo.aalto.fi/drm 60
Between…and…
• In the table customers, please retrieve the records of
customers with creditLimit that are i) greater than or
equal to 60,000 and ii) less than or equal to 70,000

Alternative 1:
Select * from customers
where creditLimit >= 60000 and creditLimit <= 70000
Alternative 2:
Select * from customers
where creditLimit between 60000 and 70000 13.09.2023
61
Challenge!
• Please retrieve the records from
table “orders” in which status is not “Shipped” and the
orderDate is between '2005-05-09' and '2005-05-31' and
requiredDate is between '2005-06-01' and '2005-06-10'
by customers whose customersNumber are 124 or 119.

13.09.2023

Table orders 62
Reflect and Question
SELECT *
FROM orders
WHERE `status` != 'Shipped' AND
orderDate BETWEEN '2005-05-09' AND '2005-05-31' AND
requiredDate BETWEEN '2005-06-01' AND '2005-06-10' AND
(customerNumber = 124 or 119)

What will happen if


“customerNumber = ” is removed
from the “customerNumber = 119 ”
condition
13.09.2023
63
Answer
SELECT *
FROM orders
WHERE status != 'Shipped' AND
orderDate BETWEEN '2005-05-09' AND '2005-05-31' AND
requiredDate BETWEEN '2005-06-01' AND '2005-06-10' AND
(customerNumber = 124 or customerNumber = 119)

13.09.2023
64
Select …. limit
• Limit is used to limit your MySQL query
results to those that fall within a specified
range.
• select * from products Limit 0,10;
# Retrieve first ten rows or using [Limit 10]
• select * from products Limit 5,10;
# Retrieve rows 6-15
13.09.2023
65
Select …. limit
Limit function:
• Enable a quick check of the validity of the
result.
• Save lots of time when a huge number of
records will be returned [a big-data issue].

13.09.2023
66
Using computed columns (1)
• Your boss asks you to provide information
on the values of different products in stock.
• Value = quantityInStock * buyPrice

13.09.2023
67
Using computed columns (2)
Select productCode, quantityInStock*buyPrice
from products +
select 5/2 → 2,5000
-
select 5%2 → 1 *
/
%
13.09.2023
68
Using As for Aliases
• SELECT column_name AS alias_name
FROM table_name;
Select productCode,
quantityInStock*buyPrice as productValue
from products

13.09.2023
69
Make the result more readable
• Select productCode,
round(quantityInStock*buyPrice, 1) as productValue
from products

13.09.2023
70
Save results as a new table
Duplicate tables
• create table new_table as select * from old_table;

• create table temp as (


select productCode,
round(quantityInStock*buyPrice , 2)
as productValue
from products)
13.09.2023
71
Select: sorting rows
• Your boss is asking you to provide
information on the top 3 most valuable
products in stock.

13.09.2023
72
Order by… asc/desc
• Problem of unsorted results
• Order by column asc/desc
- Asc : ascending (default option)
- Desc: descending

13.09.2023
73
Example
• select productCode,
quantityInStock*buyPrice as productValue
from products
order by productValue desc
limit 3

13.09.2023
74
More complex sort
• Sort several columns at the
same time
Order by column 1, column 2…
Primary sort key Secondary sort key
• If you want to change direction
of sorting among columns
Order by column 1 desc,
column 2 asc 13.09.2023
75
Question
• Your boss wants to identify
‘big’ payments to the
company in 2004
• Amounts of ‘big’ payments
should be over 10000.
• Please exhibit results in
both an ascending order
for customerNumber and a
descending order for
amount 13.09.2023
76
Table ‘payments’
Answer
Select * from payments
where amount> 10000 and
paymentDate between '2004-01-01'
and '2004-12-31'
order by customerNumber asc,
amount desc

13.09.2023
77
Instructions on Hands-on session 3
• “chile” data and “classicmodels” database will be used for the
hands-on training 3. The tables belonging to “classicmodels”
database should have already been imported at the first hands-on
session via importing the “classicmodels (using PCs in the lab or at
home).sql” file.
• The Chile data has 2400 rows and 8 columns. This data was derived
from a national survey conducted in April and May of 1988 by
FLACSO/Chile. Missing data are removed.
• Please read description of the dataset (Description of Chile election
1988.docx), downloadable from MyCourse [Data and database files
folder] 13.09.2023
78

You might also like