0% found this document useful (0 votes)
19 views

SP203 Week3

The document discusses SQL joins, including inner, outer, left and right joins. It provides examples of joining multiple tables and using aliases. It also covers union operators and self joins.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

SP203 Week3

The document discusses SQL joins, including inner, outer, left and right joins. It provides examples of joining multiple tables and using aliases. It also covers union operators and self joins.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Week 3

Intermediate SQL II: Joins


SQL for Business Users
Module Description
The real power of SQL comes from working with data
from multiple tables. The term “relational database” refers
to the fact that the tables within it “relate” to one another
and contain common identifiers that allow information
from multiple tables to be combined easily.
Module Objective
At the end of this week, learners will be able to

(1) Understand basic concepts of joining multiple table


(2) Understand and apply SQL joins like INNER, OUTER, LEFT/
RIGHT, et
(3) Getting comfortable with describing data from multiple tables
using SQL joins
c

Practice Dataset

For this part of the course,


we’ll be working with multiple
tables collected from ESPN
- data on College Football
Player
- data on College Football
Teams
s

Practice Dataset

For this part of the course, we’ll be


working with multiple tables
collected from ESPN
- data on College Football Player
- Includes data on players (height,
weight, and school, etc
- data on College Football Team
- (school, conference, division, etc)

Note: To familiarize yourself, spend a few minutes


performing basic SQL functions and aggregations
on this new tables before proceeding.
:

Joins
• From our 2 tables, say we want to know which conference has the
highest average weight? Use a join

• Let’s unpack all the new commands in the above SQL code
!

Aliases in Joins
• Giving tables with long names aliases (new names) makes it easy to perform
SQL join

• In this example, players replaced benn.college_football_player


• As an exercise, identify the alias of the other table.
s

Aliases in Joins
• Giving columns with long names aliases (new names) makes it easy to perform
SQL SELECTs

• In this example, conference replaced teams.conferenc


• As an exercise, identify the alias that refers to the average player weight.
.

Aliases in Joins

• After the FROM statement, we have JOIN, the new table name and ON

• ON indicates how the two tables relate to each other, in this case its school_name. This relationship is
called “mapping”.
• The two columns teams.school_name and players.school_name map to one another. They are called
“foreign keys” or “join keys”

• Their mapping is written as a conditional statement


.

Aliases in Joins

In plain english, its:

“Join all rows from the players table on to rows in the teams table
for which school_name field in the players table is equal to the
school_name field in the teams table.”

What does a join actually do?


• This is a row in the players table for Wake Forest wide receiver Michael Campanar

• During the join, SQL looks up the school_name, in this case Wake Forest, in the
school_name field of the teams table. If a match exists, the SQL takes all the
columns from teams table and joins them to the other columns fo the players table

• The end result is a new table with all the columns from both tables. Screenshot
below is just a snippet but it should contain 15 total column

What does a join actually do?


• To see the full table returned by the join, run the query below

• Note that SELECT * returns all of the columns from both tables, not just from
the table after FROM. If you want to only return columns from one specific
table, you can instead write SELECT players.
• As an exercise, write the SQL query that only returns the columns from the
teams table. *

Visual guide to how joins work

by Patric Spathon
INNER JOIN
• Its often the case that one or both
tables being joined contain rows that
don’t have matches in the other
table
• To handle this, we have a choice of
using either an INNER JOIN or an
OUTER JOI
• INNER JOIN eliminate the rows from
both tables that do not satisfy the
join condition in the ON statement
.

INNER JOIN

• In mathematical terms, an inner join


is the intersection of the two table
or
• In our example, if a player goes to a
school that isn’t in the teams table,
that player’s data is removed from
the resulting inner join. Similarly for
schools in the teams table. s

INNER JOIN
• In joining two tables, we might encounter cases where both tables
have columns with identical names. Run the query below to see
what happens:
INNER JOIN
• In joining two tables, we might encounter cases where both tables
have columns with identical names. Run the query below to see
what happens

• The result can only support one column with a given name.
:

INNER JOIN
• We can avoid errors in joining tables with the same column names
by aliasing or renaming columns individually. Run the query below
to see how the result differs from the previous slide’s example.
OUTER JOIN
• LEFT JOIN - returns only unmatched rows from the left table
• RIGHT JOIN - returns only unmatched rows from the right tabl
• FULL OUTER JOIN - returns unmatched rows from both tables
(rarely used)

Practice Dataset

For this part of the course, we’ll be working


with multiple tables from the Crunchbase
dataset, a crowdsourced index of startups,
founders, and investors

Its essentially two tables describing acquiring


companies or the companies that got
acquired.
.

Practice Dataset
• The first table lists a large portion of companies in the database, one row =
one company. The rest of the columns are descriptive

• The second table lists acquisitions with one row = one company
• Mapping #1: permalink from the first table maps to company_permalink in
the second

• Mapping #2: acquirer_permalink from the second table also maps to


permalink of the first table

• The foreign key used to join will entirely depend on whether you want to
add information about the acquiring company, or the company that was
acquired.

Note: To familiarize yourself, spend a few minutes performing basic SQL


functions and aggregations on this new tables before proceeding. You
may also try joining the tables using the mappings described above.
.

LEFT JOIN
• Tells the database to return all rows
in the table in the FROM clause,
regardless of whether or not they
have matches in the LEFT JOIN
clause.
LEFT JOIN
• Left joins are executed as such

• As an exercise, perform the same query but use INNER instead


of LEFT and compare the results.
:

RIGHT JOIN
• Rarely use
• You can achieve the same result
using a LEFT JOIN but switching
the table
• Tells the database to return all rows
in the table in the LEFT JOIN
clause, regardless of whether or not
they have matches in the FROM
clause.
s

RIGHT JOIN
• As an exercise, perform both queries. See any differences?
SQL JOINS using WHERE or ON
• Its possible to filter one or both tables before joining them
• In this example, everything in the acquisitions table was joined
EXCEPT for the row excluded using the operator != ‘/company/
1000memories’

SQL JOINS using WHERE or ON


• In the conditional statement after the LEFT JOIN; AND
acquisitions.company_permalink != ‘/company/1000memories’, the
filter is evaluated on only one table (without affecting the other).
UNION Operator
• JOINs allow side-by-side combination
• UNION allows stacking datasets one on top of the other
• UNION appends distinct values
• UNION ALL appends all values from both tables.

UNION Operator
• Both tables must have the same number of columns and the same
data types in order for the UNION to be successful.
Joins with Comparison Operators
• Instead of using = for all joins, we can also use comparison
operators (>, <, >=, <=, etc) as join operators. They function as
conditional statements in the join condition.
Joining on multiple keys
• We can also join on multiple foreign keys as shown in the example
below. This allows us to optimize runtime since SQL uses indices
to speed up queries.
Self Joins
• It is useful to join a table to itself. For example in the Crunchbase
dataset, say you want to identify companies that received an
investment from Great Britain following an investment from Japan,
we use the query:
Quiz
• Write a query that shows a company’s name, “status” (found in the
Companies table), and the number of unique investors in that company.
Order by the number of investors from the most to fewest. Limit to only
companies in the state of New York
• Write a query that shows 3 columns. The first indicates which dataset
(part 1 or 2) the data comes from, the second shows company status,
and the third is a count of the number of investors. Hint: Study and use
tutorial.crunchbase_investments_part1 and
tutorial.crunchbase_incestments_part2. You will also use the table
tutorial.crunchbase_companies. Lastly, you will have to group by status
and dataset.
.

You might also like