SP203 Week3
SP203 Week3
Practice Dataset
Practice Dataset
Joins
• From our 2 tables, say we want to know which conference has the
highest average weight? Use a join
• Let’s unpack all the new commands in the above SQL code
!
Aliases in Joins
• Giving tables with long names aliases (new names) makes it easy to perform
SQL join
Aliases in Joins
• Giving columns with long names aliases (new names) makes it easy to perform
SQL SELECTs
Aliases in Joins
• After the FROM statement, we have JOIN, the new table name and ON
• ON indicates how the two tables relate to each other, in this case its school_name. This relationship is
called “mapping”.
• The two columns teams.school_name and players.school_name map to one another. They are called
“foreign keys” or “join keys”
Aliases in Joins
“Join all rows from the players table on to rows in the teams table
for which school_name field in the players table is equal to the
school_name field in the teams table.”
• During the join, SQL looks up the school_name, in this case Wake Forest, in the
school_name field of the teams table. If a match exists, the SQL takes all the
columns from teams table and joins them to the other columns fo the players table
• The end result is a new table with all the columns from both tables. Screenshot
below is just a snippet but it should contain 15 total column
• Note that SELECT * returns all of the columns from both tables, not just from
the table after FROM. If you want to only return columns from one specific
table, you can instead write SELECT players.
• As an exercise, write the SQL query that only returns the columns from the
teams table. *
by Patric Spathon
INNER JOIN
• Its often the case that one or both
tables being joined contain rows that
don’t have matches in the other
table
• To handle this, we have a choice of
using either an INNER JOIN or an
OUTER JOI
• INNER JOIN eliminate the rows from
both tables that do not satisfy the
join condition in the ON statement
.
INNER JOIN
INNER JOIN
• In joining two tables, we might encounter cases where both tables
have columns with identical names. Run the query below to see
what happens:
INNER JOIN
• In joining two tables, we might encounter cases where both tables
have columns with identical names. Run the query below to see
what happens
• The result can only support one column with a given name.
:
INNER JOIN
• We can avoid errors in joining tables with the same column names
by aliasing or renaming columns individually. Run the query below
to see how the result differs from the previous slide’s example.
OUTER JOIN
• LEFT JOIN - returns only unmatched rows from the left table
• RIGHT JOIN - returns only unmatched rows from the right tabl
• FULL OUTER JOIN - returns unmatched rows from both tables
(rarely used)
Practice Dataset
Practice Dataset
• The first table lists a large portion of companies in the database, one row =
one company. The rest of the columns are descriptive
• The second table lists acquisitions with one row = one company
• Mapping #1: permalink from the first table maps to company_permalink in
the second
• The foreign key used to join will entirely depend on whether you want to
add information about the acquiring company, or the company that was
acquired.
LEFT JOIN
• Tells the database to return all rows
in the table in the FROM clause,
regardless of whether or not they
have matches in the LEFT JOIN
clause.
LEFT JOIN
• Left joins are executed as such
RIGHT JOIN
• Rarely use
• You can achieve the same result
using a LEFT JOIN but switching
the table
• Tells the database to return all rows
in the table in the LEFT JOIN
clause, regardless of whether or not
they have matches in the FROM
clause.
s
RIGHT JOIN
• As an exercise, perform both queries. See any differences?
SQL JOINS using WHERE or ON
• Its possible to filter one or both tables before joining them
• In this example, everything in the acquisitions table was joined
EXCEPT for the row excluded using the operator != ‘/company/
1000memories’
UNION Operator
• Both tables must have the same number of columns and the same
data types in order for the UNION to be successful.
Joins with Comparison Operators
• Instead of using = for all joins, we can also use comparison
operators (>, <, >=, <=, etc) as join operators. They function as
conditional statements in the join condition.
Joining on multiple keys
• We can also join on multiple foreign keys as shown in the example
below. This allows us to optimize runtime since SQL uses indices
to speed up queries.
Self Joins
• It is useful to join a table to itself. For example in the Crunchbase
dataset, say you want to identify companies that received an
investment from Great Britain following an investment from Japan,
we use the query:
Quiz
• Write a query that shows a company’s name, “status” (found in the
Companies table), and the number of unique investors in that company.
Order by the number of investors from the most to fewest. Limit to only
companies in the state of New York
• Write a query that shows 3 columns. The first indicates which dataset
(part 1 or 2) the data comes from, the second shows company status,
and the third is a count of the number of investors. Hint: Study and use
tutorial.crunchbase_investments_part1 and
tutorial.crunchbase_incestments_part2. You will also use the table
tutorial.crunchbase_companies. Lastly, you will have to group by status
and dataset.
.