SQL Joins
SQL Joins
SQL joins are used to query data from two or more tables, based on a relationship between certain columns in these tables.
SQL JOIN
The JOIN keyword is used in an SQL statement to query data from two or more tables, based on a relationship between certain columns in these tables. Tables in a database are often related to each other with keys. A primary key is a column (or a combination of columns) with a unique value for each row. Each primary key value must be unique within the table. The purpose is to bind data together, across tables, without repeating all of the data in every table. Look at the "Persons" table: P_Id 1 2 3 LastName Hansen Svendson Pettersen FirstName Ola Tove Kari Address Timoteivn 10 Borgvn 23 Storgt 20 City Sandnes Sandnes Stavanger
Note that the "P_Id" column is the primary key in the "Persons" table. This means that no two rows can have the same P_Id. The P_Id distinguishes two persons even if they have the same name. Next, we have the "Orders" table: O_Id 1 2 3 4 5 OrderNo 77895 44678 22456 24562 34764 P_Id 3 3 1 1 15
Note that the "O_Id" column is the primary key in the "Orders" table and that the "P_Id" column refers to the persons in the "Persons" table without using their names. Notice that the relationship between the two tables above is the "P_Id" column.
JOIN: Return rows when there is at least one match in both tables LEFT JOIN: Return all rows from the left table, even if there are no matches in the right table RIGHT JOIN: Return all rows from the right table, even if there are no matches in the left table FULL JOIN: Return rows when there is a match in one of the tables
By Chris Seifert Often, especially in smaller databases, simple SQL statements, such as SELECT and UPDATE, are all you need to get the required data. This situation changes as a database growsespecially in number of tables. You will increasingly find it necessary, with these databases, to extract information from multiple tables. You can do so with the join query. In this article, I will explain what a join is and how it differs from simpler queries. This article uses an example database containing the film information shown in Tables A and B. Table A
MovieID 21 22 23 Title A Beautiful Mind Forrest Gump The English Patient Year 2002 1994 1999
Movies Table B
ActorID 1 2 3 MovieID 22 21 23 Name Tom Hanks Russell Crowe Ralph Fiennes
Actors Simple SQL queries Lets begin with simple SQL queries and proceed to using joins. For example, if you want the title, year, and actor for the movie Forrest Gump, you can use the following two SELECT queries with the tables: SELECT title, year FROM Movie WHERE MovieID='22' SELECT Name FROM Actor WHERE MovieID='22' Afterward, a bit of programming is necessary to bring the information together. But why waste all that time coding when you can use a join? What is a join? Joins are possibly both the most frequently used and the most confusing aspect of SQL, so lets continue with the example to gain a better understanding. Rather than the multiple query method, you can use the following join to get the same result without additional programming: SELECT title, year, actor FROM Movie, Actor WHERE (Movie.MovieID=Actor.MovieID) AND MovieID='22' Lets take a closer look at this command. We need to examine three segments in this query. The firstSELECT title, year, actornames what you are looking for. The secondFROM Movie, Actornames the tables from which you are querying. Finally, the third sectionWHERE (Movie.MovieID=Actor.MovieID) AND MovieID='22' determines which records are chosen.
What is the difference? You may wonder about the difference between the two approaches Ive introduced. In the SELECT section, you select fields from both tables. In the FROM section, you list all the tables you are using. In the WHERE group, you choose the fields that share common data. This is the foundation of joins: The tables must have reference points that connect them to one another. In the example, the common fields are the MovieID fields. The fields dont have to be named the same, but it is often a good idea to consider doing so when designing your database. As long as the two fields have compatible data types (such as a floating point and an integer), you can use them for joining purposes. Bear in mind that these fields must have the same data, regardless of their data type. Joining the two tables where Movie.MovieID=Actor.ActorID won't do you any good, because this statement doesnt compare two sets of data that describe the same things. Now, lets take a closer look at joins. Types of join operations There are four kinds of joins: cross, inner, outer, and self joins. The join in our example is an inner join. Changing the syntax to make this more apparent, the join can also look like this: SELECT title, year, actor FROM Movie INNER JOIN Actor ON (Movie.MovieID=Actor.MovieID) AND MovieID='22' The inner join returns only the records that match the specific criteria you ask for (Movie.MovieID=Actor.MovieID) and nothing else. This result is different from that of an outer join, which can return these records as well as unmatched rows from one or both of the tables you are pulling from. A self join joins data from different fields within the same table. Self joins are rarely required. A cross join is a specialized inner join. It does the same thing as the inner join, but it does not have a WHERE clause, making it the Cartesian product of the tables you are comparing. Thus, the cross join query could look like this: SELECT * FROM Actor, Movie or like this: SELECT * FROM Actor CROSS JOIN Movie The result of the cross join would be a virtual table like Table C. Table C
ActorID 1 1 1 2 2 Actor.MovieID Movie.MovieID 22 21 22 22 22 23 21 21 21 22 Name Tom Hanks Tom Hanks Tom Hanks Russell Crowe Russell Crowe Title A Beautiful Mind Forrest Gump The English Patient A Beautiful Mind Forrest Gump Year 2002 1994 1999 2002 1994
2 3 3 3
21 23 23 23
23 21 22 23
The English Patient A Beautiful Mind Forrest Gump The English Patient
Cross join results Only three of the records (those highlighted in yellow) in Table C are accurate for our purposes, but a cross join could be useful in some situations where every possible combination is desired. The biggest issue concerning cross joins is the fact that they can easily slow a database; the amount of required processing increases quickly as the number of records increases. In this case, you have three records for each table, making nine results in the cross join. What if you had 10,000 records in each table? Or 10 million? The processing time could cripple the system. SQL DML and DDL
SQL can be divided into two parts: The Data Manipulation Language (DML) and the Data Definition Language (DDL). The query and update commands form the DML part of SQL:
SELECT - extracts data from a database UPDATE - updates data in a database DELETE - deletes data from a database INSERT INTO - inserts new data into a database
The DDL part of SQL permits database tables to be created or deleted. It also define indexes (keys), specify links between tables, and impose constraints between tables. The most important DDL statements in SQL are:
CREATE DATABASE - creates a new database ALTER DATABASE - modifies a database CREATE TABLE - creates a new table ALTER TABLE - modifies a table DROP TABLE - deletes a table CREATE INDEX - creates an index (search key) DROP INDEX - deletes an index