0% found this document useful (0 votes)
10 views

1 - SQL_DE_Feb25

The document provides a comprehensive overview of SQL, covering its introduction, types of commands, data types, and key statements such as SELECT, INSERT, UPDATE, and DELETE. It explains the importance of SQL in managing relational databases and details various SQL operations, including data manipulation and structure modification. Additionally, it discusses advanced topics like joins, aggregation functions, and transaction control, emphasizing SQL's role in data analysis and database management.

Uploaded by

maddypd18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

1 - SQL_DE_Feb25

The document provides a comprehensive overview of SQL, covering its introduction, types of commands, data types, and key statements such as SELECT, INSERT, UPDATE, and DELETE. It explains the importance of SQL in managing relational databases and details various SQL operations, including data manipulation and structure modification. Additionally, it discusses advanced topics like joins, aggregation functions, and transaction control, emphasizing SQL's role in data analysis and database management.

Uploaded by

maddypd18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 79

1.

Introduction to SQL

2. Types of SQL Commands

3. SQL Data Types

4. SELECT Statement and WHERE Clause

5. INSERT, UPDATE, and DELETE Statements

6. ALTER Statement

7. ORDER BY, GROUP BY, and HAVING Clauses

8. Joins in SQL

9. Self-Join

10. Aggregation Functions

11. Subqueries

12. Common Table Expressions (CTE)

13. Window Functions

14. Analytical Functions

15. Views in SQL

16. CASE Statements

17. Creating Tables Using CTAS (CREATE TABLE AS SELECT)

18. Temporary Tables

19. Data Type Casting

20. Working with Date and Time in SQL

21. Stored Procedures


Introduction to SQL
Structured Query Language (SQL) is a standardized programming language
specifically designed for managing and manipulating relational databases.
Developed in the early 1970s by IBM, SQL emerged from a need to interact with
databases in a more coherent and structured manner, allowing users to create, read,
update, and delete data efficiently. The language was initially called SEQUEL
(Structured English Query Language) and later shortened to SQL, which has since
become the industry standard.
The importance of SQL lies in its ability to provide a consistent way to access and
manipulate relational databases regardless of the underlying database management
system (DBMS). This uniformity is crucial, as it enables developers and database
administrators to employ SQL across various platforms, such as MySQL,
PostgreSQL, Microsoft SQL Server, and Oracle. As a result, SQL has become an
essential tool for professionals working with data in numerous fields, including
software development, business intelligence, and data analysis.
In 1986, SQL was standardized by the American National Standards Institute (ANSI),
further solidifying its position as the de facto language for relational databases. This
standardization ensured that SQL would be uniformly implemented across different
systems, promoting interoperability and reducing the learning curve for new users.
The ANSI standard has undergone several revisions, incorporating new features and
functionalities that reflect the evolving needs of database users.
Basic SQL usage encompasses a variety of operations including creating and
modifying database structures (DDL - Data Definition Language), querying data
(DML - Data Manipulation Language), and managing access control (DCL - Data
Control Language). Common SQL commands such as SELECT, INSERT, UPDATE,
and DELETE allow users to perform essential tasks effortlessly, making SQL a
powerful and indispensable language for anyone working with relational databases.
Types of SQL Commands
SQL commands can be categorized into four primary types: Data Definition
Language (DDL), Data Manipulation Language (DML), Data Control Language
(DCL), and Transaction Control Language (TCL). Each of these categories serves a
distinct purpose in the management and manipulation of data within a database,
contributing to the overall functionality of SQL as a powerful tool for database
administration.

Data Definition Language (DDL)


DDL commands are used to define and modify the structure of database objects.
These commands allow users to create, alter, and drop tables and other database
elements. Common examples of DDL commands include:
• CREATE: Used to create new tables or databases. For instance, CREATE
TABLE employees (id INT, name VARCHAR(100), department
VARCHAR(50));
• ALTER: Modifies an existing database object. For example, ALTER TABLE
employees ADD COLUMN salary DECIMAL(10, 2);
• DROP: Deletes database objects. An example would be DROP TABLE
employees;
DDL commands are essential for establishing the schema and structure of a
database.

Data Manipulation Language (DML)


DML commands are utilized for managing data within those structures defined by
DDL. They allow for the retrieval, insertion, updating, and deletion of data. Common
DML commands include:
• SELECT: Retrieves data from one or more tables. For instance, SELECT *
FROM employees WHERE department = 'Sales';
• INSERT: Adds new records to a table. For example, INSERT INTO
employees (name, department) VALUES ('John Doe', 'Sales');
• UPDATE: Modifies existing records. An example would be UPDATE
employees SET salary = 60000 WHERE id = 1;
• DELETE: Removes records from a table. For instance, DELETE FROM
employees WHERE id = 1;
DML commands are crucial for day-to-day data operations.

Data Control Language (DCL)


DCL commands are involved in controlling access to data in the database. They
primarily deal with permissions and security. Common DCL commands include:
• GRANT: Provides specific privileges to users. For example, GRANT SELECT
ON employees TO user1;
• REVOKE: Removes specific privileges from users. An example would be
REVOKE INSERT ON employees FROM user1;
These commands help ensure that only authorized users can access or manipulate
data.

Transaction Control Language (TCL)


TCL commands manage transactions within the database. They ensure that all
operations within a transaction are completed successfully and maintain data
integrity. Common TCL commands include:
• COMMIT: Saves all changes made during the current transaction. For
example, COMMIT;
• ROLLBACK: Undoes changes made during the current transaction in case of
an error. An example would be ROLLBACK;
• SAVEPOINT: Sets a point within a transaction to which you can later roll
back. For instance, SAVEPOINT save1;
TCL commands are vital for maintaining data integrity and handling error
management in database operations. Each of these SQL command categories plays
a crucial role in the effective management of relational databases, allowing users to
organize, manipulate, and secure their data efficiently.
SQL Data Types
In SQL, data types define the nature of data that can be stored in a column of a
table. Understanding these data types is crucial for designing a database schema
that supports the requirements of applications effectively. SQL data types can be
categorized into several groups: numeric, character, string, date and time, and binary
types. Below is a detailed description of each category along with examples.

Numeric Types
Numeric data types are used to store numbers. They can be divided into two main
categories: integers and floating-point numbers.
• INT: A standard integer type that can store whole numbers without decimal
points. Example: age INT;
• FLOAT: (approximate) Used for floating-point numbers that require decimal
precision. Example: price FLOAT(5, 2); (where 5 is the total number of digits,
and 2 is the number of digits after the decimal point).
• DECIMAL: (accurate)This type is used for fixed-point numbers and is often
utilized in financial applications to avoid rounding issues. Example: salary
DECIMAL(10, 2);

Character Types
Character data types are designed to store fixed-length or variable-length strings of
characters.
• CHAR: Fixed-length character strings. Example: gender CHAR(1); (stores 'M'
or 'F').
• VARCHAR: Variable-length character strings, which can store up to a
specified maximum length. Example: first_name VARCHAR(50);

Char - fixed length


Varchar – variable length

String Types
String data types are similar to character types but are typically used for larger text
blocks.
• TEXT: A data type for large text strings, which can hold up to 65,535
characters. Example: description TEXT;
• CLOB: Character Large Object, used to store large amounts of character
data. Example: bio CLOB;

Date and Time Types


These data types are critical for storing date and time information.
• DATE: Stores date values (year, month, day). Example: birth_date DATE;
• TIME: Stores time values (hours, minutes, seconds). Example: event_time
TIME;
• DATETIME: Combines date and time into a single data type. Example:
created_at DATETIME;

Binary Types
Binary data types are used to store binary data, such as images or files.
• BLOB: Binary Large Object, which can store up to 65,535 bytes of binary
data. Example: image BLOB;
• VARBINARY: Variable-length binary data. Example: file VARBINARY(255);
Each of these data types plays a pivotal role in ensuring that the data stored in a
database is accurate, efficient, and suited to the needs of applications. Properly
selecting data types can significantly impact the performance and storage efficiency
of a database system.
SQL Operators
SELECT Statement and WHERE Clause
The SELECT statement is one of the most fundamental commands in SQL, as it
allows users to retrieve data from one or more tables within a database. The basic
syntax of a SELECT statement is as follows:
SELECT column1, column2, ...
FROM table_name
WHERE condition;

In this syntax, column1, column2, etc., represent the specific columns of data that
you wish to retrieve, while table_name indicates the table from which the data is
being selected. The WHERE clause is optional but crucial for filtering records based
on specific criteria.
For example, consider a hypothetical sales database with a table named
sales_records. This table includes columns such as sale_id, customer_name,
product, quantity, and sale_date. If we want to retrieve all sales made for a specific
product, say "Laptop," the SQL query would look like this:
SELECT *
FROM sales_records
WHERE product = 'Laptop';

This query retrieves all columns for the records where the product column matches
"Laptop." The use of the asterisk (*) indicates that all columns should be returned.
The WHERE clause can also utilize various operators to refine the search further.
For instance, to find sales records where the quantity sold is greater than 10, the
query would be:
SELECT *
FROM sales_records
WHERE quantity > 10;

Moreover, the WHERE clause supports logical operators such as AND, OR, and
NOT for combining multiple conditions. If we want to retrieve records for sales that
occurred in the year 2023 and involved a quantity greater than 5, the SQL statement
would be:
SELECT *
FROM sales_records
WHERE sale_date >= '2023-01-01' AND quantity > 5;

These examples illustrate how the SELECT statement, combined with the WHERE
clause, provides flexible and powerful options for data retrieval, enabling users to
extract precisely the information they need from their databases.
INSERT, UPDATE, and DELETE Statements
In SQL, the ability to manipulate data is a core function that allows users to maintain
and update records within a database. The three primary operations for modifying
data in a SQL database are the INSERT, UPDATE, and DELETE statements. Each
of these operations serves a unique purpose, enabling users to add new records,
modify existing data, or remove records altogether. To illustrate these functions, we
will use a hypothetical sales table, which contains columns such as sale_id,
customer_name, product, quantity, and sale_date.

INSERT Statement
The INSERT statement is used to add new records to a table. For example, if we
want to add a new sale record for a customer named "Alice" who purchased 3 units
of "Smartphone", the SQL command would be:
INSERT INTO sales (customer_name, product, quantity, sale_date)
VALUES ('Alice', 'Smartphone', 3, '2023-10-01');

This command adds a new row to the sales table with the specified values.

UPDATE Statement
The UPDATE statement allows users to modify existing records in a table. For
instance, if we need to update the quantity of products sold by changing Alice's
purchase from 3 to 5 units, the SQL command would be:
UPDATE sales
SET quantity = 5
WHERE customer_name = 'Alice' AND product = 'Smartphone';

This command finds the record that matches the specified conditions and updates
the quantity field accordingly.

DELETE Statement
The DELETE statement is used to remove records from a table. If we want to delete
the sale record associated with Alice's purchase, the SQL command would be:
DELETE FROM sales
WHERE customer_name = 'Alice' AND product = 'Smartphone';

This command identifies the record that meets the specified conditions and removes
it from the sales table.
These three statements—INSERT, UPDATE, and DELETE—are essential for
effective database management, allowing users to maintain accurate and up-to-date
records in their SQL databases.
Notes:
Delete – removes specific rows
Truncate – removes all rows
Drop – removes entire from Db
Alter – add column, modify column, drop column

ALTER Statement
The ALTER TABLE statement is a crucial SQL command used for modifying the
structure of an existing table in a database. As data requirements evolve over time,
the ability to change table definitions is essential for maintaining database integrity
and functionality. This statement enables database administrators and developers to
add, modify, or drop columns, as well as to change data types and constraints
without the need to recreate the entire table.
There are several scenarios where using the ALTER TABLE statement becomes
necessary. For instance, a business may need to expand its data storage
capabilities by adding new fields to accommodate additional information. If a
company decides to collect phone numbers for customer records, it would use the
ALTER statement to add a new column named phone_number to the customers
table. An example of this command would be:
ALTER TABLE customers
ADD COLUMN phone_number VARCHAR(15);

In another scenario, a business might realize that a column's data type is insufficient
for the data it needs to store. For example, if the salary column in an employees
table currently uses an INT type but needs to store larger values, the administrator
could modify the column to a DECIMAL type as follows:
ALTER TABLE employees
MODIFY COLUMN salary DECIMAL(10, 2);

Additionally, if a column is no longer necessary, it can be removed using the DROP


COLUMN command. For instance, if the middle_name column in the employees
table is deemed redundant, it can be deleted using:
ALTER TABLE employees
DROP COLUMN middle_name;

The ALTER TABLE statement is vital for ensuring that database schemas remain
aligned with changing business needs, providing flexibility to adapt to new
information requirements efficiently.
ORDER BY, GROUP BY, and HAVING Clauses
In SQL, the ORDER BY, GROUP BY, and HAVING clauses are essential for
organizing and analyzing data effectively. Each clause serves a distinct purpose,
allowing users to sort results, aggregate data, and filter aggregated results,
respectively.

ORDER BY Clause
The ORDER BY clause is used to sort the results of a query based on one or more
columns. By default, sorting is done in ascending order, but it can be specified as
descending using the DESC keyword. For instance, if we want to retrieve a list of
employees sorted by their hire date, the SQL query would look like this:
SELECT *
FROM employees
ORDER BY hire_date ASC;

In this example, the results will be displayed with the earliest hire dates first. If we
wanted to sort by salary in descending order, the query would be modified to:
SELECT *
FROM employees
ORDER BY salary DESC;

GROUP BY Clause
The GROUP BY clause is utilized to aggregate data based on one or more columns.
This clause is often used in conjunction with aggregate functions like COUNT(),
SUM(), AVG(), MIN(), and MAX(). For example, if we need to calculate the total
sales for each product in a sales table, we can use:
SELECT product, SUM(quantity) AS total_sales
FROM sales
GROUP BY product;

This query groups the records by the product column and calculates the total
quantity sold for each product.

HAVING Clause
The HAVING clause is used to filter records after aggregation has occurred, which is
not possible with the WHERE clause. For instance, if we want to find products that
have total sales greater than 100 units, we can extend the previous query with a
HAVING clause:
SELECT product, SUM(quantity) AS total_sales
FROM sales
GROUP BY product
HAVING SUM(quantity) > 100;

In this example, the HAVING clause filters the grouped results to only include
products with total sales exceeding 100 units. Together, these clauses enhance the
ability to analyze and interpret data effectively, enabling users to derive valuable
insights from their datasets.

Joins in SQL
JOIN operations in SQL are fundamental for combining rows from two or more tables
based on a related column. By leveraging JOINs, users can retrieve comprehensive
datasets that reflect the relationships among various entities. The most common
types of JOINs include INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER
JOIN, each serving distinct purposes.

INNER JOIN
An INNER JOIN returns only the rows that have matching values in both tables. For
example, consider two tables: employees and departments. The employees table
contains a department_id column that relates to the departments table’s id column.
To retrieve a list of employees along with their corresponding department names, the
SQL query would be:
SELECT employees.name, departments.name
FROM employees
INNER JOIN departments ON employees.department_id = departments.id;

This query returns only those employees who belong to a department, omitting any
employees without an associated department.
LEFT JOIN
A LEFT JOIN returns all the rows from the left table and the matched rows from the
right table. If there is no match, NULL values are returned for columns from the right
table. For instance, if we want to get a list of all employees and their departments,
including those without departments, we would use:
SELECT employees.name, departments.name
FROM employees
LEFT JOIN departments ON employees.department_id = departments.id;

This query will include all employees, displaying NULL for department names where
no match exists.

RIGHT JOIN
A RIGHT JOIN operates similarly to a LEFT JOIN, but it returns all the rows from the
right table and the matching rows from the left table. If there’s no match, NULL
values appear for columns from the left table. Using the previous example, if we
want all departments listed with their employees, even if some departments have no
employees, the query would be:
SELECT employees.name, departments.name
FROM employees
RIGHT JOIN departments ON employees.department_id = departments.id;

This query will show all departments, with NULL for employee names where no
match exists.

FULL OUTER JOIN


A FULL OUTER JOIN combines the results of both LEFT JOIN and RIGHT JOIN. It
returns all rows from both tables, with NULLs in places where there is no match. For
instance, to get a complete list of all employees and departments, regardless of
whether they are related, the query would be:
SELECT employees.name, departments.name
FROM employees
FULL OUTER JOIN departments ON employees.department_id = departments.id;

This query captures all employees and departments, filling in NULLs where
relationships do not exist.
Understanding these JOIN types is crucial for effective data retrieval and analysis in
relational databases, as they enable users to create nuanced queries that reflect
complex data relationships.

Self-Join
A self-join is a special type of join in SQL where a table is joined with itself. This
technique is useful for querying hierarchical data or comparing rows within the same
table. In a self-join, the same table is referenced multiple times in a query, allowing
for a comparison of rows based on a common attribute.
To illustrate the concept of self-joins, consider an example of an employees table
that contains information about employees and their managers. The table might have
the following columns: employee_id, name, and manager_id. Here, the manager_id
column refers to the employee_id of the employee's manager.
Suppose we want to retrieve a list of employees along with the names of their
respective managers. We can achieve this by performing a self-join on the
employees table. The SQL query would look like this:
SELECT e1.name AS Employee, e2.name AS Manager
FROM employees e1
JOIN employees e2 ON e1.manager_id = e2.employee_id;

In this query, e1 and e2 are aliases for the employees table. The JOIN condition
specifies that the manager_id from the first instance (e1) must match the
employee_id from the second instance (e2). The result will display each employee
alongside their manager’s name.
Self-joins can also be used for more complex queries, such as finding employees
who have the same manager. This can be done by grouping the results based on the
manager_id:
SELECT e1.manager_id, e1.name AS Employee1, e2.name AS Employee2
FROM employees e1
JOIN employees e2 ON e1.manager_id = e2.manager_id
WHERE e1.employee_id <> e2.employee_id;

This query retrieves pairs of employees who share the same manager, ensuring that
an employee is not compared with themselves by including the condition WHERE
e1.employee_id <> e2.employee_id.
Self-joins are powerful tools for analyzing data within a single table, enabling users
to extract valuable insights about relationships and hierarchies that exist in their
datasets.
Aggregation Functions
SQL aggregation functions are powerful tools that allow users to perform calculations
on a set of values and return a single value. These functions are essential for
summarizing data, making them invaluable in data analysis. Some of the most
commonly used aggregation functions include COUNT, SUM, AVG, MIN, and MAX.

COUNT
The COUNT function is used to determine the number of rows that match a specified
condition. For instance, if we want to count the total number of employees in a
company, the SQL query would be:
SELECT COUNT(*) AS TotalEmployees
FROM employees;

This query returns the total number of records in the employees table.

SUM
The SUM function calculates the total of a numeric column. For example, to find the
total sales from a sales table, the query would be:
SELECT SUM(quantity) AS TotalSales
FROM sales;

This query provides the total quantity sold across all transactions.

AVG
The AVG function computes the average value of a numeric column. To calculate
the average salary of employees, the following query can be used:
SELECT AVG(salary) AS AverageSalary
FROM employees;

This will return the mean salary value for all employees in the organization.

MIN and MAX


The MIN and MAX functions are used to find the minimum and maximum values in a
dataset, respectively. For example, to find the lowest and highest salaries in the
employees table, the following queries can be executed:
SELECT MIN(salary) AS LowestSalary
FROM employees;

SELECT MAX(salary) AS HighestSalary


FROM employees;

These queries return the minimum and maximum salary values, providing insights
into the compensation range within the organization.
Grouping Data with Aggregation Functions
Aggregation functions are often used in conjunction with the GROUP BY clause to
summarize data by specific categories. For example, if we want to calculate the total
sales for each product in the sales table, we can use:
SELECT product, SUM(quantity) AS TotalSales
FROM sales
GROUP BY product;

This will produce a summary of total sales for each product, allowing for effective
comparison and analysis.
In summary, SQL aggregation functions are crucial for data analysis, enabling users
to derive meaningful insights from their datasets through efficient summarization and
calculation.
Subqueries
Subqueries, also known as nested queries or inner queries, are SQL queries
embedded within another SQL query. They allow users to perform operations that
depend on the results of an outer query, enabling complex data retrieval and
manipulation in a structured manner. Subqueries can be used in various SQL
statements, including SELECT, INSERT, UPDATE, and DELETE, making them a
versatile tool in SQL programming.
The role of subqueries is primarily to filter or retrieve data based on criteria that are
derived from another query. This capability is particularly useful when the information
required cannot be obtained from a single query alone. Subqueries can be
categorized into two types: correlated and non-correlated. A non-correlated subquery
is independent of the outer query and can be executed on its own, while a correlated
subquery relies on the outer query for its values.

Example of a Non-Correlated Subquery


Consider a scenario where we want to find all employees whose salaries are above
the average salary in the company. The SQL statement could be structured as
follows:
SELECT name
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

In this example, the inner query calculates the average salary of all employees, and
the outer query retrieves the names of those employees whose salaries exceed this
average.

Example of a Correlated Subquery


Correlated subqueries are often used for more complex conditions. For instance, if
we wish to find employees who earn more than the average salary of their respective
departments, we can use a correlated subquery:
SELECT e1.name, e1.department_id
FROM employees e1
WHERE e1.salary > (SELECT AVG(e2.salary)
FROM employees e2
WHERE e1.department_id = e2.department_id);

In this example, the inner query calculates the average salary for each department
while the outer query checks each employee against their department's average
salary.
Subqueries enhance the ability to perform intricate queries, enabling data analysts
and developers to derive meaningful insights from relational databases effectively.
By leveraging subqueries, users can simplify complex logic and improve ++the
readability of their SQL commands.
Common Table Expressions (CTE)
Common Table Expressions (CTEs) are a powerful feature in SQL that provide a
way to create temporary result sets that can be referenced within a SELECT,
INSERT, UPDATE, or DELETE statement. A CTE is defined using the WITH clause,
followed by a query that generates the result set. This feature enhances the
readability and maintainability of SQL code, particularly for complex queries.

Benefits of CTEs
One of the primary benefits of using CTEs is their ability to simplify complex queries
by breaking them down into manageable parts. Instead of nesting multiple
subqueries, which can be challenging to read and debug, CTEs allow developers to
define a query once and reference it multiple times within a single SQL statement.
This not only improves clarity but also promotes code reuse.
Additionally, CTEs can be recursive, enabling users to perform hierarchical data
queries easily. This is particularly useful for applications that require reporting on
organizational structures, such as employee hierarchies.

Syntax of CTEs
The basic syntax for creating a CTE is as follows:
WITH cte_name AS (
SELECT column1, column2, ...
FROM table_name
WHERE condition
)
SELECT *
FROM cte_name;

In this syntax, cte_name represents the name of the CTE, and the query within
parentheses defines the result set.

Example of CTE
To illustrate the use of CTEs, consider a scenario where we want to retrieve the total
sales for each product from a sales table, and subsequently filter those results to
include only products with total sales greater than 100 units. The SQL query using a
CTE would look like this:
WITH ProductSales AS (
SELECT product, SUM(quantity) AS total_sales
FROM sales
GROUP BY product
)
SELECT *
FROM ProductSales
WHERE total_sales > 100;

In this example, the ProductSales CTE calculates the total quantity sold for each
product. The outer query then filters the results to show only those products with
total sales exceeding 100 units.
CTEs enhance query organization and clarity, making it easier for developers to
construct, read, and maintain complex SQL statements efficiently.
Window Functions
Window functions in SQL are a powerful feature that allows for advanced data
analysis without the need for complex subqueries or joins. Unlike traditional
aggregate functions that return a single result for a group of rows, window functions
operate on a set of rows defined by the OVER() clause, enabling users to perform
calculations across a specified range of data while still returning the individual row
results.

Utility of Window Functions


Window functions are particularly useful in analytical queries where you want to
compute running totals, moving averages, or rank values within a partition of data.
They provide a way to analyze data in a more granular manner while preserving the
detail of each row. This capability is essential for tasks such as financial reporting,
trend analysis, and statistical computations.

Examples of Window Functions


To illustrate the utility of window functions, consider a sales table with columns
sale_id, product, quantity, and sale_date. Suppose we want to calculate the
cumulative sales for each product over time. A traditional approach might involve
multiple queries or subqueries, but with window functions, this can be achieved
simply:
SELECT
sale_id,
product,
quantity,
SUM(quantity) OVER (PARTITION BY product ORDER BY sale_date) AS
cumulative_sales
FROM
sales;

In this example, the SUM() function computes the cumulative sales for each product,
partitioned by the product column and ordered by sale_date. The result will show
each sale along with its cumulative total up to that point in time.

Comparison with Traditional Aggregates


Traditional aggregate functions, such as SUM() or COUNT(), would typically group
the results, returning one row per group. For instance, if you wanted to find the total
quantity sold for each product, you would use:
SELECT
product,
SUM(quantity) AS total_quantity
FROM
sales
GROUP BY
product;

This query returns one result per product, which is useful for summary reports but
does not retain the detailed transaction information. In contrast, the window function
retains all original rows while adding the cumulative sales information, allowing for
more detailed analysis without losing context.

Conclusion
Window functions enhance SQL's analytical capabilities, allowing users to perform
complex calculations over specific data partitions while maintaining the integrity of
the original dataset. This makes them invaluable for data analysts and anyone
needing to perform sophisticated data analysis in relational databases.
Analytical Functions
Analytical functions in SQL are a powerful tool that enables complex calculations
across a set of rows related to the current row, providing richer insights into data
patterns and trends. Unlike standard aggregate functions, which summarize data into
a single output row, analytical functions return results for each row within the context
of its larger dataset. This capability is essential for tasks such as calculating running
totals, ranking items, and performing moving averages.
One of the most commonly used analytical functions is ROW_NUMBER(), which
assigns a unique sequential integer to rows within a partition of a result set. For
example, if you have a table of sales records and you want to rank each sale within
its respective product category by the sale date, you can use:
SELECT
product,
sale_date,
ROW_NUMBER() OVER (PARTITION BY product ORDER BY sale_date) AS
sale_rank
FROM
sales;

In this query, ROW_NUMBER() generates a rank for each sale per product, allowing
analysts to see the chronological order of sales within each category.
Another popular analytical function is SUM() used with the OVER() clause, which
calculates a cumulative total across a specified window. For example, to compute a
running total of sales quantities, you could write:
SELECT
sale_date,
product,
quantity,
SUM(quantity) OVER (ORDER BY sale_date) AS running_total
FROM
sales;

This query provides a total of all quantities sold up to each sale date, giving insight
into sales trends over time.
Moreover, the LAG() and LEAD() functions allow users to access data from the
previous or subsequent row, which is particularly useful for comparing values across
rows. For instance, to find the difference in sales quantity from one day to the next,
you could use:
SELECT
sale_date,
product,
quantity,
LAG(quantity) OVER (PARTITION BY product ORDER BY sale_date) AS
previous_quantity,
quantity - LAG(quantity) OVER (PARTITION BY product ORDER BY sale_date)
AS quantity_change
FROM
sales;
In this case, LAG(quantity) retrieves the quantity from the previous sale, enabling the
calculation of changes in sales volume day over day.
Overall, analytical functions enhance the capabilities of SQL for data analysis,
allowing users to perform sophisticated calculations that provide deeper insights into
their datasets without sacrificing the detail of individual records.
Views in SQL
Views in SQL are virtual tables that represent the result of a query. They allow users
to encapsulate complex SQL queries, thereby simplifying data access and
enhancing security. A view does not store data itself; instead, it dynamically retrieves
data from the underlying tables whenever accessed. This abstraction layer can
significantly streamline querying processes, as users can interact with a simplified
representation of data without needing to understand the underlying complexities.

Creating a View
Creating a view involves using the CREATE VIEW statement, followed by the view
name and the SQL query that defines the view. Here’s an example of how to create
a view:
CREATE VIEW EmployeeSales AS
SELECT e.name, e.department, SUM(s.quantity) AS TotalSales
FROM employees e
JOIN sales s ON e.employee_id = s.employee_id
GROUP BY e.name, e.department;

In this example, the EmployeeSales view summarizes total sales for each employee,
providing a clear overview of sales performance by department.

Benefits of Using Views


1. Simplified Querying: Users can access complex data through a simple view,
which reduces the need to write intricate SQL queries repeatedly. For
instance, instead of writing the JOIN and aggregation logic every time, users
can simply query the EmployeeSales view with:
SELECT * FROM EmployeeSales WHERE TotalSales > 100;

2. Enhanced Security: Views can restrict access to sensitive data by exposing


only specific columns or rows. For example, a view can be created that omits
salary information, allowing HR staff to access employee data without
revealing confidential salary details.

3. Data Abstraction: Views provide a level of abstraction, allowing changes to


the underlying database schema without affecting how users access the data.
If the table structure changes, the view can be updated independently,
ensuring that users' queries remain valid.

4. Reusability: Once created, views can be reused in multiple queries,


promoting consistency and reducing redundancy in SQL code.

Using a View
To query a view, the syntax is similar to querying a regular table. For example, to
retrieve all employees with total sales greater than 100, you would execute:
SELECT * FROM EmployeeSales WHERE TotalSales > 100;
Views serve as an essential tool in SQL, offering a way to simplify complex queries,
enhance security, and provide a consistent interface for data access, making them
invaluable for database management and reporting.
CASE Statements
In SQL, CASE statements provide a way to implement conditional logic within
queries, allowing users to return specific values based on varying conditions. This
functionality is especially useful for categorizing data or generating computed
columns based on certain criteria. The basic syntax of a CASE statement can be
structured as follows:
SELECT column1,
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
...
ELSE default_result
END AS alias_name
FROM table_name;

Applications of CASE Statements


CASE statements can be applied in various scenarios, such as data transformation,
reporting, or even as part of aggregate functions. For example, consider a sales
table that records sales data including sale_id, product, quantity, and sale_date. We
might want to categorize sales based on the quantity sold, creating sales tiers for
reporting purposes.

Example 1: Categorizing Sales Data


SELECT product,
quantity,
CASE
WHEN quantity > 100 THEN 'High Sales'
WHEN quantity BETWEEN 51 AND 100 THEN 'Medium Sales'
ELSE 'Low Sales'
END AS sales_category
FROM sales;

In this example, the query categorizes products based on the quantity sold, allowing
for straightforward analysis of sales performance.

Example 2: Conditional Aggregation


CASE statements can also be utilized within aggregate functions to create
conditional summaries. For instance, if we want to calculate the total sales quantity
for each category, we can write:
SELECT product,
SUM(CASE
WHEN quantity > 100 THEN quantity
ELSE 0
END) AS total_high_sales
FROM sales
GROUP BY product;

This query aggregates only the quantities that fall into the 'High Sales' category,
offering a focused insight into the highest-performing products.
Example 3: Dynamic Pricing Strategy
Another scenario could involve adjusting prices based on sales volume. For
instance, if we want to apply a discount based on quantity sold, we could implement
a CASE statement as follows:
SELECT product,
quantity,
price,
CASE
WHEN quantity > 100 THEN price * 0.9 -- 10% discount
WHEN quantity BETWEEN 51 AND 100 THEN price * 0.95 -- 5%
discount
ELSE price
END AS adjusted_price
FROM sales;

In this query, the adjusted_price column dynamically calculates the price based on
the quantity sold, applying discounts accordingly.
Overall, CASE statements in SQL are powerful tools for incorporating conditional
logic within queries, enhancing the ability to analyze and report data in meaningful
ways.
Creating Tables Using CTAS
The CREATE TABLE AS SELECT (CTAS) statement is a powerful SQL command
that allows users to create a new table based on the result set of a SELECT query.
This command is particularly useful for creating a snapshot of existing data or for
generating temporary tables for further analysis. The CTAS statement combines the
creation of a new table with the insertion of data into it, simplifying the process of
table creation and data population.

Syntax of CTAS
The basic syntax for using the CTAS statement is as follows:
CREATE TABLE new_table_name AS
SELECT column1, column2, ...
FROM existing_table_name
WHERE condition;

In this syntax, new_table_name is the name of the table to be created, and the
SELECT query determines which data is copied into the new table.

Practical Usage Scenarios


1. Data Archiving: Organizations often need to archive historical data for
reporting purposes. For instance, if a company wants to create an archive of
sales data from the previous year, the CTAS statement can be employed:
CREATE TABLE sales_archive AS
SELECT *
FROM sales
WHERE sale_date < '2023-01-01';

This query creates a new table called sales_archive containing all sales
records from prior to 2023.

2. Creating Summary Tables: When performing complex analyses, users may


want to create summary tables that aggregate data. For example, if a
business wants to generate a summary of total sales by product, the following
CTAS statement could be used:
CREATE TABLE product_sales_summary AS
SELECT product, SUM(quantity) AS total_sales
FROM sales
GROUP BY product;

This query generates a new table that holds the total sales figures for each
product, making it easier to access summarized data without repeatedly
executing complex queries.

3. Temporary Tables for Data Transformation: When processing large


datasets, it may be beneficial to create temporary tables for various
transformation steps. For example, if data needs to be cleaned or transformed
before analysis, a CTAS statement can help facilitate these operations:
CREATE TABLE cleaned_data AS
SELECT *
FROM raw_data
WHERE quality_check = 'pass';

This command creates a new table with only the records that passed a quality
check, allowing users to focus on high-quality data.

Conclusion
The CTAS statement is an effective tool for creating new tables efficiently while
simultaneously populating them with data from existing tables. By leveraging this
command, users can streamline various data management tasks, such as archiving,
summarizing, and transforming data, ultimately enhancing their workflow and
productivity in database management.
Temporary Tables
Temporary tables are a powerful feature in SQL that allows users to create tables
that exist temporarily during a session. They are particularly useful for storing
intermediate results or for data that is only needed for a short duration, such as
during complex queries or data processing tasks. These tables help streamline data
manipulation and enhance performance by avoiding repeated calculations or data
retrieval operations.

Use Cases
Temporary tables are commonly used in scenarios such as:
• Complex Data Manipulation: When performing intricate data
transformations, temporary tables can be used to hold intermediate results,
making it easier to manage the data flow.
• Staging Data for Processing: When importing or exporting data, temporary
tables can serve as staging areas where data can be cleaned or transformed
before final insertion into permanent tables.
• Simplifying Queries: For complex queries that require multiple steps,
temporary tables can simplify the SQL code by breaking down the process
into manageable parts.

Benefits
The benefits of using temporary tables include:
• Performance Improvement: By storing intermediate results in a temporary
table, you can reduce the need for repeated calculations, which can enhance
query performance.
• Isolation: Temporary tables are session-specific, meaning that they do not
interfere with the data in permanent tables. This characteristic is ideal for
testing and development.
• Ease of Use: Temporary tables allow for easier coding and maintenance by
providing a structured way to handle intermediate data.

Lifecycle of Temporary Tables


Temporary tables are created using the CREATE TEMPORARY TABLE statement
and are automatically dropped at the end of the session or when they are no longer
needed. The syntax for creating a temporary table is as follows:
CREATE TEMPORARY TABLE temp_table_name (
column1 datatype,
column2 datatype,
...
);

For example, to create a temporary table to hold sales data, you could use:
CREATE TEMPORARY TABLE temp_sales (
product_id INT,
quantity_sold INT,
sale_date DATE
);

Create temp table temp_employee As Select * from employee where department


= ‘IT’

Example of Using Temporary Tables


After creating a temporary table, you can insert data into it using standard SQL
commands. For instance:
INSERT INTO temp_sales (product_id, quantity_sold, sale_date)
VALUES (1, 100, '2023-10-01'), (2, 200, '2023-10-02');

You can then perform operations on the temporary table just like any other table,
such as querying it to retrieve data:
SELECT * FROM temp_sales
WHERE quantity_sold > 150;

Once the session ends or the temporary table is no longer needed, it is automatically
dropped, ensuring that there is no residual data lingering in the database.
Temporary tables are an invaluable tool for SQL developers and database
administrators, facilitating efficient data handling and enhancing the overall
performance of database operations.

Difference Between View, CTE, CTAS, and Temporary Table in Snowflake

CTE (Common CTAS (Create


Temporary
Feature View Table Table As
Table
Expression) Select)

Creates a table
A stored SQL A temporary A table that
based on a
Definition query that does result set used exists only in the
SELECT
not store data. within a query. session.
statement.

Permanent
Permanent
Only lasts during (unless it's Exists only for
Persistence (unless it's a
query execution. explicitly the session.
transient view).
dropped).

No storage;
No storage; it Physically stores Stores data for
created in
Storage fetches data the data in a the session
memory during
dynamically. table. duration.
query execution.

Use Case Used for Used for Used to create a Used for
reusable query simplifying new table from intermediate
CTE (Common CTAS (Create
Temporary
Feature View Table Table As
Table
Expression) Select)

logic and
an existing processing
security (data complex queries.
dataset. within a session.
masking).

Faster for
Slightly slower Faster as data is
temporary use Faster as data is
Performance as it queries data stored, but only
but does not materialized.
dynamically. for the session.
persist.

Data Type Casting


Data type casting in SQL refers to the conversion of one data type into another. This
functionality is essential for ensuring that data is accurately processed and
compared, especially when performing operations on columns of different types.
SQL provides both implicit and explicit casting options, enabling developers to
convert data types as required for specific queries or operations.
Implicit Casting
Implicit casting occurs automatically when SQL recognizes that it can convert one
data type to another without the need for explicit instructions from the user. This
usually happens when the conversion is safe and unambiguous. For example, when
performing a mathematical operation involving an integer and a floating-point
number, SQL will automatically convert the integer to a float to ensure accuracy:
SELECT 10 + 5.5 AS Total; -- Result: 15.5

In this case, the integer 10 is implicitly cast to a float for the addition operation,
allowing the operation to complete without any errors.

Explicit Casting
Explicit casting, on the other hand, requires the user to define the desired data type
for the conversion explicitly. This is done using the CAST() or CONVERT() functions.
Explicit casting is important when the conversion might lead to data loss or when the
user needs to ensure that the data conforms to a specific format. For example:
SELECT CAST(10 AS VARCHAR(10)) AS StringValue; -- Result: '10'

Here, the integer 10 is explicitly cast to a VARCHAR, converting it into a string


representation. Similarly, using the CONVERT() function:
SELECT CONVERT(DATE, '2023-10-01') AS ConvertedDate; -- Result: 2023-10-01

In this example, a string in the format of a date is converted to a DATE data type,
allowing for proper date comparisons and operations in subsequent queries.

Importance of Data Type Casting


Casting is particularly crucial in cases where data types might conflict, such as when
comparing or combining data from different sources or tables. For example, if a
numeric value is being compared to a string representation of a number, explicit
casting must be performed to avoid errors:
SELECT *
FROM products
WHERE price = CAST('100.00' AS DECIMAL(10, 2)); -- Ensures proper
comparison

Proper use of data type casting enhances data integrity and accuracy in SQL
queries, allowing developers to manipulate and analyze data effectively across
various data types.

Working with Date and Time in SQL


Handling date and time data types in SQL is essential for many applications,
particularly those involving scheduling, logging, and temporal analyses. SQL
provides various data types and functions to manipulate date and time values
effectively. The most commonly used data types for date and time are DATE, TIME,
and DATETIME.
Date and Time Data Types
• DATE: Stores date values (year, month, day).
CREATE TABLE events (
event_id INT,
event_date DATE
);

• TIME: Stores time values (hours, minutes, seconds).


CREATE TABLE schedules (
schedule_id INT,
start_time TIME
);

• DATETIME: Combines date and time into a single data type.


CREATE TABLE appointments (
appointment_id INT,
appointment_time DATETIME
);

Date Functions
SQL offers numerous functions for manipulating date and time data:
• GETDATE(): Returns the current date and time.
SELECT GETDATE() AS CurrentDateTime;

• DATEADD(): Adds a specified time interval to a date.


SELECT DATEADD(DAY, 10, '2023-10-01') AS NewDate; -- Results in
'2023-10-11'

• DATEDIFF(): Returns the difference between two dates.


SELECT DATEDIFF(DAY, '2023-10-01', '2023-10-11') AS Difference; --
Results in 10

• FORMAT(): Formats a date value based on a specified format.

SELECT FORMAT(GETDATE(), 'yyyy-MM-dd') AS FormattedDate; -- Results


in '2023-10-11'

Practical Code Snippets


Here are some practical code snippets that demonstrate how to work with date and
time in SQL:
1. Inserting Date Values:
INSERT INTO events (event_id, event_date)
VALUES (1, '2023-10-15');

2. Querying Records by Date:


SELECT * FROM events
WHERE event_date > '2023-10-01';

3. Updating Date Values:


UPDATE events
SET event_date = DATEADD(MONTH, 1, event_date)
WHERE event_id = 1;

4. Using Current Date in Queries:


SELECT * FROM appointments
WHERE appointment_time >= GETDATE();

5. Extracting Parts of a Date:


SELECT
YEAR(event_date) AS EventYear,
MONTH(event_date) AS EventMonth,
DAY(event_date) AS EventDay
FROM events;

By mastering date and time handling in SQL, users can perform sophisticated
temporal analyses and maintain accurate records crucial for various applications.
Stored Procedures in SQL

Stored Procedures in SQL Server


A Stored Procedure is a precompiled collection of SQL statements that are stored in a
database and can be executed as a single unit. Stored procedures help improve
performance, maintainability, and security by encapsulating business logic within the
database.
Instead of writing and sending multiple SQL queries repeatedly, stored procedures allow
developers to define, store, and execute complex SQL operations efficiently.

1. Purpose of Stored Procedures


Stored procedures simplify and optimize database interactions by enabling:
✅ Code Reusability – Define logic once and reuse it multiple times.
✅ Parameter Handling – Accept input parameters to perform dynamic operations.
✅ Security Enforcement – Restrict direct table access and enhance database security.
✅ Improved Maintainability – Encapsulate complex logic in a modular format.
✅ Performance Optimization – Reduce network traffic and leverage query caching.
Example Scenario: Without vs. With Stored Procedure
Imagine a scenario where you frequently need to fetch employee details from a database
based on Employee ID.
🔴 Without a Stored Procedure (Repetitive Queries in Application Code):
SELECT * FROM Employees WHERE EmployeeID = 101;
SELECT * FROM Employees WHERE EmployeeID = 102;
SELECT * FROM Employees WHERE EmployeeID = 103;
This results in multiple database requests and increased network traffic.
✅ With a Stored Procedure:
CREATE PROCEDURE usp_GetEmployeeDetails
@EmployeeID INT
AS
BEGIN
SELECT * FROM Employees WHERE EmployeeID = @EmployeeID;
END;
Now, whenever you need to fetch an employee’s details, you execute the stored procedure:
EXEC usp_GetEmployeeDetails @EmployeeID = 101;
This reduces network traffic and makes the query more maintainable.

2. Enhancing SQL Server Performance with Stored Procedures


Stored procedures significantly improve database performance through multiple
optimizations:
1️⃣ Reduced Network Traffic
Instead of sending multiple SQL queries, the application sends a single stored procedure
call, minimizing data exchange between the application and the database.
🔹 Example – Without Stored Procedure (Multiple Queries Sent to SQL Server):
SELECT * FROM Orders WHERE CustomerID = 1;
SELECT * FROM Orders WHERE CustomerID = 2;
SELECT * FROM Orders WHERE CustomerID = 3;
🔹 Example – With Stored Procedure (Single Execution Call):
EXEC usp_GetOrdersByCustomer @CustomerID = 1;
✅ Benefit: The application only sends one command, improving speed and efficiency.

2️⃣ Execution Plan Reuse


SQL Server caches execution plans of stored procedures, reducing the need to recompile
SQL statements every time they run.
🔹 Example – Without Stored Procedure (Dynamic SQL Query Compilation Every Time):
DECLARE @customerID INT = 5;
EXEC('SELECT * FROM Orders WHERE CustomerID = ' + @customerID);
🔹 Example – With Stored Procedure (Reusing Cached Execution Plan):
CREATE PROCEDURE usp_GetOrdersByCustomer @CustomerID INT
AS
BEGIN
SELECT * FROM Orders WHERE CustomerID = @CustomerID;
END;
✅ Benefit: The stored procedure’s execution plan is cached, making future executions
faster.

3️⃣ Optimization for Performance


Stored procedures allow SQL Server to optimize query execution by selecting the most
efficient execution plan based on provided parameters.
🔹 Example – Optimized Query Execution
CREATE PROCEDURE usp_GetTopOrders
@TopN INT
AS
BEGIN
SELECT TOP (@TopN) * FROM Orders ORDER BY OrderDate DESC;
END;
Now, executing the procedure with different values:
EXEC usp_GetTopOrders @TopN = 5;
EXEC usp_GetTopOrders @TopN = 10;
✅ Benefit: SQL Server optimizes the execution plan based on the input parameters.

3. Guidelines for Writing Efficient Stored Procedures


Follow these best practices when designing stored procedures:
1️⃣ Use Descriptive Names
Stored procedure names should be clear and meaningful.
❌ Bad Name:
CREATE PROCEDURE sp1;
✅ Good Name:
CREATE PROCEDURE usp_GetEmployeeDetails;
Naming Convention:
 usp_ → User-defined Stored Procedure
 sp_ → Avoid this prefix (reserved for system procedures)

2️⃣ Validate Input Parameters


Always validate parameters before executing SQL statements to prevent SQL injection.
🔹 Example – Safe Input Validation:
CREATE PROCEDURE usp_GetEmployeeDetails
@EmployeeID INT
AS
BEGIN
IF @EmployeeID IS NULL
BEGIN
PRINT 'Invalid Employee ID';
RETURN;
END;
SELECT * FROM Employees WHERE EmployeeID = @EmployeeID;
END;
✅ Benefit: Prevents execution with NULL or invalid values.

3️⃣ Keep Stored Procedures Modular and Simple


Break complex procedures into smaller, reusable ones.
🔹 Example – Modular Design:
CREATE PROCEDURE usp_GetCustomerOrders @CustomerID INT
AS
BEGIN
SELECT * FROM Orders WHERE CustomerID = @CustomerID;
END;
Instead of writing a monolithic procedure, split logic into smaller procedures.

4️⃣ Implement Error Handling


Use TRY...CATCH blocks to handle errors gracefully.
🔹 Example – Error Handling in Stored Procedures:
CREATE PROCEDURE usp_UpdateEmployeeSalary
@EmployeeID INT,
@NewSalary DECIMAL(10,2)
AS
BEGIN
BEGIN TRY
UPDATE Employees SET Salary = @NewSalary WHERE EmployeeID = @EmployeeID;
END TRY
BEGIN CATCH
PRINT 'Error occurred: ' + ERROR_MESSAGE();
END CATCH
END;
✅ Benefit: Prevents application crashes due to unhandled errors.

5️⃣ Use Comments & Documentation


Add comments to improve code readability.
🔹 Example – Adding Comments:
CREATE PROCEDURE usp_GetEmployeeDetails
@EmployeeID INT
AS
BEGIN
-- Fetch employee details based on EmployeeID
SELECT * FROM Employees WHERE EmployeeID = @EmployeeID;
END;
✅ Benefit: Helps future developers understand the procedure.

4. Executing Stored Procedures


Stored procedures can be executed using the EXEC command.
Basic Execution
EXEC usp_GetEmployeeDetails @EmployeeID = 123;
Executing Without Parameters
If a stored procedure does not have parameters:
EXEC usp_ListAllEmployees;
Using Output Parameters
Stored procedures can return output values:
CREATE PROCEDURE usp_GetEmployeeCount @TotalCount INT OUTPUT
AS
BEGIN
SELECT @TotalCount = COUNT(*) FROM Employees;
END;
Executing the procedure:
DECLARE @Count INT;
EXEC usp_GetEmployeeCount @TotalCount = @Count OUTPUT;
PRINT @Count;
✅ Benefit: Allows stored procedures to return values directly to the calling program.

5. Summary

Feature Stored Procedures Benefit

Performance Faster execution via caching

Reusability Write once, use multiple times

Security Restrict direct table access

Error Handling Prevents unexpected failures

Maintainability Modular and structured logic

Final Thoughts
 ✅ Use stored procedures to improve SQL Server efficiency.
 ✅ Follow best practices like input validation, modular design, and error handling.
 ✅ Stored procedures help with scalability, security, and performance tuning.

SQL Practice and Exercises


Creating and Inserting Data into Tables

Creating Tables

CREATE TABLE employees (

employee_id INT PRIMARY KEY,

name VARCHAR(100),

department VARCHAR(50),

salary DECIMAL(10,2),

manager_id INT NULL

);

CREATE TABLE customers (

customer_id INT PRIMARY KEY,

name VARCHAR(100),

email VARCHAR(100)

);

CREATE TABLE orders (

order_id INT PRIMARY KEY,

customer_id INT,

order_date DATE,

FOREIGN KEY (customer_id) REFERENCES customers(customer_id)

);

CREATE TABLE products (

product_id INT PRIMARY KEY,

product_name VARCHAR(100),

price DECIMAL(10,2)

);

CREATE TABLE order_items (

order_item_id INT PRIMARY KEY,

order_id INT,
product_id INT,

quantity INT,

FOREIGN KEY (order_id) REFERENCES orders(order_id),

FOREIGN KEY (product_id) REFERENCES products(product_id)

);

Inserting Data

INSERT INTO employees VALUES (1, 'John Doe', 'Sales', 60000, NULL);

INSERT INTO employees VALUES (2, 'Jane Smith', 'Marketing', 70000, 1);

INSERT INTO employees VALUES (3, 'Bob Brown', 'Sales', 50000, 1);

INSERT INTO customers VALUES (1, 'Alice Johnson', '[email protected]');

INSERT INTO customers VALUES (2, 'Charlie Davis', '[email protected]');

INSERT INTO orders VALUES (1, 1, '2024-01-01');

INSERT INTO orders VALUES (2, 2, '2024-01-02');

INSERT INTO products VALUES (1, 'Laptop', 1200.00);

INSERT INTO products VALUES (2, 'Smartphone', 800.00);

INSERT INTO order_items VALUES (1, 1, 1, 2);

INSERT INTO order_items VALUES (2, 2, 2, 1);

Beginner Exercises

1. Basic SELECT Query: Retrieve all columns from the employees table.

SELECT * FROM employees;

2. Filtering Data with WHERE Clause: Retrieve all employees whose


department is 'Sales'.

SELECT * FROM employees WHERE department = 'Sales';

3. Sorting Results: List all products ordered by price in descending


order.

SELECT * FROM products ORDER BY price DESC;

Intermediate Exercises
4. Aggregation with GROUP BY: Find the total number of orders placed by
each customer.

SELECT customer_id, COUNT(order_id) AS total_orders

FROM orders

GROUP BY customer_id;

5. Using JOINs: List all orders along with the customer names.

SELECT orders.order_id, customers.name

FROM orders

JOIN customers ON orders.customer_id = customers.customer_id;

6. Subqueries: Find the names of employees whose salary is greater than


the average salary.

SELECT name

FROM employees

WHERE salary > (SELECT AVG(salary) FROM employees);

Advanced Exercises

7. Using CASE Statements: Categorize employees based on their salary.

SELECT name,

CASE

WHEN salary > 70000 THEN 'High'

WHEN salary BETWEEN 40000 AND 70000 THEN 'Medium'

ELSE 'Low'

END AS salary_category

FROM employees;

8. CTE for Hierarchical Data: List all employees and their managers.

WITH EmployeeHierarchy AS (

SELECT employee_id, name, manager_id

FROM employees

WHERE manager_id IS NULL

UNION ALL

SELECT e.employee_id, e.name, e.manager_id

FROM employees e

INNER JOIN EmployeeHierarchy eh ON e.manager_id = eh.employee_id


)

SELECT * FROM EmployeeHierarchy;

(Note: base case, no manager, top level union with recursive case)

9. Window Functions for Ranking: Rank products based on their sales in


descending order.

SELECT p.product_id, p.product_name, sum(oi.quantity) as table_sum

RANK() OVER (ORDER BY SUM(oi.quantity) DESC) AS sales_rank

FROM product p join order_items oi ON p.product_id = oi.product_id

GROUP BY product_id, product_name order by sales_rank;(here order id is


suffiecient)

Additional Interview Questions

10. How do you delete duplicate records from a table?

Check down

1 CTE, row_number(),Delete

2 Delete, Groupby , min()

11. How do you fetch the 3rd highest salary in each department?

(Hint: Subquery – dense_rank()

Cte – dense_rank()

Dense_rank is recommended)

SELECT department, salary

FROM (

SELECT department, salary,

DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC)


AS rnk

FROM employees

) temp

WHERE rnk = 3;

WITH RankedSalaries AS (

SELECT

department,

name,

salary,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS
rank

FROM employees

SELECT department, name, salary

FROM RankedSalaries

WHERE rank = 3;

12. Write a query to find the Nth highest salary. (have no


department hence no partition)

WITH SalaryRank AS (

SELECT salary,

DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk

FROM employees

SELECT salary

FROM SalaryRank

WHERE rnk = N;

SELECT DISTINCT salary

FROM employees

ORDER BY salary DESC

LIMIT 1 OFFSET N-1;

13. Write a query to fetch all employee details where the employee
ID is even.

SELECT * FROM employees WHERE MOD(employee_id, 2) = 0;

different approaches to delete


Here are

duplicate records from each table. Each approach uses a


different SQL technique, so you can choose the most efficient one based on
your database system.
1. Deleting Duplicates from employees Using ROW_NUMBER() (Common for
PostgreSQL, SQL Server, MySQL 8+)

This method assigns a row number to each duplicate and deletes those with
row numbers greater than 1.

WITH CTE AS (

SELECT *,

ROW_NUMBER() OVER (PARTITION BY name, department, salary ORDER


BY employee_id) AS rn

FROM employees

DELETE FROM employees

WHERE employee_id IN (SELECT employee_id FROM CTE WHERE rn > 1);

✅ Best for: SQL Server, PostgreSQL, MySQL 8+

2. Deleting Duplicates from customers Using GROUP BY and MIN() (Simple


Approach)

DELETE FROM customers

WHERE customer_id NOT IN (

SELECT MIN(customer_id)

FROM customers

GROUP BY name, email

);

✅ Best for: Any database system (MySQL, SQL Server, PostgreSQL, Oracle)

3. Deleting Duplicates from orders Using EXISTS (Performance-Optimized)

DELETE FROM orders o1

WHERE EXISTS (

SELECT 1 FROM orders o2

WHERE o1.customer_id = o2.customer_id

AND o1.order_date = o2.order_date

AND o1.order_id > o2.order_id

);

✅ Best for: Large tables where performance is a concern


4. Deleting Duplicates from products Using JOIN (MySQL & PostgreSQL
Friendly)

DELETE p1 FROM products p1

JOIN products p2

ON p1.product_name = p2.product_name

AND p1.price = p2.price

AND p1.product_id > p2.product_id;

✅ Best for: MySQL, PostgreSQL

5. Deleting Duplicates from order_items Using TEMP TABLE (For Large Tables)

CREATE TABLE temp_order_items AS

SELECT * FROM order_items

WHERE order_item_id IN (

SELECT MIN(order_item_id) FROM order_items GROUP BY order_id,


product_id, quantity

);

DROP TABLE order_items;

ALTER TABLE temp_order_items RENAME TO order_items;

✅ Best for: Large datasets where DELETE operations are slow

6. Deleting Duplicates from employees Using DELETE with NOT EXISTS

DELETE FROM employees e1

WHERE NOT EXISTS (

SELECT 1 FROM employees e2

WHERE e1.name = e2.name

AND e1.department = e2.department

AND e1.salary = e2.salary

AND e2.employee_id < e1.employee_id

);

✅ Best for: Performance-optimized deletion, works in SQL Server, MySQL,


PostgreSQL
7. Deleting Duplicates from customers Using DISTINCT with INSERT
(Recreating the Table)

CREATE TABLE customers_temp AS

SELECT DISTINCT * FROM customers;

DROP TABLE customers;

ALTER TABLE customers_temp RENAME TO customers;

✅ Best for: If you want to retain only unique records and reset the table.

8. Deleting Duplicates from orders Using DELETE with LIMIT (Batch Deletion
in MySQL)

DELETE FROM orders

WHERE order_id IN (

SELECT order_id FROM (

SELECT order_id, ROW_NUMBER() OVER (PARTITION BY customer_id,


order_date ORDER BY order_id) AS rn

FROM orders

) AS temp

WHERE rn > 1

) LIMIT 1000;

✅ Best for: MySQL (batch deletion to prevent locking)

Summary: Best Method for Each Table

Table Best Approach

employees ROW_NUMBER() (CTE-based deletion)

customers GROUP BY + MIN()

orders EXISTS (Performance optimized)

products JOIN (MySQL/PostgreSQL friendly)

order_items TEMP TABLE (Large datasets)

JOIN
Let's explore all types of joins between T1 and T2 using Snowflake SQL.

Given Tables:
T1
ID
1
2
2
3
NULL
T2
ID
2
3
3
4
NULL

1. INNER JOIN (Matches only common values)


SELECT T1.ID AS T1_ID, T2.ID AS T2_ID
FROM T1
INNER JOIN T2
ON T1.ID = T2.ID;
🔹 Result:4
T1_ID T2_ID
2 2
2 2
3 3
3 3
🔹 Explanation:
 Matches 2 (twice from T1) with 2 in T2.
 Matches 3 from T1 with 3 (twice from T2).
 NULL values don’t match in an INNER JOIN.

2. LEFT JOIN (All from T1, matching from T2)


SELECT T1.ID AS T1_ID, T2.ID AS T2_ID
FROM T1
LEFT JOIN T2
ON T1.ID = T2.ID;
🔹 Result:6
T1_ID T2_ID
1 NULL
2 2
2 2
3 3
3 3
NULL NULL
🔹 Explanation:
 1 has no match, so NULL appears in T2_ID.
 2 and 3 match normally.
 NULL in T1 joins with NULL in T2.

3. RIGHT JOIN (All from T2, matching from T1)


SELECT T1.ID AS T1_ID, T2.ID AS T2_ID
FROM T1
RIGHT JOIN T2
ON T1.ID = T2.ID;
🔹 Result: 6
T1_ID T2_ID
2 2
2 2
3 3
3 3
NULL 4
NULL NULL
🔹 Explanation:
 4 from T2 has no match, so NULL appears in T1_ID.
 NULL in T2 joins with NULL in T1.

4. FULL OUTER JOIN (All from both tables, matching where possible)
SELECT T1.ID AS T1_ID, T2.ID AS T2_ID
FROM T1
FULL OUTER JOIN T2
ON T1.ID = T2.ID;
🔹 Result: 8
T1.ID T2.ID Reason for Match?
1 NULL No match in T2
2 2 Matches
2 2 Matches
3 3 Matches
3 3 Matches
NULL NULL T1’s NULL row does not match anything but appears in FULL JOIN
NULL NULL T2’s NULL row does not match anything but appears in FULL JOIN
NULL 4 No match in T1

🔹 Explanation:
 Includes all values from both tables.
 1 (T1) and 4 (T2) have no match, so NULL appears.
 NULL in both tables matches with NULL.

5. CROSS JOIN (All possible combinations)


SELECT T1.ID AS T1_ID, T2.ID AS T2_ID
FROM T1
CROSS JOIN T2;
🔹 Result: (5 rows from T1 × 5 rows from T2 = 25 rows)
T1_ID T2_ID
1 2
1 3
1 3
1 4
1 NULL
2 2
2 3
2 3
2 4
2 NULL
2 2
2 3
2 3
T1_ID T2_ID
2 4
2 NULL
3 2
3 3
3 3
3 4
3 NULL
NULL 2
NULL 3
NULL 3
NULL 4
NULL NULL
🔹 Explanation:
 Every row in T1 joins with every row in T2.
 25 rows total (5 × 5).

6. ANTI JOIN (Records in T1 but NOT in T2)


SELECT T1.ID
FROM T1
LEFT JOIN T2
ON T1.ID = T2.ID
WHERE T2.ID IS NULL;
🔹 Result:
T1_ID
1
🔹 Explanation:
 Returns 1 from T1 since it's not in T2.

7. SEMI JOIN (Records in T1 that exist in T2, but no duplicates)


SELECT DISTINCT T1.ID
FROM T1
INNER JOIN T2
ON T1.ID = T2.ID;
🔹 Result:
T1_ID
2
3
🔹 Explanation:
 2 and 3 exist in both tables.
 No duplicate values are returned.

Summary of Joins
Join Type Includes NULLs? Matches Duplicates?
INNER JOIN No Yes
LEFT JOIN Yes (from T1) Yes
RIGHT JOIN Yes (from T2) Yes
FULL JOIN Yes (from both) Yes
CROSS JOIN No Yes (all combinations)

INNER JOIN – ignore not common, NULL


- Match each value from table A to matching value on table B
- Remove non matching from both tables
LEFT JOIN – match each matching value from A to each matching value in B
- Each non-matching value match with null from left table
RIGHT JOIN – “
CROSS JOIN – no of values of A * no of values of B
FULL OUTER JOIN – inner join + extra in left join + extra in right join

Data Engineering Interview Questions and Answers

1. What is the difference between WHERE and HAVING?

Answer:

 WHERE is used to filter records before any groupings are made.

 HAVING is used to filter records after grouping operations like


COUNT, SUM, AVG, etc.

Example: Query to fetch departments with more than 10 active employees:

SELECT DEPT_NO, COUNT(*)

FROM EMP

WHERE EMP_STS = 'A'

GROUP BY DEPT_NO

HAVING COUNT(*) > 10;


2. Can you use both WHERE and HAVING in a single SQL statement?

Answer: Yes, WHERE filters records before grouping, while HAVING filters
grouped records.

3. How do you delete duplicate records from a table?

Answer: Using ROWID:

DELETE FROM TABLE_NAME

WHERE ROWID NOT IN (

SELECT MAX(ROWID) FROM TABLE_NAME GROUP BY Key1, Key2

);

If ROWID is not available, use RANK() or a temporary table approach:

1. Create a temp table with unique records.

2. Delete data from the actual table.

3. Insert unique records back.

4. Drop the temp table.

4. What is the difference between UNION and UNION ALL?

Answer:

 UNION removes duplicate records.

 UNION ALL keeps all records (better performance since no duplicate


elimination).

Note: Column lists and their order must be the same in both queries.

6. What is the difference between COALESCE and DECODE?

Answer:

 COALESCE(expr1, expr2, ..., exprN): Returns the first non-null


expression.

 DECODE(expr, search1, result1 [, search2, result2, ..., default]):


Works like CASE, returning a value based on conditions.

7. What is the difference between Primary Key, Unique Key, and Surrogate
Key?

Answer:
 Primary Key: Unique + Not Null (only one per table, can be
composite).

 Unique Key: Ensures uniqueness but allows nulls.

 Surrogate Key: A system-generated key with no business meaning (e.g.,


auto-increment ID).

8. How do you convert a timestamp to date in Snowflake?

Answer:

SELECT TO_DATE('2022-05-22 00:00:00'); -- 2022-05-22

SELECT YEAR(TO_DATE('2022-05-22 00:00:00')); -- 2022

SELECT MONTH(TO_DATE('2022-05-22 00:00:00')); -- 05

SELECT DAY(TO_DATE('2022-05-22 00:00:00')); -- 22

9. How do you extract only numeric characters from a string in Snowflake?

Answer:

SELECT TRIM(REGEXP_REPLACE(string, '[^[:digit:]]', '')) AS NumericValue

FROM (SELECT 'Area code for employee ID 12345 is 6789.' AS string) a;

10. What are some SQL performance tuning techniques?

Answer:

 Use proper indexes.

 Define appropriate partitioning keys.

 Avoid SELECT *, fetch only required columns.

 Use UNION ALL instead of UNION when duplicates are acceptable.

 Prefer Common Table Expressions (CTEs) over nested subqueries.

 Avoid cross joins and use appropriate joins.

 Collect missing table statistics.

 Use Materialized Views where necessary.

 Analyze execution plans to identify bottlenecks.

11. What is the difference between NVL and NVL2?

Answer:

 NVL(expr, replace_value): Returns replace_value if expr is NULL.


 NVL2(expr, value_if_not_null, value_if_null): Checks expr and returns
the second parameter if not null, otherwise the third parameter.

12. What are DML commands in SQL?

Answer: DML (Data Manipulation Language) includes:

 INSERT – Adds records.

 UPDATE – Modifies records.

 DELETE – Removes records.

 MERGE – Combines INSERT, UPDATE, and DELETE.

13. What is the difference between VARCHAR and NVARCHAR?

Answer:

 VARCHAR: Stores non-Unicode characters (uses 1 byte per character).

 NVARCHAR: Stores Unicode characters (uses 2 bytes per character).

14. What is an index in a database?

Answer: An index is a database object that improves query performance by


providing a fast lookup mechanism.

 Clustered Index: Determines the physical order of rows.

 Non-clustered Index: A separate structure that points to data.

15. What is the difference between CASE and DECODE?

Answer:

 CASE is more flexible and supports complex conditions.

 DECODE is an Oracle-specific function that works like an inline IF-


THEN-ELSE.

16. What are the types of Slowly Changing Dimensions (SCDs)?

Answer:

 SCD Type 1: Overwrites old data with new data.

 SCD Type 2: Maintains history by adding a new row.

 SCD Type 3: Maintains partial history using a separate column.


17. How is a Data Warehouse different from a Database?

Answer:

 Database: Optimized for transaction processing (OLTP).

 Data Warehouse: Optimized for analytical queries (OLAP), integrates


data from multiple sources.

18. What is the difference between RANK() and DENSE_RANK()?

Answer:

 RANK(): Skips ranking numbers when duplicates exist.

 DENSE_RANK(): Does not skip ranking numbers.

RANK – 1 2 2 4

DENSE RANK – 1 2 2 3
.
Thank You

You might also like