SQL for Data Analysis Cheat Sheet
Last Updated :
24 Feb, 2025
SQL (Structured Query Language) is essential for data analysis as it enables efficient data retrieval, manipulation, and transformation. It allows analysts to filter, sort, group, and aggregate large datasets, making data-driven decision-making easier. SQL integrates seamlessly with business intelligence tools like Tableau, Power BI, and Python libraries, enhancing data visualization and reporting.
Why SQL is Important for Data Analysis
SQL is widely used in data analysis because it provides a powerful and efficient way to interact with structured data. Here are some reasons why SQL is essential for data analysts:
- Efficient Data Retrieval – SQL allows users to fetch specific data from large datasets quickly using simple queries.
- Data Manipulation – SQL supports various operations such as filtering, sorting, grouping, and aggregating data to prepare datasets for analysis.
- Scalability – SQL can handle vast amounts of data efficiently, making it suitable for enterprise-level analytics.
- Integration with BI Tools – SQL works seamlessly with business intelligence tools like Tableau, Power BI, and Python libraries (Pandas, SQLAlchemy) for advanced analysis and visualization.
- Data Cleaning and Transformation – SQL functions help clean and normalize data, removing duplicates and handling missing values.
- Decision-Making Support – Businesses use SQL queries to extract insights and drive data-driven decisions.
- Automation – SQL scripts can be scheduled and automated for regular reporting and data processing.
Data Retrieval
1. SELECT Statement: Used to fetch data from a database.
SELECT column1, column2
FROM table_name;
2. DISTINCT Keyword: Eliminates duplicate records.
SELECT DISTINCT column1
FROM table_name;
3. WHERE Clause: Filters records based on specified conditions.
SELECT column1
FROM table_name
WHERE condition;
2. Sorting and Limiting Results
ORDER BY Clause: Sorts the result set.
SELECT column1
FROM table_name
ORDER BY column1 [ASC|DESC];
LIMIT Clause: Limits the number of returned records.
SELECT column1
FROM table_name
LIMIT number;
3. Aggregate Functions
COUNT(): Returns the number of rows.
SELECT COUNT(*)
FROM table_name;
SUM(): Calculates the total sum of a numeric column.
SELECT SUM(column1)
FROM table_name;
AVG(): Calculates the average value.
SELECT AVG(column1)
FROM table_name;
MIN() and MAX(): Retrieve the minimum and maximum values.
SELECT MIN(column1), MAX(column1)
FROM table_name;
4. Grouping Data
GROUP BY: Group rows that have the same values in specified columns into summary rows.
SELECT column1, COUNT(*)
FROM table_name
GROUP BY column1;
HAVING: Filter groups based on aggregate functions.
SELECT column1, COUNT(*)
FROM table_name
GROUP BY column1
HAVING COUNT(*) > 1;
5. Joining Tables
INNER JOIN: Select records with matching values in both tables.
SELECT a.column1, b.column2
FROM table1 a INNER JOIN table2 b
ON a.common_field = b.common_field;
LEFT JOIN: Include all records from the left table and matched records from the right table; fill with NULLs if no match.
SELECT a.column1, b.column2
FROM table1 aLEFT JOIN table2 b ON a.common_field = b.common_field;
RIGHT JOIN: Include all records from the right table and matched records from the left table; fill with NULLs if no match.
SELECT a.column1, b.column2
FROM table1 aRIGHT JOIN table2 b
ON a.common_field = b.common_field;
FULL OUTER JOIN: Return all records when there is a match in either left or right table.
SELECT a.column1, b.column2FROM table1 aFULL OUTER JOIN table2 b ON a.common_field = b.common_field;
5. Subqueries
Subquery in WHERE Clause: Use a subquery to filter results.
SELECT column1
FROM table_name
WHERE column2 IN (SELECT column2 FROM another_table WHERE condition);
Subquery in FROM Clause: Use a subquery as a temporary table.
SELECT a.column1, b.column2
FROM (SELECT column1 FROM table_name WHERE condition)
a JOIN another_table b ON a.common_field = b.common_field;
6. Set Operations
UNION: Combine the result sets of two queries and remove duplicates.
SELECT column1
FROM table1
UNION
SELECT column1
FROM table2;
UNION ALL: Combine the result sets of two queries, including duplicates.
SELECT column1
FROM table1
UNION ALL
SELECT column1
FROM table2;
INTERSECT: Return the common records from two queries.
SELECT column1 FROM table1
INTERSECTSELECT column1
FROM table2;
EXCEPT: Return records from the first query that are not in the second query.
SELECT column1 FROM table1
EXCEPT
SELECT column1 FROM table2;
7. Window Functions
ROW_NUMBER(): Assign a unique sequential integer to rows within a partition.
SELECT column1,
ROW_NUMBER() OVER (PARTITION BY column2 ORDER BY column3) as row_num
FROM table_name;
RANK(): Assign a rank to rows within a partition, with gaps for ties.
SELECT column1,
RANK() OVER (PARTITION BY column2 ORDER BY column3) AS rank_num
FROM table_name;
8. Date and Time Functions
CURRENT_DATE and CURRENT_TIME: Retrieve the current date and time.
SELECT CURRENT_DATE, CURRENT_TIME;
NOW(): Returns the current timestamp (date and time)
SELECT NOW();
EXTRACT(): Extracts a specific part of a date (e.g., year, month, day).
SELECT EXTRACT(YEAR FROM date_column) AS year_value
FROM table_name;
DATE_ADD() and DATE_SUB(): Adds or subtracts an interval from a date.
SELECT DATE_ADD(date_column, INTERVAL 7 DAY) AS next_week
FROM table_name;
SELECT DATE_SUB(date_column, INTERVAL 1 MONTH) AS previous_month
FROM table_name;
DATEDIFF(): Finds the difference between two dates.
SELECT DATEDIFF(end_date, start_date) AS days_difference
FROM table_name;
DATE_FORMAT(): Formats a date in a specific pattern.
SELECT DATE_FORMAT(date_column, '%Y-%m-%d') AS formatted_date
FROM table_name;
Conclusion
SQL plays a vital role in data analysis by providing powerful tools for querying, manipulating, and transforming structured data. Its ability to retrieve specific data efficiently, perform aggregations, and join multiple tables makes it indispensable for analysts. SQL integrates seamlessly with BI tools and programming languages, enhancing visualization and reporting. Additionally, it supports automation and scalable data processing, making it suitable for large datasets. Mastering SQL empowers professionals to extract valuable insights, streamline decision-making, and improve data management, solidifying its importance in the world of data analysis.
Similar Reads
SQL Interview Questions Are you preparing for a SQL interview? SQL is a standard database language used for accessing and manipulating data in databases. It stands for Structured Query Language and was developed by IBM in the 1970's, SQL allows us to create, read, update, and delete data with simple yet effective commands.
15+ min read
SQL Tutorial Structured Query Language (SQL) is the standard language used to interact with relational databases. Whether you want to create, delete, update or read data, SQL provides the structure and commands to perform these operations. SQL is widely supported across various database systems like MySQL, Oracl
8 min read
Non-linear Components In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
SQL Commands | DDL, DQL, DML, DCL and TCL Commands SQL commands are crucial for managing databases effectively. These commands are divided into categories such as Data Definition Language (DDL), Data Manipulation Language (DML), Data Control Language (DCL), Data Query Language (DQL), and Transaction Control Language (TCL). In this article, we will e
7 min read
Spring Boot Tutorial Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
Normal Forms in DBMS In the world of database management, Normal Forms are important for ensuring that data is structured logically, reducing redundancy, and maintaining data integrity. When working with databases, especially relational databases, it is critical to follow normalization techniques that help to eliminate
7 min read
ACID Properties in DBMS In the world of DBMS, transactions are fundamental operations that allow us to modify and retrieve data. However, to ensure the integrity of a database, it is important that these transactions are executed in a way that maintains consistency, correctness, and reliability. This is where the ACID prop
8 min read
Class Diagram | Unified Modeling Language (UML) A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâlike developers and designersâunderstand how the system is organized and how its components interact
12 min read
Backpropagation in Neural Network Back Propagation is also known as "Backward Propagation of Errors" is a method used to train neural network . Its goal is to reduce the difference between the modelâs predicted output and the actual output by adjusting the weights and biases in the network.It works iteratively to adjust weights and
9 min read
3-Phase Inverter An inverter is a fundamental electrical device designed primarily for the conversion of direct current into alternating current . This versatile device , also known as a variable frequency drive , plays a vital role in a wide range of applications , including variable frequency drives and high power
13 min read