Regular Expressions in SQL
Last Updated :
21 Jan, 2025
Regular expressions (regex) are incredibly powerful tools that simplify complex data processing tasks. They enable users to identify, replace, or extract data based on specific patterns. Regular expressions, though originating from computer science, are widely used in SQL to manage and manipulate data effectively.
In this comprehensive guide, we will introduce you to regular expressions in SQL, covering everything from basic concepts to advanced applications, ensuring it becomes a valuable addition to our SQL toolkit.
What are Regular Expressions?
Regular expressions, often abbreviated as regex or regexp, are sequences of characters that define a specific search pattern. They are widely used in programming and data processing to perform sophisticated string matching and manipulation tasks. By using these patterns, users can efficiently search, validate, or extract specific data from large text datasets.
Regex patterns consist of a combination of literal characters and special symbols that dictate matching rules. Regular expressions can be used with functions like REGEXP_LIKE
, REGEXP_REPLACE
, and REGEXP_SUBSTR
to process and analyze textual data stored in databases. They provide a more flexible and powerful alternative to basic SQL string operations, enabling developers to handle complex text-related requirements with ease.
Example:
SELECT email
FROM users
WHERE REGEXP_LIKE(email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$');
This query retrieves all emails from the users
table that match the regex pattern for valid email addresses.
Types of Regular Expressions in SQL
In SQL, there are three primary functions for working with regular expressions:
1. REGEXP_LIKE
The REGEXP_LIKE
function in SQL is used to determine whether a given string matches a specific regular expression pattern. It evaluates the string against the regex and returns TRUE
if the string matches the pattern and FALSE
otherwise. This function is particularly useful for filtering data based on complex string patterns.
Syntax
REGEXP_LIKE(column_name, 'pattern')
Example:
The following query selects all product names from the products
table where the names start with the letter 'A':
SELECT product_name
FROM products
WHERE REGEXP_LIKE(product_name, '^A');
Output
product_name |
---|
Apple |
Apricot |
Avocado |
2. REGEXP_REPLACE
The REGEXP_REPLACE
function in SQL is used to search for a pattern within a string and replace all occurrences of that pattern with a specified replacement string. This function is particularly useful for cleaning and transforming data by removing unwanted characters or formatting strings.
Syntax
REGEXP_REPLACE(string, 'pattern', 'replacement')
Example:
The following query removes all non-numeric characters from the phone_number
column in the contacts
table:
SELECT REGEXP_REPLACE(phone_number, '[^0-9]', '') AS cleaned_number
FROM contacts;
Output
cleaned_number |
---|
1234567890 |
18005550199 |
1234567890123 |
9876543210 |
55512345678 |
Explanation:
[^0-9]
: Matches any character that is not a digit (0–9). The ^
inside square brackets negates the range.
' '
: The replacement string is an empty string, effectively removing all non-numeric characters.
3. REGEXP_SUBSTR
The REGEXP_SUBSTR
function in SQL is used to extract a substring from a string that matches a specified regular expression pattern. It is particularly useful for isolating portions of text within a larger string, such as extracting domain names from email addresses or pulling specific elements from formatted data.
Syntax
REGEXP_SUBSTR(string, 'pattern', start_position, occurrence, match_parameter)
Example:
To extract the domain name from the email
field in the users
table:
SELECT REGEXP_SUBSTR(email, '@[^.]+') AS domain
FROM users;
Output
domain |
---|
@gmail |
@outlook |
@company |
@yahoo |
@example |
Explanation:
This pattern extracts the part of the email immediately following the @
symbol up to (but not including) the next period. The REGEXP_SUBSTR
function is a versatile tool for extracting meaningful data from complex text fields, enabling targeted data analysis and processing.
Basic Regular Expression Syntax Table
Regular expressions (regex) are constructed using a combination of characters and special symbols, each with specific meanings. This table format organizes regex elements and examples for quick reference, making it easy to understand and apply them in practical scenarios.
Pattern | Description | Example | Matches |
---|
. | Matches any single character (except newline). | h.t | hat , hit , hot |
^ | Matches the start of a string. | ^A | Apple , Apricot |
$ | Matches the end of a string. | ing$ | sing , bring |
` | ` | Acts as a logical OR between patterns. | `cat |
* | Matches zero or more of the preceding character. | ab* | a , ab , abb , abbb |
+ | Matches one or more of the preceding character. | ab+ | ab , abb , abbb |
? | Matches zero or one of the preceding character. | colou?r | color , colour |
{n} | Matches exactly n occurrences of the preceding character. | a{3} | aaa |
{n,} | Matches n or more occurrences of the preceding character. | a{2,} | aa , aaa , aaaa |
{n,m} | Matches between n and m occurrences of the preceding character. | a{2,4} | aa , aaa , aaaa |
[abc] | Matches any of the enclosed characters. | [aeiou] | a , e , i , o , u |
[^abc] | Matches any character not enclosed. | [^aeiou] | Any non-vowel character |
[a-z] | Matches any character in the specified range. | [0-9] | 0 , 1 , 2 , ..., 9 |
\ | Escapes a special character to treat it as a literal. | \. | Matches a literal . |
\b | Matches a word boundary (space, punctuation). | \bcat\b | Matches cat but not scatter |
\B | Matches a non-word boundary. | \Bcat | Matches scatter but not cat |
(abc) | Groups characters together and remembers the match. | (ha)+ | ha , haha , hahaha |
\1 , \2 , ... | Matches the content of a captured group. | (ab)\1 | abab |
Common Regex Patterns
Pattern | Description | Example | Matches |
---|
^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$ | Validates an email address. | [email protected] | Valid email addresses |
^[0-9]+$ | Matches a numeric string only. | 123456 | 123 , 456 , 7890 |
https?://[^ ]+ | Matches a URL starting with http or https . | https://example.com | URLs |
^[A-Za-z0-9]+$ | Matches alphanumeric strings. | User123 | abc123 , xyz789 |
Real-World Examples of Regular Expressions in SQL
Here’s how regular expressions can solve common data processing tasks in SQL:
Example 1: Extracting URLs from Text
If we have a messages
table containing text with embedded URLs, we can extract the URLs as follows. This regex matches URLs starting with http://
or https://
and extracts them.
Query:
SELECT REGEXP_SUBSTR(message, 'https?://[^ ]+') AS url
FROM messages;
Explanation:
https?://
: Matches http://
or https://
.[^ ]+
: Matches all characters up to the next space.
Example 2: Validating Email Addresses
To validate email addresses in the users
table. This pattern ensures that the email follows the standard format.
SELECT email
FROM users
WHERE REGEXP_LIKE(email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$');
Explanation:
Ensures the email address starts with alphanumeric characters, includes an @
symbol, followed by a domain, and ends with a valid TLD (top-level domain).
Example 3: Cleaning Up Phone Numbers
To remove non-numeric characters from phone numbers in the contacts
table. This query leaves only numeric characters in the phone_number
field.
SELECT REGEXP_REPLACE(phone_number, '[^0-9]', '') AS cleaned_number
FROM contacts;
Explanation:
[^0-9]
: Matches any character that is not a digit.' '
: Replaces non-numeric characters with an empty string.
Example 4: Finding Specific Patterns
Find all product names in the products
table that contain digits:
SELECT product_name
FROM products
WHERE REGEXP_LIKE(product_name, '\d');
Explanation:
\\d
: Matches any numeric digit (0–9).
Example 5: Extracting Subdomains
Extract the subdomain from URLs in the web_logs
table:
SELECT REGEXP_SUBSTR(url, '^[^.]+') AS subdomain
FROM web_logs;
Explanation:
^[^.]+
: Matches all characters from the start of the string up to the first .
(dot).
Example 6: Validating Numeric Strings
Find records where a field contains only numbers in the data_table
:
SELECT record_id
FROM data_table
WHERE REGEXP_LIKE(field_name, '^[0-9]+$');
Explanation:
^[0-9]+$
: Matches strings that consist entirely of digits.
Common Regular Expression Use Cases in SQL
- Data Validation: Regular expressions can be used to ensure that data fields adhere to specific formats, such as validating emails, phone numbers, or numeric strings.
- Data Cleaning: Regex can help remove unwanted characters, whitespace, or other non-standard elements from strings to clean up datasets.
- Data Extraction: Extract meaningful substrings from larger text fields, such as domain names from emails or URLs from text.
Conclusion
Regular expressions in SQL offer a powerful way to manage and manipulate textual data. Whether we're validating inputs, cleaning datasets, or extracting specific patterns, mastering regex functions can significantly enhance our SQL capabilities. Start with the basics and gradually explore advanced patterns to unlock the full potential of this tool. By integrating regular expressions into our SQL workflows, we can simplify complex data operations and improve overall efficiency.
Similar Reads
MariaDB - Regular Expression
MariaDB is also a relational database language that is similar to SQL. However, the introduction of MariaDB took place as it is an extension to SQL and contains some more advanced operators rather than SQL. MariaDB contains operators similar to SQL like CRUD operations and between operators and othe
8 min read
SQL Data Extraction with Regular Expressions
Regular expressions (RegEx) are a powerful mechanism for matching patterns in text, allowing us to extract specific characters, validate input, and search for patterns. In SQL, regular expressions are used in queries to validate data, extract specific substrings, or search for matching patterns in c
4 min read
MySQL | Regular Expressions (Regexp)
In MySQL, regular expressions (REGEX) offer powerful functionality for flexible pattern matching within string data. By using the REGEXP and RLIKE operators, developers can efficiently search, validate, and manipulate string data in more dynamic ways than simple LIKE queries. In this article, we wil
6 min read
Boolean Expressions in SQL
Boolean expressions are a core concept in SQL, helping to filter and manipulate data based on conditions. These expressions evaluate to one of three Boolean values: TRUE, FALSE, or UNKNOWN. They are extensively used in WHERE clauses, HAVING clauses, and conditional statements to query and retrieve s
3 min read
Working with Regular Expressions in PostgreSQL
Regular expressions, often referred to as "regex," are patterns used to match strings. In PostgreSQL, regular expressions allow us to search, validate, and manipulate text data in a powerful way. Regular expressions are helpful whether we need to find patterns in a string, replace parts of a text, o
5 min read
Query Execution Plan in SQL
An execution plan in SQL is a detailed plan that outlines the steps that the database management system (DBMS) will take to execute a query. It is a crucial component of query optimization, as it helps the DBMS determine the most efficient way to retrieve data from the database. Here, we will learn
5 min read
SQL Server Common Table Expressions
SQL Server is a relational database management system (RDBMS) that is used to handle complex data and maintain it in of tabular manner. With the help of SQL Server, one can easily protect their data as it provides various security features. In this article, we are going to explore SQL server's CTE a
8 min read
PL/SQL Common Table Expressions
PL/SQL Common Table Expressions (CTEs) make complex queries easier to write and understand by creating temporary result sets. We can use it multiple times in the same query. They help organize and simplify your code, especially for handling hierarchical data or recursive relationships. CTEs make you
10 min read
Query Processing in SQL
Query Processing includes translations of high-level Queries into low-level expressions that can be used at the physical level of the file system, query optimization, and actual execution of the query to get the actual result. High-level queries are converted into low-level expressions during query
4 min read
SQL Server Interview Questions
Data is growing every day, and it plays a big part in making important decisions. To manage this data, we need reliable databases, and SQL Server is one of the most popular options out there. Learning SQL Server can lead to exciting careers as an SQL Specialist or SQL Developer. If youâre preparing
15+ min read