Open In App

Regular Expressions in SQL

Last Updated : 21 Jan, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Regular expressions (regex) are incredibly powerful tools that simplify complex data processing tasks. They enable users to identify, replace, or extract data based on specific patterns. Regular expressions, though originating from computer science, are widely used in SQL to manage and manipulate data effectively.

In this comprehensive guide, we will introduce you to regular expressions in SQL, covering everything from basic concepts to advanced applications, ensuring it becomes a valuable addition to our SQL toolkit.

What are Regular Expressions?

Regular expressions, often abbreviated as regex or regexp, are sequences of characters that define a specific search pattern. They are widely used in programming and data processing to perform sophisticated string matching and manipulation tasks. By using these patterns, users can efficiently search, validate, or extract specific data from large text datasets.

Regex patterns consist of a combination of literal characters and special symbols that dictate matching rules. Regular expressions can be used with functions like REGEXP_LIKE, REGEXP_REPLACE, and REGEXP_SUBSTR to process and analyze textual data stored in databases. They provide a more flexible and powerful alternative to basic SQL string operations, enabling developers to handle complex text-related requirements with ease.

Example:

SELECT email 
FROM users
WHERE REGEXP_LIKE(email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$');

This query retrieves all emails from the users table that match the regex pattern for valid email addresses.

Types of Regular Expressions in SQL

In SQL, there are three primary functions for working with regular expressions:

1. REGEXP_LIKE

The REGEXP_LIKE function in SQL is used to determine whether a given string matches a specific regular expression pattern. It evaluates the string against the regex and returns TRUE if the string matches the pattern and FALSE otherwise. This function is particularly useful for filtering data based on complex string patterns.

Syntax

REGEXP_LIKE(column_name, 'pattern')

Example:

The following query selects all product names from the products table where the names start with the letter 'A':

SELECT product_name 
FROM products
WHERE REGEXP_LIKE(product_name, '^A');

Output

product_name
Apple
Apricot
Avocado

2. REGEXP_REPLACE

The REGEXP_REPLACE function in SQL is used to search for a pattern within a string and replace all occurrences of that pattern with a specified replacement string. This function is particularly useful for cleaning and transforming data by removing unwanted characters or formatting strings.

Syntax

REGEXP_REPLACE(string, 'pattern', 'replacement')

Example:

The following query removes all non-numeric characters from the phone_number column in the contacts table:

SELECT REGEXP_REPLACE(phone_number, '[^0-9]', '') AS cleaned_number 
FROM contacts;

Output

cleaned_number
1234567890
18005550199
1234567890123
9876543210
55512345678

Explanation:

  • [^0-9]: Matches any character that is not a digit (0–9). The ^ inside square brackets negates the range.
  • ' ': The replacement string is an empty string, effectively removing all non-numeric characters.

3. REGEXP_SUBSTR

The REGEXP_SUBSTR function in SQL is used to extract a substring from a string that matches a specified regular expression pattern. It is particularly useful for isolating portions of text within a larger string, such as extracting domain names from email addresses or pulling specific elements from formatted data.

Syntax

REGEXP_SUBSTR(string, 'pattern', start_position, occurrence, match_parameter)

Example:

To extract the domain name from the email field in the users table:

SELECT REGEXP_SUBSTR(email, '@[^.]+') AS domain 
FROM users;

Output

domain
@gmail
@outlook
@company
@yahoo
@example

Explanation:

This pattern extracts the part of the email immediately following the @ symbol up to (but not including) the next period. The REGEXP_SUBSTR function is a versatile tool for extracting meaningful data from complex text fields, enabling targeted data analysis and processing.

Basic Regular Expression Syntax Table

Regular expressions (regex) are constructed using a combination of characters and special symbols, each with specific meanings. This table format organizes regex elements and examples for quick reference, making it easy to understand and apply them in practical scenarios.

PatternDescriptionExampleMatches
.Matches any single character (except newline).h.that, hit, hot
^Matches the start of a string.^AApple, Apricot
$Matches the end of a string.ing$sing, bring
``Acts as a logical OR between patterns.`cat
*Matches zero or more of the preceding character.ab*a, ab, abb, abbb
+Matches one or more of the preceding character.ab+ab, abb, abbb
?Matches zero or one of the preceding character.colou?rcolor, colour
{n}Matches exactly n occurrences of the preceding character.a{3}aaa
{n,}Matches n or more occurrences of the preceding character.a{2,}aa, aaa, aaaa
{n,m}Matches between n and m occurrences of the preceding character.a{2,4}aa, aaa, aaaa
[abc]Matches any of the enclosed characters.[aeiou]a, e, i, o, u
[^abc]Matches any character not enclosed.[^aeiou]Any non-vowel character
[a-z]Matches any character in the specified range.[0-9]0, 1, 2, ..., 9
\Escapes a special character to treat it as a literal.\.Matches a literal .
\bMatches a word boundary (space, punctuation).\bcat\bMatches cat but not scatter
\BMatches a non-word boundary.\BcatMatches scatter but not cat
(abc)Groups characters together and remembers the match.(ha)+ha, haha, hahaha
\1, \2, ...Matches the content of a captured group.(ab)\1abab

Common Regex Patterns

PatternDescriptionExampleMatches
^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$Validates an email address.[email protected]Valid email addresses
^[0-9]+$Matches a numeric string only.123456123, 456, 7890
https?://[^ ]+Matches a URL starting with http or https.https://example.comURLs
^[A-Za-z0-9]+$Matches alphanumeric strings.User123abc123, xyz789

Real-World Examples of Regular Expressions in SQL

Here’s how regular expressions can solve common data processing tasks in SQL:

Example 1: Extracting URLs from Text

If we have a messages table containing text with embedded URLs, we can extract the URLs as follows. This regex matches URLs starting with http:// or https:// and extracts them.

Query:

SELECT REGEXP_SUBSTR(message, 'https?://[^ ]+') AS url 
FROM messages;

Explanation:

  • https?://: Matches http:// or https://.
  • [^ ]+: Matches all characters up to the next space.

Example 2: Validating Email Addresses

To validate email addresses in the users table. This pattern ensures that the email follows the standard format.

SELECT email 
FROM users
WHERE REGEXP_LIKE(email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$');

Explanation:

Ensures the email address starts with alphanumeric characters, includes an @ symbol, followed by a domain, and ends with a valid TLD (top-level domain).

Example 3: Cleaning Up Phone Numbers

To remove non-numeric characters from phone numbers in the contacts table. This query leaves only numeric characters in the phone_number field.

SELECT REGEXP_REPLACE(phone_number, '[^0-9]', '') AS cleaned_number 
FROM contacts;

Explanation:

  • [^0-9]: Matches any character that is not a digit.
  • ' ': Replaces non-numeric characters with an empty string.

Example 4: Finding Specific Patterns

Find all product names in the products table that contain digits:

SELECT product_name 
FROM products
WHERE REGEXP_LIKE(product_name, '\d');

Explanation:

\\d: Matches any numeric digit (0–9).

Example 5: Extracting Subdomains

Extract the subdomain from URLs in the web_logs table:

SELECT REGEXP_SUBSTR(url, '^[^.]+') AS subdomain 
FROM web_logs;

Explanation:

^[^.]+: Matches all characters from the start of the string up to the first . (dot).

Example 6: Validating Numeric Strings

Find records where a field contains only numbers in the data_table:

SELECT record_id 
FROM data_table
WHERE REGEXP_LIKE(field_name, '^[0-9]+$');

Explanation:

  • ^[0-9]+$: Matches strings that consist entirely of digits.

Common Regular Expression Use Cases in SQL

  1. Data Validation: Regular expressions can be used to ensure that data fields adhere to specific formats, such as validating emails, phone numbers, or numeric strings.
  2. Data Cleaning: Regex can help remove unwanted characters, whitespace, or other non-standard elements from strings to clean up datasets.
  3. Data Extraction: Extract meaningful substrings from larger text fields, such as domain names from emails or URLs from text.

Conclusion

Regular expressions in SQL offer a powerful way to manage and manipulate textual data. Whether we're validating inputs, cleaning datasets, or extracting specific patterns, mastering regex functions can significantly enhance our SQL capabilities. Start with the basics and gradually explore advanced patterns to unlock the full potential of this tool. By integrating regular expressions into our SQL workflows, we can simplify complex data operations and improve overall efficiency.


Next Article
Article Tags :

Similar Reads