- Categories:
REGEXP_SUBSTR¶
Returns the substring that matches a regular expression within a string.
Syntax¶
Arguments¶
Required:
subjectThe string to search for matches.
patternPattern to match.
For guidelines on specifying patterns, see String functions (regular expressions).
Optional:
positionNumber of characters from the beginning of the string where the function starts searching for matches. The value must be a positive integer.
Default:
1(the search for a match starts at the first character on the left)occurrenceSpecifies the first occurrence of the pattern from which to start returning matches.
The function skips the first
occurrence - 1matches. For example, if there are 5 matches and you specify3for theoccurrenceargument, the function ignores the first two matches and returns the third, fourth, and fifth matches.Default:
1regex_parametersString of one or more characters that specifies the parameters used for searching for matches. Supported values:
Parameter
Description
cCase-sensitive matching
iCase-insensitive matching
mMulti-line mode
eExtract submatches
sSingle-line mode POSIX wildcard character
.matches\nDefault:
cFor more information, see Specifying the parameters for the regular expression.
Note
By default, REGEXP_SUBSTR returns the entire matching part of the subject. However, if the
e(for “extract”) parameter is specified, REGEXP_SUBSTR returns the part of the subject that matches the first group in the pattern. Ifeis specified but agroup_numis not also specified, then thegroup_numdefaults to 1 (the first group). If there is no sub-expression in the pattern, REGEXP_SUBSTR behaves as ifewas not set. For examples that usee, see Examples in this topic.group_numSpecifies which group to extract. Groups are specified by using parentheses in the regular expression.
If a
group_numis specified, Snowflake allows extraction even if the'e'option was not also specified. The'e'is implied.Snowflake supports up to 1024 groups.
For examples that use
group_num, see the Examples in this topic.
Returns¶
The function returns a value of type VARCHAR that is the matching substring.
The function returns NULL in the following cases:
No match is found.
Any argument is NULL.
Usage notes¶
For additional information on using regular expressions, see String functions (regular expressions).
Collation details¶
Arguments with collation specifications currently aren’t supported.
Examples¶
The documentation of the REGEXP_INSTR function contains many examples that use both REGEXP_SUBSTR and REGEXP_INSTR. You might want to look at those examples, too.
These examples use the strings created below:
The strings have the following characteristics:
The string with an
idof2has multiple occurrences of the word “the”.The string with an
idof3has multiple occurrences of the word “the” with extra blank spaces between the words.The string with an
idof4has the character sequence “the” inside multiple words (“thespian” and “theater”), but without the word “the” by itself.
The following examples call the REGEXP_SUBSTR function:
Calling the REGEXP_SUBSTR function in a SELECT list¶
Call the REGEXP_SUBSTR function in a SELECT list to extract or display values that match a pattern.
This example looks for first occurrence of the word the, followed by one or more non-word characters — for example,
the whitespace separating words — followed by one or more word characters.
“Word characters” include not only the letters a-z and A-Z, but also the underscore (“_”) and the decimal digits 0-9, but not whitespace, punctuation, and so on.
Starting from position 1 of the string, look for the second occurrence of the word the,
followed by one or more non-word characters, followed by one or more word characters.
Starting from position 1 of the string, look for the second occurrence of the word the,
followed by one or more non-word characters, followed by one or more word characters.
Rather than returning the entire match, return only the “group” (for example, the portion of the substring that matches the part of the regular expression in parentheses). In this case, the returned value should be the word after “the”.
This example shows how to retrieve the second word from the first, second, and third matches of
a two-word pattern in which the first word is A. This example also shows that trying to
go beyond the last pattern causes Snowflake to return NULL.
First, create a table and insert data:
Run the query:
This example shows how to retrieve the first, second, and third groups within the first occurrence of the pattern.
In this case, the returned values are the individual letters of the word MAN.
Here are some additional examples.
Create a table and insert data:
Return the first match that contains a lowercase o by matching a word boundary (\b),
followed by zero or more word characters (\S), the letter o, and then zero or more
word characters until the next word boundary:
Return the first match that contains a lowercase o, starting at the third character
in the subject:
Return the third match that contains a lowercase o, starting at the third character
in the subject:
Return the third match that contains a lowercase o, starting at the third character in
the subject, with case-insensitive matching:
This example shows that you can explicitly omit any regular expression parameters by specifying empty string.
The following example illustrates overlapping occurrences. First, create a table and insert data:
Run a query that finds the second occurrence of the following pattern in each row: a punctuation mark followed by digits and letters, followed by a punctuation mark.
The following example creates a JSON object from an Apache HTTP Server access log using pattern matching and concatenation. First, create a table and insert data:
Run a query:
Calling the REGEXP_SUBSTR function in a WHERE clause¶
Call the REGEXP_SUBSTR function in a WHERE clause to filter for rows that contain values that match a pattern. By using the function, you can avoid multiple OR conditions.
The following example queries the demo2 table you created previously to return rows that include either
the string best or the string thespian. Add IS NOT NULL to the condition to return rows that
match the pattern. That is, the rows where the REGEXP_SUBSTR function didn’t return NULL:
You can use AND conditions to find rows that match multiple patterns. For example, the following query returns
rows that include either the string best or the string thespian and start with the string It: