Regex Cheat Sheet
Regex Cheat Sheet
Characters Escapes
The backslash character (\) in the following table indicates that the character that follows it is a
special character.
\ nnn It uses octal representation to specify a \w\040\w "a b", "c d" in "a bc
character (nnn consists of two or three d"
digits).
\x nn It uses the hexadecimal representation to \w\x20\w "a b", "c d" in "a bc
specify a character (nn consists of exactly d"
two digits).
2. Character Classes
A character class will match any one of a set of characters. Character classes include the
language elements that are listed in the following table.
[ It will match any single character present [ae] "a" in "bay" "a",
character_grou in the character_group. By default, the "e" in "stake"
p] match is case-sensitive.
[ first - last ] Character range: it will match any single [A-Z] "A", "B" in
character present in the range from first "AB123"
to last.
\p{ name } It will match any single character \p{Lu} "C", "L" in "City
available in the Unicode general category \p{IsCyrillic} Lights" "Д",
or named block specified by name. "Ж" in "ДЖem"
\P{ name } It will match any single character not \P{Lu} "i", "t", "y" in
available in the Unicode general category \P{IsCyrillic} "City" "e", "m"
or named block specified by name. in "ДЖem"
\D It will match any character other than a \D " ", "=", " ", "I",
decimal digit. "V" in "4 = IV"
Anchors are also known as atomic zero-width assertions. It results the match to succeed or fail
based on the current position in the string. But these anchors cannot be used to allow the
engine to advance through the string or characters. The metacharacters that are listed in the
following table are anchors.
$ By default, the match will occur at the end of the -\d{3}$ "-444" in
string or just before \n at the end of the string. In "-901-444"
the case of the multiline mode, it will occur just
before the end of the line or before \n at the end
of the line.
\G The match occurs at the point where the \G\(\d\) "(1)", "(3)", "(5)"
previous match ended. in
"(1)(3)(5)[7](9)"
5. Grouping Constructs
Grouping constructs delineate subexpressions of a regular expression and capture substrings of
the provided string. Grouping constructs uses the following language elements.
6. Lookarounds
When the regex engine starts processing the lookaround expression, it takes a substring from
the current position to the start (lookbehind) or end (lookahead) of the original string, and then
runs Regex.IsMatch on that selected substring with the help of the lookaround pattern. You can
determine the success of the result based on a positive or negative assertion.
7. Quanitfiers
A quantifier will simply specify how many instances of the previous element must be available in
the input string for resulting in a perfect match. Quantifiers include the following language
elements.
Quantifie Description Pattern Matches
r
+ It will match the previous element "se+" "see" in "seen", "se" in "sent"
one or more times.
{n} It will match the previous element ",\d{3}" ",043" in "1,043.6", ",876",
exactly n times. ",543", and ",210" in
"9,876,543,210"
{n,m} It will match the previous element "\d{3,5}" "166", "17668" "19302" in
at least n times, but no more than "193024"
m times.
+? It will match the previous element "se+?" "se" in "seen", "se" in "sent"
one or more times, but as few
times as possible.
{ n ,}? It will match the previous element "\d{2,}?" "166", "29", "1930"
at least n times, but as few times
as possible.
{ n , m }? It will match the previous element "\d{3,5}?" "166", "17668" "193", "024" in
between n and m times, but as few "193024"
times as possible.
8. Backreference Constructs
With backreference, you can simply identify the subexpression subsequently in the same
regular expression. The following table highlights the backreference constructs:
9. Alteration Constructs
Alternation constructs will alter a regular expression to enable the “either/or” matching. These
constructs come with the language elements that are listed in the following table.
10. Substitutions
Substitutions are regex language elements that are used in replacement patterns. The following
table lists metacharacters that are atomic zero-width assertions.
The following are the inline options supported by the .Net regex engine:
x It will ignore the unescaped white space in \b(?x) \d+ "1 aardvark", "2 cats" in "1
the regular expression pattern. \s \w+ aardvark 2 cats IV
centurions"
A character class matches a small sequence of characters with a large set of characters. We
can use POSIX character classes only within bracket expressions. The POSIX standard
supports the following classes of characters to create regular expressions.
[:alpha:] PCRE (C, PHP, R…): ASCII letters A-Z and a-z [8[:alpha:]]+ WellDone88
[:alnum:] PCRE (C, PHP, R…): ASCII digits and letters [[:alnum:]]{10} ABC1235251
A-Z and a-z
[:punct:] PCRE (C, PHP, R…): ASCII punctuation mark [[:punct:]]+ ?!.,:;
The following modifiers are not supported in JavaScript. If you are using Ruby, make sure to
carefully use the “?s” and “?m”.
(?n) .NET, PCRE 10.30+: named capture Turns all (parentheses) into
only non-capture groups. To
capture, use named groups.
(?d) Java: Unix linebreaks only The dot and the ^ and $
anchors are only affected by
\n