0% found this document useful (0 votes)
3 views

DOC4

Regular expressions (regex) are patterns used to search and manipulate text strings, enhancing text processing capabilities. They come in two types: Basic Regular Expressions (BRE) and Extended Regular Expressions (ERE), each with specific meta characters and functionalities. Regex is commonly utilized in Unix/Linux commands like grep, sed, and awk for text searching and manipulation.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

DOC4

Regular expressions (regex) are patterns used to search and manipulate text strings, enhancing text processing capabilities. They come in two types: Basic Regular Expressions (BRE) and Extended Regular Expressions (ERE), each with specific meta characters and functionalities. Regex is commonly utilized in Unix/Linux commands like grep, sed, and awk for text searching and manipulation.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Regular Expression

Palani Karthikeyan
[email protected]
What are regular expressions?
● A regular expression is a pattern that describes a
set of strings.
● Regular expressions are used to search and
manipulate the text, based on the patterns.
● A regular expression, often shortened to “regex” or
“regexp”.
● Regexes enhance the ability to meaningfully
process text content, especially when combined
with other commands.
grep ,sed,awk
● Usually, regular expressions are included in the
grep,sed and awk in the following format:
● grep [options] [regexp] [inputfile]
● In sed : sed [option] '/[regexp]action/' [inputfile]
● In awk: awk [option] '/[regexp]{Action}' [inputfile]
BRE & ERE
● Two types of regular expression feature in
unix/Linux shell
● Basic Regular Expression – BRE
● Extended Regular Expression – ERE
BRE
● BRE – following meta characters are used
● . (dot) Matches any single character.
● ^ match expression at the start of a line, as ^PATTERN
● $ match expression at the end of a line, as in PATTERN$.
● \ (Back Slash) = turn off the special meaning of the next character, as in \^
● [ ] (Brackets)=match any one of the enclosed characters
● [^ ]= match any one character except those enclosed in [ ]
● * (Asterisk) = match zero or more of the preceding character or expression
● ^PATTERN$ = match PATTERN only in single line
● [-]=Character ranges as [A-Z] [0-9] [a-z] [A-Za-z0-9]
ERE
● ERE – Following meta characters are used.
● ? means that the preceding item is optional, and if found, will be matched at the
most, once.
● + means the preceding item will be matched one or more times.
● {n} means the preceding item is matched exactly n times

{n,} means the item is matched n or more times.

{n,m} means that the preceding item is matched at least n times, but not more
than m times.

{,m} means that the preceding item is matched, at the most, m times.
● | (alternation) operator means that the pattern containing this operator separately
matches the parts on either side of it; if either one is found, the line containing it is
a match.

( ) Grouping means that ( ) to group several patterns to behave as one.
ERE
● In general ERE supports following operations
– Alternative Match Patterns
– Grouping Alternatives
– Quantifiers
Alternative Match Patterns

● Alternative Match Pattern means that you can


specify a series of alternatives for a pattern
using | to separate them.
● |(called alternation) is equivalent to an “or” in
regular expression.
● Alternatives are checked from left to right, so
the first alternative that matches is the one
that’s used.
Grouping Alternatives

● Grouping “( ) “ allows parts of a regular


expression to be treated as a single unit.
● Parts of a regular expression are grouped by
enclosing them in parentheses.
● Used to group similar terms by their common
characters and only specified the differences.
● The pairs of parentheses are numbered from left
to right by the positions of the left parentheses.
Quantifiers

● Quantifiers says how many times something


may match,instead of the default of matching
just once.
● You can use quantifier to specify that a pattern
must match a specific number of times.
● Quantifiers in a regular expression are like
loops in a program.
Quantifiers (Contd..)
character Description

* It indicates that the string Immediately to


match 0 or more times the left should
be matched zero or more times in order to
be evaluated
as a true.
Example:-
$var =~ /st*/ # Will match for the strings
like
“st”, ”sttr”, “ sts ”, “star”, “son “....
The regexp “a*” will search for a followed
by either “a” or any other
character.
It matches all strings which
contain the character “a”
Quantifiers (Contd..)
character Description

+ It indicates that the string Immediately to


the left should
match 1 or more times be matched one or more times in order to
be evaluated as
a true.

Example:-
$var =~ /st+/ # Will match for the strings
like “st”,”sttr”, “sts” ,”star “, but not “son”.
Quantifiers (Contd..)
character Description

? It indicates that the string Immediately to


the left should be matched zero or one
times in order to be evaluated as a true.
match 1 or 0 times Example : -

$var =~ /st?r/ # will match either “star” or


“sttr”.

$var =~ /comm?a/ # will match either


“coma” or “comma”
Quantifiers (Contd..)
character Description

{} It indicates that how many times the string


immediately to the left should be matched.

Example : -
{n} - should match exactly n times.
{n,} - should match at least n times
{n, m} - Should match at least n times but
not more than m times.
Example :
$var =~ /mn{2,4}p/ # will match “mnnp”,
“mnnnp”, ”mnnnnp” .
Making Quantifiers Less Greedy

● To make Quantifiers less greedy –that is ,to match the


minimum number of times possible –you follow the
quantifier with a ?
● *? Matches zero or more times.
● +? Matches one or more times.
● ?? Matches zero or one times.
● {n}? Matches n times.
● {n,}? Matches at least n times
● {m,n} Matches at least n times but more than m times.
BRE vs ERE
● In basic regular expressions the
metacharacters "?", "+", "{", "|", "(", and ")" lose
their special meaning; instead use the
backslashed versions "\?", "\+", "\{", "\|", "\(",
and "\)".
● In ERE options
● grep -E
● sed -r
Examples using grep
● we now exclusively want to display lines starting with
the string "root":
● grep ^root /etc/passwd
● root:x:0:0:root:/root:/bin/bash
● If we want to see which accounts have no shell
assigned whatsoever, we search for lines ending in ":"
● grep :$ /etc/passwd
● news:x:9:13:news:/var/spool/news:
Character classes
● grep [yf] /etc/group
● sys:x:3:root,bin,adm
● tty:x:5:
● mail:x:12:mail,postfix
● ftp:x:50:
● nobody:x:99:
● floppy:x:19:
● xfs:x:43:
● nfsnobody:x:65534:
● postfix:x:89:


dog matches the string "dog"

[dog]matches matches one character: a "d" an "o" or a "g"

[dog]* matches matches a string of zero or more characters from the set {"d" an "o" or a "g"}

(dog|cat) matches the string "dog" or the string "cat"

dog.*cat matches the string "dog" followed by the string "cat" somewhere later in the string

x(dog|cat)x matches the string "dog" or the string "cat" between two "x"s

xx* matches a string of one or more "x"s

x+matches a string of one or more "x"s

x(dog|cat)?x matches two "x"s with optionally the string "dog" or the string "cat" between the "x"'s

[aeiou] matches a single vowel

[A-Z]+ matches a string of one or more uppercase characters

[az-]+ matches a string of one characters from the set or three characters "a", "z", "-"

[^a-z]+ matches a string of one or more characters that are not lowercaase letters

"[a-z]" in flex matches exactly the five character string "[a-z]"

[a-zA-Z][a-zA-Z0-9]*matches a letter optionally followed by letters or digits

[1-9][0-9]*|0 matches a positive integer with no leading zero except when the number is zero

[+-]?[0-9]+ matches an integer with optional sign (note that leading zeroes are allowed

([0-9].)*matches an even number of characters where every odd numbered character is a digit

[+-]?[1-9][0-9]*|0 matches an integer with no leading zero except when the number is zero. The
number may have an optional sign

[\^\+\-\:\*\]] matches one of the 6 characters: "^", "+", "-", ":", "*", "]"
Regx Snaps
Thank you

You might also like