l6 Latest
l6 Latest
PATTERN MATCHING:
Filenames Versus Patterns, Metacharacters, Search Patterns Replacement Patterns,
Metacharacters, Listed by Program, Examples of Searching, Examples of Searching
and Replacing
Text Book 2: Ellen Siever, Stephen Figgins, Robert Love, and Arnold
Pattern Matching
●
A number of Linux text-processing utilities let you
search for, and in some cases change, text patterns
rather than fixed strings.
●
These utilities include the editing programs ed,
ex, vi, and sed; the gawk programming
language; and the commands grep and egrep.
●
Text patterns (called regular expressions in
computer science literature) contain normal
characters mixed with special characters (called
metacharacters).
●
●
Filenames Versus Patterns
●
Metacharacters used in pattern matching are
different from metacharacters used for filename
expansion
●
However, several metacharacters have meaning
for both regular expressions and for filename
expansion.
●
This can lead to a problem: the shell sees the
command line first, and can potentially interpret an
unquoted regular expression metacharacter as a
filename expansion.
Filenames Versus Patterns
●
For example, the command:
●
$ grep [A-Z]* chap[12]
●
could be transformed by the shell into:
●
$ grep Array.c Bug.c Comp.c chap1 chap2
●
and grep would then try to find the pattern Array.c in files Bug.c,
Comp.c, chap1, and chap2.
●
To bypass the shell and pass the special characters to grep, use
quotes as follows:
●
$ grep "[A-Z]*" chap[12]
●
Double quotes suffice in most cases, but single quotes are the safest
bet, since the shell does absolutely no expansions on single-quoted
text.
●
Metacharacters
●
Different metacharacters have different meanings, depending upon where they
are used.
●
In particular, regular expressions used for searching through text (matching)
have one set of metacharacters, while the metacharacters used when
processing replacement text (such as in a text editor) have a different set
●
Search Patterns
Searching for a Pattern in file name
Examples:
(i)
$ grep “sales” emp.dat gives following results
200 abc manager sales 10/12/2012
300 xyz director sales 12/10/2013
(ii) $ grep president emp.dat gives following results
No president found
• No quoting is necessary here and the command failed becase the president string
is missing in the fie. Quoting is essential if the search string has multiple
words in it
•
•
Searching for a Pattern in file name
Examples:
(iii) $ grep “sales” emp1.dat emp2.dat
grep can be used with multiple filenames. Here it display filenames along with
the output
(iv) $ grep “jai sharma” emp.dat
these quotesa are redundant in single word
Grep with options
Examples:
(i) $ grep -i ‘sharma’ emp1.dat
locates the name sharma
• The option ‘-i’ ignores the casefor pattern matching
• For example one of the output can be:
2000 Sudhir Sharma director marketing 10/10/2000
(ii) $ grep -v ‘director’ emp.dat > newfile
• The option ‘-v’ selects all lines except those containing the pattern ‘director’
(i) $ grep -n ‘marketing’ emp1.dat
• The option ‘-n’ displays the line numbers containingthe pattern
For example:
2000 Sudhir Sharma director marketing 10/10/2000
2004 Jaya Sharma managermarketing 12/10/2015
Using Regular Expressions having metacharacters
With grep
•A caret (^) metacharacter indicates the beginning of the line.
The following command finds any line in the file list that starts with
the letter b.
$ grep '^b' list
●
A dollar-sign ($) metacharacter indicates the end of the line.
The following command displays any line in which b is the
last character on the line.
$ grep 'b$' list
●
The following command displays any line in the file list where b
is the only character on the line.
●
$ grep '^b$' list
●
●
Using Regular Expressions With grep
•Within a regular expression, dot (.) finds any single character. The
following command matches any three-character string with “an” as the
first two characters, including “any,” “and,” “management,” and “plan”
(because spaces count, too).
$ grep 'an.' list
●
When an asterisk (*) follows a character, grep interprets the asterisk as
“zero or more instances of that character.” When the asterisk follows a
regular expression, grep interprets the asterisk as “zero or more
instances of characters matching the pattern.”
●
Because it includes zero occurrences, the asterisk can create a
confusing command output. If you want to find all words with the letters
“qu” in them, type the following command.
$ grep 'qu*' list
●
●
Using Regular Expressions With grep
•However, if you want to find all words containing the
letter “n,” type the following command.
$ grep 'n*' list
●
If you want to find all words containing the pattern
“nn,” type the following command
●
$ grep 'nn*' list
●
To match zero or more occurrences of any
character in list, type the following command
●
$ grep .* list
●
●
Following table lists common search pattern
elements you can use with grep.
●
Character ●
Matches
●
^ ●
The beginning of a text line
●
$ ●
The end of a text line
●
. ●
Any single character
●
[...] ●
Any single character in the bracketed list or range
●
●
[^...] ●
Any character not in the list or range
●
* ●
Zero or more occurrences of the preceding character or regular expression
●
.* ●
Zero or more occurrences of any single character
●
●\ ●
The escape of special meaning of next character
●
Searching for Metacharacters
•To
use the grep command to search for metacharacters such as & ! . * ? and \,
●
●
LINUX PROGRAMMING
(CSE 4303)