0% found this document useful (0 votes)
36 views

Regular Expression

Regular expressions are patterns used to match character combinations in strings. They originated in the 1950s and became popular with Unix text-processing utilities. Some key advantages of regular expressions include being more concise than equivalent code and being easier for non-programmers to use than procedural code. They allow operations like matching alternatives with |, grouping with (), and quantification of elements with ?, *, +, {n}, etc. Regular expressions are now widely supported in programming languages and text editors.

Uploaded by

Fawzi Gharib
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Regular Expression

Regular expressions are patterns used to match character combinations in strings. They originated in the 1950s and became popular with Unix text-processing utilities. Some key advantages of regular expressions include being more concise than equivalent code and being easier for non-programmers to use than procedural code. They allow operations like matching alternatives with |, grouping with (), and quantification of elements with ?, *, +, {n}, etc. Regular expressions are now widely supported in programming languages and text editors.

Uploaded by

Fawzi Gharib
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Ministry Of Higher Education & Scientific

Research
Salahaddin University - College Of Science-
nd
Computer Department – 2 Stage

Regular Expression
Prepared By : Bilal Kamaran Saed
Contents
Introduction – What is Regular Expression ?...............................................................................................3
History.........................................................................................................................................................4
Advantages of Regular Expression...............................................................................................................5
Basic Concept..............................................................................................................................................6
Writing a regular expression pattern...........................................................................................................8
Using simple patterns..........................................................................................................................8
Using special characters......................................................................................................................8
Escaping.....................................................................................................................................................10
Introduction – What is Regular Expression ?
 regular expression (shortened as regex or regexp also referred to as rational
expression ) is a sequence of characters that specifies a search pattern. Usually
such patterns are used by string-searching algorithms for "find" or "find and
replace" operations on strings, or for input validation. It is a technique developed
in theoretical computer science and formal language theory.
The concept of regular expressions began in the 1950s, when the American
mathematician Stephen Cole Kleene formalized the description of a regular
language. They came into common use with Unix text-processing utilities.
Different syntaxes for writing regular expressions have existed since the 1980s,
one being the POSIX standard and another, widely used, being the Perl syntax.
Regular expressions are used in search engines, search and replace dialogs of word
processors and text editors, in text processing utilities such as sed and AWK and
in lexical analysis. Many programming languages provide regex capabilities either
built-in or via libraries, as it has uses in many situations.
History
Regular expressions originated in 1951, when mathematician Stephen Cole
Kleene described regular languages using his mathematical notation
called regular events. These arose in theoretical computer science, in the
subfields of automata theory (models of computation) and the description and
classification of formal languages. Other early implementations of pattern
matching include the SNOBOL language, which did not use regular expressions,
but instead its own pattern matching constructs.
Regular expressions entered popular use from 1968 in two uses: pattern matching
in a text editor and lexical analysis in a compiler. Among the first appearances of
regular expressions in program form was when Ken Thompson built Kleene's
notation into the editor QED as a means to match patterns in text files. For speed,
Thompson implemented regular expression matching by just-in-time
compilation (JIT) to IBM 7094 code on the Compatible Time-Sharing System, an
important early example of JIT compilation. He later added this capability to the
Unix editor ed, which eventually led to the popular search tool grep's use of
regular expressions ("grep" is a word derived from the command for regular
expression searching in the ed editor:  g/re/p  meaning "Global search for Regular
Expression and Print matching lines"). Around the same time when Thompson
developed QED, a group of researchers including Douglas T. Ross implemented a
tool based on regular expressions that is used for lexical analysis
in compiler design.
Many variations of these original forms of regular expressions were used
in Unix programs at Bell Labs in the 1970s, including vi, lex, sed, AWK, and expr,
and in other programs such as Emacs. Regexes were subsequently adopted by a
wide range of programs, with these early forms standardized in
the POSIX.2 standard in 1992.
In the 1980s the more complicated regexes arose in Perl, which originally derived
from a regex library written by Henry Spencer (1986), who later wrote an
implementation of Advanced Regular Expressions for Tcl. The Tcl library is a
hybrid NFA/DFA implementation with improved performance characteristics.
Software projects that have adopted Spencer's Tcl regular expression
implementation include PostgreSQL. Perl later expanded on Spencer's original
library to add many new features. Part of the effort in the design
of Raku (formerly named Perl 6) is to improve Perl's regex integration, and to
increase their scope and capabilities to allow the definition of parsing expression
grammars.] The result is a mini-language called Raku rules, which are used to
define Raku grammar as well as provide a tool to programmers in the language.
These rules maintain existing features of Perl 5.x regexes, but also allow BNF-style
definition of a recursive descent parser via sub-rules.
The use of regexes in structured information standards for document and
database modeling started in the 1960s and expanded in the 1980s when industry
standards like ISO SGML (precursored by ANSI "GCA 101-1983") consolidated. The
kernel of the structure specification language standards consists of regexes. Its
use is evident in the DTD element group syntax.
Starting in 1997, Philip Hazel developed PCRE (Perl Compatible Regular
Expressions), which attempts to closely mimic Perl's regex functionality and is
used by many modern tools including PHP and Apache HTTP Server.
Today, regexes are widely supported in programming languages, text processing
programs (particularly lexers), advanced text editors, and some other programs.
Regex support is part of the standard library of many programming languages,
including Java and Python, and is built into the syntax of others, including Perl
and ECMAScript. Implementations of regex functionality is often called a regex
engine, and a number of libraries are available for reuse. In the late 2010s, several
companies started to offer hardware, FPGA, GPU implementations
of PCRE compatible regex engines that are faster compared
to CPU implementations.

Advantages of Regular Expression.


 Better than equivalent code
 One line of Regex can replace 100 lines of procedural code
 Easier to cut and paste than code
 Easy to create by trial and error
 Easier for non-programmers than code
 Less error prone than code
Basic Concept
A regular expression, often called a pattern, specifies a set of strings required for a
particular purpose. A simple way to specify a finite set of strings is to list
its elements or members. However, there are often more concise ways: for
example, the set containing the three strings "Handel", "Händel", and "Haendel"
can be specified by the pattern  H(ä|ae?)ndel ; we say that this
pattern matches each of the three strings. In most formalisms, if there exists at
least one regular expression that matches a particular set then there exists an
infinite number of other regular expressions that also match it—the specification
is not unique. Most formalisms provide the following operations to construct
regular expressions.

Boolean "or"
A vertical bar separates alternatives. For example,  gray|grey  can match
"gray" or "grey".
Grouping
Parentheses are used to define the scope and precedence of
the operators (among other uses). For example,  gray|grey  and  gr(a|
e)y  are equivalent patterns which both describe the set of "gray" or "grey".
Quantification
A quantifier after a token (such as a character) or group specifies how often
that a preceding element is allowed to occur. The most common quantifiers
are the question mark  ? , the asterisk  *  (derived from the Kleene star), and
the plus sign  +  (Kleene plus).
? The question mark indicates zero or one occurrences of the
preceding element. For example,  colou?r  matches both "color"
and "colour".
* The asterisk indicates zero or more occurrences of the
preceding element. For example,  ab*c  matches "ac", "abc",
"abbc", "abbbc", and so on.
+ The plus sign indicates one or more occurrences of the
preceding element. For example,  ab+c  matches "abc", "abbc",
"abbbc", and so on, but not "ac".
{n} The preceding item is matched exactly n times.
{min,} The preceding item is matched min or more times.
{,max} The preceding item is matched up to max times.
The preceding item is matched at least min times, but not more
{min,max}
than max times.
Writing a regular expression pattern
A regular expression pattern is composed of simple characters, such as /abc/, or a
combination of simple and special characters, such as /ab*c/ or /Chapter
(\d+)\.\d*/. The last example includes parentheses, which are used as a memory
device. The match made with this part of the pattern is remembered for later use,
as described in Using groups.

Using simple patterns


Simple patterns are constructed of characters for which you want to find a direct
match. For example, the pattern /abc/ matches character combinations in strings
only when the exact sequence "abc" occurs (all characters together and in that
order). Such a match would succeed in the strings "Hi, do you know your
abc's?" and "The latest airplane designs evolved from slabcraft.". In both cases the
match is with the substring "abc". There is no match in the string "Grab
crab" because while it contains the substring "ab c", it does not contain the exact
substring "abc".

Using special characters


When the search for a match requires something more than a direct match, such
as finding one or more b's, or finding white space, you can include special
characters in the pattern. For example, to match a single  "a" followed by zero or
more  "b"s followed by "c", you'd use the pattern /ab*c/: the * after "b" means "0
or more occurrences of the preceding item." In the string "cbbabbbbcdebc", this
pattern will match the substring "abbbbc".

The following pages provide lists of the different special characters that fit into
each category, along with descriptions and examples.

 Assertions
o Assertions include boundaries, which indicate the beginnings and
endings of lines and words, and other patterns indicating in some
way that a match is possible (including look-ahead, look-behind,
and conditional expressions).
 Character classes
o Distinguish different types of characters. For example,
distinguishing between letters and digits.

Special characters in regular expressions.


Characters / constructs Corresponding
article
\, ., \cX, \d, \D, \f, \n, \r, \s, \S, \t, \v, \w, \W, \0, \xhh, \uhhhh, \uhhhhh, [\b] Character classes
^, $, x(?=y), x(?!y), (?<=y)x, (?<!y)x, \b, \B Assertions
(x), (?:x), (?<Name>x), x|y, [xyz], [^xyz], \Number Groups and ranges

*, +, ?, x{n}, x{n,}, x{n,m} Quantifiers
\p{UnicodeProperty}, \P{UnicodeProperty} Unicode property
escapes
o Groups and ranges
o Indicate groups and ranges of expression characters.
o Quantifiers
o Indicate numbers of characters or expressions to match.
 Unicode property escapes
o Distinguish based on unicode character properties, for example,
upper- and lower-case letters, math symbols, and punctuation.
 If you want to look at all the special characters that can be used in regular
expressions in a single table, see the following:
Escaping
If you need to use any of the special characters literally (actually searching for
a "*", for instance), you must escape it by putting a backslash in front of it. For
instance, to search for "a" followed by "*" followed by "b", you'd use /a\*b/ — the
backslash "escapes" the "*", making it literal instead of special.

Similarly, if you're writing a regular expression literal and need to match a slash
("/"), you need to escape that (otherwise, it terminates the pattern). For instance,
to search for the string "/example/" followed by one or more alphabetic
characters, you'd use /\/example\/[a-z]+/i—the backslashes before each slash
make them literal.

To match a literal backslash, you need to escape the backslash. For instance, to
match the string "C:\" where "C" can be any letter, you'd use /[A-Z]:\\/ — the first
backslash escapes the one after it, so the expression searches for a single literal
backslash.

If using the RegExp constructor with a string literal, remember that the backslash


is an escape in string literals, so to use it in the regular expression, you need to
escape it at the string literal level. /a\*b/ and new RegExp("a\\*b") create the
same expression, which searches for "a" followed by a literal "*" followed by "b".

If escape strings are not already part of your pattern you can add them
using String.replace:

function escapeRegExp(string) {

return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole


matched string

Copy to Clipboard
The "g" after the regular expression is an option or flag that performs a global
search, looking in the whole string and returning all matches.
References
 https://developer.mozilla.org/en-
US/docs/Web/JavaScript/Guide/Regular_Expressions
 https://en.wikipedia.org/wiki/Regular_expression
 http://www.troubleshooters.com/linux/presentations/leap_r
egex/7.html

You might also like