Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
0 ratings
0% found this document useful (0 votes)
17 views
Regular Expression
Uploaded by
alok_b
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save Regular Expression For Later
Download
Save
Save Regular Expression For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
0 ratings
0% found this document useful (0 votes)
17 views
Regular Expression
Uploaded by
alok_b
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save Regular Expression For Later
Carousel Previous
Carousel Next
Save
Save Regular Expression For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
Download now
Download
You are on page 1
/ 3
Search
Fullscreen
6 Regular Expressions Let’s now take a different approach to categorizing problems. Instead of focusing on the power of « computing device, let's look at the task that we need to perform. In particular, let's consider problems in which our goal is to match finite or repeating pattems. For example, consider: © The first step of compiling a program: this step is called lexical analysis. Its job is to break the source code into meaningful units such as keywords, variables, and numbers. For example, the string void may be a keyword, while the string 232-12 should be recognized as a floating point number + Filtering email for spam. + Sorting email into appropriate mailboxes based on sender and/or content words and phrases. # Searching a complex directory structure by specifying patterss that are known to occur in the file we want. In this chapter, we will define a simple pattern language. It has limitations. But its strength, as we will soon see, is that we ean implement pattern matching for this language using finite state machines In his classic book, A Pattern Language B, Christopher Alexander described common patterns that can be found in successful buildings, towns and cities. Software engineers read Alexander’s ‘work and realized that the same is true of successful programs and systems. Pattems are ubiquitous in our world, 6.1 What is a Regular Expression? “The repular expression language that we are about to deserbe is built onan alphabet that contains two kinds of symbols: + asel ofspecial symbols to which we will atach particular meanings when they occur ina regular expression, These symbols are , U,¢,(,), *, and ~ # an alphabet 2, which contains the symbols that regular expressions will match against A regular expression © is string that can be formed according to the following rules: is a regular expression, is a regular expression, Every element in ¥ isa regular expression, Given two regular expressions a. and B, a is a regular expression, Given two regular expressions a. and ,, 0. U B is a regular expression Given a regular expression a, a is a regular expression, Given a regular expression a, ais a regular expression, Given a regular expression a, (a) is a regular expression, So, ifwe let ¥ = {a, b}, the following strings are regular expressions: Bea, Jb)", abba Ue. . ‘The language of regular expressions, as we have just defined it, is useful because every regular expression has a ‘meaning (just like every English sentence and every Java program). In the case of regular expressions, the meaning, ‘ofa string is another language. In other words, every string a (such as alba U e) in the regular expression language ‘has, as its meaning, some new language that contains exactly the strings that match the pattern specified in a, To make it possible to determine that meaning, we need to describe a semantic interpretation function for regular expressions. Fortunately, the regular expressions language is simple, So designing a compositional semantic interpretation function (as defined in Section 2.2.6) for itis straightforward, As you read the definition that we are about to present, it will become clear why we chose the particular symbol alphabet we did. In particular, you will, Chapter 6 92 Regular Expressionsnotice the similarity between the operations that are allowed in regular expressions and the operations that we defined ‘on languages in Section 2.2 Define the following semantic interpretation function L for the language of regular expressions: 1, L(@)= 2, the language that contains no strings, 2. L(@)= {e} the language that contains just the empty string, 3. Forany ¢ © E, L(c) = {c}, the language that contains the single, one-character string ¢ 4, For any regular expressions «and B, L(«B) = L(«) L(B). In other words, o form the meening of the concatenation of two regular expressions, first determine the meaning of each of the constituents, Both meanings will be Tanguages. Then concatenate the two languages together. Recall that the concatenation of two languages £ and La is {w= xy, whore x € Ly and y © Lz}. Note that, if either L(at) or LB) is equal to Z, then the concatenation will also be equal to 2. 5. Forany regular expressions a and B, (ce B) = L(a) UW L(P). Again we form the meaning ofthe larger expression by first determining the meaning of cach of the constituents, Each of them is a language. The meaning of aU B then, as suggested by our choice of the character U as an operator, isthe union of the two constituent languages. 6. For any regular expression a, L(«*) = (L(a))*, where * is the Kleene star operator defined in Section 2.2.5. So L(a*) isthe language that is formed by concatenating together zero or more stings drawn from L(q) 7. For any regular expression 0, La’) ~ L(aa*) = L(a) (L(a))*. If La) is equal to Z%, then L(a") is also equal to D. Otherwise L(a) is the language that ig formed by concatenating together one or more strings drawn from a, 8. For any regular expression a, L((a)) ~ L(@). In other words, parentheses have no effect on meaning except to ‘group the constituents in an expression, Ifthe meaning of a regular expression a is the language L, then we say that a defines or describes L. The definition that we have just given for the regular expression language contains three kinds of rules: + Rules 1, 3, 4 5, and 6 give the language its power to define sets, starting with the basic sets defined by rules 1 and 3, and then building larger sets using the operators defined by rules 4,5, and 6. # Rule 8 has as its only role grouping other operators Rules 2 and 7 appear to add functionality to the regular expression language. But in fact they don; they serve only to provide convenient shorthands for languages that can be defined using only rules 1, 3-6, and 8. Let's see why. First consider rule 2: the language of regular expressions does not need the symbol ¢ because it has an alternative mechanism for describing L(c). Observe that L(@*) ~ {w: wis formed by concatenating together zero or more strings from @}. But how many ways are there fo concatenate together zero or more strings from @? If we select zero strings to concatenate, we get c. We cannot select more than zero since there aren’t any to choose from. So L(2*) = (c} ‘Thus, whenever we would like to write ¢, we could instead write @*. Ibis much elearerto write e, and we shall. But, ‘whenever we wish to make a formal statement about regular expressions or the languages they define, we need not ‘consider rule 2 since we can rewrite any regular expression that contains © as an equivalent one that contains Z* instead, Next consider rule 7: as we showed in the statement of rule 7 itself, the regular expression a’ is equivalent to the slightly longer regular expression at*. The form a” is @ convenient shorteut, and we will use it, But we need not ‘consider rule 7 in any analysis that we may choose to do of regular expressions or the languages that they generate. ‘The compositional semantic interpretation function that we just defined lets us map between regular expressions and the languages that they define. We begin by analyzing the smallest subexpressions and then work outwatds to larger and larger expressions, Chapter 6 93 Regular ExpressionsExample 6.1 Analyzing a Simple Regular Expression (a U bb) = La vs)*) Lo) (Li(a U b)))* Lb) (2) J E(o))* Le) {a} U ()* {} ~ {a,)* (0) So the meaning of the regular expression (a U b)*b is the set of al strings over the alphabet (2, } that end in b. (One straightforward way to read a regular expression and determine its meaning is to imagine it as a procedure that ‘generates strings, Read it left to right and imagine it generating a string left to right. As you are doing tha, think of any expression that is enclosed in a Kleene star asa loop that can be executed zero or more times. Each time through the loop, choose any one of the alternatives listed in the expression. So we can read the regular expression of the last ‘example, (2 U b)*, a8, “Go through a loop zero or more times, picking a single a orb each time. Then concatenate )*), bb.” Any string that can be generated by this procedure is in L( (a Regular expressions can be used to sean text and pick out email addresses, € 793 Example 6.2 Another Simple Regular Expression UW ((aUb) (avd) )a(aUD)*) = 1((av ba Ub) La) Lau )*) =L(aubyaub)) fa} L(avry)* =a Ub) (av) {a} {a,b} = {a,b} {a,b} {a} (a,b}* So the meaning of the regular expression ( (a Ub) (a Ub) ) a (2 Ub) is (way: x and y ate strings of a's and js and [x|=2}. Alternatively, its the language that contains all strings of a’s and b's such that there exists a third character and it is Example 6.3 Given a Language, Find a Regular Expression Let L = {w < {a,1}* : he]is even}. There are two simple regular expressions both of which define L: (@ubiauby* This one ean be read as, “Go through a loop zero or more times. Each time through, choose an a or b, then choose a second character (a or b).” (aa Uab Uba Ubb)* ‘This one can be read as, “Go through a loop zero or more times. Each time through, choose one of the two-character sequences.” From this example, itis elear that the semantic interpretation function we have defined for regular expressions is not ‘one-to-one. In fact, given any language L, if there is one regular expression that defines i, there is an infinite number that do. This is trivially tue since, for any regular expression a, the regular expression a U ot defines the same language a does. Recall from our discussion in Section 2.2.6 that this is not unusual. Semantic interpretation functions for English and for Java are not one-to-one, The practical consequence of this phenomenon for regular expressions is that, if we are ‘ying to design a regular expression that describes some particular language, there will be more than one right answer, We will generally seek the simplest one that works, both for clarity and to make pattern matching fast Chapter 6 94 Regular Expressions
You might also like
TPL lect 15 - 16
PDF
No ratings yet
TPL lect 15 - 16
5 pages
2. Regular Expressions
PDF
No ratings yet
2. Regular Expressions
4 pages
Regular Expressions and Regular Languages
PDF
No ratings yet
Regular Expressions and Regular Languages
5 pages
Regular expression
PDF
No ratings yet
Regular expression
89 pages
Theory of Automata RE 3
PDF
No ratings yet
Theory of Automata RE 3
13 pages
chapter two
PDF
No ratings yet
chapter two
59 pages
Automata Theory Computability - M2
PDF
No ratings yet
Automata Theory Computability - M2
68 pages
Regular Expressions
PDF
No ratings yet
Regular Expressions
31 pages
Chapter 4
PDF
No ratings yet
Chapter 4
31 pages
chapter 3
PDF
No ratings yet
chapter 3
10 pages
unit2
PDF
No ratings yet
unit2
135 pages
Unit 3 - Regular Expression
PDF
No ratings yet
Unit 3 - Regular Expression
45 pages
Chap-2 2 (RegularExpression)
PDF
No ratings yet
Chap-2 2 (RegularExpression)
46 pages
Unit Ii
PDF
No ratings yet
Unit Ii
25 pages
Chapter 3 - Regular Expression
PDF
No ratings yet
Chapter 3 - Regular Expression
16 pages
Lecture Slides Regular Expressions
PDF
No ratings yet
Lecture Slides Regular Expressions
138 pages
lecture 3, 4 (1)
PDF
No ratings yet
lecture 3, 4 (1)
33 pages
Absent Tha
PDF
No ratings yet
Absent Tha
33 pages
03 Regular Expression
PDF
No ratings yet
03 Regular Expression
18 pages
Automata 3
PDF
No ratings yet
Automata 3
21 pages
2.0+Regular Expression Part 1 MKN
PDF
No ratings yet
2.0+Regular Expression Part 1 MKN
33 pages
Lecture 7
PDF
No ratings yet
Lecture 7
70 pages
Ch3 - Regular Expression: Subhash Sagar Email
PDF
No ratings yet
Ch3 - Regular Expression: Subhash Sagar Email
37 pages
2 Regular Expression
PDF
No ratings yet
2 Regular Expression
23 pages
FL 2
PDF
No ratings yet
FL 2
34 pages
ch3 M.PPTX - 0
PDF
No ratings yet
ch3 M.PPTX - 0
46 pages
5 - Regular Expression
PDF
No ratings yet
5 - Regular Expression
144 pages
2 Regular Expressions
PDF
No ratings yet
2 Regular Expressions
34 pages
Lecture 3a and 3b
PDF
No ratings yet
Lecture 3a and 3b
21 pages
5a Regular Expressions
PDF
No ratings yet
5a Regular Expressions
41 pages
Chapter 3 - Regular Expressions
PDF
No ratings yet
Chapter 3 - Regular Expressions
49 pages
Bcs503 Module 2
PDF
No ratings yet
Bcs503 Module 2
46 pages
Vision 2023 Toc Chapter 3 Regular Expression 59
PDF
No ratings yet
Vision 2023 Toc Chapter 3 Regular Expression 59
8 pages
Formal Methods: Finite State Machine - Regular Expressions
PDF
No ratings yet
Formal Methods: Finite State Machine - Regular Expressions
14 pages
Regular expressions
PDF
No ratings yet
Regular expressions
21 pages
Regular Expr
PDF
No ratings yet
Regular Expr
3 pages
Chapter 3 RE
PDF
No ratings yet
Chapter 3 RE
19 pages
Regular Expressions (2)
PDF
No ratings yet
Regular Expressions (2)
17 pages
Regular - Expressions For FL & A
PDF
No ratings yet
Regular - Expressions For FL & A
34 pages
Regular Expression: Dept. of Computer Science Faculty of Science and Technology
PDF
No ratings yet
Regular Expression: Dept. of Computer Science Faculty of Science and Technology
16 pages
Language About Complier Construction
PDF
No ratings yet
Language About Complier Construction
23 pages
TOA Lecture03
PDF
No ratings yet
TOA Lecture03
24 pages
1 Regular Expression
PDF
No ratings yet
1 Regular Expression
6 pages
3 RegularExpressions
PDF
No ratings yet
3 RegularExpressions
25 pages
Regular Expressions
PDF
No ratings yet
Regular Expressions
169 pages
Unit I
PDF
No ratings yet
Unit I
37 pages
SPECIFICATION OF TOKENS - Unit 1
PDF
No ratings yet
SPECIFICATION OF TOKENS - Unit 1
13 pages
Toc Unit 2
PDF
No ratings yet
Toc Unit 2
29 pages
Lecture 7 & 8 - Regular Expressions
PDF
No ratings yet
Lecture 7 & 8 - Regular Expressions
39 pages
Lect2 Lexical
PDF
No ratings yet
Lect2 Lexical
9 pages
Theory of Automata and Formal Languages
PDF
No ratings yet
Theory of Automata and Formal Languages
24 pages
Unit 2
PDF
No ratings yet
Unit 2
53 pages
Lecture02 Scanning 1
PDF
No ratings yet
Lecture02 Scanning 1
72 pages
Regular Expressions and Languages
PDF
No ratings yet
Regular Expressions and Languages
16 pages
Small17 PDF
PDF
No ratings yet
Small17 PDF
64 pages
Lecture#03,4
PDF
No ratings yet
Lecture#03,4
27 pages
Finit Representations Language
PDF
No ratings yet
Finit Representations Language
5 pages
Lec 4
PDF
No ratings yet
Lec 4
16 pages
Specification of Tokens
PDF
No ratings yet
Specification of Tokens
21 pages
Modular Arithmetic Class1
PDF
No ratings yet
Modular Arithmetic Class1
2 pages
Set Function Relation Note
PDF
No ratings yet
Set Function Relation Note
4 pages
Discrete Structures: Dr. Nirnay Ghosh
PDF
No ratings yet
Discrete Structures: Dr. Nirnay Ghosh
18 pages
Q.No. Type Section Key/Range Marks
PDF
No ratings yet
Q.No. Type Section Key/Range Marks
3 pages