Automata Theory: CS411-2012S-02 Formal Languages
Automata Theory: CS411-2012S-02 Formal Languages
02-0:
An alphabet is a nite set of symbols 1 = {a, b, . . ., z} 2 = {0, 1} A string is a nite sequence of symbols from an alphabet re, truck are both strings over {a, . . ., z} length of a string is the number of symbols in the string |re| = 4, |truck| = 5
02-1:
is the concatenation operator w1 = re, w2 = truck w1 w2 = retruck w2 w1 = truckre w2 w2 = trucktruck Often drop the : w1 w2 = retruck For any string w , w = w
02-2:
w1 = w w2 = ww w3 = www By denition, w 0 =
Can reverse a string: w R truckR = kcurt
02-3:
Formal Language
A formal language (or just language) is a set of strings L1 ={a, aa, abba, bbba} L2 ={car, truck, goose} L3 ={1, 11, 111, 1111, 11111, . . .} A language can be either nite or innite
02-4:
Language Concatenation
L1 L2 = {wv : w L1 v L2 }
{a, ab}{bb, b} =
02-5:
Language Concatenation
L1 L2 = {wv : w L1 v L2 }
{a, ab}{bb, b} = {abb, ab, abbb} {a, ab}{a, ab} =
02-6:
Language Concatenation
L1 L2 = {wv : w L1 v L2 }
{a, ab}{bb, b} = {abb, ab, abbb} {a, ab}{a, ab} = {aa, aab, aba, abab} {a, aa}{a, aa} =
02-7:
Language Concatenation
L1 L2 = {wv : w L1 v L2 }
{a, ab}{bb, b} = {abb, ab, abbb} {a, ab}{a, ab} = {aa, aab, aba, abab} {a, aa}{a, aa} = {aa, aaa, aaaa} What can we say about |L1 L2 |, if we know |L1 | = m and |L2 | = n?
02-8:
Language Concatenation
We can concatenate a language with itself, just like strings L1 = L, L2 = LL, L3 = LLL, etc. What should L0 be, and why?
02-9:
Language Concatenation
We can concatenate a language with itself, just like strings L1 = L, L2 = LL, L3 = LLL, etc.
L = L0 L1 L2 L3 . . .
02-10:
Regular Expressions
Regular expressions are a way to describe formal languages Regular expressions are dened recursively Base case simple regular expressions Recursive case how to build more complex regular expressions from simple regular expressions
02-11:
Regular Expressions
is a regular expression, representing {} is a regular expression, representing {} a , a is a regular expression representing {a}
if r1 and r2 are regular expressions, then (r1 r2 ) is a regular expression
L[(r )] = (L[r])
02-12:
Regular Expressions
a (r1 r2 ) (r1 + r2 ) (r )
L[] = {} L[] = {} L[a] = {a} L[r1 r2 ] = L[r1 ]L[r2 ] L[(r1 + r2 )] = L[r1 ] L[(r )] = (L[r])
L[r2 ]
02-13:
Regular Expressions
02-14:
Regular Expressions
(((a+b)(b*))a) {aa, ba, aba, bba, abba, bbba, abbba, bbbba, . . .} ((a((a+b)*))a) {aa, aaa, aba, aaaa, aaba, abaa, abba, . . .} ((a*)(b*)) {, a, b, aa, ab, bb, aaa, aab, abb, bbb, . . .} ((ab)*) {, ab, abab, ababab, abababab, . . .}
02-15:
Regular Expressions
All those parenthesis can be confusing Drop them!! (((ab)b)a) becomes abba What about a+bb*a whats the problem?
02-16:
Regular Expressions
All those parenthesis can be confusing Drop them!! (((ab)b)a) becomes abba What about a+bb*a whats the problem? Ambiguous! a+(b(b*))a, (a+b)(b*)a, (a+(bb))*a ?
02-17:
r.e. Precedence
From highest to Lowest: Kleene Closure * Concatenation Alternation + ab*c+e = (a(b*)c) + e (We will still need parentheses for some regular expressions: (a+b)(a+b))
02-18:
Regular Expressions
Intuitive Reading of Regular Expressions Concatenation == is followed by + == or * == zero or more occurances (a+b)(a+b)(a+b) (a+b)* aab(aa)*
02-19:
Regular Expressions
02-20:
Regular Expressions
All strings over {a,b} that start with an a a(a+b)* All strings over {a,b} that are even in length
02-21:
Regular Expressions
All strings over {a,b} that start with an a a(a+b)* All strings over {a,b} that are even in length ((a+b)(a+b))* All strings over {0,1} that have an even number of 1s.
02-22:
Regular Expressions
All strings over {a,b} that start with an a a(a+b)* All strings over {a,b} that are even in length ((a+b)(a+b))* All strings over {0,1} that have an even number of 1s. 0*(10*10*)* All strings over a, b that start and end with the same letter
02-23:
Regular Expressions
All strings over {a,b} that start with an a a(a+b)* All strings over {a,b} that are even in length ((a+b)(a+b))* All strings over {0,1} that have an even number of 1s. 0*(10*10*)* All strings over a, b that start and end with the same letter a(a+b)*a + b(a+b)*b + a + b
02-24:
Regular Expressions
02-25:
Regular Expressions
All strings over {0, 1} with no occurrences of 00 1*(011*)*(0+1*) All strings over {0, 1} with exactly one occurrence of 00
02-26:
Regular Expressions
All strings over {0, 1} with no occurrences of 00 1*(011*)*(0+1*) All strings over {0, 1} with exactly one occurrence of 00 1*(011*)*00(11*0)*1* All strings over {0, 1} that contain 101
02-27:
Regular Expressions
All strings over {0, 1} with no occurrences of 00 1*(011*)*(0+1*) All strings over {0, 1} with exactly one occurrence of 00 1*(011*)*00(11*0)*1* All strings over {0, 1} that contain 101 (0+1)*101(0+1)* All strings over {0, 1} that do not contain 01
02-28:
Regular Expressions
All strings over {0, 1} with no occurrences of 00 1*(011*)*(0+1*) All strings over {0, 1} with exactly one occurrence of 00 1*(011*)*00(11*0)*1* All strings over {0, 1} that contain 101 (0+1)*101(0+1)* All strings over {0, 1} that do not contain 01 1*0*
02-29:
Regular Expressions
All strings over {/, *, a, . . ., z } that form valid C comments Use quotes to differentiate the * in the input from the regular expression * Use [a-z] to stand for (a + b + c + d + . . . + z)
02-30:
Regular Expressions
All strings over {/, *, a, . . ., z } that form valid C comments Use quotes to differentiate the * in the input from the regular expression * Use [a-z] to stand for (a + b + c + d + . . . + z) /*([a-z]+/)* (*(*)*[a-z]([a-z]+/)*)* *(*)*/ This exact problem (nding a regular expression for C comments) has actually been used in an industrial context.
02-31:
Regular Languages
A language is regular if it can be described by a regular expression. The Regular Languages(LREG ) is the set of all languages that can be represented by a regular expression Set of set of strings Raises the question: Are there languages that are not regular? Stay tuned!