1 LectureNotes PDF
1 LectureNotes PDF
Lecture Notes
The Three Hour Tour Through Automata Theory
Read Supplementary Materials: The Three Hour Tour Through Automata Theory
Read Supplementary Materials: Review of Mathematical Concepts
Read K & S Chapter 1
Do Homework 1.
+ 10
2 5
(3) Optimization: Realize that we can skip the first assignment since the value is never used and that we can precompute the
arithmetic expression, since it contains only constants.
(4) Termination: Decide whether the program is guaranteed to halt.
(5) Interpretation: Figure out what (if anything) it does.
A language is a (possibly infinite) set of finite length strings over a finite alphabet.
Languages
(1) Σ = {0,1,2,3,4,5,6,7,8,9}
L = {w ∈ Σ*: w represents an odd integer}
= {w ∈ Σ*: the last character of w is 1,3,5,7, or 9}
= (0∪1∪2∪3∪4∪5∪6∪7∪8∪9)*
(1∪3∪5∪7∪9)
(2) Σ = {(,)}
L = {w ∈ Σ*: w has matched parentheses}
= the set of strings accepted by the grammar:
S→(S)
S → SS
S→ε
(3) L = {w: w is a sentence in English}
Examples: Mary hit the ball.
Colorless green ideas sleep furiously.
The window needs fixed.
(4) L = {w: w is a C program that halts on all inputs}
Examples:
1,5,3,9,6#1,3,5,6,9 ∈ L
1,5,3,9,6#1,2,3,4,5,6,7 ∉ L
By equivalent we mean:
If we have a machine to solve one, we can use it to build a machine to do the other using just the starting machine and other
functions that can be built using a machine of equal or lesser power.
Clearly, if we are going to work with languages, each one must have a finite description.
Grammars 1 Grammars 2
S→(S)
S → SS
S→ε
Grammars 3
(3) The Language of Simple Arithmetic Expressions
S → <exp>
<exp> → <number>
<exp> → (<exp>)
<exp> → - <exp>
<exp> → <exp> <op> <exp>
<op> → + | - | * | /
<number> → <digit>
<number> → <digit> <number>
<digit > → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
4 + 3
Bottom Up Parsing
4 + 3
Recursively Enumerable
Languages
Recursive
Languages
Context-Free
Languages
Regular
Languages
or
Regular expressions are formed from ∅ and the characters in the target alphabet, plus the operations of:
• Concatenation: αβ means α followed by β
• Or (Set Union): α∪β means α Or (Union) β
• Kleene *: α* means 0 or more occurrences of α concatenated together.
• At Least 1: α+ means 1 or more occurrences of α concatenated together.
• (): used to group the other operators
Examples:
(2) Identifiers:
(A-Z)+((A-Z) ∪(0-9))*
Unrestricted Grammars
Recursively Enumerable
Languages
Recursive
Languages
Context-Free
Languages
Regular
Languages
1,3,5,7,9
1,3,5,7,9
0,2,4,6,8
0,2,4,6,8
letter
letter or digit
blank, delimiter
delimiter or blank
or digit
anything
Pushdown Automata
(//(
s
)/(/
Example: (())()
Stack:
Pushdown Automaton 2
a//a a/a/
#//
s f
b//b b/b/
a//a a/a/
ε//
s f
b//b b/b/
PDA 3
A PDA to accept strings of the form anbncn
Turing Machines
S
d//R
❑//R
a,e//R b,f//R
a,b,e,f//L
a a/d/R b b/e/R c c/f/L ←
❑,e,f//R
f a,b,c,d n
e,f//R
❑
à ❑ a a b b c c a ❑
à ❑ a b a a # a a b a ❑ ❑
à ❑ a b a a # a a b a ❑ ❑
à ❑ a b a a # a a b a ❑ ❑
à ❑ a b a ❑ ❑
à 0 0 1 0 0 0 0 ❑ ❑
à a b b a b a
0 1 0 0 0 0 0
{{à,a,b,#,❑} x {0,1} x
{à,a,b,#,❑} x {0,1}}
à a b a a # a a b a
Ý
a #
a a
b a
a b
à a
Suppose the input machine M has 5 states, 4 tape symbols, and a transition of the form:
q000,a00,q010,a01
q000,a00,q010,a01#q010,a00,q000,a00
a00a00a01
# # #
q000
Church's Thesis
(Church-Turing Thesis)
The Thesis: Anything that can be computed by any algorithm can be computed by a Turing machine.
Another way to state it: All "reasonable" formal models of computation are equivalent to the Turing machine. This isn't a formal
statement, so we can't prove it. But many different computational models have been proposed and they all turn out to be
equivalent.
Example: unrestricted grammars
A Machine Hierarchy
FSMs
PDAs
Turing Machines
Recursively Enumerable
Languages
Recursive
Languages
Context-Free
Languages
Regular
Languages
FSMs
PDAs
Turing Machines
Closure Properties
Example:
L = {anbmcp: n≠m or m ≠ p} is not deterministic context-free.
Theorem 3.7.1: The class of deterministic context-free languages is closed under complement.
Theorem 3.5.2: The intersection of a context-free language with a regular language is a context-free language.
If L were a deterministic CFL, then the complement of L (L') would be a deterministic CFL.
But L' ∩ a*b*c* = {anbncn}, which we know is not context-free, much less deterministic context-free. Thus a contradiction.
Diagonalization
1 2 3 4 5
Set 1 1
Set 2 1 1
Set 3 1 1
Set 4 1
Set 5 1 1 1 1 1
New Set 1
But this new set must necessarily be different from all the other sets in the supposedly complete enumeration. Yet it should be
included. Thus a contradiction.
More on Cantor
1 2 3 4 5 6 7
Set 1 1
Set 2 1
Set 3 1 1
Set 4 1
Set 5 1 1
Set 6 1 1
Set 7 1 1 1
Read the rows as bit vectors, but read them backwards. So Set 4 is 100. Notice that this is the binary encoding of 4.
This enumeration will generate all finite sets of integers, and in fact the set of all finite sets of integers is countable.
But when will it generate the set that contains all the integers except 1?
If HALTS says that TROUBLE halts on itself then TROUBLE loops. IF HALTS says that TROUBLE loops, then TROUBLE
halts.
Viewing the Halting Problem as Diagonalization
First we need an enumeration of the set of all Turing Machines. We'll just use lexicographic order of the encodings we used as
inputs to the Universal Turing Machine. So now, what we claim is that HALTS can compute the following table, where 1 means
the machine halts on the input:
I1 I2 I3 TROUBLE I5
Machine 1 1
Machine 2 1 1
Machine 3
TROUBLE 1 1
Machine 5 1 1 1 1
TROUBLE 1 1 1
Or maybe HALT said that TROUBLE(TROUBLE) would halt. But then TROUBLE would loop.
Recursively Enumerable
Languages
Recursive
Languages
Context-Free
Languages
Regular
Languages
(1) Lexical analysis: Scan the program and break it up into variable names, numbers, etc.
(2) Parsing: Create a tree that corresponds to the sequence of operations that should be executed, e.g.,
/
+ 10
2 5
(3) Optimization: Realize that we can skip the first assignment since the value is never used and that we can precompute the
arithmetic expression, since it contains only constants.
(4) Termination: Decide whether the program is guaranteed to halt.
(5) Interpretation: Figure out what (if anything) useful it does.
Language
L
Grammar
Accepts
Machine
A string over an alphabet is a finite sequence of symbols drawn from the alphabet.
We will generally omit “ ” from strings unless doing so would lead to confusion.
The shortest string contains no characters. It is called the empty string and is written “ ” or ε (epsilon).
More on Strings
Concatenation: The concatenation of two strings x and y is written x || y, x⋅y, or xy and is the string formed by appending the
string y to the string x.
|xy| = |x| + |y|
Replication: For each string w and each natural number i, the string wi is defined recursively as
w0 = ε
wi = wi-1 w for each i ≥ 1
String Reversal
An inductive definition:
Basis: |x| = 0. Then x = ε, and (w⋅x)R = (w⋅ε)R = (w)R = ε⋅wR = εR⋅wR = xR⋅wR
Induction Step: Let |x| = n + 1. Then x = u a for some character a and |u| = n
A language is a (finite or infinite) set of finite length strings over a finite alphabet Σ.
Some languages over Σ: ∅, {ε}, {a, b}, {ε, a, aa, aaa, aaaa, aaaaa}
L = {an, n ≥ 0 }
L = an (If we say nothing about the range of n, we will assume that it is drawn from N, i.e., n ≥ 0.)
L = {} = ∅ (the empty language—not to be confused with {ε}, the language of the empty string)
Languages are sets. Recall that, for sets, it makes sense to talk about enumerations and decision procedures. So, if we want
to provide a computationally effective definition of a language we could specify either a
Example: The logical definition: L = {x : ∃y ∈ {a, b}* : x = ya} can be turned into either a language generator or a
language recognizer.
• So all languages are either finite or countably infinite. Alternatively, all languages are countable.
Operations on Languages 1
Normal set operations: union, intersection, difference, complement…
Examples: Σ = {a, b} L1 = strings with an even number of a's
L2 = strings with no b's
L1 ∪ L2 =
L1 ∩ L2 =
L2 - L1 =
¬( L2 - L1) =
Examples:
L1 = {cat, dog} L2 = {apple, pear} L1 L2 = {catapple, catpear, dogapple, dogpear}
L1 = {an: n ≥ 1} L2 = {an: n ≤ 3} L1 L2 =
Identities:
L∅ = ∅L = ∅ ∀L (analogous to multiplication by 0)
L{ε}= {ε}L = L ∀L (analogous to multiplication by 1)
Replicated concatenation:
Ln = L⋅L⋅L⋅ … ⋅L (n times)
L1 = L
L0 = {ε}
Example:
L = {dog, cat, fish}
L0 = {ε}
L1 = {dog, cat, fish}
L2 = {dogdog, dogcat, dogfish, catdog, catcat, catfish, fishdog, fishcat, fishfish}
L1 = an = {an : n ≥ 0} L2 = bn = {bn : n ≥ 0}
L1 L2 = {an : n ≥ 0}{bn : n ≥ 0} = { an bm : n,m ≥ 0} (common mistake: ) ≠ anbn = { an bn : n ≥ 0}
Note: The scope of any variable used in an expression that invokes replication will be taken to be the entire expression.
L = 1n2m
L = anbman
Operations on Languages 3
Kleene Star (or Kleene closure): L* = {w ∈ Σ* : w = w1 w2 … wk for some k ≥ 0 and some w1, w2, … wk ∈ L}
Alternative definition: L* = L0 ∪ L1 ∪ L2 ∪ L3 ∪ …
Note: ∀L, ε ∈ L*
Example:
L = {dog, cat, fish}
L* = {ε, dog, cat, fish, dogdog, dogcat, fishcatfish, fishdogdogfishcat, …}
Alternatively, L+ = L1 ∪ L2 ∪ L3 ∪ …
L+ = L*-{ε} if ε ∉ L
L+ = L* if ε ∈ L
L Regular
Language
Regular Expression
or
Regular Grammar Accepts
Finite
State
Machine
The regular expressions over an alphabet Σ are all strings over the alphabet Σ ∪ {“(“, “)”, ∅, ∪, *} that can be obtained as
follows:
So far, regular expressions are just (finite) strings over some alphabet, Σ ∪ {“(“, “)”, ∅, ∪, *}.
Regular expressions define languages via a semantic interpretation function we'll call L:
1. L(∅) = ∅ and L(a) = {a} for each a ∈ Σ
2. If α , β are regular expressions, then
L(αβ) = L(α)⋅L(β)
= all strings that can be formed by concatenating to some string from L(α) some string from L(β).
Note that if either α or β is ∅, then its language is ∅, so there is nothing to concatenate and the result is ∅.
3. If α , β are regular expressions, then L(α∪β) = L(α) ∪ L(β)
4. If α is a regular expression, then L(α*) = L(α)*
5. L( (α) ) = L(α)
A regular expression is always finite, but it may describe a (countably) infinite language.
In other words, the class of regular languages is the smallest set that includes all elements of [1] and that is closed under [2],
[3], and [4].
Examples:
The set of natural numbers N can be defined as the closure over {0} and the successor (succ(n) = n+1) function.
Regular languages can be defined as the closure of {a} ∀a∈Σ and ∅ and the functions of concatenation, union, and
Kleene star.
We say a set is closed over a function if applying the function to arbitrary elements in the set does not yield any new elements.
Examples:
The set of natural numbers N is closed under multiplication.
Regular languages are closed under intersection.
See Supplementary Materials—Review of Mathematical Concepts for more formal definitions of these terms.
But we'd also like a minimal definition of what constitutes a regular expression. Why?
Observe that
∅0 = {ε} (since 0 occurrences of the elements of any set generates the empty string), so
∅* = {ε}
So, without changing the set of languages that can be defined, we can add ε to our notation for regular expressions if we
specify that
L(ε) = {ε}
We're essentially treating ε the same way that we treat the characters in the alphabet.
Having done this, you'll probably find that you rarely need ∅ in any regular expression.
L( (aa*) ∪ ε ) =
L( (a ∪ ε)* ) =
L = { w ∈ {a,b}* : there is no more than one b}
L = { w ∈ {a,b}* : no two consecutive letters are the same}
• Intersection: α∩β (we’ll prove later that regular languages are closed under intersection)
Example: L = (a3)* ∩ (a5)*
Regular expressions are strings in the language of regular expressions. Thus to interpret them we need to:
1. Parse the string
2. Assign a meaning to the parse tree
Parsing regular expressions is a lot like parsing arithmetic expressions. To do it, we must assign precedence to the operators:
Regular Arithmetic
Expressions Expressions
concatenation
multiplication
intersection
a b* ∪ c d* x y2 + i j2
Recall that grammars are language generators. A grammar is a recipe for creating strings in a language.
Regular expressions are analogous to grammars, but with two special properties:
1. The have limited power. They can be used to define only regular languages.
2. They don't look much like other kinds of grammars, which generally are composed of sets of production rules.
But we can write more "standard" grammars to define exactly the same languages that regular expressions can define.
Specifically, any such grammar must be composed of rules that:
((aa) ∪ (ab) ∪ (ba) ∪ (bb))* Notice how these rules correspond naturally to a FSM:
S→ε a, b
S → aT
S → bT S T
T→a
T→b a, b
T → aS
T → bS
Generator Recognizer
Language
Regular Languages
Regular Expressions
Regular Grammars ?
Informally, M accepts a string w if M winds up in some state that is an element of F when it has finished reading w (if not, it
rejects w).
The language accepted by M, denoted L(M), is the set of all strings accepted by M.
Deterministic finite state machines (DFSMs) are also called deterministic finite state automata (DFSAs or DFAs).
An Example Computation
Thus (q0, 235) |-M* (q1, ε). (What does this mean?)
More Examples
(b ∪ ε)(ab)*(a ∪ ε)
More Examples
Theorem: The output language of a deterministic finite state transducer (on final state) is regular.
Convert 1's to 0's and 0's to 1's (this isn't just a finite state task -- it's a one state task)
1/0
q0
0/1
After every three bits, output a fourth bit such that each group of four bits has odd parity.
M accepts a string w if there exists some path along which w drives M to some element of F.
The language accepted by M, denoted L(M), is the set of all strings accepted by M, where computation is defined analogously to
DFSMs.
A Nondeterministic FSA
L= {w : there is a symbol ai∈Σ not appearing in w}
The idea is to guess (nondeterministically) which character will be the one that doesn't appear.
L1= {w : aa occurs in w}
L2= {x : bb occurs in x}
L3= {y : ∈ L1 or L2 }
M1 = a a a, b
10 11 12
b
b
M2= b b a, b
20 21 22
a
a
M3=
b b a
b,c
ε ¬a
q1
a,c
q0 ε ¬b
q2
a,b
ε ¬c
q3
c
ε a, c
b 3 4
ε b c
1 2
b a,b c, ε
5 6 7
c, ε b
See enemy
Found by enemy
Hide Run See sword
Coast clear
See sword
Found by enemy See laser
Brother
kills enemy Reach for Sword Pick up
Laser
Sword picked up
Swing Sword
Die
Kill enemy
Become King
E(q) = {p ∈ K : (q,w) |-*M (p, w}. E(q) is the closure of {q} under the relation {(p,r) : there is a transition (p, ε, r) ∈ ∆}
An algorithm to compute E(q):
a a a, b
b
10 11 12
ε b
00
ε
b b a, b
a
20 21 22
a
Another Example
b* (b(a ∪ c)c ∪ b(a ∪ b) (c ∪ ε))* b
c
ε a, c
b 3 4
ε b c
1 2
b a, b c, ε
5 6 7
c, ε b
8
E(q) =
δ' =
æ n ö
No. of new states after 3 chars: ç ÷ = n(n-1)(n-2)/6
è n − 3ø a,b
ε ¬c
Total number of states after n chars: 2n
q3
A FSA is deterministic if, for each input and state, there is at most one possible transition.
NFSAs can be deterministic (even with ε-transitions and implicit dead states), but the formalism allows nondeterminism,
in general.
To simulate M = (K, Σ, δ, s, F): Simulate the no more than one b machine on input: aabaa
ST := s;
Repeat
i := get-next-symbol;
if i ≠ end-of-string then
ST := δ(ST, i)
Until i = end-of-string;
If ST ∈ F then accept else reject
Real computers are deterministic, so we have three choices if we want to execute a nondeterministic FSA:
2. Simulate the behavior of the nondeterministic one by constructing sets of states "on the fly" during execution
• No conversion cost
• Time to analyze string w: O(|w| × K2)
SET ST;
ST := E(s);
Repeat
i := get-next-symbol;
if i ≠ end-of-string then
ST1 := ∅
For all q ∈ ST do
For all r ∈ ∆(q, i) do
ST1 := ST1 ∪ E(r);
ST := ST1;
Until i = end-of-string;
If ST ∩ F ≠ ∅ then accept else reject
ST := s;
Repeat:
i := get-next-symbol;
if i ≠ end-of-string then
write(δ2(ST, i));
ST := δ1(ST, i)
Until i = end-of-string;
If ST ∈ F then accept else reject
Theorem: The set of languages expressible using regular expressions (the regular languages) equals the class of languages
recognizable by finite state machines. Alternatively, a language is regular if and only if it is accepted by a finite state machine.
Proof Strategies
Possible Proof Strategies for showing that two sets, a and b are equal (also for iff):
Example:
Prove:
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
A ∩ (B ∪ C) = (B ∪ C) ∩ A commutativity
= (B ∩ A) ∪ (C ∩ A) distributivity
= (A ∩ B) ∪ (A ∩ C) commutativity
2. Do two separate proofs: (1) a Þ b, and (2) b Þa, possibly using totally different techniques. In this case, we show first (by
construction) that for every regular expression there is a corresponding FSM. Then we show, by induction on the number of
states, that for every FSM, there is a corresponding regular expression.
Example:
a*(b ∪ ε)a*
The regular expressions over an alphabet Σ* are all strings over the alphabet Σ ∪ {(, ), ∅, ∪, *} that can be obtained as follows:
1. ∅ and each member of Σ is a regular expression.
2. If α , β are regular expressions, then so is αβ.
3. If α , β are regular expressions, then so is α∪β.
4. If α is a regular expression, then so is α*.
5. If α is a regular expression, then so is (α).
6. Nothing else is a regular expression.
We also allow ε and α+, etc. but these are just shorthands for ∅* and αα*, etc. so they do not need to be considered for
completeness.
Formalizing the Construction: The class of regular languages is the smallest class of languages that contains ∅ and each of the
singleton strings drawn from Σ, and that is closed under
• Union
• Concatenation, and
• Kleene star
Clearly we can construct an FSM for any finite language, and thus for ∅ and all the singleton strings. If we could show that the
class of languages accepted by FSMs is also closed under the operations of union, concatenation, and Kleene star, then we could
recursively construct, for any regular expression, the corresponding FSM, starting with the singleton strings and building up the
machine as required by the operations used to express the regular expression.
To create a FSM that accepts the union of the languages accepted by machines M1 and M2:
1. Create a new start state, and, from it, add ε-transitions to the start states of M1 and M2.
To create a FSM that accepts the concatenation of the languages accepted by machines M1 and M2:
1. Start with M1.
2. From every final state of M1, create an ε-transition to the start state of M2.
3. The final states are the final states of M2.
To create an FSM that accepts the Kleene star of the language accepted by machine M1:
1. Start with M1.
2. Create a new start state S0 and make it a final state (so that we can accept ε).
3. Create an ε-transition from S0 to the start state of M1.
4. Create ε-transitions from all of M1's final states back to its start state.
5. Make all of M1's final states final.
Note: we need a new start state, S0, because the start state of the new machine must be a final state, and this may not be true of
M1's start state.
To create an FSM that accepts the complement of the language accepted by machine M1:
1. Make M1 deterministic.
2. Reverse final and nonfinal states.
A Complementation Example
a
b
q1 q2
L1 ∩ L2 =
L1 L2
• Union
• Concatenation
• Kleene star
• Complementation
An Example
(b ∪ ab*a)*ab*
Proof:
(1) There is a trivial regular expression that describes the strings that can be recognized in going from one state to itself ({ε} plus
any other single characters for which there are loops) or from one state to another directly (i.e., without passing through any other
states), namely all the single characters for which there are transitions.
(2) Using (1) as the base case, we can build up a regular expression for an entire FSM by induction on the number assigned to
possible intermediate states we can pass through. By adding them in only one at a time, we always get simple regular
expressions, which can then be combined using union, concatenation, and Kleene star.
Idea 1: Number the states and, at each induction step, increase by one the states that can serve as intermediate states.
b
a a
1 2 3
I K J
Idea 2: To get from state I to state J without passing through any intermediate state numbered greater than K, a machine may
either:
1. Go from I to J without passing through any state numbered greater than K-1 (which we'll take as the induction hypothesis), or
2. Go from I to K, then from K to K any number of times, then from K to J, in each case without passing through any
intermediate states numbered greater than K-1 (the induction hypothesis, again).
So we'll start with no intermediate states allowed, then add them in one at a time, each time building up the regular expression
with operations under which regular languages are closed.
The Formula
Adding in state k as an intermediate state we can use to go from i to j, described using paths that don't use k:
i k j
Solution: ∪ R(s, q, N) ∀q ∈ F
b
a a a
1 2 3 4
a
1 2
b
b a
b 3
(1) Create a new initial state and a new, unique final state, neither of which is part of a loop.
ε a
4 1 2
b
b a
b 3
a
ε ε
ε a
4 1 2
b
ba*b
aa*b
ε ε
(Notice that the removal of state 3 resulted in two new paths because there were two incoming paths to 3 from another state and 1
outgoing path to another state, so 2×1 = 2.) The two paths from 2 to 1 should be coalesced by unioning their regular expressions
(not shown).
ε ab ∪ aaa*b ∪ ba*b
4 1
ε a
-? ([0-9]+(\.[0-9]*)? | \.[0-9]+)
Matching IP addresses:
Note that some of these constructs are more powerful than regular expressions.
Lecture Notes 7 Equivalence of Regular Languages and FSMs 6
Regular Grammars and Nondeterministic FSAs
Any regular language can be defined by a regular grammar, in which all rules
• have a left hand side that is a single nonterminal
• have a right hand side that is ε, a single terminal, a single nonterminal, or a single terminal followed by a single nonterminal.
S→ε T→a
S → aT T→b
S → bT T → aS
T → bS
a, b
S T
a, b
S→ε T→a
S → aT T→b
S → bT T → aS
T → bS
S→ε A → bA C → aC
S → aB A → cA C → bC
S → aC A→ε C→ε
S → bA B → aB
S → bC B → cB
S → cA B→ε
S → cB
Example:
a
b
X Y
a
b
Regular
Grammar
NFSM
(NFA)
Regular
Expression
DFSM
(DFA)
Σ* is countably infinite, because its elements can be enumerated one at a time, shortest first.
Any language L over Σ is a subset of Σ*, e.g., L1 = {a, aa, aaa, aaaa, aaaaa, …}
L2 = {ab, abb, abbb, abbbb, abbbbb, …}
The set of all possible languages is thus the power set of Σ*.
The power set of any countably infinite set is not countable. So there are an uncountable number of languages over Σ*.
Thus there are more languages than there are regular languages. So there must exist some language that is not regular.
Example
Let Σ = {0, 1, 2, … 9}
Let L ⊆ Σ* be the set of decimal representations for nonnegative integers (with no leading 0's) divisible by 2 or 3.
Recall that a number is divisible by 3 if and only if the sum of its digits is divisible by 3. We can build a FSM to determine that
and accept the language L3a, which is composed of strings of digits that sum to a multiple of 3.
L3 = L1 ∩ L3a
Finally, L = L2 ∪ L3
Another Example
Σ = {0 - 9}
L = {w : w is the social security number of a living US resident}
Any finite language is regular. The size of the language doesn't matter.
Checking Checking
FSA's are good at looking for repeating patterns. They don't bring much to the table when the language is just a set of unrelated
strings.
The argument, “I can't find a regular expression or a FSM”, won't fly. (But a proof that there cannot exist a FSM is ok.)
2. The only way to generate/accept an infinite language with a finite description is to use Kleene star (in regular expressions) or
cycles (in automata). This forces some kind of simple repetitive cycle within the strings.
Example:
ab*a generates aba, abba, abbba, abbbba, etc.
Example:
{an : n ≥ 1 is a prime number} is not regular.
b
b a a b
If a FSM of n states accepts any string of length ≥ n, how many strings does it accept?
L = bab*ab n
________
babbbbab
x y z
xy*z must be in L.
If L is regular, then
∃ N ≥ 1, such that
∀ strings w ∈ L, where |w| ≥ N,
∃ x, y, z, such that w = xyz
and |xy| ≤ N,
and y ≠ ε,
and ∀ q ≥ 0, xyqz is in L.
Example: L = anbn
aaaaaaaaaabbbbbbbbbb
x y z
∃N≥1 Call it N
∀ long strings w We pick one
∃ x, y, z We show no x, y, z
• y falls in region 3:
Since |xy| ≤ N, y must be in region 1. So y = ag for some g ≥ 1. Pumping in or out (any q but 1) will violate the constraint that the
number of a’s has to equal the number of b’s.
Suppose L is regular. Since L is regular, we can apply the pumping lemma to L. Let N be the number from the pumping lemma
for L. Choose w = aNbN. Note that w ∈ L and |w| ≥ N. From the pumping lemma, there exists some x, y, z where xyz = w and
|xy| ≤ N, y ≠ ε, and ∀ q ≥ 0, xyqz ∈L. Because |xy| ≤ N, y = a|y| (y is all a’s). We choose q = 2 and xyqz = aN+|y|bN. Because |y| >
0, then xy2z ∉ L (the string has more a’s than b’s). Thus for all possible x, y, z: xyz = w, ∃q, xyqz ∉ L. Contradiction. ∴ L is
not regular.
Note: the underlined parts of the above proof is “boilerplate” that can be reused. A complete proof should have this text or
something equivalent.
You get to choose w. Make it a single string that depends only on N. Choose w so that it makes your proof easier.
You may end up with various cases with different q values that reach a contradiction. You have to show that all possible cases
lead to a contradiction.
Since L is regular it is accepted by some DFSA, M. Let N be the number of states in M. Let w be a string in L of length N or
more.
N
aaaaaaaaaabbbbbbbbbb
x y
x y
Then, in the first N steps of the computation of M on w, M must visit N+1 states. But there are only N different states, so it must
have visited the same state more than once. Thus it must have looped at least once. We'll call the portion of w that corresponds
to the loop y. But if it can loop once, it can loop an infinite number of times. Thus:
• M can recognize xyqz for all values of q ≥ 0.
• y ≠ ε (since there was a loop of length at least one)
• |xy| ≤ N (since we found y within the first N steps of the computation)
Choose w = aNbN+1
N
aaaaaaaaaabbbbbbbbbbb
x y z
We are guaranteed to pump only a's, since |xy| ≤ N. So there exists a number of copies of y that will cause there to be more a's
than b's, thus violating the claim that the pumped string is in L.
Choose w = aN+1bN
N
aaaaaaaaaabbbbbbbbbbb
x y z
We are guaranteed that y is a string of at least one a, since |xy| ≤ N. But if we pump in a's we get even more a's than b's, resulting
in strings that are in L.
L = {w=aJbK : J ≥ K}
Choose w = aN+1bN
N
aaaaaaaaaabbbbbbbbbbb
x y z
We are guaranteed that y is a string of at least one a, since |xy| ≤ N. But if we pump in a's we get even more a's than b's, resulting
in strings that are in L.
Choose w = abaNbN
N
abaaaaaaaaaabbbbbbbbbbb
x y z
What if L is Regular?
Given a language L that is regular, pumping will work: L = (ab)* Choose w = (ab)N
Note that this does not prove that L is regular. It just fails to prove that it is not.
Once we have some languages that we can prove are not regular, such as anbn, we can use the closure properties of regular
languages to show that other languages are also not regular.
Example: Σ = {a, b}
L = {w : w contains an equal number of a's and b's }
a*b* is regular. So, if L is regular, then L1 = L ∩ a*b* is regular.
But what if L3 and L1 are regular? What can we say about L2?
L3 = L1 ∩ L2.
Example: ab = ab ∩ anbn
Σ = {a}
L = {w = aK : K is a prime number} |x| + |z| is prime.
|x| + |y| + |z| is prime.
N |x| + 2|y| + |z| is prime.
aaaaaaaaaaaaa |x| + 3|y| + |z| is prime, and so forth.
x y z
Distribution of primes:
||| | | | | | | | | | |
||| | | | | | | | | | |
But the Prime Number Theorem tells us that the primes "spread out", i.e., that the number of primes not exceeding x is
asymptotic to x/ln x.
Note that when q = |x| + |z|, |xyqz| = (|y| + 1)×(|x| + |z|), which is composite (non-prime) if both factors are > 1. If you’re careful
about how you choose N in a pumping lemma proof, you can make this true for both factors.
But to use these tools effectively, we may also need domain knowledge (e.g., the Prime Number Theorem).
More Examples
Σ = {0, 1, 2, 3, 4, 5, 6, 7}
L = {w = the octal representation of a number that is divisible by 7}
Example elements of L:
7, 16 (14), 43 (35), 61 (49), 223 (147)
More Examples
Σ = {W, H, Q, E, S, T, B (measure bar)}
L = {w = w represents a song written in 4/4 time}
Example element of L:
WBWBHHBHQQBHHBQEEQEEB
Example elements of L:
F1 = 5, F2 = 17, F3 = 257, F4 = 65,537
Another Example
Σ = {0 - 9, *, =}
L = {w = a*b=c: a, b, c ∈ {0-9}+ and int(a) * int(b) = int(c)}
OR
• The set of decimal representations for nonnegative • The set of strings over {a, b} that contain an equal
integers divisible by 2 or 3 number of a's and b's.
• The social security numbers of living US residents. • The octal representations of numbers that are divisible
• Parity checking by 7
• anbn • The songs in 4/4 time
• ajbk where k>j • The set of prime Fermat numbers
• ak where k is prime
Decision Procedures
A decision procedure is an algorithm that answers a question (usually “yes” or “no”) and terminates. The whole idea of a
decision procedure itself raises a new class of questions. In particular, we can now ask,
1. Is there a decision procedure for question X?
2. What is that procedure?
3. How efficient is the best such procedure?
Clearly, if we jump immediately to an answer to question 2, we have our answer to question 1. But sometimes it makes sense to
answer question 1 first. For one thing, it tells us whether to bother looking for answers to questions 2 and 3.
Examples of Question 1:
Is there a decision procedure, given a regular expression E and a string S, for determining whether S is in L(E)?
Is there a decision procedure, given a Turing machine T and an input string S, for determining whether T halts on S?
Let M1 and M2 be two deterministic FSAs. There is a decision procedure to determine whether M1 and M2 are equivalent. Let L1
and L2 be the languages accepted by M1 and M2. Then the language
L = (L1 ∩ ¬L2) ∪ (¬L1 ∩ L2)
= (L1 - L2) ∪ (L2 - L1)
must be regular. L is empty iff L1 = L2. There is a decision procedure to determine whether L is empty and thus whether L1 = L2
and thus whether M1 and M2 are equivalent.
L1 L2 L1 L2 L1,2
An equivalence relation on a nonempty set A creates a partition of A. We write the elements of the partition as [a1], [a2], …
Example:
Partition:
b 5
a
b a
6
Is this a minimal machine?
State Minimization
Step (1): Get rid of unreachable states.
a
1 2
b
a, b
State 3 is unreachable.
b a
1 2
a b b
We can't easily find the unreachable states directly. But we can find the reachable ones and determine the unreachable ones from
there. An algorithm for finding the reachable states:
a
1 2
b
a, b
Intuitively, two states are equivalent to each other (and thus one is redundant) if all strings in Σ* have the same fate, regardless of
which of the two states the machine is in. But how can we tell this?
a, b
1 2
b a
3 a, b
b a
1 2
a b b
The outcomes are the same, even though the states aren't.
Capture the (weaker) notion of equivalence classes of strings with respect to a language and a particular FSA.
Prove that we can always find a deterministic FSA with a number of states equal to the number of equivalence classes of strings.
We want to capture the notion that two strings are equivalent with respect to a language L if, no matter what is tacked on to them
on the right, either they will both be in L or neither will. Why is this the right notion? Because it corresponds naturally to what
the states of a recognizing FSM have to remember.
Example:
(1) a b b a b
(2) b a b a b
Suppose L = {w ∈ {a,b}* : every a is immediately followed by b}. Are (1) and (2) equivalent?
If two strings are equivalent with respect to L, we write x ≈L y. Formally, x ≈L y if, ∀z ∈ Σ*,
xz ∈ L iff yz ∈ L.
Notice that ≈L is an equivalence relation.
Example:
Σ = {a, b}
L = {w ∈ Σ* : every a is immediately followed by b }
ε aa bbb
a bb baa
b aba
aab
The equivalence classes of ≈L:
Another Example of ≈L
Σ = {a, b}
L = {w ∈ Σ* : |w| is even}
ε bb aabb
a aba bbaa
b aab aabaa
aa bbb
baa
The equivalence classes of ≈L:
ε ba aabb
a bb aabaa
b aaa aabbba
aa aba aabbaa
ab aab
bab
The equivalence classes of ≈L:
An Example of ≈L Where All Elements of L Are Not in the Same Equivalence Class
Σ = {a, b}
L = {w ∈ {a, b}* : no two adjacent characters are the same}
ε bb aabaa
a aba aabbba
b aab aabbaa
aa baa
aabb
The equivalence classes of ≈L:
What if we now consider what happens to strings when they are being processed by a real FSM?
a
1 2
a, b
b
a, b
Define ~M to relate pairs of strings that drive M from s to the same state.
Formally, if M is a deterministic FSM, then x ~M y if there is some state q in M such that (s, x) |-*M (q, ε) and (s, y) |-*M (q, ε).
An Example of ~M
a
1 2
a, b
b
a, b
ε bb aabb
a aba bbaa
b aab aabaa
aa bbb
baa
a,b
1 2
a, b
ε bb aabb
a aba bbaa
b aab aabaa
aa bbb
baa
The equivalence classes of ~M: |~M| =
~M is a refinement of ≈L.
The Refinement
|R| ≥ |S|
Theorem: For any deterministic finite automaton M and any strings x, y ∈ Σ*, if x ~M y, then x ≈L y.
Proof: If x ~M y, then x and y drive m to the same state q. From q, any continuation string w will drive M to some state r. Thus
xw and yw both drive M to r. Either r is a final state, in which case they both accept, or it is not, in which case they both reject.
But this is exactly the definition of ≈L.
Corollary: |~M | ≥ |≈
≈L |.
If x ≈L(M) y then x ~M y.
What's the smallest number of states we can get away with in a machine to accept L?
This follows directly from the theorem that says that, for any machine M that accepts L, |~M| must be at least as large as |≈L |.
Theorem: Let L be a regular language. Then there is a deterministic FSA that accepts L and that has precisely |≈L | states.
Proof: (by construction)
M= K states, corresponding to the equivalence classes of ≈L.
s = [ε], the equivalence class of ε under ≈L.
F = {[x] : x ∈ L}
δ([x], a) = [xa]
For this construction to prove the theorem, we must show:
1. K is finite.
2. δ is well defined, i.e., δ([x], a) = [xa] is independent of x.
3. L = L(M)
|≈L| is finite → regular: If |≈L| is finite, then the standard DFSA ML accepts L. Since L is accepted by a FSA, it is regular.
Σ = {a, b}
L = {w ∈ {a, b}* : no two adjacent characters are the same}
2
a a
a, b
1 a b 4
b b
3
Consider: ε
a
aa
aaa
aaaa
Equivalence classes:
So Where Do We Stand?
1. We know that for any regular language L there exists a minimal accepting machine ML.
2. We know that |K| of ML equals |≈L|.
3. We know how to construct ML from ≈L.
But is this good enough?
Consider:
a
a b b
1 2 3 4
a a
b b
b 5
a
b a
6
We want to take as input any DFSA M' that accepts L, and output a minimal, equivalent DFSA M.
Define q ≡ p iff for all strings w ∈ Σ*, either w drives M to an accepting state from both q and p or it drives M to a rejecting state
from both q and p.
Example:
Σ = {a, b} L = {w ∈ Σ* : |w| is even}
a
1 2
a, b
b
a, b
(Where n is the length of the input strings that have been considered so far)
We'll consider input strings, starting with ε, and increasing in length by 1 at each iteration. We'll start by way overgrouping
states. Then we'll split them apart as it becomes apparent (with longer and longer strings) that their behavior is not identical.
Initially, ≡0 has only two equivalence classes: [F] and [K - F], since on input ε, there are only two possible outcomes, accept or
reject.
Next consider strings of length 1, i.e., each element of Σ. Split any equivalence classes of ≡0 that don't behave identically on all
inputs. Note that in all cases, ≡n is a refinement of ≡n-1.
Constructing ≡, Continued
More precisely, for any two states p and q ∈ K and any n ≥ 1, q ≡n p iff:
1. q ≡n-1 p, AND
2. for all a ∈ Σ, δ(p, a) ≡n-1 δ(q, a)
An Example
Σ = {a, b}
b
1 a 2 3
b a
a b
a a
4 b 5 b 6
a,b
≡0 =
≡1 =
≡2 =
Another Example
(a*b*)*
a b
b
1 2
a
≡0 =
≡1 =
Minimal machine:
S→ε T→a
S → aT T→b
S → bT T → aS
T → bS
a, b a
S T #
a, b b
Convert to deterministic:
S = {s}
δ=
a,b
S(1) T(2)
a,b a,b
#S(3)
≡0 =
≡1 =
Minimal machine:
Language
L
Grammar
Accepts
Machine
Most interesting languages are infinite. So we can't write them down. But we can write down finite grammars and finite
machine specifications, and we can define algorithms for mapping between and among them.
Grammars Machines
Regular Nondeterministic
Expressions FSAs
Deterministic
FSAs
Regular
Grammars
Minimal
DFSAs
Context-Free
Language
L
Context-Free
Grammar
Accepts
Pushdown
Automaton
Derivation choose aa S
(Generate) choose ab a T
yields a S
a T
b
a a a b a a a b
Parse (Accept) use corresponding FSM
Regular grammars must always produce strings one character at a time, moving left to right.
Example 1: L = ab*a
S → aBa S → aB
B→ε vs. B→a
B → bB B → bB
Example 2: L = anb*an
S→B
S → aSa
B→ε
B → bB
Context-Free Grammars
S → abDeFGab
S→
x ÞG y is a binary relation where x, y ∈ V* such that x = αAβ and y = αχβ for some rule A→χ in R.
Example Derivations
S S
a S a S b
a S b a S b
a S a S
a S b a S
a a
Arithmetic Expressions
E E
E + E E * E
id E * E E + E id
id id id id
Backus-Naur Form (BNF) is used to define the syntax of programming languages using context-free grammars.
Main idea: give descriptive names to nonterminals and put them in angle brackets.
Then the string that is identical to w except that it has k additional ('s at the beginning would also be in the language. But it can't
be because the parentheses would be mismatched. So the language is not regular.
(1) Every regular language can be described by a regular grammar. We know this because we can derive a regular grammar from
any FSM (as well as vice versa). Regular grammars are special cases of context-free grammars.
a, b
S T
a, b
(2) The context-free languages are precisely the languages accepted by NDPDAs. But every FSM is a PDA that doesn't bother
with the stack. So every regular language can be accepted by a NDPDA and is thus context-free.
(3) Context-free languages are closed under union, concatenation, and Kleene *, and ε and each single character in Σ are clearly
context free.
Regular languages:
E + E
id E * E
id id
id + (id * id)
E E
E + E E * E
id E * E E + E id
id id id id
root
height
nodes
leaves
yield
Derivations
To capture structure, we must capture the path we took through the grammar. Derivations do that.
S→ε
S → SS
S → (S)
1 2 3 4 5 6
S Þ SS Þ (S)S Þ ((S))S Þ (())S Þ (())(S) Þ (())()
S Þ SS Þ (S)S Þ ((S))S Þ ((S))(S) Þ (())(S) Þ (())()
1 2 3 5 4 6
S
S S
( S ) ( S )
( S ) ε
ε
Alternative Derivations
S→ε
S → SS
S → (S)
S S
S S S S
( S ) ( S ) S S ( S )
( S ) ε ε ( S ) ε
ε ( S )
1 2 3 4 5 6 7
S Þ SS Þ (S)S Þ ((S))S Þ (())S Þ (())(S) Þ (())()
We can write these, or any, derivation as We say that D1 precedes D2, written D1< D2, if:
D1 = x1 Þ x2 Þ x3 Þ … Þ xn • D1 and D2 are the same length > 1, and
D2 = x1' Þ x2' Þ x3' Þ … Þ xn' • There is some integer k, 1 < k < n, such that:
• for all i ≠ k, xi = xi'
• xk-1 = x'k-1 = uAvBw : u, v, w ∈ V*,
and A, B ∈ V - Σ
• xk = uyvBw, where A → y ∈ R
• xk' = uAvzw where B → z ∈ R
• xk+1 = x'k+1 = uyvzw
Comparing Several Derivations
Consider three derivations:
1 2 3 4 5 6 7
(1) S Þ SS Þ (S)S Þ ((S))S Þ (())S Þ(())(S) Þ(())()
D1 < D2
D2 < D3
But D1 does not precede D3.
All three seem similar though. We can define similarity:
D1 is similar to D2 iff the pair (D1, D2) is in the reflexive, symmetric, transitive closure of <.
Note: similar is an equivalence class.
In other words, two derivations are similar if one can be transformed into another by a sequence of switchings in the order of rule
applications.
Parse Trees Capture Similarity
1 2 3 4 5 6 7
(1) S Þ SS Þ (S)S Þ ((S))S Þ (())S Þ(())(S) Þ(())()
D1 < D2
D2 < D3
All three derivations are similar to each other. This parse tree describes this equivalence class of the similarity relation:
S
S S
( S ) ( S )
( S ) ε
S S
( S ) ( S )
( S ) ε
There's one derivation in this equivalence class that precedes all others in the class.
The leftmost (rightmost) derivation can be used to construct the parse tree and the parse tree can be used to construct the leftmost
(rightmost) derivation.
Another Example
E → id
E→E+E
E→E*E
E E
E + E E * E
id E * E E + E id
id id id id
id + [id * id] [id + id] * id
Ambiguity
A grammar G for a language L is ambiguous if there exist strings in L for which G can generate more than one parse tree (note
that we don't care about the number of derivations).
E → id
E→E+E
E→E*E
Often, when this happens, we can find a different, unambiguous grammar to describe L.
Another Example
The following grammar for the language of matched parentheses is ambiguous:
S→ε
S → SS
S → (S)
S S
S S S S
( S ) ( S ) S S ( S )
( S ) ε ε ( S ) ε
ε ( S )
S→ε S
S → S1
S1 → S1 S1 S1
S1 → (S1)
S1 → () S1 S1
( S1 ) ( )
( )
S → S1 | S2
S1 → S1c | A Now consider the strings anbncn
A → aAb | ε
S2 → aS2|B They have two distinct derivations
B → bBc | ε
Inherent Ambiguity of CFLs
A context free language with the property that all grammars that generate it are ambiguous is inherently ambiguous.
Other languages that appear ambiguous given one grammar, turn out not to be inherently ambiguous because we can find an
unambiguous grammar.
Whenever we design practical languages, it is important that they not be inherently ambiguous.
a + b * c
Just Recognizing
The insight: Precisely what it needs is a stack, which gives it an unlimited amount of memory with a restricted structure.
( ( ( ( ( ) ) ) ) ( ) ( ( ) )
( Finite
( State
( Controller
(
( (
(K × (Σ ∪ {ε}) × Γ* ) × ( K × Γ* )
[//[
s
]/[/
Important:
This does not mean that the stack is empty.
An Example of Accepting
[//[
s
]/[/
∆ contains:
[1] ((s, [, ε), (s, [ ))
[2] ((s, ], [ ), (s, ε))
input = [ [ [ ] [ ] ] ]
An Example of Rejecting
[//[
s
]/[/
∆ contains:
[1] ((s, [, ε), (s, [ ))
[2] ((s, ], [ ), (s, ε))
input = [ [ ] ] ]
First we notice:
• We'll use the stack to count the a's.
• This time, all strings in L have two regions. So we need two states so that a's can't follow b's. Note the similarity to the
regular language a*b*.
a//a a/a/
c//
s f
b//b b/b/
a//a a/a/
c//
s f
b//b b/b/
∆ contains:
[1] ((s, a, ε), (s, a))
[2] ((s, b, ε), (s, b))
[3] ((s, c, ε), (f, ε))
[4] ((f, a, a), (f, ε))
[5] ((f, b, b), (f, ε))
input = b a c a b
trans state unread input stack
s bacab ε
2 s acab b
1 s cab ab
3 f ab ab
5 f b b
6 f ε ε
a//a a/a/
ε//
s f
b//b b/b/
a//a a/a/
ε//
s f
b//b b/b/
[1] ((s, a, ε), (s, a)) [4] ((f, a, a), (f, ε))
[2] ((s, b, ε), (s, b)) [5] ((f, b, b), (f, ε))
[3] ((s, ε, ε), (f, ε))
input: a a b b a a
a//a b/a/
b/a/ b/ε/
1 2
b/ε/
Accepting Mismatches
L = {ambn m ≠ n; m, n >0}
a//a b/a/
b/a/
1 2
b//
4 b//
A PDA is deterministic if, for each input and state, there is at most one possible transition. Determinism implies uniquely
defined machine behavior.
b//
4 b//
b/Z/
4 b//
ε//Z a//a
S S' machine for N
a// b,c
S→A
S→B
A→ε
A → aAb
B→ε
B → bBa
A DPDA for L:
More on PDAs
Example: Accept by final state at end of string (i.e., we don't care about the stack being empty)
We can easily convert from one of our machines to one of these:
1. Add a new state at the beginning that pushes # onto the stack.
2. Add a new final state and a transition to it that can be taken if the input string is empty and the top of the stack is #.
Converting the balanced parentheses machine:
But what we really want to do with languages like this is to extract structure.
Theorem: The class of languages accepted by PDAs is exactly the class of context-free languages.
Recall: context-free languages are languages that can be defined with context-free grammars.
Restate theorem: Can describe with context-free grammar ⇔ Can accept by PDA
E→E+T
E→T
T→T*F ε/ε/E
T→F 1 2
F → (E)
F → id
(1) (2, ε, E), (2, E+T) (7) (2, id, id), (2, ε)
(2) (2, ε, E), (2, T) (8) (2, (, ( ), (2, ε)
(3) (2, ε, T), (2, T*F) (9) (2, ), ) ), (2, ε)
(4) (2, ε, T), (2, F) (10) (2, +, +), (2, ε)
(5) (2, ε, F), (2, (E) ) (11) (2, *, *), (2, ε)
(6) (2, ε, F), (2, id)
The Top-down Parse Conversion Algorithm
Given G = (V, Σ, R, S)
Construct M such that L(M) = L(G)
The resulting machine can execute a leftmost derivation of an input string in a top-down fashion.
Another Example
L = {anbmcpdq : m + n = p + q}
0 (p, ε, ε), (q, S)
(1) S → aSd 1 (q, ε, S), (q, aSd)
(2) S→T 2 (q, ε, S), (q,T)
(3) S→U 3 (q, ε, S), (q,U)
(4) T → aTc 4 (q, ε, T), (q, aTc)
(5) T→V 5 (q, ε, T), (q, V)
(6) U → bUd 6 (q, ε, U), (q, bUd)
(7) U→V 7 (q, ε, U), (q, V)
(8) V → bVc 8 (q, ε, V), (q, bVc
(9) V→ε 9 (q, ε, V), (q, ε)
10 (q, a, a), (q, ε)
11 (q, b, b), (q, ε)
input = a a b c d d 12 (q, c, c), (q, ε)
13 (q, d, d), (q, ε)
Machines constructed with the algorithm are often nondeterministic, even when they needn't be. This happens even with trivial
languages.
Example: L = anbn
A nondeterministic transition group is a set of two or more transitions out of the same state that can fire on the same
configuration. A PDA is nondeterministic if it has any nondeterministic transition groups.
Lemma: If a language is accepted by a pushdown automaton, it is a context-free language (i.e., it can be described by a context-
free grammar).
Proof (by construction)
a/ε/a a/a/
ε/ε/Z c// ε/Z/
s' s f f'
b/ε/b b/b/
Step 2:
(1) Assure that |β| ≤ 1.
a/ε/a a/a/
ε/ε/Z c// ε/Z/
s' s f f'
b/ε/b b/b/
If the nonterminal <s1, X, s2> Þ* w, then the PDA starts in state s1 with (at least) X on the stack and after consuming w and
popping the X off the stack, it ends up in state s2.
a + b * c
Now it's time to worry about extracting structure (and doing so efficiently).
There are lots of ways to transform grammars so that they are more useful for a particular purpose.
the basic idea:
1. Apply transformation 1 to G to get of undesirable property 1. Show that the language generated by G is unchanged.
2. Apply transformation 2 to G to get rid of undesirable property 2. Show that the language generated by G is unchanged AND
that undesirable property 1 has not been reintroduced.
3. Continue until the grammar is in the desired form.
Examples:
• Getting rid of ε rules (nullable rules)
• Getting rid of sets of rules with a common initial terminal, e.g.,
• A → aB, A → aC become A → aD, D → B | C
• Conversion to normal forms
If you want to design algorithms, it is often useful to have a limited number of input forms that you have to deal with.
Normal forms are designed to do just that. Various ones have been developed for various purposes.
Examples:
becomes
becomes
• Chomsky Normal Form, in which all rules are of one of the following two forms:
• X → a, where a ∈ Σ, or
• X → BC, where B and C are nonterminals in G
• Greibach Normal Form, in which all rules are of the following form:
• X → a β, where a ∈ Σ and β is a (possibly empty) string of nonterminals
If L is a context-free language that does not contain ε, then if G is a grammar for L, G can be rewritten into both of these normal
forms.
2. Remove from G' all unit productions (rules of the form A → B, where B is a nonterminal):
2.1. Remove from G' all unit productions of the form A → A.
2.2. For all nonterminals A, find all nonterminals B such that A Þ* B, A ≠ B.
2.3. Create G'' and add to it all rules in G' that are not unit productions.
2.4. For all A and B satisfying 3.2, add to G''
A → y1 | y2 | … where B → y1 | y2 | is in G".
2.5. Set G' to G''.
Example: A→a
A→B
A → EF
B→A
B → CD
B→C
C → ab
At this point, all rules whose right hand sides have length 1 are in Chomsky Normal Form.
3. Remove from G' all productions P whose right hand sides have length greater than 1 and include a terminal (e.g., A →
aB or A → BaC):
3.1. Create a new nonterminal Ta for each terminal a in Σ.
3.2. Modify each production P by substituting Ta for each terminal a.
3.3. Add to G', for each Ta, the rule Ta → a
Example:
A → aB
A → BaC
A → BbC
Ta → a
Tb → b
Conversion to Chomsky Normal Form
4. Remove from G' all productions P whose right hand sides have length greater than 2 (e.g., A → BCDE)
4.1. For each P of the form A → N1N2N3N4…Nn, n > 2, create new nonterminals M2, M3, … Mn-1.
4.2. Replace P with the rule A → N1M2.
4.3. Add the rules M2 → N2M3, M3 → N3M4, … Mn-1 → Nn-1Nn
Example:
A → BCDE (n = 4)
A → BM2
M2 → C M3
M3 → DE
Top Down
E Þ E Þ E
E + T . + T
.
.
id
Bottom Up
E
T T
F F F
id + id Þ id + id Þ id + id
[1] E→E+T
[2] E→T
[3] T→T*F
[4] T→F
[5] F → (E)
[6] F → id
[7] F → id(E)
Example: id + id * id(id)
Stack:
E Þ E + T Þ T + T Þ F + T Þ id + T Þ
id + T * F Þ id + F * F Þ id + id * F Þ
id + id * id(E) Þ id + id * id(T) Þ
id + id * id(F) Þ id + id * id(id)
E + T
T T * F
F F id ( E )
id id T
id
In the case of regular languages, we could cope with nondeterminism in either of two ways:
• Create an equivalent deterministic recognizer (FSM)
• Simulate the nondeterministic FSM in a number of steps that was still linear in the length of the input string.
We'd really like to find a deterministic parsing algorithm that could run in time proportional to the length of the input string.
In general: No
Some definitions:
• A PDA M is deterministic if it has no two transitions such that for some (state, input, stack sequence) the two transitions
could both be taken.
Theorem: The class of deterministic context-free languages is a proper subset of the class of context-free languages.
Proof: Later.
Adding a Terminator to the Language
We define the class of deterministic context-free languages with respect to a terminator ($) because we want that class to be as
large as possible.
Proof:
Without the terminator ($), many seemingly deterministic cfls aren't. Example:
a* ∪ {anbn : n> 0}
What if we add the ability to look one character ahead in the input string?
Example: id + id * id(id)
Ý
E Þ E + T Þ T + T Þ F + T Þ id + T Þ
id + T * F Þ id + F * F Þ id + id * F
Considering transitions:
(5) (2, ε, F), (2, (E) )
(6) (2, ε, F), (2, id)
(7) (2, ε, F), (2, id(E))
So we've solved part of the problem. But what do we do when we come to the end of the input? What will be the state indicator
then?
The solution is to modify the language. Instead of building a machine to accept L, we will build a machine to accept L$.
Using Lookahead
For now, we'll ignore the issue of when we read the lookahead character and the fact that we only care about it if the top symbol
on the stack is F.
Possible Solutions to the Nondeterminism Problem
Replace with:
Whenever A → αβ1
A → αβ2 …
A → αβn
are rules with α ≠ ε and n ≥ 2, then replace them by the rules:
A → αA'
A' → β1
A' → β2 …
A' → βn
The problem:
E
E + T
E + T
Replace with:
LL(k) Languages
We have just offered heuristic rules for getting rid of some nondeterminism.
We know that not all context-free languages are deterministic, so there are some languages for which these rules won't work.
We define a grammar to be LL(k) if it is possible to decide what production to apply by looking ahead at most k symbols in the
input string.
If a language L has an LL(1) grammar, then we can build a deterministic LL(1) parser for it. Such a parser scans the input Left to
right and builds a Leftmost derivation.
The heart of an LL(1) parser is the parsing table, which tells it which production to apply at each step.
For example, here is the parsing table for our revised grammar of arithmetic expressions without function calls:
V\ΣΣ id + * ( ) $
E E→TE' E→TE'
E' E'→+TE' E'→ε E'→ε
T T→FT' T→FT'
T' T'→ε T'→*FT' T'→ε T'→ε
F F→id F→(E)
Given input id + id * id, the first few moves of this parser will be:
E id + id * id$
E→TE' TE' id + id * id$
T→FT' FT'E' id + id * id$
F→id idT'E' id + id * id$
T'E' + id * id$
T'→ε E' + id * id$
Example:
ST → if C then ST else ST
ST → if C then ST
Now we've procrastinated the decision. But the language is still ambiguous. What if the input is
T E'
F T' + T E'
id A ε F T' + T E'
ε id A ε F T' ε
ε id A ε
Bottom Up Parsing
An Example:
[1] E→E+T
[2] E→T
[3] T→T*F
[4] T→F
[5] F → (E)
[6] F → id
id + id * id $
M will be:
$/S/
p q
E T
T T F
F F
id + id * id $
We can reconstruct the derivation that we found by reading the results of the parse bottom to top, producing:
EÞ E+ id* idÞ
E+ T Þ T+ id*idÞ
E+ T* FÞ F+ id*idÞ
E+ T* idÞ id+ id*id
E+ F* idÞ
Let's return to the problem of deciding when to shift and when to reduce (as in our example).
This corresponds to knowing that “+” has low precedence, so if there are any other operations, we need to do them first.
Solution:
1. Add a one character lookahead capability.
2. Define the precedence relation
P⊆ ( V × {Σ ∪ $} )
top next
stack input
symbol symbol
If (a,b) is in P, we reduce (without consuming the input) . Otherwise we shift (consuming the input).
a
γ and the next input character is b, we reduce
now, before we put the b on the stack.
To make this happen, we put (a, b) in P. That means we'll try to reduce if a is on top of the stack and b is the next character. We
will actually succeed if the next part of the stack is γ.
T*F
Ý corresponding to a rule T→T*F
T
Ý* Input: id * id * id
E
V\Σ ( ) id + * $
(
) • • • •
id • • • •
+
*
E
T • • •
F • • • •
A Different Example
E+T
Ý* corresponding to a rule E→E+T
E
V\Σ ( ) id + * $
(
) • • • •
id • • • •
+
*
E
T • • •
F • • • •
ST → if C then ST else ST
ST → if C then ST
1 2
We don't put (ST, else) in the precedence relation, so we will not reduce at 1. At 2, we reduce:
ST2 2
else
ST1 1
then
C2
if
then
C1
if
A simple to implement heuristic rule, when faced with competing reductions, is:
We call grammars that become unambiguous with the addition of a precedence relation and the longest string reduction heuristic
weak precedence grammars.
LR Parsers
LR parsers scan each input Left to right and build a Rightmost derivation. They operate bottom up and deterministically using a
parsing table derived from a grammar for the language to be recognized.
A grammar that can be parsed by an LR parser examining up to k input symbols on each move is an LR(k) grammar. Practical
LR parsers set k to 1.
An LALR ( or Look Ahead LR) parser is a specific kind of LR parser that has two desirable properties:
• The parsing table is not huge.
• Most useful languages can be parsed.
Input String
state
state Lexical Analyzer
state
state
Output Token
Stack
Parsing Table
In simple cases, think of the "states" on the stack as corresponding to either terminal or nonterminal characters.
In more complicated cases, the states contain more information: they encode both the top stack symbol and some facts about
lower objects in the stack. This information is used to determine which action to take in situations that would otherwise be
ambiguous.
The Actions the Parser Can Take
At each step of its operation, an LR parser does the following two things:
1) Based on its current state, it decides whether it needs a lookahead token. If it does, it gets one.
2) Based on its current state and the lookahead token if there is one, it chooses one of four possible actions:
• Shift the lookahead token onto the stack and clear the lookahead token.
• Reduce the top elements of the stack according to some rule of the grammar.
• Detect the end of the input and accept the input string.
• Detect an error in the input.
state 0 (empty)
$accept : _rhyme $end ⇐ the rule this came from
DING shift 3 state 3
. error current position of input
rhyme goto 1 if none of the others match
sound goto 2 push state 2
state 1 (rhyme)
$accept : rhyme_$end
$end accept if we see EOF, accept
. error
state 2 (sound)
rhyme : sound_place
DELL shift 5
. error by rule 1
place goto 4
state 3 (DING)
sound : DING_DONG state 5 (DELL)
DONG shift 6 place : DELL_ (3)
. error . reduce 3
state 4 (place) state 6 (DONG)
rhyme : sound place_ (1) sound : DING DONG_ (2)
. reduce 1 . reduce 2
Example:
id
procname ( id) (
procname
procid
proc(
procname
The parsing table can get complicated as we incorporate more stack history into the states.
Output: -1414.52
Output: -1414.52
Output: -1414.52
A string of input tokens, corresponding to the primitive objects of which the input is composed:
-(id * id) + id / id
Output: -1414.52
lex
lexical analyzer ✸1
yacc
parser ✸2
All strings that are not matched by any rule are simply copied to the output.
Rules:
Example:
integer action 1
[a-z]+ action 2
yacc
(Yet Another Compiler Compiler)
The input to yacc:
declarations
%%
rules
%%
#include "lex.yy.c"
any other programs
This structure means that lex.yy.c will be compiled as part of y.tab.c, so it will have access to the same token names.
Declarations:
Rules:
V :a b c
V :a b c {action}
V :a b c {$$ = $2} returns the value of b
The parser table that yacc creates represents some decision about what to do if there is ambiguity in the input grammar rules.
How does yacc make those decisions? By default, yacc invokes two disambiguating rules:
1. In the case of a shift/reduce conflict, shift.
2. In the case of a reduce/reduce conflict, reduce by the earlier grammar rule.
yacc tells you when it has had to invoke these rules.
ST → if C then ST else ST
ST → if C then ST
1 2
Which bracketing (rule) should we choose?
ST2 2
else
ST1 1
then
C2
if
then
C1
if
We know that we can force left associativity by writing it into our grammars.
Example:
E→E+T E
E→T
T → id E T
E T
id + id + id
What does the shift rather than reduce heuristic if we instead write:
E→E+E id + id + id
E → id
Shift/Reduce Conflicts - Operator Precedence
One solution was the precedence table, derived from an unambiguous grammar, which can be encoded into the parsing table of an
LR parser, since it tells us what to do for each top-of-stack, input character combination.
Operator Precedence
We know that we can write an unambiguous grammar for arithmetic expressions that gets the precedence right. But it turns out
that we can build a faster parser if we instead write:
E → E + E | E * E | (E) | id
And, in addition, we specify operator precedence. In yacc, we specify associativity (since we might not always want left) and
precedence using statements in the declaration section of our grammar:
Operators on the first line have lower precedence than operators on the second line, and so forth.
This can easily be used to simulate the longest prefix heuristic, "Choose the longest possible stack string to reduce."
[1] E→E+T
[2] E→T
[3] T→T*F
[4] T→F
[5] F → (E)
[6] F → id
Step 2:
$ lex ourlex.l creates lex.yy.c
$ yacc ouryacc.y creates y.tab.c
$ cc -o ourprog y.tab.c -ly -ll actually compiles y.tab.c and lex.yy.c, which is included.
-ly links the yacc library, which includes main and yyerror.
-ll links the lex library
Step 3: Run the program
$ ourprog
ask return
for a a yylval
token token
Lexical Analyer
set the value of the token
Summary
Efficient parsers for languages with the complexity of a typical programming language or command line interface:
• Make use of special purpose constructs, like precedence, that are very important in the target languages.
• May need complex transition functions to capture all the relevant history in the stack.
• Use heuristic rules, like shift instead of reduce, that have been shown to work most of the time.
• Would be very difficult to construct by hand (as a result of all of the above).
Proof:
(1) There are a countably infinite number of context-free languages. This true because every description of a context-free
language is of finite length, so there are a countably infinite number of such descriptions.
Thus there are more languages than there are context-free languages.
Example: {anbncn}
Showing that a Language is Context-Free
Unfortunately, these are weaker than they are for regular languages.
Lecture Notes 19 Languages That Are and Are Not Context Free 1
The Context-Free Languages are Closed Under Kleene Star
Let L = L(G1)*
L1 ∩ L2 = L1 ∪ L2
We proved closure for regular languages two different ways. Can we use either of them here:
1. Given a deterministic automaton for L, construct an automaton for its complement. Argue that, if closed under complement
and union, must be closed under intersection.
2. Given automata for L1 and L2, construct a new automaton for L1 ∩ L2 by simulating the parallel operation of the two original
machines, using states that are the Cartesian product of the sets of states of the two original machines.
We construct a new PDA, M3, that accepts L ∩ R by simulating the parallel execution of M1 and M2.
Insert into ∆:
This works because: we can get away with only one stack.
Lecture Notes 19 Languages That Are and Are Not Context Free 2
Example
L= a nb n ∩ (aa)*(bb)*
b/a/ a
A B 1 2
a//a b/a/ a
b
b
3 4
b
((A, a, ε), (A, a)) (1, a, 2)
((A, b, a), (B, ε)) (1, b, 3)
((B, b, a), (B, ε)) (2, a, 1)
(3, b, 4)
(4, b, 3)
A PDA for L:
Don’t Try to Use Closure Backwards
L3 = L1 ∪ L2.
But what if L3 and L1 are context free? What can we say about L2?
L3 = L1 ∪ L2.
Example:
This time we use parse trees, not automata as the basis for our argument.
u v x y z
If L is a context-free language, and if w is a string in L where |w| > K, for some value of K, then w can be rewritten as uvxyz,
where |vy| > 0 and |vxy| ≤ M, for some value of M.
uxz, uvxyz, uvvxyyz, uvvvxyyyz, etc. (i.e., uvnxynz, for n ≥ 0) are all in L.
Lecture Notes 19 Languages That Are and Are Not Context Free 3
Some Tree Basics
root
height
nodes
leaves
yield
Theorem: The length of the yield of any tree T with height H and branching factor (fanout) B is ≤ BH.
Proof: By induction on H. If H is 1, then just a single rule applies. By definition of fanout, the longest yield is B.
Assume true for H = n.
Consider a tree with H = n + 1. It consists of a root, and some number of subtrees, each of which is of height ≤ n (so induction
hypothesis holds) and yield ≤ Bn. The number of subtrees ≤ B. So the yield must be ≤ B(Bn) or Bn+1.
What Is K?
S
u v x y z
So K = BT, where T is the number of nonterminals in G and B is the branching factor (fanout).
What is M?
u v x y z
Assume that we are considering the bottom most two occurrences of some nonterminal. Then the yield of the upper one is at
most BT+1 (since only one nonterminal repeats).
So M = BT+1.
Lecture Notes 19 Languages That Are and Are Not Context Free 4
The Context-Free Pumping Lemma
Theorem: Let G = (V, Σ, R, S) be a context-free grammar with T nonterminal symbols and fanout B. Then any string w ∈ L(G)
where |w| > K (BT) can be rewritten as w = uvxyz in such a way that:
• |vy| > 0,
• |vxy| ≤ M (BT+1), (making this the "strong" form),
• for every n ≥ 0, uvnxynz is in L(G).
Proof:
Let w be such a string and let T be the parse tree with root labeled S and with yield w that has the smallest number of leaves
among all parse trees with the same root and yield. T has a path of length at least T+1, with a bottommost repeated nonterminal,
which we'll call A. Clearly v and y can be repeated any number of times (including 0). If |vy| = 0, then there would be a tree with
root S and yield w with fewer leaves than T. Finally, |vxy| ≤ BT+1.
An Example of Pumping
L = {anbncn : n≥ 0}
u v x y z
Unfortunately, we don't know where v and y fall. But there are two possibilities:
1. If vy contains all three symbols, then at least one of v or y must contain two of them. But then uvvxyyz contains at least one
out of order symbol.
2. If vy contains only one or two of the symbols, then uvvxyyz must contain unequal numbers of the symbols.
We need to pick w, then show that there are no values for uvxyz that satisfy all the above criteria. To do that, we just need to
focus on possible values for v and y, the pumpable parts. So we show that all possible picks for v and y violate at least one of
the criteria.
For each possibility for v and y (described in terms of the regions defined above), find some value n such that uvnxynz is not in L.
Almost always, the easiest values are 0 (pumping out) or 2 (pumping in). Your value for n may differ for different cases.
Lecture Notes 19 Languages That Are and Are Not Context Free 5
v y n why the resulting string is not in L
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
Q. E. D.
Suppose L is context free. The context free pumping lemma applies to L. Let M be the number from the pumping lemma.
Choose w = aMbMcM. Now w ∈ L and |w| > M ≥ K. From the pumping lemma, for all strings w, where |w| > K, there exist u, v, x,
y, z such that w = uvxyz and |vy| > 0, and |vxy| ≤ M, and for all n ≥ 0, uvnxynz is in L. There are two main cases:
1. Either v or y contains two or more different types of symbols (“a”, “b” or “c”). In this case, uv2xy2z is not of the form
a*b*c* and hence uv2xy2z ∉L.
2. Neither v nor y contains two or more different types of symbols. In this case, vy may contain at most two types of
symbols. The string uv0xy0z will decrease the count of one or two types of symbols, but not the third, so uv0xy0z ∉L
Cases 1 and 2 cover all the possibilities. Therefore, regardless of how w is partitioned, there is some uvnxynz that is not in L.
Contradiction. Therefore L is not context free.
Note: the underlined parts of the above proof is “boilerplate” that can be reused. A complete proof should have this text or
something equivalent.
L = {anbn}
L′ = {anan}
= {a2n}
= {w ∈ {a}* : |w| is even}
L = {anbm : n, m ≥ 0 and n ≠ m}
L′ = {anam : n, m ≥ 0 and n ≠ m}
=
Lecture Notes 19 Languages That Are and Are Not Context Free 6
Another Language That Is Not Context Free
L = {an : n ≥ 1 is prime}
2. |ΣL| = 1. So if L were context free, it would also be regular. But we know that it is not. So it is not context free either.
Now what?
t t
u v x y z
What if u is ε,
v is w,
x is ε,
y is w, and
z is ε
Lecture Notes 19 Languages That Are and Are Not Context Free 7
L = {tt : t ∈ {a, b}* }
What if we let |w| > M, i.e. choose to pump the string aMbaMb:
t t
u v x y z
Suppose |v| = |y|. Now we have to show that repeating them makes the two copies of t different. But we can’t.
This time, we let |w| > 2M, and the number of both a's and b's in w >M:
1 2 3 4
aaaaaaaaaabbbbbbbbbbaaaaaaaaaabbbbbbbbbb
t t
u v x y z
First, notice that if either v or y contains both a's and b's, then we immediately violate the rules for L' when we pump.
So now we know that v and y must each fall completely in one of the four marked regions.
|w| > 2M, and the number of both a's and b's in w >M:
1 2 3 4
aaaaaaaaaabbbbbbbbbbaaaaaaaaaabbbbbbbbbb
t t
u v x y z
(1,1)
(2,2)
(3,3)
(4,4)
(1,2)
(2,3)
(3,4)
(1,3)
(2,4)
(1,4)
Lecture Notes 19 Languages That Are and Are Not Context Free 8
The Context-Free Languages Are Not Closed Under Intersection
Consider L = {anbncn: n ≥ 0}
L is not context-free.
But L = L1 ∩ L2.
So, if the context-free languages were closed under intersection, L would have to be context-free. But it isn't.
By definition:
L1 ∩ L2 = L1 ∪ L2
Since the context-free languages are closed under union, if they were also closed under complementation, they would necessarily
be closed under intersection. But we just showed that they are not. Thus they are not closed under complementation.
Let L be a language such that L$ is accepted by the deterministic PDA M. We construct a deterministic PDA M' to accept (the
complement of L)$, just as we did for FSMs:
Lecture Notes 19 Languages That Are and Are Not Context Free 9
An Example of the Construction
a//a b/a/
b/a/ $/ε/
1 2 3
$/ε/
$/Z/
$/Z/
b/Z/, $/a/
a//, $/a/, b/Z/
4
Theorem: The class of deterministic context-free languages is a proper subset of the class of context-free languages.
Proof: Consider L = {anbmcp : m ≠ n or m ≠ p} L is context free (we have shown a grammar for it).
But L is not deterministic. If it were, then its complement L1 would be deterministic context free, and thus certainly context free.
But then
L2 = L1 ∩ a*b*c* (a regular language)
would be context free. But
L2 = {anbncn : n ≥ 0}, which we know is not context free.
Thus there exists at least one context-free language that is not deterministic context free.
Note that deterministic context-free languages are not closed under union, intersection, or difference.
Lecture Notes 19 Languages That Are and Are Not Context Free 10
Decision Procedures for CFLs & PDAs
Such decision procedures usually involve conversions to Chomsky Normal Form or Greibach Normal Form. Why?
Theorem: For any context free grammar G, there exists a number n such that:
1. If L(G) ≠ ∅, then there exists a w ∈ L(G) such that |w| < n.
2. If L(G) is infinite, then there exists w ∈ L(G) such that n ≤ |w| < 2n.
If we could decide these problems, we could decide the halting problem. (More later.)
Convert M to its equivalent PDA and use the corresponding CFG decision procedure. Why avoid using PDA’s directly?
If we could decide these problems, we could decide the halting problem. (More later.)
Lecture Notes 19 Languages That Are and Are Not Context Free 11
Comparing Regular and Context-Free Languages
Recursively Enumerable
Languages
Recursive
Languages
Context-Free
Languages
Regular
Languages
FSMs
D ND
PDAs
Lecture Notes 19 Languages That Are and Are Not Context Free 12
Turing Machines
Read K & S 4.1.
Do Homework 17.
Recursively
L
Enumerable
Language
Unrestricted
Grammar Accepts
Turing
Machine
Turing Machines
Can we come up with a new kind of automaton that has two properties:
• powerful enough to describe all computable things
unlike FSMs and PDAs
• simple enough that we can reason formally about it
like FSMs and PDAs
unlike real computers
Turing Machines
❑ à ❑ a b b a ❑ ❑ ❑
A Formal Definition
A Turing machine is a quintuple (K, Σ, δ, s, H):
K is a finite set of states;
Σ is an alphabet, containing at least ❑ and à, but not → or ←;
s ∈ K is the initial state;
H ⊆ K is the set of halting states;
δ is a function from:
(K - H) × Σ to K × (Σ ∪ {→, ←})
non-halting state × input symbol state × action (write or move)
such that
(a) if the input symbol is à, the action is →, and
(b) à can never be written .
1. The input tape is infinite to the right (and full of ❑), but has a wall to the left. Some definitions allow infinite tape in both
directions, but it doesn't matter.
3. δ must be defined for all state, input pairs unless the state is a halt state.
4. Turing machines do not necessarily halt (unlike FSM's). Why? To halt, they must enter a halt state. Otherwise they loop.
A Simple Example
❑ à ❑ 0 1 1 0 ❑ ❑ ❑
Σ = 0, 1, à, ❑
s=
H=
δ=
à a a b b ❑ ❑ ❑ (1)
à ❑ a a b b ❑ ❑ ❑ (2)
The input after the scanned square may be empty, but it may not end with a blank. We assume the entire tape to the right of the
input is filled with blanks.
(q1, w1a1u1) |-M (q2, w2a2u2), a1 and a2 ∈ Σ, iff ∃ b ∈ Σ ∪ {←, →}, δ(q1, a1) = (q2, b) and either:
| w1 | a1 | u1 |
à ❑ a a b b ❑ ❑ ❑ à❑aabb
| w2 | a2 | u2 |
à ❑ a a a b ❑ ❑ ❑ à❑aaab
Yields, Continued
| w1 | a1 | u1 |
à ❑ a a a b ❑ ❑ ❑ à❑aaab
| w2 | a2 | u2 |
à ❑ a a a b ❑ ❑ ❑ à❑aaab
or (b) u2 = ε, if a1 = ❑ and u1 = ε
| w1 | a1 |u1|
à ❑ a a a b ❑ ❑ ❑ à❑aaab❑
| w1 | a1 |u1|
à ❑ a a a b ❑ ❑ ❑ à❑aaab
If we scan left off the first square of the blank region, then drop that square from the configuration.
Yields, Continued
| w1 | a1 | u1 |
à ❑ a a a b ❑ ❑ ❑ à❑aaab
| w2 | a2 | u2 |
à ❑ a a a b ❑ ❑ ❑ à❑aaab
or (b) u1 = u2 = ε and a2 = ❑
| w1 | a1 |u1|
à ❑ a a a b ❑ ❑ ❑ à❑aaab
| w2 | a2 |u2|
à ❑ a a a b ❑ ❑ ❑ à❑aaab❑
If we scan right onto the first square of the blank region, then a new blank appears in the configuration.
For any Turing machine M, let |-M* be the reflexive, transitive closure of |-M.
We say that the computation is of length n or that it has n steps, and we write
C0 |-Mn Cn
A Context-Free Example
M takes a tape of a's then b's, possibly with more a's, and adds b's as required to make the number of b's equal the number of a's.
à ❑ a a a b ❑ ❑ ❑
K = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
Σ = a, b, à, ❑, 1, 2
s= 0 H = {9} δ=
0 a/1
❑ /→
a,1,2/→ 1,2/←
a/1 1/→ b/2 2/←
1 2 3 4 5
❑/2 2/←
❑/❑ 6 ❑/→
1/a;2/b
7 8
∀/→
❑/❑
An Example Computation
à ❑ a a a b ❑ ❑ ❑
It's very common to have state pairs, in which the first writes on the tape and the second move. Some definitions allow both
actions at once, and those machines will have fewer states.
There are common idioms, like scan left until you find a blank.
>M1 a M2
b
M3
a
M1 M2 becomes M1 a, b M2
b
MM becomes M2
M1 a, b M2 becomes M1 x = a, b M2
M x?y M2
e.g., > x≠❑ Rx if the current square is not blank, go right and copy it.
>R ¬❑ find the first blank square to the right of the current square
R❑
>L ¬❑ find the first blank square to the left of the current square
L❑
>R ❑ find the first nonblank square to the right of the current square
R¬❑
>L ❑ find the first nonblank square to the left of the current square
L¬❑
Ra,b find the first occurrence of a or b to the right of the current square
La,b a M1 find the first occurrence of a or b to the left of the current square, then go to M1 if the detected
b character is a; go to M2 if the detected character is b
M2
Lx=a,b find the first occurrence of a or b to the left of the current square and set x to the value found
Lx=a,bRx find the first occurrence of a or b to the left of the current square, set x to the value found, move one
square to the right, and write x (a or b)
An Example
Input: à❑w w ∈ {1}*
Output: à❑w3
Example: à ❑111❑❑❑❑❑❑❑❑❑❑❑❑❑
>R1,❑ 1 #R❑#R#L❑
❑
L # 1
❑
H
A Shifting Machine S←
Input: ❑❑w❑
Output: ❑w❑
Example: ❑❑abba❑❑❑❑❑❑❑❑❑❑❑❑❑
x=❑
A Recognition Example
L = {anbncn : n ≥ 0}
Example: à❑aabbcc❑❑❑❑❑❑❑❑❑
Example: à❑aaccb❑❑❑❑❑❑❑❑❑
a’ a, b’ b, c’
> R a a’ R b b’ R c c’ L❑
❑, b’, c’ c, a’, c’, ❑
b,c ❑, a, b’, a’
b’,c’ R a, b, c, a’ n
❑
y
Example: à❑abbcabb❑❑❑
Example: à❑acabb❑❑❑
c ❑ ❑ c
❑ y?x
y #L❑
FSMs Always halt after n steps, where n is the length of the input. At that point, they either accept or reject.
PDAs Don't always halt, but there is an algorithm to convert any PDA into one that does halt.
And now there is no algorithm to determine whether a given machine always halts.
Computing Functions
f(w) = ww
x=❑
L
Then the machine to compute f is just >C S L❑←
Example: Succ(n) = n + 1
Why Are We Working with Our Hands Tied Behind Our Backs?
Turing machines are more powerful than any of the other formalisms we have studied so far.
Turing machines are a lot harder to work with than all the real computers we have available.
Why bother?
The very simplicity that makes it hard to program Turing machines makes it possible to reason formally about what they can do.
If we can, once, show that anything a real computer can do can be done (albeit clumsily) on a Turing machine, then we have a
way to reason about what real computers can do.
Let L ⊆ Σ0*.
M semidecides L iff
for any string w ∈ Σ0*,
w∈LÞ M halts on input w
w∉L Þ M does not halt on input w
M(w) = ↑
¬a
>R
❑b b b b b b❑❑❑❑❑
❑
> R),❑ ) ❑L(,❑
❑
L❑
❑b b b b b b) ❑❑❑❑❑
Theoretical Examples
L = {Turing machines that halt on a blank input tape}
Theorems with valid proofs.
We say that Turing machine M enumerates the language L iff, for some fixed state q of M,
L = {w : (s, à❑) |-M* (q, à❑w)}
q w
Note that q is not a halting state. It merely signals that the current contents of the tape should be viewed as a member of L.
=w? halt
w3, w2, w1
M M'
ε [1]
ε [2] a [1]
ε [3] a [2] b [1]
ε [4] a [3] b [2] aa [1]
ε [5] a [4] b [3] aa [2] ab [1]
ε [6] a [5] aa [3] ab [2] ba [1]
a/a
y n y n
Proof: (by construction) If L is recursive, then there is a Turing machine M that decides L.
We construct a machine M' to decide L by taking M and swapping the roles of the two halting states y and n.
M: M':
y n n y
M: M':
Lemma: There exists at least one language L that is recursively enumerable but not recursive.
Proof that M' doesn't exist: Suppose that the RE languages were closed under complement. Then if L is RE, L would be RE. If
that were true, then L would also be recursive because we could construct M to decide it:
1. Let T1 be the Turing machine that semidecides L.
2. Let T2 be the Turing machine that semidecides L.
3. Given a string w, fire up both T1 and T2 on w. Since any string in Σ* must be in either L or L, one of the two machines will
eventually halt. If it's T1, accept; if it's T2, reject.
But we know that there is at least one RE language that is not recursive. Contradiction.
Theorem: A language is recursive iff both it and its complement are recursively enumerable.
Proof:
• L recursive implies L and ¬L are RE: Clearly L is RE. And, since the recursive languages are closed under complement,
¬L is recursive and thus also RE.
• L and ¬L are RE implies L recursive: Suppose L is semidecided by M1 and ¬L is semidecided by M2. We construct M to
decide L by using two tapes and simultaneously executing M1 and M2. One (but not both) must eventually halt. If it's M1,
we accept; if it's M2 we reject.
Lexicographic Enumeration
We say that M lexicographically enumerates L if M enumerates the elements of L in lexicographic order. A language L is
lexicographically Turing-enumerable iff there is a Turing machine that lexicographically enumerates it.
Example: L = {anbncn}
Lexicographic enumeration:
Proof
Proof that recursive implies lexicographically Turing enumerable: Let M be a Turing machine that decides L. Then M'
lexicographically generates the strings in Σ* and tests each using M. It outputs those that are accepted by M. Thus M'
lexicographically enumerates L.
M
M'
Proof, Continued
Proof that lexicographically Turing enumerable implies recursive: Let M be a Turing machine that lexicographically enumerates
L. Then, on input w, M' starts up M and waits until either M generates w (so M' accepts), M generates a string that comes after w
(so M' rejects), or M halts (so M' rejects). Thus M' decides L.
= w? yes
L3, L2, L1
> w? no
M
no more Lis? no
M'
Languages Functions
Tm always halts recursive recursive
Tm halts if yes recursively ?
enumerable
domain range
Suppose we have a function that is not defined for all elements of its domain.
domain range
One solution: Redefine the domain to be exactly those elements for which f is defined:
domain
range
But what if we don't know? What if the domain is not a recursive set (but it is recursively enumerable)? Then we want to define
the domain as some larger, recursive set and say that the function is partially recursive. There exists a Turing machine that halts
if given an element of the domain but does not halt otherwise.
IN OUT
Semidecidable Recursively
Enumerable Enumerable
Unrestricted grammar
δ is a function from:
K×Γ to K× (Γ - {❑}) × {←, →}
state, tape symbol, L or R
❑ ❑ a b b a ❑ ❑ ❑
Both definitions are simple enough to work with, although details may make specific arguments easier or harder.
Answer: No.
In fact, there are lots of extensions we can make to our basic Turing machine model. They may make it easier to write Turing
machine programs, but none of them increase the power of the Turing machine because:
We can show that every extended machine has an equivalent basic machine.
We can also place a bound on any change in the complexity of a solution when we go from an extended machine to a basic
machine.
❑ ❑ a b b a ❑ ❑ ❑
❑ b a b b a ❑ ❑ ❑
❑ ❑ 1 2 2 1 ❑ ❑ ❑
❑ ❑ a b b a ❑ ❑ ❑
❑ ❑ ❑ ❑ ❑ ❑ ❑ ❑ ❑
❑ ❑ a b b a ❑ ❑ ❑
❑ ❑ a b b a ❑ ❑ ❑
❑ ❑ a b b a ❑ ❑ ❑
❑ ❑ a b b a ❑ ❑ ❑
❑ 1 0 1 ; 1 1 0 ❑
❑ ❑ ❑ ❑ ❑ ❑ ❑ ❑ ❑
❑ 0 0 0 0 1 1 0 ❑
❑ 1 0 1 ❑ ❑ ❑ ❑ ❑
à ❑ a b a ❑ ❑
à 0 0 1 0 0 0 0 ❑ ❑
à a b b a b a
0 1 0 0 0 0 0
à a b c d ❑ ❑
Proposed definition:
❑ ❑ g f e a b c d ❑
Simulation:
Track 1 à a b c d ❑ ❑
Track 2 à e f g ❑ ❑ ❑
Simulating a PDA
The components of a PDA:
• Finite state controller
• Input tape
• Stack
The simulation:
• Finite state controller:
• Input tape:
• Stack:
Track 1 à a a a b b ❑
(Input)
Track 2 à ❑ a a ❑ ❑ ❑
Corresponding to
a
a
à a b a a # a a b a
Ý
a #
a a
b a
a b
à a
à❑abab
à❑abab à❑abab
à❑abab à❑bbab
An Example
L = {w ∈ {a, b, c, d}* : there are two of at least one letter}
¬a/→
2 a
c/→ ¬c/→ c
d/→ 4
¬d/→ d
There is a natural number N, depending on M and w, such that there is no configuration C satisfying
(s, à❑w) |-MN C.
An Example of Nondeterministic Deciding
1. Nondeterministically choose two binary numbers 1 < p, q, where |p| and |q| ≤ |w|, and write them on the tape, after w,
separated by ;.
à❑110011;111;1111❑❑
2. Multiply p and q and put the answer, A, on the tape, in place of p and q.
à❑110011;1011111❑❑
Theorem: If a nondeterministic Turing machine M semidecides or decides a language, or computes a function, then there is a
standard Turing machine M' semideciding or deciding the same language or computing the same function.
Note that while nondeterminism doesn’t change the computational power of a Turing Machine, it can exponentially increase its
speed!
Recall the way we did this for FSMs: simulate being in a combination of states.
At any point in the operation of a nondeterministic machine M, the maximum number of branches is
r= |K| ⋅ (|Σ| + 2)
states actions
So imagine a table:
1 2 3 r
(q1,σ1) (p-,σ-) (p-,σ-) (p-,σ-) (p-,σ-) (p-,σ-)
(q1,σ2) (p-,σ-) (p-,σ-) (p-,σ-) (p-,σ-) (p-,σ-)
(q1,σn)
(q2,σ1)
(q|K|,σn)
Note that if, in some configuration, there are not r different legal things to do, then some of the entries on that row will repeat.
Tape 1: Input
Tape 2: 1 3 2 6 5 4 3 6
Md either:
• discovers that M would accept, or
• comes to the end of Tape 2.
Tape 1: Input
Steps of M':
write ε on Tape 3
until Md accepts do
(1) copy Input from Tape 1 to Tape 2
(2) run Md
(3) if Md accepts, exit
(4) otherwise, generate lexicographically next string on Tape 3.
Pass 1 2 3 7 8 9
Tape3 ε 1 2 ⋅⋅⋅ 6 11 12 ⋅⋅⋅ 2635
Can we make Turing machines even more limited and still get all the power?
Example:
• One character?
• Two characters?
• Three characters?
1 2 3
4 5
Problem View: It is unsolvable whether a Turing Machine halts on a given input. This is called the Halting Problem.
Question: Does it make sense to talk about a programmable Turing machine that accepts as input
program input string
executes the program, and outputs
output string
Notice that the Universal Turing machine semidecides H = {áM, wñ : TM M halts on input string w} = L(U).
Lecture Notes 24 Problem Encoding, Turing Machine Encoding, and the Universal Turing Machine 1
Encoding a Turing Machine M
We encode input strings to a machine M using the same character encoding we use for M.
For example, suppose that we are using the following encoding for symbols in M:
symbol representation
❑ a000
à a001
← a010
→ a011
a a100
Lecture Notes 24 Problem Encoding, Turing Machine Encoding, and the Universal Turing Machine 2
An Encoding Example
Consider M = ({s, q, h}, {❑, à,a}, δ, s, {h}), where δ =
Example of a transforming TM T:
Input: a machine M1 that reads its input tape and performs some operation P on it.
Output: a machine M2 that performs P on an empty input tape:
>R x≠❑ ❑
Là R M1
Initialization of U:
1. Copy "M" onto tape 2
2. Insert "à❑" at the left edge of tape 1, then shift w over.
3. Look at "M", figure out what i is, and write the encoding of state s on tape 3.
Lecture Notes 24 Problem Encoding, Turing Machine Encoding, and the Universal Turing Machine 3
The Operation of U
a 0 0 1 a 0 0
1 0 0 0 0 0 0
à "M ---------------------------- M" ❑ ❑ ❑ ❑ ❑
1 0 0 0 0 0 0
q 0 0 0 ❑ ❑ ❑
1 ❑ ❑ ❑ ❑ ❑ ❑
An Example
Tape 1: a001a000a100a100a000a100
à ❑ a a ❑ a
Tape 3: q01
Tape 1: a001a000a100a100a000a100
à ❑ a a ❑ a
Tape 3: q00
Lecture Notes 24 Problem Encoding, Turing Machine Encoding, and the Universal Turing Machine 4
Grammars and Turing Machines
Do Homework 20.
L
Recursively
Enumerable
Language
Unrestricted
Grammar Accepts
Turing
Machine
Unrestricted Grammars
• V is an alphabet,
• Σ (the set of terminals) is a subset of V,
• R (the set of rules) is a finite subset of
• (V* (V-Σ) V*) × V*,
context N context → result
• S (the start symbol) is an element of V - Σ.
We define derivations just as we did for context-free grammars.
The language generated by G is
{w ∈ Σ* : S ÞG* w}
There is no notion of a derivation tree or rightmost/leftmost derivation for unrestricted grammars.
Unrestricted Grammars
Example: L = anbncn, n > 0
S → aBSc
S → aBc
Ba → aB
Bc → bc
Bb → bb
Another Example
Unrestricted grammars have a procedural feel that is absent from restricted grammars.
Derivations often proceed in phases. We make sure that the phases work properly by using nonterminals as flags that we're in a
particular phase.
Theorem: A language is generated by an unrestricted grammar if and only if it is recursively enumerable (i.e., it is semidecided
by some Turing machine M).
Proof:
Only if (grammar → TM): by construction of a nondeterministic Turing machine.
à ❑ a b a ❑ ❑
à 0 1 0 0 0 0 0 ❑ ❑
à a S T a b ❑
0 1 0 0 0 0 0
At each step, M nondeterministically chooses a rule to try to apply and a position on tape 2 to start looking for the left hand side
of the rule. Or it chooses to check whether tape 2 equals tape 1. If any such machine succeeds, we accept. Otherwise, we keep
looking.
Suppose that M semidecides a language L (it halts when fed strings in L and loops otherwise). Then we can build M' that halts in
the configuration (h, à❑).
M'
goes from
à ❑ a b b a ❑ ❑ ❑
à ❑ ❑ ❑ ❑ ❑ ❑ ❑ ❑
The Rules of G
S → >❑h< (the halting configuration)
If δ(q, a) = (p, b) : bp → aq
L = a*
>❑saa< 1 S Þ >❑h< 1
>❑aqa< 2 Þ >❑t< 14
>❑aaq< 2 Þ >❑❑p< 17
>❑aa❑q< 3 Þ >❑at< 13
>❑aat< 4 Þ >❑a❑p< 17
>❑a❑p< 6 Þ >❑aat< 13
>❑at< 4 Þ >❑aa❑q< 12
>❑❑p< 6 Þ >❑aaq< 9
>❑t< 5 Þ >❑aqa< 8
>❑h< Þ >❑saa< 5
Þ aa< 2
Þ aa 3
An alternative is to build a grammar G that simulates the forward operation of a Turing machine M. It uses alternating symbols
to represent two interleaved tapes. One tape remembers the starting string, the other “working” tape simulates the run of the
machine.
The second (test) part of G simulates the execution of M on a particular string w. An example of a partially derived string:
à à ❑ ❑ a 1 b 2 c c b 4 Q3 a 3
Examples of rules:
b b Q 4 → b 4 Q 4 (rewrite b as 4)
b 4 Q 3 → Q 3 b 4 (move left)
Example rule:
#ha1→a#h (sweep # h to the right erasing the working “tape”)
Example:
Input: S111S
Output:
Example: plus(n, 0) = n
plus(n, m+1) = succ(plus(n, m))
Proof:
Lexicographically enumerate the unary primitive recursive functions, f0, f1, f2, f3, ….
Define g(n) = fn(n) + 1.
G is clearly computable, but it is not on the list. Suppose it were fm for some m. Then
fm(m) = fm(m) + 1, which is absurd.
0 1 2 3 4
f0
f1
f2
f3 27
f4
0 1 2 3 4
0 1 2 3 4 5
1 2 3 4 5 6
2 3 5 7 9 11
3 5 13 29 61 125
4 13 65533 265536-3 * 265536
−3
65536
22 # 22 −3 %
A function is µ-recursive if it can be obtained from the basic functions using the operations of:
• Composition,
• Recursive definition, and
• Minimalization of minimalizable functions:
A function g is minimalizable iff for every n1,n2,…nk, there is an m such that g(n1,n2,…nk,m)=1.
domain range
Theorem: There are uncountably many partially recursive functions (but only countably many Turing machines).
Partial Recursive
Functions
Recursive
Functions
Primitive Recursive
Functions
Turing Machines
Recursively Enumerable
Languages
Recursive
Languages
Context-Free
Languages
Deterministic
Context-Free
Languages
Regular
Languages
FSMs
DPDAs
NDPDAs
Turing Machines
Examples:
L = {anbncn, n > 0}
L = {w ∈ {a, b, c}+ : number of a's, b's and c's is the same}
The basic idea: To decide if a string w is in L, start generating strings systematically, shortest first. If you generate w, accept. If
you get to strings that are longer than w, reject.
A linear bounded automaton is a nondeterministic Turing machine the length of whose tape is bounded by some fixed constant k
times the length of the input.
Example: L = {anbncn : n ≥ 0}
à❑aabbcc❑❑❑❑❑❑❑❑❑
a’ a,b’ b,c’
> R a a’ R b b’ R c c’ L❑
❑,b’,c’ c,a’,c’,❑
b,c ❑,a,b’,a’
b’,c’ R a,b,c,a’ n
❑
y
Theorem: The set of context-sensitive languages is exactly the set of languages that can be accepted by linear bounded automata.
Proof: (sketch) We can construct a linear-bounded automaton B for any context-sensitive language L defined by some grammar
G. We build a machine B with a two track tape. On input w, B keeps w on the first tape. On the second tape, it
nondeterministically constructs all derivations of G. The key is that as soon as any derivation becomes longer than |w| we stop,
since we know it can never get any shorter and thus match w. There is also a proof that from any lba we can construct a context-
sensitive grammar, analogous to the one we used for Turing machines and unrestricted grammars.
Theorem: There exist recursive languages that are not context sensitive.
Recursively Enumerable
Languages
Recursive
Languages
Context-Sensitive
Languages
Context-Free
Languages
Deterministic
Context-Free
Languages
Regular
Languages
FSMs
DPDAs
NDPDAs
Turing Machines
Recursively Enumerable
Languages
Context-Sensitive
Languages
Context-Free
Languages
Regular
Type 0 Type 1 Type 2 (Type 3)
Languages
FSMs
PDAs
Turing Machines
The Thesis: Anything that can be computed by any algorithm can be computed by a Turing machine.
Another way to state it: All "reasonable" formal models of computation are equivalent to the Turing machine.
This isn't a formal statement, so we can't prove it. But many different computational models have been proposed and they all turn
out to be equivalent.
Examples:
§ unrestricted grammars
§ lambda calculus
§ cellular automata
§ DNA computing
§ quantum computing (?)
If HALTS says that TROUBLE halts on itself then TROUBLE loops. IF HALTS says that TROUBLE loops, then TROUBLE
halts. Either way, we reach a contradiction, so HALTS(M, x) cannot be made into a decision procedure.
Why?
But H cannot be recursive. If it were, then it would be decided by some TM MH. But MH("M" "w") would have to be:
If M is not a syntactically valid TM, then False.
else HALTS("M" "w")
If H were Recursive
Theorem: If H were also recursive, then every recursively enumerable language would be recursive.
Proof: Let L be any RE language. Since L is RE, there exists a TM M that semidecides it.
Undecidable Problems, Languages that Are Not Recursive, and Partial Functions
"M""w" pairs
Consider two lists of strings over some alphabet Σ. The lists must be finite and of equal length.
Question: Does there exist some finite sequence of integers that can be viewed as indexes of A and B such that, when elements of
A are selected as specified and concatenated together, we get the same string we get when elements of B are selected also as
specified?
For example, if we assert that 1, 3, 4 is such a sequence, we’re asserting that x1x3x4 = y1y3y4
i A B
1 1 111
2 10111 10
3 10 0
i A B
1 10 101
2 011 11
3 101 011
A pragmatically non RE language: L1={ (i, j) : i, j are integers where the low order five digits of i are a street address number
and j is the number of houses with that number on which it rained on November 13, 1946 }
An analytically non RE language: L2={x : x = "M" of a Turing machine M and M("M") does not halt}
Why isn't L2 RE? Suppose it were. Then there would be a TM M* that semidecides L2. Is "M*" in L2?
• If it is, then M*("M*") halts (by the definition of M* as a semideciding machine for L2)
• But, by the definition of L2, if "M*" ∈ L2, then M*("M*") does not halt.
Contradiction. So L2 is not RE.
Why not?
ß τ = Succ
ß a, b becomes Succ(a), b
L2 = {a, b : a,b ∈ N : a = b}
If there is a Turing machine M2 to decide L2, then I can build a Turing machine M1 to decide L1:
1. Take the input and apply Succ to the first number.
2. Invoke M2 on the result.
3. Return whatever answer M2 returns.
x
M1 x ∈ L 1?
y = M2 yes yes
τ y ∈ L2?
τ(x) no no
Theorem: If there is a reduction from L1 to L2 and L1 is not recursive, then L2 is not recursive.
x
M1 x ∈ L1?
y = M2 halt halt
τ y ∈ L2?
τ(x)
Theorem: If there is a reduction from L1 to L2 and L1 is not RE, then L2 is not RE.
This is equivalent to, "Is the language L2 = {"M" : Turing machine M halts on the empty tape} recursive?"
ß τ
Let τ be the function that, from "M" and "w", constructs "M*", which operates as follows on an empty input tape:
1. Write w on the tape.
2. Operate as M would have.
Prove that L2 = {áMñ: Turing machine M halts on the empty tape} is not recursive.
Proof that L2 is not recursive via a reduction from H = {áM, wñ: Turing machine M halts on input string w}, a non-recursive
language. Suppose that there exists a TM, M2 that decides L2. Construct a machine to decide H as M1(áM, wñ) = M2(τ(áM, wñ)).
The τ function creates from áMñ and áwñ a new machine M*. M* ignores its input and runs M on w, halting exactly when M halts
on w.
• áM, wñ ∈ H Þ M halts on w Þ M* always halts Þε ∈ L(M*) Þ áM*ñ ∈ L2 Þ M2 accepts Þ M1 accepts.
• áM, wñ ∉ H Þ M does not halt on w Þ ε ∉ L(M*) Þ áM*ñ ∉ L2 Þ M2 rejects Þ M1 rejects.
Thus, if there is a machine M2 that decides L2, we could use it to build a machine that decides H. Contradiction. ∴L2 is not
recursive.
• A clear declaration of the reduction “from” and “to” languages and what you’re trying to prove with the reduction.
• A description of how a machine is being constructed for the “from” language based on an assumed machine for the “to”
language and a recursive τ function.
• A description of the τ function’s inputs and outputs. If τ is doing anything nontrivial, it is a good idea to argue that it is
recursive.
• Note that machine diagrams are not necessary or even sufficient in these proofs. Use them as thought devices, where
needed.
• Run through the logic that demonstrates how the “from” language is being decided by your reduction. You must do both
accepting and rejecting cases.
• Declare that the reduction proves that your “to” language is not recursive.
Doing it wrong by reducing L2 (the unknown one to L1): If there exists a machine M1 that solves H, then we could build a
machine that solves L2 as follows:
1. Return (M1("M", "")).
Suppose that we have proved that the following problem L1 is unsolvable: Determine the number of days that have elapsed since
the beginning of the universe.
Now consider the following problem L2: Determine the number of days that had elapsed between the beginning of the universe
and the assassination of Abraham Lincoln.
Reduce L1 to L2: L1
L1 = L2 + (now - 4/9/1865)
L2
Reduce L2 to L1: L2
L2 = L1 - (now - 4/9/1865)
L1
Considering L2: L1
Reduce L1 to L2:
L1 = L2 + (now - 4/9/1865) L2
Reduce L2 to L1: L2
L2 = L1 - (now - 4/9/1865)
L1
Considering L3: L1
Reduce L1 to L3:
L1 = oops L3
Reduce L3 to L1: L3
L3 = L1 - 365 - (now - 4/9/1866)
L1
ß τ
Let τ be the function that, from "M" and "w", constructs "M*", which operates as follows:
1. M* examines its input tape.
2. If it is equal to w, then it simulates M.
3. If not, it loops.
Clearly the only input on which M* has a chance of halting is w, which it does iff M would halt on w.
ß τ
Let τ be the function that, from "M", constructs "M*", which operates as follows:
1. Erase the input tape.
2. Simulate M.
Clearly M* either halts on all inputs or on none, since it ignores its input.
Rice's Theorem
Alternate statement: Let P: 2Σ*→{true, false} be a nontrivial property of the recursively enumerable languages. The language
{“M”: P(L(M)) = True} is not recursive.
By "nontrivial" we mean a property that is not simply true for all languages or false for all languages.
Examples:
• L contains only even length strings.
• L contains an odd number of strings.
• L contains all strings that start with "a".
• L is infinite.
• L is regular.
Note:
Rice's theorem applies to languages, not machines. So, for example, the following properties of machines are decidable:
• M contains an even number of states
• M has an odd number of symbols in its tape alphabet
Of course, we need a way to define a language. We'll use machines to do that, but the properties we'll deal with are properties of
L(M), not of M itself.
ß τ
Either P(∅) = true or P(∅) = false. Assume it is false (a matching proof exists if it is true). Since P is nontrivial, there is some
language LP such that P(LP) is true. Let MP be some Turing machine that semidecides LP.
• "M" "w" ∉ H Þ M doesn’t halt on w Þ M* will halt on nothing Þ L(M*) = ∅ Þ P(L(M*)) = P(∅) = false Þ M2 decides
P, so M2 rejects "M*" Þ M1 rejects.
Example 1:
L = {s = "M" : M writes a 1 within three moves}.
Example 2:
L = {s = "M1" "M2": L(M1) = L(M2)}.
No, by Rice’s Theorem, since being regular (or context free or recursive) is a nontrivial property of the recursively enumerable
languages.
We can also show this directly (via the same technique we used to prove the more general claim contained in Rice’s Theorem):
ß τ
(?M2) L2 = {s = "M" : L(M) is regular}
Let τ be the function that, from "M" and "w", constructs "M*", whose own input is a string
t = "M*" "w*"
M*("M*" "w*") operates as follows:
1. Copy its input to another track for later.
2. Write w on its input tape and execute M on w.
3. If M halts, invoke U on "M*" "w*".
4. If U halts, halt and accept.
If M2 exists, then ¬M2(M*(s)) decides L1 (H).
Why?
If M does not halt on w, then M* accepts ∅ (which is regular).
If M does halt on w, then M* accepts H (which is not regular).
ß τ
Let τ be the construction that builds a grammar G for the language L that is semidecided by M. Thus
w ∈ L(G) iff M(w) halts.
ß τ
Let τ append the description of a context free grammar GΣ* that generates Σ*.
Non-RE Languages
There are an uncountable number of non-RE languages, but only a countably infinite number of TM’s (hence RE languages).
∴The class of non-RE languages is much bigger than that of RE languages!
Intuition: Non-RE languages usually involve either infinite search or knowing a TM will infinite loop to accept a string.
Diagonalization
x
M1 x ∈ L1?
y = M2 halt halt
τ y ∈ L2?
τ(x)
Theorem: If there is a reduction from L1 to L2 and L1 is not RE, then L2 is not RE.
ß τ
(?M2) L2 = {áMñ: there does not exist a string on which Turing machine M halts}
Let τ be the function that, from áMñ and áwñ, constructs áM*ñ, which operates as follows:
1. Erase the input tape (M* ignores its input).
2. Write w on the tape
3. Run M on w.
áM, wñ
M1
áM*ñ halt halt
τ M2
M*
x w halt halt
M
áM, wñ ∈ H Þ M does not halt on w Þ M* does not halt on any input Þ M* halts on nothing Þ M2 accepts (halts).
áM, wñ ∉ H Þ M halts on w Þ M* halts on everything Þ M2 loops.
If M2 exists, then M1(áM, wñ) = M2(Mτ(áM, wñ)) and M1 semidecides L1. Contradiction. L1 is not RE. ∴ L2 is not RE.
IN OUT
Semidecidable Recursively
Enumerable Enumerable
Unrestricted grammar
Most computational problems you will face your life are solvable (decidable). We have yet to address whether a problem is
“easy” or “hard”. Complexity theory tries to answer this question.
Big-O Notation
A function f(n) is O(g(n)) whenever there exists a constant c, such that |f(n)| ≤ c⋅|g(n)| for all n ≥ 0.
(We are usually most interested in the “smallest” and “simplest” function, g.)
Examples:
2n3 + 3n2⋅log(n) + 75n2 + 7n + 2000 is O(n3)
75⋅2n + 200n5 + 10000 is O(2n)
If a function f(n) is not polynomial, it is considered to be exponential, whether or not it is O of some exponential function
(e.g. n log n).
In the above two examples, the first is polynomial and the second is exponential.
Speed of various time complexities for different values of n, taken to be a measure of problem size. (Assumes 1 step per
microsecond.)
f(n)\n 10 20 30 40 50 60
n .00001 sec. .00002 sec. .00003 sec. .00004 sec. .00005 sec. .00006 sec.
n2 .0001 sec. .0004 sec. .0009 sec. .0016 sec. .0025 sec. .0036 sec.
n3 .001 sec. .008 sec. .027 sec. .064 sec. .125 sec. .216 sec.
n5 .1 sec. 3.2 sec. 24.3 sec. 1.7 min. 5.2 min. 13.0 min.
2n .001 sec. 1.0 sec. 17.9 min. 12.7 days 35.7 yr. 366 cent.
3n .059 sec. 58 min. 6.5 yr. 3855 cent. 2x108 cent. 1.3x1013 cent.
Faster computers don’t really help. Even taking into account Moore’s Law, algorithms with exponential time complexity are
considered intractable. ∴Polynomial time complexities are strongly desired.
This means that we can sequence and compose polynomial-time algorithms with the resulting algorithms remaining polynomial-
time.
Computational Model
For formally describing the time (and space) complexities of algorithms, we will use our old friend, the deciding TM (decision
procedure).
We will classify the time complexity of an algorithm (TM) to solve it by its big-O bound on TM(n).
We are most interested in polynomial time complexity algorithms for various types of problems.
Encoding a Problem
Traveling Salesman Problem: Given a set of cities and the distances between them, what is the minimum distance tour a
salesman can make that covers all cities and returns him to his starting city?
Stated as a decision question over graphs: Given a graph G = (V, E), a positive distance function for each edge d: E→N+, and a
bound B, is there a circuit that covers all V where Σd(e) ≤ B? (Here a minimization problem was turned into a bound problem.)
Note that the sizes of most “reasonable” problem encodings are polynomially related.
Most TM extensions are can be simulated by a standard TM in a time polynomially related to the time of the extended machine.
Recall that a nondeterministic TM can use a “guess and test” approach, which is computationally efficient at the expense of
many parallel instances.
Roughly speaking, P is the class of problems that can be solved by deterministic algorithms in a time that is polynomially related
to the size of the respective problem instance.
The way the problem is encoded or the computational abilities of the machine carrying out the algorithm are not very important.
The Class NP
Roughly speaking, NP is the class of problems that can be solved by nondeterministic algorithms in a time that is polynomially
related to the size of the respective problem instance.
Examples:
§ Traveling salesman problem: Given a graph G = (V, E), a positive distance function for each edge d: E→N+, and a
bound B, is there a circuit that covers all V where Σd(e) ≤ B?
§ Subgraph isomorphism problem: Given two graphs G1 and G2, does G1 contain a subgraph isomorphic to G2?
Recursive
NP
Clearly P ⊆ NP.
§ To date, nearly all decidable problems with polynomial bounds on the size of the solution are in this class.
§ Nondeterminism doesn’t influence decidability, so maybe it shouldn’t have a big impact on complexity.
§ Showing that P = NP would dramatically change the computational power of our algorithms.
§ Showed that the Boolean Satisfiability (SAT) problem has the property that every other NP problem can be
polynomially reduced to it. Thus, SAT can be considered the hardest problem in NP.
§ Suggested that other NP problems may also be among the “hardest problems in NP”.
This “hardest problems in NP” class is called the class of “NP-complete” problems.
Further, if any of these NP-complete problems can be solved in deterministic polynomial time, they all can and, by implication,
P = NP.
A language L1 is polynomial time reducible to L2 if there is a polynomial-time recursive function τ such that ∀x ∈ L1 iff τ(x) ∈
L2.
If L1 is polynomial time reducible to L2, we say L1 reduces to L2 (“polynomial time” is assumed) and we write it as L1 ∝ L2.
Lemma: If L1 ∝ L2, then (L2 ∈ P) Þ (L1 ∈ P). And conversely, (L1 ∉ P) Þ (L2 ∉ P).
Polynomially equivalent languages form an equivalence class. The partitions of this equivalence class are related by the partial
order ∝.
P is the “least” element in this partial order.
M1
w τ(w) y
τ M2
n
Given a set of Boolean variables U = {u1, u2, …, um} and a Boolean expression in conjunctive normal form (conjunctions of
clauses—disjunctions of variables or their negatives), is there a truth assignment to U that makes the Boolean expression true
(satisfies the expression)?
SAT is NP-complete because SAT ∈ NP and for all other languages L’ ∈ NP, L’ ∝ SAT.
Reduction Roadmap
SAT
3SAT
3DM VC
PARTITION HC CLIQUE
The early NP-complete reductions took this structure. Each phrase represents a problem. The arrow represents a reduction from
one problem to another.
Consider a set M ⊆ X × Y × Z of disjoint sets, X, Y, & Z, such that |X| = |Y| = |Z| = q. Does there exist a matching, a subset
M’⊆ M such that |M’| = q and M’ partitions X, Y, and Z?
This is a generalization of the marriage problem, which has two sets men & women and a relation describing acceptable
marriages. Is there a pairing that marries everyone acceptably?
The marriage problem is in P, but this “3-sex version” of the problem is NP-complete.
PARTITION
Given a set A and a positive integer size, s(a) ∈ N+, for each element, a ∈ A. Is there a subset A’ ⊆ A such that
Σ s(a) = Σ s(a) ?
a∈A’ a∈A-A’
VC (Vertex Cover)
Given a graph G = (V, E) and an integer K, such that 0 < K ≤ |V|, is there a vertex cover of size K or less for G, that is, a subset
V’ ⊆ V such that |V’| ≤ K and for each edge, (u, v) ∈ E, at least one of u and v belongs to V’?
CLIQUE
HC (Hamiltononian Circuit)
Given a graph G = (V, E), does there exist a Hamiltonian circuit, that is an ordering <v1, v2, …, vn> of all V such that
(v|V|, v1) ∈ E and (vi, vi+1) ∈ E for all i, 1 ≤ i < |V|?
Given a graph G = (V, E), a positive distance function for each edge d: E→N+, and a bound B, is there a circuit that covers all V
where Σd(e) ≤ B?
TSP ∈ NP: Guess a set of roads. Verify that the roads form a tour that hits all cities. Answer “yes” if the guess is a tour and the
sum of the distances is ≤ B.
Reduction from HC: Answer the Hamiltonian circuit question on G = (V, E) by constructing a complete graph where “roads”
have distance 1 if the edge is in E and 2 otherwise. Pose the TSP problem, is there a tour of length ≤ |V|?
The more NP-complete problems are known, the easier it is to find a NP-complete problem to reduce from.
More Theory
NP has a rich structure that includes more than just P and NP-complete. This structure is studied in later courses on the theory of
computation.
The set of recursive problems outside of NP (and including NP-complete) are called NP-hard. There is a proof technique to
show that such problems are at least as hard as NP-complete problems.
Space complexity addresses how much tape does a TM use in deciding a language. There is a rich set of theories surrounding
space complexity.
Recursive
NP-hard
NP
NP-complete
(part of NP-hard)
You will likely run into NP-complete problems in your career. For example, most optimization problems are NP-complete.
The field of linear optimization springs out of the latter approach. Some linear optimization solutions can be proven to be “near”
optimal.
A branch of complexity theory deals with solving problems within some error bound or probability.
For more: Read Computers and Intractability: A Guide to the Theory of NP-Completeness by Michael R. Garey and David S.
Johnson, 1979.