Toc PDF
Toc PDF
Reference Books:
1. Formal languages and Automata 5th Edition, By Peterlinz
2. Introduction to the theory of computation 2nnd Edition, by M. Sipser
This PDF contains the notes from the standards books and are only meant for
GATE CSE aspirants.
Concatenation of two strings w and υ is the string obtained by appending the symbols of υ to the
right end of w, that is, if
If w is a string, then w^n stands for the string obtained by repeating w, n times. As a special case
defined as
If Σ is an alphabet, then we use Σ* to denote the set of strings obtained by concatenating zero or
more symbols from Σ. The set Σ* always contains λ.
While Σ is finite by assumption, Σ* and Σ^+ are always infinite since there is no limit on the length of
the strings in these sets.
A language is defined very generally as a subset of Σ*. A string in a language L will be called a
sentence of L.
The concatenation of two languages L1 and L2 is the set of all strings obtained by concatenating any
element of L1 with any element of L2; specifically
For every language L., We define L^n as L concatenated with itself n times, with the special cases
And
Example: If
Then
Grammars
A grammar G is defined as a quadruple G = (V, T, S, P)
The * indicates that an unspecified number of steps (including zero) can be taken to
derive wn from w1.
The set of all such terminal strings is the language defined or generated by the grammar.
Let G = (V, T, S, P) be a grammar. Then the set
Sentential forms
If w ∈ L (G), then the sequence.
is a derivation of the sentence w.
The strings S, w1, w2… wn, which contain variables as well as terminals, are called sentential forms
of the derivation.
Two grammars G1 and G2 are equivalent if they generate the same language
An automaton is an abstract model of a digital computer. It will be assumed that the input is a
string over a given alphabet, written on an input file, which the automaton can read but not
change.
The input file is divided into cells, each of which can hold one symbol. The input mechanism can read
the input file from left to right, one symbol at a time.
The input mechanism can also detect the end of the input string (by sensing an end-of-file condition)
The automaton has a control unit, which can be “in” any one of a finite number of internal states,
and which can change state in some defined manner.
Transition function it gives the next state in terms of the current state, the current input symbol,
and the information currently in the temporary storage.
Deterministic Automata
A deterministic automaton is one in which each move is uniquely determined by the current
configuration. If we know the internal state, the input, and the contents of the temporary storage,
we can predict the future behavior of the automaton exactly.
Nondeterministic automata
Nondeterministic automaton may have several possible moves, so
we can only predict a set of possible actions.
Note
a. An automaton whose output response is limited to a simple “yes” or “no” is called an accepter.
b. Automaton, capable of producing strings of symbols as output, is called a transducer.
Properties:-
Solution
Finite Automata
A deterministic finite accepter or dfa is defined by the quintuple M = (Q, Σ, δ, q0, F),
The second argument of δ* is a string, rather than a single symbol, and its value gives the state the
automaton will be in after reading that string.
The language accepted by a DFA M = (Q, Σ, δ, q0, F) is the set of all strings on Σ accepted by M.
In formal notation,
In a nondeterministic accepter, the range of δ is in the powerset 2^Q, so that its value is not a
single element of Q but a subset of it. This subset defines the set of all possible states that can be
reached by the transition.
A string is accepted by an nfa if there is some sequence of possible moves that will put the machine
in a final state at the end of the string. A string is rejected (that is, not accepted) only if there is no
possible sequence of moves by which a final state can be reached.
We see that δ*(q2, λ) contains q0. Also, since any state can be reached from itself by making no
move, thus
The language L accepted by an nfa M = (Q, Σ, δ, q0, F) is defined as the set of all strings accepted in
the above sense. Formally,
The language consists of all strings w for which there is a walk labeled w from the initial vertex of the
transition graph to some final vertex.
OR
On same input w, states p and both are either going to final state or non-final state will be called
equivalent states.
2. For all w ∈ Σ*. If, on the other hand, there exists some string w ∈ Σ* such that
OR vice versa then the states p and q are said to be distinguishable by a string w.
b. Left Linear Grammar a grammar is said to be left-linear if all productions are of the form.
A → Bx,
A→x
A grammar G =(V, T, S, P) is called unrestricted if all the productions are of the form
u→v
A → ax,
Note: - If G is an s-grammar, any string w in L(G) can be parsed with efforts proportional to |w|.
Each step produces one terminal symbol and hence the whole process must be completed in no
more than |w| steps.
1. A context-free grammar G is said to be ambiguous if there exists some w ∈ L(G) that has at
least two distinct derivation trees.
2. Alternatively, ambiguity implies the existence of two or more leftmost or rightmost
derivations.
Inherently ambiguous
If L is a context-free language for which there exists an unambiguous grammar, then L is said to be
unambiguous. If every grammar that generates L is ambiguous, then the language is called
inherently ambiguous.
With x, y in (V U T)*. A variable is useful if and only if it occurs in at least one derivation.
A variable that is not useful is called useless. A production is useless if it involves any useless
variable.
Removing λ-Productions
Note: A grammar may generate a language not containing λ, yet have some λ-productions or nullable
variables. In such cases, the λ-productions can be removed.
1. Let G be any context-free grammar with λ not in L (G). Then there exists an equivalent grammar
having no λ-productions.
2. If a language contains empty string then we can’t remove λ-productions from the grammar.
We put restrictions not on the length of the right sides of a production, but on the positions in which
terminals and variables can appear.
A context-free grammar is said to be in Greibach normal form if all productions have the form
A → ax,
Where a ∈ T and x ∈ V*
Example-
The grammar
Note:-
CYK Membership and parsing algorithms for context-free grammars exist that require
approximately |w|^3 steps to parse a string w.
CYK algorithm works only if the grammar is in CNF and succeeds by breaking one problem into a
sequence of smaller ones.
Push Down Automata
Pushdown automata, are like non-deterministic finite automata but have an extra component called
a stack.
Pushdown automata are equivalent in power to context-free grammars.
Schematic of PDA
Where
Q is a finite set of internal states of the control unit,
Σ is the input alphabet,
Γ is a finite set of symbols called the stack alphabet
δ: Q × (Σ ∪ {λ}) × Γ → set of finite subsets of Q × Γ* is the transition function.
q0 ∈ Q is the initial state of the control unit,
z ∈ Γ is the stack start symbol,
F ⊆ Q is the set of final states.
1. The arguments of δ are the current state of the control unit, the current input symbol, and the
current symbol on top of the stack.
2. The result is a set of pairs (q, x), where q is the next state of the control unit and x is a string
that is put on top of the stack in place of the single symbol there before.
3. Note that the second argument of δ may be λ, indicating that a move that does not consume an
input symbol is possible. We will call such a move a λ-transition.
4. Note also that δ is defined so that it needs a stack symbol; no move is possible if the stack is
empty.
5. Finally, the requirement that the elements of the range of δ be a finite subset is necessary
because Q × Γ* is an infinite set and therefore has infinite subsets.
6. While an npda may have several choices for its moves, this choice must be restricted to a finite
set of possibilities.
If at any time the control unit is in state q1, the input symbol read is a, and the symbol on top of the
stack is b, then one of two things can happen: (1) the control unit goes into state q2 and the string
cd replaces b on top of the stack, or (2) the control unit goes into state q3 with the symbol b
removed from the top of the stack. In our notation we assume that the insertion of a string into a
stack is done symbol by symbol, starting at the right end of the string.
The language accepted by M is the set of all strings that can put M into a final state at the end of
the string.
The final stack content u is irrelevant to this definition of acceptance.
A deterministic pushdown accepter (dpda) is a pushdown automaton that never has a choice in its
Move.
Where
Q is a finite set of internal states of the control unit,
Σ is the input alphabet,
Γ is a finite set of symbols called the stack alphabet
It is subject to the restrictions that, for every q ∈ Q, a ∈ Σ ∪ {λ} and b ∈ Γ,
1. δ(q, a, b) contains at most one element.
2. If δ (q, λ, b) is not empty, then δ (q, c, b) must be empty for every c ∈ Σ.
• The first of these conditions simply requires that for any given input symbol and any stack top, at
most one move can be made.
• The second condition is that when a λ-move is possible for some configuration, no input-
consuming alternative is available.
Note:-
1. We retain λ-transitions in DPDA also.
2. λ-transitions does not automatically imply nondeterminism. Also, some transitions of a dpda
may be to the empty set, that is, undefined, so there may be dead configurations.
3. Only criterion for determinism is that at all times at most one possible move exists.
A language L is said to be a deterministic context-free language if and only if there exists a dpda M
such that L = L (M).
If A is a context-free language, then there is a number p (the pumping length) where, if s is any
string in A of the length at least p, then s may be divided into five pieces s = uvxyz the conditions
1.
2.
3.
1. Case 2: When S is being divided into uvxyz, condition 2 says that either v or y is not empty string.
2. Case 3: the pieces v, x and y together have length at most p.
Note: -
1. A DPDA with acceptance by EMPTY STACK is proper subset of the languages accepted by a
DPDA with final state.
2. For each DCFL which satisfied prefix property, can be accepted by a DPDA with empty stack.
The following list summarizes the differences between finite automata and Turing machines
1. A Turing machine can both write on the tape and read from it.
2. The read-write head can move both to the left and to the right.
3. The tape is infinite.
4. The special states for rejecting and accepting take effect immediately
Where
Q is the set of internal states,
Σ is the input alphabet not containing blank symbol.
Γ is the finite set of symbols called the tape alphabet,
δ is the transition function,
is a special symbol called the blank,
q0 ∈ Q is the initial state,
F ⊆ Q is the set of final states.
Transition Function: In general, δ is a partial function on Q × Γ; for a Turing machine, δ takes the
form:
When machine is in a certain state q and the head is over a tape cell containing a symbol a, and
If δ(q, a) = ( r, b, L), the machine writes the symbol b replacing a and machine goes to state r.
The third component is either L or R and indicates whether the head moves to the left or right after
writing. In this case the L indicate a move to the left.
A Turing machine is said to halt whenever it reaches a configuration for which δ is not defined; this
is possible because δ is a partial function. We will assume that no transitions are defined for any
final state, so the Turing machine will halt whenever it enters a final state.
1. A string w is written on the tape, with blanks filling out the unused portions.
2. The machine is started in the initial state q0 with the read write head positioned on the
leftmost symbol of w.
3. If, after a sequence of moves, the Turing machine enters a final state and halts, then w is
considered to be accepted.
1. This definition indicates that the input w is written on the tape with blanks on either side.
2. Exclusion of blanks from the input assures us that all the input is restricted to a well-defined
region of the tape, bracketed by blanks on the right and left.
3. Without this convention, the machine could not limit the region in which it must look for the
input; no matter how many blanks it saw, it could never be sure that there was not some
nonblank input somewhere else on the tape.
Note that the Turing machine also halts in a final state if started in state q0 on a blank. We could
interpret this as acceptance of λ, but for technical reasons the empty string is not included.
Note:-
The collection of strings that M accepts is the language of M, or the language recognized by M,
denoted L(M)
Variants of Turing Machine
1. Multiple Turing Machine It’s like an ordinary T.M with several tapes. Each tape has its own head
for reading and writing. Initially the input appears on tape 1, and the others start out blank. The
transition function is:
2. Nondeterministic Turing Machine transition function for a NTM has the form
I think, P is powerset.
3. Enumerators Some people use the term recursively enumerable language for Turing-
recognizable language. Loosely defined, an enumerator is a Turing machine with an attached
printer. The Turing machine can use that printer as an output device to print strings.
Schematic of an enumerator
An enumerator E starts with a blank input tape. If the enumerator doesn’t halt it may print an
infinite list of strings. The language enumerated by E is the collection of all strings that
eventually prints out.
Decidability
Closure Properties
1. All types of languages are closed under all the operations with regular languages such as LUR,
L∩R, L-R.
2. CFLs are not closed under difference operation as L1-L2 = L1∩L2^c, and CFLs are not closed
under complement operation.
3. No languages are closed under subset ⊆ and Infinite union.
4. Regular languages are not closed under infinite UNION and infinite INTERSECTION
5. If L is DCFL then so are MIN(L) and MAX(L).
6. Complement of non-regular is always non-regular
7. If L be a DCFL and R is a regular language then L/R is DCFL.
8. DCFL U CFL = CFL
9. If something is closed under UNION and COMPLEMENT then it will be surely closed under
INTERSECTION.
Let L be a language
1. HALF(L) = {x | for some y such that |x| = |y| and xy ∈ L }
2. MIN(L) = { w | w is in L and no proper prefix of w is in L}
3. MAX(L) = {w | w is in L and for no x other than epsilon wx is in L}
4. INIT(L) = {w | for some x, wx is in L}
5. CYCLE(L) = { w | we can write w as w=xy such that yx is in L}
6. ALT(L, M) is regular provided that L and M are regular languages.
7. SHUFFLE(L, L’) is a CFL if L is CFL and L’ is regular.
8. SUFFIX(L) = { y | xy ∈ L for some string x }, CFL is closed under SUFFIX operation.
9. NOPREFIX(L) = {w ∈ A | and no prefix of w is member of A }
10. NOEXTEND(L) = { w ∈ A | w is not proper prefix of any string in A }
11. DROP-OUT(L) let A be any language, define DROP-OUT(L) to be the language containing all
strings that can be obtained by removing one symbol from a string in L.
12. Regular languages are closed under NOPREFIX, NOEXTEND, and DROP-OUT operations.
To test whether L(G) is empty we need to test whether the productions of G can generate a string of
terminals.
a. Determine for each variable whether that variable can generate a string of terminals.
b. When the algorithm determines that a variable can generate a string of terminals the
algorithm mark that variable.
c. The algorithm start by marking first all terminals. Then it marks variables that have on their
rhs in some rules only terminals.
d. If the start symbol S is not marked, accept otherwise reject.
Problem 3: for a CFL L and a string w does w belong to L? i.e. is there a CFG G such that w ∈ L(G)?
Let G be a CFG for L, i.e L(G) = L. Design a TM M that decides L by building a copy of G into M”
M = “on input w”
1. Run TM S on input <G, w>
2. If this machine accepts, accept; if it rejects, rejects.
Problem 4: For two CFL languages generated by two CFGs A and B is L(A) = L(B) true?
Since the class of CFL is not closed under INTERSECTION and COMPLEMENT we cannot use the
symmetric difference for EQCFG.
Problem 5: Is it decidable whether a given context free grammar generates a finite language?
GATECSE: checking if L(CFG) is finite is decidable because we just need to see if L(CFG) contains any
string with length between n and 2n−1, where n is the pumping lemma constant. If so, L(CFG) is
infinite otherwise it is finite.
Stackoverflow:
Let G be a context free grammar, and let us assume that it is in Chomsky normal form. If it's not,
we'll convert it first. An important property of this normal form is that the only way to derive the
empty word is with the single rule S0→ϵ (where S0 is the initial variable, which cannot be derived
from other variables).
Thus, any other derivation adds some non-trivial part to a word. Now, let n be the number of
variables in the grammar, and let k be the maximal length on the right-side of a derivation rule.
That is, A→B1⋯BkA→B1⋯Bk is the maximal length.
G generates an infinite word iff it generates a word of length at least k^(n+2)
Decidable Problems concerning regular languages
Problem1:
The problem of testing whether a DFA B accepts an input w is the same as the problem of testing
whether <B, w> is a member of the language ADFA. ADFA is decidable.
Problem 2:
Problem3:
AREX is decidable.
The following TM P decides AREX.
P = “On input <R, w> where R is regular expression and w is a string”
1. Convert regular expression R to an equivalent NFA A.
2. Run TM N on <A, w>
3. If N accepts, accept; otherwise, reject.
Problem 4:
EDFA is decidable.
A DFA accepts some string iff reaching an accept state from the start by travelling along the arrows
of the DFA is possible. To test this condition we can design a TM T.
T = “On input <A> where A is a DFA
1. Mark the start state of A.
2. Repeat until no new states get marked:
3. Mark any state that has a transition coming into from any state that is already marked.
4. If no accept state is marked, accept; otherwise, reject.
Problem 5:
Solution:
Que 7: is decidable?
1. Let A = { <M> | M is a DFA which doesn’t accept any string containing an odd number of 1’s} is
Decidable language.
2. Let A = {<G> | G is a CFG over {0, 1} and 1*∩L(G) != Φ} is Decidable.
3. LCFG = {<G, k> | G is a CFG, L(G) contains exactly k strings where k>=0 or k=infinity} is decidable.
_________________ TURING MACHINE ________________________________________
From my notes:-
1. HALT-TM = {<M, w> | M is a TM and M halts on input w} – Undecidable, RE
2. E-TM = {<M> | M is a TM and L(M) = Φ} – Undecidable, NOT RE
3. EN-TM = {<M> | M is a TM and L(M) != Φ} – Undecidable, RE
4. REGULAR-TM = {<M> | M is a TM and L(M) is a regular language} – Undecidable, NOT RE
5. REGULAR-TM = {<M> | M is a TM and L(M) is a REC}, Undecidable, NOT RE
6. REGULAR-TM = {<M> | M is a TM and L(M) is a NOT REC}, Undecidable, NOT RE
7. EQ-TM = {<M1, M2> | M1 and M2 are TMs and L(M1) = L(M2)} – Undecidable, NOT RE
8. A-LBA = {<M, w> | M is an LBA that accepts string w} – Decidable
9. E-LBA = {<M> | M is an LBA where L(M) = Φ} – Undecidable, NOT RE
10. All-CFG = {<G>|G is a context free grammar and L(G) = Σ∗} Undecidable, NOT RE
11. T = {<M > | M is a TM that accepts w^r whenever it accepts w} undecidable, RE?
12. A TM ever writes a blank symbol over a non-blank symbol during the course of its computation.
Undecidable, NOT RE
13. L3 = {< G> | G is ambiguous} RE, while L3’ is NOT RE.
Note:-
1. TM >> LBA( FA + 2 counter) > NPDA( NFA + 1 counter) > DPDA (DFA + 1 counter) > NFA = DFA
2. Any TM with m symbols and n states can be simulated using 4mn + n states by other TM.
3. If A <=p B ( A is polynomial reducible to B, if A is NOT RE then B is NOT RE too.
4. A <=p A’, If A is Turing recognizable, then A is decidable.
Solutions:
(a) Property of a TM to be part of our language: TM M has at least 481 states.
By looking at the description of TM M, we can tell how many states a TM has.
Method 1:
Suppose on eps TM m takes only 60 steps and halts on eps. But any other input may take more than
481 steps. In worst case all the tried inputs took less than 481 steps but we cannot stop because any
i/p among we didn’t try may take more than 481 steps. In worst case TM will always keep running
and keep checking.
Method 2: take all the permutations of inputs of length 481 and run inputted TM M only on these
inputs.
On 2 input alphabets 2^481 strings are possible, write all of them on a tape separated by #.
S1 # S2 # S3 # S4 ……
If TM M takes less than 481 steps on S1, run M on S2 and so on but input of length atmost 481.
Case1: If some string takes more than 481 steps then we don’t require to check bigger set.
Case2: all strings in small set not taking more than 481 steps (halting on less no. of steps)
Now we should check for bigger set strings, but that is not required because
Take 500 length strings, divide strings into two parts 481 symbols + 19 symbols.
In case of the strings of 500 length also, TM will start from the beginning of the string, as we already
checked for 481 symbols part, if this smaller part has been accepted than TM will remain in the same
state for next 19 symbols also.
Hence, by looking at small part only, we can tell that whether a TM will take more than 481 steps
on some input or not. It’s not required to check for bigger set.
Problem: given a TM, which ever moves its head more than 481 tape cells away from the left end
marker on input epsilon?
Suppose we run TM for 5 steps and Header remain in the same position in first cell. We check in
which cell currently header is there. But we can’t conclude anything because header may move to
right after 5 steps.
It’s a decidable language, because we can check for all the possible combinations of no. of heads
can be moved*no. of states*no. of ways each cell can be modified.
Note 1:
1. Element distinctness problem A TM M is given a list of strings over {0, 1} separated by #, as
follows s1 # s2 # s3 # S4 # . . . .
Its job is to accept if all the strings are different is decidable problem.
2. D = {P | P is a polynomial with multi variable with an integral root} is RE but NOT REC.
3. D = {P | P is a polynomial with single variable with an integral root} is REC.
4. A = {<G> | G is a connected undirected graph } – decidable
https://www.cs.rice.edu/~nakhleh/COMP481/final_review_sp06_sol.pdf
Prob1: L1=
Is a Decidable language.
It first finds the length of <M>, and stores it. Then, it runs M on all inputs of length at most |<M>|,
for at most |<M>|steps, and accepts if M accepts at least one of the strings within the specified
number of steps.
Prob3: (RE)
M* that semidecides the language, runs M on all inputs in an interleaved mode, and halts whenever
3 inputs have been accepted. Notice that M¤ generates the input strings for M one by one as they
are needed (It is not allowed that M* first generates all strings, and then starts running M on them,
since generating the inputs takes infinite time!).
Prob7: (Decidable)
This is the language of all TM's, since there are no uncountable languages. (Over finite alphabets and
finite-length strings).
Prob8: (Decidable)
This is the empty set; there are no uncountable languages (over finite alphabets and finite-length
strings).
Prob9: (RE)
M* that semidecides the language run the two machines on epsilon " (it interleaves the run between
the machines), and accepts if at least one of them accepts. But not recursive in worst case both
machines can keep running.
Prob10: (RE)
Prob12: (RE)
M0 is a halting TM which halts on all inputs. If TM M0 is a member of L(M) then M will halt on M0
but if M0 is not member of L(M) then M will keep running. Hence it’s RE language.
Prob13: (REC)
M0 is HTM, which halts on every input. When M is inputted to M0, M0 will always halt, either it
accepts of rejects M.
Prob14:
(Recursive)
In particular, take M’ to be machine that rejects all inputs. Intersection of M’ and M will accept
nothing.
Prob15:
(RE) it semidecides the language runs M on all strings of length at most 100 in an interleaved mode,
and halts if M accepts at least one.
Prob16: (REC)
This is the empty set, since every language has an infinite number of TMs that accept it.
Prob17: (REC)
We are talking about all the descriptions of Turing machines using a fixed alphabet (of finite size, of
course), i.e., TM's that are encoded as input to the universal TM. So, L19 is finite, and hence
recursive.
Prob20:
Prob21:
(Recursive)
Prob22: (RE)
It semidecides the language runs M on all inputs in an interleaved mode, halts and accepts once M
accepts two strings of different lengths.
(NOT RE)
(REC)
Singleton: if TM only accepts one string
http://gatecse.in/rices-theorem/
Complement a DFA:
In a given DFA,
1. Convert final states into non-final states, and
2. Convert non-final states into final states.
3. Don’t change initial state
This DFA will accept Complement of the language accepted by the original DFA.
Reversal of DFA