Lesson 6 3rd Release
Lesson 6 3rd Release
`
VI
CLASSIFICATION OF GRAMMARS
Introduction
In the literary sense of the term, grammars denote syntactical rules for conversation in natural
languages. Linguistics have attempted to define grammars since the inception of natural languages like
English, Sanskrit, Mandarin, etc.
The theory of formal languages finds its applicability extensively in the fields of Computer
Science. Noam Chomsky gave a mathematical model of grammar in 1956 which is effective for writing
computer languages.
DISCUSSION:
Grammar
A grammar G can be formally written as a 4-tuple (N, T, S, P) where −
N or VN is a set of variables or non-terminal symbols.
T or ∑ is a set of Terminal symbols.
S is a special variable called the Start symbol, S ∈ N
P is Production rules for Terminals and Non-terminals. A production rule has the form α → β,
where α and β are strings on VN ∪ ∑ and least one symbol of α belongs to VN.
Example
Grammar G1 −
({S, A, B}, {a, b}, S, {S → AB, A → a, B → b})
Here,
S, A, and B are Non-terminal symbols;
a and b are Terminal symbols
S is the Start symbol, S ∈ N
Productions, P : S → AB, A → a, B → b
Example
Grammar G2 −
(({S, A}, {a, b}, S,{S → aAb, aA → aaAb, A → ε } )
Here,
S and A are Non-terminal symbols.
a and b are Terminal symbols.
ε is an empty string.
S is the Start symbol, S ∈ N
Production P : S → aAb, aA → aaAb, A → ε
Strings may be derived from other strings using the productions in a grammar. If a grammar G has a
production α → β, we can say that x α y derives x β y in G. This derivation is written as −
x α y ⇒G x β y
Example
We’ll consider some languages and convert it into a grammar G which produces those languages.
Example
Problem − Suppose, L (G) = {am bn | m ≥ 0 and n > 0}. We have to find out the grammar G which
produces L(G).
Solution
Since L(G) = {am bn | m ≥ 0 and n > 0}
the set of strings accepted can be rewritten as −
L(G) = {b, ab,bb, aab, abb, …….}
Here, the start symbol has to take at least one ‘b’ preceded by any number of ‘a’ including null.
To accept the string set {b, ab, bb, aab, abb, …….}, we have taken the productions −
S → aS , S → B, B → b and B → bB
S → B → b (Accepted)
S → B → bB → bb (Accepted)
S → aS → aB → ab (Accepted)
S → aS → aaS → aaB → aab(Accepted)
S → aS → aB → abB → abb (Accepted)
Thus, we can prove every single string in L(G) is accepted by the language generated by the production
set.
Hence the grammar −
G: ({S, A, B}, {a, b}, S, { S → aS | B , B → b | bB })
Example
Problem − Suppose, L (G) = {am bn | m > 0 and n ≥ 0}. We have to find out the grammar G which
produces L(G).
Solution −
Since L(G) = {am bn | m > 0 and n ≥ 0}, the set of strings accepted can be rewritten as −
L(G) = {a, aa, ab, aaa, aab ,abb, …….}
Here, the start symbol has to take at least one ‘a’ followed by any number of ‘b’ including null.
To accept the string set {a, aa, ab, aaa, aab, abb, …….}, we have taken the productions −
S → aA, A → aA , A → B, B → bB ,B → λ
S → aA → aB → aλ → a (Accepted)
S → aA → aaA → aaB → aaλ → aa (Accepted)
S → aA → aB → abB → abλ → ab (Accepted)
S → aA → aaA → aaaA → aaaB → aaaλ → aaa (Accepted)
S → aA → aaA → aaB → aabB → aabλ → aab (Accepted)
S → aA → aB → abB → abbB → abbλ → abb (Accepted)
Thus, we can prove every single string in L(G) is accepted by the language generated by the production
set.
Hence the grammar −
G: ({S, A, B}, {a, b}, S, {S → aA, A → aA | B, B → λ | bB })
Take a look at the following illustration. It shows the scope of each type of grammar −
Type - 3 Grammar
Type-3 grammars generate regular languages. Type-3 grammars must have a single non-terminal on the
left-hand side and a right-hand side consisting of a single terminal or single terminal followed by a
single non-terminal.
The productions must be in the form X → a or X → aY
where X, Y ∈ N (Non terminal)
and a ∈ T (Terminal)
The rule S → ε is allowed if S does not appear on the right side of any rule.
Example
X→ε
X → a | aY
Y→b
Type - 2 Grammar
S→Xa
X→a
X → aX
X → abc
X→ε
Type - 1 Grammar
Example
AB → AbBc
A → bcA
B→b
Type - 0 Grammar
Example
S → ACaB
Bc → acB
CB → DB
aD → Db
Regular Expressions
A Regular Expression can be recursively defined as follows −
ε is a Regular Expression indicates the language containing an empty string. (L (ε) = {ε})
φ is a Regular Expression denoting an empty language. (L (φ) = { })
x is a Regular Expression where L = {x}
If X is a Regular Expression denoting the language L(X) and Y is a Regular Expression denoting
the language L(Y), then
o X + Y is a Regular Expression corresponding to the language L(X) ∪
L(Y) where L(X+Y) = L(X) ∪ L(Y).
o X . Y is a Regular Expression corresponding to the language L(X) . L(Y) where L(X.Y)
= L(X) . L(Y)
o R* is a Regular Expression corresponding to the language L(R*)where L(R*) = (L(R))*
If we apply any of the rules several times from 1 to 5, they are Regular Expressions.
Some RE Examples
(a+b)* Set of strings of a’s and b’s of any length including the null string. So L
= { ε, a, b, aa , ab , bb , ba, aaa…….}
(a+b)*abb Set of strings of a’s and b’s ending with the string abb. So L = {abb,
aabb, babb, aaabb, ababb, …………..}
(11)* Set consisting of even number of 1’s including empty string, So L= {ε,
11, 1111, 111111, ……….}
(aa)*(bb)*b Set of strings consisting of even number of a’s followed by odd number
of b’s , so L = {b, aab, aabbb, aabbbbb, aaaab, aaaabbb, …………..}
(aa + ab + ba + String of a’s and b’s of even length can be obtained by concatenating any
bb)* combination of the strings aa, ab, ba and bb including null, so L = {aa,
ab, ba, bb, aaab, aaba, …………..}
Regular Sets
Any set that represents the value of the Regular Expression is called a Regular Set.
Proof −
Hence, proved.
Property 2. The intersection of two regular set is regular.
Proof −
Let us take two regular expressions
RE1 = a(a*) and RE2 = (aa)*
So, L1 = { a,aa, aaa, aaaa, ....} (Strings of all possible lengths excluding Null)
L2 = { ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 ∩ L2 = { aa, aaaa, aaaaaa,.......} (Strings of even length excluding Null)
RE (L1 ∩ L2) = aa(aa)* which is a regular expression itself.
Hence, proved.
Property 3. The complement of a regular set is regular.
Proof −
Let us take a regular expression −
RE = (aa)*
So, L = {ε, aa, aaaa, aaaaaa, .......} (Strings of even length including Null)
Complement of L is all the strings that is not in L.
So, L’ = {a, aaa, aaaaa, .....} (Strings of odd length excluding Null)
RE (L’) = a(aa)* which is a regular expression itself.
Hence, proved.
Hence, proved.
Proof −
We have to prove LR is also regular if L is a regular set.
Let, L = {01, 10, 11, 10}
RE (L) = 01 + 10 + 11 + 10
LR = {10, 01, 11, 01}
RE (LR) = 01 + 10 + 11 + 10 which is regular
Hence, proved.
Proof −
If L = {a, aaa, aaaaa, .......} (Strings of odd length excluding Null)
i.e., RE (L) = a (aa)*
L* = {a, aa, aaa, aaaa , aaaaa,……………} (Strings of all lengths excluding Null)
RE (L*) = a (a)*
Hence, proved.
Proof −
Regular Expressions
• A compact notation to describe regularlanguages
• Omit braces around one-string sets, use + todenote union and juxtapose subexpressions to
represent concatenation (without the dot, like we have been doing).
• Useful in
– text search (editors, Unix/grep)
– compilers: lexical analysis
Regular Expressions: Examples
• (0+1)*
– All binary strings
• ((0+1)(0+1))*
– All binary strings of even length
• (0+1)*001(0+1)*
– All binary strings containing the substring 001
• 0* + (0*10*10*10*)*
– All binary strings with #1s ≡ 0 mod 3
• (01+1)*(0+ε)
– All binary strings without two consecutive 0s
The Pumping Lemma is made up of two words, in which, the word pumping is used to generate many
input strings by pushing the symbol in input string one after another, and the word Lemma is used as
intermediate theorem in a proof.
Pumping lemma is a method to prove that certain languages are not context free.
The set of all context free language is identical to the set of languages accepted by Push down Automata.
Theorem:
If L be a Context free language, then there is a constant ‘n’ depending only on L such that, if w ? L and |
w| >= n, then w may be divided into five pieces w = uvxyz, satisfying the following conditions.
Let G be in Chomsky normal form (CNF). It does not contain any empty string. As we know the right
hand side of productions in CNF contain maximum two variables. So, the derivation tree representing G
is a binary tree in which, no node contains two children.
Step 1: Let L is a context free language, and we will get contradiction. Let n be a natural number obtained
by pumping lemma.
Step 2: Now choose a string w ? L where |w| >= n. By using pumping lemma, we can write w = uvxyz
with |vy| >= 1 and |vxy| <= n.
Step 3: Find suitable i, so that uvixyiz ? L. It contradicts our assumption and it is proved that given
languages is not context free.
Example 1:
Let L= { anbncn | n>=0 }. By using pumping lemma show that L is not context free language.
Solution:
Step 1: Let L is a context free language, and we will get contradiction. Let n be a natural number obtained
by pumping lemma.
Step 2: Let w = anbncn where| w |>= 3n. By using pumping lemma we can write w = uvxyz with |vy| >=
1 and |vxy| <= n.
Case 1: Here, v and y contain only one type of alphabet symbol, i.e. both contains only a’s.
Let i = 2
Then we have uv2xy2z, which pumped more a’s into the string, but the number of b’s remain same. It
contradicts our assumption, and it is proved that given language is not context free.
Case 2:
In given context free language, we have equal number of a’s, b’s and c’s. The possible substring from
given language anbncn can be ab and bc, but not ba, ca, ac and cb.
Then uv2xy2z may contain equal number of three alphabet symbols but not in correct order. The resulting
string is of the form
aaaa.aaaaaaaaa..aabb…bbbcc…..bcccbc
Here not all b’s follows a’s and not all c’s follows b’s. Hence it cannot be member of context free
language L and a contradiction occurs. Both the cases result in contradictions so L is not context free
language.
Example 2:
Let L= { anbnan | n> 0 }. By using pumping lemma show that L is not context free language.
Solution:
Step 1: Let L is a context free language and we will get contradiction. Let n be a natural number obtained
by pumping lemma.
Step 2: Let w = anbnan where| w |>= n. By using pumping lemma we can write w = uvxyz with |vy| >= 1
and |vxy| <= n.
Let, i = 2
Case 2: Allwords in anbnan have one occurrence of substring ab or ba no matter what n is.
Let, i = 2
Then uv2xy2z will have more than one substring ab or ba, so it cannot be in the form anbnan.
Hence, uv2xy2z ? L.
2. L = {a | k is a prime number} k
Proof by contradiction:
Let us assume L is regular. Clearly L is infinite (there are infinitely many prime numbers). From the
pumping lemma, there exists a number n such that any string of length greater than n has a “repeatable”
substring generating more strings in the language L. Let us consider the first prime number p $ n. For
example, if n was 50 we could use p = 53. From the pumping lemma the string of length p has a
“repeatable” substring. We will assume that this substring is of length k $ 1. Hence:
a 0 L and p
a 0 L as well as p + k
a 0 L, etc. p+2k
It should be relatively clear that p + k, p + 2k, etc., cannot all be prime but let us add k p times, then we
must have:
a 0 L, of course a p + pk p + pk = ap (k + 1)
so this would imply that (k + 1)p is prime, which it is not since it is divisible by both p
and k + 1.
3. L = {a b } n n+1
Assume L is regular. From the pumping lemma there exists a p such that every w 0L such that |w| $ p can
be represented as x y z with |y|
0 and |xy| # p. Let us choose a b . Its length is 2p + 1 $ p. Since the
length of xy cannot exceed p, y p p+1 must be of the form a for some k > 0. From the pumping lemma a b
must also k p-k p+1 be in L but it is not of the right form. Hence the language is not regular.
Note that the repeatable string needs to appear in the first n symbols to avoid the following situation:
Assume, for the sake of argument that n = 20 and you choose the string a b 10 11 which is of length larger
than 20, but |xy| # 20 allows xy to extend past b, which means that y could contain some b’s. In such case,
removing y (or adding more y’s) could lead to strings which still belong to L.
Summary:
In this chapter you will learn the four classes of formal languages introduced by Noam Chimsky.
Most famous classification of grammars and languages introduced by Noam Chomsky is divided into four
classes:
Recursively enumerable grammars – recognized by a Turing machine.
Context-sensitive grammars – recognizable by the linear bounded automaton
Context-free grammars – recognizable by the pushdown automaton
Regular grammars – recognizable by the finite state automaton.
1 – Context-sensitive grammars
Type-1 grammars generate the context-sensitive languages.
These grammars have rules of α A β → α γ βwhere A ∈ N (Non-terminal)and α, β, γ ∈ (T ∪ N)*
(Strings of terminals and non-terminals).
The strings α and β may be empty, but γ must be non-empty.
The rule S → ε is allowed if S does not appear on the right side of any rule. The languages
generated by these grammars are recognized by a linear bounded automaton.
2 – Context-free grammars
Type-2 grammars generate the context-free languages. These are defined by rules of the form
The productions must be in the form A → γ with A a nonterminal and γ a string of terminals and
nonterminal. These languages are exactly all languages that can be recognized by a non-
deterministic push down automaton. Context -free languages are theoretical basis for the syntax
of most programming languages.
3 - Regular grammars
Type 3-grammars generate the regular languages. Such a grammars restricts its rules to a single
nonterminal on the left-hand side and a right-hand side consisting of a single terminal possibly followed
(or proceeded, but not both in the same grammar) by a single nonterminal.
Evaluation Sheet in CMSC 124
(Automata and Language Theory)
Test I. MULTIPLE CHOICE: Read and understand the questions carefully. Circle the letter of the correct
answer.
1. The entity which generates Language is termed as:
a. Automata
b. Tokens
c. Grammar
d. Data
2. Production Rule aAb->agb belongs to which of the following category?
a. Regular languages
b. Context free languages
c. Context Sensitive Language
d. Recursive Enumerable Language
3. Which of the following statement is false?
a. Context free language is the subset of context sensitive language
b. Regular language is the subset of context sensitive language
c. Recursively enumerable language is the super set of regular language
d. Context sensitive language is a subset of context free language.
4. The grammar can be defined as: G=(V, Σ,p,S) in the given definition, what does S represents?
a. Accepting State
b. Starting Variable
c. Sensitive Grammar
d. None of these
5. Which among the following cannot be accepted by a regular grammar?
a. L is a set of numbers divisible by 2
b. L is a set of binary complement
c. L is a set of string with odd number of 0
d. L is set of 0”1”
6. Which of the expression is appropriate? For production p: a->b where a€V and b€?
a. V
b. S
c. (V+ Σ)
d. V + Σ
7. According to Noam Chomosky, there are four types of grammars − Type 0, Type 1, Type 2, and
Type 3.
a. Yes
b. No
8. Which of the following statement is correct?
a. All regular grammar is context free but not vice versa
b. All context free grammar are regular grammar but not vice versa
c. Regular grammar and context free grammar are the same entity
d. None of the mentioned.
9. Are ambiguous grammar is a context free?
a. Yes
b. No
10 DFA and NDFA conversion is the same?
a. Yes
b. No
Test II. Provide a context-free grammar for each of the following languages: