0% found this document useful (0 votes)
125 views

Lesson 6 3rd Release

The document discusses the classification of formal grammars and regular expressions. It defines four types of grammars (Type-0 to Type-3) that generate different formal languages. Type-3 grammars generate regular languages, Type-2 grammars generate context-free languages, Type-1 grammars generate context-sensitive languages, and Type-0 grammars generate recursively enumerable languages. It also discusses regular expressions and their use in defining regular languages. Properties of regular sets are described, including that the union and intersection of two regular sets is also a regular set.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views

Lesson 6 3rd Release

The document discusses the classification of formal grammars and regular expressions. It defines four types of grammars (Type-0 to Type-3) that generate different formal languages. Type-3 grammars generate regular languages, Type-2 grammars generate context-free languages, Type-1 grammars generate context-sensitive languages, and Type-0 grammars generate recursively enumerable languages. It also discusses regular expressions and their use in defining regular languages. Properties of regular sets are described, including that the union and intersection of two regular sets is also a regular set.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

LESSON

`
VI
CLASSIFICATION OF GRAMMARS

Introduction
In the literary sense of the term, grammars denote syntactical rules for conversation in natural
languages. Linguistics have attempted to define grammars since the inception of natural languages like
English, Sanskrit, Mandarin, etc.
The theory of formal languages finds its applicability extensively in the fields of Computer
Science. Noam Chomsky gave a mathematical model of grammar in 1956 which is effective for writing
computer languages.

Total Learning Time: Weeks 10,11 & 12 (9 hours)

DISCUSSION:
Grammar
A grammar G can be formally written as a 4-tuple (N, T, S, P) where −
 N or VN is a set of variables or non-terminal symbols.
 T or ∑ is a set of Terminal symbols.
 S is a special variable called the Start symbol, S ∈ N
 P is Production rules for Terminals and Non-terminals. A production rule has the form α → β,
where α and β are strings on VN ∪ ∑ and least one symbol of α belongs to VN.

Example

Grammar G1 −
({S, A, B}, {a, b}, S, {S → AB, A → a, B → b})
Here,
 S, A, and B are Non-terminal symbols;
 a and b are Terminal symbols
 S is the Start symbol, S ∈ N
 Productions, P : S → AB, A → a, B → b

Example

Grammar G2 −
(({S, A}, {a, b}, S,{S → aAb, aA → aaAb, A → ε } )
Here,
 S and A are Non-terminal symbols.
 a and b are Terminal symbols.
 ε is an empty string.
 S is the Start symbol, S ∈ N
 Production P : S → aAb, aA → aaAb, A → ε

Derivations from a Grammar

Strings may be derived from other strings using the productions in a grammar. If a grammar G has a
production α → β, we can say that x α y derives x β y in G. This derivation is written as −
x α y  ⇒G x β y

Example

Let us consider the grammar −


G2 = ({S, A}, {a, b}, S, {S → aAb, aA → aaAb, A → ε } )
Some of the strings that can be derived are −
S ⇒ aAb using production S → aAb
⇒ aaAbb using production aA → aAb
⇒ aaaAbbb using production aA → aaAb
⇒ aaabbb using production A → ε

Construction of a Grammar Generating a Language

We’ll consider some languages and convert it into a grammar G which produces those languages.

Example

Problem − Suppose, L (G) = {am bn | m ≥ 0 and n > 0}. We have to find out the grammar G which
produces L(G).
Solution
Since L(G) = {am bn | m ≥ 0 and n > 0}
the set of strings accepted can be rewritten as −
L(G) = {b, ab,bb, aab, abb, …….}
Here, the start symbol has to take at least one ‘b’ preceded by any number of ‘a’ including null.
To accept the string set {b, ab, bb, aab, abb, …….}, we have taken the productions −
S → aS , S → B, B → b and B → bB
S → B → b (Accepted)
S → B → bB → bb (Accepted)
S → aS → aB → ab (Accepted)
S → aS → aaS → aaB → aab(Accepted)
S → aS → aB → abB → abb (Accepted)
Thus, we can prove every single string in L(G) is accepted by the language generated by the production
set.
Hence the grammar −
G: ({S, A, B}, {a, b}, S, { S → aS | B , B → b | bB })

Example

Problem − Suppose, L (G) = {am bn | m > 0 and n ≥ 0}. We have to find out the grammar G which
produces L(G).
Solution −
Since L(G) = {am bn | m > 0 and n ≥ 0}, the set of strings accepted can be rewritten as −
L(G) = {a, aa, ab, aaa, aab ,abb, …….}
Here, the start symbol has to take at least one ‘a’ followed by any number of ‘b’ including null.
To accept the string set {a, aa, ab, aaa, aab, abb, …….}, we have taken the productions −
S → aA, A → aA , A → B, B → bB ,B → λ
S → aA → aB → aλ → a (Accepted)
S → aA → aaA → aaB → aaλ → aa (Accepted)
S → aA → aB → abB → abλ → ab (Accepted)
S → aA → aaA → aaaA → aaaB → aaaλ → aaa (Accepted)
S → aA → aaA → aaB → aabB → aabλ → aab (Accepted)
S → aA → aB → abB → abbB → abbλ → abb (Accepted)
Thus, we can prove every single string in L(G) is accepted by the language generated by the production
set.
Hence the grammar −
G: ({S, A, B}, {a, b}, S, {S → aA, A → aA | B, B → λ | bB })

Chomsky Classification of Grammars


According to Noam Chomosky, there are four types of grammars − Type 0, Type 1, Type 2, and Type 3.
The following table shows how they differ from each other −

Grammar Grammar Accepted Language Accepted Automaton


Type

Type 0 Unrestricted grammar Recursively enumerable Turing Machine


language

Type 1 Context-sensitive Context-sensitive Linear-bounded


grammar language automaton
Type 2 Context-free Context-free language Pushdown automaton
grammar

Type 3 Regular grammar Regular language Finite state


automaton

Take a look at the following illustration. It shows the scope of each type of grammar −

Type - 3 Grammar

Type-3 grammars generate regular languages. Type-3 grammars must have a single non-terminal on the
left-hand side and a right-hand side consisting of a single terminal or single terminal followed by a
single non-terminal.
The productions must be in the form X → a or X → aY
where X, Y ∈ N (Non terminal)
and a ∈ T (Terminal)
The rule S → ε is allowed if S does not appear on the right side of any rule.

Example

X→ε
X → a | aY
Y→b

Type - 2 Grammar

Type-2 grammars generate context-free languages.


The productions must be in the form A → γ
where A ∈ N (Non terminal)
and γ ∈ (T ∪ N)* (String of terminals and non-terminals).
These languages generated by these grammars are be recognized by a non-deterministic pushdown
automaton.
Example

S→Xa
X→a
X → aX
X → abc
X→ε

Type - 1 Grammar

Type-1 grammars generate context-sensitive languages. The productions must be in the form


αAβ→αγβ
where A ∈ N (Non-terminal)
and α, β, γ ∈ (T ∪ N)* (Strings of terminals and non-terminals)
The strings α and β may be empty, but γ must be non-empty.
The rule S → ε is allowed if S does not appear on the right side of any rule. The languages generated by
these grammars are recognized by a linear bounded automaton.

Example

AB → AbBc
A → bcA
B→b

Type - 0 Grammar

Type-0 grammars generate recursively enumerable languages. The productions have no restrictions.


They are any phase structure grammar including all formal grammars.
They generate the languages that are recognized by a Turing machine.
The productions can be in the form of α → β where α is a string of terminals and nonterminals with at
least one non-terminal and α cannot be null. β is a string of terminals and non-terminals.

Example

S → ACaB
Bc → acB
CB → DB
aD → Db

Regular Expressions
A Regular Expression can be recursively defined as follows −
 ε is a Regular Expression indicates the language containing an empty string. (L (ε) = {ε})
 φ is a Regular Expression denoting an empty language. (L (φ) = { })
 x is a Regular Expression where L = {x}
 If X is a Regular Expression denoting the language L(X) and Y is a Regular Expression denoting
the language L(Y), then
o X + Y is a Regular Expression corresponding to the language L(X) ∪
L(Y) where L(X+Y) = L(X) ∪ L(Y).
o X . Y is a Regular Expression corresponding to the language L(X) . L(Y) where L(X.Y)
= L(X) . L(Y)
o R* is a Regular Expression corresponding to the language L(R*)where L(R*) = (L(R))*
 If we apply any of the rules several times from 1 to 5, they are Regular Expressions.

Some RE Examples

Regular Regular Set


Expressions

(0 + 10*) L = { 0, 1, 10, 100, 1000, 10000, … }

(0*10*) L = {1, 01, 10, 010, 0010, …}

(0 + ε)(1 + ε) L = {ε, 0, 1, 01}

(a+b)* Set of strings of a’s and b’s of any length including the null string. So L
= { ε, a, b, aa , ab , bb , ba, aaa…….}

(a+b)*abb Set of strings of a’s and b’s ending with the string abb. So L = {abb,
aabb, babb, aaabb, ababb, …………..}

(11)* Set consisting of even number of 1’s including empty string, So L= {ε,
11, 1111, 111111, ……….}

(aa)*(bb)*b Set of strings consisting of even number of a’s followed by odd number
of b’s , so L = {b, aab, aabbb, aabbbbb, aaaab, aaaabbb, …………..}

(aa + ab + ba + String of a’s and b’s of even length can be obtained by concatenating any
bb)* combination of the strings aa, ab, ba and bb including null, so L = {aa,
ab, ba, bb, aaab, aaba, …………..}

Formal definition of CFG


• A Context-free grammar is a 4-tuple
(V, Σ, R, S) where
1. V is a finite set called the variables (nonterminals)
2. Σ is a finite set (disjoint from V) called the
terminals,
3. R is a finite set of rules, where each rule maps
a variable to a string s ∈ (V ∪ Σ)*
4. S ∈ V is the start symbol

Regular Sets
Any set that represents the value of the Regular Expression is called a Regular Set.

Properties of Regular Sets


Property 1. The union of two regular set is regular.

Proof −

Let us take two regular expressions


RE1 = a(aa)* and RE2 = (aa)*
So, L1 = {a, aaa, aaaaa,.....} (Strings of odd length excluding Null)
and L2 ={ ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 ∪ L2 = { ε, a, aa, aaa, aaaa, aaaaa, aaaaaa,.......}
(Strings of all possible lengths including Null)
RE (L1 ∪ L2) = a* (which is a regular expression itself)

Hence, proved.
Property 2. The intersection of two regular set is regular.

Proof −
Let us take two regular expressions
RE1 = a(a*) and RE2 = (aa)*
So, L1 = { a,aa, aaa, aaaa, ....} (Strings of all possible lengths excluding Null)
L2 = { ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 ∩ L2 = { aa, aaaa, aaaaaa,.......} (Strings of even length excluding Null)
RE (L1 ∩ L2) = aa(aa)* which is a regular expression itself.

Hence, proved.
Property 3. The complement of a regular set is regular.

Proof −
Let us take a regular expression −
RE = (aa)*
So, L = {ε, aa, aaaa, aaaaaa, .......} (Strings of even length including Null)
Complement of L is all the strings that is not in L.
So, L’ = {a, aaa, aaaaa, .....} (Strings of odd length excluding Null)
RE (L’) = a(aa)* which is a regular expression itself.
Hence, proved.

Property 4. The difference of two regular set is regular.


Proof −
Let us take two regular expressions −
RE1 = a (a*) and RE2 = (aa)*
So, L1 = {a, aa, aaa, aaaa, ....} (Strings of all possible lengths excluding Null)
L2 = { ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 – L2 = {a, aaa, aaaaa, aaaaaaa, ....}
(Strings of all odd lengths excluding Null)
RE (L1 – L2) = a (aa)* which is a regular expression.

Hence, proved.

Property 5. The reversal of a regular set is regular.

Proof −
We have to prove LR is also regular if L is a regular set.
Let, L = {01, 10, 11, 10}
RE (L) = 01 + 10 + 11 + 10
LR = {10, 01, 11, 01}
RE (LR) = 01 + 10 + 11 + 10 which is regular

Hence, proved.

Property 6. The closure of a regular set is regular.

Proof −
If L = {a, aaa, aaaaa, .......} (Strings of odd length excluding Null)
i.e., RE (L) = a (aa)*
L* = {a, aa, aaa, aaaa , aaaaa,……………} (Strings of all lengths excluding Null)
RE (L*) = a (a)*

Hence, proved.

Property 7. The concatenation of two regular sets is regular.

Proof −

Let RE1 = (0+1)*0 and RE2 = 01(0+1)*


Here, L1 = {0, 00, 10, 000, 010, ......} (Set of strings ending in 0)
and L2 = {01, 010,011,.....} (Set of strings beginning with 01)
Then, L1 L2 = {001,0010,0011,0001,00010,00011,1001,10010,.............}
Set of strings containing 001 as a substring which can be represented by an RE − (0 + 1)*001(0 + 1)*
Hence, proved.

Regular Expressions
• A compact notation to describe regularlanguages
• Omit braces around one-string sets, use + todenote union and juxtapose subexpressions to
represent concatenation (without the dot, like we have been doing).
• Useful in
– text search (editors, Unix/grep)
– compilers: lexical analysis
Regular Expressions: Examples
• (0+1)*
– All binary strings
• ((0+1)(0+1))*
– All binary strings of even length
• (0+1)*001(0+1)*
– All binary strings containing the substring 001
• 0* + (0*10*10*10*)*
– All binary strings with #1s ≡ 0 mod 3
• (01+1)*(0+ε)
– All binary strings without two consecutive 0s

Pumping Lemma for context free languages

The Pumping Lemma is made up of two words, in which, the word pumping is used to generate many
input strings by pushing the symbol in input string one after another, and the word Lemma is used as
intermediate theorem in a proof.

Pumping lemma is a method to prove that certain languages are not context free.

The set of all context free language is identical to the set of languages accepted by Push down Automata.

Theorem:

If L be a Context free language, then there is a constant ‘n’ depending only on L such that, if w ? L and |
w| >= n, then w may be divided into five pieces w = uvxyz, satisfying the following conditions.

For all i >= 0, uvixyiz ? L.


|vy| >= 1
|vxy| <= p
Proof:

Let G be in Chomsky normal form (CNF). It does not contain any empty string. As we know the right
hand side of productions in CNF contain maximum two variables. So, the derivation tree representing G
is a binary tree in which, no node contains two children.

How to Apply Pumping Lemma?


We can apply pumping lemma in context free language to prove that given language is not context free.
The steps needed to prove that given languages is not context free are given below:

Step 1: Let L is a context free language, and we will get contradiction. Let n be a natural number obtained
by pumping lemma.

Step 2: Now choose a string w ? L where |w| >= n. By using pumping lemma, we can write w = uvxyz
with |vy| >= 1 and |vxy| <= n.

Step 3: Find suitable i, so that uvixyiz ? L. It contradicts our assumption and it is proved that given
languages is not context free.

Example 1:
Let L= { anbncn | n>=0 }. By using pumping lemma show that L is not context free language.

Solution:

Step 1: Let L is a context free language, and we will get contradiction. Let n be a natural number obtained
by pumping lemma.

Step 2: Let w = anbncn where| w |>= 3n. By using pumping lemma we can write w = uvxyz with |vy| >=
1 and |vxy| <= n.

Step 3: In step 3, we consider two cases.

Case 1: Here, v and y contain only one type of alphabet symbol, i.e. both contains only a’s.

Here, uvxyz = anbncn

Let i = 2

Then we have uv2xy2z, which pumped more a’s into the string, but the number of b’s remain same. It
contradicts our assumption, and it is proved that given language is not context free.

Case 2:
In given context free language, we have equal number of a’s, b’s and c’s. The possible substring from
given language anbncn can be ab and bc, but not ba, ca, ac and cb.

If we choose substring v and y as combination of a and b or b and c.

Then uv2xy2z may contain equal number of three alphabet symbols but not in correct order. The resulting
string is of the form

aaaa.aaaaaaaaa..aabb…bbbcc…..bcccbc

Here not all b’s follows a’s and not all c’s follows b’s. Hence it cannot be member of context free
language L and a contradiction occurs. Both the cases result in contradictions so L is not context free
language.

Example 2:

Let L= { anbnan | n> 0 }. By using pumping lemma show that L is not context free language.

Solution:

Step 1: Let L is a context free language and we will get contradiction. Let n be a natural number obtained
by pumping lemma.

Step 2: Let w = anbnan where| w |>= n. By using pumping lemma we can write w = uvxyz with |vy| >= 1
and |vxy| <= n.

Step 3: In step 3 we consider two cases:


Case 1: When both v and y contains equal number of a’s and b’s.

Let, i = 2

Then uv2xy2z = an+k b n+k a n or a n b n+k an+k which is not in L.

Case 2: Allwords in anbnan have one occurrence of substring ab or ba no matter what n is.

Let, i = 2

Then uv2xy2z will have more than one substring ab or ba, so it cannot be in the form anbnan.

Hence, uv2xy2z ? L.

Solutions to Practice Problems


Pumping Lemma
1. L = { a b | k $ 0} k k
see notes

2. L = {a | k is a prime number} k
Proof by contradiction:

Let us assume L is regular. Clearly L is infinite (there are infinitely many prime numbers). From the
pumping lemma, there exists a number n such that any string of length greater than n has a “repeatable”
substring generating more strings in the language L. Let us consider the first prime number p $ n. For
example, if n was 50 we could use p = 53. From the pumping lemma the string of length p has a
“repeatable” substring. We will assume that this substring is of length k $ 1. Hence:
a 0 L and p
a 0 L as well as p + k
a 0 L, etc. p+2k
It should be relatively clear that p + k, p + 2k, etc., cannot all be prime but let us add k p times, then we
must have:
a 0 L, of course a p + pk p + pk = ap (k + 1)
so this would imply that (k + 1)p is prime, which it is not since it is divisible by both p
and k + 1.

Hence L is not regular.

3. L = {a b } n n+1
Assume L is regular. From the pumping lemma there exists a p such that every w 0L such that |w| $ p can
be represented as x y z with |y| … 0 and |xy| # p. Let us choose a b . Its length is 2p + 1 $ p. Since the
length of xy cannot exceed p, y p p+1 must be of the form a for some k > 0. From the pumping lemma a b
must also k p-k p+1 be in L but it is not of the right form. Hence the language is not regular.

Note that the repeatable string needs to appear in the first n symbols to avoid the following situation:
Assume, for the sake of argument that n = 20 and you choose the string a b 10 11 which is of length larger
than 20, but |xy| # 20 allows xy to extend past b, which means that y could contain some b’s. In such case,
removing y (or adding more y’s) could lead to strings which still belong to L.

Summary:
In this chapter you will learn the four classes of formal languages introduced by Noam Chimsky.
Most famous classification of grammars and languages introduced by Noam Chomsky is divided into four
classes:
 Recursively enumerable grammars – recognized by a Turing machine.
 Context-sensitive grammars – recognizable by the linear bounded automaton
 Context-free grammars – recognizable by the pushdown automaton
 Regular grammars – recognizable by the finite state automaton.

Noam Chomsky, is an American linguist, philosopher, cognitive scientist and social


activist. Chomsky is well known in the academic and scientific community as one of the fathers
of modern linguistics and a major figure of analytic philosophy.

0- Recursively enumerable grammar


Type – 0 grammars (unrestricted grammars) include all formal languages. They generate exactly
all languages that can be recognized by a Turing machine. These languages are also known as the
recursively enumerable languages. Note that this is different from the recursive languages which
can be decided by an always-halting Turing machine.
Class 0 grammars are too general to describe the syntax of programming languages and natural
languages.

1 – Context-sensitive grammars
Type-1 grammars generate the context-sensitive languages.
These grammars have rules of α A β → α γ βwhere A ∈ N (Non-terminal)and α, β, γ ∈ (T ∪ N)*
(Strings of terminals and non-terminals).
The strings α and β may be empty, but γ must be non-empty.
The rule S → ε is allowed if S does not appear on the right side of any rule. The languages
generated by these grammars are recognized by a linear bounded automaton.

2 – Context-free grammars
Type-2 grammars generate the context-free languages. These are defined by rules of the form
The productions must be in the form A → γ with A a nonterminal and γ a string of terminals and
nonterminal. These languages are exactly all languages that can be recognized by a non-
deterministic push down automaton. Context -free languages are theoretical basis for the syntax
of most programming languages.

3 - Regular grammars
Type 3-grammars generate the regular languages. Such a grammars restricts its rules to a single
nonterminal on the left-hand side and a right-hand side consisting of a single terminal possibly followed
(or proceeded, but not both in the same grammar) by a single nonterminal.
Evaluation Sheet in CMSC 124
(Automata and Language Theory)

Name: ________________________ Course & Year: _____________ Score: _______

Test I. MULTIPLE CHOICE: Read and understand the questions carefully. Circle the letter of the correct
answer.
1. The entity which generates Language is termed as:
a. Automata
b. Tokens
c. Grammar
d. Data
2. Production Rule aAb->agb belongs to which of the following category?
a. Regular languages
b. Context free languages
c. Context Sensitive Language
d. Recursive Enumerable Language
3. Which of the following statement is false?
a. Context free language is the subset of context sensitive language
b. Regular language is the subset of context sensitive language
c. Recursively enumerable language is the super set of regular language
d. Context sensitive language is a subset of context free language.
4. The grammar can be defined as: G=(V, Σ,p,S) in the given definition, what does S represents?
a. Accepting State
b. Starting Variable
c. Sensitive Grammar
d. None of these
5. Which among the following cannot be accepted by a regular grammar?
a. L is a set of numbers divisible by 2
b. L is a set of binary complement
c. L is a set of string with odd number of 0
d. L is set of 0”1”
6. Which of the expression is appropriate? For production p: a->b where a€V and b€?
a. V
b. S
c. (V+ Σ)
d. V + Σ
7. According to Noam Chomosky, there are four types of grammars − Type 0, Type 1, Type 2, and
Type 3.
a. Yes
b. No
8. Which of the following statement is correct?
a. All regular grammar is context free but not vice versa
b. All context free grammar are regular grammar but not vice versa
c. Regular grammar and context free grammar are the same entity
d. None of the mentioned.
9. Are ambiguous grammar is a context free?
a. Yes
b. No
10 DFA and NDFA conversion is the same?
a. Yes
b. No

Test II. Provide a context-free grammar for each of the following languages:

1. Give the set of strings of base -1 numbers

2. The set of strings of base-8 numbers

3. Give a set of strings of base-16 numbers


Test III. Discussion: Briefly discussed the following in your own words. Write your answer on the
space provided.
1. What is the classification of grammar introduced by Chomsky? 10pts.
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

2. Which class of grammars corresponds to Turing machines? 10pts.


______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

KEEP SAFE EVERYONE……

You might also like