0% found this document useful (0 votes)
20 views

Unit 3

The document covers formal languages and regular grammars, focusing on their applications in natural language processing. It explains the definitions of formal languages, grammars, and various operations that can be performed on languages, as well as the Chomsky hierarchy of grammars. Additionally, it discusses closure properties of regular sets and provides examples of converting regular grammars to regular expressions and vice versa.

Uploaded by

Abhi
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Unit 3

The document covers formal languages and regular grammars, focusing on their applications in natural language processing. It explains the definitions of formal languages, grammars, and various operations that can be performed on languages, as well as the Chomsky hierarchy of grammars. Additionally, it discusses closure properties of regular sets and provides examples of converting regular grammars to regular expressions and vice versa.

Uploaded by

Abhi
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 98

CSE322

Formal Languages &


Regular Grammars

Unit 3
Application: Natural Language
Processing
- Formal Language -
(formal) Language
(Regular) Grammar
Formal Language
A formal language L is a set of finite-length
words (or "strings") over some finite
alphabet A.  is the empty word.
Example:
A = {a, b, c}
L1 = {ab, c}
Grammar
Grammar is a standard way of representing a language and is
quadtruple set of: {V,T/Σ,P,S}
V: Finite set of variables
T/ Σ: Finite set of Terminals
P: Production Rules
S: Start Symbol

Purpose of Grammar:
1.Language can be derived from a grammar
2.Grammar checks whether the strings belong to that language or
not.
Formal Languages - Examples
Some examples of formal languages:
• the set of all words over {a, b},
• the set { an | n is a prime number },
• the set of syntactically correct programs in
some programming language, or
• the set of inputs upon which a certain
Turing machine halts.
Several operations can be used to produce new languages from
given ones. Suppose L1 and L2 are languages over some common
alphabet.
• The concatenation L1L2 consists of all strings of the form vw
where v is a string from L1 and w is a string from L2.
• The intersection of L1 and L2 consists of all strings which are
contained in L1 and also in L2.
• The union of L1 and L2 consists of all strings which are contained
in L1 or in L2.
• The complement of the language L1 consists of all strings over the
alphabet which are not contained in L1.
• The Kleene star L1* consists of all strings which can be written in
the form w1w2...wn with strings wi in L1 and n ≥ 0. Note that this
includes the empty string ε because n = 0 is allowed.
A formal language can be specified in a great
variety of ways, such as:
• Strings produced by some formal grammar (see
Chomsky hierarchy)
• Strings produced by a regular expression
• Strings accepted by some automaton, such as a
Turing machine or finite state automaton
• From a set of related YES/NO questions those
ones for which the answer is YES, see
decision problem
• If we select a string w such that w∈L, and
w=xyz. Which of the following portions
cannot be an empty string?
a) x
b) y
c) z
d) all of the mentioned
• Let w= xyz and y refers to the middle portion
and |y|>0.What do we call the process of
repeating y 0 or more times before checking
that they still belong to the language L or not?
a) Generating
b) Pumping
c) Producing
d) None of the mentioned
Formal Grammar - Definition

A formal grammar G = (N, Σ, P, S) consists of:


• A finite set N of nonterminal symbols.
• A finite set Σ of terminal symbols that is disjoint from
N.
• A finite set P of production rules where a rule is of the
form
• string in (Σ U N)* -> string in (Σ U N)*
– (where * is the Kleene star and U is set union)
– the left-hand side of a rule must contain at least one
nonterminal symbol.
• A symbol S in N that is indicated as the start symbol.
• A regular language over an alphabet a is
one that can be obtained from
a) union
b) concatenation
c) kleene
d) All of the mentioned
…….
• Answer in accordance to the third and last
statement in pumping lemma:
For all _______ xyiz ∈L
a) i>0
b) i<0
c) i<=0
d) i>=0
Closure properties of regular sets
• Property 1. The union of two regular set is regular.
• Proof −
• Let us take two regular expressions
• RE1 = a(aa)* and RE2 = (aa)*
• So, L1 = {a, aaa, aaaaa,.....} (Strings of odd length excluding Null)
• and L2 ={ ε, aa, aaaa, aaaaaa,.......} (Strings of even length including
Null)
• L1 ∪ L2 = { ε, a, aa, aaa, aaaa, aaaaa, aaaaaa,.......}
• (Strings of all possible lengths including Null)
• RE (L1 ∪ L2) = a* (which is a regular expression itself)
• Hence, proved.
Closure properties of regular sets
• Property 2. The intersection of two regular set is regular.
• Proof −
• Let us take two regular expressions
• RE1 = a(a*) and RE2 = (aa)*
• So, L1 = { a,aa, aaa, aaaa, ....} (Strings of all possible lengths excluding
Null)
• L2 = { ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
• L1 ∩ L2 = { aa, aaaa, aaaaaa,.......} (Strings of even length excluding
Null)
• RE (L1 ∩ L2) = aa(aa)* which is a regular expression itself.
• Hence, proved.
Closure properties of regular sets
• Property 3. The complement of a regular set is regular.
• Proof −
• Let us take a regular expression −
• RE = (aa)*
• So, L = {ε, aa, aaaa, aaaaaa, .......} (Strings of even length
including Null)
• Complement of L is all the strings that is not in L.
• So, L’ = {a, aaa, aaaaa, .....} (Strings of odd length excluding
Null)
• RE (L’) = a(aa)* which is a regular expression itself.
• Hence, proved.
Closure properties of regular sets
• Property 4. The difference of two regular set is regular.
• Proof −
• Let us take two regular expressions −
• RE1 = a (a*) and RE2 = (aa)*
• So, L1 = {a, aa, aaa, aaaa, ....} (Strings of all possible lengths
excluding Null)
• L2 = { ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
• L1 – L2 = {a, aaa, aaaaa, aaaaaaa, ....}
• (Strings of all odd lengths excluding Null)
• RE (L1 – L2) = a (aa)* which is a regular expression.
Closure properties of regular sets
• Property 5. The reversal of a regular set is regular.
• Proof −
• We have to prove LR is also regular if L is a regular set.
• Let, L = {01, 10, 11, 10}
• RE (L) = 01 + 10 + 11 + 10
• LR = {10, 01, 11, 01}
• RE (LR) = 01 + 10 + 11 + 10 which is regular
• Hence, proved.
• Property 6. The closure of a regular set is regular.
• Proof −
• If L = {a, aaa, aaaaa, .......} (Strings of odd length excluding Null)
• i.e., RE (L) = a (aa)*
• L* = {a, aa, aaa, aaaa , aaaaa,……………} (Strings of all lengths excluding Null)
• RE (L*) = a (a)*
• Hence, proved.
Closure properties of regular sets
• Property 7. The concatenation of two regular sets is regular.
• Proof −
• Let RE1 = (0+1)*0 and RE2 = 01(0+1)*
• Here, L1 = {0, 00, 10, 000, 010, ......} (Set of strings ending in
0)
• and L2 = {01, 010,011,.....} (Set of strings beginning with 01)
• Then, L1 L2 =
{001,0010,0011,0001,00010,00011,1001,10010,.............}
• Set of strings containing 001 as a substring which can be
represented by an RE − (0 + 1)*001(0 + 1)*
• If L1, L2 are regular and op(L1, L2) is also
regular, then L1 and L2 are said to be
____________ under an operation op.
a) open
b) closed
c) decidable
d) none of the mentioned
• If L1′ and L2′ are regular languages, then
L1.L2 will be
a) regular
b) non regular
c) may be regular
d) none of the mentioned
Language of a Formal Grammar

The language of a formal grammar G = (N, Σ, P,


S), denoted as L(G), is defined as all those
strings over Σ that can be generated by starting
with the start symbol S and then applying the
production rules in P until no more nonterminal
symbols are present.
Language of a Formal Grammar

Example
Consider, for example, the grammar G with N =
{S, B}, Σ = {a, b, c}, P consisting of the
following production rules
1. S -> aBSc
2. S -> abc
3. Ba -> aB
4. Bb -> bb

This grammar defines the language {anbncn | n>0}


• Which of the expression is appropriate?
For production p: a->b where a∈V and
b∈_______
a) V
b) S
c) (V+∑)*
d) V+ ∑
CSE322
The Chomsky Hierarchy

Lecture #16
Recursive Enumerable Language
and Recursive Language
S.N Recursive Enumerable Recursive Language
o. Language
1 L is RE if there is TM L is Recursive(REC) if
there is halting/total TM
2 3 states: Halt and Accept 2 states: Halt and Accept
Halt and Reject Halt and Reject
Never Halt
3 Closed under all except set Closed under all except
difference and compliment Homomorphism and
substitution

REC is a subset of RE
If X Y
Type 0: Atleast one variable in X
Type 1: Type 0+ |X| <= |Y|
Type 2: Type 0+ Type 1 + |X|=1
Type 3: Type 2 + Y Є VT* + T*
OR
T*V + T*
Chomsky's four types of grammars
• Type-0 grammars (unrestricted grammars)
languages recognized by a Turing machine
• Type-1 grammars (context-sensitive grammars)
Turing machine with bounded tape
• Type-2 grammars (context-free grammars)
non-deterministic pushdown automaton
• Type-3 grammars (regular grammars)
regular expressions, finite state automaton
Grammars, Languages, Machines

Type-0
Recursively enumerable Turing machine No restrictions
Type-1
Context-sensitive Linear-bounded αAβ -> αγβ
non-deterministic
Turing machine
Type-2
Context-free Non-deterministic A -> γ
pushdown automaton
Type-3
Regular Finite state automaton A -> aB
A -> a
• The Grammar can be defined as: G=(V, ∑,
p, S)
In the given definition, what does S
represents?
a) Accepting State
b) Starting Variable
c) Sensitive Grammar
d) None of these
• Which among the following cannot be
accepted by a regular grammar?
a) L is a set of numbers divisible by 2
b) L is a set of binary complement
c) L is a set of string with odd number of 0
d) L is a set of 0n1n
Type1
Type 1
• Production Rule: aAb->agb belongs to
which of the following category?
a) Regular Language
b) Context free Language
c) Context Sensitive Language
d) Recursively Ennumerable Language
Type 2
Type 3
• . The entity which generate Language is
termed as:
a) Automata
b) Tokens
c) Grammar
d) Data
Questions
Solution
• Let G be a grammar: S->AB|e, A->a, B->b
Is the given grammar in CNF?

a) Yes
b) No
• With reference to the process of conversion of a context
free grammar to CNF, the number of variables to be
introduced for the terminals are:
S->ABa
A->aab
B->Ac
a) 3
b) 4
c) 2
d) 5
Recursive and Enumerable Sets
The Chomsky Hierarchy

Non Turing-Acceptable

Turing-Acceptable
decidable

Context-sensitive

Context-free

Regular
Which of the following statement is false?
a) Context free language is the subset of context sensitive language
b) Regular language is the subset of context sensitive language
c) Recursively ennumerable language is the super set of regular
language
d) Context sensitive language is a subset of context free language
LANGUAGES AND AUTOMATON
Languages and their Relation
Regular Grammar to Regular
Expression
Convert these RG into RE:

1.

2.

3.

4.
5.
Convert these RG into RE:

1.
2.

3. 4.
Regular Expression to Regular
Grammar
Steps:
1.Construct FA corresponding to the RE
2.Derive RLG and LLG
2.1. For RLG
•Considers outdegree: input x Next state
•Start symbol is initial state
•Epsilon transition will be in final state
2.2. For LLG
•Considers indegree: Previous State x input
•Start symbol is Final state
•Epsilon transition will be in initial state
Regular Expression to Regular
Grammar
Convert this RE into RG
Regular Expression to Regular
Grammar
Regular Expression to Regular
Grammar (Direct Method)
Example
Question: Derive RG for this RE: 0*(1(0+1))*

You might also like