Unit 3
Unit 3
Unit 3
Application: Natural Language
Processing
- Formal Language -
(formal) Language
(Regular) Grammar
Formal Language
A formal language L is a set of finite-length
words (or "strings") over some finite
alphabet A. is the empty word.
Example:
A = {a, b, c}
L1 = {ab, c}
Grammar
Grammar is a standard way of representing a language and is
quadtruple set of: {V,T/Σ,P,S}
V: Finite set of variables
T/ Σ: Finite set of Terminals
P: Production Rules
S: Start Symbol
Purpose of Grammar:
1.Language can be derived from a grammar
2.Grammar checks whether the strings belong to that language or
not.
Formal Languages - Examples
Some examples of formal languages:
• the set of all words over {a, b},
• the set { an | n is a prime number },
• the set of syntactically correct programs in
some programming language, or
• the set of inputs upon which a certain
Turing machine halts.
Several operations can be used to produce new languages from
given ones. Suppose L1 and L2 are languages over some common
alphabet.
• The concatenation L1L2 consists of all strings of the form vw
where v is a string from L1 and w is a string from L2.
• The intersection of L1 and L2 consists of all strings which are
contained in L1 and also in L2.
• The union of L1 and L2 consists of all strings which are contained
in L1 or in L2.
• The complement of the language L1 consists of all strings over the
alphabet which are not contained in L1.
• The Kleene star L1* consists of all strings which can be written in
the form w1w2...wn with strings wi in L1 and n ≥ 0. Note that this
includes the empty string ε because n = 0 is allowed.
A formal language can be specified in a great
variety of ways, such as:
• Strings produced by some formal grammar (see
Chomsky hierarchy)
• Strings produced by a regular expression
• Strings accepted by some automaton, such as a
Turing machine or finite state automaton
• From a set of related YES/NO questions those
ones for which the answer is YES, see
decision problem
• If we select a string w such that w∈L, and
w=xyz. Which of the following portions
cannot be an empty string?
a) x
b) y
c) z
d) all of the mentioned
• Let w= xyz and y refers to the middle portion
and |y|>0.What do we call the process of
repeating y 0 or more times before checking
that they still belong to the language L or not?
a) Generating
b) Pumping
c) Producing
d) None of the mentioned
Formal Grammar - Definition
Example
Consider, for example, the grammar G with N =
{S, B}, Σ = {a, b, c}, P consisting of the
following production rules
1. S -> aBSc
2. S -> abc
3. Ba -> aB
4. Bb -> bb
Lecture #16
Recursive Enumerable Language
and Recursive Language
S.N Recursive Enumerable Recursive Language
o. Language
1 L is RE if there is TM L is Recursive(REC) if
there is halting/total TM
2 3 states: Halt and Accept 2 states: Halt and Accept
Halt and Reject Halt and Reject
Never Halt
3 Closed under all except set Closed under all except
difference and compliment Homomorphism and
substitution
REC is a subset of RE
If X Y
Type 0: Atleast one variable in X
Type 1: Type 0+ |X| <= |Y|
Type 2: Type 0+ Type 1 + |X|=1
Type 3: Type 2 + Y Є VT* + T*
OR
T*V + T*
Chomsky's four types of grammars
• Type-0 grammars (unrestricted grammars)
languages recognized by a Turing machine
• Type-1 grammars (context-sensitive grammars)
Turing machine with bounded tape
• Type-2 grammars (context-free grammars)
non-deterministic pushdown automaton
• Type-3 grammars (regular grammars)
regular expressions, finite state automaton
Grammars, Languages, Machines
Type-0
Recursively enumerable Turing machine No restrictions
Type-1
Context-sensitive Linear-bounded αAβ -> αγβ
non-deterministic
Turing machine
Type-2
Context-free Non-deterministic A -> γ
pushdown automaton
Type-3
Regular Finite state automaton A -> aB
A -> a
• The Grammar can be defined as: G=(V, ∑,
p, S)
In the given definition, what does S
represents?
a) Accepting State
b) Starting Variable
c) Sensitive Grammar
d) None of these
• Which among the following cannot be
accepted by a regular grammar?
a) L is a set of numbers divisible by 2
b) L is a set of binary complement
c) L is a set of string with odd number of 0
d) L is a set of 0n1n
Type1
Type 1
• Production Rule: aAb->agb belongs to
which of the following category?
a) Regular Language
b) Context free Language
c) Context Sensitive Language
d) Recursively Ennumerable Language
Type 2
Type 3
• . The entity which generate Language is
termed as:
a) Automata
b) Tokens
c) Grammar
d) Data
Questions
Solution
• Let G be a grammar: S->AB|e, A->a, B->b
Is the given grammar in CNF?
a) Yes
b) No
• With reference to the process of conversion of a context
free grammar to CNF, the number of variables to be
introduced for the terminals are:
S->ABa
A->aab
B->Ac
a) 3
b) 4
c) 2
d) 5
Recursive and Enumerable Sets
The Chomsky Hierarchy
Non Turing-Acceptable
Turing-Acceptable
decidable
Context-sensitive
Context-free
Regular
Which of the following statement is false?
a) Context free language is the subset of context sensitive language
b) Regular language is the subset of context sensitive language
c) Recursively ennumerable language is the super set of regular
language
d) Context sensitive language is a subset of context free language
LANGUAGES AND AUTOMATON
Languages and their Relation
Regular Grammar to Regular
Expression
Convert these RG into RE:
1.
2.
3.
4.
5.
Convert these RG into RE:
1.
2.
3. 4.
Regular Expression to Regular
Grammar
Steps:
1.Construct FA corresponding to the RE
2.Derive RLG and LLG
2.1. For RLG
•Considers outdegree: input x Next state
•Start symbol is initial state
•Epsilon transition will be in final state
2.2. For LLG
•Considers indegree: Previous State x input
•Start symbol is Final state
•Epsilon transition will be in initial state
Regular Expression to Regular
Grammar
Convert this RE into RG
Regular Expression to Regular
Grammar
Regular Expression to Regular
Grammar (Direct Method)
Example
Question: Derive RG for this RE: 0*(1(0+1))*