0% found this document useful (0 votes)
179 views

Chapter 5 Context Free Languages and Grammar PDF

This document discusses context-free grammars and their importance in describing programming languages. Context-free grammars allow recursive structures to be captured and are useful for modeling aspects of human languages as well as specifying programming languages. They are a central component of compiler design, allowing parsers to be automatically generated from a context-free grammar of the language.

Uploaded by

peninah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
179 views

Chapter 5 Context Free Languages and Grammar PDF

This document discusses context-free grammars and their importance in describing programming languages. Context-free grammars allow recursive structures to be captured and are useful for modeling aspects of human languages as well as specifying programming languages. They are a central component of compiler design, allowing parsers to be automatically generated from a context-free grammar of the language.

Uploaded by

peninah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

ICS 2407

THEORY OF COMPUTING
Chapter 5
Context-Free Grammars and
Languages
Introduction to Grammar
• A grammar is intuitively a set of rules which are used to
construct a language contained in Σ* for some alphabet
Σ.
• These rules allow us to replace symbols or strings of
symbols with other symbols or strings of symbols until
we finally have strings of symbols contained Σ in
allowing us to form an element of the language.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
2
9 Ogada
Introduction to Grammar

• By placing restrictions on the rules, we shall


see that we can develop different types of
languages.

• In particular we can restrict our rules to


produce desirable qualities in our language.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
3
9 Ogada
Formal Languages
• A formal language is abstraction of the general
characteristics of programming languages.
• A formal language consists of a set of symbols and
some rules of formation by which those symbols can
be combined into entities called sentences.
• A formal language is the set of all strings
permitted by the rules of formation.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
4
9 Ogada
Describing Languages

• There are two different methods of describing


languages:
– Finite automata and
– Regular expressions.
• Context-free grammars are a more powerful
method of describing languages.
25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
5
9 Ogada
Describing Languages

• Such grammars can describe certain features


that have a recursive structure, which makes
them useful in a variety of applications.

• Context-free grammars were first used in the


study of human languages.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
6
9 Ogada
Describing Languages

• One way of understanding the relationship


of terms such as noun, verb, and preposition
and their respective phrases leads to a
natural recursion because noun phrases may
appear inside verb phrases and vice versa.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
7
9 Ogada
Describing Languages

• Context-free grammars can capture important


aspects of these relationships.

• An important application of context-free


grammars occurs in the specification and
compilation of programming languages.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
8
9 Ogada
Context Free Grammars

• How can we tell if a given string is derivable


from a given grammar?

• Explaining if a sentence through its grammatical


derivation is through a study of natural
languages and is called Parsing.

• Parsing is a way of describing sentence structure.


25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
9
9 Ogada
Context Free Grammars

• Parsing is important whenever we need to


understand the meaning of a sentence, as we
do for instance in translating from one
language to another.
• In computing, this is relevant in interpreters,
compilers and other translating programs.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
10
9 Ogada
Context Free Grammars

• The topic of context free languages is perhaps the


most important aspect of formal language theory
as it applies to programming languages.

• Actual programming languages have many


features that can be described elegantly by
means of context-free languages.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
11
9 Ogada
Context Free Grammars

• A grammar for a programming language often


appears as a reference for people trying to learn
the language syntax.

• Designers of compilers and interpreters for


programming languages often start by obtaining a
grammar for the language.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
12
9 Ogada
Context Free Grammars

• Most compilers and interpreters contain a


component called a parser that extracts the
meaning of a program prior to generating the
compiled code or performing the interpreted
execution.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
13
9 Ogada
Context Free Grammars

• A number of methodologies facilitate the


construction of a parser once a context-free
grammar is available.

• Some tools even automatically generate the


parser from the grammar.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
14
9 Ogada
Context Free Grammars

• The collection of languages associated with


context-free grammars are called the context-
free languages.

• They include all the regular languages and


many additional languages.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
15
9 Ogada
Context-Free Grammars
• Context-free grammars have played a central role
in compiler technology since the 1960’s; they
turned the implementation of parsers (functions
that discover the structure of a program) from a
time-consuming, ad-hoc implementation task
into a routine job that can be done in a
afternoon.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
16
9 Ogada
Context-Free Grammars

• The context-free grammar has also been used


to describe document formats, via the so
called document-type definition (DTD) that is
used in the XML (extensible Markup
Language) community for information
exchange on the Web.
25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
17
9 Ogada
Definition of a Context-Free Grammar
• There are 4 important components in a
grammatical description of a language:
– Terminals or terminal symbols.
– Variables or nonterminals or syntactic
categories.
– Start symbol.
– Finite set of Productions or rules.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
18
9 Ogada
Context-free grammars

• A grammar consists of a collection of


substitution rules, also called productions.
• Each rule appears as a line in the grammar,
comprising a symbol and a string separated by
an arrow.
• The symbol is called a variable.
25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
19
9 Ogada
Context-free grammars

• The string consists of variables and other


symbols called terminals.

• The variable symbols often are represented by


capital letters.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
20
9 Ogada
Context-free grammars

• The terminals are analogous to the input


alphabet and often are represented by lowercase
letters, numbers, or special symbols.

• One variable is designated as the start variable.

• It usually occurs on the left-hand side of the


topmost rule.
25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
21
9 Ogada
Grammar for Palindromes
• Use context-free grammar to formally express recursive
definitions of palindromes
1. P→
2. P→0
3. P→1
4. P → 0P0
5. P → 1P1
– 0, 1 are terminals
– P is a variable (or non-terminal, or syntactic category)
– P is in this grammar also a start variable
– 1-5 are productions (or rules)

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
22
9 Ogada
Grammar of Palindromes

• The first three rules form the basis.

• They indicate that the class of palindromes


includes the strings ,0 and 1.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
23
9 Ogada
Grammar of Palindromes

• None of the right sides of these rules (the


portions following the arrows) contains a
variable, which is why they form a basis for
the definition.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
24
9 Ogada
Grammar of Palindromes

• The last 2 rules from the inductive part of the


definition.
• For instance, rule 4 says that if we take any
string w from the class P, then 0w0 is also in
class P.
• Rule 5 likewise tells us that 1w1 is also in P.
25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
25
9 Ogada
Terminals or Terminal Symbols

• There is a finite set of symbols that form the


strings of the language being defined.
• This set was {0, 1} in the example palindrome
just discussed.
• This alphabet is called the Terminals or
Terminal Symbols.
25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
26
9 Ogada
Variables
• There is a finite set of variables.
• Also called nonterminals or syntactic categories.
• Each variable represents a language, i.e., a set of
strings.
• In the previous example, there was only one variable, P,
which was used to represent the class of palindromes
over alphabet {0, 1}.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
27
9 Ogada
Start Symbol
• One of the variables represents the language being
defined.
• It is called the start symbol.
• Other variables represent auxiliary classes of strings
that are used to help define the language of the start
symbol.
• In the example, P, the only variable is the start Symbol.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
28
9 Ogada
Productions or Rules
• There is a finite set of productions or rules that
represent the recursive definition of a language.
• Each production consists of:
– A variable that is being (partially) defined by the
production. This variable is often called the head of
the production.
– The production symbol →.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
29
9 Ogada
Productions or Rules (cont)
• Each production consists of a string of zero or more
terminals and variables.
• The string, called the body of the production,
represents one way to form strings in the language of
the variable of the head. In so doing, we leave
terminals unchanged and substitute for each variable
of the body any string that is known to be in the
language of that variable.
25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
30
9 Ogada
Formal Definition of CFG’s
• A context-free grammar is a quadruple
G = (V, T, P, S)
Where
V is a finite set of variables
T is a finite set of terminals
P is a finite set of productions of the form A→, where A is a
variable and (VT)*
S  V is the start variable

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
31
9 Ogada
Examples

• The grammar of palindromes is represented by


Gpal=({P}, {0,1}, A, P), where A={P→, P→0, P→1,
P→0P0, P→1P1}

• Sometimes we group productions with the same


head, e.g. P→ | 0 | 1 | 0P0 | 1P1.

• This is called Compact Notation for productions.


25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
32
9 Ogada
Context-Free Grammars
• Strings are generated by repeated replacement of non-
terminals with string of terminals and non-terminals:
– Write down start variable (non-terminal).
– Replace a non-terminal with the right-hand-side of a
rule that has that non-terminal as its left-hand-side.
– Repeat above until no more non-terminals.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
33
9 Ogada
Example
• A simple grammar generates strings of 0's and 1's such
that each block of 0's is followed by at least as many 1's.
– S  AS | 
– A  0A1 | A1 | 01
• S  AS  A  0A1  0A11  00111 is a derivation of
the string 00111

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
34
9 Ogada
Derivations using a Grammar

• Productions of a CFG are used to infer that


certain strings are in the language of a certain
variable.
• There are two approaches to this inference:
– Using the rules from body to head.
– Using productions from head to body.
25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
35
9 Ogada
Using the rules from body to head
• In this case, we take strings known to be in the
language of each of the variables of the body,
concatenate them, in the proper order, with any
terminals appearing in the body, and infer that
the resulting string is in the language of the
variable in the head.
• This procedure is called recursive inference.
25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
36
9 Ogada
Using the rules from head to body
• We expand the start symbol using one of its
productions (i.e. using a production whose head is
the start symbol).

• We further expand the resulting string by replacing


one of the variables by the body of one of its
productions, and so on, until we derive a string
consisting of entire terminals.
25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
37
9 Ogada
Using the rules from head to body

• The language of the grammar is all strings of


terminals that we can obtain in this way.
• This use of grammars is called derivation.
• The process of deriving strings by applying
productions from head to body requires the
definition of a new relation symbol .
25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
38
9 Ogada
Derivations from head to body
• This symbol is read derives.
• The symbol  relationship may be extended to
represent zero, one, or many derivation steps,
much as the transition function of a finite
automaton was extended to δ.
• For derivations, we use a * to denote zero or
more steps.
25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
39
9 Ogada
Leftmost and Rightmost Derivations

• In order to restrict the number of choices in


deriving a string, it is often useful to require
that at each step we replace the leftmost
variable (leftmost derivation) by one of its
production bodies.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
40
9 Ogada
Leftmost and Rightmost Derivations

• Similarly, it is possible to require that at each


step the rightmost variable (rightmost
derivation) is replaced by one of its bodies.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
41
9 Ogada
Language of a CFG

• The language of G = (V, T, P, S), denoted L(G) is:

{w  Σ* : S * w}

is the set of strings that have derivations from


the start symbol i.e. the set of strings over T*
derivable from the start variable.

• If G is a CFG, we call L(G) a context-free language.


25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
42
9 Ogada
Exercise
• The following grammar generates the language of the regular
expression 0*1(0+1)*.
– S → A1B
– A → OA| ε
– B → 0B|1B|ε

• Give leftmost and rightmost derivations of the following strings:


a) 00101
b) 1001
c) 00011

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
43
9 Ogada
Meaning of Context-Free

• If you have a string of characters (terminals


and nonterminals) and you wish to replace a
nonterminal in the string, a context-free
grammar allows you to do that regardless of
the characters surrounding the nonterminal.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
44
9 Ogada
Parse Tree

• There is a tree representation for derivations that


has proved extremely useful.

• This tree shows clearly how the symbols of a


terminal string are grouped into substrings, each
of which belongs to the language of one of the
variables of the grammar.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
45
9 Ogada
Parse tree
• More importantly, the tree, ( called parse tree), when
used in a compiler, is the data structure of choice to
represent the source program.
• In a compiler, the tree structure of the source program
facilitates the translation of the source program into
executable code by allowing natural, recursive
functions to perform this translation process.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
46
9 Ogada
Parse Tree
• Certain grammars allow a terminal string to have more
than one parse tree.
• This makes the grammar unsuitable for a programming
language since a compiler could not tell the structure
of certain source programs, and therefore could not
with certainty deduce what the proper executable code
for the program was.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
47
9 Ogada
Parse Tree
• Easier way to picture derivation: parse tree
<expr>

<expr> * <expr>

<expr> + <expr> a

a a

• grammar encodes grouping information; this is captured in


the parse tree.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
48
9 Ogada
Constructing a Parse Tree

• Given a grammar G= (V, T, P, S). The parse tree


for G are trees with the following conditions:

1. Each interior node is labelled by a variable V.

2. Each leaf is labelled by either a variable, a


terminal, or ε. However, if the leaf is labelled
ε, then it must be the only child of its parent.
25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
49
9 Ogada
Constructing a Parse Tree

3. If an interior node is labelled A, and its


children labelled X1,X2,…Xk respectively, from
the left, then A→ X1X2…Xk is a production in
P.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
50
9 Ogada
Constructing Parse Tree
• Nodes = variables, terminals, or .
– Interior nodes are variables
– Leaf nodes are variables, terminals, or 
– A leaf can be  only if it is the only child of its parent.
• A node and its children (from left to right) must form
the head and body of a production

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
51
9 Ogada
Example

• E→E+E

• E→ I E

E + E

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
52
9 Ogada
Exercise

• Draw a parse tree for the derivation

• E  E * E  I * E  a * E  a*(E)  a*(E + E)
 a * (I + E)  a * (a + E)  a * (a + I)  a *
(a + I0)  a * (a + I00)  a * (a + b00).

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
53
9 Ogada
The Yield of a Parse Tree

• Looking at the leaves of any parse tree and


concatenating them results into a string,
called yield.

• The yield of a parse tree is the string of leaves


from left to right.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
54
9 Ogada
The Yield of a Parse Tree

• Important are those parse trees where:

– The yield is a terminal string, i.e. all leaves


are labeled either with a terminal or with ε

– The root is labeled by the start variable.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
55
9 Ogada
Example

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
56
9 Ogada
Equivalence of Parse Trees and
Derivations
• The following about a grammar G = (V, T, P, S) and a
terminal string w are all equivalent:
– S * w (i.e., w is in L(G))
– S * w
lm
– S *rmw
– There is a parse tree for G with root S and yield w.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
57
9 Ogada
Applications of CFGs
• Grammars are used to describe programming
languages.
• More importantly, there is a mechanical way of
turning the language description as a CFG into a
parser, the component of the compiler that
discovers the structure of the source program
and represents that structure by a parse tree.
25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
58
9 Ogada
Applications of Parse Trees

• The development of XML is widely predicted


to facilitate electronic commerce by allowing
participants to share conventions regarding
the format of orders, product descriptions,
and many other kinds of documents.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
59
9 Ogada
Applications of Parse Trees

• An essential part of XML is the Document Type


Definition (DTD), which is essentially a
context-free grammar that describes the
allowable tags and the ways in which these
tags may be nested.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
60
9 Ogada
Applications of Parse Trees
• Tags are the familiar keywords with triangular brackets,
e.g. <head> and </head> from html.
• However, XML tags deal not with the formatting of
text, but with the meaning of text.
• E.g. one could surround a sequence of characters that
was intended to be interpreted as a phone number by
<PHONE> and </PHONE>.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
61
9 Ogada
Ambiguity

• A CFG is ambiguous if there is a terminal string


that has multiple leftmost derivations from
the start variable.

– Equivalently: multiple rightmost derivations,


or multiple parse trees.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
62
9 Ogada
Ambiguity

<expr> <expr>

<expr> * <expr> <expr> + <expr>

<expr> + <expr> a a <expr> * <expr>

a a a a

ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr


25/06/2019 63
Kennedy Ogada
Inherent Ambiguity

• A CFL L is inherently ambiguous if every CFG


for L is ambiguous.

• Ambiguity of the grammar implies that at


least some strings in its language have
different structures (parse trees).

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
64
9 Ogada
Inherent Ambiguity

• Thus, such a grammar is unlikely to be useful


for a programming language, because two
structures for the same string (program)
implies two different meanings.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
65
9 Ogada
Inherent Ambiguity

• Common example: the easiest grammars for


arithmetic expressions are ambiguous and
need to be replaced by more complex,
unambiguous grammars.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
66
9 Ogada
Inherent Ambiguity

• An inherently ambiguous language would be


absolutely unsuitable as a programming
language, because we would not have any way
of fixing a unique structure for all its programs.

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
67
9 Ogada
Assignment
• The following grammar generates the language of regular
expression 0*1(0+1)*:
• S→A1B
• A →0A| є
• B →0B|1B|є
• Give the leftmost and rightmost derivations of the following strings:
a) 00101
b) 1001
c) 00011

25/06/201 ICS 2407 Theory Of Computing: Chapter 5: Context Free Grammar. Dr Kennedy
68
9 Ogada
The End

You might also like