0% found this document useful (0 votes)
124 views21 pages

2014-CD Ch-03 SAn

This document discusses syntax analysis and parsing in compiler design. It covers the role of parsers in breaking down code into tokens, context-free grammars for specifying language syntax with terminals, non-terminals, productions and a start symbol, and parsing techniques including derivations, parse trees, ambiguity, and the parser generator Yacc. Key concepts covered include lexical analysis providing tokens to the parser, parsers checking syntax against context-free grammars and generating parse trees, left-most and right-most derivations for parsing, and how parse trees depict operator precedence.

Uploaded by

HASEN SEID
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views21 pages

2014-CD Ch-03 SAn

This document discusses syntax analysis and parsing in compiler design. It covers the role of parsers in breaking down code into tokens, context-free grammars for specifying language syntax with terminals, non-terminals, productions and a start symbol, and parsing techniques including derivations, parse trees, ambiguity, and the parser generator Yacc. Key concepts covered include lexical analysis providing tokens to the parser, parsers checking syntax against context-free grammars and generating parse trees, left-most and right-most derivations for parsing, and how parse trees depict operator precedence.

Uploaded by

HASEN SEID
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Compiler Design (CoSc3102)

Chapter Three: Syntax Analysis and Yacc


Outline
3.1. Role of a parser
3.2. Context Free Grammar
3.3. Derivation, parse tree, ambiguity, left recursion, left factoring
3.4. Syntax error handling
3.5. Types of parsing
3.6. Parser Generator Yacc

3.1. Role of a Syntax Analyzers (parser)


What does it mean parsing in compiler design?
A parser is a compiler or interpreter component that breaks data into smaller elements for easy translation into
another language. A parser takes input in the form of a sequence of tokens, interactive commands, or program
instructions and breaks them up into parts that can be used by other components in programming.

Syntax analysis is the second phase of a compiler. In this chapter, we will learn the basic concepts used in the
construction of a parser. In CH-02, we have seen that a lexical analyzer can identify tokens with the help of
regular expressions and pattern rules. From this point of view a syntax analyzer or parser takes the input from
a lexical analyzer in the form of token streams. The parser analyzes the source code (token stream) against
the production rules to detect any errors in the code. The output of this phase is a parse tree.
Token & token Intermediate
read char value Parse
Lexical Rest of Representation
Source Parser Tree
Analyzer Front end
program
put back char getNextToken
id

Symbol table
In this way, the parser accomplishes two tasks, i.e., parsing the code, looking for errors and generating a
parse tree as the output of the phase. Parsers are expected to parse the whole code even if some errors exist in
the program. Parsers use error recovering strategies. But a lexical analyzer cannot check the syntax of a given
sentence due to the limitations of the regular expressions. Regular expressions cannot check balancing tokens,
such as parenthesis. Therefore, this syntax analysis phase uses context-free grammar (CFG), which is
recognized by push-down automata. The syntax of a language is specified by a context free grammar (CFG).

3.2. Context Free Grammar


CFG is a superset of Regular Grammar, as depicted in the following diagram.

Compiled by Tseganesh M. CS@ WCU 1


Compiler Design (CoSc3102)

CFG is a helpful tool in describing the syntax of programming languages. It implies that every Regular
Grammar is also context-free, but there exists some problems, which are beyond the scope of Regular
Grammar. The rules in a CFG are mostly recursive. A syntax analyzer checks whether a given program
satisfies the rules implied by a CFG or not. If it satisfies, the syntax analyzer creates a parse tree for the given
program.

Before we proceed to the details of CFG, first let we see the definition of context-free grammar and introduce
terminologies used in parsing technology. A context-free grammar has four components:

 A set of non-terminals (V): are syntactic variables that define sets of strings which are help to define
the language generated by the grammar. Non-terminals are represented by capital letters and they can
be further derivate.
 A set of tokens, known as terminal symbols (Σ): are the basic symbols from which strings are formed.
Terminals are represented by small letters and they don’t have further derivation.
 A set of productions (P): the productions of a grammar specify the manner in which the terminals and
non-terminals can be combined to form strings. Each production consists of a non-terminal (called left
side of the production), an arrow, and a sequence of tokens with non-terminals (called right side of
the production).
 One of the non-terminals is designated as the start symbol (S); from where the production begins.

The strings are derived from the start symbol by repeatedly replacing a non-terminal (initially the start symbol)
by the right side of a production, for that non-terminal.

The CFG can be written as follows


 S => <expression>
 <expression> => ( <expression> )
 <expression> => <expression> + <expression>
 <expression> => <expression> - <expression>
 <expression> => <expression> * <expression>
 <expression> => <expression> / <expression>
 <expression> => number

Example: We take the problem of palindrome language; L = {w | w = wR}, which cannot be described by
means of Regular Expression. Because it is not a regular language. But it can be described by means
of CFG, as illustrated below:
G = (V, Σ, P, S); Where: G- is a grammar,

V = {Q, Z, N}, which is set of non-terminals,


Σ = {0, 1}, which is set of terminals,
P = {Q → Z | Q → N | Q → ℇ | Z → 0Q0 | N → 1Q1}, which is set of production
S = {Q}, which is start symbol

This grammar describes palindrome language, such as: 1001, 11100111, 00100, 1010101, 11111, etc.

 Syntax Analyzer versus Lexical Analyzer


 a program recognized by LA, and it also by the syntax analyzer?
 Both of them do similar things; But the lexical analyzer deals with simple non-recursive
constructs of the language.
Compiled by Tseganesh M. CS@ WCU 2
Compiler Design (CoSc3102)

 The syntax analyzer deals with recursive constructs of the language.


 The lexical analyzer simplifies the job of the syntax analyzer.
 The lexical analyzer recognizes the smallest meaningful units (tokens) in a source program.
 The SA works on the smallest meaningful units (tokens) in a source program to recognize
meaningful structures in our programming language.

3.3. Derivation, parse tree, ambiguity, left recursion, left factoring

3.3.1. Derivation
A derivation is basically a sequence of production rules, in order to get the input string. During parsing, we
take two decisions for some sentential form of input:
 Deciding the non-terminal which is to be replaced.
 Deciding the production rule, by which, the non-terminal will be replaced.
To decide which non-terminal to be replaced with production rule, we can have two options; left-most
derivation and right-most derivation

i. Left-most Derivation

If the sentential form of an input is scanned and replaced from left to right, it is called left-most derivation.
The sentential form derived by the left-most derivation is called the left-sentential form.

ii. Right-most Derivation

If we scan and replace the input with production rules, from right to left, it is known as right-most derivation.
The sentential form derived from the right-most derivation is called the right-sentential form.

Example: Production rules:

E → E + E
E → E * E
E → id
For the input string: id + id * id
The left-most derivation is: The right-most derivation is:
E → E * E E → E + E
E → E + E * E E → E + E * E
E → id + E * E E → E + E * id
E → id + id * E E → E + id * id
E → id + id * id E → id + id * id
Notice in left-most derivation, the left-most side non-terminal is always processed first; whereas in right-most
derivation, the right-most side non-terminal is always processed first

3.3.2. Parse Tree


A parse tree is a graphical depiction of a derivation. It is convenient to see how strings are derived from the
start symbol. The start symbol of the derivation becomes the root of the parse tree. Let us see this by an example
from the above topic. We take the left-most derivation of a + b * c

The left-most derivation is: E→E*E


E → E + E * E
E → id + E * E
Compiled by Tseganesh M. CS@ WCU 3
Compiler Design (CoSc3102)

E → id + id * E
E → id + id * id
Step 1: E → E * E

Step 2: E → E + E * E

Step 3: E → id + E * E

Step 4: E → id + id * E

Step 5: E → id + id * id

Compiled by Tseganesh M. CS@ WCU 4


Compiler Design (CoSc3102)

In a parse tree:

 All leaf nodes are terminals.


 All interior nodes are non-terminals.
 In-order traversal gives original input string.

A parse tree depicts associativity and precedence of operators. The deepest sub-tree is traversed first, therefore
the operator in that sub-tree gets precedence over the operator which is in the parent nodes.

Exercise: for the given input string; a + b * c draw the parse tree by using right-most derivation.

o Hint: Assume the production rule is; E → E + E| E * E | id


o based on this rule, first show the right-most derivation of a + b * c is, then draw the parse tree.

3.3.3. Ambiguity
A grammar G is said to be ambiguous if it has more than one parse tree (left or right derivation) for at least
one string.

Example:
E → E + E
E → E – E
E → id
For the string id - id + id, the above grammar generates two parse trees:

The language generated by an ambiguous grammar is said to be inherently ambiguous. Ambiguity in grammar
is not good for a compiler construction. No method can detect and remove ambiguity automatically, but it can
be removed by either re-writing the whole grammar without ambiguity, or by setting and following
associativity and precedence constraints.

a. Associativity
If an operand has operators on both sides, the side on which the operator takes this operand is decided by the
associativity of those operators. If the operation is left-associative, then the operand will be taken by the left
operator or if the operation is right-associative, the right operator will take the operand.

Example
Operations such as Addition, Multiplication, Subtraction, and Division are left associative. If the expression
contains:
Compiled by Tseganesh M. CS@ WCU 5
Compiler Design (CoSc3102)

id op id op id
it will be evaluated as: (id op id) op id
For example, (id + id) + id

Operations like Exponentiation are right associative, i.e., the order of evaluation in the same expression will
be:
id op (id op id)
For example, id ^ (id ^ id)

b. Precedence
If two different operators share a common operand, the precedence of operators decides which will take the
operand. That is, 2+3*4 can have two different parse trees, one corresponding to (2+3)*4 and another
corresponding to 2+(3*4). By setting precedence among operators, this problem can be easily removed. As in
the previous example, mathematically * (multiplication) has precedence over + (addition), so the expression
2+3*4 will always be interpreted as:

2 + (3 * 4)

These methods decrease the chances of ambiguity in a language or its grammar.

3.3.4. Left Recursion


A grammar becomes left-recursive if it has any non-terminal ‘A’ whose derivation contains ‘A’ itself as the
left-most symbol. Left-recursive grammar is considered to be a problematic situation for top-down parsers.
Top-down parsers start parsing from the Start symbol, which in itself is non-terminal. So, when the parser
encounters the same non-terminal in its derivation, it becomes hard for it to judge when to stop parsing the left
non-terminal and it goes into an infinite loop.

Example:

(1) A => Aα|β; this is an example of immediate left recursion, where A is any non-terminal
symbol and α represents a string of non-terminals.
(2) S => Aα|β
A => Sd ; this is an example of indirect-left recursion

A top-down parser will first parse the A, which in-turn will yield a string consisting of A itself and the parser
may go into a loop forever.

Compiled by Tseganesh M. CS@ WCU 6


Compiler Design (CoSc3102)

Removal of Left Recursion

One way to remove left recursion is to use the following technique:

The production: A => Aα | β is converted into following productions

A => βA'
A'=> αA' | ε

This does not impact the strings derived from the grammar, but it removes immediate left recursion. Second
method is to use the following algorithm, which should eliminate all direct and indirect left recursions.

START

Arrange non-terminals in some order like A1, A2, A3,…, An

for each i from 1 to n


{
for each j from 1 to i-1
{
replace each production of form Ai ⟹Aj𝜸
with Ai ⟹ δ1𝜸 | δ2𝜸 | δ3𝜸 |…| 𝜸
where Aj ⟹ δ1 | δ2|…| δn are current Aj productions
}
}
eliminate immediate left-recursion

END

Example

The production set


S => Aα | β
A => Sd
after applying the above algorithm, should become
S => Aα | β
A => Aαd | βd
and then, remove immediate left recursion using the first technique.
A => βdA'
A' => αdA' | ε
Now none of the production has either direct or indirect left recursion.

3.3.5. Left Factoring

If more than one grammar production rules has a common prefix string, then the top-down parser cannot
make a choice as to which of the production it should take to parse the string in hand.

Example: If a top-down parser encounters a production like:


A ⟹ αβ | α𝜸 | …
Then it cannot determine which production to follow to parse the string as both productions are starting from
the same terminal (or non-terminal). To remove this confusion, we use a technique called left factoring. Left

Compiled by Tseganesh M. CS@ WCU 7


Compiler Design (CoSc3102)

factoring transforms the grammar to make it useful for top-down parsers. In this technique, we make one
production for each common prefixes and the rest of the derivation is added by new productions.

Example: The above productions can be written as

A => αA'
A'=> β | 𝜸 | …

Now the parser has only one production per prefix which makes it easier to take decisions.

First and Follow Sets


An important part of parser table construction is to create first and follow sets. These sets can provide the actual
position of any terminal in the derivation. This is done to create the parsing table where the decision of
replacing T[A, t] = α with some production rule.

First Set
This set is created to know what terminal symbol is derived in the first position by a non-terminal. For example,
α → t β
That is α derives t (terminal) in the very first position. So, t ∈ FIRST(α).

Algorithm for calculating First set


Look at the definition of FIRST(α) set:
 if α is a terminal, then FIRST(α) = { α }.
 if α is a non-terminal and α → ℇ is a production, then FIRST(α) = { ℇ }.
 if α is a non-terminal and α → 𝜸1 𝜸2 𝜸3 … 𝜸n and any FIRST(𝜸) contains t then t is in FIRST(α).
First set can be seen as:

Follow Set
Likewise, we calculate what terminal symbol immediately follows a non-terminal α in production rules. We
do not consider what the non-terminal can generate but instead, we see what would be the next terminal symbol
that follows the productions of a non-terminal.

Algorithm for calculating Follow set:


 if α is a start symbol, then FOLLOW() = $
 if α is a non-terminal and has a production α → AB, then FIRST(B) is in FOLLOW(A) except ℇ.
 if α is a non-terminal and has a production α → AB, where B ℇ, then FOLLOW(A) is in FOLLOW(α).

Follow set can be seen as: FOLLOW(α) = { t | S *αt*}

Limitations of Syntax Analyzers


Syntax analyzers receive their inputs, in the form of tokens, from lexical analyzers. Lexical analyzers are
responsible for the validity of a token supplied by the syntax analyzer. Syntax analyzers have the following
drawbacks –
Compiled by Tseganesh M. CS@ WCU 8
Compiler Design (CoSc3102)

 it cannot determine if a token is valid,


 it cannot determine if a token is declared before it is being used,
 it cannot determine if a token is initialized before it is being used,
 it cannot determine if an operation performed on a token type is valid or not.

These tasks are accomplished by the semantic analyzer, which we shall study in Semantic Analysis.

3.4. Syntax error handling


A parser should be able to detect and report any error in the program. It is expected that when an error is
encountered, the parser should be able to handle it and carry on parsing the rest of the input. Mostly it is
expected from the parser to check for errors but errors may be encountered at various stages of the compilation
process. A program may have the following kinds of errors at various stages:

 Lexical : name of some identifier typed incorrectly


 Syntactical : missing semicolon or unbalanced parenthesis
 Semantical : incompatible value assignment
 Logical : code not reachable, infinite loop

There are four common error-recovery strategies that can be implemented in the parser to deal with errors
in the code.

a. Panic mode
When a parser encounters an error anywhere in the statement, it ignores the rest of the statement by not
processing input from erroneous input to delimiter, such as semi-colon. This is the easiest way of error-
recovery and also, it prevents the parser from developing infinite loops.

b. Statement mode
When a parser encounters an error, it tries to take corrective measures so that the rest of inputs of statement
allow the parser to parse ahead. For example, inserting a missing semicolon, replacing comma with a semicolon
etc. Parser designers have to be careful here because one wrong correction may lead to an infinite loop.

c. Error productions
Some common errors are known to the compiler designers that may occur in the code. In addition, the designers
can create augmented grammar to be used, as productions that generate erroneous constructs when these errors
are encountered.

d. Global correction
The parser considers the program in hand as a whole and tries to figure out what the program is intended to do
and tries to find out a closest match for it, which is error-free. When an erroneous input (statement) X is fed, it
creates a parse tree for some closest error-free statement Y. This may allow the parser to make minimal changes
in the source code, but due to the complexity (time and space) of this strategy, it has not been implemented in
practice yet.

Compiled by Tseganesh M. CS@ WCU 9


Compiler Design (CoSc3102)

Abstract Syntax Trees


Parse tree representations are not easy to be parsed by the compiler, as they contain more details than actually
needed. Take the following parse tree as an example:

If watched closely, we find most of the leaf nodes are single child to their parent nodes. This information can
be eliminated before feeding it to the next phase. By hiding extra information, we can obtain a tree as shown
below:

Abstract tree can be represented as:

ASTs are important data structures in a compiler with least unnecessary information. ASTs are more compact
than a parse tree and can be easily used by a compiler.

3.5. Types of parsing


Syntax analyzers follow production rules defined by means of context-free grammar. The way the production
rules are implemented (derivation) divides parsing into two types : top-down parsing and bottom-up parsing.

Compiled by Tseganesh M. CS@ WCU 10


Compiler Design (CoSc3102)

3.5.1. Top-down Parsing

When the parser starts constructing the parse tree from the start symbol and then tries to transform the start
symbol to the input, it is called top-down parsing. Means that top-down parsing technique parses the input, and
starts constructing a parse tree from the root node gradually moving down to the leaf nodes. Top-down parsing
uses left most derivation.

 Example: for the given input string: cad


S  cAB
A ab S
A a S
B d
S c A B
S c A B
c A B
c A B a d
a b a
The further categories of top-down parsing are depicted below.

Recursive Descent Parsing

Recursive descent is a common form of top-down parsing technique that constructs the parse tree from the top
and the input is read from left to right. It uses procedures for every terminal and non-terminal entity. It is called
recursive as it uses recursive procedures to process the input. This parsing technique recursively parses the
input to make a parse tree, which may or may not require back-tracking. But the grammar associated with it
(if not left factored) cannot avoid back-tracking. A form of recursive-descent parsing that does not require any
back-tracking is known as predictive parsing. This parsing technique is regarded recursive as it uses context-
free grammar which is recursive in nature.

a. Back-tracking
Top- down parsers start from the root node (start symbol) and match the input string against the production
rules to replace them (if matched). Backtracking means, if one derivation of a production fails, the syntax

Compiled by Tseganesh M. CS@ WCU 11


Compiler Design (CoSc3102)

analyzer restarts the process using different rules of same production. This technique may process the input
string more than once to determine the right production.

To understand top-down parser, take the following example of CFG: For an input string: read,

S → rXd | rZd
X → oa | ea
Z → ai

It will start with S from the production rules and will match its yield to the left-most letter of the input, i.e. ‘r’.
The very production of S (S → rXd) matches with it. So the top-down parser advances to the next input letter
(i.e. ‘e’). The parser tries to expand non-terminal ‘X’ and checks its production from the left (X → oa). It does
not match with the next input symbol. So the top-down parser backtracks to obtain the next production rule of
X, (X → ea). Now the parser matches all the input letters in an ordered manner. The string is accepted.

b. Predictive Parser
Predictive parser is a recursive descent parser, which has the capability to predict which production is to be
used to replace the input string. The predictive parser does not suffer from backtracking. To accomplish its
tasks, the predictive parser uses a look-ahead pointer, which points to the next input symbols. To make the
parser back-tracking free, the predictive parser puts some constraints on the grammar and accepts only a class
of grammar known as LL(k) grammar.

Predictive parsing uses a stack and a parsing table to parse the input and generate a parse tree. Both the stack
and the input contains an end symbol $ to denote that the stack is empty and the input is consumed. The parser
refers to the parsing table to take any decision on the input and stack element combination.

Compiled by Tseganesh M. CS@ WCU 12


Compiler Design (CoSc3102)

In recursive descent parsing, the parser may have more than one production to choose from for a single
instance of input, whereas in predictive parser, each step has at most one production to choose. There might
be instances where there is no production matching the input string, making the parsing procedure to fail.

c. LL Parser
An LL Parser accepts LL grammar. LL grammar is a subset of context-free grammar but with some restrictions
to get the simplified version, in order to achieve easy implementation. LL grammar can be implemented by
means of both algorithms namely, recursive-descent or table-driven.

LL parser is denoted as LL(k). The first L in LL(k) is parsing the input from left to right, the second L in LL(k)
stands for left-most derivation and k itself represents the number of lookahead. Generally k = 1, so LL(k) may
also be written as LL(1).

LL Parsing Algorithm
We may stick to deterministic LL(1) for parser explanation, as the size of table grows exponentially with the
value of k. Secondly, if a given grammar is not LL(1), then usually, it is not LL(k), for any given k.

Given below is an algorithm for LL(1) Parsing:

Input:
string ω
parsing table M for grammar G

Output:
If ω is in L(G) then left-most derivation of ω,
error otherwise.

Initial State: $S on stack (with S being start symbol)


Compiled by Tseganesh M. CS@ WCU 13
Compiler Design (CoSc3102)

ω$ in the input buffer

SET ip to point the first symbol of ω$.

repeat
let X be the top stack symbol and a the symbol pointed by ip.

if X∈ Vt or $
if X = a
POP X and advance ip.
else
error()
endif

else /* X is non-terminal */
if M[X,a] = X → Y1, Y2,... Yk
POP X
PUSH Yk, Yk-1,... Y1 /* Y1 on top */
Output the production X → Y1, Y2,... Yk
else
error()
endif
endif
until X = $ /* empty stack */

A grammar G is LL(1) if A → α | β are two distinct productions of G:

 for no terminal, both α and β derive strings beginning with a.


 at most one of α and β can derive empty string.
 if β → t, then α does not derive any string beginning with a terminal in FOLLOW(A).

3.5.2. Bottom-up Parsing (Shift-Reduce Parsing)

As the name suggests, bottom-up parsing starts from the leaf nodes of a tree and works in upward direction
till it reaches the root node. Here, we start from a sentence and then apply production rules in reverse manner
in order to reach the start symbol.

Example: Input string : a + b * c


Production rules:
S → E
E → E + T
E → E * T
E → T
T → id
Let us start bottom-up parsing for: a + b * c
Read the input and check if any production matches with the input:
a + b * c
T + b * c
E + b * c
E + T * c
E * c
E * T
E
S

Compiled by Tseganesh M. CS@ WCU 14


Compiler Design (CoSc3102)

Additional example: for the given input string: abbcde

S  aABe
A Abc|b
B d

The image given below depicts the bottom-up parsers available.

a. Shift-Reduce Parsing

Shift-reduce parsing uses two unique steps for bottom-up parsing. These steps are known as shift-step and
reduce-step.
 Shift step: The shift step refers to the advancement of the input pointer to the next input symbol, which
is called the shifted symbol. This symbol is pushed onto the stack. The shifted symbol is treated as a
single node of the parse tree.
 Reduce step: When the parser finds a complete grammar rule (RHS) and replaces it to (LHS), it is known
as reduce-step. This occurs when the top of the stack contains a handle. To reduce, a POP function is
performed on the stack which pops off the handle and replaces it with LHS non-terminal symbol.

A more general form of shift reduce parser is LR parser.

LR Parser
The LR parser is a non-recursive, shift-reduce, bottom-up parser. It uses a wide class of context-free grammar
which makes it the most efficient syntax analysis technique. LR parsers are also known as LR(k) parsers, where
L stands for left-to-right scanning of the input stream; R stands for the construction of right-most derivation
in reverse, and k denotes the number of lookahead symbols to make decisions.

There are three widely used algorithms available for constructing an LR parser:

 SLR(1) – Simple LR Parser:


o Works on smallest class of grammar
o Few number of states, hence very small table
o Simple and fast construction
 LR(1) – LR Parser:
o Works on complete set of LR(1) Grammar
o Generates large table and large number of states
o Slow construction

Compiled by Tseganesh M. CS@ WCU 15


Compiler Design (CoSc3102)

 LALR(1) – Look-Ahead LR Parser:


o Works on intermediate size of grammar
o Number of states are same as in SLR(1)

LR Parsing Algorithm

Here we describe a skeleton algorithm of an LR parser:

token = next_token()

repeat forever
s = top of stack

if action[s, token] = “shift si” then


PUSH token
PUSH si
token = next_token()

else if action[s, token] = “reduce A::= β“ then


POP 2 * |β| symbols
s = top of stack
PUSH A
PUSH goto[s,A]

else if action[s, token] = “accept” then


return

else
error()

LL vs. LR
LL LR
Does a leftmost derivation. Does a rightmost derivation in reverse.
Starts with the root nonterminal on the stack. Ends with the root nonterminal on the stack.
Ends when the stack is empty. Starts with an empty stack.
Uses the stack for designating what is still to be
Uses the stack for designating what is already seen.
expected.
Builds the parse tree top-down. Builds the parse tree bottom-up.
Continuously pops a nonterminal off the stack, and Tries to recognize a right hand side on the stack, pops
pushes the corresponding right hand side. it, and pushes the corresponding nonterminal.
Expands the non-terminals. Reduces the non-terminals.
Reads the terminals when it pops one off the stack. Reads the terminals while it pushes them on the stack.
Pre-order traversal of the parse tree. Post-order traversal of the parse tree.

Compiled by Tseganesh M. CS@ WCU 16


Compiler Design (CoSc3102)

Questions for exercise: working with the following question for more elaboration

1. Which of the following derivations does a top-down parser use while parsing an input string? The
input is assumed to be scanned in left to right order.
(a) Leftmost derivation (c) Rightmost derivation
(b) Leftmost derivation traced out in reverse (d) Rightmost derivation traced out in reverse

Answer (a)

Top-down parsing (LL)


In top down parsing, we just start with the start symbol and compare the right side of the different productions
against the first piece of input to see which of the productions should be used. A top down parser is called LL
parser because it parses the input from Left to right, and constructs a Leftmost derivation of the sentence.

Algorithm (Top Down Parsing)


a) In the current string, choose leftmost nonterminal.
b) Choose a production for the chosen nonterminal.
c) In the string, replace the nonterminal by the right-hand-side of the rule.
d) Repeat until no more non-terminals.

LL grammars are often classified by numbers, such as LL(1), LL(0) and so on. The number in the parenthesis
tells the maximum number of terminals we may have to look at a time to choose the right production at any
point in the grammar. The most common (and useful) kind of LL grammar is LL(1) where you can always
choose the right production by looking at only the first terminal on the input at any given time. With LL(2)
you have to look at two symbols, and so on. There exist grammars that are not LL(k) grammars for any fixed
value of k at all, and they are sadly quite common.

Let us see an example of top down parsing for following grammar. Let input string be ax.
S -> Ax
A -> a
A -> b

An LL(1) parser starts with S and asks “which production should I attempt?” Naturally, it predicts the only
alternative of S. From there it tries to match A by calling method A (in a recursive-descent parser). Lookahead
a predicts production
A -> a

The parser matches a, returns to S and matches x. Done. The derivation tree is:
S
/ \
A x
|
a

2. Which of the following statements is false?


a) An unambiguous grammar has same leftmost and rightmost derivation
b) An LL(1) parser is a top-down parser
c) LALR is more powerful than SLR
d) An ambiguous grammar can never be LR(k) for any k

Compiled by Tseganesh M. CS@ WCU 17


Compiler Design (CoSc3102)

Answer: (a)

If a grammar has more than one leftmost (or rightmost) derivation for a single sentential form, the grammar is
ambiguous. The leftmost and rightmost derivations for a sentential form may differ, even in an unambiguous
grammar

3. Which of the following grammar rules violate the requirements of an operator grammar? P, Q, R
are nonterminals, and r,s,t are terminals

a) P -> QR b) (P -> QsR c) P -> ε d) (P -> QtRr e) a and c

Answer: (e)

Explanation:
An operator precedence parser is a bottom-up parser that interprets an operator-precedence grammar. For
example, most calculators use operator precedence parsers to convert from the human-readable infix notation
with order of operations format into an internally optimized computer-readable format like Reverse Polish
notation (RPN). An operator precedence grammar is a kind of context-free grammar that can be parsed with
an operator-precedence parser. It has the property that no production has either an empty (ε) right-hand side or
two adjacent non-terminals in its right-hand side. These properties allow the terminals of the grammar to be
described by a precedence relation, and a parser that exploits that relation is considerably simpler than more
general-purpose parsers such as LALR parsers.

4. Consider the grammar with the following translation rules and E as the start symbol.

E -> E1 #T {E.value = E1.value * T.value}


| T {E.value = T.value}
T -> T1 & F {T.value = T1.value + F.value}
|F {T.value= F.value}
F -> num {F.value = num.value}
Compute E.value for the root of the parse tree for the expression: 2 # 3 & 5 # 6 &4.
a) 200 b) 180 c) 160 d) 40

Answer: (c)
Explanation:
We can calculate the value by constructing the parse tree for the expression 2 # 3 & 5 # 6 &4. Alternatively,
we can calculate by considering following precedence and associativity rules.
Precedence in a grammar is enforced by making sure that a production rule with higher precedence operator
will never produce an expression with operator with lower precedence.
In the given grammar ‘&’ has higher precedence than ‘#’.

Left associativity for operator * in a grammar is enforced by making sure that for a production rule like S ->
S1 * S2 in grammar, S2 should never produce an expression with *. On the other hand, to ensure right
associativity, S1 should never produce an expression with *. In the given grammar, both ‘#’ and & are left-
associative.
So expression 2 # 3 & 5 # 6 &4 will become ((2 # (3 & 5)) # (6 & 4))
Let us apply translation rules, we get ((2 * (3 + 5)) * (6 + 4)) = 160.

Compiled by Tseganesh M. CS@ WCU 18


Compiler Design (CoSc3102)

3.6. Parser Generator Yacc

1 Introduction to YACC
A parser generator is a program that takes as input a specification of a syntax, and produces as output a
procedure for recognizing that language. Historically, they are also called compiler-compilers.
YACC (yet another compiler-compiler) is an LALR(1) (LookAhead, Left-to-right, Rightmost derivation
producer with 1 lookahead token) parser generator. YACC was originally designed for being complemented
by Lex.

Input File: YACC input file is divided into three parts.

/* definitions */
....

%%
/* rules */
....
%%

/* auxiliary routines */
....

Input File: Definition Part: this part includes information about the tokens used in the syntax definition:

%token NUMBER
%token ID

 Yacc automatically assigns numbers for tokens, but it can be overridden by

%token NUMBER 621

 Yacc also recognizes single characters as tokens. Therefore, assigned token numbers should not overlap
ASCII codes.
 The definition part can include C code external to the definition of the parser and variable declarations,
within %{ and %} in the first column.
 It can also include the specification of the starting symbol in the grammar:

%start nonterminal

Input File: Rule Part: this part contains grammar definition in a modified BNF form.

 Actions is C code in { } and can be embedded inside (Translation schemes).

Input File: Auxiliary Routines Part: this part is only C code.

 It includes function definitions for every function needed in rules part.


 It can also contain the main() function definition if the parser is going to be run as a program.
 The main() function must call the function yyparse().
Compiled by Tseganesh M. CS@ WCU 19
Compiler Design (CoSc3102)

Input File:

 If yylex() is not defined in the auxiliary routines sections, then it should be included:

#include "lex.yy.c"

 YACC input file generally finishes with: .y

Output Files: output of YACC is a file named y.tab.c

 If it contains the main() definition, it must be compiled to be executable. Otherwise, the code can be an
external function definition for the function int yyparse()
 If called with the –d option in the command line, Yacc produces as output a header file y.tab.h with all its
specific definition (particularly important are token definitions to be included, for example, in a Lex input
file).
 If called with the –v option, Yacc produces as output a file y.output containing a textual description of the
LALR(1) parsing table used by the parser. This is useful for tracking down how the parser solves conflicts.

Example: Yacc File (.y)

%{
#include <ctype.h>
#include <stdio.h>
#define YYSTYPE double /* double type for yacc stack */
%}
%%
Lines : Lines S '\n' { printf("OK \n"); }
| S '\n’
| error '\n' {yyerror("Error: reenter last line:");
yyerrok; };
S : '(' S ')’
| '[' S ']’
| /* empty */ ;
%%
#include "lex.yy.c"
void yyerror(char * s)
/* yacc error handler */
{
fprintf (stderr, "%s\n", s);
}

int main(void)
{
return yyparse();
}

Lex File (.l)

%{
%}
%%
[ \t] { /* skip blanks and tabs */ }
\n|. { return yytext[0]; }
Compiled by Tseganesh M. CS@ WCU 20
Compiler Design (CoSc3102)

%%

For Compiling YACC Program:

1. Write lex program in a file file.l and yacc in a file file.y


2. Open Terminal and Navigate to the Directory where you have saved the files.
3. type lex file.l
4. type yacc file.y
5. type cc lex.yy.c y.tab.h -ll
6. type ./a.out

Compiled by Tseganesh M. CS@ WCU 21

You might also like