0% found this document useful (0 votes)
62 views

Compiler Design Slide Chapter 1-6

Uploaded by

Miki Micah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Compiler Design Slide Chapter 1-6

Uploaded by

Miki Micah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 250

COMPILER DESIGN

CHAPTER ONE
INTRODUCTION

Compiled By: Seble N.


Overview
 What is a compiler?
 Types of Compilers
 Language Processing Systems
 Compilers vs Interpreters
 The structure of a Compiler
 Analysisand synthesis phases of a compiler
 Phases of a compilation process

 Compiler Construction Tools


Introduction
 What is a compiler?
 is a program that can read a program written in one
language – the source language – and translate it into
an equivalent program in another language – the
target language

 Types of Compilers
 Source – to – Source Compilers
 Source – to – Machine Compilers
Language Processing Systems
Source Program

 Preprocessor
Preprocessor

 Compiler
Assembler
Modified Source Program

Compiler  Linker
Target Assembly Program  Loader
Assembler

Re-locatable Machine Code

Library files
Linker/Loader Re-locatable object files

Target Machine Code


Compilers vs Interpreters

Compilers Interpreters

 Scans the entire program and  Translates program one


translates it as a whole into statement at a time.
machine code.  It takes less amount of time to
 It takes large amount of time analyze the source code but
to analyze the source code the overall execution time is
but the overall execution time slower.
is comparatively faster  Continues translating the
 It generates the error program until the first error is
message only after scanning met, in which case it stops.
the whole program. Hence Hence debugging is easy
debugging is comparatively
hard
The structure of a Compiler
 The Analysis Part (Front End)
 Breaks up the program into constituent pieces
 Checks for syntax and semantic errors
 Produces an Intermediate Representation of the source code
 If it founds any errors then it will display them
 Collects information about the source program and stores it in a
data structure called symbol table

 The Synthesis Part (Back End)


 Constructs the target program from the IR and the symbol table
Character Stream

Phases in a Compilation Lexical Analyzer

Token Stream

Syntax Analyzer
 Lexical Analyzer
Syntax Tree

 Syntax Analyzer Semantic Analyzer


Symbol Table

 Semantic Analyzer Syntax Tree

Intermediate Code Generator


 Intermediate Code Generator
Intermediate Representation

 Code Generator
Code Generator

 Code Optimization Target-Machine Code

Machine-Dependent Code Optimizer

Target-Machine Code
Lexical Analysis/Scanner

 Reads streams of characters


 Groups the characters into meaningful sequences called
lexemes.
 For each lexeme it produces a token in the form
 <token-name, attribute-value>
 Token-name is an abstract name that will be used during syntax
analysis
 Attribute-value is a value that points to an entry in the symbol
table for this token
Lexical Analysis/Scanner
 Example
 position = initial + rate * 60
 Position is a lexeme that would be mapped into a token <id, 1>,
 = is a lexeme that is mapped into the token <=>
 Initial is a lexeme that would be mapped into a token <id, 2>
 + is a lexeme that is mapped into the token <+>
 Rate is a lexeme that would be mapped into a token <id, 3>
 * is a lexeme that is mapped into the token <*>
 60 is a lexeme that is mapped into the token <60>

 After lexical analysis, the sequence of tokens are


 <id, 1><=><id, 2><+><id, 3><*><60>
Syntax Analysis/Parsing

 The parser uses the tokens produced by the lexical


analyzer to create a tree-like intermediate
representation – called syntax tree – that describes the
grammatical structure of the token stream
 Each interior node represents operator
 The children of the node represent the arguments of the
operation
Syntax Analysis/Parsing
 position = initial + rate * 60
 <id, 1><=><id, 2><+><id, 3><*><60>

<id, 1> +

<id, 2> *

<id, 3>
60
Semantic Analysis
 Uses the syntax tree and the information in the symbol table to
check the source program for semantic consistency with the language
definition
 It gathers type information and performs type checking
 Type checking
 Checks that each operator has matching operands

 Type Conversion
 position = initial + rate * 60
 Semantic analyzer first converts integer 60 to a floating point number
before applying *
Intermediate Code Generation

 After syntax and semantic analysis, many compilers


generate an explicit low-level or machine-like, which we
can think of as a program for an abstract machine

 have two properties:


 it should be easy to produce and

 it should be easy to translate it into the target machine


Intermediate Code Generation
 Three-address code consists of an assembly like instructions with a maximum of
three operands per instruction
=
 Each operand can act like a register

 The output of the intermediate code generator for<id, 1> +

 <id, 1><=><id, 2><+><id, 3><*><60>


<id, 2> *
 t1 = inttofloat(60)

 t2 = id3 * t1
<id, 3>
 t3 = id2 + t2 60
 id1 = t3

 The compiler must also generate temporary name to hold the value computed by a
three-address instruction
Code Optimization
 Two Types
 The machine-independent code-optimization phase
 attempts to improve the intermediate code so that better target
code will result
 The machine dependant code-optimization phase
 Example
t1 = inttofloat(60) t1 = id3 * 60.0
id1 = id2 + t1
t2 = id3 * t1

t3 = id2 + t2

id1 = t3
Code Generation

 The code generator takes as an input intermediate


representation of the source program and maps it into
the target language
 If the target language is machine language, registers or
memory locations are selected for each of the variables
used by the program
 A crucial aspect of code generation is the judicious
assignment of registers to hold variables
Code Generation
 For example, using registers R1 and R2, the
previous intermediate code might get translated
into the machine code
position = initial + rate * 60

<id, 1><=><id, 2><+><id, 3><*><60>

LDF R2, id3


MULF R2, R2, #60.0
t1 = id3 * 60.0 LDF R1, id2
id1 = id2 + t1 ADDF R1, R1, R2
STF id1, R1
Symbol Table Management
 An essential function of a compiler is
 to record the variables names used in the source program and

 collect information about various attributes of each name

 These attributes may provide information about


 the storage allocated for a name,

 its type,

 its scope (where in the program its value can be used), and

 in the case of procedure names, as the number and types of its arguments, the method of
passing each argument (e.g., by value or by reference), and the type returned

 The symbol table


 is a data structure containing a record for each variable name, with fields for attributes of
the name
Compiler Construction Tools
 Some commonly used compiler construction tools include:
 Scanner generators
 produce lexical analyzers from a regular-expression description of the tokens of the
language
 Parser generators
 automatically produce syntax analyzers from a grammatical description of a programming
language
 Syntax-directed translation engines
 produce collections of routines for walking a parse tree and generating intermediate code
 Code generator generators
 produce a code generator form a collection of rules for translating each operation of the
intermediate language into the machine language for a target machine
 Data-flow analysis engines
 facilitate the gathering of information about how values are transmitted from one part of
the program to each other part.
 is key part of code optimization
 Compiler-construction toolkits
 provide an integrated set of routines for constructing various phases of a compiler
COMPILER DESIGN
CHAPTER TWO
LEXICAL ANALYSIS

Compiled By: Seble N.


Outline
 Role of lexical analyzer
 Basic Terminologies
 Class of Tokens
 Attributes of Tokens
 Lexical Errors
 Scanning Technique
 Specification of tokens
 Recognition of tokens
The role of lexical analyzer

token
Source To semantic
Lexical Analyzer Parser
program analysis
getNextToken

Symbol
table
Tasks of Lexical Analyzer

 Identifying Lexemes

 Stripping out comments and whitespace (blank,


newline, and tab)

 Correlating error messages generated by the


compiler with the source program by keeping track
of line numbers (using newline characters)
Basic Terminologies
 What are Major Terms for Lexical Analysis?
 TOKEN
 A classification for a common set of strings
 Examples Include Identifier, Integer, Float, Assign, LParen, RParen,
etc.
 PATTERN
 The rules which characterize the set of strings for a token –
integers [0-9]+
 Recall File and OS Wildcards ([A-Z]*.*)
 LEXEME
 Actual sequence of characters that matches pattern and is
classified by a token
 Identifiers: x, count, name, etc…
 Integers: 345, 20 -12, etc.
Class of Tokens
 One token for each keyword. The pattern for a keyword is
the same as the keyword itself
 Tokens for operators, either individually or in classes such as
comparison (for lexemes: <, >, <=, >=, ==, !=)
 One token representing all identifiers
 One or more tokens representing constants, such as numbers
and literal strings
 Tokens for each punctuation symbol, such as left and right
parentheses, comma and semicolon
Example

Token Informal description Sample lexemes

if Characters i, f if
else Characters e, l, s, e else
comparison < or > or <= or >= or == or != <=, !=

id Letter followed by letter and digits pi, score, D2


number Any numeric constant 3.14159, 0, 6.02e23
literal Anything but “ sorrounded by “ “core dumped”

printf(“total = %d\n”, score);


Attributes for tokens
 Practically, a token has one attribute: a pointer to the symbol
table entry in which the information about the token is kept

 The token name influences parsing decisions, while the


attribute value influences translation of tokens after the
parse

 The symbol table entry contains various information about


the token such as the lexeme, its type, the line number in
which it was first seen, etc…
Attributes for tokens

 E = M * C ** 2
 <id, pointer to symbol table entry for E>
 <assign-op>
 <id, pointer to symbol table entry for M>
 <mult-op>
 <id, pointer to symbol table entry for C>
 <exp-op>
 <number, integer value 2>
Lexical errors

 Some errors are out of power of lexical analyzer to


recognize:
 fi (a == f(x)) …

 However it may be able to recognize errors like:


 d = 2r

 Such errors are recognized when no pattern for tokens


matches a character sequence
Error recovery

 Panic mode:
 Delete one character from the remaining input

 Insert a missing character into the remaining input

 Replace a character by another character

 Transpose two adjacent characters


Basic Scanning technique
 Use 1 character of look-ahead
 Obtain char with getc()
 Do a case analysis
 Based on lookahead char
 Based on current lexeme

 Outcome
 If char can extend lexeme, all is well, go on.
 If char cannot extend lexeme:
 Figure out what the complete lexeme is and return its token
 Put the lookahead back into the symbol stream
Formalization
 How to formalize this pseudo-algorithm ?
 Character-at-a-time I/O
 Block / Buffered I/O

 Block/Buffered I/O
 Utilize Block of memory
 Stage data from source to buffer block at a time
 Maintain two blocks - Why (Recall OS)?
 Asynchronous I/O - for 1 block
 While Lexical Analysis on 2nd block
Block 1 Block 2
When done, ptr... Still Process token in
issue I/O
2nd block
Buffer Pairs
 Two pointers to the input are maintained:
 Pointer lexemeBegin, marks the beginning of the
current lexeme, whose extent we are attempting to
determine
 Pointer forward scans ahead until a pattern match is
found
E = M * C * * 2 eof

lexemeBegin forward
Algorithm: Buffered I/O with Sentinels
Current token

E = M * eof C * * 2 eof eof

lexeme beginning forward (scans


forward : = forward + 1 ;
ahead to find
pattern match)
if forward  = eof then begin
if forward at end of first half then begin
reload second half ; Block I/O Algorithm performs
forward : = forward + 1
I/O‟s. We can still
end
else if forward at end of second half then begin have get & un getchar
reload first half ; Block I/O Now these work on
move forward to beginning of first half
end real memory buffers !
else / * eof within buffer signifying end of input * /
terminate lexical analysis
end 2nd eof  no more input !
Specification of Tokens
 Questions
 What makes a lexeme an Identifier?
 What makes a lexeme a keyword?

 What makes a lexeme a comparison operator?

 So, we need a method to describe/define patterns


of the different classes of tokens
 Solution:
 Regular Expressions
Language
 A language, L, is simply any set of strings over
a fixed alphabet.

Alphabet Languages
{0,1} {0,10,100,1000,100000…}
{a,b,c} {abc,aabbcc,aaabbbccc,…}
{A, … ,Z} {TEE,FORE,BALL, FOR,WHILE,GOTO…}
{A,…,Z,a,…,z,0,…9, { All legal PASCAL progs}
+,-,…,<,>,…} {All grammatically correct
English sentences }
Language & Regular Expressions

 A Regular Expression is a Set of Rules / Techniques


for Constructing Sequences of Symbols (Strings)
From an Alphabet.

 Let  Be an Alphabet, r a Regular Expression Then


L(r) is the Language That is Characterized by the
Rules of R
Regular Definitions
 A regular definition is a sequence of definitions of the
form:
 d1→r1
 d2→r2 letter_ → A | B | . . . | Z | a | b | . . . | z | _
digit → 0 | 1 | . . . | 9
… id → letter_(letter_ | digit)*
 dn→rn

 where:
 Each di is a new symbol, not in ∑ and not the same as any
other of the d‟s, and
 Each ri is a regular expression over the alphabet ∑ U {d1,d2, .
. . , di-1}
Extensions of Regular Expressions
 One or more instances -- r+
 Zero or more instances -- r*
 Zero or one instance -- r?
 Character classes
 a1 | a2 | . . . | an
 [a1- an]
digit→0 | 1 | . . . | 9
digits→digit+
number→digits(.digits)?(E[+-]?digits)?

 All Strings that start with “ab” and end with “ba”
 All Strings in Which {1,2,3} exist in ascending order
Recognition of tokens
Regular Token Attribute-Value
Expression
ws - -
if if -
then then -
else else -
id id pointer to table entry
num num pointer to table entry
< relop LT
<= relop LE
= relop EQ
<> relop NE
> relop GT
>= relop GE
Note: Each token has a unique token identifier to define category of lexemes
How is Pattern Matching done?

• Patterns are converted into stylized flowcharts, called “Transition


Diagrams”
• As characters are read, the relevant TDs are used to attempt to
match lexeme to a pattern
• Each TD has:
• States : Represented by Circles/Nodes
• Actions/Inputs : Represented by Arrows between states
• Start State : Beginning of a pattern (Arrowhead)
• Final State(s) : End of pattern (Concentric Circles)
• Each TD is Deterministic - No need to choose between 2 different
actions !
Example TDs

•A tool to specify a token


>=: start > = RTN(GE)
0 6 7

other
8
* RTN(G)

We‟ve accepted “>” and have read other char that must be unread.
Example : All RELOPs
start < =
0 1 2 return(relop, LE)
>
3 return(relop, NE)
other
= *
4 return(relop, LT)

5 return(relop, EQ)
>

=
6 7 return(relop, GE)
other
8
*
return(relop, GT)
Example TDs : id and delim

id : letter or digit

start letter other *


9 10 11 return(id, lexeme)

delim :
delim
start delim other *
28 29 30
blank  b
tab  ^T
newline  ^N
delim  blank | tab | newline
ws  delim +
Example TDs : Unsigned #s
digit digit digit

start digit . digit E +|- digit other *


12 13 14 15 16 17 18 19

E digit
digit digit

start digit * . digit other *


20 21 22 23 24

digit

start digit other *


25 26 27
Recognition of Keywords
• All Keywords / Reserved words are matched as ids
•After the match, the symbol table or a special keyword table is
consulted
• Keyword table contains string versions of all keywords and
associated token values if 15
then 16
begin 17
... ...

• When a match is found, the token is returned, along with its


symbolic value, i.e., “then”, 16
• If a match is not found, then it is assumed that an id has been
discovered
Transition Diagrams Implementation
state = 0;
token nexttoken()
{ while(1) {
switch (state) {
case 0: c = nextchar();
/* c is lookahead character */
if (c== blank || c==tab || c== newline) {
state = 0;
lexeme_beginning++;
/* advance beginning of lexeme */
}
else if (c == ‘<‘) state = 1;
else if (c == ‘=‘) state = 5;
else if (c == ‘>’) state = 6;
else state = fail();
break;
… /* cases 1-8 here */
case 9: c = nextchar();
if (isletter(c)) state = 10;
else state = fail();
break;
case 10; c = nextchar();
if (isletter(c)) state = 10;
else if (isdigit(c)) state = 10;
else state = 11;
break;
case 11; retract(1); install_id();
return ( gettoken() );
… /* cases 12-24 here */
case 25; c = nextchar();
if (isdigit(c)) state = 26;
else state = fail();
break;
case 26; c = nextchar();
if (isdigit(c)) state = 26;
else state = 27;
break;
case 27; retract(1); install_num();
return ( NUM ); } } }
Lexical Analyzer Review
 LEXICAL ANALYZER
 Scan Input
 Remove WS, NL, …
 Identify Tokens
 Create Symbol Table
 Insert Tokens into ST
 Generate Errors
 Send Tokens to Parser
COMPILER DESIGN
CHAPTER THREE
SYNTAX ANALYSIS

Compiled By: Seble N.


Outline
 The Role of a Parser
 Context Free Grammar
 Introduction to Derivations
 Types of Parsing
The Role of a Parser
Context Free Grammar
 A context-free grammar has four components:
A set of Non-Terminals (V)- are syntactic variables
that denote sets of strings.
 A set of tokens, known as terminal symbols (Σ).
Terminals are the basic symbols from which strings are
formed.
 A set of productions (P). The productions of a
grammar specify the manner in which the terminals and
non-terminals can be combined to form strings.
 The start symbol (S); from where the production begins.
CFG Examples
 S --> <expression>
 <expression> --> number
 <expression> --> ( <expression> )
 <expression> --> <expression> + <expression>
 <expression> --> <expression> - <expression>
 <expression> --> <expression> * <expression>
 <expression> --> <expression> / <expression>
How is Parsing done?
 During parsing, we take two decisions for an input:
 Deciding the non-terminal which is to be replaced.
 Deciding the production rule, by which, the non-terminal
will be replaced.
 To decide which non-terminal to be replaced with
production rule, we can have two options.
 Left-mostDerivation
 Right-most Derivation
Derivation
 A derivation is basically a sequence of production
rules, in order to get the input string.
 Left-most Derivation
 Ifan input is scanned and replaced from left to right, it is
called left-most derivation.
 Right-most Derivation
 If we scan and replace the input with production rules, from
right to left, it is known as right-most derivation.
Derivation Example
 Example Left-most derivation
E→E*E
 Production rules:
E→E+E*E
E →E+E
E → id + E * E
E→E*E E → id + id * E
 E → id E → id + id * id
 Input string: id + id * id Right-most derivation
E→E+E
E→E+E*E
E → E + E * id
E → E + id * id
E → id + id * id
Parse Tree
 A parse tree is a graphical depiction of a
derivation.
 It is convenient to see how strings are derived from
the start symbol.
 The start symbol of the derivation becomes the root
of the parse tree.
Parse Tree Example
 Left-most derivation of a + b * c

Left-most derivation

E→E*E
E→E+E*E
E → id + E * E
E → id + id * E
E → id + id * id

In a parse tree:
 All leaf nodes are terminals.
 All interior nodes are non-terminals.
 In-order traversal gives original input string.
Ambiguity
 A grammar G is said to be ambiguous if it has more
than one parse tree (left or right derivation) for at
least one string.
E→E+E
E→E–E
E → id

INPUT
id – id + id
Ambiguity Solution
 No method can detect and remove ambiguity
automatically, but it can be removed by
 either re-writing the whole grammar without ambiguity,
or
 by setting and following associativity and precedence
constraints
 Example
 2+3*4 can have two different parse trees
 (2+3)*4 and
 2+(3*4)
Left Recursion
 A grammar becomes left-recursive if it has any non-
terminal „A‟ whose derivation contains „A‟ itself as the
left-most symbol
(1) A => Aα | β

(2) S => Aα | β
A => Sd
Removal of Left Recursion
A => Aα | β A => βA‟
A‟ => αA‟ | ε
Algorithm
START
Arrange non-terminals in some order like A1, A2, A3,…, An
for each i from 1 to n
{
for each j from 1 to i-1
{
replace each production of form Ai⟹Aj𝜸
with Ai ⟹ δ1𝜸 | δ2𝜸 | δ3𝜸 |…| 𝜸
where Aj ⟹ δ1 | δ2|…| δn are current Aj productions
}
} Exercises
eliminate immediate left-recursion S => Aα | β
END
A => Sd
Left Factoring
 If more than one grammar production rules has a
common prefix string, then the parser cannot make
a choice as to which of the production it should take
to parse the string in hand
A ⟹ αβ | α𝜸 | …

After Left Factoring Exercise


E → E + E |E – E
A => αA‟ E → id
A‟=> β | 𝜸 | …
First and Follow Sets
 An important part of parser table construction is to
create first and follow sets.
 First Set
 Thisset is created to know what terminal symbol is
derived in the first position by a non-terminal.
 For example,
α→ tβ
 That is, α derives t (terminal) in the very first position. So, t ∈
FIRST(α).
Algorithm for Calculating First Set

 Look at the definition of FIRST(α) set:


 if α is a terminal, then FIRST(α) = { α }.

 if α is a non-terminal and α → ℇ is a production, then


FIRST(α) = { ℇ }.

 if α is a non-terminal and α → 𝜸1 𝜸2 𝜸3 … 𝜸n and


any FIRST(𝜸) contains t, then t is in FIRST(α).
Example of First Set
 Remove left recursion of the following grammar

Exercise
E → E + T|T
T → T * F|F
F → (E)|id
Algorithm for Calculating First Set

 Look at the definition of FIRST(α) set:


 if α is a terminal, then FIRST(α) = { α }.

 if α is a non-terminal and α → ℇ is a production, then


FIRST(α) = { ℇ }.

 if α is a non-terminal and α → 𝜸1 𝜸2 𝜸3 … 𝜸n and


Solution
any FIRST(𝜸) contains t, then t is in FIRST(α). E→ TE‟
E‟→ +TE‟| ℇ
T→ FT‟
T‟→ *FT‟| ℇ
F→ (E)|id
Follow Set

 What terminal symbol immediately follows a non-


terminal α in production rules
 Algorithm for Calculating Follow Set:
 if α is a start symbol, then FOLLOW(S) = $
 if α is a non-terminal and has a production α → AB, then
FIRST(B) is in FOLLOW(A) except ℇ.
 if α is a non-terminal and has a production α → AB, where
B → ℇ, then FOLLOW(A) is in FOLLOW(α).
E→ TE‟
Example of Follow Set E‟→ +TE‟| ℇ
T→ FT‟
T‟→ *FT‟| ℇ
F→ (E)|id
 if α is a start symbol, then FOLLOW(S) = $

 if α is a non-terminal and has a production α → AB,


then FIRST(B) is in FOLLOW(A) except ℇ.

 if α is a non-terminal and has a production α → AB,


where B → ℇ, then FOLLOW(A) is in FOLLOW(α).
TYPES OF PARSING

 The way the production rules are implemented


(derivation) divides parsing into two types : top-
down parsing and bottom-up parsing.
Types of Top-down Parsing
Recursive descent parsing
 It is called recursive, as it uses recursive procedures
to process the input.
 Recursive descent parsing suffers from backtracking.
 It means, if one derivation of a production fails, the
syntax analyzer restarts the process using different
rules of same production
 Example:
 Draw a parse tree for the input string cad using the
following grammar
 S→cAd
 A→a | ab
S→cAd
Recursive descent parsing A→a|ab

 We create one node tree consisting of S


 Two pointers, one for the tree and one for the input, will
be used to indicate where the parsing process is.
Initially, they will be on S and the first input symbol,
respectively
 Then we use the first S-production to expand the tree.
The tree pointer will be positioned on the leftmost
symbol of the newly created sub-tree

 As the symbol pointed by the tree pointer matches that


of the symbol pointed by the input pointer, both
pointers are moved to the right
S→cAd
Recursive descent parsing A→a|ab

 Whenever the tree pointer points on a non-terminal, we


expand it using the first production of the non-terminal

 Whenever the pointers point on different terminals, the


production that was used is not correct, thus another
production should be used. We have to go back to the
step just before we replaced the non-terminal and use
another production

 If we reach the end of the input and the tree pointer


passes the last symbol of the tree, we have finished
parsing
Implementation of EiE‟
Recursive Decent Parsing E‟+iE‟/ 

main(){ E‟(){
if(ch==„+‟){
E();
match(„+‟);
if(ch==„$‟) match(„i‟);
printf(„Successful‟); E‟();
} }
else{
E(){ return;
if(ch==„i‟) }
match(char t){
}
match(„i‟); if(ch==t)
E‟(); ch=getchar()
else
}
error();
}
Recursive decent parsing with
Backtracking
S → rXd | rZd
X → oa | ea
Z → ai

 For an input string: read, a top-down parser, will


behave like this:
Non-recursive Predictive Parsing

 Does not suffer from backtracking

 Uses a parse table to determine which production


rule to apply

 Implemented on a stack
Parsing Table Construction
 The parsing table is a two dimensional array M[X,
a] where X is a nonterminal of the grammar and a
is a terminal or $
 compiler\parsetableconstruction.pdf
Non-recursive Predictive Parsing

 The input buffer contains the string to be parsed

followed by $

 The stack contains a sequence of grammar symbols with

$ at the bottom.

 Initially, the stack contains the start symbol of the

grammar followed by $.
Non-recursive Predictive Parsing
id+id*id
Non-recursive Predictive Parsing Rules

 If X is a terminal and X = a = $: the parser halts and


announces a successful completion of parsing
 If X is a terminal and X = a ≠ $: the parser pops x off
the stack and advances the input pointer to the next
symbol
 If X is a nonterminal: the program consults entry M[X, a]
which can be an X-production or an error entry.
 If M[X, a] = {X→Y1Y2 . . . Yk}, X on top of the stack will be
replaced by Y1Y2 . . . Yk (Y1 at the top of the stack).
 If M[X, a] = error, the parser calls the error recovery
method
LL Grammars
 LL grammar is a subset of context-free grammar but
with some restrictions to get the simplified version
 LL parser is denoted as LL(k)
LL(1) Grammars
 A grammar G is in LL(1) if and only if whenever A→α | β are two
distinct productions of G, the following conditions hold:
 For no terminal a do both α and β derive strings beginning with a
 At most one of α and β can derive the empty string
 If β => λ in zero or more steps, then α does not derive any string
beginning with a terminal in FOLLOW(A).
 Likewise, if α => λ in zero or more steps, then β does not derive any
string beginning with a terminal in FOLLOW(A)
Facts about LL(1) Grammars
 LL(1) grammar is a grammar for which the parsing
table does not have a multiply-defined entries
 No left-recursive or ambiguous grammar can be
LL(1)
 Example: Let a grammar G be given by:
S→aSbS
| bSaS
|
 Is G an LL(1) grammar?
Is G an LL(1) grammar?
S→ A/a
A →a
-------------------
S→ aABb
A →c | 
B → d| 
---------------
S→ aSA/ 
A →c/ 
Syntax Error Handling

 Goals of the error handler


 Report the presence of errors clearly and accurately

 Recover from each error quickly enough to detect


subsequent errors

 Add minimal overhead to the processing or correct


programs
Syntax Error Handling

 The error handler should report:


 the place in the source program where the error is
detected

 the type of error (if possible)


Error Recovery Strategies
93

 There are four main strategies in error handling:


 Panic mode: discards all tokens until a synchronization token
is found.
 Phrase level recovery: the parser makes a local correction
so that it can continue to parse the rest of the input.
 When a parser encounters an error, it tries to take corrective measures so that the rest of the
inputs of the statement allow the parser to parse ahead. For example, inserting a missing
semicolon, replacing comma with a semicolon, etc.
 Error productions: augment the grammar to capture the
most common errors that programmers make

 Global correction: makes as few changes as possible in the


program so that a globally least cost correction program is
obtained.
Panic mode Recovery
 As a starting point place all symbols in FOLLOW(A) into the
synchronizing set for nonterminal A. If we skip tokens until an
element of FOLLOW(A) is seen and pop A from the stack, it is likely
that the parsing can continue
 It is not enough to use FOLLOW(A) as the synchronizing set for A. For
example, if semicolons terminate statements, then keywords that
begin statements may not appear in the FOLLOW set of the
nonterminal representing expressions.
 A missing semicolon after an assignment may therefore result in the
keyword beginning the next statement being skipped. For example, we
might add keywords that begin statements to synchronizing sets for the
non-terminals generating expressions
 If nonterminals can generate the empty string, then the production
deriving λ can be used as a default.
Limitations of Syntax Analyzers
 Syntax analyzers have the following drawbacks:
 it cannot determine if a token is valid,
 it cannot determine if a token is declared before it is
being used,
 it cannot determine if a token is initialized before it is
being used,
 it cannot determine if an operation performed on a
token type is valid or not.
Bottom-Up Parsing

 Begins parsing at the leaves (the bottom) and works


up towards the root (the top)

 The key decisions in bottom-up parsing are


 about when to reduce and

 about what production to apply, as the parse proceeds


Bottom-Up Parsing
Bottom-up parsing

 At each reduction step a particular substring


matching the right side a production (body) is
replaced by the symbol of the left of that
production (head), and if the substring is chosen
correctly at each step, a rightmost derivation is
traced out in reverse.
Bottom-up parsing
 Will the following grammar generate input string
“id+id*id” ?
E→E+T | T
T→T*F | F
F→(E) | id

1. Use top down parser with rightmost derivation


technique to prove it.
2. Use bottom up parser with leftmost derivation
technique to prove it.
Handle Pruning
 A “handle” is a substring that matches the body of
a production, and whose reduction represents one
step along the reverse of a rightmost derivation

Right Sentential Form Handle Reducing Production

id1*id2 id1 F→id


F*id2 F T→F
T*id2 id2 F→id
T*F T*F T→T*F
T T E→T
Shift Reduce Parsing
 Uses two unique steps for bottom-up parsing
 Shift step:
 The shift step refers to the advancement of the input pointer to the
next input symbol, which is called the shifted symbol.
 This symbol is pushed onto the stack. The shifted symbol is treated
as a single node of the parse tree.
 Reduce step:
 When the parser finds a complete grammar rule (RHS) and
replaces it to (LHS), it is known as reduce-step. This occurs when
the top of the stack contains a handle.
 To reduce, a POP function is performed on the stack which pops
off the handle and replaces it with LHS non-terminal symbol.
Shift-Reduce Parser
 a stack holds grammar symbols and
 an input buffer holds the rest of the string to be
parsed
 the handle always appears at the top of the stack
just before it is identified as the handle
 $ is used to mark the bottom of the stack and also
the right end of the input
 Initially, the stack is empty, and the string w is on the
input buffer as follows
Stack Input
$ w$
Shift-Reduce Parsing
 During a left-to-right scan of the input string, the parser shifts zero or more
input symbols onto the stack, until it is ready to reduce a string β of
grammar symbols on top of the stack
 The following steps show the reductions of id*id to E
Stack Input Action
$ id1*id2 $ Shift
$ id1 *id2 $ reduce by F→id
$F *id2 $ reduce by T→F
E→E+T | T $T *id2 $ Shift **
T→T*F | F $ T* id2 $ Shift
F→(E) | id $ T*id2 $ reduce by F→id
$ T*F $ reduce by T→T*F
$T $ reduce by E→T
$E $ accept
Shift-Reduce Parser
 There are four possible actions a shift-reduce parser can make:

 Shift. Shift the next input symbol on the top of the stack

 Reduce.
 The parser knows the right end of the handle is at the top of the
stack. It should then decide what non-terminal should replace that
substring

 Accept. Announce successful completion of parsing

 Error. Discover a syntax error and call an error recovery routine


Conflictsduring Shift-ReduceParsing

 cannot decide whether to shift or to reduce (a


shift/reduce conflict), or

 cannot decide which of several reductions to make


(a reduce/reduce conflict)
LR Parser
 LR parser is a non-recursive, shift-reduce, bottom-up parser
 The “L” is for left-to-right scanning of the input,
 The “R” for constructing a rightmost derivation in reverse,
and
 The k for the number of input symbols of lookahead that are
used in making parsing decisions
 When (k) is omitted, k is assumed to be 1
 It uses a wide class of context-free grammar which makes it
the most efficient syntax analysis technique
Components of LR Parser
 Stack Input a1 . . . ai ... an$

 Input Buffer
 LR Parsing Algorithm sm
Xm LR
Parsing Table Stack
sm-1 Parsing
 Xm-1 Program Output
...
 Has two parts s0

 Action and
 goto
action goto
Types of LR Parsers
 LR(0)
 SLR(1) – Simple LR
 LALR(1) - Look Ahead LR
 CLR(1) – Canonical LR
LR Parsing Algorithm

 The parsing program reads characters from an input


buffer one at a time

 The program uses a stack to store a string of the form


s0X1s1X2 . . . Xmsm, where sm is on top

 Each Xi is a grammar symbol and each si is a symbol


called a state
LR Parsing Algorithm Cont‟d
 The program driving the LR parser behaves as
follows:
 It determines sm, the state currently on top of the
stack, and ai, the current input symbol
 It then consults action[sm, ai], the parsing action
table entry for state sm and input ai, which can
have one of four values:
 shifts, where s is a state,
 reduce by a grammar production A→β,
 accept, and
 error
LR Parsing Algorithm Cont‟d
 The four types of actions are as follows:
 If action[sm, ai] = shift s, the parser shifts
 the current input symbol ai and
 the next state s, which is given in action[sm, ai], onto the stack
 ai+1 become the current input symbol
 If action[sm, ai] = reduce A→β, then the parser executes a reduce
move,
 Here the parser
 first pops 2r symbols off the stack (r state symbols and r grammar
symbols),
 Then pushes both A, the left side of the production, and s, the entry
for goto[sm-r, A], onto the stack
 The current input symbol is not changed
 If action[sm, ai] = accept, parsing is completed
 If action[sm, ai] = error, the parser has discovered an error and
call an error recovery routine
LR Parsing Table 1. E→E+T
2. E→T
si means shift and stack state i, 3. T→T*F
rj means reduce by a production numbered j, 4. T→F
acc means accept 5. F→(E)
blank means error 6. F→id

Action Goto
State
Id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
LR Parsing Example
 On input id*id+id, the sequence of stack and input
contents is

Stack Input Action


0 id*id+id $ Shift
Moves of LR parser on id*id+id
Stack Input Action
(1) 0 id*id+id $ Shift
(2) 0 id 5 *id+id $ reduce by F→id
(3) 0F3 *id+id $ reduce by T→F
(4) 0T2 *id+id $ Shift
(5) 0 T 2*7 id+id $ Shift
(6) 0 T 2*7 id 5 +id $ reduce by F→id
(7) 0 T 2*7 F 10 +id $ reduce by T→T*F
(8) 0T2 +id $ reduce by E→T
(9) 0E1 +id $ Shift
(10) 0 E 1+6 id $ Shift
(11) 0 E 1+6 id 5 $ reduce by F→id
(12) 0 E 1+6 F 3 $ reduce by T→F
(13) 0 E 1+6 id 9 $ reduce by E→E+T
1. S→AA
Exercise 2. A→aA
3. A→b

 Will the input string “aabb$” be accepted by the above grammar?

Action Goto
State
a b $ A S
0 s3 s4 2 1
1 acc
2 s3 s4 r2 5
3 s3 s4 6
4 r3 r3 r3
5 r1 r1 r1
6 r2 r2 r2
Types of LR Parsers
 LR(0)
 SLR(1) – Simple LR
 LALR(1) - Look Ahead LR
 CLR(1) – Canonical LR
LR Parser Types
 There are three widely used algorithms available for
constructing an LR parser:
 SLR(1) – Simple LR Parser:
 Works on smallest class of grammar
 Few number of states, hence very small table
 Simple and fast construction
 LR(1) – LR Parser:
 Works on complete set of LR(1) Grammar
 Generates large table and large number of states
 Slow construction
 LALR(1) – Look-Ahead LR Parser:
 Works on intermediate size of grammar
 Number of states are same as in SLR(1)
LR(0) PARSING TABLE
CONSTRUCTION
LR(0) Parsing Table Construction
 LR(0) Item/Item
 Item sets/Canonical LR(0)
 LR(0) State
 Augmented Grammar
 Closure Operation
 Goto Operation
LR(0) Item
 An LR(0) item (item for short)
 is a production of a grammar G with a dot at some position
of the right side
 For example
 the rule E → E + B has the following four corresponding
items:
 E→•E+B
 E→E•+B
 E→E+•B
 E→E+B•
 Rules of the form A → ε have only a single item A → .
 An item indicates what is the part of a production that
we have seen and what we hope to see.
Item Set

 An LR(0) state is a set of Items

 To construct the canonical LR(0) collection for a


grammar,
 we define an augmented grammar and

 two functions: closure and goto


Augmented Grammar
 If G is a grammar with start symbol S, then G‟, the
augmented grammar for G, is G with a new start
symbol S‟ and production S‟→S
 The purpose of this new starting production is to
indicate the parser when it should stop parsing and
announce the acceptance of the input, i.e.,
acceptance occurs when and only when the parser is
about to reduce S‟→S E’→E
G E→E+T G‟ E→E+T
E→T E→T
T→T*F T→T*F
T→F T→F
F→(E) F→(E)
F→id F→id
The Closure Operation
 The closure of an item set written as closure(I)
where I is an item set
 If I is a set of items of G, then closure(I) is the set of
items constructed by two rules:
 Initially,
every item in I is added to closure(I)
 If A→α.Bβ is in closure(I) and B→γ is a production, then
add B→.γ to I.
 It is these closed item sets that are taken as the
states of the parser
The Closure Operation
 If I is the set of one item: {[E‟→.E]}, then closure(I)
contains the items:
E‟→.E
E→.E+T
E→.T
T→.T*F
T→.F
F→.(E)
F→.id
The Goto Operation

 The second useful function is goto(I, X) where I is a


set of items and X is a grammar symbol

 goto(I, X) is defined as the closure of all items


[A→αX.β] such that [A→α.Xβ] is in I
The Goto Operation
 If I is the set of two items {[E‟→E.], [E→E.+T]}, then
goto(I, +) consists of
E→E+.T
T→.T*F
T→.F
F→.(E)
F→.id
Sets-of-Items Construction (
Canonical of LR(0))
 The algorithm to construct C, the canonical collection
of sets of LR(0) items for an augmented grammar
G‟ is
procedure items(G‟);
begin
C := {closure({[S‟→.S]})
repeat
for each set of items I in C and each grammar symbol X
such that goto(I, X) is not empty and not in C do
add goto(I, X) to C
until no more sets of items can be added to C
end
E’→E
E→E+T
LR(0) Canonical Construction E→T
T→T*F
T→F
 Example: Construction of the sets of Items for the augmented grammar above F→(E)
G 1‟
F→id
 I0 = {[E‟→.E], [E→.E+T], [E→.T], [T→.T*F], [T→.F], [F→.(E)], [F→.id]}

 I1 = goto(I0, E) = {[E‟→E.], [E→E.+T]}

 I2 = goto(I0, T) = {[E→T.], [T→T.*F]}

 I3 = goto(I0, F) = {[T→F.]}

 I4 = goto(I0, () = {[F→(.E)], [E→.E+T], [E→.T], [T→.T*F], [T→.F], [F→.(E)],


[F→.id]}

 I5 = goto(I0, id) = {[F→id.]}

 I6 = goto(I1, +) = {[E→E+.T], [T→.T*F], [T→.F], [F→.(E)], [F→.id]}

 I7 = goto(I2, *) = {[T→T*.F], [T→.F], [F→.(E)], [F→.id]}


LR(0) Canonical Construction
 I8 = goto(I4, E) = {[F→(E.)], [E→E.+T]}
goto(I4, () = I4;
goto(I4, id) = I5;
 I9 = goto(I6, T) = {[E→E+T.], [T→T.*F]}
goto(I6, F) = I3;
goto(I6, () = I4;
goto(I6, id) = I5;
 I10 = goto(I7, F) = {[T→T*F.]}
goto(I7, () = I4;
goto(I7, id) = I5;
 I11= goto(I8, )) = {[F→(E).]}
goto(I8, +) = I6;
goto(I6, *) = I7;
Exercise

 Construct the LR(0) Canonical Set for the


following grammar
 S →( L )
 S→x
 L→S
 L→L,S

You can find the answer on slide133


LR(0) Table Construction
 If state i contains S‟→ S.
 Then
 table[i,$] = accept
 If state i contains rule k: X →Y.
 Then
 table[i,T] = rk for all terminals T
 For goto(Ii, T) = Ij OR Transition from i to j marked with T
 Then
 table[i,T] = sj
 For goto(Ii, X) = Ij OR Transition from i to j marked with X
 Then
 table[i,X] = j
Exercise

 Construct an LR(0) Parsing Table for


the following Grammar
S → AA
A → aA
A →b
Answer for the question on slide number 130

S‟ → S
S →( L )
S→x
L→S
L→L,S
S‟ ::= . S $
S ::= . ( L )
S ::= . x
Grammar:

S‟ → S
S →( L )
S→x
L→S
L→L,S
S‟ ::= . S $
S ::= . ( L )
S ::= . x

S‟ ::= S . $
Grammar:

S‟ → S
S →( L )
S→x
L→S
L→L,S
(
S‟ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S
S ::= . x L ::= . L , S
S ::= . ( L )
S S ::= . x

S‟ ::= S . $
Grammar:

S‟ → S
S →( L )
S→x
L→S
L→L,S
(
S‟ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S
S ::= . x L ::= . L , S
S ::= . ( L )
S S ::= . x

S‟ ::= S . $
Grammar:

S‟ → S
S →( L )
S→x
L→S
L→L,S
(
S‟ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S
S ::= . x L ::= . L , S
S ::= . ( L )
S S ::= . x

S‟ ::= S . $
Grammar:

S‟ → S
S →( L )
S→x
L→S
L→L,S
(
S‟ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S
S ::= . x L ::= . L , S
S ::= . ( L ) L S ::= ( L . )
S ::= . x L ::= L . , S
S

S‟ ::= S . $
Grammar:

S‟ → S
S →( L )
S→x
L→S
L→L,S
(
S‟ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S
S ::= . x L ::= . L , S
S ::= . ( L ) L S ::= ( L . )
S ::= . x L ::= L . , S
S

S
S‟ ::= S . $
L ::= S .
Grammar:

S‟ → S
S →( L )
S→x S ::= x .
L→S
L→L,S x
(
S‟ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S
S ::= . x L ::= . L , S
S ::= . ( L ) L S ::= ( L . )
S ::= . x L ::= L . , S
S

S
S‟ ::= S . $
L ::= S .
Grammar:

S‟ → S
S →( L )
S→x S ::= x .
L→S
L→L,S x
(
S‟ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S
S ::= . x L ::= . L , S
S ::= . ( L ) L S ::= ( L . )
S ::= . x L ::= L . , S
S
)
S
S‟ ::= S . $
S ::= ( L ) .
L ::= S .
Grammar:

S‟ → S
S →( L )
S→x S ::= x .
L→S L ::= L , . S
L→L,S x S ::= . ( L )
S ::= . x
(
S‟ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x L ::= . L , S
S ::= . ( L ) L S ::= ( L . )
S ::= . x L ::= L . , S
S
)
S
S‟ ::= S . $
S ::= ( L ) .
L ::= S .
Grammar:

S‟ → S
S →( L )
S→x S ::= x .
L→S L ::= L , . S
L→L,S x S ::= . ( L )
S ::= . x
(
S‟ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x L ::= . L , S
S ::= . ( L ) L S ::= ( L . )
S ::= . x L ::= L . , S
S
)
S
S‟ ::= S . $
S ::= ( L ) .
L ::= S .
Grammar:

L ::= L , S .

S‟ → S
S
S →( L )
S→x S ::= x .
L→S L ::= L , . S
L→L,S x S ::= . ( L )
S ::= . x
(
S‟ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x L ::= . L , S
S ::= . ( L ) L S ::= ( L . )
S ::= . x L ::= L . , S
S
)
S
S‟ ::= S . $
S ::= ( L ) .
L ::= S .
Grammar:

L ::= L , S .

S‟ → S
S
S →( L )
S→x S ::= x .
L→S L ::= L , . S
L→L,S x S ::= . ( L )
( S ::= . x
(
S‟ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x L ::= . L , S
S ::= . ( L ) L S ::= ( L . )
S ::= . x L ::= L . , S
S
)
S
S‟ ::= S . $
S ::= ( L ) .
L ::= S .
Grammar:

L ::= L , S .

S‟ → S
S
S →( L )
S→x x
S ::= x .
L→S L ::= L , . S
x S ::= . ( L )
L→L,S x
( S ::= . x
(
S‟ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x L ::= . L , S
S ::= . ( L ) L S ::= ( L . )
S ::= . x L ::= L . , S
S
)
S
S‟ ::= S . $
S ::= ( L ) .
L ::= S .
Grammar:
Assigning numbers to states:
9 L ::= L , S .

S‟ → S
S
S →( L )
x 8
S→x 2 S ::= x .
L→S L ::= L , . S
x S ::= . ( L )
L→L,S x
( S ::= . x
1 (
S‟ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x ( L ::= . L , S 5
S ::= . ( L ) L S ::= ( L . )
3 S ::= . x L ::= L . , S
S
)
S
4 S‟ ::= S . $
6 S ::= ( L ) .
7 L ::= S .
The LR(0) Parsing Table
S‟ → S
S →( L ) states ( ) x , $ S L
S→x 1 s3 s2 g4
L→S
2 r2 r2 r2 r2 r2
L→L,S
3 s3 s2 g7 g5
4 a
5 s6 s8
6 r1 r1 r1 r1 r1
7 r3 r3 r3 r3 r3
8 s3 s2 g9
9 r4 r4 r4 r4 r4
Construct the LR(0) parsing table for G

 S→ AA
 A→aA/b

parse the input
aabb$
LR(0)
 LR(0) doesn't look ahead at all
 we only use the terminal to figure out which state to go to next,
not to decide whether to shift or reduce
 Example
states ( ) x , $ S L
 S→ X 1 s3 s2 g4
2 r2 r2 r2 r2 r2
 X→a 3 s3 s2 g7 g5
 X→ab ignore next automaton state
states no look-ahead S L
input: ( x , x ) $ 1 shift g4
2 reduce 2
stack: 1(3L5
3 shift g7 g5
SLR(1) PARSING TABLE
CONSTRUCTION
SLR(1) Parse Table Generation

 SLR( k): Simple LR with k tokens of lookahead

 „For reductions, consider next token


 When does this reduction make sense?

 Only if it can lead to a correct parse!

 Can lead to a correct parse if next token is in FOLLOW


SET
SLR(1) Parse Table Generation

 „Generate the collection of item sets as normal for LR(0)

 When generating table actions:


 Reduce only if next token is in FOLLOW(LHS)

 This method is the weakest, in terms of number of


grammars it succeeds for, but it is also the simplest to
implement
SLR(1) Table Construction
 If state i contains S‟→ S.
 Then
 table[i,$] = accept
 If state i contains rule k: X →Y.
 Then
 table[i,T] = rk for all terminals T in follow of X
 For goto(Ii, T) = Ij OR Transition from i to j marked with
T(terminal)
 Then
 table[i,T] = sj
 For goto(Ii, X) = Ij OR Transition from i to j marked with X(non-
terminal)
 Then
 table[i,X] = j
Construct the SLR(1) parsing table for G

 S→ X
 X→a
 X→ab
Grammar:
SLR(1) Set of Items collection
S’→S
S→L=R [S→L=R.]
S→R 9
R→L R
[S→L.= R]
L→*R 2 = 6
[R→L.] [S→L =.R]
L→id
L [R→.L]
[S‟→.S] [L→.*R]
0 [S→.L=R] R 3 [S→R.] [L→.id]
[S→.R] * *
[L→.*R] L
[L→.id] * 4 [L→*.R] L 8
[R→.L] [L→.*R] id
[R→L.]
S [L→.id]
[R→.L] R
id id
1 S‟ → S . 7
[L→*R.]
[L→id.]
5
SLR(1) Parsing Table

S’→S states = * id $ S L R
S→L=R 0 s4 s5 1 2 3
S→R
R→L 1 acc
L→*R 2 s6/r3 r3
L→id 3 r2
4 s4 s5 8 7
5 r5 r5
F(S) = {$}
F(L)={$,=} 6 s4 s5 8 9
F(R)={$,=} 7 r4 r4
8 r3 r3
9 r1

Since there is a shift-reduce conflict G is not an SLR(1) grammar


LR(1) PARSING TABLE
CONSTRUCTION
LR(1)
 Lets examine why the above error occurred
 Because

 the states produced using LR(0) items do not hold


enough information

 Solution

 Using an LR(1) items that has a form of


 [A→α.β, a] where a is a terminal or $
LR(1) items
 Items will keep info on
 Production

 right-hand-side
position (the dot)
 look ahead symbol LR(1)

 Item [A →α• β, T] means


 The parser has parsed an α
 If it parses a β and the next symbol is T

 Then parser should reduce by A →αβ


LR(1) items

 LR(1) automata are identical to LR(0) except for

the “items” that make up the states


 LR(0) items:

X → s1. s2

 LR(1) items look-ahead symbol added

X → s1. s2, T
LR(1) Closure Operation
 I is a set of LR(1) items, then closure(I) is found using
the following algorithm:
Closure(I)
repeat for all items [A →α•Xβ, c] in I
for any production X → γ
for any d ∈First(βc)
I = I ∪{ [X →• γ, d] }
until I does not change
LR(1) Closure Operation
 If I is the set of LR(1) one item: {S‟→.S, $}, then
closure(I) contains the items: S’→S
{ S→L=R
 [S‟→.S, $] S→R
 [S→.L=R, $] R→L
L→*R
 [S→.R, $]
L→id
 [L→.*R, =]
 [L→.id, =]
 [R→.L, $]
 [L→.*R, $]
FIRST($) = {$}
 [L→.id, $]
FIRST(=R$) = {=}
}
LR(1) Goto Operation

 The second useful function is goto(I, X) where I is a


set of items and X is a grammar symbol
Goto(I, X)
J={}
for any item [A →α•Xβ, c] in I
J = J ∪ closure{[A →αX•β, c]}
return J
LR(1) Goto Operation
 If I is the set of one item { [L→.*R, =/$]},
then goto(I, *) consists of
S’→S
{ S→L=R
S→R
[L→*.R, =/$], R→L
L→*R
[L→.*R, =/$], L→id
[L→.id, =/$],
[R→.L, =/$] FIRST($) = {$}
} FIRST(=R$) = {=}
LR(1) Sets-of-Items Construction
 Start with the item I0 = {S‟ →•S, $}
 Find the closure of I0
 Pick a state I
for each item [A→α•X β, c] in I
find Goto(I, X)
if Goto(I, X) is not already a state, make one
Repeat until no more additions possible
LR(1) Set of items
 Construct the LR(1) set of items for the following
grammar G
0) S' –> S
1) S –> XX
2) X –> aX
3) X –> b
LR(1) Set of items
0) S' –> S I0: S' –> •S, $ I4: X –> b•, a/b
1) S –> XX S –> •XX, $
2) X –> aX X –> •aX, a/b I5: S –> XX•, $
3) X –> b X –> •b, a/b
I6: X –> a•X, $
I1: S' –> S•, $ X –> •aX, $
X –> •b, $
I2: S –> X•X, $
X –> •aX, $ I7: X –> b•, $
X –> •b, $
I8: X –> aX•, a/b
I3: X –> a•X, a/b
X –> •aX, a/b I9: X –> aX•, $
X –> •b, a/b
LR(1) Table Construction
 To fill in the entries in the action and goto tables, we use a similar
algorithm as we did
 for SLR(1), but instead of assigning reduce actions using the follow set,
we use the specific lookaheads.
 If state i contains S‟→ S.
 Then table[i,$] = accept
 If state i contains rule k: [X →Y., a]
 Then table[i,a] = rk
 For goto(Ii, T) = Ij OR Transition from i to j marked with
T(terminal)
 Then table[i,T] = sj
 For goto(Ii, X) = Ij OR Transition from i to j marked with X(non-
terminal)
 Then table[i,X] = j
LR(1) parsing table
0) S' –> S
1) S –> XX
2) X –> aX
3) X –> b
LALR(1)
 Reading Assignment!
COMPILER DESIGN
CHAPTER FOUR
SEMANTIC ANALYSIS:
SYNTAX DIRECTED TRANSLATIONS

Compiled By: Seble N.


A Step-Back
A Step-Back

CHAPTER TWO
•Strings
•Regular expressions
•Tokens
•Transition diagrams
•Finite Automata
A Step-Back

CHAPTER THREE
•Grammars
•Derivations
•Parse-trees
•Top-down parsing (LL)
•Bottom-up paring (LR, SLR,LALR)
We Need Some Tools
 To help in semantic analysis
 To help in intermediate code generation
 Two such tools
 Semantic rules (Syntax-Directed Definitions)
 Semantic actions (Syntax Directed Translations)
What does a semantic analyzer do?
 Semantic analysis judges whether the syntax structure
constructed in the source program derives any meaning or
not.
 For Example
 int a = “value”;
 should not issue an error in lexical and syntax analysis phase, as it is
lexically and structurally correct, but it should generate a semantic
error as the type of the assignment differs
 The following tasks should be performed in semantic
analysis:
 Scope resolution
 Type checking
 Array-bound checking
Semantic Errors
 These are some of the semantics errors that the
semantic analyzer is expected to recognize:
 Type mismatch
 Undeclared variable

 Reserved identifier misuse

 Multiple declaration of variable in a scope.

 Accessing an out of scope variable.

 Actual and formal parameter mismatch.


Syntax Directed Translation
 Syntax Directed Translation
 Refers to a method of compiler implementation where the source
language translation is completely driven by the parser
 The parsing process and parse trees are used to direct semantic
analysis and the intermediate code translation of the source
program
 This can be a separate phase of a compiler or we can augment
the CFG with information to control the semantic analysis and
translation process. Such grammars are called attributed
grammars.
Syntax Directed Definitions

 are a generalization of context-free grammars in which:


 Grammar symbols have an associated set of Attributes;
 Productions are associated with Semantic Rules for computing
the values of attributes.

 Such formalism generates Annotated Parse-Trees where


each node of the tree is a record with a field for each
attribute
Attributed CFG Example
 Attributed grammar that calculates the value of an
expression,
Syntax Rules Semantic Rules
 E→E + T E1.v= E2.v + T.v
 E→T E.v = T.v
 T→T * F T1.v= T2.v * F.v
 T→F T.v = F.v
 F→id F.v = id.value
 F→(E) F.v = E.v
Annotated Parse Tree Example
Attributed CFG Example
 Attributed grammar that associates to an identifier
its type
Syntax Rules Semantic Rules
 D→T L L.in = T.type
 T→real T.type = real
 L→ident ident.type = L.in
 L1→L2, ident L2.in = L1.in
ident.type = L1.in
Types of Attributes

 Synthesized Attributes
 Attribute of a node is defined in terms of:
 Attribute values at children of the node

 Attribute value at node itself

 SDD involving only synthesized attributes is called


S-attributed Definition
Types of Attributes
 Inherited Attributes
 Attribute of a node is defined in terms of:
 Attribute values at parent of the node
 Attribute values at left siblings
 Attribute value at node itself

 SDD involving both synthesized and inherited attributes is


called L-attributed Definition
 NB:
 Terminals can have synthesized attributes, but not inherited
attributes
S-attributed Grammar Example

 Definition
 An S-Attributed Definition is a Syntax Directed
Definition that uses only synthesized attributes.

 Evaluation Order
 Semantic rules in an S-Attributed Definition can be
evaluated by a bottom-up, or PostOrder, traversal of
the parse-tree
S-attributed Grammar Example
 Example of an S-attributed grammar:
Syntax Rules Semantic Rules
L → En print(E.val)
 E → E1 + T E.val = E1.val + T.val
E→ T E.val = T.val
 T → T1* F T.val = T1.val F.val
T→ F T.val = F.val
 F → (E) F.val = E.val
 F → digit F.val =digit.lexval
S-attributed Grammar Example
 The annotated parse-tree for the input 3*5+4n
from the above S-attributed grammar is:
S-attributed Grammar Example
 Exercises
 Give the annotated parse tree of (3+4)*(5+6)n from
the following grammar
Syntax Rules Semantic Rules
 L → En print(E.val)
 E → E1 + T E.val = E1.val + T.val
 E → T E.val = T.val
 T → T1* F T.val = T1.val F.val
 T → F T.val = F.val
 F → (E) F.val = E.val
 F → digit F.val =digit.lexval
L-attributed Grammars

 Definition
 An L-Attributed Definition is a Syntax Directed
Definition that uses both synthesized and Inherited
attributes.
 It
is always possible to rewrite a syntax directed
definition to use only synthesized attributes,
 Evaluation Order.
 Inherited
attributes cannot be evaluated by a simple
PreOrder traversal of the parse-tree:
L-attributed Grammar Example
 An L-Attributed grammar that associates to an identifier
its type
PRODUCTION SEMANTIC RULE
 D → TL L.in := T.type
 T → int T.type :=integer
 T → real T.type :=real
 L → L1, id L1.in := L.in; addtype(id.entry, L.in)
 L → id addtype(id.entry, L.in)
L-attributed Grammar Example
 The annotated parse-tree for the input real id1, id2, id3 from the
above L-attributed grammar is:

 L.in is then inherited top-down the tree by the other L-nodes.


 At each L-node the procedure addtype inserts into the symbol table
the type of the identifier.
L-attributed Grammar Example
 Example of another attributed grammar:
/* We have three attributes defined for this grammar */
Syntax Rules Semantic Rules
 N→L1 . L2 N.v = L1.v + L2.v
L1.s = 0
L2.s = -L2.x
 L1→L2 B L1.x = L2.x + 1
L2.s = L1.s + 1
B.s = L1.s
L1.v = L2.v + B.v
 L→B L.v = B.v
L.x = 1
B.s = L.s
 B→0 B.v = 0
 B→1 B.v = 1 * 2B.s
L-attributed Grammars
 Identify the non-L-attributed production
 Example:
 A→L M L.h = f1(A.h)
M.h = f2(L.s)
A.s = f3(M.s)
 Example:
 A→Q R R.h = f4(A.h)
Q.h = f5(R.s)
A.s = f6(Q.s)
Implementing SDD

 Dependency Graphs

 S-Attributed Definitions

 L-Attributed Definitions
Dependency Graph

 Implementing a SDD consists primarily in finding an


order for the evaluation of attributes

 The attributes should be evaluated in a given order


because they depend on one another

 The dependency of the attributes is represented by


a dependency graph
Inter-dependency of Attributes
 A Dependency Graph shows the interdependencies among
the attributes of the various nodes of a parse-tree

 Dependency Graphs are the most general technique used


to evaluate SDD with both synthesized and inherited
attributes.
 T(j) ----D()---> E(i) if and only if there exists a semantic action
such as E(i) := f (... T (j) ...)
Building Dependency Graph
 Algorithm for the construction of the dependency
graph
 for each node n in the parse tree do
 for each attribute a of the grammar symbol at n do
 Construct a node in the dependency graph for a
 for each node n in the parse tree do
 for each semantic rule b := f (c1, c2, . . ., ck) associated with
the production used at n do
 for i := 1 to k do
 Construct an edge from the node for ci to the node for b;
Dependency Graph
 Build the dependency graph for the parse-tree of
real id1, id2, id3
Example
The dependency graph for the parse-tree
of real id1, id2, id3
Evaluation Order
 The evaluation order of semantic rules depends from a
Topological Sort derived from the dependency graph
 A topological sort
 of a directed acyclic graph is any ordering m1, m2, . . ., mk
of the nodes of the graph such that edges go from nodes
earlier in the ordering to later nodes.
 i.e., if mi→mj is an edge from mi to mj then mi appears before mj
in the ordering
 Any topological sort of a dependency graph gives a
valid order to evaluate the semantic rules.
Evaluation Order

 a4=real;
 a5=a4;
 addtype(id3.entry,a5);
 a7=a5;
 addtype(id2.entry,a7);
 a9=a7;
 addtype(id1.entry,a5);

202
Evaluating Semantic Rules
203

 Parse Tree methods


 At compile time evaluation order obtained from the topological sort of
dependency graph.
 Fails if dependency graph has a cycle
 Rule Based Methods
 Semantic rules analyzed by hand or specialized tools at compiler
construction time
 Order of evaluation of attributes associated with a production is pre-
determined at compiler construction time
 For this method, the dependency graph need not be constructed
 Oblivious Methods
 Evaluation order is chosen without considering the semantic rules.
 Restricts the class of syntax directed definitions that can be
implemented.
Implementing Attribute Evaluation
 Attributes can be evaluated by building a dependency
graph at compile-time and then finding a topological sort.
 Disavantages
 1. This method fails if the dependency graph has a cycle: We
need a test for non-circularity;
 2. This method is time consuming due to the construction of the
dependency graph.
 Alternative Approach.
 Design the SDD in such a way that attributes can be evaluated
with a fixed order avoiding to build the dependency graph
(method followed by many compilers).
Strongly Non-Circular Syntax
Directed Definitions
 Formalisms for which an attribute evaluation order
can be fixed at compiler construction time

 Two kinds of strictly non-circular definitions:


 S-Attributed and

 L-Attributed Definitions
Evaluation of S-Attributed Definitions

 Synthesized Attributes can be evaluated by a bottom-


up parser as the input is being analyzed avoiding the
construction of a dependency graph.
 The parser keeps the values of the synthesized attributes in
its stack.
 Whenever a reduction A  α is made, the attribute for A is
computed from the attributes of α which appear on the
stack.
Bottom-Up Evaluation -- Example
207 • At each shift of digit, we also push digit.lexval into val-stack.
Input state val semantic rule
5+3*4n - -
+3*4n 5 5
+3*4n F 5 F → digit
+3*4n T 5 T→F
+3*4 n E 5 E→T
3*4n E+ 5-
*4 n E+3 5-3
*4n E+F 5-3 F → digit
*4n E+T 5-3 T→F
4n E+T* 5-3-
n E+T*4 5-3-4
n E+T*F 5-3-4 F → digit
n E+T 5-12 T → T1 * F
n E 17 E → E1 + T
En 17- L→En
L 17
Evaluation of L-Attributed Definitions
 The following procedure evaluate L-Attributed Definitions by
mixing PostOrder (synthesized) and PreOrder (inherited)
traversal.
 Algorithm: L-Eval(n: Node)
 Input: Node of an annotated parse-tree.
 Output: Attribute evaluation.
 Begin
 For each child m of n, from left-to-right Do
 Begin
 Evaluate inherited attributes of m;
 L-Eval(m)
 End;
 Evaluate synthesized attributes of n
 End.
COMPILER DESIGN
CHAPTER FIVE
TYPE CHECKING

Compiled By: Seble N.


Introduction
210

 A compiler must check if the source program follows


semantic conventions of the source language. This is
called static checking

 Dynamic Check
 executed during execution of the program
Examples of static checks
 Type Checks
 Checks if an operator has the right type of operands
 Flow-of-Control Checks
 Statements that cause flow of control to leave a construct must
have some place to which to transfer the flow of control
 For example, a break instruction in C that is not in an enclosing
statement
 Uniqueness Checks
 There are situations in which an object must be defined exactly
once
 For example, in Pascal, an identifier must be declared uniquely
 Name-Related Checks
 Sometimes, the same name must appear two or more times .
 For example, in Ada, a loop or block may have a name that appears
at the beginning and end of the construct.
Examples of Dynamic Checks
 Array Out of Bound
int [] a = new int [10];
for (int i=0; i<20; ++i)
a[i] = i;
 Division by zero
float a=50;
for (int i=0; i<5; ++i)
a= a/i;
Type Checking Introduction
 A compiler must check that a program follows the Type Rules
of a language.
 The Type Checker is a module of a compiler devoted to type
checking tasks
 Examples of Tasks
 The operator mod is defined only if the operands are integers;
 Indexing is allowed only on an array and the index must be an
integer;
 A function must have a precise number of arguments and the
parameters must have a correct type;
 etc...
Type Checker
 Type checker verifies that the type of a construct
(constant, variable, array, list, object)matches what is
expected in its usage context
 E.g.
 int x=“abc”;
 Some operators (+,-,*,/) are “overloaded”; i.e, they
can apply to objects of different types
 Functions may be polymorphic; i.e, accept arguments of
different types.
 Type information produced by the type checker may be
needed when the code is generated
Types of Types
 Basic types
 are atomic types that have no internal structure as far as
the programmer is concerned
 They include types like integer, real, boolean, character , and
enumerated types

 Constructed types
 include arrays, records, sets, and structures constructed from
the basic types and/or other constructed types
 Pointers and functions are also constructed types
Type Checker

 The design of a Type Checker depends on

 the syntactic structure of language constructs (e.g. operator)

 the Type Expressions of the language (e.g. int,float, array)

 the rules for assigning types to constructs

 E.g. if both operands of + are int, result is int


Type Expressions

 The type of a language construct will be denoted by a


type expression

 A type expression is either a basic type or formed by


applying an operator called type constructor to the
type expression
1. A basic type is a type expression(e.g. int)
2. A type name is a type expression (e.g. ptr:*int, then x:ptr)
Type Expressions cont‟d
 A type constructor applied to a type expression is
a type expression. Constructors include:
 Arrays

 If I in an index set and T is a type expression, then array (I, T)


is a type expression
 Example: array[1..10] of int == array(10,int);
 Records
 The difference between products and records is that, records
have names for record fields.
 Pointer
 If T is a type expression, then pointer(T) is the type expression
“pointer to an object of type T”;
Type System
 Type System:
 Collection of rules for assigning type expressions to the various
part of a program

 Type Systems are specified using syntax directed definitions


 A type checker implements a type systems
 Sound type system : is one where any program that passes
the static type checker cannot contain run-time type errors.
Such languages are said to be strongly typed languages.
Specification of a Type System
 The syntax directed definition for associating a type
to an Identifier is:

 All the attributes are synthesized.


 Since P  D;E, all the identifiers will have their types saved in
the symbol table before type checking an expression E.
Specification of a Type System
 The syntax directed definition for associating a type
to an Expression is:
Specification of a Type System
 The syntax directed definition for associating a type
to a statement is:

 The type expression for a statement is either void or type error.


Equivalence of Type Expressions

 In the above SDD we compared the type


expressions using the equal operator. However,
such an operator is not defined except
perhaps between basic types.
 A natural notion of equivalence is
 Structural equivalence
 Name equivalence
Structural Equivalence

 two type expressions are structurally equivalent if and


only if they are
 the same basic types or
 E.g. integer is equivalent only to integer

 formed by applying the same constructor to structurally


equivalent types
 E.g. pointer (integer) is structurally equivalent to pointer (integer)
type
ptr = *integer
Name Equivalence var
A : ptr;
B : ptr;
C : *integer;
D, E : *integer;
 In some language, types can be given names
 Do the variables A, B, C, D, E have the same type?
 The answer varies from implementation to implementation
 Name Equivalence:
 We have name equivalence between two type expressions if and
only if they are identical
 Structural equivalence
 Names are replaced by type expressions they define, so two types
are structurally equivalent if they represent two structurally
equivalent type expressions when all names have been substituted
 For example
 ptr and *integer are not name equivalent but they are structurally
equivalent
Type Conversion
 Example. What‟s the type of “x + y” if:
1. x is of type real;
2. y is of type int;
3. Different machine instructions are used for operations on reals
and integers.
 Depending on the language, specific conversion rules
must be adopted by the compiler to convert the type of
one of the operand of +
 The type checker in a compiler can insert these conversion
operators into the intermediate code.
 For example, an operator inttoreal can be inserted
 Such an implicit type conversion is called Coercion.
Type Coercion in Expressions
 The SDD for coercion from integer to real for a
generic arithmetic operation op is
INTERMEDIATE CODE
GENERATION
COMPILER DESIGN
CHAPTER SIX
INTERMEDIATE CODE GENERATION

Compiled By: Seble N.


Intermediate Code Generation
Intermediate Code Generation
 In a compiler, the front end translates a source
program into an intermediate representation, and
the back end generates the target code from this
intermediate representation
 The use of a machine independent intermediate
code (IC) is:
 retargeting to another machine is facilitated
 the optimization can be done on the machine
independent code
Intermediate Languages
 Syntax trees,
 Postfix notations, and
 Three-address code
Syntax Tree
 A syntax tree (abstract tree) is a condensed form of
parse tree useful for representing language
constructs
E
+

E + E
a b
a b

a. Parse tree for a + b b. Abstract tree for a + b


Postfix Notation
 The postfix notation is practical for an intermediate
representation as the operands are found just
before the operator
 In fact, the postfix notation is a linearized
representation of a syntax tree
 e.g.,1 * 2 + 3 will be represented in the postfix
notation as 1 2 *3 +
Three-Address code
 The three address code is a sequence of statements of
the form:
X := Y op Z
 where: X, Y, and Z are names, constants or compiler-generated
temporaries
 op is an operator such as integer or floating point arithmetic
operator or logical operator on Boolean data
 Important Notes:
 No built-up arithmetic operator is permitted
 Only one operator at the right side of the assignment is
possible, i.e., x + y + z is not possible
 It has been given the name three-address code because
such an instruction usually contains three-addresses (the two
operands and the result)
Types of Three-Address Statements
Statement Format Comments
Assignment (binary operation) X := Y op Z Arithmetic and logical operators used
Assignment (unary operation) X := op Y Unary -, not, conversion operators used
Copy statement X := Y
Unconditional jump goto L
Conditional jump If X relop y goto L
Function call param X1 The parameters are specified by param
param X2 The procedure p is called by indicating the
… number of parameters n
param Xn
call p, n
Indexed arguments X := Y [I] X will be assigned the value at the address Y + I
Y[I] := X The value at the address Y + I will be assigned X

Address and pointer assignments X := & Y X is assigned the address of Y


X := *Y X is assigned the element at the address Y
*X = Y The value at the address X is assigned Y
Syntax-Directed Translation into
Three-Address Code
 Syntax directed translation can be used to generate
the three-address code
 Generally,
 Either the three-address code is generated as an
attribute of the attributed parse tree or
 The semantic actions have side effects that write the
three-address code statements in a file
Three address code generation
 To this end the following functions are given:
 newtemp - each time this function is called, it gives distinct names
that can be used for temporary variables
 newlabel - each time this function is called, it gives distinct names
that can be used for label names
 In addition, for convenience, we use the notation gen to create a
three-address code from a number of strings
 gen will produce a three-address code after concatenating all the
parameters
 For example, if id1.lexeme = x, id2.lexeme =y and id3.lexeme = z:
 gen (id1.lexeme, „:=‟, id2.lexeme, „+‟, id3.lexeme) will produce the
three-address code: x := y + z
SDT of a Three Address Code
Production Semantic Rules
S→id := E S.code := E.code || gen (id.lexeme, :=, E.place)
E→E1 + E2 E.place := newtemp;
E.code := E1.code || E2.code || gen (E.place, ‘:=’, E1.place, ‘+’, E2.place)
E→E1 * E2 E.place := newtemp;
E.code := E1.code || E2.code || gen (E.place, ‘:=’, E1.place, ‘*’, E2.place)
E→- E1 E.place := newtemp;
E.code := E1.code || gen (E.place, ‘:= uminus ’, E1.place)
E→(E1) E.place := newtemp;
E.code := E1.code
E→id E.place := id.lexeme;
E.code := ‘’ /* empty code */
the attribute place will hold the value of the grammar symbol
the attribute code will hold the sequence of three-address statements evaluating the grammar symbol
the function newtemp returns a sequence of distinct names t1, t2, . . . in response to successive calls
Declaration
 While processing declaration statements, the compiler
 reserves memory area for the variables and

 stores the relative address of each variable in the symbol table

 The compiler maintains a global offset variable that indicates the first
address not yet allocated

 Initially, offset is assigned 0. Each time an address is allocated to a


variable, the offset is incremented by the width of the data object denoted
by the name

 The procedure enter(name, type, address) creates a symbol table entry for
name, give it the type type and the relative address address
SDT for Declaration statements
Example: Semantic actions for the declaration part
Production Semantic Rules
P→{offset := 0} D S
D→D; D
D→id :T {enter(id.name, T.type, offset);
offset := offset + T.width}
T→integer {T.type := integer;
T.width := 4}
T→real {T.type := real;
T.width := 8}
T→array[num] of T1 {T.type := array (num.val, T1.type);
T.width := num.val * T1.width}
T→^T1 {T.type := pointer (T1.type);
T.width := 4}
Example
 How many memory cells will be allocated for the
following declaration statements?
x : int;
 arr : array[5] of int;
Assignment Statements

 lookup(lexeme)
 checks if there is an entry for this occurrence of the name in
the symbol table, and
 if so a pointer to the entry is returned; otherwise nil is
returned.
 newtemp
 generates temporary variables and
 reserve a memory area for the variables by modifying the
offset and putting in the symbol table the reserved
memories‟ addresses
TAC SDD for assignment statements
 Example: generation of the three-address code for
the assignment statement and simple expressions
Production Semantic Rules
S→id := E p := lookup (id.name);
S.code := E.code || If p <> nil then gen (p.lexeme, ‘:=’, E.place) else error
E→E1 + E2 E.place := newtemp;
E.code := E1.code || E2.code || gen (E.place, ‘:=’, E1.place, ‘+’, E2.place)
E→E1 * E2 E.place := newtemp;
E.code := E1.code || E2.code || gen (E.place, ‘:=’, E1.place, ‘*’, E2.place)
E→- E1 E.place := newtemp;
E.code := E1.code || gen (E.place, ‘:= uminus ’, E1.place)
E→(E1) E.place := newtemp;
E.code := E1.code
E→id p := lookup (id.lexeme)
if p <> nil then E.place = p.lexeme else error
E.code := ’’ /* empty code */
E→num E.place := newtemp;
E.code := gen (E.place, ‘:= num.value)
Example
 Generate the three-address code for the input a:=
x + y * z from the previous SDT:
Exercise
 Generate the three-address code for the input
 num:= (x + y) * (z + m) from the previous SDT:
 i=2*n+k
Addressing Array Elements
 Width of each array element is w,
 Relative address of the array base and
 Lower bound of the index low, then the ith element of
the array is found at the address: base + (i – low) * w
 For example
 A : array [5..10] of integer;
 if it is stored at the address 100
 A[7] = 100 + (7 – 5) * 4
 When the index is constant as above, it is possible to
evaluate the address of A[i] at compile time
Addressing Array Elements
 A : array [5..10] of integer;
 A[7] = 100 + (7 – 5) * 4

 However, even in this case, some of the calculation


can be done at compile time
 base + (i – low) * w = i * w + (base – low * w)
 base – low * w can be calculated at compile time
and save time for the execution
Production Semantic Rules
S→L := E if L.offset = nil then /* L is a simple id */
S.code := L.code || E.code || gen (L.place, ‘:=’, E.place);
else
S.code := L.code || E.code || gen (L.place, ‘[’, L.offset, ‘] :=’, E.place)
E→E1 + E2 E.place := newtemp;
E.code := E1.code || E2.code || gen (E.place, ‘:=’, E1.place, ‘+’, E2.place)
E→E1 * E2 E.place := newtemp;
E.code := E1.code || E2.code || gen (E.place, ‘:=’, E1.place, ‘*’, E2.place)
E→- E1 E.place := newtemp;
E.code := E1.code || gen (E.place, ‘:= uminus ’, E1.place)
E→(E1) E.place := newtemp;
E.code := E1.code
E→L if L.offset = nil then /* L is simple */
begin
E.place := L.place;
E.code := L.code
end
else
begin
E.place := newtemp;
E.code := L.code || gen (E.place, ‘ :=’, L.place, ‘[’ , L.offset, ‘]’)
end
L→id [E] L.place := newtemp;
L.offset := newtemp;
L.code := E.code || gen (L.place, ‘:=’, base (id.lexeme) – width (id.lexeme) *
low(id.lexeme)) || gen (L.offset, ‘:=’, E.place, ‘*’, width (id.lexeme));
L→id p := lookup (id.lexeme);
if p <> nil then L.place = p.lexeme else error;
L.offset := nil; /* for simple identifier */
E.code := ‘’ /* empty code */
Example
 Generate the three address code for the following
input based on the above SDT
X := A [y]
 A is stored at the address 100 and its values are
integers (width = 4) and low = 1
Production Semantic Rules

TAC SDD for Expressions


S→id := E p := lookup (id.name);
S.code := E.code || If p <> nil then gen (p.lexeme, „:=‟, E.place) else error

S→while E do S1 See next slide;

E→E1 + E2 E.place := newtemp;


E.code := E1.code || E2.code || gen (E.place, „:=‟, E1.place, „+‟, E2.place)

E→E1 - E2 E.place := newtemp;


E.code := E1.code || E2.code || gen (E.place, „:=‟, E1.place, „-‟, E2.place)

E→E1 * E2 E.place := newtemp;


E.code := E1.code || E2.code || gen (E.place, „:=‟, E1.place, „*‟, E2.place)

E→- E1 E.place := newtemp;


E.code := E1.code || gen (E.place, „:= uminus ‟, E1.place)

E→(E1) E.place := newtemp;


E.code := E1.code
E→id p := lookup (id.lexeme)
if p <> nil then E.place = p.lexeme else error
E.code := ‟‟ /* empty code */

E→num E.place := newtemp;


E.code := gen(E.place = num.value)
Example of TAC
 Semantic rules for generating a three address code for a while statement

Production Semantic Rules


S→while E do S1 S.begin := newlabel;
S.after := newlabel;
S.code := gen (S.begin, ‘:’) || E.code ||
gen (‘if’, E.place, ‘= 0 goto ’, S.after) || S1.code || gen
(S.after, ‘:’)
Example
 Generate the three address code for
i=2*n+k
while i do
i=i-k
Exercise
 Generate the three address code for
i=100;
Sum=2;
count=0;
while i do {
i=i-sum;
count=count+1;
}

You might also like