0% found this document useful (0 votes)

7 views

4.parsing

Uploaded by

shreya.banerjee85

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

4.parsing

Uploaded by

shreya.banerjee85

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

CS 473: COMPILER

DESIGN
Adapted from slides by Steve Zdancewic, UPenn
creating an abstract representation of program syntax

PARSING
Today: Parsing
Source Code
(Character stream)
if (b == 0) { a = 1; }
Lexical Analysis
Token stream:

if ( b == 0 ) { a = 0 ; }
Parsing
Abstract Syntax Tree:
If
Analysis &
Eq Assn None Transformation
b 0 a 1

Backend
Assembly Code
l1:
cmpq %eax, $0
jeq l2
jmp l3
l2:
… 3
Parsing: Structure
Tokens:
Structure:
10 + 5 add 10 and 5

call a function f
f ( x )
with argument x

if statement with condition b

if ( b ) { a = 0 ; }
and body a = 0

4
Parsing: Structure
Tokens:
Structure:
10 + 5 add 10 and 5

call a function f
f ( x )
with argument x

if statement with condition b

if ( b ) { a = 0 ; }
and body a = 0

5
Parsing: Structure
Tokens:
Structure:
10 + 5 add 10 and 5

call a function f
f ( x )
with argument x

if statement with condition b

if ( b ) { a = 0 ; }
and body that assigns 0 to a

• Figure out what role each token is playing

• Catch most syntax errors (“valid pieces, bad arrangement”)
• Understand what program the user wrote
• Turn it into a representation we can traverse and analyze
6
Parsing: Overview
• Input: stream of tokens (generated by lexer)
• Output: abstract syntax tree

• Strategy:
– Parse the token stream to build a tree showing how the pieces
relate
– Forget the “concrete” syntax, remember the “abstract” syntax

• Will catch lots of malformed programs! Wrong number of

operators, missing semicolons, unmatched parens; most
“syntax errors” appear here
– But no type errors, initialization, etc.: we still don’t know what
anything means!

7
Describing Syntax
• How are we going to write these things down?

• Exercise: Describe the structure of an if statement, in terms

of the tokens involved.

“An if statement starts with the token IF, then an LPAREN, some
tokens that make up a condition, and an RPAREN, then an
LBRACE, some tokens that make up the body, and an RBRACE.”

if_stmt ::= IF LPAREN cond RPAREN LBRACE stmts RBRACE

• Note: regexps aren’t expressive enough for this, because

they can’t really do recursion (example: can’t describe
matching parentheses)
8
9
CONTEXT-FREE
GRAMMARS
Context-Free Grammars
• Here is a specification of the language of balanced parens:
S⟼
(S)S
S⟼e
• The definition is recursive: S mentions itself.

• Idea: “derive” a string in the language by starting with S and

rewriting according to the rules:
– Example: S ⟼ (S)S ⟼ ((S)S)S ⟼ ((e)S)S ⟼ ((e)S)e ⟼ ((e)e)e
= (())

• You can replace the “nonterminal” S by its definition

anywhere
• A context-free grammar accepts a string when there is a
derivation from the start symbol
11
Context-Free Grammars:
Definition
• A Context-Free Grammar (CFG) consists of
– A set of terminals (e.g., a lexical token or e)
– A set of nonterminals (e.g., S and other syntactic variables)
– A designated nonterminal called the start symbol
– A set of productions: LHS ⟼ RHS
• LHS is a nonterminal
• RHS is a string of terminals and nonterminals

• Example: The balanced parentheses language:

S⟼
(S)S
S⟼e
• Exercise: How many terminals? How many nonterminals?
Productions?

12
Another Example: Sum
Grammar
• A grammar that accepts parenthesized sums of numbers:
S ⟼ E+S
| E
E ⟼ number |
(S)
e.g.: (1 + 2 + (3 + 4)) + 5

• Note the vertical bar ‘|’ is shorthand for multiple

productions:

S ⟼ E+S 4 productions
S ⟼ E 2 nonterminals: S, E
E ⟼ number 4 terminals: (, ), +, number
E ⟼ (S) Start symbol: S

13
Derivations in CFGs
• Example: derive (1 + 2 + (3 + 4)) + 5S ⟼ E + S | E
• S⟼E+S E ⟼ number |
⟼ (S) + S (S)
⟼ (E + S) + S For arbitrary strings a, b, g and
⟼ (1 + S) + S production rule A ⟼ b
a single step of the derivation is:
⟼ (1 + E + S) + S
⟼ (1 + 2 + S) + S aAg ⟼ abg
⟼ (1 + 2 + E) + S
⟼ (1 + 2 + (S)) + S ( substitute b for an occurrence of A)
⟼ (1 + 2 + (E + S)) + S
⟼ (1 + 2 + (3 + S)) + S
⟼ (1 + 2 + (3 + E)) + S In general, there are many possible
⟼ (1 + 2 + (3 + 4)) + S derivations for a given string
⟼ (1 + 2 + (3 + 4)) + E
Note: Underline indicates symbol
⟼ (1 + 2 + (3 + 4)) + 5 being expanded.

14
Loops and Termination
• Some care is needed when defining grammars
• Consider: S ⟼
E
E ⟼
S has nonterminal definitions that are “nonproductive”.
– This grammar
(i.e. they don’t mention any terminal symbols)
– There is no finite derivation starting from S, so the language is
empty.
S ⟼ (S
• Consider: )
– This grammar is productive, but again there is no finite derivation
starting from S, so the language is empty

• When writing a large grammar, it’s easy to accidentally “chain”

many nonterminals without a base case
• Upshot: be aware of “vacuously empty” CFG grammars.
– Every nonterminal should eventually rewrite to an alternative that
contains only terminal symbols.
15
16
debugging parser conflicts
disambiguating grammars

PARSER GENERATORS

17
Getting Started with
Yacc/Bison
• https://www.gnu.org/software/bison/
• https://sourceforge.net/projects/gnuwin32/files/bison/2.4.1/b
ison-2.4.1-setup.exe/download?use_mirror=cfhcable
for Windows (but Cygwin or WSL works better)

• Run yacc -d or bison -yd <grammar>.y to get two files:

– y.tab.c implements the parser
– y.tab.h defines tokens and values for use in lexing – include
in .lex file (replaces files like tokens.h)

• Not every grammar can be automatically parsed!

– When run, reports number of shift/reduce and reduce/reduce
conflicts
– yacc -dv or bison -ydv also produces y.output, which describes
the parser states and conflicts

18
Anatomy of a Yacc file
%{
int yylex(void);
prelude: helper functions,
written in C
%}

%union {
int ival;
char* sval; } token definitions, to be used in lexer

%token <ival> NUM

%%
exp:
body: grammar rules
NUM { printf("number\n"); }
and associated actions
| exp PLUS exp { printf("addition\n"); }
(again, written in C)
%%

int main(){
yyparse(); end: arbitrary C code, can call
} the parsing function yyparse 19
Yacc Actions
NUM { printf("number\n"); }if we just want to know
what’s in the program

$$ is the return value for this

NUM { $$ = $1; }
production; $1 gets the
value of the first symbol

exp PLUS exp { $$ = $1 + $3; } access values of tokens and

nonterminals by their
position in the rule

• Later, we’ll build a representation of the program

instead of running it right away!

20
Running the Lexer
• Running yacc -d <filename>.y generates y.tab.c and y.tab.h

• y.tab.c defines a function called yyparse, which parses a

stream of tokens according to the grammar (using yylex to get
the tokens)

• y.tab.h defines the tokens and their values, and should be

used in the lexer instead of defining them there

• If parser has a main function, we can just compile and run

y.tab.c together with the lex.yy.c from the lexer

• Otherwise, we can use the lexer and parser as a library, and call
the generated yyparse function in other files (the rest of the
compiler)

• Adding the -v argument (e.g. yacc –dv <filename>.y) also

generates y.output, which can help with debugging (more on
21
this later)
22
Ambiguity
• Consider this grammar:
S⟼S–S |
number
• How do we parse 1 – 2 – 3?

S⟼S–S S⟼S–S
⟼1–S ⟼S –3
vs.
⟼… ⟼…

“This is an expression that

“This is an expression that
computes 1 minus
computes <an expression>
<an expression>”
minus 3”

23
Ambiguity
• Consider this grammar:
S⟼S–S |
number
• How do we parse 1 – 2 – 3?

S⟼S–S S⟼S–S
⟼1 – S ⟼S –3
vs.
⟼1 – S–S ⟼S–S–3
⟼1 – 2–S ⟼1–S–3
⟼1 – 2–3 ⟼1–2–3

“This is an expression that

“This is an expression that
computes 1 minus <2 minus 3>”
computes <1 minus 2> minus 3”

24
Ambiguity
• Consider this grammar:
S⟼S–S |
number
• How do we parse 1 – 2 – 3?

S⟼S–S S⟼S–S
⟼1 – S ⟼S –3
vs.
⟼1 – S–S ⟼S–S–3
⟼1 – 2–S ⟼1–S–3
⟼1 – 2–3 ⟼1–2–3

“This is an expression that

“This is an expression that
does 1 – (2 – 3)”
does (1 – 2) – 3”

25
Associativity and Precedence
• Consider this grammar:S ⟼ E – S | E
E ⟼ number |
(S)
• This grammar makes ‘–’ right-associative
• If we want to generate 1 – 2 – 3:
S⟼E–S S⟼E–S
⟼1 – S ⟼E–E
⟼1 – E – S but ⟼E–3
⟼1 – 2–S ⟼ can’t make
⟼1 – 2–E
⟼1 – 2–3
• So 1 – (2 – 3) is the only possible parse
• Note that the grammar is right recursive!
• Exercise: How would you make ‘–’ left-associative?
26
Eliminating Ambiguity
• We can often eliminate ambiguity by layering the grammar
(precedence) and allowing recursion only on one side
(associativity).
• Higher-precedence operators go farther from the start
symbol.
• Example: S ⟼ S + S | S * S | ( S ) |
number

• To disambiguate:
– Decide (following math) to make ‘*’ higher precedence than ‘+’
– Make ‘+’ left associative S0 ⟼ S0 + S1 |
– Make ‘*’ right associative
• Now 1 + 2 + 3 * 4 must mean S1
(1 + 2) + (3 * 4)
S1 ⟼ S2 * S1 |

• Note that operations can only S2

appear in the bottom
nonterminal S2 if they’re S2 ⟼ number | ( S0 ) 27
28
Precedence and Associativity Declarations
• Parser generators like yacc/bison support precedence and
associativity declarations
– Resolve common conflicts without changing the grammar by hand
• Example:
%left PLUS
%left TIMES
• Tokens can be declared left, right, or nonassoc
• Tokens further down have higher precedence (bind tighter, get
evaluated first)
• Precedence of a rule is based on the precedence of its last
terminal:
E⟼E+E has the precedence of “+”
E ⟼ if E then E else E has the precedence of
“else”
• Can’t apply precedence to nonterminals

• Exercise: add some arithmetic ops to parse1.y, and give them the
right associativity and precedence! For instance: 1 – 2 – 3 * 4
should be -13 29
Expressions and Statements
• Most languages have at least two kinds of
nonterminals, expressions and statements
– Expressions (arithmetic, array lookup, ...) compute values
– Statements (assignment, loops, …) change state
• Usually expressions can appear inside statements, but
not vice versa
stmt: ID ASSIGN exp SEMICOLON
• (Note that C breaks this rule, which makes everything
harder!)

• Once the grammar is a little more complicated, having

the parser act as an interpreter is much harder and
less efficient!
– We’ll want to have the parser produce an internal
representation of the program structure instead
30
Program 2: Parser
• Posted on the course website (
https://www.cs.uic.edu/~mansky/teaching/cs473/sp21/progr
am2.html
)
• Extend a simple parser with support for more features
• Due next Wednesday at the start of class
• Submit via Gradescope

• Extending a parser:
1. What do we want the syntax to look like?
2. Which parts have to be specific tokens (keywords, symbols,
identifiers) and which can be more complex structures
(expressions, statements)?
3. Add the new productions to the appropriate nonterminal
4. See if this causes any new conflicts, add
associativity/precedence directives as necessary

31
32

T-Top Waterleak Manual PDF
No ratings yet
T-Top Waterleak Manual PDF
50 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Lecture Notes - William James & Richard Taylor
100% (1)
Lecture Notes - William James & Richard Taylor
3 pages
Compiler 3
No ratings yet
Compiler 3
11 pages
Syntax Analysis: EECS 483 - Lecture 4 University of Michigan Monday, September 17, 2006
No ratings yet
Syntax Analysis: EECS 483 - Lecture 4 University of Michigan Monday, September 17, 2006
28 pages
8 Notes
No ratings yet
8 Notes
12 pages
Lec4 SyntaxAnalysis
No ratings yet
Lec4 SyntaxAnalysis
41 pages
Chapter 3 Syntax Analysis
No ratings yet
Chapter 3 Syntax Analysis
78 pages
Lecture 1 Introduction DR Raheel 19022024 032426pm
No ratings yet
Lecture 1 Introduction DR Raheel 19022024 032426pm
32 pages
Compiler Construction Week 04 Syntax Analysis I)
No ratings yet
Compiler Construction Week 04 Syntax Analysis I)
41 pages
CH2-1 To CH2-3
No ratings yet
CH2-1 To CH2-3
79 pages
Compiler Design Chapter-3
0% (1)
Compiler Design Chapter-3
177 pages
Chapter-3-Syntax Analysis
No ratings yet
Chapter-3-Syntax Analysis
126 pages
Lecture 5
No ratings yet
Lecture 5
28 pages
Principles of Programming Languages: Syntax Analysis
No ratings yet
Principles of Programming Languages: Syntax Analysis
51 pages
G52Cmp Compilers: Syntax Analysis
No ratings yet
G52Cmp Compilers: Syntax Analysis
36 pages
Chapter 2 - Simple Syntax Directed Translator
No ratings yet
Chapter 2 - Simple Syntax Directed Translator
39 pages
CSC 409 Note 2
No ratings yet
CSC 409 Note 2
12 pages
Parsing - 1
No ratings yet
Parsing - 1
59 pages
Chapter 4_01a0a63b848e0c15cdfbc525231434fc
No ratings yet
Chapter 4_01a0a63b848e0c15cdfbc525231434fc
62 pages
Chapter 3
No ratings yet
Chapter 3
180 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
14 pages
Entrepreneurship Process
No ratings yet
Entrepreneurship Process
22 pages
Compiler 2
No ratings yet
Compiler 2
45 pages
2014-CD Ch-03 SAn
No ratings yet
2014-CD Ch-03 SAn
21 pages
Syntax Analyser
No ratings yet
Syntax Analyser
30 pages
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
No ratings yet
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
44 pages
Compiler 2
100% (1)
Compiler 2
45 pages
Chapter 3 - Syntax Analysis Part One
No ratings yet
Chapter 3 - Syntax Analysis Part One
10 pages
Chapter 3 - Syntax Analyzer
No ratings yet
Chapter 3 - Syntax Analyzer
28 pages
L4 Formal Grammers
No ratings yet
L4 Formal Grammers
23 pages
CD Chapter-3
No ratings yet
CD Chapter-3
105 pages
Unit 3 SDD
No ratings yet
Unit 3 SDD
7 pages
Compiler Theory: (A Simple Syntax-Directed Translator)
No ratings yet
Compiler Theory: (A Simple Syntax-Directed Translator)
50 pages
CSC441-Lesson 04
No ratings yet
CSC441-Lesson 04
40 pages
Syntax Analyzer
No ratings yet
Syntax Analyzer
38 pages
SE Compiler Chapter 3-Parser
No ratings yet
SE Compiler Chapter 3-Parser
27 pages
KCA015 Unit2
No ratings yet
KCA015 Unit2
29 pages
Parsing
No ratings yet
Parsing
38 pages
Unit II PDF
No ratings yet
Unit II PDF
7 pages
Why Syntax Analysis?
No ratings yet
Why Syntax Analysis?
15 pages
CH03
No ratings yet
CH03
57 pages
Lecture 04
No ratings yet
Lecture 04
51 pages
Parsing Part - 1
No ratings yet
Parsing Part - 1
53 pages
Ch2 Modified
No ratings yet
Ch2 Modified
39 pages
2nd Phase Syntax Analyzer -1
No ratings yet
2nd Phase Syntax Analyzer -1
136 pages
CD UNIT 3
No ratings yet
CD UNIT 3
76 pages
Compiler Design CS_4
No ratings yet
Compiler Design CS_4
70 pages
Syntax Analyzer
No ratings yet
Syntax Analyzer
38 pages
APznzabvYKoN4zDY71onQwxNN3R5YXoFXjgna4I0XurpAH1XE77GlYeHrkYJx-bE96PPeJntwqzIfNBvguewq_9dNxjJHAPsi5CaMk-Pv6X530i-KQDKh3JuMvyl95bEO1TR_fC6I6zJQhW0qb1oPgi21XiXcoliVzRGGVn66Gsj5rdWsJ7DYhv9_bPuB3iUXcsUAVwQmrEsvBAIIrycUz
No ratings yet
APznzabvYKoN4zDY71onQwxNN3R5YXoFXjgna4I0XurpAH1XE77GlYeHrkYJx-bE96PPeJntwqzIfNBvguewq_9dNxjJHAPsi5CaMk-Pv6X530i-KQDKh3JuMvyl95bEO1TR_fC6I6zJQhW0qb1oPgi21XiXcoliVzRGGVn66Gsj5rdWsJ7DYhv9_bPuB3iUXcsUAVwQmrEsvBAIIrycUz
44 pages
Parsing Part - 1
No ratings yet
Parsing Part - 1
53 pages
Minimize The Number of States in A DFA - Algorithm (3.6, Page 142)
No ratings yet
Minimize The Number of States in A DFA - Algorithm (3.6, Page 142)
11 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
11 pages
1 Syntax Analyzer
No ratings yet
1 Syntax Analyzer
33 pages
Lex
No ratings yet
Lex
13 pages
CH2 1
No ratings yet
CH2 1
27 pages
Unit 2
No ratings yet
Unit 2
168 pages
Compilers - Week 3
No ratings yet
Compilers - Week 3
17 pages
Parsing Notes
No ratings yet
Parsing Notes
96 pages
Grammar and Parse Trees (Syntax) : What Makes A Good Programming Language?
100% (2)
Grammar and Parse Trees (Syntax) : What Makes A Good Programming Language?
50 pages
Lecture05-Syntax Analysis-CFG
No ratings yet
Lecture05-Syntax Analysis-CFG
19 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
12.Register Allocation
No ratings yet
12.Register Allocation
34 pages
6.ast
No ratings yet
6.ast
16 pages
1.lec01
No ratings yet
1.lec01
26 pages
3.regex
No ratings yet
3.regex
16 pages
5.ll-lr
No ratings yet
5.ll-lr
53 pages
2.lexing
No ratings yet
2.lexing
16 pages
Theory Master Toolbox (INCOMPLETE)
No ratings yet
Theory Master Toolbox (INCOMPLETE)
19 pages
TTT Brochure 2019
No ratings yet
TTT Brochure 2019
1 page
Aimstack Readthedocs Io en v3.17.2
No ratings yet
Aimstack Readthedocs Io en v3.17.2
209 pages
Assignment Motion and Projectile
100% (1)
Assignment Motion and Projectile
2 pages
Laplace Transforms
No ratings yet
Laplace Transforms
15 pages
Insight 2 Data Privacy Act of 2012
No ratings yet
Insight 2 Data Privacy Act of 2012
7 pages
ASLI Products List 2014
No ratings yet
ASLI Products List 2014
2 pages
Cbse Test Paper-02 12 Electricity and Its Effects
No ratings yet
Cbse Test Paper-02 12 Electricity and Its Effects
4 pages
Errata: Date of Issue: Affected Publication
No ratings yet
Errata: Date of Issue: Affected Publication
8 pages
PPS Prelim 23-24 Set1
No ratings yet
PPS Prelim 23-24 Set1
4 pages
CS304 Mcqs MidTerm by Vu Topper RM-New
100% (1)
CS304 Mcqs MidTerm by Vu Topper RM-New
14 pages
Discrete-Time Signals and Systems: H. C. So Semester B, 2011-2012
No ratings yet
Discrete-Time Signals and Systems: H. C. So Semester B, 2011-2012
50 pages
Popular Prick - Condensed Version
No ratings yet
Popular Prick - Condensed Version
49 pages
NRP Qap
100% (1)
NRP Qap
10 pages
Digital Empowerment Project (1)
No ratings yet
Digital Empowerment Project (1)
3 pages
Program 2023
No ratings yet
Program 2023
100 pages
Mini Pleat Medium Filter f6 f9
No ratings yet
Mini Pleat Medium Filter f6 f9
3 pages
Beer in Pet Packaging PDF
No ratings yet
Beer in Pet Packaging PDF
6 pages
N140BGE-L32-CHIMEI Innolux
No ratings yet
N140BGE-L32-CHIMEI Innolux
29 pages
Tape Op 126 Subscriber 243028 PDF
No ratings yet
Tape Op 126 Subscriber 243028 PDF
86 pages
Antrodiction BKI REPORT
No ratings yet
Antrodiction BKI REPORT
9 pages
Manual Part 3 668536
No ratings yet
Manual Part 3 668536
67 pages
Slides - Power
No ratings yet
Slides - Power
349 pages
Factors Affecting Absenteeism To The Academic Performance of Grade 9 Students of Agdangan National High School
No ratings yet
Factors Affecting Absenteeism To The Academic Performance of Grade 9 Students of Agdangan National High School
61 pages
IGNOU MCA MCS-031 Solved Assignment 2010
No ratings yet
IGNOU MCA MCS-031 Solved Assignment 2010
13 pages
Tanque de Extincion de Incendio Mediante Agente Limpio
No ratings yet
Tanque de Extincion de Incendio Mediante Agente Limpio
6 pages
CEM_UNIT-2
No ratings yet
CEM_UNIT-2
182 pages
Solo Travel Presentation
No ratings yet
Solo Travel Presentation
10 pages

4.parsing

Uploaded by

4.parsing

Uploaded by

CS 473: COMPILER

if statement with condition b

if statement with condition b

if statement with condition b

• Figure out what role each token is playing

• Will catch lots of malformed programs! Wrong number of

• Exercise: Describe the structure of an if statement, in terms

if_stmt ::= IF LPAREN cond RPAREN LBRACE stmts RBRACE

• Note: regexps aren’t expressive enough for this, because

• Idea: “derive” a string in the language by starting with S and

• You can replace the “nonterminal” S by its definition

• Example: The balanced parentheses language:

• Note the vertical bar ‘|’ is shorthand for multiple

• When writing a large grammar, it’s easy to accidentally “chain”

• Run yacc -d or bison -yd <grammar>.y to get two files:

• Not every grammar can be automatically parsed!

%token <ival> NUM

$$ is the return value for this

exp PLUS exp { $$ = $1 + $3; } access values of tokens and

• Later, we’ll build a representation of the program

• y.tab.c defines a function called yyparse, which parses a

• y.tab.h defines the tokens and their values, and should be

• If parser has a main function, we can just compile and run

• Adding the -v argument (e.g. yacc –dv <filename>.y) also

“This is an expression that

“This is an expression that

“This is an expression that

• Note that operations can only S2

• Once the grammar is a little more complicated, having

You might also like