Comp Review: Compilers: Fall 1996 Textbook: "Compilers" by Aho, Sethi & Ullman
Comp Review: Compilers: Fall 1996 Textbook: "Compilers" by Aho, Sethi & Ullman
Fall 1996
Textbook: "Compilers" by Aho, Sethi & Ullman
Compiler Vs Interpreter
When Interpreter?
- To execute command lang's
- When info about data is known only at runtime (eg. arrays in APL)
- If number of runs are one or few
Parts of compilation
o Analysis
o Synthesis
Analysis phases
1. Lexical Analysis
2. Syntax Analysis (Parsing)
3. Semantic Analysis (Type checking)
Lexical Analyzer
\/
Syntax Analyzer
\/
/ Semantic Analyzer \
/ \/ \
Symbol -- Intermediate Code Generator -- Error
Table \ \/ / handler
\ Code Optimizer /
\/
Code Generator
\/
Target Program
Cousins of compiler
1. Preprocessor
2. Assembler
3. Loaders and Link-editiors
Front End Vs Back End
Parsing
- process of finding a parse tree for a string of tokens
- process of determining whether a string of tokens can be
generated by a grammar.
Time: worst - O(n^3); typical - O(n)
Ambiguous grammar - more than one parse tree for a token string
eg. 1. S -> S + S | S - S | a
2. S -> S S + | S S - | a (ambiguous?)
Parsing types
- Top down (start from start symbol and derive string)
o Efficient hand-written parsers
eg. Recursive-Descent, Predictive
Recursive-descent
- 1 proc for each nonterminal (NT). Call proc if RHS has the NT.
problems:
1. Backtracking
2. Left recursion
problems:
1. Left recursion
Regular Expressions
r & s are reg exp's. r+s, rs, r*, r?, r+
char classes [abc], [a-z]*, etc.
declarations
%%
translation rules
%%
auxiliary procedures
Types of parsers
1. Universal (CYK, Earley)
o inefficient
2. Top-down (LL)
o simple
o suitable for hand-written parsers
3. Bottom-up (LR)
o complex
o covers a larger class of grammars
o suitable for automated parsers
Why reg exp's over CFG's for lex spec's? (p. 172)
1. Lex rules are simple. For this, reg exp is sufficient; power of
CFG not needed
2. Reg exp's provide concise & easier to understand notation for
tokens.
3. More efficient lexers can be constructed from reg exp's than
from grammars
4. Separating syntax into lexical and non-lexical parts makes
compiler manageable
Reg exp uses: specifying syntax for id's, const's, keywords, etc.
Grammar uses: nested structures, if-then-else, etc.
if-then-else disambiguation
grammar:
stmt -> if expr then stmt
| if expr then stmt else stmt
Left factoring
- done to produce unique first symbols on RHS of production.
(so as to make it suitable for predictive parsing)
Bottom-up parsing
shift/reduce conflict
- to shift or to reduce?
eg. if
reduce/reduce conflict
- more than 1 NT to reduce to. Which one to reduce to?
LR Parsers are the most general & thus the most complex of all
bottom-up parsers. Then,
Why LR parsers?
1. Recognizes a large class of CFGs.
2. Most general non-backtracking shift-reduce parsing method
3. LR grammars are proper superset of predictive grammars
4. LR grammars detect syntactic errors as soon as it is
possible to do so.
Disadv of LR parsers:
1. Difficult to construct by hand
- need LR parser generators
Ambiguous grammars
- some time we use them. Why?
o shorter, natural spec for some lang constructs (eg. expr's)
o isolate common constructs for special case optimization
Yacc - Yet Another Compiler Compiler - an LALR parser (p. 257 onwards)
file format
declarations
%%
translation rules
%%
supporting C routines
Name equivalence
- two types are equal if the type names are same
Type Conversion
- Implicit (coercion)
- Explicit
-----------
Code
-----------
Static Data
-----------
Stack
-----------
\/
/\
-----------
Heap
-----------
AR fields
1. Returned value (to the caller)
2. Actual parameters (from the caller)
3. Optional control link (points to caller's AR)
4. Optional access link (points to AR of proc in enclosing scope)
5. Saved machine status (PC, regs)
6. Local data
7. Temporaries
Symbol Tables:
- used to store type, scope, storage, etc. of a symbol. If the
symbol is a proc name then,
o number & type of args
o method of arg passing
o type returned
are also stored.
Hashing (and thus Hash tables) is commonly used for symbol tables
Dynamic Storage
1. Allocation
a. fixed-size blocks
b. var-size blocks
2. Deallocation
a. Reference counts -|
b. Marking Techniques |-> Garbage collection
Chapter 8: Intermediate Code Generation
Machine-independent
Intermediate representations
1. syntax trees (DAGs)
2. postfix notation
3. 3-address code x := y op z
Backpatching
- used in one-pass compiler/assembler
Output choices
1. Absolute machine lang
2. Relocatable machine lang
3. Assembly lang
Peep-hole optimizations
- optimize a small window (peephole) of code (not nessarily
contiguous)
- chain effect: doing one peep-hole optimization uncovers other
chances
for peep-hole optimizations. Usually run the algorithm thru the code
several times to optimize it fully.
2. Flow-of-control optimizations
eg. multiple goto's to single goto.
3. Algebraic simplifications
eg. X + 0 to X; X * 1 to X, etc.
4. Strength reduction
- convert costly (time-wise) operations to less costly operatios
eg. X^2 to X * X