compiler construction Lecture 3-4
compiler construction Lecture 3-4
COMPILER CONSTRUCTION(CS-636)
______________________________________________________
GIMS- PMAS Arid Agriculture University, Gujrat Campus
B-
Formal Languages & Automata2
Theory (a review in one slide)
Alphabet: a finite set of symbols
String: a finite, possibly empty sequence of
symbols from an alphabet
Language: a set, often infinite, of strings
Finite specifications of (possibly infinite)
languages
Automaton – a recognizer; a machine that
accepts all strings in a language (and rejects all
other strings)
Grammar – a generator; a system for producing
all strings in the language (and no other strings)
A particular language may be specified by
many different grammars and automata
A grammar or automaton specifies only one
language
Lexical Analyzer
Symbol
Table
3
Lexical Analyzer
E.g. :
<id , a>
4
Lexical Analyzer
Id [R.E]
Letter = a-z
Digit = 0-9
5
Lexical Analyzer
6
Tokens 7
Example:
if( i == j )
z = 0;
else
z = 1;
Tokens 8
Input is just a sequence of characters:
i f ( \b i \b = = \b j \n \t ....
Tokens 9
Goal:
partition input string into
substrings
classify them according to
their role
Tokens 10
A token is a syntactic
category
Natural language:
“He wrote the program”
Words: “He”, “wrote”,
“the”, “program”
Tokens 11
Programming language:
“if(b == 0) a = b”
Words:
“if”, “(”, “b”, “==”, “0”,
“)”, “a”, “=”, “b”
Tokens 12
scanner
Define tokens in the source language.
Describe the patterns allowed for
tokens.
Write regular expressions describing
the patterns.
Construct an FA for each pattern.
Combine all FA’s which results in an
NFA.
Convert NFA into DFA
Write a program simulating the DFA.
Ad-hoc Lexer 15
Hand-write code to
generate tokens.
Partition the input string
by reading left-to-right,
recognizing one token at
a time
Ad-hoc Lexer 16
Look-ahead required to
decide where one token
ends and the next token
begins.
Ad-hoc Lexer 17
Problems:
Do not know what kind of
token we are going to
read from seeing first
character.
Ad-hoc Lexer 18
Problems:
If token begins with “i”, is
it an identifier “i” or
keyword “if”?
If token begins with “=”, is
it “=” or “==”?
Ad-hoc Lexer 19
Regular
Languages are the
most popular for specifying
tokens
Simple and useful theory
Easy to understand
Efficient implementations
Languages 21
Let S be a set of
characters. S is called
the alphabet.
A language over S is set
of strings of characters
drawn from S.
Example of Languages 22
Alphabet = ASCII
Language = C++ programs,
Java, C#
Notation 23
Forlexical analysis we
care about regular
languages.
Regular languages can
be described using
regular expressions.
Regular Languages 25
R|S = either R or S
RS = R followed by S
(concatenation)
R* = concatenation of R
zero or more times
(R*= e |R|RR|RRR...)
RE Extentions 28
R? = e | R (zero or one R)
R + = RR* (one or more R)
(R) = R (grouping)
RE Extentions 29
RE Strings in L(R)
a “a”
ab “ab”
a|b “a” “b”
(ab)* “” “ab” “abab” ...
(a|e)b “ab” “b”
Example: integers 31
integer: a non-empty
string
of digits
digit =
‘0’|’1’|’2’|’3’|’4’|
’5’|’6’|’7’|’8’|’9’
integer = digit digit*
Example: identifiers 32
identifier:
string or letters or digits
starting with a letter
C identifier:
[a-zA-Z_][a-zA-Z0-9_]*
Recap 33
Tokens:
strings of characters
representing lexical units
of programs such as
identifiers, numbers,
operators.
Recap 34
Regular Expressions:
concise description of
tokens. A regular
expression describes a
set of strings.
Recap 35
Language L(R):
set of strings represented
by a regular expression
R. L(R) is the language
denoted by regular
expression R.
How to Use REs 36
We need mechanism to
determine if an input
string w belongs to L(R),
the language denoted
by regular expression R.
Acceptor 37
Specification:
Regular Expressions
Implementation:
Finite Automata
Finite Automata
Finite Automaton consists of
An input alphabet (S)
A set of states
A start (initial) state
A set of transitions
A set of accepting (final) states
39
Finite Automaton 40
State Graphs
A state
An accepting
state
Finite Automaton 41
State Graphs
a
A transition
Finite Automata 42
A finite automaton
accepts a string if we can
follow transitions labelled
with characters in the
string from start state to
some accepting state.
FA Example 43
1
0
FA Example 45
A FA that accepts ab*a
Alphabet: {a,b}
b
a a
Table Encoding of FA
Transition b
table a a
0 1 2
a b
0 1 err
1 2 1
2 err err
46
RE → Finite Automata 47
Nondeterministic Finite
Automaton (NFA)
Can have multiple
transitions for one input
in a given state
Can have e - moves
Epsilon Moves 49
ε – moves
machine can move from state A to state B without consuming
input
e
A B
NFA 50
operation of the automaton is
not completely defined by input
1
0 1
A B C
1
0 1
A B C
0
DFA and NFA 53
Deterministic Finite
Automata (DFA)
One transition per input
per state.
No e - moves
Execution of FA 54
A DFA
can take only one path
through the state graph.
Completely determined by
input.
NFA vs DFA 55
Thompson’s construction
(CACM 1968)
Build an NFA for each RE
term.
Combine NFAs with
e-moves.
RE → NFA Construction 59
Subset construction
NFA → DFA
Build the simulation.
Minimize number of
states in DFA (Hopcroft’s
algorithm)
RE → NFA Construction 60
Key idea:
NFA pattern for each
symbol and each
operator.
Join them with e-moves in
precedence order.
RE → NFA Construction 61
a
s0 s1
NFA for a
s0 a
s1 e s3 b s4
NFA for ab
RE → NFA Construction 62
a
NFA for a s0 s1
RE → NFA Construction 63
a
NFA for a s0 s1
b
NFA for b s3 s4
RE → NFA Construction 64
a
NFA for a s0 s1
b
NFA for b s3 s4
s0 a b
s1 s3 s4
RE → NFA Construction 65
a
NFA for a s0 s1
b
NFA for b s3 s4
s0 a
s1 e s3 b s4
NFA for ab
RE → NFA Construction 66
a
e s1 s2 e
s0 s5
e b
s3 s4 e
NFA for a | b
RE → NFA Construction 67
s1 a
s2
NFA for a
RE → NFA Construction 68
s1 a
s2
s3 b s4
a
e s1 s2 e
s0 s5
e b
s3 s4 e
NFA for a | b
RE → NFA Construction 70
s0 e s1 a
s2 e s4
e
NFA for a*
RE → NFA Construction 71
s1 a
s2
NFA for a
RE → NFA Construction 72
s0 e s1 a
s2 e s4
e
NFA for a*
Example RE → NFA 73
b
e e e s4 s5 e
a e
s0 s1 s2 s3 s8 s9
e s6 c
s7 e
e
Example RE → NFA 74
a
s0 s1
Example RE → NFA 75
b
s4 s5
a
s0 s1
s6 c
s7
Example RE → NFA 76
b
e s4 s5 e
a
s0 s1 s3 s8
e s6 c
s7 e
Example RE → NFA 77
e
Example RE → NFA 78
b
e e e s4 s5 e
a e
s0 s1 s2 s3 s8 s9
e s6 c
s7 e