0% found this document useful (0 votes)
2 views

Lecture Week 03

Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture Week 03

Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 24

Compiler

Construction
CS 322
Mr. Atif Ali
Lecture 6
How to Describe Tokens?
 Regular Languages are the most popular for specifying tokens
because
• These are based on Simple and useful theory
• Easy to understand
• Efficient implementations exist for generating lexical analyzers
based on such languages.

Languages
 Let be a set of characters.  is called the
alphabet.
 A language over  is set of strings of characters
drawn from 
2
Example of Languages
Alphabet = English characters
Language = English sentences
Alphabet = ASCII
Language = C++ programs,
Java, C#
Notation
 Languages are sets of strings (finite sequence of
characters)
 Need some notation for specifying which sets we want
 For lexical analysis we care about regular
languages.
 Regular languages can be described using regular
3
expressions.
Regular Languages
 Each regular expression is a notation for a regular
language (a set of words).
 If A is a regular expression, we write L(A) to refer
to language denoted by A.
 A regular expression (RE) is defined inductively
a ordinary character from 
the empty string
R|S = either R or S
RS = R followed by S (concatenation)
R* = concatenation of R zero or more
times
(R*=  |R|RR|RRR...) 4
RE Extensions
Regular expression extensions are used as
convenient notation of complex RE:

R? =  | R (zero or one R)
R+ = RR* (one or more R)
(R) = R (grouping)
[abc] = a|b|c (any of listed)
[a-z] = a|b|....|z (range)
[^ab] = c|d|... (anything but ‘a’‘b’)
5
Regular Expression
RE Strings in L(R)
a “a”
ab “ab”
a|b “a” “b”
(ab)* “” “ab” “abab” ...
(a|)b “ab” “b”
Here are examples of common tokens found in
programming languages.
 integer: a non-empty string of digits
 digit = ‘0’|’1’|’2’|’3’|’4’|’5’|’6’|’7’|’8’|’9’
 integer = digit digit*

6
Example: identifiers
 identifier:
string or letters or digits starting with a letter
 C identifier: [a-zA-Z_][a-zA-Z0-9_]*

How to Use REs


 We need mechanism to determine if an input
string w belongs to L(R), the language
denoted by regular expression R.
. 7
Acceptor
 Such a mechanism is called
an acceptor.
input w
string yes, if w  L
acceptor
no, if w  L
language L

8
Finite Automata (FA)
 Specification: Regular Expressions
 Implementation: Finite Automata

Finite Automaton consists of


 An input alphabet (
 A set of states
 A start (initial) state
 A set of transitions
 A set of accepting (final) states 9
Finite Automaton
State Graphs
A state
The start state

An accepting state
a

A transition 10
Finite Automata
 A finite automaton accepts a string if we can
follow transitions labelled with characters in the
string from start state to some accepting state.

FA Example
A FA that accepts only “1”
1

11
FA Example
 A FA that accepts any number of 1’s followed by
a single 0
1
0

 A FA that accepts ab*a


 Alphabet: {a,b} b
a a
12
Table Encoding of FA
 Transition b
table a a
0 1 2

a b
0 1 err
1 2 1
2 err err
13
RE → Finite Automata
 Can we build a finite automaton for every regular
expression?
 Yes, – build FA inductively based on the definition
of Regular Expression
NFA
Nondeterministic Finite Automaton (NFA)
 Can have multiple transitions for one input in a given state
 Can have  - moves

Epsilon Moves
 ε – moves 
machine can move from state A
to state B without consuming
input
A 14 B
NFA
operation of the automaton is not completely defined by input
1
0 1
A B C
On input “11”, automaton could be in either state
Execution of FA
A NFA can choose
 Whether to make -moves.
 Which of multiple transitions to take for a single
input. 15
Acceptance of NFA
 NFA can get into multiple states
 Rule: NFA accepts if it can get in a final state
1
0 1
A B C

0
DFA and NFA
Deterministic Finite Automata (DFA)
 One transition per input per state.
 No  - moves
16
Execution of FA
A DFA
 can take only one path through the state graph.
 Completely determined by input.

NFA vs DFA
 NFAs and DFAs recognize the same set of languages (RL)
 DFAs are easier to implement – table driven.
 For a given language, the NFA can be simpler than the DFA.
 DFA can be exponentially larger than NFA.
 NFAs are the key to automating RE → DFA construction.

17
RE → NFA Construction
Thompson’s construction (CACM 1968)
 Build an NFA for each RE term.
 Combine NFAs with -moves.
Subset construction
NFA → DFA
 Build the simulation.
 Minimize number of states in DFA (Hopcroft’s
algorithm)
Key idea:
 NFA pattern for each symbol and each operator.
 Join them with -moves in precedence order.
18
RE → NFA Construction
a
NFA for a s0 s1
b
NFA for b s3 s4

a  b
s0 s1 s3 s4

NFA for ab
19
RE → NFA Construction
a
 s1 s2 
s0 s5
 b
s3 s4 

NFA for a | b
20
RE → NFA Construction

 a 
s0 s1 s2 s4


NFA for a*
21
RE → NFA Construction

 a 
s0 s1 s2 s4


NFA for a*
22
Example RE → NFA
NFA for a ( b|c )* 

b
   s4 s5 
a 
s0 s1 s2 s3 s8 s9
 s c
6 s7 


23
Thank You!

24

You might also like