0% found this document useful (0 votes)
2 views

compiler construction Lecture 3-4

The document provides an overview of compiler construction, focusing on formal languages, automata theory, and lexical analysis. It explains key concepts such as tokens, patterns, and regular expressions, and describes the process of constructing a lexical analyzer. Additionally, it covers finite automata, including deterministic (DFA) and nondeterministic (NFA) types, and discusses the conversion between regular expressions and finite automata.

Uploaded by

programmerareeba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

compiler construction Lecture 3-4

The document provides an overview of compiler construction, focusing on formal languages, automata theory, and lexical analysis. It explains key concepts such as tokens, patterns, and regular expressions, and describes the process of constructing a lexical analyzer. Additionally, it covers finite automata, including deterministic (DFA) and nondeterministic (NFA) types, and discusses the conversion between regular expressions and finite automata.

Uploaded by

programmerareeba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

1

COMPILER CONSTRUCTION(CS-636)

Gul Sher Ali


MS(CS & Tech.)-China

______________________________________________________
GIMS- PMAS Arid Agriculture University, Gujrat Campus
B-
Formal Languages & Automata2
Theory (a review in one slide)
 Alphabet: a finite set of symbols
 String: a finite, possibly empty sequence of
symbols from an alphabet
 Language: a set, often infinite, of strings
 Finite specifications of (possibly infinite)
languages
 Automaton – a recognizer; a machine that
accepts all strings in a language (and rejects all
other strings)
 Grammar – a generator; a system for producing
all strings in the language (and no other strings)
 A particular language may be specified by
many different grammars and automata
 A grammar or automaton specifies only one
language
Lexical Analyzer

 Reads the input characters of the source program and


produces a sequence of tokens for each lexeme.

Source Program Lexical Tokens Parser


Analyzer (Syntax analyzer)
(Scanner)
Get next Token

Symbol
Table
3
Lexical Analyzer

 Token : A pair consisting of token name & attribute


value.

 E.g. :
 <id , a>

 <keyword, int >

 < keyword , if >

 < number , 123>

4
Lexical Analyzer

 Pattern: Describes the form that a lexeme can take.

Id [R.E]

Letter {letter + digit )*

Letter = a-z

Digit = 0-9

 Lexeme: A sequence of characters that matches the pattern of a


token.

 Literal: Any thing surrounded by double quotes “”.

5
Lexical Analyzer

 Task of Lexical analyzer:

1. Deletion of comments, white spaces, blanks , tabs, etc.

2. Produces the sequence of tokens.

3. Keeps the tracks of lines in case of errors.

6
Tokens 7
Example:

if( i == j )
z = 0;
else
z = 1;
Tokens 8
 Input is just a sequence of characters:

i f ( \b i \b = = \b j \n \t ....
Tokens 9

Goal:
 partition input string into
substrings
 classify them according to
their role
Tokens 10

A token is a syntactic
category
 Natural language:
“He wrote the program”
 Words: “He”, “wrote”,
“the”, “program”
Tokens 11

 Programming language:
“if(b == 0) a = b”
 Words:
“if”, “(”, “b”, “==”, “0”,
“)”, “a”, “=”, “b”
Tokens 12

 Identifiers:x y11 maxsize


 Keywords: if else while
for
 Integers: 2 1000 -44 5L
 Floats: 2.0 0.0034 1e5
 Symbols: ( ) + * / { } < > ==
 Strings: “enter x” “error”
Scanning 13

 When a token is found:


 It is passed to the next phase of compiler.
 Sometimes values associated with the token,
called attributes, need to be calculated.
 Some tokens, together with their attributes,
must be stored in the symbol/literal table.
 it is necessary to check if the token is already in the
table
 Examples of attributes
 Attributes of a variable are name, address,
type, etc.
 An attribute of a numeric constant is its value.
How to construct a 14

scanner
 Define tokens in the source language.
 Describe the patterns allowed for
tokens.
 Write regular expressions describing
the patterns.
 Construct an FA for each pattern.
 Combine all FA’s which results in an
NFA.
 Convert NFA into DFA
 Write a program simulating the DFA.
Ad-hoc Lexer 15

 Hand-write code to
generate tokens.
 Partition the input string
by reading left-to-right,
recognizing one token at
a time
Ad-hoc Lexer 16

 Look-ahead required to
decide where one token
ends and the next token
begins.
Ad-hoc Lexer 17

Problems:
 Do not know what kind of
token we are going to
read from seeing first
character.
Ad-hoc Lexer 18

Problems:
 If token begins with “i”, is
it an identifier “i” or
keyword “if”?
 If token begins with “=”, is
it “=” or “==”?
Ad-hoc Lexer 19

 Need a more principled


approach
 Use lexer generator that
generates efficient
tokenizer automatically.
How to Describe Tokens? 20

 Regular
Languages are the
most popular for specifying
tokens
Simple and useful theory
Easy to understand
Efficient implementations
Languages 21

 Let S be a set of
characters. S is called
the alphabet.
 A language over S is set
of strings of characters
drawn from S.
Example of Languages 22

Alphabet = English characters


Language = English sentences

Alphabet = ASCII
Language = C++ programs,
Java, C#
Notation 23

 Languages are sets of


strings (finite sequence of
characters)
 Need some notation for
specifying which sets we
want
Notation 24

 Forlexical analysis we
care about regular
languages.
 Regular languages can
be described using
regular expressions.
Regular Languages 25

 Each regular expression is


a notation for a regular
language (a set of words).
 If A is a regular expression,
we write L(A) to refer to
language denoted by A.
Regular Expression 26

A regular expression (RE)


is defined inductively
a ordinary character
from S
e the empty string
Regular Expression 27

R|S = either R or S
RS = R followed by S
(concatenation)
R* = concatenation of R
zero or more times
(R*= e |R|RR|RRR...)
RE Extentions 28

R? = e | R (zero or one R)
R + = RR* (one or more R)
(R) = R (grouping)
RE Extentions 29

[abc] = a|b|c (any of


listed)
[a-z] = a|b|....|z (range)
[^ab] = c|d|... (anything
but
‘a’‘b’)
Regular Expression 30

RE Strings in L(R)
a “a”
ab “ab”
a|b “a” “b”
(ab)* “” “ab” “abab” ...
(a|e)b “ab” “b”
Example: integers 31

 integer: a non-empty
string
of digits
 digit =
‘0’|’1’|’2’|’3’|’4’|
’5’|’6’|’7’|’8’|’9’
 integer = digit digit*
Example: identifiers 32

 identifier:
string or letters or digits
starting with a letter
 C identifier:
[a-zA-Z_][a-zA-Z0-9_]*
Recap 33

Tokens:
strings of characters
representing lexical units
of programs such as
identifiers, numbers,
operators.
Recap 34

Regular Expressions:
concise description of
tokens. A regular
expression describes a
set of strings.
Recap 35

Language L(R):
set of strings represented
by a regular expression
R. L(R) is the language
denoted by regular
expression R.
How to Use REs 36

 We need mechanism to
determine if an input
string w belongs to L(R),
the language denoted
by regular expression R.
Acceptor 37

 Sucha mechanism is called


an acceptor.
input w
string yes, if w e L
acceptor
no, if w e L
language L
Finite Automata (FA) 38

 Specification:
Regular Expressions
 Implementation:
Finite Automata
Finite Automata
Finite Automaton consists of
 An input alphabet (S)
 A set of states
 A start (initial) state
 A set of transitions
 A set of accepting (final) states

39
Finite Automaton 40

State Graphs
A state

The start state

An accepting
state
Finite Automaton 41

State Graphs
a

A transition
Finite Automata 42

A finite automaton
accepts a string if we can
follow transitions labelled
with characters in the
string from start state to
some accepting state.
FA Example 43

A FA that accepts only “1”


1
FA Example 44
 A FA that accepts any number of 1’s followed by a single 0

1
0
FA Example 45
 A FA that accepts ab*a
 Alphabet: {a,b}

b
a a
Table Encoding of FA
 Transition b
table a a
0 1 2

a b
0 1 err
1 2 1
2 err err
46
RE → Finite Automata 47

 Can we build a finite


automaton for every
regular expression?
 Yes, – build FA inductively
based on the definition of
Regular Expression
NFA 48

Nondeterministic Finite
Automaton (NFA)
 Can have multiple
transitions for one input
in a given state
 Can have e - moves
Epsilon Moves 49
 ε – moves
machine can move from state A to state B without consuming
input

e
A B
NFA 50
operation of the automaton is
not completely defined by input
1
0 1
A B C

On input “11”, automaton could be


in either state
Execution of FA 51

A NFA can choose


 Whether to make e-
moves.
 Which of multiple
transitions to take for a
single input.
Acceptance of NFA 52
 NFA can get into multiple states
 Rule: NFA accepts if it can get in a final state

1
0 1
A B C

0
DFA and NFA 53

Deterministic Finite
Automata (DFA)
 One transition per input
per state.
 No e - moves
Execution of FA 54

A DFA
 can take only one path
through the state graph.
 Completely determined by
input.
NFA vs DFA 55

 NFAs and DFAs recognize


the same set of
languages (regular
languages)
 DFAs are easier to
implement – table driven.
NFA vs DFA 56

 Fora given language,


the NFA can be simpler
than the DFA.
 DFA can be
exponentially larger than
NFA.
NFA vs DFA 57

 NFAsare the key to


automating RE → DFA
construction.
RE → NFA Construction 58

Thompson’s construction
(CACM 1968)
 Build an NFA for each RE
term.
 Combine NFAs with
e-moves.
RE → NFA Construction 59

Subset construction
NFA → DFA
 Build the simulation.
 Minimize number of
states in DFA (Hopcroft’s
algorithm)
RE → NFA Construction 60

Key idea:
 NFA pattern for each
symbol and each
operator.
 Join them with e-moves in
precedence order.
RE → NFA Construction 61

a
s0 s1
NFA for a

s0 a
s1 e s3 b s4

NFA for ab
RE → NFA Construction 62
a
NFA for a s0 s1
RE → NFA Construction 63
a
NFA for a s0 s1
b
NFA for b s3 s4
RE → NFA Construction 64
a
NFA for a s0 s1
b
NFA for b s3 s4

s0 a b
s1 s3 s4
RE → NFA Construction 65
a
NFA for a s0 s1
b
NFA for b s3 s4

s0 a
s1 e s3 b s4

NFA for ab
RE → NFA Construction 66

a
e s1 s2 e
s0 s5
e b
s3 s4 e

NFA for a | b
RE → NFA Construction 67

s1 a
s2

NFA for a
RE → NFA Construction 68

s1 a
s2

s3 b s4

NFA for a and b


RE → NFA Construction 69

a
e s1 s2 e
s0 s5
e b
s3 s4 e

NFA for a | b
RE → NFA Construction 70

s0 e s1 a
s2 e s4

e
NFA for a*
RE → NFA Construction 71

s1 a
s2

NFA for a
RE → NFA Construction 72

s0 e s1 a
s2 e s4

e
NFA for a*
Example RE → NFA 73

NFA for a ( b|c )* e

b
e e e s4 s5 e
a e
s0 s1 s2 s3 s8 s9
e s6 c
s7 e

e
Example RE → NFA 74

building NFA for a ( b|c )*

a
s0 s1
Example RE → NFA 75

NFA for a, b and c

b
s4 s5
a
s0 s1
s6 c
s7
Example RE → NFA 76

NFA for a and b|c

b
e s4 s5 e
a
s0 s1 s3 s8
e s6 c
s7 e
Example RE → NFA 77

NFA for a and ( b|c )*


e
b
e e s4 s5 e
a e
s0 s1 s2 s3 s8 s9
e s6 c
s7 e

e
Example RE → NFA 78

NFA for a ( b|c )* e

b
e e e s4 s5 e
a e
s0 s1 s2 s3 s8 s9
e s6 c
s7 e

You might also like