0% found this document useful (0 votes)

2 views

compiler construction Lecture 3-4

The document provides an overview of compiler construction, focusing on formal languages, automata theory, and lexical analysis. It explains key concepts such as tokens, patterns, and regular expressions, and describes the process of constructing a lexical analyzer. Additionally, it covers finite automata, including deterministic (DFA) and nondeterministic (NFA) types, and discusses the conversion between regular expressions and finite automata.

Uploaded by

programmerareeba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

compiler construction Lecture 3-4

Uploaded by

programmerareeba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

1

COMPILER CONSTRUCTION(CS-636)

Gul Sher Ali

MS(CS & Tech.)-China

______________________________________________________
GIMS- PMAS Arid Agriculture University, Gujrat Campus
B-
Formal Languages & Automata2
Theory (a review in one slide)
 Alphabet: a finite set of symbols
 String: a finite, possibly empty sequence of
symbols from an alphabet
 Language: a set, often infinite, of strings
 Finite specifications of (possibly infinite)
languages
 Automaton – a recognizer; a machine that
accepts all strings in a language (and rejects all
other strings)
 Grammar – a generator; a system for producing
all strings in the language (and no other strings)
 A particular language may be specified by
many different grammars and automata
 A grammar or automaton specifies only one
language
Lexical Analyzer

 Reads the input characters of the source program and

produces a sequence of tokens for each lexeme.

Source Program Lexical Tokens Parser

Analyzer (Syntax analyzer)
(Scanner)
Get next Token

Symbol
Table
3
Lexical Analyzer

 Token : A pair consisting of token name & attribute

value.

 E.g. :
 <id , a>

 <keyword, int >

 < keyword , if >

 < number , 123>

4
Lexical Analyzer

 Pattern: Describes the form that a lexeme can take.

Id [R.E]

Letter {letter + digit )*

Letter = a-z

Digit = 0-9

 Lexeme: A sequence of characters that matches the pattern of a

token.

 Literal: Any thing surrounded by double quotes “”.

5
Lexical Analyzer

 Task of Lexical analyzer:

1. Deletion of comments, white spaces, blanks , tabs, etc.

2. Produces the sequence of tokens.

3. Keeps the tracks of lines in case of errors.

6
Tokens 7
Example:

if( i == j )
z = 0;
else
z = 1;
Tokens 8
 Input is just a sequence of characters:

i f ( \b i \b = = \b j \n \t ....
Tokens 9

Goal:
 partition input string into
substrings
 classify them according to
their role
Tokens 10

A token is a syntactic
category
 Natural language:
“He wrote the program”
 Words: “He”, “wrote”,
“the”, “program”
Tokens 11

 Programming language:
“if(b == 0) a = b”
 Words:
“if”, “(”, “b”, “==”, “0”,
“)”, “a”, “=”, “b”
Tokens 12

 Identifiers:x y11 maxsize

 Keywords: if else while
for
 Integers: 2 1000 -44 5L
 Floats: 2.0 0.0034 1e5
 Symbols: ( ) + * / { } < > ==
 Strings: “enter x” “error”
Scanning 13

 When a token is found:

 It is passed to the next phase of compiler.
 Sometimes values associated with the token,
called attributes, need to be calculated.
 Some tokens, together with their attributes,
must be stored in the symbol/literal table.
 it is necessary to check if the token is already in the
table
 Examples of attributes
 Attributes of a variable are name, address,
type, etc.
 An attribute of a numeric constant is its value.
How to construct a 14

scanner
 Define tokens in the source language.
 Describe the patterns allowed for
tokens.
 Write regular expressions describing
the patterns.
 Construct an FA for each pattern.
 Combine all FA’s which results in an
NFA.
 Convert NFA into DFA
 Write a program simulating the DFA.
Ad-hoc Lexer 15

 Hand-write code to
generate tokens.
 Partition the input string
by reading left-to-right,
recognizing one token at
a time
Ad-hoc Lexer 16

 Look-ahead required to
decide where one token
ends and the next token
begins.
Ad-hoc Lexer 17

Problems:
 Do not know what kind of
token we are going to
read from seeing first
character.
Ad-hoc Lexer 18

Problems:
 If token begins with “i”, is
it an identifier “i” or
keyword “if”?
 If token begins with “=”, is
it “=” or “==”?
Ad-hoc Lexer 19

 Need a more principled

approach
 Use lexer generator that
generates efficient
tokenizer automatically.
How to Describe Tokens? 20

 Regular
Languages are the
most popular for specifying
tokens
Simple and useful theory
Easy to understand
Efficient implementations
Languages 21

 Let S be a set of
characters. S is called
the alphabet.
 A language over S is set
of strings of characters
drawn from S.
Example of Languages 22

Alphabet = English characters

Language = English sentences

Alphabet = ASCII
Language = C++ programs,
Java, C#
Notation 23

 Languages are sets of

strings (finite sequence of
characters)
 Need some notation for
specifying which sets we
want
Notation 24

 Forlexical analysis we
care about regular
languages.
 Regular languages can
be described using
regular expressions.
Regular Languages 25

 Each regular expression is

a notation for a regular
language (a set of words).
 If A is a regular expression,
we write L(A) to refer to
language denoted by A.
Regular Expression 26

A regular expression (RE)

is defined inductively
a ordinary character
from S
e the empty string
Regular Expression 27

R|S = either R or S
RS = R followed by S
(concatenation)
R* = concatenation of R
zero or more times
(R*= e |R|RR|RRR...)
RE Extentions 28

R? = e | R (zero or one R)
R + = RR* (one or more R)
(R) = R (grouping)
RE Extentions 29

[abc] = a|b|c (any of

listed)
[a-z] = a|b|....|z (range)
[^ab] = c|d|... (anything
but
‘a’‘b’)
Regular Expression 30

RE Strings in L(R)
a “a”
ab “ab”
a|b “a” “b”
(ab)* “” “ab” “abab” ...
(a|e)b “ab” “b”
Example: integers 31

 integer: a non-empty
string
of digits
 digit =
‘0’|’1’|’2’|’3’|’4’|
’5’|’6’|’7’|’8’|’9’
 integer = digit digit*
Example: identifiers 32

 identifier:
string or letters or digits
starting with a letter
 C identifier:
[a-zA-Z_][a-zA-Z0-9_]*
Recap 33

Tokens:
strings of characters
representing lexical units
of programs such as
identifiers, numbers,
operators.
Recap 34

Regular Expressions:
concise description of
tokens. A regular
expression describes a
set of strings.
Recap 35

Language L(R):
set of strings represented
by a regular expression
R. L(R) is the language
denoted by regular
expression R.
How to Use REs 36

 We need mechanism to
determine if an input
string w belongs to L(R),
the language denoted
by regular expression R.
Acceptor 37

 Sucha mechanism is called

an acceptor.
input w
string yes, if w e L
acceptor
no, if w e L
language L
Finite Automata (FA) 38

 Specification:
Regular Expressions
 Implementation:
Finite Automata
Finite Automata
Finite Automaton consists of
 An input alphabet (S)
 A set of states
 A start (initial) state
 A set of transitions
 A set of accepting (final) states

39
Finite Automaton 40

State Graphs
A state

The start state

An accepting
state
Finite Automaton 41

State Graphs
a

A transition
Finite Automata 42

A finite automaton
accepts a string if we can
follow transitions labelled
with characters in the
string from start state to
some accepting state.
FA Example 43

A FA that accepts only “1”

1
FA Example 44
 A FA that accepts any number of 1’s followed by a single 0

1
0
FA Example 45
 A FA that accepts ab*a
 Alphabet: {a,b}

b
a a
Table Encoding of FA
 Transition b
table a a
0 1 2

a b
0 1 err
1 2 1
2 err err
46
RE → Finite Automata 47

 Can we build a finite

automaton for every
regular expression?
 Yes, – build FA inductively
based on the definition of
Regular Expression
NFA 48

Nondeterministic Finite
Automaton (NFA)
 Can have multiple
transitions for one input
in a given state
 Can have e - moves
Epsilon Moves 49
 ε – moves
machine can move from state A to state B without consuming
input

e
A B
NFA 50
operation of the automaton is
not completely defined by input
1
0 1
A B C

On input “11”, automaton could be

in either state
Execution of FA 51

A NFA can choose

 Whether to make e-
moves.
 Which of multiple
transitions to take for a
single input.
Acceptance of NFA 52
 NFA can get into multiple states
 Rule: NFA accepts if it can get in a final state

1
0 1
A B C

0
DFA and NFA 53

Deterministic Finite
Automata (DFA)
 One transition per input
per state.
 No e - moves
Execution of FA 54

A DFA
 can take only one path
through the state graph.
 Completely determined by
input.
NFA vs DFA 55

 NFAs and DFAs recognize

the same set of
languages (regular
languages)
 DFAs are easier to
implement – table driven.
NFA vs DFA 56

 Fora given language,

the NFA can be simpler
than the DFA.
 DFA can be
exponentially larger than
NFA.
NFA vs DFA 57

 NFAsare the key to

automating RE → DFA
construction.
RE → NFA Construction 58

Thompson’s construction
(CACM 1968)
 Build an NFA for each RE
term.
 Combine NFAs with
e-moves.
RE → NFA Construction 59

Subset construction
NFA → DFA
 Build the simulation.
 Minimize number of
states in DFA (Hopcroft’s
algorithm)
RE → NFA Construction 60

Key idea:
 NFA pattern for each
symbol and each
operator.
 Join them with e-moves in
precedence order.
RE → NFA Construction 61

a
s0 s1
NFA for a

s0 a
s1 e s3 b s4

NFA for ab
RE → NFA Construction 62
a
NFA for a s0 s1
RE → NFA Construction 63
a
NFA for a s0 s1
b
NFA for b s3 s4
RE → NFA Construction 64
a
NFA for a s0 s1
b
NFA for b s3 s4

s0 a b
s1 s3 s4
RE → NFA Construction 65
a
NFA for a s0 s1
b
NFA for b s3 s4

s0 a
s1 e s3 b s4

NFA for ab
RE → NFA Construction 66

a
e s1 s2 e
s0 s5
e b
s3 s4 e

NFA for a | b
RE → NFA Construction 67

s1 a
s2

NFA for a
RE → NFA Construction 68

s1 a
s2

s3 b s4

NFA for a and b

RE → NFA Construction 69

a
e s1 s2 e
s0 s5
e b
s3 s4 e

NFA for a | b
RE → NFA Construction 70

s0 e s1 a
s2 e s4

e
NFA for a*
RE → NFA Construction 71

s1 a
s2

NFA for a
RE → NFA Construction 72

s0 e s1 a
s2 e s4

e
NFA for a*
Example RE → NFA 73

NFA for a ( b|c )* e

b
e e e s4 s5 e
a e
s0 s1 s2 s3 s8 s9
e s6 c
s7 e

e
Example RE → NFA 74

building NFA for a ( b|c )*

a
s0 s1
Example RE → NFA 75

NFA for a, b and c

b
s4 s5
a
s0 s1
s6 c
s7
Example RE → NFA 76

NFA for a and b|c

b
e s4 s5 e
a
s0 s1 s3 s8
e s6 c
s7 e
Example RE → NFA 77

NFA for a and ( b|c )*

e
b
e e s4 s5 e
a e
s0 s1 s2 s3 s8 s9
e s6 c
s7 e

e
Example RE → NFA 78

NFA for a ( b|c )* e

b
e e e s4 s5 e
a e
s0 s1 s2 s3 s8 s9
e s6 c
s7 e

Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
SLD 2
No ratings yet
SLD 2
67 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Chapter 3 - Lexical Analysis
100% (3)
Chapter 3 - Lexical Analysis
51 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
Lecture Week 03
No ratings yet
Lecture Week 03
24 pages
ch-2.pdf 2
No ratings yet
ch-2.pdf 2
27 pages
UNIT-I - Lexical Analysis
No ratings yet
UNIT-I - Lexical Analysis
51 pages
02. Chapter 3 - Lexical Analysis
No ratings yet
02. Chapter 3 - Lexical Analysis
51 pages
File 1675742677 110405 LexicalAnalysis-Continue1
No ratings yet
File 1675742677 110405 LexicalAnalysis-Continue1
39 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
CompilerD L3
No ratings yet
CompilerD L3
36 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
CH 2
No ratings yet
CH 2
36 pages
Code Source Tokens Scanner Parser IR
No ratings yet
Code Source Tokens Scanner Parser IR
26 pages
Lexical Analysis All Token List and Diffence
No ratings yet
Lexical Analysis All Token List and Diffence
4 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
Lecture 06
No ratings yet
Lecture 06
27 pages
Lexical Analysis1
No ratings yet
Lexical Analysis1
44 pages
Regualr Languages
No ratings yet
Regualr Languages
29 pages
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
No ratings yet
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
52 pages
Lec02 Lexicalanalyzer
100% (1)
Lec02 Lexicalanalyzer
50 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
Lexical Analysis
No ratings yet
Lexical Analysis
36 pages
lect03
No ratings yet
lect03
19 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
34 pages
Chapter Two LexicalAnalysis
No ratings yet
Chapter Two LexicalAnalysis
16 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
Lecture 06
No ratings yet
Lecture 06
27 pages
Lexical Analysis
No ratings yet
Lexical Analysis
47 pages
2 - Scanner
No ratings yet
2 - Scanner
49 pages
Lexi Cal
No ratings yet
Lexi Cal
38 pages
Lecture 06
No ratings yet
Lecture 06
26 pages
Chapter 2 - Copy
No ratings yet
Chapter 2 - Copy
39 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
No ratings yet
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
69 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Lecture 3-4 Updated
No ratings yet
Lecture 3-4 Updated
26 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
Ch3myppt
No ratings yet
Ch3myppt
59 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Ch2-CC
No ratings yet
Ch2-CC
47 pages
Chapter-2[1]
No ratings yet
Chapter-2[1]
77 pages
Unit 2-Introduction to Compilers
No ratings yet
Unit 2-Introduction to Compilers
51 pages
6-Lexical Analysis Part5
No ratings yet
6-Lexical Analysis Part5
20 pages
Chapter 4
No ratings yet
Chapter 4
14 pages
Compiler
No ratings yet
Compiler
60 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Regular Expression & Autometa
No ratings yet
Regular Expression & Autometa
62 pages
Ch3 1
No ratings yet
Ch3 1
52 pages
2024 CSN352 Lec 8
No ratings yet
2024 CSN352 Lec 8
48 pages
Recognition of Tokens
No ratings yet
Recognition of Tokens
34 pages
Chapter 33
No ratings yet
Chapter 33
107 pages
Finals Review
No ratings yet
Finals Review
42 pages
Compiler Design Lab manual
No ratings yet
Compiler Design Lab manual
32 pages
Implementation of The Regular Expression
No ratings yet
Implementation of The Regular Expression
10 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Graph Theory Basic Definition
No ratings yet
Graph Theory Basic Definition
3 pages
Nota PDF Bab 2
No ratings yet
Nota PDF Bab 2
3 pages
Fraction Concepts
No ratings yet
Fraction Concepts
17 pages
MIT15 093J F09 Final 2006
No ratings yet
MIT15 093J F09 Final 2006
6 pages
RS Aggarwal Solutions Class 9 Maths Chapter 5 Coordinate Geometry
No ratings yet
RS Aggarwal Solutions Class 9 Maths Chapter 5 Coordinate Geometry
5 pages
Travelling Salesman Problem
No ratings yet
Travelling Salesman Problem
3 pages
Lecture 01 Part C - Constraint Satisfaction Problem (CSP)
No ratings yet
Lecture 01 Part C - Constraint Satisfaction Problem (CSP)
132 pages
Convert To Clause Form:: 1-Eliminate The Implication (
No ratings yet
Convert To Clause Form:: 1-Eliminate The Implication (
8 pages
Prediction of Student's Performance With Deep Neural Networks
No ratings yet
Prediction of Student's Performance With Deep Neural Networks
9 pages
Data Structure Question Bank
No ratings yet
Data Structure Question Bank
24 pages
Chapter 6 Sugeno Fuzzy and Mamdani models
No ratings yet
Chapter 6 Sugeno Fuzzy and Mamdani models
5 pages
DP Placing Parenthesis
No ratings yet
DP Placing Parenthesis
26 pages
Module 3 Quantech
No ratings yet
Module 3 Quantech
3 pages
Lecture 20 Water Resources Systems Modeling Techniques and Analysis
No ratings yet
Lecture 20 Water Resources Systems Modeling Techniques and Analysis
18 pages
UNIT-3 - Arithmetic Logic Circuit
No ratings yet
UNIT-3 - Arithmetic Logic Circuit
84 pages
C++ Data Types - Tutorialspoint
No ratings yet
C++ Data Types - Tutorialspoint
4 pages
Equal Stack and Down To Zero Problem of Hacckerank
No ratings yet
Equal Stack and Down To Zero Problem of Hacckerank
5 pages
Enumeration
0% (1)
Enumeration
12 pages
Church Turing Thesis
100% (3)
Church Turing Thesis
6 pages
Starting Out With Programming Logic & Design - Chapter4 - Descision Structures and Boolean Logic
No ratings yet
Starting Out With Programming Logic & Design - Chapter4 - Descision Structures and Boolean Logic
16 pages
Ant Colony Optimization
100% (1)
Ant Colony Optimization
66 pages
Decidable Regular Context Free
No ratings yet
Decidable Regular Context Free
16 pages
Full Project On BubbleSort Algorithm
No ratings yet
Full Project On BubbleSort Algorithm
9 pages
Ap Calculus BC Test Booklet
No ratings yet
Ap Calculus BC Test Booklet
5 pages
Iszc 361
No ratings yet
Iszc 361
2 pages
Instant ebooks textbook Data Structures & Algorithms using Kotlin, Second Edition Hemant Jain download all chapters
90% (10)
Instant ebooks textbook Data Structures & Algorithms using Kotlin, Second Edition Hemant Jain download all chapters
40 pages
Note Oct 1, 2021 RANDOM WALK ON THE RADO GRAPH
No ratings yet
Note Oct 1, 2021 RANDOM WALK ON THE RADO GRAPH
10 pages
CSF111 CP Handout
No ratings yet
CSF111 CP Handout
3 pages
Solution Manual for A Friendly Introduction to Numerical Analysis Brian Bradie - Available For Quick Download And Unlimited Reading
100% (8)
Solution Manual for A Friendly Introduction to Numerical Analysis Brian Bradie - Available For Quick Download And Unlimited Reading
40 pages
Signed Number Representations
No ratings yet
Signed Number Representations
18 pages

compiler construction Lecture 3-4

Uploaded by

compiler construction Lecture 3-4

Uploaded by

1

Gul Sher Ali

 Reads the input characters of the source program and

Source Program Lexical Tokens Parser

 Token : A pair consisting of token name & attribute

 <keyword, int >

 < keyword , if >

 < number , 123>

 Pattern: Describes the form that a lexeme can take.

Letter {letter + digit )*

 Lexeme: A sequence of characters that matches the pattern of a

 Literal: Any thing surrounded by double quotes “”.

 Task of Lexical analyzer:

1. Deletion of comments, white spaces, blanks , tabs, etc.

2. Produces the sequence of tokens.

3. Keeps the tracks of lines in case of errors.

 Identifiers:x y11 maxsize

 When a token is found:

 Need a more principled

Alphabet = English characters

 Languages are sets of

 Each regular expression is

A regular expression (RE)

[abc] = a|b|c (any of

 Sucha mechanism is called

The start state

A FA that accepts only “1”

 Can we build a finite

On input “11”, automaton could be

A NFA can choose

 NFAs and DFAs recognize

 Fora given language,

 NFAsare the key to

NFA for a and b

NFA for a ( b|c )* e

building NFA for a ( b|c )*

NFA for a, b and c

NFA for a and b|c

NFA for a and ( b|c )*

NFA for a ( b|c )* e

You might also like