0% found this document useful (0 votes)

10 views

Chapter 2 - Copy

Uploaded by

jafarabdurazak020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Chapter 2 - Copy

Uploaded by

jafarabdurazak020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 39

CHAPTER Two

COMPILER DESIGN

Prepared by: Mesay Yohannes (MSc.)

November, 2023
Jinka
Lexical Analysis
Outline
2.1. The role of the lexical analysis

2.2. Token, Pattern, lexeme

2.3. Lexical Error

2.4. Finite Automata

2.5. Lexical Analyzer Generator

1
Phases of a Compiler
 This figure shows a typical decomposition (Implementation) of a compiler.

2
2.1. The Role of the Lexical Analysis
 Lexical Analyzer (LA) is the first phase of a compiler also called linear analysis or scanning.
 LA reads the stream of character as input and produce a sequence of character.
Main Functions of Lexical Analyzer.
 1st task is to read the input characters (stream of characters) and
produce a sequence of tokens that the parser uses for syntax analysis.
 2nd task is removing any comments and white space from source code
in the form of blank, tab, and new line characters.
 Another task is it generates an error messages, if it finds invalid token from the
source program.

3
2.1. The Role of the Lexical Analysis…(Cont.)
 Generally LA reads stream of characters as input and produce a sequence of tokens.

4
2.2. Token, Pattern, Lexeme
 Implementation of Lexical Analyzer can do:-
 Removes all white space and comments.
 Identify tokens.
 Return lexeme of a found token.
 Tokens: describes the category of input string.
 In programming language, keywords, constants, identifiers, strings, numbers, whitespace,
operators, and punctuations symbol are considered as tokens.
 For example, in C language, the variable declaration line.
 Int value = 100;
 Contains the tokens:
 Int (keyword), value (identifier), = (assignment) 100 (constant) and; (symbol(semicolon)).
Attributes of Token
 When more than one pattern matches a lexeme,
 Lexical analyzer must provide additional information about the particular lexeme.
 Lexical analyzer collects information about tokens into their associated attributes. 5
2.2. Token, Pattern, Lexeme
 Lexeme is a sequence of characters (alphanumeric) in the source program that
matches the pattern for a token.
Example: a = b + c;
Tokens: Identifier, Keywords, and Punctuations.
Lexeme: a, b, c, +, =, ;

Token Lexeme Attributes

Identifier a Index to symbol table entry a
Assignme =
nt
Identifier b Index to symbol table entry b
Operator +
Identifier c Index to symbol table entry c

6
2.2. Token, Pattern, Lexeme
 Program 1:
int max (int a, int b)
{
if(a>b)
Return a; Source Code
Else
Return b;
}

7
2.2. Token, Pattern, Lexeme…(Cont.)
Specifications of Tokens
 The Following are the Specifications of Tokens:
a) Alphabet
b) Strings
c) Language
d) Regular Expression
 Let us understand how the language theory undertakes these terms:
a) Alphabets
 Any finite set of symbols
 {0,1} is a set of binary alphabets,
 {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F} is a set of hexadecimal alphabets,
 {a-z, A-Z} is a set of English language alphabets.

7
2.2. Token, Pattern, Lexeme…(Cont.)
b) Strings
 Any finite sequence of alphabets (characters) is called a string.
 A string over some alphabet is a finite sequence of symbols drawn from that alphabet.
 Length of strings S is the total number of occurrence of alphabets, and it is denoted by |S|
 E.g. the length of the string compiler is 8 and is denoted by |compiler| = 8
 A string having no alphabets, i.e. a string of zero length is known as an empty string and is
denoted by Ɛ (epsilon).

c) Special Symbols
 A typical high-level language contains the following special symbols: -

8
2.2. Token, Pattern, Lexeme…(Cont.)
c) Language
 Language is considered as a finite set of strings over some finite set of fixed
alphabets.
 Computer languages are considered as finite sets, and mathematically set operations
can be performed on them.
 Finite languages can be described by means of regular expressions.
d) Regular Expressions
 Regular expressions are an important notation to specify lexeme patterns for a token.
 Each pattern matches a set of strings, so regular expressions serve as names for a set of
strings.
 Regular expressions are used to represent the language for lexical analyzer
 The lexical analyzer needs to scan and identify only a finite set of valid
string/token/lexeme that belong to the language in hand.
 It searches for the pattern defined by the language rules.
9
2.2. Token, Pattern, Lexeme…(Cont.)
d) Regular Expressions…(Cont.)
 A grammar defined by regular expressions is known as regular grammar.
 The language defined by regular grammar is known as regular languages.
 There are a number of algebraic laws that are obeyed by regular expressions.
 Also known as operations on language.
 In lexical analysis by using regular expressions it is possible to represent:
 Valid tokens of a language,
 Occurrences of Symbols, and
 Language Tokens;
 Representing Valid Tokens of a language in regular expression
 If X is a regular expression, then:
 X* means zero or more occurrences of x.
 i.e. it can generate {e, x, xx, xxx, xxxx, …}
 X+ means one or more occurrence of x.
 i.e. it can generate {x, xx, xxx, xxxx, …} or x.x* 10
2.2. Token, Pattern, Lexeme…(Cont.)
d) Regular Expressions…(Cont.)
 X? means at most one occurrence of x.
 i.e. it can generate either {x} or {e}.
 [a-z] is all lower-case alphabets of English language.
 [A-Z] is all upper-case alphabets of English language.
 [0-9] is all natural digits used in mathematics.
 Representation of Occurrence of Symbols using regular expressions
 Letter = [a-z] or [A-Z]
 Digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 or [0-9]
 Sign = [ + | - ]
 Representation of Language Tokens using regular expressions
 Decimal = (sign)?(digit)+
 Identifier = (letter)(letter | digit)*

11
2.2. Token, Pattern, Lexeme…(Cont.)
d) Regular Expressions…(Cont.)
 Table 1: pattern matching examples of regular expression

12
2.2. Token, Pattern, Lexeme…(Cont.)
d) Regular Expressions…(cont.)
 Therefore, in lexical analysis by using regular expression it is possible to
represent:
 Valid tokens of a Language,
 Occurrences of Symbols, and
 Language Tokens;
 The only problem left with the lexical analyzer is how to verify the validity
of a regular expression used in specifying the patterns of keyword of
languages.
 A well-accepted solution to this problem is using Finite Automata for
verification.

13
2.3. Lexical Errors

14
2.3. Lexical Errors…(Cont.)
 Lexical Error:
 Lexical errors are not very common, but it should be managed by a scanner.
 Misspelling of identifiers, operators, keyword, appearance of some illegal
character are considered as lexical errors.
 Some errors out of power of LA to recognize, because a LA has a very localized
view of source program.
 fi (a == b)
 Whlie(a<b)
 Such errors are recognized when no pattern for tokens matches a character
sequence.
 Numeric constants which are ill-formed.
int i = 4567$91;
Lexical Error
15
2.3. Lexical Errors…(Cont.)
 Error Recovery
 Deleting an extraneous character
 e.g.. coutt -> cout
 Inserting a missing character
 e.g. cot -> cout
 Replacing an incorrect character by a correct character
 e.g. couf -> cout
 Transpose two adjacent characters.
 E.g. ocut ->cout

16
2.4. Automata
 Finite Automata (FA) is a state machine that takes a string of symbols as input and
changes its state accordingly.
 FA is a recognized for regular expressions.
 When a regular expression string is fed into finite automata, it changes its state for each
literal.
 If the input string is successfully processed and the automata reaches its final state, it is
accepted.
 i.e. the string that fed was said to be valid token of the language in hand.
 Regular Expressions = Specification
 Finite Automata = Implementation
 FA representations:
 Graphical (transition diagram or transition table)
 Tabular (transition table).
 Mathematical (transition function or mapping function).
17
2.4. Automata….(Cont.)
 Formal Definition of Finite Automata
 FA is a set of 5-tuples, they are:
M = (Q, Σ, δ, , F)
 Where:
Q = is a finite set called the state.
Σ = is a finite set called the alphabets.
δ=QxΣQ
Q is the start state also called initial state.

F c Q is the set of accept states, also called final state.

 = initial states
 F = final states

18
2.4. Automata….(Cont.)
A) Transition Diagram (Transition Graph)
 It is a directed graph associated with the vertices of the graph corresponds to the state of
finite automata.
 Finite Automata State Graphs

 { 0, 1} are inputs
 = initial states
 = intermediate states States

 = final state
19
2.4. Automata…..(Cont.)
B) Transition Table
 It is basically a tabular representation of the transition function that takes two
arguments (a state and a symbol) and returns a value (the “next state”).
 Rows corresponds to states,
 Column corresponds to input symbols,
 Entire corresponds to the next states.
 The start states is marked with an arrow (->).
 The accept state are marked with a star (*).

20
2.4. Automata
C) Transition Function
 The mapping function or transition function denoted by δ.
 Two parameter are passed to this transition function.
 Current State
 Input State
δ=QxΣQ

 The transition function returns a state which can be called as next state.

 Example:

21
2.4. Automata….(Cont.)
 Types of Finite Automata
1) Deterministic Automata (DFA)
2) Non-Deterministic Automata (NDFA or NFA)

1) Deterministic Finite Automata (DFA):

 If the machine read on input storing one symbol at a times.
 DFA required more space.
 In DFA, there is only one path for specific input from the current state to the next state.
 DFA does not accept the null move (), i.e. which means a machine can not move from one state
to another without reading input.

 DFA can contain multiple final states. It is used in lexical analysis in compiler.

22
2.4. Automata….(Cont.)
Formal Definition of DFA
 A DFA is a collection of 5-tuples (same as FA).
Q = is a finite set called the state.
Σ = is a finite set called the symbols.
= initial state.

F = final state.
δ = transition function
 Transition function can be defined as:
δ=QxΣQ

Acceptance of Language
 A language acceptance is defined by “if a string ” w is accepted by the machine m.
i.e. if it is reaching the final state F by taking the storing W.
 Not accepted if not reaching the final state.
23
2.4. Automata….(Cont.)
 Generally, in DFA the transition state depends on the length of the string.
Number of State = Minimum String + 1
L = {} which means no string

L = {} which means epsilon (the string itself)

 DFA, to accept “a”
Number of state = Minimum String + 1
=1+1
=2

 DFA, to accept zero or more “a”

24
2.4. Automata….(Cont.)
Example 1:
 Let DFA be Q = {a, b, c}
Σ = {0, 1} Input Symbols
= {a} Initial State
F = {c} Final State
Input state 0 1
 a a a
b c a
* c b c

25
2.4. Automata….(Cont.)
Example 2: design DFA with Σ= {0, 1} accepts those storing which starts with 1 and ends
with 0.
Solution:
L = {10, 100, 1010 ------}
Min length = 2
Number of states = |Minimum length + 1|
=2+1
=3

10, is accepted state

26
2.4. Automata….(Cont.)
Example 3: construct DFA accepts all strings over {a, b} ending with ‘ab’.
Solution:
L = {ab, bab ------}
Min length = 2
Number of states = |Minimum length + 1|
=2+1
=3

27
2.4. Automata….(Cont.)
2) NFA (Non Deterministic Finite Automata)
 NFA: when there exist many paths for specific input from the current state to the next state.
 It is easy to construct NFA than DFA for a given regular language.
 Every NFA is not DFA, but each NFA can be translated into DFA.
 DFA  only one path for specific input.
 NFA  many paths for specific input.
 NFA is defined in the same way as DFA, but with two exceptions,
 It contains multiple next state

 It contains transitions. which means a machine can from one state to another without
reading input.

28
2.4. Automata….(Cont.)
Formal Definitions of NFA
 DFA: also have five states same as DFA, but different transition function,
δ=QxΣ
Where:
Q = is a finite set called the state.
Σ = is a finite set of the input symbol.
= initial state.

F = final state.
δ = transition function

29
2.4. Automata….(Cont.)
Graphical Representation of NFA
 An NFA can be represented by graphs called state diagrams. In which:
1. The state is represented by vertices
2. The arc labeled with an input character show the transitions.
3. The initial state is marked with an arrow.
4. The final state is denoted by the double circle.

30
2.4. Automata….(Cont.)
Example 1:
 Let NFA be Q = {, , }
Σ = {0, 1} Input Symbols
= {} Initial State
F = {} Final State

Input state 0 1
 {, }

* {, }

31
2.4. Automata….(Cont.)
Example 3: construct NFA where L = {start with ‘a’}
Σ = {a, b}
Solution:
L = {a, ab, abb ------}
Min length = 1
Number of states = |Minimum length + 1|
=1+1
=2

32
2.4. Automata….(Cont.)
Example 1: design NFA for transition table Q = {, , }
Input state 0 1
 {, }

33
2.5. Lexical Analyzer Generator: LEX
 Creating a Lexical Analyzer with Lex:
 First, a lexical analyzer is prepared by creating a program lex.l in the Lex language.
 Then, lex.1 is run through the lex compiler to produce C a program lex.yy.c.
 Finally, lex.yy.c is run through the C compiler to produce an object program a.out,
 a.out is the lexical analyzer that transforms an input stream into a sequence of
tokens.

34
2.5. Lexical Analyzer Generator: LEX…..(Cont.)
 Lex specification: a lex program consists of three parts:

 Definitions: include declarations of variables, constants, and regular definitions.

 Rules: are statements of the form p1 {action1}p2{action2}…pn{actionn}
 Where pi is regular expression and
 Action describes what action the lexical analyzer should take when pattern pi
matches a lexeme.
 Actions are written in C code.
 User Subroutines: are auxiliary procedures needed by the actions.
 These can be compiled separately and loaded with the lexical analyzer.
35
2.5. Lexical Analyzer Generator: LEX…..(Cont.)
 Consider the following lex program; that counts vowels and constants.

36
U!
YO
NK
H A
T
39

Discrete Mathematics - Balakrishnan and Viswanathan
50% (2)
Discrete Mathematics - Balakrishnan and Viswanathan
492 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Uptu B.tech 2nd Year Syllabus
No ratings yet
Uptu B.tech 2nd Year Syllabus
16 pages
CH 2
No ratings yet
CH 2
36 pages
Chpater 2 Lexical Analysis
No ratings yet
Chpater 2 Lexical Analysis
48 pages
CC_unit_2
No ratings yet
CC_unit_2
80 pages
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
No ratings yet
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
88 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
CC Note 1
No ratings yet
CC Note 1
11 pages
ch-2.pdf 2
No ratings yet
ch-2.pdf 2
27 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
2024_CD-Ch02_Lexical_Analysis
No ratings yet
2024_CD-Ch02_Lexical_Analysis
25 pages
Compiler Design Part 2
No ratings yet
Compiler Design Part 2
20 pages
Chapter-2[1]
No ratings yet
Chapter-2[1]
77 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Chapter 3 - Lexical Analysis
100% (3)
Chapter 3 - Lexical Analysis
51 pages
Week 5-6
No ratings yet
Week 5-6
33 pages
Chapter-2
No ratings yet
Chapter-2
99 pages
A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
No ratings yet
A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
64 pages
2 LexicalAnalysis
No ratings yet
2 LexicalAnalysis
11 pages
4-LexicalAnalysis
No ratings yet
4-LexicalAnalysis
27 pages
Chapter Two LexicalAnalysis
No ratings yet
Chapter Two LexicalAnalysis
16 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
Compiler
No ratings yet
Compiler
60 pages
Compiler Design - Lexical Analysis: University of Salford, UK
No ratings yet
Compiler Design - Lexical Analysis: University of Salford, UK
1 page
Compiler 18700220055 Prathamrai
No ratings yet
Compiler 18700220055 Prathamrai
12 pages
2.1 Constituents of Lexical Analysis
No ratings yet
2.1 Constituents of Lexical Analysis
10 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
55 pages
Unit 6
No ratings yet
Unit 6
109 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Compiler Design - Lexical Analysis
No ratings yet
Compiler Design - Lexical Analysis
16 pages
compiler construction Lecture 3-4
No ratings yet
compiler construction Lecture 3-4
78 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
File 1675742677 110405 LexicalAnalysis-Continue1
No ratings yet
File 1675742677 110405 LexicalAnalysis-Continue1
39 pages
Lexical Analyzer (Compiler Contruction)
100% (1)
Lexical Analyzer (Compiler Contruction)
6 pages
Lexical Analysis
No ratings yet
Lexical Analysis
36 pages
Compiler_Construction_Lexical_Analysis
No ratings yet
Compiler_Construction_Lexical_Analysis
63 pages
Lecture 2
No ratings yet
Lecture 2
20 pages
Unit I Bks Lexical Analysis V - Re - and - Fsa
No ratings yet
Unit I Bks Lexical Analysis V - Re - and - Fsa
52 pages
Lexical Analysis 1
No ratings yet
Lexical Analysis 1
26 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
Lexical Analysis
No ratings yet
Lexical Analysis
47 pages
Unit 1
No ratings yet
Unit 1
24 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
ch3 M.PPTX - 0
No ratings yet
ch3 M.PPTX - 0
46 pages
CD_UNIT-2
No ratings yet
CD_UNIT-2
64 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Compiler Design
No ratings yet
Compiler Design
42 pages
2 Lex
No ratings yet
2 Lex
45 pages
Cp 324 Lexical Analysis l2
No ratings yet
Cp 324 Lexical Analysis l2
26 pages
2
No ratings yet
2
40 pages
Ch3myppt
No ratings yet
Ch3myppt
59 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
UNIT-I - Lexical Analysis
No ratings yet
UNIT-I - Lexical Analysis
51 pages
Compiler 2
No ratings yet
Compiler 2
10 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
34 pages
People of Africa
From Everand
People of Africa
Edith A. How
No ratings yet
Lesson 19
No ratings yet
Lesson 19
20 pages
Toc Unit 1 Updated 1
No ratings yet
Toc Unit 1 Updated 1
132 pages
Theory of Computation
No ratings yet
Theory of Computation
11 pages
CD Unit - 1 Lms Notes
No ratings yet
CD Unit - 1 Lms Notes
58 pages
4_Sem_BTech_AI_Honors_Syllabus (1)
No ratings yet
4_Sem_BTech_AI_Honors_Syllabus (1)
13 pages
CS 311 Final Notes
No ratings yet
CS 311 Final Notes
3 pages
Chap 1 Dhamdhere
75% (4)
Chap 1 Dhamdhere
84 pages
CS6503-Theory of Computation
No ratings yet
CS6503-Theory of Computation
9 pages
Btech Cse 6 Sem Compiler Design Pcs6i102 2019
No ratings yet
Btech Cse 6 Sem Compiler Design Pcs6i102 2019
2 pages
Chapter 3 - Regular Expression
No ratings yet
Chapter 3 - Regular Expression
16 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
T.E. Information Technology 2019 Course 28.06.2021
No ratings yet
T.E. Information Technology 2019 Course 28.06.2021
116 pages
Lexical Analysis in Automata Theory and Compiler Design
No ratings yet
Lexical Analysis in Automata Theory and Compiler Design
12 pages
Zero Lecture
No ratings yet
Zero Lecture
29 pages
Toc Unit I
No ratings yet
Toc Unit I
20 pages
Chapter 2 - Finite Automata
No ratings yet
Chapter 2 - Finite Automata
58 pages
Assignment Unit 1 Automata
No ratings yet
Assignment Unit 1 Automata
4 pages
comp106_6_computation
No ratings yet
comp106_6_computation
63 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
21 pages
SEM05-Reverse of A Regular Language
No ratings yet
SEM05-Reverse of A Regular Language
75 pages
Toc Lab Manual
No ratings yet
Toc Lab Manual
50 pages
Automata Theory Questions and Answers - Finite Automata-Introduction
No ratings yet
Automata Theory Questions and Answers - Finite Automata-Introduction
278 pages
toc sec b
No ratings yet
toc sec b
71 pages
20cs2204 - Formal Languages and Automata Theory
No ratings yet
20cs2204 - Formal Languages and Automata Theory
2 pages
Fuzziness in Automata Theory
No ratings yet
Fuzziness in Automata Theory
8 pages
UNIT-I Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
No ratings yet
UNIT-I Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
27 pages
Hw2sol PDF
No ratings yet
Hw2sol PDF
7 pages
Automata Therory LEC-3
No ratings yet
Automata Therory LEC-3
16 pages

Chapter 2 - Copy

Uploaded by

Chapter 2 - Copy

Uploaded by

CHAPTER Two

Prepared by: Mesay Yohannes (MSc.)

2.2. Token, Pattern, lexeme

2.3. Lexical Error

2.4. Finite Automata

2.5. Lexical Analyzer Generator

Token Lexeme Attributes

F c Q is the set of accept states, also called final state.

1) Deterministic Finite Automata (DFA):

L = {} which means epsilon (the string itself)

 DFA, to accept zero or more “a”

10, is accepted state

 Definitions: include declarations of variables, constants, and regular definitions.

You might also like