Learn Compiler Design: From B. K.
Sharma
UNIT I
Lexical Analysis: Token, Lexeme
Pattern and Role of Lexical
Analyzer
Learn Compiler Design: From B. K. Sharma
Unit I: Syllabus
• Introduction to Compiler
• Structure of a compiler
• Lexical Analysis
• Role of Lexical Analyzer
• Input Buffering
• Specification of Tokens
• Recognition of Tokens
Learn Compiler Design: From B. K. Sharma
Unit I: Syllabus
• Lex
• Finite Automata
• Regular
• Expressions to Automata
• Minimizing DFA.
Learn Compiler Design: From B. K. Sharma
Active Learning Activity: Diagnostic
Assessment
One- Minute Paper
List 6 phases of compiler.
List 8 components of structure of compiler.
Learn Compiler Design: From B. K. Sharma
Summary of Lesson 2: Structure of a Compiler:
The compiler's structure is modular consisting of following
8 components:
1: Lexical Analyzer takes as input a stream of characters and
produces as output a stream of words or phrases or tokens along
with their associated syntactic categories
2: Syntax Analyzer takes as input a stream of tokens and
recognizes the structure of tokens according to grammar of the
language producing parse tree or syntax tree as output
3: Semantic analyzer checks the static semantics of the
language and annotates the syntax tree with type information
4: Intermediate Code Generator produces code that is machine
independent which is portable code.
Learn Compiler Design: From B. K. Sharma
Summary of Lesson 2: Structure of a Compiler:
The compiler's structure is modular consisting of following
8 components:
5: Code Optimizer produces better /semantically equivalent
code
6: Target Code Generator produces final target low-level code
which in assembly language or machine language.
7: Symbol Table is a data structure used to store information
about identifiers, their types, and other attributes.
8: Error Handler ensures the compiler can detect and
report errors to the programmers, aiding in code
correction and improvement.
Learn Compiler Design: From B. K. Sharma
Mapping of Lesson with Course Outcome
(CO)
Lesson CO
Lesson 3: Lexical Apply the knowledge of
Analysis and Role of theory of computation in
Lexical Analyzer specifying and
recognizing tokens
Learn Compiler Design: From B. K. Sharma
Tokens, Lexeme or Pattern
Token:
a group of characters having a collective meaning.
Smallest individual unit of a language that are
recognized.
Token Types or classes:
Identifiers Keywords Operators
Numbers Delimiters Parentheses
Learn Compiler Design: From B. K. Sharma
Lexical Analysis and Lexical Analyzer
Tokens IR
Source Scanner Parser
code
Errors
Lexical Analysis:
The task concerned with breaking an input into its
smallest meaningful units, called tokens.
Lexical Analyzer (Scanner):
Program that reads input characters and produces
a sequence of tokens as output.
Learn Compiler Design: From B. K. Sharma
Tasks or Role of Lexical Analyzer: Scanner
Main Role:
Read the characters of source language (a stream
of characters) and break it up into tokens, the
smallest meaningful units of the source language.
Other Roles:
Remove the white space/tab Remove the comments
Interpret the compiler directives. Insert Tokens into ST
Generate Errors Send Tokens to Parser
Learn Compiler Design: From B. K. Sharma
Tokens, Lexeme or Pattern
Lexeme:
Lexeme is a particular instant of a token.
e.g. token: identifier, lexeme: a, x,y,2.5 etc.
Token: operators, lexeme: +, -, *, /
Token: Parentheses, lexeme: (, )
Token: keywords, lexeme: main, int
Learn Compiler Design: From B. K. Sharma
Tokens, Lexeme or Pattern
Pattern:
the rule describing how a token can be formed.
e.g.: identifier: ([a-z]|[A-Z]) ([a-z]|[A-Z]|[0-9])*
letter followed by letters and digits
e.g.: Number: [0-9]+
Learn Compiler Design: From B. K. Sharma
Lexical Analysis: Token Example
Let us consider a C-Language statement:
if (x==3)
Tokens are:
Keyword, LPAR, IDENT, EQ, NUMBER, RPAR
Token and Lexeme pairs are
<Key, “if”> <LPAR, “(“> <IDENT, “x”>
<LOP, “==”> <NUMB, “3”> <RPAR,”)”>
Learn Compiler Design: From B. K. Sharma
Lexical Analysis: Token Example
i f ( x = = 3 ) scanner IF, LPAR, IDENT, EQ, NUMBER,
RPAR, ..., EOF
character stream token stream
(must end with EOF)
Learn Compiler Design: From B. K. Sharma
Lexical Analysis: Token Attribute Example
y := 31 + 28*x Lexical analyzer
<id, “y”> <assign, “:=“> <num, 31> <+, > <num, 28> <*, > <id, “x”>
token
tokenval
(token attribute)
Parser
Learn Compiler Design: From B. K. Sharma
Lexical Analysis: Token Example
Let us consider another C statement:
if (x==y)
z =12;
else
z = 3;
i f ( x = = y ) \n \t z = 1 2 ; \n e l s e \n \t z = 3 ; \n
<KEY, “if”> <LPAR> <ID, “x”> <LOP, “==”> <ID, “y”>
<RPAR> <ID, “z”> <OP, “=“> <INT, “12“> <SEMIC>
<KEY, “else“> <ID, “z“> <OP, “=“> <INT, “3“> <SEMIC>
Learn Compiler Design: From B. K. Sharma
Active Learning Activity
One- Minute Paper
Consider the following code in C Language:
printf (“i=%d, j=%f, &i=%x\n”, i, j, &i);
The number of tokens find by the lexical analyzer
is ?
a) 10 b) 35 c) 12 d) 46
Learn Compiler Design: From B. K. Sharma
Active Learning Activity
One- Minute Paper
Everything inside “ ” in printf() is counted as a
single token.
Learn Compiler Design: From B. K. Sharma
Active Learning Activity
One- Minute Paper
Consider the following code in C Language:
#include<stdio.h>
int main()
{
printf(“%d + %d =%d”,3,1,4);
return 0;
}
The number of lexemes after pre-processing is ?
a) 10 b)12 c) 20 d) 5
Learn Compiler Design: From B. K. Sharma
Active Learning Activity
One- Minute Paper
During pre-processing, file inclusion, macro substitution
pre-processing directives, comments are removed.
Learn Compiler Design: From B. K. Sharma
What is the need for separating parser
from scanner?
Source Scanner Tokens Parser
code
Errors
Lexical analyzer does not have to be an individual
phase.
But having a separate phase simplifies the design
and improves the efficiency and portability.
Learn Compiler Design: From B. K. Sharma
What is the need for separating parser
from scanner?
Reasons for separating both analysis:
1) Simpler design.
Separation allows the simplification of one or the other.
Example: A parser with comments or white spaces is more
complex
2) Compiler efficiency is improved.
Optimization of lexical analysis because a large amount of
time is spent reading the source program and partitioning it
into tokens.
Learn Compiler Design: From B. K. Sharma
What is the need for separating parser
from scanner?
3) Compiler portability is enhanced.
Only the scanner requires to communicate with the outside
world
Input alphabet peculiarities and other device-specific
anomalies can be restricted to the lexical analyzer.
4) Specialization
Specialized techniques can be applied to improves the
lexical analysis process
Learn Compiler Design: From B. K. Sharma
Summary of Lesson 3: Lexical Analysis and Role of
Lexical Analyzer
1: A group of characters having a collective meaning or
smallest individual unit of a language that are recognized,
is called token.
2: Lexeme is a particular instant of a token.
3: The rule describing how a token can be formed is called
pattern.
4: The primary task of Lexical Analyzer is Read the
characters of source language (a stream of characters) and
break it up into tokens, the smallest meaningful units of the
source language.
5:The secondary tasks of Lexical Analyzer are: remove white
space and tab, remove comments, interpret compiler
directives, insert tokens into symbol table, lexical errors,
send tokens to parser.
Learn Compiler Design: From B. K. Sharma
Summary of Lesson 3: Lexical Analysis and Role of
Lexical Analyzer
6: Separating scanner from parser simplifies the design and
improves the efficiency and portability.
Learn Compiler Design: From B. K. Sharma
Active Learning Activity
One- Minute Paper
When expression sum=3+2 is tokenized then what
is the token category of 3?
a) Identifier b) Assignment operator
c) Integer Literal d) Addition Operator
Learn Compiler Design: From B. K. Sharma
Lexical Analysis: Questions
Explain the term token, lexeme and Pattern with
examples.
What is the need for separating the parser from
scanner?
Learn Compiler Design: From B. K. Sharma
Active Learning Activity: After Leaning
Process to Check Important Take away of
Things and Feedback for Improvement in
teaching and learning process
One-Minute Paper
1) What was the most interesting part of the session?
2) What was the most confusing part of the session?
3) Give a score out of 10 for this session:
If it is not 10 out of 10 tell me why not?