lect03
lect03
Lexical Analysis:
• reads characters and produces sequences of tokens.
Today’s lecture:
Towards automated Lexical Analysis.
2 Dec 2024 1
The Big Picture
First step in any translation: determine whether the text to be
translated is well constructed (hence formal languages, rather than
natural languages) in terms of the input language. Syntax is
specified with parts of speech - syntax checking matches parts of
speech against a grammar.
2 Dec 2024 2
Language
• Languages in general, consists of three components: alphabet/letters,
words and sentences.
• Examples:
English Computer Languages
Letters of alphabet ASCII characters
Words in a dictionary Keywords, user-defined identifiers, etc.
4
2 Dec 2024
Tokens, Lexemes, Patterns
• Tokens
Token is a sequence of characters. Tokens may be
a) Identifiers b) Keywords c) Operator d) Special
Symbols c) Constant, and so on.
• Lexemes
Lexeme is sequence of characters are matched by a
pattern (i.e. RE) for token.
• Patterns
Rule of description is a pattern. Patterns are specified
using regular expression.
Ex: letter(letter | digit)*
2 Dec 2024 5
Tokens, Lexemes, Patterns
2 Dec 2024 6
Tokenization
• Process of forming tokens from input
stream is called tokenization.
Ex: div = 6/2;
2 Dec 2024 7
What is attribute for token
• Lexical analyzer provides additional
information about particular lexeme.
Ex: y = 4*x + 5;
Token stream should be:
<id, y><op,=><num,4><id, x><op,+><num,5>
Can be efficient; but requires a lot of work and may be difficult to modify!
2 Dec 2024 10
2 Dec 2024 COMP36512 Lecture 3 11
Building lexical analyser by hand can be
efficient; but requires a lot of work and may
be difficult to modify!
Hence the suitable approach is:
Building Lexical Analysers “automatically”
2 Dec 2024 14
We study REs to automate scanner construction!
Consider the problem of recognising register names starting with r and
requiring at least one digit:
Register r (0|1|2|…|9) (0|1|2|…|9)* (or, Register r Digit Digit*)
The RE corresponds to a transition diagram:
digit
start r digit
S0 S1 S2
2 Dec 2024 16
An Example (recognise r0 through r31)
Register r ((0|1|2) (Digit|) | (4|5|6|7|8|9) | (3|30|31))
S2 digit S3
0|1|2
S0 r S1 3 S5 0|1 S6
4|5|6|7|8|9 S4
2 Dec 2024 17
The Full Story!
2 Dec 2024 18
Assignments
1. Write a C program that read the following string:
“ Md. Tareq Zaman, Part-3, 2011”
a) Count number of words, letters, digits and other characters.
b) Separates letters, digits and others characters.
2. Write a program that read the following string:
“ Munmun is the student of Computer Science & Engineering”.
a) Count how many vowels and Consonants are there?
b) Find out which vowels and consonants are existed in the above string?
c) Divide the given string into two separate strings, where one string only
contains the words started with vowel, and another contains the words
started with consonant.
3. Write a program that abbreviates the following code:
CSE-3141 as Computer Science & Engineering, 3rd year, 1st semester,
Compiler Design, Theory.