0% found this document useful (0 votes)

6 views

UNIT 3 (2)

This unit covers the tools used to generate scanners, focusing on their applications and relationship with compilers. It explains the role of lexical analyzers in transforming source programs into tokens, filtering out whitespace and comments, and associating error messages with specific lines. The document also introduces tools like FLEX, ALEX, and ANTLR for generating scanners and parsers.

Uploaded by

afomh2019

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

UNIT 3 (2)

Uploaded by

afomh2019

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

UNIT 3: USING TOOLS TO GENERATE SCANNERS

Contents
3.0. Aims and Objectives

3.1. Introduction to Tools

3.2. Application of Scanners

3.3. Relationship of Scanners and Compilers

3.4. Summary

3.5. Model Examination Questions

3.0. AIMS AND OBJECTIVES

This unit discusses different types of tools in generating scanners and applications of
scanners.
After you study this unit, you will be able to:
• Describe tools in generating scanners
• Design and construct using tools in your lab class
• Understand the roles and application s of scanners
• Know the relationship between scanners and compilers

1
3.1. INTRODUCTION TO TOOLS

Lexical Analysis is the first phase of compiler also known as scanner. It converts the input
program into a sequence of Tokens. The function of the lexical analyzer is to read the source
program, one character at a time, and to translate it into a sequence of primitive units called
“tokens”. Lexical Analyzer reads the source program character by character to produce
tokens. Normally a lexical analyzer does not return a list of tokens at one shot; it returns a
token when the parser asks a token from it.
The function of the lexical analyzer is to read the source program, one character at a time,
and to translate it into a sequence of primitive units called “tokens”. Lexical Analyzer reads
the source program character by character to produce tokens. Normally a lexical analyzer
does not return a list of tokens at one shot; it returns a token when the parser asks a token
from it.
The scanner performs lexical analysis of a certain program (in our case, the Simple program).
It reads the source program as a sequence of characters and recognizes "larger" textual units
called tokens. For example, if the source programs contains the characters

int number=2; // variable declaration

The scanner would produce the tokens

int (keyword), number (variable name), = (equals to), 2 (value), semicolon (;)
to be processed in later phases of the compiler. Note that the scanner discards white space
and comments between the tokens, i.e. they are "filtered" and not passed on to later phases.
Examples of non-tokens are tabs, line feeds, carriage returns, etc.

In this chapter we are going to see some tools that will help us generate scanners and parsers.

FLEX (Fast Lexical analyzer generator) is a tool for generating scanners. Instead of writing
a scanner from scratch, you only need to identify the vocabulary of a certain language (e.g.

2
Simple), write a specification of patterns using regular expressions (e.g. DIGIT [0-9]), and
FLEX will construct a scanner for you. FLEX is generally used in the manner depicted here:

First, FLEX reads a specification of a scanner either from an input file *.lex, or from standard
input, and it generates as output a C source file lex.yy.c. Then, lex.yy.c is compiled and linked
with the "-lfl" library to produce an executable a.out. Finally, a.out analyzes its input stream
and transforms it into a sequence of tokens.
• *.lex is in the form of pairs of regular expressions and C code.
• lex.yy.c defines a routine yylex() that uses the specification to recognize tokens.
• a.out is actually the scanner

The flex input file consists of three sections, separated by a line with just `%%' in it:
Definitions
%%
Rules
%%
User code
The definitions section contains declarations of simple name definitions to simplify the
scanner specification, and declarations of start conditions, which are explained in a later
section. Name definitions have the form: Name definition, the "name" is a word beginning
with a letter or an underscore ('_') followed by zero or more letters, digits, '_', or '-' (dash).
The definition is taken to begin at the first non-white-space character following the name
and continuing to the end of the line. The definition can subsequently be referred to using
"{name}", which will expand to "(definition)". For example,
DIGIT [0-9]
ID [a-z][a-z0-9]*

3
Defines "DIGIT" to be a regular expression which matches a single digit, and "ID" to be a
regular expression which matches a letter followed by zero-or-more letters-or-digits. A
subsequent reference to {DIGIT} +"."{DIGIT}* is identical to ([0-9]) +"."([0-9])* and matches
one-or-more digits followed by a '.' followed by zero-or-more digits.

The rules section of the flex input contains a series of rules of the form: pattern action where
the pattern must be unindented and the action must begin on the same line.

Finally, the user code section is simply copied to `lex.yy.c' verbatim. It is used for companion
routines which call or are called by the scanner. The presence of this section is optional; if it
is missing, the second `%%' in the input file may be skipped, too.

In the definitions and rules sections, any indented text or text enclosed in `%{' and `%}' is
copied verbatim to the output (with the `%{}''s removed). The `% {}''s must appear
unindented on lines by themselves.

In the rules section, any indented or %{} text appearing before the first rule may be used to
declare variables which are local to the scanning routine and (after the declarations) code
which is to be executed whenever the scanning routine is entered. Other indented or %{}
text in the rule section is still copied to the output, but its meaning is not well-defined and it
may well cause compile-time errors.

When the generated scanner is run, it analyzes its input looking for strings which match any
of its patterns. If it finds more than one match, it takes the one matching the most text (for
trailing context rules, this includes the length of the trailing part, even though it will then be
returned to the input). If it finds two or more matches of the same length, the rule listed first
in the flex input file is chosen.

Once the match is determined, the text corresponding to the match (called the token) is made
available in the global character pointer yytext, and its length in the global integer yyleng.
The action corresponding to the matched pattern is then executed (a more detailed
description of actions follows), and then the remaining input is scanned for another match.

If no match is found, then the default rule is executed: the next character in the input is
considered matched and copied to the standard output. Thus, the simplest legal flex input is:

4
%%, which generates a scanner that simply copies its input (one character at a time) to its
output. Note that yytext can be defined in two different ways: either as a character pointer
or as a character array. You can control which definition flex uses by including one of the
special directives `%pointer' or `%array' in the first (definitions) section of your flex input.
The default is `%pointer', unless you use the `-l' lex compatibility option, in which case yytext
will be an array. The advantage of using `%pointer' is substantially faster scanning and no
buffer overflow when matching very large tokens (unless you run out of dynamic memory).
The disadvantage is that you are restricted in how your actions can modify yytext and calls
to the `unput()' function destroys the present contents of yytext, which can be a considerable
porting headache when moving between different lex versions.

Another tool for generating lexical analyses is ALEX (a Lex-like tool).

Alex(a Lex-like tool) is a tool for generating lexical analysers in Haskell, given a description
of the tokens to be recognised in the form of regular expressions. It is similar to the tools lex
and flex for C/C++. Alex(a Lex-like tool) takes a description of tokens based on regular
expressions and generates a Haskell module containing code for scanning text efficiently.
Alex is designed to be familiar to existing lex users, although it does depart from lex in a
number of ways.

5
Alex (a Lex-like tool) is a scanner generator which translates the lexical description of a
programming language into a scanner for that language. The scanner description language
is easy to use, as it is intentionally small and simple. Alex as well as the generated scanners
are written in Modula-2 and implemented on several microcomputers. The scanner
generator may be used in conjunction with a compiler-compiler.

Finally, we will explain another tool called ANTLR (ANother Tool for Language Recognition).
ANTLR is a parser generator, a tool that helps you to create parsers. A parser takes a piece
of text and transforms it in an organized structure, such as an Abstract Syntax Tree (AST).
You can think of the AST as a story describing the content of the code, or also as its logical
representation, created by putting together the various pieces.

Check your progress-1

1. Describe the three sections of Flex ?
_________________________________________________________________________________________________
_________________________________________________________________________________________________

3.2. APPLICATION OF SCANNERS

Roles of a lexical analyser include:

• Reads input character stream
• Group them into lexeme
• Generate tokens for output to parser

It uses the symbol table for two reasons -

1. If it thinks a lexeme constitutes an identifier, it stores that lexeme in that symbol table.
2. To get the type of identifier for a particular lexeme so that it can generate more
relevant token for the parser.

Lexical Analysers also have a role in removing whitespace (newline, blanks, and tabs),
comments etc. They also associate error messages with corresponding lines (based on the
newline characters or other delimiters) in source program.
Lexical Analyser can be thought of as a combination of:

6
1. Scanning - no tokenization, only scanning - removing comments etc.
2. Lexical Analysis - scanner produces sequence of tokens as output

Why Lexical Analysis and Parsing are required to be separate phases

• Simplicity of design
• Improved compiler efficiency
• Compiler portability is enhanced.

Since the lexical analyzer is the part of the compiler that reads the source text, it may perform
certain other tasks besides identification of lexemes. One such task is stripping out
comments and whitespace (blank, newline, tab, and perhaps other characters that are used
to separate tokens in the input). Another task is correlating error messages generated by the
compiler with the source program. For instance, the lexical analyzer may keep track of the
number of newline characters seen, so it can associate a line number with each error
message. In some compilers, the lexical analyzer makes a copy of the source program with
the error messages inserted at the appropriate positions. If the source program uses a
macro-preprocessor, the expansion of macros may also be performed by the lexical analyzer.

Check your progress-2

1. Explain the role of scanners in compilers?
_________________________________________________________________________________________________
________________________________________________________________________________________________

7
3.3. RELATIONSHIP OF SCANNERS AND COMPILERS

Lexical analyzer (or scanner) is a program to recognize tokens (also called symbols) from an
input source file (or source code). Each token is a meaningful character string, such as a
number, an operator, or an identifier.
The analysis of a source program during compilation is often complex. The construction of a
compiler can often be made easier if the analysis of the source program is separated into two
parts, with one part identifying the low-level language constructs (tokens) such as variable
names, keywords, labels, and operators, and the second part determining the syntactic
organization of the program.

Two aspects of scanners concern us. First we describe what the tokens of the language are.
The class of regular grammars, is one vehicle which can be used to describe tokens. Another
description approach which is briefly introduced involves the use of regular expressions.
Both description methods are equivalent in the sense that both describe the set of regular
languages. The second aspect of scanners deals with the recognition of tokens. Finite-state
acceptors are devices that are well suited to this recognition task primarily because they can
be specified pictorially by using transition diagrams.

The scanner represents an interface between the source program and the syntactic analyzer
or parser. The scanner, through a character-by-character examination of the input text,
separates the source program into pieces called tokens which represent the variable names,
operators, labels, and so on that comprise the source program.

Scanning and syntactic analysis can also have other advantages. Scanning characters is
typically slow in compilers, and by separating it from the parsing component of compilation,
particular emphasis can be given to making the process efficient. Furthermore, more
information can be made available to the parser when it is needed. For example, it is easier
to parse tokens such as keywords, identifiers, and operators, rather than tokens which are
the terminal character set (i.e., A, B, C, etc.). If the first token for a DO WHILE statement is DO
rather than just ‘D’, the compiler can determine that a repetition loop is being parsed rather
than other possibilities such as an assignment statement.

8
The scanner usually interacts with the parser in one of two ways. The scanner may process
the source program in a separate pass before parsing begins. Thus the tokens are stored in a
file or large table. The second way involves an interaction between the parser and the
scanner. The scanner is called by the parser whenever the next token in the source program
is required. The latter approach is the preferred method of operation, since an internal form
of the complete source program does not need to be constructed and stored in memory
before parsing can begin. Another advantage of this method is that multiple scanners can be
written for the same language. These scanners vary depending on the input interfaces used
in the language. Throughout this topic, the scanner will be assumed to be implemented in
this manner. In most cases, however, it makes little difference how the scanner is linked to
the parser.

3.4. SUMMARY

In this chapter, we have seen some of the tools used to generate scanners and parsers. The
scanner performs lexical analysis of a certain program (in our case, the Simple program). It
reads the source program as a sequence of characters and recognizes "larger" textual units
called tokens.
Lexical Analysers also have a role in removing whitespace (newline, blanks, and tabs),
comments etc. They also associate error messages with corresponding lines (based on the
newline characters or other delimiters) in source program.
Lexical Analyser can be thought of as a combination of:
1. Scanning - no tokenization, only scanning - removing comments etc.
2. Lexical Analysis - scanner produces sequence of tokens as output

9
3.5. MODEL EXAMINATION QUESTIONS.

I: True/False questions
1. The rules section of the flex input contains a series of rules of the form: pattern
action.
2. Alex (a Lex-like tool) is a scanner generator which translates the lexical
description of a programming language into a scanner for that language.
3. The scanner represents an interface between the source program and the
syntactic analyzer or parser.

II: Short Answer Questions

1. Explain the ways in which scanners interact with parsers?
2. List at least three applications of scanners?
3. Write a lex program what scans upper case vowels?

Information Technology For The Health Professions. 5th Edition. ISBN 0134877713, 978-0134877716
100% (23)
Information Technology For The Health Professions. 5th Edition. ISBN 0134877713, 978-0134877716
23 pages
AOC LCD TV LC32W063 User Manual
50% (2)
AOC LCD TV LC32W063 User Manual
37 pages
CMSC 141 - Automata and Language Theory
No ratings yet
CMSC 141 - Automata and Language Theory
2 pages
Flexman PDF
No ratings yet
Flexman PDF
37 pages
Experiment No. 9 3118013: Aim: Theory: Lexical Analyzer
No ratings yet
Experiment No. 9 3118013: Aim: Theory: Lexical Analyzer
16 pages
Lecture 07 PDF
No ratings yet
Lecture 07 PDF
8 pages
Estd 1919
No ratings yet
Estd 1919
22 pages
An Introduction To Flex
No ratings yet
An Introduction To Flex
7 pages
CSC412 Compiler Construction I March 24 2022 NOUN-pages-2
No ratings yet
CSC412 Compiler Construction I March 24 2022 NOUN-pages-2
36 pages
Introduction For Lab Compiler
No ratings yet
Introduction For Lab Compiler
15 pages
Lexical Analyser Parser
No ratings yet
Lexical Analyser Parser
37 pages
Compiler Construction CS-4207: Instructor Name: Atif Ishaq
No ratings yet
Compiler Construction CS-4207: Instructor Name: Atif Ishaq
13 pages
Compiler Construction: Lab Report # 07
No ratings yet
Compiler Construction: Lab Report # 07
6 pages
CD (Aicte 2020-2021)
No ratings yet
CD (Aicte 2020-2021)
74 pages
Compiler Design Project (Edited) 1
No ratings yet
Compiler Design Project (Edited) 1
28 pages
Project 1 - Lexical Analyzer Using The Lex Unix Tool No Due Date - Project Not Graded
No ratings yet
Project 1 - Lexical Analyzer Using The Lex Unix Tool No Due Date - Project Not Graded
4 pages
CD Cse Record
No ratings yet
CD Cse Record
76 pages
Project Report
No ratings yet
Project Report
26 pages
lex tool
No ratings yet
lex tool
7 pages
Lab 2
No ratings yet
Lab 2
22 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
Programming Assignment I Due Thursday, April 20, 2023 at 11:59pm
No ratings yet
Programming Assignment I Due Thursday, April 20, 2023 at 11:59pm
7 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
Compiler Design: Lexical Analysis
No ratings yet
Compiler Design: Lexical Analysis
68 pages
Flex/Le X: Javeria Akram (276) Ifra Zahid
No ratings yet
Flex/Le X: Javeria Akram (276) Ifra Zahid
21 pages
System Programming & Compiler Design Lab Manual
No ratings yet
System Programming & Compiler Design Lab Manual
41 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
Lab Programs
No ratings yet
Lab Programs
18 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
40 pages
lab4
No ratings yet
lab4
12 pages
SPCC EXP7
No ratings yet
SPCC EXP7
8 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
CD Mini Project Lexical Analyzer
No ratings yet
CD Mini Project Lexical Analyzer
31 pages
CC2
No ratings yet
CC2
6 pages
AIM: To Study About Lexical Analyzer Generator (LEX) and Flex (Fast Lexical Analyzer) Lexical Analyzer Generator (LEX)
No ratings yet
AIM: To Study About Lexical Analyzer Generator (LEX) and Flex (Fast Lexical Analyzer) Lexical Analyzer Generator (LEX)
3 pages
Compiler Construction: Chapter # 2 - Lexical Analysis Instructor: Ms. Raazia Sosan
No ratings yet
Compiler Construction: Chapter # 2 - Lexical Analysis Instructor: Ms. Raazia Sosan
53 pages
Day 2 - Lexial Analyzer
No ratings yet
Day 2 - Lexial Analyzer
37 pages
Compiler Construction: Lab Report # 06
No ratings yet
Compiler Construction: Lab Report # 06
5 pages
UNIT I BKS Lexical Analysis IX - LEX
No ratings yet
UNIT I BKS Lexical Analysis IX - LEX
17 pages
CD MANUAL Edited
No ratings yet
CD MANUAL Edited
26 pages
Chapter 2
No ratings yet
Chapter 2
6 pages
Compiler Construction: Lab Report # 08
No ratings yet
Compiler Construction: Lab Report # 08
5 pages
Shahrukh Memon CSC-18F-054 CS-6A Computer Compiler Sindh Madressatul Islam University
No ratings yet
Shahrukh Memon CSC-18F-054 CS-6A Computer Compiler Sindh Madressatul Islam University
3 pages
Compiler Design Manual
No ratings yet
Compiler Design Manual
69 pages
Compiler Design (CD) : Lab Assignment 1
No ratings yet
Compiler Design (CD) : Lab Assignment 1
36 pages
Implementation of Symbol Table Using Flex On Unix Environment
No ratings yet
Implementation of Symbol Table Using Flex On Unix Environment
19 pages
SSCD assignment1
No ratings yet
SSCD assignment1
11 pages
Flex
No ratings yet
Flex
36 pages
@CD_ch2 compiler design
No ratings yet
@CD_ch2 compiler design
26 pages
Comp Chap2
No ratings yet
Comp Chap2
36 pages
Flex Coursz
No ratings yet
Flex Coursz
15 pages
Lexical-Analyzer
No ratings yet
Lexical-Analyzer
33 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
14 pages
The Function of Lex Is As Follows
No ratings yet
The Function of Lex Is As Follows
3 pages
Compiler Lab Manual Final E-Content
75% (16)
Compiler Lab Manual Final E-Content
55 pages
Topic-:Lexical Analyse Generator: Term Paper OF CSE-318
No ratings yet
Topic-:Lexical Analyse Generator: Term Paper OF CSE-318
13 pages
Lex1 Lab Manual TE Computer SPPU
No ratings yet
Lex1 Lab Manual TE Computer SPPU
6 pages
CD Unit I Part II Lexical Analysis
No ratings yet
CD Unit I Part II Lexical Analysis
58 pages
Python Programming Concepts
From Everand
Python Programming Concepts
MRB
No ratings yet
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Coding for beginners The basic syntax and structure of coding
From Everand
Coding for beginners The basic syntax and structure of coding
Diamond Moore
No ratings yet
Beginning Swift Programming
From Everand
Beginning Swift Programming
Wei-Meng Lee
No ratings yet
2022CC SEC Extension Offices Citizens Charter 2022 1st Edition
No ratings yet
2022CC SEC Extension Offices Citizens Charter 2022 1st Edition
1,008 pages
Grievance May Be Any Genuine or Imaginary Feeling of Dissatisfaction or Injustice Which An
No ratings yet
Grievance May Be Any Genuine or Imaginary Feeling of Dissatisfaction or Injustice Which An
94 pages
KC Brahma
No ratings yet
KC Brahma
224 pages
Leadership Development Training Program
100% (1)
Leadership Development Training Program
7 pages
API 660 Vs TEMA
No ratings yet
API 660 Vs TEMA
2 pages
ComparingContrasting LP
No ratings yet
ComparingContrasting LP
8 pages
2 - Turbopump Types
No ratings yet
2 - Turbopump Types
28 pages
Year 8 Unit 2 Assessment Task 2022 2 2
No ratings yet
Year 8 Unit 2 Assessment Task 2022 2 2
7 pages
TAM Marinas
No ratings yet
TAM Marinas
3 pages
Waayu Product Deck 2024
No ratings yet
Waayu Product Deck 2024
18 pages
t2 Mini Manual
No ratings yet
t2 Mini Manual
6 pages
Test Bias
No ratings yet
Test Bias
23 pages
Addison Wesley Physics 11 11th Edition Elgin Wolfe instant download
100% (1)
Addison Wesley Physics 11 11th Edition Elgin Wolfe instant download
41 pages
KC Volume 6 Updated
No ratings yet
KC Volume 6 Updated
65 pages
Generator Set Installation Checklist-Open Generator Set
No ratings yet
Generator Set Installation Checklist-Open Generator Set
2 pages
ERKE Group, PTC Vibrodriver Catalogue
100% (1)
ERKE Group, PTC Vibrodriver Catalogue
16 pages
Best Interior Design Companies in Abu Dhabi
No ratings yet
Best Interior Design Companies in Abu Dhabi
5 pages
SRM56
No ratings yet
SRM56
40 pages
Download Full (Ebook) Kingship, Madness, and Masculinity on the Early Modern Stage: Mad World, Mad Kings by Christina Gutierrez-Dennehy (editor) ISBN 9780367760830, 0367760835 PDF All Chapters
100% (5)
Download Full (Ebook) Kingship, Madness, and Masculinity on the Early Modern Stage: Mad World, Mad Kings by Christina Gutierrez-Dennehy (editor) ISBN 9780367760830, 0367760835 PDF All Chapters
81 pages
Arsi University ICT Directorate ICT Club Formation Proposal
No ratings yet
Arsi University ICT Directorate ICT Club Formation Proposal
8 pages
GLass
No ratings yet
GLass
30 pages
Instruction NucleoSpin Soil
No ratings yet
Instruction NucleoSpin Soil
30 pages
Keberhasilan Program Kawasan Rumah Pangan Lestari
No ratings yet
Keberhasilan Program Kawasan Rumah Pangan Lestari
14 pages
Runeterra
100% (1)
Runeterra
14 pages
What Is Sociology?: PROVOCATION 1: Who Gives A Spit About Sociology?
No ratings yet
What Is Sociology?: PROVOCATION 1: Who Gives A Spit About Sociology?
21 pages
Eleesha fonseka
No ratings yet
Eleesha fonseka
1 page
OSHAFU UMAR DAHARIU PROJECT
No ratings yet
OSHAFU UMAR DAHARIU PROJECT
73 pages
MPN Solutions Partner Benefits Guide
No ratings yet
MPN Solutions Partner Benefits Guide
22 pages

UNIT 3 (2)

Uploaded by

UNIT 3 (2)

Uploaded by

UNIT 3: USING TOOLS TO GENERATE SCANNERS

3.1. Introduction to Tools

3.3. Relationship of Scanners and Compilers

3.5. Model Examination Questions

3.0. AIMS AND OBJECTIVES

int number=2; // variable declaration

The scanner would produce the tokens

Another tool for generating lexical analyses is ALEX (a Lex-like tool).

Check your progress-1

3.2. APPLICATION OF SCANNERS

Roles of a lexical analyser include:

It uses the symbol table for two reasons -

Why Lexical Analysis and Parsing are required to be separate phases

Check your progress-2

II: Short Answer Questions

You might also like