0% found this document useful (0 votes)
37 views

Lexical and Syntax Analysis-4

Uploaded by

f20221525
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Lexical and Syntax Analysis-4

Uploaded by

f20221525
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Principles of Programming

Languages(CS F301)
Prof.R.Gururaj
BITS Pilani CS&IS Dept.
Hyderabad Campus
Lexical and Syntax Analysis
(Ch.4 of T1)
BITS Pilani Prof R Gururaj
Hyderabad Campus
Introduction

In this chapter, we get an introduction to -


❑ Lexical Analysis
❑ Parsing
The three approaches for implementing a Programming
Language-
1. Compilation
2. Interpretation
3. Hybrid approach

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


All these implementation approaches use both lexical and syntax
analyzers.
Syntax analyzers are based on a formal description of the syntax
of the program.
The most commonly used descriptions are CFG / BNF.
Because:
1. BNF descriptions of PLs are clear and concise.
2. BNF descriptions can be used as direct basis for syntax
analyzers.
3. Implementations based on BNF are relatively easy to
maintain because of the modularity.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Analyzing Syntax has two parts-
Lexical analysis and
Syntax analysis.
The Lexical Analyzer deals with small scale language
constructs such as names, numerical literals.
The Syntax Analyzer deals with the large scale constructs
such as expressions, statements, and program units.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Reasons for separating Lexical analysis from
Syntax analysis:
1. Simplicity- lexical analysis techniques are not
so complex hence better to separate
2. Efficiency- Optimizations can be applied
separately
3. Portability- LA involves reading strings from
files and storing buffers hence mostly
platform dependent and not portable.
Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus
Lexical Analysis

It is essentially pattern matching.


The process of pattern matching has been a traditional part
of computing.
A Lexical Analyzer serves as a front-end of the Syntax
Analysis.
Technically, Lexical Analyzer is part of Syntax Analyzer.
The Lexical analyzer collects characters and forms
lexemes.
Each lexeme belongs to a category (token).

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


The Lexical Analyzer(LA) extracts lexemes from the given
input string and produces tokens.
The Syntax analyzer (SA) module calls lexical analyzer.
Hence each call to LA by SA returns a lexeme and its
token.
Lexical analysis process includes skipping comments and
white spaces outside lexemes.
Important: Lexical analyzer detects syntactic errors in
tokens- identifiers, numeric literals etc.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus
Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus
Approaches for building Lexical Analyzer:
1. Write a formal description of the token pattern of
language using descriptive language related to Regular
expression. Then tools like Lex can generate lexical
analyzer programs automatically.
2. Design a transition diagram to describe token patterns
of the language and write program to implement the FA.
3. Design a transition diagram to describe token patterns
of the language and hand-construct a table-driven
implementation of the state transition diagram.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


State transition diagram for arithmetic expression including variables
and integer literals as operands.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Subprograms used:
getChar() - gets the next character of input and puts it in
global variable nextChar.
The lexeme is implemented as character string or array
named lexeme.
addChar()- will add the nextChar in to string array lexeme.
lookup() - used to compute the token code for single
character lexemes.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Names (identifiers) and reserved words have same patterns.
We can write state transitions separately to recognize for
reserved words, but the size of the state diagram it becomes
prohibitively large.
It is much simpler and faster to have LA to recognize names
and reserved words with same pattern and use lookup in
table of reserved words to determine which names are
reserved words.
Lexical Analyzer is often responsible for initial construction of
Symbol table to store info about user defined names.
Attributes of names are filled by other parts of compiler later.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus
Parsing Problem

Syntax Analysis is also called as Parsing.


We have two approaches: Top-down and Bottom-up
parsing.
Parsers construct a parse tree.
The process of derivation and parse tree construction
include all syntactic information needed by the language.
Goals of Syntax analysis:
1.Check to syntactic correctness.
2. Produce a complete parse tree for the syntactically
correct program.
This parse tree is used as the basis for translation.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Notations used:
Terminals: a,b,c,…
Non-terminals : A, B, C,..
Strings of terminals: x, y, z, …
Mixed strings: α, β, δ, γ ,….
Terminal or Nonterminals: W, X, Y, Z uppercase at the end
of alphabet.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Top-down Parsers

A Top-down parser builds a parse tree in preorder.


The preorder traversal of parse tree begins with root.
Each node is visited before its branches.
Branches from the particular node are followed in left-to-
right order.
This corresponds to LMD.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Given a sentential form that is part of the LMD, the parser’s
task is to find the next sentential form in that LMD.
The general form of sentential form is-

xAα xϵ∑* A is a NT and α ϵ (∑ U (NT))*

Here A needs to be expanded using correct rule.


Determining the next sentential form is matter of choosing
correct rule that has A as LHS.
This is the decision problem for Top-down parser.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Different Top-down parsers make use of different
information for parsing decision.
Most parsers choose the correct rule by comparing the next
input symbol with the first symbol in the RHS.
Some times two RHS involving same NT as LHS may start
with same symbol then it is difficult.
A recursive descent parser is a coded version of a syntax
analyzer based directly on the BNF descriptions of the
syntax of the language.
An alternative approach is to use parsing table to
implement BNF.
They are both called as LL parsers.
Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus
Bottom –up Parsing

Bottom-up parser constructs a parse tree by beginning at


leaves and proceeding towards the root.
This corresponds to reverse of RMD.

That is the sentential forms are produced in order last to


first.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


The Bottom-up parsing Process:
Given a right sentential form α , the Bottom-up parser
must determine which substring of the α is the RHS of
rule in grammar that must be reduced to LHS to produce
the previous sentential form.
For example, the first step for bottom-up parser is to
determine which substring of the initial given sentence is
the RHS to be replaced with corresponding LHS to get
the second last sentential form in the derivation.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


The process of finding the correct RHS to reduce is
complicated by the fact that a given right sentential form
may include more than on RHS from the grammar.
The correct RHS is called the handle.
Ex: 2

S→aAc
A→ aA | b

S=> aAc =>aaAc => aabc

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


The bottom up parser finds the handle of the given right
sentential form by examining the symbols on both sides
of the possible handles.
The most common Bottom-up parser algorithms are of LR
family.
L- left to right
R- RMD
Usually the complexity of general parsing is O(n3)
But less general parsers used in compilers can do it O(n).

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Example
Look at the following grammar:

G= { S→ bAB | aBA; A→ aS | b; B→ bS | a},


Show how a Bottom-up parser accept the
following valid string using a stack and with shift
and reduce actions.

abbbab

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus
Recursive descent parsing
(Top-down)
As the name suggests, it consists of collection of
subprograms, many of which are recursive in nature.
It produces parse tree in top-down order.
This recursion is reflection of the nature of PLs which
include several different kinds of nested structures.
Ex: statements are nested.
And we see that the syntax is naturally described using
recursive grammar rules.
EBNF is ideally suited for Recursive descent parsing.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


A Recursive descent parser has a subprogram for each NT
in its grammar.
A recursive descent subprogram for a rule with a single
RHS is relatively simple.
For each terminal symbol in the RHS, the terminal is
compared with next token.
If they do not match, it is syntax error.
If they match, the lexical analyzer is called for the next
input token.
For each NT, the parsing subprogram for the NT is called.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Recursive descent parsing
Grammar:

We will have one procedure for each


Non-terminal S, A, and B;
and one main procedure parse()
to start the parsing process and call S()
The procedure associated with starting NT.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Recursive descent parsing

Check for
Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus
Exercise :

Give Recursive Descent Parser code for the following


grammar:

S→ aAb
A→ bA | aB
B→ c

And track the subprogram calls for accepting acacb

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus
LL Grammar Class
Let us understand the limitation of the recursive descent
approach.
1. Left Recursion.
Direct Recursion:
A--> A+B
How to eliminate.
Indirect recursion:
A--> BaA
B--> Ab
There are algorithms to remove indirect recursion . But we
don't discuss now.

Prof.R.Gururaj CSF301 BITS Pilani, Hyderabad Campus


Left recursion
E→E+T
Creates problems. This is known as Left Recursion.

Whenever we have production of the form-


A→ Aα1, A→ Aα2, …,A→ Aαn, and
A → β1, A→ β2 , …. ,A → βm

Replace these by
A→ β1 A', A→ β2 A', …, A→ βm A', and
A'→ α1 A', A'→ α2 A', …, A'→ αn A', and
A'→ e

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


E→E+T
E→T
Can be replaced by
E→ TE'
E'→ +TE'
E'→e

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Exercise :
Now eliminate Left recursion from the following Grammar.
S→ SabA | Sc
S→a
S→b
A→ cA | c

This Grammar can generate:


S→SabA→ScabA→bcabA→ bcabc

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Now with the modified grammar, can we get “bcabc”

S→bS' → bcS' → bcabAS' → bcabA → bcabc

Hence both grammars generate same language.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Can a Recursive Descent Parser always choose a correct
RHS based on the first symbol in RHS?
In first place it must be non-left recursive.
Test: Pairwise disjoint test.
Compute the FIRST(α) where α is the RHS.
A→ aB FIRST (aB)= {a}
A→ b FIRST (b) = {b}
B→cd |e
Here there is no problem.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


A→aB
A→b
B→cd |e
FIRST(aB) is {a}
FIRST(b) is {b}
Hence first of (A)= {a,b}
Two productions of A do not start with same symbol hence
no problem.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


A→ abb | acb

FIRST(abb) match with one of symbols in FIRST(acb)


Hence it is a problem.
Difficult to decide which RHS to use.
We can apply left factoring.
A→ aM
M→ bb | cb
This may not work always.
In that case we will have to revise the grammar rules.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Bottom-up parsing
Consider the following Grammar.

This is left recursive.


LL parser cannot handle this.
But bottom-up parser has no issues with Left recursion.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Bottom-up parsing process produces the reverse of RMD.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Bottom-up parsing
So the bottom-up parser starts with the last sentential form
and produces the sequence of sentential forms from
there until all that remains is the starting NT.
In each step, the task of the parser is to find the specific
RHS (the handle), in the sentential form that must be
rewritten (replaced with LHS) to get previous sentential
form.
Some times, a Right sentential form may include more than
one RHS in it.
Example: E+T*id { we have E→ E+T; F→ id}
But if we chose E+T as the handle it gives E*id, which is
not a legal right sentential form.
The handle of a right sentential form is unique.
Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus
The task of a bottom-up parser is to find the handle of any
given right sentential form.
Formally handle is defined as follows.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Shift-reduce Algorithm

The bottom-up parser are often called shift-reduce


algorithms.
A parser is a PDA.
The input is examined from left-to-right, one symbol at a
time.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


LR parsers

Many bottom-up parsing algorithms are LR.


LR parsers are relatively small and a parsing table is built
for specific language.
Original LR parser(canonical LR parser, 1965):
Required large computational resources to build the table.
Many variants came up in 1975 which require less
computational resources and work on small class of
grammars than canonical LR.
LR parsing table can be constructed for a grammar using
tool like yacc.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


LR parsers can handle a large class of context-free grammars.
The LR parsing method is a most general non-back tracking shift-
reduce parsing method.
An LR parser can detect the syntax errors as soon as they can
occur.
LR grammars can describe more languages than LL grammars

BITS Pilani, Hyderabad Campus


There exist programs that generate parsing tables if input is the
grammar.

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus
Grammar

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Parsing table

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


LR Parsing process

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Depicting stack operation of LR parser

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus


Summary Ch.4

❑ Intro to Lexical Analysis & Syntax Analysis


❑ Finite Automata
❑ Parsing problem (top-down & Bottom-up)
❑ Top-down parsers
❑ Recursive descent parser
❑ Left recursion
❑ Left factoring
❑ Bottom-up parsing (LR), Handle
❑ LR Parsing table
❑ LR parsing process

Prof.R.Gururaj CSF301 PPL BITS Pilani, Hyderabad Campus

You might also like