0% found this document useful (0 votes)
22 views

unit-1-syntax-and-semantics-ppl (1)

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

unit-1-syntax-and-semantics-ppl (1)

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

UNIT 1 Syntax AND Semantics - PPL

Principles of Programming Language (Anna University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


CP 4154 PRINCIPLES OF PROGRAMMING LANGUAGES

UNIT 1 SYNTAX AND SEMANTICS

Evolu琀椀on of programming languages – describing syntax – context – free grammars –a琀琀


ribute Grammars – describing seman琀椀cs – lexical analysis – parsing – recursive-descent – bo琀
琀om- up Parsing

EVOLUTION OF PROGRAMMING LANGUAGES

The evolu琀椀on of programming languages can be examined through various principles that have
guided their development over 琀椀me.

1. Machine Code and Assembly Language (1st and 2nd Genera琀椀on):

Principle: Direct hardware interac琀椀on.

Explana琀椀on: Ini琀椀ally, programmers worked directly with machine code, represen琀椀ng


instruc琀椀ons in binary. Assembly languages were introduced, providing mnemonics for machine
code instruc琀椀ons and making programming more human-readable.

Advantages: Low-level control over hardware, essen琀椀al for early computer systems.

2. Abstrac琀椀on and Procedural Programming (3rd Genera琀椀on):

Principle: Abstrac琀椀on for complexity management.

Explana琀椀on: Third-genera琀椀on languages like Fortran and COBOL introduced higher-level


abstrac琀椀ons, allowing programmers to work with variables, loops, and func琀椀ons. This
increased readability and ease of programming.

Advantages: Improved readability, portability, and maintenance. Enabled handling complex tasks
with higher-level constructs.

3. Structured Programming (late 3rd and 4th Genera琀椀on):

Principle: Emphasis on modular, structured code.

Explana琀椀on: Languages like C and Pascal promoted structured programming principles,


encouraging the use of func琀椀ons and control structures for clearer, more maintainable code. This
led to be琀琀er program organiza琀椀on and debugging.

Advantages: Enhanced code organiza琀椀on, readability, and debugging. Facilitated large-scale so昀琀
ware development.

4. Object-Oriented Programming (OOP) (4th

Genera琀椀on): Principle: Modelling so昀琀ware as

interac琀椀ng objects.
Explana琀椀on: OOP languages like Smalltalk and later, Java and C++, introduced the concept of
classes and objects, promo琀椀ng encapsula琀椀on, inheritance, and polymorphism. This facilitated
be琀琀er code organiza琀椀on, reuse, and maintenance.

Advantages: Encapsula琀椀on, inheritance, and polymorphism improved code reuse, modularity,


and scalability. Enhanced so昀琀ware design prac琀椀ces.

5. Declara琀椀ve Programming (4th and 5th Genera琀椀on):

Principle: Focus on “what” rather than “how.”

Explana琀椀on: Declara琀椀ve languages like SQL and Prolog shi昀琀ed the focus to expressing the
desired outcome, leaving the implementa琀椀on details to the language run琀椀me. This abstrac琀
椀on simpli昀椀ed coding for speci昀椀c tasks like database queries and AI.

Advantages: Simpli昀椀ed coding for speci昀椀c domains like databases and AI. Increased abstrac琀
椀on and reduced complexity in certain applica琀椀ons.

6. Domain-Speci昀椀c Languages (DSLs) (4th and 5th Genera琀椀on):

Principle: Tailored languages for speci昀椀c applica琀椀ons.

Explana琀椀on: DSLs, such as MATLAB for numerical compu琀椀ng or HTML for web development,
emerged to address par琀椀cular domains. These languages are designed with features and syntax
op琀椀mized for speci昀椀c tasks, improving e昀케ciency.

Advantages: Improved e昀케ciency and expressiveness in specialized domains. Streamlined


development in昀椀elds like 昀椀nance, scien琀椀昀椀c compu琀椀ng, and web development.

7. Func琀椀onal Programming (late 20th Century – Present):

Principle: Emphasis on func琀椀ons as 昀椀rst-class ci琀椀zens.

Explana琀椀on: Func琀椀onal languages like Haskell and Lisp focus on trea琀椀ng func琀椀ons as
values, suppor琀椀ng immutability and avoiding side e昀昀ects. This paradigm facilitates concise,
expressive code and helps in handling complex computa琀椀ons.

Advantages: Concise, expressive code. Facilitated parallel programming and eased reasoning about
complex computa琀椀ons.

8. Concurrency and Parallelism (Present):

Principle: E昀케cient handling of concurrent and parallel tasks.

Explana琀椀on: With the rise of mul琀椀core processors and distributed systems, languages like
Erlang, Go, and Rust incorporate features to simplify concurrent programming and enhance
performance in parallel environments.

Advantages: Simpli昀椀ed handling of mul琀椀ple tasks concurrently. Improved performance in the


era of mul琀椀core processors and distributed systems.

9. Simplicity and Expressiveness (Present):


Principle: Priori琀椀zing simplicity without sacri昀椀cing power.

Explana琀椀on: Modern languages like Python, Ruby, and Swi昀琀 focus on readability and
simplicity, aiming to make code more understandable and maintainable. They o昀琀en incorporate
features that enhance expressiveness and reduce boilerplate code.

Advantages: Reduced boilerplate code, enhanced developer produc琀椀vity, and improved code
maintainability. A琀琀racted a broader audience to programming.

● The evolu琀椀on of programming languages re昀氀ects an ongoing pursuit of improving


produc琀椀vity, managing complexity, and adap琀椀ng to the changing landscape of compu
琀椀ng.
● Each genera琀椀on builds upon the principles of its predecessors while introducing new
concepts to address emerging challenges and opportuni琀椀es in so昀琀ware development.

SYNTAX

Syntax and seman琀椀cs are important terms in any computer programming language.

In a programming language, syntax refers to the collec琀椀on of a language’s allowable words; in


contrast, seman琀椀cs expresses the associated meaning of those words.

What is Syntax?

● The syntax of computer programming language is u琀椀lized to represent the structure of


programs
without viewing their meaning.
● It mainly focuses on the structure and arrangement of a program with the help of its appearance.
● It consists of a set of rules and regula琀椀ons that validates the sequence of symbols and
statements that are u琀椀lized in a program.
● Both human languages and programming languages rely on syntax, and the pragma琀椀c
and computa琀椀on model represents these syntac琀椀c elements of a computer
programming language.

Use of grammar in Syntax:

● Usually, grammar is rewri琀椀ng rules whose aim is to recognize and make programs.
● Grammar doesn’t rely on the computa琀椀on model but rather on the descrip琀椀on of the
language’s structure.
● The grammar includes a 昀椀nite number of gramma琀椀cal categories (including noun
phrases, ar琀椀cles, nouns, verb phrases, etc.), single words (alphabet elements), and
well-formed rules that govern the order in which the gramma琀椀cal categories may appear.

Techniques of Syntax:

There are several formal and informal techniques that may help to understand the syntax of computer
programming language.

1. Lexical Syntax

It is u琀椀lized to de昀椀ne basic symbols’ rules, including iden琀椀昀椀ers, punctuators, literals,


and operators.

2. Concrete Syntax

It describes the real representa琀椀on of programs u琀椀lizing lexical symbols such as their
alphabet.

3. Abstract Syntax

It communicates only the essen琀椀al program informa琀椀on.

Types of Grammars:

There are several types of grammar u琀椀lized in the programming syntax.

1. Context-free Grammar

It is commonly u琀椀lized to determine the overall structure of a language.

2. Regular Expressions

It explains a programming language’s lexical units (tokens).

3. A琀琀ribute Grammars

It de昀椀nes the language’s context-sensi琀椀ve parts.

What is Syntax Highligh琀椀ng?

The syntax elements of a programming language (variable names, keywords, operators, etc.) are
coloured in a text editor or IDE that supports syntax highligh琀椀ng, making the code easier to
understand.

How do you prevent a syntax error?

● Syntax errors occur when a command is typed incorrectly in a command line or when a bug
is discovered in a program or script.
● The command or code should be wri琀琀en without any syntax errors to avoid syntax errors.

Example :

Example of a “for

loop “ Numbers = [1,

2, 3, 4, 5] For num in

numbers:
Squared = num ** 2

Print(f”The square of {num} is: {squared}”)

● Numbers = [1, 2, 3, 4, 5]: This line creates a list called numbers containing 昀椀ve integers.
● For num in numbers: This is the beginning of a “for loop,” itera琀椀ng through each element
in the numbers list.
● Squared = num ** 2: This line calculates the square of the current number in the loop.
● Print(f”The square of {num} is: {squared}”): It prints a forma琀琀ed string displaying the
original number and its square.
● This example demonstrates the syntax of a for loop in Python, itera琀椀ng through a list of
numbers and calcula琀椀ng their squares.

CONTEXT FREE GRAMMAR

Grammar :

● It is a 昀椀nite set of formal rules for genera琀椀ng syntac琀椀cally correct sentences or


meaningful correct sentences.

Context free grammar:

● Context Free Grammar is formal grammar, the syntax or structure of a formal language can
be described using context-free grammar (CFG), a type of formal grammar.
● It is de昀椀ned as four tuples G=(V,T,P,S)

Where,

● G is a grammar, which consists of a set of produc琀椀on rules. It is used to generate the


strings of a language.
● T is the 昀椀nal set of terminal symbols. It is denoted by lower case le琀琀ers.
● V is the 昀椀nal set of non-terminal symbols. It is denoted by capital le琀琀ers.
● P is a set of produc琀椀on rules, which is used for replacing non-terminal symbols (on the le
昀琀 side of produc琀椀on) in a string with other terminals (on the right side of produc琀椀
on).
● S is the start symbol used to derive the string.

A grammar is said to be the Context-free grammar if every produc琀椀on is in the form of :

G -> (V∪T)*, where G ∊ V

And the le昀琀-hand side of the G, here in the example can only be a Variable, it cannot be a terminal.

But on the right-hand side here it can be a Variable or Terminal or both combina琀椀on of Variable and
Terminal.

For example the grammar A = { S, a,b, P,S} having produc琀椀on :


Here S is the star琀椀ng symbol.

{a,b} are the terminals generally represented by small

characters. P is variable along with S.

Produc琀椀on rules:

● S → aSa
● S → bSb
● S→c

Now check that abbcbba string can be derived from the given

CFG. S ⇒ aSa

S ⇒ abSba

S⇒

abbSbba

S⇒

abbcbba

By applying the produc琀椀on S → aSa, S → bSb recursively and 昀椀nally applying the produc琀椀
on S → c, we get the string abbcbba.

Capabili琀椀es of CFG:

● Context free grammar is useful to describe most of the programming languages.


● If the grammar is properly designed then an e昀케cient parser can be constructed automa琀椀
cally.
● Using the features of associa琀椀vely & precedence informa琀椀on, suitable grammars for
expressions can be constructed.
● Context free grammar is capable of describing nested structures like: balanced parentheses,
matching begin-end, corresponding if-then-else’s & so on.

Limita琀椀ons of Context-Free Grammar (CFG):

● CFGs are less expressive, and neither English nor programming language can be expressed
using Context-Free Grammar.
● Context-Free Grammar can be ambiguous means we can generate mul琀椀ple parse trees of
the same input.
● For some grammar, Context-Free Grammar can be less e昀케cient because of the exponen
琀椀al 琀椀me complexity.
● And the less precise error repor琀椀ng as CFGs error repor琀椀ng system is not that precise
that can give more detailed error messages and informa琀椀on.

Applica琀椀ons of Context Free Grammar (CFG):

● Context Free Grammars are used in Compilers (like GCC) for parsing. In this step, it takes a
program (a set of strings).
● Context Free Grammars are used to de昀椀ne the High Level Structure of a Programming
Language.
● Every Context Free Grammar can be converted to a Parser which is a component of a
Compiler that iden琀椀昀椀es the structure of a Program and converts the Program into a
Tree.
● Document Type De昀椀ni琀椀on in XML is a Context Free Grammar which describes the
HTML tags and the rules to use the tags in a nested fashion.

ATTRIBUTE GRAMMAR

A琀琀ributes:

● An a琀琀ribute is a characteris琀椀c that determines the value of a gramma琀椀cal symbol.


● Seman琀椀c func琀椀ons, also referred to as a琀琀ribute computa琀椀on func琀椀ons,
are func琀椀ons connected to grammar produc琀椀ons that compute the values of a琀琀
ributes.
● Predicate func琀椀ons are func琀椀ons that state a speci昀椀c grammar’s sta琀椀c seman
琀椀c rules as well as por琀椀ons of its syntax.

Basis of A琀琀ributes:

Some of the basis of a琀琀ributes are the following:

Type: These link data objects to the range of valid

values. Loca琀椀on: This represents the opera琀椀ng

system’s memory. Value: These are what an

assignment opera琀椀on produces.

Name: The name of these variables can be altered whenever a sub-program is called.

Components: Data objects from other data items are called components. A pointer is used to
represent this binding, which is then modi昀椀ed.

A琀琀ribute Grammar:

● A琀琀ribute grammar is a special form of context-free grammar where some addi琀椀onal


informa琀椀on (a琀琀ributes) are appended to one or more of its non-terminals in order to
provide context-sensi琀椀ve informa琀椀on.
● Each a琀琀ribute has well-de昀椀ned domain of values, such as integer, 昀氀oat, character,
string, and expressions.
● A琀琀ribute grammar is a medium to provide seman琀椀cs to the context-free grammar and it
can help
specify the syntax and seman琀椀cs of a programming language.
● A琀琀ribute grammar (when viewed as a parse-tree) can pass values or informa琀椀on
among the nodes of a tree.

Example:

E → E + T { E.value = E.value + T.value }

● The right part of the CFG contains the seman琀椀c rules that specify how the grammar
should be interpreted. Here, the values of non-terminals E and T are added together and the
result is copied to the non-terminal E.
● Seman琀椀c a琀琀ributes may be assigned to their values from their domain at the 琀椀me
of parsing and evaluated at the 琀椀me of assignment or condi琀椀ons. Based on the way
the a琀琀ributes get their values, they can be broadly divided into two categories :
synthesized a琀琀ributes and inherited a琀琀ributes.

Synthesized a琀琀ributes:

● These a琀琀ributes get values from the a琀琀ribute values of their child nodes. To illustrate,
assume the following produc琀椀on:

S → ABC

● If S is taking values from its child nodes (A,B,C), then it is said to be a synthesized a琀琀
ribute, as the values of ABC are synthesized to S.
● As in our previous example (E → E + T), the parent node E gets its value from its child node.
Synthesized a琀琀ributes never take values from their parent nodes or any sibling nodes.

Inherited a琀琀ributes:

● In contrast to synthesized a琀琀ributes, inherited a琀琀ributes can take values from parent
and/or siblings. As in the following produc琀椀on,

S → ABC

● A can get values from S, B and C. B can take values from S, A, and C. Likewise, C can take
values from S, A, and B

Expansion : When a non-terminal is expanded to terminals as per a gramma琀椀cal rule.

Reduc琀椀on : When a terminal is reduced to its corresponding non-terminal according to


grammar rules. Syntax trees are parsed top-down and le昀琀 to right. Whenever reduc琀椀on
occurs, we apply its corresponding seman琀椀c rules (ac琀椀ons).

S-a琀琀ributed SDT:

● If an SDT uses only synthesized a琀琀ributes, it is called as S-a琀琀ributed SDT.


● These a琀琀ributes are evaluated using S-a琀琀ributed SDTs that have their seman琀椀c ac
琀椀ons wri琀琀en a昀琀er the produc琀椀on (right hand side).

L-a琀琀ributed SDT:

● This form of SDT uses both synthesized and inherited a琀琀ributes with restric琀椀on of
not taking values from right siblings.
● In L-a琀琀ributed SDTs, a non-terminal can get values from its parent, child, and sibling
nodes. As in the following produc琀椀on
S → ABC
● S can take values from A, B, and C (synthesized). A can take values from S only. B can take
values from S and A. C can get values from S, A, and B. No non-terminal can get values from
the sibling to its right.
● A琀琀ributes in L-a琀琀ributed SDTs are evaluated by depth-昀椀rst and le昀琀-to-right parsing
manner.

SEMANTICS

What are Seman琀椀cs?

● Seman琀椀cs is a linguis琀椀c concept that di昀昀ers from syntax.


● n a computer programming language, the term seman琀椀cs is u琀椀lized to determine the
link between the model of computa琀椀on and the syntax.
● The concept behind seman琀椀cs is that linguis琀椀c representa琀椀ons or symbols enable
logical outcomes because a collec琀椀on of words and phrases communicates ideas to
machines and humans.
● The syntax-directed seman琀椀cs approach is u琀椀lized to map syntac琀椀cal concepts to
the computa琀椀onal model via a func琀椀on.

Techniques of Seman琀椀cs:

There are several techniques that may help to understand the seman琀椀cs of computer programming
language. Some of those techniques are as follows:

1. Algebraic seman琀椀cs
It analyzes the program by specifying algebra.

2. Opera琀椀onal Seman琀椀cs

A昀琀er comparing the languages to the abstract machine, it evaluates the program as a series
of state transi琀椀ons.

3. Transla琀椀onal Seman琀椀cs

It mainly concentrates on the methods that are u琀椀lized for transla琀椀ng a program into
another computer language.

4. Axioma琀椀c Seman琀椀cs

It speci昀椀es the meaning of a program by developing statements about an establishment


that detain at every stage of the program’s execu琀椀on.

5. Denota琀椀onal Seman琀椀cs

It represents the meaning of the program by a set of func琀椀ons that work on the program state.

Types of Seman琀椀cs:

There are several types of seman琀椀cs. Some of those are as follows:

1. Formal Seman琀椀cs
● Words and meanings are analyzed philosophically or mathema琀椀cally in formal seman琀椀cs.
● It constructs models to help de昀椀ne the truth behind words instead of considering only
real-world instances.

2. Lexical Seman琀椀cs
● It is the most well-known sort of seman琀椀cs.
● It searches for the meaning of single words by taking into considera琀椀on the context and
text surrounding them.

3. Conceptual Seman琀椀cs
● The dic琀椀onary de昀椀ni琀椀on of the word is analyzed before any context is applied in
conceptual seman琀椀cs.
● A昀琀er examina琀椀on of the de昀椀ni琀椀on, the context is explored by searching for
linking terms, how meaning is assigned, and how meaning may change over 琀椀me.
● It can be referred to as a sign that the word conveys context.

Example:

A=5

B=3

C=a+b

1. Variable Assignment Seman琀椀c:


● A = 5: This statement assigns the value 5 to the variable a.
● B = 3: This statement assigns the value 3 to the variable b.
● C = a + b: This statement assigns the result of the addi琀椀on of a and b (which is 8) to the
variable c.

2. Arithme琀椀c Opera琀椀on Seman琀椀c:

● A + b: This represents the addi琀椀on opera琀椀on, and in this context, it evaluates to 8.

LEXICAL ANALYSIS

What is Lexical Analysis?

● Lexical analysis is the star琀椀ng phase of the compiler.


● It gathers modi昀椀ed source code that is wri琀琀en in the form of sentences from the
language preprocessor.
● The lexical analyzer is responsible for breaking these syntaxes into a series of tokens, by
removing whitespace in the source code.
● If the lexical analyzer gets any invalid token, it generates an error. The stream of character is
read by it and it seeks the legal tokens, and then the data is passed to the syntax analyzer,
when it is asked for.

Terminologies:

There are three terminologies-

● Token
● Pa琀琀ern
● Lexeme

Token: It is a sequence of characters that represents a unit of informa琀椀on in the source code.

Example of tokens:

Type token (id, number, real, . . . )

Punctua琀椀on tokens (IF, void,

return, . . . ) Alphabe琀椀c tokens

(keywords)

Example of Non-Tokens:

Comments, pre-processor direc琀椀ve, macros, blanks, tabs, newline, etc.

Pa琀琀ern: The descrip琀椀on used by the token is known as a pa琀琀ern.

Lexeme: A sequence of characters in the source code, as per the matching pa琀琀ern of a token, is known
as lexeme. It is also called the instance of a token.

The Architecture of Lexical Analyzer:


● To read the input character in the source code and produce a token is the most important
task of a lexical analyzer.
● The lexical analyzer goes through with the en琀椀re source code and iden琀椀昀椀es each
token one by one.
● The scanner is responsible to produce tokens when it is requested by the parser. The lexical
analyzer avoids the whitespace and comments while crea琀椀ng these tokens.
● If any error occurs, the analyzer correlates these errors with the source 昀椀le and line number.

Roles and Responsibility of Lexical Analyzer:

The lexical analyzer performs the following tasks-

● The lexical analyzer is responsible for removing the white spaces and comments from the
source program.
● It corresponds to the error messages with the source program.
● It helps to iden琀椀fy the tokens.
● The input characters are read by the lexical analyzer from the source code.
How Lexical Analyzer works:

Input preprocessing:

● This stage involves cleaning up the input text and preparing it for lexical analysis.
● This may include removing comments, whitespace, and other non-essen琀椀al characters
from the input text.

Tokeniza琀椀on:

● This is the process of breaking the input text into a sequence of tokens.
● This is usually done by matching the characters in the input text against a set of pa琀琀erns
or regular expressions that de昀椀ne the di昀昀erent types of tokens.

Token classi昀椀ca琀椀on:

● In this stage, the lexer determines the type of each token.


● For example, in a programming language, the lexer might classify keywords, iden琀椀昀椀
ers, operators, and punctua琀椀on symbols as separate token types.

Token valida琀椀on:

● In this stage, the lexer checks that each token is valid according to the rules of the
programming language.
● For example, it might check that a variable name is a valid iden琀椀昀椀er, or that an
operator has the correct syntax.

Output genera琀椀on:

● In this 昀椀nal stage, the lexer generates the output of the lexical analysis process, which is
typically a list of tokens.
● This list of tokens can then be passed to the next stage of compila琀椀on or interpreta琀椀on.

Aadvantages of Lexical Analysis:

● Lexical analysis helps the browsers to format and display a web page with the help of parsed
data.
● It is responsible to create a compiled binary executable code.
● It helps to create a more e昀케cient and specialised processor for the task.

Disadvantages of Lexical Analysis:

● It requires addi琀椀onal run琀椀me overhead to generate the lexer table and construct the
tokens.
● It requires much e昀昀ort to debug and develop the lexer and its token descrip琀椀on.
● Much signi昀椀cant 琀椀me is required to read the source code and par琀椀琀椀on it into
tokens.
PARSING

What is parser?

● The parser is that phase of the compiler which takes a token string as input and with the
help of exis琀椀ng grammar, converts it into the corresponding Intermediate Representa琀
椀on(IR). The parser is also known as Syntax Analyzer.
● The process of transforming the data from one format to another is called Parsing. This
process can be accomplished by the parser.

Types of Parsing:

Top-Down Parser:

● The top-down parser is the parser that generates parse for the given input string with the
help of grammar produc琀椀ons by expanding the non-terminals
● It starts from the start symbol and ends on the terminals. It uses le昀琀 most deriva琀椀on.
● Further Top-down parser is classi昀椀ed into 2 types: A recursive descent parser, and
Non-recursive descent parser.
1. Recursive descent parser:

● Recursive descent parser is also known as the Brute force parser or the backtracking parser.
● It basically generates the parse tree by using brute force and backtracking.

2. Non-recursive descent parser:

● Non-recursive descent parser is also known as LL(1) parser or predic琀椀ve parser or


without backtracking parser or dynamic parser.
● It uses a parsing table to generate the parse tree instead of backtracking.

Bo琀琀om-up Parser:

● Bo琀琀om-up Parser is the parser that generates the parse tree for the given input string
with the help of grammar produc琀椀ons by compressing the terminals.
● It starts from terminals and ends on the start symbol. It uses the reverse of the rightmost
deriva琀椀on.
● Further Bo琀琀om-up parser is classi昀椀ed into two types: LR parser, and Operator precedence
parser.

1. LR Parser:

● LR parser is the bo琀琀om-up parser that generates the parse tree for the given string by using
unambiguous grammar.
● It follows the reverse of the rightmost deriva琀椀on.
● LR parser is of four

types: (a)LR(0)

(b) SLR(1)

(c) LALR(1)

(d) CLR(1)

2. Operator precedence parser:


● Operator precedence parser generates the parse tree from given grammar and string but the
only condi琀椀on is two consecu琀椀ve non-terminals and epsilon never appears on the
right-hand side of any produc琀椀on.
● The operator precedence parsing techniques can be applied to Operator grammars.

Operator grammar: A grammar is said to be operator grammar if there does not exist any produc琀椀
on rule on the right-hand side.

1. as ε(Epsilon)

2. Two non-terminals appear consecu琀椀vely, that is, without any terminal b/w them
operator precedence parsing is not a simple technique to apply to most the language constructs, but
it evolves into an easy technique to implement where a suitable grammar may be produced.

RECURSIVE DESCENT PARSER

Recursive descent parser:

● It is a kind of Top-Down Parser.


● A top-down parser builds the parse tree from the top to down, star琀椀ng with the start
non- terminal.
● A Predic琀椀ve Parser is a special case of Recursive Descent Parser, where no Back
Tracking is required.
● Top-Down Parsing Technique.
● Technique implements recursive procedures to parse the input string.
● Procedure is associated with each Non-Terminal of the Grammar.
● Input scanning is from le昀琀 to right.
● Non-Terminal: Implement Recursive Procedure.
● Terminal: Compare lookahead to check with input string.
● It may need backtracking to iden琀椀fy the correct A-Produc琀椀on.
● It may require repeated scan over the input.

Steps for Recursive descent parser:

● Step1: Procedure begins with start symbol of the grammar.


● Step2: Scan the input le昀琀 to right.
● Step3: Non-Terminal – Recursively replace NT with Produc琀椀on by checking with next
input symbol (lookahead).
● Step 4: If more than one alterna琀椀ve produc琀椀on rule available for a non-terminal,
then decision is based on comparison with lookahead symbol.
● Step5: Terminal Matching with input string, advance the pointer to check with next input symbol.
● Step6: Procedure con琀椀nues un琀椀l it derives en琀椀re input string.
● Step7: Any step if it does not match to derive input string, apply backtracking.
Back-tracking :

● It means, if one deriva琀椀on of a produc琀椀on fails, syntax analyzer restarts process using
di昀昀erent rules of same produc琀椀on.
● This technique may process input string more than once to determine right produc琀椀on.
● Top- down parser start from root node (start symbol) and match input string against produc
琀椀on rules to replace them (if matched).

Example:
Bene昀椀ts of Recursive descent parser:

1. Ease of use: Because recursive descent parsing closely mimics the grammar rules of the
language being parsed, it is simple to comprehend and use.

2. Readability: The parsing code is usually set up in a structured and modular way, which makes
it easier to read and maintain.

3. Error repor琀椀ng: Recursive descent parsers can produce descrip琀椀ve error m essages,
which make it simpler to 昀椀nd and detect syntax mistakes in the input.

4. Predictability: The predictable behavior of recursive descent parsers makes the parsing
process determinis琀椀c and clear.
Drawbacks of Recursive descent parser:

1. Limited handling of le昀琀-recursive grammars: RDP cannot handle le昀琀-recursive


grammars, which can cause in昀椀nite loops in the parsing process.

2. Ine昀케ciency: In some cases, RDP can su昀昀er from e昀케ciency issues when processing
large grammar or complex input text, especially when compared to other parsing techniques such as
LR parsing.

3. Limita琀椀ons in error recovery: Since RDP generates syntax errors immediately upon
encountering an unexpected token, it can be less e昀昀ec琀椀ve than other parsing techniques at
recovering from errors and con琀椀nuing parsing.

4. Di昀케culty handling ambiguous grammars: RDP may struggle to handle grammars that are
inherently ambiguous or have mul琀椀ple valid parse trees.

BOTTOM DOWN PARSING

What is bo琀琀om up parsing?

● Bo琀琀om up parsing is also known as shi昀琀-reduce parsing.


● Bo琀琀om up parsing is used to construct a parse tree for an input string.
● In the bo琀琀om up parsing, the parsing starts with the input symbol and construct the
parse tree up to the start symbol by tracing out the rightmost deriva琀椀ons of string in
reverse.

Example:

E→T

T→T

*FT

→ id

id

Parse Tree representa琀椀on of input string “id * id” is as follows:


Classi昀椀ca琀椀on of bo琀琀om up parsing:

Shi昀琀 reduce parsing:

● Shi昀琀 reduce parsing is a process of reducing a string to the start symbol of a grammar.
● Shi昀琀 reduce parsing uses a stack to hold the grammar and an input tape to hold the string.
● Si昀琀 reduce parsing performs the two ac琀椀ons: shi昀琀 and reduce. That’s why it is
known as shi昀琀 reduces parsing.
● At the shi昀琀 ac琀椀on, the current symbol in the input string is pushed to a stack.
● At each reduc琀椀on, the symbols will replaced by the non-terminals. The symbol is the
right side of the produc琀椀on and non-terminal is the le昀琀 side of the produc琀椀on.
LR parsing:

● LR parser is the bo琀琀om-up parser that generates the parse tree for the given string by
using unambiguous grammar.
● It follows the reverse of the rightmost deriva琀椀on.
● LR parser is of four

types: (a)LR(0)

(b)SLR(1)

(C)LALR(1)

(d)CLR(1)

SLR(1) – Simple LR Parser:

● Works on smallest class of grammar


● Few number of states, hence very small table
● Simple and fast construc琀椀on

LR(1) – LR Parser:

● Works on complete set of LR(1) Grammar


● Generates large table and large number of states
● Slow construc琀椀on

LALR(1) – Look-Ahead LR Parser:

● Works on intermediate size of grammar


● Number of states are same as in SLR(1)

Operator precedence parser:

• Operator precedence parser generates the parse tree from given grammar and string
but the only condi琀椀on is two consecu琀椀ve non-terminals and epsilon never
appears on the right-hand side of any produc琀椀on.
• The operator precedence parsing techniques can be applied to Operator grammars.

Operator grammar: A grammar is said to be operator grammar if there does not exist any produc琀椀
on rule on the right-hand side.

1. As ε(Epsilon)
2. Two non-terminals appear consecu琀椀vely, that is, without any terminal b/w them operator
precedence parsing is not a simple technique to apply to most the language constructs, but it
evolves into an easy technique to implement where a suitable grammar may be produced.

You might also like