Bottom-up parsing is a type of syntax analysis method where the parser starts from the input symbols (tokens) and attempts to reduce them to the start symbol of the grammar (usually denoted as S). The process involves applying production rules in reverse, starting from the leaves of the parse tree and working upward toward the root.
We need bottom-up parsers because:
- Ensures a clear and efficient construction of the parse tree.
- It is effective for detecting syntactic errors early in the process.
- Well-suited for handling complex grammars, including context-free grammars.
- Often uses less memory and can be more efficient for certain types of languages.
Steps in Bottom-Up Parsing
- Start with tokens: The parser begins with the terminal symbols (the input tokens), which are the leaves of the parse tree.
- Shift and reduce: The parser repeatedly applies two actions:
- Shift: The next token is pushed onto a stack.
- Reduce: A sequence of symbols on the stack is replaced by a non-terminal according to the production rules of the grammar.
- Repeat until root: The process of shifting and reducing continues until the entire input is reduced to the start symbol, indicating the sentence has been successfully parsed.
Example
Using following production rule construct a parse tree which takes the input "id * id".
- E → T
- T → T * F
- T → id
- F → T
- F → id
Solution:
Example of Bottom Up ParsingShift and Reduce Operation in Bottom-Up Parsing
In bottom-up parsing, the process is done by attempting to break down the input string into the start symbol of the grammar. The two most important operations used in this process are Shift and Reduce.
Shift Operation
In Shift operation an input string symbol is shifted onto the stack to be processed. It's the initial step in parsing and goes on until the parser is ready to perform a Reduce operation.
- Input Symbol: The parser begins with the entire input string, and it processes one symbol (referred to as the "next token") at a time.
- Stack: The parser keeps a stack of symbols (terminals and non-terminals) that it has processed thus far. The stack is initially empty.
- Action: The parser moves to the next symbol in the input and adds it to the stack. This is referred to as the shift. The stack contains a segment of the input that will eventually be reduced based on the grammar rules.
Example of Shift Operation
Take a simple expression "a + b". Here's how the shift operation would proceed step-by-step:
Initial Input: "a + b" (the entire input string)
- The parser reads the first symbol, which is "a".
- It pushes "a" onto the stack.
- Stack after shift: ["a"]
- Remaining input: "+ b"
Next Symbol: The parser now examines the next symbol in the input, which is "+".
- It shifts "+" onto the stack.
- Stack after shift: ["a", "+"]
- Remaining input: "b"
Final Symbol: The parser considers the next symbol, which is "b".
- It pushes "b" onto the stack.
- Stack after shift: ["a", "+", "b"]
- Rest of the input: "" (empty)
At this stage, the input becomes empty, yet the stack stores the entire input string ("a + b"). The parser can now go for a reduce action if it detects a rule by which this string can be reduced.
Reduce Operation
The process of using reduce operation is called reduction. Reduction is when a specific part of the input (called a substring or handle) is replaced by a non-terminal symbol according to the production rules of the grammar. A handle is a substring in the stack that matches a grammar rule's RHS and must be reduced next to progress toward the start symbol.
Production Rule: A production rule defines how a non-terminal symbol can be replaced by other symbols (either terminals or non-terminals). For example, you might have seen a production like this:
Expression → Term + Term
This rule says that an Expression
can be made by combining two Term
s with a '+'
in between.
Matching the Substring: During parsing, we look at the input string and try to match parts of it to the right-hand side of a production rule. For instance, if the input is "3 + 5", we might find that this substring matches the Term + Term
part of the production.
Replacement Step (Reduction): Once a match is found, we "reduce" that matched part. This means we replace it with the non-terminal on the left side of the production rule. So, from the example above:
- We recognize the substring "3 + 5" as matching
Term + Term
. - We then replace this substring with the non-terminal
Expression
.
Continue Reducing: The parser keeps reducing parts of the input string in this way, until all parts are reduced to the start symbol (like S
in many grammars). This indicates that the entire input has been successfully parsed according to the grammar.
Example of Reduce Operation
Consider the following simple grammar:
- S→A+B
- A→3
- B→5
Now, let's parse the string "3 + 5":
- First, we start with the input string: "3 + 5".
- We look for parts of the string that match a production rule.
- We see that "3" matches A, so we replace it with A (so now we have A+5).
- Next, "5" matches B, so we replace it with B (now we have A+B).
- Finally, A+B matches the production S→A+B, so we replace A+B with S.
Now, we've reduced the entire input to the start symbol S, meaning the input has been successfully parsed.
Classification of Bottom-up Parsers
Classification of Bottom Up ParsersA bottom-up parser is often referred to as a shift-reduce parser. A shift-reduce parser has just four canonical actions:
- shift: next input symbol is shifted onto the top of the stack.
- reduce: pop the rule's RHS from the stack, push its LHS
- accept: terminate parsing and signal success.
- error: call an error recovery routine.
LR Parsers
LR parsers are a type of bottom-up parsers that are used to handle large and complex grammars. They are commonly used in compilers for programming languages. The name "LR" comes from two parts:
- The "L" stands for left-to-right scanning of the input. This means the parser reads the input string one symbol at a time, from left to right.
- The "R" stands for rightmost derivation in reverse. This refers to the way the parser constructs the parse tree.
Instead of building the tree from the top down (like in top-down parsers), LR parsers work from the leaves (the input symbols) and gradually reduces them back to the start symbol, following a rightmost derivation in reverse.
The "K" part, which you may see in some variants like LALR or SLR, refers to the lookahead symbols the parser uses. A "lookahead" is the number of input symbols the parser looks at in advance to decide what action to take.
For example, if the parser uses 1 lookahead, it looks at just the next symbol to decide what to do, while a parser using 2 lookahead looks ahead by two symbols.
Algorithm
push s₀ # Start with initial state
token ← next_token() # Load first token
while True:
s ← stack.top() # Current state
match action[s, token]:
case "shift sᵢ":
push sᵢ # Shift to new state
token ← next_token() # Consume token
case "reduce A → β":
pop |β| states # Remove RHS symbols
s' ← stack.top() # State after pop
push goto[s', A] # Push GOTO for LHS
case "accept" if token == $:
return SUCCESS # Input fully parsed
case _:
raise ERROR # Invalid parse
The common algorithms to build tables for an “LR” parser:
LR(0) Parser
An LR(0) parser is a particular kind of bottom-up parser employed in compiler construction. The "LR" refers to Left-to-right scanning of the input and Rightmost derivation in reverse. The "(0)" means that the parser has no lookahead i.e., it makes parsing choices based solely on the current symbol on the input and the stack, without having to look ahead at subsequent symbols.
Working of LR(0) Parser:
The LR(0) parser analyzes the input symbol by symbol from left to right. It creates the parse tree using a shift-reduce process. This continues on until the whole input string has been processed and the stack only has the start symbol of the grammar.
SLR(1) Parser
An SLR(1) parser is an extended version of the LR(0) parser. The "SLR" refers to Simple LR, and the "(1)" indicates that it has 1 symbol of lookahead to decide what action to take. That is, the parser will have a look at the next symbol in the input to aid in deciding what action to perform, hence more powerful than an LR(0) parser.
Working of SLR(1) Parser:
Similar to the LR(0) parser, an SLR(1) parser employs a shift-reduce strategy. The main distinction here is that the SLR(1) parser also takes into account the next input symbol (the lookahead) to determine whether to shift or reduce. This additional lookahead enables it to resolve certain kinds of conflicts not resolvable by LR(0) parsers.
LR(1)
- full set of LR(1) grammars
- largest tables (number of states)
- slow, large construction
LALR(1)
- intermediate sized set of grammars
- same number of states as SLR(1)
- canonical construction is slow and large
- better construction techniques exist
Benefits of LR parsing
- Many programming languages using some variations of an LR parser. It should be noted that C++ and Perl are exceptions to it.
- LR Parser can be implemented very efficiently.
- Of all the Parsers that scan their symbols from left to right, LR Parsers detect syntactic errors, as soon as possible.
LR(k) Items
When building parsing tables for LR parsers, we use LR(k) items to track what the parser is expecting next.
An LR(k) item is a pair [α, β], where:
- α is a grammar rule with a dot (
•
) in it. The dot shows how much of the rule has been processed. - β is a string of up to k lookahead symbols (tokens) that help decide the next action.
The k in LR(k) means how many lookahead symbols the parser considers when making decisions.
Examples of LR(k) Items
LR(0) Items (No Lookahead) : These only track progress in a grammar rule.
Example for a rule S → A B
:
[S → • A B]
(nothing processed yet)[S → A • B]
(A is processed, B is next)[S → A B •]
(complete rule)
LR(1) Items (One Lookahead Symbol) : These also consider one lookahead token (β
).
Example: If we have the rule S → A B
, with b
as lookahead:
[S → • A B, b]
(predicting this rule when next token is b
)[S → A • B, b]
[S → A B •, b]
Note: LR(0) items are used in SLR(1) parsers (simpler) and LR(1) items are used in LR(1) and LALR(1) parsers (more powerful).
CLOSURE
The CLOSURE function helps a parser figure out all possible rules that might be needed in a particular situation.
If we have a rule like A → α • B β, it means:
- We have processed α so far.
- B is the next thing we need to expand.
The CLOSURE function finds all rules that start with B and adds them to the set.
Example
Grammar Rules
S → A BA → aB → b CC → c
Closure for [S → A • B]
- Start with
[S → A • B]
B
is next, so find rules for B
. We find: - Add
[B → • b C]
to the closure. - No more rules for
b
, so we stop.
Closure Algorithm
function CLOSURE(I):
repeat:
for each [A → α • B β] in I:
for each rule [B → γ] in grammar:
add [B → • γ] to I
until no new items can be added
return I
GOTO
The GOTO function helps a parser move from one state to another after recognizing a symbol (X).
- Suppose we are in a state I and expecting X next.
- GOTO(I, X) finds all rules where X was the next expected symbol.
- It moves the dot (
•
) past X and then applies CLOSURE to find any new possibilities.
Example
Grammar Rules
S → A BA → aB → b CC → c
GOTO for (I, B)
- Suppose I contains
[S → A • B]
. - The dot is before
B
, so we move it: - Now apply CLOSURE:
- If there are rules for
B
, we add those with •
at the start.
GOTO Algorithm
function GOTO(I, X):
J = set of items [A → α X • β]
where [A → α • X β] is in I
return CLOSURE(J)
Construction of GOTO graph
- State I0 - closure of augmented LR(0) item
- Using I0 find all collection of sets of LR(0) items with the help of DFA
- Convert DFA to LR(0) parsing table
Examples of CLOSURE and GOTO
CLOSURE | GOTO |
---|
 |
 |
Augmented Grammar
An augmented grammar is a modified version of a grammar where we add a new start symbol and rule to help with parsing.
- It ensures that the parser knows when to accept the input.
- It helps in building LR parsing tables (used in compilers).
How Do We Create an Augmented Grammar?
- Add a new start symbol (
S'
) - Create a new rule (
S' → S
), where S
is the original start symbol. - Keep all other rules the same.
Example
Original Grammar:
S → A BA → aB → b
Augmented Grammar:
S' → S (New Start Rule)S → A BA → aB → b
Now, S' → S
helps the parser recognize when it has reached the end of the input.Operator Precedence Parsing
Operator precedence parsing is a type of bottom-up parsing used to parse expressions based on operator precedence relations. It is suitable for grammars where operators have clear precedence and associativity, such as arithmetic expressions.
Operator Precedence Relations
Operator precedence parsers rely on three relations between terminal symbols (operators) to determine the parsing action:
- Less than (
<·
) → Operator has lower precedence than the next. - Greater than (
·>
) → Operator has higher precedence than the next. - Equal to (
=
) → Operators have the same precedence (e.g., parentheses matching).
These relations help in deciding when to shift or reduce during parsing.
Operator Precedence Table
A table defining precedence relationships among operators is required for the parser to function. Example:
Operator | + | * | ( | ) | $ |
---|
+ | ·> | <· | <· | ·> | ·> |
* | ·> | ·> | <· | ·> | ·> |
( | <· | <· | <· | = | error |
) | ·> | ·> | error | ·> | ·> |
$ | <· | <· | <· | error | accept |
$
represents the end of input.- Shift occurs when the relation is
<·
(lower precedence). - Reduce occurs when the relation is
·>
(higher precedence).
Parsing Algorithm
Initialize Stack: Push $
onto the stack.
Read Token: Get the next input symbol.
Compare Precedence: Check precedence between the top of the stack and input token:
- If stack_top
<·
input_token → Shift (push the token onto the stack). - If stack_top
·>
input_token → Reduce (apply reduction to the handle). - If stack_top
=
input_token → Match and proceed (for parentheses). - If no valid relation exists → Error.
Repeat Until Accept: Continue until the parser reaches the $
symbol and accept
condition.
Advantages
- Efficient for handling operator-precedence grammars.
- Simple Implementation using precedence relations.
- No Need for Left Recursion Removal in certain cases.
Similar Reads
Compiler Design Tutorial A compiler is software that translates or converts a program written in a high-level language (Source Language) into a low-level language (Machine Language or Assembly Language). Compiler design is the process of developing a compiler.It involves many stages like lexical analysis, syntax analysis (p
3 min read
Introduction
Introduction of Compiler DesignA compiler is software that translates or converts a program written in a high-level language (Source Language) into a low-level language (Machine Language or Assembly Language). Compiler design is the process of developing a compiler.The development of compilers is closely tied to the evolution of
9 min read
Compiler construction toolsThe compiler writer can use some specialized tools that help in implementing various phases of a compiler. These tools assist in the creation of an entire compiler or its parts. Some commonly used compiler construction tools include: Parser Generator - It produces syntax analyzers (parsers) from the
4 min read
Phases of a CompilerA compiler is a software tool that converts high-level programming code into machine code that a computer can understand and execute. It acts as a bridge between human-readable code and machine-level instructions, enabling efficient program execution. The process of compilation is divided into six p
10 min read
Symbol Table in CompilerEvery compiler uses a symbol table to track all variables, functions, and identifiers in a program. It stores information such as the name, type, scope, and memory location of each identifier. Built during the early stages of compilation, the symbol table supports error checking, scope management, a
8 min read
Error Detection and Recovery in CompilerError detection and recovery are essential functions of a compiler to ensure that a program is correctly processed. Error detection refers to identifying mistakes in the source code, such as syntax, semantic, or logical errors. When an error is found, the compiler generates an error message to help
6 min read
Error Handling in Compiler DesignDuring the process of language translation, the compiler can encounter errors. While the compiler might not always know the exact cause of the error, it can detect and analyze the visible problems. The main purpose of error handling is to assist the programmer by pointing out issues in their code. E
5 min read
Language Processors: Assembler, Compiler and InterpreterComputer programs are generally written in high-level languages (like C++, Python, and Java). A language processor, or language translator, is a computer program that convert source code from one programming language to another language or to machine code (also known as object code). They also find
5 min read
Generation of Programming LanguagesProgramming languages have evolved significantly over time, moving from fundamental machine-specific code to complex languages that are simpler to write and understand. Each new generation of programming languages has improved, allowing developers to create more efficient, human-readable, and adapta
6 min read
Lexical Analysis
Introduction of Lexical AnalysisLexical analysis, also known as scanning is the first phase of a compiler which involves reading the source program character by character from left to right and organizing them into tokens. Tokens are meaningful sequences of characters. There are usually only a small number of tokens for a programm
6 min read
Flex (Fast Lexical Analyzer Generator)Flex (Fast Lexical Analyzer Generator), or simply Flex, is a tool for generating lexical analyzers scanners or lexers. Written by Vern Paxson in C, circa 1987, Flex is designed to produce lexical analyzers that is faster than the original Lex program. Today it is often used along with Berkeley Yacc
7 min read
Introduction of Finite AutomataFinite automata are abstract machines used to recognize patterns in input sequences, forming the basis for understanding regular languages in computer science. They consist of states, transitions, and input symbols, processing each symbol step-by-step. If the machine ends in an accepting state after
4 min read
Ambiguous GrammarContext-Free Grammars (CFGs) is a way to describe the structure of a language, such as the rules for building sentences in a language or programming code. These rules help define how different symbols can be combined to create valid strings (sequences of symbols).CFGs can be divided into two types b
7 min read
Syntax Analysis
Parsers
Parsing - Introduction to ParsersParsing, also known as syntactic analysis, is the process of analyzing a sequence of tokens to determine the grammatical structure of a program. It takes the stream of tokens, which are generated by a lexical analyzer or tokenizer, and organizes them into a parse tree or syntax tree.The parse tree v
6 min read
Classification of Top Down ParsersTop-down parsing is a way of analyzing a sentence or program by starting with the start symbol (the root of the parse tree) and working down to the leaves (the actual input symbols). It tries to match the input string by expanding the start symbol using grammar rules. The process of constructing the
4 min read
Bottom-up ParsersBottom-up parsing is a type of syntax analysis method where the parser starts from the input symbols (tokens) and attempts to reduce them to the start symbol of the grammar (usually denoted as S). The process involves applying production rules in reverse, starting from the leaves of the parse tree a
13 min read
Shift Reduce Parser in CompilerShift-reduce parsing is a popular bottom-up technique used in syntax analysis, where the goal is to create a parse tree for a given input based on grammar rules. The process works by reading a stream of tokens (the input), and then working backwards through the grammar rules to discover how the inpu
11 min read
SLR Parser (with Examples)LR parsers is an efficient bottom-up syntax analysis technique that can be used to parse large classes of context-free grammar is called LR(k) parsing. L stands for left-to-right scanningR stands for rightmost derivation in reversek is several input symbols. when k is omitted k is assumed to be 1.Ad
4 min read
CLR Parser (with Examples)LR parsers :It is an efficient bottom-up syntax analysis technique that can be used to parse large classes of context-free grammar is called LR(k) parsing. L stands for the left to right scanningR stands for rightmost derivation in reversek stands for no. of input symbols of lookahead Advantages of
7 min read
Construction of LL(1) Parsing TableParsing is an essential part of computer science, especially in compilers and interpreters. From the various parsing techniques, LL(1) parsing is best. It uses a predictive, top-down approach. This allows efficient parsing without backtracking. This article will explore parsing and LL(1) parsing. It
6 min read
LALR Parser (with Examples)LALR Parser :LALR Parser is lookahead LR parser. It is the most powerful parser which can handle large classes of grammar. The size of CLR parsing table is quite large as compared to other parsing table. LALR reduces the size of this table.LALR works similar to CLR. The only difference is , it combi
6 min read
Syntax Directed Translation
Code Generation and Optimization
Code Optimization in Compiler DesignCode optimization is a crucial phase in compiler design aimed at enhancing the performance and efficiency of the executable code. By improving the quality of the generated machine code optimizations can reduce execution time, minimize resource usage, and improve overall system performance. This proc
9 min read
Intermediate Code Generation in Compiler DesignIn the analysis-synthesis model of a compiler, the front end of a compiler translates a source program into an independent intermediate code, then the back end of the compiler uses this intermediate code to generate the target code (which can be understood by the machine). The benefits of using mach
6 min read
Issues in the design of a code generatorA code generator is a crucial part of a compiler that converts the intermediate representation of source code into machine-readable instructions. Its main task is to produce the correct and efficient code that can be executed by a computer. The design of the code generator should ensure that it is e
7 min read
Three address code in CompilerTAC is an intermediate representation of three-address code utilized by compilers to ease the process of code generation. Complex expressions are, therefore, decomposed into simple steps comprising, at most, three addresses: two operands and one result using this code. The results from TAC are alway
6 min read
Data flow analysis in CompilerData flow is analysis that determines the information regarding the definition and use of data in program. With the help of this analysis, optimization can be done. In general, its process in which values are computed using data flow analysis. The data flow property represents information that can b
6 min read
Compiler Design | Detection of a Loop in Three Address CodePrerequisite - Three address code in Compiler Loop optimization is the phase after the Intermediate Code Generation. The main intention of this phase is to reduce the number of lines in a program. In any program majority of the time is spent actually inside the loop for an iterative program. In the
3 min read
Introduction of Object Code in Compiler DesignLet assume that you have a C program then, you give it to the compiler and compiler will produce the output in assembly code. Now, that assembly language code will be given to the assembler and assembler will produce some code and that code is known as Object Code. Object CodeObject Code is a key co
6 min read
Data flow analysis in CompilerData flow is analysis that determines the information regarding the definition and use of data in program. With the help of this analysis, optimization can be done. In general, its process in which values are computed using data flow analysis. The data flow property represents information that can b
6 min read
Runtime Environments
Compiler Design LMN
Compiler Design GATE PYQ's and MCQs