FIRST and FOLLOW in Compiler Design
Last Updated :
25 Apr, 2025
In compiler design, FIRST and FOLLOW are two sets used to help parsers understand how to process a grammar.
- FIRST Set: The FIRST set of a non-terminal contains all the terminal symbols that can appear at the beginning of any string derived from that non-terminal. In other words, it tells us which terminal symbols are possible when expanding a non-terminal.
- FOLLOW Set: The FOLLOW set of a non-terminal contains all the terminal symbols that can appear immediately after that non-terminal in any derivation. It helps identify what can follow a non-terminal in the grammar and is essential for handling productions where a non-terminal is at the end of a rule.
FIRST (X)
The FIRST(X) set is a collection of terminal symbols (and possibly ε) that can appear as the leftmost symbol when we expand X in a derivation of a given grammar.
1. When X is a terminal:
If X is a terminal symbol (for example, 'a' or 'b'), the FIRST(X) set simply contains X itself, since a terminal always starts with itself.
Example: If X = a, then FIRST(a) = {a}.
2. When X has only one production rule of type X → aY:
For example A → a B, the FIRST set of A is the first non-terminal character.
Example:
A → a B
Here, FIRST(A → a B) is just {a} because 'a' is the first terminal symbol on the right side of the production.
3. When X has multiple production rules:
If X is a non-terminal (for example, A or B), the FIRST(X) set includes all the terminal symbols that could be the first symbol of any string derived from X. This means, for each production rule of X, you look at what can be the first symbol of that rule.
Example:
A → a B
A → b
A → ε
Here, FIRST(A) = {a, b, ε} because:
- The first symbol in the first production A → a B is 'a'.
- The first symbol in the second production A → b is 'b'.
- The third production A → ε means A can derive an empty string, so ε is included.
4. When X is a string of non-terminals:
If X is a string made up of terminals and/or non-terminals (like A B C), we start with the leftmost symbol and use its FIRST set.
To compute FIRST(X), you look at FIRST(A). If A can derive ε (i.e., the empty string), then you also look at FIRST(B), and so on. If both A and B can derive ε, you also need to consider FIRST(C).
Example:
X → A B C
A→a∣ε
B→b∣ε
C→c∣d
The FIRST(X) is {a,b,c,d}.
This is because:
- A contributes a and ε.
- B contributes b and ε.
- C contributes c and d. Since A and B can both derive ε, we include the first symbols of C.
Read more about FIRST Set in Syntax Analysis.
FOLLOW (X)
The FOLLOW(X) set contains all the terminal symbols that can appear immediately after the non-terminal X in any valid string derived from the grammar. It is used in parsing to decide what symbols can follow a particular non-terminal in a derivation.
1. When X is the start symbol of the grammar:
If X is the start symbol (e.g., S), the FOLLOW(S) set always includes the special end-of-input marker $
to indicate that nothing comes after the start symbol in a complete string.
Example:
Start Symbol = S
FOLLOW(S) = { $ }
2. When X appears before a terminal:
If X is followed by a terminal symbol in a production rule, that terminal symbol is added to FOLLOW(X).
Example:
A → a B c
Here, B is followed by c, so c is in FOLLOW(B).
3. When X appears before a non-terminal:
If X is followed by a non-terminal (e.g., B), the FIRST(B) set is added to FOLLOW(X). However, if B can derive ε (empty string), then
- FOLLOW(X) include the FIRST of terminal or non-terminal (excluding ε) that is after B.
- If there is nothing after B then FOLLOW(X) include the FOLLOW of left-hand side non-terminal of the production rule.
Example:
A → a X B
B → b |
ε
If FIRST(B) = {b, ε}, then:
- b is added to FOLLOW(X).
- Since B can derive ε, FOLLOW(A) will also be included in FOLLOW(X).
4. When X appears at the end of a production rule:
If X appears at the end of a production rule (e.g., A → B X), then FOLLOW(A) is added to FOLLOW(X) because whatever follows A in the string must also follow X.
Example:
A → B X
Here, everything in FOLLOW(A) must also be included in FOLLOW(X).
Read more about FOLLOW Set in Syntax Analysis.
Importance of FIRST and FOLLOW Set
The FIRST and FOLLOW sets play a crucial role in LL(1) parsing and grammar analysis, with multiple important applications:
1. Building LL(1) Parsing Tables
- FIRST and FOLLOW sets help create parsing tables used by LL(1) parsers. These tables guide the parser in selecting the correct production based on the next input symbol.
- Without these sets, it would be impossible to systematically and deterministically decide which rule to apply.
2. Ensuring Grammar is LL(1)
- By analyzing FIRST and FOLLOW, we can check if a grammar satisfies the LL(1) condition: no overlapping entries in the parsing table for any non-terminal and input symbol.
- This ensures that the grammar is unambiguous and suitable for top-down parsing.
3. Handling ε-Productions
- The FIRST set includes ε (empty string) when a non-terminal can derive ε.
- The FOLLOW set ensures that after ε, the parser knows which symbols can legally follow.
4. Predictive Parsing
- These sets allow a parser to predict the correct production to use without backtracking.
- This makes LL(1) parsers efficient, as they don’t need to retry multiple rules.
5. Error Detection and Recovery
- The FOLLOW set helps the parser identify errors in the input string by showing what symbols are expected at a given point.
- This aids in implementing error recovery mechanisms during parsing.
6. Compiler Design and Syntax Analysis
- Both sets are foundational in compiler design, enabling systematic analysis of context-free grammars.
- They assist in identifying left recursion and improving grammar to make it suitable for LL(1) parsing.
Related Links:
Quiz on Syntax Analysis
Program to calculate FIRST and FOLLOW sets of given grammar
Similar Reads
Introduction of Compiler Design A compiler is software that translates or converts a program written in a high-level language (Source Language) into a low-level language (Machine Language or Assembly Language). Compiler design is the process of developing a compiler.The development of compilers is closely tied to the evolution of
9 min read
Compiler Design Basics
Introduction of Compiler DesignA compiler is software that translates or converts a program written in a high-level language (Source Language) into a low-level language (Machine Language or Assembly Language). Compiler design is the process of developing a compiler.The development of compilers is closely tied to the evolution of
9 min read
Compiler construction toolsThe compiler writer can use some specialized tools that help in implementing various phases of a compiler. These tools assist in the creation of an entire compiler or its parts. Some commonly used compiler construction tools include: Parser Generator - It produces syntax analyzers (parsers) from the
4 min read
Phases of a CompilerA compiler is a software tool that converts high-level programming code into machine code that a computer can understand and execute. It acts as a bridge between human-readable code and machine-level instructions, enabling efficient program execution. The process of compilation is divided into six p
10 min read
Symbol Table in CompilerEvery compiler uses a symbol table to track all variables, functions, and identifiers in a program. It stores information such as the name, type, scope, and memory location of each identifier. Built during the early stages of compilation, the symbol table supports error checking, scope management, a
8 min read
Error Handling in Compiler DesignDuring the process of language translation, the compiler can encounter errors. While the compiler might not always know the exact cause of the error, it can detect and analyze the visible problems. The main purpose of error handling is to assist the programmer by pointing out issues in their code. E
5 min read
Language Processors: Assembler, Compiler and InterpreterComputer programs are generally written in high-level languages (like C++, Python, and Java). A language processor, or language translator, is a computer program that convert source code from one programming language to another language or to machine code (also known as object code). They also find
5 min read
Generation of Programming LanguagesProgramming languages have evolved significantly over time, moving from fundamental machine-specific code to complex languages that are simpler to write and understand. Each new generation of programming languages has improved, allowing developers to create more efficient, human-readable, and adapta
6 min read
Lexical Analysis
Introduction of Lexical AnalysisLexical analysis, also known as scanning is the first phase of a compiler which involves reading the source program character by character from left to right and organizing them into tokens. Tokens are meaningful sequences of characters. There are usually only a small number of tokens for a programm
6 min read
Flex (Fast Lexical Analyzer Generator)Flex (Fast Lexical Analyzer Generator), or simply Flex, is a tool for generating lexical analyzers scanners or lexers. Written by Vern Paxson in C, circa 1987, Flex is designed to produce lexical analyzers that is faster than the original Lex program. Today it is often used along with Berkeley Yacc
7 min read
Introduction of Finite AutomataFinite automata are abstract machines used to recognize patterns in input sequences, forming the basis for understanding regular languages in computer science. They consist of states, transitions, and input symbols, processing each symbol step-by-step. If the machine ends in an accepting state after
4 min read
Classification of Context Free GrammarsA Context-Free Grammar (CFG) is a formal rule system used to describe the syntax of programming languages in compiler design. It provides a set of production rules that specify how symbols (terminals and non-terminals) can be combined to form valid sentences in the language. CFGs are important in th
4 min read
Ambiguous GrammarContext-Free Grammars (CFGs) is a way to describe the structure of a language, such as the rules for building sentences in a language or programming code. These rules help define how different symbols can be combined to create valid strings (sequences of symbols).CFGs can be divided into two types b
7 min read
Syntax Analysis & Parsers
Syntax Directed Translation & Intermediate Code Generation
Syntax Directed Translation in Compiler DesignSyntax-Directed Translation (SDT) is a method used in compiler design to convert source code into another form while analyzing its structure. It integrates syntax analysis (parsing) with semantic rules to produce intermediate code, machine code, or optimized instructions.In SDT, each grammar rule is
8 min read
S - Attributed and L - Attributed SDTs in Syntax Directed TranslationIn Syntax-Directed Translation (SDT), the rules are those that are used to describe how the semantic information flows from one node to the other during the parsing phase. SDTs are derived from context-free grammars where referring semantic actions are connected to grammar productions. Such action c
4 min read
Parse Tree and Syntax TreeParse Tree and Syntax tree are tree structures that represent the structure of a given input according to a formal grammar. They play an important role in understanding and verifying whether an input string aligns with the language defined by a grammar. These terms are often used interchangeably but
4 min read
Intermediate Code Generation in Compiler DesignIn the analysis-synthesis model of a compiler, the front end of a compiler translates a source program into an independent intermediate code, then the back end of the compiler uses this intermediate code to generate the target code (which can be understood by the machine). The benefits of using mach
6 min read
Issues in the design of a code generatorA code generator is a crucial part of a compiler that converts the intermediate representation of source code into machine-readable instructions. Its main task is to produce the correct and efficient code that can be executed by a computer. The design of the code generator should ensure that it is e
7 min read
Three address code in CompilerTAC is an intermediate representation of three-address code utilized by compilers to ease the process of code generation. Complex expressions are, therefore, decomposed into simple steps comprising, at most, three addresses: two operands and one result using this code. The results from TAC are alway
6 min read
Data flow analysis in CompilerData flow is analysis that determines the information regarding the definition and use of data in program. With the help of this analysis, optimization can be done. In general, its process in which values are computed using data flow analysis. The data flow property represents information that can b
6 min read
Code Optimization & Runtime Environments
Practice Questions