Program to Calculate First and Follow Sets of Given Grammar
Last Updated :
29 Nov, 2024
The First and Follow sets are important in syntax analysis, mainly in parsing. All sets help in making predictive parsers and are integral to identifying how a given grammar can be parsed effectively.
- The First Set for a non-terminal symbol represents all possible terminals that can appear at the beginning of any string derived from that non-terminal.
- The follow set contains terminals that can appear immediately after a non-terminal in the derivation of the grammar.
All meaning of First and Follow sets is it allows to define which production rules are implemented in different parsing moment.
What is the First Set?
The First set of non-terminal in grammar tells us all the possible terminals that can appear at the beginning of any string derived from that non terminal. it gives us a way to know what symbol might come first if we start with a particular non-terminal. For example, if we have a rule like A -> aB | ε, then the First set of A includes the terminal a, as well as ε, which represents the possibility of deriving an empty string. Calculating the First set helps us predict which rules to apply during parsing by knowing which symbols can appear first in a derivation.
What is a Follow Set?
The Follow set of a non-terminal contains all terminals that can immediately follow it in any derivation of the grammar. This means that if a non-terminal appears in the middle of a sentence, the Follow set tells us what symbol might come right after it. For example, in a rule like S -> AB, the Follow set of A might include anything that can follow B, depending on other rules in the grammar. Follow sets are especially useful for parsers when making decisions about which rule to apply next and ensuring that we’re interpreting the structure of a sentence correctly, especially for LL(1) parsers where we only look one symbol ahead.
Example:
Input :
E -> TR
R -> +T R| #
T -> F Y
Y -> *F Y | #
F -> (E) | i
Output :
First(E)= { (, i, }
First(R)= { +, #, }
First(T)= { (, i, }
First(Y)= { *, #, }
First(F)= { (, i, }
-----------------------------------------------
Follow(E) = { $, ), }
Follow(R) = { $, ), }
Follow(T) = { +, $, ), }
Follow(Y) = { +, $, ), }
Follow(F) = { *, +, $, ), }
The functions follow and follow first are both involved in the calculation of the Follow Set of a given Non-Terminal. The follow set of the start symbol will always contain "$". Now the calculation of Follow falls under three broad cases:
- If a Non-Terminal on the R.H.S. of any production is followed immediately by a Terminal then it can immediately be included in the Follow set of that Non-Terminal.
- If a Non-Terminal on the R.H.S. of any production is followed immediately by a Non-Terminal, then the First Set of that new Non-Terminal gets included on the follow set of our original Non-Terminal. In case encountered an epsilon i.e. " # " then, move on to the next symbol in the production.
Note: "#" is never included in the Follow set of any Non-Terminal. - If reached the end of a production while calculating follow, then the Follow set of that non-terminal will include the Follow set of the Non-Terminal on the L.H.S. of that production. This can easily be implemented by recursion.
Assumptions
- Epsilon is represented by '#'.
- Productions are of the form A=B, where 'A' is a single Non-Terminal and 'B' can be any combination of Terminals and Non- Terminals.
- L.H.S. of the first production rule is the start symbol.
- Grammar is not left recursive.
- Each production of a non-terminal is entered on a different line.
- Only Upper Case letters are Non-Terminals and everything else is a terminal.
- Do not use '!' or '$' as they are reserved for special purposes.
Below is the implementation:
C
// C program to calculate the First and
// Follow sets of a given grammar
#include <ctype.h>
#include <stdio.h>
#include <string.h>
// Functions to calculate Follow
void followfirst(char, int, int);
void follow(char c);
// Function to calculate First
void findfirst(char, int, int);
int count, n = 0;
// Stores the final result
// of the First Sets
char calc_first[10][100];
// Stores the final result
// of the Follow Sets
char calc_follow[10][100];
int m = 0;
// Stores the production rules
char production[10][10];
char f[10], first[10];
int k;
char ck;
int e;
int main(int argc, char** argv)
{
int jm = 0;
int km = 0;
int i, choice;
char c, ch;
count = 8;
// The Input grammar
strcpy(production[0], "X=TnS");
strcpy(production[1], "X=Rm");
strcpy(production[2], "T=q");
strcpy(production[3], "T=#");
strcpy(production[4], "S=p");
strcpy(production[5], "S=#");
strcpy(production[6], "R=om");
strcpy(production[7], "R=ST");
int kay;
char done[count];
int ptr = -1;
// Initializing the calc_first array
for (k = 0; k < count; k++) {
for (kay = 0; kay < 100; kay++) {
calc_first[k][kay] = '!';
}
}
int point1 = 0, point2, xxx;
for (k = 0; k < count; k++) {
c = production[k][0];
point2 = 0;
xxx = 0;
// Checking if First of c has
// already been calculated
for (kay = 0; kay <= ptr; kay++)
if (c == done[kay])
xxx = 1;
if (xxx == 1)
continue;
// Function call
findfirst(c, 0, 0);
ptr += 1;
// Adding c to the calculated list
done[ptr] = c;
printf("\n First(%c) = { ", c);
calc_first[point1][point2++] = c;
// Printing the First Sets of the grammar
for (i = 0 + jm; i < n; i++) {
int lark = 0, chk = 0;
for (lark = 0; lark < point2; lark++) {
if (first[i] == calc_first[point1][lark]) {
chk = 1;
break;
}
}
if (chk == 0) {
printf("%c, ", first[i]);
calc_first[point1][point2++] = first[i];
}
}
printf("}\n");
jm = n;
point1++;
}
printf("\n");
printf("-----------------------------------------------"
"\n\n");
char donee[count];
ptr = -1;
// Initializing the calc_follow array
for (k = 0; k < count; k++) {
for (kay = 0; kay < 100; kay++) {
calc_follow[k][kay] = '!';
}
}
point1 = 0;
int land = 0;
for (e = 0; e < count; e++) {
ck = production[e][0];
point2 = 0;
xxx = 0;
// Checking if Follow of ck
// has already been calculated
for (kay = 0; kay <= ptr; kay++)
if (ck == donee[kay])
xxx = 1;
if (xxx == 1)
continue;
land += 1;
// Function call
follow(ck);
ptr += 1;
// Adding ck to the calculated list
donee[ptr] = ck;
printf(" Follow(%c) = { ", ck);
calc_follow[point1][point2++] = ck;
// Printing the Follow Sets of the grammar
for (i = 0 + km; i < m; i++) {
int lark = 0, chk = 0;
for (lark = 0; lark < point2; lark++) {
if (f[i] == calc_follow[point1][lark]) {
chk = 1;
break;
}
}
if (chk == 0) {
printf("%c, ", f[i]);
calc_follow[point1][point2++] = f[i];
}
}
printf(" }\n\n");
km = m;
point1++;
}
}
void follow(char c)
{
int i, j;
// Adding "$" to the follow
// set of the start symbol
if (production[0][0] == c) {
f[m++] = '$';
}
for (i = 0; i < 10; i++) {
for (j = 2; j < 10; j++) {
if (production[i][j] == c) {
if (production[i][j + 1] != '\0') {
// Calculate the first of the next
// Non-Terminal in the production
followfirst(production[i][j + 1], i,
(j + 2));
}
if (production[i][j + 1] == '\0'
&& c != production[i][0]) {
// Calculate the follow of the
// Non-Terminal in the L.H.S. of the
// production
follow(production[i][0]);
}
}
}
}
}
void findfirst(char c, int q1, int q2)
{
int j;
// The case where we
// encounter a Terminal
if (!(isupper(c))) {
first[n++] = c;
}
for (j = 0; j < count; j++) {
if (production[j][0] == c) {
if (production[j][2] == '#') {
if (production[q1][q2] == '\0')
first[n++] = '#';
else if (production[q1][q2] != '\0'
&& (q1 != 0 || q2 != 0)) {
// Recursion to calculate First of New
// Non-Terminal we encounter after
// epsilon
findfirst(production[q1][q2], q1,
(q2 + 1));
}
else
first[n++] = '#';
}
else if (!isupper(production[j][2])) {
first[n++] = production[j][2];
}
else {
// Recursion to calculate First of
// New Non-Terminal we encounter
// at the beginning
findfirst(production[j][2], j, 3);
}
}
}
}
void followfirst(char c, int c1, int c2)
{
int k;
// The case where we encounter
// a Terminal
if (!(isupper(c)))
f[m++] = c;
else {
int i = 0, j = 1;
for (i = 0; i < count; i++) {
if (calc_first[i][0] == c)
break;
}
// Including the First set of the
// Non-Terminal in the Follow of
// the original query
while (calc_first[i][j] != '!') {
if (calc_first[i][j] != '#') {
f[m++] = calc_first[i][j];
}
else {
if (production[c1][c2] == '\0') {
// Case where we reach the
// end of a production
follow(production[c1][0]);
}
else {
// Recursion to the next symbol
// in case we encounter a "#"
followfirst(production[c1][c2], c1,
c2 + 1);
}
}
j++;
}
}
}
Output:
First(X) = { q, n, o, p, #,m}
First(T) = { q, #, }
First(S) = { p, #, }
First(R) = { o, p, q, #, }
-----------------------------------------------
Follow(X) = { $, }
Follow(T) = { n, m, }
Follow(S) = { $, q, m, }
Follow(R) = { m, }
Conclusion
Understanding First and Follow sets is crucial for building efficient parsers in compiler design. These sets allow us to find which production rules are during parsing to identify the potential starting and following symbols for each non terminal. By calculating First and Follow sets, we ensure accurate syntax analysis, helping to create parsers that are both predictive and unambiguous, which is essential for building reliable compilers.
Similar Reads
Static and Dynamic Scoping
The scope of a variable x in the region of the program in which the use of x refers to its declaration. One of the basic reasons for scoping is to keep variables in different parts of the program distinct from one another. Since there are only a small number of short variable names, and programmers
6 min read
Generation of Programming Languages
Programming languages have evolved significantly over time, moving from fundamental machine-specific code to complex languages that are simpler to write and understand. Each new generation of programming languages has improved, allowing developers to create more efficient, human-readable, and adapta
6 min read
Error Handling in Compiler Design
During the process of language translation, the compiler can encounter errors. While the compiler might not always know the exact cause of the error, it can detect and analyze the visible problems. The main purpose of error handling is to assist the programmer by pointing out issues in their code. E
5 min read
Error Detection and Recovery in Compiler
Error detection and recovery are essential functions of a compiler to ensure that a program is correctly processed. Error detection refers to identifying mistakes in the source code, such as syntax, semantic, or logical errors. When an error is found, the compiler generates an error message to help
6 min read
Linker
A linker is an essential tool in the process of compiling a program. It helps combine various object modules (output from the assembler) into a single executable file that can be run on a system. The linkerâs job is to manage and connect different pieces of code and data, ensuring that all reference
8 min read
Introduction of Lexical Analysis
Lexical analysis, also known as scanning is the first phase of a compiler which involves reading the source program character by character from left to right and organizing them into tokens. Tokens are meaningful sequences of characters. There are usually only a small number of tokens for a programm
6 min read
C program to detect tokens in a C program
As it is known that Lexical Analysis is the first phase of compiler also known as scanner. It converts the input program into a sequence of Tokens. A C program consists of various tokens and a token is either a keyword, an identifier, a constant, a string literal, or a symbol.For Example: 1) Keyword
5 min read
Flex (Fast Lexical Analyzer Generator)
Flex (Fast Lexical Analyzer Generator), or simply Flex, is a tool for generating lexical analyzers scanners or lexers. Written by Vern Paxson in C, circa 1987, Flex is designed to produce lexical analyzers that is faster than the original Lex program. Today it is often used along with Berkeley Yacc
7 min read
Classification of Context Free Grammars
A Context-Free Grammar (CFG) is a formal rule system used to describe the syntax of programming languages in compiler design. It provides a set of production rules that specify how symbols (terminals and non-terminals) can be combined to form valid sentences in the language. CFGs are important in th
4 min read
Ambiguous Grammar
Context-Free Grammars (CFGs) is a way to describe the structure of a language, such as the rules for building sentences in a language or programming code. These rules help define how different symbols can be combined to create valid strings (sequences of symbols).CFGs can be divided into two types b
7 min read