0% found this document useful (0 votes)
8 views

Compiler Design Lab Manual 05.02.2024_Final

The document is a lab manual for a Compiler Design course at Medi-Caps University, detailing experiments on designing a lexical analyzer and computing first and follow sets of context-free grammars (CFG). It includes objectives, theoretical background, program logic, and C code examples for implementing the lexical analyzer and calculating first and follow sets. The manual serves as a practical guide for students to understand and apply compiler design concepts.

Uploaded by

en21cs301878
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Compiler Design Lab Manual 05.02.2024_Final

The document is a lab manual for a Compiler Design course at Medi-Caps University, detailing experiments on designing a lexical analyzer and computing first and follow sets of context-free grammars (CFG). It includes objectives, theoretical background, program logic, and C code examples for implementing the lexical analyzer and calculating first and follow sets. The manual serves as a practical guide for students to understand and apply compiler design concepts.

Uploaded by

en21cs301878
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 71

MEDI-CAPS UNIVERSITY, INDORE

DEPARTMENT
OF

COMPUTER SCIENCE & ENGINEERING

Lab Manual

Subject: Compiler Design


Subject Code:CS3CO27

Session: 2023-24
CS3CO27: COMPILER DESIGN Experiment no- 1
Experiment Title: Write a program to Design Lexical Analyzer to Page 2 of 6
recognize keyword

Practical 1 & 2
1. Objective (s): WRITE A PROGRAM TO DESIGN LEXICAL ANALYZER
TO RECOGNIZE KEYWORD

2. Theory:
Lexical analysis is the starting phase of the compiler. It gathers modified source code that is written in
the form of sentences from the language preprocessor. The lexical analyzer is responsible for breaking
these syntaxes into a series of tokens, by removing whitespace in the source code. If the lexical
analyzer gets any invalid token, it generates an error. The stream of character is read by it and it seeks
the legal tokens, and then the data is passed to the syntax analyzer, when it is asked for.

Terminologies:
There are three terminologies-

Token

Pattern

Lexeme
Token: It is a sequence of characters that represents a unit of information in the source code.
Pattern: The description used by the token is known as a pattern.
Lexeme: A sequence of characters in the source code, as per the matching pattern of a token, is
known as lexeme. It is also called the instance of a token.

Architecture of Lexical Analyzer


To read the input character in the source code and produce a token is the most important task of a
lexical analyzer. The lexical analyzer goes through with the entire source code and identifies each
token one by one. The scanner is responsible to produce tokens when it is requested by the parser.
The lexical analyzer avoids the whitespace and comments while creating these tokens. If any error
occurs, the analyzer correlates these errors with the source file and line number.

Fig 1. Lexical analyzer


Roles and Responsibility of Lexical Analyzer

The lexical analyzer performs the following tasks-



The lexical analyzer is responsible for removing the white spaces and comments from
the source program.

It corresponds to the error messages with the source program.


It helps to identify the tokens.


The input characters are read by the lexical analyzer from the source code.

Fig 2. Role of Lexical analyzer

3. Program Logic :

Lexical Analyzer for Recognizing Keywords in C Code


1. Define a function isKeyword(word: string) -> int
a. Define an array keywords[ ][ ] containing C keywords.
b. Iterate over the keywords array.
c. If word matches any keyword, return 1 (true).
d. If no match is found, return 0 (false).
2. Define a function lexicalAnalysis(inputCode: string) -> void
a. Define delimiters[ ] containing whitespace, tabs, newlines, parentheses, braces, brackets,
semicolon, comma, and various operators.
b. Use strtok to tokenize inputCode based on delimiters.
c. Iterate over each token.
i. If isKeyword(token) returns true,
- Print "Keyword: token".
3. Define the main function
a. Print "Enter your C code (press Enter to finish):".
b. Read input C code from the user until Enter is pressed.
c. Remove the newline character from the input.
d. Call lexicalAnalysis with the input code.
e. Exit the program.
Example in Pseudocode:

Function isKeyword(word: string) -> int


keywords[ ][ ] = {"auto", "break", ...}

for each keyword in


keywords if word equals
keyword
return 1 (true)

return 0 (false)

Function lexicalAnalysis(inputCode: string) -> void


delimiters[ ] = " \t\n( ){ }[ ];,.+-*/%&|^< >=!"

token = strtok(inputCode, delimiters)

while token is not NULL


if isKeyword(token) equals 1
print "Keyword: " + token

token = strtok(NULL, delimiters)

Function main
print "Enter your C code (press Enter to finish):"
inputCode = readInputUntilEnter( )
inputCode = removeNewlineCharacter(inputCode)
lexicalAnalysis(inputCode)
exit program
4. Program:
#include <stdio.h>
#include <string.h>

int isKeyword(char *word) {


char keywords[][10] = {"auto", "break", "case", "char", "const", "continue", "default", "do", "double",
"else", "enum", "extern", "float", "for", "goto", "if", "int", "long", "register", "return", "short", "signed",
"sizeof", "static", "struct", "switch", "typedef", "union", "unsigned", "void", "volatile", "while"};

for (int i = 0; i < sizeof(keywords) / sizeof(keywords[0]); i++) {


if (strcmp(word, keywords[i]) == 0) {
return 1;
}
}
return 0;
}
void lexicalAnalysis(char *inputCode) {
char delimiters[] = " \t\n(){}[];,.+-*/%&|^<>=!";
char *token = strtok(inputCode, delimiters);

while (token != NULL) {


if (isKeyword(token)) {

printf("Keyword: %s\n", token);


}

token = strtok(NULL, delimiters);

}
}
int main() {
printf("Enter your C code (press Enter to finish):\n");
char cCode[1000];
fgets(cCode, sizeof(cCode), stdin);
cCode[strcspn(cCode, "\n")] = '\
0';
lexicalAnalysis(cCode);
return 0;}

5. OUTPUT:
CS3CO27: COMPILER DESIGN Experiment no- 3&4
Experiment Title: Write a program to compute first and follow set Page 7 of 16
of CFG.

Practical: 3 & 4
Objective (s): Write a program to compute first and follow set of CFG.

2. Theory:
FIRST set is a concept used in syntax analysis, specifically in the context of LL and LR parsing
algorithms. It is a set of terminals that can appear immediately after a given non-terminal in a
grammar.
The FIRST set of a non-terminal A is defined as the set of terminals that can appear as the first
symbol in any string derived from A. If a non-terminal A can derive the empty string, then the
empty string is also included in the FIRST set of A.
The FIRST set is used to determine which production rule should be used to expand a non-
terminal in an LL or LR parser. For example, in an LL parser, if the next symbol in the input
stream is in the FIRST set of a non-terminal, then that non-terminal can be safely expanded
using the production rule that starts with that symbol
.
FOLLOW set in compiler design are used to identify the terminal symbol immediately after a
non- terminal in a given language. FOLLOW set is also used to avoid backtracking the same as
the FIRST set. The only difference is FOLLOW set works on vanishing non-terminal on the
right- hand side so that decision-making gets easier for the compiler while parsing.
Follow(X) to be the set of terminals that can appear immediately to the right of Non-Terminal X
in some sentential form.

3.Algorithm/Rules:
First set:
1. If x is a terminal, then FIRST(x) = { ‘x’ }
2. If x-> ?, is a production rule, then add ? to
FIRST(x). 3.If X->Y1 Y2 Y3….Yn is a production,
1.FIRST(X) = FIRST(Y1)
2. If FIRST(Y1) contains ? then FIRST(X) = { FIRST(Y1) – ? } U { FIRST(Y2) }
3. If FIRST (Yi) contains ? for all i = 1 to n, then add ? to FIRST(X).
Follow set:
1.FOLLOW(S) = { $ } // where S is the starting Non-Terminal
2. If A -> pBq is a production, where p, B and q are any grammar
symbols, then everything in FIRST(q) except Є is in FOLLOW(B).
3. If A->pB is a production, then everything in FOLLOW(A) is in FOLLOW(B).
.4. If A->pBq is a production and FIRST(q) contains Є,
then FOLLOW(B) contains { FIRST(q) – Є } U FOLLOW(A)
4. Program:

// C program to calculate the First and


// Follow sets of a given grammar
#include <ctype.h>
#include <stdio.h>
#include <string.h>

// Functions to calculate Follow


void followfirst(char, int, int);
void follow(char c);
// Function to calculate First
void findfirst(char, int, int);
int count, n = 0;

// Stores the final result


// of the First Sets
char calc_first[10][100];

// Stores the final result


// of the Follow Sets
char calc_follow[10][100];
int m = 0;

// Stores the production rules


char production[10][10];
char f[10], first[10];
int k;
char ck;
int e;
int main(int argc, char** argv)
{
int jm = 0;
int km = 0;
int i, choice;
char c, ch;
count = 8;
// The Input grammar
strcpy(production[0], "X=TnS");
strcpy(production[1], "X=Rm");
strcpy(production[2], "T=q");
strcpy(production[3], "T=#");
strcpy(production[4], "S=p");
strcpy(production[5], "S=#");
strcpy(production[6], "R=om");
strcpy(production[7], "R=ST");

int kay;
char done[count];
int ptr = -1;

// Initializing the calc_first array


for (k = 0; k < count; k++) {
for (kay = 0; kay < 100; kay++) {
calc_first[k][kay] = '!';
}
}
int point1 = 0, point2, xxx;
for (k = 0; k < count; k++) {
c = production[k][0];
point2 = 0;
xxx = 0;

// Checking if First of c has


// already been calculated
for (kay = 0; kay <= ptr; kay++)
if (c == done[kay])
xxx = 1;

if (xxx == 1)
continue;

// Function call
findfirst(c, 0, 0);
ptr += 1;

// Adding c to the calculated list


done[ptr] = c;
printf("\n First(%c) = { ", c);
calc_first[point1][point2++] =
c;

// Printing the First Sets of the grammar


for (i = 0 + jm; i < n; i++) {
int lark = 0, chk = 0;

for (lark = 0; lark < point2; lark++) {

if (first[i] == calc_first[point1][lark]) {
chk = 1;
break;
}
}
if (chk == 0) {
printf("%c, ", first[i]);
calc_first[point1][point2++] = first[i];
}
}
printf("}\n");
jm = n;
point1++;
}
printf("\n");
printf(" "
"\n\n");
char donee[count];
ptr = -1;

// Initializing the calc_follow array


for (k = 0; k < count; k++) {
for (kay = 0; kay < 100; kay++) {
calc_follow[k][kay] = '!';
}
}
point1 = 0;
int land =
0;
for (e = 0; e < count; e++) {
ck = production[e][0];
point2 = 0;
xxx = 0;

// Checking if Follow of ck
// has already been calculated
for (kay = 0; kay <= ptr; kay+
+)
if (ck ==
donee[kay])
xxx = 1;

if (xxx == 1)
continue;
land += 1;
// Function call
follow(ck);
ptr += 1;

// Adding ck to the calculated list


donee[ptr] = ck;
printf(" Follow(%c) = { ", ck);
calc_follow[point1][point2++] = ck;

// Printing the Follow Sets of the grammar


for (i = 0 + km; i < m; i++) {
int lark = 0, chk = 0;
for (lark = 0; lark < point2; lark++) {
if (f[i] == calc_follow[point1][lark]) {
chk = 1;
break;
}
}
if (chk == 0) {
printf("%c, ", f[i]);
calc_follow[point1][point2++] =
f[i];
}
}
printf(" }\n\n");
km = m;
point1++;
}
}
void follow(char c)
{
int i, j;

// Adding "$" to the follow


// set of the start symbol
if (production[0][0] == c) {
f[m++] = '$';
}
for (i = 0; i < 10; i++) {
for (j = 2; j < 10; j++) {
if (production[i][j] == c) {
if (production[i][j + 1] != '\0') {
// Calculate the first of the next
// Non-Terminal in the
production
followfirst(production[i][j + 1], i,
(j + 2));
}

if (production[i][j + 1] == '\0'
&& c != production[i][0]) {
// Calculate the follow of the
// Non-Terminal in the L.H.S. of the
// production
follow(production[i][0]);
}
}
}
}
}

void findfirst(char c, int q1, int q2)


{
int j;

// The case where we


// encounter a Terminal
if (!(isupper(c))) {
first[n++] = c;
}
for (j = 0; j < count; j++) {
if (production[j][0] == c) {
if (production[j][2] == '#') {
if (production[q1][q2] == '\0')
first[n++] = '#';
else if (production[q1][q2] != '\0'
&& (q1 != 0 || q2 != 0)) {
// Recursion to calculate First of New
// Non-Terminal we encounter after
// epsilon findfirst(production[q1]
[q2], q1,
(q2 + 1));
}
else

first[n++] = '#';
}

else if (!isupper(production[j][2])) { first[n+


+] = production[j][2];
}
else {
// Recursion to calculate First of
// New Non-Terminal we encounter
// at the beginning
findfirst(production[j][2], j,
} 3);
}
}
}

void followfirst(char c, int c1, int c2)


{
int k;
// The case where we encounter
// a Terminal
if (!(isupper(c)))
f[m++] = c;
else
{
int i = 0, j = 1;
for (i = 0; i < count; i++) {
if (calc_first[i][0] == c)
break;
}

// Including the First set of the


// Non-Terminal in the Follow of
// the original query
while (calc_first[i][j] != '!') {
if (calc_first[i][j] != '#') { f[m+
+] = calc_first[i][j];
}
else {
if (production[c1][c2] == '\0') {
// Case where we reach the
// end of a production
follow(production[c1][0]);
}
else
{
// Recursion to the next symbol
// in case we encounter a "#" followfirst(production[c1]
[c2], c1,
c2 + 1);

}
} j+
+;
}
}
}

4.Output:
First(X) = { q, n, o, p, #,m}
First(T) = { q, #, }
First(S) = { p, #, }
First(R) = { o, p, q, #, }

Follow(X) = { $, }
Follow(T) = { n, m, }
Follow(S) = { $, q, m, }
Follow(R) = { m, }
CS3CO27: COMPILER DESIGN Experiment no- 5
Page 17 of 21
Experiment Title: Write a program for implementation of
Predictive Parsing Table for LL (1) grammar

Practical 5:

Objective (s): Write a program for implementation of Predictive Parsing Table for LL (1) grammar
2. Theory:
Construction of LL(1) Parsing Table

LL(1) Parsing: Here the 1st L represents that the scanning of the Input will be done from the
Left to Right manner and the second L shows that in this parsing technique, we are going to use
the Left most Derivation Tree. And finally, the 1 represents the number of look-ahead, which
means how many symbols are you going to see when you want to make a decision.

Essential conditions to check first are as follows:

The grammar is free from left


recursion. The grammar should not be
ambiguous.
The grammar has to be left factored in so that the grammar is deterministic grammar.
These conditions are necessary but not sufficient for proving a LL(1) parser.
3. Algorithm:

Algorithm to construct LL(1) Parsing Table:


Step 1: First check all the essential conditions mentioned above and go to step 2.

Step 2: Calculate First() and Follow() for all non-terminals.

First(): If there is a variable, and from that variable, if we try to drive all the strings then the
beginning Terminal Symbol is called the First.
Follow(): What is the Terminal Symbol which follows a variable in the process of derivation.
Step 3: For each production A –> α. (A tends to alpha)

Find First(α) and for each terminal in First(α), make entry A –> α in the table.
If First(α) contains ε (epsilon) as terminal, then find the Follow(A) and for each terminal in
Follow(A), make entry A –> ε in the table.
If the First(α) contains ε and Follow(A) contains $ as terminal, then make entry A –> ε in the
table for the $.
To construct the parsing table, we have two functions:
In the table, rows will contain the Non-Terminals and the column will contain the Terminal
Symbols. All the Null Productions of the Grammars will go under the Follow elements and the
remaining productions will lie under the elements of the First set.

Now, let’s understand with an


example. Example 1: Consider the
Grammar:

E --> TE'
E' --> +TE' | ε
T --> FT'
T' --> *FT' | ε
F --> id | (E)

*ε denotes epsilon
Step 1: The grammar satisfies all properties in step 1.
Step 2: Calculate first() and

follow(). Find their First and Follow

sets:

First Follow

E –> TE’ { id, ( } { $, ) }

E’ –> +TE’/ε { +, ε } { $, ) }

T –> FT’ { id, ( } { +, $, ) }

T’ –> *FT’/ε { *, ε } { +, $, ) }

F –> id/(E) { id, ( } { *, +, $, ) }

Step 3: Make a parser table.


Now, the LL(1) Parsing Table
is:

Id + * ( ) $

E E –> TE’ E –> TE’

E’ E’ –> +TE’ E’ –> ε E’ –> ε

T T –> FT’ T –> FT’

T’ T’ –> ε T’ –> *FT’ T’ –> ε T’ –> ε

F F –> id F –> (E)

As you can see that all the null productions are put under the Follow set of that symbol and all
the remaining productions lie under the First of that symbol.

Note: Every grammar is not feasible for LL(1) Parsing table. It may be possible that one cell may
contain more than one production.

4. Program:
#include<stdio.h>
#include<stdlib.h>
#include<conio.h>
#include<string.h>
char s[20],stack[20];
int main()
{
char m[5][6][3]={"tb"," "," ","tb"," "," "," ","+tb"," "," ","n","n","fc"," "," ","fc"," "," "," ","n","*fc","
a ","n","n","i"," "," ","(e)"," "," "};
int size[5][6]={2,0,0,2,0,0,0,3,0,0,1,1,2,0,0,2,0,0,0,1,3,0,1,1,1,0,0,3,0,0};
int i,j,k,n,str1,str2;
printf("\n Enter the input string:
"); scanf("%s",s);
strcat(s,"$");
n=strlen(s);
stack[0]='$';
stack[1]='e';
i=1;
j=0;
printf("\nStack Input\n");
printf(" \n");
while((stack[i]!='$')&&(s[j]!='$'))
{
if(stack[i]==s[j])
{
i--;
j++;
}
switch(stack[i])
{
case 'e': str1=0;
break;
case 'b': str1=1;
break;
case 't': str1=2;
break;
case 'c': str1=3;
break;
case 'f': str1=4;
break;
}
switch(s[j])
{
case 'i': str2=0;
break;
case '+': str2=1;
break;
case '*': str2=2;
break;
case '(': str2=3;
break;
case ')': str2=4;
break;
case '$': str2=5;
break;
}
if(m[str1][str2][0]=='\0')
{
printf("\nERROR");
exit(0);
}
else if(m[str1][str2][0]=='n')
i--;
else if(m[str1][str2][0]=='i')
stack[i]='i';
else
{
for(k=size[str1][str2]-1;k>=0;k--)
{
stack[i]=m[str1][str2][k];
i++;
}
i--;
}
for(k=0;k<=i;k++)
printf(" %c",stack[k]);
printf(" ");
for(k=j;k<=n;k++)
printf("%c",s[k]);
printf(" \n ");
}
printf("\n SUCCESS");
return 0;
}

5. Output:
CS3CO27: COMPILER DESIGN Experiment no- 6
Page 22 of 28
Experiment Title: Write a program for implementation of
Predictive Parser

Practical: 6

Objective (s): Write a program for implementation of Predictive Parser

Theory:

Predictive Parser in Compiler Design


Predictive Parser:
A predictive parser is a recursive descent parser with no backtracking or backup. It is a top-
down parser that does not require backtracking. At each step, the choice of the rule to be
expanded is made upon the next terminal symbol.

A -> A1 | A2 | ... | An
If the non-terminal is to be further expanded to ‘A’, the rule is selected based on the
current input symbol ‘a’ only.

Predictive Parser Algorithm:

Make a transition diagram(DFA/NFA) for every rule of grammar.


Optimize the DFA by reducing the number of states, yielding the final transition diagram.
Simulate the string on the transition diagram to parse a string.
If the transition diagram reaches an accept state after the input is consumed, it is parsed.
Consider the following grammar –
E->E+T|T
T->T*F|F
F->(E)|id
After removing left recursion, left factoring

E->TT'
T'->+TT'|ε
T->FT''
T''->*FT''|ε
F->(E)|id

STEP 1:
Make a transition diagram(DFA/NFA) for every rule of
grammar. E->TT’

T’->+TT’|ε

T->FT”

T”->*FT”|ε
F->(E)|id

STEP 2:
Optimize the DFA by decreases the number of states, yielding the final transition diagram.

T’->+TT’|ε

It can be optimized ahead by combining it with DFA for E->TT’


Accordingly, we optimize the other structures to produce the following DFA

STEP 3:
Simulation on the input string.
Steps involved in the simulation procedure are:

Start from the starting state.


If a terminal arrives consume it, move to the next state.
If a non-terminal arrive go to the state of the DFA of the non-terminal and return on reached up
to the final state.
Return to actual DFA and Keep doing parsing.
If one completes reading the input string completely, you reach a final state, and the string is
successfully parsed.
Program:
#include <stdio.h>
#include <string.h>
char prol[7][10] = { "S", "A", "A", "B", "B", "C", "C" };
char pror[7][10] = { "A", "Bb", "Cd", "aB", "@", "Cc", "@" };
char prod[7][10] = { "S->A", "A->Bb", "A->Cd", "B->aB", "B->@", "C->Cc", "C->@" };
char first[7][10] = { "abcd", "ab", "cd", "a@", "@", "c@", "@" };
char follow[7][10] = { "$", "$", "$", "a$", "b$", "c$", "d$" };
char table[5][6][10];

int numr(char c){


switch (c){
case 'S':
return 0;

case 'A':
return 1;

case 'B':
return 2;

case 'C':
return 3;

case 'a':
return 0;

case 'b':
return 1;

case 'c':
return 2;

case 'd':
return 3;

case '$':
return 4;
}

return (2);
}

int main(){
int i, j, k;
for (i = 0; i < 5; i++)
for (j = 0; j < 6; j++)
strcpy(table[i][j], " ");
printf("The following grammar is used for Parsing Table:\n");
for (i = 0; i < 7; i++)
printf("%s\n", prod[i]); printf("\

nPredictive parsing table:\n");

fflush(stdin);
for (i = 0; i < 7; i++)
{k=
strlen(first[i]);
for (j = 0; j < 10; j++)
if (first[i][j] != '@')
strcpy(table[numr(prol[i][0]) + 1][numr(first[i][j]) + 1], prod[i]);
}
for (i = 0; i < 7; i++){
if (strlen(pror[i]) == 1){
if (pror[i][0] == '@'){
k = strlen(follow[i]);
for (j = 0; j < k; j++)
strcpy(table[numr(prol[i][0]) + 1][numr(follow[i][j]) + 1], prod[i]);
}
}
}
strcpy(table[0][0], " ");

strcpy(table[0][1], "a");

strcpy(table[0][2], "b");

strcpy(table[0][3], "c");

strcpy(table[0][4], "d");

strcpy(table[0][5], "$");

strcpy(table[1][0], "S");

strcpy(table[2][0], "A");

strcpy(table[3][0], "B");

strcpy(table[4][0], "C");

for (i = 0; i < 5; i++)


for (j = 0; j < 6; j++){
printf("%-10s", table[i][j]);
if (j == 5)
}
}

1. Output:
The following grammar is used for Parsing
Table: S->A A->Bb A->Cd B->aB B->@ C->Cc C-
>@

Predictive parsing table:

a b c d $

S S->A S->A S->A S->A


A A->Bb A->Bb A->Cd A->Cd
B B->aB B->@ B->@ B->@

C C->@ C->@ C->@


CS3CO27: COMPILER DESIGN Experiment no- 7
Experiment Title: Write a program to develop an operator Page 29 of 31
precedence parser.

Practical 7
Objective
Write a program to develop an operator precedence parser.
Theory
Operator precedence parser – An operator precedence parser is a bottom-up parser that interprets an
operator grammar. This parser is only used for operator grammars. Ambiguous grammars are not
allowed in any parser except operator precedence parser. There are two methods for determining
what precedence relations should hold between a pair of terminals:

Use the conventional associativity and precedence of operator.


The second method of selecting operator-precedence relations is first to construct an unambiguous
grammar for the language, a grammar that reflects the correct associativity and precedence in its parse
trees.
This parser relies on the following three precedence relations: ⋖, ≐, ⋗ a ⋖ b This means a “yields
precedence to” b. a ⋗ b This means a “takes precedence over” b. a ≐ b This means a “has same
precedence as” b.

Figure – Operator precedence relation table for grammar E->E+E/E*E/id There is not given any relation
between id and id as id will not be compared and two variables can not come side by side. There is also
a disadvantage of this table – if we have n operators then size of table will be n*n and complexity will
be 0(n2). In order to decrease the size of table, we use operator function table. Operator precedence
parsers usually do not store the precedence table with the relations; rather they are implemented in a
special way. Operator precedence parsers use precedence functions that map terminal symbols to
integers, and the precedence relations between the symbols are implemented by numerical
comparison. The parsing table can be encoded by two precedence functions f and g that map terminal
symbols to integers. We select f and g such that:

f(a) < g(b) whenever a yields precedence to b


f(a) = g(b) whenever a and b have the same
precedence f(a) > g(b) whenever a takes precedence
over b Example – Consider the following grammar:
E -> E + E/E * E/( E )/id
This is the directed graph representing the precedence function:

Since there is no cycle in the graph, we can make this function table:

fid -> g* -> f+ ->g+ -> f$


gid -> f* -> g* ->f+ -> g+ ->f$
Size of the table is 2n. One disadvantage of function tables is that even though we have blank entries
in relation table we have non-blank entries in function table. Blank entries are also called error. Hence
error detection capability of relation table is greater than function table.
Algorithm

Program
#include<stdlib.h>
#include<stdio.h>
#include<string.h>
// function f to exit from the loop
// if given condition is not
true void f(){
printf("Not operator grammar");
exit(0);
}
void main(){
char grm[20][20], c;
// Here using flag variable,
// considering grammar is not operator
grammar int i, n, j = 2, flag = 0;
// taking number of productions from
user scanf("%d", &n);

for (i = 0; i < n; i++)


scanf("%s", grm[i]);
for (i = 0; i < n; i++) {
c = grm[i][2];
while (c != '&#092;&#048;') {
if (grm[i][3] == '+' || grm[i][3] == '-'

|| grm[i][3] == '*' || grm[i][3] == '/')


flag = 1;

else{

flag = 0;
f();
}

if (c == '$') {
flag = 0;
f();

c = grm[i][++j];
}
}

if (flag == 1)
printf("Operator grammar");
}
Output
CS3CO27: COMPILER DESIGN Experiment no- 8
Experiment Title: Write a program to design LALR Bottom-up Parser. Page 32 of 37

Practical 8

Objective: Write a program to design LALR Bottom-up Parser.

Theory:
LALR Parser:
LALR Parser is lookahead LR parser. It is the most powerful parser which can handle large classes of
grammar. The size of CLR parsing table is quite large as compared to other parsing table. LALR reduces the
size of this table.LALR works similar to CLR. The only difference is , it combines the similar states of CLR
parsing table into one single state.
The general syntax becomes [A->.B, a ]
where A->.B is production and a is a terminal or right end marker
$ LR(1) items=LR(0) items + look ahead

How to add lookahead with the


production? CASE 1 –

A->.BC, a

Suppose this is the 0th production.Now, since ‘ . ‘ precedes B,so we have to write B’s productions as
well. B->.D [1st production]
Suppose this is B’s production. The look ahead of this production is given as- we look at previous
production i.e. – 0th production. Whatever is after B, we find FIRST(of that value) , that is the lookahead
of 1st production. So, here in 0th production, after B, C is there. Assume FIRST(C)=d, then 1st production
become.
B->.D, d
CASE 2 –
Now if the 0th production was like
this, A->.B, a

Here,we can see there’s nothing after B. So the lookahead of 0th production will be the lookahead of 1st
production. ie-
B->.D, a
CASE 3 –
Assume a production A->a|
b A->a,$ [0th production]

A->b,$ [1st production]


Here, the 1st production is a part of the previous production, so the lookahead will be the same as that of
its previous production.
Steps for constructing the LALR parsing table :
Writing augmented grammar

LR(1) collection of items to be found


Defining 2 functions: goto[list of terminals] and action[list of non-terminals] in the LALR parsing table
EXAMPLE
Construct CLR parsing table for the given context free
grammar S-->AA
A-->aA|b
Solution:
STEP1- Find augmented grammar
The augmented grammar of the given grammar
is:- S'-->.S ,$ [0th production]

S-->.AA ,$ [1st production]


A-->.aA ,a|b [2nd
production] A-->.b ,a|b [3rd
production]
Let’s apply the rule of lookahead to the above
productions. The initial look ahead is always $
Now,the 1st production came into existence because of ‘ . ‘ before ‘S’ in 0th production.There is nothing
after ‘S’, so the lookahead of 0th production will be the lookahead of 1st production. i.e. : S–>.AA ,$
Now,the 2nd production came into existence because of ‘ . ‘ before ‘A’ in the 1st production.
After ‘A’, there’s ‘A’. So, FIRST(A) is a,b. Therefore, the lookahead of the 2nd production becomes a|b.
Now,the 3rd production is a part of the 2nd production.So, the look ahead will be the same.

STEP2 – Find LR(0) collection of items


Below is the figure showing the LR(0) collection of items. We will understand everything one by one.
The terminals of this grammar are {a,b}
The non-terminals of this grammar are {S,A}
RULES –
1. If any non-terminal has ‘ . ‘ preceding it, we have to write all its production and add ‘ . ‘
preceding each of its production.
2. from each state to the next state, the ‘ . ‘ shifts to one place to the
right. In the figure, I0 consists of augmented grammar.
Io goes to I1 when ‘ . ‘ of 0th production is shifted towards the right of S(S’->S.). This state is the accept
state . S is seen by the compiler. Since I1 is a part of the 0th production, the lookahead is same i.e. $
Io goes to I2 when ‘ . ‘ of 1st production is shifted towards right (S->A.A) . A is seen by the compiler. Since
I2 is a part of the 1st production, the lookahead is same i.e. $.
I0 goes to I3 when ‘ . ‘ of 2nd production is shifted towards the right (A->a.A) . a is seen by the
compiler.since I3 is a part of 2nd production, the lookahead is same i.e. a|b.
I0 goes to I4 when ‘ . ‘ of 3rd production is shifted towards right (A->b.) . b is seen by the compiler.
Since I4 is a part of 3rd production, the lookahead is same i.e. a|b.
I2 goes to I5 when ‘ . ‘ of 1st production is shifted towards right (S->AA.) . A is seen by the compiler.
Since I5 is a part of the 1st production, the lookahead is same i.e. $.
I2 goes to I6 when ‘ . ‘ of 2nd production is shifted towards the right (A->a.A) . A is seen by the compiler.
Since I6 is a part of the 2nd production, the lookahead is same i.e. $.
I2 goes to I7 when ‘ . ‘ of 3rd production is shifted towards right (A->b.) . A is seen by the compiler.
Since I6 is a part of the 3rd production, the lookahead is same i.e. $.
I3 goes to I3 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is seen by the compiler.
Since I3 is a part of the 2nd production, the lookahead is same i.e. a|b.
I3 goes to I8 when ‘ . ‘ of 2nd production is shifted towards the right (A->aA.) . A is seen by the compiler.
Since I8 is a part of the 2nd production, the lookahead is same i.e. a|b.
I6 goes to I9 when ‘ . ‘ of 2nd production is shifted towards the right (A->aA.) . A is seen by the compiler.
Since I9 is a part of the 2nd production, the lookahead is same i.e. $.
I6 goes to I6 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is seen by the compiler.
Since I6 is a part of the 2nd production, the lookahead is same i.e. $.
I6 goes to I7 when ‘ . ‘ of the 3rd production is shifted towards right (A->b.) . b is seen by the compiler.
Since I6 is a part of the 3rd production, the lookahead is same i.e. $.
STEP 3 –
Defining 2 functions: goto[list of terminals] and action[list of non-terminals] in the parsing table.Below is
the CLR parsing table
Once we make a CLR parsing table, we can easily make a LALR parsing table from
it. In the step2 diagram, we can see that
I3 and I6 are similar except their
lookaheads. I4 and I7 are similar except
their lookaheads. I8 and I9 are similar
except their lookaheads.
In LALR parsing table construction , we merge these similar
states. Wherever there is 3 or 6, make it 36(combined form)
Wherever there is 4 or 7, make it 47(combined form)
Wherever there is 8 or 9, make it 89(combined form)
Below is the LALR parsing table.
Now we have to remove the unwanted rows
As we can see, 36 row has same data twice, so we delete 1 row.
We combine two 47 row into one by combining each value in the single 47
row. We combine two 89 row into one by combining each value in the single
89 row. The final LALR table looks like the below.

Algorithm

token = next_token()

repeat forever
s = top of stack

if action[s, token] = “shift si” then


PUSH token
PUSH si
token = next_token()

else if action[s, token] = “reduce A::= β“ then


POP 2 * |β| symbols
s = top of stack
PUSH A
PUSH goto[s,A]

else if action[s, token] = “accept” then


return

else
error()
Program
< parser.l >%{ #include<stdio.h> #include "y.tab.h" %}%%[0 - 9] + { yylval.dval = atof (yytext); return
DIGIT;}\n |.return yytext[0];%%<parser.y >%{/*This YACC specification file generates the LALR parser for
the program considered in experiment 4.*/ #include<stdio.h> %}%union{double dval;} %token < dval >
DIGIT %type < dval > expr %type < dval > term %type < dval > factor %%line:expr '\n'{;printf ("%g\n",
$1);}expr:expr '+' term{$$ = $1 + $3;}|term;term:term '*' factor{ $$ = $1 * $3;}|factor;factor:'(' expr ')'{$$
= $2;}|DIGIT;%%39|P a g e int main (){ Print(b

Output
$ lex parser.l
$ yacc –d parser.y

$cc lex.yy.c y.tab.c –ll –lm


$./a.out
2+3

5.0000
CS3CO27: COMPILER DESIGN Experiment no- 9
Experiment Title: Write a program for generating various Page 38 of 46
intermediate code forms-Polish notation:
a. Infix to prefix
b. Infix to postfix

Practical 9
Objective (s): Write a program for generating various intermediate code forms-Polish notation:
c. Infix to prefix
d. Infix to postfix

2. Theory:
In the analysis-synthesis model of a compiler, the front end of a compiler translates a source program into
an independent intermediate code, then the back end of the compiler uses this intermediate code to
generate the target code (which can be understood by the machine). The benefits of using machine-
independent intermediate code are:
Because of the machine-independent intermediate code, portability will be enhanced. For ex, suppose, if
a compiler translates the source language to its target machine language without having the option for
generating intermediate code, then for each new machine, a full native compiler is required. Because,
obviously, there were some modifications in the compiler itself according to the machine specifications.
Retargeting is facilitated.

It is easier to apply source code modification to improve the performance of source code by optimizing

the intermediate code.


If we generate machine code directly from source code then for n target machine we will have
optimizers and n code generator but if we will have a machine-independent intermediate code, we will
have only one optimizer. Intermediate code can be either language-specific (e.g., Bytecode for Java) or
language. independent (three-address code). The following are commonly used intermediate code
representations:
Postfix Notation:
Also known as reverse Polish notation or suffix notation.
In the infix notation, the operator is placed between operands, e.g., a + b. Postfix notation positions the
operator at the right end, as in ab +.
For any postfix expressions e1 and e2 with a binary operator (+) , applying the operator yields e1e2+.
Postfix notation eliminates the need for parentheses, as the operator’s position and arity allow
unambiguous expression decoding.
In postfix notation, the operator consistently follows the operand.
Example 1: The postfix representation of the expression (a + b) * c is : ab + c *
Example 2: The postfix representation of the expression (a – b) * (c + d) + (a – b) is : ab – cd + *ab -+
Read more: Infix to Postfix

Three-Address Code:
A three address statement involves a maximum of three references, consisting of two for operands and
one for the result.
A sequence of three address statements collectively forms a three address code.
The typical form of a three address statement is expressed as x = y op z, where x, y, and z
represent memory addresses.
Each variable (x, y, z) in a three address statement is associated with a specific memory location.
While a standard three address statement includes three references, there are instances where a
statement may contain fewer than three references, yet it is still categorized as a three address
statement. Example: The three address code for the expression a + b * c + d : T1 = b * c T2 = a + T1 T3 =
T2 + d; T 1 , T2 , T3 are temporary variables.
There are 3 ways to represent a Three-Address Code in compiler design:
i) Quadruples
ii) Triples

iii) Indirect Triples


Read more: Three-address code

Syntax Tree:
A syntax tree serves as a condensed representation of a parse tree.
The operator and keyword nodes present in the parse tree undergo a relocation process to become
part of their respective parent nodes in the syntax tree. the internal nodes are operators and child
nodes are operands.
Creating a syntax tree involves strategically placing parentheses within the expression. This technique
contributes to a more intuitive representation, making it easier to discern the sequence in which
operands should be processed.
The syntax tree not only condenses the parse tree but also offers an improved visual representation of the
program’s syntactic structure,
Example: x = (a + b * c) / (a – b * c)
Advantages of Intermediate Code Generation:
Easier to implement: Intermediate code generation can simplify the code generation process by reducing
the complexity of the input code, making it easier to implement.

Facilitates code optimization: Intermediate code generation can enable the use of various code
optimization techniques, leading to improved performance and efficiency of the generated code.

Platform independence: Intermediate code is platform-independent, meaning that it can be translated


into machine code or bytecode for any platform.

Code reuse: Intermediate code can be reused in the future to generate code for other platforms or
languages.

Easier debugging: Intermediate code can be easier to debug than machine code or bytecode, as it is closer
to the original source code.

Disadvantages of Intermediate Code Generation:


Increased compilation time: Intermediate code generation can significantly increase the compilation
time, making it less suitable for real-time or time-critical applications.

Additional memory usage: Intermediate code generation requires additional memory to store the
intermediate representation, which can be a concern for memory-limited systems.

Increased complexity: Intermediate code generation can increase the complexity of the compiler design,
making it harder to implement and maintain.

Reduced performance: The process of generating intermediate code can result in code that executes
slower than code generated directly from the source code.

3. Algorithm:
// to be added
4. Program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// Function to return precedence of operators
int prec(char c) {
if (c == '^')
return 3;
else if (c == '/' || c == '*')
return 2;
else if (c == '+' || c == '-')
return 1;
else
return -1;
}
// Function to return associativity of operators
char associativity(char c) {
if (c == '^')
return 'R';
return 'L'; // Default to left-associative
}

// The main function to convert infix expression to postfix expression


void infixToPostfix(char s[]) {
char result[1000];
int resultIndex = 0;
int len = strlen(s);
char stack[1000];
int stackIndex = -1;

for (int i = 0; i < len; i++)


{ char c = s[i];

// If the scanned character is an operand, add it to the output string.


if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9')) {
result[resultIndex++] = c;
}
// If the scanned character is an ‘(‘, push it to the stack.
else if (c == '(') {
stack[++stackIndex] = c;
}
// If the scanned character is an ‘)’, pop and add to the output string from the
stack
// until an ‘(‘ is
encountered. else if (c == ')')
{
while (stackIndex >= 0 && stack[stackIndex] != '(')
{ result[resultIndex++] =
stack[stackIndex--];
}
stackIndex--; // Pop '('
}
// If an operator is
scanned else {
while (stackIndex >= 0 && (prec(s[i]) < prec(stack[stackIndex]) ||
prec(s[i]) ==
prec(stack[stackIndex]) &&
associativity(s[i]) ==
'L')) {
result[resultIndex++] = stack[stackIndex--];
}
stack[++stackIndex] = c;
}
}

// Pop all the remaining elements from the


stack while (stackIndex >= 0) {
result[resultIndex++] = stack[stackIndex--];
}

result[resultIndex] = '\0';
printf("%s\n", result);
}

// Driver code
int main() {
char exp[] = "a+b*(c^d-e)^(f+g*h)-i";

// Function call
infixToPostfix(exp);

return 0;
}

Infix to prefix :
// C++ program to convert infix to prefix
#include <bits/stdc++.h>
using namespace std;

// Function to check if the character is an operator


bool isOperator(char c)
{
return (!isalpha(c) && !isdigit(c));
}

// Function to get the priority of operators


int getPriority(char C)
{
if (C == '-' || C == '+')
return 1;
else if (C == '*' || C == '/')
return 2;
else if (C == '^')
return 3;
return 0;
}

// Function to convert the infix expression to postfix


string infixToPostfix(string infix)
{
infix = '(' + infix + ')';
int l = infix.size();
stack<char> char_stack;
string output;

for (int i = 0; i < l; i++) {

// If the scanned character is an


// operand, add it to output.
if (isalpha(infix[i]) ||
isdigit(infix[i])) output +=
infix[i];

// If the scanned character is an


// ‘(‘, push it to the
stack. else if (infix[i] ==
'(')
char_stack.push('(');

// If the scanned character is an


// ‘)’, pop and output from the stack
// until an ‘(‘ is
encountered. else if (infix[i]
== ')') {
while (char_stack.top() != '(') {
output += char_stack.top();
char_stack.pop();
}

// Remove '(' from the


stack char_stack.pop();
}

// Operator found
else {
if (isOperator(char_stack.top())) {
if (infix[i] == '^') {
while (
getPriority(infix[i])
<= getPriority(char_stack.top())) {
output += char_stack.top();
char_stack.pop();
}
}
else {
while (
getPriority(infix[i])
< getPriority(char_stack.top())) {
output += char_stack.top();
char_stack.pop();
}
}

// Push current Operator on


stack char_stack.push(infix[i]);
}
}
}
while (!char_stack.empty()) {
output += char_stack.top();
char_stack.pop();
}
return output;
}

// Function to convert infix to prefix notation


string infixToPrefix(string infix)
{
// Reverse String and replace ( with ) and vice versa
// Get Postfix
// Reverse Postfix
int l = infix.size();
// Reverse infix
reverse(infix.begin(), infix.end());

// Replace ( with ) and vice


versa for (int i = 0; i < l; i++) {

if (infix[i] == '(') {
infix[i] = ')';
}
else if (infix[i] == ')') {
infix[i] = '(';
}
}

string prefix = infixToPostfix(infix);

// Reverse postfix
reverse(prefix.begin(), prefix.end());

return prefix;
}

// Driver code
int main()
{
string s = ("x+y*z/w+u");

// Function call
cout << infixToPrefix(s) << std::endl;
return 0;
}

5. Output:
CS3CO27: COMPILER DESIGN Experiment no- 10
Experiment Title: Write a program to perform heap Page 47 of 62
storage allocation strategies

Practical 10

Objective:
Write a program to perform heap storage allocation strategies

Theory
Heap Allocation

Heap allocation is used where the Stack allocation lacks if we want to retain
the values of the local variable after the activation record ends, which we
cannot do in stack allocation, here LIFO scheme does not work for the
allocation and de-allocation of the activation record. Heap is the most flexible
storage allocation strategy we can dynamically allocate and de-allocate local
variables whenever the user wants according to the user needs at run-time.
The variables in heap allocation can be changed according to the user’s
requirement. C, C++, Python, and Java all of these support Heap Allocation.

For example:

int* ans = new int[5];


Advantages of Heap Allocation

1. Heap allocation is useful when we have data whose size is not fixed and
can change during the run time.

2. We can retain the values of variables even if the activation records end.

3. Heap allocation is the most flexible allocation scheme.

Disadvantages of Heap Allocation

1. Heap allocation is slower as compared to stack allocation.

2. There is a chance of memory leaks

Algorithm
Step1: Initially check whether the stack is empty

Step2: Insert an element into the stack using push operation

Step3: Insert more elements onto the stack until stack becomes full
Step4: Delete an element from the stack using pop operation
Step5: Display the elements in the stack

Step6: Top the stack element will be displayed

Algorithm
Step1: Initially check whether the stack is empty

Step2: Insert an element into the stack using push operation

Step3: Insert more elements onto the stack until stack becomes full

Step4: Delete an element from the stack using pop operation

Step5: Display the elements in the stack

Step6: Top the stack element will be displayed

Program
//implementation of heap allocation storage strategies//

#include<stdio.h>

#include<stdlib.h>

#define TRUE 1

#define FALSE 0

typedef struct Heap

int data;

struct Heap *next;

node;

node *create();

void main()

{
int choice,val;

char ans;

node *head;

void display(node *);

node *search(node *,int);

node *insert(node *);

void dele(node **);

head=NULL;

do

printf("\nprogram to perform various operations on heap using


dynamic memory management");

printf("\n1.create");

printf("\n2.display");

printf("\n3.insert an element in a list");

printf("\n4.delete an element from list");

printf("\n5.quit");

printf("\nenter your chioce(1-5)");

scanf("%d",&choice);

switch(choice)

{
case 1:head=create();

break;

case 2:display(head);

break;

case 3:head=insert(head);

break;

case 4:dele(&head);

break;

case 5:exit(0);

default:

printf("invalid choice,try again");

while(choice!=5);

node* create()

node *temp,*New,*head;

int val,flag;

char ans='y';

node *get_node();
temp=NULL;

flag=TRUE;

do

printf("\n enter the element:");

scanf("%d",&val);

New=get_node();

if(New==NULL)

printf("\nmemory is not allocated");

New->data=val;

if(flag==TRUE)

head=New;

temp=head;

flag=FALSE;

else

temp->next=New;

temp=New;

}
printf("\ndo you want to enter more elements?(y/n)");

while(ans=='y');

printf("\nthe list is created\n");

return head;

node *get_node()

node *temp;

temp=(node*)malloc(sizeof(node));

temp->next=NULL;

return temp;

void display(node *head)

node *temp;

temp=head;

if(temp==NULL)

printf("\nthe list is empty\n");

return;
}

while(temp!=NULL)

printf("%d->",temp->data);

temp=temp->next;

printf("NULL");

node *search(node *head,int key)

node *temp;

int found;

temp=head;

if(temp==NULL)

printf("the linked list is empty\n");

return NULL;

found=FALSE;

while(temp!=NULL && found==FALSE)

{
if(temp->data!=key)

temp=temp->next;

else

found=TRUE;

if(found==TRUE)

printf("\nthe element is present in the list\n");

return temp;

else

printf("the element is not present in the list\n");

return NULL;

node *insert(node *head)

int choice;

node *insert_head(node *);

void insert_after(node *);


void insert_last(node *);

printf("n1.insert a node as a head node");

printf("n2.insert a node as a head node");

printf("n3.insert a node at intermediate position in t6he list");

printf("\nenter your choice for insertion of node:");

scanf("%d",&choice);

switch(choice)

case 1:head=insert_head(head);

break;

case 2:insert_last(head);

break;

case 3:insert_after(head);

break;

return head;

node *insert_head(node *head)

node *New,*temp;

New=get_node();
printf("\nEnter the element which you want to insert");

scanf("%d",&New->data);

if(head==NULL)

head=New;

else

temp=head;

New->next=temp;

head=New;

return head;

void insert_last(node *head)

node *New,*temp;

New=get_node();

printf("\nenter the element which you want to insert");

scanf("%d",&New->data);

if(head==NULL)

head=New;

else
{

temp=head;

while(temp->next!=NULL)

temp=temp->next;

temp->next=New;

New->next=NULL;

void insert_after(node *head)

int key;

node *New,*temp;

New=get_node();

printf("\nenter the elements which you want to insert");

scanf("%d",&New->data);

if(head==NULL)

head=New;

else

{
printf("\enter the element which you want to insert the node");

scanf("%d",&key);

temp=head;

do

if(temp->data==key)

New->next-temp->next;

temp->next=New;

return;

else

temp=temp->next;

while(temp!=NULL);

node *get_prev(node *head,int val)

node *temp,*prev;

int flag;
temp=head;

if(temp==NULL)

return NULL;

flag=FALSE;

prev=NULL;

while(temp!=NULL && ! flag)

if(temp->data!=val)

prev=temp;

temp=temp->next;

else

flag=TRUE;

if(flag)

return prev;

else

return NULL;

void dele(node **head)


{

node *temp,*prev;

int key;

temp=*head;

if(temp==NULL)

printf("\nthe list is empty\n");

return;

printf("\nenter the element you want to delete:");

scanf("%d",&key);

temp=search(*head,key);

if(temp!=NULL)

prev=get_prev(*head,key);

if(prev!=NULL)

prev->next=temp->next;

free(temp);

else
{

*head=temp->next;

free(temp);

printf("\nthe element is deleted\n");

Output:

You might also like