Compiler Design Lab Manual 05.02.2024_Final
Compiler Design Lab Manual 05.02.2024_Final
DEPARTMENT
OF
Lab Manual
Session: 2023-24
CS3CO27: COMPILER DESIGN Experiment no- 1
Experiment Title: Write a program to Design Lexical Analyzer to Page 2 of 6
recognize keyword
Practical 1 & 2
1. Objective (s): WRITE A PROGRAM TO DESIGN LEXICAL ANALYZER
TO RECOGNIZE KEYWORD
2. Theory:
Lexical analysis is the starting phase of the compiler. It gathers modified source code that is written in
the form of sentences from the language preprocessor. The lexical analyzer is responsible for breaking
these syntaxes into a series of tokens, by removing whitespace in the source code. If the lexical
analyzer gets any invalid token, it generates an error. The stream of character is read by it and it seeks
the legal tokens, and then the data is passed to the syntax analyzer, when it is asked for.
Terminologies:
There are three terminologies-
Token
Pattern
Lexeme
Token: It is a sequence of characters that represents a unit of information in the source code.
Pattern: The description used by the token is known as a pattern.
Lexeme: A sequence of characters in the source code, as per the matching pattern of a token, is
known as lexeme. It is also called the instance of a token.
It helps to identify the tokens.
The input characters are read by the lexical analyzer from the source code.
3. Program Logic :
return 0 (false)
Function main
print "Enter your C code (press Enter to finish):"
inputCode = readInputUntilEnter( )
inputCode = removeNewlineCharacter(inputCode)
lexicalAnalysis(inputCode)
exit program
4. Program:
#include <stdio.h>
#include <string.h>
}
}
int main() {
printf("Enter your C code (press Enter to finish):\n");
char cCode[1000];
fgets(cCode, sizeof(cCode), stdin);
cCode[strcspn(cCode, "\n")] = '\
0';
lexicalAnalysis(cCode);
return 0;}
5. OUTPUT:
CS3CO27: COMPILER DESIGN Experiment no- 3&4
Experiment Title: Write a program to compute first and follow set Page 7 of 16
of CFG.
Practical: 3 & 4
Objective (s): Write a program to compute first and follow set of CFG.
2. Theory:
FIRST set is a concept used in syntax analysis, specifically in the context of LL and LR parsing
algorithms. It is a set of terminals that can appear immediately after a given non-terminal in a
grammar.
The FIRST set of a non-terminal A is defined as the set of terminals that can appear as the first
symbol in any string derived from A. If a non-terminal A can derive the empty string, then the
empty string is also included in the FIRST set of A.
The FIRST set is used to determine which production rule should be used to expand a non-
terminal in an LL or LR parser. For example, in an LL parser, if the next symbol in the input
stream is in the FIRST set of a non-terminal, then that non-terminal can be safely expanded
using the production rule that starts with that symbol
.
FOLLOW set in compiler design are used to identify the terminal symbol immediately after a
non- terminal in a given language. FOLLOW set is also used to avoid backtracking the same as
the FIRST set. The only difference is FOLLOW set works on vanishing non-terminal on the
right- hand side so that decision-making gets easier for the compiler while parsing.
Follow(X) to be the set of terminals that can appear immediately to the right of Non-Terminal X
in some sentential form.
3.Algorithm/Rules:
First set:
1. If x is a terminal, then FIRST(x) = { ‘x’ }
2. If x-> ?, is a production rule, then add ? to
FIRST(x). 3.If X->Y1 Y2 Y3….Yn is a production,
1.FIRST(X) = FIRST(Y1)
2. If FIRST(Y1) contains ? then FIRST(X) = { FIRST(Y1) – ? } U { FIRST(Y2) }
3. If FIRST (Yi) contains ? for all i = 1 to n, then add ? to FIRST(X).
Follow set:
1.FOLLOW(S) = { $ } // where S is the starting Non-Terminal
2. If A -> pBq is a production, where p, B and q are any grammar
symbols, then everything in FIRST(q) except Є is in FOLLOW(B).
3. If A->pB is a production, then everything in FOLLOW(A) is in FOLLOW(B).
.4. If A->pBq is a production and FIRST(q) contains Є,
then FOLLOW(B) contains { FIRST(q) – Є } U FOLLOW(A)
4. Program:
int kay;
char done[count];
int ptr = -1;
if (xxx == 1)
continue;
// Function call
findfirst(c, 0, 0);
ptr += 1;
if (first[i] == calc_first[point1][lark]) {
chk = 1;
break;
}
}
if (chk == 0) {
printf("%c, ", first[i]);
calc_first[point1][point2++] = first[i];
}
}
printf("}\n");
jm = n;
point1++;
}
printf("\n");
printf(" "
"\n\n");
char donee[count];
ptr = -1;
// Checking if Follow of ck
// has already been calculated
for (kay = 0; kay <= ptr; kay+
+)
if (ck ==
donee[kay])
xxx = 1;
if (xxx == 1)
continue;
land += 1;
// Function call
follow(ck);
ptr += 1;
if (production[i][j + 1] == '\0'
&& c != production[i][0]) {
// Calculate the follow of the
// Non-Terminal in the L.H.S. of the
// production
follow(production[i][0]);
}
}
}
}
}
first[n++] = '#';
}
}
} j+
+;
}
}
}
4.Output:
First(X) = { q, n, o, p, #,m}
First(T) = { q, #, }
First(S) = { p, #, }
First(R) = { o, p, q, #, }
Follow(X) = { $, }
Follow(T) = { n, m, }
Follow(S) = { $, q, m, }
Follow(R) = { m, }
CS3CO27: COMPILER DESIGN Experiment no- 5
Page 17 of 21
Experiment Title: Write a program for implementation of
Predictive Parsing Table for LL (1) grammar
Practical 5:
Objective (s): Write a program for implementation of Predictive Parsing Table for LL (1) grammar
2. Theory:
Construction of LL(1) Parsing Table
LL(1) Parsing: Here the 1st L represents that the scanning of the Input will be done from the
Left to Right manner and the second L shows that in this parsing technique, we are going to use
the Left most Derivation Tree. And finally, the 1 represents the number of look-ahead, which
means how many symbols are you going to see when you want to make a decision.
First(): If there is a variable, and from that variable, if we try to drive all the strings then the
beginning Terminal Symbol is called the First.
Follow(): What is the Terminal Symbol which follows a variable in the process of derivation.
Step 3: For each production A –> α. (A tends to alpha)
Find First(α) and for each terminal in First(α), make entry A –> α in the table.
If First(α) contains ε (epsilon) as terminal, then find the Follow(A) and for each terminal in
Follow(A), make entry A –> ε in the table.
If the First(α) contains ε and Follow(A) contains $ as terminal, then make entry A –> ε in the
table for the $.
To construct the parsing table, we have two functions:
In the table, rows will contain the Non-Terminals and the column will contain the Terminal
Symbols. All the Null Productions of the Grammars will go under the Follow elements and the
remaining productions will lie under the elements of the First set.
E --> TE'
E' --> +TE' | ε
T --> FT'
T' --> *FT' | ε
F --> id | (E)
*ε denotes epsilon
Step 1: The grammar satisfies all properties in step 1.
Step 2: Calculate first() and
sets:
First Follow
E’ –> +TE’/ε { +, ε } { $, ) }
T’ –> *FT’/ε { *, ε } { +, $, ) }
Id + * ( ) $
As you can see that all the null productions are put under the Follow set of that symbol and all
the remaining productions lie under the First of that symbol.
Note: Every grammar is not feasible for LL(1) Parsing table. It may be possible that one cell may
contain more than one production.
4. Program:
#include<stdio.h>
#include<stdlib.h>
#include<conio.h>
#include<string.h>
char s[20],stack[20];
int main()
{
char m[5][6][3]={"tb"," "," ","tb"," "," "," ","+tb"," "," ","n","n","fc"," "," ","fc"," "," "," ","n","*fc","
a ","n","n","i"," "," ","(e)"," "," "};
int size[5][6]={2,0,0,2,0,0,0,3,0,0,1,1,2,0,0,2,0,0,0,1,3,0,1,1,1,0,0,3,0,0};
int i,j,k,n,str1,str2;
printf("\n Enter the input string:
"); scanf("%s",s);
strcat(s,"$");
n=strlen(s);
stack[0]='$';
stack[1]='e';
i=1;
j=0;
printf("\nStack Input\n");
printf(" \n");
while((stack[i]!='$')&&(s[j]!='$'))
{
if(stack[i]==s[j])
{
i--;
j++;
}
switch(stack[i])
{
case 'e': str1=0;
break;
case 'b': str1=1;
break;
case 't': str1=2;
break;
case 'c': str1=3;
break;
case 'f': str1=4;
break;
}
switch(s[j])
{
case 'i': str2=0;
break;
case '+': str2=1;
break;
case '*': str2=2;
break;
case '(': str2=3;
break;
case ')': str2=4;
break;
case '$': str2=5;
break;
}
if(m[str1][str2][0]=='\0')
{
printf("\nERROR");
exit(0);
}
else if(m[str1][str2][0]=='n')
i--;
else if(m[str1][str2][0]=='i')
stack[i]='i';
else
{
for(k=size[str1][str2]-1;k>=0;k--)
{
stack[i]=m[str1][str2][k];
i++;
}
i--;
}
for(k=0;k<=i;k++)
printf(" %c",stack[k]);
printf(" ");
for(k=j;k<=n;k++)
printf("%c",s[k]);
printf(" \n ");
}
printf("\n SUCCESS");
return 0;
}
5. Output:
CS3CO27: COMPILER DESIGN Experiment no- 6
Page 22 of 28
Experiment Title: Write a program for implementation of
Predictive Parser
Practical: 6
Theory:
A -> A1 | A2 | ... | An
If the non-terminal is to be further expanded to ‘A’, the rule is selected based on the
current input symbol ‘a’ only.
E->TT'
T'->+TT'|ε
T->FT''
T''->*FT''|ε
F->(E)|id
STEP 1:
Make a transition diagram(DFA/NFA) for every rule of
grammar. E->TT’
T’->+TT’|ε
T->FT”
T”->*FT”|ε
F->(E)|id
STEP 2:
Optimize the DFA by decreases the number of states, yielding the final transition diagram.
T’->+TT’|ε
STEP 3:
Simulation on the input string.
Steps involved in the simulation procedure are:
case 'A':
return 1;
case 'B':
return 2;
case 'C':
return 3;
case 'a':
return 0;
case 'b':
return 1;
case 'c':
return 2;
case 'd':
return 3;
case '$':
return 4;
}
return (2);
}
int main(){
int i, j, k;
for (i = 0; i < 5; i++)
for (j = 0; j < 6; j++)
strcpy(table[i][j], " ");
printf("The following grammar is used for Parsing Table:\n");
for (i = 0; i < 7; i++)
printf("%s\n", prod[i]); printf("\
fflush(stdin);
for (i = 0; i < 7; i++)
{k=
strlen(first[i]);
for (j = 0; j < 10; j++)
if (first[i][j] != '@')
strcpy(table[numr(prol[i][0]) + 1][numr(first[i][j]) + 1], prod[i]);
}
for (i = 0; i < 7; i++){
if (strlen(pror[i]) == 1){
if (pror[i][0] == '@'){
k = strlen(follow[i]);
for (j = 0; j < k; j++)
strcpy(table[numr(prol[i][0]) + 1][numr(follow[i][j]) + 1], prod[i]);
}
}
}
strcpy(table[0][0], " ");
strcpy(table[0][1], "a");
strcpy(table[0][2], "b");
strcpy(table[0][3], "c");
strcpy(table[0][4], "d");
strcpy(table[0][5], "$");
strcpy(table[1][0], "S");
strcpy(table[2][0], "A");
strcpy(table[3][0], "B");
strcpy(table[4][0], "C");
1. Output:
The following grammar is used for Parsing
Table: S->A A->Bb A->Cd B->aB B->@ C->Cc C-
>@
a b c d $
Practical 7
Objective
Write a program to develop an operator precedence parser.
Theory
Operator precedence parser – An operator precedence parser is a bottom-up parser that interprets an
operator grammar. This parser is only used for operator grammars. Ambiguous grammars are not
allowed in any parser except operator precedence parser. There are two methods for determining
what precedence relations should hold between a pair of terminals:
Figure – Operator precedence relation table for grammar E->E+E/E*E/id There is not given any relation
between id and id as id will not be compared and two variables can not come side by side. There is also
a disadvantage of this table – if we have n operators then size of table will be n*n and complexity will
be 0(n2). In order to decrease the size of table, we use operator function table. Operator precedence
parsers usually do not store the precedence table with the relations; rather they are implemented in a
special way. Operator precedence parsers use precedence functions that map terminal symbols to
integers, and the precedence relations between the symbols are implemented by numerical
comparison. The parsing table can be encoded by two precedence functions f and g that map terminal
symbols to integers. We select f and g such that:
Since there is no cycle in the graph, we can make this function table:
Program
#include<stdlib.h>
#include<stdio.h>
#include<string.h>
// function f to exit from the loop
// if given condition is not
true void f(){
printf("Not operator grammar");
exit(0);
}
void main(){
char grm[20][20], c;
// Here using flag variable,
// considering grammar is not operator
grammar int i, n, j = 2, flag = 0;
// taking number of productions from
user scanf("%d", &n);
else{
flag = 0;
f();
}
if (c == '$') {
flag = 0;
f();
c = grm[i][++j];
}
}
if (flag == 1)
printf("Operator grammar");
}
Output
CS3CO27: COMPILER DESIGN Experiment no- 8
Experiment Title: Write a program to design LALR Bottom-up Parser. Page 32 of 37
Practical 8
Theory:
LALR Parser:
LALR Parser is lookahead LR parser. It is the most powerful parser which can handle large classes of
grammar. The size of CLR parsing table is quite large as compared to other parsing table. LALR reduces the
size of this table.LALR works similar to CLR. The only difference is , it combines the similar states of CLR
parsing table into one single state.
The general syntax becomes [A->.B, a ]
where A->.B is production and a is a terminal or right end marker
$ LR(1) items=LR(0) items + look ahead
A->.BC, a
Suppose this is the 0th production.Now, since ‘ . ‘ precedes B,so we have to write B’s productions as
well. B->.D [1st production]
Suppose this is B’s production. The look ahead of this production is given as- we look at previous
production i.e. – 0th production. Whatever is after B, we find FIRST(of that value) , that is the lookahead
of 1st production. So, here in 0th production, after B, C is there. Assume FIRST(C)=d, then 1st production
become.
B->.D, d
CASE 2 –
Now if the 0th production was like
this, A->.B, a
Here,we can see there’s nothing after B. So the lookahead of 0th production will be the lookahead of 1st
production. ie-
B->.D, a
CASE 3 –
Assume a production A->a|
b A->a,$ [0th production]
Algorithm
token = next_token()
repeat forever
s = top of stack
else
error()
Program
< parser.l >%{ #include<stdio.h> #include "y.tab.h" %}%%[0 - 9] + { yylval.dval = atof (yytext); return
DIGIT;}\n |.return yytext[0];%%<parser.y >%{/*This YACC specification file generates the LALR parser for
the program considered in experiment 4.*/ #include<stdio.h> %}%union{double dval;} %token < dval >
DIGIT %type < dval > expr %type < dval > term %type < dval > factor %%line:expr '\n'{;printf ("%g\n",
$1);}expr:expr '+' term{$$ = $1 + $3;}|term;term:term '*' factor{ $$ = $1 * $3;}|factor;factor:'(' expr ')'{$$
= $2;}|DIGIT;%%39|P a g e int main (){ Print(b
Output
$ lex parser.l
$ yacc –d parser.y
5.0000
CS3CO27: COMPILER DESIGN Experiment no- 9
Experiment Title: Write a program for generating various Page 38 of 46
intermediate code forms-Polish notation:
a. Infix to prefix
b. Infix to postfix
Practical 9
Objective (s): Write a program for generating various intermediate code forms-Polish notation:
c. Infix to prefix
d. Infix to postfix
2. Theory:
In the analysis-synthesis model of a compiler, the front end of a compiler translates a source program into
an independent intermediate code, then the back end of the compiler uses this intermediate code to
generate the target code (which can be understood by the machine). The benefits of using machine-
independent intermediate code are:
Because of the machine-independent intermediate code, portability will be enhanced. For ex, suppose, if
a compiler translates the source language to its target machine language without having the option for
generating intermediate code, then for each new machine, a full native compiler is required. Because,
obviously, there were some modifications in the compiler itself according to the machine specifications.
Retargeting is facilitated.
It is easier to apply source code modification to improve the performance of source code by optimizing
Three-Address Code:
A three address statement involves a maximum of three references, consisting of two for operands and
one for the result.
A sequence of three address statements collectively forms a three address code.
The typical form of a three address statement is expressed as x = y op z, where x, y, and z
represent memory addresses.
Each variable (x, y, z) in a three address statement is associated with a specific memory location.
While a standard three address statement includes three references, there are instances where a
statement may contain fewer than three references, yet it is still categorized as a three address
statement. Example: The three address code for the expression a + b * c + d : T1 = b * c T2 = a + T1 T3 =
T2 + d; T 1 , T2 , T3 are temporary variables.
There are 3 ways to represent a Three-Address Code in compiler design:
i) Quadruples
ii) Triples
Syntax Tree:
A syntax tree serves as a condensed representation of a parse tree.
The operator and keyword nodes present in the parse tree undergo a relocation process to become
part of their respective parent nodes in the syntax tree. the internal nodes are operators and child
nodes are operands.
Creating a syntax tree involves strategically placing parentheses within the expression. This technique
contributes to a more intuitive representation, making it easier to discern the sequence in which
operands should be processed.
The syntax tree not only condenses the parse tree but also offers an improved visual representation of the
program’s syntactic structure,
Example: x = (a + b * c) / (a – b * c)
Advantages of Intermediate Code Generation:
Easier to implement: Intermediate code generation can simplify the code generation process by reducing
the complexity of the input code, making it easier to implement.
Facilitates code optimization: Intermediate code generation can enable the use of various code
optimization techniques, leading to improved performance and efficiency of the generated code.
Code reuse: Intermediate code can be reused in the future to generate code for other platforms or
languages.
Easier debugging: Intermediate code can be easier to debug than machine code or bytecode, as it is closer
to the original source code.
Additional memory usage: Intermediate code generation requires additional memory to store the
intermediate representation, which can be a concern for memory-limited systems.
Increased complexity: Intermediate code generation can increase the complexity of the compiler design,
making it harder to implement and maintain.
Reduced performance: The process of generating intermediate code can result in code that executes
slower than code generated directly from the source code.
3. Algorithm:
// to be added
4. Program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// Function to return precedence of operators
int prec(char c) {
if (c == '^')
return 3;
else if (c == '/' || c == '*')
return 2;
else if (c == '+' || c == '-')
return 1;
else
return -1;
}
// Function to return associativity of operators
char associativity(char c) {
if (c == '^')
return 'R';
return 'L'; // Default to left-associative
}
result[resultIndex] = '\0';
printf("%s\n", result);
}
// Driver code
int main() {
char exp[] = "a+b*(c^d-e)^(f+g*h)-i";
// Function call
infixToPostfix(exp);
return 0;
}
Infix to prefix :
// C++ program to convert infix to prefix
#include <bits/stdc++.h>
using namespace std;
// Operator found
else {
if (isOperator(char_stack.top())) {
if (infix[i] == '^') {
while (
getPriority(infix[i])
<= getPriority(char_stack.top())) {
output += char_stack.top();
char_stack.pop();
}
}
else {
while (
getPriority(infix[i])
< getPriority(char_stack.top())) {
output += char_stack.top();
char_stack.pop();
}
}
if (infix[i] == '(') {
infix[i] = ')';
}
else if (infix[i] == ')') {
infix[i] = '(';
}
}
// Reverse postfix
reverse(prefix.begin(), prefix.end());
return prefix;
}
// Driver code
int main()
{
string s = ("x+y*z/w+u");
// Function call
cout << infixToPrefix(s) << std::endl;
return 0;
}
5. Output:
CS3CO27: COMPILER DESIGN Experiment no- 10
Experiment Title: Write a program to perform heap Page 47 of 62
storage allocation strategies
Practical 10
Objective:
Write a program to perform heap storage allocation strategies
Theory
Heap Allocation
Heap allocation is used where the Stack allocation lacks if we want to retain
the values of the local variable after the activation record ends, which we
cannot do in stack allocation, here LIFO scheme does not work for the
allocation and de-allocation of the activation record. Heap is the most flexible
storage allocation strategy we can dynamically allocate and de-allocate local
variables whenever the user wants according to the user needs at run-time.
The variables in heap allocation can be changed according to the user’s
requirement. C, C++, Python, and Java all of these support Heap Allocation.
For example:
1. Heap allocation is useful when we have data whose size is not fixed and
can change during the run time.
2. We can retain the values of variables even if the activation records end.
Algorithm
Step1: Initially check whether the stack is empty
Step3: Insert more elements onto the stack until stack becomes full
Step4: Delete an element from the stack using pop operation
Step5: Display the elements in the stack
Algorithm
Step1: Initially check whether the stack is empty
Step3: Insert more elements onto the stack until stack becomes full
Program
//implementation of heap allocation storage strategies//
#include<stdio.h>
#include<stdlib.h>
#define TRUE 1
#define FALSE 0
int data;
node;
node *create();
void main()
{
int choice,val;
char ans;
node *head;
head=NULL;
do
printf("\n1.create");
printf("\n2.display");
printf("\n5.quit");
scanf("%d",&choice);
switch(choice)
{
case 1:head=create();
break;
case 2:display(head);
break;
case 3:head=insert(head);
break;
case 4:dele(&head);
break;
case 5:exit(0);
default:
while(choice!=5);
node* create()
node *temp,*New,*head;
int val,flag;
char ans='y';
node *get_node();
temp=NULL;
flag=TRUE;
do
scanf("%d",&val);
New=get_node();
if(New==NULL)
New->data=val;
if(flag==TRUE)
head=New;
temp=head;
flag=FALSE;
else
temp->next=New;
temp=New;
}
printf("\ndo you want to enter more elements?(y/n)");
while(ans=='y');
return head;
node *get_node()
node *temp;
temp=(node*)malloc(sizeof(node));
temp->next=NULL;
return temp;
node *temp;
temp=head;
if(temp==NULL)
return;
}
while(temp!=NULL)
printf("%d->",temp->data);
temp=temp->next;
printf("NULL");
node *temp;
int found;
temp=head;
if(temp==NULL)
return NULL;
found=FALSE;
{
if(temp->data!=key)
temp=temp->next;
else
found=TRUE;
if(found==TRUE)
return temp;
else
return NULL;
int choice;
scanf("%d",&choice);
switch(choice)
case 1:head=insert_head(head);
break;
case 2:insert_last(head);
break;
case 3:insert_after(head);
break;
return head;
node *New,*temp;
New=get_node();
printf("\nEnter the element which you want to insert");
scanf("%d",&New->data);
if(head==NULL)
head=New;
else
temp=head;
New->next=temp;
head=New;
return head;
node *New,*temp;
New=get_node();
scanf("%d",&New->data);
if(head==NULL)
head=New;
else
{
temp=head;
while(temp->next!=NULL)
temp=temp->next;
temp->next=New;
New->next=NULL;
int key;
node *New,*temp;
New=get_node();
scanf("%d",&New->data);
if(head==NULL)
head=New;
else
{
printf("\enter the element which you want to insert the node");
scanf("%d",&key);
temp=head;
do
if(temp->data==key)
New->next-temp->next;
temp->next=New;
return;
else
temp=temp->next;
while(temp!=NULL);
node *temp,*prev;
int flag;
temp=head;
if(temp==NULL)
return NULL;
flag=FALSE;
prev=NULL;
if(temp->data!=val)
prev=temp;
temp=temp->next;
else
flag=TRUE;
if(flag)
return prev;
else
return NULL;
node *temp,*prev;
int key;
temp=*head;
if(temp==NULL)
return;
scanf("%d",&key);
temp=search(*head,key);
if(temp!=NULL)
prev=get_prev(*head,key);
if(prev!=NULL)
prev->next=temp->next;
free(temp);
else
{
*head=temp->next;
free(temp);
Output: