Lab Manual - Compiler Lab CSL411 Manual 2022
Lab Manual - Compiler Lab CSL411 Manual 2022
Experiment list
1. Design and implement a lexical analyzer using C language to recognize all valid tokens in the input
program. The lexical analyzer should ignore redundant spaces, tabs and newlines. It should also ignore
comments.
2..Write a lex program to find out total number of vowels and consonants from the given input sting.
3. Write a lex program to display the number of lines, words and characters in an input text.
4. Write a LEX Program to convert the substring abc to ABC from the given input string.
6. Generate a YACC specification to recognize a valid arithmetic expression that uses operators +, – , *,/
and parenthesis.
7. Generate a YACC specification to recognize a valid identifier which starts with a letter followed by any
number of letters or digits.
9. Write a program to find ε – closure of all states of any given NFA with ε transition.
10. Write a program to find First and Follow of any given grammar.
11. Design and implement a recursive descent parser for a given grammar.
15. Implement the back end of the compiler which takes the three address code and produces the 8086
assembly language instructions that can be assembled and run using an 8086 assembler. The target
assembly instructions can be simple move, add, sub, jump etc
Exercise 1.
Aim: Design and implement a lexical analyzer using C language to recognize all valid tokens in the input
program. The lexical analyzer should ignore redundant spaces, tabs and newlines. It should also ignore
comments.
Algorithm
2 Check whether the string is identifier/ keyword /symbol by using the rules of identifier and keyword
Program
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<ctype.h>
char keywords[32][10] =
{"auto","break","case","char","const","continue","default",
"do","double","else","enum","extern","float","for","goto",
"if","int","long","register","return","short","signed",
"sizeof","static","struct","switch","typedef","union",
"unsigned","void","volatile","while"};
int i, flag = 0;
flag = 1;
break;
return flag;
int main(){
char ch, buffer[15], operators[] = "+-
*/%=",specialch[]=",;[]{}",num[]="1234567890",buf[10];
FILE *fp;
int i,j=0,k=0;
fp = fopen("program.txt","r");
if(fp == NULL){
exit(0);
if(ch == operators[i])
if(ch == specialch[i])
if(isalpha(ch)){
buffer[j++] = ch;}
if(isdigit(ch)){
buf[k++]=ch;
}
buffer[j] = '\0';
j = 0;
if(isKeyword(buffer) == 1)
else{
}}
fclose(fp);
return 0;
output
program.txt
int a,b,v=1
./test
int is keyword
, is special character
, is special character
= is operator
abv is identifier
1 is constant
Introduction to LEX
Lex is a program that generates lexical analyzers. It is used with the YACC parser generator.The
lexical analyzer is a program that transforms an input stream into a sequence of tokens.
It reads the input stream and produces the source code as output through implementing the lexical
analyzer in the C program.
● Firstly, a lexical analyzer creates a program programname.l in the Lex language. Then
the Lex compiler runs the programname.l program and produces a C program lex.yy.c.
● Finally C compiler runs the lex.yy.c program and produces an object program a.out.
● a.out is a lexical analyzer that transforms an input stream into a sequence of tokens.
%{
definition section
%}
%%
rules section
/*pattern action */
%%
main()
yylex();
To create C file:
lex <pgmname.l>
cc lex.yy.c –ll
To execute program:
./a.out
SAMPLE PROGRAMS
1. HELLO WORLD
%{
%}
%%
%%
void main()
{
yylex();
printf("Hello world\n");
}
%{
int alpha=0,nonalpha=0; /*Global variables*/
%}
/*Rule Section*/
%%
[a-zA-Z] alpha++;
. nonalpha++;
%%
int main()
{
// The function that starts the analysis
yylex();
%{
#include<stdio.h>
int lc=0, sc=0, tc=0, ch=0, nwords=0; /*Global variables*/
%}
/*Rule Section*/
%%
\n lc++; //line counter
([ ])+ sc++; //space counter
\t tc++; //tab counter
[^ \n\t]+ {nwords++, c h=ch+yyleng;}
%%
int main()
{
// The function that starts the analysis
yyin=fopen("input.txt","r");
yylex();
fclose(yyin);
printf("\nNo. of lines=%d", lc);
printf("\nNo. of spaces=%d", sc);
printf("\nNo. of tabs=%d", tc);
printf("\nNo. of other characters=%d", ch);
printf("\nNo. of other words=%d", nwords);
Exercise 2
Aim. Write a lex program to find out total number of vowels and consonants from the given input sting.
Algorithm
Algorithm
Procedure:
1. In definition section, declare and initiate the variables, which are used to count the total number of
vowels and consonants.
2. In rule section,
(i)Define the pattern which is used to recognize vowels. (In action part) , If a character from the input
string matches with this pattern then increment the vowel count.
(ii)Define the pattern which is used to recognize consonants. (In action part)If a character from the
input string matches with this pattern then increment the consonant count.
(i) In main() ,Call yylex(). Then print the total number of vowels and consonants in the input
string.
Program
%{
%}
%%
main()
yylex();
Exercise 3
Aim . Write a lex program to display the number of lines, words and characters in an input
text.Algorithm
Procedure:
1. In definition section, declare and initiate the variables, which are used to count the total number of
characters, words, and lines in a given input file.
2. In rule section,
(i)Define the pattern which is used to recognize words by specifying any of the word delimiters (one or
more occurrences) except tab and new line . (In action part) , If a character from the input string
matches with this pattern then increment the word count by 1 and the character count. by
yyleng.(yylength is the length of yytext array.)
(ii)Define the pattern which is used to recognize space. (In action part)If a character from the input
string matches with this pattern then total character count by 1.
(iii)Define the pattern which is used to recognize new line. (In action part) if a character from the input
string matches with this pattern then increment the total line count as well as the total character count
by 1.
(ii) Call yylex(). Then print the total nqumber of words, characters, and lines in the given input file.
Program
%{
%}
%%
[] { chars++; }
%%
main()
yyin=fopen(“input.txt”,”r”);
yylex(); AA ll
Pujjprintf("\nCharacters = %d",chars);
printf("\nLines = %d",lines);
}
Sample Input & output:
Input.txt
This is bangalore
Words =3
Characters=17
Lines=1
Exercise 4.
Aim . Implement a Lexical Analyzer for a given program using Lex Tool.
Algorithm
1 Lex program contains three sections : definitions , rules , and user subroutines . Each section must be
separated from the others by a line containing only the delimiter , %%. The format is as follows :
definitions %% rules %% user_subroutines
2 In definition section , the variables make up the left column , and their definitions make up the right
column . Any C statements should be enclosed in %{..}%. Identifier is defined such that the first letter of
an identifier is alphabet and remaining letters are alphanumeric .
3 In rules section , the left column contains the pattern to be recognized in an input file to yylex () . The
right column contains the C program fragment executed when that pattern is recognized . The various
patterns are keywords , operators , new line character , number , string , identifier , beginning and end
of block , comment statements , preprocessor directive statements etc .
4 Each pattern may have a corresponding action , that is , a fragment of C source code to execute when
the pattern is matched .
5 When yylex () matches a string in the input stream , it copies the matched text to an external character
array , yytext , before it executes any actions in the rules section .
6 In user subroutine section , main routine calls yylex () . yywrap () is used to get more input .
7 The lex command uses the rules and actions contained in file to generate a program , lex . yy .c , which
can be compiled with the gcc command . That program can then receive input , break the input into the
logical pieces defined by the rules in file , and run program fragments contained in the actions in file .
Program
Output
if(a==b)
keywords:if
separator:(
identifier: a
operator:==
identifier: b
separator:)
// Another code
%{
%}
identifier [a - zA - Z ][ a - zA - Z0 -9]*
%%
int |
float |
char |
double |
while |
for |
struct |
typedef |
do |
if |
break |
continue |
void |
switch |
return |
else |
yytext ) ;}
\( ECHO ;
\ <= |
\ >= |
\<|
== |
%%
FILE * file ;
if (! file )
exit (0) ;
yyin = file ;
yylex ();
return (0) ;
int yywrap ()
return (1) ;
}
Input File:
var.c
#include<stdio.h>
#include<conio.h>
void main()
int a,b,c;
a=1;
b=2;
c=a+b;
printf("Sum:%d",c);
Output:
Exercise 5.
Aim Write a LEX Program to convert the substring abc to ABC from the given input string
Algorithm
1Lex program contains three sections: definitions, rules, and user subroutines. Each section must be
separated from the others by a line containing only the delimiter, %%.
2In definition section, the variables make up the left column, and their definitions make up the right
column. Any C statements should be enclosed in %{..}%. Identifier is defined such that the first letter of
an identifier is alphabet and remaining letters are alphanumeric.
3In rules section, the left column contains the pattern to be recognized in an input file to yylex(). The
right column contains the C program fragment executed when that pattern is recognized. The various
patterns are keywords, operators, new line character, number, string, identifier, beginning and end of
block, comment statements, preprocessor directive statements etc.
4Each pattern may have a corresponding action, that is, a fragment of C source code to execute when
the pattern is matched.
5When yylex() matches a string in the input stream, it copies the matched text to an external character
array, yytext, before it executes any actions in the rules section.
6In user subroutine section, main routine calls yylex(). yywrap() is used to get more input.
7The lex command uses the rules and actions contained in file to generate a program, lex.yy.c, which
can be compiled with the cc command. That program can then receive input, break the input into the
logical pieces defined by the rules in file, and run program fragments contained in the actions in file.
Program
%{
int i;
%}
%%
[a-z A-Z]* {
for(i=0;i<=yyleng;i++)
if((yytext[i]=='a')&&(yytext[i+1]=='b')&&(yytext[i+2]=='c'))
yytext[i]='A';
yytext[i+1]='B';
yytext[i+2]='my
printf("%s",yytext);
[\t]* return;
.* {ECHO;}
\n {printf("%s",yytext);}
%%
main()
yylex();
int yywrap()
return 1;
Output
rabcdrgabchh
rABCdrgABChh
Introduction to YACC
A parser generator is a program that takes as input a specification of a syntax, and produces as output a
procedure for recognizing that language They are called compiler compilers.
Yacc is a compiler compiler.It is the standard parser generator for the Unix operating system. An open
source program, yacc generates code for the parser in the C programming language
Input File:
/* definitions */
....
%%
/* rules */
....
%%
/* auxiliary routines */
....
The auxiliary routines part is only C code. It can also contain the main() function definition if the parser is
going to be run as a program.
If yylex() is not defined in the auxiliary routines sections, then it should be included:
#include "lex.yy.c"
Syntax Of Yacc Program:
%{ /* definition section */ %}
%%
%%
main()
yyparse();
%{
# include "y.tab.h"
definition section
%}
%%
rules section
/*pattern action */
%%
yacc –d <pgmname.y>
lex <pgmname.l>
./a.out
First, we need to specify all pattern matching rules for lex (bas.l) and grammar rules for yacc (bas.y).
yacc –d bas.y
If called with the –v option, Yacc produces as output a file y.output containing a textual description of
the LALR(1) parsing table used by the parser. This is useful for tracking down how the parser solves
conflicts.
lex bas.l
This will create lex.yy.c
Yacc reads the grammar descriptions in bas.y and generates a syntax analyzer (parser), that
includes function yyparse, in file y.tab.c. Included in file bas.y are token declarations. The –d
option causes yacc to generate definitions for tokens and place them in file y.tab.h. Lex reads the
pattern descriptions in bas.l, includes file y.tab.h, and generates a lexical analyzer, that includes function
yylex, in file lex.yy.c.
Finally, the lexer and parser are compiled and linked together to create executable bas.exe.
From main we call yyparse to run the compiler. Function yyparse automatically calls yylex to
E.g.
The tokens for INTEGER and VARIABLE are utilized by yacc to create #defines in y.tab.h for use in lex.
Exercise 6.
Aim Generate a YACC specification to recognize a valid arithmetic expression that uses operators +, – ,
*,/ and parenthesis.
Algorithm :
1 START
2 Read the given input expression .
5 STOP
Program
LEX
%{
#include"y.tab.h"
extern yylval;
%}
%%
[\t]+ ;
\n {return 0;}
. {return yytext[0];}
%%
YACC
%{
#include<stdio.h>
%}
%token NUMBER ID
%left '+' '-'
%%
|'-'NUMBER
|'-'ID
|'('expr')'
|NUMBER
|ID
%%
main()
yyparse();
printf("\nExpression is valid\n");
exit(0);
printf("\nExpression is invalid");
exit(0);
Output
Enter the expression
a+b
Expression is valid
Exercise 7.
Aim Generate a YACC specification to recognize a valid identifier which starts with a letter followed by
any number of letters or digits.
Algorithm
1START
5 STOP
Program
Aim Generate a YACC specification to recognize a valid identifier which starts with a letter
Procedure:
Yacc
2. In Rules section, Write the grammar rules to recognize the valid variable which starts with a letter,
followed by any number of letters or digits.
Grammar G:
id 🡪 id LETTER | LETTER
3.In user subroutine section, call yyparse() to check the syntax of the input viable name. If it is valid then
print the expression as valid, else if it is invalid the yyerror can be called automatically by the parser.
User can also redefine the yyerrror();
Lex
1.Lexer will provide the tokens for our parser. In definition section include the “y.tab.h” which are the
generated token codes from the parser.
(a) If the input variable is started with Digit [0-9] (i.e) except alphabets a-z, then it will be return the
token code for digit (which is invalid variable name).
(b) If the variable name is started with alphabet [a-z] return the token code as LETTER.
Yacc Program
%{
%}
iden : LETTER
|iden NUMBER
|iden LETTER
%%
int main()
yyparse();
return 0;
printf("\n%s\n",s);
exit(1);
Lex Program
%{
# include "y.tab.h"
%}
%%
%%
Sample I/O:
Exercise 8.
Algorithm
1START
variables .
be printed .
6 STOP
Program
Yacc program
%{
#include<stdio.h>
int flag=0;
%}
%token NUMBER
%%
ArithmeticExpression: E{
printf("\nResult=%d\n",$$);
return 0;
};
E:E'+'E {$$=$1+$3;}
|E'-'E {$$=$1-$3;}
|E'*'E {$$=$1*$3;}
|E'/'E {$$=$1/$3;}
|E'%'E {$$=$1%$3;}
|'('E')' {$$=$2;}
| NUMBER {$$=$1;}
%%
void main()
if(flag==0)
printf("\n\n");
void yyerror()
printf("\nInvalid\n\n");
flag=1;
Lex program
%{
/* Definition section */
#include<stdio.h>
#include "y.tab.h"
%}
/* Rule Section */
%%
[0-9]+ {
yylval=atoi(yytext);
return NUMBER;
[\t] ;
[\n] return 0;
. return yytext[0];
%%
int yywrap()
return 1;
Output
yacc –d calc.y
lex calc.l
./a.out
2+3
Result =5
Exercise 9
Aim Write a program to find ε – closure of all states of any given NFA with ε transition.
Algorithm
1 START
7 While iterator has not crossed the last element of the list t
enfa
11 end function
12 STOP
Program
Epsilon - closure
struct node
int st ;
};
void main ()
int i ,j ,k ,m ,t , n ;
getchar () ;
]\n") ;
alphabet [ i ]= getchar () ;
getchar () ;
]\n", notransition ) ;
printf (" NOTE : - [ States number must be greater than zero ]\n"
);
insert_trantbl (r ,c , s ) ;
printf ("\n")
c =0;
buffer [ j ]=0;
e_closure [ i ][ j ]=0;
findclosure (i , i ) ;
print_e_closure ( i ) ;
int i ;
if( buffer [ x ])
return ;
buffer [ x ]=1;
NULL )
int j ;
j = findalpha ( c ) ;
if( j ==999)
exit (0) ;
temp - > st = s ;
transition [ r ][ j ]= temp ;
int i ;
return i ;
return (999) ;
int j ;
printf ("{") ;
printf ("}") ;
Exercise 10
.Aim Write a program to find First and Follow of any given grammar.
Algorithm
1 Start
3 1. If X is terminal , FIRST ( X ) = { X }.
5 3. If X is a non - terminal , and X -> Y1 Y2 ... Yk is a production , and e is in all of FIRST ( Y1 ) , ... , FIRST (
Yk ) , then add e to FIRST ( X ) .
6 4. If X is a non - terminal , and X -> Y1 Y2 ... Yk is a production , then add a to FIRST ( X ) if for some i , a
is in FIRST ( Yi ) , and e is in all of FIRST ( Y1 ) , ... , FIRST ( Yi -1 ) .
8 1. If $ is the input end - marker , and S is the start symbol , $ element of FOLLOW ( S ) .
10 3. If there is a production , A -> aB , or a production A -> aBb , where e element of FIRST ( b ) , then
FOLLOW ( A ) subset of FOLLOW ( B ) .
11 Stop
Program
#include<stdio.h>
#include<math.h>
#include<string.h>
#include<ctype.h>
#include<stdlib.h>
int n,m=0,p,i=0,j=0;
char a[10][10],f[10];
int main(){
int i,z;
char c,ch;
//clrscr();
scanf("%d",&n);
for(i=0;i<n;i++)
scanf("%s%c",a[i],&ch);
do{
m=0;
scanf("%c",&c);
first(c);
printf("First(%c)={",c);
for(i=0;i<m;i++)
printf("%c",f[i]);
printf("}\n");
strcpy(f," ");
//flushall();
m=0;
follow(c);
printf("Follow(%c)={",c);
for(i=0;i<m;i++)
printf("%c",f[i]);
printf("}\n");
printf("Continue(0/1)?");
scanf("%d%c",&z,&ch);
}while(z==1);
return(0);
void first(char c)
int k;
if(!isupper(c))
f[m++]=c;
for(k=0;k<n;k++)
if(a[k][0]==c)
if(a[k][2]=='$')
follow(a[k][0]);
else if(islower(a[k][2]))
f[m++]=a[k][2];
else first(a[k][2]);
void follow(char c)
if(a[0][0]==c)
f[m++]='$';
for(i=0;i<n;i++)
for(j=2;j<strlen(a[i]);j++)
if(a[i][j]==c)
if(a[i][j+1]!='\0')
first(a[i][j+1]);
follow(a[i][0]);
}
Exercise 11
Aim Design and implement a recursive descent parser for a given grammar.
Algorithm
1 START
productions .
backtrack .
9 STOP
10 Procedure parser
11 from j =1 to t , repeat
13 from i -1 to k repeat
15 call procedure xi () ;
void Tprime () ;
void Eprime () ;
void E () ;
void check () ;
void T () ;
int main ()
count = 0;
flag = 0;
E () ;
else
}
}
void E ()
T () ;
Eprime () ;
void T ()
check () ;
Tprime () ;
void Tprime ()
count ++;
check () ;
Tprime () ;
void check ()
count ++;
count ++;
E () ;
count ++;
else
flag = 1;
else
flag = 1;
void Eprime ()
count ++;
T () ;
Eprime () ;
Output
Exercise 12
1 START
3 Loop forever .
5 case action o T [s , a ]
11 Pop 2+[ beta ] symbols of the stack . At this point , top of the stack
14 STOP
Program
int k =0 , z =0 , i =0 , j =0 , c =0;
void check () ;
int main ()
c = strlen ( a ) ;
stk [ i ]= a [ j ];
a [ j ]= ’ ’;
a [ j +1]= ’ ’;
check () ;
else
{
stk [ i ]= a [ j ];
a [ j ]= ’ ’;
check () ;
void check ()
stk [ z ]= ’E’;
printf ("\n$%s\t%s$\t%s",stk ,a , ac ) ;
j ++;
if( stk [ z ]== ’E’ && stk [ z +1]== ’+’ && stk [ z +2]== ’E’)
stk [ z ]= ’E’;
printf ("\n$%s\t%s$\t%s",stk ,a , ac ) ;
i =i -2;
}
if( stk [ z ]== ’E’ && stk [ z +1]== ’*’ && stk [ z +2]== ’E’)
stk [ z ]= ’E’;
printf ("\n$%s\t%s$\t%s",stk ,a , ac ) ;
i =i -2;
if( stk [ z ]== ’(’ && stk [ z +1]== ’E’ && stk [ z +2]== ’)’)
stk [ z ]= ’E’;
printf ("\n$%s\t%s$\t%s",stk ,a , ac ) ;
i =i -2;
}
Exercise 13
Algorithm
1 Start
7 end
Program
void input () ;
void output () ;
void change (int p , char * res ) ;
void constant () ;
struct expr
int flag ;
} arr [10];
int n ;
void main ()
input () ;
constant () ;
output () ;
void input ()
int i ;
void constant ()
int i ;
if( isdigit ( arr [ i ]. op1 [0]) && isdigit ( arr [ i ]. op2 [0]) || strcmp ( arr [ i]. op ,"=") ==0)
op = arr [ i ]. op [0];
switch ( op )
case ’+’:
break ;
case ’-’:
break ;
case ’*’:
break ;
case ’/’:
res = op1 / op2 ;
break ;
case ’=’:
res = op1 ;
break ;
arr [ i ]. flag =1; /* eliminate expr and replace any operand below that
change (i , res1 ) ;
void output ()
int i =0;
if (! arr [ i ]. flag )
printf ("\n%s %s %s %s", arr [ i ]. op , arr [ i ]. op1 , arr [ i ]. op2 , arr [ i ]. res ) ;
{
int i ;
Output
Exercise 14
Algorithm
1START
4 Else find ’/’ operator and the operands on the left and right side of that operator , compute it and
assign it to a new variable .
5 Repeat the above step for all the operators in the order ’*’, ’+’,’-’
7 STOP
Program
void findopr () ;
void explore () ;
struct exp
int pos ;
char op ;
} k [15];
void main ()
findopr () ;
explore () ;
}
void findopr ()
k [ j ]. pos = i ;
k [ j ++]. op =’:’;
k [ j ]. pos = i ;
k [ j ++]. op =’/’;
k [ j ]. pos = i ;
k [ j ++]. op =’*’;
k [ j ]. pos = i ;
k [ j ++]. op =’+’;
k [ j ]. pos = i ;
k [ j ++]. op =’-’;
void explore ()
i =1;
while ( k [ i ]. op != ’\0 ’)
fleft ( k [ i ]. pos ) ;
fright ( k [ i ]. pos ) ;
printf ("\n") ;
i ++;
fright ( -1) ;
if( no ==0)
exit (0) ;
}
x - -;
while ( x != -1 && str [ x ]!= ’+’ && str [ x ]!= ’*’&& str [ x ]!= ’=’&& str [ x ]!= ’
\0 ’&& str [ x ]!= ’-’&& str [ x ]!= ’/’&& str [ x ]!= ’:’)
left [ w ]= ’\0 ’;
flag =1;
x - -;
x ++;
while ( x != -1 && str [ x ]!= ’+’&& str [ x ]!= ’*’&& str [ x ]!= ’\0 ’&& str [ x ]!= ’
=’&& str [ x ]!= ’:’&& str [ x ]!= ’-’&& str [ x ]!= ’/’)
right [ w ]= ’\0 ’;
str [ x ]= ’$’;
flag =1;
x ++;
Output
Exercise 15
Aim Implement the back end of the compiler which takes the three address code and produces the 8086
assembly language instructions that can be assembled and run using an 8086 assembler. The target
assembly instructions can be simple move, add, sub, jump etc
Algorithm
Program
Lex Program Exercises
%{
definition section
%}
%%
rules section
/*pattern action */
%%
yylex();
To create C file:
lex <pgmname.l>
cc lex.yy.c –ll
To execute program:
./a.out
1. Aim: Program to count the number of vowels and consonants in a given string.
Procedure:
1. In definition section, declare and initiate the variables, which are used to count the total
2. In rule section,
(i)Define the pattern which is used to recognize vowels. (In action part) , If a character from the input
string matches with this pattern then increment the vowel count.
(ii)Define the pattern which is used to recognize consonants. (In action part)If a character from the
input string matches with this pattern then increment the consonant count.
(i) In main() ,Call yylex(). Then print the total number of vowels and consonants in the input
string.
Progra m:
%{
%}
%%
%%
main()
yylex();
}
Sample Input & output:
2. Aim: Program to count the number of characters, words, spaces and lines in a given input file.
Procedure:
1. In definition section, declare and initiate the variables, which are used to count the total
2. In rule section,
(i)Define the pattern which is used to recognize words by specifying any of the word delimiters (one or
more occurrences) except tab and new line . (In action part) , If a character from the input string
matches with this pattern then increment the word count by 1 and the character count. by
yyleng.(yylength is the length of yytext array.)
(ii)Define the pattern which is used to recognize space. (In action part)If a character from the input
string matches with this pattern then increment the space count as well as the total character count by
1.
(iii)Define the pattern which is used to recognize new line. (In action part) if a character from the input
string matches with this pattern then increment the total line count as well as the total character count
by 1.
Program:
%{
%}
%%
[] { spaces++; chars++; }
%%
main()
yyin=fopen(“input.txt”,”r”);
yylex();
fclose(yyin);
printf("\nWords=%d",words);
printf("\nCharacters = %d",chars);
printf("\nSpaces = %d",spaces);
printf("\nLines = %d",lines);
Input.txt
This is bangalore
Words =3
Characters=17
Spaces=2
Lines=1
Procedure:
1. In definition section, declare and initiate the variables, which are used to count the total
2. In rule section,
(i)Define the pattern which is used to recognize positive integers (i.e) ‘+’ sign(optional) and 0-9 digits (1
or more occurrences). (In action part) , If a number from the input data matches with this pattern then
increment the positive count.
(ii) Define the pattern which is used to recognize positive integers (i.e) ‘-’ sign and 0-9 digits (1 or more
occurrences). (In action part) , If a number from the input data matches with this pattern then
increment the negative count.
(i) In main() ,Call yylex(). Then print the total number of positive and negative integers in the
input string.
Program:
%{
int pos_ints=0,neg_ints=0;
%}
%%
"+"?[0-9]+ { pos_ints++; }
"-"[0-9]+ { neg_ints++; }
%%
main()
yylex();
Procedure:
1. In definition section, declare and initiate the variables, which are used to count the total
(i)Define the pattern which is used to recognize positive fractions (i.e) ‘+’ sign(optional) and digits (0 or
more occurrences) followed by decimal point and digits(0-9)one or more occurrences. (In action part) , If
a number from the input data matches with this pattern then increment the positive count.
(ii) Define the pattern which is used to recognize positive fractions (i.e) ‘-’ sign and digits (0 or more
occurrences)followed by decimal point and digits (0-9) one or more occurrences. (In action part) , If a
number from the input data matches with this pattern then increment the negative count.
(i) In main() ,Call yylex(). Then print the total number of positive and negative fractions in the
input string.
Program:
%{
int pos_fracts=0,neg_fracts=0;
%}
%%
"+"?[0-9]*[.][0-9]+ { pos_fracts++; }
"-"[0-9]*[.][0-9]+ { neg_fracs++; }
%%
main()
4. Aim: Program to count the numbers of comment lines in a given C program. Also eliminate them and
copy that program into separate file.
Procedure:
1. In definition section, declare and initiate the variables, which are used to count the total
2. In rule section,
(i)If the input string has “/*” ,change the state of LEX to COMMENT.
if “*/” encountered then go to the default state of LEX(0).And increment the comment count.
Else if any other character or new line encountered then retain in the COMMENT state and ,don’t write
the output on the output file.
3. If any other character or new line in LEX’s default state encountered then print that on output file.
(i) In main() ,open the input file using yyin pointer and output file using yyout pointer. Call yylex(). Then
print the total number of comment lines.
Program:
%{
int c=0;
%}
%s COMMENT
%%
< COMMENT>. ;
< COMMENT>\n ;
. ECHO;
\n ECHO;
%%
main()
yyin=fopen(“input..c”,”r”);
yyout=fopen(“output.c”,”r”);
yylex();
fclose(yyin);
fclose(yyout);
}
Sample I/O:
Input.c:
main()
int a; /* declaration */
a=10;
printf(“\n a=%d”,a);
Output.c:
#include<stdio.h>
main()
int a;
a=10;
printf(“\n a=%d”,a);
Procedure:
1. In definition section, declare and initiate the variables, which are used to count the total
number of scanf and printf statements in a C program. Define the file pointer also to access the C
program.
2. In rule section,
(i)Define the pattern which is used to recognize scanf. (In action part) , If the input file consists of scanf
then increment the count for scanf. And also printing it in the output file by replacing scanf with readf.
(ii) Define the pattern which is used to recognize printf. (In action part) , If the input file consists of printf
then increment the count for printf. And also printing it in the output file by replacing printf with writef.
(iii)Define the pattern to read any other characters except printf and scanf and write them into the
output file without changes.
(i) Define the main() to get the I/O file names in command line arguments. n main() ,read the input file
using yyin pointer and write the output file using yyout then Call yylex(). Then print the total number of
scanf and printf .
Program:
%{
int s=0,p=0;
%}
%%
.|\n { ECHO; }
%%
yyin=fopen(argv[1],"r");
yyout=fopen(argv[2],”w”);
yylex();
Sample I/O:
Input.c:
#include<stdio.h>
main()
int a,b;
scanf(“%d”,&a);
scanf(“%d”.,&b);
printf(“%d”,a);
printf(“%d”,b);
}
Output.c:
#include<stdio.h>
main()
int a,b;
readf(“%d”,&a);
readf(“%d”.,&b);
writef(“%d”,a);
writef(“%d”,b);
6. Aim: Program to recognize a valid arithmetic expression and identify the identifiers and operators
present. Print them separately.
Procedure:
1. In rule section,
(i)Define the pattern, which is used to recognize identifiers (which are named using alphabets and digits)
for one or more occurrences. (In action part) , If the input expression consists of identifier name then
print the yytext(which is currently having the identifier name.)
(ii) Define the pattern which is used to recognize operator (+,-,*,/,++,--,=,%). (In action part) , If the input
expression consists of operator then print the yytext.(which is currently having the operator).
(iii)Define the pattern to recognize digits (one or more occurrences) In action part print it as number or
constant.
(iv)Define the pattern to identify the invalid input such as identifier name started with a digit (78df –
invalid identifier). In action part set the valid flag as enabled.
(v)Define the pattern to recognize invalid expression which is ended with operator.(a+f+ -invalid
expression). In action part set the valid flag as enabled.
Program:
%{
%}
%%
[0-9]+[a-zA-Z][a-zA-Z0-9]+ {valid=1;}
[-+*/]\n {valid=1;}
\n return;
%%
main()
yylex();
Sample I/O:
enter an expression: a+40
id = a
operator=+
number=40
valid!
Procedure:
1. In definition section, declare and initiate the variable, which is to be used as flag to denote whether
an entered sentence is simple or compound.
2. In rule section,
(i)Define the pattern which is used to recognize any number of characters, followed by space ,followed
by “and” or “or” or “because”, followed by space ,followed by any number of characters. (In action part)
, from the input statement if there any match to be found with this pattern then set the flag.
(ii)Define the pattern which will read any character or new line which are all not matched with the
previous pattern.
(i) In main() ,Call yylex(). If the flag is set then print the sentence given by the user as compound
sentence, else print the given sentence is a simple sentence.
Program:
%{
# include <stdio.h>
int c=0;
%}
%%
. | \n ;
%%
int main()
yylex();
if(c==0)
printf("\nSimple Sentence.");
else
printf("\nCompound Sentence.");
return 0;
Sample I/O:
Compound statement
8.Aim: Program to recognize and count the number of identifiers in a given input file.
Procedure:
1. In definition section, declare and initiate the variable, which is used to count the total
number of identifiers.
2.Define a new state IDENTIFIER
3 In rule section,
(i)Define the pattern which is used to recognize identifiers ( in the declaration part of the program
itself).In the given input program if a statement has int, char, float, double, long or unsigned then
change the state of LEX from default state to IDENTIFIER state. (In action part), If a character from the
input string matches with this pattern then increment the vowel count.
(a)(Data type is already identified in previous step (i.e ) int a,b; ). Define a pattern to recognize one or
more spaces, followed by alphabet (identifier name should be started with alphabet), followed by zero
or more occurrences of alphabet or digit, followed by cama (,).If this pattern matched with the input
given by the user then, increment the total number of identifier count.
(b)Define a pattern, which is having the same patterns to recognize as the previous step to count
identifier, but if it is ended with semicolon (;), then go to the default state (0) of the LEX.
(c) Define a pattern which is used to read the input from the file, which are all not matched with the
previous 2 patterns.(i.e) pattern to recognize any character or new line character.
In main() ,open the input file using yyin pointer and then Call yylex(). Then print the total number of
identifiers.
Program:
%{
int c=0;
%}
%s IDEN
%%
<IDEN>[a-zA-Z][a-zA-Z0-9]+? {c++;}
<IDEN>”[“[0-9]+”]” ;
<IDEN> \n ;
. | \n ;
%%
main()
yyin=fopen(“input.c”,”r”);
yylex();
fclose(yyin);
Sample I/O:
Input.c:
#include<stdio.h>
main()
{
int a[10],b,c;
c=a=10;
b=a+c;
%{ /* definition section */ %}
%%
%%
main()
yyparse();
# include "y.tab.h"
definition section
%}
%%
rules section
/*pattern action */
%%
yacc –d <pgmname.y>
lex <pgmname.l>
./a.out
Aim : Program to test the validity of a simple expression involving operators +, -, * and /.
Procedure:
Yacc
2. Define the precedence of the operators ( +,-,*,/) to be used in expressions as left associative.
3. In Rules section, Write the grammar rules to recognize the expression involving addition(+),
subtraction(-), multiplication(*) & division(/).
Grammar G:
4.In user subroutine section, call yyparse() to check the syntax of the input expression. If it is valid then
print the expression as valid, else if it is invalid the yyerror can be called automatically by the parser.
User can also redefine the yyerror();
Lex
1.Lexer will provide the tokens for our parser. In definition section include the “y.tab.h” which are the
generated token codes from the parser.
(a) if the input expression has Digit [0-9], then it will be matched with the token NUMBER (terminal
defined in grammar for parser).
(b) If the input has identifier name it will match with pattern to recognize identifier.(terminal defined in
the grammar)
(c) If the expression has white space then ignore. (do no action)
(d) If new line, it will denote the end of input token, which will tell the parser to not read more.
(e) Write the last rule with pattern to return any character, otherwise not handled as a single character
token to the parser.
Program:
Yacc Program:
%{
%}
%token NUMBER ID
%left '+''-'
%%
| NUMBER
| ID
%%
int main()
yyparse();
printf("\n %s\n",s);
exit(1);
Lex Program
%{
# include "y.tab.h"
%}
%%
[ \t] ;
\n { return 0; }
. { return yytext[0]; }
%%
Sample I/O:
valid exprn!2.Program to recognize nested IF control statements and display the number of levels of
nesting.
Procedure:
Yacc:
1. In definition section define an extern variable to count the total number of levels of if condition. Then
define the symbolic tokens EXPR,STMT.
Grammar G:
3.In user subroutine section, call yyparse() to check the syntax of the input statements. If it is valid then
print the statement as valid, else if it is invalid the yyerror can be called.User can also redifine the
yyerrror();
Lex :
1.Lexer will provide the tokens for our parser. In definition section include the “y.tab.h” which are the
generated token codes from the parser.
(a) in the input if there is a “if” return the token code for “if”.
(a)if the input statement has logical condition with in ( ) , then it will be matched with the token
cond.(terminal defined in grammar for parser).
(b) if the expression has white space then ignore .(do no action)
( c )if any other arithmetic expression with in { },then it will be matched with the STMT1.
Yacc Program
%{
%}
%token IF COND OB CB
%%
%%
int main()
yyparse();
return 0;
}
int yyerror(char *s)
printf(“\n%s\n”,s);
exit(1);
Lex Program
%{
#include “y.tab.h”
int l1=0,l2=0;
%}
%s condition
%%
[^”if””(“”)””{“”}”]* ;
%%
3.Aim: Program to recognize a valid arithmetic expression that uses operators +, -, * and /.
Procedure:
Yacc
2. Define the precedence of the operators ( +,-,*,/) to be used in expressions as left or non associative .
3. In Rules section, Write the grammar rules to recognize the expression involving addition(+),
subtraction(-), multiplication(*) & division(/).
Grammar G:
4.In user subroutine section, call yyparse() to check the syntax of the input expression. If it is valid then
print the expression as valid, else if it is invalid the yyerror can be called automatically by the parser.
User can also redefine the yyerrror();
Lex
1.Lexer will provide the tokens for our parser. In definition section include the “y.tab.h” which are the
generated token codes from the parser.
(a) if the input expression has identifier name started with alphabet it will be matched with the terminal
id so return that token code for id.
(b) if the input expression has Digit [0-9], then it will be matched with the token NUMBER (terminal
defined in grammar for parser).
(c) if the expression has white space then ignore .(do no action)
(d) if new line ,it will denote the end of input tokens, which will tell the parser to not read more.
(e) Write the last rule with pattern to return any character, which are not matched with the above rules.
Return the character to yacc(to return operators).
Yacc Program
%{
%}
%token NUMBER
%left '+''-'
%nonassoc UNARY
%%
%%
int main()
yyparse();
return 0;
printf("\n %s\n",s);
exit(1);
Lex Program
%{
# include "y.tab.h"
%}
%%
[ \t] ;
\n { return 0; }
. { return yytext[0]; }
%%
Sample I/O:
4.Aim: Program to recognize a valid variable, which starts with a letter, followed by any number of
letters or digits.
Procedure:
Yacc
2. In Rules section, Write the grammar rules to recognize the valid variable which starts with a letter,
followed by any number of letters or digits.
Grammar G:
id 🡪 id LETTER | LETTER
3.In user subroutine section, call yyparse() to check the syntax of the input viable name. If it is valid then
print the expression as valid, else if it is invalid the yyerror can be called automatically by the parser.
User can also redefine the yyerrror();
Lex
1.Lexer will provide the tokens for our parser. In definition section include the “y.tab.h” which are the
generated token codes from the parser.
2.In rules section write the patterns for the following:
(a) If the input variable is started with Digit [0-9] (i.e) except alphabets a-z, then it will be return the
token code for digit (which is invalid variable name).
(b) If the variable name is started with alphabet [a-z] return the token code as LETTER.
Yacc Program
%{
%}
%%
iden : LETTER
: iden NUMBER
: iden LETTER
%%
int main()
{
yyparse();
return 0;
printf("\n%s\n",s);
exit(1);
Lex Program
%{
# include "y.tab.h"
%}
%%
%%
Sample I/O:
Procedure:
Yacc
2. Define the precedence of the operators ( +,-,*,/) to be used in expressions as left or non associative .
3. In Rules section, Write the grammar rules along with the actions to recognize the expression involving
addition(+), subtraction(-), multiplication(*) & division(/). In action subsection write the code to evaluate
the value according to the values of the symbols in the right hand side of the rule.
Grammar G:
| -E calculate as E2.val=-E1.val
| id
4.In user subroutine section, call yyparse() to check the syntax of the input expression. If it is valid then
print the expression as valid, else if it is invalid the yyerror can be called automatically by the parser.
User can also redefine the yyerrror();
Lex
1.Lexer will provide the tokens for our parser. In definition section include the “y.tab.h” which are the
generated token codes from the parser. Define the variable yylval which will return the token associated
value to parser.
(a) if the input expression has Digits [0-9], then it will be matched with the token NUMBER ,then
yylval=numeric value of (yytext))(terminal defined in grammar for parser).
(b) if the expression has white space then ignore .(do no action)
(c) if new line ,it will denote the end of input tokens, which will tell the parser to not read more.
(d) write the last rule with pattern to recognize any character which are not matched with above rules.
And return the character . (tp return operators in the input string to yacc grammar)
Yacc Program
%{
%}
%token NUMBER
%left '+''-'
%nonassoc UNARY
%%
;
expression : expression '+' expression { $$=$1+$3; }
exit(0);
else { $$=$1/$3; }
| NUMBER
%%
int main()
yyparse();
return 0;
printf("\n %s\n",s);
exit(1);
}
Lex Program
%{
# include "y.tab.h"
int yylval;
%}
%%
[ \t] ;
\n { return 0; }
. { return yytext[0]; }
%%
Sample I/O:
The value of the expression is 06.Program to recognize strings ‘aaab’, ‘abbb’, ‘ab’ and ‘a’ using the
grammar (an bn , n>=0).
Note: The problem statement is incorrect . It may be ((an bm , n,m>=0). Or ab,aabb,aaabbb. Our
solution is for equal number of a’s and b’s.
Procedure:
Yacc
Grammar G:
expa 🡪 expa a | ξ
expb 🡪 expb b | ξ
3.In user subroutine section, call yyparse() to check the syntax of the input expression. If it is valid then
print the expression as valid, else if it is invalid the yyerror can be called automatically by the parser.
User can also redefine the yyerror();
Lex
1.Lexer will provide the tokens for our parser. In definition section include the “y.tab.h” which are the
generated token codes from the parser.
(a)If the input has ‘a’ then return the token code for a.
(b) if the input has ‘b’ then return the token code for b.
(c)if it is new line character then return the token code for new line character.
Yacc Program
%{
%}
%token a b END
%%
validexpn : exp END {printf(“\n valid!”);}
exp : A1 B1 | A1 | B1 | ;
A1 : A1 A | ;
B1 : B1 B | ;
%%
int main()
yyparse();
printf("\nString recognized!\n");
printf("\n%s\n",s);
exit(1);
Lex Program
%{
# include "y.tab.h"
%}
%%
a { return a; }
b { return b; }
. {return 0;}
\n {return END;}
%%
Sample I/O:
Procedure:
Yacc
2. In Rules section, Write the grammar rules to recognize the expression involving
Grammar G:
Exp 🡪 a a a a a a a a a a Exp1 b
Exp1 🡪 exp1 a | ξ
4.In user subroutine section, call yyparse() to check the syntax of the input expression. If it is valid then
print the expression as valid, else if it is invalid the yyerror can be called automatically by the parser.
User can also redefine the yyerror();
Lex
1.Lexer will provide the tokens for our parser. In definition section include the “y.tab.h” which are the
generated token codes from the parser.
(b) if the input has ‘b’ then return the token code for b.
(c)if it is new line character then return the token code for new line character.
Yacc Program
%{
%}
%token a b END
%%
exp : a a a a a a a a a a exp1 b ;
exp1 : exp1 a | ;
%%
int main()
yyparse();
printf("\nString recognized!\n");
printf("\n%s\n",s);
exit(1);
Lex Program
%{
# include "y.tab.h"
%}
%%
a { return a; }
b { return b; }
. {return 0;}
\n { return END; }
%%
Sample I/O:
String recognized!
Text Editor.
The mini projects should be preferably implemented in C/C++ language. However, the program may be
run under MS Windows, Unix or Linux environment. The students have to follow the following process
activities/Steps.
Process Activities/Steps
Software Engineering processes are composed of many activities, notably the following. They are
considered sequential steps in the Waterfall process, but other processes may rearrange or combine
them in different ways.
Requirements Analysis
Extracting the requirements of a desired software product is the first task in creating it. While customers
probably believe they know what the software is to do, it may require skill and experience in software
engineering to recognize incomplete, ambiguous or contradictory requirements.
Specification
Specification is the task of precisely describing the software to be written, in a mathematically rigorous
way. In practice, most successful specifications are written to understand and fine-tune applications that
were already well-developed, although safety-critical software systems are often carefully specified
prior to application development. Specifications are most important for external interfaces that must
remain stable.
Software architecture
The architecture of a software system refers to an abstract representation of that system. Architecture
is concerned with making sure the software system will meet the requirements of the product, as well
as ensuring that future requirements can be addressed. The architecture step also addresses interfaces
between the software system and other software products, as well as the underlying hardware or the
host operating system.
Coding
Reducing a design to code may be the most obvious part of the software engineering job, but it is not
necessarily the largest portion.
Testing
Testing of parts of software, especially where code by two different engineers must work together, falls
to the software engineer.
Documentation
An important (and often overlooked) task is documenting the internal design of software for the
purpose of future maintenance and enhancement. Documentation is most important for external
interfaces.
Pass 1:
BEGIN
BEGIN
increment Scnt
END {while}
Breakup Sourceline[Scnt]
BEGIN
ENDIF
increment Scnt
Breakup Sourceline[Scnt]
END
ENDIF
WHILE Opcode <> 'END'
BEGIN
BEGIN
IF not found
ELSE
ENDIF
ENDIF
IF found THEN
DO CASE
BEGIN
IF error THEN
ENDIF
BEGIN
IF error THEN
3. OTHERWISE
BEGIN
IF error THEN
END
ENDCASE
ELSE
ENDIF
ENDIF
increment Scnt
Breakup Sourceline[Scnt]
END {while}
IF not found
ELSE
ENDIF
ENDIF
IF Operand not NULL
IF found
install in ENDval
ENDIF
ENDIF
Pass 2:
BEGIN
BEGIN
increment Scnt
END {while}
Breakup Sourceline[Scnt]
IF Opcode = 'START' THEN
BEGIN
increment Scnt
Breakup Sourceline[Scnt]
END
ENDIF
BEGIN
BEGIN
IF found THEN
DO CASE
BEGIN
set Skip to 1
END
IF error THEN
ENDIF
END
3. OTHERWISE
BEGIN
IF error THEN
ENDIF
END
ENDCASE
ELSE
ENDIF
END
ENDIF
BEGIN
set Errorflag to 1
ENDIF
BEGIN
END
ENDIF
IF Skip = 1 THEN
set Skip to 0
ENDIF
increment Scnt
Breakup Sourceline[Scnt]
END {while}
IF Errorflag = 0 THEN
ENDIF
Text Editor
An interactive editor is a computer program that allows a user to create and revise a target document.
The document-editing process is an interactive user-computer dialogue designed to accomplish four
tasks:
Selection of the part of the document to be viewed and edited involves first traveling through the
document to locate the area of interest. This search is accomplished with operations such as next screen
full, bottom, and find pattern. Traveling specifies where the area of interest is. The selection of what is
to be viewed and manipulated there is controlled by filtering. Filtering extracts the relevant subset of
the target document at the point of interest , such as the next screenful of text or the next statement.
2.Determine how to format this view online and how to display it.
Formatting then determines how the result of the filtering will be seen as a visible representation (the
view) on a display screen or other device.
In the actual editing phase , the target document is created or altered with a set of operations such as
insert, delete, replace, move and copy. The editing functions are often specialized to operate on
elements meaningful to the type of editor. For example, a manuscript-oriented editor might operate on
elements such as single characters, words, lines, sentences and paragraphs.
In a simple scenario, then the user might travel to the end of the document. A screenful of text would be
filtered, this segment would be formatted, and the view would be displayed on an output device. The
user could then for example, delete the first three words of this view.
Note: The student has to design the Editor with workspace, title bar and menu bar with all the above
mentioned characteristics.
Linux Shell
Computer understand the language of 0's and 1's called binary language. In early days of computing,
instruction are provided using binary language, which is difficult for all , to read and write. So in Os there
is special program called Shell. Shell accepts instruction or commands in English (mostly) and if its a valid
command, it is pass to kernel.
Shell is a user program or it's environment provided for user interaction. Shell is an command language
interpreter that executes commands read from the standard input device (keyboard) or from a file. Shell
is not part of system kernel, but uses the system kernel to execute programs, create files etc.
Note : The student has to develop the shell with minimum 25 commands.
4.Lexical Analyzer
Lexical analysis is the processing of an input sequence of characters (such as the source code of a
computer program) to produce, as output, a sequence of symbols called "lexical tokens", or just
"tokens". For example, lexers for many programming languages convert the character sequence 123 abc
into two tokens: 123 and abc (whitespace is not a token in most languages). The purpose of producing
these tokens is usually to forward them as input to another program, such as a parser.
A lexical analyzer, or lexer for short, can be thought of having two stages, namely a scanner and an
evaluator. (These are often integrated, for efficiency reasons, so they operate in parallel.)
The first stage, the scanner, is usually based on a finite state machine. It has encoded within it
information on the possible sequences of characters that can be contained within any of the tokens it
handles (individual instances of these character sequences are known as a lexemes). For instance, an
integer token may contain any sequence of numerical digit characters. In many cases the first non-
whitespace character can be used to deduce the kind of token that follows, the input characters are
then processed one at a time until reaching a character that is not in the set of characters acceptable for
that token (this is known as the maximal munch rule). In some languages the lexeme creation rules are
more complicated and may involve backtracking over previously read characters.
A lexeme, however, is only a string of characters known to be of a certain type. In order to construct a
token, the lexical analyzer needs a second stage, the evaluator, which goes over the characters of the
lexeme to produce a value. The lexeme's type combined with its value is what properly constitutes a
token, which can be given to a parser. (Some tokens such as parentheses do not really have values, and
so the evaluator function for these can return nothing. The evaluators for integers, identifiers, and
strings can be considerably more complex. Sometimes evaluators can suppress a lexeme entirely,
concealing it from the parser, which is useful for whitespace and comments.)
Note: The student has to develop the LEX for scanning a specific Programming language. Eg. C, C++
References:
[3rd edition]