Compiler Design Assignment Lexical Analysis: 30/08/2021 Neha Vijay Khairnar 191081036 IT
Compiler Design Assignment Lexical Analysis: 30/08/2021 Neha Vijay Khairnar 191081036 IT
Lexical Analysis
Submission Date- 30/08/2021
Name- Neha Vijay Khairnar
ID-191081036
Branch- IT
Problem Statement-
In LEX code detect the tokens (datatype, variable name, equal to operator,
comma, number, Semicolon).
In YACC code detect the syntax of variable declaration and definition.
Theory-
LEX code-
Lex is a computer program that generates lexical analyzers and was written by
Mike Lesk and Eric Schmidt.
Lex reads an input stream specifying the lexical analyzer and outputs source
code implementing the lexer in the C programming language.
The extension used for a lex code file is .l for example- sum.l and after
compilation of the lex code file a file lex.yy.c is generated which is basically
conversion of lex code into c code.
Compilation of lex code steps/sequence of commands-
1. flex lex_code_filename
2. gcc lex.yy.c or cc lex.yy.c -lfl
3. ./a.out
FLEX (fast lexical analyzer generator) is a tool/computer program for
generating lexical analyzers.
The function yylex() is automatically generated by the flex when it is provided
with a .l file and this yylex() function is expected by parser to call to retrieve
tokens from current/this token stream.
Steps for working of flex-
The figure below shows the steps followed by flex-
Step 1: An input file describes the lexical analyzer to be generated named lex.l is
written in lex language. The lex compiler transforms lex.l to C program, in a file
that is always named lex.yy.c.
Step 2: The C complier compile lex.yy.c file into an executable file called a.out.
Step 3: The output file a.out take a stream of input characters and produce a
stream of tokens.
Program Structure-
In the input file, there are 3 sections:
1) Definition Section: The definition section contains the declaration of variables,
regular definitions, manifest constants. In the definition section, text is enclosed
in “%{ %}” brackets. Anything written in this brackets is copied directly to the
file lex.yy.c
Syntax-
%{
//definitions
%}
%%
Rules
%%
YACC code -
YACC is a parse generator. A parser generator is a program that takes as input a
specification of a syntax, and produces as output a procedure for recognizing that
language.
%%
/* rules */
....
%%
/* auxiliary routines */
....
1) Definition part-
The definition part includes information about the tokens used in the syntax
definition:
Syntax-
%token NUMBER
%token ID
It can also include the specification of the starting symbol in the grammar:
%start nonterminal
Yacc automatically assigns numbers for tokens, but it can be overridden by
%token NUMBER 621
2) Rule part-
The rules part contains grammar definition in a modified BNF form.
Actions is C code in { } and can be embedded inside.
3) Auxiliary Routine Part-
The auxiliary routines part is only C code.
It includes function definitions for every function needed in rules part.
It can also contain the main() function definition if the parser is going to be
run as a program.
The main() function must call the function yyparse().
Solution-
LEX code to detect the token –
%{
#include <stdio.h>
#include "y.tab.h"
%}
DIGIT [0-9]
REAL {DIGIT}+[.]{DIGIT}*
LETTER [A-Za-z]
ASSIGN =
%%
[\t ] ;
int {printf("%s\tDataType\n",yytext);return (INT);}
float {printf("%s\t DataType\n",yytext);return (FLOAT);}
char {printf("%s\tDataType\n",yytext);return (CHAR);}
boolean {printf("%s\t DataType\n",yytext);return (BOOLEAN);}
true|false { printf("%s\tBOOLEAN VALue\n",yytext);return (BOOLEANVALUE);}
['][^\t\n]['] { printf("%s\tCHAR VALUE\n",yytext);return CHARVALUE;}
[a-zA-z]+[a-zA-z0-9_]* {printf("%s\tVariable Name\n",yytext);return ID;}
{REAL} { printf("%s\tNUMBER\n",yytext);return REAL;}
{DIGIT}+ { printf("%s\tNUMBER\n",yytext);return NUM;}
"," {printf("%s\tSpecial Symbol\n",yytext);return COMMA;}
";" {printf("%s\tSpecial Symbol\n",yytext);;return SC;}
{ASSIGN} {printf("%s\tOperator\n",yytext);return AS;}
\n return NL;
.;
%%
int yywrap()
{
return 1;
}
%}
%token ID SC INT CHAR FLOAT BOOLEAN BOOLEANVALUE CHARVALUE REAL
AS NUM COMMA NL
%%
s: type1|type2|type3|type4
;
type1:INT varlist SC NL { printf("Syntax is Correct\n"); return 0;}
;
type2:FLOAT varlist2 SC NL{ printf("Syntax is Correct\n");return 0;}
;
type3:CHAR varlist3 SC NL{ printf("Syntax is Correct\n");return 0;}
;
type4:BOOLEAN varlist4 SC NL{ printf("Syntax is Correct\n");return 0;}
;
varlist: ID | ID COMMA varlist | ID AS NUM |ID AS NUM COMMA varlist | //THIS IS FOR
EPSILON CASE (EMPTY)
;
varlist2: ID | ID COMMA varlist2 | ID AS REAL |ID AS REAL COMMA varlist2 |
;
varlist3: ID | ID COMMA varlist3 | ID AS CHARVALUE | ID AS CHARVALUE COMMA
varlist3 |
;
varlist4: ID | ID COMMA varlist4 | ID AS BOOLEANVALUE | AS BOOLEANVALUE
COMMA varlist4 |
;
%%
void yyerror(char *s )
{
int main()
{
yyparse();
return 0;
}
Output-
Input- int a=10
Conclusion- Hence we successfully wrote a LEX code and YACC code to do syntax
analysis and solved the given problem statement.