CD Project Report
CD Project Report
Submitted by:
Aronya Baksy PES1201800002
Suhas Vasisht PES1201800212
Aditeya Baral PES1201800366
Mr. Kiran P
Assistant Professor
PES University, Bengaluru
1. Introduction
The compiler is designed for the Go programming language (also sometimes referred
to as Golang). Go is a statically typed, compiled programming language that features
a C-like syntax but aims to be as readable and usable as more modern languages like
Python.
The compiler designed takes in a valid path to a file containing source code in
Golang. The compiler reads the source code, and undertakes the following steps on it:
Lexical Analysis: Converts the source code into a stream of tokens. Tokens
are defined in the lexer file lexer.py
Syntax Analysis: Matches the input stream of tokens with the grammar rules
defined in the file parser.py.
Semantic Analysis: Using actions attached with the above defined grammar
rules, transform the source code into a tree representation called an Abstract
Syntax Tree and the associated Three Address Code (a machine-independent
intermediate representation of the code)
Three Address Code: Collect the Three Address Code generated by the
semantic rules in the above step and put into a data structure that stores it in
quadruple format.
Optimization: Apply constant folding, common subexpression elimination
and packing temporary optimizations to reduce the length and complexity of
the generated Three Address Code.
3. Literature Survey
Sources:
PLY (Python Lex Yacc) official documentation and examples
(https://ply.readthedocs.io/en/latest/)
BNF (Backus-Naur Form) representation of the Context-Free Grammar
of the Go language (https://golang.org/ref/spec)
Operator precedence rules in Go programming language
(https://www.tutorialspoint.com/go/go_operators_precedence.htm)
4. Context-Free Grammar
SourceFile : PACKAGE IDENTIFIER SEMICOLON ImportDeclList
TopLevelDeclList
TopLevelDecl : Declaration
| FunctionDecl
ScopeStart : empty
ScopeEnd : empty
Statement : Declaration
| SimpleStmt
| ReturnStmt
| Block
| IfStmt
| SwitchStmt
| ForStmt
| PrintIntStmt
| PrintStrStmt
Declaration : VarDecl
Type : StandardTypes
StandardTypes : PREDEFINED_TYPES
Signature : Parameters
| Parameters Result
Result : Parameters
| Type
ParameterList : ParameterDecl
| ParameterList COMMA ParameterDecl
SimpleStmt : Expression
| Assignment
| ShortVarDecl
| IncDecStmt
ShortVarDecl : ExpressionList ASSIGN_OP ExpressionList
| Expression ASSIGN_OP Expression
assign_op : EQ
| PLUS_EQ
| MINUS_EQ
| OR_EQ
| CARET_EQ
| STAR_EQ
| DIVIDE_EQ
| MODULO_EQ
| LS_EQ
| RS_EQ
| AMP_EQ
| AND_OR_EQ
SwitchStmt : ExprSwitchStmt
ExprCaseClauseList : empty
| ExprCaseClauseList ExprCaseClause
ReturnStmt : RETURN
| RETURN Expression
| RETURN ExpressionList
Expression : UnaryExpr
| Expression OR_OR Expression
| Expression AMP_AMP Expression
| Expression EQ_EQ Expression
| Expression NOT_EQ Expression
| Expression LT Expression
| Expression LT_EQ Expression
| Expression GT Expression
| Expression GT_EQ Expression
| Expression PLUS Expression
| Expression MINUS Expression
| Expression OR Expression
| Expression CARET Expression
| Expression STAR Expression
| Expression DIVIDE Expression
| Expression MODULO Expression
| Expression LS Expression
| Expression RS Expression
| Expression AMP Expression
| Expression AND_OR Expression
UnaryExpr : PrimaryExpr
| unary_op UnaryExpr
unary_op : PLUS
| MINUS
| NOT
| CARET
| STAR
| AMP
| LT_MINUS
PrimaryExpr : Operand
| IDENTIFIER
| PrimaryExpr Selector
| PrimaryExpr Index
| PrimaryExpr Arguments
Operand : Literal
| LROUND Expression RROUND
Literal : BasicLit
BasicLit : decimal_lit
| float_lit
| string_lit
decimal_lit : DECIMAL_LIT
float_lit : FLOAT_LIT
string_lit : STRING_LIT
5. Design Strategy
5.1 Symbol Table Creation
The Symbol Table entries are created by the lexer. At this stage, the only field
that is not empty is the symbol name.
The parser adds additional information about the identifier, such as its type,
and its scope.
At the semantic stage, the expression result is evaluated, and the values are
stored in the symbol table.
5.3 Optimization
The three-address code is optimized in a separate script once the entire
intermediate code is generated from the input. The optimization script takes
the complete symbol table, as well as the current generated three address code
as input, and outputs the appropriate optimized three address code.
6. Implementation Details
6.1 Symbol Table Creation
The symbol table is an object of class SymbolTable. The symbol table
contains a data member which is a list of objects of type SymbolTableNode.
Both classes are defined in the file SymbolTable.py.
The searching is done using linear search, to avoid the additional overheads of
maintaining the table in sorted order for Binary search.
The TAC class contains methods for inserting a line into the three-address
code and contains methods that generate new temporary variables and new
label names as needed by the semantic analysis module.
The classes for the TAC and the AST nodes are defined in the file code.py.
The TAC object is associated with a particular AST node that is generated.
Using the action rules, the AST is built using these nodes.
Each AST node contains a name, a data field, an input type (for leaf nodes
only) and a Boolean variable indicating whether the node denotes an L-value
or not.
Each AST node contains an object of type TAC that contains the three-address
code corresponding to that node.
./go-compile <path-to-source-file.go>
7. Results
7.1 Final Result
The compiler front-end is shown to generate correct and optimized
intermediate representation of the input high level program written in Golang.
The three-address code is generated in quadruple format, and the symbol table
entries are correctly filled in.
The compiler front-end designed entirely using Python Lex and Yacc is shown
to work on both Windows and Linux environments, with minimal number of
package dependencies, and with reasonable performance levels for medium-
sized input programs.
7.2 Possible Shortcomings
The final optimized code is highly likely to be slower and less efficient than
the one generated by the actual reference implementation of the Golang
compiler. Also, import statements and the associated features (such as reading
from STDIN and writing to STDOUT) are not implemented. The unique
features of the Go programming model such as channels and the concurrency
model that involve advanced lower-level programming have also not been
implemented.
8. Screenshots
9. Conclusions
We can conclude that a satisfactorily accurate compiler can be built
using Lex and Yacc for several different languages spreading across
multiple genres. We can conclude that the various phases of a standard
compiler front-end can be built and implemented using these tools and
by following all regulations, a standard compiler can be built for almost
any language.