0% found this document useful (0 votes)

237 views

CD Project Report

The document is a report submitted for a compiler design course on developing a compiler front-end for the Go programming language. It was submitted by three students and describes the context-free grammar, design strategy, and implementation details of their mini-compiler for Go. The compiler performs lexical analysis, syntax analysis, semantic analysis, generates three-address code, and applies optimizations to the intermediate code.

Uploaded by

Fadwa Abid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

237 views

CD Project Report

Uploaded by

Fadwa Abid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Report on

Compiler Front-End for the Go programming language

Submitted in partial fulfilment of the requirements for Semester VI

Compiler Design (UE18CS351)

Bachelor of Technology
in
Computer Science & Engineering

Submitted by:
Aronya Baksy PES1201800002
Suhas Vasisht PES1201800212
Aditeya Baral PES1201800366

Under the guidance of

Mr. Kiran P
Assistant Professor
PES University, Bengaluru

January – May 2021

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

FACULTY OF ENGINEERING
PES UNIVERSITY
(Established under Karnataka Act No. 16 of 2013)
100ft Ring Road, Bengaluru – 560 085, Karnataka, India
TABLE OF CONTENTS
Chapter Title Page No.
No.
1. INTRODUCTION 03
2. ARCHITECTURE OF LANGUAGE: 03
3. LITERATURE SURVEY 04
4. CONTEXT FREE GRAMMAR 04
5. DESIGN STRATEGY 09
● SYMBOL TABLE CREATION
● INTERMEDIATE CODE GENERATION
● CODE OPTIMIZATION
● ERROR HANDLING
6. IMPLEMENTATION DETAILS (TOOL AND DATA
STRUCTURES USED
 SYMBOL TABLE CREATION
● INTERMEDIATE CODE GENERATION
● CODE OPTIMIZATION
● ERROR HANDLING
7. RESULTS AND possible shortcomings of your Mini-Compiler
8. SNAPSHOTS (of different outputs)
9. CONCLUSIONS
10. FURTHER ENHANCEMENTS
REFERENCES/BIBLIOGRAPHY

1. Introduction
The compiler is designed for the Go programming language (also sometimes referred
to as Golang). Go is a statically typed, compiled programming language that features
a C-like syntax but aims to be as readable and usable as more modern languages like
Python.

The compiler designed takes in a valid path to a file containing source code in
Golang. The compiler reads the source code, and undertakes the following steps on it:
 Lexical Analysis: Converts the source code into a stream of tokens. Tokens
are defined in the lexer file lexer.py
 Syntax Analysis: Matches the input stream of tokens with the grammar rules
defined in the file parser.py.
 Semantic Analysis: Using actions attached with the above defined grammar
rules, transform the source code into a tree representation called an Abstract
Syntax Tree and the associated Three Address Code (a machine-independent
intermediate representation of the code)
 Three Address Code: Collect the Three Address Code generated by the
semantic rules in the above step and put into a data structure that stores it in
quadruple format.
 Optimization: Apply constant folding, common subexpression elimination
and packing temporary optimizations to reduce the length and complexity of
the generated Three Address Code.

2. Architecture of the Language

The following subset of the syntax of Go has been handled by the grammar (syntax
and semantic rules) implemented in this mini project:

 Short variable declarations (using type inference and the shorthand

operator :=) are supported. Explicit type declarations (of the form var x
int = 5) are not supported.
 Assignment statements involving shorthand operators (+=, -=, *= and so on)
are supported
 Arithmetic expressions involving the +, -, * ,/ and % operators are supported
 Switch case expressions involving single variables are supported
 For loop statements containing a logical condition are supported
 Import statements, type definitions, constant declarations, array expressions
are not handled

3. Literature Survey
Sources:
 PLY (Python Lex Yacc) official documentation and examples
(https://ply.readthedocs.io/en/latest/)
 BNF (Backus-Naur Form) representation of the Context-Free Grammar
of the Go language (https://golang.org/ref/spec)
 Operator precedence rules in Go programming language
(https://www.tutorialspoint.com/go/go_operators_precedence.htm)

4. Context-Free Grammar
SourceFile : PACKAGE IDENTIFIER SEMICOLON ImportDeclList
TopLevelDeclList

ImportDeclList : ImportDecl SEMICOLON ImportDeclList

| empty

TopLevelDeclList : TopLevelDecl SEMICOLON TopLevelDeclList

| empty

TopLevelDecl : Declaration
| FunctionDecl

ImportDecl : IMPORT LROUND ImportSpecList RROUND

| IMPORT ImportSpec

ImportSpecList : ImportSpec SEMICOLON ImportSpecList

| empty

ImportSpec : DOT string_lit

| IDENTIFIER string_lit
| empty string_lit

Block : LCURLY ScopeStart StatementList ScopeEnd RCURLY

ScopeStart : empty

ScopeEnd : empty

StatementList : Statement SEMICOLON StatementList

| empty

Declaration : VarDecl

IdentifierList : IDENTIFIER COMMA IdentifierBotList

IdentifierBotList : IDENTIFIER COMMA IdentifierBotList

| IDENTIFIER

ExpressionList : Expression COMMA ExpressionBotList

ExpressionBotList : Expression COMMA ExpressionBotList

| Expression

Type : StandardTypes
StandardTypes : PREDEFINED_TYPES

FunctionType : FUNC Signature

Signature : Parameters
| Parameters Result

Result : Parameters
| Type

Parameters : LROUND RROUND

| LROUND ParameterList RROUND

ParameterList : ParameterDecl
| ParameterList COMMA ParameterDecl

ParameterDecl : IdentifierList Type

| IDENTIFIER Type
| Type

SimpleStmt : Expression
| Assignment
| ShortVarDecl
| IncDecStmt
ShortVarDecl : ExpressionList ASSIGN_OP ExpressionList
| Expression ASSIGN_OP Expression

Assignment : ExpressionList assign_op ExpressionList

| Expression assign_op Expression

SwitchStmt : ExprSwitchStmt

ExprSwitchStmt : SWITCH SimpleStmt SEMICOLON LCURLY ScopeStart

ExprCaseClauseList ScopeEnd RCURLY
| SWITCH SimpleStmt SEMICOLON Expression LCURLY ScopeStart
ExprCaseClauseList ScopeEnd RCURLY
| SWITCH LCURLY ScopeStart ExprCaseClauseList ScopeEnd
RCURLY
| SWITCH Expression LCURLY ScopeStart ExprCaseClauseList
ScopeEnd RCURLY

ExprCaseClauseList : empty
| ExprCaseClauseList ExprCaseClause

ExprCaseClause : ExprSwitchCase COLON StatementList

ExprSwitchCase : CASE ExpressionList

| DEFAULT
| CASE Expression

ForStmt : FOR Expression Block

| FOR Block

ReturnStmt : RETURN
| RETURN Expression
| RETURN ExpressionList

UnaryExpr : PrimaryExpr
| unary_op UnaryExpr

PrimaryExpr : Operand
| IDENTIFIER
| PrimaryExpr Selector
| PrimaryExpr Index
| PrimaryExpr Arguments

Operand : Literal
| LROUND Expression RROUND

Literal : BasicLit

BasicLit : decimal_lit
| float_lit
| string_lit
decimal_lit : DECIMAL_LIT

float_lit : FLOAT_LIT

Arguments : LROUND RROUND

string_lit : STRING_LIT

5. Design Strategy
5.1 Symbol Table Creation
The Symbol Table entries are created by the lexer. At this stage, the only field
that is not empty is the symbol name.

The parser adds additional information about the identifier, such as its type,
and its scope.

At the semantic stage, the expression result is evaluated, and the values are
stored in the symbol table.

5.2 Intermediate Code Generation

The generation of three-address code is handled by action rules that are
incorporated into the grammar. The scheme used for TAC generation is the
syntax directed translation scheme (SDTS). This means that the rules for
generating a TAC are written in terms of the actual parser stack
implementation and depend on the structure of the parsing stack.

5.3 Optimization
The three-address code is optimized in a separate script once the entire
intermediate code is generated from the input. The optimization script takes
the complete symbol table, as well as the current generated three address code
as input, and outputs the appropriate optimized three address code.

5.4 Error Handling

The following error handling strategies are in place:
 Lexer handles invalid identifiers by exiting as soon as an invalid identifier
token is detected by the lexer. The program exits and no more processing is
done after an invalid identifier is detected.
 Parser handles syntax errors using the p_error production that is defined
by PLY.
 Errors handled by the semantic analysis module:
o Invalid number of identifiers on left- and right-hand sides of
assignment statement
o Invalid number of function arguments

6. Implementation Details
6.1 Symbol Table Creation
The symbol table is an object of class SymbolTable. The symbol table
contains a data member which is a list of objects of type SymbolTableNode.
Both classes are defined in the file SymbolTable.py.

The SymbolTableNode class contains 4 data members, namely the Identifier

Name, the type, the value, and the scope it was declared in.
The SymbolTable class contains an add method that can add a row to the
symbol table, as well as a search method that searches the symbol table given
the name of an identifier.

The searching is done using linear search, to avoid the additional overheads of
maintaining the table in sorted order for Binary search.

To handle common subexpression elimination, the symbol table contains an

expression field which is essentially a reference to an AST node that contains
the same expression that has already been encountered. If a symbol table
lookup for the expression returns a non-empty result, then the same AST node
is reused, instead of a new AST node being generated.

6.2 Intermediate Code Generation

The process of intermediate code generation is linked to the process of
generating the abstract syntax tree using the action rules included with the
grammar productions.
The Intermediate Code is stored in a class called TAC that contains a single
data member (a list of size 4 for the operator field, the result, and the two
operands).

The TAC class contains methods for inserting a line into the three-address
code and contains methods that generate new temporary variables and new
label names as needed by the semantic analysis module.

The classes for the TAC and the AST nodes are defined in the file code.py.

The TAC object is associated with a particular AST node that is generated.
Using the action rules, the AST is built using these nodes.
Each AST node contains a name, a data field, an input type (for leaf nodes
only) and a Boolean variable indicating whether the node denotes an L-value
or not.
Each AST node contains an object of type TAC that contains the three-address
code corresponding to that node.

6.3 Code Optimization

The intermediate code optimization step takes in the symbol table and the final
generated three-address code as input.
For constant folding and propagation, the symbol table is used to retrieve the
evaluated values and the pointer to the already-evaluated subexpression. The
result

6.4 Error Handling

Syntax errors in the parsing stage are handled by special rules in the p_error
production. The syntax error can be pointed out and the error message is
printed that includes the line number of the syntax error.

In the lexical analysis module, the lexer is capable of detecting invalid

identifiers and stopping the compilation process then, printing an error
message to the screen with the line number on which the invalid identifier was
encountered.

6.5 How to run

On a Linux machine, the compiler can be run using the executable script
bundled along with it. Using the command

./go-compile <path-to-source-file.go>

7. Results
7.1 Final Result
The compiler front-end is shown to generate correct and optimized
intermediate representation of the input high level program written in Golang.
The three-address code is generated in quadruple format, and the symbol table
entries are correctly filled in.

The compiler front-end designed entirely using Python Lex and Yacc is shown
to work on both Windows and Linux environments, with minimal number of
package dependencies, and with reasonable performance levels for medium-
sized input programs.
7.2 Possible Shortcomings
The final optimized code is highly likely to be slower and less efficient than
the one generated by the actual reference implementation of the Golang
compiler. Also, import statements and the associated features (such as reading
from STDIN and writing to STDOUT) are not implemented. The unique
features of the Go programming model such as channels and the concurrency
model that involve advanced lower-level programming have also not been
implemented.

8. Screenshots

Output for a program with a switch-case statement

Output for a program with a loop statement

Output for an invalid identifier

Output for a program with common subexpressions

9. Conclusions
We can conclude that a satisfactorily accurate compiler can be built
using Lex and Yacc for several different languages spreading across
multiple genres. We can conclude that the various phases of a standard
compiler front-end can be built and implemented using these tools and
by following all regulations, a standard compiler can be built for almost
any language.

10. Further Enhancements

Some possible further enhancements to this mini project are as follows:
 Implementation of syntax and semantic rules for conditional and
unconditional branch statements (if, break, goto)
 More optimization techniques based on dead code elimination
(implementing live-variable analysis and adding next-use information
to the symbol table)

Code Wizard Workshop Notes
No ratings yet
Code Wizard Workshop Notes
29 pages
Compiler Construction: University of Central Punjab
No ratings yet
Compiler Construction: University of Central Punjab
3 pages
CS236 Project 2: The Datalog Parser
No ratings yet
CS236 Project 2: The Datalog Parser
3 pages
Tiger Language
No ratings yet
Tiger Language
52 pages
Syntax For CFG of DECAF
No ratings yet
Syntax For CFG of DECAF
3 pages
FSVCLDeveloperManual-en
No ratings yet
FSVCLDeveloperManual-en
75 pages
minicc
No ratings yet
minicc
139 pages
UNIVERSITY of Management and Technology Compiler Construction
0% (1)
UNIVERSITY of Management and Technology Compiler Construction
4 pages
CompilersLab Assignment3 New
No ratings yet
CompilersLab Assignment3 New
11 pages
CFG
No ratings yet
CFG
2 pages
Final Report Toa
No ratings yet
Final Report Toa
15 pages
Grammar File
No ratings yet
Grammar File
19 pages
Nila Assign5
No ratings yet
Nila Assign5
5 pages
Tiger Language Specification
No ratings yet
Tiger Language Specification
5 pages
System Software Notes 5TH Sem Vtu
100% (1)
System Software Notes 5TH Sem Vtu
98 pages
C Gramar
No ratings yet
C Gramar
9 pages
Compiler Construction-Ii Project "C2ASM" (Cross Compiler)
No ratings yet
Compiler Construction-Ii Project "C2ASM" (Cross Compiler)
27 pages
Gramática (Semántico) : 1 Terminales Del Lenguaje
No ratings yet
Gramática (Semántico) : 1 Terminales Del Lenguaje
5 pages
RkCD-Chapter 6 - Intermediate Code Generation
No ratings yet
RkCD-Chapter 6 - Intermediate Code Generation
12 pages
Lang Spec
No ratings yet
Lang Spec
5 pages
Context Free Grammar (CFG) - 2021
100% (1)
Context Free Grammar (CFG) - 2021
2 pages
A Grammar For The C-Programming Language (Version F15) September 8, 2015
No ratings yet
A Grammar For The C-Programming Language (Version F15) September 8, 2015
7 pages
C +grammar+ +updated
No ratings yet
C +grammar+ +updated
1 page
SSCD Chapter3
No ratings yet
SSCD Chapter3
97 pages
Readmi
No ratings yet
Readmi
8 pages
G
No ratings yet
G
2 pages
Compiler
No ratings yet
Compiler
52 pages
Report
No ratings yet
Report
20 pages
Unt 5-Functional Programming
No ratings yet
Unt 5-Functional Programming
74 pages
Tiny C Grammar
No ratings yet
Tiny C Grammar
1 page
CC Assignment
No ratings yet
CC Assignment
11 pages
AQA-8525-NG-PC
No ratings yet
AQA-8525-NG-PC
15 pages
Unit Iii QB
No ratings yet
Unit Iii QB
20 pages
History 6
No ratings yet
History 6
170 pages
Java Dialect Semantic: XERO CODER Online Help
No ratings yet
Java Dialect Semantic: XERO CODER Online Help
4 pages
Tugas Latihan: Ganang Fuad Husaini 10112694 Tekom 4
No ratings yet
Tugas Latihan: Ganang Fuad Husaini 10112694 Tekom 4
23 pages
CD Chapter 6
No ratings yet
CD Chapter 6
24 pages
Class Test Merged
No ratings yet
Class Test Merged
22 pages
01 134201 011 9556776808 25042022 113009pm
No ratings yet
01 134201 011 9556776808 25042022 113009pm
13 pages
10 Win Comp Vi Rev SPCC
No ratings yet
10 Win Comp Vi Rev SPCC
19 pages
Compiler Notes Unit 3
No ratings yet
Compiler Notes Unit 3
23 pages
Intermediate
No ratings yet
Intermediate
29 pages
CC Record
No ratings yet
CC Record
83 pages
Compilers: Basic Compiler Functions
No ratings yet
Compilers: Basic Compiler Functions
51 pages
Functional Programming Languages
No ratings yet
Functional Programming Languages
35 pages
Semantic Analysis: Type Checking !!!
No ratings yet
Semantic Analysis: Type Checking !!!
38 pages
Compiler Engineering
No ratings yet
Compiler Engineering
24 pages
Chapter 7_7e49d6f35691adb53f3f328c824a37a9
No ratings yet
Chapter 7_7e49d6f35691adb53f3f328c824a37a9
62 pages
compiler construction Lecture 8
No ratings yet
compiler construction Lecture 8
29 pages
Writing A Compiler With Sablecc
No ratings yet
Writing A Compiler With Sablecc
14 pages
Project1
No ratings yet
Project1
4 pages
Intermediate Code Generator 1
No ratings yet
Intermediate Code Generator 1
48 pages
3unit cd IntermediateCode_Part2
No ratings yet
3unit cd IntermediateCode_Part2
35 pages
UNIT-V Part 1: Functional Programming Languages
100% (1)
UNIT-V Part 1: Functional Programming Languages
22 pages
Compilier Design Ca1
No ratings yet
Compilier Design Ca1
4 pages
CD Lab Manual 2024 Ctech
No ratings yet
CD Lab Manual 2024 Ctech
37 pages
Language Reference 111202
No ratings yet
Language Reference 111202
16 pages
Reglas Semanticas
No ratings yet
Reglas Semanticas
6 pages
ch6ans
No ratings yet
ch6ans
7 pages
Basics of Python Programming: A Quick Guide for Beginners
From Everand
Basics of Python Programming: A Quick Guide for Beginners
Krishna Kumar Mohbey
No ratings yet
Problem Solving in C and Python: Programming Exercises and Solutions, Part 1
From Everand
Problem Solving in C and Python: Programming Exercises and Solutions, Part 1
Yana Kortsarts
4.5/5 (2)
Semantic Analysis
No ratings yet
Semantic Analysis
46 pages
Stanford University CS143
No ratings yet
Stanford University CS143
35 pages
Mastering C23
No ratings yet
Mastering C23
569 pages
Reference To An Entity That Precedes Its Definition in The Program Is Called Forward Reference.) Handling The Forward Reference in A
No ratings yet
Reference To An Entity That Precedes Its Definition in The Program Is Called Forward Reference.) Handling The Forward Reference in A
4 pages
CS 3723 - Programming Language: 1. Introductory Stuff
No ratings yet
CS 3723 - Programming Language: 1. Introductory Stuff
11 pages
15 Cs 63
No ratings yet
15 Cs 63
424 pages
CC Lab 4 Report
No ratings yet
CC Lab 4 Report
6 pages
17CSL67 Lab Manual
No ratings yet
17CSL67 Lab Manual
73 pages
TCS Theory Questions
No ratings yet
TCS Theory Questions
7 pages
ICSE Supplement - 2 July
No ratings yet
ICSE Supplement - 2 July
48 pages
NLP Final
No ratings yet
NLP Final
4 pages
Compiler Design
No ratings yet
Compiler Design
31 pages
Normalization and Pre-Tokenization - Hugging Face NLP Course
No ratings yet
Normalization and Pre-Tokenization - Hugging Face NLP Course
11 pages
RoBERTa-GCN A Novel Approach For Combating Fake News in Bangla Using Advanced Language Processing and Graph Convolutional Networks
No ratings yet
RoBERTa-GCN A Novel Approach For Combating Fake News in Bangla Using Advanced Language Processing and Graph Convolutional Networks
20 pages
Module-1 Introduction To NLP
No ratings yet
Module-1 Introduction To NLP
28 pages
INTRO
No ratings yet
INTRO
8 pages
Facebook Natural Language Engineering
No ratings yet
Facebook Natural Language Engineering
8 pages
XII - CS - Bridge Course
No ratings yet
XII - CS - Bridge Course
59 pages
Compiler Design Lab Manual
No ratings yet
Compiler Design Lab Manual
47 pages
Cse 142 Baby Names
No ratings yet
Cse 142 Baby Names
14 pages
Book Review Corpus Approaches To The Language of Sports (Texts, Media, Modalities)
No ratings yet
Book Review Corpus Approaches To The Language of Sports (Texts, Media, Modalities)
4 pages
Compiler Design MCQ Questions PDF
No ratings yet
Compiler Design MCQ Questions PDF
6 pages
Atcd mp1
No ratings yet
Atcd mp1
3 pages
Unit-V Expert Systems-Notes
No ratings yet
Unit-V Expert Systems-Notes
23 pages
Lab Manual: Regulation: 2013 Branch: B.E. - CSE Year & Semester: III Year / VI Semester
No ratings yet
Lab Manual: Regulation: 2013 Branch: B.E. - CSE Year & Semester: III Year / VI Semester
67 pages
CSE 326 Lecture 1
No ratings yet
CSE 326 Lecture 1
22 pages
The Java Language Specification Third Edition James Gosling 2024 Scribd Download
100% (1)
The Java Language Specification Third Edition James Gosling 2024 Scribd Download
81 pages
Chapter-1 Compiler Design
No ratings yet
Chapter-1 Compiler Design
13 pages
Compiler Construction and Phases
No ratings yet
Compiler Construction and Phases
8 pages