0% found this document useful (0 votes)
3 views

parser (1)

The document outlines the process of setting up an 8-bit CPU simulator and developing a simple programming language called SimpleLang. It details the steps for implementing a lexer, parser, and code generation, along with the architecture of the CPU and the structure of an Abstract Syntax Tree (AST). Additionally, it provides a breakdown of the source code organization and implementation strategy for building the compiler incrementally.

Uploaded by

Meghana Sangapur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

parser (1)

The document outlines the process of setting up an 8-bit CPU simulator and developing a simple programming language called SimpleLang. It details the steps for implementing a lexer, parser, and code generation, along with the architecture of the CPU and the structure of an Abstract Syntax Tree (AST). Additionally, it provides a breakdown of the source code organization and implementation strategy for building the compiler incrementally.

Uploaded by

Meghana Sangapur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Setup the 8-bit CPU Simulator

● Clone and Explore: Start by cloning the 8-bit CPU repository from GitHub. Spend time
understanding the CPU’s architecture, especially the instruction set, which will be crucial
for translating high-level code to assembly.
● Run Examples: Run the provided assembly code examples to familiarize yourself with
how the CPU executes instructions. This will help you understand what kind of assembly
code your compiler needs to generate.

Understand the 8-bit CPU Architecture

● Instruction Set: Focus on the instruction set first. Map each SimpleLang construct (like
variable declarations, assignments, and conditionals) to corresponding CPU instructions.
For example, you’ll need to understand how to implement arithmetic operations and
conditionals using the CPU’s instruction set.
● Key Files in Verilog: Review the Verilog code, particularly files like machine.v, to
understand how the CPU handles various instructions. This insight will guide your
assembly code generation.

Design SimpleLang

● Syntax and Semantics: Clearly define the syntax and semantics of your language.
Keep it simple, as the project suggests, focusing on basic constructs like variable
declarations, assignments, arithmetic operations, and conditionals.
● Documentation: Document the language with examples. This will serve as a reference
when writing your lexer, parser, and code generator.

Create a Lexer

● Tokenization: Implement a lexer that can recognize SimpleLang’s tokens: keywords


(int, if), operators (+, -, =), identifiers, and literals (numbers). You can use regular
expressions or finite state machines (FSMs) for this purpose.
● Error Handling: Ensure your lexer can handle lexical errors, such as invalid identifiers
or incomplete tokens.

Develop a Parser

● AST Generation: Implement a parser that takes the tokens from your lexer and
generates an Abstract Syntax Tree (AST). The AST represents the syntactic structure of
the SimpleLang program.
● Syntax Error Handling: Your parser should detect and report syntax errors, such as
missing semicolons or unmatched braces, gracefully.
Generate Assembly Code

● Mapping to Assembly: Traverse the AST and generate corresponding assembly code
for the 8-bit CPU. Each node in the AST should correspond to one or more assembly
instructions.
● Optimizations: Though the language is simple, consider basic optimizations, such as
minimizing the number of instructions or reusing CPU registers efficiently.

SimpleLang-Compiler/
├── src/
│ ├── lexer/
│ │ ├── lexer.h
│ │ ├── lexer.c
│ ├── parser/
│ │ ├── parser.h
│ │ ├── parser.c
│ ├── codegen/
│ │ ├── codegen.h
│ │ ├── codegen.c
│ ├── ast/
│ │ ├── ast.h
│ │ ├── ast.c
│ ├── main.c
│ └── utils/
│ ├── utils.h
│ ├── utils.c
├── include/
│ ├── tokens.h
│ ├── syntax.h
│ └── errors.h
├── tests/
│ ├── test_cases.sl
│ └── test_runner.c
├── Makefile
└── README.md

Component Breakdown

2.1. src/lexer/: Lexer Implementation

● lexer.h: Header file declaring functions and data structures used by the lexer.
○ Function prototypes: tokenize(), getNextToken(), etc.
○ Data structures: Token, TokenType.
● lexer.c: Source file implementing the lexer.
○ tokenize(): Reads the SimpleLang source code and produces a list of tokens.
○ getNextToken(): Retrieves the next token from the source code.

2.2. src/parser/: Parser Implementation

● parser.h: Header file declaring the parser functions and data structures.
○ Function prototypes: parse(), parseStatement(), parseExpression(),
etc.
○ Data structures: ASTNode, ParserState.
● parser.c: Source file implementing the parser.
○ parse(): Converts tokens into an AST.
○ parseStatement(), parseExpression(): Handles specific parsing tasks
like parsing a statement or an expression.

2.3. src/codegen/: Code Generation

● codegen.h: Header file declaring code generation functions.


○ Function prototypes: generateCode(), generateExpressionCode(), etc.
● codegen.c: Source file implementing code generation.
○ generateCode(): Converts the AST into 8-bit CPU assembly code.
○ generateExpressionCode(): Generates assembly code for expressions.

2.4. src/ast/: Abstract Syntax Tree (AST) Representation

● ast.h: Header file defining the structure of AST nodes.


○ Data structures: ASTNode, NodeType, VariableNode, ExpressionNode, etc.
● ast.c: Source file for AST-related utilities.
○ Functions: createASTNode(), destroyASTNode(), etc.

2.5. src/utils/: Utility Functions

● utils.h: Header file for general utility functions.


○ Function prototypes: error(), debug(), etc.
● utils.c: Source file implementing utility functions.
○ error(): Handles error reporting.
○ debug(): Outputs debug information during compilation.

2.6. src/main.c: Compiler Entry Point


● main.c: The main entry point of the compiler.
○ This file coordinates the entire compilation process: lexing, parsing, code
generation, and writing output to a file.

Implementation Strategy

1. Start with the Lexer: Implement and test the lexer first, as it's the foundation for
parsing.
2. Move to the Parser: Once the lexer is stable, develop the parser to generate the AST.
3. AST and Codegen: Implement the AST representation and the code generation
module.
4. Integrate and Test: Combine the components and run test cases from the tests/
directory.
5. Iterative Development: Build the compiler incrementally, testing each stage before
moving on.

Abstract Syntax Tree (AST) Reference


What is an AST?

An Abstract Syntax Tree (AST) is a tree representation of the syntactic structure of source code.
Each node in the tree represents a construct occurring in the source code. The AST abstracts
away the details of syntax, focusing instead on the hierarchical and structural aspects of the
code.

Basic Components of an AST

1. Nodes: Represent syntactic constructs (e.g., expressions, statements).


○ Expression Nodes: Handle arithmetic expressions, logical comparisons.
○ Statement Nodes: Handle variable declarations, assignments, and conditionals.
2. Tree Structure: Nodes are connected to represent the hierarchical relationship between
constructs.

Example AST Structure

For the SimpleLang code:

c
Copy code
a = b + c;

The corresponding AST could look like this:

scss
Copy code
AssignmentNode
├── IdentifierNode(a)
└── BinaryOpNode(+)
├── IdentifierNode(b)
└── IdentifierNode(c)

Implementing an AST in C/C++

Node Structure:

cpp
Copy code
// Base class for AST nodes
class ASTNode {
public:
virtual ~ASTNode() {}
virtual void print() const = 0; // Function to print the node
};

// Identifier Node
class IdentifierNode : public ASTNode {
public:
std::string name;
IdentifierNode(const std::string &name) : name(name) {}
void print() const override { std::cout << "Identifier: " << name
<< std::endl; }
};

// Binary Operation Node


class BinaryOpNode : public ASTNode {
public:
std::string op;
ASTNode* left;
ASTNode* right;
BinaryOpNode(const std::string &op, ASTNode* left, ASTNode* right)
: op(op), left(left), right(right) {}
void print() const override {
std::cout << "Binary Operation: " << op << std::endl;
left->print();
right->print();
}
};

// Assignment Node
class AssignmentNode : public ASTNode {
public:
ASTNode* lhs;
ASTNode* rhs;
AssignmentNode(ASTNode* lhs, ASTNode* rhs) : lhs(lhs), rhs(rhs) {}
void print() const override {
std::cout << "Assignment:" << std::endl;
lhs->print();
rhs->print();
}
};

Constructing the AST:

cpp
Copy code
int main() {
// Example: a = b + c;
ASTNode* lhs = new IdentifierNode("a");
ASTNode* rhs = new BinaryOpNode("+", new IdentifierNode("b"), new
IdentifierNode("c"));
ASTNode* assignment = new AssignmentNode(lhs, rhs);

assignment->print(); // Output the AST structure

// Clean up
delete assignment;
delete rhs;
delete lhs;

return 0;
}

Assembly Code Generation Reference


Mapping AST to Assembly

Once you have an AST, you need to traverse it to generate corresponding assembly code. The
key idea is to map high-level constructs (like arithmetic expressions and assignments) to
low-level CPU instructions.

Example: Simple Addition

Given the AST for a = b + c;, the assembly generation would involve:

1. Loading the value of b into a register.


2. Adding the value of c to the register.
3. Storing the result back into the memory location for a.

Assembly Code Generation in C/C++

Code Generation Functions:

cpp
Copy code
class CodeGenerator {
public:
void generateCode(ASTNode* node) {
if (dynamic_cast<AssignmentNode*>(node)) {
generateAssignment(dynamic_cast<AssignmentNode*>(node));
}
// Add more cases for other node types...
}

private:
void generateAssignment(AssignmentNode* node) {
generateExpression(node->rhs);
std::cout << "STORE " <<
dynamic_cast<IdentifierNode*>(node->lhs)->name << std::endl;
}

void generateExpression(ASTNode* node) {


if (auto binOp = dynamic_cast<BinaryOpNode*>(node)) {
generateExpression(binOp->left);
std::cout << "PUSH" << std::endl;
generateExpression(binOp->right);
std::cout << (binOp->op == "+" ? "ADD" : "SUB") <<
std::endl;
} else if (auto id = dynamic_cast<IdentifierNode*>(node)) {
std::cout << "LOAD " << id->name << std::endl;
}
}
};

Generating Assembly for a Sample Program:

cpp
Copy code
int main() {
// Example: a = b + c;
ASTNode* lhs = new IdentifierNode("a");
ASTNode* rhs = new BinaryOpNode("+", new IdentifierNode("b"), new
IdentifierNode("c"));
ASTNode* assignment = new AssignmentNode(lhs, rhs);

CodeGenerator codeGen;
codeGen.generateCode(assignment);

// Output might be:


// LOAD b
// PUSH
// LOAD c
// ADD
// STORE a

// Clean up
delete assignment;
delete rhs;
delete lhs;

return 0;
}

You might also like