0% found this document useful (0 votes)
2 views

lab4_tinycpu

The document outlines a lab assignment for CSE 140L at UCSD, focusing on the design of a tiny computer using Verilog. It details the architecture, instruction set, and operation of the computer, including the fetch, decode, and execute cycle. Students are required to implement various components of the computer and extend the provided code for additional instructions while learning about data and control paths.

Uploaded by

chowresearch22
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

lab4_tinycpu

The document outlines a lab assignment for CSE 140L at UCSD, focusing on the design of a tiny computer using Verilog. It details the architecture, instruction set, and operation of the computer, including the fetch, decode, and execute cycle. Students are required to implement various components of the computer and extend the provided code for additional instructions while learning about data and control paths.

Uploaded by

chowresearch22
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

CSE 140L Lab 4 - Design of a Tiny

Computer
Gopi Tummala, Prof. CK Cheng, FALL 2010, UCSD

Report and demo due: 12/3/2010

Objective
In this lab, we will build a tiny computer (description given separately) in Verilog. The execution results will be
displayed in the LED digits of your board. Unlike a real computer, our tiny computer will consist of few
instructions.

Computer details
System Overview
A traditional digital computer consists of three main units, the processor or central processing unit (CPU), the
memory that stores program instructions and data, and the input/output hardware that communicates to other
devices. As seen in Figure 1, these units are connected by a collection of parallel digital signals called a bus.
Typically, signals on the bus include the memory address, memory data, and bus status. Bus status signals indicate
the current bus operation, memory read, memory write, or input/output operation.

Figure-1: Architecture of a Tiny Computer System.

Internally, the CPU contains a small number of registers that are used to store data inside the processor. Registers
such as PC, IR, AC, MAR and MDR are built using D flip-flops for data storage. One or more arithmetic logic units
(ALUs) are also contained inside the CPU. The ALU is used to perform arithmetic and logical operations on data
values. Common ALU operations include add, subtract, multiplication and logical and/or operations. Register-to-
bus connections are hard wired for simple point-to-point connections. When one of several registers can drive the
bus, the connections are constructed using multiplexers. The control unit is a complex state machine that controls
the internal operation of the processor. The primary operation performed by the processor is the execution of
sequences of instructions stored in main memory. The CPU (processor) fetches (reads) an instruction from
memory, decodes the instruction to determine what operations are required, and then executes the instruction.
The control unit controls this sequence of operations in the processor.

1
Computer Programs and Instructions
A computer program is a sequence of instructions that perform a desired operation. Instructions are stored in
memory. For the computer design in this Lab-4, an instruction consists of 16 bits. The high eight bits of the
instruction contain the opcode. The instruction operation code or "opcode" specifies the operation, such as add or
subtract, that will be performed by the instruction. Typically, an instruction sends one set of data values through
the ALU to perform this operation. The low eight bits of each instruction contain a memory address field.
Depending on the opcode, this address may point to a data location or the location of another instruction.

Figure -2: Tiny Computer instruction format

Instruction Mnemonic Operation Preformed Opcode Value


ADD Address AC <= AC + contents of memory Address 00
STORE Address contents of memory Address <= AC 01
LOAD Address AC <= contents of memory Address 02
JUMP Address PC <= Address 03
Figure -3 Computer Instructions

Example Computer Program for A = B + C:


Assembly Language Machine Language
LOAD B 0211
ADD C 0012
STORE A 0110

More details on Control Path and Data Path


Control Flow and Path
A simple state machine called the control unit controls the sequence of operations (figure-4) in the processor. The
CPU contains a general-purpose data register called the accumulator (AC) and the program counter (PC). The
arithmetic logic unit (ALU) is used for arithmetic and logical operations.

Figure-4

The processor reads or fetches an instruction from memory, decodes the instruction to determine what operations
are required, and then executes the instruction as shown in Figure-4.

2
Implementation of the fetch, decode, and execute cycle requires several register transfer operations and clock
cycles as given below:
1. The program counter contains the address of the current instruction.
2. To fetch the next instruction from memory the processor must increment the program counter (PC).
3. The processor must then send the address value in the PC to memory over the bus by loading the memory
address register (MAR) and start a memory read operation on the bus.
4. After a small delay, the instruction data will appear on the memory data bus lines, and it will be latched into
the memory data register (MDR).
5. Execution of the instruction may require an additional memory cycle so the instruction is normally saved in the
CPU's instruction register (IR).
6. Using the value in the IR, the instruction can now be decoded.
7. Execution of the instruction will require additional operations in the CPU and perhaps additional memory
operations.
8. The Accumulator (AC) is the primary register used to perform data calculations and to hold temporary
program data in the processor.
9. After completing execution of the instruction the processor begins the cycle again by fetching the next
instruction.

More Detailed View


The fetch, decode, and execute cycle can be implemented in this computer using the sequence of register transfer
operations as shown in figure 5.
The next instruction is fetched from memory with the following register transfer operations:
MAR = PC
Read Memory
MDR = Instruction value from memory
IR = MDR
PC = PC + 1
After this sequence of operations, the current instruction is in the instruction register (IR). This instruction is one of
several possible machine instructions such as ADD, LOAD, or STORE. The opcode field is tested to decode the
specific machine instruction. The address field of the instruction register contains the address of possible data
operands. Using the address field, a memory read is started in the decode state.

Figure 5 : Detailed View of Fetch, Decode, and Execute for the Tiny Computer Design

The ‘decode’ state transfers control to one of several possible next states based on the opcode value. Each
instruction requires a short sequence of register transfer operations to implement or execute that instruction.
These register transfer operations are then performed to execute the instruction. Only a few of the instruction
execute states are shown in Figure 5. When execution of the current instruction is completed, the cycle repeats by

3
starting a memory read operation and returning to the fetch state. A small state machine (FSM) called a control
unit is used to control these internal processor states and control signals.

Datapath
Figure 6a is the datapath used for the implementation of the Tiny Computer.
1. A computer’s datapath consists of the registers, memory interface, ALUs, and the bus structures used to
connect them.
2. The vertical lines are the three major busses used to connect the registers.
3. On the bus lines in the datapath, a “/” with a number indicates the number of bits on the bus.
4. Data values present on the active busses are shown in hexadecimal.
5. MW is the memory write control line.
6. A reset must be used to force the processor into a known state after power is applied.
7. The initial contents of registers and memory produced by a reset can also be seen in Figure 6a.
8. Since the PC and MAR are reset to 00, program execution will start at 00.

Note that memory contains the machine code for the example program presented earlier (in section Computer
Programs and Instructions). Recall that the program consists of a LOAD, ADD, and STORE instruction starting at
address 00. Data values for this example program are stored in memory locations, 10, 11, and 12.

[a] [b]

4
[c] [d]
Figure 6:
a: Datapath used for the Tiny Computer Design after applying Reset
b: Register transfers in the ADD instruction’s Fetch State
c: Register transfers in the ADD instruction’s Decode State
d: Register transfers in the ADD instruction’s Execute State.

Example and explanation


Consider the execution of the ADD machine instruction (0012) stored at program location 01 in detail. The
instruction, ADD address, adds the contents of the memory location at address 12 to the contents of AC and stores
the result in AC.
The following sequence of register transfer operations will be required to fetch, decode and execute this
instruction.
Register Transfer Cycle Description
1. FETCH First, the memory address register is loaded with the PC. In the example program,
MAR = PC prior to the ADD instruction (0012) is at location 01 in memory, so the PC and MAR will
fetch, read memory, both contain 01. In this implementation of the computer, the MAR=PC operation
IR = MDR, PC = PC + will be moved to the end of the fetch, decode, and execute loop to the execute
1 state in order to save a clock cycle. To fetch the instruction, a memory read
operation is started. After a small delay for the memory access time, the ADD
instruction is available at the input of the instruction register. To set up for the
next instruction fetch, one is added to the program counter. The last two
operations occur in parallel during one clock cycle using two different data busses.
2. DECODE At the rising edge of the clock signal, the decode state is entered.
Decode Opcode to Using the new value in the IR, the CPU control hardware decodes the instruction's
find Next State, opcode of 00 and determines that this is an ADD instruction.
MAR = IR, and start Therefore, the next state in the following clock cycle will be the execute state for
memory read the ADD instruction.
Instructions typically are decoded in hardware using combinational circuits such as
decoders; or a small ROM. A memory read cycle is always started in decode, since
the instruction may require a memory data operand in the execute state.

5
The ADD instruction requires a data operand from memory address 12. In Figure
6c, the low 8–bit address field portion of the instruction in the IR is transferred to
the MAR. At the next clock, after a small delay for the memory access time, the
ADD instruction’s data operand value from memory (0003) will be available in the
MDR.
3. EXECUTE ADD The two values can now be added. The ALU operation input is set for addition by
AC = AC + MDR, the control unit. As shown in Figure 6d, the MDR’s value of 0003 is fed into one
MAR = PC*, and input of the ALU. The contents of register AC (0004) are fed into the other ALU
GOTO FETCH input. After a small delay for the addition circuitry, the sum of 0007 is produced by
the ALU and will be loaded into the AC at the next clock. To provide the address
for the next instruction fetch, the MAR is loaded with the current value of the PC
(02). Note that by moving the operation, MAR=PC, to every instruction’s final
execute state, the fetch state can execute in one clock cycle. The ADD instruction
is now complete and the processor starts to fetch the next instruction at the next
clock cycle. Since three states were required, an ADD instruction will require three
clock cycles to complete the operation.

Verilog Code of TC140L (Tiny Computer 140Lab):


To demonstrate the operation of the tiny computer using Verilog, a Verilog model of the tiny computer is given
(refer to the zip files).
1. The computer’s RAM memory is implemented using the Altsyncram function which uses the FPGA’s internal
memory blocks.
2. The remainder of the computer model is basically a Verilog based state machine that implements the fetch,
decode, and execute cycle.
3. The first few lines declare internal registers for the processor along with the states needed for the fetch,
decode and execute cycle.
4. A long CASE statement is used to implement the control unit state machine. A reset state is needed to
initialize the processor.
5. In the reset state, several of the registers are reset to zero and a memory read of the first instruction is
started.
6. This forces the processor to start executing instructions at location 00 in a predictable state after a reset.
7. A second case statement at the end of the code makes assignments to the memory address register based on
the current state.

Lab Questions:
This lab will seem to be tough at first, as you are essentially designing a very simple tiny computer. Many of the
concepts here are new. However, we will provide you guys with many of the modules and try to guide you guys as
much as possible. In this lab, we will structure things for you so you do not have to come up with anything on your
own (you will get to do that in 141L). Instead, you will learn how a basic computer works, how things related to
what you learned in CSE 30 (remember, these instructions are essentially assembly commands) and how data path
and control path work. This lab will essentially be the culmination of all of the things you have learned in CSE 140
and CSE 140L up until now.

You will be designing logic in Verilog HDL. You will be implementing both the data path (the components that
handle data manipulation and storage) and the control path (the components than determine how to process the
current instruction, and control the modules in the data path).

6
PART 1:
Full computer implementation for ADD instruction is given and you should extend this code for rest of the
instructions. Most of the modules and structure of the code is provided. You need to extend the given code and
implement the following:

1. Instruction Fetch Stage - instruction_fetch.v


2. Instruction Decoder Stage instruction_decoder.v
3. Control/Execute FSM (sequential) - tc140l.v
4. ALU (combinational) for the instructions in the following figure (in red color) - tc140l.v

Instruction Mnemonic Operation Preformed Opcode Value


ADD Address AC <= AC + contents of memory address 00
STORE Address contents of memory address <= AC 01
LOAD Address AC <= contents of memory address 02
JUMP Address PC <= address 03
JNEG Address If AC < 0 Then PC <= address 04
SUB Address AC = AC - MDR 05
XOR Address AC = AC XOR MDR 06
OR Address AC = AC OR MDR 07
AND Address AC = AC AND MDR 08
JPOS Address IF AC > 0 THEN PC <= address 09
JZERO Address If AC = 0 Then PC <= address 0A
ADDI Data AC = AC + Data 0B
OUT xxxx 7-Seg LED displays hex value of AC 0C
SHL Data AC = AC shifted left by data bits 0D
SHR Data AC = AC shifted right by data bits 0E

In the logical XOR instruction each bit is exclusive OR’ed with the corresponding bit in each operation for a
total of sixteen independent exclusive OR operations. This is called a bitwise logical operation. OR and AND are
also bitwise logical operations.
For Shift instructions, only the low four bits of the address field contain the shift amount. The other four
bits are always zero.
For OUT: These instructions modify or use only the low eight bits of AC. Add a new register,
register_output, to the input of the seven-segment decoder that drives the LED display. The register is loaded with
the value of AC only when an OUT instruction is executed. (Could you see this already in the code given?)

- Extend the RESET implementation to restore and reset the full processor (not a big design change  )
PART 2:
Find the maximum clock rate of the Tiny computer. Examine the project’s compiler report and find the logic cell
(LC) percentage utilized
Part3:
PART 3:
Tiny is Old: The TC140L’s multiple clock cycles per instruction implementation approach was used in early
generation microprocessors. These computers had limited hardware, since the VLSI technology at that time
supported orders of magnitude fewer gates on a chip than is now possible in current devices.

Current generation processors, such as those used in personal computers, have a hundred or more instructions,
and use additional means to speedup program execution. Instruction formats are more complex with up to 32 data
registers and with additional instruction bits that are used for longer address fields and more powerful addressing
modes.

7
Google and explain in no more than a paragraph as to what the following are
1. Pipelined processor
2. Superscalar processor

How could you extend the Tiny computer design to modernize by incorporating the above features. (Write in
two/three points)

DEMONSTRATION

You will need to demonstrate your project to a lab TA. Please look at the office hours for when we are available.
You can bring your laptop with your FPGA board to us and program the FPGA board in between demonstrations.
For those of you without a laptop, we had set up machines in the lab that have Quartus II for your use, so you can
bring your saved files on a flash drive or email and reprogram on the lab machines. If all looks good, we will give
you thumbs up. If something doesn't work correctly, there is always a second chance till the deadline 

*Please download the demo/test program which will be used in your demonstration.

Note: Currently Fibonacci test program is provided with the zip file. Many other test programs will be appended on
the webpage

INPUT: Instructions can be read from files into the ROM. The instruction ROM will then contain the program
instructions. Each individual instruction will be stored in order in the ROM using addresses. So, the first instruction
will be stored at address 0, the second at address 1, and so on.

The clock will can be varied by hacking clock_divider.v e.g.:


parameter DIV_CONST = 10000000; // for 1Hz clock
parameter DIV_CONST = 4000000; // etc

Feel free to hack this constant to slow down the clock or accelerate it. Every second then, the program counter
increments 1. So the program counter goes 0,1,2,3,4,5,6,7,8,... The program counter feeds into the instruction
Memory(RAM), sending the current program cycle. The instruction RAM will take that number and output the
corresponding instruction on the next clock rising edge. For example, if the current program counter cycle is 3,
then the instruction RAM will output whatever instruction is stored in address 3.
Buttons on Board:
Push/Toggle buttons:
KEY[0]: reset
SW[3]: Input clock ticks (used to debug)
SW[4]: Selects between internal clock and your Inputs to tick the processor from KEY[3] i.e. Pushbutton 3

OUTPUT: Four numbers output to the 4 LED displays in hexadecimal format


Buttons on Board:
Toggle Keys:
SW[2:0]: Selects between the following displays
3'b000: Accumulator
3'b001: PC
3'b010: MDR (Memory Data Register)
3'b011: IR
3'b100: OUT
REPORT INSTRUCTIONS

You lab report should include:

8
1. Title page, which contains Names and PID of students, due date, title of lab, and brief description of each
person's contribution.
2. Architecture diagram of the top-level design, ALU (datapath) and registers must use BUS.
3. Verilog HDL code of the design.
4. For the test program, show the instructions inside ROM in a table including the index of instruction, the
command of instruction, the 4-bit instruction, and the 4-bit data.
5. Describe the minimal clock cycle and the limiting path.
6. What is the control path? Why is it called that? What is the data path? Why is it called that? How does the
control path relate to the data path?
7. Describe step by step what happens when the instruction is JUMP.
8. Let's say you wanted to add a MULT operation into your CPU. What would you have to do? Start from the
instruction set (what would you have to do the instructions in order to support a MULT operation?) and
continue through the other modules.
9. Describe step by step what happens when the instruction that is being processed is MULT address.

GRADING

50% of your grade will be from the demonstration, so this is an easy 50%. Another 10% will be in your architecture
design, and timing diagrams. Easy points assuming you were able to demonstrate this for us. 40% will be in your
answers to the questions.

You might also like