lab4_tinycpu
lab4_tinycpu
Computer
Gopi Tummala, Prof. CK Cheng, FALL 2010, UCSD
Objective
In this lab, we will build a tiny computer (description given separately) in Verilog. The execution results will be
displayed in the LED digits of your board. Unlike a real computer, our tiny computer will consist of few
instructions.
Computer details
System Overview
A traditional digital computer consists of three main units, the processor or central processing unit (CPU), the
memory that stores program instructions and data, and the input/output hardware that communicates to other
devices. As seen in Figure 1, these units are connected by a collection of parallel digital signals called a bus.
Typically, signals on the bus include the memory address, memory data, and bus status. Bus status signals indicate
the current bus operation, memory read, memory write, or input/output operation.
Internally, the CPU contains a small number of registers that are used to store data inside the processor. Registers
such as PC, IR, AC, MAR and MDR are built using D flip-flops for data storage. One or more arithmetic logic units
(ALUs) are also contained inside the CPU. The ALU is used to perform arithmetic and logical operations on data
values. Common ALU operations include add, subtract, multiplication and logical and/or operations. Register-to-
bus connections are hard wired for simple point-to-point connections. When one of several registers can drive the
bus, the connections are constructed using multiplexers. The control unit is a complex state machine that controls
the internal operation of the processor. The primary operation performed by the processor is the execution of
sequences of instructions stored in main memory. The CPU (processor) fetches (reads) an instruction from
memory, decodes the instruction to determine what operations are required, and then executes the instruction.
The control unit controls this sequence of operations in the processor.
1
Computer Programs and Instructions
A computer program is a sequence of instructions that perform a desired operation. Instructions are stored in
memory. For the computer design in this Lab-4, an instruction consists of 16 bits. The high eight bits of the
instruction contain the opcode. The instruction operation code or "opcode" specifies the operation, such as add or
subtract, that will be performed by the instruction. Typically, an instruction sends one set of data values through
the ALU to perform this operation. The low eight bits of each instruction contain a memory address field.
Depending on the opcode, this address may point to a data location or the location of another instruction.
Figure-4
The processor reads or fetches an instruction from memory, decodes the instruction to determine what operations
are required, and then executes the instruction as shown in Figure-4.
2
Implementation of the fetch, decode, and execute cycle requires several register transfer operations and clock
cycles as given below:
1. The program counter contains the address of the current instruction.
2. To fetch the next instruction from memory the processor must increment the program counter (PC).
3. The processor must then send the address value in the PC to memory over the bus by loading the memory
address register (MAR) and start a memory read operation on the bus.
4. After a small delay, the instruction data will appear on the memory data bus lines, and it will be latched into
the memory data register (MDR).
5. Execution of the instruction may require an additional memory cycle so the instruction is normally saved in the
CPU's instruction register (IR).
6. Using the value in the IR, the instruction can now be decoded.
7. Execution of the instruction will require additional operations in the CPU and perhaps additional memory
operations.
8. The Accumulator (AC) is the primary register used to perform data calculations and to hold temporary
program data in the processor.
9. After completing execution of the instruction the processor begins the cycle again by fetching the next
instruction.
Figure 5 : Detailed View of Fetch, Decode, and Execute for the Tiny Computer Design
The ‘decode’ state transfers control to one of several possible next states based on the opcode value. Each
instruction requires a short sequence of register transfer operations to implement or execute that instruction.
These register transfer operations are then performed to execute the instruction. Only a few of the instruction
execute states are shown in Figure 5. When execution of the current instruction is completed, the cycle repeats by
3
starting a memory read operation and returning to the fetch state. A small state machine (FSM) called a control
unit is used to control these internal processor states and control signals.
Datapath
Figure 6a is the datapath used for the implementation of the Tiny Computer.
1. A computer’s datapath consists of the registers, memory interface, ALUs, and the bus structures used to
connect them.
2. The vertical lines are the three major busses used to connect the registers.
3. On the bus lines in the datapath, a “/” with a number indicates the number of bits on the bus.
4. Data values present on the active busses are shown in hexadecimal.
5. MW is the memory write control line.
6. A reset must be used to force the processor into a known state after power is applied.
7. The initial contents of registers and memory produced by a reset can also be seen in Figure 6a.
8. Since the PC and MAR are reset to 00, program execution will start at 00.
Note that memory contains the machine code for the example program presented earlier (in section Computer
Programs and Instructions). Recall that the program consists of a LOAD, ADD, and STORE instruction starting at
address 00. Data values for this example program are stored in memory locations, 10, 11, and 12.
[a] [b]
4
[c] [d]
Figure 6:
a: Datapath used for the Tiny Computer Design after applying Reset
b: Register transfers in the ADD instruction’s Fetch State
c: Register transfers in the ADD instruction’s Decode State
d: Register transfers in the ADD instruction’s Execute State.
5
The ADD instruction requires a data operand from memory address 12. In Figure
6c, the low 8–bit address field portion of the instruction in the IR is transferred to
the MAR. At the next clock, after a small delay for the memory access time, the
ADD instruction’s data operand value from memory (0003) will be available in the
MDR.
3. EXECUTE ADD The two values can now be added. The ALU operation input is set for addition by
AC = AC + MDR, the control unit. As shown in Figure 6d, the MDR’s value of 0003 is fed into one
MAR = PC*, and input of the ALU. The contents of register AC (0004) are fed into the other ALU
GOTO FETCH input. After a small delay for the addition circuitry, the sum of 0007 is produced by
the ALU and will be loaded into the AC at the next clock. To provide the address
for the next instruction fetch, the MAR is loaded with the current value of the PC
(02). Note that by moving the operation, MAR=PC, to every instruction’s final
execute state, the fetch state can execute in one clock cycle. The ADD instruction
is now complete and the processor starts to fetch the next instruction at the next
clock cycle. Since three states were required, an ADD instruction will require three
clock cycles to complete the operation.
Lab Questions:
This lab will seem to be tough at first, as you are essentially designing a very simple tiny computer. Many of the
concepts here are new. However, we will provide you guys with many of the modules and try to guide you guys as
much as possible. In this lab, we will structure things for you so you do not have to come up with anything on your
own (you will get to do that in 141L). Instead, you will learn how a basic computer works, how things related to
what you learned in CSE 30 (remember, these instructions are essentially assembly commands) and how data path
and control path work. This lab will essentially be the culmination of all of the things you have learned in CSE 140
and CSE 140L up until now.
You will be designing logic in Verilog HDL. You will be implementing both the data path (the components that
handle data manipulation and storage) and the control path (the components than determine how to process the
current instruction, and control the modules in the data path).
6
PART 1:
Full computer implementation for ADD instruction is given and you should extend this code for rest of the
instructions. Most of the modules and structure of the code is provided. You need to extend the given code and
implement the following:
In the logical XOR instruction each bit is exclusive OR’ed with the corresponding bit in each operation for a
total of sixteen independent exclusive OR operations. This is called a bitwise logical operation. OR and AND are
also bitwise logical operations.
For Shift instructions, only the low four bits of the address field contain the shift amount. The other four
bits are always zero.
For OUT: These instructions modify or use only the low eight bits of AC. Add a new register,
register_output, to the input of the seven-segment decoder that drives the LED display. The register is loaded with
the value of AC only when an OUT instruction is executed. (Could you see this already in the code given?)
- Extend the RESET implementation to restore and reset the full processor (not a big design change )
PART 2:
Find the maximum clock rate of the Tiny computer. Examine the project’s compiler report and find the logic cell
(LC) percentage utilized
Part3:
PART 3:
Tiny is Old: The TC140L’s multiple clock cycles per instruction implementation approach was used in early
generation microprocessors. These computers had limited hardware, since the VLSI technology at that time
supported orders of magnitude fewer gates on a chip than is now possible in current devices.
Current generation processors, such as those used in personal computers, have a hundred or more instructions,
and use additional means to speedup program execution. Instruction formats are more complex with up to 32 data
registers and with additional instruction bits that are used for longer address fields and more powerful addressing
modes.
7
Google and explain in no more than a paragraph as to what the following are
1. Pipelined processor
2. Superscalar processor
How could you extend the Tiny computer design to modernize by incorporating the above features. (Write in
two/three points)
DEMONSTRATION
You will need to demonstrate your project to a lab TA. Please look at the office hours for when we are available.
You can bring your laptop with your FPGA board to us and program the FPGA board in between demonstrations.
For those of you without a laptop, we had set up machines in the lab that have Quartus II for your use, so you can
bring your saved files on a flash drive or email and reprogram on the lab machines. If all looks good, we will give
you thumbs up. If something doesn't work correctly, there is always a second chance till the deadline
*Please download the demo/test program which will be used in your demonstration.
Note: Currently Fibonacci test program is provided with the zip file. Many other test programs will be appended on
the webpage
INPUT: Instructions can be read from files into the ROM. The instruction ROM will then contain the program
instructions. Each individual instruction will be stored in order in the ROM using addresses. So, the first instruction
will be stored at address 0, the second at address 1, and so on.
Feel free to hack this constant to slow down the clock or accelerate it. Every second then, the program counter
increments 1. So the program counter goes 0,1,2,3,4,5,6,7,8,... The program counter feeds into the instruction
Memory(RAM), sending the current program cycle. The instruction RAM will take that number and output the
corresponding instruction on the next clock rising edge. For example, if the current program counter cycle is 3,
then the instruction RAM will output whatever instruction is stored in address 3.
Buttons on Board:
Push/Toggle buttons:
KEY[0]: reset
SW[3]: Input clock ticks (used to debug)
SW[4]: Selects between internal clock and your Inputs to tick the processor from KEY[3] i.e. Pushbutton 3
8
1. Title page, which contains Names and PID of students, due date, title of lab, and brief description of each
person's contribution.
2. Architecture diagram of the top-level design, ALU (datapath) and registers must use BUS.
3. Verilog HDL code of the design.
4. For the test program, show the instructions inside ROM in a table including the index of instruction, the
command of instruction, the 4-bit instruction, and the 4-bit data.
5. Describe the minimal clock cycle and the limiting path.
6. What is the control path? Why is it called that? What is the data path? Why is it called that? How does the
control path relate to the data path?
7. Describe step by step what happens when the instruction is JUMP.
8. Let's say you wanted to add a MULT operation into your CPU. What would you have to do? Start from the
instruction set (what would you have to do the instructions in order to support a MULT operation?) and
continue through the other modules.
9. Describe step by step what happens when the instruction that is being processed is MULT address.
GRADING
50% of your grade will be from the demonstration, so this is an easy 50%. Another 10% will be in your architecture
design, and timing diagrams. Easy points assuming you were able to demonstrate this for us. 40% will be in your
answers to the questions.