Dklahn Project Final Report
Dklahn Project Final Report
1. Abstract 5
2. Motivation 5
4. CPU (Israel) 7
System Overview 7
OpCode Decoding 7
Arithmetic and Logic Unit 8
Registers and Memory 9
Cycle State Machine 10
Support Modules 11
Testing and Simulation 12
6. Summary 24
Insights 24
Improvements/Next Steps 25
3
Appendices 26
Block Diagrams 26
Diagram 1 - Overall PPU 26
Diagram 2 - PPU Memory 27
Diagram 3 - PPU Pixel Rendering 28
Diagram 4 - PPU Video Output 29
Diagram 5 - main_6502 30
Figures 31
Figure 1 - PPU Pattern Tables 31
Figure 2 - Attribute Bytes 32
Figure 3 - PPU VRAM 33
Verilog 34
CPU Modules 34
cpu_constants.sv 34
alu_6502.sv 35
clock_gen6502.sv 38
control_decode.sv 39
cpu_demo.sv 40
implied_handler.sv 41
mem_write.sv 42
memory_controller.sv 42
reg_sel.sv 42
main_6502.sv 43
alu_tb.sv 49
clk_gen_tb.sv 51
main_6502_tb.sv 51
PPU Modules 52
Top.sv 52
PPU.sv 56
PPUCounter.sv 60
PPUStateCalculator.sv 61
OAM.sv 64
SecondaryOAM.sv 68
VRAM.sv 70
VRAMSpoofer.sv 72
Background.sv 82
Sprite_pixels.sv 85
Pixel_mux.sv 91
4
PaletteRAM.sv 92
FrameBuffer.sv 93
PPUColorMapper.sv 95
VGA.sv 97
PPUTypes.sv 99
Sources 102
5
1. Abstract
In a multitude of industries, companies often use legacy hardware. Legacy hardware is any
hardware that has been deprecated by its manufacturer but continues in widespread use. While new
hardware inevitably presents more capability, it will likely remain unproven until the end of the device’s
product cycle. Therefore, it is often in the best interest of the end-user to continue using legacy hardware.
When legacy systems eventually fail, replacements can be difficult to source, either as a result of rarity,
price, or both. Fortunately, the synthesis of deprecated hardware is feasible with the use of an FPGA. The
Nintendo Entertainment System (NES) Hardware Emulation project presents an opportunity to employ
that functionality.
The NES, manufactured by Nintendo from the mid-1980s to mid-1990s, helped repopularize
video games in North America. 20 years later, the hardware is inevitably becoming more scarce, and will
only become more scarce with time. Emulators do already exist, but they often fail to capture the true
behavior of an NES, as they are adaptations of modern hardware to the older software. With the FPGA,
we worked to reproduce the NES hardware as faithfully as possible as an exercise in legacy hardware
emulation
2. Motivation
The motivations behind this project were to take on something that sounded crazy, but seemed
doable given the time we had and the skills that we had learned in class. Implementing one of the most
iconic game systems of all time in just a few weeks was definitely challenging, but it was a project that
we were all really excited about. In addition to having a really interesting end product, we felt the project
offered a really good learning opportunity by allowing us to implement a very basic processor and very
basic graphics unit to gain insight into how modern systems work and also into the history of how
hardware has developed and how the constraints of the time (primarily memory) dictated the design
decisions that were made when creating the NES.
6
Building an NES emulator requires two main components: the 2A03 (a modified MOS 6502)
CPU and the 2C02 PPU (this device is best thought of as a primitive video card). Conjoining the two are
two buses. The address bus provides a means for the CPU to interface with all I\O devices and the PPU
via memory mapping, and the data bus facilitates the actual data transfer between them. For this project,
our implementation was able to construct the PPU and 6502 in separate states, with the PPU rendering
frames and the CPU ready to compute the majority of its instruction set.
7
4. CPU (Israel)
System Overview
At the core of the Nintendo Entertainment System (NES) is the Ricoh 2A03, a modified MOS
Systems 6502 processor. From a modern perspective, this processor is rather primitive, with a 2 MHz
clock in NTSC markets and no pipelining. However, the 6502 would redefine home computing through
the extensive use of a state machine. The 6502 is an 8-bit, little-endian processor with 3 main registers,
namely the accumulator, x-index register, and y-index register. Aside from those three, the 6502 has a few
support registers for internal use, which are the program counter. the status flag, and the stack pointer. All
registers are 8 bit except for the program counter, which holds 16 bits. In normal operation, the 6502
would receive instructions through its 8 data bus pins (in/out) resulting from a request via its 16 address
pins (out only). The value from the address pins is either the value in the program counter or an address
from an instruction. The 6502 instruction set consists of roughly 150 instructions, with 13 different
addressing modes and 56 different possible operations. The addressing modes will be discussed in further
detail in sections (b) and (e).
In order to emulate this system, several control signals needed to be generated, resulting in three
main modules and six supporting modules for auxiliary control signals. Similar to the actual 6502, one of
the most important modules is dedicated to decoding the received instructions from a raw opcode.
Obeying von Neumann architecture, the 6502 also had an Arithmetic\Logic Unit (ALU) for computations
like add with carry and bitwise or. Lastly, the the main 6502 verilog module encodes a Mealy state
machine consisting of 7 states based on clock cycle. Integrating these modules together, the result is a
largely cycle-accurate 6502, missing only the indirect addressing mode (which largely went unused by the
NES), relative addressing mode, and decimal mode (which was not present in the 2A03 variant used by
the NES).
OpCode Decoding
In order to achieve accurate results and timings, the opcodes needed to be decoded into an
addressing mode for the controller and an operation for the ALU. This module takes advantage of the
encoding scheme employed by most of the 6502 instruction set. Decomposing the byte-sized instruction
8
into individual bits and labeling them as “aaabbbcc,” there are three main divisions of the instruction set
based on the value of “cc.” Within each subset, “aaa” specifies the operation, and “bbb” specifies the
addressing mode. For example, the opcode 10101001 ($A9) specifies “LDA immediate,” 6502 Assembly.
In this case, cc = 01, which asserts that aaa = 101 calls for a load accumulator operation and bbb = 010
calls for the “immediate” addressing mode (meaning the next byte that the processor receives over its data
bus pins is the value to be operated upon by the ALU). Referring to any 6502 instruction set chart, we can
confirm that 10101001 in fact refers to LDA in the immediate addressing mode.
In order to implement a decoder, we built a LUT on case statements regarding the bit contents of
each opcode. The exceptions to the above encoding scheme are hard-coded in, since there are relatively
few. In order to encode the results, the operations are sorted alphabetically by their name in 6502
assembly language and assigned decimal numbers accordingly. The addressing modes are treated the
same way. Refer to cpu_constants.sv for the addressing mode and operation values.
Support Modules
In order to create a functioning 6502 replica, it was deemed necessary to add a few extra control
signals. The first of these extra signals is the PC flag. When reading from a specific memory address, the
6502 needs to send that address out on the address pins. When it finishes this instruction, it needs to
return to sending the value in the Program Counter for the next instruction. This was accomplished with
the 1-bit PC flag. Another module created to aid in the control of the processor was a controller for the
implied addressing mode. The implied addressing mode has a variety of possible outcomes, encapsulating
all instructions for which the operand to be used is specified by the operation name. For example, TAX,
“transfer A to X,” implies that the A register must be the input to this operation. This module cased on
several instructions for which the result would take longer than two cycles, taking the requested operation
as input and providing three signals as output. These three signals were used in transition logic to decide
whether to continue increasing the cycle counter or to go back to cycle zero.
Since the design convention of our ALU module specifies that its first operand is to come from a
register (except in those cases where the input must come from memory), a module was needed to select
which register would be pulled from. This module cased on operations that specified use of registers other
than the A register, since using the A register is most common, thereby reducing the amount of code.
In order to mimic the NES’ system clock architecture, a simple module was drawn up to divide an
input clock signal by 12. This was accomplished with a simple counter. When the counter reaches the
specified divisor, the outgoing signal is inverted and the counter is reset.
The instructions for bit shifts (ASL, LSR, ROL, and ROR), when writing to memory, are
particularly slow. The module mem_write was used to hold the cpu from going to cycle 0 until cycle 6,
which is when those instructions should terminate.
Lastly, the NES memory map incorporates several mirrored addresses. A module was devised to
“demirror” addresses aimed at internal RAM. This was accomplished by recognizing that in each of the 4
mirrored quadrants, the low byte of the address would be the same. Since the 6502 only had 2kB of
internal RAM, we could keep within the first quadrant, replacing the first 5 bits of the address high byte
with 0s when the raw address was below location $2000. It was further necessary to do this demirroring
on the program counter in case the program counter attempted to access internal memory.
12
First, reset (rst) is deasserted and the clock initializes (note the program counter was initialized at $FF00
to avoid reading from internal memory) . The data_in reads $A9 on the next positive edge, meaning the
processor should load the accumulator with an immediate value. The data_in value switches to $67,
signifying the value that ought to be stored in the accumulator. Notice that on the negative edge of the
same clock cycle, the value $67 propagates to the accumulator. The program counter reads $FF02 at the
end of this instruction, matching the cycle timing of the LDA IMMEDIATE instruction. The next input is
$85, which refers to STA zeropage (store value of A in the zeropage). The next data point tells us to store
at location $0001 since it is a zeropage address. This value is held since the processor needs a cycle to
receive this location, and then another cycle to actually write to the location. We see the rw flag go low
(write is considered an active low on the 6502). The next instruction is $A6, which is LDX ZEROPAGE.
13
The data_in line reports $01, telling the processor to access the data from location $0001. Since this
location is within the range $2000 to $0000, this is found in internal memory, which we generated as a
BRAM with a write width of 8 bits and a depth of 2048. We know the execution of this instruction was
successful because the X register then receives the value $67, exactly what was stored at that location
from a different register. Other “programs” can be written in this manner. Based on this testbench (and
many other iterations of similar programs), successful integration of this module with a working PPU
would have favorable results.
14
Visible Scanlines
During the visual scanline, the PPU is constantly fetching background tiles just in time for them
to be drawn. Each memory access takes two clock cycles, so every eight cycles, the PPU fetches four
bytes from VRAM:
These four bytes contain the data for the PPU to draw eight pixels on the screen. Therefore, every
eight cycles, the PPU obtains enough information to draw eight pixels and can keep up with drawing one
pixel every clock cycle. Each visible scanline contains 256 visible pixels which translates to 32 tiles. This
16
means that during the visible scanline, the PPU repeats this eight-cycle background tile fetch pattern 32
times which takes 256 cycles in total.
On cycle 256, once the PPU is no longer rendering visible pixels and is in its horizontal blanking
period, it starts prefetching the image data for the sprites that appear on the next scanline. Although 64
sprites can be held in the PPU’s internal Object Attribute Memory (OAM), the NES can only display
eight sprites on any given scanline. This is due to the fact that the PPU only has time to prefetch data for
eight sprites before it must once again start fetching background data. It is also interesting to note that
although sprite tile fetches take the same number of cycles (eight) as background tile fetches, no
nametable or attribute table data is fetched because the sprite tile index and color information is held
within OAM. Because of this, in this implementation, the PPU does not access VRAM during the first
four cycles of each sprite prefetch and only fetches two pattern table bytes.
Once all of the tile data for the sprites on the next scanline has been fetched from VRAM, the
PPU prefetches the first two tiles for the next scanline. The reason the PPU fetches these tiles is so that it
can support background scrolling. Background scrolling is not implemented in this version of the PPU,
but this prefetching was included to maintain the original memory access progression found in the
original NES. The background tile prefetches start at cycle 320 and follow the same progression as the
previously discussed background tile fetches
Prerender Scanline
During the prerender scanline, no pixels are drawn to the screen, so no real-time background tile
fetching is necessary. However, pixels will be rendered on the next scanline, so the sprite prefetch and
background tile prefetch phases still happen (see Figure 5.1 below).
17
One of the jobs of the OAM module is to determine which of the eight sprites will be drawn on
the scanline. Due to the timing constraints on the sprite prefetching, only the first eight sprites in OAM
that are on the next scanline will actually be drawn. In this implementation, the OAM does this range
evaluation during cycles [0, 63]. Each cycle, the OAM evaluates one sprite and determines if it appears on
the next scanline. If it does, it stores it in the secondary OAM if the secondary OAM is not full (see the
next section). A detailed block diagram of this part of the PPU can be found in Diagram 2.
The transparent color is mirrored to all palette RAM addresses of the form 5’bXXX00. This module takes
in this 5-bit address and maps it to a six-bit PPU color ID stored within the palette RAM. A detailed block
diagram of this part of the PPU can be found in Diagram 3.
Figure 5.2
VGA (Daniel)
The VGA module used is the same as was used in lab 3. The original intention was to use this
module to generate 640x480 video and to scale our NES video output by two times in order to completely
fill the screen. However, there were complications with getting the lower resolution video to work and we
opted to use the higher resolution mode (1024x768) with scaling to fill a smaller, but still usable, portion
of the screen. A detailed block diagram of this part of the PPU can be found in Diagram 4.
6. Summary
Insights
The biggest challenge in designing the 6502 emulator was in understanding the full extent of its
operation. In essence, this project was halfway a research project. Faithful recreation of deprecated
hardware requires thorough understanding of its intentional and unintentional function. We made the
(right) decision to avoid implementing unofficial opcodes, as no official NES title ever used them. Upon
understanding the function and behavior of the 6502, the project becomes much simpler, requiring some
ingenuity to be cycle accurate as well as some tedious work to ensure all opcodes are implemented
correctly. The other major lesson learned was in implementing the main_6502 module. Simply put, it
really ought to be broken up into more modules. The implied addressing mode was controllable via its
own module. It likely was worth the effort to build similar modules for the other modules in order to
create a cleaner top level module. In hindsight, we ought to have considered a combinational state
machine instead, which is closer to what the 6502 actually had. The current implementation does manage
cycle accuracy, but there exists potential for contaminating other address locations, since the switching
with the PC flag is clocked on both the positive and negative edge of the clock. Ultimately, a second pass
at this would be greatly beneficial to the project.
In the same fashion as the CPU one of the biggest challenges of faithfully recreating the NES
PPU was fully understanding the way it functioned. We struggled with having enough time for this
project because we had to use our first week to read through pages and pages of NES documentation to
try and get a firm grasp on how the PPU works. Even when we had created an overall design for the PPU
and begun creating our modules we still had to constantly refer back to documentation and change past
modules based on specific requirements for the modules we were currently working on. We nearly have a
25
fully functional PPU, given a week or so more time or having a fuller understanding of how the NES parts
specifically worked we could have probably hit full functionality.
Improvements/Next Steps
Moving forward with the CPU, a second pass at the state machine needs to be made, finding a
better way to assert results when those results are ready without risking contaminating the memory. The
chief worry with the current iteration is the possibility that the rw flag is asserted for too long, resulting in
a write to a memory location that should have been read from. The processor is still missing a few parts as
well, and, after improving the implementation of the state machine, will need the addition of the relative
addressing mode to make branching possible. Once these pieces are assembled properly, then the system
will be ready for cycle accurate NES emulation, given a working PPU and data bus connection.
I think that for the PPU we could continue where we left off and have a fully working PPU
module fairly quickly. The next step would be going back through the sprite modules (OAM, secondary
OAM, Memory Fetcher, and Sprite Renderer) to find where our current bugs with the sprites are located
and fix them.
For the project of recreating the NES as a whole, there are many things that we could add on or
improve. The biggest step would be finishing the CPU and the PPU, and then connecting them together.
Next we would like to focus on implementing our stretch goals and being able to play games from ROMs
with the system. Some stretch goals that would be imperative to traditional NES gameplay on this system
would be building an interface for the original NES controller, cartridges and also designing and
implementing an APU or audio processing unit.
26
Appendices
Block Diagrams
Diagram 5 - main_6502
31
Figures
Figures taken from “Nintendo Entertainment System Documentation”
Each name table has an associated attribute table. Attribute tables hold the upper two bits of the colours for the tiles.
Each byte in the attribute table represents a 4x4 group of tiles, so an attribute table is an 8x8 table of these groups.
Each 4x4 group is further divided into four 2x2 squares as shown in Figure 2. The 8x8 tiles are numbered $0-$F.
The layout of the byte 20 is 33221100 where every two bits specifies the most significant two colour bits for the
specified square.
33
Verilog
CPU Modules
cpu_constants.sv
`timescale 1ns / 1ps
package cpu_constants;
// operation encoding
parameter ADC = 1;
parameter AND1 = 2;
parameter ASL = 3;
parameter BCC = 4;
parameter BCS = 5;
parameter BEQ = 6;
parameter BIT1 = 7;
parameter BMI = 8;
parameter BNE = 9;
parameter BPL = 10;
parameter BRK = 11;
parameter BVC = 12;
parameter BVS = 13;
parameter CLC = 14;
parameter CLD = 15;
parameter CLI = 16;
parameter CLV = 17;
parameter CMP = 18;
parameter CPX = 19;
parameter CPY = 20;
parameter DEC = 21;
parameter DEX = 22;
parameter DEY = 23;
parameter EOR = 24;
parameter INC = 25;
parameter INX = 26;
parameter INY = 27;
parameter JMP = 28;
parameter JSR = 29;
parameter LDA = 30;
parameter LDX = 31;
parameter LDY = 32;
parameter LSR = 33;
parameter NOP = 34;
parameter ORA = 35;
parameter PHA = 36;
parameter PHP = 37;
parameter PLA = 38;
parameter PLP = 39;
parameter ROL = 40;
parameter ROR = 41;
parameter RTI = 42;
parameter RTS = 43;
parameter SBC = 44;
parameter SEC = 45;
parameter SED = 46;
parameter SEI = 47;
parameter STA = 48;
parameter STX = 49;
35
parameter STY = 50;
parameter TAX = 51;
parameter TAY = 52;
parameter TSX = 53;
parameter TXA = 54;
parameter TXS = 55;
parameter TYA = 56;
// addressing modes
parameter ACCUMULATOR = 1;
parameter ABSOLUTE = 2;
parameter ABSOLUTE_XINDEX = 3;
parameter ABSOLUTE_YINDEX = 4;
parameter IMMEDIATE = 5;
parameter IMPLIED = 6;
parameter INDIRECT = 7;
parameter INDIRECT_XINDEX = 8;
parameter INDERECT_YINDEX = 9;
parameter RELATIVE = 10;
parameter ZEROPAGE = 11;
parameter ZEROPAGE_XINDEX = 12;
parameter ZEROPAGE_YINDEX = 13;
endpackage
alu_6502.sv
`timescale 1ns / 1ps
import cpu_constants::*;
module alu_6502(
input[5:0] op,
input[7:0] a,
input[7:0] b,
input[7:0] curr_status_flag,
output logic[7:0] result,
output logic[7:0] status_flag_out
);
logic N;
logic V;
logic dash;
logic B;
logic D;
logic I;
logic Z;
logic C;
logic[8:0] int_result;
endmodule
clock_gen6502.sv
`timescale 1ns / 1ps
module clock_gen6502(
input clock,
input rst,
output logic phi_1,
output logic phi_2
);
assign phi_2 = !phi_1; // phi_2 exactly out of phase with phi_1 always
end
endmodule
39
control_decode.sv
`timescale 1ns / 1ps
import cpu_constants::*;
module control_decode(
input[7:0] opcode,
output logic[3:0] addr_mode,
output logic[5:0] op
);
always_comb begin
// some opcodes must be called out by name, they break pattern
if(opcode == 8'b0) begin op = BRK; addr_mode = IMPLIED; end
else if(opcode == 8'h20) begin op = JSR; addr_mode = ABSOLUTE; end
else if(opcode == 8'h40) begin op = RTI; addr_mode = IMPLIED; end
else if(opcode == 8'h60) begin op = RTS; addr_mode = IMPLIED; end
else if(opcode == 8'h08) begin op = PHP; addr_mode = IMPLIED; end
else if(opcode == 8'h28) begin op = PLP; addr_mode = IMPLIED; end
else if(opcode == 8'h48) begin op = PHA; addr_mode = IMPLIED; end
else if(opcode == 8'h68) begin op = PLA; addr_mode = IMPLIED; end
else if(opcode == 8'h88) begin op = DEY; addr_mode = IMPLIED; end
else if(opcode == 8'hA8) begin op = TAY; addr_mode = IMPLIED; end
else if(opcode == 8'hC8) begin op = INY; addr_mode = IMPLIED; end
else if(opcode == 8'hE8) begin op = INX; addr_mode = IMPLIED; end
else if(opcode == 8'h18) begin op = CLC; addr_mode = IMPLIED; end
else if(opcode == 8'h38) begin op = SEC; addr_mode = IMPLIED; end
else if(opcode == 8'h58) begin op = CLI; addr_mode = IMPLIED; end
else if(opcode == 8'h78) begin op = SEI; addr_mode = IMPLIED; end
else if(opcode == 8'h98) begin op = TYA; addr_mode = IMPLIED; end
else if(opcode == 8'hB8) begin op = CLV; addr_mode = IMPLIED; end
else if(opcode == 8'hD8) begin op = CLD; addr_mode = IMPLIED; end
else if(opcode == 8'hF8) begin op = SED; addr_mode = IMPLIED; end
else if(opcode == 8'h8A) begin op = TXA; addr_mode = IMPLIED; end
else if(opcode == 8'h9A) begin op = TXS; addr_mode = IMPLIED; end
else if(opcode == 8'hAA) begin op = TAX; addr_mode = IMPLIED; end
else if(opcode == 8'hBA) begin op = TSX; addr_mode = IMPLIED; end
else if(opcode == 8'hCA) begin op = DEX; addr_mode = IMPLIED; end
else if(opcode == 8'hEA) begin op = NOP; addr_mode = IMPLIED; end
else if(opcode == 8'h6C) begin op = JMP; addr_mode = INDIRECT; end // special
exception to pattern
endmodule
41
cpu_demo.sv
module cpu_demo(
input[15:0] sw,
input btnc,
input btnr,
input clk_100mhz,
output logic [15:0] led,
output ca, cb ,cc ,cd, ce, cf, cg,
output[7:0] an
);
logic[7:0] A;
logic[7:0] M;
logic[7:0] X;
logic[7:0] Y;
logic[7:0] PCL;
logic[7:0] PCH;
logic[15:0] address;
logic[7:0] data_out;
logic phi_2;
logic clean_btnr;
logic[6:0] segments;
assign {cg, cf, ce, cd, cc, cb, ca} = segments[6:0];
logic[32:0] data_in;
debounce
btnr_cleaner(.clk_in(clk_100mhz),.rst_in(btnc),.bouncey_in(btnr),.clean_out(clean_btnr));
display_7seg(.clk_in(clk_100mhz),.data_in(data_in),.seg_out(segments),.strobe_out(an));
main_6502
demo_6502(.rst(btnc),.clock(clean_btnr),.data_in(sw[7:0]),.data_out(data_out),.address(address
),
.rw(led[0]),.phi_2(phi_2),.A(A),.M(M),.X(X),.Y(Y),.PCL(PCL),.PCH(PCH));
always_comb begin
case(sw[15:14])
2'b00: data_in = {Y,X,M,A}; // display register values
2'b01: data_in = {address, PCH, PCL}; // display program counter and output
2'b10: data_in = {24'b0, data_out}; // display 6502 output
default: data_in = {Y,X,M,A};
endcase
end
endmodule
implied_handler.sv
`timescale 1ns / 1ps
// use this module to control implied addressing mode
module implied_handler(
input[5:0] op,
output logic brk,
output logic pushing,
output logic pulling
);
always_comb begin
case(op)
BRK: begin brk = 1; pushing = 0; pulling = 0; end // detect brk command
PHA: begin brk = 0; pushing = 1; pulling = 0; end // detect push to stack
42
PHP: begin brk = 0; pushing = 1; pulling = 0; end
PLA: begin brk = 0; pushing = 0; pulling = 1; end
PLP: begin brk = 0; pushing = 0; pulling = 1; end
default: begin brk = 0; pushing = 0; pulling = 0; end // all other are 2 byte
instructions handled by ALU
endcase
end
endmodule
mem_write.sv
`timescale 1ns / 1ps
module mem_write(
input[5:0] op,
input[3:0] addr_mode,
output logic long_wea
);
always_comb begin
case(op)
ASL: long_wea = !(addr_mode == ACCUMULATOR);
DEC: long_wea = 1;
INC: long_wea = 1;
LSR: long_wea = !(addr_mode == ACCUMULATOR);
ROL: long_wea = !(addr_mode == ACCUMULATOR);
ROR: long_wea = !(addr_mode == ACCUMULATOR);
default: long_wea = 0; // writing memory is specific to a few operations
endcase
end
endmodule
memory_controller.sv
`timescale 1ns / 1ps
module memory_controller(
input[7:0] outer_data,
input[7:0] inner_data,
input[7:0] ADDR_H_RAW,
input[7:0] ADDR_L,
input[7:0] PCL,
input[7:0] PCH_RAW,
input pc_flag,
output logic[7:0] mem_dat_out,
output logic[15:0] address
);
reg_sel.sv
`timescale 1ns / 1ps
import cpu_constants::*;
module reg_sel(
input[5:0] op,
input[3:0] addr_mode,
input[7:0] x,
input[7:0] y,
input[7:0] a,
input[7:0] m,
input[7:0] s,
output logic[7:0] reg_out
);
main_6502.sv
`timescale 1ns / 1ps
import cpu_constants::*;
module main_6502(
input rdy,
input irq_low,
input nmi_low,
input rst,
input so_low,
input clock,
input[7:0] data_in,
output logic[7:0] data_out,
output logic[15:0] address,
output logic sync,
output logic rw,
output logic phi_2,
output logic[7:0] A,
output logic[7:0] M,
44
output logic[7:0] X,
output logic[7:0] Y,
output logic[7:0] PCL,
output logic[7:0] PCH // for demo purposes, added registers and program counter as outputs
);
logic pc_flag; // high when address bus should recieve value in program counter, low when
should recieve from addr_bus
logic phi_1;
logic[3:0] addr_mode;
logic[5:0] op;
logic[5:0] working_op;
logic[3:0] working_addr;
logic[2:0] cycle;
logic[7:0] result; // ALU output
logic[7:0] flag_out;
logic long_wea; // flag for enabling cpu to write to memory based on recieved instruction
logic shifting; // flag for when shifts take place, only important when memory is the
target
logic STORE; // flag for STA, STX, and STY
3'd1: begin
// next step depends on addressing mode
case(working_addr)
ACCUMULATOR: begin ready_flag <= 1; cycle <= 0; end // two byte
instructions terminate on this cycle
ABSOLUTE: begin ADDR_L <= mem_data; cycle <= cycle + 1; end // receive
first byte of memory read/write location
ABSOLUTE_XINDEX: begin ADDR_L <= mem_data + X; addr_calc <= mem_data +
X; cycle <= cycle + 1; end
ABSOLUTE_YINDEX: begin ADDR_L <= mem_data + Y; addr_calc <= mem_data +
Y; cycle <= cycle + 1; end
IMMEDIATE: begin operand_b <= data_in; ready_flag <= 1; cycle <= 0;
end // receive byte for operation, result ready on negedge
IMPLIED: begin
if(brk) begin
cycle <= cycle + 1;
end else if(pushing | pulling) begin
ADDR_H_RAW <= 8'h01;
46
ADDR_L <= S;
end else begin
cycle <= 0; // ALU will handle if not pushing, pulling, or brk
ready_flag <= 1;
end
end
ZEROPAGE: begin ADDR_H_RAW <= 0; ADDR_L <= mem_data; pc_flag <= 0;
cycle <= cycle + 1; end
ZEROPAGE_XINDEX: begin ADDR_H_RAW <= 0; ADDR_L <= mem_data; pc_flag <=
0; cycle <= cycle + 1; end // don't increment address yet for sake of cycle accuracy
endcase
end
3'd2: begin
case(working_addr)
ABSOLUTE: begin
ADDR_H_RAW <= mem_data;
if(working_op == JMP) begin
PCL <= ADDR_L;
PCH <= ADDR_H_RAW;
cycle <= 0;
end else if(working_op == JSR) begin
PCL_JUMP <= ADDR_L;
PCH_JUMP <= mem_data;
PCL_HOLD <= PCL;
PCH_HOLD <= PCH;
cycle <= cycle + 1;
end else begin
cycle <= cycle + 1;
pc_flag <= 1'b0;
end
end // assign high byte
ABSOLUTE_XINDEX: begin
ADDR_H_RAW <= mem_data;
pc_flag <= addr_calc[8]; // can use this address if there's no
carry
cycle <= cycle + 1;
end
ABSOLUTE_YINDEX: begin
ADDR_H_RAW <= mem_data;
pc_flag <= addr_calc[8]; // can use this address if there's no
carry
cycle <= cycle + 1;
end
ZEROPAGE: begin
if(shifting) begin // need to feed data_in to the M register so
ALU performs operation on right data
M <= mem_data;
end else begin
operand_b <= mem_data; // otherwise feed data_in as operand b
end
if(long_wea) begin // if shifting or decrementing memory continue
cycle <= cycle + 1;
end else begin
rw <= !STORE; // rw on STA, STX, STY
ready_flag <= 1;
cycle <= 0; // if reading, data is ready
end
end
endcase
end
3'd3: begin
47
case(working_addr)
ABSOLUTE: begin
if(shifting) begin // need to feed data from memory to the M
register so ALU performs operation on right data
M <= mem_data;
end else begin
operand_b <= mem_data; // otherwise feed data from memory as
operand b
end
if(long_wea) begin // if writing operation, continue
cycle <= cycle + 1;
end else if(working_op != JSR) begin // jsr special exception
rw <= !STORE;
ready_flag <= 1;
cycle <= 0; // if reading, data is ready
end else begin
ADDR_H_RAW <= 8'h01;
ADDR_L <= S;
M <= PCL_HOLD;
rw <= 0;
pc_flag <= 0;
ready_flag <= 1;
cycle <= cycle + 1;
S <= S - 1;
end
end
ABSOLUTE_XINDEX: begin
ADDR_H_RAW <= ADDR_H_RAW + addr_calc[8]; // add carry if there,
more accurate to 6502 this way
if(!long_wea) begin
operand_b <= mem_data;
rw <= (!STORE & !addr_calc[8]);
ready_flag <= !pc_flag; // if pc_flag low ready to calculate
cycle <= (pc_flag) ? cycle + 1 : 0; // continue if pc_flag
high
pc_flag <= 0;
end else begin
cycle <= cycle + 1;
end
end
ABSOLUTE_YINDEX: begin
ADDR_H_RAW <= ADDR_H_RAW + addr_calc[8]; // add carry if there,
more accurate to 6502 this way
if(!long_wea) begin
operand_b <= mem_data;
rw <= (!STORE & !addr_calc[8]);
ready_flag <= !pc_flag; // if pc_flag low ready to calculate
cycle <= (pc_flag) ? cycle + 1 : 0; // continue if pc_flag
high
pc_flag <= 0;
end else begin
cycle <= cycle + 1;
end
end
ZEROPAGE: cycle <= cycle + 1;
ZEROPAGE_XINDEX: begin
if(shifting) begin
M <= mem_data;
end else begin
operand_b <= mem_data;
end
if(long_wea) begin
cycle <= cycle + 1;
end else begin
rw <= !STORE;
ready_flag <= 1;
48
cycle <= 0;
end
end
endcase
end
3'd4: begin
case(working_addr)
ABSOLUTE: begin
cycle <= cycle + 1;
if(working_op == JSR) begin
M <= PCH_HOLD;
ADDR_H_RAW <= 8'h01;
ADDR_L <= S;
rw <= 0;
ready_flag <= 1;
pc_flag = 0;
cycle <= cycle + 1;
S <= S - 1;
end
end
ABSOLUTE_XINDEX: begin // if at this point, ready to calc just like
absolute mode
if(!long_wea) begin
rw <= !STORE;
operand_b <= mem_data;
ready_flag <= 1;
cycle <= 0;
end else begin
cycle <= cycle + 1; // if writing, this addressing mode takes
7 cycles to complete
end
end
ABSOLUTE_YINDEX: begin // if at this point, ready to calc just like
absolute mode
if(!long_wea) begin
rw <= !STORE;
operand_b <= mem_data;
ready_flag <= 1;
cycle <= 0;
end else begin
cycle <= cycle + 1; // if writing, this addressing mode takes
7 cycles to complete
end
end
ZEROPAGE: begin ready_flag <= 1; rw <= 0; cycle <= 0; end
ZEROPAGE_XINDEX: cycle <= cycle + 1; // preserving cycle accuracy
endcase
end
3'd5: begin
case(working_addr)
ABSOLUTE: begin
if(working_op != JSR) begin
ready_flag <= 1; rw <= 0; cycle <= 0;
end else begin
PCH <= PCH_JUMP;
PCL <= PCL_JUMP;
cycle <= 0;
end
end
ABSOLUTE_XINDEX: cycle <= cycle + 1;
ABSOLUTE_YINDEX: cycle <= cycle + 1;
ZEROPAGE_XINDEX: begin ready_flag <= 1; rw <= 0; cycle <= 0; end
endcase
end
49
3'd6: begin
case(working_addr)
ABSOLUTE_XINDEX: begin M <= mem_data; ready_flag <= 1; rw <= 0; cycle
<=0 ;end
ABSOLUTE_YINDEX: begin M <= mem_data; ready_flag <= 1; rw <= 0; cycle
<=0 ;end
endcase
end
default: begin cycle <= 3'd0; PCH <= 8'b0; PCL <= 8'b0; end
endcase
PCH <= (PCL == 8'hFF) ? PCH + 1 : PCH; // increment program counter with carry
PCL <= PCL + 1; // always increment program counter
end
end
always_comb begin
if(rst) begin
data_out = 0;
internal_out = 0;
end
if(ready_flag & (long_wea | STORE)) begin
data_out = result;
internal_out = result;
end
end
endmodule
alu_tb.sv
`timescale 1ns / 1ps
Import cpu_constants::*;
module alu_tb;
// inputs
logic[6:0] op;
logic[7:0] a;
logic[7:0] b;
logic[7:0] curr_status_flag;
//outputs
logic[7:0] result;
logic[7:0] status_flag_out;
50
alu_6502
uut(.op(op),.a(a),.b(b),.curr_status_flag(curr_status_flag),.result(result),.status_flag_out(s
tatus_flag_out));
initial begin
#10;
a = 8'hFF;
b = 8'h01;
op = ADC;
curr_status_flag = 8'b00110000;
#10;
$display("ADC $FF + $01 with carry flag not set: %2h",result);
$display("Status Flag: %8b",status_flag_out);
#10;
op = SBC;
a = 8'h01;
b = 8'h01;
curr_status_flag = 8'b00110000;
#10;
$display("SBC $01 - $01 with carry flag not set: %2h",result);
$display("Status Flag: %8b",status_flag_out);
#10;
op = AND1;
a = 8'h81;
b = 8'h83;
#10;
$display("AND 81 & 83: %2h",result);
$display("Status Flag: %8b",status_flag_out);
op = ASL;
a = 8'h83;
b = 8'h83;
#10;
$display("ASL: %2h",result);
$display("Status Flag: %8b",status_flag_out);
op = ORA;
a = 8'h83;
b = 8'h84;
#10;
$display("ORA: %2h",result);
$display("Status Flag: %8b",status_flag_out);
op = ROL;
a = 8'h83;
curr_status_flag = 8'b00110000;
#10;
$display("ROL, CARRY CLEAR: %2h",result);
$display("Status Flag: %8b",status_flag_out);
op = ROL;
a = 8'h83;
curr_status_flag = 8'b00110001;
#10;
$display("ROL, CARRY SET: %2h",result);
$display("Status Flag: %8b",status_flag_out);
op = ROR;
a = 8'h83;
curr_status_flag = 8'b00110000;
#10;
$display("ROR, CARRY CLEAR: %2h",result);
$display("Status Flag: %8b",status_flag_out);
51
op = ROR;
a = 8'h83;
curr_status_flag = 8'b00110001;
#10;
$display("ROR, CARRY SET: %2h",result);
$display("Status Flag: %8b",status_flag_out);
op = BIT1;
a = 8'h24;
b = 8'h83;
curr_status_flag = 8'b00110000;
#10;
$display("BIT: %2h",result);
$display("Status Flag: %8b",status_flag_out);
end
endmodule
clk_gen_tb.sv
`timescale 1ns / 1ps
module clk_gen_tb;
// inputs
logic clock;
logic rst;
//outputs
logic phi_1;
logic phi_2;
clock_gen6502 uut(.clock(clock),.rst(rst),.phi_1(phi_1),.phi_2(phi_2));
always begin
#2;
clock = !clock;
end
initial begin
clock = 0;
rst = 0;
#10;
rst = 1;
#10;
rst = 0;
#100;
end
endmodule
main_6502_tb.sv
`timescale 1ns / 1ps
import cpu_constants::*;
module main_6502_tb;
// inputs
logic irq_low;
logic nmi_low;
logic rst;
logic so_low;
52
logic clock;
logic[7:0] data_in;
// outputs
logic[7:0] data_out;
logic[15:0] address;
logic sync;
logic rw;
logic phi_2;
main_6502
uut(.irq_low(irq_low),.nmi_low(nmi_low),.rst(rst),.so_low(so_low),.clock(clock),.data_in(data_
in),
.data_out(data_out),.address(address),.sync(sync),.rw(rw),.phi_2(phi2));
always begin
#1;
clock = !clock;
end
initial begin
clock = 0;
irq_low = 0;
nmi_low = 0;
rst = 1;
so_low = 0;
data_in = 8'hEA;
#12;
rst = 0;
#24;
data_in = 8'hA9;
#48;
data_in = 8'h67;
#48;
data_in = 8'h85;
#48;
data_in = 8'h01;
#48;
data_in = 8'h01;
#48;
data_in = 8'hA6;
#48;
data_in = 8'h01;
#200;
end
endmodule
PPU Modules
Top.sv
import PPU_Types::*;
53
module top(
input clk_100mhz,
input[15:0] sw,
input btnc, btnu, btnl, btnr, btnd,
output[3:0] vga_r,
output[3:0] vga_b,
output[3:0] vga_g,
output vga_hs,
output vga_vs,
output[15:0] led
);
parameter SCALE_FACTOR = 2;
parameter SCALED_HORIZONTAL_BOUND = NUM_PIXELS_HORIZONTAL * SCALE_FACTOR;
parameter SCALED_VERTICAL_BOUND = NUM_PIXELS_VERTICAL * SCALE_FACTOR;
logic reset;
assign reset = btnc;
logic vga_clock_wire;
logic ppu_clock_wire;
top_clock(
.clk_in(clk_100mhz),
.reset(reset),
.vga_clock_65mhz(vga_clock_wire),
.ppu_clock_5mhz(ppu_clock_wire)
);
/*
VGA module
*/
logic[9:0] vga_h_count;
logic[9:0] vga_v_count;
logic vga_vsync_out, vga_hsync_out;
logic vga_blank_out;
vga_1024 vga_module(
.vclock_in(vga_clock_wire),
.hcount_out(vga_h_count),
.vcount_out(vga_v_count),
.vsync_out(vga_vsync_out),
.hsync_out(vga_hsync_out),
.blank_out(vga_blank_out)
);
/*
PPU module
*/
// logic[8:0] ppu_v_count;
// logic[8:0] ppu_h_count;
// PPUCounter ppu_counter(
// .clock_in(ppu_clock_wire),
// .reset_in(reset),
54
// .ppu_v_count_out(ppu_v_count),
// .ppu_h_count_out(ppu_h_count)
// );
logic[8:0] pixel_h_count_fix;
assign pixel_h_count_fix = pixel_h_count < NUM_PIXELS_HORIZONTAL ? pixel_h_count: 0;
logic[8:0] pixel_v_count_fix;
assign pixel_v_count_fix = pixel_v_count < NUM_PIXELS_VERTICAL ? pixel_v_count: 0;
/*
Frame Buffer
*/
logic frame_number;
PPUColor frame_buffer_out;
FrameBuffer frames(
.reset_in(reset),
.frame_num_in(frame_number),
/*
PPU Color Mapper Module
*/
VGAColor vco;
PPUColorMapper color_mapper(
.ppu_color_in(frame_buffer_out),
.vga_color_out(vco)
55
);
end
end
logic vga_display;
assign vga_display = (vga_h_count <= SCALED_HORIZONTAL_BOUND && vga_v_count <=
SCALED_VERTICAL_BOUND);
endmodule
56
PPU.sv
import PPU_Types::*;
module PPU(
input logic clock_in,
input logic reset_in,
input logic up,down,right,left,
//SecondaryOAM I/O
logic secondary_oam_push_in;
OAMSprite secondary_oam_sprite_in;
logic secondary_oam_full_out;
.ppu_v_count_out(ppu_v_count_in),
.ppu_h_count_out(ppu_h_count_in)
);
PPUStateCalculator ppu_state_calculator(
.ppu_v_count_in(ppu_v_count_in),
.ppu_h_count_in(ppu_h_count_in),
.cycle_info_out(cycle_info_in)
);
OAM oam(
.clock_in(clock_in),
.reset_in(reset_in),
SecondaryOAM secondary_oam(
.clock_in(clock_in),
.reset_in(reset_in),
VRAMSpoofer vram_spoofer(
.clock_in(clock_in),
.reset_in(reset_in),
.addr_in(vram_address_out),
.data_out(vram_data_in)
);
PPUMemoryFetcher memory_fetcher(
58
.clock_in(clock_in),
.reset_in(reset_in),
sprite_pixels sprites(
.clock(clock_in),
.reset(reset_in),
.hcount_in(ppu_h_count_in),
.vcount_in(ppu_v_count_in),
.sprites_data_in(sprites_to_render_out),
.cycle_info_in(cycle_info_in),
pixel_mux pixel_renderer(
.clock(clock_in),
.reset(reset_in),
.sprite_hcount(sprite_hcount),
.sprite_vcount(sprite_vcount),
.background_pixel_data(background_pixel_data),
.sprite_pixel_data(sprite_pixel_data),
.background_drawing(background_drawing),
59
.sprite_drawing(sprite_drawing),
PaletteRAM palette_ram(
.clock_in(clock_in),
.reset_in(reset_in),
endmodule
60
PPUCounter.sv
import PPU_Types::*;
/*
Module Notes:
Inputs:
clock_in - the 5.369 MHz PPU clock
reset_in - when high, resets the module to vcount = 0, hcount = 0 on the rising edge
of the next clock cycle
Outputs:
vcount_out - the current scanline being drawn (ranges from 0 to 261, inclusive)
hcount_out - the current horizontal cycle number (ranges from 0 to 340, inclusive)
*/
module PPUCounter(
input logic clock_in,
input logic reset_in,
if(reset_in) begin
ppu_v_count_out <= 0;
ppu_h_count_out <= 0;
end
end
end
end
endmodule
61
PPUStateCalculator.sv
import PPU_Types::*;
/*
Module Notes:
Inputs:
ppu_v_count_in - this is the number (from 0 to 261, inclusive) of the scanline that
the PPU is currently drawing:
ppu_h_count_in - this is the scanline cycle number (from 0 to 340, inclusive) of the
pixel that the PPU is currently drawing.
Outputs:
vstate_out - the current vertical state of the PPU
hstate_out - the current horizontal state of the PPU
tileFetchstate_out - the current tile fetch state of the PPU
memoryFetchstate_out - the current memoryFetchstate_out of the PPU
*/
module PPUStateCalculator(
input logic[8:0] ppu_v_count_in,
input logic[8:0] ppu_h_count_in,
VerticalState vstate;
HorizontalState hstate;
TileFetchState tile_fetch_state;
MemoryFetchState memory_fetch_state;
assign cycle_info_out = {
vstate: vstate,
hstate: hstate,
tile_fetch_state: tile_fetch_state,
memory_fetch_state: memory_fetch_state
};
// the tile state can be determined from the 2nd and 3rd lowest bits of (ppu_h_count_in)
logic[1:0] tile_id;
assign tile_id = (ppu_h_count_in) >> 1;
always_comb begin
// Determine hstate_out
if(ppu_h_count_in <= VISIBLE_SCANLINE_LAST_BACKGROUND_DRAW_CYCLE) begin
hstate = BACKGROUND_DRAW;
end
// Determine tile_fetch_state_out
case(tile_id)
NAMETABLE_FETCH_ID: tile_fetch_state = NAMETABLE_FETCH;
ATTRIBUTE_FETCH_ID: tile_fetch_state = ATTRIBUTE_FETCH;
PATTERN_FETCH_1_ID: tile_fetch_state = PATTERN_FETCH_1;
PATTERN_FETCH_2_ID: tile_fetch_state = PATTERN_FETCH_2;
endcase
end
// Determine hstate_out
if(ppu_h_count_in <= VISIBLE_SCANLINE_LAST_SPRITE_PREFETCH_CYCLE) begin
63
hstate = SPRITE_PREFETCH;
end else if(ppu_h_count_in <= VISIBLE_SCANLINE_LAST_BACKGROUND_PREFETCH_CYCLE)
begin
hstate = BACKGROUND_PREFETCH;
end
// Determine tile_fetch_state_out
case(tile_id)
NAMETABLE_FETCH_ID: tile_fetch_state = NAMETABLE_FETCH;
ATTRIBUTE_FETCH_ID: tile_fetch_state = ATTRIBUTE_FETCH;
PATTERN_FETCH_1_ID: tile_fetch_state = PATTERN_FETCH_1;
PATTERN_FETCH_2_ID: tile_fetch_state = PATTERN_FETCH_2;
endcase
end
end
end
endmodule
64
OAM.sv
import PPU_Types::*;
module OAM(
input logic clock_in,
input logic reset_in,
// Internal state
OAMSprite[63:0] all_sprites;
logic[5:0] sprite_to_evaluate;
logic V_COUNT_IN_RANGE;
assign V_COUNT_IN_RANGE = NEXT_SCANLINE < LAST_VISIBLE_SCANLINE;
logic H_COUNT_IN_RANGE;
assign H_COUNT_IN_RANGE = (0 <= ppu_h_count_in) && (ppu_h_count_in < 64);
logic DO_EVALUATION;
assign DO_EVALUATION = V_COUNT_IN_RANGE && H_COUNT_IN_RANGE;
logic sprite_is_on_next_scanline;
assign sprite_is_on_next_scanline = (evaluated_sprite.y_position <= NEXT_SCANLINE) &&
(NEXT_SCANLINE < evaluated_sprite.y_position + 8);
// Attribute byte
color_data: 2'd1,
background_priority: 1'd0,
65
horizontal_flip: 1'd0,
vertical_flip: 1'd0
};
all_sprites[1] <= {
x_position: 8'd1,
y_position: 8'd222,
tile_index: 8'd0,
// Attribute byte
color_data: 2'd1,
background_priority: 1'd0,
horizontal_flip: 1'd0,
vertical_flip: 1'd0
};
all_sprites[2] <= {
x_position: 8'd16,
y_position: 8'd230,
tile_index: 8'd1,
// Attribute byte
color_data: 2'd1,
background_priority: 1'd1,
horizontal_flip: 1'd0,
vertical_flip: 1'd0
};
all_sprites[3] <= {
x_position: 8'd24,
y_position: 8'd222,
tile_index: 8'd1,
// Attribute byte
color_data: 2'd1,
background_priority: 1'd1,
horizontal_flip: 1'd0,
vertical_flip: 1'd0
};
all_sprites[4] <= {
x_position: 8'd16,
y_position: 8'd222,
tile_index: 8'd1,
// Attribute byte
color_data: 2'd1,
background_priority: 1'd1,
horizontal_flip: 1'd0,
vertical_flip: 1'd0
};
all_sprites[5] <= {
x_position: 8'd24,
y_position: 8'd230,
tile_index: 8'd1,
// Attribute byte
color_data: 2'd1,
background_priority: 1'd1,
horizontal_flip: 1'd0,
vertical_flip: 1'd0
};
all_sprites[6] <= {
66
x_position: 8'd254,
y_position: 8'd230,
tile_index: 8'd0,
// Attribute byte
color_data: 2'd0,
background_priority: 1'd0,
horizontal_flip: 1'd0,
vertical_flip: 1'd0
};
all_sprites[63:7] <= 0;
sprite_to_evaluate <= 0;
//Reset outputs
queue_write_out <= 0;
queue_sprite_out <= 0;
if (left ) begin
all_sprites[4].x_position <= all_sprites[4].x_position - 1;
all_sprites[5].x_position <= all_sprites[5].x_position - 1;
all_sprites[2].x_position <= all_sprites[2].x_position - 1;
all_sprites[3].x_position <= all_sprites[3].x_position - 1;
end else if (right ) begin
all_sprites[5].x_position <= all_sprites[5].x_position + 1;
all_sprites[4].x_position <= all_sprites[4].x_position + 1;
all_sprites[2].x_position <= all_sprites[2].x_position + 1;
all_sprites[3].x_position <= all_sprites[3].x_position + 1;
end
end
if(DO_EVALUATION) begin
endmodule
68
SecondaryOAM.sv
import PPU_Types::*;
module SecondaryOAM(
input logic clock_in,
input logic reset_in,
/*
The size of this queue is 8 because it can store 8 elements before it is full
Data cannot be inserted into a full queue until the cycle after data is popped.
Data cannot be removed from an empty queue until the cycle after data is pushed
Pushing and popping cannot be done at the same time. Pushing takes precedence.
*/
parameter SIZE = 8;
logic[3:0] write_pointer; // Array index where the next sprite will be written
logic[3:0] read_pointer; // Array index of the sprite being currently presented on
sprite_out
OAMSprite[8:0] sprite_queue; // This array has SIZE + 1 elements so that the queue can
differentiate between being full and empty
// Reset outputs
full_out <= 0;
sprite_out <= 0;
empty_out <= 1; // The queue is empty when it is reset
if(!full_out && push_in) begin // If queue is not full, we can push new data
logic[3:0] is_full_check_pointer;
is_full_check_pointer = (write_pointer + 2) % (SIZE + 1);
end else if(!empty_out && pop_in) begin // If queue is not empty, we can pop data
logic[3:0] next_read_pointer;
next_read_pointer = (read_pointer + 1) % (SIZE + 1);
end
end
end
endmodule
70
VRAM.sv
`timescale 1ns / 1ps
endmodule
72
VRAMSpoofer.sv
/*
Module notes:
This module pretends to be VRAM and returns tile data, nametable data, and attribute
table data
as if it were VRAM. However, regardless of the tile address, it will always return the
same pattern
table bytes.
Addresses of the form 15'b000X_XXXX_XXXX_0abc will map to pattern_table_1[abc]
Addresses of the form 15'b000X_XXXX_XXXX_1abc will map to pattern_table_2[abc]
Note: the nes has 2 pattern tables, but this mirrors them into one
Regardless of the attribute table address, it will always return the attribute byte
Addresses in the range [0x23C0, 02400) will return the attribute byte
parameter NUM_TILES_LOADED = 7;
parameter MAX_VALID_PATTERN_TABLE_ADDRESS = NUM_TILES_LOADED * 16 - 1;
assign pattern_table[0] = {
pattern_bytes_1: {
8'b00000000,
8'b00000000,
8'b00000000,
8'b00000000,
8'b00000000,
8'b00000000,
8'b00000000,
8'b00000000
},
pattern_bytes_2: {
8'b00000000,
73
8'b00000000,
8'b00000000,
8'b00000000,
8'b00000000,
8'b00000000,
8'b00000000,
8'b00000000
}
};
assign pattern_table[1] = {
pattern_bytes_1: {
8'b00000011,
8'b00000000,
8'b00011110,
8'b00000100,
8'b00000010,
8'b00000000,
8'b00001111,
8'b00000001
},
pattern_bytes_2: {
8'b00000000,
8'b00001111,
8'b00000001,
8'b00111011,
8'b00011101,
8'b00000011,
8'b00001111,
8'b00000001
}
};
assign pattern_table[2] = {
pattern_bytes_1: {
8'b11110000,
8'b00000000,
8'b00011100,
8'b11001000,
8'b01001000,
8'b11110000,
8'b11110000,
8'b11100000
},
pattern_bytes_2: {
8'b00000000,
8'b11100000,
8'b11100000,
8'b00110000,
8'b10110000,
8'b00000000,
8'b11110000,
8'b11100000
}
};
assign pattern_table[3] = {
pattern_bytes_1: {
8'b00000111,
8'b00000110,
8'b00001111,
8'b00001111,
74
8'b00001111,
8'b00000111,
8'b00000111,
8'b00001111
},
pattern_bytes_2: {
8'b00000001,
8'b00000111,
8'b00001111,
8'b00001111,
8'b00001111,
8'b00000111,
8'b00000000,
8'b00000000
}
};
assign pattern_table[4] = {
pattern_bytes_1: {
8'b01111000,
8'b00111000,
8'b01111000,
8'b10011000,
8'b00011000,
8'b11110000,
8'b11110000,
8'b11110000
},
pattern_bytes_2: {
8'b00000000,
8'b00000000,
8'b01111000,
8'b11111000,
8'b11101000,
8'b10000000,
8'b11000000,
8'b10000000
}
};
assign pattern_table[5] = {
pattern_bytes_1: {
8'b1111_1111,
8'b1000_0001,
8'b1011_1101,
8'b1010_0101,
8'b1010_0101,
8'b1011_1101,
8'b1000_0001,
8'b1111_1111
},
pattern_bytes_2: {
8'b0000_0000,
8'b0111_1110,
8'b0111_1110,
8'b0111_1110,
8'b0111_1110,
8'b0111_1110,
8'b0111_1110,
8'b0000_0000
}
};
75
assign pattern_table[6] = {
pattern_bytes_1: {
8'b1111_1111,
8'b1111_1111,
8'b1111_1111,
8'b1111_1111,
8'b1111_1111,
8'b1111_1111,
8'b1111_1111,
8'b1111_1111
},
pattern_bytes_2: {
8'b0000_0000,
8'b0111_1110,
8'b0111_1110,
8'b0111_1110,
8'b0111_1110,
8'b0111_1110,
8'b0111_1110,
8'b0000_0000
}
};
end
end else if(NAMETABLE_LOWER_BOUND <= addr_in && addr_in < NAMETABLE_CHANGE) begin
data_out <= NAMETABLE_BYTE_1;
end
end
end
endmodule
76
PPUMemoryFetcher.sv
import PPU_Types::*;
module PPUMemoryFetcher(
input logic clock_in,
input logic reset_in,
// Internal state
OAMSprite current_sprite_to_fetch; // Internal copy of the current sprite to fetch
logic[2:0] sprite_write_pointer; // Contains the index into spritePixelsToRender in which
the next loaded sprite will be stored
logic[7:0] background_tile_index; // This value is the tile index that is loaded from the
nametable
logic[7:0] background_attribute_byte;
logic[1:0] background_attribute_bits; // These are the bits selected from the attribute
byte that will be sent to the background module
logic[7:0] tile_byte_1; // Byte 1 from the pattern table for the tile being fetched
logic[7:0] tile_byte_2; // Byte 2 from the pattern table for the tile being fetched
//v_count has to be incremented by one if we are prefetching for the next line
logic [8:0] prefetch_v_count;
//the condition was added in to try and fix top scanline of prefetched tiles, may need to
change condition to greater than
assign prefetch_v_count = ppu_v_count_in + 1 > 240 ? 0 : ppu_v_count_in + 1;
logic[15:0] nametable_addr;
// Translate vcount and hcount into a nametable address.
// Note: only supporting one nametable from addresses 0x2000-0x23BF
77
assign nametable_addr[15:10] = 6'b001000;
assign nametable_addr[9:5] = cycle_info_in.hstate == BACKGROUND_PREFETCH ?
prefetch_v_count[7:3] : ppu_v_count_in[7:3];
assign nametable_addr[4:0] = cycle_info_in.hstate == BACKGROUND_PREFETCH ? {3'b0,
ppu_h_count_in[3]}: (ppu_h_count_in[7:3] + 2);
ATTRIBUTE_FETCH: begin
vram_address_out <= attribute_addr; // TODO: calculate actual address
here
PATTERN_FETCH_1: begin
/*
The address of tile byte 1 takes the form of:
MSB
4'b0000 - indicating pattern table 1
8 bits - the tile index
1'b0 - indicating that it is the first of two bytes defining
the tile row
78
ppu_v_count_in[2:0] - the last three bits of ppu_v_count_in
tell which row of the tile to read
LSB
*/
logic[15:0] tile_byte_1_addr;
tile_byte_1_addr[15:12] = 0;
tile_byte_1_addr[11:4] = background_tile_index;
tile_byte_1_addr[3] = 0;
tile_byte_1_addr[2:0] = cycle_info_in.hstate == BACKGROUND_PREFETCH ?
prefetch_v_count[2:0]:ppu_v_count_in[2:0];
vram_address_out <= tile_byte_1_addr;
end
end
end
PATTERN_FETCH_2: begin
/*
The address of tile byte 2 takes the form of:
MSB
4'b0000 - indicating pattern table 1
8 bits - the tile index
1'b1 - indicating that it is the second of two bytes defining
the tile row
ppu_v_count_in[2:0] - the last three bits of ppu_v_count_in
tell which row of the tile to read
LSB
*/
logic[15:0] tile_byte_2_addr;
tile_byte_2_addr[15:12] = 0;
tile_byte_2_addr[11:4] = background_tile_index;
tile_byte_2_addr[3] = 1;
tile_byte_2_addr[2:0] = cycle_info_in.hstate == BACKGROUND_PREFETCH ?
prefetch_v_count[2:0]:ppu_v_count_in[2:0];
vram_address_out <= tile_byte_2_addr;
79
// Store response from VRAM in the background tile byte 2 register
if(cycle_info_in.memory_fetch_state == VRAM_RECIEVE) begin
tile_byte_2 <= vram_data_in; // TODO don't really need this here,
but useful for debugging
TILE_FETCH_IDLE: begin
// Should never be in this state
end
endcase
end else if(cycle_info_in.hstate == SPRITE_PREFETCH) begin
case(cycle_info_in.tile_fetch_state)
NAMETABLE_FETCH: begin
vram_address_out <= 0; // Nothing is fetched during this cycle for
sprites
current_sprite_to_fetch <= sprite_in; // Latch the current sprite to
fetch
end
ATTRIBUTE_FETCH: begin
vram_address_out <= 0; // Nothing is fetched during this cycle for
sprites
end
PATTERN_FETCH_1: begin
/*
The address of tile byte 1 takes the form of:
MSB
4'b0000 - indicating pattern table 1
8 bits - the tile index
1'b0 - indicating that it is the first of two bytes defining
the tile row
ppu_v_count_in[2:0] - the last three bits of ppu_v_count_in
tell which row of the tile to read
LSB
*/
logic[15:0] tile_byte_1_addr;
tile_byte_1_addr[15:12] = 0;
tile_byte_1_addr[11:4] = sprite_in.tile_index;
tile_byte_1_addr[3] = 0;
tile_byte_1_addr[2:0] = prefetch_v_count[2:0]-sprite_in.y_position;
vram_address_out <= tile_byte_1_addr;
80
// Store response from VRAM in the background tile byte 1 register
if(cycle_info_in.memory_fetch_state == VRAM_RECIEVE) begin
tile_byte_1 <= vram_data_in;
end
end
PATTERN_FETCH_2: begin
/*
The address of tile byte 2 takes the form of:
MSB
4'b0000 - indicating pattern table 1
8 bits - the tile index
1'b1 - indicating that it is the second of two bytes defining
the tile row
ppu_v_count_in[2:0] - the last three bits of ppu_v_count_in
tell which row of the tile to read
LSB
*/
logic[15:0] tile_byte_2_addr;
tile_byte_2_addr = {4'b0, sprite_in.tile_index, 1'b1,
ppu_v_count_in[2:0]};
tile_byte_2_addr[15:12] = 0;
tile_byte_2_addr[11:4] = sprite_in.tile_index;
tile_byte_2_addr[3] = 1;
tile_byte_2_addr[2:0] = prefetch_v_count-sprite_in.y_position;
vram_address_out <= tile_byte_2_addr;
if(current_sprite_to_fetch == 0) begin
sprites_to_render_out[sprite_write_pointer] <= 0;
x_position: sprite_in.x_position
};
end
end
TILE_FETCH_IDLE: begin
// Should never be in this state
end
endcase
81
end
// Clear the secondary oam on one of the background prefetch cycles to reset it
for the next scanline
secondary_oam_clear_out <= ppu_h_count_in ==
VISIBLE_SCANLINE_LAST_BACKGROUND_PREFETCH_CYCLE;
end
end
endmodule
82
Background.sv
import PPU_Types::*;
module background( input clock,
input reset,
input BackgroundDataToRender background_data_to_render_in,
input new_data,
//input[2:0] fineX,
input [8:0] hcount_in,
input [8:0] vcount_in,
input PPUCycleInfo ppu_cycle_info_in,
);
logic [15:0] PatternReg1;
logic [15:0] PatternReg2;
logic [7:0] attribute1;
logic [7:0] attribute2;
logic new_attribute1;
logic new_attribute2;
Sprite_pixels.sv
`timescale 1ns / 1ps
import PPU_Types::*;
module sprite_pixels(
input clock,
input reset,
input [8:0] hcount_in,
input [8:0] vcount_in,
input SpriteDataToRender[7:0] sprites_data_in,
/*
// 2 pattern table bytes
logic[7:0] tile_byte_1;
logic[7:0] tile_byte_2;
logic[7:0] x_position;
*/
input PPUCycleInfo cycle_info_in,
//sprite_pixel_data bit 5 background priority, bit 4 = 1 to direct to
sprite palettes, bit 3&2 attribute palette data, bit 1&0 color data from pattern tables
output logic [5:0] sprite_pixel_data,
output logic [8:0] hcount_out,
output logic [8:0] vcount_out,
output logic drawing
);
//creating 8 latches with attribute info of sprites bit 0&1 color data bit 2 background
priority
logic [1:0] attribute_sprite0;
logic [1:0] attribute_sprite1;
logic [1:0] attribute_sprite2;
logic [1:0] attribute_sprite3;
logic [1:0] attribute_sprite4;
logic [1:0] attribute_sprite5;
logic [1:0] attribute_sprite6;
logic [1:0] attribute_sprite7;
//creating 16 8-bit shift registers for pattern table bytes, 2 for each sprite
logic [7:0] first_pattern_sprite0;
logic [7:0] first_pattern_sprite1;
logic [7:0] first_pattern_sprite2;
logic [7:0] first_pattern_sprite3;
logic [7:0] first_pattern_sprite4;
logic [7:0] first_pattern_sprite5;
logic [7:0] first_pattern_sprite6;
logic [7:0] first_pattern_sprite7;
logic [7:0] second_pattern_sprite0;
logic [7:0] second_pattern_sprite1;
logic [7:0] second_pattern_sprite2;
86
logic [7:0] second_pattern_sprite3;
logic [7:0] second_pattern_sprite4;
logic [7:0] second_pattern_sprite5;
logic [7:0] second_pattern_sprite6;
logic [7:0] second_pattern_sprite7;
logic priority0;
logic priority1;
logic priority2;
logic priority3;
logic priority4;
logic priority5;
logic priority6;
logic priority7;
always_comb begin
if (sprite0_pixel != 6'b0 && sprite0_pixel != 6'b010000) begin
send_out = sprite0_pixel;
end else if (sprite1_pixel != 6'b0 && sprite1_pixel != 6'b010000) begin
send_out = sprite1_pixel;
end else if (sprite2_pixel != 6'b0 && sprite2_pixel != 6'b010000) begin
send_out = sprite2_pixel;
end else if (sprite3_pixel != 6'b0 && sprite3_pixel != 6'b010000) begin
send_out = sprite3_pixel;
end else if (sprite4_pixel != 6'b0 && sprite4_pixel != 6'b010000) begin
send_out = sprite4_pixel;
end else if (sprite5_pixel != 6'b0 && sprite5_pixel != 6'b010000) begin
send_out = sprite5_pixel;
end else if (sprite6_pixel != 6'b0 && sprite6_pixel != 6'b010000) begin
send_out = sprite6_pixel;
end else if (sprite7_pixel != 6'b0 && sprite7_pixel != 6'b010000) begin
send_out = sprite7_pixel;
end else begin
send_out = 6'b0;
end
if (hcount_in == 0) begin
send_out =6'b0;
end
end
attribute_sprite0 <= 0;
87
attribute_sprite1 <= 0;
attribute_sprite2 <= 0;
attribute_sprite3 <= 0;
attribute_sprite4 <= 0;
attribute_sprite5 <= 0;
attribute_sprite6 <= 0;
attribute_sprite7 <= 0;
first_pattern_sprite0 <= 0;
first_pattern_sprite1 <= 0;
first_pattern_sprite2 <= 0;
first_pattern_sprite3 <= 0;
first_pattern_sprite4 <= 0;
first_pattern_sprite5 <= 0;
first_pattern_sprite6 <= 0;
first_pattern_sprite7 <= 0;
second_pattern_sprite0 <= 0;
second_pattern_sprite1 <= 0;
second_pattern_sprite2 <= 0;
second_pattern_sprite3 <= 0;
second_pattern_sprite4 <= 0;
second_pattern_sprite5 <= 0;
second_pattern_sprite6 <= 0;
second_pattern_sprite7 <= 0;
sprite0_pixel <= 0;
sprite1_pixel <= 0;
sprite2_pixel <= 0;
sprite3_pixel <= 0;
sprite4_pixel <= 0;
sprite5_pixel <= 0;
sprite6_pixel <= 0;
sprite7_pixel <= 0;
priority0 <= 0;
priority1 <= 0;
priority2 <= 0;
priority3 <= 0;
priority4 <= 0;
priority5 <= 0;
priority6 <= 0;
priority7 <= 0;
end
//calculating if a sprite is there and drawing that pixel
else if (hcount_in < VISIBLE_SCANLINE_LAST_BACKGROUND_DRAW_CYCLE)
begin //2 is added because of clockcycle offset
if (hcount_in >= location_sprite7 && hcount_in < location_sprite7
+ 8'd8 ) begin
sprite7_pixel <= {priority7, 1'b1,
attribute_sprite7,second_pattern_sprite7[7],first_pattern_sprite7[7]};
first_pattern_sprite7 <= {first_pattern_sprite7[6:0],1'b0};
second_pattern_sprite7 <= {second_pattern_sprite7[6:0],1'b0};
end else begin
sprite7_pixel <= 6'b0;
end
if (hcount_in >= location_sprite6 && hcount_in < location_sprite6
+ 8'd8 ) begin
sprite6_pixel <= {priority6, 1'b1,
attribute_sprite6,second_pattern_sprite6[7],first_pattern_sprite6[7]};
first_pattern_sprite6 <= {first_pattern_sprite6[6:0],1'b0};
second_pattern_sprite6 <= {second_pattern_sprite6[6:0],1'b0};
end else begin
sprite6_pixel <= 6'b0;
end
if (hcount_in >= location_sprite5 && hcount_in < location_sprite5
+ 8'd8 ) begin
sprite6_pixel <= {priority6, 1'b1,
attribute_sprite6,second_pattern_sprite6[7],first_pattern_sprite6[7]};
first_pattern_sprite5 <= {first_pattern_sprite5[6:0],1'b0};
second_pattern_sprite5 <= {second_pattern_sprite5[6:0],1'b0};
end else begin
sprite5_pixel <= 6'b0;
end
if (hcount_in >= location_sprite4 && hcount_in < location_sprite4
+ 8'd8 ) begin
sprite6_pixel <= {priority6, 1'b1,
attribute_sprite6,second_pattern_sprite6[7],first_pattern_sprite6[7]};
first_pattern_sprite4 <= {first_pattern_sprite4[6:0],1'b0};
second_pattern_sprite4 <= {second_pattern_sprite4[6:0],1'b0};
end else begin
sprite4_pixel <= 6'b0;
end
if (hcount_in >= location_sprite3 && hcount_in < location_sprite3
+ 8'd8 ) begin
sprite6_pixel <= {priority6, 1'b1,
attribute_sprite6,second_pattern_sprite6[7],first_pattern_sprite6[7]};
first_pattern_sprite3 <= {first_pattern_sprite3[6:0],1'b0};
second_pattern_sprite3 <= {second_pattern_sprite3[6:0],1'b0};
end else begin
90
sprite3_pixel <= 6'b0;
end
if (hcount_in >= location_sprite2 && hcount_in < location_sprite2
+ 8'd8 ) begin
sprite6_pixel <= {priority6, 1'b1,
attribute_sprite6,second_pattern_sprite6[7],first_pattern_sprite6[7]};
first_pattern_sprite2 <= {first_pattern_sprite2[6:0],1'b0};
second_pattern_sprite2 <= {second_pattern_sprite2[6:0],1'b0};
end else begin
sprite2_pixel <= 6'b0;
end
if (hcount_in >= location_sprite1 && hcount_in < location_sprite1
+ 8'd8 ) begin
sprite6_pixel <= {priority6, 1'b1,
attribute_sprite6,second_pattern_sprite6[7],first_pattern_sprite6[7]};
Pixel_mux.sv
module pixel_mux(
input clock,
input reset,
input [8:0] sprite_hcount,
input [8:0] sprite_vcount,
input [5:0] sprite_pixel_data,
input [4:0] background_pixel_data,
//possibly need this input
input background_drawing,
input sprite_drawing,
PaletteRAM.sv
module PaletteRAM(
input logic clock_in,
input logic reset_in,
PPUColor transparent_color;
assign transparent_color = 6'h21; // light grey
PPUColor[3:0][3:0] background_palettes;
assign background_palettes = {
{6'h08, 6'h18, 6'h17, transparent_color},
{6'h18, 6'h17, 6'h08, transparent_color},
{6'h17, 6'h08, 6'h18, transparent_color},
{6'h06, 6'h07, 6'h17, transparent_color}
};
PPUColor[3:0][3:0] sprite_palettes;
assign sprite_palettes = {
{6'h06, 6'h37, 6'h02, transparent_color},
{6'h06, 6'h37, 6'h02, transparent_color},
{6'h06, 6'h37, 6'h02, transparent_color},
{6'h06, 6'h37, 6'h02, transparent_color}
};
// This provides a one-stage pipeline so that the position information follows the
pixel information correctly
pixel_h_count_out <= pixel_h_count_in;
pixel_v_count_out <= pixel_v_count_in;
end
end
endmodule
93
FrameBuffer.sv
import PPU_Types::*;
/*
Module notes:
This module serves as a frame buffer between the output of the PPU and the input to the
VGA module.
This module contains two memories that each store an entire frame of video. While the PPU
is writing
the next frame to one buffer, the VGA module is reading the current frame data from the
other.
Note that the output pixel data lags the VGA v and h count by one clock cycle.
Also note that vga_v_count_in and ppu_v_count_in should range from 0 to 239 inclusive and
vga_h_count_in and ppu_h_count_in should range from 0 to 255 inclusive
*/
module FrameBuffer(
input logic reset_in,
PPUColor buffer1_pixel;
ppu_frame_buffer buffer1(
.clka(frame_num_in ? vga_clock_in : ppu_clock_in),
.wea(~frame_num_in), // Write to this frame buffer when frame_num_in is high
.addra(frame_num_in ? video_out_read_addr : ppu_write_addr),
.dina(pixel_color_in),
.douta(buffer1_pixel)
);
PPUColor buffer2_pixel;
ppu_frame_buffer buffer2(
.clka(frame_num_in ? ppu_clock_in : vga_clock_in),
.wea(frame_num_in), // Write to this frame buffer when frame_num_in is low
.addra(frame_num_in ? ppu_write_addr : video_out_read_addr),
.dina(pixel_color_in),
.douta(buffer2_pixel)
);
94
endmodule
95
PPUColorMapper.sv
import PPU_Types::*;
module PPUColorMapper(
input PPUColor ppu_color_in,
// Buckle up
// There's probably a better way to do this, but idk
logic[11:0] color_map[63:0]; // 64 x 12 bit array
assign color_map[0] = 12'h777; // 0x00
assign color_map[1] = 12'h00F;
assign color_map[2] = 12'h00B;
assign color_map[3] = 12'h42B;
assign color_map[4] = 12'h908;
assign color_map[5] = 12'hA02;
assign color_map[6] = 12'hA10;
assign color_map[7] = 12'h810;
assign color_map[8] = 12'h530;
assign color_map[9] = 12'h070;
assign color_map[10] = 12'h060;
assign color_map[11] = 12'h050;
assign color_map[12] = 12'h045;
assign color_map[13] = 12'h000;
assign color_map[14] = 12'h000;
assign color_map[15] = 12'h000; // 0x0F
assign color_map[16] = 12'hBBB; // 0x10
assign color_map[17] = 12'h07F;
assign color_map[18] = 12'h05F;
assign color_map[19] = 12'h64F;
assign color_map[20] = 12'hD0C;
assign color_map[21] = 12'hE05;
assign color_map[22] = 12'hF30;
assign color_map[23] = 12'hE51;
assign color_map[24] = 12'hA70;
assign color_map[25] = 12'h0B0;
assign color_map[26] = 12'h0A0;
assign color_map[27] = 12'h0A4;
assign color_map[28] = 12'h088;
assign color_map[29] = 12'h000;
assign color_map[30] = 12'h000;
assign color_map[31] = 12'h000; // 0x1F
assign color_map[32] = 12'hFFF; // 0x20
assign color_map[33] = 12'h3BF;
assign color_map[34] = 12'h68F;
assign color_map[35] = 12'h97F;
assign color_map[36] = 12'hF7F;
assign color_map[37] = 12'hF59;
assign color_map[38] = 12'hF75;
assign color_map[39] = 12'hFA4;
assign color_map[40] = 12'hFB0;
assign color_map[41] = 12'hBF1;
assign color_map[42] = 12'h5D5;
assign color_map[43] = 12'h5F9;
assign color_map[44] = 12'h0ED;
assign color_map[45] = 12'h777;
assign color_map[46] = 12'h000;
assign color_map[47] = 12'h000; // 0x2F
assign color_map[48] = 12'hFFF; // 0x30
assign color_map[49] = 12'hAEF;
assign color_map[50] = 12'hBBF;
96
assign color_map[51] = 12'hDBF;
assign color_map[52] = 12'hFBF;
assign color_map[53] = 12'hFAC;
assign color_map[54] = 12'hFDB;
assign color_map[55] = 12'hFEA;
assign color_map[56] = 12'hFD7;
assign color_map[57] = 12'hDF7;
assign color_map[58] = 12'hBFB;
assign color_map[59] = 12'hBFD;
assign color_map[60] = 12'h0FF;
assign color_map[61] = 12'hDDD;
assign color_map[62] = 12'h000;
assign color_map[63] = 12'h000; // 0x3F
endmodule
97
VGA.sv
//////////////////////////////////////////////////////////////////////////////////
// Update: 8/8/2019 GH
// Create Date: 10/02/2015 02:05:19 AM
// Module Name: xvga
//
// xvga: Generate VGA display signals (1024 x 768 @ 60Hz)
//
// ---- HORIZONTAL ----- ------VERTICAL -----
// Active Active
// Freq Video FP Sync BP Video FP Sync BP
// 640x480, 60Hz 25.175 640 16 96 48 480 11 2 31
// 800x600, 60Hz 40.000 800 40 128 88 600 1 4 23
// 1024x768, 60Hz 65.000 1024 24 136 160 768 3 6 29
// 1280x1024, 60Hz 108.00 1280 48 112 248 768 1 3 38
// 1280x720p 60Hz 75.25 1280 72 80 216 720 3 5 30
// 1920x1080 60Hz 148.5 1920 88 44 148 1080 4 5 36
//
// change the clock frequency, front porches, sync's, and back porches to create
// other screen resolutions
////////////////////////////////////////////////////////////////////////////////
module vga_1024(input vclock_in,
output reg [10:0] hcount_out, // pixel number on current line
output reg [9:0] vcount_out, // line number
output reg vsync_out, hsync_out,
output reg blank_out);
endmodule
99
PPUTypes.sv
/*
Enums and constants used for the PPU
*/
package PPU_Types;
parameter NUM_VERTICAL_PIXELS = 240; // The number of pixels tall the drawn window is
parameter NUM_HORIZONTAL_PIXELS = 256; // The number of pixels wide the drawn window is
// The boundaries for the different cycle phases during the visible scanlines
parameter VISIBLE_SCANLINE_LAST_BACKGROUND_DRAW_CYCLE = 255; // The number of the last cycle
of the background draw phase
parameter VISIBLE_SCANLINE_LAST_SPRITE_PREFETCH_CYCLE = 319; // The number of the last cycle
of the sprite data prefetch phase
parameter VISIBLE_SCANLINE_LAST_BACKGROUND_PREFETCH_CYCLE = 335; // The number of the last
cycle of the background data prefetch phase
parameter VISIBLE_SCANLINE_LAST_IDLE_CYCLE = 340; // The number of the last cycle of the
ending idle phase
/*
VISIBLE: scanlines [0,239]
POSTRENDER: scaneline 240
VBLANK: scanlines [241,260]
PRERENDER: scanline 261
*/
typedef enum {
PRERENDER, VISIBLE, POSTRENDER, VBLANK
} VerticalState;
/*
BACKGROUND: When the PPU is drawing the background. Cycles [0, 255]
SPRITE_PREFETCH: When the PPU is pre-fetching the sprite data for the next frame. Cycles
[256,319]
HORIZONTAL_IDLE: When the PPU isn't actually drawing anything. Cycles [320,324]
BACKGROUND_PREFETCH: When the PPU is pre-fetching the background tiles for the next frame.
Cycles [325,340]
*/
typedef enum {
HORIZONTAL_IDLE, BACKGROUND_DRAW, SPRITE_PREFETCH, BACKGROUND_PREFETCH
} HorizontalState;
/*
TILE_FETCH_IDLE: the PPU isn't fetching tile data
NAMETABLE_READ: the PPU is fetching data from a nametable
ATTRIBUTE_READ: the PPU is fetching data from an attribute table
PATTERN_READ_1: the PPU is fetching the first byte of tile data
PATTERN_READ_2: the PPU is fetching the second byte of tile data
*/
100
typedef enum {
NAMETABLE_FETCH, ATTRIBUTE_FETCH, PATTERN_FETCH_1, PATTERN_FETCH_2, TILE_FETCH_IDLE
} TileFetchState;
/*
MEM_FETCH_IDLE: the PPU isn't fetching tile data
VRAM_REQUEST: the PPU is asserting an address to the VRAM
VRAM_RECIEVE: the PPU is reading the data returned by the VRAM
*/
typedef enum {
VRAM_REQUEST, VRAM_RECIEVE, MEMORY_FETCH_IDLE
} MemoryFetchState;
/*
This is the data type representing a sprite as it is stored in Object Attribute Memory
(OAM)
*/
typedef struct packed {
logic[7:0] x_position;
logic[7:0] y_position;
logic[7:0] tile_index;
// Attribute byte
logic[1:0] color_data;
logic background_priority;
logic horizontal_flip;
logic vertical_flip;
} OAMSprite;
/*
This is the data type defining a pixel from a sprite as it is passed to the rendering
module
*/
typedef struct packed {
// 2 pattern table bytes
logic[7:0] tile_byte_1;
logic[7:0] tile_byte_2;
logic[7:0] x_position;
} SpriteDataToRender;
/*
This is the data type defining a background pixel as it is passed to the rendering module
*/
typedef struct packed {
// 2 pattern table bytes
logic[7:0] tile_byte_1;
logic[7:0] tile_byte_2;
logic[1:0] attributes;
101
} BackgroundDataToRender;
endpackage
102
Sources
Diskin, Patrick. “Nintendo Entertainment System Documentation.” nesdev.com, 1.0, August 2004.
http://www.nesdev.com/NESDoc.pdf