FPGA NES

Behold: a complete Nintendo Entertainment System cloned in an FPGA! Originally written in VHDL by Brent Allen and myself while at Washington State University, I’ve recently revisited this project and begun both: rewriting it in Verilog, and adding many new features (like support for more complex games requiring memory mappers).

FPGA NES

FPGA NES

History

Way back in 2006, Brent and I found ourselves taking a digital logic course. We’d already succeeded in vastly over-complicating the penultimate lab (a simple single-channel frequency synthesizer; which we each extended to support multichannel MIDI playback in an FPGA – but I digress). So, what were we to do with all of our spare time? – clone a NES, of course!

We didn’t know exactly what we were getting ourselves in to. We knew it was possible, though not commonly done; at the time, only one person – Kevin Horton – had posted any information about successfully doing so (I’m aware of at least one more now: Jonathon Donaldson’s VeriNES).

There was a lot going for the project: emulator developers had already done an incredible job of reverse-engineering every piece of the NES, to the point of having cycle-accurate behavioral descriptions of the majority of the NES’ internal logic. The NES’ CPU was a 6502 derivative, for which several open-source HDL implementations existed (though this proved to be somewhat of a red-herring, as only one of those implementations proved sufficiently accurate to run even a basic game – and we still had to correct a few bugs and add missing instructions to it).

Brent took on the task of developing the audio-processing unit (APU) and integrating the existing 6502 CPU core, while I set to work on implementing the picture-processing unit (PPU).

Just a couple of weeks (and many pots of coffee) later, we had ourselves a working NES – or, at least, a NES that worked well enough to play Super Mario Bros. We were ecstatic – and, perhaps, somewhat behind on our other coursework.

Playing SMB was as far as we took the original project. At the time, we weren’t terribly anxious to implement a whole slew of mappers, as that could have easily been a far larger task than the whole rest of the project (having now done so, I can say that it absolutely is [more difficult]!).

New Developments

I recently made the mistake of browsing through my old VHDL from this project. I knew it wasn’t great when originally I wrote it, but – after having become a Verilog convert and spending a few years gainfully employed as an FPGA developer – seeing it again just made me cringe. A complete re-write was in order.

Thus far, I’ve made a bunch of enhancements:

  • Completely re-wrote the PPU in Verilog (and fixed all sorts of obscure bugs in the process – and, by fixed, I mean that I actually implemented many of the NES’ quirks and bugs)
  • Re-wrote the top-level and various wrappers in Verilog, leaving just the APU and CPU core in VHDL (eventually, they too will be re-written)
  • Moved all RAM and ROM storage into an external PSRAM, which made using mappers practical (previously, everything had to fit within the FPGA’s very limited 48KB of RAM)
  • Implemented an iNES parser
  • Implemented a plethora of mappers
  • Implemented an EPP interface to the Nexys’ USB microcontroller, so games could be loaded over USB (previously, they had to be stored within the FPGA’s block RAM when the bitstream file was generated)

There is, as always, much more to do. But, it now plays Super Mario Bros. 3 (and a rather lot of other non-trivial games), so I’m pretty satisfied.

Top items on my to-do list include: fleshing out the project description here, and packaging some of the source-code for release (particularly the PPU, since it’s in the best shape, and likely of the greatest interest – being the single trickiest part in the NES and all).

FPGA NES block diagram

FPGA NES block diagram

FPGA NES

FPGA NES

All of the hardware required for this project is readily available from commercial sources (mostly Digilent) – no custom boards required (though Brent and I have long intended to design a cartridge interface board, so we could test our clone against the real thing):

Nexys FPGA board

Nexys FPGA board

VGA interface

VGA interface

Speaker

Speaker

NES controller interface

NES controller interface

38 Responses to FPGA NES

  1. Jack Gassett says:

    Hello Dan,

    I really like the work you’ve done with this NES project. I’ve been working on an Open Source hardware platform for classic arcade game projects just like this one. It might be a better, lower cost option for your project then the digilent board. If you have some time please take a look at my project and send me an email. It would be great to chat and see if we can figure out a way to collaborate.
    http://gadgetforge.gadgetfactory.net/gf/project/papilio_arcade/

    Thanks,
    Jack.

    • Dan says:

      Nice platform you have there, Jack! Certainly, the NES could be ported to run on it.

      The biggest limitation is that even the XC3S500E is a bit short on block-RAM; before adding support for the Nexys board’s generous external RAM, the NES was using every single BRAM in a XC3S1000. For a typical mapper-less NES game (e.g. Super Mario Bros.), you need a minimum of 44KB (32 KB for program ROM, 8 KB for graphics ROM, 2 KB for CPU RAM, and 2KB for video RAM). Though, there are some very simple games that only need 16 KB of PRG ROM.

      The XC3S500E, of course, only has 40 KB of BRAM – unless you get really creative (like using the spare parity bits in the BRAM to store additional bytes of data – hmm.. I may just have to try that; it would save ~1 BRAM off of the PRG ROM). Distributed RAM could be used, too, at great expense to the available logic resources.

      Of course, if you did wind up producing a NES cartridge adapter, then all of these RAM concerns would be nullified! ;)

      Anyway – you’ve piqued my interest. When I find some spare time, I’ll see if I can get the NES (and a game) to fit in a XC3S500E.

      • Jack Gassett says:

        I was wondering if the Program Rom’s could be run out of the SPI Flash memory? I include a 4Mbit SPI Flash chip on the board and only uses 2.2MB for the configuration bitstream. The big question is whether the SPI Flash would be fast enough. The NES is pretty old and I have received reports that the SST chip used is very fast.
        Quote from Alvaro who is working on ZPUino and did some speed tests:
        “Is that an SST25VF* flash ? If so, then looks perfect :P
        Quite fast actually. That one above is a full 48MHz clock, with dual-edge (setup on rising and sampling on falling [but clock inverted on output]).”

        So maybe it is possible to use SPI Flash as Program ROM?

        Jack.

      • Dan says:

        Now that’s an intriguing notion! Tricky, though, even with something as slow as the NES.

        I ran a few quick calculations: just looking at the CPU for now, it runs at 1.79 MHz, and expects to be able to complete a read in 1 cycle. That gives us, ideally, a bit over 550ns to complete a read (sounds like an eternity in this age of sub-nanosecond logic!).

        Since reads can be entirely random, we can’t do any sort of burst reads from the SPI flash*. Each byte will require a whole new read command/address to be issued. Conventional SPI flash takes a 40-bit transaction to read a single byte (8-bit command, 24-bit address, and 8-bits of data). Devices that support a high-speed mode can require more bits (the ones I’ve looked at so far insert an extra dummy byte, so you wind up having to transfer 48-bits.. 8 command, 24 address, 8 dummy and 8 data).

        Ignoring additional timing constraints (minimum CE high-times, etc.): 40 bits in 550ns is upwards of 73 MHz. 48 bits is 87 MHz. For the few devices I looked at, that’s a bit fast – the SST25VF040B-80, for example, is specced for a max of 33 MHz with normal reads, and 80 MHz for fast reads. 87 isn’t too far from 80.. you could probably get away with it in a non-production environment. It’s slightly more complicated, though, once you look at other timing requirements – the SST25VF040B specifies that the CE line must remain high for 50ns between accesses, so (depending on whether you want to violate that spec), you’re looking at more like 48 bits in 500ns – which is 96 MHz. That’s not looking so great.

        If you upgrade the SPI device to something that supports so-called “serial quad I/O” (sort of a 4-bit wide SPI), like the SST26VF016, then this becomes much more realistic (though 2 extra FPGA IO pins are required over a normal 1-bit SPI flash, and the price for such a part is certainly higher). The 40/48-bit requirement is still in place, and that device has the same 33/80 MHz limitation – but, now we can transfer 4 bits per clock! So, a 48-bit high-speed read now only takes 12 cycles, or a mere 150ns at 80 MHz. That’s plenty fast. Fast enough that it might even be practical to offload the character ROM to flash as well.

        *..unless some sort of cache was implemented, which allowed the NES to be briefly halted on a cache miss, and then run at a higher rate to “catch up” to real time. It’s not quite as crazy as it sounds; everything in my NES is actually already running at a higher rate (21.48 MHz), and is slowed down via clock-enables (the CPU runs on a divide-by-12, while the PPU runs on a divide-by-4, and the VGA output driver runs on the undivided clock). By tweaking the clock-enable generation logic, it would be possible to pause the entire NES for brief periods of time (while data is fetched from flash), and then run at up to 4x real-time to catch up without the user noticing anything unusual (provided that some amount of buffering exists between the PPU and the display device, so jitter doesn’t get translated to the timing-sensitive output signals; I already have that buffering in my implementation, in the form of a VGA line-doubler which must buffer a whole scan-line). Certain pathological memory read patterns that have very poor cache hit rates could potentially be an issue as well (unlikely, since the 6502 won’t generally make lots of back-to-back reads, so there would be recovery time).

        The original reason I thought about this whole pause-and-catch-up notion was to allow implementing all of the NES’ multitude of cartridge memory mappers via some sort of software trap-and-emulate function: when the game makes a write to a mapper control register, a special interrupt is triggered that forces the emulation function to be loaded by the CPU; that function saves the CPU state of the game, reacts to the mapper register write (and adjusts the CPU/PPU-visible memory map accordingly), and then restores the game’s state and returns control (at which point, the NES must make up for the time lost while running the emulation function). As it stands, I’ve implemented most of the mappers in hardware, and it isn’t pretty – they’re now consuming more FPGA resources than the entire rest of the NES combined, and causing some trouble for the synthesis tools.

        But that’s a rather more complex application. A simple cache between the CPU and an SPI flash device would be (relatively speaking) pretty simple. I may just have to try implementing it..

  2. Pingback: Papilio Arcade kit | WISH (Alpha)

  3. Jack Gassett says:

    Dan,

    Well it sounds like you might be on an interesting but challenging path. As I was thinking more about this problem and the problem of making a cartridge Wing it occurred to me that what would be even better than having the program ROM in the SPI Flash is to actually put the program ROM(s) on a SD card. (I already have a SD Card Wing) We just might be able to squeeze the speed we need out of the SD card if we use the four bit SD mode instead of the SPI mode for the SD card. I’m pretty sure I have seen SD cores on OpenCores.com that implement the faster four bit SD mode! I’ll take a look around and see if I can find one again.

    • Dan says:

      Absolutely – an SD card would be an ideal solution from a user perspective. That’s what I ultimately wanted to use for loading ROMs onto my FPGA NES, but I had planned to load them into RAM, and run them from there (currently I’m loading directly into RAM via USB).

      I am by no means an expert on the SD spec (hard to be with how tightly the full spec is controlled..), but two concerns come to mind for this sort of latency-sensitive execute-in-place application:
      1) SD is sector-oriented. Transfers are typically in 512 byte chucks. I don’t know if it is possible to transfer less than 512 bytes at a time (maybe by prematurely stopping a transfer), or if it is possible to start a transfer in the middle of a sector.
      2) SD doesn’t, to my knowledge, have guaranteed read latencies. So, the FPGA console would conceivably have to be able to deal with very high latencies at times.

      Sector-oriented and high-latency both tend to suggest requiring some sort of cache, but even that may not be sufficient for this application. Again, this is just from memory, so it would need to be double-checked.

      Even if the card can’t be used for directly running games, it’d still be a great way of loading them – in my experience, regenerating bitstreams just to change games is a real hassle! ;)

  4. G’day Dan,

    Nice work! Would be interested in the source when you release to port to some Altera platforms.

    Check out my site. I’ve done a few original designs myself, plus a heap of ports from other projects. I also had a quick go at doing the NES myself; rather than a verbatim implementation I did a ‘mock up’ using a generic tile-and-sprite engine and managed to get Tennis, Wrecking Crew and SMB running – for a ‘proof-of-concept’ really. I started on a more ‘correct’ implementation but got diverted onto other projects.

    Regards,
    tcdev

  5. Hey Dan!

    I just happen to see your site listed in the “Top Referrers” section of my VeriNES site. You should have emailed me or something and let me know that you were working on an FPGA-based NES! I like to keep up with other FPGA NES projects that people are working on (since there are so few). Anyway, sounds like you have made some awesome progress – nice work! I went ahead and added your emulator and website to the list of emulators on the NesDev wiki.

    Btw, in your block diagram I don’t see the DMC channel listed in the APU block. Have you not implemented it yet? Haha, the DMC is the only channel that I have implemented so far. Looking at all the other channels I think the DMC has got to be the hardest to get exactly right. Hey, speaking of, maybe you can answer a question for me related to the DMC channel….

    If you can believe it, I have asked numerous people online (mostly on NesDev) this question and not a single person has been able to provide me with a solid answer even with the plethora of knowledge/expertise out there about the NES. Here’s the question:

    —————————————
    If a sprite DMA transfer is already in progress (and therefore already in control of the bus and already deasserting the RDY signal on the CPU), does a DMC DMA operation override (interrupt) the sprite DMA process or does the DMC wait for the entire sprite RAM transfer to finish before taking control of the bus?
    —————————————

    It’s shocking to me that no one knows the real answer but I think that’s because 99% of the emulators out there are software-based and it turns out that the all “cheat” by performing both transaction “simultaneously”. A luxury that us hardware-folk don’t have. Haha. I’d love to finally get an answer to this if you are so inclined… ;)

    Pz!!

    Jonathon :)

    • Dan says:

      Hey Jonathon,

      Yeah, we never actually got around to implementing the DMC. And, at the time, we didn’t have any games that needed it (lacking mapper support, and all).

      I’ve barely touched the APU code since revisiting this project – it’s still the original VHDL code that Brent wrote a few years ago. When I finally get around to re-writing that block in Verilog, then I’ll add the DMC!

      So, no, I sadly don’t have any more insight into that particular corner-case than you do. Given how many of the other bits in the NES are implemented, I wouldn’t be especially surprised if the two DMAs try to happen simultaneously and just wind up corrupting each other… ;P

      It really is pretty remarkable that no one seems to know how the DMC and sprite DMAs interact, given how well understood most of the other low-level stuff is. Compared to other really obscure (seemingly) purely-internal things (e.g. PPU OAM fetch behavior), you’d think it would be relatively easy to put together a test program and watch the DMA fetches on the 2A03’s memory bus.. If I had a NES and a flash cart handy, I’d be tempted to try this myself – but I don’t even have a NES on hand at the moment (as ridiculous as that may be).

      Alternatively, I’ve actually been working on something else that might be useful for testing this behavior (it still requires a NES – or, at least, the CPU chip from one – but no flash cart or logic analyzer): I’ve developed something that lets you transparently interface a Verilog testbench with an arbitrary piece of external hardware (say, the Nintendo’s 2A03 CPU chip).

      So, one could write a testbench that feeds the real 2A03 with suitable instructions and data to attempt to trigger coincident DMA operations, and then (again) observes the DMA fetches happening. Plus, once you’ve tested the behavior on the real chip, you’d already have a testbench to check your clone with as well. ;)

      To elaborate on my real-world Verilog interface: Basically, I wrote a PLI/VPI plugin for the Icarus Verilog simulator which transfers signals between the simulation and an external interface device (currently, a USB AVR microcontroller). You instantiate a black-box of sorts in your testbench, which has a matching pinout to the AVR (the VPI plugin takes control of this black-box). Then you hook your DUT to the real microcontroller.

      When the simulation is running, signals are automatically propagated back and forth between the simulation and the real-world DUT (synchronous/combinational/input/output/inout/etc. all “just work” thanks to some VPI magic – only pull-ups/downs really have any limitations). Though, due to the need to handshake with the simulator on every state chance, it’s pretty ridiculously slow – like, < 500 Hz slow (with the high-latency USB connection) – so it only works with DUTs that have no minimum operating frequency. I'm hoping to eventually improve performance by migrating to a faster interface (even an ancient parallel port would likely be over an order of magnitude faster).

      Anyway. I'm hoping to do a writeup on that project sometime in the near future. And, I plan on resolving that not-having-a-NES problem soon, too – since I actually need some real Nintendo hardware to test my NES Cartridge Adapter board with!

      • “Yeah, we never actually got around to implementing the DMC. And, at the time, we didn’t have any games that needed it (lacking mapper support, and all).”
        No??? What about Duck hunt?! :) Quack! Quack!, Arf! Arf! Of course, you can’t use the light gun unless you hook up your NES to a CRT (won’t work on LCD).

        “…I wouldn’t be especially surprised if the two DMAs try to happen simultaneously and just wind up corrupting each other…”
        Actually, I think you’re right on this. I just happened ran into some new info on the nesdev forums that makes me think this is the case (info courtesy of Kevtris).

        …since I actually need some real Nintendo hardware to test my NES Cartridge Adapter board with!”
        That’s really cool. I was working on making one myself at one point but I would have to learn how to make PCBs which I have no knowledge of. I also figured I would need too many CPLDs for all the pins on the cartridge and the adapter board would be too big. So what did you do – make some kind of interface “protocol” (that works on 16-bits) to the cartridge rather than pull all the pins out to your FPGA-NES? If so, pretty neat. Also, it may interest you to know that while I was doing my research for my cartridge adapter board I discovered that the high-performance (XL) versions of the Xilinx CPLDs (e.g. XC9572XL) can tolerate 5V on their input and will still output 3.3V. So they can essentially translate the voltage for you. Might wanna give it a shot! ;) But I also suspect that the ROMs in the cart will also run on 3.3V – most of those super old CMOS chips can operate on very low voltage. In fact, I have my NES controller (which has a tiny CMOS shift register as you already know) connected directly to one of the 3.3V banks on my FPGA and it works perfectly. You may want to try that as well.

        Pz!!

        Jonathon

      • Dan says:

        What about Duck hunt?!

        Ha.. I completely forgot about Duck Hunt! Back when I still actually owned a CRT monitor, I did try to get the light gun to work, but didn’t have much success. At the time, I figured it had something to do with interlaced (NTSC TV) vs. progressive (VGA CRT), and didn’t spend much time debugging it. And, since Duck Hunt isn’t much fun without a functional gun, there wasn’t much incentive to get other supporting features (e.g. DMC) working.

        Have you been able to get the light gun to work with VeriNES? I haven’t owned a CRT in years, so it’s doubtful I’d ever make another attempt to get the light gun working with mine..

        So what did you do – make some kind of interface “protocol” (that works on 16-bits) to the cartridge rather than pull all the pins out to your FPGA-NES?

        Yep. I have a time-multiplexed notion in mind. The 16-bit interface runs at ~43 MHz (2x the NES master clock), and transfers various pieces of the cartridge bus on each time slice (PRG address low, PRG address high, PRG data, CHR address low, CHR address high, and CHR data) – all timed so that any outgoing data/address signals are sent as early as possible, and the incoming data lines are read as late as possible (so it stands a chance of still working with slow memory chips on older cartridges).

        That careful timing is currently dependent on slightly tweaking the NES clock waveforms – instead of running the PPU on a /4 clock and the CPU on a /12 clock, I effectively create a common /3 clock; the PPU is enabled for 3 rising edges, then the CPU is enabled for 1 rising edge, then back to the PPU.. (etc.). The net effect being that the CPU and PPU each run at their nominal rates, but now there is an equal amount of time between transactions (enough time to send 3 bytes: address low, address high, and data out; then, turn the bus around; and finally read 1 byte: data in). I have some timing diagrams drawn up, and the Verilog code for the CPLD written, so I’m pretty sure it’ll work. But, I haven’t yet actually tried hooking it up the FPGA NES (in simulation or otherwise) to confirm that it does.

        ..the high-performance (XL) versions of the Xilinx CPLDs (e.g. XC9572XL) can tolerate 5V on their input and will still output 3.3V. So they can essentially translate the voltage for you.

        Absolutely! I really wanted to use one of the XL parts (a single-chip solution would be ideal).. but I wasn’t confident that all cartridges would be happy with just 3.3V signaling levels. So, I’ve put a true 5V CPLD in the first design, and I have jumpers on the board that will allow selecting between 5V/3.3V IO and 5V/3.3V cartridge power, so I can easily experiment to see what ones work.

        In fact, I have my NES controller (which has a tiny CMOS shift register as you already know) connected directly to one of the 3.3V banks on my FPGA and it works perfectly.

        Ah, now that’s a bit different than my past experience – of the two controllers that we originally had, only 1 of them was happy with 3.3V. The other needed 5V power and 5V signaling to work reliably – hence my concerns that other NES hardware of that vintage (cartridges) may have similar requirements. The controllers that I have now are cheap modern-day clones, which, of course, work just fine on 3.3V

      • “I have jumpers on the board that will allow selecting between 5V/3.3V IO and 5V/3.3V cartridge power, so I can easily experiment to see what ones work. …snip… of the two controllers that we originally had, only 1 of them was happy with 3.3V. The other needed 5V power and 5V signaling to work reliably”

        Ah, I never considered that maybe I had a newer model chip or a chip that just by chance happened to be happy with 3.3V. So I think it’s great that you put jumpers in to try different options.

        Also, I forgot to say in my previous post but I think your logic analyzer/tester idea with the verilog testbench and PLI/VPI plugins is really awesome.

        Pz!

        Jonathon

      • Hey Dan,

        I’ve uploaded a fairly comprehensive video demonstration of my emulator and GUI front-end to youtube. It demonstrates essentially everything that I’ve done up to this point if you’re interested. Be warned, it’s almost an hour and a half long total. ;)

        Part 1 – http://www.youtube.com/watch?v=pUhgRFLz6xg
        Part 2 – http://www.youtube.com/watch?v=7tkDA706mPY
        Part 3 – http://www.youtube.com/watch?v=BOVSQlWY7yM
        Part 4 – http://www.youtube.com/watch?v=Y_2tFN3xd9Y
        Part 5 – http://www.youtube.com/watch?v=GdKulbB7xCg
        Part 6 – http://www.youtube.com/watch?v=0MClsQF4WLE

        Pz!

        Jonathon

  6. I worked on a cartridge emulator for the NES many years ago when I was just getting into FPGAs.

    The hardware had an FPGA, some flash, some SRAM, and card-edge connectors for NES, SNES & N64. We got the SNES and N64 working, and managed Tennis on the NES but with snowy graphics.

    The problem with NES of course is the asynchronous buses. Unfortunately we only had a single bus for the flash/SRAM on the board so the only option was to time-multiplex the two NES buses. We could generate a clock up to 80MHz on the board. I started looking at the timing in more detail but moved on to emulation of complete systems before I got to far into it. IIRC the access pattern on the CHR bus was actually less random than appeared on 1st glance and I deduced it would be possible to cache some data in the FPGA in order to reduce the number of accesses to flash/SRAM.

    I really should re-visit that project. I know a lot more about FPGA design now than I did back then. We got around 6 PCBS made and two boards assembled and it cost around AUD$1,000. Unfortunately we took a short-cut and used a custom DC-DC converter from another project – and we have no more of those. Won’t make that mistake again! But the two boards should still be working.

    • Dan says:

      Very cool, Mark!

      The hardware had an FPGA, some flash, some SRAM, and card-edge connectors for NES, SNES & N64. We got the SNES and N64 working, and managed Tennis on the NES but with snowy graphics.

      I’ve seen NES cart emulators before – but a (mostly) functional all-in-one NES/SNES/N64 cart emulator? Most impressive!

      The problem with NES of course is the asynchronous buses. Unfortunately we only had a single bus for the flash/SRAM on the board so the only option was to time-multiplex the two NES buses.

      Tricky, that. Other cart emulators I’ve seen don’t even bother with being clever, and just use two sets of SRAM. Though, being able to run everything from one memory device would certainly be my preference!

      With my current FPGA NES implementation, I have the PRG/CHR memory all residing in one shared PSRAM chip (effectively a 16MB 70ns SRAM), and it works very well. Of course, I have the not-inconsiderable advantage of having the NES all inside an FPGA with synchronous logic, so I have complete control over timing and don’t have to worry about any sort of huge setup/hold times on the real NES hardware (and can even cheat a little – e.g., by not having to muliplex the PPU data/address, I can know the whole CHR address a cycle early).

      IIRC the access pattern on the CHR bus was actually less random than appeared on 1st glance and I deduced it would be possible to cache some data in the FPGA in order to reduce the number of accesses to flash/SRAM.

      Yeah, it’s predictable to an extent. Short of actually emulating most of the PPU in your cart emulator, though, you might have trouble predicting far enough in advance to gain a lot of benefit. One major point, that you may well already be aware of: the PPU may run at 3x the CPU speed, but it takes 2 cycles to make a fetch (so fetch rate is only 1.5x the CPU speed); it uses a multiplexed data/address bus, so the 1st cycle is just used for latching half of the address.

      Brad Taylor’s 2C02 technical reference is an excellent document for learning about the nitty-gritty details of the PPU, including exact memory fetch patterns (my PPU is largely based on this document).

      I really should re-visit that project. I know a lot more about FPGA design now than I did back then.

      Ha! I hear ya there – it was a lot of fun picking up my FPGA NES project after having not touched it in a few years. It’s remarkable what a couple years worth of learnings can do for a project. :)

  7. Bart Zuidgeest says:

    I suggest you check out
    http://www.fpgaarcade.com/

    I have no relation to the site, besides waiting for their board to become available. If your nes code would run on it that would seem perfect to me. their fpga arcade board should be more capable than the gadgetfactory one and has everything needed for arcade /console emulation on board. (that is, look at it with novice eye’s)

  8. Aswino says:

    How about if we use USB Gamepad?Can you explain to me?thank you.I really need this for my final project..Thank you very much.I’m using Spartan 3E XC3S500E

    For additional information.If we use USB Gamepad,we must use the converter, to convert USB to PS/2..Because on the development board,USB port was not provided

    • Dan says:

      A USB gamepad is orders of magnitude more complex than a NES-like controller (which just houses a simple shift-register for doing parallel-to-serial conversion). Interfacing that directly to an FPGA would be a non-trivial task, since you’d effectively be building a whole USB host controller (or some subset thereof).

      (presuming, of course, that your specific gamepad doesn’t support some form of legacy emulation mode; many USB keyboards and mice have this, which allows for the use of dumb “USB” to PS/2 converters (it’s not USB at that point; the keyboard/mouse switches to a PS/2 protocol when it detects a dumb converter).. but I’m not aware of a any gamepad that supports such a mode)

      There are a couple of open-source USB host controllers on OpenCores which might be applicable. There’s still the whole system-integration and firwmare aspects to consider, though..

      In the USB microcontroller world, there exist libraries that can make this sort of thing easier. For example: for Atmel’s AVR series of 8-bit micros, there’s Dean Camera’s LUFA USB library. It claims to support being a host for USB HID devices (like a gamepad).

      The easiest thing to do (but perhaps not the most educational/fun/exasperating), of course, would be to find a gamepad that uses a simpler protocol!

  9. Adrian Gonzalez says:

    Hi!
    We`re two students working also in emulating the NES with a FPGA, but using Handel-C (a language developed by Celoxica).
    Could you send me your VHDL code? It would be very helpful with us. Also, could we ask you if we had doubts about how the NES works?

    Thanks!

    Adrian Gonzalez.

    • Dan says:

      Neato. Implementing a NES in a high-level synthesis language could be very interesting indeed.. HLS tends to be more geared for algorithmic/dataflow-oriented designs (e.g. data compression or image filtering), with little concern over the particular timing details (timing-critical interfaces tend to be inferred or instantiated RTL macros (possibly supplied by your HLS vendor, if it’s a common enough bus type (e.g. AMBA)), rather than trying to describe their particulars in C).

      Creating a cycle-accurate NES clone in a language that’s trying to shield you from such details may be an adventure! (even just porting some of the NES’s more esoteric asynchronous bits into a purely synchronous RTL design requires a bit of finagling..)

      As for code availability.. I’ve been meaning (for quite some time) to post it here, but there are a lot of little bits of maintenance work to do before then (perhaps I’m holding my old code to too high of a standard..). It’s, sadly, a low priority for me (so many projects..). And I’d really rather not share it privately before I post it publicly here.

      Everything I know about the NES’s inner workings comes from publicly available info (the vast majority of which is archived on NesDev).. But you’re certainly welcome to ask if you’ve any questions (though I am quite clearly not known for my timely responses!).

  10. Lyle Mustard says:

    Hello Dan,

    Very impressive set of work you have here! I am looking to sniff the RAM data exchanges from an NES, but thought it would be easier if I had more direct control of the hardware. I sincerely apologize if you have an answer to my request posted already and I just missed it, but is the source code available for this project? If it isn’t, would you be willing to pass it over to a fellow designer? It would be of great benefit to the 7th annual Ice Hockey tournament coming up in January 2012. (5 Nintendos, 20 intoxicated entrants, one hand-made cup for the champion to take home.) It all started with a desire to make a hardware scoreboard that would use real-time game data, and has expanded to a desire to pass all real-time data to an (existing) statistical database.

    I am looking to get this started as soon as possible, so any useful information would be greatly appreciated.

    Thank you, and best wishes with all your projects.

    • Lyle Mustard says:

      Hello again Dan,

      Sorry to try to originally shortcut all the work you did, I meant no offense. I am looking towards a similar project and hope I have the same level of success that you guys have.

      I was wondering if you had ported all the code over to Verilog because of your familiarity with it, or is there some other advantage? I have only touched on VHDL with some labs in college, but would like to know if there is any considerations on choosing a language for my current work.

      Thanks for your time.

  11. sergi says:

    Hi Dan,
    congratulations for the work, specially the PPU (coded in only two weeks !!!).
    What are the resources occupied by the project (slices, bram)?
    Thanks

    • Dan says:

      Here’s the usage numbers for all the major components (when targeting a Xilinx Spartan-3 1000):

      Module          Slices  Flops   LUTs    BRAM
      --------------------------------------------
      Top             3711    1588    5088    5
      -Cart           1129    390     1797    1
      --Mappers       848     258     1404    0
      --PSRAM control 184     61      255     0
      --INES parser   93      71      132     0
      -CPU mem        0       0       0       1   
      -CPU            1229    619     1763    0
      --APU           634     453     850     0
      --CPU core      534     140     845     0
      -PPU            587     425     817     1
      -PPU VRAM       0       0       0       1
      -VGA scaler     373     55      461     1
      

      This if for the version that includes mapper support and uses external PSRAM for most data. The original version that didn’t support mappers nor external memory was much smaller (logic wise), but used all of the BRAMs in the FPGA.

  12. Ronald Smith says:

    I just stumbled across this, back in the day i used to wait for news on the Kevin Horton board, though that seemed to go no where.

    Extremely interested in seeing where this project ends up, especially if it means i can essentially have an RGB supporting NES (this already supports VGA so a simple scart converter should do the trick) and run all my games off SD!

    Do you plan on adding famicom disk system support? and do you plan on actually implementing a cart adapter? or is it just a would be nice to have but not critical feature?

    Keep up the good work, can’t wait to see where this all ends up.

  13. Andrew says:

    Hi Dan,
    I am a student working on a 4 player pong game that uses 4 NES controllers on my Nexys2 board. I’d like to use the Parallax adapter instead of cutting the ends off my controllers and wiring them in to the board. Can you elaborate further on “a warning, though: its pinout is not quite directly compatible with the Nexys”? My group starts work on this project next week and I’d appreciate any advice!

  14. brainpann says:

    Hi and thanks in advance. Please excuse me if what I am asking is out of the realm of possibility ( or practicality). I am just recently learning of fpga’s and was curious if one could be programmed as a replacement ppu on original NES hardware? The reason would be to get an RGB signal out of the NES rather than composite. Many collectors try to get the most out of there hardware and to get rgb video out of an NES requires replacing the ppu with that from VS System arcade hardware. Besides the cost issue, the hardware is becoming extinct. Maybe if an fpga could be used, people wouldnt destroy good hardware for a single chip.

    • Mef says:

      THIS!

      Yeah, I’m very interested in the anwser to same question. Either RGB or even VGA output. Not only would it save a couple Playchoice 10’s PPUs but also provide proper palette and a solution for PAL-version which can’t have its PPU swapped just like that.

    • James says:

      This is my greatest interest as well. Implementing the entire NES in FPGA is cool, but I have a real NES already. What I would really love is the PPU implemented in a smaller FPGA that I could install on a daughterboard and plug in place of the original PPU to get RGB or component video output.

  15. akeshet says:

    Really cool.

    You mention having to intentionally implement several obscure NES quirks and bugs, which I find deeply amusing. Can you give an example?

  16. P_M says:

    Hi! What is the whole hardware definition also stored in a PSRAM? If it is, what source (re-)loads the content in the PSRAM when/if it becomes emptied?

  17. Joshua Cadmium says:

    I second what people have posted here previously. If you could design a drop in PPU that output VGA/RGB (especially in the original colors, which is impossible with chips taken from Playchoice and Vs boards, or some way to switch between NES and arcade colors which would be perfect) you would have a small gold mine on your hands. I’d happily purchase several from you and I’m sure many people would as well. You could probably sell several thousand of them.

    Please look into doing this, as it seems like you already have solved the main problems, if not all of them.

  18. raz says:

    hi^^
    you’re project is very interesting and awesome ,, i’m a beginner in fpga and i’ll do my seminar on your project .. so please can you give me the vhdl program .
    thanks

  19. Darksoft says:

    Hi Dan,

    there is a hardware on the market with a Cyclone III that emulates already several systems. It’s called Chamaleon64. Would you be willing to port it to this system or provide some help so others can profit from your work?

  20. retrofan says:

    Hello Dan,

    Would you be interrested to support the Multiple Classic Computer Project with your NES Core ?

    This System alreday supports the Commodore C64, Amiga 500, Atari 2600, ZX Spectrum and Apple 2e Core.
    The NES would be great extensions.

    I touched base with them and they are great guys and are open to support anyone who likes to add his Core to the Multiple Classic Computer.

    Check Out the following links:

    http://mcc-home.com/

    Facebook:

    http://www.facebook.com/arcaderetrogaming

    Would be nice if you like to consider your core to be added to preserve the good old Classic Computers
    and Retro Gaming Systems.

    Thanks,

    The Retrofan

  21. Pingback: FPGA NES » µ[micro]electronics info

Leave a comment