Eventually, when my FPGA stereo-vision project nears its terminus, I’m going to want to produce a refined sensor board that combines the image sensors and FPGA onto a single board. In preparation for that, this board is a test vehicle to investigate what it takes to design and assemble a compact PCB with multiple BGA packages using only tools and services that are within the reach of a well-equipped hobbyist.
Major features include:
- Spartan-6 FPGA in FT256 package (up to an XC6SLX25)
- 64MB 800Mbps x8 DDR2 SDRAM
- 18 high-speed LVDS pairs for FPGA expansion (across 9 “SATA” connectors)
- 16 low-speed 3.3V signals for FPGA expansion (on a Gadget Factory “Wing” style header)
- 100 MHz oscillator
- ATmega32U2 USB microcontroller (responsible for configuring the FPGA via JTAG)
- 2MB SPI flash for non-volatile bitstream and data storage
- Single 5V supply (onboard regulators for 3.3V/2.5V/1.8V/1.2V rails)
- Single JTAG port selectable between AVR and FPGA
- 85mm x 50mm 4-layer PCB (3.35″ x 1.97″)
Looking at the top two items on that list – an FPGA in a 256-ball 1.0mm BGA package, and a memory device in a 60-ball 0.8mm BGA package – one can easily imagine that assembly is going to be the trickiest part of this project.. but this post isn’t about the assembly of the board, seeing as I’ve only just sent it off to be fabricated (this time by Laen’s 4-layer PCB service). I’ll make a follow-up post once the board is back and assembled.
Instead, this post is entirely about the design and layout of the board.
The star of this board is one of Xilinx’s Spartan-6 FPGAs. It’s quite the step up from their prior Spartan-3 devices (easily justifying the 100% series-number increase). Many excellent features from the higher-end Virtex devices have trickled-down to the Spartans, and at least one is entirely new. The three biggest ones in my mind are:
- 800 Mbps x16 DDR memory controllers (2-4 total controllers, depending on device and package): Spartan-3s had to make due with controllers implemented in soft-logic, and were limited to 333 Mbps. Spartan-6s are capable of over twice the performance with zero logic usage.
- 1050 Mbps SerDes blocks with variable input delays: Spartan-3s, again, had to use soft-logic for SerDes – and their LVDS I/Os were limited to lower speeds. Without run-time adjustable input delays, per-bit deskew was essentially impossible for Spartan-3s. Spartan-6s fix all of that, making 1+ Gbps source-synchronous interfaces practical.
- 6-input LUTs: older devices relied on 4-input LUTs as their basic building block. There are a lot more functions that efficiently map to 6-input LUTs, thus improving logic-usage efficiency. For example, a 4:1 mux will fit perfectly into a single 6-input LUT, but it takes three 4-input LUTs to achieve the same functionality. Adder trees can be made denser, too – as each 6-input LUT can implement one bit of a 3-input adder (a 9-input adder can be built with just 4 LUTs per bit, while it would take 8 LUTs per bit in an older device)
In order to take advantage of the FPGA’s integrated memory controller, the board includes a single 8-bit DDR2 memory device. I had really wanted to use a x16 part, for greater bandwidth – but routing it on a low-spec 4-layer board would have been extremely difficult. x8 was tricky enough.
The 4-layer stack-up used for this board wasn’t conducive to 50-ohm routing, but 75-ohm traces were nearly ideal (0.006″ minimum-width traces work out almost exactly to 75 ohms). The DDR2 spec allows for 75-ohm traces, and both the Spartan-6 FPGA and the DDR2 part support integrated 75-ohm terminations. Even the differential clock was routed with a 150-ohm pair of traces; this isn’t strictly supported, but the 150-ohm termination resistor placed by the DDR2 part should allow it to work just fine.
The Spartan-6 memory controller user guide is quite specific about trace-length matching. All of the signals between the FPGA and the DDR2 part are matched to within 3mm of each other – hence all of the serpentine traces. It was a tedious process, since EAGLE has no concept of length-matching constraints (it does, at least, bundle a script for measuring/comparing lengths of multiple traces-of-interest).
The memory controller block is really meant to be used with a much higher layer-count PCB. A non-trivial amount of PCB real-estate was used to re-arrange the signals coming from the FPGA so they matched up with the memory device’s pins. Most of the slower address and control signals are transitioned to the bottom layer, where they are re-positioned. They’re brought back up to the top layer before connecting to the DDR2 part.
The faster data signals are routed completely on the top layer (no vias). A continuous ground plane is maintained under all memory signals on both the top and bottom layers, and lots of vias are used to bridge the upper and lower ground planes in the vicinity of signal layer changes (the power plane has a ground island in it, for this purpose).
There wasn’t quite enough room around the FPGA itself to bring out the complete x16 data interface (at least, not without substantially compromising the ground and power planes). Even with just a x8 interface, compromises had to be made; I wasn’t able to connect the DM (data mask) signal to the DDR2 part, so it won’t be possible to write individual bytes to memory (the smallest writable set of data is a 4-byte word, since DDR2 typically operates with burst lengths of 4).
The memory controller monopolizes the vast majority of I/O Bank 3 on the FPGA.
High-speed LVDS is my preferred method of interconnecting high-bandwidth logic. With my FMC-LPC to SATA adapter board, I standardized on using SATA connectors and cables to to provide a cost-effective way of transporting LVDS signals. Thus, SATA connectors are used here.
There are 9 SATA connectors in total, comprising 18 differential pairs (36 signals). 4 of the connectors are connected to Bank 2 of the FPGA, while the remaining 5 are connected to Bank 0 of the FPGA. These two banks are used almost exclusively for LVDS (with the exception of one lone configuration signal).
Since SATA cables are nominally 50-ohm (100-ohm differential) mechanisms, I didn’t have the luxury of using 6-mil 75/150-ohm signals on the PCB. Each LVDS pair is routed as a wider length-matched (more serpentining!) 100-ohm differential pair.
Being the potentially highest-speed signals in the design, all of the LVDS pairs are routed without layer changes (no vias, again), and they’re all referenced to an unbroken ground plane. Avoiding the use of vias keeps the other layers cleaner (especially the ground and power planes), but has the downside of making it impossible to escape-route more than about 2 rows of BGA pins. Thus, many FPGA I/O pins are left unconnected.
In addition to the high-speed LVDS expansion, the board also features 16-bits worth of [relatively] low-speed 3.3V I/O. I’m using a Gadget Factory “Wing” style expansion connector, as seen on the popular Openbench Logic Sniffer and Gadget Factory’s own Butterfly/Papilio One board.
The Wing connector exports 16 I/O lines and 3 different power rails (5V, 3.3V, and 2.5V) on an easy-to-use 0.1″ header. I’ll be using this style connector on any of my future boards that don’t have high speed requirements (as I’ve already done on my NES cartridge adapter board).
Most Wing peripherals tend to be 3.3V devices, so these low-speed lines connect to their own dedicated 3.3V I/O bank on the FPGA (Bank 1). The 45nm Spartan-6s are, however, a bit picky when it comes to 3.3V signaling (their 40nm Virtex-6 siblings don’t even support 3.3V signaling). An IDT Quickswitch is used to prevent minor voltage spikes from making it to the FPGA (and also allows for 5V peripherals to be connected directly to the board without additional translation). By supplying 4.3V to the Quickswitch, overshoots are limited to 3.3V (as outlined in IDT’s 5V/3V interfacing application note).
In addition to the FPGA, there’s also a USB-enabled Atmel AVR microcontroller (an ATmega32U2). Its primary responsibility is configuring the FPGA.
The AVR’s SPI port is connected to the FPGA’s JTAG port, giving it complete control over the configuration of the FPGA. For post-configuration communication, 5 I/Os from the FPGA are connected to the AVR’s USART (which can be used as a UART or an SPI master). The FPGA’s JTAG port could also be used for communication by instantiating a special boundary-scan primitive in the design.
Also connected to the AVR’s USART is a 2MB SPI flash device. I had originally planned to use a micro-SD card for non-volatile storage, but the physical size of the holder was prohibitive. The flash device’s main purpose is to store configuration bitstreams for the FPGA. The AVR will read the configuration from the flash and write it to the FPGA at power-up.
While the FPGA is actually capable of natively loading its configuration from a variety of SPI flashes, it requires the connection of several additional signals to the FPGA. I didn’t have any room left to route them without sacrificing some of the board’s expansion capabilities. Having the AVR perform configuration is a small price to pay for improved expandability.
Most of the FPGA’s configuration signals are on a 2.5V domain, so level translation is required for interfacing with the 3.3V AVR. The speed-requirements aren’t especially high, as the AVR is limited to running at 8 MHz. Thus, one of TI’s bidirectional direction-sensing translators was selected (a TXS0108E). Due to the presence of pull-ups on some of the JTAG lines, I had to be especially careful in my translator selection.
Many voltage-level translators (like the similar TXB0108) rely on internal weak-keeper circuits to hold the logic level on either side of the device; if there are any pull-ups/downs on the line, they can override the keepers and cause the translator to incorrectly believe that a device is trying to actively drive the bus (when, in fact, the bus is still being actively driven from the other side). This rarely ends well.
Both the AVR and the FPGA have programming/JTAG interfaces that need to be externally accessible for development purposes. The AVR has a proprietary serial programming interface (accessed by the same pins that comprise its SPI port), while the FPGA has a true JTAG interface. They both use very similar signals (a reset/mode line, a clock, a transmit line, and a receive line), and can both be connected to with a no-frills parallel-port cable.
One interesting consequence of having the AVR’s SPI/programming port connected to the FPGA’s JTAG port is that they can pretty easily share a single JTAG header on the board, which is what I’ve done here. A jumper is provided that disables the level-translator for the FPGA’s JTAG port (and simultaneously enables a driver that hooks the JTAG header’s TMS line to the AVR’s reset line), thus allowing the AVR to be initially programmed. A smaller header is provided to give access to the AVR’s debugWire port (which can only be used after setting some of the AVR’s fuse bits via the conventional programming port).
To connect to the FPGA via JTAG, the aforementioned jumper is removed. This connects the JTAG header to the FPGA via the 3.3V to 2.5V translator chip. The AVR can optionally be held in reset, to prevent contention.
Once suitable AVR firmware is developed, it should be possible to conduct most development work via the AVR’s USB port (as the AVR can both reprogram itself, and configure the FPGA). The only need for the development headers would then be in-circuit-debug (e.g. using GDB with the AVR, or ChipScope Pro with the FPGA).
This board has no less than 5 different voltage levels on it:
- 5V – Wing expansion connector and Quickswitch translator
- 3.3V – AVR, flash memory, main oscillator, and Bank 1 of the FPGA (for the Wing I/O)
- 2.5V – FPGA auxiliary logic, and I/O banks 0 and 2 (for LVDS expansion)
- 1.8V – DDR2 memory and Bank 3 of the FPGA (connected to the DDR2 part)
- 1.2V – FPGA core logic
The 5V rail is easy, as it’s supplied externally. Power can either be sourced from the USB port, or form a separate 2-pin power header. A jumper makes the selection.
The 3.3V rail powers relatively slow and low-power logic, so it’s furnished by a simple SOT-223 packaged linear regulator (an MCP1826S).
The 2.5V and 1.2V rails are expected to be the most power-consumptive in typical applications; the 2.5V rail powers all of the high-speed off-board I/O, while the 1.2V rail powers all of the FPGA’s core logic. Thus, both of these rails are supplied by switching regulators. I didn’t want to spend a lot of time designing and laying out a couple of SMPS units (I’ve been there before), so I turned to National Semiconductor’s LMZ10503 – a member of their “simple switcher” series. While expensive, the device integrates all of the major components for an SMPS (including the controller, power switches – and even the inductor). I merely had to supply decoupling capacitors and a feedback network.
The 1.8V rail is only responsible for powering the DDR2 memory device and interface. For this, I’m using another MCP1826S linear regulator to drop the 2.5V rail down a little to 1.8V. While the power consumption of a fully active 800 Mbps DDR memory interface isn’t something to be idly ignored, there shouldn’t be too much of an overall efficiency loss from using a linear here. If I were designing a cell-phone, where efficiency is critical for battery-life, I would reconsider the use of an SMPS for this rail.
The AVR has control over the enable lines for the 1.2V and 2.5V switchers (and, thus, also controls the 1.8V rail). The 3.3V rail can’t be turned off, since it powers the AVR; nor can the 5V rail, since it powers everything. In order to allow the AVR to completely shut off power to the FPGA and expansion connectors, two small MOSFETs are used to provide switched versions of the 3.3V and 5V rails.
Many of the power rails are routed on the dedicated power plane (layer 3; adjacent to the bottom copper layer). The 1.8V, 2.5V and 3.3V (un-switched and switched) rails are all on the power plane. The power plane also hosts a small ground island for the DDR2 memory signals.
The 1.2V rail is routed exclusively on the bottom layer. It consists of a copper pour that extends under the FPGA, where vias connect it directly to the FPGA.
Quite a lot of attention went into making sure the power and ground planes were as contiguous as possible – especially in the vicinity of the DDR signals. The large vias required by low-cost 4-layer services tend to cause planes to be substantially broken when used in tight groupings (as one does under a BGA).
Again: I’ve only just sent this board out for fabrication. I should be receiving the completed boards in a week or so, but it may be a while before I actually get around to assembling them. Rest assured, however, that I’ll eventually be making a follow-up post about the assembly of the board – no matter how gruesome those BGA packages turn out to be…