Another piece of my ongoing FPGA stereo-vision project. This board is, as the name suggests, a breakout board for Aptina’s excellent MT9V032 1/3″ VGA image sensor.
The board’s main purpose in life is to connect the LVDS output of the MT9V032 sensor to my FMC-LPC to SATA adapter board, which would then route the LVDS data into one of Xilinx’s Spartan-6 FPGA development boards. Multiple camera boards would be connected to support stereo vision.
There’s more to the board than simple signal breakout, however.
The MT9V032
There are a few big reasons for selecting the MT9V032 – but the biggest, by far, is its global shutter (“TrueSNAP” in Aptina’s marketing parlance). The vast majority of CMOS image sensors use a simpler rolling shutter, which leads to all sorts of troublesome visual artifacts. The MT9V032, on the other hand, is able to expose all of its pixels for the exact same slice of time, making it immune to many of those artifacts. That doesn’t, of course, do any good for more conventional artifacts – like motion blur; only shorter exposures can help with that.
Which brings up another point: the MT9V032 (for its size) has very large and sensitive pixels. It only has 752×480 of them (wide-VGA), but they’re each 6µm x 6µm – compared to, for example, the diminutive 1.75µm photo-sites in Aptina’s 2592×1944 (5 Mpx) MT9P013. The upshot being that the MT9V032 can take usable images in lower light conditions, and it can use higher shutter speeds to reduce motion blur.
Beyond raw sensitivity, the MT9V032 supports a form of high-dynamic-range exposures. By progressively reducing the sensitivity of its pixels over the course of an exposure, the sensor is able to approximate a non-linear response. This significantly increases the dynamic range that the sensor can capture in a single exposure – making it much easier to operate in environments with wildly varying lighting conditions (e.g. outside on a sunny day).
For a mobile robot operating in an unstructured outdoor environment, each of those features is pretty compelling (perhaps even mandatory). Finding them all in one device is even better. Those features come at a (monetary) cost, however – in small quantities, it’s easily twice the price of less-functional VGA sensors. That’s a trade-off that I’m willing to make.
The MT9V032 has one other interesting feature: it supports outputting all of its image data over a single high-speed (320 Mbps) LVDS pair. In a stereo vision setup, it even supports cascading two image sensors so that they can share the same LVDS pair.
LVDS doesn’t offer any direct benefit to the robot’s capabilities, but it does dramatically reduce the amount of wiring required to connect to each image sensor – which has the potential benefit of allowing remote sensors to be placed in more ideal locations. This breakout board capitalizes on the LVDS capability of the MT9V032 by allowing its entire interface to be handled by a single 4-signal SATA cable (with the exception of power, which is supplied separately).
Design
That 4-signal interface is the main reason for the board’s additional complexity. 2 of the signals are immediately used for the LVDS video stream. 1 signal is needed to convey the 26.67 MHz reference clock to the image sensor – which can’t be locally generated, since the standard (non-MGT) deserializer blocks in the Spartan-6 FPGA are unable to perform embedded clock recovery (furthermore, precisely synchronizing multiple cameras in a stereo pair would be considerably trickier without a common reference clock). That leaves just 1 signal to configure and control the sensor.
The sensor uses an I2C interface for most configuration, and has a handful of additional control signals that may come in handy for certain applications (low-power standby controls, external exposure triggers, etc.). Placing a small 8-bit AVR microcontroller (an ATmega168) on the board was an easy way to provide local control over all of the MT9V032’s signals. To communicate back to the main FPGA, that 1 remaining signal is used as a bidirectional half-duplex serial bus (it’s hooked to both the transmit and receive lines of the AVR’s UART).
The image sensor and the AVR are nominally 3.3V devices with 3.3V I/Os (except for the MT9V032’s LVDS output – which is, well, LVDS). The Spartan-6 FPGA on the other side of the SATA cable is, however, setup for 2.5V signaling (which Xilinx FPGAs require for LVDS transmission). As a result, some level translation is also needed on this board – and it needs to be pretty fast, too, since both the fast ~27 MHz (54 Mbps) clock and the slow serial bus need to be translated. TI has a whole slew of nice voltage translator – including the SN74LVC1T45 dual-supply bidirectional translator, which I’ve used on this board. It’s rated for well over 100 Mbps.
The translators don’t have automatic direction sensing (a feature which typically imposes a significant performance penalty). The AVR is, thus, tasked with manually controlling the direction of each translator. Normally, the clock signal will always be received by the board – still, there are provisions for (temporarily) co-opting the clock line as an additional bidirectional communication line with the FPGA. The serial line’s direction must, of course, be suitably driven by the AVR depending on whether or not it is actively transmitting or receiving (a trivial burden, as the AVR must already do this with its own TX/RX pins).
2.5V power for the translators (they’re the only devices on the board that need it) comes from a small SOT-23 linear regulator (a MCP1700). 3.3V power for the rest of the logic is also via a linear supply (a larger, SOT-223 packaged, ZLDO1117). The analog supplies to the MT9V032 are run through ferrites, to remove some of the noise introduced by the high-frequency digital logic on the board (the AVR and the MT9V032’s own digital logic). The input to the board is nominally 5V – but, since this supply only drives the 3.3V regulator, it can tolerate significant variation (subject to the ZLDO1117’s input voltage and power dissipation limits). Current draw of the board when running flat-out (60 FPS with LVDS enabled) is on the order of 100mA, so keeping the input voltage low reduces power loss in the linear regulator.
Having a dedicated microcontroller on the board opens up additional possibilities, as well. The AVR’s two unused ADC/GPIO pins are brought out to an expansion header for future use. One could, for example, use a lens with an integrated variable iris/aperture – which the AVR could dynamically adjust in response to extreme changes in ambient lighting (e.g. transitioning between deep shade into direct sunlight).
Regarding lenses: the board is designed to work with two different lens holders from Sunex. The CMT821 (seen in the photo) and the CMT103 are both supported. They’re both M12x0.5 (S-mount) lens holders, with the main difference between them being their height (lenses of different focal lengths and construction may require different mounting heights to achieve critical focus).
Currently, a DSL903B 6.6mm lens is attached (largely because Sunex was offering them at a hefty discount when I originally bought the lens holders). On the MT9V032, this works out very nearly to a normal focal length (~50mm in 35mm format terms). In practice, a wider angle lens may be needed to give a wide enough field of view for robot navigation (at the cost of reduced stereoscopic depth resolution).
Layout
Layout was easy; space wasn’t at a premium, and the amount of components to place and signals to route was few (owing in no-small-part to the MT9V032’s LVDS interface obviating the need for routing any of the sensor’s parallel outputs). The majority of the signals are on the top layer, with just a few breaking up the otherwise continuous ground fill on the bottom layer. The sensor’s LVDS output is routed as directly as possible to the SATA connector, and the 27 MHz clock is routed so-as to minimize reflection-inducing stubs.
The board has mounting holes for the camera lens holder, and also has four well-defined mounting holes with 1″ spacings – making it easy to mount to some sort of development structure, for experimenting with various multi-camera setups.
The board was fabbed through Laen’s excellent 2-layer PCB service (just under $15 for 3 copies of this 1.7″x1.7″ board!). Assembly wasn’t a problem. I used a laser-cut Kapton stencil from OHARARP to apply solder paste to the board, loaded all of the parts, and then reflowed the whole thing in my modified toaster oven (as I have done before).
Firmware
The firmware for this board doesn’t have to do a whole lot at this point. Currently, its primary task is reconfiguring the MT9V032 in order to enable the LVDS output. The hardest part in that was tracking down a minor problem on the PCB: I accidentally swapped the data and clock lines for the I2C bus in the schematic (one of two known issues with the board – the other being a typo on the bottom silkscreen). This prevented me from using the AVR’s built-in I2C peripheral, so I had to resort to writing a simple bit-banged I2C routine. The AVR has plenty of cycles to spare, so there isn’t much concern over the efficiency lost from not using the hardware peripheral.
Eventually, the firmware will be enhanced to give the FPGA some amount of control over the image sensor. For stereo-vision, you (ideally) want to have all cameras perfectly synchronized and with matched exposures. Synchronization is easy; if configured identically, clocked from the same source, and removed from reset simultaneously, the image sensors will operate in lock-step. Normally, exposure is automatically adjusted by the MT9V032; by giving the FPGA control over this, however, it should be possible to achieve slightly better correlation between multiple sensors.
Downloads
The EAGLE libraries I used to make this board are available in my eagle-lbr Mercurial repository (or directly download a ZIP file). The MT9V032 footprint is in the ‘dan-aptina.lbr’ library.
The EAGLE schematic and PCB layout for the MT9V032 LVDS camera board is in my eagle_mt9v032 repository (direct ZIP download).
For those without EAGLE, you can download PDF versions of the schematic and layout.
The current AVR firmware is available in my mt9v032 repository (direct ZIP download).
That’s really a great post. I have learned a lot. Please keep up the good work and just tell me
if anything pops out that I can help !!
Thanks Rifat! I’m glad you liked it!
Good stuff.
Nice! How do you read the image? (for testing the MT9V032) Do you use a linux driver for this?
I don’t currently have a direct means to read/view the image produced by the sensor; ultimately, it’ll be read by the host PC over PCIe (a custom Linux driver will facilitate this). (“read by the host” meaning: host tells the FPGA where to DMA to, and then the FPGA writes directly to the host’s memory over PCIe).
I had previously considered using DVI/VGA as a simple way to output video before I had PCIe working (and had alluded to that notion in my commentary for the development photo), but later found that this would be non-trivial (the DVI codec requires additional configuration and data formatting) – or, at least, more involved than the trivial VGA driver I used for my FPGA NES.
I’ve since then taken a step back and decided to prioritize development of things that will be more generally useful (e.g. PCIe, which gives me bidirectional access to any data produced/consumed by the FPGA), rather than detouring and getting DVI working (which only lets me visually confirm that things appear to be working, but serves little purpose beyond that (for this project, at least)).
That being said, what I’ve developed thus far lets me confirm that the PCB and the MT9V032 should be fully functional; the sensor outputs a valid LVDS stream with embedded horizontal and vertical syncs of the expected frequency (which the FPGA is able to lock to and deserialize). The thing that I can’t yet confirm is that the pixel data is actually visually meaningful (that is, you could hypothetically have a sensor with a busted analog section that still passes this simple test..).
I believe Dan uses the DVI output of the SP605 and directly feed the image to a monitor. If you check the second photo, you can see the DVI connector.
That had been an intermediate goal at one point, but I haven’t yet put together the infrastructure to actually drive the DVI codec on the SP605 board. I have, at least, loaded one of Xilinx’s self-test bitstreams that generates test patterns over DVI.. which is probably the reason that I had the DVI cable hooked up in that photo.
Hello Dan,
I am wondering about C mount lenses. Do you think an extra benefit may come if you use C mount lens instead of S mount? I have been reading about lenses nowadays. It seems to me that for CMOS sensors that are 1/3″ or smaller no benefit is there if you use C mount lenses. S mount lenses with similar specs cost less. Do you think that I am missing something here?
If I make a statement like “C mount lenses are better (they are at least bigger) than S mount lenses when you have a CMOS image sensor bigger than 1/3″ . ” Would there be any truth in it? :D
I would be happy if you can share your knowledge about this subject.
thanks a lot
Yeah, I’d tend to agree with that. The general rule for lenses is, of course: bigger/heavier/more-expensive lenses are better than smaller/lighter/cheaper lenses.
But you’ll rapidly run into diminishing-returns with small/low-resolution image sensors. I’m no expert about lenses for small sensors, so I can’t really be more specific than that.
Ideally, you’d like some form of objective way of comparing lens quality – like MTF charts – but good luck finding such things for CCTV lenses! Many manufactures will give their lenses resolution ratings (e.g. a “1.3 megapixel” lens); I wouldn’t generally trust those figures, but they might be useful for comparing quality between lenses from a specific manufacture.
Really, I’d expect other design considerations to play a larger role – like physical size. The C-mount itself (ignoring the optics entirely) is, after all, considerably bulkier than other small-sensor mounts (like S-mount).
The biggest thing that concerns me about S-mount is focusing; most S-mount lenses are focused via the mounting threads, rather than something on the lens. Again, it’s probably not a big deal with VGA sensors, where focus isn’t critical (you have huge amounts of depth-of-field) – but it could easily be a limiting factor for multi-megapixel sensors.
Hey Dan,
All very interesting, good looking PCB too. I was wondering if you tried running the MT9V032 at more than 26.67 MHz (essentially a higher FPS). If so, does it work?
I haven’t tried overclocking the MT9V032, as it were.. (and don’t plan on it, as 60 FPS at full resolution is more than adequate as it is). I’d speculate that the PLL used for LVDS mode would be an early limiting factor (so use parallel output mode if possible).
If you’re just looking for increased frame-rate, and can sacrifice some resolution, you should be able to achieve that by reducing the size of the sampled window, and/or by enabling row-binning (I haven’t tried this myself, so I can’t guarantee it works). The MT9V032 documentation indicates that if you row bin by 4 (effective output resolution of 752×120), you’ll achieve a 4x FPS gain (240 FPS!). Column binning doesn’t increase frame-rate on the MT9V032.
Binning only works (correctly) on monochrome sensors (on color sensors, sensels with different color filters are erroneously combined). Windowing should work on either.
Hi Dan,
this a great project! I am looking for exactly what you are doing,
but I think for good stereo vision, the cameras need to be triggered.
I think the cameras have the option to acquire still images based on a
trigger pin. Also it would be good to add frame counters to the image,
such that the data of the cameras can easily be matched in terms of
frame id’s on the PC side. Do you think, triggering and ideally adding frame counters would be possible?
Or would there be any bigger problems?
-Felix
Yeah, absolutely – synchronization is highly desirable for stereo vision, so it was a major requirement in designing this board.
In my system, synchronization is achieved by running all cameras off of the same reference clock, and ensuring that they are configured identically. This is basically the approach recommended in the MT9V032 datasheet.
The MT9V032 sensors do have trigger pins, but the datasheet isn’t entirely clear on whether you can reach 60 FPS when using them. With the MT9V032, the highest frame rate is achieved by overlapping the exposure of a frame with the readout of the previous frame; the datasheet seems to indicate that this overlapped operation is only possible in the self-timed “Simultaneous Master Mode”, which precludes external triggering.
That restriction doesn’t make sense to me, so it’s something I’ve been meaning to test, but it’s not a priority (since I intended for the system to not rely on triggering anyway).
As for frame counters: sure, you could certainly associate metadata with an image (frame count or time stamp). I hadn’t really planned on it, since the assumption is that everything is sufficiently in sync so-as to not require shuffling frames around on the PC side. The processing pipeline that I’ll be using in the FPGA will, by design, keep everything together.
Or do you mean superimposing a visible count on the actual image? (like those goofy pre-digital cameras that didn’t have a better way of recording a timestamp..) This board isn’t capable of doing that (the microcontroller can’t really modify the pixel stream, other than making exposure adjustments to large parts of the image). The FPGA on the receiving end could certainly do it, but there are much better ways of tracking such data without visibly altering the image.
Hi Dan,
We have a board up and running now with the same sensor you are using. It is working consistently from frame capture to frame capture, but we see an odd image. It looks like the first few columns of the left side of the image are compressed and about the right 1/4 of the image is compressed relative to the middle part of the image (or the middle part is expanded). I am receiving all 752×480 pixels out of the sensor (I am also using the LVDS serial video output), which is good and I know that the chip can co-add 2×2 or 4×4 groups of pixels, but if it were doing something like that, I would think it would affect both the x and y look of the image, not just the x direction.
I have a call into Aptina, but I am not holding my breath for them to answer. If you have seen this or have any ideas, that would be great.
Here is an image of my co-worker Matt:
And here is our test setup:
It is a Xilinx SP601 main board with a custom FMC daughter card that we designed (the MT9V032 is under the lens on that board). The video is captured by the FPGA, re-formatted and sent out camera link to a National Instruments frame grabber connected to a laptop via an ExpressCard interface. There is also an OmniVision sensor on that board to the left of the lens that we are starting to bring up as well. That one is supposed to be more sensitive, which is important for our application, but this test will allow true side by side comparison of them
Hey Gary I am newer to sensors and doing something similar using the Xilinx Spartan-3a instead of the Xilinx SP601. Is there any chance that you would be willing to share your VHDL code so I could compare it to mine because I am working on fixing the grouping problem as well?
And here is a capture I just did with concentric circles that is easier to see. It almost looks like the center area is stretched and the left and right sides are shrunk.
As an FYI – we are running all defaults at power up so it is outputting frames at 60 hz. The only register I poke is 0xB1 to enable the LVDS output so that we can lock on and capture data.
Hi Dan
Nice writeup (as always)! What would be very interesting for me is to see how you deserialized the LVDS stream inside the FPGA. If it’s a simple matter then I would also replace my cameras with a similar model.
By the way, I was very impressed by your stereo vision system. I’m also currently developing something similar. You could call it a vision preprocessor. Two cameras are connected to the FPGA and the image stream along with other information (depth map, optical flow, covariance matrix, spartial derivatives,…) is sent via ethernet to a host pc.
Writing the algorithms was fun. Integrating them into the whole system wasn’t. Thats the point where you leave the scope of a simulator, which leaves chipscope as the only debugging tool. Very time-consuming work sometimes, as every single change to the system keeps the pc busy for 3h building the system…
Greetings,
Ben.
Hi Dan
I really liked your write up.
When I try to use the MT9V032 in snapshot mode with external trigger pulse on the EXPOSURE pin I see that the sensor is level triggered rather than the edge triggered that the Aptina datasheet says it is. So if the EXPOSURE pin is held high for too long the sensor will retrigger.
Has anybody else seen this?
With reduced image size I have snapped at above 60FPS.
Pingback: Regarding Serial Input Image Sensor
I’m looking for a LVDS camera as a component to be used in an instrument and that will interface to a DS92LV2422 Deserializer and, ultimately, a camera port on a D3730 based SBC. If you’ve pursued your design to the point where it actually works with Aptina ARM9 drivers for this sensor, perhaps you’d be interested in contacting me. Thanks.
Dan,
I looking for a LVDS camera design to be used as a component in an instrument and which will need to interface to a NS DS92LV2422 and, ultimately, a D3730 SBC running Linux. If you’ve pursued this design to the point where you are comfortable with its capabilities and would like to see it in a product, please contact me. Thanks.
Pete VJ
Just wondering where I can buy the MT9V032 in single quantities? I would like to prototype up something similar.
Digikey!
http://search.digikey.com/us/en/products/MT9V032C12STC/557-1237-ND/1553334
Cool project! How did you deal with demosaicing? My understanding is that Aptina global shutter products all output the straight sensor raw values and required demosacing. Did you do that on the FPGA?