64 — FPGA-based ORB Feature Extraction for Real-Time Visual SLAM

Fang et al (1710.07312)

Read on 24 October 2017
#slam  #computer-vision  #realtime  #fpga 

One of the first solo advanced coding projects on which I ever embarked was a 3D scene reconstruction tool for stereoscopic images. It turns out that this problem is really hard and really complex, and I think the half-baked code is still around somewhere on a CD or some other antiquated storage medium.

The most computationally challenging task in this localization/mapping process (SLAM) is feature-extraction — finding parts of the image that are unique or interesting so that you can find them in the other image as well and then correlate the two images using the point cloud of their features. This feature extraction takes a lot of work: On an ARM v8 quad-core processor, the authors were able to achieve only 20 FPS on a 640×480 image.

In order to work in realtime, with high temporal acuity (for instance, for use on self-driving cars), it’s advantageous to have a faster system. Fang et al implement the ORB algorithm — a common feature extractor — using FPGAs, colocating 2D pixel neighborhoods when they’re being accessed for faster pixel reads, and down-sampling images (there’s no good reason to say ‘mipmap’ here but I know the word so I’ll use it: mipmap!) for quicker scanning. A 7×7 Gaussian filter is also implemented using hardware: All results are transmitted to a devoted, seperate ARM processor.

Using this new system, the authors demonstrate an experimental 103% increase in framerate (to a very fluid 67 FPS) with around half the time-latency.

Because ORB feature extraction is around half of the CPU-budget for SLAM, this project can dramatically reduce the energy and processor power required to map an environment in realtime. I’m not sure 67 FPS is yet self-driving-car-ready, but it’s certainly getting close.