MONOCULAR VISUAL SLAM

Main contributors Garrick Orchard, Greg Cohen, Xavier Lagorce

Approach

We split the SLAM problem up into two sub-problems. The first is the problem of reliably recognizing and localizing image features on the image plane (see the event-based features page for details). The second is the problem of determining the 3D location of features and the camera in space, which is discussed below.

Our approach to SLAM is based on Andrew Davison's  MonoSLAM paper.

As a starting point we are using the  Matlab MonoSLAM implementation.

Sensor Calibration

The first task is to calibrate the sensor, which was performed using a checkerboard pattern. Since our sensors only detect changes in the scene, we flashed a checkerboard pattern on the screen while viewed by the sensor. Static images were created by averaging the activity for every pixel over time. The  Caltech camera calibration toolbox was used to calibrate the sensor from these images.

The calibration stimulus (left) and an image generated by ATIS from the stimulus (right) are shown below.

Calibration stimulus Example ATIS calibration image obtained using flashing checkerboard

Test Sequences

We created a test sequence by drawing a regular grid of black dots on a whiteboard and moving the ATIS sensor while it viewed the whiteboard. The attached stimulus recording video shows the raw recording.

The whiteboard with black dots drawn as features

Recordings were cleaned up with noise filtering before applying an activity tracker to detect and track the black dots. The results of tracking are shown in the attached tracking video, where each event is coloured according to the tracker it has been assigned to.

Planar 'SLAM'

As a first approach we took advantage of the fact that our features all lie on a plane (the whiteboard). We initialize the sensor by telling it the real world location of the first four features it sees. At each subsequent timestep, two steps are followed:

Camera location and pose estimation: Using knowledge of the location of tracked points, both in 3D space and on the image plane, a homography is computed between the whiteboard and the image plane. Using the ATIS' intrinsic calibration matrix, the extrinsic parameters (location and pose) of the ATIS can be estimated from the homography.

Mapping new features: When a 'new' feature is seen (one which has not been seen before) we can use its location on the image plane, camera location and pose, and camera intrinsic parameters to define a ray in 3D space along which the feature lies. Knowing that our features lie on a plane, we determine the feature location as being the intersection of the ray and plane.

The attached planar SLAM video shows the result. The three lines are the axis of the camera to show camera location and orientation. The video shows new features being detected and localized as the ATIS moves.

3D SLAM

This was a true 3D visual mono-SLAM implemented using the  Matlab MonoSLAM implementation. As with the planar SLAM, the algorithm was initialized knowing the location of the first four trackers is 3D space, and the location and pose of the camera was initialized by computing a homography. From that point onwards, the true SLAM algorithm takes over.

An Extended Kalman Filter consisting of alternating prediction and update steps was used.

During the prediction step, a new camera state is estimated based on the previous camera state (which includes rotational and translational velocities).

During the update step, the difference between detected and predicted feature locations on the image plane is used to correct estimates of the camera and feature locations.

New features must be observed multiple times before being added to the map. Each time a feature is seen, we estimate a ray along which the feature lies. When we have enough rays for a feature, we find their best point of intersection (actually the center of the smallest possible sphere which contains all the rays). This point of intersection is then assumed to be the feature location.

The attached 3D SLAM video shows the result. The video shows rays being estimated as new features are seen, and new features (filled circles) being added to the map as time progresses.

Attachments