Motion segmentation from ATIS data: Discarding independently moving objects (IMOs) for SLAM estimation

Participants: Yezhou Yang, Francisco Barranco, Cornelia Fermuller


This component of the MAP project is to segment the objects that are moving (independently of the camera) in the scene. The estimation of camera pose in the SLAM module requires as input features from the static parts of the scene. Objects that are moving independently, as for example cars, pedestrians, or bicycles are going to introduce noise and make the system fail. The first part of this component describes image motion estimation at significant image edges (normal flow) and occlusion detection from both DVS events and intensity information on the ATIS camera. The second part describes the segmentation of the independently moving objects.

Normal flow estimation and detection of occlusions

We use normal flow (instead of optic flow) because its estimation is local and does not require assumptions about the scene. Furthermore, it can be used to estimate optical flow later as needed, or estimate ego-motion.

Let us first give the intuition for our approach: Imagine a very simple stimuli, as for example, a vertical bar moving from left to right. The amount of events at every position will depend on the image spatial contrast and the speed. In other words, considering only local information, there is an intrinsic coupling between speed and spatial contrast in the information of DVS/ATIS(TD) events.

In our approach, we instead estimate the image motion using the width of the traces created by the events at strong intensity edges. These traces are seen mostly at object boundaries, and we thus call them motion boundaries. A very accurate estimation of motion boundaries is possible using as input both DVS events and the gray-level information.

Normal flow estimation based on the events and the intensity grey levels is more accurate than if based on events only. However, since the quality of the ATIS data depends on the the intensity level, we get distortions in the intensity if we try to acquire synchronous image frames. The intensity of pixels of darker objects are acquired from a photon integration during longer periods of time than those of brighter objects, and thus close to moving boundaries the intensities are blurred.

After we estimate normal flow, we also attempt to distinguish foreground from background. From the ATIS we get events at strong texture edges and real motion boundaries (that correspond to proper object boundaries). Motion boundaries can be occluding or disoccluding boundaries. Occlusion information provides information about the relative depth of objects. It is useful for segmentation and it also can help improve recognition.

Since the intensity is available, we can estimate occluding and disoccluing boundaries by matching the intensity regions next to the motion boundary in successive image frames. We know already the image motion and thus the size of the region to be matched. Computing for different intervals of time makes the occlusion/disoclussion estimation more robust (but also increases the latency). However, this local approach of occlusion detection can only work if there are intensity variations in the background. The next figure shows some results with one occluding and one disoccluding boundary. Due to the plain background, the occlusions of the objects and the background can not always be detected.

Motion-based segmentation

The segmentation combines image motion with intensity information. Using the ATIS we can obtain normal flow (as described in the previous section) in an event based manner and at the same time estimate spatial derivatives from synchronous gray-scale images. The normal flow provides an estimate of the image motion on moving boundaries. Using this normal flow, we compute a dense optical flow field. We adapt the Horn-Schunck method for optic flow estimation. The event-based normal flow provides very good estimates at boundaries (of the normally inaccurate temporal derivative in classic frame-based approaches). Using these estimates the optic flow algorithm propagates the flow values to the other parts of the image. In addition, we can use occlusion information, when available to stop the propagation. Once we have a rough estimation of optical flow, we apply a clustering algorithm on the optical flow to separate the scene into regions of different movement. The output of this clustering process is then used in a graph-cut based segmentation using both intensity and flow to separate the scene into multiple moving parts. Specifically in the data term we use image motion, in the regularization term we use intensity.

After a first segmentation, we now improve the segmentation. Intuitively, pixels sharing the same label also share similar motion. Thus in the next optical flow estimation, rather than blindly propagating flow vectors, we use a layered propagating mechanism where only pixels sharing the same label from the previous segmentation will affect each other during the process. After a new round of optical flow estimation, the graph-cut based segmentation is updated.

We only have described the process for one flow field and set of images. Now, as new data arrives, we update the segmentation and improve it. When a new image and new normal flow estimates come in, we use the previous label map obtained from the segmentation, warp it forward using the optical flow estimation, and give a prediction of the label map for the next round of segmentation. The diagram below illustrates our system.

Independently moving objects

An ATIS dataset for motion based segmentation

During the workshop, we also collected a variety of ATIS data to evaluate motion estimation and segmentation algorithms. We collected data from three scenarios: fixed camera with moving objects; moving camera with fixed objects; and moving camera with moving objects.

Future work

At this point, the segmentation is not working in real time, and it is our goal to achieve for the very next future. Also the interaction between optic flow estimation and segmentation with intensity information will be improved. Specifically, we want to emphasize boundary information. As certainty about boundaries increase, this information can be exploited to jointly tune flow estimation and segmentation. Then, new approaches could also take advantage of the normal flow to compute the optical flow based only on the ATIS camera. The idea is that if the intensity grey values are obtained with lower latency, we could attempt to compute intensity derivatives that can be combined with the image motion at boundaries.