Proto-Object Based Saliency Map The aim of this project was to develop a proto-object based saliency map as opposed to a feature based map.

What is a proto-object?

Rensink described it as "a volatile unit of visual information that can be bound into a coherent and stable object when accessed by focused attention."

Why is it useful for a Saliency Map?

Feature based saliency maps detect the odd feature out. Proto-object based maps provide the shape and extent of attention to be applied.

Border ownership and grouping The proto-object based saliency map uses the idea of grouping and border cells, as discussed by Craft et al. (2007), to deal with occlusion and overlap between adjacent objects. This is summarized in ,Figure 1, below which shows a border ownership network for two overlapping rectangles. Briefly, two border ownership (B)cells which encode antiparallel ownership directions share the same receptive field. When an object edge falls into the receptive field then the B neurons are excited, providing mutual inhibition to each other and excitation to their respective grouping cells. The grouping cells receive further excitatory inputs from additional B cells within their receptive fields and provide excitation to all B cells within this field. Consequently, for two B cells sharing the same receptive field the B cell with stronger grouping excitation will obtain ownership of that boundary. Grouping cells can be thought of to provide a handle to a given object and the border ownership cells define the boundaries of the object.

Figure 1: "Model architecture. A: network overview, showing border-ownership selection for a stimulus of 2 overlapping rectangles (bottom). Receptive fields of B cells are shown as ellipses, where attached arrows indicate their preferred side of figure. B cells with opposite arrows compete, and this competition is decided by grouping cell input (receptive fields of active cells are shown in green and red; receptive fields of suppressed cells shown in gray)" Craft et al. (2007)

The model The saliency map makes use of the above border ownership model to define proto-object handles and boundaries. The model accepts an input image on which to to calculate saliency. The image is then processed via three different channels - intensity, red-green and blue-yellow. To ensure size invariance each channel separates its input image into a pyramid, downsampling the image by a factor of 2 at each layer of the pyramid. A center-surround operation is then performed on each layer of the pyramid using a Difference of Gaussians filter. Next orientation specific filters extract 0 and 90 degree edges from the center surround. The edge maps and center surround maps from the each channel are then summed. The final center surround map is used to calculate the grouping pyramid. The border ownership signals are then calculated using the grouping and edge maps.

[[Image(overview.png)],height = 200]

Figure 2: The proto-object saliency model. The model was inspired by that of Itti et al. (1996) however; certain changed were made to accommodate the grouping mechanism. The largest of these changes are that the model only has two distinct channels - colour and intensity. The orientation channel is now implicit in the border ownership. Secondly, in the original Saliency model it was possible to collapse the final saliency map into a 2D image. In this model the final saliency is obtained using the grouping cells as object handles. Because there can be multiple handles at the same location coding for proto-objects at different spatial scales the final saliency map is a pyramid. The most salient proto-object is obtained using an argmax operation over the pyramid. By keeping the saliency map in a pyramid formation it is simple to include top modulation of attention to scale by weighting the different levels of the map. Segmentation of the objects is performed by assuming all pixels with contiguous grouping cells and whose border ownership vectors agree belong to the same proto-object. The dashed line between the border ownership pyramid and the proto-object segmentation denotes that this component of the algorithm is incomplete.

Figure 3 below shows an input image of hot air balloons with the boundary of the proto-object describing the elephant shown in green.

Figure 3- proto-object segmentation of elephant balloon.

The grouping map and associated boundary vectors are shown in the map below. The green cross indicates the elephant.

Figure 4: Grouping map and associated boundary vectors. Note the change in direction between vectors at the boundary between two objects.

The complete algorithm was tested on the balloon image shown above. The results are shown below in order of most to least salient. The grouping of multiple balloons as a single proto-objects is a result of the border ownership signals not being incorporated into the image segmentation algorithm.

Figure 5: 9 most salient objects in the balloon image.


Ernst Niebur, 'arussell' Ralph Etienne-Cummings


Rensink, R. A., Oregan, J. K., & Clark, J. J. (1997). To see or not to see: The need for attention to perceive changes in scenes. Psychological Science, 8(5), 368–373.

Craft E, Schuetze H, Niebur E, von der Heydt R (2007) A neural model of figure-ground organization. J Neurophysiol 97: 4310-4326

Itti, L. Niebur, E. and Koch, C. Control of Selective Visual Attention: Modeling the `Where' Pathway. Neural Information Processing Systems 8:802-808 (1996)