Learning coordinate transformations between visual and motor spaces with DNF


Yulia Sandamirkaya (idea and implementation)

Jörg Conrad (interface between DVS and DNF)

Mathis Richter, Matt Cook (critical comments)

The task

The goal of this project was to design a dynamical architecture that enables autonomous learning of mapping between the visual (here, retinal) space and the motor space, which controls looking behavior. The task of the robot was to center the target object in the field of view by controlling pan and tilt motors appropriately.

The DNF architecture

The main principle behind the DNF architecture for autonomous learning is the representation of all task-relevant actions and perceptions in the structure called Elementary behavior. Each elementary behavior consists of a DNF representation of the intention and the condition of satisfaction CoS.

In particular for the scenario sketched above, two EBs were implemented: a visual EB and a motor EB (see figure, upper part). The visual intention DNF receives input from the vision sensor (here, eDVS) and integrates sensory events in time and in space to produce an activation bump at the retinal location of an object, which is stabilized against fluctuations in the input, but tracks the input as it moves.

If there's a bump in the visual intention DNF, the motor intention DNF receives homogeneous boost and may also build a bump. However, in the beginning of the learning processes, the mapping between the visual location of the input and the motor command needed to center that input is unknown and an exploratory dynamics activates a random location in the motor intention field.

The robot executes the motion encoded in the motor intention field. The motor CoS node integrates the motor commands and detects when the robot has finished the executing the intended motor action. When this match between the intention and the current state is detected, the motor intention field is inhibited by it's CoS node.

However, is by the end of the action the object was centered in the visual field, this event is detected in the visual CoS field, which triggers a learning process in the synaptic weights that couple the visual intention field and the motor intention field. The visual CoS field also inhibits the visual intention field, which may now build an activity bump over location of the new visual target.

In this manner, the robot explores the mapping between the visual targets and motor commands that bring visual target into the center of the visual field. As this mapping is acquired, the strength of the exploratory input is reduce, so that ultimately the motor intentions are driven by the visual intentions.

Experimental setup



The main result, achieved at the workshop is a functional architecture that enables autonomous learning by activating and deactivating visual and motor intentions when appropriate and triggering learning immediately after a successful saccade. The following figures illustrate functioning of the architecture.

In the beginning of the learning process, the robot observes visual input, looking for salient blobs in the perceptual DNF to attend to. As long as no intention has formed in the visual intention field, no motor command is executed, the motor intention field is preshaped by the randomly generated exploratory input, but this preshape is not sufficient to activate the motor intention field.

When an intention-peak is formed in the visual intention field, the overall activity in this field boosts the motor intention field, which now forms a motor peak over the location of the exploratory input. The robot executes the random motor command. If the resulting saccade wasn't successful in bringing the visual target in the center of the view, the visual CoS field remains inactive, no learning occurs and the motor intention is inhibited by the motor CoS node (not shown in the figure), which matches the intended and the calculated current position of the motors.

After a few successful saccades, after which the visual CoS field was activated, some of the transformation weights have been learned and the motor intention field receives localized input from the visual intention field, performing an almost correct saccade, which is corrected by the dynamics of the motor intention field. Over time, the strength of the exploratory input is reduced and the system forms motor intentions in accordance with locations of targets in the visual input, transformed by the learned transformation matrix.


the next to screenshots illustrate what happens in the architecture immediately after a successful and an unsuccessful saccade:

After an unsuccessful saccade, the motor CoS node is activated after the completed movements and inhibits the motor intention field, whereas the visual CoS field remains silent, as the perceptual input does not overlap with the preshaping activation in the center of this DNF.

After a successful saccade, the perceptual input falls in the center of the visual CoS field, this DNF is activated and triggers the learning dynamics in the transformation weights.

Discussion and further work

In the following figure, time steps at which the visual CoS (red plots) and the motor CoS (blue plots) where active. As may be observed, over time, the visual CoS is activated a bit more often, but overall, the learning is fairly slow, as expected in an unconstrained environment with many moving objects in the camera view at any moment in time. In the follow-up of the project, this framework will be tested in a controlled environment with gradually increasing complexity. Moreover, the model will be aligned with the data on development of looking behavior in young infants. On the other hand, scalability to more complex set-ups (stereo vision, robotic arm) will be tested.