Map formation and alignment

Group Members:

  • Garrick Orchard, Singapore Institute for Neurotechnology SINAPSE
  • Michele Rucci, Boston University
  • Shih-Chii Liu, Institute of Neuroinformatics, UNI/ETHZ

Contact Person: Bert Shi

To learn correlations between multimodal stimuli, do we need explicit connections between maps, multisensory maps or just convergent projections onto motor neurons. We also wang to know what is the coordinate frame that develops (head position, eye position etc)? Does it depend on the task (auditory, visual)? Run sensory map formation and hebbian map linking algorithm seperately and simultaneously. What are the constraints on learning rates and learning outcome in the two modes? How independent are maps?

Learn Visual Tracking

Participants: 'Bernhard Englitz', Frederic Broccard, 'David Barr', 'Qiuyan Peng', Bert Shi


The ability to follow an object by head and/or eye movements is a basic ability of the visual system which underlies a wide range of higher level mechanisms, such as attention. In the previous project [1], such a tracking system was implemented from an engineering perspective, since the focus at that time was mainly on map alignment. One of the goals of this years project was to replace all engineering components with neuronally inspired solutions. The tracking system was realized as a neuronal mapping between a visual representation and a motor representation of a robotic head. Starting from a fully random connection structure, the visual-to-motor mapping was learned using well-known Hebbian-learning guided by the success of tracking an object.

Initial Model Structure

Visual information was recorded by a camera in the pan-tilt head (320x240), subsampled 10-fold and slightly blurred (imitating the spread of information in a cortical column, related to the subplate development project by P. Kanold). The resulting representation were considered to represent firing rates of cortical neurons (grid of 32x24). The possible movements of the pan-tilt-head were represented as a 3x3 neuron matrix, coding for all combinations of up/down/left/right/stop movements. The visual and the motor representation were all-to-all connected with initially all random weights, normalized to 1 per receiving neuron. Hence, an activation in the visual population leads to an activation in the motor representation. Since only one motor action can be executed we read off the motor population response by taking the action corresponding to the maximally activated neuron. To avoid running into low order limitcycles uniform noise was added to the motor representation prior to evaluating the maximum, reminiscent of stochastic response properties of biological neurons. A schematic depiction of the system setup is shown below.

  • system architecture (same as in Sensory Fusion part):

Learning Phase

During the learning phase the goal of the system was to learn to track an object (a moving, red LED) by adapting its internal visuo-motor-connections. This adaptation was driven by reward-based Hebbian learning [4], dwij = α(R) vi mj, where vi represents the i-th visual neuron and mj the j-th motor neuron. The learning rate α was not constant but dependent linearly on the reward R in a given iteration of the model. R was anticorrelated with the change in distance of the target to the center of visual field, i.e. when the target moved closer to the center from the last frame to the current frame, the delta distance is less than 0, and R is greater 0. To ensure a topological organization of the motor neurons, the selected motor action was blurred (Gaussian, initial width 1) for the learning update. While this step is not necessary for learning to track, it mimicks the topology found in most cortical representations (which is also induced by activity waves in early development). The size of the blur was also reduced as a function of R. This reward-based annealing schedule avoided overlearning, once stable tracking was achieved. For the convenience of the operators, the LED was moved by the robot arm to a new random position every few seconds. This time was chosen to give the tracking system a reasonable time-span for consistent learning (without unpredictable target movement). This simple update rule with the annealing schedule lead to stable tracking after about 5-10 minutes (corresponding to 12000-24000 frames) of real-time learning (see a high speed video below). An interesting observation was that the receptive fields of the motor neurons were not as clearly distinguished as was previously expected (see video headmotor_rf.avi below). They were rather offset to their preferred direction slightly, winning by only a small margin, however, quite reliably. This was illustrated by hue-coding the preferred movement direction of each visual neuron (their projective fields centroid). A movement orientation wheel emerged (see movie recep.avi below), indicating the topological and (except for the center point) continuous progression of preferred directions. As described in the map alignment project, the mapping from the head to the arm-motor position were learned using classical (non-reward-based) Hebbian learning.

* learn visual tracking (speed up video):

  • receptive field of head motor:


  • receptive field pinwheel:


Head-Arm Tracking Phase

After the learning process had converged, the head-to-arm motor map learning was stopped. Now arm movement were controller by the head movements, replacing the random movements of the robot arm. The target was then guided by one of the operators, leading the head and the arm to track the target synchronously, reminiscent of the tracking behavior or a toddler for finding a target haptically and visually.

Sensory Fusion

Participants: 'Qiuyan Peng', 'David Barr', 'Bernhard Englitz', Frederic Broccard, Garrick Orchard, Bert Shi

In the earlier project (Ref.1&2), a robot learned to point the end effector of its arm to the object it is looking at by simply watching its arm moving in front of itself. The learning is established by aligning the arm motor coordinates with the sensory coordinates from vision and arm proprioception via biologically plausible Hebbian learning.

In the current project, we adopt the similar framework but try to integrate the audition as an additional sensory map. We obtain a 1x16 Interaural Time Difference (ITD) histogram from the silicon cochlea (Ref.3) as the auditory input. Each bin value is normalized by the maximum of the histogram as pre-synaptic response which is fed into the to the Hebbian learning architecture. Since the ITD information could be used for sound localization, it is expected to facilitate the learning to point. (see https://neuromorphs.net/nm/wiki/2010/sfalign10#LearnVisualTracking for system architecture])

In order for the system to get robust auditory input without the noise generated by the pan-tilt motor, we separate one cycle (70 frames) of training into two halves by slightly modifying the visual tracking system. In the first half, the visual tracking is updated at each frame for 35 frames. In the second half, the visual tracking stop updating (so that the pan-tilt does not move) to listen to the sound with silicon cochlea. The projective mapping from vision to arm motor is only updated in the first half, and the mapping from audition to arm motor is only updated in the second half of cycle. The mapping from the arm proprioception to arm motor is updated during the whole cycle.

We examine the effect of fusing audition with vision by converting the projective field between the sensory maps and the action map. Each neuron in the auditory map projects to a field in the arm motor map, and each neuron in the arm motor map receives the responses from a field in the visual map. Thus, we compute the projection from the auditory map to the visual map by the weighted average of the arm motor receptive fields, where the weight is the projection from the auditory map.

The weights between auditory map and visual map are random initially and the tuning of average pan versus ITD starts with flat, but they will evolve over time as the two maps are getting aligned. Each block in the figure below is a matrix for a certain ITD channel (Wuv(u,v0,v1) with u fixed). The curve illustrates the average normalized pan projected from different ITD channels.

Here is a video shows the evolution process during training:

Another video showing the following behavior after the sensory fusion and map alignment:

The placement of the silicon cochlea on the robot need to be further tuned to get better alignment of the sensory maps…

(1) https://neuromorphs.net/ws2009/wiki/sensormotor09
(2) Y.Wang, T.Wu, G.Orchard, P.Dudek, M.Rucci and B.Shi, "Hebbian Learning of Visually Directed Reaching by a Robot Arm", IEEE Biomedical Circuits and Systems Conference, BioCAS 2009, pp.205-208, 2009. https://neuromorphs.net/nm/attachment/wiki/2010/sf10/BioCAS_2009.pdf
(3) AER EAR: A matched silicon cochlea pair with address event representation interface, V. Chan, S.-C. Liu, and A. van Schaik, IEEE Transactions on Circuits and Systems I: Special Issue on Smart Sensors, 54(1), pgs 48--59, 2007.  http://www.ini.uzh.ch/~shih/papers/liuvanschaiktcas1.pdf (4) Hebb, D.O. (1949). The organization of behavior. New York: Wiley

APRON Platform

'David Barr'