Audio-visual fusion


We implemented a spiking model of Bayesian inference using integrate and fire (IF) neurons to achieve better card identification using the outputs from the various visual and auditory sensors. Each sensor outputted a 52 length vector of card probabilities to enable communication between the different sensory modules and the fusion system. These probabilities were then mapped as input currents to a IF neuron with a refractory period. By tuning the parameter of the neuron we were able to get the firing rate of the IF neuron to saturate logarithmically (as observed by Tal and Schwartz, 1997), enabling us to convert the probabilities inputs into log probabilities. We could then sum the log probability spike rates (instead of multiplying the probabilities as per normal Bayesian inference) to identify the most likely card.

In addition to the sensor data, we were also able to include global information about the games (such as cards played and cards in hand) as a prior to further improve our estimates. The inference was effective when using test probabilities and was also effectively interfaced with both the artificial auditory and visual systems.

One weakness of this method was that it was unable to discriminate between very similar probability estimates (< 1% different) because of the mapping of a continuous current signal to a discrete spiking signal. Namely, one needed to increase the current by a specific increment before one could evoke another spike.

Visual and Auditory Fusion

'Jonathan Dhyr' 'Anirban Dutta' Tara Julia Hamilton 'Jon Tapson' Christopher Comer