Sound Localization

Participants: Shih-Chii Liu, Nicolai Waniek, Evangelos Stromatias, Dimitra Emmanouilidou, Qian Liu,Xavier Lagorce, Amir Khosrowshahi, Alejandro Pasciaroni, Tobi Delbruck

We will map an environment of multiple audio sources (speakers). The robot with mounted microphones will locate and identify the sound sources and then move towards a sound of interest. The audio input goes to the AEREAR2, a binaural spiking cochlea system. In the first instance, we recorded data from 4 speakers. The recording setup from the experiments ran on the 8th of July is in the figures ( 1,  2) and the jAER log files can be found in uns13/Experiments_Monday_8th/. These data files and subsequent data files were used for training a network to recognize a gender or a unique speaker.

Two different approaches were explored here.

Approach 1: The spikes from the AEREAR2 were used as inputs to a spiking network running on SpiNNkaer to localize the sound source.

The most widely accepted hypothesis is the Jeffress model, which detects the interaural time differences (ITDs) of a sound wave captured by two ears. Although it forms the basis of modern binaural localization, the first evidence was found more than 30 years after the model’s creation, showing that the nucleus laminaris of the barn owl works similarly to the hypothesis.

mso coincidence detection neurons

The cochlea output is as followed, and the ITDs are captured in a MATLAB simulation: cochlea output
coincidence detection neuron

We captured the output from the cochlea in different sound directions during the workshop. The recordings positionns are showed in the Figure with white marks on the ground including angles of {0,+-30,+-60,+90} degrees.Laptop A, connected to a speaker was playing the sound files at each different angle. Laptop B, was connected to the cochlea and was recording the spiking activitity. data capture data capture

The Spiking neural network model is described below: network model

The weighted projections from coincidence detection neurons to direction neurons are trained. weights

Go down to the hardware, the cochlea is connected to the 48 node SpiNNaker board via an FPGA link. connection

Speaker Male-vs-Female identification task was further explored, using just the cue of Inter-spike intervals for each channel band, we trained a linear SVM. To collect data, four subjects (2 male 2 female) were asked to talk to the cochlea standing 1 meter away, reading some local news from Telluride website. Then, 2 sample files (TIMIT database 1 male, 1 female) were played through the speaker positioned 1 meter away from the cochlea and so further data were collected , yielding in total 6 data files. The data files contaiend the output of the cochlea (spikes and timestamps).

The left ear was chosen and the Inter-spike intervals were calculated using the ISIfilter in JAER. Data was logged per frame into txt files.

For the linear SVM two classed were considered (male/female) and a 5-fold cross validation was used, i.e. randomly choosing 90% of the data for training and 10% for testing. 50 Monte Carlo runs were used to judge a performance of 89.96% correct classification. Below you can see the average ISI's for male (top) and female (bottom) for selected frequency channels, produced with matlab.

Average Inter-Spike Intervals for males and females with respect to selected frequency channels. The difference is noticable.

Currently we cannot recognize the gender using SpiNNaker, but we have found a simple approach to classify them. We can detect the pitch by looking the spike rates of Channel 59 (female) and channel 64 (male). pitch detection female pitch detection male

There is a short demo for sound localization, please find it in the attachment.

Approach 2: In the most straightforward approach for a working demonstration, jAER was used to integrate two existing filters (ITDFilter and CochleaGenderClassifier?) together with a new class OmniRobotContoller? that sends UDP commands to the OmniRobot? to make a new robot implementation class CochleaOmnRobotiSexChaser?.

The cochlea is mounted on top of the OmniRobot? and the ITDFilter is used to extract the heading direction of the speaker. Then the CochleaGenderClassifier? determines if the speaker is male or female. The robot is controlled to steer towards only female or male speakers as selected by the user.