Passive Auditory Processing

Participants: Tomas Figliolia, Andreas Andreou

The auditory processing pathways involve both time-domain as well as spectral features to aid segmentation, classification and ultimately recognition. Passive auditory processing from the microphone array involves spectral representation of sound of extracting features from audio using a model of the cochlea (Liu 1992). This model would take the audio signals and by implementing a model of the basilar membrane and the inner hair cells it would give us a cochleogram from which we could extract our features. The idea was to take the cochleogram, threshold it and transform it into a logical image (only 0 or 1). This image would give us the places of interest in the cochleogram. We would find “islands” in the image and we would try to characterize those islands by different features such as area, length, width, orientation, and frequency of repetition. By using these features we would be able to identify the actions performed. Since the output of the cochleogram has a high resolution, we had to implement an image smoothing algorithm. In figure 1 we can see the smoothed cochleogram for when a person chops a cucumber and on the right we can observe the thresholded image from which we extract the features. More complex feature extraction strategies can also be incorporated to classify sounds emanating from machinery such as a kitchen blender (Goldberg 2006).

Figure 1: Smoothed cochleogram on the left and thresholded cochleogram on the right. This corresponds to the action of chopping a cucumber.

Unfortunately we ran out of time for this complex cochleogram-based feature extraction and for demonstration purposes we had to move to a simpler way of classifying our sounds using time domain features (energy). We based our feature extraction on the repetition of sounds and the “duty cycle” of them. As an example, if we are chopping a cucumber and cutting a tomato, the sound of chopping has a higher frequency repetition compared to the tomato cutting and since both vegetables are different, the decay in the sound for each of them is different, this is what we call the “duty cycle”. It is the duration of the sound. The performance of this recognition algorithm was adequate especially, but for periodic actions like cutting a tomato, chopping a cucumber or stirring water, the performance was acceptable. Similar time-domain processing strategy can be applied to the signals from the vibration sensors.

Time-domain algorithms where employed for the recognition of the position of the sound source in space <place> as described in the paper Close Range Bearing Estimation and Tracking of Slow moving vehicles using the microphone arrays in the Hopkins Acoustic Surveillance Unit. This paper explains how to approach the problem by using only two microphones. We implemented it for different acoustic scenarios and we found that the algorithm was working well. The algorithm is based on the fact that sound arrives to one microphone before the other if the acoustic source is not equidistant to both of the microphones. In Figure 2 we can observe the results we obtained for the case of a person walking and talking making circles around the two microphones. The only improvement we had to do is try to change these two microphones algorithm to a four microphones algorithm, since this approach is only valid for angles from 0 to 180 degrees. This can be understood by the fact that there are two positions from which we could have the same delay in a microphone.

Figure 2: Example of the time delay from one of the microphones for the case of a person walking and talking around the microphone array.


(Liu 1992) W. Liu, A. Andreou, and M. Goldstein, “Voiced-speech representation by an analog silicon model of the auditory periphery,” IEEE Transactions on Neural Networks, vol. 3, no. 3, pp. 477–487, 1992.

(Goldberg 2006) D. Goldberg, A. Andreou, P. Julián, P. Pouliquen, L. Riddle, and R. Rosasco, “VLSI implementation of an energy-aware wake-up detector for an acoustic surveillance sensor network,” Transactions on Sensor Networks, vol. 2, no. 4, pp. 594–611, Nov. 2006.