EEG Measurements of Saliency

'Ailar Javadi', Song Hui Chon, 'Spencer Kellis', 'Mounya', Malcolm Slaney, Julien Martel, 'Peter Brunner'

Project Report (added by Song Hui)

Motivation: We wanted to answer following questions: What is auditory saliency and how is it comparable to Visual saliency? How would brain respond to a more 'salient' sound?

Team Goals:

  • Study the relationship between attention and saliency, especially bottom-up saliency perception in audition.
  • Study existing auditory saliency models, such as Kayser 2005.
  • Design an EEG experiment using auditory stimuli. Learn how BCI 2000 interface works. Collect data from at least a couple of subjects.
  • Analyze the EEG data and the saliency maps obtained from models and establish a relationship between them.


  • Designed two EEG experiments as well as stimuli to measure the effect of bottom-up auditory saliency (discussed below)
  • Conducted a preliminary analysis of EEG response to auditory stimuli.
  • Generated and studied saliency maps of stimuli using Kayser’s model and Andrew Schwartz’s cochlea-based model. These data were compared with the time-series response of EEG data to obtain some metrics for correlations between them, which was inconclusive. We will need to collect more EEG data to see definitive patterns.
  • Spectrograms of EEG responses were studied, which seemed to show some correlations with the predicted saliency values from models. We didn't have enough data to confirm this hypothesis though.
  • Tested the (non)linearity of the Kayser model (figure 1). Even though what was tried was not successful (meaning the model was not completely linear, which shouldn’t be the case in the first place), we might be able to explain the model as some nonlinear operations on the sum of parts, which could give us a new perspective to understand bottom-up auditory saliency.

test result of (non)linearity in Kayser's model Figure 1: Test result of linearity of Kayser model. Top left: Kayser saliency map of a target embedded in background. Top right: saliency map of the background only. Bottom left: saliency map of target only. Bottom right: The residual of total saliency map (top left) minus saliency maps of individual stimuli (top right and bottom left). If this model was linear, we would have all blue pixels (meaning all zero values). The resulting figure illustrates it’s not the case.

Experiment 1:
  • Stimuli: 105 pairs of sound segments of 3 second long, though 20% of them were shorter (2.5 seconds). In each pair ('A' and 'B'), the background sound is the same, while the 'target' is different. Each background is generated by combining 20 different sounds from BBC sound effects library (whose rms levels are all equalized) with white noise of matching rms levels. There are 15 different targets from 3 categories (animal (mostly pitched), nature (wide-spectrum) and industry (noise-like)) that were chosen considering their highly attention-drawing nature. In each sound mixture, the target is to start at any random point of time after .75 second and to finish before the background ends. For each pair of stimuli, sound A would play first, followed by a 1 second silence, then by sound B. 4 seconds of silence was given after each pair for subject to make a decision and indicate.
  • Task: Each subject was informed their EEG data would be collected while they participated in a listening test. Subjects were asked to indicate using two keys on a keyboard if two sounds in a stimuli pair were of the same length or different length. The purpose was to have subjects listen to sounds with their top-down attention purposefully drawn away from targets, thus to collect responses to bottom-up saliency of the stimuli.
  • Experiment setup: The experiment was conducted in a quiet staircase outside of the labs. A laptop with BCI2000 interface was connected to the electrode cap on a subject's head and recorded brain responses.
  • Note: The data collected from two subjects had many artifacts (such as eye-blinking movements). There were outside noises (such as fireworks going on) that affected the EEG data also. We ended up analyzing one subject's data, whose result is reported in Mounya's document (see https://neuromorphs.net/nm/wiki/2010/att10/saliency).
Experiment 2:

With Julio joining the team during the last week, we decided to simplify the experiment since our first one was too complicated with too many variables.

  • Stimuli: 6 targets (1-to-2 second sounds -- baby crying, baby laughing, baby saying 'daddy', dentist's drill, glass breaking, and an alarm) were embedded in a stream of white noise. There was a 1-to-2 second gap between targets. Each target was repeated 100 times in total in the experiment. (Listen to segment.wav in attachment for example.)
  • Task: Subjects were to count the number of alarms that sounded so far. This was all quiet, in-the-head task, hence requiring no key-pressing movements.
  • Experiment setup: Same as in the first experiment.
  • Note: The preliminary analysis with spectrograms of EEG responses with the saliency map results seems to indicate a positive correlation between them. This is promising, though more data are required to confirm.
Example Segment (between first 20 - 32 seconds)
  • Spectrogram:

spectrogram of example segment

Figure 2: Spectrogram of the given segment. Blue is for low energy, followed by green, yellow and red the highest. We can see there are four events in this example.

  • Saliency map from Kayser model

saliency map of the example segment from Kayser's model

Figure 3: The saliency map from Kayser's model on the example segment. This is quite similar to the spectrogram in figure 2.

  • Saliency prediction output from Andrew Schwartz's model

saliency prediction output from Andrew Schwartz's model (based on cortical responses)

Figure 4: The output from Andrew Schwartz's cortical-response-based model. Black lines represents how salient is the sound event at that time.

  • EEG response during the example segment's playing

new spectrogram of EEG data corresponding to the duration of example segment playing

Figure 5: (Top) The waveform of the example segment. (Second from top) The spectrogram of EEG response averaged across channels during the example segment (the time delay in recording the response has been considered). Yellow means high energy and blue low. There seems a high level of activity after some time delay from the start of each target. (Bottom four) The EEG responses of individual channels during the example segment's playing. It seems T7 (near the auditory cortex) shows the quickest and most consistent responses to the targets in the stimuli.

  • Location of EEG channels for the experiment

position of electrodes used in EEG experiments

Figure 6: The positions of electrodes used in the experiment. Note T7 is closest to the auditory cortex, which may explain why its response is the most reflective of the target playing.

( This is the end of Song Hui's edit)

WORD OF CAUTION: (added by Mounya)
Saliency is generally thought of as an automatic bottom-up process that can attract our attention involuntarily. However, results from the visual system are showing that the neural response to salient distracters that are irrelevant for a task at hand could be suppressed [1].

Our initial results from the EEG experiment #1 of auditory backgrounds with embedded natural distracters seem to suggest the same effects. Specifically, a power ratio between EEG responses to distracters vs. backgrounds is consistently biased towards suppressing the distractors’ response. Is this result replicable? Is it consistent for different types of tasks (other than duration judgment)? The data is very preliminary so far..

[1] C. Mevorach, J. Hodsoll, H. Allen, L. Shalev, and G. Humphreys, “Ignoring the Elephant in the Room: A Neural Circuit to Downregulate Salience,” J. Neurosci., Apr 2010; 30: 6072 - 6079.