Auditory Decision Variables: unpredictability and decision threshold

Coordinator: Yves

Project idea


Humans and animals can rapidly and effortlessly recognize a wide range of acoustic stimuli. In order to achieve this accurate classification, the auditory system is capable of processing rapid variations in the acoustics of sounds. These processing capabilities offer straightforward use for many discrete stimuli, characterized by a stereotypical sequence of spectrotemporal features. However, a much wider class of stimuli can be recognized, which are not as strictly characterized. A waterfall, for example, features a spectrogram, which does not contain a single sequence of spectrotemporal features, but instead is composed of a mixture of spectrotemporal features, whose properties and occurrence times are only statistically defined.

Stimuli which are defined based on their statistical composition of spectrotemporal features have been termed "acoustic textures", in analogy to visual textures, such as fur, marble, and grass. Previous results using natural textures suggest that in order to recognise such sounds, the auditory system somehow summarizes or at least produces a condensed representation by combining the statistics of lower-level activity (McDermott? et al. [2013]).

Statistics and accumulation of sensory evidence

In this study, we focused on the dynamics of sound statistics integration. We investigated how subjects detect a change in the statistics of an artificial texture. A recent work established that it is possible to find a neural correlate of the accumulation of sensory evidence in the low-pass EEG signals (<6Hz) in the Centro-Parietal Posterior area (CPP) (Connell et al., 2012). We think this signal is a relevant tool for investigating how the brain gather information about a sound.

Stimulus and previous psychophysics results

Instead of using natural sounds, we chose a more reduced approach. The stimulus was generated constraining the marginal frequency distribution of cloud of tones. The present paradigm is close to profile analysis used in earlier studies because subjects are asked to assess the spectral shape of the stimuli. However we introduce here a dynamical aspect to the task in the sense that subjects have to constantly monitor and integrate the statistics of a sound in order to report a change occuring at a random timing.

We used a tone cloud, i.e. a train of short overlapping pure tones from a range of 2.2 octaves (400 to 1840 Hz divided in 8 frequency bins), governed by a marginal distribution of occurrence probability (Figure 1, left panel). This distribution is different from trial to trial. The duration of each individual tone was 30 ms and at each chord the number of tones per frequency bin depended on the value drawn from the marginal distribution for a particular frequency bin. The marginal distribution was then altered at a random time during the stimulus presentation (after 0 to 8 second of the first sound), resulting in the appearance of a change. The change consisted of an increment in the probability of occurrence of tones in two of the eight frequency bins (Figure 2). Subjects were instructed that a change would occur on every trial and that their task was to press the response button placed in front of them whenever they heard the change, but within 2 seconds after the onset of the change. The outcome of each trial was either a False Alarm (click before the change), Hit (click during the 2 seconds of sound after the change) or Miss (no click). Performance for each condition was computed as (Hit nb) / (Hit + Miss Nb).

Figure 1: A: marginal distribution of the tone frequencies. B: Cochleogram of the tone cloud with the embedded change at 5 seconds (white dashed line).

Figure 2: Marginal distribution used after the change. Two possible increments of different size are shown, in orange and red.

Although we found several parameters influencing the subject performance, here we will only present the contextual effect of the duration of the first part of the sound (or timing of change). Figure 3 shows that there is a strong influence of both the size and the timing of the change on subject performance. Early changes (for timings of change < 2s) were more difficult to detect by subjects. This effect is mirrored by reaction times (data not shown here), namely that early changes were correlated with longer reaction times. We hypothesize this effect could be due to the uncertainty about sound statistics in the first few seconds of the stimulus presentation.

Figure 3: Performance with respect to the timing of the change (n = 9 subjects). Performance for early changes is worst than for late changes.

This task seems to be appropriate for studying accumulation of sensory evidence since subjects have to integrate local sounds statistics in order to detect a potential change. We think the effect depicted in Figure 3 could be used for investigating how the the uncertainty about the statistics of the initial sound could impact the integration of change-related sensory evidence.

Protocol, apparatus, signal analysis

EEG recordings were performed with a 64-channel signal (BrainVision?) at a sampling rate of 1 kHz. Changes occurred after 0-8 seconds of sounds (according to a modified Poisson distribution), and subjects had 2 seconds for reporting the change by pressing a button. Outcome of each trial was either Hit, Miss, or False Alarm (FA). Three difficulty levels were used, each 120 times (total number of trial for each subject = 480, ~1 hour recording). At the end, of the recording, sound was stopped and subjects were asked to press randomly the button in order to isolate a purely motor component. EEG recordings were down-sampled at 62.5 Hz, and then low-pass filtered at 6 Hz. Each Hit and False alarm trials were locked on the response time. We then subtracted the individual average motor component for each subject before avergaing across subjects.


Is there a neural correlate of sensory evidence accumulation for sound textures?

Although the tone cloud we used is quite reminiscent of the random dot cloud used in visual tasks, we wanted to show that accumulation of sensory evidence is also necessary for performing this task. Figure 4 shows the topographic display of the EEG recordings for one subject. Only Hit trials were kept, and all trials were response-locked. One can see an increase of activity in the CPP region up to 500 ms before the button press. This activity peaks at the timing of decision, and then decays afterwards. This different characteristics match the features one could expect from a decision variable signal.

Figure 4: Topographic plot of EEG recordings. Hit trials are clustered by the difficulty of each trial (~40-100 trials in each condition).

For visualizing the time course of this CPP activity, we selected 6 electrodes centered on the CPP area. Figure 5 displays the average across subjects for different difficulty. Unfortunately the low number of subjects (n=3) impedes any statistical analysis. However, one may see a steeper slope for the easiest condition (red trace) than for the 2 others more difficult conditions. This may correspond to the fact that the build-up rate is increased for easy-to-detect changes.

Figure 5: Average of the 6 CPP electrodes (n= 3 subjects) as a function of trial difficulty. Motor response was removed from the individual trace of each subject.

Are false alarms based on sensory evidence?

Subjects indicated after the experiments that the false alarms they reported was based on a sensory input in the stimulus. One can wonder whether sensory features due to the stochastic nature of the stimulus could elicit an increase in the CPP activity just before the response in false alarm trials. Figure 6 shows that these false alarm trials are preceded by an increase in the CPP activity, altogether with the Hit trials. Moreover, we response-locked the miss trials (without any actual button press) with the median reaction time across all trials. CPP activity stays pretty low for this condition, suggesting that misses are correlated with an absence of response at the level of the CPP electrodes.

Figure 6: False alarms are correlated with a increased CPP activity.

Future plans

  • Use DSS analysis to denoise the signal
  • Increase subject number for determining any influence of the timing of change on the CPP activity

Related work (and papers)

About sound textures:

  • Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis, Josh H. McDermott and Eero P. Simoncelli, Neuron 71, 926–940, September 8, 2011 (attached)

About decision variable and accumulation of sensory evidence:

  • Internal and External Influences on the Rate of Sensory Evidence Accumulation in the Human Brain, Simon P. Kelly and Redmond G. O’Connell, The Journal of Neuroscience, December 11, 2013, 33(50):19434 –19441 (attached)
  • A supramodal accumulation-to-bound signal that determines perceptual decisions in humans, Redmond G O’Connell, Paul M Dockree and Simon P Kelly, Nature Neuroscience 15(12), December 2012, doi:10.1038/nn.3248 (attached)

About Drift Diffusion Model (DDM), amongst many others:

  • The speed and accuracy of perceptual decisions in a random-tone pitch task, Mulder MJ1, Keuken MC, van Maanen L, Boekel W, Forstmann BU and Wagenmakers EJ., Atten Percept Psychophys. 2013 Jul;75(5):1048-58. doi: 10.3758/s13414-013-0447-8