In Ear EEG Sensors

Recording and analysis of electrophysiology (EEG) from the scalp is a well established technology, which is often used in many clinical and research applications including assesment of human hearing and cognition. However, in spite of the promising results, its application in everyday life is still limited due to its visibility and requirement to be mounted by a professional. An alternative approach would be to record the EEG from the ear canal, which would enable reliable insertion by the person himself and provide an almost invisible device. Although this approach seems favorable in everyday life applications such as to steer a hearing aid based on the users intensions, research is needed to explore if meaningful information can be retrieved from the ear canal.

We know the EarEEG is limited with respect to number of sensors to record the traces and the spatial distribution over the scalp. Hence, the research questions for this project includes:
1) What kind of signal do we get, and how does it compare to scalp EEG (what information is present, and what is missing)?
2) How does the reference location influence the retrieved information in the signals (in ear, mastoid, contralateral, temporal etc.)?
3) Can decoding algorithms developed for scalp EEG be applied to the EarEEG signals?
4) Can EarEEG be used to identify location of the sound source?


People

Carina Graversen (group leader), Sahar Akram, Emina Alickovic, Alain de Cheveign, Francisco Cervantes Constantino, Jens Hjortkjær, Shihab Shamma, and Jie Jack Zhang.


Data

Several datasets are available for analysis, and furthermore EEG equipment (both scalp and EarEEG) is available during the workshop to record even more if needed. Available data are all stored on the external harddisk labeled "Malcolm". The configuration for the datasets are:


Oddball from Oticon (Carina Graversen):
Data is stored in the directory:
Raw EEG data: "\LaCie?\Telluride2015\CarinaGraversen?\Oddball_ScalpEarEEG\RawEEG"
Mat EEG data: "\LaCie?\Telluride2015\CarinaGraversen?\Oddball_ScalpEarEEG\MatEEG"

Data was recorded from 8 normal-hearing subjects. Each recording consisted of 5 blocks of each 8 times 1 minute run. The 8 runs per blocks were randomized with respect to balanced attention (4 in the left ear, and 4 in the right ear). Stimulations were applied in a dichotic paradigm through direct-audio-input in the hearing aids. Stimuli were applied with random frequencies (250 vs. 1000 Hz) and repetition rate (1.4 vs. 1.8 Hz), while deviants (1 semi-tone) in both streams were presented with an occurence of 15%. Before each run, the subject was cued on which stream to attend to, and to press a button whenever a deviant occured in this stream.

The configuration of the stimuli settings can be retrieved from a marker in the beginning of the run - and the triggers retrieved as follows:
71) Attend left. Left = 1000 Hz, 1.4 Hz repetition rate. Right = 250 Hz, 1.8 Hz repetition rate. Left std = 92; Left dev = 94; Right std = 91; Right dev = 93.
72) Attend left. Left = 1000 Hz, 1.8 Hz repetition rate. Right = 250 Hz, 1.4 Hz repetition rate. Left std = 96; Left dev = 98; Right std = 95; Right dev = 97.
73) Attend left. Left = 250 Hz, 1.8 Hz repetition rate. Right = 1000 Hz, 1.4 Hz repetition rate. Left std = 99; Left dev = 101; Right std = 100; Right dev = 102.
74) Attend left. Left = 250 Hz, 1.4 Hz repetition rate. Right = 1000 Hz, 1.8 Hz repetition rate. Left std = 103; Left dev = 104; Right std = 105; Right dev = 106.
75) Attend right. Left = 1000 Hz, 1.4 Hz repetition rate. Right = 250 Hz, 1.8 Hz repetition rate. Left std = 108; Left dev = 110; Right std = 107; Right dev = 109.
76) Attend right. Left = 1000 Hz, 1.8 Hz repetition rate. Right = 250 Hz, 1.4 Hz repetition rate. Left std = 112; Left dev = 114; Right std = 111; Right dev = 113.
77) Attend right. Left = 250 Hz, 1.8 Hz repetition rate. Right = 1000 Hz, 1.4 Hz repetition rate. Left std = 115; Left dev = 117; Right std = 116; Right dev = 118.
78) Attend right. Left = 250 Hz, 1.4 Hz repetition rate. Right = 1000 Hz, 1.8 Hz repetition rate. Left std = 119; Left dev = 121; Right std = 120; Right dev = 122.

While listeners did the task described above, EEG recordings were collected at a 2048 Hz sampling frequency from 64-electrode scalp channels (Biosemi) plus two EEG earpiece systems comprising three electrodes each, inserted into the left and right ear canals.


Audiobook with competing speakers from Oticon (Carina Graversen):
Data is stored in the directory:
Raw EEG data: "\LaCie?\Telluride2015\CarinaGraversen?\Audiobook_ScalpEarEEG\RawEEG"
Mat EEG data: "\LaCie?\Telluride2015\CarinaGraversen?\Audiobook_ScalpEarEEG\MatEEG"
Audio files: "\LaCie?\Telluride2015\CarinaGraversen?\Audiobook_ScalpEarEEG\Audio"

Data was recorded from 5 normal-hearing subjects. Each subject listened for 60 minutes to a speech mixture made from a 30 minute audiobook story narrated by a male speaker, and overlapping in time with a completely different narrative from a female speaker of the same duration. After the first minute of this duo/mixture talk, the exact same 1-minute fragment just presented was repeated again. After the repetition, a new duo fragment (the continuation to the main stories) was then presented during the third minute. The fourth minute was, in the same way, a repetition of the duo fragment presented during the third minute; this rule was then repeated until an hour of auditory recordings was completed. Subjects decided to continue the experiment via a button press. The reason behind repetitions was to allow the listener to switch the focus of their attention and select either the female or the male speaker. Listeners were instructed which speaker to attend to in advance of each of the button presses that triggered the next mixture fragment.

While listeners did the task described above, EEG recordings were collected at a 2048 Hz sampling frequency from a 64-electrode scalp channel (Biosemi) plus two EEG earpiece systems comprising three electrodes each, inserted into the left and right ear canals. Offline, recordings were band-pass filtered between a 1 and 25 Hz with an order 2 Butterworth filter in the forward and reverse direction, and down-sampled to 256 samples per second (this only applies to the .mat files).


Audio speech (Maarten de Vos - ask Sahar in case of questions during the workshop):
Recording from two subjects, 2 male speakers, 48 trials of each 1 minute, attention to the second speaker for all trials, sampling frequency 64 Hz, 16 channels around the ear (cGrid).


Audiobook with shift in head position from Telluride workshop (Carina Graversen):
Carina adds this...


Audiobook with four locations from Telluride workshop (Carina Graversen):
Carina adds this...


Methods

Below is a short description of the data analysis applied to answer the research questions:


1) Comparison between scalp EEG and EarEEG:
Initially, the degree of linear dependence of each of the 6 earpiece EEG channels was contrasted against the rest of individual scalp channels via Pearson correlation coefficients. The following matrices were obtained for all five subjects, and the color scale indicates Pearson’s r.

Denoising techniques were then applied, as EEG signals routinely reflect information that is redundant across sensors, and which are irrelevant (e.g. cardiac artifact) to a target neural processing of interest. A popular method is independent component analysis (ICA), which was applied here through a Fast ICA implementation algorithm (Hyvarinen etc), from which twenty independent components were obtained as additional reference channels (the left and right mastoid recordings were already available). Independent components were selected as synthetic references when they contained the largest proportion of broadband (0-75 Hz) power (even when signal frequencies above 25 Hz were discounted previously); this selection was done by finding the independent component yielding the most power at each spectral bin (of fixed linear size, determined by dataset length), and then computing the histogram of independent components that most frequently outnumbered all others in terms of spectral power across the frequency range, thus extracting ICs containing relatively high broadband power. As an example, the spectra from independent components on both extremes (relatively high/low in broadband power), extracted from 10 minutes of data from subject 08, are presented below in blue/red respectively (left), and their time series is also shown (right). Note that the IC to be used for reference later on contains artifact unrelated to brain activity of interest, but some may still lie with the ICs retained with this criterion.


Also, note that since spectral bins are linearly spaced, and given the 1/f power spectrum of typical EEG fluctuations, this approach weighs favorably unusual components that consistently show extreme power at the higher end of frequencies. The approach has the versatility in that component selection is ‘blind’, although verification of the spectra of components-to-be used as reference is recommended. Environmental sources arising from unwanted electrical signals not related to brain activity of interest were reduced by time-shifted principal component analysis (TS-PCA). This technique discards environmental sources that have dissimilar convolutive properties when they mix at reference sensors in the EEG system, in contrast with the convolutive properties of sources that mix at the data sensors in the array. Provided that reference sensors record noise and no primary sources of interest, such mismatch is exploited as a basis for rejection: projections of recordings from the brain sensor array which do match in their convolutive properties with those from reference sensors recordings are removed via PCA (tspca reference). The synthetic reference signals obtained through Fast ICA plus the two mastoid recordings operated here as reference electrodes for TS-PCA purposes. TS-PCA parameters were set to N=12 taps (at 256 Hz sampling frequency), and regressor principal components whose variance amounted to less than 10-6 times that of the first component were discarded as negligible, for numerical purposes. Signal phase delays introduced by the procedure were corrected. An example is shown below for channel Cz which may conveys auditory responses in speech experiments, before (blue) and after (red) this environmental denoising, for the same data extract as in the previous figure.


Although data appears cleaner, a few sensors with clearly outlier readings were still present (not shown here). An exclusion criterion was then applied per subject dataset, such that for all K sensors in the system , a sensor standard deviation vector was computed. Given the elements in , a threshold based on its statistical median was determined: the i-th sensor was excluded from further analysis if that is, if variance in the i-th channel lies at an extreme from this empirical distribution of K sensor time-series in a dense array system. It is based on the L-2 norm of the sample , but referenced with respect to the median, which itself discounts the effects of extreme values (Junghofer ref). The hyperparameter is set here to to weight this threshold, resulting in rejection of less than 2% of sensors across all 5 subjects. The last step in this pre-processing pipeline considered sensor-specific sources of unwanted electrical signals unrelated to brain activity of interest. These were reduced with the sensor noise suppression algorithm (SNS, reference). Each channel recording was substituted by its projection formed by the orthogonal basis span of all other channels: this method exploits redundancy from a dense array -where the number of sensors exceeds the number of brain sources- rejecting sensor-specific components whose presence cannot be explained by the manifold of redundancy shown by the rest of the dataset – potentially including sensor-specific noise from those channels themselves. Collectively, this separation does not necessarily eliminate all sensor-specific noise, since at each substitution it can be imported from other sensors, yet this may promote instances where such redistribution will add these components in incoherent manner and thus become attenuated (SNS reference).

Next, the data were epoched according to triggers marking the onset of each 1-minute duo speech delivery. Epochs were then ordered according to what the target speaker was, a female or a male. Recall that in this experiment, each new narrative was repeated again once, so this meant that half of ‘attend-male’ epochs were a novel sound presentation, half were a repeated presentation, and the same for ‘attend-female’. In this part of the analysis, novelty versus repetition is not studied because even if stimulus conditions are being the same, the listener’s attentional task always reflects a novel way of processing sound, which approximates single trial tasks. During repetition presentations there may be learning effects from the prior exposure that help a listener ignore the unattended speaker, to whom attention had been paid previously. In this part of the work it was of interest to see the effect of attentional gain per se, and so the data were pooled across first and second presentations, ordered by attentional target which was the only effect that matters at this time. Finally, because data were approximately ‘single-trial’ which inabilitated the benefit of averaging across conditions, in order to evaluate what signal does the typical listener produce under these conditions, we computed the statistical median across the pool of 5 subjects. The median is unaffected by possible outlier recordings that may occur at any time in one of the subjects versus the rest, and in this sense represents better the tendency. As a caveat, this approach assumes that electrodes locate equally across listeners, which may vary even for the same elastic cap band with different head sizes, hairstyles, etc. Lead fields also vary across subjects as a consequence of different brain folding patterns. Under some circumstances, these considerations may discourage the use of an average across subjects, and leave the median from their distribution as a better alternative instead.

A critical question is which statistic to extract from acoustic information. The raw sound signal that is delivered from the computer’s the sound card to be transduced and delivered to the ear has all instructions to make the loudspeaker push and pull air volumes – yet it turns out that the instructions for making the physics of sound work out may not necessarily be the same instructions for making the biophysics of auditory neurons work out - they may not even be compatible with the computational rules that work out for the auditory system! Because the EEG reflects many signals arising from the central nervous system, it is then important to know what aspects of the input signal matter once it is being processed at that stage. Low frequency (e.g. 1-10 Hz) modulations of speech have been observed to match some of the scalp EEG so that they have been useful in decoding approaches to realize what speaker a person is attending to (refs). A simple way to extract these modulations is via the envelope of the raw sound signal. Here the envelope was obtained by extracting first an auditory spectrogram based on a model of sound encoding at the auditory periphery up to the midbrain (Chi ref). The high resolution in time and frequency of this model yields essentially a collection of envelopes representing each different frequency bins in say, the 0.9 – 3 KHz range. Just as in an ordinary spectrogram, collapsing across all frequencies analysed may yield an approximation to the acoustic envelope, such as in the figure below which is the envelope of extracts from the Danish audiobook stories employed in this dataset: female (red) and male (blue) narrators delivering at the same time ea duo presentation (green).


Rather than linguistic content per se, the slower sound envelope modulations observed can be interpreted as indicators of acoustic energy fluctuations of spoken utterances, as they evolve over time. Correspondingly, the slow modulations (2-8 Hz) in all EEG channels was extracted via an order 100 FIR filter, and the consequent phase delay shift was then corrected. Again, a correlation analysis was performed. Last time, it was of interest to see how did signals A and B matched with each other via a Pearson’s r correlation coefficient, but there was the assumption that all simultaneous electric signals were transferred to all electrodes at the same time (light speed through space), this time, our signals B has a postulated causality with respect to signal A: that is, signal A (sound) hopefully elicits a perturbation on what the auditory system does, which is reflected in signal B (EEG). Obviously, the biological auditory system has physical constraints built-in, and it certainly does not and will not operate at light speed. So this time our postulated causality implies a systematic delay to be expected between presented stimulus and response recorded. This time, one can see what the delay is, and whether it is physiologically plausible according to known delays, via a correlation function with time lags. To make a comparison between 2 sound channels (attended & unattended) and 70 potentials’ channels (64 scalp & 6 earpiece), without any dependence on the particular gain that one channel might have over another, all stimuli and responses were transformed to z-scores (signal’s mean substracted, then divided by standard deviation).
Stimulus reconstruction is the complement of neural response prediction. In the first case, one starts ‘without’ the stimulus, and wants to obtain a version or representation of it, as accurate as possible, given the known EEG signal. This is achieved by finding a mapping from neural signal to stimulus representation. In the second case, one wants to be able to know what the shape of the neural response is (that is, a ‘version’ of the neural response), given the known stimulus. This is achieved by the inverse mapping from stimulus to neural response. To start here, a linear filter model, the temporal response function (TRF), is used. It initiates with from a one-dimensional representation of the stimulus (the speech envelopes seen earlier), and convolves it with an impulse response. The result is the predicted neural signal (or the predicted linear part of it), and the goal is to minimize squared error between data and prediction. A priori the impulse response is not known, but several techniques exist to estimate it. Boosting, an iterative algorithm based on stage-wise modeling (David ref) is used here on the 60 minute dataset. To assess the predictive value of the TRF model, the algorithm is trained on 58 minutes of data, the prediction this model makes on the remaining 2 minutes of unused data is then examined. This cross-validation scheme is repeated 29 times for all 2-min data segments. TRFs were estimated from the EEG dataset, using the Attended and Unattended speech envelopes separately as template stimulus representations. Recall that this EEG system has 70 channels. Instead of obtaining TRFs at each channel - which could allow a spatial distribution of the impulse response over time, at the expense of potentially poor SNR-, a virtual channel is obtained here by a weighted sum of all channels’ timeseries. The weights were given by the Attended minus Unattended correlation headmap (see Results, below).



2) Optimization of electrode location:
The Oticon system comprises three electrodes per earpiece, and use implementation requires at least one of them be a ground electrode. This leaves two other electrodes, one of them to be used as a reference. This short investigation sought whether positioning the reference electrodes within the same earpiece (‘ipsilateral reference’) or in the opposite earpiece made a difference in the quality of recorded auditory responses.

A total of 4 minute long amplitude-modulated (AM) noise was delivered to a subject wearing two earpieces, with the stimulus divided in two blocks. In the first block, each of the reference channels were positioned in the opposite ear canal (‘contralateral reference’), and in the second block these were positioned to the same ear canal (‘ipsilateral reference’). AM noise had a 40 Hz modulation rate.

Data were recorded at a 512 Hz sampling rate from two sensors (left and right channels), and each channel’s linear regression over time was removed along with its mean. Recordings were then notch-filtered at 60, 120, 180 and 240 Hz and then peak-filtered at 40, 80, 160, 200 Hz with order 2 Butterworth filters applied in the forward and reverse direction. Data were then partitioned in 750 ms epoch segments (correspond to 30 cycles per epoch). An exclusion criterion was then applied to these epochs, namely for all K epochs from an earpiece , an epoch standard deviation vector was computed. Given the elements in , a threshold based on its statistical median was determined: the i-th epoch was excluded from further analysis if

that is, if variance in the i-th epoch lies at an extreme from this empirical distribution of K epoch time-series. It is based on the L-2 norm of the sample , but referenced with respect to the median, which itself discounts the effects of extreme values (Junghofer ref). The hyperparameter is set here to to weight this threshold, resulting in rejection of less than 2% of epochs.



3) Decoding algorithms for EarEEG:
Sahar, Emina or Carina will add this...


4) Spatial localization from EarEEG:
Sahar or Emina will add this...


Results

1) Comparison between scalp EEG and EarEEG:
Initially, the degree of linear dependence of each of the 6 earpiece EEG channels was contrasted against the rest of individual scalp channels via Pearson correlation coefficients. The following matrices were obtained for all five subjects, and the color scale indicates Pearson’s r.


The picture suggested a spatial pattern of correspondence with some sensors in the first three subjects, and that the last two subjects’s data may benefit from denoising later. For now, it was interesting to see the spatial pattern under which in-ear sensors match best the traditional ones, averaging across the first three subjects:


In this case, the scale ranges from Pearson’s r=0.65 (dark red) down to ~0 (light green). It was noteworthy that these raw EEG earpiece recordings (L1,2,3 and R1,2,3) tend to match most those scalp sensors physically closest to them. A prior motivation to this verification step was to check the possibility that differences in the conductance model from within the ear canal may lead to an unusually different spatial pattern, other than correlation based on physical proximity: for example, some level of correlation with contra-lateral sensors is apparent in this picture. This observation leads to the question of whether some of this bilateral correlation pattern stems from common activation sources, and whether these are relevant to the speech task or not.

Given the pre-processed data, the correlations between earpiece and scalp sensors were again evaluated across channels that were not discarded (i.e. dark blue)


And corresponding headmaps for all earpiece channels averaged across subjects (range r=-0.19 to 0.19)


And averaged by ear canal


To make a comparison between 2 sound channels (attended & unattended) and 70 potentials’ channels (64 scalp & 6 earpiece), without any dependence on the particular gain that one channel might have over another, all stimuli and responses were transformed to z-scores (signal’s mean substracted, then divided by standard deviation).




The figure to the right suggests that 207 ms post sound may be a time where sound envelope and the EEG signals have an optimal correspondence. This time frame is within a physiologically plausible window, however at the moment this delay should account for any delay between trigger delivery and sound presentation, which occur. If recordings have not been corrected for this delay, then one needs to substract it from the value suggested here.

The prediction scores (Pearson’s r) between actual and modeled EEG response were 0.041 and 0.024 in the Attended and Unattended conditions, respectively, which may be quite low even for essentially single trial EEG data as used here. The TRFs may suggest a different temporal profile between conditions, which was usually consistent at cross-validations

Let’s look at how do earpiece sensors fare on their own. A cross-correlation of 30 minutes of attended envelope with each of the left earpiece sensors yields a different picture than cross-correlation with respect to the unattended envelope, as shown below a RMS for left earpiece sensors, per condition.



2) Optimization of electrode location:
The plots above suggest that a 40 Hz auditory steady state may be picked up with less noise for the ipsilateral rather than contralateral reference array. Note the difference in relative amplitudes between left and right channels, and also the phase difference between (ipsilateral and contralateral) arrays. This may be in part due to the phase difference observed between left and right earpieces themselves, as shown below:

3) Decoding algorithms for EarEEG:
Sahar, Emina or Carina will add this...


4) Spatial localization from EarEEG:
Sahar or Emina will add this...


Discussion

1) Comparison between scalp EEG and EarEEG:
How do in-ear recordings compare to the traditional scalp EEG in auditory experiments? Which scalp electrodes do in-ear sensors match most, during simultaneous auditory recordings?

So back to the original question of how do in-ear recordings compare to the traditional scalp EEG in an auditory experiment, the answer seems to be, they compare best to nearby (temporal) scalp electrodes, with a limited degree of hemispheric symmetry between recordings retrieved from the left and right earpieces. If the patterns observed here are indeed stimulus-related, one could conjecture that the asymmetry might be due to differential language processing, although it is noteworthy that a bias for stronger correlations with respect to the right earpiece set was already seen with the raw data (e.g. cardiac muscle contraction artifacts often seen in temporal/frontal sensors). With denoised data, one might be able to exploit the opposite hemispheric gradients shown by the earpieces, in order to distinguish between sources from different hemispheres later on.

Can scalp and earpiece EEG sensor recordings match any aspect of the attended stimulus? It is well known that attended speech can be decoded from scalp sensor EEG (ref), and that earpiece EEG sensors do pick up auditory signals such as the 40 Hz steady state response; earpiece EEG can also be modulated by visual attention, as those recordings demonstrate different spectral peaks when one of several light displays flickering at a different regular rhythms (low beta) is attended, matching the target flicker rate. With this in mind, it is important to learn whether selective attention effects can be also observed with the same instruments for a different sensory modality, and with stimulus that has a richer statistics such a speech.

Rather than linguistic content per se, the slower sound envelope modulations observed can be interpreted as indicators of acoustic energy fluctuations of spoken utterances, as they evolve over time. Correspondingly, the slow modulations (2-8 Hz) in all EEG channels was extracted via an order 100 FIR filter, and the consequent phase delay shift was then corrected.

The above image some it is clear that higher correlation magnitudes may be obtained when EEG signals are contrasted to the clean target speaker than to the unattended speaker. The time delay shows how this pattern exists to the left but not to the right of the 0th delay, which may suggest causality. How much time after sound does the EEG become best paired with an envelope fluctuation? It is noteworthy that correlations do appear as well for unattended speech, and this may be a consequence that listeners are always being presented with such sound, whether they want to attend it or not. Stimulus-wise, there is a number of times at which only one speaker is talking, so this could be driving the EEG to also track the unattended speaker whenever the target paused for a while (see stimulus figure, above). It would then be beneficial to (1) take a look to a difference map between these two conditions now, and (2) examine the time at which this difference is maximal across the entire sensor setup; recall that depending on their location, some channels may have anti-correlation at about the same time than other channels, so the root-mean-square may help find out the optimal time delay.

Overall, shortly after the very sound envelope is presented, it is notable that a listener’s sole capacity to attend biases the seemingly arbitrary brain waveforms into those of the target speaker in the mixture with the present dataset. It is postulated that this remarkable ability stems from cortical processing in auditory and language-related areas. It remains to see what is the topography of the difference correlation map at the time stamp suggested earlier.

The figure suggests that, for the typical listener, a relative peak of auditory activity (central electrodes) which a left lateralization (where language processing regions are postulated to operate) work in conjunction in this task, and that we can retrieve this distribution even from approximately single-trial sound mixture presentations. The correlation levels computed remain considerable low, which suggests stimulus reconstruction would remain the next challenge.


2) Optimization of electrode location:
The picture shows that in principle, these sensors may pick up some of the differential processing entailed to auditory selective attention under identical sound stimuli were identical.


3) Decoding algorithms for EarEEG:
Sahar, Emina or Carina will add this...


4) Spatial localization from EarEEG:
Sahar or Emina will add this...


Conclusion

Carina will update this after the final group discussion...


Future work and collaboration

Carina will update this after the final group discussion...



Attachments