James Wright, Adam O'Donovan, Connie Cheung,  Nils Peters, Jeffrey Pompe

This is unpublished work in progress.

Project Goal

The goal of this project is to determine if attention-modulated electroencephalography (EEG) signals can be used to control external devices in real-time. Implementation of real-time control is advantageous as it allows for machines and humans to seamlessly work together towards a common goal, which in this demo is to enhance an attended signal.

The basic experiment looks like this: Two human speakers begin talking simultaneously. An audio camera sends the (omni-directional) signals to the headphones of a subject in another room, who is equipped with an EEG cap. The subject is asked to attend to one of the two speakers (in this case, the speaker identifies himself with a code word). In real-time, our system decodes the EEG signals and visual identify the attended speaker on a separate computer screen.

This project extends the work of Lalor, Mesgarani, and Ding (see  reading assignments). To the authors' knowledge, real-time detection of auditory attention is novel and was yet to be successfully demonstrated. We believe our project is the first to demonstrate that reliable, real-time, single-trial decisions can be made by a subject’s attention to speech in a multi-speaker environment. This study may have significant implications on the future design of the next generation of neural prosthetics, such as hearing aids and cochlear implants.

User Scenarios

There are numerous scenarios for which this real-time demo serves as a proof-of-concept.

Air traffic control

For instance, considering the job of an air traffic controller. This job requires a human to control air traffic on and within vicinity of airport and movement of air traffic between altitude sectors and control centers. The controller authorizes, regulates, and controls commercial airline flights according to government or company regulations to expedite and ensure flight safety. Because an error in this job might have dramatic consequences, computer system to support the controller is desired. Our real-time demo shows, how a computer system could track to which airplane the controller is listening and mostly paying attention to. In case a critical scenario arrises, the computer could monitor the other air planes and warn the air controller in case of a critical situation.

A personal meeting diarist

Coupled with an automated speech recognition software (ASR) our brain-computer-interface could transform the stream of an attended conversation into text, suppressing the unattended conversations at the same time.

Interface for impaired people

It has been shown that EEG signals in conjunction with a visual display can be used to type letters and words, thus engaging paralyzed people in social interaction. Similarly, measured attention via EEG may also enables communication.

Experimental design

Task Setup

Two male speakers were given reading material and instructed to begin reading simultaneously. At the beginning of each session, one speaker would identify himself as the intended speaker with a code word (e.g. Ringo) before reading his passage. The intended speaker was randomized between sessions.

In a separate room, we fitted the subject with a 32-sensor EEG-cap and a diotically-streamed set of headphones. The subject was told the code word prior to the session and was instructed to attend to the target speaker for the duration of the session. Each session lasted approximately 2 minutes. The figure below shows a picture of our demonstration setup. Visualization of the real-time demo

Technical Setup

The subject, EEG, data acquisition laptop and decoder laptop are in one room. In another room, the speakers A and B are captured by the audio-visual camera. A long cable connects the camera to the decoder laptop is in the lecture room.

1. Both speakers begin reading newspaper articles out loud. The target speaker begins his passage with the code word “Ringo”.

2. The audio-visual camera receives this and generates two beams, each pointing to one of the speakers. The data is sampled at 44.1 kHz. The envelopes of the two waveforms are calculated in MATLAB on Computer 2 using the Hilbert transform (env = abs(hilbert(x))). These envelopes are used for the decoder in step 4.

3. EEG signals are gathered with a  BrainVision ActiCHamp active electrode data acquisition system. A 32-channel cap with electrodes locations in the standard 10/20 system was used, in addition to two auxiliary channels to record the envelopes of the incoming acoustic stimuli and a third auxiliary channel to simultaneously measure horizontal and vertical bipolar electrooculogram data. The EEG data is amplified and sampled at 1000 Hz. A data acquisition laptop (Computer 1) sends the data via the UDP socket connection to the decoder in Matlab (Computer 2).

4. Computer 2 downsamples to 100 Hz incoming EEG data from Computer 1 and the extracted audio envelopes of the two speakers. All incoming data are stored in individual 45 seconds ring-buffers. The [DecoderFunction] (see more details at  ProjectDecode) returns a value that indicates one of three possible outcomes:

  • 0: no attention
  • 1: attention to speaker A
  • 2: attention to speaker B

The decoder function is created by the  ProjectDecode group.

5. According to the returned value of DecoderFunction, we place a box in the camera display around the attended speaker (using switch_box_mode()).

The runtime Matlab script of the demo running on Computer 2 can be found here.


The demo was running for about 15 minutes and worked to a great extent. This is great success since the Decoder function was created based on the data of the same subject, but on different stimuli. This shows that the Decoder is fairly robust and generalizable to other auditory stimuli. The attention estimation was based on an observation window of 45 seconds. This window length is a compromise between accurate results and real-time feasibility: A longer observation window would improve the accuracy of the estimation but reduces the reaction time of the system. A shorter observation window creates the oposite behavior (see Figure 2 at  ProjectDecode).

We observed a bias toward speaker A which might be due to the fact that speaker A had a louder voice compared to speaker B. Since we did not store the data processed during the real-time demo, it is hard to draw accurate conclusions. In a future demo, we need to store the raw audio and EEG data in order to be able to better understand the Decoder estimates.

The attached video shows the results of one experiment.  EEG_screen_capture_480p.mov


1. For synchronization with the measured EEG signals, we were trying to send the audio signal presented to the subject also to the Aux 1 and Aux 2 inputs of the EEG amplifier. To protect the EEG amplifier, we were trying to build half-way rectifiers. It turned out that these rectifiers compromised the audio signals to the point that the audio signals envelopes were unusable for the decoder at Computer 2.

2. Presenting the attended audio stream to the audience along with the video showing the attended speaker (Fig. 1) in parallel to sending audio to the subject's headphone turned out to be a problem in Matlab due to the need to use two sound cards. In the course of this workshop, we could not implement get the non-blocking audio-playback working for the video presentation.

Future Directions

As a future goal, we would like to explore how the decoder function works across different acoustical scenes.

Also, we are interested in seeing how the decoder function maps to different subjects.