Speech Recognition (from EEG)

Coordinator: Vikram

Project Members

Vikram Ramanarayanan, Jessica Thompson, Tom Murray, Giovanni Di Liberto

Project idea

Can we come up with a better representation to allow us to convert EEGs into speech?

More generally, can we obtain a robust low-dimensional representation of the “phonetic information content” in speech? Possible approaches could involve finding low-dimensional repns by leveraging measurements at different levels of the processing hierarchy, such as:

- Speech

- EEG

- Phonetic labels

- Articulatory movements

One possible application of such an endeavour could be in better understanding the link between speech production and perception. For example, recently researchers have presented evidence in favor of this (Ghosh et al., 2010; Bertrand et al., 2008), showing that processing speech signals using an auditory cochlea-like filterbank preserves maximal mutual information between articulatory gestures and the processed speech signals. One can also show that gesture-like “primitives” learnt from speech articulation data perform competently in classifying different phonetic categories (derived from the acoustic signal) relative to conventional acoustic features (Ramanarayanan et al., 2013).

Approaches

One could train a Deep Neural Network to learn mappings between MFCCs and phone labels, as well as EEG and phone labels, and try and connect the 2 architectures at one of the hidden (bottleneck?) layers. Alternatively, one could use an associative hidden layer (as is popular in convolutional nets) to do the required multi-view learning.

Related work (and papers)

Nima's eCog paper describing the forward and backward reconstruction approaches

See Google Doc  here:

Attachments