2010/bmi10/speech_emg

Overview

The electromyogram (EMG) is a recording of electrical activity generated by muscles. The human face has a complex muscle framework allowing a wide range of vowels, consonants, and other vocalizations. This project aims to decode speech, or components of speech, from EMG signals recorded around the mouth and face. We found some guidance in the paper

We placed electrodes at several locations: below the lip, above and to the right (and left) of the lips, just to the right (and left) of the lips, and on the neck. With these placements we targeted several of the important muscles used in opening and closing the jaw as well as forming the mouth to shape sounds.

In two separate patients, we recorded 10 trials each of several kinds of vocalizations and muscle movements:

vowels: "ah", "eh", "ee", "oh", "oo"
numbers: one (1) through nine (9)
words: "yes", "no", "hot", "cold", "hungry", "thirsty", "more", "less", "hello", "goodbye"
clenching
laughing
rest

From this data, we extracted features over a 0.5s window aligned to the activity. These features included:

mean time-series value
max time-series value
min time-series value
zero-crossing count
mean power in time-series
power in the rectified signal

We used the classify function (MATLAB) to train on a subset of trials (half of the available trials), and decode the rest. In the case of the vowels, with five separate classes, we achieved 80% accuracy in the classification for one subject, and 50% accuracy in the classification for the second subject (level of chance 20%). For the numbers, we achieved classification accuracies of approximately 50% (9 classes, level of chance 11%). For the words, we achieved approximately 50% accuracy (10 classes, level of chance 10%).

There were several limitations in our experimental paradigm which may have prevented more concrete performance. For example, one of the subjects had to tense facial muscles to avoid smiling or laughing. Also, the method of connecting the EMG electrodes to the amplifiers left some freedom of movement which may have allowed movement artifact, or excessive noise coupling. Audio recordings were difficult to interpret at times due to poor reference connection (had to be held in place by the subject).

Details on EMG signal acquisition and processing