Spike-Based Hebbian Learning of Distributed Word Representations

Word vectors are distributed word representations that have successfully been employed in several difficult natural language processing (NLP) tasks. They are the new state-of-the-art in machine learning for extracting meaningful semantic information from text, the quintessential example being that they can help solve the following problem:

king - man + woman = queen

without any a priori notion of gender or royal hierarchy.But how is it able to to this task? Word vectors are typically trained several times through huge corpora, for example all of Wikipedia, using methods optimized for speed and storage such as continuous bag-of-words (CBOW) and Skip-gram (1).

The goal of such advanced methods involve building up an accurate and concise summary of word co-occurrences of the trained corpus, the idea being that words that occur often enough together in these huge pieces of data can already take you a long way in determining underlying semantic meaning. Given the proven power of word vectors in a variety of NLP tasks (for a good sample, see the other results from this work group!), we hypothesized that the brain might be capable of performing something vaguely similar to computing word vectors.

Our subproject involved developing a spiking neural network that took words as inputs and learned co-occurrences of these words using a Hebbian learning rule (2). The words we used were gathered from the famous One fish, two fish, red fish, blue fish children's poem written by Dr. Suess (3). This corpus was chosen due to the fact that it has lots of repeated words, which we thought would lead to a richer synaptic representation of co-occurrences. Using NEST (4), we encoded each unique word as a 30-neuron population, and provided 100 ms of stimulation to that population in order to make it spike at 20 Hz when the word was encountered in the corpus. Going word-by-word, we gradually stimulated all neurons in the network in this way. Using periods ('.') as sentence delimiters, we provided temporal gaps in which no stimulation was present to avoid associating words between sentences.

Over the course of training, and due to a long Hebbian learning window, the network was able to learn the full co-occurrence matrix of words in the form of synaptic strengths:

Since this way of storing word co-occurrences does not scale so well (the entire NxN matrix scales as a square...), ongoing work will take the form of collaborations after the workshop: Philip Tully, Guido Zerrella, and Jonathan Tapson, will attempt develop compressed weight matrix representations for storing and recalling word vectors using this network accompanied by Jonathan's SKIM architectures (5).

1. Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).
2. Tully, Philip J., Matthias H. Hennig, and Anders Lansner. "Synaptic and nonsynaptic plasticity approximating probabilistic inference." Frontiers in synaptic neuroscience 6 (2014).
3. Seuss. One fish, two fish, red fish, blue fish. Harper Collins UK, 2003.
4. Gewaltig M-O & Diesmann M (2007) NEST (Neural Simulation Tool) Scholarpedia 2(4):1430.
5. Tapson, Jonathan C., et al. "Synthesis of neural networks for spatio-temporal spike pattern recognition and processing." Frontiers in neuroscience 7 (2013).