2015/word2vec

* word2vec can be downloaded from here:  https://code.google.com/p/word2vec/

* we can train our own vectors using the lab machine at 10.130.53.56, with up to 10 threads. Ask Guido for the login.

* you can also download pre-trained word vectors trained on 100 billion words of Google News data from:  https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit

* here are some sample word2vec command line parameters to get you started:

word2vec \
		-train $< \
		-output $@ \
		-negative 15 \
		-size 256 \
		-binary 1 \
		-iter 5 \
		-cbow 0 \
		-min-count 100 \
		-window 10 \
		-threads 10

Our vectors were trained on the most recent version of wikipedia, which includes 3,169,913,651 tokens.