Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

Getting Data!

We have tried to make getting the data as simple as possible. Run the shell script (get-all-data.sh) provided in this folder to create the subfolder structure and download the data files needed for the notebooks. The files are hosted on google drive. Alternatively, you can follow the download links below and create the directory structure yourself!

There is one data file that we do not re-host: the GloVe word embeddings. Please download from the stanford website: http://nlp.stanford.edu/data/glove.6B.zip. Then, unzip and put the 100d version into a subfolder named glove to result in the following file path: data/glove/glove.6B.100d.txt

Downloading and creating folders manually

Yelp

Folder: data/yelp

  1. raw_train.csv
  2. raw_test.csv
  3. reviews_with_splits_lite.csv

Surnames

Folder: data/surnames

  1. surnames.csv
  2. surnames_with_splits.csv

Frankenstein

Folder: data/books

  1. frankenstein.txt
  2. frankenstein_with_splits.csv

AG News

Folder: data/ag_news

  1. news.csv
  2. news_with_splits.csv

English-French text

Folder: data/nmt

  1. eng-fra.txt
  2. simplest_eng_fra.csv