On the snippets (left) and long documents (right).
The x-axis is log-scale.
Methods in the graphs
`GoogleNews word vectors':
Semi-supervised word-vector CNN.
A convolution layer takes as input the Google News word vectors
(trained with word2vec on 100 billion words, general-purpose public word vectors).
Word vectors were updated/fine-tuned during CNN training.
Supervised one-hot CNN.
A convolution layer takes one-hot vectors as input.
No unlabeled data. No pre-training of any sort.
Semi-supervised one-hot CNN.
A convolution layer takes as input
and info. (unsup+unsup3) learned from domain unlabeled data of up to 300M words.
Summary of the results
Google News word vectors are helpful when training text is scarce (small sets of snippets).
Supervised one-hot CNN works well with plenty of training text (many snippets or long documents).
Semi-supervised one-hot CNN performs well across the board.
Snippets and long documents are Amazon product reviews (on electronics and
DVD, respectively) generated from
the new-and-improved Amazon dataset at https://snap.stanford.edu/data/web-Amazon.html.
Snippets were generated by concatenating a summary section (five words in average) and the first 15 words of
a text section.
The long documents are text sections of no shorter than 200 words.
Training/test sets are balanced (#positive=#negative).
max 300K (6M words)
max 30K (14M words)
2M reviews (265M words)
2M reviews (190M words)
File translated from
version 4.03. On 5 Sept 2016, 13:00.