One-hot CNN in comparison with GoogleNews-CNN

snippets-longdoc.png
Figure 1: On the snippets (left) and long documents (right). The x-axis is log-scale.
Methods in the graphs  
Summary of the results  
Data   Snippets and long documents are Amazon product reviews (on electronics and DVD, respectively) generated from the new-and-improved Amazon dataset at https://snap.stanford.edu/data/web-Amazon.html. Snippets were generated by concatenating a summary section (five words in average) and the first 15 words of a text section. The long documents are text sections of no shorter than 200 words. Training/test sets are balanced (#positive=#negative).
Snippets Long documents
Contents DVD reviews electronics reviews
Average #words 20 475
#training max 300K (6M words) max 30K (14M words)
#development min(#training/10, 5K)
#test 25K
#unlabeled 2M reviews (265M words) 2M reviews (190M words)



File translated from TEX by TTH, version 4.03.
On 5 Sept 2016, 13:00.