Data used in the CNN experiments

Updated: April 25, 2016

Elec: electronic product reviews for sentiment classification

These datasets were derived from a large collection of Amazon reviews [3]. Note that the data should be used for research purposes only.

RCV1 (Reuters Corpus Version 1)

Information on how to obtain RCV1 from NIST is here.

References

[1] Rie Johnson and Tong Zhang. Effective use of word order for text categorization with convolutional neural networks. NAACL HLT 2015.
[2] Rie Johnson and Tong Zhang. Semi-supervised convolutional neural networks for text categorization via region embedding. NIPS 2015. pdf  supplemental
[3] Julian McAuley and Jure Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. RecSys, 2013.
[4] David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361-397, 2004.