CONTEXT v4: Neural network code for text categorization in C++ on GPU

Updated: July 26, 2017.    Latest version: CONTEXT v4.00 (July 22, 2016).    GitHub
What's new?    v4 includes deep pyramid CNN (DPCNN) of [JZ17].
CONTEXT provides an implementation of the following types of neural network for text categorization:

Looking for a tool?

System requirements    This code runs only on a CUDA-capable GPU such as Tesla K20. That is, your system must have a GPU and an appropriate version of CUDA installed. The provided makefile and example shell scripts are for Unix-like systems. Testing was done on Linux. In principle, the C++ code should compile and run also in other systems (e.g., Windows), but no guarantee. See README for more details.

Download & Documentation

Getting started

  1. Download the code and extract the files, and read README.
  2. Go to the top directory and build executables by make, after customizing makefile as needed.
  3. To confirm installation, go to examples/ and enter ./sample.sh to train and test a network on small data.
    (See README for installation trouble shooting.)
  4. Read Section 1 (Overview) of User Guide to get an idea.
  5. Try some shell scripts at examples/. There is a table of the scripts in Section 1.6 of User Guide.

Data Source

The data files in the code/data archives were derived from Large Movie Review Dataset (IMDB) [MDPHN11] and Amazon reviews [ML13].

License

This program is free software issued under the GNU General Public License V3 .

References

[JZ17] Rie Johnson and Tong Zhang. Deep pyramid convolutional neural networks for text categorization. ACL 2017.
[JZ16b] Rie Johnson and Tong Zhang. Convolutional neural networks for text categorization: shallow word-level vs. deep character-level. arXiv:1609.00718, 2016.
[JZ16a] Rie Johnson and Tong Zhang. Supervised and semi-supervised text categorization using LSTM for region embeddings. ICML 2016.
[JZ15b] Rie Johnson and Tong Zhang. Semi-supervised convolutional neural networks for text categorization via region embedding. NIPS 2015.
[JZ15a] Rie Johnson and Tong Zhang. Effective use of word order for text categorization with convolutional neural networks. NAACL-HLT 2015.
[ML13] Julian McAuley and Jure Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. RecSys, 2013.
[MDPHN11] Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. Learning word vectors for sentiment analysis. ACL, 2011.