Resource files used by example shell scripts of ConText v4


Some of the example shell scripts for semi-supervised learning, included in CONTEXT, use files that are not included in the package, due to their sizes. The shell scripts download them automatically as needed. However, if for some reason it is more convenient to download them in advance, they can be manually downloaded from this page.

Instructions

  1. Download the file you need by clicking the link below.
  2. Set the current directory to examples/ and extract the file content by tar -xvf, for example,
         cd examples
         tar -xvf imdb-unlab.txt.tok.tar.gz

Doing so places the resource files at examples/for-semi. The example shell scripts do not attempt downloading if they find the required files at examples/for_semi.


Unlabeled tokenized text files


Optional resource files

Some of the example shell scripts use the results of training conducted by another example shell script (e.g., unsupervised embedding training). For convenience, those training results are provided. As said above, the example shell scripts attempt to download and decompress them as needed, and so they should be downloaded from this page only if it is more convenient to do so for some reason. Please follow the instructions above.

NOTE1: Use of the files below are optional. You can generate these files by yourself using the example shell scripts.
NOTE2: The files are in the little-endian format (Intel convention), and they cannot be used in the systems with Big Endian (Motorola convention).


Optional resource files for IMDB

Optional resource files for Elec

Optional resource files for RCV1