Concurrent Inference of Topic Models and

Distributed Vector Representations

Debakar Shamanta, Sheikh Motahar Naim, Parang Saraf, Naren Ramakrishnan, and M. Shahriar Hossain


The paper is available here.

 

 

This page contains supplementary materials for the paper “Concurrent Inference of Topic Models and Distributed Vector Representations” published in the proceedings of the ECML PKDD 2015. The datasets used in the paper can be found in the following links.

1.      Synthetic Dataset: Download

2.      20 Newsgroups: Download
Original source:
http://qwone.com/~jason/20Newsgroups/

3.      Reuters R8: Download
Original source:
http://web.ist.utl.pt/~acardoso/datasets/

4.      Reuters R52: Download
Original source:
http://web.ist.utl.pt/~acardoso/datasets/

5.      WebKB: Download
Original source:
http://web.ist.utl.pt/~acardoso/datasets/

6.      PubMed:
Since the dataset is large we are providing the python script to prepare the dataset.
Click here to download the Python script.

7.      Spanish-news:
This dataset is collected as a part of the EMBERS project directed by Dr. Naren Ramakrishnan. Please contact Dr. Ramakrishnan (http://people.cs.vt.edu/naren/) to discuss about this dataset.

 

Source codes of a prototype of the model we proposed in the paper can be downloaded from this link: http://dal.cs.utep.edu/projects/tvec/codes.zip.