Distributed Representations

The Discovery Analytics Lab is investigating distributed representations of unstructured collections as well as their underlying topics. Although topic modeling techniques have been widely used to uncover dominant themes hidden inside an unstructured document collection through the use of probabilistic analysis of word distributions, many deep learning approaches have been adopted recently. Discovery Analytics Lab is focusing on neural network based architectures that produce distributed representations of topics to capture topical themes in a dataset and concurrently generates distributed representation of words and documents that directly use neighboring words for training. The networks, for topic modeling and generation of distributed representations, are trained concurrently in a cascaded style with better runtime without sacrificing the quality of the topics.

Some empirical studies reported in one of our papers show that the distributed representations of topics represent intuitive themes using smaller dimensions than conventional topic modeling approaches.