Probabilistic latent semantic analysis (PLSA) is a topic model which extracts topics from text corpus. PLSA was historically a predecessor of LDA. However recent research shows that modifications of PLSA sometimes performs better then LDA. Furthermore, the most recent paper by same authors shows that there is a clear way to extend PLSA to LDA and beyond.
We should implement distributed versions of PLSA. In addition it should be possible to easily add user defined regularizers or combination of them. We will implement regularizers that allows
- extract sparse topics
- extract human interpretable topics
- perform semi-supervised training
- sort out non-topic specific terms.
 Potapenko, K. Vorontsov. 2013. Robust PLSA performs better than LDA. In Proceedings of ECIR'13.
 Vorontsov, Potapenko. Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization. http://www.machinelearning.ru/wiki/images/1/1f/Voron14aist.pdf