Description
LDA currently supports prediction on the training set. E.g., you can call logLikelihood and topicDistributions to get that info for the training data. However, it should support the same functionality for new (test) documents.
This will require inference but should be able to use the same code, with a few modification to keep the inferred topics fixed.
Note: The API for these methods is already in the code but is commented out.
Attachments
Issue Links
- is depended upon by
-
SPARK-16786 LDA topic distributions for new documents in PySpark
- Closed
- is related to
-
SPARK-8696 Streaming API for Online LDA
- Resolved
- is required by
-
SPARK-5572 LDA improvement listing
- Resolved
- relates to
-
SPARK-6793 Implement perplexity for LDA
- Resolved
- links to