Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
1.3.0
-
None
Description
Latent Dirichlet Allocation (LDA) currently operates only on vectors of word counts. It should also supporting training and prediction using text (Strings).
This plan is sketched in the original LDA design doc.
There should be:
- runWithText() method which takes an RDD with a collection of Strings (bags of words). This will also index terms and compute a dictionary.
- dictionary parameter for when LDA is run with word count vectors
- prediction/feedback methods returning Strings (such as describeTopicsAsStrings, which is commented out in LDA currently)
Attachments
Issue Links
- is required by
-
SPARK-5572 LDA improvement listing
- Resolved
- requires
-
SPARK-8169 Add StopWordsRemover as a transformer
- Resolved
-
SPARK-9578 Stemmer feature transformer
- Resolved