Description
The old LDA is unmaintained and unsupported. We already (since 0.6) have a newer, faster version in the o.a.m.clustering.lda.cvb package, which I'm actively working on and using in production at Twitter. We should delete the old o.a.m.clustering.lda codebase.
Normally, I'd say that we should at the same time promote o.a.m.clustering.lda.cvb up a package-level, but that would cause some serious merge conflicts on my GitHub branch (with updates/improvements/new features targetted for 0.8), so we can get users on this new code by simply changing the driver.classes.props to have "lda" point to CVB0Driver as the main().
One thing which goes away entirely, is the LDAPrintTopics class, but it's replaced by simply doing VectorDumper with the -sort option on the model files, which is more standard anyways.
Attachments
Issue Links
- is related to
-
MAHOUT-1024 cluster_reuters.sh still relies on old (now removed) lda implementation
- Closed
- supercedes
-
MAHOUT-399 LDA on Mahout 0.3 does not converge to correct solution for overlapping pyramids toy problem.
- Closed