Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1009

Remove old LDA implementation from codebase

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.7
    • 0.7
    • classic
    • None

    Description

      The old LDA is unmaintained and unsupported. We already (since 0.6) have a newer, faster version in the o.a.m.clustering.lda.cvb package, which I'm actively working on and using in production at Twitter. We should delete the old o.a.m.clustering.lda codebase.

      Normally, I'd say that we should at the same time promote o.a.m.clustering.lda.cvb up a package-level, but that would cause some serious merge conflicts on my GitHub branch (with updates/improvements/new features targetted for 0.8), so we can get users on this new code by simply changing the driver.classes.props to have "lda" point to CVB0Driver as the main().

      One thing which goes away entirely, is the LDAPrintTopics class, but it's replaced by simply doing VectorDumper with the -sort option on the model files, which is more standard anyways.

      Attachments

        Issue Links

          Activity

            People

              ssc Sebastian Schelter
              jake.mannix Jake Mannix
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: