Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-682

The LDA output does not include the topic-probability distribution per document (p(z|d)). It outputs only the topics and corresponding words.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.4
    • 0.5
    • classic
    • None

    Description

      The current implementation of LDA outputs only topics and their words. Many applications need the p(z|d) values of a document to use this vector as a reduced representation of the document (dimensionality reduction of document). We need to introduce a new key which would keep track of the gamma values for each document (as obtained from the document.infer() method) and writes these to the output stream and finally, PrintLDATopics should output these values per document id. Also, outputting the probabilities of words in a topic would also provide a more meaningful output.

      Attachments

        Issue Links

          Activity

            People

              jake.mannix Jake Mannix
              hgahlot Himanshu Gahlot
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: