[MAHOUT-682] The LDA output does not include the topic-probability distribution per document (p(z|d)). It outputs only the topics and corresponding words. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.4
Fix Version/s: 0.5
Component/s: classic
Labels:
None

Description

The current implementation of LDA outputs only topics and their words. Many applications need the p(z|d) values of a document to use this vector as a reduced representation of the document (dimensionality reduction of document). We need to introduce a new key which would keep track of the gamma values for each document (as obtained from the document.infer() method) and writes these to the output stream and finally, PrintLDATopics should output these values per document id. Also, outputting the probabilities of words in a topic would also provide a more meaningful output.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ASF.LICENSE.NOT.GRANTED--MAHOUT-458.patch
27/Apr/11 22:45
38 kB
Jake Mannix
ASF.LICENSE.NOT.GRANTED--MAHOUT-458.patch
27/Apr/11 22:45
12 kB
Jake Mannix

Issue Links

is a clone of

MAHOUT-458 The LDA output does not include the topic-probability distribution per document (p(z|d)). It outputs only the topics and corresponding words.

Closed

Activity

People

Assignee:: Jake Mannix

Reporter:: Himanshu Gahlot

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 27/Apr/11 22:45

Updated:: 31/Jan/24 22:14

Resolved:: 23/May/11 14:36