[SPARK-9245] DistributedLDAModel predict top topic per doc-term instance - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.5.0
Component/s: MLlib
Labels:
None

Description

For each (document, term) pair, return top topic. Note that instances of (doc, term) pairs within a document (a.k.a. "tokens") are exchangeable, so we should provide an estimate per document-term, rather than per token.

Synopsis for DistributedLDAModel:

/** @return RDD of (doc ID, vector of top topic index for each term) */
def topTopicAssignments: RDD[(Long, Vector)]

Note that using Vector will let us have a sparse encoding which is Java-friendly.

Attachments

Issue Links

is required by

SPARK-5572 LDA improvement listing

Resolved

links to

[Github] Pull Request #8329 (jkbradley)

Activity

People

Assignee:: Joseph K. Bradley

Reporter:: Joseph K. Bradley

Shepherd:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 22/Jul/15 06:32

Updated:: 20/Aug/15 22:09

Resolved:: 20/Aug/15 22:01

Time Tracking

Estimated:

48h

Remaining:

48h

Logged:

Not Specified