[SPARK-5564] Support sparse LDA solutions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 1.3.0
Fix Version/s: None
Component/s: MLlib
Labels:
- bulk-closed

Description

Latent Dirichlet Allocation (LDA) currently requires that the priors’ concentration parameters be > 1.0. It should support values > 0.0, which should encourage sparser topics (phi) and document-topic distributions (theta).

For EM, this will require adding a projection to the M-step, as in: Vorontsov and Potapenko. "Tutorial on Probabilistic Topic Modeling : Additive Regularization for Stochastic Matrix Factorization." 2014.

Attachments

Issue Links

is required by

SPARK-5572 LDA improvement listing

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Joseph K. Bradley

Votes:: 2 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 03/Feb/15 18:51

Updated:: 21/May/19 04:16

Resolved:: 21/May/19 04:16