Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9134

LDA Asymmetric topic-word prior

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • MLlib

    Description

      SPARK-8536 generalizes LDA to asymmetric document-topic priors, which Wallach et al proposes offers greater utility in terms of asymmetric priors.

      However, Stanford NLP also permits asymmetric priors on the topic-word prior. We should not support manually specifying the entire matrix (which has numTopics * vocabSize entries); rather we should follow Stanford NLP and take a single vector of length vocabSize for a prior over words and assume that all topics share this prior (e.g. replicate it numTopics times to get the topic-word prior matrix).

      We are leaving this as todo; any users who have a need for this feature should discuss on this JIRA.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              fliang Feynman Liang
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: