Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-953

Latent Dirichlet Association (LDA model)

    XMLWordPrintableJSON

    Details

    • Type: Story
    • Status: Resolved
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 0.7.3
    • Fix Version/s: None
    • Component/s: Examples
    • Labels:
      None

      Description

      This code is for learning the LDA model. However, if our input is 2.5 M documents per machine, a dictionary with 10000 words, running in EC2 m2.4xlarge instance with 68 G memory each machine. The time is really really slow. For five iterations, the time cost is 8145, 24725, 51688, 58674, 56850 seconds. The time for shuffling is quite slow. The LDA.tbl is the simulated data set for the program, and it is quite fast.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                caizhua caizhua
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: