Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14322

Use treeAggregate instead of reduce in OnlineLDAOptimizer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.3.1, 1.4.1, 1.5.2, 1.6.1, 2.0.0
    • 1.5.3, 1.6.2, 2.0.0
    • ML, MLlib
    • None

    Description

      OnlineLDAOptimizer uses RDD.reduce in two places where it could use treeAggregate. This can cause scalability issues. This should be an easy fix.

      This is also a bug since it modifies the first argument to reduce, so we should use aggregate or treeAggregate.

      See this line: https://github.com/apache/spark/blob/f12f11e578169b47e3f8b18b299948c0670ba585/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala#L452
      and a few lines below it.

      Attachments

        Issue Links

          Activity

            People

              yuhaoyan yuhao yang
              josephkb Joseph K. Bradley
              Joseph K. Bradley Joseph K. Bradley
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: