Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14322

Use treeAggregate instead of reduce in OnlineLDAOptimizer

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.1, 1.4.1, 1.5.2, 1.6.1, 2.0.0
    • Fix Version/s: 1.5.3, 1.6.2, 2.0.0
    • Component/s: ML, MLlib
    • Labels:
      None

      Description

      OnlineLDAOptimizer uses RDD.reduce in two places where it could use treeAggregate. This can cause scalability issues. This should be an easy fix.

      This is also a bug since it modifies the first argument to reduce, so we should use aggregate or treeAggregate.

      See this line: https://github.com/apache/spark/blob/f12f11e578169b47e3f8b18b299948c0670ba585/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala#L452
      and a few lines below it.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                yuhaoyan yuhao yang
                Reporter:
                josephkb Joseph K. Bradley
                Shepherd:
                Joseph K. Bradley
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: