Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11968

ALS recommend all methods spend most of time in GC

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.2, 1.6.0
    • Fix Version/s: 2.2.0
    • Component/s: ML, MLlib
    • Labels:
      None

      Description

      After adding recommendUsersForProducts and recommendProductsForUsers to ALS in spark-perf, I noticed that it takes much longer than ALS itself. Looking at the monitoring page, I can see it is spending about 8min doing GC for each 10min task. That sounds fixable. Looking at the implementation, there is clearly an opportunity to avoid extra allocations: https://github.com/apache/spark/blob/e6dd237463d2de8c506f0735dfdb3f43e8122513/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala#L283

      CC: Xiangrui Meng

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                peng.meng@intel.com Peng Meng
                Reporter:
                josephkb Joseph K. Bradley
              • Votes:
                1 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: