Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1707

Spark-itemsimilarity uses too much memory

    Details

      Description

      java.lang.OutOfMemoryError: Java heap space

      The code has an unnecessary .collect(), forcing all interaction data into memory of the client/driver. Increasing the executor memory will not help with this.

      remove this line and rebuild Mahout.
      https://github.com/apache/mahout/blob/mahout-0.10.x/spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala#L157

      The errant line reads:

      interactions.collect()

      This forces the user action data into memory, a bad thing for memory consumption. Removing it should allow for better Spark memory management.

        Attachments

          Activity

            People

            • Assignee:
              pferrel Pat Ferrel
              Reporter:
              pferrel Pat Ferrel
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: