Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1707

Spark-itemsimilarity uses too much memory

    Details

      Description

      java.lang.OutOfMemoryError: Java heap space

      The code has an unnecessary .collect(), forcing all interaction data into memory of the client/driver. Increasing the executor memory will not help with this.

      remove this line and rebuild Mahout.
      https://github.com/apache/mahout/blob/mahout-0.10.x/spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala#L157

      The errant line reads:

      interactions.collect()

      This forces the user action data into memory, a bad thing for memory consumption. Removing it should allow for better Spark memory management.

        Activity

        Hide
        pferrel Pat Ferrel added a comment -

        removed bad collect.

        Show
        pferrel Pat Ferrel added a comment - removed bad collect.

          People

          • Assignee:
            pferrel Pat Ferrel
            Reporter:
            pferrel Pat Ferrel
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development