Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1707

Spark-itemsimilarity uses too much memory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.10.0
    • 0.10.1
    • None
    • None
    • Spark

    Description

      java.lang.OutOfMemoryError: Java heap space

      The code has an unnecessary .collect(), forcing all interaction data into memory of the client/driver. Increasing the executor memory will not help with this.

      remove this line and rebuild Mahout.
      https://github.com/apache/mahout/blob/mahout-0.10.x/spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala#L157

      The errant line reads:

      interactions.collect()

      This forces the user action data into memory, a bad thing for memory consumption. Removing it should allow for better Spark memory management.

      Attachments

        Activity

          People

            pferrel Pat Ferrel
            pferrel Pat Ferrel
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: