Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2696

Reduce default spark.serializer.objectStreamReset

    XMLWordPrintableJSON

Details

    Description

      The current default value of spark.serializer.objectStreamReset is 10,000.
      When trying to re-partition (e.g., to 64 partitions) a large file (e.g., 500MB), containing 1MB records, the serializer will cache 10000 x 1MB x 64 = 640 GB which will cause it to go out of memory.

      We think 100 would be a more reasonable default value for this configuration parameter.

      Attachments

        Activity

          People

            falaki Hossein Falaki
            falaki Hossein Falaki
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: