Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27041

large partition data cause pyspark with python2.x oom

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.0
    • 3.0.0
    • PySpark
    • None

    Description

      With large partition, pyspark may exceeds executor memory limit and trigger out of memory for python 2.7.
      This is because map() is used. Unlike in python3.x, python 2.7 map() will generate a list and need to read all data into memory.

      The proposed fix will use imap in python 2.7 and it has been verified.

      Attachments

        Issue Links

          Activity

            People

              TigerYang414 David Yang
              TigerYang414 David Yang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: