Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3554

handle large dataset in closure of PySpark

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.2.0
    • PySpark
    • None

    Description

      Sometimes there are large dataset used in closure and user forget to use broadcast for it, then the serialized command will become huge.

      py4j can not handle large objects efficiently, we should compress the serialized command and user broadcast for it if it's huge.

      Attachments

        Activity

          People

            davies Davies Liu
            davies Davies Liu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: