Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2282

PySpark crashes if too many tasks complete quickly

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.1, 1.0.0, 1.0.1
    • Fix Version/s: 0.9.2, 1.0.0, 1.0.1, 1.1.0
    • Component/s: PySpark
    • Labels:
      None

      Description

      Upon every task completion, PythonAccumulatorParam constructs a new socket to the Accumulator server running inside the pyspark daemon. This can cause a buildup of used ephemeral ports from sockets in the TIME_WAIT termination stage, which will cause the SparkContext to crash if too many tasks complete too quickly. We ran into this bug with 17k tasks completing in 15 seconds.

      This bug can be fixed outside of Spark by ensuring these properties are set (on a linux server);
      echo "1" > /proc/sys/net/ipv4/tcp_tw_reuse
      echo "1" > /proc/sys/net/ipv4/tcp_tw_recycle

      or by adding the SO_REUSEADDR option to the Socket creation within Spark.

        Attachments

          Activity

            People

            • Assignee:
              ilikerps Aaron Davidson
              Reporter:
              ilikerps Aaron Davidson
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: