Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16925

Spark tasks which cause JVM to exit with a zero exit code may cause app to hang in Standalone mode

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.6.0, 2.0.0
    • 1.6.3, 2.0.1, 2.1.0
    • Deploy
    • None

    Description

      If you have a Spark standalone cluster which runs a single application and you have a Spark task which repeatedly fails by causing the executor JVM to exit with a zero exit code then this may temporarily freeze / hang the Spark application.

      For example, running

              sc.parallelize(1 to 1, 1).foreachPartition { _ => System.exit(0) }
      

      on a cluster will cause all executors to die but those executors won't be replaced unless another Spark application or worker joins or leaves the cluster. This is caused by a bug in the standalone Master where schedule() is only called on executor exit when the exit code is non-zero, whereas I think that we should always call schedule() even on a "clean" executor shutdown since schedule() should always be safe to call.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            joshrosen Josh Rosen
            joshrosen Josh Rosen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment