Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17644

The failed stage never resubmitted due to abort stage in another thread

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0, 2.0.0
    • 2.0.1, 2.1.0
    • Scheduler, Spark Core
    • None

    Description

      there is a race condition when FetchFailed and resubmit failed stage:
      job1, job2 run in different threads, if job 1 failed 4 times due to fetchfailed and aborted, then job2 can not post ResubmitFailedStages becase the failedStages in DAGScheduler is not empty now.

      Attachments

        Activity

          People

            scwf Fei Wang
            scwf Fei Wang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: