Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14658

when executor lost DagScheduer may submit one stage twice even if the first running taskset for this stage is not finished

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.6.1, 2.0.0, 2.1.0, 2.2.0
    • 2.2.0
    • Scheduler, Spark Core
    • None
    • spark1.6.1 hadoop-2.6.0-cdh5.4.2

    Description

      16/04/14 15:35:22 ERROR DAGSchedulerEventProcessLoop: DAGSchedulerEventProcessLoop failed; shutting down SparkContext
      java.lang.IllegalStateException: more than one active taskSet for stage 57: 57.2,57.1
              at org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:173)
              at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1052)
              at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:921)
              at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1214)
              at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1637)
              at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
              at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
              at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
      

      First Time:

      16/04/14 15:35:20 INFO DAGScheduler: Resubmitting ShuffleMapStage 57 (run at AccessController.java:-2) because some of its tasks had failed: 5, 8, 9, 12, 13, 16, 17, 18, 19, 23, 26, 27, 28, 29, 30, 31, 40, 42, 43, 48, 49, 50, 51, 52, 53, 55, 56, 57, 59, 60, 61, 67, 70, 71, 84, 85, 86, 87, 98, 99, 100, 101, 108, 109, 110, 111, 112, 113, 114, 115, 126, 127, 134, 136, 137, 146, 147, 150, 151, 154, 155, 158, 159, 162, 163, 164, 165, 166, 167, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 188, 189, 190, 191, 198, 199, 204, 206, 207, 208, 218, 219, 222, 223, 230, 231, 236, 238, 239
      16/04/14 15:35:20 DEBUG DAGScheduler: submitStage(ShuffleMapStage 57)
      16/04/14 15:35:20 DEBUG DAGScheduler: missing: List()
      16/04/14 15:35:20 INFO DAGScheduler: Submitting ShuffleMapStage 57 (MapPartitionsRDD[7887] at run at AccessController.java:-2), which has no missing parents
      16/04/14 15:35:20 DEBUG DAGScheduler: submitMissingTasks(ShuffleMapStage 57)
      16/04/14 15:35:20 INFO DAGScheduler: Submitting 100 missing tasks from ShuffleMapStage 57 (MapPartitionsRDD[7887] at run at AccessController.java:-2)
      16/04/14 15:35:20 DEBUG DAGScheduler: New pending partitions: Set(206, 177, 127, 98, 48, 27, 23, 163, 238, 188, 159, 28, 109, 59, 9, 176, 126, 207, 174, 43, 170, 208, 158, 108, 29, 8, 204, 154, 223, 173, 219, 190, 111, 61, 40, 136, 115, 86, 57, 155, 55, 230, 222, 180, 172, 151, 101, 18, 166, 56, 137, 87, 52, 171, 71, 42, 167, 198, 67, 17, 236, 165, 13, 5, 53, 178, 99, 70, 49, 218, 147, 164, 114, 85, 60, 31, 179, 150, 19, 100, 50, 175, 146, 134, 113, 84, 51, 30, 199, 26, 16, 191, 162, 112, 12, 239, 231, 189, 181, 110)
      

      Second Time:

      16/04/14 15:35:22 INFO DAGScheduler: Resubmitting ShuffleMapStage 57 (run at AccessController.java:-2) because some of its tasks had failed: 26
      16/04/14 15:35:22 DEBUG DAGScheduler: submitStage(ShuffleMapStage 57)
      16/04/14 15:35:22 DEBUG DAGScheduler: missing: List()
      16/04/14 15:35:22 INFO DAGScheduler: Submitting ShuffleMapStage 57 (MapPartitionsRDD[7887] at run at AccessController.java:-2), which has no missing parents
      16/04/14 15:35:22 DEBUG DAGScheduler: submitMissingTasks(ShuffleMapStage 57)
      16/04/14 15:35:22 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 57 (MapPartitionsRDD[7887] at run at AccessController.java:-2)
      16/04/14 15:35:22 DEBUG DAGScheduler: New pending partitions: Set(26)
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            yixiaohua yixiaohua
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment