Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.4.0
-
None
Description
With Dynamic Allocation function, a task failed over maxFailure time, all the dependent jobs, stages, tasks will be killed or aborted. In this process, SparkListenerTaskEnd event will be behind in SparkListenerStageCompleted and SparkListenerJobEnd. Like the Event Log below:
{"Event":"SparkListenerStageCompleted","Stage Info":{"Stage ID":20,"Stage Attempt ID":0,"Stage Name":"run at AccessController.java:-2","Number of Tasks":200} {"Event":"SparkListenerJobEnd","Job ID":9,"Completion Time":1444914699829} {"Event":"SparkListenerTaskEnd","Stage ID":20,"Stage Attempt ID":0,"Task Type":"ResultTask","Task End Reason":{"Reason":"TaskKilled"},"Task Info":{"Task ID":1955,"Index":88,"Attempt":2,"Launch Time":1444914699763,"Executor ID":"5","Host":"linux-223","Locality":"PROCESS_LOCAL","Speculative":false,"Getting Result Time":0,"Finish Time":1444914699864,"Failed":true,"Accumulables":[]}}
Because that, the numRunningTasks in ExecutorAllocationManager class will be less than 0, and it will affect executor allocation.
Attachments
Issue Links
- is duplicated by
-
SPARK-18981 The last job hung when speculation is on
- Resolved
-
SPARK-16708 ExecutorAllocationManager.numRunningTasks can be negative when stage retry
- Resolved
-
SPARK-22312 Spark job stuck with no executor due to bug in Executor Allocation Manager
- Resolved
- relates to
-
SPARK-27630 Stage retry causes totalRunningTasks calculation to be negative
- Resolved
- links to