Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-3950

Preempted task attempts intermittently marked as FAILED instead of KILLED

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.9.2, 0.10.0
    • None
    • None
    • None

    Description

      TestMockDAGAppMaster.testInternalPreemption intermittently fails with expected:<KILLED> but was:<FAILED>

      Crux of the matter is TaskSchedulerManager sends two events

      • TaskScheduler#deallocatedContainer->TaskSchedulerManager#containerBeingReleased->Sends AMContainerStopRequest -> TA_CONTAINER_TERMINATING
      • AMContainerEventCompleted -> TA_CONTAINER_TERMINATED_BY_SYSTEM

      In order to kill a task attempt correctly the second message loop must complete first. The first path is longer so the second message loop completes almost always first. When the first message loop completes first, then the task attempt is marked as FAILED and not KILLED.

      Attachments

        1. TEZ-3950.fail.patch
          0.9 kB
          Jonathan Turner Eagles

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jeagles Jonathan Turner Eagles
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: