Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-3491

Tez job can hang due to container priority inversion

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.7.1
    • Fix Version/s: 0.9.0, 0.8.5
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      If the Tez AM receives containers at a lower priority than the highest priority task being requested then it fails to assign the container to any task. In addition if the container is new then it refuses to release it if there are any pending tasks. If it takes too long for the higher priority requests to be fulfilled (e.g.: the lower priority containers are filling the queue) then eventually YARN will expire the unused lower priority containers since they were never launched. The Tez AM then never re-requests these lower priority containers and the job hangs because the AM is waiting for containers from the RM that the RM already sent and expired.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jlowe Jason Lowe
                Reporter:
                jlowe Jason Lowe
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: