Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-3296

Tez job can hang if two vertices at the same root distance have different task requirements

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.7.1
    • Fix Version/s: 0.7.2, 0.9.0, 0.8.4
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      When two vertices have the same distance from the root Tez will schedule containers with the same priority. However those vertices could have different task requirements and therefore different capabilities. As documented in YARN-314, YARN currently doesn't support requests for multiple sizes at the same priority. In practice this leads to one vertex allocation requests clobbering the other, and that can result in a situation where the Tez AM is waiting on containers it will never receive from the RM.

        Attachments

        1. taskschedulerlog
          7 kB
          Jason Darrell Lowe
        2. TEZ-3296.001.patch
          5 kB
          Jason Darrell Lowe

          Issue Links

            Activity

              People

              • Assignee:
                jlowe Jason Darrell Lowe
                Reporter:
                jlowe Jason Darrell Lowe
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: