Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-10243

Rack-only localization constraint for MR AM is broken for CapacityScheduler

    XMLWordPrintableJSON

    Details

    • Target Version/s:

      Description

      Reproduction: Start a MR sleep job with strict-locality configured for AM (-Dmapreduce.job.am.strict-locality=/rack1 for instance). If CapacityScheduler is used, the job will hang (stuck in SCHEDULED state).

      Root cause: if there are no other resources requested (like node locality or other constraint), the scheduling opportunities counter will not be incremented and the following piece of code always returns false (so we always skip this constraint) resulting in an infinite loop:

          // If we are here, we do need containers on this rack for RACK_LOCAL req
          if (type == NodeType.RACK_LOCAL) {
            // 'Delay' rack-local just a little bit...
            long missedOpportunities =
                application.getSchedulingOpportunities(schedulerKey);
            return getActualNodeLocalityDelay() < missedOpportunities;
          }
      

      Workaround: set yarn.scheduler.capacity.node-locality-delay to zero to enforce this rule to be processed immediately.

        Attachments

          Activity

            People

            • Assignee:
              BilwaST Bilwa S T
              Reporter:
              adam.antal Adam Antal
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: