Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-914 (Umbrella) Support graceful decommission of nodemanager
  3. YARN-4677

RMNodeResourceUpdateEvent update from scheduler can lead to race condition

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      When a node is in decommissioning state, there is time window between completedContainer() and RMNodeResourceUpdateEvent get handled in scheduler.nodeUpdate (YARN-3223).

      So if a scheduling effort happens within this window, the new container could still get allocated on this node. Even worse case is if scheduling effort happen after RMNodeResourceUpdateEvent sent out but before it is propagated to SchedulerNode - then the total resource is lower than used resource and available resource is a negative value.

        Attachments

        1. YARN-4677.01.patch
          18 kB
          wilfreds#1
        2. YARN-4677-branch-2.001.patch
          19 kB
          Greg Phillips
        3. YARN-4677-branch-2.002.patch
          19 kB
          Greg Phillips
        4. YARN-4677-branch-2.003.patch
          20 kB
          wilfreds#1

        Issue Links

          Activity

            People

            • Assignee:
              wilfreds Wilfred Spiegelenburg
              Reporter:
              brookz Brook Zhou

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment