Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4191

capacity scheduler: job unexpectedly exceeds queue capacity limit by one task

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.23.3
    • Fix Version/s: None
    • Component/s: mrv2, scheduler
    • Labels:
      None

      Description

      While testing the queue capacity limits, it appears that the job can exceed the
      queue capacity limit by one task while the user limit factor is 1. It's not
      clear to me why this is.

      Here is the steps to reproduce:

      1) set yarn.app.mapreduce.am.resource.mb to 2048 (default value)
      2) set yarn.scheduler.capacity.root.default.user-limit-factor to 1.0 (default)
      3) set yarn.scheduler.capacity.root.default.capacity to 90 (%)
      4) For a cluster with capacity of 56G, 90% rounded up is 51.
      5) submit a job with large number of tasks, each task using 1G memory.
      6) webui shows that the used resource is 52 G, which is 92.9% of the cluster
      capacity (instead of the expected 90%), and 103.2% of the queue capacity
      (instead of the expected 100%).

        Activity

        Hide
        Thomas Graves added a comment -

        I'm still following this through to fully understand, but there is a comment in the code in LeafQueue that tries to explain this:

        // Note: We aren't considering the current request since there is a fixed
        // overhead of the AM, but it's a > check, not a >= check, so...

        Which I don't totally follow, I guess if you have one job in the queue that is taking the entire capacity, it allows the job to be more like it was in mrv1 and tries not to penalize you for the AM overhead. The AM however is doing the setup and clean tasks where as in mrv1 it would need to allocate a slot for those. The AM may have fixed overhead but that overhead is configurable. I could create an AM with 24G of memory or use the default of 1.5G. Or on the flip side, I have an AM that uses 1.5G, but have a map task that now gets scheduled and uses 24G which puts it way over its capacity. That could affect the queue current usage greatly and seems to break the capacity guarantee.

        In the case where you say have 2 jobs in the queue, you have 2 app masters, one of which is "counted' against your queue and then the other one is not.

        I do see it beneficial to queues with very small capacities though, as without this they could be stuck without enough resources to run a task.

        Arun or anyone else familiar with capacity scheduler, if you could provide explanation that would be great.

        Show
        Thomas Graves added a comment - I'm still following this through to fully understand, but there is a comment in the code in LeafQueue that tries to explain this: // Note: We aren't considering the current request since there is a fixed // overhead of the AM, but it's a > check, not a >= check, so... Which I don't totally follow, I guess if you have one job in the queue that is taking the entire capacity, it allows the job to be more like it was in mrv1 and tries not to penalize you for the AM overhead. The AM however is doing the setup and clean tasks where as in mrv1 it would need to allocate a slot for those. The AM may have fixed overhead but that overhead is configurable. I could create an AM with 24G of memory or use the default of 1.5G. Or on the flip side, I have an AM that uses 1.5G, but have a map task that now gets scheduled and uses 24G which puts it way over its capacity. That could affect the queue current usage greatly and seems to break the capacity guarantee. In the case where you say have 2 jobs in the queue, you have 2 app masters, one of which is "counted' against your queue and then the other one is not. I do see it beneficial to queues with very small capacities though, as without this they could be stuck without enough resources to run a task. Arun or anyone else familiar with capacity scheduler, if you could provide explanation that would be great.

          People

          • Assignee:
            Thomas Graves
            Reporter:
            Thomas Graves
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development