Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3483

CapacityScheduler reserves container on same node as AM but can't ever use due to never enough avail memory

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.3.0
    • Fix Version/s: None
    • Component/s: mrv2
    • Labels:
      None
    • Target Version/s:

      Description

      Saw a case where a job was stuck trying to get reducers. The issue is the capacity scheduler reserved a container on the same node as the application master but there wasn't ever enough memory to run the reducer on that node. Node total memory was 8G, Reducer needed 8G, AM was using 2G. This particular job had 10 reducers and it was stuck waiting on the one because the AM + reserved reducer memory was already over the queue limit.

        Activity

        Hide
        Vinod Kumar Vavilapalli added a comment -

        Moving bugs out of previously closed releases into the next minor release 2.8.0.

        Show
        Vinod Kumar Vavilapalli added a comment - Moving bugs out of previously closed releases into the next minor release 2.8.0.
        Hide
        Arun C Murthy added a comment -

        Thanks Thomas. The way the CS is setup, this is an extremely rare corner case which is hard to fix.

        The essential problem here is that in your setup the queue has so little capacity that it can't get more than one reduce slot...

        For now, I'll downgrade as this is a corner case as I think about ways to fix this.

        Show
        Arun C Murthy added a comment - Thanks Thomas. The way the CS is setup, this is an extremely rare corner case which is hard to fix. The essential problem here is that in your setup the queue has so little capacity that it can't get more than one reduce slot... For now, I'll downgrade as this is a corner case as I think about ways to fix this.
        Hide
        Thomas Graves added a comment -

        I believe the queue capacity was actually 4.8G - 12% of 40G cluster. The UI displayed capacity at 208% usage when this occurred.

        2011-11-29 20:51:53,948 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: User limit
        computation for userfoo in queue default userLimit=100 userLimitFactor=1.0 required: memory: 8192 consumed: memory:
        10240 limit: 8192 queueCapacity: 8192 qconsumed: 10240 currentCapacity: 18432 activeUsers: 1 clusterCapacity: 40960
        2011-11-29 20:51:53,948 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: User userfoo
        in queue default will exceed limit - consumed: memory: 10240 limit: memory: 8192

        Show
        Thomas Graves added a comment - I believe the queue capacity was actually 4.8G - 12% of 40G cluster. The UI displayed capacity at 208% usage when this occurred. 2011-11-29 20:51:53,948 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: User limit computation for userfoo in queue default userLimit=100 userLimitFactor=1.0 required: memory: 8192 consumed: memory: 10240 limit: 8192 queueCapacity: 8192 qconsumed: 10240 currentCapacity: 18432 activeUsers: 1 clusterCapacity: 40960 2011-11-29 20:51:53,948 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: User userfoo in queue default will exceed limit - consumed: memory: 10240 limit: memory: 8192
        Hide
        Arun C Murthy added a comment -

        Thanks Jonathan. Per-chance, do you know the queue's capacity?

        Show
        Arun C Murthy added a comment - Thanks Jonathan. Per-chance, do you know the queue's capacity?
        Hide
        Jonathan Eagles added a comment -

        This is message in Tom's log file.

        2011-11-29 03:47:59,399 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default
        used=memory: 10240 numContainers=2 user=hadoop resources=memory: 10240

        Show
        Jonathan Eagles added a comment - This is message in Tom's log file. 2011-11-29 03:47:59,399 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default used=memory: 10240 numContainers=2 user=hadoop resources=memory: 10240
        Hide
        Arun C Murthy added a comment -

        So, Thomas - the queue only had 10G as the limit?

        Show
        Arun C Murthy added a comment - So, Thomas - the queue only had 10G as the limit?
        Hide
        Mahadev konar added a comment -

        MAPREDUCE-2917 actually tried to solve this exact issue.

        Show
        Mahadev konar added a comment - MAPREDUCE-2917 actually tried to solve this exact issue.
        Hide
        Thomas Graves added a comment -

        No I don't think they are the same. In this case, it was just that one node that the reducer shouldn't have been scheduled/reserved on. It could have been scheduled on any of the other nodes in the cluster as they (eventually) had enough memory. All the other nodes in the cluster may have been running maps from that particular job or tasks from other jobs when the reservation was made but those eventually finished and the reducer asking for 8G would have been scheduled. I thinks its a special case of don't reserve a container on a node if node memory - AM memory < requested memory that container (where the requested container is in the same job as AM). The AM is never going to finish before the requested container so the container will never get scheduled on that node.

        Show
        Thomas Graves added a comment - No I don't think they are the same. In this case, it was just that one node that the reducer shouldn't have been scheduled/reserved on. It could have been scheduled on any of the other nodes in the cluster as they (eventually) had enough memory. All the other nodes in the cluster may have been running maps from that particular job or tasks from other jobs when the reservation was made but those eventually finished and the reducer asking for 8G would have been scheduled. I thinks its a special case of don't reserve a container on a node if node memory - AM memory < requested memory that container (where the requested container is in the same job as AM). The AM is never going to finish before the requested container so the container will never get scheduled on that node.
        Hide
        Aaron T. Myers added a comment -

        In some sense this seems like a specific case of the problem which was supposed to be addressed by MAPREDUCE-2324, right?

        Show
        Aaron T. Myers added a comment - In some sense this seems like a specific case of the problem which was supposed to be addressed by MAPREDUCE-2324 , right?

          People

          • Assignee:
            Arun C Murthy
            Reporter:
            Thomas Graves
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development