Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4358

Reducers are assigned containers before all maps are assigned containers

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: None
    • Component/s: mrv2
    • Labels:
      None

      Description

      Reducers start to get containers before all maps are. We have seen this issue and it is problematic since if there is no avaialable resources for the remaining maps, the job will just stall where reducers are waiting for mappers which are unable to start because there is no containers available.

        Issue Links

          Activity

          Hide
          Robert Joseph Evans added a comment -

          Isn't that what mapreduce.job.reduce.slowstart.completedmaps is for? If you don't want any reducers to run until all of the maps have finished then you want to set it to 1.0, not the 0.05 that is the default. I think part of the issue is that the default value for mapreduce.job.reduce.slowstart.completedmaps is still set for when map and reduce slots were completely separate. Perhaps this config does not make since any more now that reduce tasks can block map tasks from running. Or perhaps we need another config so that the AM will not fill more than X% of the queue with reduces until all map tasks have completed.

          We have taken the rout of setting the slowstart to 1.0 even on our 1.0.2 clusters because it improves the cluster utilization and we have not seen much of a hit to the end to end time of our jobs.

          Show
          Robert Joseph Evans added a comment - Isn't that what mapreduce.job.reduce.slowstart.completedmaps is for? If you don't want any reducers to run until all of the maps have finished then you want to set it to 1.0, not the 0.05 that is the default. I think part of the issue is that the default value for mapreduce.job.reduce.slowstart.completedmaps is still set for when map and reduce slots were completely separate. Perhaps this config does not make since any more now that reduce tasks can block map tasks from running. Or perhaps we need another config so that the AM will not fill more than X% of the queue with reduces until all map tasks have completed. We have taken the rout of setting the slowstart to 1.0 even on our 1.0.2 clusters because it improves the cluster utilization and we have not seen much of a hit to the end to end time of our jobs.
          Hide
          Jason Lowe added a comment -

          The AM is supposed to be watching its headroom to see if it has enough resources for running maps and will even preempt (kill) reduce tasks to make room for map tasks when there isn't enough headroom. Which scheduler is being used for this test? If it's the FifoScheduler then I believe this is a duplicate of MAPREDUCE-4299. If the AM is being told an incorrectly computed headroom then it could think there is sufficient room to allocate containers for maps when in fact they've all been filled with reduce tasks.

          Show
          Jason Lowe added a comment - The AM is supposed to be watching its headroom to see if it has enough resources for running maps and will even preempt (kill) reduce tasks to make room for map tasks when there isn't enough headroom. Which scheduler is being used for this test? If it's the FifoScheduler then I believe this is a duplicate of MAPREDUCE-4299 . If the AM is being told an incorrectly computed headroom then it could think there is sufficient room to allocate containers for maps when in fact they've all been filled with reduce tasks.
          Hide
          Harsh J added a comment -

          It may also be related to Jason's earlier report/fix at MAPREDUCE-4228.

          Show
          Harsh J added a comment - It may also be related to Jason's earlier report/fix at MAPREDUCE-4228 .
          Hide
          Sharad Agarwal added a comment -

          This is already handled by actively looking at the available headroom for the job and ramp down (pre-empt) the reduces if needed. Are you seeing this issue in your cluster ?

          Show
          Sharad Agarwal added a comment - This is already handled by actively looking at the available headroom for the job and ramp down (pre-empt) the reduces if needed. Are you seeing this issue in your cluster ?
          Hide
          Ahmed Radwan added a comment -

          @Robert, I think mapreduce.job.reduce.slowstart.completedmaps is related but different. The issue here is not to wait for a % of mappers to totally complete before start allocating containers to reducers, but the issue is to prevent reducers from occupying containers while these containers are still needed by mappers.
          @Jason, watching headroom and preempting reducers should be sufficient to address this issue, but this doesn't seem to work in our case. It is using Fifo.
          @Harsh, MAPREDUCE-4228 seems to address a bug with the behavior of mapreduce.job.reduce.slowstart.completedmaps, which as I mentioned above is different.
          @Sharad, Yes we are seeing this in a customer cluster.

          Show
          Ahmed Radwan added a comment - @Robert, I think mapreduce.job.reduce.slowstart.completedmaps is related but different. The issue here is not to wait for a % of mappers to totally complete before start allocating containers to reducers, but the issue is to prevent reducers from occupying containers while these containers are still needed by mappers. @Jason, watching headroom and preempting reducers should be sufficient to address this issue, but this doesn't seem to work in our case. It is using Fifo. @Harsh, MAPREDUCE-4228 seems to address a bug with the behavior of mapreduce.job.reduce.slowstart.completedmaps, which as I mentioned above is different. @Sharad, Yes we are seeing this in a customer cluster.
          Hide
          Jason Lowe added a comment -

          Since it occurs with the FifoScheduler, it looks like a duplicate of MAPREDUCE-4299. A workaround is to configure the CapacityScheduler with a single default queue.

          Show
          Jason Lowe added a comment - Since it occurs with the FifoScheduler, it looks like a duplicate of MAPREDUCE-4299 . A workaround is to configure the CapacityScheduler with a single default queue.
          Hide
          Ahmed Radwan added a comment -

          Will try to see if the capacity scheduler solves this issue. Will update.

          Show
          Ahmed Radwan added a comment - Will try to see if the capacity scheduler solves this issue. Will update.
          Hide
          Ahmed Radwan added a comment -

          Seems to be the same issue resolved in MAPREDUCE-4299.

          Show
          Ahmed Radwan added a comment - Seems to be the same issue resolved in MAPREDUCE-4299 .

            People

            • Assignee:
              Unassigned
              Reporter:
              Ahmed Radwan
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development