Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4867

reduces tasks won't start in certain circumstances

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.0.4
    • Fix Version/s: None
    • Component/s: scheduler
    • Labels:
      None

      Description

      Reduce tasks start are conditioned by the value of "mapred.reduce.slowstart.completed.maps". However, if the number of completed map tasks never reached the configured value (for example because "mapred.max.map.failures.percent" has been set to a high value, to permit a job to have a lot of failed tasks), then the reduce tasks won't start.
      The job is still running, all map tasks are finished (either successful or not), and all reduce tasks are still pending. The only thing one can do is to kill the job.

      There are 2 things that could be done :

      • document the relation between "mapred.max.map.failures.percent" and "mapred.reduce.slowstart.completed.maps" : we can say that the rule to follow if you want to be sure that your reduce tasks will start is : "mapred.reduce.slowstart.completed.maps * 100 < 100 - mapred.max.map.failures.percent"
      • fix JobInProgress.scheduleReduces() to return true if all map tasks are finished

        Activity

        Vincent Behar created issue -
        Hide
        Jason Lowe added a comment -

        I believe this is a duplicate of MAPREDUCE-2129 which was fixed in 1.1.0.

        Show
        Jason Lowe added a comment - I believe this is a duplicate of MAPREDUCE-2129 which was fixed in 1.1.0.
        Hide
        Vincent Behar added a comment -

        yes it is a duplicate of MAPREDUCE-2129 (sorry I didn't find it)

        The fix has been applied to branch-1 and branch-1.1, but not branch-1.0.
        Merging r1358233 (from branch-1) in branch-1.0 should be enough.

        Thanks

        Show
        Vincent Behar added a comment - yes it is a duplicate of MAPREDUCE-2129 (sorry I didn't find it) The fix has been applied to branch-1 and branch-1.1, but not branch-1.0. Merging r1358233 (from branch-1) in branch-1.0 should be enough. Thanks
        Hide
        Jason Lowe added a comment -

        Adding Matt Foley who is the release manager for Hadoop 1.x. He can comment on whether there are plans for another 1.0.x release and if MAPREDUCE-2129 would be a good candidate.

        Show
        Jason Lowe added a comment - Adding Matt Foley who is the release manager for Hadoop 1.x. He can comment on whether there are plans for another 1.0.x release and if MAPREDUCE-2129 would be a good candidate.

          People

          • Assignee:
            Unassigned
            Reporter:
            Vincent Behar
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:

              Development