Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1521

Protection against incorrectly configured reduces

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.1
    • Component/s: jobtracker
    • Labels:
      None

      Description

      We've seen a fair number of instances where naive users process huge data-sets (>10TB) with badly mis-configured #reduces e.g. 1 reduce.

      This is a significant problem on large clusters since it takes each attempt of the reduce a long time to shuffle and then run into problems such as local disk-space etc. Then it takes 4 such attempts.

      Proposal: Come up with heuristics/configs to fail such jobs early.

      Thoughts?

        Attachments

        1. resourcestimator-overflow.txt
          1 kB
          Todd Lipcon
        2. resourceestimator-threshold.txt
          2 kB
          Todd Lipcon
        3. MAPREDUCE-1521-trunk.patch
          13 kB
          Mahadev konar
        4. MAPREDUCE-1521-0.20-yahoo.patch
          3 kB
          Mahadev konar
        5. MAPREDUCE-1521-0.20-yahoo.patch
          9 kB
          Mahadev konar
        6. MAPREDUCE-1521-0.20-yahoo.patch
          11 kB
          Mahadev konar
        7. MAPREDUCE-1521-0.20-yahoo.patch
          11 kB
          Mahadev konar
        8. MAPREDUCE-1521-0.20-yahoo.patch
          12 kB
          Mahadev konar

          Issue Links

            Activity

              People

              • Assignee:
                mahadev Mahadev konar
                Reporter:
                acmurthy Arun C Murthy
              • Votes:
                0 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: