Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1521

Protection against incorrectly configured reduces

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.22.1
    • jobtracker
    • None

    Description

      We've seen a fair number of instances where naive users process huge data-sets (>10TB) with badly mis-configured #reduces e.g. 1 reduce.

      This is a significant problem on large clusters since it takes each attempt of the reduce a long time to shuffle and then run into problems such as local disk-space etc. Then it takes 4 such attempts.

      Proposal: Come up with heuristics/configs to fail such jobs early.

      Thoughts?

      Attachments

        1. resourcestimator-overflow.txt
          1 kB
          Todd Lipcon
        2. resourceestimator-threshold.txt
          2 kB
          Todd Lipcon
        3. MAPREDUCE-1521-trunk.patch
          13 kB
          Mahadev Konar
        4. MAPREDUCE-1521-0.20-yahoo.patch
          12 kB
          Mahadev Konar
        5. MAPREDUCE-1521-0.20-yahoo.patch
          11 kB
          Mahadev Konar
        6. MAPREDUCE-1521-0.20-yahoo.patch
          11 kB
          Mahadev Konar
        7. MAPREDUCE-1521-0.20-yahoo.patch
          9 kB
          Mahadev Konar
        8. MAPREDUCE-1521-0.20-yahoo.patch
          3 kB
          Mahadev Konar

        Issue Links

          Activity

            People

              mahadev Mahadev Konar
              acmurthy Arun Murthy
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: