Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5323

Min Spills For Combine Ignored

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Not A Problem
    • None
    • None
    • task
    • None

    Description

      We've observed for some time that combiners always run when specified. However there is a config called mapreduce.map.combine.minspills which sort of implies that the developer or administrator ought to be able to control when combiners are invoked.

      I spelunked into the code and found this gem in MapTask.java:

      if (combinerRunner == null || numSpills < minSpillsForCombine)

      { Merger.writeFile(kvIter, writer, reporter, job); }

      else

      { combineCollector.setWriter(writer); combinerRunner.combine(kvIter, combineCollector); }

      That looks way buggy to me. If ( A || B ) is made false by A then B is never executed. I spelunked around the code some more and it looks like combinerRunner is never null except on reflection failure. So it looks like the intention is for minSpillsForCombine to be respected, but due to this logic error it's totally ignored.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jwfbean Jeff Bean
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: