Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-591

Reducer sort should even out the pass factors in merging different pass

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      When multiple pass merging is needed during sort, the current sort implementation in SequenceFile class uses a simple "greedy" way to select pass factors, resulting uneven pass factor in different passes. For example, if the factor pass is 100 (the default), and there are 101 segments to be merged. The current implementation will first merge the first 100 segments into one and then merge the big output file with the last segment with pass factor 2. It will be better off to use pass factors 11 in the first pass and pass factor 10 in the second pass.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ddas Devaraj Das
                Reporter:
                runping Runping Qi
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: