Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-2120

dfs -getMerge does not do what it says it does

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 0.14.3
    • Fix Version/s: None
    • Component/s: documentation, fs
    • Labels:
    • Environment:

      All

      Description

      dfs -getMerge, which calls FileUtil.CopyMerge, contains this javadoc:

      Get all the files in the directories that match the source file pattern
         * and merge and sort them to only one file on local fs 
         * srcf is kept.
      

      However, it only concatenates the set of input files, rather than merging them in sorted order.

      Ideally, the copyMerge should be equivalent to a map-reduce job with IdentityMapper and IdentityReducer with numReducers = 1. However, not having to run this as a map-reduce job has some advantages, since it increases cluster utilization during reduce phase.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              milindb Milind Bhandarkar
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: