Hadoop Common
  1. Hadoop Common
  2. HADOOP-2120

dfs -getMerge does not do what it says it does

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not A Problem
    • Affects Version/s: 0.14.3
    • Fix Version/s: None
    • Component/s: documentation, fs
    • Labels:
    • Environment:

      All

      Description

      dfs -getMerge, which calls FileUtil.CopyMerge, contains this javadoc:

      Get all the files in the directories that match the source file pattern
         * and merge and sort them to only one file on local fs 
         * srcf is kept.
      

      However, it only concatenates the set of input files, rather than merging them in sorted order.

      Ideally, the copyMerge should be equivalent to a map-reduce job with IdentityMapper and IdentityReducer with numReducers = 1. However, not having to run this as a map-reduce job has some advantages, since it increases cluster utilization during reduce phase.

        Activity

        Uma Maheswara Rao G made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Not A Problem [ 8 ]
        Eli Collins made changes -
        Labels newbie
        Component/s documentation [ 12311160 ]
        Nigel Daley made changes -
        Fix Version/s 0.16.0 [ 12312740 ]
        Arun C Murthy made changes -
        Field Original Value New Value
        Component/s mapred [ 12310690 ]
        Component/s fs [ 12310689 ]
        Milind Bhandarkar created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Milind Bhandarkar
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development