Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
0.14.3
-
None
-
All
Description
dfs -getMerge, which calls FileUtil.CopyMerge, contains this javadoc:
Get all the files in the directories that match the source file pattern * and merge and sort them to only one file on local fs * srcf is kept.
However, it only concatenates the set of input files, rather than merging them in sorted order.
Ideally, the copyMerge should be equivalent to a map-reduce job with IdentityMapper and IdentityReducer with numReducers = 1. However, not having to run this as a map-reduce job has some advantages, since it increases cluster utilization during reduce phase.