Details
-
Improvement
-
Status: Open
-
Not a Priority
-
Resolution: Unresolved
-
None
-
None
Description
In HadoopOutputFormatBase open method, the config option mapreduce.output.basename was changed to "tmp" and there is not any documentation state this change.
By default, HDFS will use this format "part-x-yyyyy" to name its file, the x and y means :
- x is either 'm' or 'r', depending on whether the job was a map only job, or reduce
- yyyyy is the mapper or reducer task number (zero based)
The keyword "part" has used in many place in user's business logic to match the hdfs's file name. So I suggest to revert this config option or document it.