Status: Open
Not a Priority
Resolution: Unresolved
In HadoopOutputFormatBase open method, the config option mapreduce.output.basename was changed to "tmp" and there is not any documentation state this change.
By default, HDFS will use this format "part-x-yyyyy" to name its file, the x and y means :
- x is either 'm' or 'r', depending on whether the job was a map only job, or reduce
- yyyyy is the mapper or reducer task number (zero based)
The keyword "part" has used in many place in user's business logic to match the hdfs's file name. So I suggest to revert this config option or document it.