Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-12003

Revert the config option about mapreduce.output.basename in HadoopOutputFormatBase

    XMLWordPrintableJSON

Details

    Description

      In HadoopOutputFormatBase open method, the config option mapreduce.output.basename was changed to "tmp" and there is not any documentation state this change.

      By default, HDFS will use this format "part-x-yyyyy" to name its file, the x and y means : 

      • x is either 'm' or 'r', depending on whether the job was a map only job, or reduce
      • yyyyy is the mapper or reducer task number (zero based)

       

      The keyword "part" has used in many place in user's business logic to match the hdfs's file name. So I suggest to revert this config option or document it.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            yanghua vinoyang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: