Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3258

FileOutputFormat should have a method to create custom files under the outputdir with a unique name per task to avoid name collision

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None
    • None
    • all

    • FileOutputFormat provides a new static method, getPathForCustomFile, that creates a custom filename under the outputdir that is namespace with the task ID and task type (i.e. myfile-r-00001).

    Description

      Currently, if a M/R code creates a file, it is the responsibility of the M/R code to avoid file name collisions from different tasks.

      Hadoop should provide an API that creates unique file names based on the task type (map or reduce) and the task ID. Similarly to how output files, part-#####, are created.

      The proposed patch adds 2 static methods to the FileOutputFormat

      {nofomat}
      public static String getUniqueName(JobConf conf, String name);
      public static Path getPathForCustomFile(JobConf conf, String name);{nofomat}

      The first one adds task type and task ID to the given name.

      The second gives a PATH to a file in the working outputdir using a file name namespaced by the first method.

      Attachments

        1. patch-3258.txt
          7 kB
          Alejandro Abdelnur
        2. patch-3258.txt
          8 kB
          Alejandro Abdelnur

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tucu00 Alejandro Abdelnur
            tucu00 Alejandro Abdelnur
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment