Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3041

Within a task, the value ofJobConf.getOutputPath() method is modified

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.16.1
    • Fix Version/s: 0.17.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      all

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      1. Deprecates JobConf.setOutputPath and JobConf.getOutputPath
      JobConf.getOutputPath() still returns the same value that it used to return.
      2. Deprecates OutputFormatBase. Adds FileOutputFormat. Existing output formats extending OutputFormatBase, now extend FileOutputFormat.
      3. Adds the following APIs in FileOutputFormat :
      public static void setOutputPath(JobConf conf, Path outputDir); // sets mapred.output.dir
      public static Path getOutputPath(JobConf conf) ; // gets mapred.output.dir
      public static Path getWorkOutputPath(JobConf conf); // gets mapred.work.output.dir
      4. static void setWorkOutputPath(JobConf conf, Path outputDir) is also added to FileOutputFormat. This is used by the framework to set mapred.work.output.dir as task's temporary output dir .
      Show
      1. Deprecates JobConf.setOutputPath and JobConf.getOutputPath JobConf.getOutputPath() still returns the same value that it used to return. 2. Deprecates OutputFormatBase. Adds FileOutputFormat. Existing output formats extending OutputFormatBase, now extend FileOutputFormat. 3. Adds the following APIs in FileOutputFormat : public static void setOutputPath(JobConf conf, Path outputDir); // sets mapred.output.dir public static Path getOutputPath(JobConf conf) ; // gets mapred.output.dir public static Path getWorkOutputPath(JobConf conf); // gets mapred.work.output.dir 4. static void setWorkOutputPath(JobConf conf, Path outputDir) is also added to FileOutputFormat. This is used by the framework to set mapred.work.output.dir as task's temporary output dir .

      Description

      Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.

      For example: /user/foo/myoutput/part_00000

      In 0.16.1, now it returns an internal hadoop for the task output temporary location.

      For the above example: /user/foo/myoutput/_temporary/part_00000

      This change breaks applications that use the getOutputPath() to compute other directories.

      IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.

        Attachments

        1. patch-3041.txt
          86 kB
          Amareshwari Sriramadasu
        2. patch-3041.txt
          85 kB
          Amareshwari Sriramadasu
        3. patch-3041.txt
          13 kB
          Amareshwari Sriramadasu
        4. patch-3041.txt
          13 kB
          Amareshwari Sriramadasu
        5. patch-3041.txt
          12 kB
          Amareshwari Sriramadasu
        6. patch-3041-0.16.2.txt
          17 kB
          Amareshwari Sriramadasu
        7. patch-3041.txt
          20 kB
          Amareshwari Sriramadasu

          Activity

            People

            • Assignee:
              amareshwari Amareshwari Sriramadasu
              Reporter:
              tucu00 Alejandro Abdelnur
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: