Hadoop Common
  1. Hadoop Common
  2. HADOOP-3041

Within a task, the value ofJobConf.getOutputPath() method is modified

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.16.1
    • Fix Version/s: 0.17.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      all

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      1. Deprecates JobConf.setOutputPath and JobConf.getOutputPath
      JobConf.getOutputPath() still returns the same value that it used to return.
      2. Deprecates OutputFormatBase. Adds FileOutputFormat. Existing output formats extending OutputFormatBase, now extend FileOutputFormat.
      3. Adds the following APIs in FileOutputFormat :
      public static void setOutputPath(JobConf conf, Path outputDir); // sets mapred.output.dir
      public static Path getOutputPath(JobConf conf) ; // gets mapred.output.dir
      public static Path getWorkOutputPath(JobConf conf); // gets mapred.work.output.dir
      4. static void setWorkOutputPath(JobConf conf, Path outputDir) is also added to FileOutputFormat. This is used by the framework to set mapred.work.output.dir as task's temporary output dir .
      Show
      1. Deprecates JobConf.setOutputPath and JobConf.getOutputPath JobConf.getOutputPath() still returns the same value that it used to return. 2. Deprecates OutputFormatBase. Adds FileOutputFormat. Existing output formats extending OutputFormatBase, now extend FileOutputFormat. 3. Adds the following APIs in FileOutputFormat : public static void setOutputPath(JobConf conf, Path outputDir); // sets mapred.output.dir public static Path getOutputPath(JobConf conf) ; // gets mapred.output.dir public static Path getWorkOutputPath(JobConf conf); // gets mapred.work.output.dir 4. static void setWorkOutputPath(JobConf conf, Path outputDir) is also added to FileOutputFormat. This is used by the framework to set mapred.work.output.dir as task's temporary output dir .

      Description

      Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.

      For example: /user/foo/myoutput/part_00000

      In 0.16.1, now it returns an internal hadoop for the task output temporary location.

      For the above example: /user/foo/myoutput/_temporary/part_00000

      This change breaks applications that use the getOutputPath() to compute other directories.

      IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.

      1. patch-3041.txt
        86 kB
        Amareshwari Sriramadasu
      2. patch-3041.txt
        85 kB
        Amareshwari Sriramadasu
      3. patch-3041.txt
        13 kB
        Amareshwari Sriramadasu
      4. patch-3041.txt
        13 kB
        Amareshwari Sriramadasu
      5. patch-3041.txt
        12 kB
        Amareshwari Sriramadasu
      6. patch-3041.txt
        20 kB
        Amareshwari Sriramadasu
      7. patch-3041-0.16.2.txt
        17 kB
        Amareshwari Sriramadasu

        Activity

          People

          • Assignee:
            Amareshwari Sriramadasu
            Reporter:
            Alejandro Abdelnur
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development