Hadoop Common
  1. Hadoop Common
  2. HADOOP-3041

Within a task, the value ofJobConf.getOutputPath() method is modified

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.16.1
    • Fix Version/s: 0.17.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      all

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      1. Deprecates JobConf.setOutputPath and JobConf.getOutputPath
      JobConf.getOutputPath() still returns the same value that it used to return.
      2. Deprecates OutputFormatBase. Adds FileOutputFormat. Existing output formats extending OutputFormatBase, now extend FileOutputFormat.
      3. Adds the following APIs in FileOutputFormat :
      public static void setOutputPath(JobConf conf, Path outputDir); // sets mapred.output.dir
      public static Path getOutputPath(JobConf conf) ; // gets mapred.output.dir
      public static Path getWorkOutputPath(JobConf conf); // gets mapred.work.output.dir
      4. static void setWorkOutputPath(JobConf conf, Path outputDir) is also added to FileOutputFormat. This is used by the framework to set mapred.work.output.dir as task's temporary output dir .
      Show
      1. Deprecates JobConf.setOutputPath and JobConf.getOutputPath JobConf.getOutputPath() still returns the same value that it used to return. 2. Deprecates OutputFormatBase. Adds FileOutputFormat. Existing output formats extending OutputFormatBase, now extend FileOutputFormat. 3. Adds the following APIs in FileOutputFormat : public static void setOutputPath(JobConf conf, Path outputDir); // sets mapred.output.dir public static Path getOutputPath(JobConf conf) ; // gets mapred.output.dir public static Path getWorkOutputPath(JobConf conf); // gets mapred.work.output.dir 4. static void setWorkOutputPath(JobConf conf, Path outputDir) is also added to FileOutputFormat. This is used by the framework to set mapred.work.output.dir as task's temporary output dir .

      Description

      Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed to the part file assigned to the task.

      For example: /user/foo/myoutput/part_00000

      In 0.16.1, now it returns an internal hadoop for the task output temporary location.

      For the above example: /user/foo/myoutput/_temporary/part_00000

      This change breaks applications that use the getOutputPath() to compute other directories.

      IMO, this has always being broken, Hadoop should not change the values of properties injected by the client, instead it should use private properties or internal helper methods.

      1. patch-3041.txt
        86 kB
        Amareshwari Sriramadasu
      2. patch-3041.txt
        85 kB
        Amareshwari Sriramadasu
      3. patch-3041.txt
        13 kB
        Amareshwari Sriramadasu
      4. patch-3041.txt
        13 kB
        Amareshwari Sriramadasu
      5. patch-3041.txt
        12 kB
        Amareshwari Sriramadasu
      6. patch-3041-0.16.2.txt
        17 kB
        Amareshwari Sriramadasu
      7. patch-3041.txt
        20 kB
        Amareshwari Sriramadasu

        Activity

        Hide
        Devaraj Das added a comment -

        Alejandro, the reason for modifying the job's output dir is to let user apps transparently deal with things like creation of side files in the task's output directory, and, speculative tasks creating the same output files. Another reason is that the getOutputPath can be used (and is usually used) in the OutputFormat implementation. All user code could use getOutputPath and create task specific stuff there and the framework automatically promotes/discards these files upon successful/failed task completion. Look at the JavaDoc in JobConf.getOutputPath() to get a clear explanation of what i am trying to say (by the way this doc needs to be fixed to include _temporary).
        You are facing the problem since you create a directory in the same level as the actual output directory of the job. One way to address your problem is to provide an additional API like JobConf.getConfiguredOutputPath that would internally do things like getOutputPath.getParent(), etc. and return you the actual configured directory. This will ensure that your apps don't break when the framework changes the directory structure of the output path, etc. Not the best solution but we have to arrive at a compromise between your requirement and what we already document and provide. Thoughts?

        Show
        Devaraj Das added a comment - Alejandro, the reason for modifying the job's output dir is to let user apps transparently deal with things like creation of side files in the task's output directory, and, speculative tasks creating the same output files. Another reason is that the getOutputPath can be used (and is usually used) in the OutputFormat implementation. All user code could use getOutputPath and create task specific stuff there and the framework automatically promotes/discards these files upon successful/failed task completion. Look at the JavaDoc in JobConf.getOutputPath() to get a clear explanation of what i am trying to say (by the way this doc needs to be fixed to include _temporary). You are facing the problem since you create a directory in the same level as the actual output directory of the job. One way to address your problem is to provide an additional API like JobConf.getConfiguredOutputPath that would internally do things like getOutputPath.getParent(), etc. and return you the actual configured directory. This will ensure that your apps don't break when the framework changes the directory structure of the output path, etc. Not the best solution but we have to arrive at a compromise between your requirement and what we already document and provide. Thoughts?
        Hide
        Alejandro Abdelnur added a comment -

        If there is a method returning the original path is OK.

        But, using the rule of least surprise, wouldn't make more sense to have a getTaskOutputPath() that returns the path to the part file for the current task and leave the getOutputPath() with the user entered value?

        Also the javadoc should not say 'Get the Path to the output directory for the map-reduce job' in its one line description then.

        Show
        Alejandro Abdelnur added a comment - If there is a method returning the original path is OK. But, using the rule of least surprise, wouldn't make more sense to have a getTaskOutputPath() that returns the path to the part file for the current task and leave the getOutputPath() with the user entered value? Also the javadoc should not say 'Get the Path to the output directory for the map-reduce job' in its one line description then.
        Hide
        Devaraj Das added a comment -

        But, using the rule of least surprise, wouldn't make more sense to have a getTaskOutputPath() that returns the path to the part file for the current task and leave the getOutputPath() with the user entered value?

        Possibly. One thing that is of concern here is that apps potentially have been written using the getOutputPath API (that creates side files within it).. Also, if the user really intends to create a side file in the output directory of the job, it is slightly unintuitive IMO to have the user invoke getTaskOutputPath. But yes I agree that getOutputPath returning the task's output path is unintuitive as well. I wish this was clearer. I am unhappy about it too..

        Also the javadoc should not say 'Get the Path to the output directory for the map-reduce job' in its one line description then.

        Hmm..

        Show
        Devaraj Das added a comment - But, using the rule of least surprise, wouldn't make more sense to have a getTaskOutputPath() that returns the path to the part file for the current task and leave the getOutputPath() with the user entered value? Possibly. One thing that is of concern here is that apps potentially have been written using the getOutputPath API (that creates side files within it).. Also, if the user really intends to create a side file in the output directory of the job, it is slightly unintuitive IMO to have the user invoke getTaskOutputPath. But yes I agree that getOutputPath returning the task's output path is unintuitive as well. I wish this was clearer. I am unhappy about it too.. Also the javadoc should not say 'Get the Path to the output directory for the map-reduce job' in its one line description then. Hmm..
        Hide
        Runping Qi added a comment -

        bq: Possibly. One thing that is of concern here is that apps potentially have been written using the getOutputPath API (that creates side files within it)..

        Indeed. Any applications that implementtheir own output format class depend on the current semantics of getOutputPath.
        I have many of such applications.

        Show
        Runping Qi added a comment - bq: Possibly. One thing that is of concern here is that apps potentially have been written using the getOutputPath API (that creates side files within it).. Indeed. Any applications that implementtheir own output format class depend on the current semantics of getOutputPath. I have many of such applications.
        Hide
        Devaraj Das added a comment -

        Indeed. Any applications that implementtheir own output format class depend on the current semantics of getOutputPath.

        I guess the solution we are driving towards is that we will have an API called JobConf.getFinalOutputPath() and define a private job config variable that will store the dir what the user originally sets during job submission. This config variable is never updated except during job submission.

        Show
        Devaraj Das added a comment - Indeed. Any applications that implementtheir own output format class depend on the current semantics of getOutputPath. I guess the solution we are driving towards is that we will have an API called JobConf.getFinalOutputPath() and define a private job config variable that will store the dir what the user originally sets during job submission. This config variable is never updated except during job submission.
        Hide
        Sameer Paranjpye added a comment -

        Silently changing public API semantics is bad, looks like we've done that here. How about we:

        • Deprecate getOutputPath() and replace it with getCurrentOutputPath() and getFinalOutputPath()
        • Have getOutputPath() return the same thing as getCurrentOutputPath() because that breaks the least amount of code
        Show
        Sameer Paranjpye added a comment - Silently changing public API semantics is bad, looks like we've done that here. How about we: Deprecate getOutputPath() and replace it with getCurrentOutputPath() and getFinalOutputPath() Have getOutputPath() return the same thing as getCurrentOutputPath() because that breaks the least amount of code
        Hide
        Devaraj Das added a comment -

        Deprecate getOutputPath() and replace it with getCurrentOutputPath() and getFinalOutputPath()

        +1

        Show
        Devaraj Das added a comment - Deprecate getOutputPath() and replace it with getCurrentOutputPath() and getFinalOutputPath() +1
        Hide
        Amareshwari Sriramadasu added a comment -

        Here is a patch that deprecates getOutputPath() and adds new apis getCurrentOutputPath() and getFinalOutputPath() (all in JobConf)

        Show
        Amareshwari Sriramadasu added a comment - Here is a patch that deprecates getOutputPath() and adds new apis getCurrentOutputPath() and getFinalOutputPath() (all in JobConf)
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12378306/patch-3041.txt
        against trunk revision 619744.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 15 new or modified tests.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2016/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2016/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2016/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2016/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378306/patch-3041.txt against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 15 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2016/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2016/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2016/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2016/console This message is automatically generated.
        Hide
        Devaraj Das added a comment - - edited

        I committed the patch to trunk. Could you please submit a patch for the 0.16 branch. This one doesn't apply cleanly. Thanks!

        Show
        Devaraj Das added a comment - - edited I committed the patch to trunk. Could you please submit a patch for the 0.16 branch. This one doesn't apply cleanly. Thanks!
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch for 0.16 branch. It passed all the ant tests successfully on my machine on branch 0.16.

        Show
        Amareshwari Sriramadasu added a comment - Patch for 0.16 branch. It passed all the ant tests successfully on my machine on branch 0.16.
        Hide
        Devaraj Das added a comment -

        I just committed this. Thanks, Amareshwari!

        Show
        Devaraj Das added a comment - I just committed this. Thanks, Amareshwari!
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #436 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/436/ )
        Hide
        Owen O'Malley added a comment -

        I think we need to revert this in 16.2. Breaking compatibility from 0.16.1 is worse that living with a recognized change from 0.16.0. Changing from getOutputPath makes the API inconsistent with setOutputPath, so for 0.17, I propose:

        • moving getOutputPath to be the final one
        • rename getCurrentOutputPath to getWorkOutputDirectory
        Show
        Owen O'Malley added a comment - I think we need to revert this in 16.2. Breaking compatibility from 0.16.1 is worse that living with a recognized change from 0.16.0. Changing from getOutputPath makes the API inconsistent with setOutputPath, so for 0.17, I propose: moving getOutputPath to be the final one rename getCurrentOutputPath to getWorkOutputDirectory
        Hide
        Alejandro Abdelnur added a comment -

        IMO Owen's proposal is the correct one.

        I'm fine with reverting behavior to the 0.16.0 if it gets addressed in 0.17 as we can workaround it.

        Show
        Alejandro Abdelnur added a comment - IMO Owen's proposal is the correct one. I'm fine with reverting behavior to the 0.16.0 if it gets addressed in 0.17 as we can workaround it.
        Hide
        Devaraj Das added a comment -

        I reverted the patch

        Show
        Devaraj Das added a comment - I reverted the patch
        Hide
        Amareshwari Sriramadasu added a comment -

        so for 0.17, I propose:

        • moving getOutputPath to be the final one
        • rename getCurrentOutputPath to getWorkOutputDirectory

        Moving the getOutputPath to be the final one makes existing applications incompatible. ex. All the OutputFormat written by users has to use getWorkOutputDirectory instead of getOutputPath.

        Show
        Amareshwari Sriramadasu added a comment - so for 0.17, I propose: moving getOutputPath to be the final one rename getCurrentOutputPath to getWorkOutputDirectory Moving the getOutputPath to be the final one makes existing applications incompatible. ex. All the OutputFormat written by users has to use getWorkOutputDirectory instead of getOutputPath.
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #442 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/442/ )
        Hide
        Amareshwari Sriramadasu added a comment -

        Here is a patch adding
        1. getWorkOutputDirectory() returning task's temporary output directory which will be set by the framework.
        2. changing getOutputPath to hold the value specified by setOutputPath always.

        Show
        Amareshwari Sriramadasu added a comment - Here is a patch adding 1. getWorkOutputDirectory() returning task's temporary output directory which will be set by the framework. 2. changing getOutputPath to hold the value specified by setOutputPath always.
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch has a documentation change from the earlier.

        Show
        Amareshwari Sriramadasu added a comment - Patch has a documentation change from the earlier.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12378708/patch-3041.txt
        against trunk revision 619744.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 6 new or modified tests.

        javadoc -1. The javadoc tool appears to have generated 1 warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2075/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2075/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2075/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2075/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378708/patch-3041.txt against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 6 new or modified tests. javadoc -1. The javadoc tool appears to have generated 1 warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2075/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2075/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2075/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2075/console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12378715/patch-3041.txt
        against trunk revision 619744.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 6 new or modified tests.

        javadoc -1. The javadoc tool appears to have generated 1 warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2077/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2077/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2077/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2077/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378715/patch-3041.txt against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 6 new or modified tests. javadoc -1. The javadoc tool appears to have generated 1 warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2077/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2077/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2077/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2077/console This message is automatically generated.
        Hide
        Amareshwari Sriramadasu added a comment -

        Java doc warning is due to src/java/org/apache/hadoop/net/SocketInputStream.java, not related to this patch.

        Show
        Amareshwari Sriramadasu added a comment - Java doc warning is due to src/java/org/apache/hadoop/net/SocketInputStream.java, not related to this patch.
        Hide
        Amareshwari Sriramadasu added a comment -

        Canceling patch to make setOutputPath and setWorkOutputDirectory have same code semantics.

        Show
        Amareshwari Sriramadasu added a comment - Canceling patch to make setOutputPath and setWorkOutputDirectory have same code semantics.
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch has setWorkOutputDirectory code similar to setOutputPath.

        Show
        Amareshwari Sriramadasu added a comment - Patch has setWorkOutputDirectory code similar to setOutputPath.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12378939/patch-3041.txt
        against trunk revision 619744.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 6 new or modified tests.

        patch -1. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2101/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378939/patch-3041.txt against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 6 new or modified tests. patch -1. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2101/console This message is automatically generated.
        Hide
        Runping Qi added a comment -

        My applications assume that getOutputPath() returns the temporary working dir.
        Will my apps be broken by this patch?
        I think many people are in similar situations.

        Show
        Runping Qi added a comment - My applications assume that getOutputPath() returns the temporary working dir. Will my apps be broken by this patch? I think many people are in similar situations.
        Hide
        Runping Qi added a comment -

        I think it is better to removing the getOutputPath() from the api and replace it with something else.
        That way, the application can detect the problem at compile time.
        Otherwise, the apps will misbehave without any warnings.

        Show
        Runping Qi added a comment - I think it is better to removing the getOutputPath() from the api and replace it with something else. That way, the application can detect the problem at compile time. Otherwise, the apps will misbehave without any warnings.
        Hide
        Owen O'Malley added a comment -

        Sigh

        One option is to move both get and set output path to FileInputFormat, which is where they actually belong. I only hate to do it because moving JobConf.setOutputPath will break 99% of map/reduce applications. In that case, it would look like:

        FileInputFormat:
          getOutputPath
          setOutputPath
          getTaskOutputPath
        
        JobConf:
          deprecate getOutputPath
          deprecate setOutputPath
        
        Show
        Owen O'Malley added a comment - Sigh One option is to move both get and set output path to FileInputFormat, which is where they actually belong. I only hate to do it because moving JobConf.setOutputPath will break 99% of map/reduce applications. In that case, it would look like: FileInputFormat: getOutputPath setOutputPath getTaskOutputPath JobConf: deprecate getOutputPath deprecate setOutputPath
        Hide
        Alejandro Abdelnur added a comment -

        Our applications, which which are running on a previous hadoop version, when migrating to 0.16.0+ are failing because we assumed the returned value was the path of output part without any temporary stuff in it. So we are broken as well.

        As the Hadoop API gets refined changes like this will break things, for example FileSystem listPaths() now returns NULL instead an empty array when the dir does not exist.

        It is kind of painful but I would not deprecating methods with the right name because they were returning incorrect data.

        IMO the right thing to do is to have the getOutputPath() with the configured value and getWorkingOutputPath() with the temporary dir.

        Show
        Alejandro Abdelnur added a comment - Our applications, which which are running on a previous hadoop version, when migrating to 0.16.0+ are failing because we assumed the returned value was the path of output part without any temporary stuff in it. So we are broken as well. As the Hadoop API gets refined changes like this will break things, for example FileSystem listPaths() now returns NULL instead an empty array when the dir does not exist. It is kind of painful but I would not deprecating methods with the right name because they were returning incorrect data. IMO the right thing to do is to have the getOutputPath() with the configured value and getWorkingOutputPath() with the temporary dir.
        Hide
        Amareshwari Sriramadasu added a comment -

        After the discussions, attaching a patch that does the following:

        1. Deprecates JobConf.setOutputPath and JobConf.getOutputPath
        JobConf.getOutputPath() still returns the same value that it used to return.
        2. Deprecates OutputFormatBase. Adds FileOutputFormat. Existing output formats extending OutputFormatBase, are now extending FileOutputFormat.
        3. Adds the following APIs in FileOutputFormat :
        public static void setOutputPath(JobConf conf, Path outputDir); // sets mapred.output.dir
        public static Path getOutputPath(JobConf conf) ; // gets mapred.output.dir
        public static Path getWorkOutputPath(JobConf conf); // gets mapred.work.output.dir
        4. static void setWorkOutputPath(JobConf conf, Path outputDir) is also added to FileOutputFormat. This is used by the framework to set mapred.work.output.dir as task's temporary output dir .

        Show
        Amareshwari Sriramadasu added a comment - After the discussions, attaching a patch that does the following: 1. Deprecates JobConf.setOutputPath and JobConf.getOutputPath JobConf.getOutputPath() still returns the same value that it used to return. 2. Deprecates OutputFormatBase. Adds FileOutputFormat. Existing output formats extending OutputFormatBase, are now extending FileOutputFormat. 3. Adds the following APIs in FileOutputFormat : public static void setOutputPath(JobConf conf, Path outputDir); // sets mapred.output.dir public static Path getOutputPath(JobConf conf) ; // gets mapred.output.dir public static Path getWorkOutputPath(JobConf conf); // gets mapred.work.output.dir 4. static void setWorkOutputPath(JobConf conf, Path outputDir) is also added to FileOutputFormat. This is used by the framework to set mapred.work.output.dir as task's temporary output dir .
        Hide
        Owen O'Malley added a comment -

        This looks like the right direction.

        Show
        Owen O'Malley added a comment - This looks like the right direction.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12379258/patch-3041.txt
        against trunk revision 643282.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 114 new or modified tests.

        patch -1. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2146/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12379258/patch-3041.txt against trunk revision 643282. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 114 new or modified tests. patch -1. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2146/console This message is automatically generated.
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch in sync with the trunk

        Show
        Amareshwari Sriramadasu added a comment - Patch in sync with the trunk
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12379370/patch-3041.txt
        against trunk revision 643282.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 114 new or modified tests.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2163/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2163/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2163/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2163/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12379370/patch-3041.txt against trunk revision 643282. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 114 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2163/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2163/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2163/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2163/console This message is automatically generated.
        Hide
        Devaraj Das added a comment -

        +1

        Show
        Devaraj Das added a comment - +1
        Hide
        Nigel Daley added a comment -

        I just committed this. Thanks Amareshwari! Can we add a release note for this?

        Show
        Nigel Daley added a comment - I just committed this. Thanks Amareshwari! Can we add a release note for this?
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #451 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/451/ )

          People

          • Assignee:
            Amareshwari Sriramadasu
            Reporter:
            Alejandro Abdelnur
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development