Hadoop Common
  1. Hadoop Common
  2. HADOOP-4746

Job output directory should be normalized

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.18.0
    • Fix Version/s: 0.18.3
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      JobClient should normalize the output directory by calling FileSystem#makeQualified() in order to avoid the problem described in HADOOP-4717.

      1. outDir3.patch
        1 kB
        Hairong Kuang
      2. outDir2-br18.patch
        1 kB
        Hairong Kuang
      3. outDir2.patch
        1 kB
        Hairong Kuang
      4. outDir1.patch
        0.8 kB
        Hairong Kuang
      5. outDir.patch
        0.6 kB
        Hairong Kuang

        Issue Links

          Activity

          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #680 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/680/ )
          Hide
          Hairong Kuang added a comment -

          A unit test will be added in HADOOP-4717 to test the new code.

          I've just committed this!

          Show
          Hairong Kuang added a comment - A unit test will be added in HADOOP-4717 to test the new code. I've just committed this!
          Hide
          Hairong Kuang added a comment -

          ant test-core passed:
          BUILD SUCCESSFUL
          Total time: 114 minutes 21 seconds

          ant test-patch result:
          [exec] -1 overall.

          [exec] +1 @author. The patch does not contain any @author tags.

          [exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
          [exec] Please justify why no tests are needed for this patch.

          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.

          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.

          [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          Show
          Hairong Kuang added a comment - ant test-core passed: BUILD SUCCESSFUL Total time: 114 minutes 21 seconds ant test-patch result: [exec] -1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          Hide
          Doug Cutting added a comment -

          +1 This looks good to me.

          Show
          Doug Cutting added a comment - +1 This looks good to me.
          Hide
          Hairong Kuang added a comment -

          Here comes a new patch, which normalizes the output directory in FileOutputFormat#checkOutputSpecs().

          Show
          Hairong Kuang added a comment - Here comes a new patch, which normalizes the output directory in FileOutputFormat#checkOutputSpecs().
          Hide
          Hairong Kuang added a comment -

          The down side of doing this in FileOutputFormat#checkOutputSpecs() is that if a user writes its own customized OutputFormat, the user might forget to normalize the output directory in checkOutputSpecs(). Otherwise I am OK with this suggestion.

          Show
          Hairong Kuang added a comment - The down side of doing this in FileOutputFormat#checkOutputSpecs() is that if a user writes its own customized OutputFormat, the user might forget to normalize the output directory in checkOutputSpecs(). Otherwise I am OK with this suggestion.
          Hide
          Doug Cutting added a comment -

          Might this be done in FileOutputFormat#checkOutputSpecs()? That would better encapsulate this file-specific operation, and it already throws IOException, so it wouldn't break compatibility.

          > Ideally setInputPath & getInputPath should be methods of JobConf.

          No, we'd like to keep all file-specific stuff in FileOutputFormat, since jobs can take non-file input and output.

          Show
          Doug Cutting added a comment - Might this be done in FileOutputFormat#checkOutputSpecs()? That would better encapsulate this file-specific operation, and it already throws IOException, so it wouldn't break compatibility. > Ideally setInputPath & getInputPath should be methods of JobConf. No, we'd like to keep all file-specific stuff in FileOutputFormat, since jobs can take non-file input and output.
          Hide
          Hairong Kuang added a comment -

          A patch for 0.18.

          Show
          Hairong Kuang added a comment - A patch for 0.18.
          Hide
          Hairong Kuang added a comment -

          This patch incorporates Owen's comment. It normalize the output directory in JobClient#submitJob. Ideally setInputPath & getInputPath should be methods of JobConf.

          Show
          Hairong Kuang added a comment - This patch incorporates Owen's comment. It normalize the output directory in JobClient#submitJob. Ideally setInputPath & getInputPath should be methods of JobConf.
          Hide
          Hairong Kuang added a comment -

          Output directory is set by the map/reduce application. I am not familiar with JobClient code. But I do not think how it works if we make change in JobClient. Reading it from jobConf, normailizing it, and then seting it again?

          Show
          Hairong Kuang added a comment - Output directory is set by the map/reduce application. I am not familiar with JobClient code. But I do not think how it works if we make change in JobClient. Reading it from jobConf, normailizing it, and then seting it again?
          Hide
          Owen O'Malley added a comment -

          Rather than make the incompatible API change, I'd propose putting the qualification into JobClient where it is about to be submitted to the JobTracker. I believe the input path is already made absolute there.

          Show
          Owen O'Malley added a comment - Rather than make the incompatible API change, I'd propose putting the qualification into JobClient where it is about to be submitted to the JobTracker. I believe the input path is already made absolute there.
          Hide
          Hairong Kuang added a comment -

          This patch has two changes:
          1. make FileOutputFormat#setOutputPath to throw an IOException;
          2. The output dir could be in a file system which is different from the one defined in jobconf; So use Path.getFileSystem to get the file system handle.

          Show
          Hairong Kuang added a comment - This patch has two changes: 1. make FileOutputFormat#setOutputPath to throw an IOException; 2. The output dir could be in a file system which is different from the one defined in jobconf; So use Path.getFileSystem to get the file system handle.
          Hide
          Arun C Murthy added a comment -

          The patch is fine, though ignoring the exception bothers me. Should we atleast log that exception, if not abort?

          Show
          Arun C Murthy added a comment - The patch is fine, though ignoring the exception bothers me. Should we atleast log that exception, if not abort?

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development