Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5911

Terasort TeraOutputFormat does not check for output directory existance

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: examples
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      The enforcement that the directory must not yet exist is implemented in FileOutputFormat#checkOutputSpecs by throwing FileAlreadyExistsException. However, terasort uses a specialized output format, TeraOutputFormat, which is a subclass of FileOutputFormat. The subclass overrides checkOutputSpecs, but does not re-implement the existence check and throw FileAlreadyExistsException.

      1. HADOOP-5911.patch
        1 kB
        Bruno P. Kinoshita

        Issue Links

          Activity

          Hide
          Gera Shegalov added a comment -

          Hi Ivan Mitic, could you review and potentially commit MAPREDUCE-4879?

          Show
          Gera Shegalov added a comment - Hi Ivan Mitic , could you review and potentially commit MAPREDUCE-4879 ?
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1932 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1932/)
          MAPREDUCE-5911. Terasort TeraOutputFormat does not check for output directory existance. Contributed by Bruno P. Kinoshita. (ivanmi: rev 7bbda6ef92e9bf4a28e67b8736067b38defab51e)

          • hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java
          • hadoop-mapreduce-project/CHANGES.txt
            Revert "MAPREDUCE-5911. Terasort TeraOutputFormat does not check for output directory existance. Contributed by Bruno P. Kinoshita." (ivanmi: rev da80c4da41e555929b9432da7e999e27468efcf5)
          • hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java
          • hadoop-mapreduce-project/CHANGES.txt
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1932 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1932/ ) MAPREDUCE-5911 . Terasort TeraOutputFormat does not check for output directory existance. Contributed by Bruno P. Kinoshita. (ivanmi: rev 7bbda6ef92e9bf4a28e67b8736067b38defab51e) hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java hadoop-mapreduce-project/CHANGES.txt Revert " MAPREDUCE-5911 . Terasort TeraOutputFormat does not check for output directory existance. Contributed by Bruno P. Kinoshita." (ivanmi: rev da80c4da41e555929b9432da7e999e27468efcf5) hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java hadoop-mapreduce-project/CHANGES.txt
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #1907 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1907/)
          MAPREDUCE-5911. Terasort TeraOutputFormat does not check for output directory existance. Contributed by Bruno P. Kinoshita. (ivanmi: rev 7bbda6ef92e9bf4a28e67b8736067b38defab51e)

          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java
            Revert "MAPREDUCE-5911. Terasort TeraOutputFormat does not check for output directory existance. Contributed by Bruno P. Kinoshita." (ivanmi: rev da80c4da41e555929b9432da7e999e27468efcf5)
          • hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java
          • hadoop-mapreduce-project/CHANGES.txt
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1907 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1907/ ) MAPREDUCE-5911 . Terasort TeraOutputFormat does not check for output directory existance. Contributed by Bruno P. Kinoshita. (ivanmi: rev 7bbda6ef92e9bf4a28e67b8736067b38defab51e) hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java Revert " MAPREDUCE-5911 . Terasort TeraOutputFormat does not check for output directory existance. Contributed by Bruno P. Kinoshita." (ivanmi: rev da80c4da41e555929b9432da7e999e27468efcf5) hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java hadoop-mapreduce-project/CHANGES.txt
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #718 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/718/)
          MAPREDUCE-5911. Terasort TeraOutputFormat does not check for output directory existance. Contributed by Bruno P. Kinoshita. (ivanmi: rev 7bbda6ef92e9bf4a28e67b8736067b38defab51e)

          • hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java
          • hadoop-mapreduce-project/CHANGES.txt
            Revert "MAPREDUCE-5911. Terasort TeraOutputFormat does not check for output directory existance. Contributed by Bruno P. Kinoshita." (ivanmi: rev da80c4da41e555929b9432da7e999e27468efcf5)
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #718 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/718/ ) MAPREDUCE-5911 . Terasort TeraOutputFormat does not check for output directory existance. Contributed by Bruno P. Kinoshita. (ivanmi: rev 7bbda6ef92e9bf4a28e67b8736067b38defab51e) hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java hadoop-mapreduce-project/CHANGES.txt Revert " MAPREDUCE-5911 . Terasort TeraOutputFormat does not check for output directory existance. Contributed by Bruno P. Kinoshita." (ivanmi: rev da80c4da41e555929b9432da7e999e27468efcf5) hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java
          Hide
          Bruno P. Kinoshita added a comment -

          Apologies Ivan, Gera. I hadn't seen 4879 either, and thanks for reverting it Ivan

          Show
          Bruno P. Kinoshita added a comment - Apologies Ivan, Gera. I hadn't seen 4879 either, and thanks for reverting it Ivan
          Hide
          Ivan Mitic added a comment -

          I reverted the patch from trunk, branch-2 and branch-2.6. Resolving this Jira as a dupe of MAPREDUCE-4879, let's iterate on the right fix there.

          Show
          Ivan Mitic added a comment - I reverted the patch from trunk, branch-2 and branch-2.6. Resolving this Jira as a dupe of MAPREDUCE-4879 , let's iterate on the right fix there.
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #6290 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6290/)
          Revert "MAPREDUCE-5911. Terasort TeraOutputFormat does not check for output directory existance. Contributed by Bruno P. Kinoshita." (ivanmi: rev da80c4da41e555929b9432da7e999e27468efcf5)

          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #6290 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6290/ ) Revert " MAPREDUCE-5911 . Terasort TeraOutputFormat does not check for output directory existance. Contributed by Bruno P. Kinoshita." (ivanmi: rev da80c4da41e555929b9432da7e999e27468efcf5) hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java
          Hide
          Ivan Mitic added a comment -

          OK, I am going to revert the change given that it does not work and resolve this Jira as a duplicate of MAPREDUCE-4879. Let's iterate further on the other Jira. Thanks again Gera for catching this.

          Show
          Ivan Mitic added a comment - OK, I am going to revert the change given that it does not work and resolve this Jira as a duplicate of MAPREDUCE-4879 . Let's iterate further on the other Jira. Thanks again Gera for catching this.
          Hide
          Ivan Mitic added a comment -

          Thank you Gera Shegalov for bringing this up. You are right, this won't work with the default partitioner. Sorry I wasn't aware of MAPREDUCE-4879. Let me take another look and see whether to revert the change that went in or go with your patch as an addendum.

          Show
          Ivan Mitic added a comment - Thank you Gera Shegalov for bringing this up. You are right, this won't work with the default partitioner. Sorry I wasn't aware of MAPREDUCE-4879 . Let me take another look and see whether to revert the change that went in or go with your patch as an addendum.
          Hide
          Gera Shegalov added a comment -

          Ivan Mitic, This fix does not work if the default TotalOrderPartitioner is used instead of of SimplePartitioner. It will always fail because the partion file will have been written into the output dir by the the time checkOutputSpec is called. You should have taken my fix from MAPREDUCE-4879.

          Show
          Gera Shegalov added a comment - Ivan Mitic , This fix does not work if the default TotalOrderPartitioner is used instead of of SimplePartitioner. It will always fail because the partion file will have been written into the output dir by the the time checkOutputSpec is called. You should have taken my fix from MAPREDUCE-4879 .
          Hide
          Ivan Mitic added a comment -

          Committed to trunk, branch-2 and branch-2.6.

          Thank you Bruno for the contribution!

          Show
          Ivan Mitic added a comment - Committed to trunk, branch-2 and branch-2.6. Thank you Bruno for the contribution!
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #6289 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6289/)
          MAPREDUCE-5911. Terasort TeraOutputFormat does not check for output directory existance. Contributed by Bruno P. Kinoshita. (ivanmi: rev 7bbda6ef92e9bf4a28e67b8736067b38defab51e)

          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #6289 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6289/ ) MAPREDUCE-5911 . Terasort TeraOutputFormat does not check for output directory existance. Contributed by Bruno P. Kinoshita. (ivanmi: rev 7bbda6ef92e9bf4a28e67b8736067b38defab51e) hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java
          Hide
          Bruno P. Kinoshita added a comment -

          Thanks for the quick reply Ivan! I'll continue reading the code and looking for easy issues to send patches or comment then.

          Show
          Bruno P. Kinoshita added a comment - Thanks for the quick reply Ivan! I'll continue reading the code and looking for easy issues to send patches or comment then.
          Hide
          Ivan Mitic added a comment -

          Hi Bruno, it should be ok not to include a test case with this change, it's a minor fix to the examples.

          Will commit the patch shortly.

          Show
          Ivan Mitic added a comment - Hi Bruno, it should be ok not to include a test case with this change, it's a minor fix to the examples. Will commit the patch shortly.
          Hide
          Bruno P. Kinoshita added a comment -

          Hmm, the TestTeraSort is using JUnit 3 and is marked with the @Ignore annotation.

          Should we add another class for this test, remove the @Ignore annotation and fix the tests, or justify not having a test case for this case?

          Thanks
          Bruno

          Show
          Bruno P. Kinoshita added a comment - Hmm, the TestTeraSort is using JUnit 3 and is marked with the @Ignore annotation. Should we add another class for this test, remove the @Ignore annotation and fix the tests, or justify not having a test case for this case? Thanks Bruno
          Hide
          Bruno P. Kinoshita added a comment -

          Thanks Ivan!

          Looks like Jenkins is not happy about the missing tests. I'm updating the repository and will write a new PATCH with tests.

          Show
          Bruno P. Kinoshita added a comment - Thanks Ivan! Looks like Jenkins is not happy about the missing tests. I'm updating the repository and will write a new PATCH with tests.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12667588/HADOOP-5911.patch
          against trunk revision 8256766.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-examples.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4971//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4971//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667588/HADOOP-5911.patch against trunk revision 8256766. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-examples. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4971//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4971//console This message is automatically generated.
          Hide
          Ivan Mitic added a comment -

          Hi Bruno, thanks for contributing the patch! Looks good, +1.

          Will commit when it comes back with +1 from Jenkins.

          Show
          Ivan Mitic added a comment - Hi Bruno, thanks for contributing the patch! Looks good, +1. Will commit when it comes back with +1 from Jenkins.
          Hide
          Bruno P. Kinoshita added a comment -

          Hi, first time writing a patch for Hadoop. Based on the description provided by Ivan. Couldn't find any tests referencing this class, but no tests failed in maven.

          HTH, Bruno

          Show
          Bruno P. Kinoshita added a comment - Hi, first time writing a patch for Hadoop. Based on the description provided by Ivan. Couldn't find any tests referencing this class, but no tests failed in maven. HTH, Bruno

            People

            • Assignee:
              Bruno P. Kinoshita
              Reporter:
              Ivan Mitic
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development