Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4815

Speed up FileOutputCommitter#commitJob for many output files

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.3, 2.0.1-alpha, 2.4.1
    • Fix Version/s: 2.7.0
    • Component/s: mrv2
    • Labels:
    • Hadoop Flags:
      Reviewed

      Description

      If a job generates many files to commit then the commitJob method call at the end of the job can take minutes. This is a performance regression from 1.x, as 1.x had the tasks commit directly to the final output directory as they were completing and commitJob had very little to do. The commit work was processed in parallel and overlapped the processing of outstanding tasks. In 0.23/2.x, the commit is single-threaded and waits until all tasks have completed before commencing.

      1. MAPREDUCE-4815.v17.patch
        32 kB
        Siqi Li
      2. MAPREDUCE-4815.v16.patch
        32 kB
        Siqi Li
      3. MAPREDUCE-4815.v15.patch
        32 kB
        Siqi Li
      4. MAPREDUCE-4815.v14.patch
        31 kB
        Siqi Li
      5. MAPREDUCE-4815.v13.patch
        29 kB
        Siqi Li
      6. MAPREDUCE-4815.v12.patch
        27 kB
        Siqi Li
      7. MAPREDUCE-4815.v11.patch
        23 kB
        Siqi Li
      8. MAPREDUCE-4815.v10.patch
        18 kB
        Siqi Li
      9. MAPREDUCE-4815.v9.patch
        14 kB
        Siqi Li
      10. MAPREDUCE-4815.v8.patch
        33 kB
        Siqi Li
      11. MAPREDUCE-4815.v7.patch
        24 kB
        Siqi Li
      12. MAPREDUCE-4815.v6.patch
        29 kB
        Siqi Li
      13. MAPREDUCE-4815.v5.patch
        27 kB
        Siqi Li
      14. MAPREDUCE-4815.v4.patch
        27 kB
        Siqi Li
      15. MAPREDUCE-4815.v3.patch
        26 kB
        Siqi Li

        Issue Links

          Activity

          Hide
          jira.shegalov Gera Shegalov added a comment -

          Ivan Bella, thanks for reporting the problem. We need to capture this problem in a unit test. I chatted with Siqi Li offline, and need to verify my understanding of the problem.

          Your reducer i outputs some unique paths under $joboutput/_temporary/$appAttemptID/_temporary/$taskAttemptID/dir

          When multiple, say 2 reducers commit simultaneously before $joboutput/dir is created too they both try
          move:

          $joboutput/_temporary/$appAttemptID/_temporary/$taskAttemptID/dir -> $joboutput/dir

          Assuming reducer 1 won, creating $joboutput/dir, we will have 1's files under $joboutput/dir, 2's files will be under $joboutput/dir/dir

          Show
          jira.shegalov Gera Shegalov added a comment - Ivan Bella , thanks for reporting the problem. We need to capture this problem in a unit test. I chatted with Siqi Li offline, and need to verify my understanding of the problem. Your reducer i outputs some unique paths under $joboutput/_temporary/$appAttemptID/_temporary/$taskAttemptID/dir When multiple, say 2 reducers commit simultaneously before $joboutput/dir is created too they both try move: $joboutput/_temporary/$appAttemptID/_temporary/$taskAttemptID/dir -> $joboutput/dir Assuming reducer 1 won, creating $joboutput/dir , we will have 1's files under $joboutput/dir , 2's files will be under $joboutput/dir/dir
          Hide
          l201514 Siqi Li added a comment -

          Thanks Ivan Bella for reporting this, I have created another jira to address this potential race condition of double renaming in https://issues.apache.org/jira/browse/MAPREDUCE-6275

          Show
          l201514 Siqi Li added a comment - Thanks Ivan Bella for reporting this, I have created another jira to address this potential race condition of double renaming in https://issues.apache.org/jira/browse/MAPREDUCE-6275
          Hide
          ivan.bella Ivan Bella added a comment -

          Siqi Li I work with Dave and it appears after reviewing the last patch that the basic problem we reported still exists. Basically if mergePaths is called from two reducers at the same time, and they have output files in the same subdirectory (the key is subdirectory here) then we will get the situation as previously described. So say the working directory is dir.1, and two reducers are putting files in dir.1/dir.2, then this will non-deterministically result in one of the reducers creating a dir.1/dir.2/dir.2. We were in addition seeing that files were being put in dir.1/dir.2/dir.2 which should have been moved into dir.1/dir.2. We will deploy this patch and confirm.

          Show
          ivan.bella Ivan Bella added a comment - Siqi Li I work with Dave and it appears after reviewing the last patch that the basic problem we reported still exists. Basically if mergePaths is called from two reducers at the same time, and they have output files in the same subdirectory (the key is subdirectory here) then we will get the situation as previously described. So say the working directory is dir.1, and two reducers are putting files in dir.1/dir.2, then this will non-deterministically result in one of the reducers creating a dir.1/dir.2/dir.2. We were in addition seeing that files were being put in dir.1/dir.2/dir.2 which should have been moved into dir.1/dir.2. We will deploy this patch and confirm.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2079 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2079/)
          MAPREDUCE-4815. Speed up FileOutputCommitter#commitJob for many output files. (Siqi Li via gera) (gera: rev aa92b764a7ddb888d097121c4d610089a0053d11)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestFileOutputCommitter.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestFileOutputCommitter.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2079 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2079/ ) MAPREDUCE-4815 . Speed up FileOutputCommitter#commitJob for many output files. (Siqi Li via gera) (gera: rev aa92b764a7ddb888d097121c4d610089a0053d11) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestFileOutputCommitter.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestFileOutputCommitter.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #129 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/129/)
          MAPREDUCE-4815. Speed up FileOutputCommitter#commitJob for many output files. (Siqi Li via gera) (gera: rev aa92b764a7ddb888d097121c4d610089a0053d11)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestFileOutputCommitter.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestFileOutputCommitter.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java
          • hadoop-mapreduce-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #129 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/129/ ) MAPREDUCE-4815 . Speed up FileOutputCommitter#commitJob for many output files. (Siqi Li via gera) (gera: rev aa92b764a7ddb888d097121c4d610089a0053d11) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestFileOutputCommitter.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestFileOutputCommitter.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java hadoop-mapreduce-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #120 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/120/)
          MAPREDUCE-4815. Speed up FileOutputCommitter#commitJob for many output files. (Siqi Li via gera) (gera: rev aa92b764a7ddb888d097121c4d610089a0053d11)

          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestFileOutputCommitter.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestFileOutputCommitter.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #120 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/120/ ) MAPREDUCE-4815 . Speed up FileOutputCommitter#commitJob for many output files. (Siqi Li via gera) (gera: rev aa92b764a7ddb888d097121c4d610089a0053d11) hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestFileOutputCommitter.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestFileOutputCommitter.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2061 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2061/)
          MAPREDUCE-4815. Speed up FileOutputCommitter#commitJob for many output files. (Siqi Li via gera) (gera: rev aa92b764a7ddb888d097121c4d610089a0053d11)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestFileOutputCommitter.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestFileOutputCommitter.java
          • hadoop-mapreduce-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2061 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2061/ ) MAPREDUCE-4815 . Speed up FileOutputCommitter#commitJob for many output files. (Siqi Li via gera) (gera: rev aa92b764a7ddb888d097121c4d610089a0053d11) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestFileOutputCommitter.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestFileOutputCommitter.java hadoop-mapreduce-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #863 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/863/)
          MAPREDUCE-4815. Speed up FileOutputCommitter#commitJob for many output files. (Siqi Li via gera) (gera: rev aa92b764a7ddb888d097121c4d610089a0053d11)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestFileOutputCommitter.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestFileOutputCommitter.java
          • hadoop-mapreduce-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #863 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/863/ ) MAPREDUCE-4815 . Speed up FileOutputCommitter#commitJob for many output files. (Siqi Li via gera) (gera: rev aa92b764a7ddb888d097121c4d610089a0053d11) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestFileOutputCommitter.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestFileOutputCommitter.java hadoop-mapreduce-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #129 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/129/)
          MAPREDUCE-4815. Speed up FileOutputCommitter#commitJob for many output files. (Siqi Li via gera) (gera: rev aa92b764a7ddb888d097121c4d610089a0053d11)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestFileOutputCommitter.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestFileOutputCommitter.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java
          • hadoop-mapreduce-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #129 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/129/ ) MAPREDUCE-4815 . Speed up FileOutputCommitter#commitJob for many output files. (Siqi Li via gera) (gera: rev aa92b764a7ddb888d097121c4d610089a0053d11) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestFileOutputCommitter.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestFileOutputCommitter.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java hadoop-mapreduce-project/CHANGES.txt
          Hide
          jira.shegalov Gera Shegalov added a comment -

          Thanks Siqi Li for working through all review cycles, and Ming Ma, Dave Marion, Shrijeet Paliwal, and last but not least Jason Lowe for additional reviews. Committed to trunk, branch-2, and branch-2.7.

          Show
          jira.shegalov Gera Shegalov added a comment - Thanks Siqi Li for working through all review cycles, and Ming Ma , Dave Marion , Shrijeet Paliwal , and last but not least Jason Lowe for additional reviews. Committed to trunk, branch-2, and branch-2.7.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #7299 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7299/)
          MAPREDUCE-4815. Speed up FileOutputCommitter#commitJob for many output files. (Siqi Li via gera) (gera: rev aa92b764a7ddb888d097121c4d610089a0053d11)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestFileOutputCommitter.java
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestFileOutputCommitter.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7299 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7299/ ) MAPREDUCE-4815 . Speed up FileOutputCommitter#commitJob for many output files. (Siqi Li via gera) (gera: rev aa92b764a7ddb888d097121c4d610089a0053d11) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestFileOutputCommitter.java hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestFileOutputCommitter.java
          Hide
          jira.shegalov Gera Shegalov added a comment -

          Committing v17.

          Show
          jira.shegalov Gera Shegalov added a comment - Committing v17.
          Hide
          dlmarion Dave Marion added a comment -

          I'll see if I can apply the patch to the version of code on our test cluster in the next couple of days.

          Show
          dlmarion Dave Marion added a comment - I'll see if I can apply the patch to the version of code on our test cluster in the next couple of days.
          Hide
          l201514 Siqi Li added a comment -

          Shrijeet PaliwalDave Marion Hey guys, do you want to try out the latest patch for improving FileOutputCommitter?

          Show
          l201514 Siqi Li added a comment - Shrijeet Paliwal Dave Marion Hey guys, do you want to try out the latest patch for improving FileOutputCommitter?
          Hide
          jlowe Jason Lowe added a comment -

          +1 from me as well for v17. Thanks again to Siqi for sticking with the long review process.

          Show
          jlowe Jason Lowe added a comment - +1 from me as well for v17. Thanks again to Siqi for sticking with the long review process.
          Hide
          jira.shegalov Gera Shegalov added a comment -

          That sounds reasonable, Jason Lowe. In this sense, a self-reference for this JIRA in mapred-default.xml is good enough.

          Thanks for v17, Siqi Li. +1 from me. Jason Lowe?

          Show
          jira.shegalov Gera Shegalov added a comment - That sounds reasonable, Jason Lowe . In this sense, a self-reference for this JIRA in mapred-default.xml is good enough. Thanks for v17, Siqi Li . +1 from me. Jason Lowe ?
          Hide
          l201514 Siqi Li added a comment -

          I have updated the patch by removing the unnecessary documentation that may confuse users

          - After upgrading the file output committer version from 1 to 2,
          -  all newly submitted jobs will use algorithm 2. However, during
          -  the upgrade, if AM attempt of old jobs fails and restarts, the
          -  new AM attempt will pick up algorithm 2 and try to recover the
          -  task output from previous attempt. Algorithm 2 is able to handle
          -  this case properly by moving all previously committed task files
          -  to the final output directory.
          
          -  Note: this doesn't support the rolling upgrade, it can only tolerate
          -  full upgrade, after which the algorithm can be set to 2.
          
          Show
          l201514 Siqi Li added a comment - I have updated the patch by removing the unnecessary documentation that may confuse users - After upgrading the file output committer version from 1 to 2, - all newly submitted jobs will use algorithm 2. However, during - the upgrade, if AM attempt of old jobs fails and restarts, the - new AM attempt will pick up algorithm 2 and try to recover the - task output from previous attempt. Algorithm 2 is able to handle - this case properly by moving all previously committed task files - to the final output directory. - Note: this doesn't support the rolling upgrade, it can only tolerate - full upgrade, after which the algorithm can be set to 2.
          Hide
          jlowe Jason Lowe added a comment -

          Yes, I see. Again I'm not against leaving the code in there, rather I'm pointing out there are a zillion ways things can go wrong if people push new configs to a cluster for a new upgrade and jobs are allowed to load those new configs as they run. IMHO config isolation is key to supporting rolling upgrades in a safe manner.

          Show
          jlowe Jason Lowe added a comment - Yes, I see. Again I'm not against leaving the code in there, rather I'm pointing out there are a zillion ways things can go wrong if people push new configs to a cluster for a new upgrade and jobs are allowed to load those new configs as they run. IMHO config isolation is key to supporting rolling upgrades in a safe manner.
          Hide
          jira.shegalov Gera Shegalov added a comment -

          Hi Jason Lowe, thanks for feedback! I don't think the upgrade problem is dubious. Scenario is this:

          1. Cluster with old code, no version is set in site/job.xml. Jobs are running.
          2. Shutdown the cluster: all running app attempts failing
          3. Part of upgrade the cluster, yay we can use version=2, make it part of cluster-wide mapred-site.xml

          I understand it may not the best way to do it, but it's possible.

          My biggest concern is that if people push the config first, and then start a rolling upgrade, it may lead to an incomplete output because in commitJob we rightfully decided to cut the legacy inefficiency.

          1. rolling upgrade, version=2 is pushed to cluster mapred-site
          2. AM is killed because its node is upgraded.
          3. AM restarts on some upgraded node and version=2
          4. Some outstanding task X is scheduled on old node, commits in a v1 fashion, despite version=2
          5. Job completes, AM executes v2 commitJob and X output will be missing
          Show
          jira.shegalov Gera Shegalov added a comment - Hi Jason Lowe , thanks for feedback! I don't think the upgrade problem is dubious. Scenario is this: Cluster with old code, no version is set in site/job.xml. Jobs are running. Shutdown the cluster: all running app attempts failing Part of upgrade the cluster, yay we can use version=2, make it part of cluster-wide mapred-site.xml I understand it may not the best way to do it, but it's possible. My biggest concern is that if people push the config first, and then start a rolling upgrade, it may lead to an incomplete output because in commitJob we rightfully decided to cut the legacy inefficiency. rolling upgrade, version=2 is pushed to cluster mapred-site AM is killed because its node is upgraded. AM restarts on some upgraded node and version=2 Some outstanding task X is scheduled on old node, commits in a v1 fashion, despite version=2 Job completes, AM executes v2 commitJob and X output will be missing
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12703107/MAPREDUCE-4815.v17.patch
          against trunk revision 24db081.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5244//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5244//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703107/MAPREDUCE-4815.v17.patch against trunk revision 24db081. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5244//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5244//console This message is automatically generated.
          Hide
          l201514 Siqi Li added a comment -

          That makes sense, I will remove the use case for upgrading scenario from the document

          Show
          l201514 Siqi Li added a comment - That makes sense, I will remove the use case for upgrading scenario from the document
          Hide
          jlowe Jason Lowe added a comment -

          Thanks for the clarification, Siqi. That should only happen if the client submitting the job did not have a default value in their mapred-site.xml. If it did then the setting would make it into the job's conf (job.xml) and override what mapred-site.xml on the node says.

          From personal experience, you do not want jobs picking up confs from nodes as the nodes are upgrading. Besides concerns like the one here, there are all sorts of other things that can break. If the job doesn't try to pull from the node's configs then it's not an issue, and we run our clusters that way so we can safely upgrade the nodes without affecting the confs of the jobs on those nodes.

          To be clear, I'm not against the code trying to protect itself from this (somewhat dubious) use case, but I don't think we need to spell it out in the docs. It's not a use case we want to encourage.

          Show
          jlowe Jason Lowe added a comment - Thanks for the clarification, Siqi. That should only happen if the client submitting the job did not have a default value in their mapred-site.xml. If it did then the setting would make it into the job's conf (job.xml) and override what mapred-site.xml on the node says. From personal experience, you do not want jobs picking up confs from nodes as the nodes are upgrading. Besides concerns like the one here, there are all sorts of other things that can break. If the job doesn't try to pull from the node's configs then it's not an issue, and we run our clusters that way so we can safely upgrade the nodes without affecting the confs of the jobs on those nodes. To be clear, I'm not against the code trying to protect itself from this (somewhat dubious) use case, but I don't think we need to spell it out in the docs. It's not a use case we want to encourage.
          Hide
          l201514 Siqi Li added a comment -

          Agreed with shorten the documentation to reduce the unnecessary confusion.

          For algorithm changing on the fly, from what I understand, the use-case of algorithm version changes would be a cluster originally have algorithm 1 set in mapred-site.xml that want to use algorithm 2 for all jobs. As a result, people bump up the version to 2 in mapred-site.xml and restart the worker nodes. However, if an AM is running at that point , it will get restarted and the new attempt would use algorithm 2 to recover task result.

          Show
          l201514 Siqi Li added a comment - Agreed with shorten the documentation to reduce the unnecessary confusion. For algorithm changing on the fly, from what I understand, the use-case of algorithm version changes would be a cluster originally have algorithm 1 set in mapred-site.xml that want to use algorithm 2 for all jobs. As a result, people bump up the version to 2 in mapred-site.xml and restart the worker nodes. However, if an AM is running at that point , it will get restarted and the new attempt would use algorithm 2 to recover task result.
          Hide
          jlowe Jason Lowe added a comment -

          Took a closer look at the patch, patch looks pretty good to me. Huge thanks to Siqi for sticking with the patch over many iterations and to Gera for doing detailed reviews!

          I'm wondering about the use-case where the algorithm version changes while the job is running, which seems weird to me. The job conf would be changing between AM attempts which is dubious and would have ramifications beyond just this change. IMHO trying to describe this use case in the property description text adds unnecessary confusion, unless I'm missing how this would happen in practice and how often it would occur.

          Speaking of documentation, I do think it's important to point out in the docs the incompatibilities of algorithm 2 compared to algorithm 1 as I mentioned above (i.e.: more likely to leave partial output directly underneath the output directory if the job fails badly, resolution of output path collisions between tasks is no longer deterministic, etc). Users will need to make sure they have mechanisms in place to verify they are not using partial output (i.e.: leveraging the _SUCCESS file, checking for a successful job status from the RM/JHS, etc.)

          Show
          jlowe Jason Lowe added a comment - Took a closer look at the patch, patch looks pretty good to me. Huge thanks to Siqi for sticking with the patch over many iterations and to Gera for doing detailed reviews! I'm wondering about the use-case where the algorithm version changes while the job is running, which seems weird to me. The job conf would be changing between AM attempts which is dubious and would have ramifications beyond just this change. IMHO trying to describe this use case in the property description text adds unnecessary confusion, unless I'm missing how this would happen in practice and how often it would occur. Speaking of documentation, I do think it's important to point out in the docs the incompatibilities of algorithm 2 compared to algorithm 1 as I mentioned above (i.e.: more likely to leave partial output directly underneath the output directory if the job fails badly, resolution of output path collisions between tasks is no longer deterministic, etc). Users will need to make sure they have mechanisms in place to verify they are not using partial output (i.e.: leveraging the _SUCCESS file, checking for a successful job status from the RM/JHS, etc.)
          Hide
          jlowe Jason Lowe added a comment -

          Jason Lowe, would you mind taking a look as well. I think we are close.

          Will do, but I won't be able to get to this until next week.

          Show
          jlowe Jason Lowe added a comment - Jason Lowe, would you mind taking a look as well. I think we are close. Will do, but I won't be able to get to this until next week.
          Hide
          l201514 Siqi Li added a comment -

          Hi Gera, I just update v16, which
          1. removed extraneous and trailing white spaces in the patch.
          2. renamed fileStatus to taskAttemptDirStatus
          3. added more precise descriptions in mapred-default.xml
          4. removed unused import org.junit.Test in both test cases

          Show
          l201514 Siqi Li added a comment - Hi Gera, I just update v16, which 1. removed extraneous and trailing white spaces in the patch. 2. renamed fileStatus to taskAttemptDirStatus 3. added more precise descriptions in mapred-default.xml 4. removed unused import org.junit.Test in both test cases
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12701172/MAPREDUCE-4815.v16.patch
          against trunk revision 2214dab.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5229//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5229//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701172/MAPREDUCE-4815.v16.patch against trunk revision 2214dab. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5229//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5229//console This message is automatically generated.
          Hide
          jira.shegalov Gera Shegalov added a comment -

          Jason Lowe, would you mind taking a look as well. I think we are close.

          Thanks for v15 Siqi Li!

          Remaining nits:
          Please try to avoid introducing any extraneous and trailing white spaces in the patch.

          FileOutputCommitter
          The variable name fileStatus looks too generic in this context, maybe change to taskAttemptDirStatus

          mapred-default.xml
          maybe you could be more precise in both algorithm descriptions using the notation of my comment on 12/23.

          TestFileOutputCommitter
          Both Tests seem to have an unused import org.junit.Test

          Show
          jira.shegalov Gera Shegalov added a comment - Jason Lowe , would you mind taking a look as well. I think we are close. Thanks for v15 Siqi Li ! Remaining nits: Please try to avoid introducing any extraneous and trailing white spaces in the patch. FileOutputCommitter The variable name fileStatus looks too generic in this context, maybe change to taskAttemptDirStatus mapred-default.xml maybe you could be more precise in both algorithm descriptions using the notation of my comment on 12/23. TestFileOutputCommitter Both Tests seem to have an unused import org.junit.Test
          Hide
          l201514 Siqi Li added a comment -

          Thanks for your feedback Gera Shegalov. I have updated patch v15, which resolves 80 characters violations in mapred.TestFileOutputCommitter.

          Also, I have added some more documentation in mapred-default.xml, including behavior of algorithm 1, and not to do rolling upgrade.

          As for output.FileOutputCommitter, I have replace fs.exist(taskAttemptPath) with fs.getFileStatus(taskAttemptPath).

          Show
          l201514 Siqi Li added a comment - Thanks for your feedback Gera Shegalov . I have updated patch v15, which resolves 80 characters violations in mapred.TestFileOutputCommitter. Also, I have added some more documentation in mapred-default.xml, including behavior of algorithm 1, and not to do rolling upgrade. As for output.FileOutputCommitter, I have replace fs.exist(taskAttemptPath) with fs.getFileStatus(taskAttemptPath).
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12700539/MAPREDUCE-4815.v15.patch
          against trunk revision 1aea440.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5219//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5219//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700539/MAPREDUCE-4815.v15.patch against trunk revision 1aea440. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5219//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5219//console This message is automatically generated.
          Hide
          jira.shegalov Gera Shegalov added a comment -

          Thanks for v14, Siqi Li.

          Please check the code style again in the whole patch. I think you have some 80-column line width violations in mapred.TestFileOutputCommitter

          Other suggestions:
          mapred-default.xml
          I think it would be useful to document the algorithm 1 before pointing out its performance problems (mention this JIRA as well). Then document algorithm 2.

          output.FileOutputCommitter
          We can save one RPC roundtrip per commitTask. We have a check if (fs.exists(taskAttemptPath)) before doing the algorithmVersion == 1. Then in case of version 2, we need fs.getFileStatus(taskAttemptPath). If you look at FileSystem#exists it's implemented using getFileStatus. Please use getFileStatus to do the exists check manually such that you don't have to call it again for the second case.

          Show
          jira.shegalov Gera Shegalov added a comment - Thanks for v14, Siqi Li . Please check the code style again in the whole patch. I think you have some 80-column line width violations in mapred.TestFileOutputCommitter Other suggestions: mapred-default.xml I think it would be useful to document the algorithm 1 before pointing out its performance problems (mention this JIRA as well). Then document algorithm 2. output.FileOutputCommitter We can save one RPC roundtrip per commitTask. We have a check if (fs.exists(taskAttemptPath)) before doing the algorithmVersion == 1 . Then in case of version 2, we need fs.getFileStatus(taskAttemptPath) . If you look at FileSystem#exists it's implemented using getFileStatus . Please use getFileStatus to do the exists check manually such that you don't have to call it again for the second case.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12700296/MAPREDUCE-4815.v14.patch
          against trunk revision 769507b.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5215//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5215//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700296/MAPREDUCE-4815.v14.patch against trunk revision 769507b. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5215//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5215//console This message is automatically generated.
          Hide
          jira.shegalov Gera Shegalov added a comment -

          The logic looks good to me now, Siqi Li

          Please add an entry to mapred-default.xml as suggested by Xuan Gong to describe the difference between v1 and v2 and how you want to deal with upgrades.

          Outstanding nits in output.TestFileOutputCommitter:
          a bunch of fully qualified references to org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.FILEOUTPUTCOMMITTER_ALGORITHM_VERSION

          In testRecoveryInternal 80-column limit is violated

              conf.setInt(
                  FileOutputCommitter.FILEOUTPUTCOMMITTER_ALGORITHM_VERSION, commitVersion);
          

          probably easier to have it as:

              conf.setInt(FileOutputCommitter.FILEOUTPUTCOMMITTER_ALGORITHM_VERSION,
                  commitVersion);
          
          Show
          jira.shegalov Gera Shegalov added a comment - The logic looks good to me now, Siqi Li Please add an entry to mapred-default.xml as suggested by Xuan Gong to describe the difference between v1 and v2 and how you want to deal with upgrades. Outstanding nits in output.TestFileOutputCommitter : a bunch of fully qualified references to org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.FILEOUTPUTCOMMITTER_ALGORITHM_VERSION In testRecoveryInternal 80-column limit is violated conf.setInt( FileOutputCommitter.FILEOUTPUTCOMMITTER_ALGORITHM_VERSION, commitVersion); probably easier to have it as: conf.setInt(FileOutputCommitter.FILEOUTPUTCOMMITTER_ALGORITHM_VERSION, commitVersion);
          Hide
          l201514 Siqi Li added a comment -

          Thanks for you feedback Gera Shegalov, I have updated the patch v13 to address the issues you mentioned above.

          Show
          l201514 Siqi Li added a comment - Thanks for you feedback Gera Shegalov , I have updated the patch v13 to address the issues you mentioned above.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12700239/MAPREDUCE-4815.v13.patch
          against trunk revision fe7a302.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5213//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5213//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700239/MAPREDUCE-4815.v13.patch against trunk revision fe7a302. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5213//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5213//console This message is automatically generated.
          Hide
          jira.shegalov Gera Shegalov added a comment -

          Thanks for v12, Siqi Li!

          output.FileOutputCommitter:
          nits:
          in recoverTask we should add an info message about upgrade in the block because it may help debugging and it's a one-off situation.

                    // essentially a no-op, but for backwards compatibility
                    // after upgrade to the new fileOutputCommitter,
                    // check if there are any output left in committedTaskPath
          

          On the other hand we should probably suppress

           LOG.warn(attemptId + " had no output to recover.");
          

          by enclosing it in if (algorithmVersion == 1) because for v2 it's a normal situation and does not deserve a warning.

          output.TestFileOutputCommitter

          testRecoveryInternal needs to take two versions, commitVersion, recoveryVersion.

          Such that we can have the following tests:
          testRecoveryV1 aka testRecoveryInternal(1, 1), and testRecoveryV2 testRecoveryInternal(2,2) which you already have.
          However, we also need testRecoveryUpgradeV1V2: testRecoveryInternal(1, 2)

          We can have the following validation after commitTask

              committer.commitTask(tContext);
          
              Path jobTempDir1 = committer.getCommittedTaskPath(tContext);
              File jtd = new File(jobTempDir1.toUri().getPath());
              if (commitVersion == 1) {
                assertTrue("Version 1 commits to temporary dir " + jtd, jtd.exists());
                validateContent(jtd);
              } else {
                assertFalse("Version 2 commits to output dir " + jtd, jtd.exists());
              }
          

          and after recoverTask where we have

              conf2.setInt(FileOutputCommitter.FILEOUTPUTCOMMITTER_ALGORITHM_VERSION,
                  recoveryVerion);
          

          we can check:

              if (recoveryVerion == 1) {
                assertTrue("Version 1 recovers to " + jtd2, jtd2.exists());
                validateContent(jtd2);
              } else {
                assertFalse("Version 2 commits to output dir " + jtd2, jtd2.exists());
                if (commitVersion == 1) {
                  assertTrue("Version 2  recovery moves to output dir from "  + jtd , jtd.list().length == 0);
                }
              }
          

          testFailAbortInternal does not set the version passed as a parameter.

          nit: throughout the code:

              conf.setInt(org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.FILEOUTPUTCOMMITTER_ALGORITHM_VERSION, version);
          

          should simply be

           conf.setInt(FileOutputCommitter.FILEOUTPUTCOMMITTER_ALGORITHM_VERSION, version);
          

          as the test is in the same package as FOC.

          in testInvalidVersionNumber
          do we need

          JobContext jContext = new JobContextImpl(conf, taskID.getJobID());
          

          Similarly, since the variable {{committer} is not used, it would suffice to invoke the constructor without assigning the object to any variable.

           
          new FileOutputCommitter(outDir, tContext);
          

          mapred.TestFileOutputCommitter
          testMapOnlyNoOutputV1 and testMapOnlyNoOutputV2 are still needed for completeness

          Adjust testRecovery as above.

          Show
          jira.shegalov Gera Shegalov added a comment - Thanks for v12, Siqi Li ! output.FileOutputCommitter : nits: in recoverTask we should add an info message about upgrade in the block because it may help debugging and it's a one-off situation. // essentially a no-op, but for backwards compatibility // after upgrade to the new fileOutputCommitter, // check if there are any output left in committedTaskPath On the other hand we should probably suppress LOG.warn(attemptId + " had no output to recover." ); by enclosing it in if (algorithmVersion == 1) because for v2 it's a normal situation and does not deserve a warning. output.TestFileOutputCommitter testRecoveryInternal needs to take two versions, commitVersion, recoveryVersion. Such that we can have the following tests: testRecoveryV1 aka testRecoveryInternal(1, 1), and testRecoveryV2 testRecoveryInternal(2,2) which you already have. However, we also need testRecoveryUpgradeV1V2: testRecoveryInternal(1, 2) We can have the following validation after commitTask committer.commitTask(tContext); Path jobTempDir1 = committer.getCommittedTaskPath(tContext); File jtd = new File(jobTempDir1.toUri().getPath()); if (commitVersion == 1) { assertTrue( "Version 1 commits to temporary dir " + jtd, jtd.exists()); validateContent(jtd); } else { assertFalse( "Version 2 commits to output dir " + jtd, jtd.exists()); } and after recoverTask where we have conf2.setInt(FileOutputCommitter.FILEOUTPUTCOMMITTER_ALGORITHM_VERSION, recoveryVerion); we can check: if (recoveryVerion == 1) { assertTrue( "Version 1 recovers to " + jtd2, jtd2.exists()); validateContent(jtd2); } else { assertFalse( "Version 2 commits to output dir " + jtd2, jtd2.exists()); if (commitVersion == 1) { assertTrue( "Version 2 recovery moves to output dir from " + jtd , jtd.list().length == 0); } } testFailAbortInternal does not set the version passed as a parameter. nit: throughout the code: conf.setInt(org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.FILEOUTPUTCOMMITTER_ALGORITHM_VERSION, version); should simply be conf.setInt(FileOutputCommitter.FILEOUTPUTCOMMITTER_ALGORITHM_VERSION, version); as the test is in the same package as FOC. in testInvalidVersionNumber do we need JobContext jContext = new JobContextImpl(conf, taskID.getJobID()); Similarly, since the variable {{committer} is not used, it would suffice to invoke the constructor without assigning the object to any variable. new FileOutputCommitter(outDir, tContext); mapred.TestFileOutputCommitter testMapOnlyNoOutputV1 and testMapOnlyNoOutputV2 are still needed for completeness Adjust testRecovery as above.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12699963/MAPREDUCE-4815.v13.patch
          against trunk revision f56c65b.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5211//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5211//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699963/MAPREDUCE-4815.v13.patch against trunk revision f56c65b. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5211//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5211//console This message is automatically generated.
          Hide
          jira.shegalov Gera Shegalov added a comment -

          Xuan Gong, Moving the FOC setting to MRJobConfig would be inconsistent with already existing mapreduce.fileoutputcommitter.marksuccessfuljobs.

          Show
          jira.shegalov Gera Shegalov added a comment - Xuan Gong , Moving the FOC setting to MRJobConfig would be inconsistent with already existing mapreduce.fileoutputcommitter.marksuccessfuljobs.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12699959/MAPREDUCE-4815.v13.patch
          against trunk revision f56c65b.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5210//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699959/MAPREDUCE-4815.v13.patch against trunk revision f56c65b. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5210//console This message is automatically generated.
          Hide
          xgong Xuan Gong added a comment -

          Siqi Li Thanks for the patch.
          Could we put the configuration "FILEOUTPUTCOMMITTER_ALGORITHM_VERSION" into MRJobConfig ? We could use something like: MRJobConfig.fileoutputcommitter_algorithem_version instead of org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.FILEOUTPUTCOMMITTER_ALGORITHM_VERSION ?

          Also, could we add more documentation on this configuration to tell users what changes we have made between version 1 and version 2 ?

          Show
          xgong Xuan Gong added a comment - Siqi Li Thanks for the patch. Could we put the configuration "FILEOUTPUTCOMMITTER_ALGORITHM_VERSION" into MRJobConfig ? We could use something like: MRJobConfig.fileoutputcommitter_algorithem_version instead of org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.FILEOUTPUTCOMMITTER_ALGORITHM_VERSION ? Also, could we add more documentation on this configuration to tell users what changes we have made between version 1 and version 2 ?
          Hide
          l201514 Siqi Li added a comment -

          Thanks Gera Shegalov for the feedback, I have updated the patch with additional test cases that cover testAbort and testFailAbort

          Show
          l201514 Siqi Li added a comment - Thanks Gera Shegalov for the feedback, I have updated the patch with additional test cases that cover testAbort and testFailAbort
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12699505/MAPREDUCE-4815.v12.patch
          against trunk revision 2ecea5a.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5206//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5206//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699505/MAPREDUCE-4815.v12.patch against trunk revision 2ecea5a. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5206//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5206//console This message is automatically generated.
          Hide
          l201514 Siqi Li added a comment -

          I think in

          testRecovery/CommitInternal

          the output directory contents are also validated for version 2. For

          version == 1

          there was an extra validation for task output directory

          Show
          l201514 Siqi Li added a comment - I think in testRecovery/CommitInternal the output directory contents are also validated for version 2. For version == 1 there was an extra validation for task output directory
          Hide
          l201514 Siqi Li added a comment -

          I think in

          testRecovery/CommitInternal

          the output directory contents are also validated for version 2. For

          version == 1

          there was an extra validation for task output directory

          Show
          l201514 Siqi Li added a comment - I think in testRecovery/CommitInternal the output directory contents are also validated for version 2. For version == 1 there was an extra validation for task output directory
          Hide
          jira.shegalov Gera Shegalov added a comment -

          Thanks for v11, Siqi!

          FileOutputComitter

          Pull the info statement

          LOG.info("File Output Committer Algorithm version is " + algorithmVersion);
          

          out of if (outputPath != null) {

          Tests:
          TestFileOutputCommitter (both classes)
          There are still tests, e.g. testAbort, testFailAbort that are not tested with both committers.

          Currently directory contents are only validated for version == 1 in testRecovery/CommitInternal. You should define and validate content for version 2 as well.

          Show
          jira.shegalov Gera Shegalov added a comment - Thanks for v11, Siqi! FileOutputComitter Pull the info statement LOG.info( "File Output Committer Algorithm version is " + algorithmVersion); out of if (outputPath != null) { Tests: TestFileOutputCommitter (both classes) There are still tests, e.g. testAbort , testFailAbort that are not tested with both committers . Currently directory contents are only validated for version == 1 in testRecovery/CommitInternal . You should define and validate content for version 2 as well.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12696045/MAPREDUCE-4815.v11.patch
          against trunk revision 8acc5e9.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5144//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5144//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5144//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696045/MAPREDUCE-4815.v11.patch against trunk revision 8acc5e9. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5144//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5144//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5144//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12696015/MAPREDUCE-4815.v11.patch
          against trunk revision 8acc5e9.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5142//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5142//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5142//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696015/MAPREDUCE-4815.v11.patch against trunk revision 8acc5e9. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5142//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5142//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5142//console This message is automatically generated.
          Hide
          l201514 Siqi Li added a comment -

          Gera Shegalov Thanks for your feedback, I have updated the patch that addressed issues you mentioned above.

          Show
          l201514 Siqi Li added a comment - Gera Shegalov Thanks for your feedback, I have updated the patch that addressed issues you mentioned above.
          Hide
          jira.shegalov Gera Shegalov added a comment -

          Thanks for v10 Siqi and for clarification regarding the constructor that I overlooked.

          FileOutputCommitter

          • algorithmVersion should be declared final, hence initialized unconditionally and validated. Currently we are implying that we support only 1 and 2. Throw an exception for invalid values.

          nits:

          • FileOutputCommitter#recoverTask has white space issues in your changes.
          • if (algorithmVersion == 1) : code style is a space after 'if', no space after '('
          • in the following else the indentation should be 2 spaces.

          TestFileOutputCommitter
          lib.output.TestFileOutputCommitter has only white space changes and mapred.TestFileOutputCommitter has too much code duplication. I suggest the following approach.
          In both of these classes refactor all public test methods to be private methods that take version as a parameter. For example:

          public void testRecovery() throws Exception {
          ...
          }
          

          becomes:

          private void testRecoveryInternal(int version) throws Exception {
            JobConf conf = new JobConf();
            conf.setInt(org.apache.hadoop.mapreduce.lib.output.
                  FileOutputCommitter.FILEOUTPUTCOMMITTER_ALGORITHM_VERSION, version);
            ...
          }
          
          public void testRecoveryV1() throws Exception {
            testRecoveryInternal(1);
          }
          
          public void testRecoveryV2() throws Exception {
            testRecoveryInternal(2);
          }
          

          Also add at least one test for an invalid version.

          Show
          jira.shegalov Gera Shegalov added a comment - Thanks for v10 Siqi and for clarification regarding the constructor that I overlooked. FileOutputCommitter algorithmVersion should be declared final, hence initialized unconditionally and validated. Currently we are implying that we support only 1 and 2. Throw an exception for invalid values. nits: FileOutputCommitter#recoverTask has white space issues in your changes. if (algorithmVersion == 1) : code style is a space after 'if', no space after '(' in the following else the indentation should be 2 spaces. TestFileOutputCommitter lib.output.TestFileOutputCommitter has only white space changes and mapred.TestFileOutputCommitter has too much code duplication. I suggest the following approach. In both of these classes refactor all public test methods to be private methods that take version as a parameter. For example: public void testRecovery() throws Exception { ... } becomes: private void testRecoveryInternal( int version) throws Exception { JobConf conf = new JobConf(); conf.setInt(org.apache.hadoop.mapreduce.lib.output. FileOutputCommitter.FILEOUTPUTCOMMITTER_ALGORITHM_VERSION, version); ... } public void testRecoveryV1() throws Exception { testRecoveryInternal(1); } public void testRecoveryV2() throws Exception { testRecoveryInternal(2); } Also add at least one test for an invalid version.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12694812/MAPREDUCE-4815.v10.patch
          against trunk revision f56da3c.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5123//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5123//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5123//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694812/MAPREDUCE-4815.v10.patch against trunk revision f56da3c. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5123//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5123//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5123//console This message is automatically generated.
          Hide
          l201514 Siqi Li added a comment -

          Thanks for your feedback,Gera Shegalov. I have uploaded the patch to address the issues that you mentioned above.

          For the question of "Why the flag for the new behavior is not initialized when FileOutputCommitter#FileOutputCommitter(Path, TaskAttemptContext) is used."

          I think it's calling this(outputPath, (JobContext)context), so the algorithmVersion will be initialized there

          Show
          l201514 Siqi Li added a comment - Thanks for your feedback, Gera Shegalov . I have uploaded the patch to address the issues that you mentioned above. For the question of "Why the flag for the new behavior is not initialized when FileOutputCommitter#FileOutputCommitter(Path, TaskAttemptContext) is used." I think it's calling this(outputPath, (JobContext)context), so the algorithmVersion will be initialized there
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12694627/MAPREDUCE-4815.v10.patch
          against trunk revision 21d5599.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5122//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5122//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5122//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694627/MAPREDUCE-4815.v10.patch against trunk revision 21d5599. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5122//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5122//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5122//console This message is automatically generated.
          Hide
          jira.shegalov Gera Shegalov added a comment -

          Thanks for the latest patch, Siqi Li! Some comments/questions:

          1. we are changing the behavior and not the api, we can have a property
          mapreduce.fileoutputcommitter.algorithm.version
          1: the old behavior. This should be the default unless we have solved the upgrade in an efficient bullet-proof manner.
          2: the new proposed design.

          Why the flag for the new behavior is not initialized when FileOutputCommitter#FileOutputCommitter(Path, TaskAttemptContext) is used.

          There is a minor difference between runOldCommitJob and runNewCommitJob in that the lengthy copy iterator is skipped. Therefore, no need to duplicate code. Enclose this copy loop into some if (version == 1). I think it’s sufficient to have such checks for commit/recoverTask as well.

          Code under the comment

          //for backwards compatibility after upgrade to the new fileOutputCommitter,
          //check if there are any output left in committedTaskPath
          

          seems misplaced and should actually be under runNewRecoverTask. This scenario will need a test. Equally the existing tests should be run under both the new and the old logic.

          Show
          jira.shegalov Gera Shegalov added a comment - Thanks for the latest patch, Siqi Li ! Some comments/questions: 1. we are changing the behavior and not the api, we can have a property mapreduce.fileoutputcommitter.algorithm.version 1: the old behavior. This should be the default unless we have solved the upgrade in an efficient bullet-proof manner. 2: the new proposed design. Why the flag for the new behavior is not initialized when FileOutputCommitter#FileOutputCommitter(Path, TaskAttemptContext) is used. There is a minor difference between runOldCommitJob and runNewCommitJob in that the lengthy copy iterator is skipped. Therefore, no need to duplicate code. Enclose this copy loop into some if (version == 1) . I think it’s sufficient to have such checks for commit/recoverTask as well. Code under the comment // for backwards compatibility after upgrade to the new fileOutputCommitter, //check if there are any output left in committedTaskPath seems misplaced and should actually be under runNewRecoverTask . This scenario will need a test. Equally the existing tests should be run under both the new and the old logic.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12692010/MAPREDUCE-4815.v9.patch
          against trunk revision 10ac5ab.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5105//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5105//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5105//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692010/MAPREDUCE-4815.v9.patch against trunk revision 10ac5ab. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5105//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5105//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5105//console This message is automatically generated.
          Hide
          l201514 Siqi Li added a comment -

          I have attached a patch v9 based on the design suggestions from Jason and Gera.

          Also, I have run a bunch of performance testing job as follows,

          1. Teragen job with 500 mappers

          Job Execution Time Job Commit Time
          Old APIs 43 sec 31 sec
          New APIs 31 sec 0.2 sec
          Savings ~38.7%

          2. Teragen job with 5K mappers

          Job Execution Time Job Commit Time
          Old APIs 6 min 8 sec 2 min
          New APIs 4 min 10 sec 0.3 sec
          Savings ~33.3%

          3. Teragen job with 20K mappers

          Job Execution Time Job Commit Time
          Old APIs 23 min 45 sec 10 min
          New APIs 15 min 36 sec 0.5 sec
          Savings ~33.3%

          According to the tables above, the average time saving of teragen job is ~33.3%, and the job commit time of new API is almost instant when compared to old APIs, which is linear to the number of tasks. Noted that this is when the entire cluster is used by this job only. In actual scenario, the job commit time may take much longer when NNs are under heavy load.

          In addition, this new APIs are optimized for large jobs with small average task finish time. Because, this kind of job require less time to finish all task, but use a lot of time doing committing using old APIs. This means a large portion of overall job time is used to commit. However, with the new APIs commit time is largely reduced, hence, the saving is huge.

          For the long running small jobs, the saving might be negligible, but it will not be worse than the old APIs

          Show
          l201514 Siqi Li added a comment - I have attached a patch v9 based on the design suggestions from Jason and Gera. Also, I have run a bunch of performance testing job as follows, 1. Teragen job with 500 mappers Job Execution Time Job Commit Time Old APIs 43 sec 31 sec New APIs 31 sec 0.2 sec Savings ~38.7% 2. Teragen job with 5K mappers Job Execution Time Job Commit Time Old APIs 6 min 8 sec 2 min New APIs 4 min 10 sec 0.3 sec Savings ~33.3% 3. Teragen job with 20K mappers Job Execution Time Job Commit Time Old APIs 23 min 45 sec 10 min New APIs 15 min 36 sec 0.5 sec Savings ~33.3% According to the tables above, the average time saving of teragen job is ~33.3%, and the job commit time of new API is almost instant when compared to old APIs, which is linear to the number of tasks. Noted that this is when the entire cluster is used by this job only. In actual scenario, the job commit time may take much longer when NNs are under heavy load. In addition, this new APIs are optimized for large jobs with small average task finish time. Because, this kind of job require less time to finish all task, but use a lot of time doing committing using old APIs. This means a large portion of overall job time is used to commit. However, with the new APIs commit time is largely reduced, hence, the saving is huge. For the long running small jobs, the saving might be negligible, but it will not be worse than the old APIs
          Hide
          jlowe Jason Lowe added a comment -

          Thanks for the design proposal, Gera. I think that should work, and I'm a lot more comfortable with an approach that doesn't use an alternate top-level directory which will likely break existing, derived committers. However the proposed design has some slight incompatibilities with how it behaves today:

          • If a job crashes and fails to cleanup properly, partial output can appear in the output directory. This is true today as well if the AM crashes in the middle of the rename process, but the window with the new design is much, much wider since files appear in the output directory as tasks complete rather than in a burst at the end of the job.
          • The resolution of output filename collisions between tasks is deterministic today but would be dependent upon task completion order with the proposed design. commitTask would need to be tolerant of renames to existing files due to this collision possibility. This can also happen in the non-recovery case if the previous attempt crashed during a multi-file commit, since some of the output files are already there. abortTask doesn't know enough about what is being committed to cleanup after the commit to avoid this.
          • If a task attempt emits non-deterministic filenames then we could end up generating extra output after a recovery since we don't know which files in the output directory correspond to the previous task attempt if the attempt crashed mid-commit.

          IIRC Hadoop 0.20/1.x also committed output files straight from tasks into the output directory and therefore would have the same issues as the first two above. The first is mitigated by the _SUCCESS file, and consumers of the output should be sanity-checking that it exists before assuming all the data is there. The second is mitigated by a "don't do that" approach or ensuring that the file contents are identical so it doesn't matter who wins.

          The third issue is probably not a show stopper, since MapReduce doesn't work that well with non-deterministic tasks in general (e.g.: speculation and reattempts). However we may want to implement this, at least initially, with a config or protected method that allows jobs or derived committers to request the original FileOutputCommitter behavior in case they need to preserve the current behavior.

          About the task directory rename, I'm not sure it's strictly necessary. The AM is tracking whether a task successfully completed in the jhist file, and a subsequent app attempt won't try to recover the output of a task unless it has received confirmation that the task completed and recorded that in the jhist file. However we could keep the directory rename as an additional indication that the commit succeeded and the output was successfully placed in the output directory. This allows the recoverTask method to report failure if it doesn't find that directory, although one would wonder how the AM received confirmation from the previous task that everything was OK and recorded that in the jhist. I'm torn on whether to keep this sanity check, as removing it allows us to eliminate another write operation on the namenode per task. Without it recoverTask becomes a no-op in the new design, since if the previous task attempt completed then we know its output is already in the job output directory and there's nothing else to do for that task.

          Show
          jlowe Jason Lowe added a comment - Thanks for the design proposal, Gera. I think that should work, and I'm a lot more comfortable with an approach that doesn't use an alternate top-level directory which will likely break existing, derived committers. However the proposed design has some slight incompatibilities with how it behaves today: If a job crashes and fails to cleanup properly, partial output can appear in the output directory. This is true today as well if the AM crashes in the middle of the rename process, but the window with the new design is much, much wider since files appear in the output directory as tasks complete rather than in a burst at the end of the job. The resolution of output filename collisions between tasks is deterministic today but would be dependent upon task completion order with the proposed design. commitTask would need to be tolerant of renames to existing files due to this collision possibility. This can also happen in the non-recovery case if the previous attempt crashed during a multi-file commit, since some of the output files are already there. abortTask doesn't know enough about what is being committed to cleanup after the commit to avoid this. If a task attempt emits non-deterministic filenames then we could end up generating extra output after a recovery since we don't know which files in the output directory correspond to the previous task attempt if the attempt crashed mid-commit. IIRC Hadoop 0.20/1.x also committed output files straight from tasks into the output directory and therefore would have the same issues as the first two above. The first is mitigated by the _SUCCESS file, and consumers of the output should be sanity-checking that it exists before assuming all the data is there. The second is mitigated by a "don't do that" approach or ensuring that the file contents are identical so it doesn't matter who wins. The third issue is probably not a show stopper, since MapReduce doesn't work that well with non-deterministic tasks in general (e.g.: speculation and reattempts). However we may want to implement this, at least initially, with a config or protected method that allows jobs or derived committers to request the original FileOutputCommitter behavior in case they need to preserve the current behavior. About the task directory rename, I'm not sure it's strictly necessary. The AM is tracking whether a task successfully completed in the jhist file, and a subsequent app attempt won't try to recover the output of a task unless it has received confirmation that the task completed and recorded that in the jhist file. However we could keep the directory rename as an additional indication that the commit succeeded and the output was successfully placed in the output directory. This allows the recoverTask method to report failure if it doesn't find that directory, although one would wonder how the AM received confirmation from the previous task that everything was OK and recorded that in the jhist. I'm torn on whether to keep this sanity check, as removing it allows us to eliminate another write operation on the namenode per task. Without it recoverTask becomes a no-op in the new design, since if the previous task attempt completed then we know its output is already in the job output directory and there's nothing else to do for that task.
          Hide
          jira.shegalov Gera Shegalov added a comment -

          For commit it's necessary to distinguish "empty output", and "fully succeeded move". Thus an atomic dir rename at the end, which is like a commit record of the potentially multi-rename transaction.

          For recoverTask, step 2 ensures that it can deal with the output of the previous FOC implementation. We actually need this logic in commitJob as well because the AM node could be upgraded later than the others. In this sense, I am not sure how fancy we want to be as it might easily defeat the purpose of the optimization. Maybe we should just write out FOC version under _temporary to know whether to do the commit new way or old way.

          Show
          jira.shegalov Gera Shegalov added a comment - For commit it's necessary to distinguish "empty output", and "fully succeeded move". Thus an atomic dir rename at the end, which is like a commit record of the potentially multi-rename transaction. For recoverTask, step 2 ensures that it can deal with the output of the previous FOC implementation. We actually need this logic in commitJob as well because the AM node could be upgraded later than the others. In this sense, I am not sure how fancy we want to be as it might easily defeat the purpose of the optimization. Maybe we should just write out FOC version under _temporary to know whether to do the commit new way or old way.
          Hide
          l201514 Siqi Li added a comment -

          there was a typo in my previous question, What I mean was that are the second step of commitTask and the first step of recoverTask necessary? Since after step 1 of commitTask, the directory will be empty

          Show
          l201514 Siqi Li added a comment - there was a typo in my previous question, What I mean was that are the second step of commitTask and the first step of recoverTask necessary? Since after step 1 of commitTask, the directory will be empty
          Hide
          jira.shegalov Gera Shegalov added a comment -

          Siqi Li please elaborate on your question?

          Show
          jira.shegalov Gera Shegalov added a comment - Siqi Li please elaborate on your question?
          Hide
          l201514 Siqi Li added a comment -

          Thanks Gera for your feedback, this is a good way to do it without using a sibling directory.

          But I still have some question. Not sure if commitTask 1 and recoverTask 1 are necessary, since the directory is empty .

          Show
          l201514 Siqi Li added a comment - Thanks Gera for your feedback, this is a good way to do it without using a sibling directory. But I still have some question. Not sure if commitTask 1 and recoverTask 1 are necessary, since the directory is empty .
          Hide
          jira.shegalov Gera Shegalov added a comment -

          I think we should strive for a solution that does not create any sibling directories as it will surprise users, and it would mean that checkOutputSpec everywhere needs to be adjusted in derived classes. I think we can modify the behavior of the FOC based on Siqi Li's idea but still use the existing directory structure for backwards-compatibility:

          task attempts write as usual to $joboutput/_temporary/$appAttemptID/_temporary/$taskAttemptID/

          commitTask:

          1. rename all files '$joboutput/_temporary/$appAttemptID/_temporary/$taskAttemptID/foo' to '$joboutput/foo'.
          2. rename '$joboutput/_temporary/$appAttemptID/_temporary/$taskAttemptID' to '$joboutput/_temporary/$appAttemptID/$taskID' , which is the actual commit

          recoverTask:

          1. if '$joboutput/_temporary/$(appAttemptID - 1)/$taskID' exists: rename to '$joboutput/_temporary/$appAttemptID/$taskID'
          2. for backwards compatibility after upgrade to the new logic, check if there are any '$joboutput/_temporary/$appAttemptID/$taskID/foo' and rename them to '$joboutput/foo'

          commitJob

          1. blow away $joboutput/_temporary
          2. write $joboutput/_SUCCESS
          Show
          jira.shegalov Gera Shegalov added a comment - I think we should strive for a solution that does not create any sibling directories as it will surprise users, and it would mean that checkOutputSpec everywhere needs to be adjusted in derived classes. I think we can modify the behavior of the FOC based on Siqi Li 's idea but still use the existing directory structure for backwards-compatibility: task attempts write as usual to $joboutput/_temporary/$appAttemptID/_temporary/$taskAttemptID/ commitTask : rename all files '$joboutput/_temporary/$appAttemptID/_temporary/$taskAttemptID/foo' to '$joboutput/foo'. rename '$joboutput/_temporary/$appAttemptID/_temporary/$taskAttemptID' to '$joboutput/_temporary/$appAttemptID/$taskID' , which is the actual commit recoverTask : if '$joboutput/_temporary/$(appAttemptID - 1)/$taskID' exists: rename to '$joboutput/_temporary/$appAttemptID/$taskID' for backwards compatibility after upgrade to the new logic, check if there are any '$joboutput/_temporary/$appAttemptID/$taskID/foo' and rename them to '$joboutput/foo' commitJob blow away $joboutput/_temporary write $joboutput/_SUCCESS
          Hide
          l201514 Siqi Li added a comment -

          Hi Shrijeet Paliwal, I have also seen that the output_temporary directory has not been cleaned up properly. Do you have a patch to fix this issue?

          Show
          l201514 Siqi Li added a comment - Hi Shrijeet Paliwal , I have also seen that the output_temporary directory has not been cleaned up properly. Do you have a patch to fix this issue?
          Hide
          dlmarion Dave Marion added a comment -

          Assuming that the output directory is created in setupJob makes sense, but may be invalid for any extensions of FileOutputCommitter that don't call super.setupJob() if they override the method. The fix in mergePaths seems easy and safe.

          Show
          dlmarion Dave Marion added a comment - Assuming that the output directory is created in setupJob makes sense, but may be invalid for any extensions of FileOutputCommitter that don't call super.setupJob() if they override the method. The fix in mergePaths seems easy and safe.
          Hide
          dlmarion Dave Marion added a comment -

          We were seeing it in the case of distcp. I have not checked the logs, but will see if I can find something.

          Show
          dlmarion Dave Marion added a comment - We were seeing it in the case of distcp. I have not checked the logs, but will see if I can find something.
          Hide
          l201514 Siqi Li added a comment -

          Dave Marion Although, doing rename twice would result in wrong output directory structure, it seems commitTask() will not do rename directory at all.

          Since, the destination directory has already been created during setupJob()

          if (!fs.mkdirs(jobAttemptPath)) {
                  LOG.error("Mkdirs failed to create " + jobAttemptPath);
          }
          

          Have you check the log to see if your job have logged error in creating this directory

          Show
          l201514 Siqi Li added a comment - Dave Marion Although, doing rename twice would result in wrong output directory structure, it seems commitTask() will not do rename directory at all. Since, the destination directory has already been created during setupJob() if (!fs.mkdirs(jobAttemptPath)) { LOG.error( "Mkdirs failed to create " + jobAttemptPath); } Have you check the log to see if your job have logged error in creating this directory
          Hide
          dlmarion Dave Marion added a comment -

          This appears to be known, but not expected behavior in the NameNode. Also, I would update by comment above to fix the code, but apparently I can't edit my own comments.

          Show
          dlmarion Dave Marion added a comment - This appears to be known, but not expected behavior in the NameNode. Also, I would update by comment above to fix the code, but apparently I can't edit my own comments.
          Hide
          l201514 Siqi Li added a comment -

          Dave Marion Hi Dave, you are right, if two tasks are both calling rename, the output directory structure would have problems

          Show
          l201514 Siqi Li added a comment - Dave Marion Hi Dave, you are right, if two tasks are both calling rename, the output directory structure would have problems
          Hide
          dlmarion Dave Marion added a comment -

          I think in mergePaths the following:

          } else if (from.isDirectory()) {
          if (fs.exists(to))

          { should be changed to }

          else if (from.isDirectory()) {
          if (!fs.exists(to))

          { fs.mkdirs(to); }

          and remove the else condition at the bottom.

          Show
          dlmarion Dave Marion added a comment - I think in mergePaths the following: } else if (from.isDirectory()) { if (fs.exists(to)) { should be changed to } else if (from.isDirectory()) { if (!fs.exists(to)) { fs.mkdirs(to); } and remove the else condition at the bottom.
          Hide
          dlmarion Dave Marion added a comment -

          I think we might be seeing a side effect of patch #8. What we are seeing is an output directory being created underneath the location where it should be. For example, if we expect files in dir1/dir2 there are times when we see /dir1/dir2/dir2. I think the problem stems from the call to mergePaths now being called from commitTask, and there is a race condition when two tasks complete at the same time. Specifically, its the last case in mergePaths when 'from' does not exist, so it calls rename.

          I traced this, hopefully correctly, to FSNamesystem.renameToInternal() which has a nasty comment about doing something that it shouldn't. It also appears to create dir1/dir2/dir2. I think this is a bug in FSNamesystem. For example if

          from = /pathA/dir1/dir2
          to = /pathB/dir1/dir2

          What happens when two processes call fs.rename(from,to) at the same time?

          Show
          dlmarion Dave Marion added a comment - I think we might be seeing a side effect of patch #8. What we are seeing is an output directory being created underneath the location where it should be. For example, if we expect files in dir1/dir2 there are times when we see /dir1/dir2/dir2. I think the problem stems from the call to mergePaths now being called from commitTask, and there is a race condition when two tasks complete at the same time. Specifically, its the last case in mergePaths when 'from' does not exist, so it calls rename. I traced this, hopefully correctly, to FSNamesystem.renameToInternal() which has a nasty comment about doing something that it shouldn't. It also appears to create dir1/dir2/dir2. I think this is a bug in FSNamesystem. For example if from = /pathA/dir1/dir2 to = /pathB/dir1/dir2 What happens when two processes call fs.rename(from,to) at the same time?
          Hide
          dlmarion Dave Marion added a comment -

          I don't think either of those apply to us as we dont open a file during the M/R job to the directory, and the directory is deleted after the M/R job. Thanks for the information though. I'll watch this ticket for changes. For us, it's working. It would be great to get this committed. Thanks.

          Show
          dlmarion Dave Marion added a comment - I don't think either of those apply to us as we dont open a file during the M/R job to the directory, and the directory is deleted after the M/R job. Thanks for the information though. I'll watch this ticket for changes. For us, it's working. It would be great to get this committed. Thanks.
          Hide
          shrijeet Shrijeet Paliwal added a comment -

          I worked with Siqi Li to derive patch #8 (patch #7 is broke, do not use it) & want to report there might be a small bug left to resolve in #8 too. It leaves a temp directory behind after finishing the job successfully. For example if your job write to /foo/bar it might be leaving /foor/bar_temporary as a by product. The temp directory is empty. I will submit an amend to fix this.

          Also post this patch any job which bypasses contex.write approach to write to o/p, will fail to work.

          For example if in mapper/reducer ones opens a file directly & writes to job's o/p directory - that o/p will be lost. Pi estimator does following & it fails after this patch.

            @Override
              public void cleanup(Context context) throws IOException {
                //write output to a file
                Configuration conf = context.getConfiguration();
                Path outDir = new Path(conf.get(FileOutputFormat.OUTDIR));
                Path outFile = new Path(outDir, "reduce-out");
                FileSystem fileSys = FileSystem.get(conf);
                SequenceFile.Writer writer = SequenceFile.createWriter(fileSys, conf,
                    outFile, LongWritable.class, LongWritable.class, 
                    CompressionType.NONE);
                writer.append(new LongWritable(numInside), new LongWritable(numOutside));
                writer.close();
              }
          
          Show
          shrijeet Shrijeet Paliwal added a comment - I worked with Siqi Li to derive patch #8 (patch #7 is broke, do not use it) & want to report there might be a small bug left to resolve in #8 too. It leaves a temp directory behind after finishing the job successfully. For example if your job write to /foo/bar it might be leaving /foor/bar_temporary as a by product. The temp directory is empty. I will submit an amend to fix this. Also post this patch any job which bypasses contex.write approach to write to o/p, will fail to work. For example if in mapper/reducer ones opens a file directly & writes to job's o/p directory - that o/p will be lost. Pi estimator does following & it fails after this patch. @Override public void cleanup(Context context) throws IOException { //write output to a file Configuration conf = context.getConfiguration(); Path outDir = new Path(conf.get(FileOutputFormat.OUTDIR)); Path outFile = new Path(outDir, "reduce-out"); FileSystem fileSys = FileSystem.get(conf); SequenceFile.Writer writer = SequenceFile.createWriter(fileSys, conf, outFile, LongWritable.class, LongWritable.class, CompressionType.NONE); writer.append(new LongWritable(numInside), new LongWritable(numOutside)); writer.close(); }
          Hide
          dlmarion Dave Marion added a comment -

          We were seeing slow times at the end of our M/R jobs. We applied patch #8 today to our Hadoop software (CDH 5.1.2) and have seen M/R jobs act much better. Jobs are finishing right after the last reducer.

          +1 to getting this committed asap.

          Show
          dlmarion Dave Marion added a comment - We were seeing slow times at the end of our M/R jobs. We applied patch #8 today to our Hadoop software (CDH 5.1.2) and have seen M/R jobs act much better. Jobs are finishing right after the last reducer. +1 to getting this committed asap.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12680951/MAPREDUCE-4815.v8.patch
          against trunk revision 163bb55.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5008//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5008//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680951/MAPREDUCE-4815.v8.patch against trunk revision 163bb55. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5008//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5008//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12663778/MAPREDUCE-4815.v7.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4817//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4817//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663778/MAPREDUCE-4815.v7.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4817//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4817//console This message is automatically generated.
          Hide
          mingma Ming Ma added a comment -

          To have first task's recoverTask recover all succeeded tasks seems to work functionality wise. If the first task fails to recoverTask due to fs.rename exception, it will be rescheduled; the second task's recoverTask can continue to recover the succeeded tasks.

          It does change the semantics of recoverTask. It is no longer done on per task basis. But perhaps we can treat it as an optimization; other OutputCommitter implementations can still choose to have recovery on per task basis.

          For the upgrade scenario, how does it clean up the succeeded task attempt data in the old scheme?

          Show
          mingma Ming Ma added a comment - To have first task's recoverTask recover all succeeded tasks seems to work functionality wise. If the first task fails to recoverTask due to fs.rename exception, it will be rescheduled; the second task's recoverTask can continue to recover the succeeded tasks. It does change the semantics of recoverTask. It is no longer done on per task basis. But perhaps we can treat it as an optimization; other OutputCommitter implementations can still choose to have recovery on per task basis. For the upgrade scenario, how does it clean up the succeeded task attempt data in the old scheme?
          Hide
          l201514 Siqi Li added a comment -

          In this patch, the first task recovery will recover all succeeded tasks. Therefore, the rest tasks recovery will not do anything.

          Show
          l201514 Siqi Li added a comment - In this patch, the first task recovery will recover all succeeded tasks. Therefore, the rest tasks recovery will not do anything.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12662874/MAPREDUCE-4815.v4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4813//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4813//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662874/MAPREDUCE-4815.v4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4813//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4813//console This message is automatically generated.
          Hide
          l201514 Siqi Li added a comment -

          I think there is a problem with the original code, I tried to kill the AM while removing the old temp directory. The teragen job succeeded but missing a bunch of files.

          Hence, I modified the recoverTask() so that if temp output dir doesn't exist it will throw an exception instead of logging as a warning. Also, in case of upgrading, it will delete the final output dir

          Show
          l201514 Siqi Li added a comment - I think there is a problem with the original code, I tried to kill the AM while removing the old temp directory. The teragen job succeeded but missing a bunch of files. Hence, I modified the recoverTask() so that if temp output dir doesn't exist it will throw an exception instead of logging as a warning. Also, in case of upgrading, it will delete the final output dir
          Hide
          l201514 Siqi Li added a comment -

          Ming, you are right. When restarting the cluster for upgrade. It might run into some problems with old output structure.

          Does it have to throw some exceptions in this case? Originally, it just gives some warnings.

          Show
          l201514 Siqi Li added a comment - Ming, you are right. When restarting the cluster for upgrade. It might run into some problems with old output structure. Does it have to throw some exceptions in this case? Originally, it just gives some warnings.
          Hide
          mingma Ming Ma added a comment -

          Siqi, thanks for the patch. It looks good overall.

          Will the upgrade scenario work for task recovery? It seems the data stored in the old output structure won't be honored by the new scheme after cluster restarts with the patch; it skips the recoverTask given TempOutputPath doesn't exist.

          If that is the case, perhaps the patch can through some exception so that the task attempt state changes to KILLED state for retry. Alternatively, the new patch can be modified to handle the recovery of old directory structure, but that seems over complicated.

          Show
          mingma Ming Ma added a comment - Siqi, thanks for the patch. It looks good overall. Will the upgrade scenario work for task recovery? It seems the data stored in the old output structure won't be honored by the new scheme after cluster restarts with the patch; it skips the recoverTask given TempOutputPath doesn't exist. If that is the case, perhaps the patch can through some exception so that the task attempt state changes to KILLED state for retry. Alternatively, the new patch can be modified to handle the recovery of old directory structure, but that seems over complicated.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12661324/MAPREDUCE-4815.v3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4801//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4801//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661324/MAPREDUCE-4815.v3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4801//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4801//console This message is automatically generated.
          Hide
          l201514 Siqi Li added a comment -

          The approach I took is merging the output of each to a temporary directory whenever a task is finished
          Assuming output directory is $parentDir/$outputDir

          setupJob() will create
          $parentDir/$outputDir_temporary/$attemptID
          and
          $parentDir/$outputDir_temporary/$attemptID_temporary
          
          setupTask() or on-demand file creation by task will create
          $parentDir/$outputDir_temporary/$attemptID_temporary/$taskAttemptID
          
          commitTask() will move everything inside
          $parentDir/$outputDir_temporary/$attemptID_temporary/$taskAttemptID
          to
          $parentDir/$outputDir_temporary/$attemptID
          
          recoverJob() also will move
          $parentDir/$outputDir_temporary/$previous_attemptID
          to
          $parentDir/$outputDir_temporary/$recovering_attemptID
          
          if output directory doesn't exist, commitJob() will simply move 
          $parentDir/$outputDir_temporary/$attemptID to $parentDir/$outputDir
          
          if output directory does exist, copy all files from 
          $parentDir/$outputDir_temporary/$attemptID to $parentDir/$outputDir
          
          Show
          l201514 Siqi Li added a comment - The approach I took is merging the output of each to a temporary directory whenever a task is finished Assuming output directory is $parentDir/$outputDir setupJob() will create $parentDir/$outputDir_temporary/$attemptID and $parentDir/$outputDir_temporary/$attemptID_temporary setupTask() or on-demand file creation by task will create $parentDir/$outputDir_temporary/$attemptID_temporary/$taskAttemptID commitTask() will move everything inside $parentDir/$outputDir_temporary/$attemptID_temporary/$taskAttemptID to $parentDir/$outputDir_temporary/$attemptID recoverJob() also will move $parentDir/$outputDir_temporary/$previous_attemptID to $parentDir/$outputDir_temporary/$recovering_attemptID if output directory doesn't exist, commitJob() will simply move $parentDir/$outputDir_temporary/$attemptID to $parentDir/$outputDir if output directory does exist, copy all files from $parentDir/$outputDir_temporary/$attemptID to $parentDir/$outputDir
          Hide
          l201514 Siqi Li added a comment -

          If anyone can give me some feedback, that would be great.

          Show
          l201514 Siqi Li added a comment - If anyone can give me some feedback, that would be great.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12659925/MAPREDUCE-4815.v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4788//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4788//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12659925/MAPREDUCE-4815.v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4788//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4788//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12659761/MAPREDUCE-4815.v1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core:

          org.apache.hadoop.mapreduce.lib.output.TestPreemptableFileOutputCommitter

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4786//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4786//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12659761/MAPREDUCE-4815.v1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: org.apache.hadoop.mapreduce.lib.output.TestPreemptableFileOutputCommitter +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4786//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4786//console This message is automatically generated.
          Hide
          eric14 eric baldeschwieler added a comment -

          Bikas:

          Anyone who wants to use S3 as a target is going to have trouble w this. Of course, that is the case w the MR1 implementation too, so this is not a regression. But we're going to need to put energy into providing an alternative approach that does work w cloud stores at some point. It would be great if a solution emerged here that did not involve moving of files. That would reduce the burden on the HDFS meta-data system too, which would be a good thing for scalability.

          Show
          eric14 eric baldeschwieler added a comment - Bikas: Anyone who wants to use S3 as a target is going to have trouble w this. Of course, that is the case w the MR1 implementation too, so this is not a regression. But we're going to need to put energy into providing an alternative approach that does work w cloud stores at some point. It would be great if a solution emerged here that did not involve moving of files. That would reduce the burden on the HDFS meta-data system too, which would be a good thing for scalability.
          Hide
          acmurthy Arun C Murthy added a comment -

          Write permissions to the parent directory of the output directory is a new implicit requirement over the original FileOutputFormat. I think in the vast majority of cases it won't be a problem, but it is a potential backwards-compatibility issue.

          Currently that is already required since FileOutputFormat creates the output dir in the parent dir itself, so that isn't a new requirement.

          I think we should add this as an optimized path to FileOutputFormat, but keep the original, iterative rename scheme if the output directory isn't empty for backwards compatibility.

          Makes sense. It's unfortunately much more code to maintain, and I'm not sure it's worth it, but a good idea nevertheless.

          I have a preliminary patch which I'm testing, I'll upload it asap.

          Show
          acmurthy Arun C Murthy added a comment - Write permissions to the parent directory of the output directory is a new implicit requirement over the original FileOutputFormat. I think in the vast majority of cases it won't be a problem, but it is a potential backwards-compatibility issue. Currently that is already required since FileOutputFormat creates the output dir in the parent dir itself, so that isn't a new requirement. I think we should add this as an optimized path to FileOutputFormat, but keep the original, iterative rename scheme if the output directory isn't empty for backwards compatibility. Makes sense. It's unfortunately much more code to maintain, and I'm not sure it's worth it, but a good idea nevertheless. I have a preliminary patch which I'm testing, I'll upload it asap.
          Hide
          bikassaha Bikas Saha added a comment -

          Does this code user FileSystem or specifically DistributedFileSystem (HDFS)? If the former, then how does this relate to the comment eric baldeschwieler made earlier about cloud stores?

          Show
          bikassaha Bikas Saha added a comment - Does this code user FileSystem or specifically DistributedFileSystem (HDFS)? If the former, then how does this relate to the comment eric baldeschwieler made earlier about cloud stores?
          Hide
          jlowe Jason Lowe added a comment -

          I think this will work well with a couple of caveats:

          • Write permissions to the parent directory of the output directory is a new implicit requirement over the original FileOutputFormat. I think in the vast majority of cases it won't be a problem, but it is a potential backwards-compatibility issue.
          • There are existing output formats that override checkOutputSpecs() and explicitly remove the verification step that outputDir doesn't exist (e.g.: TeraOutputFormat). If we only support this new scheme, those output formats could fail to commit since the rename in commitJob() will fail for a non-empty destination directory. I think we should add this as an optimized path to FileOutputFormat, but keep the original, iterative rename scheme if the output directory isn't empty for backwards compatibility.
          Show
          jlowe Jason Lowe added a comment - I think this will work well with a couple of caveats: Write permissions to the parent directory of the output directory is a new implicit requirement over the original FileOutputFormat. I think in the vast majority of cases it won't be a problem, but it is a potential backwards-compatibility issue. There are existing output formats that override checkOutputSpecs() and explicitly remove the verification step that outputDir doesn't exist (e.g.: TeraOutputFormat). If we only support this new scheme, those output formats could fail to commit since the rename in commitJob() will fail for a non-empty destination directory. I think we should add this as an optimized path to FileOutputFormat, but keep the original, iterative rename scheme if the output directory isn't empty for backwards compatibility.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          The original code to do merging of directories was done with recovery in mind, but overlooked the performance implications. Here's a concrete proposal:

          Assuming output directory is $parentDir/$outputDir

          • On job submission, OutputFormat.checkOutputSpecs() will verify that $outputDir doesn't already exist.
          • OutputCommitter initialization(constructor) will create $outputDir if it doesn't exist
          • setupJob() will create
            $parentDir/_tempJobOutput_$applicationAttemptID/

            and

            $parentDir/_tempJobOutput_$applicationAttemptID_tempTaskOutputs/
          • setupTask() or on-demand file creation by task will create
            $parentDir/_tempJobOutput_$applicationAttemptID_tempTaskOutputs/$taskAttemptID
          • commitTask() will move
            $parentDir/_tempJobOutput_$applicationAttemptID_tempTaskOutputs/$taskAttemptID

            to

            $parentDir/_tempJobOutput_$applicationAttemptID/$taskAttemptID
          • commitJob() will move
            $parentDir/_tempJobOutput_$applicationAttemptID/

            to $outputDir

          • recoverJob() also will move
            $parentDir/_tempJobOutput_$applicationAttemptID

            to

            $parentDir/_tempJobOutput_$recoveringApplicationAttemptID

          So, in sum per application-attempt, two top-level directories. No more per-task file-moves, everything is a atomic rename.

          Thoughts?

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - The original code to do merging of directories was done with recovery in mind, but overlooked the performance implications. Here's a concrete proposal: Assuming output directory is $parentDir/$outputDir On job submission, OutputFormat.checkOutputSpecs() will verify that $outputDir doesn't already exist. OutputCommitter initialization(constructor) will create $outputDir if it doesn't exist setupJob() will create $parentDir/_tempJobOutput_$applicationAttemptID/ and $parentDir/_tempJobOutput_$applicationAttemptID_tempTaskOutputs/ setupTask() or on-demand file creation by task will create $parentDir/_tempJobOutput_$applicationAttemptID_tempTaskOutputs/$taskAttemptID commitTask() will move $parentDir/_tempJobOutput_$applicationAttemptID_tempTaskOutputs/$taskAttemptID to $parentDir/_tempJobOutput_$applicationAttemptID/$taskAttemptID commitJob() will move $parentDir/_tempJobOutput_$applicationAttemptID/ to $outputDir recoverJob() also will move $parentDir/_tempJobOutput_$applicationAttemptID to $parentDir/_tempJobOutput_$recoveringApplicationAttemptID So, in sum per application-attempt, two top-level directories. No more per-task file-moves, everything is a atomic rename. Thoughts?
          Hide
          acmurthy Arun C Murthy added a comment -

          Hmm... my bad, this looks like an oversight vis-a-vis hadoop-1.

          Maybe the correct solution is to create a top-level (peer of user's expected output dir) temporary$

          {mapred.output.dir}_${app_attempt_id} directory and move it to ${mapred.output.dir}

          . This will mean just one HDFS mv.

          Thoughts?

          Show
          acmurthy Arun C Murthy added a comment - Hmm... my bad, this looks like an oversight vis-a-vis hadoop-1. Maybe the correct solution is to create a top-level (peer of user's expected output dir) temporary $ {mapred.output.dir}_${app_attempt_id} directory and move it to ${mapred.output.dir} . This will mean just one HDFS mv. Thoughts?
          Hide
          eric14 eric baldeschwieler added a comment -

          When we think about this issues... we should think about cloud stores too (S3, swift), moves don't make sense there. Is there a better way forward here? DONE files perhaps?

          Show
          eric14 eric baldeschwieler added a comment - When we think about this issues... we should think about cloud stores too (S3, swift), moves don't make sense there. Is there a better way forward here? DONE files perhaps?

            People

            • Assignee:
              l201514 Siqi Li
              Reporter:
              jlowe Jason Lowe
            • Votes:
              0 Vote for this issue
              Watchers:
              47 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development