Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11709

StandbyCheckpointer should handle an non-existing legacyOivImageDir gracefully

    Details

      Description

      In StandbyCheckpointer, if the legacy OIV directory is not properly created, or was deleted for some reason (e.g. mis-operation), all checkpoint ops will fall. Not only the ANN won't receive new fsimages, the JNs will get full with edit log files, and cause NN to crash.

            // Save the legacy OIV image, if the output dir is defined.
            String outputDir = checkpointConf.getLegacyOivImageDir();
            if (outputDir != null && !outputDir.isEmpty()) {
              img.saveLegacyOIVImage(namesystem, outputDir, canceler);
            }
      

      It doesn't make sense to let such an unimportant part (saving OIV) abort all checkpoints and cause NN crash (and possibly lose data).

        Issue Links

          Activity

          Hide
          xkrogen Erik Krogen added a comment -

          Attaching v000 patch that catches IOExceptions thrown by saveLegacyOIVImage and logs an error but continues with the rest of the checkpointing process.

          Show
          xkrogen Erik Krogen added a comment - Attaching v000 patch that catches IOExceptions thrown by saveLegacyOIVImage and logs an error but continues with the rest of the checkpointing process.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 14m 2s trunk passed
          +1 compile 0m 49s trunk passed
          +1 checkstyle 0m 36s trunk passed
          +1 mvnsite 0m 53s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          -1 findbugs 1m 42s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings.
          +1 javadoc 0m 40s trunk passed
          +1 mvninstall 0m 50s the patch passed
          +1 compile 0m 53s the patch passed
          +1 javac 0m 53s the patch passed
          +1 checkstyle 0m 34s the patch passed
          +1 mvnsite 0m 53s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 49s the patch passed
          +1 javadoc 0m 41s the patch passed
          -1 unit 74m 1s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 21s The patch does not generate ASF License warnings.
          100m 54s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
            hadoop.hdfs.TestErasureCodeBenchmarkThroughput
            hadoop.hdfs.server.namenode.ha.TestEditLogTailer
            hadoop.fs.viewfs.TestViewFsAtHdfsRoot



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ac17dc
          JIRA Issue HDFS-11709
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865364/HDFS-11709.000.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux ec94446f3d50 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 61cda39e
          Default Java 1.8.0_121
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19216/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/19216/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19216/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19216/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 14m 2s trunk passed +1 compile 0m 49s trunk passed +1 checkstyle 0m 36s trunk passed +1 mvnsite 0m 53s trunk passed +1 mvneclipse 0m 15s trunk passed -1 findbugs 1m 42s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. +1 javadoc 0m 40s trunk passed +1 mvninstall 0m 50s the patch passed +1 compile 0m 53s the patch passed +1 javac 0m 53s the patch passed +1 checkstyle 0m 34s the patch passed +1 mvnsite 0m 53s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 49s the patch passed +1 javadoc 0m 41s the patch passed -1 unit 74m 1s hadoop-hdfs in the patch failed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 100m 54s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   hadoop.hdfs.TestErasureCodeBenchmarkThroughput   hadoop.hdfs.server.namenode.ha.TestEditLogTailer   hadoop.fs.viewfs.TestViewFsAtHdfsRoot Subsystem Report/Notes Docker Image:yetus/hadoop:0ac17dc JIRA Issue HDFS-11709 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865364/HDFS-11709.000.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux ec94446f3d50 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 61cda39e Default Java 1.8.0_121 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19216/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html unit https://builds.apache.org/job/PreCommit-HDFS-Build/19216/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19216/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19216/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          kihwal Kihwal Lee added a comment -

          +1 lgtm

          Show
          kihwal Kihwal Lee added a comment - +1 lgtm
          Hide
          zhz Zhe Zhang added a comment -

          Thanks Erik Krogen for the patch and Kihwal Lee for the review! +1 from me as well. The reported UT failures are unrelated and cannot be reproduced locally. I just committed to trunk, working on backports.

          Show
          zhz Zhe Zhang added a comment - Thanks Erik Krogen for the patch and Kihwal Lee for the review! +1 from me as well. The reported UT failures are unrelated and cannot be reproduced locally. I just committed to trunk, working on backports.
          Hide
          zhz Zhe Zhang added a comment -

          Committed to trunk~branch-2.7.

          Show
          zhz Zhe Zhang added a comment - Committed to trunk~branch-2.7.
          Hide
          xkrogen Erik Krogen added a comment -

          Thanks Kihwal Lee and Zhe Zhang!

          Show
          xkrogen Erik Krogen added a comment - Thanks Kihwal Lee and Zhe Zhang !
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11640 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11640/)
          HDFS-11709. StandbyCheckpointer should handle an non-existing (zhz: rev d8a33098309f17dfb0e3a000934f68394de44bf7)

          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11640 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11640/ ) HDFS-11709 . StandbyCheckpointer should handle an non-existing (zhz: rev d8a33098309f17dfb0e3a000934f68394de44bf7) (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java
          Hide
          shv Konstantin Shvachko added a comment -

          A unit test would've been appropriate for this jira.

          Show
          shv Konstantin Shvachko added a comment - A unit test would've been appropriate for this jira.
          Hide
          xkrogen Erik Krogen added a comment -
          Show
          xkrogen Erik Krogen added a comment - Konstantin Shvachko , sure. Filed HDFS-11717 .
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11677 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11677/)
          HDFS-11717. Add unit test for HDFS-11709 StandbyCheckpointer should (shv: rev d9014bda93760f223789d2ec9f5e35f40de157d4)

          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyCheckpoints.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11677 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11677/ ) HDFS-11717 . Add unit test for HDFS-11709 StandbyCheckpointer should (shv: rev d9014bda93760f223789d2ec9f5e35f40de157d4) (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyCheckpoints.java
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          2.8.1 became a security release. Moving fix-version to 2.8.2 after the fact.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - 2.8.1 became a security release. Moving fix-version to 2.8.2 after the fact.

            People

            • Assignee:
              xkrogen Erik Krogen
              Reporter:
              zhz Zhe Zhang
            • Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development