Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11306

Print remaining edit logs from buffer if edit log can't be rolled.

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha2
    • Component/s: ha, namenode
    • Labels:
      None

      Description

      In HDFS-10943 Yongjun Zhang reported that edit log can not be rolled due to unexpected edit logs lingering in the buffer.

      Unable to root cause the bug, I propose that we dump the remaining edit logs in the buffer into namenode log, before crashing namenode. Use this new capability to find the ops that sneaks into the buffer unexpectedly, and hopefully catch the bug.

      This effort is orthogonal, but related to HDFS-11292, which adds additional informational logs to help debug this issue.

      1. HDFS-11306.001.patch
        4 kB
        Wei-Chiu Chuang
      2. HDFS-11306.002.patch
        6 kB
        Wei-Chiu Chuang
      3. HDFS-11306.003.patch
        6 kB
        Wei-Chiu Chuang

        Issue Links

          Activity

          Hide
          yzhangal Yongjun Zhang added a comment -

          I saw Wei-Chiu Chuang has committed to trunk and branch-2 on 1/13/2017.

          Thanks Wei-Chiu!

          Show
          yzhangal Yongjun Zhang added a comment - I saw Wei-Chiu Chuang has committed to trunk and branch-2 on 1/13/2017. Thanks Wei-Chiu!
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Will commit v003 patch by end of day.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Will commit v003 patch by end of day.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Yongjun Zhang thanks for the review. All failed tests looks flaky to me.

          TestDFSRSDefault10x4StripedOutputStreamWithFailure,
          TestDFSStripedOutputStreamWithFailure050,
          TestDFSStripedOutputStreamWithFailure120 passed in my local tree.

          TestDFSStripedOutputStreamWithFailure120 failed the first time, then passed the second time.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Yongjun Zhang thanks for the review. All failed tests looks flaky to me. TestDFSRSDefault10x4StripedOutputStreamWithFailure, TestDFSStripedOutputStreamWithFailure050, TestDFSStripedOutputStreamWithFailure120 passed in my local tree. TestDFSStripedOutputStreamWithFailure120 failed the first time, then passed the second time.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks for the updated patch Wei-Chiu Chuang. Would you please take a look at the failed tests? I'm +1 other than that.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks for the updated patch Wei-Chiu Chuang . Would you please take a look at the failed tests? I'm +1 other than that.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 13m 23s trunk passed
          +1 compile 0m 49s trunk passed
          +1 checkstyle 0m 27s trunk passed
          +1 mvnsite 0m 53s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          +1 findbugs 1m 49s trunk passed
          +1 javadoc 0m 40s trunk passed
          +1 mvninstall 0m 48s the patch passed
          +1 compile 0m 45s the patch passed
          +1 javac 0m 45s the patch passed
          +1 checkstyle 0m 25s the patch passed
          +1 mvnsite 0m 52s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 55s the patch passed
          +1 javadoc 0m 39s the patch passed
          -1 unit 109m 52s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 21s The patch does not generate ASF License warnings.
          135m 34s



          Reason Tests
          Failed junit tests hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure120
            hadoop.hdfs.server.namenode.ha.TestHAAppend
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure050
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure190



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-11306
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12847324/HDFS-11306.003.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux bf6ac3686683 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 1f344e0
          Default Java 1.8.0_111
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18163/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18163/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18163/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 13m 23s trunk passed +1 compile 0m 49s trunk passed +1 checkstyle 0m 27s trunk passed +1 mvnsite 0m 53s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 49s trunk passed +1 javadoc 0m 40s trunk passed +1 mvninstall 0m 48s the patch passed +1 compile 0m 45s the patch passed +1 javac 0m 45s the patch passed +1 checkstyle 0m 25s the patch passed +1 mvnsite 0m 52s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 55s the patch passed +1 javadoc 0m 39s the patch passed -1 unit 109m 52s hadoop-hdfs in the patch failed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 135m 34s Reason Tests Failed junit tests hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure120   hadoop.hdfs.server.namenode.ha.TestHAAppend   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure050   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure190 Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11306 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12847324/HDFS-11306.003.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux bf6ac3686683 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 1f344e0 Default Java 1.8.0_111 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/18163/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18163/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18163/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Jira is back, and here's my v003 patch to address checkstyle issue, and update warning messages.

          Thanks again for reviewing the patch!

          Show
          jojochuang Wei-Chiu Chuang added a comment - Jira is back, and here's my v003 patch to address checkstyle issue, and update warning messages. Thanks again for reviewing the patch!
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 19s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 14m 24s trunk passed
          +1 compile 1m 3s trunk passed
          +1 checkstyle 0m 32s trunk passed
          +1 mvnsite 1m 6s trunk passed
          +1 mvneclipse 0m 17s trunk passed
          +1 findbugs 2m 5s trunk passed
          +1 javadoc 0m 51s trunk passed
          +1 mvninstall 0m 58s the patch passed
          +1 compile 0m 58s the patch passed
          +1 javac 0m 58s the patch passed
          -0 checkstyle 0m 29s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 5 unchanged - 0 fixed = 6 total (was 5)
          +1 mvnsite 0m 59s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 2m 25s the patch passed
          +1 javadoc 0m 42s the patch passed
          -1 unit 75m 53s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 19s The patch does not generate ASF License warnings.
          105m 2s



          Reason Tests
          Timed out junit tests org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-11306
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12846724/HDFS-11306.002.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 69289742c7af 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / e692316
          Default Java 1.8.0_111
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18139/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18139/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18139/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18139/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 19s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 14m 24s trunk passed +1 compile 1m 3s trunk passed +1 checkstyle 0m 32s trunk passed +1 mvnsite 1m 6s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 2m 5s trunk passed +1 javadoc 0m 51s trunk passed +1 mvninstall 0m 58s the patch passed +1 compile 0m 58s the patch passed +1 javac 0m 58s the patch passed -0 checkstyle 0m 29s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 5 unchanged - 0 fixed = 6 total (was 5) +1 mvnsite 0m 59s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 25s the patch passed +1 javadoc 0m 42s the patch passed -1 unit 75m 53s hadoop-hdfs in the patch failed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 105m 2s Reason Tests Timed out junit tests org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11306 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12846724/HDFS-11306.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 69289742c7af 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / e692316 Default Java 1.8.0_111 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18139/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/18139/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18139/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18139/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Upload v002 patch to address Yongjun Zhang's comment.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Upload v002 patch to address Yongjun Zhang 's comment.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Thanks Yongjun Zhang for the review

          add a finally block and call

          IOUtils.cleanup(LOG, dis);
          IOUtils.cleanup(LOG, bis);

          Actually, ByteArrayInputStream.close() and DataInputStream.close() are no-op. So this is not needed. I'll work on the other two comments soon.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Thanks Yongjun Zhang for the review add a finally block and call IOUtils.cleanup(LOG, dis); IOUtils.cleanup(LOG, bis); Actually, ByteArrayInputStream.close() and DataInputStream.close() are no-op. So this is not needed. I'll work on the other two comments soon.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Wei-Chiu Chuang,

          Thanks much for working on this issue!

          Some comments of the patch:

          1. Suggest to print a summary WARN message at the beginning of dumpRemainingEditLogs(), stating something like "The edits buffer should have been flushed but there are still <numTxns> unflushed. Below are the list of the unflushed transactions:".
          2. add a finally block and call
                    IOUtils.cleanup(LOG, dis);
                    IOUtils.cleanup(LOG, bis);
            
          3. Can we add couple of more different edits in the test?

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Wei-Chiu Chuang , Thanks much for working on this issue! Some comments of the patch: Suggest to print a summary WARN message at the beginning of dumpRemainingEditLogs() , stating something like "The edits buffer should have been flushed but there are still <numTxns> unflushed. Below are the list of the unflushed transactions:". add a finally block and call IOUtils.cleanup(LOG, dis); IOUtils.cleanup(LOG, bis); Can we add couple of more different edits in the test? Thanks.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 22s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 16m 21s trunk passed
          +1 compile 1m 3s trunk passed
          +1 checkstyle 0m 34s trunk passed
          +1 mvnsite 1m 8s trunk passed
          +1 mvneclipse 0m 16s trunk passed
          +1 findbugs 2m 7s trunk passed
          +1 javadoc 0m 47s trunk passed
          +1 mvninstall 1m 5s the patch passed
          +1 compile 0m 55s the patch passed
          +1 javac 0m 55s the patch passed
          -0 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 5 unchanged - 0 fixed = 6 total (was 5)
          +1 mvnsite 1m 3s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 2m 15s the patch passed
          +1 javadoc 0m 46s the patch passed
          -1 unit 89m 45s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 28s The patch does not generate ASF License warnings.
          121m 9s



          Reason Tests
          Failed junit tests hadoop.hdfs.TestErasureCodeBenchmarkThroughput
          Timed out junit tests org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
            org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration
            org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-11306
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12846419/HDFS-11306.001.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 908f880b1514 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 91bf504
          Default Java 1.8.0_111
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18116/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18116/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18116/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18116/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 22s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 16m 21s trunk passed +1 compile 1m 3s trunk passed +1 checkstyle 0m 34s trunk passed +1 mvnsite 1m 8s trunk passed +1 mvneclipse 0m 16s trunk passed +1 findbugs 2m 7s trunk passed +1 javadoc 0m 47s trunk passed +1 mvninstall 1m 5s the patch passed +1 compile 0m 55s the patch passed +1 javac 0m 55s the patch passed -0 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 5 unchanged - 0 fixed = 6 total (was 5) +1 mvnsite 1m 3s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 15s the patch passed +1 javadoc 0m 46s the patch passed -1 unit 89m 45s hadoop-hdfs in the patch failed. +1 asflicense 0m 28s The patch does not generate ASF License warnings. 121m 9s Reason Tests Failed junit tests hadoop.hdfs.TestErasureCodeBenchmarkThroughput Timed out junit tests org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration   org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11306 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12846419/HDFS-11306.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 908f880b1514 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 91bf504 Default Java 1.8.0_111 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18116/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/18116/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18116/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18116/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Upload patch v001. This patch adds a private method that dumps edit logs in human readable formation into namenode log. A test case is also added.

          Any suggestion is greatly appreciated. I honestly do not have much experience in edit logs and HA.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Upload patch v001. This patch adds a private method that dumps edit logs in human readable formation into namenode log. A test case is also added. Any suggestion is greatly appreciated. I honestly do not have much experience in edit logs and HA.

            People

            • Assignee:
              jojochuang Wei-Chiu Chuang
              Reporter:
              jojochuang Wei-Chiu Chuang
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development