Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10192

Namenode safemode not coming out during failover

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha1
    • Component/s: namenode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Scenario:
      =======
      write some blocks
      wait till roll edits happen
      Stop SNN
      Delete some blocks in ANN, wait till the blocks are deleted in DN also.
      restart the SNN and Wait till block reports come from datanode to SNN
      Kill ANN then make SNN to Active.

      1. HDFS-10192-01.patch
        4 kB
        Brahma Reddy Battula
      2. HDFS-10192-02.patch
        6 kB
        Brahma Reddy Battula
      3. HDFS-10192-03.patch
        5 kB
        Brahma Reddy Battula

        Issue Links

          Activity

          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          Jing Zhao thanks a lot for review and commit and thanks to Mingliang Liu for review.

          Show
          brahmareddy Brahma Reddy Battula added a comment - Jing Zhao thanks a lot for review and commit and thanks to Mingliang Liu for review.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9568 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9568/)
          HDFS-10192. Namenode safemode not coming out during failover. (jing9: rev 221b3a8722f84f8e9ad0a98eea38a12cc4ad2f24)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManagerSafeMode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestHASafeMode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9568 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9568/ ) HDFS-10192 . Namenode safemode not coming out during failover. (jing9: rev 221b3a8722f84f8e9ad0a98eea38a12cc4ad2f24) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManagerSafeMode.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestHASafeMode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          Hide
          jingzhao Jing Zhao added a comment -

          I've committed this to trunk and branch-2. Thanks for the contribution, Brahma Reddy Battula! And thanks for the review, Mingliang Liu!

          Show
          jingzhao Jing Zhao added a comment - I've committed this to trunk and branch-2. Thanks for the contribution, Brahma Reddy Battula ! And thanks for the review, Mingliang Liu !
          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          Test failures are unrelated..

          Show
          brahmareddy Brahma Reddy Battula added a comment - Test failures are unrelated..
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 9s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 6m 32s trunk passed
          +1 compile 0m 40s trunk passed with JDK v1.8.0_77
          +1 compile 0m 42s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 24s trunk passed
          +1 mvnsite 0m 50s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 1m 55s trunk passed
          +1 javadoc 1m 3s trunk passed with JDK v1.8.0_77
          +1 javadoc 1m 44s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 44s the patch passed
          +1 compile 0m 35s the patch passed with JDK v1.8.0_77
          +1 javac 0m 35s the patch passed
          +1 compile 0m 39s the patch passed with JDK v1.7.0_95
          +1 javac 0m 39s the patch passed
          +1 checkstyle 0m 22s the patch passed
          +1 mvnsite 0m 48s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 9s the patch passed
          +1 javadoc 1m 2s the patch passed with JDK v1.8.0_77
          +1 javadoc 1m 44s the patch passed with JDK v1.7.0_95
          -1 unit 55m 48s hadoop-hdfs in the patch failed with JDK v1.8.0_77.
          -1 unit 54m 5s hadoop-hdfs in the patch failed with JDK v1.7.0_95.
          +1 asflicense 0m 21s Patch does not generate ASF License warnings.
          134m 43s



          Reason Tests
          JDK v1.8.0_77 Failed junit tests hadoop.hdfs.server.mover.TestStorageMover
            hadoop.hdfs.TestHFlush
          JDK v1.7.0_95 Failed junit tests hadoop.hdfs.server.namenode.TestEditLog



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:fbe3e86
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12797232/HDFS-10192-03.patch
          JIRA Issue HDFS-10192
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 4ab6f742f0e6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 21eb428
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/15081/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/15081/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15081/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15081/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15081/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15081/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 9s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 6m 32s trunk passed +1 compile 0m 40s trunk passed with JDK v1.8.0_77 +1 compile 0m 42s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 24s trunk passed +1 mvnsite 0m 50s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 55s trunk passed +1 javadoc 1m 3s trunk passed with JDK v1.8.0_77 +1 javadoc 1m 44s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 44s the patch passed +1 compile 0m 35s the patch passed with JDK v1.8.0_77 +1 javac 0m 35s the patch passed +1 compile 0m 39s the patch passed with JDK v1.7.0_95 +1 javac 0m 39s the patch passed +1 checkstyle 0m 22s the patch passed +1 mvnsite 0m 48s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 9s the patch passed +1 javadoc 1m 2s the patch passed with JDK v1.8.0_77 +1 javadoc 1m 44s the patch passed with JDK v1.7.0_95 -1 unit 55m 48s hadoop-hdfs in the patch failed with JDK v1.8.0_77. -1 unit 54m 5s hadoop-hdfs in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 21s Patch does not generate ASF License warnings. 134m 43s Reason Tests JDK v1.8.0_77 Failed junit tests hadoop.hdfs.server.mover.TestStorageMover   hadoop.hdfs.TestHFlush JDK v1.7.0_95 Failed junit tests hadoop.hdfs.server.namenode.TestEditLog Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12797232/HDFS-10192-03.patch JIRA Issue HDFS-10192 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 4ab6f742f0e6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 21eb428 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/15081/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/15081/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15081/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15081/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15081/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15081/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          Jing Zhao thanks for taking a look into this issue.. Uploaded the patch to address the above comment..

          Test Failures are unrelated, those are passed locally..

          Show
          brahmareddy Brahma Reddy Battula added a comment - Jing Zhao thanks for taking a look into this issue.. Uploaded the patch to address the above comment.. Test Failures are unrelated, those are passed locally..
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 7m 35s trunk passed
          +1 compile 0m 55s trunk passed with JDK v1.8.0_77
          +1 compile 0m 45s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 24s trunk passed
          +1 mvnsite 0m 53s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 2m 4s trunk passed
          +1 javadoc 1m 18s trunk passed with JDK v1.8.0_77
          +1 javadoc 2m 3s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 51s the patch passed
          +1 compile 0m 54s the patch passed with JDK v1.8.0_77
          +1 javac 0m 54s the patch passed
          +1 compile 0m 45s the patch passed with JDK v1.7.0_95
          +1 javac 0m 45s the patch passed
          +1 checkstyle 0m 21s the patch passed
          +1 mvnsite 0m 54s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 20s the patch passed
          +1 javadoc 1m 14s the patch passed with JDK v1.8.0_77
          +1 javadoc 1m 57s the patch passed with JDK v1.7.0_95
          -1 unit 111m 29s hadoop-hdfs in the patch failed with JDK v1.8.0_77.
          -1 unit 103m 44s hadoop-hdfs in the patch failed with JDK v1.7.0_95.
          +1 asflicense 0m 35s Patch does not generate ASF License warnings.
          244m 23s



          Reason Tests
          JDK v1.8.0_77 Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID
            hadoop.hdfs.server.namenode.ha.TestEditLogTailer
            hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations
            hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
            hadoop.hdfs.server.datanode.TestDataNodeMetrics
            hadoop.hdfs.TestFileAppend
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
            hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
            hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode
            hadoop.hdfs.qjournal.TestSecureNNWithQJM
            hadoop.hdfs.server.namenode.TestNamenodeCapacityReport
            hadoop.hdfs.server.datanode.TestDirectoryScanner
          JDK v1.8.0_77 Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2
          JDK v1.7.0_95 Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID
            hadoop.hdfs.server.namenode.ha.TestHAMetrics
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
            hadoop.hdfs.TestSafeModeWithStripedFile
            hadoop.hdfs.TestReconstructStripedFile
            hadoop.metrics2.sink.TestRollingFileSystemSinkWithSecureHdfs
            hadoop.hdfs.server.datanode.TestDirectoryScanner



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:fbe3e86
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12797122/HDFS-10192-02.patch
          JIRA Issue HDFS-10192
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux e2dc0f06e2f2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 0005816
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/15068/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/15068/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15068/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15068/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15068/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15068/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 7m 35s trunk passed +1 compile 0m 55s trunk passed with JDK v1.8.0_77 +1 compile 0m 45s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 24s trunk passed +1 mvnsite 0m 53s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 2m 4s trunk passed +1 javadoc 1m 18s trunk passed with JDK v1.8.0_77 +1 javadoc 2m 3s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 51s the patch passed +1 compile 0m 54s the patch passed with JDK v1.8.0_77 +1 javac 0m 54s the patch passed +1 compile 0m 45s the patch passed with JDK v1.7.0_95 +1 javac 0m 45s the patch passed +1 checkstyle 0m 21s the patch passed +1 mvnsite 0m 54s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 20s the patch passed +1 javadoc 1m 14s the patch passed with JDK v1.8.0_77 +1 javadoc 1m 57s the patch passed with JDK v1.7.0_95 -1 unit 111m 29s hadoop-hdfs in the patch failed with JDK v1.8.0_77. -1 unit 103m 44s hadoop-hdfs in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 35s Patch does not generate ASF License warnings. 244m 23s Reason Tests JDK v1.8.0_77 Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID   hadoop.hdfs.server.namenode.ha.TestEditLogTailer   hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations   hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl   hadoop.hdfs.server.datanode.TestDataNodeMetrics   hadoop.hdfs.TestFileAppend   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes   hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode   hadoop.hdfs.qjournal.TestSecureNNWithQJM   hadoop.hdfs.server.namenode.TestNamenodeCapacityReport   hadoop.hdfs.server.datanode.TestDirectoryScanner JDK v1.8.0_77 Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2 JDK v1.7.0_95 Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID   hadoop.hdfs.server.namenode.ha.TestHAMetrics   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.TestSafeModeWithStripedFile   hadoop.hdfs.TestReconstructStripedFile   hadoop.metrics2.sink.TestRollingFileSystemSinkWithSecureHdfs   hadoop.hdfs.server.datanode.TestDirectoryScanner Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12797122/HDFS-10192-02.patch JIRA Issue HDFS-10192 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e2dc0f06e2f2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 0005816 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/15068/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/15068/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15068/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15068/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15068/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15068/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          jingzhao Jing Zhao added a comment -

          Thanks for the fix, Brahma Reddy Battula. The patch looks good to me. One nit is the following code in the test can be simplified as return cluster.getNamesystem(0).getBlockManager().getPendingDeletionBlocksCount() == 0.

                  if (cluster.getNamesystem(0).getBlockManager()
                      .getPendingDeletionBlocksCount() == 0) {
                    return true;
                  }
                  return false;
          

          Besides, there is a typo in

          168	    // should stay in PENDING_THRESHOLD during trasitionToActive
          

          +1 after addressing the comments.

          Show
          jingzhao Jing Zhao added a comment - Thanks for the fix, Brahma Reddy Battula . The patch looks good to me. One nit is the following code in the test can be simplified as return cluster.getNamesystem(0).getBlockManager().getPendingDeletionBlocksCount() == 0 . if (cluster.getNamesystem(0).getBlockManager() .getPendingDeletionBlocksCount() == 0) { return true ; } return false ; Besides, there is a typo in 168 // should stay in PENDING_THRESHOLD during trasitionToActive +1 after addressing the comments.
          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          Mingliang Liu uploaded the patch..added the one more unit test ..

          Show
          brahmareddy Brahma Reddy Battula added a comment - Mingliang Liu uploaded the patch..added the one more unit test ..
          Hide
          liuml07 Mingliang Liu added a comment -

          s/avoid not/not/

          Show
          liuml07 Mingliang Liu added a comment - s/avoid not/not/
          Hide
          liuml07 Mingliang Liu added a comment -

          I had a look at HDFS-7046 and found that there was no easy fix to avoid NPE because of leaving safe mode early in the middle of edit. For now I'm in favor of the current fix. We are deliberately avoid not leaving safe mode in the middle of edit when failing over., and check the safe mode after start active services.

          That's said,

          1. I had a look at the BlockManagerSafeMode#checkSafeMode(), if the safe mode is OFF, it will be a no op. This means we can check the safe mode without side effects (e.g. OFF -> PENDING_THRESHOLD). This is important if BlockManager#checkSafeMode() is public.
          2. I think we can add another unit test that will assert that, the BlockManagerSafeMode#checkSafeMode() will never leave safe mode (even better, a no-op) in the context of start active services. This may be similar to the test case in the patch (or we can consolidate them in one single test).

          Any comment?

          Show
          liuml07 Mingliang Liu added a comment - I had a look at HDFS-7046 and found that there was no easy fix to avoid NPE because of leaving safe mode early in the middle of edit. For now I'm in favor of the current fix. We are deliberately avoid not leaving safe mode in the middle of edit when failing over., and check the safe mode after start active services. That's said, I had a look at the BlockManagerSafeMode#checkSafeMode() , if the safe mode is OFF, it will be a no op. This means we can check the safe mode without side effects (e.g. OFF -> PENDING_THRESHOLD). This is important if BlockManager#checkSafeMode() is public. I think we can add another unit test that will assert that, the BlockManagerSafeMode#checkSafeMode() will never leave safe mode (even better, a no-op) in the context of start active services. This may be similar to the test case in the patch (or we can consolidate them in one single test). Any comment?
          Hide
          liuml07 Mingliang Liu added a comment -

          I can verify that the test fails with trunk code. Thanks for the report.

          However, I don't think startActiveServices() should check safe mode explicitly in the finally block. 1) The block manager safe mode is a state machine and once the NN leaves it, it should never enter again (unless requested by user). 2) The block manager safe mode should maintain the state machine automatically when loading edit logs. That's why we made the BlockManager.checkSafeMode() package local, instead of public. If there is any bug, we should fix it there.

          So my gut feeling is that in BlockManagerSafeMode#checkSafeMode(), we wrongly bypass the internal state change if namesystem.inTransitionToActive() is true. Obviously this is related to HDFS-7046. I'll debug the code for a fix.

          Thanks.

          Show
          liuml07 Mingliang Liu added a comment - I can verify that the test fails with trunk code. Thanks for the report. However, I don't think startActiveServices() should check safe mode explicitly in the finally block. 1) The block manager safe mode is a state machine and once the NN leaves it, it should never enter again (unless requested by user). 2) The block manager safe mode should maintain the state machine automatically when loading edit logs. That's why we made the BlockManager.checkSafeMode() package local, instead of public. If there is any bug, we should fix it there. So my gut feeling is that in BlockManagerSafeMode#checkSafeMode() , we wrongly bypass the internal state change if namesystem.inTransitionToActive() is true. Obviously this is related to HDFS-7046 . I'll debug the code for a fix. Thanks.
          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          Pinging Mingliang Liu.

          Show
          brahmareddy Brahma Reddy Battula added a comment - Pinging Mingliang Liu .
          Hide
          liuml07 Mingliang Liu added a comment -

          Thanks for reporting this, Brahma Reddy Battula. I will have a look at the root cause and review the patch this week.

          Show
          liuml07 Mingliang Liu added a comment - Thanks for reporting this, Brahma Reddy Battula . I will have a look at the root cause and review the patch this week.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 10s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 6m 36s trunk passed
          +1 compile 0m 39s trunk passed with JDK v1.8.0_74
          +1 compile 0m 40s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 24s trunk passed
          +1 mvnsite 0m 51s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 1m 59s trunk passed
          +1 javadoc 1m 5s trunk passed with JDK v1.8.0_74
          +1 javadoc 1m 45s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 48s the patch passed
          +1 compile 0m 37s the patch passed with JDK v1.8.0_74
          +1 javac 0m 37s the patch passed
          +1 compile 0m 38s the patch passed with JDK v1.7.0_95
          +1 javac 0m 38s the patch passed
          +1 checkstyle 0m 23s the patch passed
          +1 mvnsite 0m 49s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 8s the patch passed
          +1 javadoc 1m 3s the patch passed with JDK v1.8.0_74
          +1 javadoc 1m 46s the patch passed with JDK v1.7.0_95
          -1 unit 57m 25s hadoop-hdfs in the patch failed with JDK v1.8.0_74.
          +1 unit 52m 24s hadoop-hdfs in the patch passed with JDK v1.7.0_95.
          +1 asflicense 0m 20s Patch does not generate ASF License warnings.
          134m 56s



          Reason Tests
          JDK v1.8.0_74 Failed junit tests hadoop.hdfs.TestAclsEndToEnd
            hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:fbe3e86
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12794753/HDFS-10192-01.patch
          JIRA Issue HDFS-10192
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux e726cf1a1445 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / e7ed05e
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_74 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14892/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_74.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14892/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_74.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14892/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14892/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 10s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 36s trunk passed +1 compile 0m 39s trunk passed with JDK v1.8.0_74 +1 compile 0m 40s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 24s trunk passed +1 mvnsite 0m 51s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 59s trunk passed +1 javadoc 1m 5s trunk passed with JDK v1.8.0_74 +1 javadoc 1m 45s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 48s the patch passed +1 compile 0m 37s the patch passed with JDK v1.8.0_74 +1 javac 0m 37s the patch passed +1 compile 0m 38s the patch passed with JDK v1.7.0_95 +1 javac 0m 38s the patch passed +1 checkstyle 0m 23s the patch passed +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 8s the patch passed +1 javadoc 1m 3s the patch passed with JDK v1.8.0_74 +1 javadoc 1m 46s the patch passed with JDK v1.7.0_95 -1 unit 57m 25s hadoop-hdfs in the patch failed with JDK v1.8.0_74. +1 unit 52m 24s hadoop-hdfs in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 20s Patch does not generate ASF License warnings. 134m 56s Reason Tests JDK v1.8.0_74 Failed junit tests hadoop.hdfs.TestAclsEndToEnd   hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12794753/HDFS-10192-01.patch JIRA Issue HDFS-10192 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e726cf1a1445 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / e7ed05e Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_74 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/14892/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_74.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14892/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_74.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14892/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14892/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          Attached the patch.
          Kindly review.

          blockManager.checkSafeMode() was not called after startActiveServices().
          This call was present before HDFS-9129. Missed during refactor in HDFS-9129.

          Show
          brahmareddy Brahma Reddy Battula added a comment - Attached the patch. Kindly review. blockManager.checkSafeMode() was not called after startActiveServices() . This call was present before HDFS-9129 . Missed during refactor in HDFS-9129 .
          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          Broken by HDFS-9129

          Show
          brahmareddy Brahma Reddy Battula added a comment - Broken by HDFS-9129

            People

            • Assignee:
              brahmareddy Brahma Reddy Battula
              Reporter:
              brahmareddy Brahma Reddy Battula
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development