Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • HA branch (HDFS-1623)
    • HA branch (HDFS-1623)
    • ha, namenode
    • None

    Description

      I'm seeing an NPE when running HBase 0.92 unit tests against the HA branch. The test failure is: org.apache.hadoop.hbase.regionserver.wal.TestHLog.testAppendClose.

      Here is the backtrace:
      java.lang.NullPointerException
      at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.size(BlocksMap.java:179)
      at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.getActiveBlockCount(BlockManager.java:2465)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.doConsistencyCheck(FSNamesystem.java:3591)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.isOn(FSNamesystem.java:3285)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.access$900(FSNamesystem.java:3196)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.isInSafeMode(FSNamesystem.java:3670)
      at org.apache.hadoop.hdfs.server.namenode.NameNode.isInSafeMode(NameNode.java:609)
      at org.apache.hadoop.hdfs.MiniDFSCluster.isNameNodeUp(MiniDFSCluster.java:1476)
      at org.apache.hadoop.hdfs.MiniDFSCluster.isClusterUp(MiniDFSCluster.java:1487)

      Here is the relevant section of the test:

         try {
            DistributedFileSystem dfs = (DistributedFileSystem) cluster.getFileSystem();
            dfs.setSafeMode(FSConstants.SafeModeAction.SAFEMODE_ENTER);
            cluster.shutdown();
            try {
              // wal.writer.close() will throw an exception,
              // but still call this since it closes the LogSyncer thread first
              wal.close();
            } catch (IOException e) {
              LOG.info(e);
            }
            fs.close(); // closing FS last so DFSOutputStream can't call close
            LOG.info("STOPPED first instance of the cluster");
          } finally {
            // Restart the cluster
            while (cluster.isClusterUp()){
              LOG.error("Waiting for cluster to go down");
              Thread.sleep(1000);
            }
      

      Fix looks trivial, will include patch shortly.

      Attachments

        1. HDFS-2838-v2.patch
          3 kB
          Gregory Chanan
        2. HDFS-2838.patch
          0.9 kB
          Gregory Chanan

        Activity

          eli Eli Collins added a comment -

          +1

          eli Eli Collins added a comment - +1

          make sense. I just verified in trunk. Looks this is the bug in branch only.
          BTW, could you please provide test also to replicate this issue?

          umamaheswararao Uma Maheswara Rao G added a comment - make sense. I just verified in trunk. Looks this is the bug in branch only. BTW, could you please provide test also to replicate this issue?

          sorry, i did not notice Eli's review above.

          umamaheswararao Uma Maheswara Rao G added a comment - sorry, i did not notice Eli's review above.
          eli Eli Collins added a comment -

          No worries. Greg is going to take a stab at moving the kernel of TestHLog.testAppendClose into an HDFS test.

          eli Eli Collins added a comment - No worries. Greg is going to take a stab at moving the kernel of TestHLog.testAppendClose into an HDFS test.
          tlipcon Todd Lipcon added a comment -

          looks good to me. Getting an HDFS test for this might be tricky, since this is only the case during startup, right?

          tlipcon Todd Lipcon added a comment - looks good to me. Getting an HDFS test for this might be tricky, since this is only the case during startup, right?

          +1

          I just verified his sample test code. It passes for me. Yes, it would be tricky to create the situation where safemode object is not null and blockmanager not up completely. Thanks Greg for the patch.

          umamaheswararao Uma Maheswara Rao G added a comment - +1 I just verified his sample test code. It passes for me. Yes, it would be tricky to create the situation where safemode object is not null and blockmanager not up completely. Thanks Greg for the patch.
          gchanan Gregory Chanan added a comment -

          Added version 2 of patch that contains a test case that fails without change and passes with.

          gchanan Gregory Chanan added a comment - Added version 2 of patch that contains a test case that fails without change and passes with.
          eli Eli Collins added a comment -

          +1 nice test.

          eli Eli Collins added a comment - +1 nice test.
          eli Eli Collins added a comment -

          I've committed this. Thanks Greg!

          eli Eli Collins added a comment - I've committed this. Thanks Greg!

          Thanks Greg,
          Eli, is this test failing reliably for you without fix? For me, it passes even with out fix.
          It may be ok to keep this test, at least this can reproduce randomly. may be better than nothing
          @Greg, small suggestion, from next time you can use HdfsConstants instead of FSConstants.

          umamaheswararao Uma Maheswara Rao G added a comment - Thanks Greg, Eli, is this test failing reliably for you without fix? For me, it passes even with out fix. It may be ok to keep this test, at least this can reproduce randomly. may be better than nothing @Greg, small suggestion, from next time you can use HdfsConstants instead of FSConstants.
          hudson Hudson added a comment -

          Integrated in Hadoop-Hdfs-HAbranch-build #60 (See https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/60/)
          HDFS-2838. NPE in FSNamesystem when in safe mode. Contributed by Gregory Chanan

          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236450
          Files :

          • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/CHANGES.HDFS-1623.txt
          • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestMiniDFSCluster.java
          hudson Hudson added a comment - Integrated in Hadoop-Hdfs-HAbranch-build #60 (See https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/60/ ) HDFS-2838 . NPE in FSNamesystem when in safe mode. Contributed by Gregory Chanan eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236450 Files : /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/CHANGES. HDFS-1623 .txt /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestMiniDFSCluster.java

          People

            gchanan Gregory Chanan
            gchanan Gregory Chanan
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: