HBase
  1. HBase
  2. HBASE-3933

Hmaster throw NullPointerException

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.90.6
    • Fix Version/s: None
    • Component/s: master
    • Labels:
      None
    • Release Note:
      It was recommended to continue working on this on a new JIRA since this one was closed and was too old to continue with.

      Description

      NullPointerException while hmaster starting.

            java.lang.NullPointerException
              at java.util.TreeMap.getEntry(TreeMap.java:324)
              at java.util.TreeMap.get(TreeMap.java:255)
              at org.apache.hadoop.hbase.master.AssignmentManager.addToServers(AssignmentManager.java:1512)
              at org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:606)
              at org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:214)
              at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:402)
              at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283)
      
      1. Hmastersetup0.90
        4 kB
        gaojinchao
      2. HBASE-3933.patch
        13 kB
        Eugene Koontz
      3. HBASE-3933.patch
        13 kB
        Eugene Koontz

        Issue Links

          Activity

          Hide
          gaojinchao added a comment -

          Please review it and give some suggestion. If the solution is ok, I will test it. Thanks.

          Because region server interval is 3s, So I think hbase.master.wait.on.regionservers.timeout should be 9s.

          Show
          gaojinchao added a comment - Please review it and give some suggestion. If the solution is ok, I will test it. Thanks. Because region server interval is 3s, So I think hbase.master.wait.on.regionservers.timeout should be 9s.
          Hide
          Ted Yu added a comment -

          The changes in HMaster copied some existing code. We can add the new check to the existing if statement so that we don't copy the code in the if block.

          For the changes in waitForRegionServers(), I guess the rationale is to prevent the region server which carried .ROOT. to checkin after that point. If so, we should still let other region servers to continue checking in.

          Show
          Ted Yu added a comment - The changes in HMaster copied some existing code. We can add the new check to the existing if statement so that we don't copy the code in the if block. For the changes in waitForRegionServers(), I guess the rationale is to prevent the region server which carried .ROOT. to checkin after that point. If so, we should still let other region servers to continue checking in.
          Hide
          gaojinchao added a comment -

          Hi Ted.
          Hlog will be split after waitForRegionServers. So I prevent all region server to chechin and stop these region server after that point.

          Show
          gaojinchao added a comment - Hi Ted. Hlog will be split after waitForRegionServers. So I prevent all region server to chechin and stop these region server after that point.
          Hide
          stack added a comment -

          @Gao I'm with Ted that you should not duplicate code. On going to 9seconds from 4.5, do you think your change generally applicable? Everyone will now have to wait at least nine seconds before startup proceeds. Or is it that you have a cluster with lots of regionservers? Can you not just up the configuration for how long to wait on new regionservers to check in? Also, can you say more on why you block regionservers coming in and actually tell them shutdown. Thats a pretty radical change. Thanks.

          Show
          stack added a comment - @Gao I'm with Ted that you should not duplicate code. On going to 9seconds from 4.5, do you think your change generally applicable? Everyone will now have to wait at least nine seconds before startup proceeds. Or is it that you have a cluster with lots of regionservers? Can you not just up the configuration for how long to wait on new regionservers to check in? Also, can you say more on why you block regionservers coming in and actually tell them shutdown. Thats a pretty radical change. Thanks.
          Hide
          stack added a comment -

          Any update on this one Gao? I'm moving it out of 0.90.4 till hear otherwise.

          Show
          stack added a comment - Any update on this one Gao? I'm moving it out of 0.90.4 till hear otherwise.
          Hide
          gaojinchao added a comment -

          OK, Thanks.
          It happens rarely.I can't get a better change now.

          Show
          gaojinchao added a comment - OK, Thanks. It happens rarely.I can't get a better change now.
          Hide
          gaojinchao added a comment -

          Hi all. I have a new idea for this issue. why don't we get the regionserver list from zk when it is failover?
          we can avoid this case that the hlog is splited but region server is servering.

          Show
          gaojinchao added a comment - Hi all. I have a new idea for this issue. why don't we get the regionserver list from zk when it is failover? we can avoid this case that the hlog is splited but region server is servering.
          Hide
          gaojinchao added a comment -

          I study the TRUNK. It has fixed. So we can close this issue.

          Trunk code:
          // Wait for region servers to report in.
          this.serverManager.waitForRegionServers(status);
          // Check zk for regionservers that are up but didn't register
          for (ServerName sn: this.regionServerTracker.getOnlineServers()) {
          if (!this.serverManager.isServerOnline(sn))

          { // Not registered; add it. LOG.info("Registering server found up in zk: " + sn); this.serverManager.recordNewServer(sn, HServerLoad.EMPTY_HSERVERLOAD); }

          }

          Show
          gaojinchao added a comment - I study the TRUNK. It has fixed. So we can close this issue. Trunk code: // Wait for region servers to report in. this.serverManager.waitForRegionServers(status); // Check zk for regionservers that are up but didn't register for (ServerName sn: this.regionServerTracker.getOnlineServers()) { if (!this.serverManager.isServerOnline(sn)) { // Not registered; add it. LOG.info("Registering server found up in zk: " + sn); this.serverManager.recordNewServer(sn, HServerLoad.EMPTY_HSERVERLOAD); } }
          Hide
          stack added a comment -

          Resolving after Gaojinchao did research and found it fixed in TRUNK. Thanks Gaojinchao.

          Show
          stack added a comment - Resolving after Gaojinchao did research and found it fixed in TRUNK. Thanks Gaojinchao.
          Hide
          Eugene Koontz added a comment -

          The change that gaojinchao refers to was done here.

          Show
          Eugene Koontz added a comment - The change that gaojinchao refers to was done here.
          Hide
          Eugene Koontz added a comment -

          I would like to reopen this because I believe that the bug still exists in the 0.90 branch (though not in trunk as Gaojinchao said).

          Show
          Eugene Koontz added a comment - I would like to reopen this because I believe that the bug still exists in the 0.90 branch (though not in trunk as Gaojinchao said).
          Hide
          Eugene Koontz added a comment -

          Adds new HMaster::verifyMetaTablesAreUp() method to avoid NPE in AssignmentManager::processFailover().

          Show
          Eugene Koontz added a comment - Adds new HMaster::verifyMetaTablesAreUp() method to avoid NPE in AssignmentManager::processFailover().
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12510546/HBASE-3933.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/756//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510546/HBASE-3933.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/756//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          @Eugene:
          Have you run all unit tests ?

          Normally a new JIRA should be opened since the original issue was resolved 4 months ago.

          Show
          Ted Yu added a comment - @Eugene: Have you run all unit tests ? Normally a new JIRA should be opened since the original issue was resolved 4 months ago.
          Hide
          Ted Yu added a comment -

          verifyMetaTablesAreUp isn't in HMaster under TRUNK.
          So this is not a backport.

          Show
          Ted Yu added a comment - verifyMetaTablesAreUp isn't in HMaster under TRUNK. So this is not a backport.
          Hide
          gaojinchao added a comment -

          @Eugene
          Yes the issue exits in branch90. I avoid this by increasing "hbase.master.wait.on.regionservers.timeout"

          Show
          gaojinchao added a comment - @Eugene Yes the issue exits in branch90. I avoid this by increasing "hbase.master.wait.on.regionservers.timeout"
          Hide
          gaojinchao added a comment -

          @Eugene
          In your patches, You only deale with the root/meta regionserver. If a normal regionserver registers laterly.
          Master will process it as a dead one. Some regions in the later one will be opened twice.

          Show
          gaojinchao added a comment - @Eugene In your patches, You only deale with the root/meta regionserver. If a normal regionserver registers laterly. Master will process it as a dead one. Some regions in the later one will be opened twice.
          Hide
          Eugene Koontz added a comment -

          @Zhihong, correct that it's not a backport - would this mean that I should not open a new JIRA but rather continue with this one?

          Show
          Eugene Koontz added a comment - @Zhihong, correct that it's not a backport - would this mean that I should not open a new JIRA but rather continue with this one?
          Hide
          Ted Yu added a comment -

          I think opening a new JIRA would be better.
          Please also address Jinchao's comment.

          Thanks

          Show
          Ted Yu added a comment - I think opening a new JIRA would be better. Please also address Jinchao's comment. Thanks
          Hide
          Eugene Koontz added a comment -

          increase test timeout on new

          testMasterFailoverWithSlowRS() test.

          Show
          Eugene Koontz added a comment - increase test timeout on new testMasterFailoverWithSlowRS() test.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12510589/HBASE-3933.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/762//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510589/HBASE-3933.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/762//console This message is automatically generated.
          Hide
          Eugene Koontz added a comment -

          @Zhihong, ok, will open new JIRA and link here. @Gaojinchao, will address your comments in the new JIRA.

          Show
          Eugene Koontz added a comment - @Zhihong, ok, will open new JIRA and link here. @Gaojinchao, will address your comments in the new JIRA.
          Hide
          Eugene Koontz added a comment -

          Please see HBASE-5202 for the new JIRA.

          Show
          Eugene Koontz added a comment - Please see HBASE-5202 for the new JIRA.

            People

            • Assignee:
              Eugene Koontz
              Reporter:
              gaojinchao
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development