HBase
  1. HBase
  2. HBASE-5733

AssignmentManager#processDeadServersAndRegionsInTransition can fail with NPE.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.95.2
    • Fix Version/s: 0.94.1, 0.95.0
    • Component/s: master
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Found while going through the code...
      AssignmentManager#processDeadServersAndRegionsInTransition can fail with NPE as this is directly iterating the nodes from listChildrenAndWatchForNewChildren with-out checking for null.

      Here also we need to handle with null check like other places.

      1. HBASE-5733.patch
        5 kB
        Uma Maheswara Rao G
      2. HBASE-5733.patch
        5 kB
        Uma Maheswara Rao G
      3. HBASE-5733.patch
        5 kB
        Uma Maheswara Rao G

        Activity

        Hide
        Hudson added a comment -

        Integrated in HBase-0.92-security #109 (See https://builds.apache.org/job/HBase-0.92-security/109/)
        HBASE-5733 AssignmentManager#processDeadServersAndRegionsInTransition can fail with NPE. (Uma) (Revision 1344354)

        Result = SUCCESS
        ramkrishna :
        Files :

        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        Show
        Hudson added a comment - Integrated in HBase-0.92-security #109 (See https://builds.apache.org/job/HBase-0.92-security/109/ ) HBASE-5733 AssignmentManager#processDeadServersAndRegionsInTransition can fail with NPE. (Uma) (Revision 1344354) Result = SUCCESS ramkrishna : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94-security #33 (See https://builds.apache.org/job/HBase-0.94-security/33/)
        HBASE-5733 AssignmentManager#processDeadServersAndRegionsInTransition can fail with NPE. (Uma) (Revision 1344352)

        Result = FAILURE
        ramkrishna :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
        Show
        Hudson added a comment - Integrated in HBase-0.94-security #33 (See https://builds.apache.org/job/HBase-0.94-security/33/ ) HBASE-5733 AssignmentManager#processDeadServersAndRegionsInTransition can fail with NPE. (Uma) (Revision 1344352) Result = FAILURE ramkrishna : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #433 (See https://builds.apache.org/job/HBase-0.92/433/)
        HBASE-5733 AssignmentManager#processDeadServersAndRegionsInTransition can fail with NPE. (Uma) (Revision 1344354)

        Result = FAILURE
        ramkrishna :
        Files :

        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        Show
        Hudson added a comment - Integrated in HBase-0.92 #433 (See https://builds.apache.org/job/HBase-0.92/433/ ) HBASE-5733 AssignmentManager#processDeadServersAndRegionsInTransition can fail with NPE. (Uma) (Revision 1344354) Result = FAILURE ramkrishna : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94 #233 (See https://builds.apache.org/job/HBase-0.94/233/)
        HBASE-5733 AssignmentManager#processDeadServersAndRegionsInTransition can fail with NPE. (Uma) (Revision 1344352)

        Result = FAILURE
        ramkrishna :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
        Show
        Hudson added a comment - Integrated in HBase-0.94 #233 (See https://builds.apache.org/job/HBase-0.94/233/ ) HBASE-5733 AssignmentManager#processDeadServersAndRegionsInTransition can fail with NPE. (Uma) (Revision 1344352) Result = FAILURE ramkrishna : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
        Hide
        ramkrishna.s.vasudevan added a comment -

        Committed to 0.94 and 0.92. Hence resolving it.

        Show
        ramkrishna.s.vasudevan added a comment - Committed to 0.94 and 0.92. Hence resolving it.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Reopening so that once committed to other versions we can close it.

        Show
        ramkrishna.s.vasudevan added a comment - Reopening so that once committed to other versions we can close it.
        Hide
        ramkrishna.s.vasudevan added a comment -

        I think its better we commit it to 0.94.1 also before Lars could take the Rc.

        Show
        ramkrishna.s.vasudevan added a comment - I think its better we commit it to 0.94.1 also before Lars could take the Rc.
        Hide
        Uma Maheswara Rao G added a comment -

        Since it got committed, marking it as closed.

        Show
        Uma Maheswara Rao G added a comment - Since it got committed, marking it as closed.
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #174 (See https://builds.apache.org/job/HBase-TRUNK-security/174/)
        HBASE-5733 AssignmentManager#processDeadServersAndRegionsInTransition can fail with NPE (Uma Maheswara Rao G) (Revision 1327364)

        Result = FAILURE
        tedyu :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #174 (See https://builds.apache.org/job/HBase-TRUNK-security/174/ ) HBASE-5733 AssignmentManager#processDeadServersAndRegionsInTransition can fail with NPE (Uma Maheswara Rao G) (Revision 1327364) Result = FAILURE tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2779 (See https://builds.apache.org/job/HBase-TRUNK/2779/)
        HBASE-5733 AssignmentManager#processDeadServersAndRegionsInTransition can fail with NPE (Uma Maheswara Rao G) (Revision 1327364)

        Result = FAILURE
        tedyu :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2779 (See https://builds.apache.org/job/HBase-TRUNK/2779/ ) HBASE-5733 AssignmentManager#processDeadServersAndRegionsInTransition can fail with NPE (Uma Maheswara Rao G) (Revision 1327364) Result = FAILURE tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
        Hide
        Ted Yu added a comment -

        From Hadoop QA test output, I didn't find the hanging test.

        Integrated to trunk.

        Thanks for the patch Uma.

        Thanks for the review, Stack.

        Show
        Ted Yu added a comment - From Hadoop QA test output, I didn't find the hanging test. Integrated to trunk. Thanks for the patch Uma. Thanks for the review, Stack.
        Hide
        Uma Maheswara Rao G added a comment -

        No test failures and some tests skipped, that is unrelated to this change. And findbugs are unrelated.

        Show
        Uma Maheswara Rao G added a comment - No test failures and some tests skipped, that is unrelated to this change. And findbugs are unrelated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12522970/HBASE-5733.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1550//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1550//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1550//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522970/HBASE-5733.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1550//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1550//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1550//console This message is automatically generated.
        Hide
        Uma Maheswara Rao G added a comment -

        Attached the same patch as previous, with removal of FATAL log.

        Show
        Uma Maheswara Rao G added a comment - Attached the same patch as previous, with removal of FATAL log.
        Hide
        Uma Maheswara Rao G added a comment -

        Yeah, I just seen that in logs in real cluster with this situation. I will remove that explicit FATAL log here.

        2012-04-17 11:18:39,353 FATAL org.apache.hadoop.hbase.master.AssignmentManager: Problem in getting the children from ZK. Going to abort
        2012-04-17 11:18:39,354 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: []
        2012-04-17 11:18:39,354 FATAL org.apache.hadoop.hbase.master.HMaster: Problem in getting the children from ZK
        java.io.IOException: Failed to get the children from ZK
        at org.apache.hadoop.hbase.master.AssignmentManager.processDeadServersAndRegionsInTransition(AssignmentManager.java:398)
        at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:347)
        at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:537)
        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:343)
        at java.lang.Thread.run(Thread.java:662)
        2012-04-17 11:18:39,355 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

        Show
        Uma Maheswara Rao G added a comment - Yeah, I just seen that in logs in real cluster with this situation. I will remove that explicit FATAL log here. 2012-04-17 11:18:39,353 FATAL org.apache.hadoop.hbase.master.AssignmentManager: Problem in getting the children from ZK. Going to abort 2012-04-17 11:18:39,354 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-17 11:18:39,354 FATAL org.apache.hadoop.hbase.master.HMaster: Problem in getting the children from ZK java.io.IOException: Failed to get the children from ZK at org.apache.hadoop.hbase.master.AssignmentManager.processDeadServersAndRegionsInTransition(AssignmentManager.java:398) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:347) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:537) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:343) at java.lang.Thread.run(Thread.java:662) 2012-04-17 11:18:39,355 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
        Hide
        stack added a comment -

        Patch looks good to me. I like the test. The LOG.fatal is redundant. The master abort does a log fatal. Else patch is good.

        Show
        stack added a comment - Patch looks good to me. I like the test. The LOG.fatal is redundant. The master abort does a log fatal. Else patch is good.
        Hide
        Uma Maheswara Rao G added a comment -

        Test failure and findbugs are urelated to this change.

        I ran the test several times. Once it failed out of 10 runs without the patch.
        Will check the test failure separately as it is not related.

        Show
        Uma Maheswara Rao G added a comment - Test failure and findbugs are urelated to this change. I ran the test several times. Once it failed out of 10 runs without the patch. Will check the test failure separately as it is not related.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12522828/HBASE-5733.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1540//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1540//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1540//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522828/HBASE-5733.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1540//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1540//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1540//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12522805/HBASE-5733.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1538//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1538//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1538//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522805/HBASE-5733.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1538//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1538//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1538//console This message is automatically generated.
        Hide
        Uma Maheswara Rao G added a comment -

        Thanks a lot Ted for the reviews!
        Updated the patch with your suggestion.

        Show
        Uma Maheswara Rao G added a comment - Thanks a lot Ted for the reviews! Updated the patch with your suggestion.
        Hide
        Ted Yu added a comment -

        Minor comment:
        Similar sentence appears 3 times below:

        +      LOG.fatal("Problem in getting the children from ZK. Going to abort");
        +      master.abort("Problem in getting the children from ZK", new IOException(
        +          "Failed to get the children from ZK"));
        +      return;
        

        Can "Failed to get the children from ZK" be shared ?

        Show
        Ted Yu added a comment - Minor comment: Similar sentence appears 3 times below: + LOG.fatal( "Problem in getting the children from ZK. Going to abort" ); + master.abort( "Problem in getting the children from ZK" , new IOException( + "Failed to get the children from ZK" )); + return ; Can "Failed to get the children from ZK" be shared ?
        Hide
        Ted Yu added a comment -

        testProcessDeadServersAndRegionsInTransitionShouldNotFailWithNPE failed without the patch and passes with the patch.

        Show
        Ted Yu added a comment - testProcessDeadServersAndRegionsInTransitionShouldNotFailWithNPE failed without the patch and passes with the patch.
        Hide
        Uma Maheswara Rao G added a comment -

        Thanks a lot, Ted for taking a look!
        Yep, accidentally uploaded the little older one than today's patch. Updated the latest one, which I tested with real cluster for aborting on this situation.

        Show
        Uma Maheswara Rao G added a comment - Thanks a lot, Ted for taking a look! Yep, accidentally uploaded the little older one than today's patch. Updated the latest one, which I tested with real cluster for aborting on this situation.
        Hide
        Ted Yu added a comment -

        @Uma:
        Can you generate a patch for trunk ?
        I got the following when I tried to apply your patch to trunk:

        [ERROR] /Users/zhihyu/trunk-hbase/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java:[495,75] unreported exception com.google.protobuf.ServiceException; must be caught or declared to be thrown
        
        Show
        Ted Yu added a comment - @Uma: Can you generate a patch for trunk ? I got the following when I tried to apply your patch to trunk: [ERROR] /Users/zhihyu/trunk-hbase/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java:[495,75] unreported exception com.google.protobuf.ServiceException; must be caught or declared to be thrown
        Hide
        stack added a comment -

        If can't get to zk, then all bets are off (As Ram says, if connectionloss issues, RZK will retry under the covers).

        Show
        stack added a comment - If can't get to zk, then all bets are off (As Ram says, if connectionloss issues, RZK will retry under the covers).
        Hide
        ramkrishna.s.vasudevan added a comment -

        Already it is a RecoverableZookeeper right. So we again retrying may be redundant.

        Show
        ramkrishna.s.vasudevan added a comment - Already it is a RecoverableZookeeper right. So we again retrying may be redundant.
        Hide
        Ted Yu added a comment -

        We should retry in this scenario.

        Show
        Ted Yu added a comment - We should retry in this scenario.
        Hide
        Uma Maheswara Rao G added a comment -

        When we can not get the children due to ZK problem, we may not be able to mark as failover as there is no nodes.
        In-fact currently it will throw NPE. Do we need to shutdown the master in this case? or we can retry?

        Show
        Uma Maheswara Rao G added a comment - When we can not get the children due to ZK problem, we may not be able to mark as failover as there is no nodes. In-fact currently it will throw NPE. Do we need to shutdown the master in this case? or we can retry?

          People

          • Assignee:
            Uma Maheswara Rao G
            Reporter:
            Uma Maheswara Rao G
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development