HBase
  1. HBase
  2. HBASE-5883

Backup master is going down due to connection refused exception

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.90.6, 0.92.1, 0.94.0
    • Fix Version/s: 0.94.1, 0.95.0
    • Component/s: master
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got
      notification, and started to became active. Immedietly backup node got aborted with the below exception.

      2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms
      2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: []
      2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
      java.io.IOException: java.net.ConnectException: Connection refused
      	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375)
      	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045)
      	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897)
      	at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
      	at $Proxy13.getProtocolVersion(Unknown Source)
      	at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183)
      	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303)
      	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280)
      	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332)
      	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220)
      	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569)
      	at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369)
      	at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353)
      	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660)
      	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616)
      	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540)
      	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363)
      	at java.lang.Thread.run(Thread.java:662)
      Caused by: java.net.ConnectException: Connection refused
      	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
      	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
      	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488)
      	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328)
      	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362)
      	... 20 more
      2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
      2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads
      
      1. 90-addendum.patch
        1 kB
        Jieshan Bean
      2. 92-addendum.patch
        1 kB
        Jieshan Bean
      3. 94-addendum.patch
        1 kB
        Jieshan Bean
      4. HBASE-5883-90.patch
        4 kB
        Jieshan Bean
      5. HBASE-5883-92.patch
        4 kB
        Jieshan Bean
      6. HBASE-5883-94.patch
        4 kB
        Jieshan Bean
      7. HBASE-5883-trunk.patch
        4 kB
        Jieshan Bean
      8. trunk-addendum.patch
        1 kB
        Jieshan Bean

        Issue Links

          Activity

          Hide
          Jieshan Bean added a comment -

          From the below log:

          2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
          java.io.IOException: java.net.ConnectException: Connection refused
          

          We can deduce ConnectException was packaged as a IOException, likes below:

          new IOException(new ConnecException("Connection refused"));

          or something likes:
          new IOException(connectException.toString());

          If so, this exception is not handled from the code.

          Show
          Jieshan Bean added a comment - From the below log: 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused We can deduce ConnectException was packaged as a IOException, likes below: new IOException(new ConnecException("Connection refused")); or something likes: new IOException(connectException.toString()); If so, this exception is not handled from the code.
          Hide
          Jieshan Bean added a comment -

          Patch for 94. All tests passed. We are still testing it in real cluster.
          Your comments before I post the results is welcome.
          Thank you.

          Show
          Jieshan Bean added a comment - Patch for 94. All tests passed. We are still testing it in real cluster. Your comments before I post the results is welcome. Thank you.
          Hide
          Ted Yu added a comment -

          Why do we need the following code ?

          +        } else if (ioex.getMessage().toLowerCase()
          +            .contains("connection refused")) {
          +          ce = new ConnectException(ioex.getMessage());
          
          Show
          Ted Yu added a comment - Why do we need the following code ? + } else if (ioex.getMessage().toLowerCase() + .contains( "connection refused" )) { + ce = new ConnectException(ioex.getMessage());
          Hide
          Jieshan Bean added a comment -

          Just in case of the exception likes:
          new IOException(connectionException.toString())

          Show
          Jieshan Bean added a comment - Just in case of the exception likes: new IOException(connectionException.toString())
          Hide
          Ted Yu added a comment -

          @Jieshan:
          Do you want to attach patch for trunk so that Hadoop QA can run tests ?

          Show
          Ted Yu added a comment - @Jieshan: Do you want to attach patch for trunk so that Hadoop QA can run tests ?
          Hide
          Jieshan Bean added a comment -

          I have asked our tester to test this patch in real cluster. It can be finished today, then I will attach the patch for trunk

          Show
          Jieshan Bean added a comment - I have asked our tester to test this patch in real cluster. It can be finished today, then I will attach the patch for trunk
          Hide
          Jieshan Bean added a comment -

          Is that ok?

          Show
          Jieshan Bean added a comment - Is that ok?
          Hide
          Lars Hofhansl added a comment -

          Just in case of the exception likes: new IOException(connectionException.toString())

          If we have code like that, I would consider that a bug.

          Show
          Lars Hofhansl added a comment - Just in case of the exception likes: new IOException(connectionException.toString()) If we have code like that, I would consider that a bug.
          Hide
          Jieshan Bean added a comment -

          I didn't find the code like that in HBase, and I can't figure out where this exception come from.
          That's just one possibility based on my analysis.

          Show
          Jieshan Bean added a comment - I didn't find the code like that in HBase, and I can't figure out where this exception come from. That's just one possibility based on my analysis.
          Hide
          Ted Yu added a comment -

          A brief search in 0.94 branch revealed the following:

                      throw new IOException(ite.toString(), ite);
                      throw new IOException(t.toString(), t);
          src/main/java/org/apache/hadoop/hbase/client/coprocessor/Batch.java
                  IOException ioe = new IOException(target.toString());
                  IOException ioe = new IOException(e.toString());
          src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java
                IOException ioe = new IOException(target.toString());
                IOException ioe = new IOException(e.toString());
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          

          @Jieshan:
          Testing in real cluster is needed. Take your time.

          Show
          Ted Yu added a comment - A brief search in 0.94 branch revealed the following: throw new IOException(ite.toString(), ite); throw new IOException(t.toString(), t); src/main/java/org/apache/hadoop/hbase/client/coprocessor/Batch.java IOException ioe = new IOException(target.toString()); IOException ioe = new IOException(e.toString()); src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java IOException ioe = new IOException(target.toString()); IOException ioe = new IOException(e.toString()); src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java @Jieshan: Testing in real cluster is needed. Take your time.
          Hide
          Jieshan Bean added a comment -

          Patch for trunk. We have tested it in real cluster.

          Show
          Jieshan Bean added a comment - Patch for trunk. We have tested it in real cluster.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12525272/HBASE-5883-trunk.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1721//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1721//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1721//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525272/HBASE-5883-trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1721//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1721//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1721//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          Will integrate the patch to trunk and 0.94 branch tomorrow @ 10 AM PST if there is no objection.

          Show
          Ted Yu added a comment - Will integrate the patch to trunk and 0.94 branch tomorrow @ 10 AM PST if there is no objection.
          Hide
          Jieshan Bean added a comment -

          Patches for all the branches. All test cases passed.

          Show
          Jieshan Bean added a comment - Patches for all the branches. All test cases passed.
          Hide
          Ted Yu added a comment -

          Integrated to 0.94 and trunk.

          Show
          Ted Yu added a comment - Integrated to 0.94 and trunk.
          Hide
          Ted Yu added a comment -

          Integrated to 0.92 and 0.90 as well.

          Thanks for the patch Jieshan.

          Thanks for the review, Lars.

          Show
          Ted Yu added a comment - Integrated to 0.92 and 0.90 as well. Thanks for the patch Jieshan. Thanks for the review, Lars.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2843 (See https://builds.apache.org/job/HBase-TRUNK/2843/)
          HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333530)

          Result = SUCCESS
          tedyu :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2843 (See https://builds.apache.org/job/HBase-TRUNK/2843/ ) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333530) Result = SUCCESS tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #175 (See https://builds.apache.org/job/HBase-0.94/175/)
          HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333533)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #175 (See https://builds.apache.org/job/HBase-0.94/175/ ) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333533) Result = FAILURE tedyu : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92 #396 (See https://builds.apache.org/job/HBase-0.92/396/)
          HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333537)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
          Show
          Hudson added a comment - Integrated in HBase-0.92 #396 (See https://builds.apache.org/job/HBase-0.92/396/ ) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333537) Result = FAILURE tedyu : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
          Hide
          stack added a comment -

          Can't we at least check the message to ensure its what we expect? (See the second catch below where we look for "connection reset"). Can we be sure what comes up here is the ConnectException we set down in HBaseRPC?

          +      if (ioe instanceof ConnectException) {
          +        // Catch. Connect refused.
          

          This redoing of an exception seems problematic. Its really necessary?

          +        } else if (ioex.getMessage().toLowerCase()
          +            .contains("connection refused")) {
          +          ce = new ConnectException(ioex.getMessage());
          +          ioe = ce;
          

          I'd feel better about this fix if we could figure where the exception came from (Its not from the rpc stringifying of exceptions to pass them from server to client?

          Show
          stack added a comment - Can't we at least check the message to ensure its what we expect? (See the second catch below where we look for "connection reset"). Can we be sure what comes up here is the ConnectException we set down in HBaseRPC? + if (ioe instanceof ConnectException) { + // Catch. Connect refused. This redoing of an exception seems problematic. Its really necessary? + } else if (ioex.getMessage().toLowerCase() + .contains( "connection refused" )) { + ce = new ConnectException(ioex.getMessage()); + ioe = ce; I'd feel better about this fix if we could figure where the exception came from (Its not from the rpc stringifying of exceptions to pass them from server to client?
          Hide
          Ted Yu added a comment -

          As I mentioned here: https://issues.apache.org/jira/browse/HBASE-5883?focusedCommentId=13266335&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13266335
          there're places where the original exception is hidden. e.g. In WritableRpcEngine.call():

                  IOException ioe = new IOException(e.toString());
                  ioe.setStackTrace(e.getStackTrace());
                  throw ioe;
          

          The above makes identifying the source of connection refused exception difficult.

          Similar technique is used in HBASE-5877 as well:

          2) hadoop.ipc serialization of exception is based on the #getMessage. So we have to parse it internally.

          Show
          Ted Yu added a comment - As I mentioned here: https://issues.apache.org/jira/browse/HBASE-5883?focusedCommentId=13266335&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13266335 there're places where the original exception is hidden. e.g. In WritableRpcEngine.call(): IOException ioe = new IOException(e.toString()); ioe.setStackTrace(e.getStackTrace()); throw ioe; The above makes identifying the source of connection refused exception difficult. Similar technique is used in HBASE-5877 as well: 2) hadoop.ipc serialization of exception is based on the #getMessage. So we have to parse it internally.
          Hide
          stack added a comment -

          Your argument is that it is ok to compound a flaw because we have the flaw elsewhere?

          hbase-5877 is something else altogether, it is message passing in an exception messsage because dumb rpc gives no other option – it is for sure not recast of thrown exceptions

          Show
          stack added a comment - Your argument is that it is ok to compound a flaw because we have the flaw elsewhere? hbase-5877 is something else altogether, it is message passing in an exception messsage because dumb rpc gives no other option – it is for sure not recast of thrown exceptions
          Hide
          Gopinathan A added a comment -

          Sorry for the confusion..

          When i was testing this scenario, HBASE-5673 patch was exist in my cluster. It caused me the above issue.

          Show
          Gopinathan A added a comment - Sorry for the confusion.. When i was testing this scenario, HBASE-5673 patch was exist in my cluster. It caused me the above issue.
          Hide
          Ted Yu added a comment -

          The scenario described in this JIRA may not be limited to the existence of HBASE-5673 patch.

          I think an addendum removing the “.contains("connection refused"))” is needed.

          Show
          Ted Yu added a comment - The scenario described in this JIRA may not be limited to the existence of HBASE-5673 patch. I think an addendum removing the “.contains("connection refused"))” is needed.
          Hide
          Jieshan Bean added a comment -

          I agree with adding an addendum patch. I think we should unify the behavior to handle the exceptions(I remember we found so many issues regarding on the exception conversion. Everyone fixes it with his own version).

          In the method of HBaseClient#setupConnection, we catches 2 types of exceptions: SocketTimeoutException & IOException.
          Before this patch, we also catches 2 types of exceptions but they were SocketTimeoutException & ConnectionException.
          I think the previous catching is better.

          I will prepare the addendum patch today if no objection.

          Thank you all.

          Show
          Jieshan Bean added a comment - I agree with adding an addendum patch. I think we should unify the behavior to handle the exceptions(I remember we found so many issues regarding on the exception conversion. Everyone fixes it with his own version). In the method of HBaseClient#setupConnection, we catches 2 types of exceptions: SocketTimeoutException & IOException. Before this patch, we also catches 2 types of exceptions but they were SocketTimeoutException & ConnectionException. I think the previous catching is better. I will prepare the addendum patch today if no objection. Thank you all.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #191 (See https://builds.apache.org/job/HBase-TRUNK-security/191/)
          HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333530)

          Result = SUCCESS
          tedyu :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #191 (See https://builds.apache.org/job/HBase-TRUNK-security/191/ ) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333530) Result = SUCCESS tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12525583/HBASE-5883-trunk-addendum.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1762//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1762//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1762//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525583/HBASE-5883-trunk-addendum.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1762//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1762//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1762//console This message is automatically generated.
          Hide
          Jieshan Bean added a comment -

          Addendum patches for all branches.

          Show
          Jieshan Bean added a comment - Addendum patches for all branches.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12525595/HBASE-5883-94-addendum.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.client.TestFromClientSide

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1763//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1763//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1763//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525595/HBASE-5883-94-addendum.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFromClientSide Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1763//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1763//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1763//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          +1 on addendum

          Show
          Ted Yu added a comment - +1 on addendum
          Hide
          stack added a comment -

          Should we remove this too?

          +        } else if (ioex.getCause() != null
          +            && ioex.getCause() instanceof ConnectException) {
          +          ce = (ConnectException) ioex.getCause();
          +          ioe = ce;
          

          If the above happens, we'll get a stack trace that will be missing the last few stacks; i.e. the difference between here where its handled and wherever ConnectionException was originally thrown. It could be confuse debugging later?

          Also, should we pass the ioe into handleConnectionException? I'd think we'd do this for the case that ce is null (could that happen)?

          Good stuff.

          Show
          stack added a comment - Should we remove this too? + } else if (ioex.getCause() != null + && ioex.getCause() instanceof ConnectException) { + ce = (ConnectException) ioex.getCause(); + ioe = ce; If the above happens, we'll get a stack trace that will be missing the last few stacks; i.e. the difference between here where its handled and wherever ConnectionException was originally thrown. It could be confuse debugging later? Also, should we pass the ioe into handleConnectionException? I'd think we'd do this for the case that ce is null (could that happen)? Good stuff.
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security #26 (See https://builds.apache.org/job/HBase-0.94-security/26/)
          HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333533)

          Result = SUCCESS
          tedyu :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security #26 (See https://builds.apache.org/job/HBase-0.94-security/26/ ) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333533) Result = SUCCESS tedyu : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92-security #106 (See https://builds.apache.org/job/HBase-0.92-security/106/)
          HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333537)

          Result = SUCCESS
          tedyu :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
          Show
          Hudson added a comment - Integrated in HBase-0.92-security #106 (See https://builds.apache.org/job/HBase-0.92-security/106/ ) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333537) Result = SUCCESS tedyu : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
          Hide
          Jieshan Bean added a comment -

          Thank you, stack. It makes sense to me.
          Anyway, ConnectionException should not be wrapped into IOE. And we should pass ioe into handleConnectionException.
          I just thought we should handle all kinds of ConenctionExceptions(Either directly ConnectionException or wrapped ConnectionException).
          I will update the addendum patches today.

          Thank you for your review, Ted & Stack.

          Show
          Jieshan Bean added a comment - Thank you, stack. It makes sense to me. Anyway, ConnectionException should not be wrapped into IOE. And we should pass ioe into handleConnectionException. I just thought we should handle all kinds of ConenctionExceptions(Either directly ConnectionException or wrapped ConnectionException). I will update the addendum patches today. Thank you for your review, Ted & Stack.
          Hide
          Jieshan Bean added a comment -

          @Ted,
          What's your opinion on this? Thank you

          Show
          Jieshan Bean added a comment - @Ted, What's your opinion on this? Thank you
          Hide
          Ted Yu added a comment -

          handleConnectionException() is called under the condition of:

          +        if (ce != null) {
          +          handleConnectionException(++reconnectAttempts, maxAttempts, protocol,
          +              addr, ce);
          

          If ioe is passed, would the above condition change ?

          Show
          Ted Yu added a comment - handleConnectionException() is called under the condition of: + if (ce != null ) { + handleConnectionException(++reconnectAttempts, maxAttempts, protocol, + addr, ce); If ioe is passed, would the above condition change ?
          Hide
          Jieshan Bean added a comment -

          Yes. It should be changed.

          Show
          Jieshan Bean added a comment - Yes. It should be changed.
          Hide
          Jieshan Bean added a comment -

          After careful consideration, I think we should just handle the ConnectionException. Otherwise, just throw it(This is the similar logic before this re-writing).

             } catch (IOException ioex) {
                  // We only handle the ConnectException.
                  ConnectException ce = null;
                  if (ioex instanceof ConnectException) {
                    ce = (ConnectException) ioex;
                    ioe = ce;
                  } else {
                    // This is the exception we can't handle.
                    throw ioex;
                  }
                  if (ce != null) {
                    handleConnectionException(++reconnectAttempts, maxAttempts, protocol,
                        addr, ce);
                  }
                }
          
          Show
          Jieshan Bean added a comment - After careful consideration, I think we should just handle the ConnectionException. Otherwise, just throw it(This is the similar logic before this re-writing). } catch (IOException ioex) { // We only handle the ConnectException. ConnectException ce = null; if (ioex instanceof ConnectException) { ce = (ConnectException) ioex; ioe = ce; } else { // This is the exception we can't handle. throw ioex; } if (ce != null) { handleConnectionException(++reconnectAttempts, maxAttempts, protocol, addr, ce); } }
          Hide
          Jieshan Bean added a comment -

          Sorry, it should be:

            } catch (IOException ioex) {
                  // We only handle the ConnectException.
                  ConnectException ce = null;
                  if (ioex instanceof ConnectException) {
                    ce = (ConnectException) ioex;
                    ioe = ce;
                  } else {
                    // This is the exception we can't handle.
                    throw ioex;
                  }
                  handleConnectionException(++reconnectAttempts, maxAttempts, protocol,
                      addr, ce);
                }
          

          Please share your comment, Thank you.

          Show
          Jieshan Bean added a comment - Sorry, it should be: } catch (IOException ioex) { // We only handle the ConnectException. ConnectException ce = null; if (ioex instanceof ConnectException) { ce = (ConnectException) ioex; ioe = ce; } else { // This is the exception we can't handle. throw ioex; } handleConnectionException(++reconnectAttempts, maxAttempts, protocol, addr, ce); } Please share your comment, Thank you.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12525833/trunk-addendum.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.TestDrainingServer

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1785//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1785//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1785//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525833/trunk-addendum.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.TestDrainingServer Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1785//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1785//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1785//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12525851/94-addendum.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1786//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1786//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1786//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525851/94-addendum.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1786//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1786//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1786//console This message is automatically generated.
          Hide
          Lars Hofhansl added a comment -

          From the comment here I find it hard to determine whether this is fixed or not? Is it?

          Show
          Lars Hofhansl added a comment - From the comment here I find it hard to determine whether this is fixed or not? Is it?
          Hide
          Jieshan Bean added a comment -

          Yes. Nothing was fixed. It seems a misunderstanding to that exception. So the patch just like a code restructure.. So add an addendum patch to remove the unnecessary code..

          Show
          Jieshan Bean added a comment - Yes. Nothing was fixed. It seems a misunderstanding to that exception. So the patch just like a code restructure .. So add an addendum patch to remove the unnecessary code ..
          Hide
          Lars Hofhansl added a comment -

          Moving to 0.94.2 for now.

          Show
          Lars Hofhansl added a comment - Moving to 0.94.2 for now.
          Hide
          Gregory Chanan added a comment -

          Adding 0.92.2 and 0.90.7 to fix version, as this was originally checked in under those versions. I'm also unclear what needs to be done to get this to resolved, but it should be done to 0.90.7 and 0.92.2 as well.

          Show
          Gregory Chanan added a comment - Adding 0.92.2 and 0.90.7 to fix version, as this was originally checked in under those versions. I'm also unclear what needs to be done to get this to resolved, but it should be done to 0.90.7 and 0.92.2 as well.
          Hide
          stack added a comment -

          @Jieshan So, what do we need to do to close this issue out? What do we need to apply? Thanks.

          Show
          stack added a comment - @Jieshan So, what do we need to do to close this issue out? What do we need to apply? Thanks.
          Hide
          Jonathan Hsieh added a comment -

          @Jieshan since this was committed along time ago (5/3/12) I'd suggest creating a new issue to clean it up. I'll close this after it is done.

          Show
          Jonathan Hsieh added a comment - @Jieshan since this was committed along time ago (5/3/12) I'd suggest creating a new issue to clean it up. I'll close this after it is done.
          Hide
          Jonathan Hsieh added a comment -

          I'm resolving this. I believe Greg tested this and the addendum version and found that the pre-addendum version fixed the problem more effectively than afterwards.

          The patch has been committed already on 0.94.1 (it is in the 0.94.1rc0) an in the other branches.

          Please file a new issue if to address the addendum.

          Show
          Jonathan Hsieh added a comment - I'm resolving this. I believe Greg tested this and the addendum version and found that the pre-addendum version fixed the problem more effectively than afterwards. The patch has been committed already on 0.94.1 (it is in the 0.94.1rc0) an in the other branches. Please file a new issue if to address the addendum.

            People

            • Assignee:
              Jieshan Bean
              Reporter:
              Gopinathan A
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development