Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-5883

Backup master is going down due to connection refused exception

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.90.6, 0.92.1, 0.94.0
    • 0.94.1, 0.95.0
    • master
    • None
    • Reviewed

    Description

      The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got
      notification, and started to became active. Immedietly backup node got aborted with the below exception.

      2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms
      2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: []
      2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
      java.io.IOException: java.net.ConnectException: Connection refused
      	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375)
      	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045)
      	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897)
      	at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
      	at $Proxy13.getProtocolVersion(Unknown Source)
      	at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183)
      	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303)
      	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280)
      	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332)
      	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220)
      	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569)
      	at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369)
      	at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353)
      	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660)
      	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616)
      	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540)
      	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363)
      	at java.lang.Thread.run(Thread.java:662)
      Caused by: java.net.ConnectException: Connection refused
      	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
      	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
      	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488)
      	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328)
      	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362)
      	... 20 more
      2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
      2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads
      

      Attachments

        1. 90-addendum.patch
          1 kB
          Jieshan Bean
        2. 92-addendum.patch
          1 kB
          Jieshan Bean
        3. 94-addendum.patch
          1 kB
          Jieshan Bean
        4. HBASE-5883-90.patch
          4 kB
          Jieshan Bean
        5. HBASE-5883-92.patch
          4 kB
          Jieshan Bean
        6. HBASE-5883-94.patch
          4 kB
          Jieshan Bean
        7. HBASE-5883-trunk.patch
          4 kB
          Jieshan Bean
        8. trunk-addendum.patch
          1 kB
          Jieshan Bean

        Issue Links

          Activity

            jmhsieh Jonathan Hsieh added a comment -

            I'm resolving this. I believe Greg tested this and the addendum version and found that the pre-addendum version fixed the problem more effectively than afterwards.

            The patch has been committed already on 0.94.1 (it is in the 0.94.1rc0) an in the other branches.

            Please file a new issue if to address the addendum.

            jmhsieh Jonathan Hsieh added a comment - I'm resolving this. I believe Greg tested this and the addendum version and found that the pre-addendum version fixed the problem more effectively than afterwards. The patch has been committed already on 0.94.1 (it is in the 0.94.1rc0) an in the other branches. Please file a new issue if to address the addendum.
            jmhsieh Jonathan Hsieh added a comment -

            @Jieshan since this was committed along time ago (5/3/12) I'd suggest creating a new issue to clean it up. I'll close this after it is done.

            jmhsieh Jonathan Hsieh added a comment - @Jieshan since this was committed along time ago (5/3/12) I'd suggest creating a new issue to clean it up. I'll close this after it is done.
            stack Michael Stack added a comment -

            @Jieshan So, what do we need to do to close this issue out? What do we need to apply? Thanks.

            stack Michael Stack added a comment - @Jieshan So, what do we need to do to close this issue out? What do we need to apply? Thanks.
            gchanan Gregory Chanan added a comment -

            Adding 0.92.2 and 0.90.7 to fix version, as this was originally checked in under those versions. I'm also unclear what needs to be done to get this to resolved, but it should be done to 0.90.7 and 0.92.2 as well.

            gchanan Gregory Chanan added a comment - Adding 0.92.2 and 0.90.7 to fix version, as this was originally checked in under those versions. I'm also unclear what needs to be done to get this to resolved, but it should be done to 0.90.7 and 0.92.2 as well.
            larsh Lars Hofhansl added a comment -

            Moving to 0.94.2 for now.

            larsh Lars Hofhansl added a comment - Moving to 0.94.2 for now.
            jeason Jieshan Bean added a comment -

            Yes. Nothing was fixed. It seems a misunderstanding to that exception. So the patch just like a code restructure.. So add an addendum patch to remove the unnecessary code..

            jeason Jieshan Bean added a comment - Yes. Nothing was fixed. It seems a misunderstanding to that exception. So the patch just like a code restructure .. So add an addendum patch to remove the unnecessary code ..
            larsh Lars Hofhansl added a comment -

            From the comment here I find it hard to determine whether this is fixed or not? Is it?

            larsh Lars Hofhansl added a comment - From the comment here I find it hard to determine whether this is fixed or not? Is it?
            hadoopqa Hadoop QA added a comment -

            -1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12525851/94-addendum.patch
            against trunk revision .

            +1 @author. The patch does not contain any @author tags.

            -1 tests included. The patch doesn't appear to include any new or modified tests.
            Please justify why no new tests are needed for this patch.
            Also please list what manual steps were performed to verify this patch.

            +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

            +1 javadoc. The javadoc tool did not generate any warning messages.

            +1 javac. The applied patch does not increase the total number of javac compiler warnings.

            +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

            +1 release audit. The applied patch does not increase the total number of release audit warnings.

            -1 core tests. The patch failed these unit tests:

            Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1786//testReport/
            Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1786//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
            Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1786//console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525851/94-addendum.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1786//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1786//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1786//console This message is automatically generated.
            hadoopqa Hadoop QA added a comment -

            -1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12525833/trunk-addendum.patch
            against trunk revision .

            +1 @author. The patch does not contain any @author tags.

            -1 tests included. The patch doesn't appear to include any new or modified tests.
            Please justify why no new tests are needed for this patch.
            Also please list what manual steps were performed to verify this patch.

            +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

            +1 javadoc. The javadoc tool did not generate any warning messages.

            +1 javac. The applied patch does not increase the total number of javac compiler warnings.

            +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

            +1 release audit. The applied patch does not increase the total number of release audit warnings.

            -1 core tests. The patch failed these unit tests:
            org.apache.hadoop.hbase.TestDrainingServer

            Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1785//testReport/
            Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1785//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
            Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1785//console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525833/trunk-addendum.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.TestDrainingServer Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1785//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1785//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1785//console This message is automatically generated.
            jeason Jieshan Bean added a comment -

            Sorry, it should be:

              } catch (IOException ioex) {
                    // We only handle the ConnectException.
                    ConnectException ce = null;
                    if (ioex instanceof ConnectException) {
                      ce = (ConnectException) ioex;
                      ioe = ce;
                    } else {
                      // This is the exception we can't handle.
                      throw ioex;
                    }
                    handleConnectionException(++reconnectAttempts, maxAttempts, protocol,
                        addr, ce);
                  }
            

            Please share your comment, Thank you.

            jeason Jieshan Bean added a comment - Sorry, it should be: } catch (IOException ioex) { // We only handle the ConnectException. ConnectException ce = null; if (ioex instanceof ConnectException) { ce = (ConnectException) ioex; ioe = ce; } else { // This is the exception we can't handle. throw ioex; } handleConnectionException(++reconnectAttempts, maxAttempts, protocol, addr, ce); } Please share your comment, Thank you.
            jeason Jieshan Bean added a comment -

            After careful consideration, I think we should just handle the ConnectionException. Otherwise, just throw it(This is the similar logic before this re-writing).

               } catch (IOException ioex) {
                    // We only handle the ConnectException.
                    ConnectException ce = null;
                    if (ioex instanceof ConnectException) {
                      ce = (ConnectException) ioex;
                      ioe = ce;
                    } else {
                      // This is the exception we can't handle.
                      throw ioex;
                    }
                    if (ce != null) {
                      handleConnectionException(++reconnectAttempts, maxAttempts, protocol,
                          addr, ce);
                    }
                  }
            
            jeason Jieshan Bean added a comment - After careful consideration, I think we should just handle the ConnectionException. Otherwise, just throw it(This is the similar logic before this re-writing). } catch (IOException ioex) { // We only handle the ConnectException. ConnectException ce = null; if (ioex instanceof ConnectException) { ce = (ConnectException) ioex; ioe = ce; } else { // This is the exception we can't handle. throw ioex; } if (ce != null) { handleConnectionException(++reconnectAttempts, maxAttempts, protocol, addr, ce); } }
            jeason Jieshan Bean added a comment -

            Yes. It should be changed.

            jeason Jieshan Bean added a comment - Yes. It should be changed.
            zhihyu@ebaysf.com Ted Yu added a comment -

            handleConnectionException() is called under the condition of:

            +        if (ce != null) {
            +          handleConnectionException(++reconnectAttempts, maxAttempts, protocol,
            +              addr, ce);
            

            If ioe is passed, would the above condition change ?

            zhihyu@ebaysf.com Ted Yu added a comment - handleConnectionException() is called under the condition of: + if (ce != null ) { + handleConnectionException(++reconnectAttempts, maxAttempts, protocol, + addr, ce); If ioe is passed, would the above condition change ?
            jeason Jieshan Bean added a comment -

            @Ted,
            What's your opinion on this? Thank you

            jeason Jieshan Bean added a comment - @Ted, What's your opinion on this? Thank you
            jeason Jieshan Bean added a comment -

            Thank you, stack. It makes sense to me.
            Anyway, ConnectionException should not be wrapped into IOE. And we should pass ioe into handleConnectionException.
            I just thought we should handle all kinds of ConenctionExceptions(Either directly ConnectionException or wrapped ConnectionException).
            I will update the addendum patches today.

            Thank you for your review, Ted & Stack.

            jeason Jieshan Bean added a comment - Thank you, stack. It makes sense to me. Anyway, ConnectionException should not be wrapped into IOE. And we should pass ioe into handleConnectionException. I just thought we should handle all kinds of ConenctionExceptions(Either directly ConnectionException or wrapped ConnectionException). I will update the addendum patches today. Thank you for your review, Ted & Stack.
            hudson Hudson added a comment -

            Integrated in HBase-0.92-security #106 (See https://builds.apache.org/job/HBase-0.92-security/106/)
            HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333537)

            Result = SUCCESS
            tedyu :
            Files :

            • /hbase/branches/0.92/CHANGES.txt
            • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
            • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
            hudson Hudson added a comment - Integrated in HBase-0.92-security #106 (See https://builds.apache.org/job/HBase-0.92-security/106/ ) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333537) Result = SUCCESS tedyu : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
            hudson Hudson added a comment -

            Integrated in HBase-0.94-security #26 (See https://builds.apache.org/job/HBase-0.94-security/26/)
            HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333533)

            Result = SUCCESS
            tedyu :
            Files :

            • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
            • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
            hudson Hudson added a comment - Integrated in HBase-0.94-security #26 (See https://builds.apache.org/job/HBase-0.94-security/26/ ) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333533) Result = SUCCESS tedyu : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
            stack Michael Stack added a comment -

            Should we remove this too?

            +        } else if (ioex.getCause() != null
            +            && ioex.getCause() instanceof ConnectException) {
            +          ce = (ConnectException) ioex.getCause();
            +          ioe = ce;
            

            If the above happens, we'll get a stack trace that will be missing the last few stacks; i.e. the difference between here where its handled and wherever ConnectionException was originally thrown. It could be confuse debugging later?

            Also, should we pass the ioe into handleConnectionException? I'd think we'd do this for the case that ce is null (could that happen)?

            Good stuff.

            stack Michael Stack added a comment - Should we remove this too? + } else if (ioex.getCause() != null + && ioex.getCause() instanceof ConnectException) { + ce = (ConnectException) ioex.getCause(); + ioe = ce; If the above happens, we'll get a stack trace that will be missing the last few stacks; i.e. the difference between here where its handled and wherever ConnectionException was originally thrown. It could be confuse debugging later? Also, should we pass the ioe into handleConnectionException? I'd think we'd do this for the case that ce is null (could that happen)? Good stuff.
            zhihyu@ebaysf.com Ted Yu added a comment -

            +1 on addendum

            zhihyu@ebaysf.com Ted Yu added a comment - +1 on addendum
            hadoopqa Hadoop QA added a comment -

            -1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12525595/HBASE-5883-94-addendum.patch
            against trunk revision .

            +1 @author. The patch does not contain any @author tags.

            -1 tests included. The patch doesn't appear to include any new or modified tests.
            Please justify why no new tests are needed for this patch.
            Also please list what manual steps were performed to verify this patch.

            +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

            +1 javadoc. The javadoc tool did not generate any warning messages.

            +1 javac. The applied patch does not increase the total number of javac compiler warnings.

            +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

            +1 release audit. The applied patch does not increase the total number of release audit warnings.

            -1 core tests. The patch failed these unit tests:
            org.apache.hadoop.hbase.client.TestFromClientSide

            Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1763//testReport/
            Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1763//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
            Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1763//console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525595/HBASE-5883-94-addendum.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFromClientSide Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1763//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1763//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1763//console This message is automatically generated.
            jeason Jieshan Bean added a comment -

            Addendum patches for all branches.

            jeason Jieshan Bean added a comment - Addendum patches for all branches.
            hadoopqa Hadoop QA added a comment -

            -1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12525583/HBASE-5883-trunk-addendum.patch
            against trunk revision .

            +1 @author. The patch does not contain any @author tags.

            -1 tests included. The patch doesn't appear to include any new or modified tests.
            Please justify why no new tests are needed for this patch.
            Also please list what manual steps were performed to verify this patch.

            +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

            +1 javadoc. The javadoc tool did not generate any warning messages.

            +1 javac. The applied patch does not increase the total number of javac compiler warnings.

            +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

            +1 release audit. The applied patch does not increase the total number of release audit warnings.

            +1 core tests. The patch passed unit tests in .

            Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1762//testReport/
            Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1762//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
            Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1762//console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525583/HBASE-5883-trunk-addendum.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1762//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1762//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1762//console This message is automatically generated.
            hudson Hudson added a comment -

            Integrated in HBase-TRUNK-security #191 (See https://builds.apache.org/job/HBase-TRUNK-security/191/)
            HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333530)

            Result = SUCCESS
            tedyu :
            Files :

            • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
            • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
            hudson Hudson added a comment - Integrated in HBase-TRUNK-security #191 (See https://builds.apache.org/job/HBase-TRUNK-security/191/ ) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333530) Result = SUCCESS tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
            jeason Jieshan Bean added a comment -

            I agree with adding an addendum patch. I think we should unify the behavior to handle the exceptions(I remember we found so many issues regarding on the exception conversion. Everyone fixes it with his own version).

            In the method of HBaseClient#setupConnection, we catches 2 types of exceptions: SocketTimeoutException & IOException.
            Before this patch, we also catches 2 types of exceptions but they were SocketTimeoutException & ConnectionException.
            I think the previous catching is better.

            I will prepare the addendum patch today if no objection.

            Thank you all.

            jeason Jieshan Bean added a comment - I agree with adding an addendum patch. I think we should unify the behavior to handle the exceptions(I remember we found so many issues regarding on the exception conversion. Everyone fixes it with his own version). In the method of HBaseClient#setupConnection, we catches 2 types of exceptions: SocketTimeoutException & IOException. Before this patch, we also catches 2 types of exceptions but they were SocketTimeoutException & ConnectionException. I think the previous catching is better. I will prepare the addendum patch today if no objection. Thank you all.
            zhihyu@ebaysf.com Ted Yu added a comment -

            The scenario described in this JIRA may not be limited to the existence of HBASE-5673 patch.

            I think an addendum removing the “.contains("connection refused"))” is needed.

            zhihyu@ebaysf.com Ted Yu added a comment - The scenario described in this JIRA may not be limited to the existence of HBASE-5673 patch. I think an addendum removing the “.contains("connection refused"))” is needed.
            gopinathan.av Gopinathan A added a comment -

            Sorry for the confusion..

            When i was testing this scenario, HBASE-5673 patch was exist in my cluster. It caused me the above issue.

            gopinathan.av Gopinathan A added a comment - Sorry for the confusion.. When i was testing this scenario, HBASE-5673 patch was exist in my cluster. It caused me the above issue.
            stack Michael Stack added a comment -

            Your argument is that it is ok to compound a flaw because we have the flaw elsewhere?

            hbase-5877 is something else altogether, it is message passing in an exception messsage because dumb rpc gives no other option – it is for sure not recast of thrown exceptions

            stack Michael Stack added a comment - Your argument is that it is ok to compound a flaw because we have the flaw elsewhere? hbase-5877 is something else altogether, it is message passing in an exception messsage because dumb rpc gives no other option – it is for sure not recast of thrown exceptions
            zhihyu@ebaysf.com Ted Yu added a comment -

            As I mentioned here: https://issues.apache.org/jira/browse/HBASE-5883?focusedCommentId=13266335&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13266335
            there're places where the original exception is hidden. e.g. In WritableRpcEngine.call():

                    IOException ioe = new IOException(e.toString());
                    ioe.setStackTrace(e.getStackTrace());
                    throw ioe;
            

            The above makes identifying the source of connection refused exception difficult.

            Similar technique is used in HBASE-5877 as well:

            2) hadoop.ipc serialization of exception is based on the #getMessage. So we have to parse it internally.

            zhihyu@ebaysf.com Ted Yu added a comment - As I mentioned here: https://issues.apache.org/jira/browse/HBASE-5883?focusedCommentId=13266335&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13266335 there're places where the original exception is hidden. e.g. In WritableRpcEngine.call(): IOException ioe = new IOException(e.toString()); ioe.setStackTrace(e.getStackTrace()); throw ioe; The above makes identifying the source of connection refused exception difficult. Similar technique is used in HBASE-5877 as well: 2) hadoop.ipc serialization of exception is based on the #getMessage. So we have to parse it internally.
            stack Michael Stack added a comment -

            Can't we at least check the message to ensure its what we expect? (See the second catch below where we look for "connection reset"). Can we be sure what comes up here is the ConnectException we set down in HBaseRPC?

            +      if (ioe instanceof ConnectException) {
            +        // Catch. Connect refused.
            

            This redoing of an exception seems problematic. Its really necessary?

            +        } else if (ioex.getMessage().toLowerCase()
            +            .contains("connection refused")) {
            +          ce = new ConnectException(ioex.getMessage());
            +          ioe = ce;
            

            I'd feel better about this fix if we could figure where the exception came from (Its not from the rpc stringifying of exceptions to pass them from server to client?

            stack Michael Stack added a comment - Can't we at least check the message to ensure its what we expect? (See the second catch below where we look for "connection reset"). Can we be sure what comes up here is the ConnectException we set down in HBaseRPC? + if (ioe instanceof ConnectException) { + // Catch. Connect refused. This redoing of an exception seems problematic. Its really necessary? + } else if (ioex.getMessage().toLowerCase() + .contains( "connection refused" )) { + ce = new ConnectException(ioex.getMessage()); + ioe = ce; I'd feel better about this fix if we could figure where the exception came from (Its not from the rpc stringifying of exceptions to pass them from server to client?
            hudson Hudson added a comment -

            Integrated in HBase-0.92 #396 (See https://builds.apache.org/job/HBase-0.92/396/)
            HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333537)

            Result = FAILURE
            tedyu :
            Files :

            • /hbase/branches/0.92/CHANGES.txt
            • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
            • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
            hudson Hudson added a comment - Integrated in HBase-0.92 #396 (See https://builds.apache.org/job/HBase-0.92/396/ ) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333537) Result = FAILURE tedyu : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
            hudson Hudson added a comment -

            Integrated in HBase-0.94 #175 (See https://builds.apache.org/job/HBase-0.94/175/)
            HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333533)

            Result = FAILURE
            tedyu :
            Files :

            • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
            • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
            hudson Hudson added a comment - Integrated in HBase-0.94 #175 (See https://builds.apache.org/job/HBase-0.94/175/ ) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333533) Result = FAILURE tedyu : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
            hudson Hudson added a comment -

            Integrated in HBase-TRUNK #2843 (See https://builds.apache.org/job/HBase-TRUNK/2843/)
            HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333530)

            Result = SUCCESS
            tedyu :
            Files :

            • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
            • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
            hudson Hudson added a comment - Integrated in HBase-TRUNK #2843 (See https://builds.apache.org/job/HBase-TRUNK/2843/ ) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333530) Result = SUCCESS tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
            zhihyu@ebaysf.com Ted Yu added a comment -

            Integrated to 0.92 and 0.90 as well.

            Thanks for the patch Jieshan.

            Thanks for the review, Lars.

            zhihyu@ebaysf.com Ted Yu added a comment - Integrated to 0.92 and 0.90 as well. Thanks for the patch Jieshan. Thanks for the review, Lars.
            zhihyu@ebaysf.com Ted Yu added a comment -

            Integrated to 0.94 and trunk.

            zhihyu@ebaysf.com Ted Yu added a comment - Integrated to 0.94 and trunk.
            jeason Jieshan Bean added a comment -

            Patches for all the branches. All test cases passed.

            jeason Jieshan Bean added a comment - Patches for all the branches. All test cases passed.
            zhihyu@ebaysf.com Ted Yu added a comment -

            Will integrate the patch to trunk and 0.94 branch tomorrow @ 10 AM PST if there is no objection.

            zhihyu@ebaysf.com Ted Yu added a comment - Will integrate the patch to trunk and 0.94 branch tomorrow @ 10 AM PST if there is no objection.
            hadoopqa Hadoop QA added a comment -

            -1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12525272/HBASE-5883-trunk.patch
            against trunk revision .

            +1 @author. The patch does not contain any @author tags.

            -1 tests included. The patch doesn't appear to include any new or modified tests.
            Please justify why no new tests are needed for this patch.
            Also please list what manual steps were performed to verify this patch.

            +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

            +1 javadoc. The javadoc tool did not generate any warning messages.

            +1 javac. The applied patch does not increase the total number of javac compiler warnings.

            -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

            +1 release audit. The applied patch does not increase the total number of release audit warnings.

            +1 core tests. The patch passed unit tests in .

            Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1721//testReport/
            Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1721//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
            Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1721//console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525272/HBASE-5883-trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1721//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1721//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1721//console This message is automatically generated.
            jeason Jieshan Bean added a comment -

            Patch for trunk. We have tested it in real cluster.

            jeason Jieshan Bean added a comment - Patch for trunk. We have tested it in real cluster.
            zhihyu@ebaysf.com Ted Yu added a comment -

            A brief search in 0.94 branch revealed the following:

                        throw new IOException(ite.toString(), ite);
                        throw new IOException(t.toString(), t);
            src/main/java/org/apache/hadoop/hbase/client/coprocessor/Batch.java
                    IOException ioe = new IOException(target.toString());
                    IOException ioe = new IOException(e.toString());
            src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java
                  IOException ioe = new IOException(target.toString());
                  IOException ioe = new IOException(e.toString());
            src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
            

            @Jieshan:
            Testing in real cluster is needed. Take your time.

            zhihyu@ebaysf.com Ted Yu added a comment - A brief search in 0.94 branch revealed the following: throw new IOException(ite.toString(), ite); throw new IOException(t.toString(), t); src/main/java/org/apache/hadoop/hbase/client/coprocessor/Batch.java IOException ioe = new IOException(target.toString()); IOException ioe = new IOException(e.toString()); src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java IOException ioe = new IOException(target.toString()); IOException ioe = new IOException(e.toString()); src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java @Jieshan: Testing in real cluster is needed. Take your time.
            jeason Jieshan Bean added a comment -

            I didn't find the code like that in HBase, and I can't figure out where this exception come from.
            That's just one possibility based on my analysis.

            jeason Jieshan Bean added a comment - I didn't find the code like that in HBase, and I can't figure out where this exception come from. That's just one possibility based on my analysis.
            larsh Lars Hofhansl added a comment -

            Just in case of the exception likes: new IOException(connectionException.toString())

            If we have code like that, I would consider that a bug.

            larsh Lars Hofhansl added a comment - Just in case of the exception likes: new IOException(connectionException.toString()) If we have code like that, I would consider that a bug.
            jeason Jieshan Bean added a comment -

            Is that ok?

            jeason Jieshan Bean added a comment - Is that ok?
            jeason Jieshan Bean added a comment -

            I have asked our tester to test this patch in real cluster. It can be finished today, then I will attach the patch for trunk

            jeason Jieshan Bean added a comment - I have asked our tester to test this patch in real cluster. It can be finished today, then I will attach the patch for trunk
            zhihyu@ebaysf.com Ted Yu added a comment -

            @Jieshan:
            Do you want to attach patch for trunk so that Hadoop QA can run tests ?

            zhihyu@ebaysf.com Ted Yu added a comment - @Jieshan: Do you want to attach patch for trunk so that Hadoop QA can run tests ?
            jeason Jieshan Bean added a comment -

            Just in case of the exception likes:
            new IOException(connectionException.toString())

            jeason Jieshan Bean added a comment - Just in case of the exception likes: new IOException(connectionException.toString())
            zhihyu@ebaysf.com Ted Yu added a comment -

            Why do we need the following code ?

            +        } else if (ioex.getMessage().toLowerCase()
            +            .contains("connection refused")) {
            +          ce = new ConnectException(ioex.getMessage());
            
            zhihyu@ebaysf.com Ted Yu added a comment - Why do we need the following code ? + } else if (ioex.getMessage().toLowerCase() + .contains( "connection refused" )) { + ce = new ConnectException(ioex.getMessage());
            jeason Jieshan Bean added a comment -

            Patch for 94. All tests passed. We are still testing it in real cluster.
            Your comments before I post the results is welcome.
            Thank you.

            jeason Jieshan Bean added a comment - Patch for 94. All tests passed. We are still testing it in real cluster. Your comments before I post the results is welcome. Thank you.
            jeason Jieshan Bean added a comment -

            From the below log:

            2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
            java.io.IOException: java.net.ConnectException: Connection refused
            

            We can deduce ConnectException was packaged as a IOException, likes below:

            new IOException(new ConnecException("Connection refused"));

            or something likes:
            new IOException(connectException.toString());

            If so, this exception is not handled from the code.

            jeason Jieshan Bean added a comment - From the below log: 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused We can deduce ConnectException was packaged as a IOException, likes below: new IOException(new ConnecException("Connection refused")); or something likes: new IOException(connectException.toString()); If so, this exception is not handled from the code.

            People

              jeason Jieshan Bean
              gopinathan.av Gopinathan A
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: