[HBASE-5883] Backup master is going down due to connection refused exception - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.90.6, 0.92.1, 0.94.0
Fix Version/s: 0.94.1, 0.95.0
Component/s: master
Labels:
None

Hadoop Flags:

Reviewed

Description

The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got
notification, and started to became active. Immedietly backup node got aborted with the below exception.

2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms
2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: []
2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
java.io.IOException: java.net.ConnectException: Connection refused
	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375)
	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045)
	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897)
	at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
	at $Proxy13.getProtocolVersion(Unknown Source)
	at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183)
	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303)
	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280)
	at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332)
	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220)
	at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569)
	at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369)
	at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353)
	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660)
	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616)
	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540)
	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488)
	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328)
	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362)
	... 20 more
2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

90-addendum.patch
07/May/12 13:50
1 kB
Jieshan Bean
92-addendum.patch
07/May/12 13:50
1 kB
Jieshan Bean
94-addendum.patch
07/May/12 13:50
1 kB
Jieshan Bean
HBASE-5883-90.patch
03/May/12 11:58
4 kB
Jieshan Bean
HBASE-5883-92.patch
03/May/12 11:58
4 kB
Jieshan Bean
HBASE-5883-94.patch
03/May/12 11:58
4 kB
Jieshan Bean
HBASE-5883-trunk.patch
02/May/12 10:12
4 kB
Jieshan Bean
trunk-addendum.patch
07/May/12 12:04
1 kB
Jieshan Bean

Issue Links

is related to

HBASE-4288 "Server not running" exception during meta verification causes RS abort

Closed

HBASE-4470 ServerNotRunningException coming out of assignRootAndMeta kills the Master

Closed

HBASE-4762 ROOT and META region never be assigned if IOE throws in verifyRootRegionLocation

Closed

Activity

Descending order - Click to sort in ascending order

Jonathan Hsieh added a comment - 17/Jul/12 21:35

I'm resolving this. I believe Greg tested this and the addendum version and found that the pre-addendum version fixed the problem more effectively than afterwards.

The patch has been committed already on 0.94.1 (it is in the 0.94.1rc0) an in the other branches.

Please file a new issue if to address the addendum.

Jonathan Hsieh added a comment - 17/Jul/12 21:35 I'm resolving this. I believe Greg tested this and the addendum version and found that the pre-addendum version fixed the problem more effectively than afterwards. The patch has been committed already on 0.94.1 (it is in the 0.94.1rc0) an in the other branches. Please file a new issue if to address the addendum.

Jonathan Hsieh added a comment - 11/Jul/12 23:38

@Jieshan since this was committed along time ago (5/3/12) I'd suggest creating a new issue to clean it up. I'll close this after it is done.

Jonathan Hsieh added a comment - 11/Jul/12 23:38 @Jieshan since this was committed along time ago (5/3/12) I'd suggest creating a new issue to clean it up. I'll close this after it is done.

Michael Stack added a comment - 11/Jul/12 21:53

@Jieshan So, what do we need to do to close this issue out? What do we need to apply? Thanks.

Michael Stack added a comment - 11/Jul/12 21:53 @Jieshan So, what do we need to do to close this issue out? What do we need to apply? Thanks.

Gregory Chanan added a comment - 11/Jul/12 21:42

Adding 0.92.2 and 0.90.7 to fix version, as this was originally checked in under those versions. I'm also unclear what needs to be done to get this to resolved, but it should be done to 0.90.7 and 0.92.2 as well.

Gregory Chanan added a comment - 11/Jul/12 21:42 Adding 0.92.2 and 0.90.7 to fix version, as this was originally checked in under those versions. I'm also unclear what needs to be done to get this to resolved, but it should be done to 0.90.7 and 0.92.2 as well.

Lars Hofhansl added a comment - 30/Jun/12 16:40

Moving to 0.94.2 for now.

Lars Hofhansl added a comment - 30/Jun/12 16:40 Moving to 0.94.2 for now.

Jieshan Bean added a comment - 18/May/12 01:55

Yes. Nothing was fixed. It seems a misunderstanding to that exception. So the patch just like a code restructure.. So add an addendum patch to remove the unnecessary code..

Jieshan Bean added a comment - 18/May/12 01:55 Yes. Nothing was fixed. It seems a misunderstanding to that exception. So the patch just like a code restructure .. So add an addendum patch to remove the unnecessary code ..

Lars Hofhansl added a comment - 17/May/12 16:59

From the comment here I find it hard to determine whether this is fixed or not? Is it?

Lars Hofhansl added a comment - 17/May/12 16:59 From the comment here I find it hard to determine whether this is fixed or not? Is it?

Hadoop QA added a comment - 07/May/12 14:41

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12525851/94-addendum.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

+1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed these unit tests:

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1786//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1786//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1786//console

This message is automatically generated.

Hadoop QA added a comment - 07/May/12 14:41 -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525851/94-addendum.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1786//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1786//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1786//console This message is automatically generated.

Hadoop QA added a comment - 07/May/12 12:57

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12525833/trunk-addendum.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.TestDrainingServer

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1785//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1785//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1785//console

This message is automatically generated.

Hadoop QA added a comment - 07/May/12 12:57 -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525833/trunk-addendum.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.TestDrainingServer Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1785//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1785//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1785//console This message is automatically generated.

Jieshan Bean added a comment - 07/May/12 04:55

Sorry, it should be:

  } catch (IOException ioex) {
        // We only handle the ConnectException.
        ConnectException ce = null;
        if (ioex instanceof ConnectException) {
          ce = (ConnectException) ioex;
          ioe = ce;
        } else {
          // This is the exception we can't handle.
          throw ioex;
        }
        handleConnectionException(++reconnectAttempts, maxAttempts, protocol,
            addr, ce);
      }

Please share your comment, Thank you.

Jieshan Bean added a comment - 07/May/12 04:55 Sorry, it should be: } catch (IOException ioex) { // We only handle the ConnectException. ConnectException ce = null; if (ioex instanceof ConnectException) { ce = (ConnectException) ioex; ioe = ce; } else { // This is the exception we can't handle. throw ioex; } handleConnectionException(++reconnectAttempts, maxAttempts, protocol, addr, ce); } Please share your comment, Thank you.

Jieshan Bean added a comment - 07/May/12 04:47

After careful consideration, I think we should just handle the ConnectionException. Otherwise, just throw it(This is the similar logic before this re-writing).

   } catch (IOException ioex) {
        // We only handle the ConnectException.
        ConnectException ce = null;
        if (ioex instanceof ConnectException) {
          ce = (ConnectException) ioex;
          ioe = ce;
        } else {
          // This is the exception we can't handle.
          throw ioex;
        }
        if (ce != null) {
          handleConnectionException(++reconnectAttempts, maxAttempts, protocol,
              addr, ce);
        }
      }

Jieshan Bean added a comment - 07/May/12 04:47 After careful consideration, I think we should just handle the ConnectionException. Otherwise, just throw it(This is the similar logic before this re-writing). } catch (IOException ioex) { // We only handle the ConnectException. ConnectException ce = null; if (ioex instanceof ConnectException) { ce = (ConnectException) ioex; ioe = ce; } else { // This is the exception we can't handle. throw ioex; } if (ce != null) { handleConnectionException(++reconnectAttempts, maxAttempts, protocol, addr, ce); } }

Jieshan Bean added a comment - 07/May/12 04:25

Yes. It should be changed.

Jieshan Bean added a comment - 07/May/12 04:25 Yes. It should be changed.

Ted Yu added a comment - 07/May/12 04:20

handleConnectionException() is called under the condition of:

+        if (ce != null) {
+          handleConnectionException(++reconnectAttempts, maxAttempts, protocol,
+              addr, ce);

If ioe is passed, would the above condition change ?

Ted Yu added a comment - 07/May/12 04:20 handleConnectionException() is called under the condition of: + if (ce != null ) { + handleConnectionException(++reconnectAttempts, maxAttempts, protocol, + addr, ce); If ioe is passed, would the above condition change ?

Jieshan Bean added a comment - 07/May/12 04:12

@Ted,
What's your opinion on this? Thank you

Jieshan Bean added a comment - 07/May/12 04:12 @Ted, What's your opinion on this? Thank you

Jieshan Bean added a comment - 07/May/12 04:03

Thank you, stack. It makes sense to me.
Anyway, ConnectionException should not be wrapped into IOE. And we should pass ioe into handleConnectionException.
I just thought we should handle all kinds of ConenctionExceptions(Either directly ConnectionException or wrapped ConnectionException).
I will update the addendum patches today.

Thank you for your review, Ted & Stack.

Jieshan Bean added a comment - 07/May/12 04:03 Thank you, stack. It makes sense to me. Anyway, ConnectionException should not be wrapped into IOE. And we should pass ioe into handleConnectionException. I just thought we should handle all kinds of ConenctionExceptions(Either directly ConnectionException or wrapped ConnectionException). I will update the addendum patches today. Thank you for your review, Ted & Stack.

Hudson added a comment - 05/May/12 02:28

Integrated in HBase-0.92-security #106 (See https://builds.apache.org/job/HBase-0.92-security/106/)
~~HBASE-5883~~ Backup master is going down due to connection refused exception (Jieshan) (Revision 1333537)

Result = SUCCESS
tedyu :
Files :

/hbase/branches/0.92/CHANGES.txt
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java

Hudson added a comment - 05/May/12 02:28 Integrated in HBase-0.92-security #106 (See https://builds.apache.org/job/HBase-0.92-security/106/ ) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333537) Result = SUCCESS tedyu : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java

Hudson added a comment - 05/May/12 00:58

Integrated in HBase-0.94-security #26 (See https://builds.apache.org/job/HBase-0.94-security/26/)
~~HBASE-5883~~ Backup master is going down due to connection refused exception (Jieshan) (Revision 1333533)

Result = SUCCESS
tedyu :
Files :

/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java

Hudson added a comment - 05/May/12 00:58 Integrated in HBase-0.94-security #26 (See https://builds.apache.org/job/HBase-0.94-security/26/ ) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333533) Result = SUCCESS tedyu : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java

Michael Stack added a comment - 04/May/12 15:32

Should we remove this too?

+        } else if (ioex.getCause() != null
+            && ioex.getCause() instanceof ConnectException) {
+          ce = (ConnectException) ioex.getCause();
+          ioe = ce;

If the above happens, we'll get a stack trace that will be missing the last few stacks; i.e. the difference between here where its handled and wherever ConnectionException was originally thrown. It could be confuse debugging later?

Also, should we pass the ioe into handleConnectionException? I'd think we'd do this for the case that ce is null (could that happen)?

Good stuff.

Michael Stack added a comment - 04/May/12 15:32 Should we remove this too? + } else if (ioex.getCause() != null + && ioex.getCause() instanceof ConnectException) { + ce = (ConnectException) ioex.getCause(); + ioe = ce; If the above happens, we'll get a stack trace that will be missing the last few stacks; i.e. the difference between here where its handled and wherever ConnectionException was originally thrown. It could be confuse debugging later? Also, should we pass the ioe into handleConnectionException? I'd think we'd do this for the case that ce is null (could that happen)? Good stuff.

Ted Yu added a comment - 04/May/12 13:54

+1 on addendum

Ted Yu added a comment - 04/May/12 13:54 +1 on addendum

Hadoop QA added a comment - 04/May/12 11:16

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12525595/HBASE-5883-94-addendum.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.client.TestFromClientSide

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1763//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1763//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1763//console

This message is automatically generated.

Hadoop QA added a comment - 04/May/12 11:16 -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525595/HBASE-5883-94-addendum.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFromClientSide Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1763//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1763//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1763//console This message is automatically generated.

Jieshan Bean added a comment - 04/May/12 10:29

Addendum patches for all branches.

Jieshan Bean added a comment - 04/May/12 10:29 Addendum patches for all branches.

Hadoop QA added a comment - 04/May/12 10:27

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12525583/HBASE-5883-trunk-addendum.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1762//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1762//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1762//console

This message is automatically generated.

Hadoop QA added a comment - 04/May/12 10:27 -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525583/HBASE-5883-trunk-addendum.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1762//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1762//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1762//console This message is automatically generated.

Hudson added a comment - 04/May/12 06:04

Integrated in HBase-TRUNK-security #191 (See https://builds.apache.org/job/HBase-TRUNK-security/191/)
~~HBASE-5883~~ Backup master is going down due to connection refused exception (Jieshan) (Revision 1333530)

Result = SUCCESS
tedyu :
Files :

/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java

Hudson added a comment - 04/May/12 06:04 Integrated in HBase-TRUNK-security #191 (See https://builds.apache.org/job/HBase-TRUNK-security/191/ ) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333530) Result = SUCCESS tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java

Jieshan Bean added a comment - 04/May/12 05:30

I agree with adding an addendum patch. I think we should unify the behavior to handle the exceptions(I remember we found so many issues regarding on the exception conversion. Everyone fixes it with his own version).

In the method of HBaseClient#setupConnection, we catches 2 types of exceptions: SocketTimeoutException & IOException.
Before this patch, we also catches 2 types of exceptions but they were SocketTimeoutException & ConnectionException.
I think the previous catching is better.

I will prepare the addendum patch today if no objection.

Thank you all.

Jieshan Bean added a comment - 04/May/12 05:30 I agree with adding an addendum patch. I think we should unify the behavior to handle the exceptions(I remember we found so many issues regarding on the exception conversion. Everyone fixes it with his own version). In the method of HBaseClient#setupConnection, we catches 2 types of exceptions: SocketTimeoutException & IOException. Before this patch, we also catches 2 types of exceptions but they were SocketTimeoutException & ConnectionException. I think the previous catching is better. I will prepare the addendum patch today if no objection. Thank you all.

Ted Yu added a comment - 04/May/12 05:15

The scenario described in this JIRA may not be limited to the existence of ~~HBASE-5673~~ patch.

I think an addendum removing the “.contains("connection refused"))” is needed.

Ted Yu added a comment - 04/May/12 05:15 The scenario described in this JIRA may not be limited to the existence of HBASE-5673 patch. I think an addendum removing the “.contains("connection refused"))” is needed.

Gopinathan A added a comment - 04/May/12 04:29

Sorry for the confusion..

When i was testing this scenario, ~~HBASE-5673~~ patch was exist in my cluster. It caused me the above issue.

Gopinathan A added a comment - 04/May/12 04:29 Sorry for the confusion.. When i was testing this scenario, HBASE-5673 patch was exist in my cluster. It caused me the above issue.

Michael Stack added a comment - 03/May/12 22:08

Your argument is that it is ok to compound a flaw because we have the flaw elsewhere?

hbase-5877 is something else altogether, it is message passing in an exception messsage because dumb rpc gives no other option – it is for sure not recast of thrown exceptions

Michael Stack added a comment - 03/May/12 22:08 Your argument is that it is ok to compound a flaw because we have the flaw elsewhere? hbase-5877 is something else altogether, it is message passing in an exception messsage because dumb rpc gives no other option – it is for sure not recast of thrown exceptions

Ted Yu added a comment - 03/May/12 20:40

As I mentioned here: https://issues.apache.org/jira/browse/HBASE-5883?focusedCommentId=13266335&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13266335
there're places where the original exception is hidden. e.g. In WritableRpcEngine.call():

        IOException ioe = new IOException(e.toString());
        ioe.setStackTrace(e.getStackTrace());
        throw ioe;

The above makes identifying the source of connection refused exception difficult.

Similar technique is used in ~~HBASE-5877~~ as well:

2) hadoop.ipc serialization of exception is based on the #getMessage. So we have to parse it internally.

Ted Yu added a comment - 03/May/12 20:40 As I mentioned here: https://issues.apache.org/jira/browse/HBASE-5883?focusedCommentId=13266335&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13266335 there're places where the original exception is hidden. e.g. In WritableRpcEngine.call(): IOException ioe = new IOException(e.toString()); ioe.setStackTrace(e.getStackTrace()); throw ioe; The above makes identifying the source of connection refused exception difficult. Similar technique is used in HBASE-5877 as well: 2) hadoop.ipc serialization of exception is based on the #getMessage. So we have to parse it internally.

Michael Stack added a comment - 03/May/12 19:50

Can't we at least check the message to ensure its what we expect? (See the second catch below where we look for "connection reset"). Can we be sure what comes up here is the ConnectException we set down in HBaseRPC?

+      if (ioe instanceof ConnectException) {
+        // Catch. Connect refused.

This redoing of an exception seems problematic. Its really necessary?

+        } else if (ioex.getMessage().toLowerCase()
+            .contains("connection refused")) {
+          ce = new ConnectException(ioex.getMessage());
+          ioe = ce;

I'd feel better about this fix if we could figure where the exception came from (Its not from the rpc stringifying of exceptions to pass them from server to client?

Michael Stack added a comment - 03/May/12 19:50 Can't we at least check the message to ensure its what we expect? (See the second catch below where we look for "connection reset"). Can we be sure what comes up here is the ConnectException we set down in HBaseRPC? + if (ioe instanceof ConnectException) { + // Catch. Connect refused. This redoing of an exception seems problematic. Its really necessary? + } else if (ioex.getMessage().toLowerCase() + .contains( "connection refused" )) { + ce = new ConnectException(ioex.getMessage()); + ioe = ce; I'd feel better about this fix if we could figure where the exception came from (Its not from the rpc stringifying of exceptions to pass them from server to client?

Hudson added a comment - 03/May/12 19:20

Integrated in HBase-0.92 #396 (See https://builds.apache.org/job/HBase-0.92/396/)
~~HBASE-5883~~ Backup master is going down due to connection refused exception (Jieshan) (Revision 1333537)

Result = FAILURE
tedyu :
Files :

/hbase/branches/0.92/CHANGES.txt
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java

Hudson added a comment - 03/May/12 19:20 Integrated in HBase-0.92 #396 (See https://builds.apache.org/job/HBase-0.92/396/ ) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333537) Result = FAILURE tedyu : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java

Hudson added a comment - 03/May/12 17:54

Integrated in HBase-0.94 #175 (See https://builds.apache.org/job/HBase-0.94/175/)
~~HBASE-5883~~ Backup master is going down due to connection refused exception (Jieshan) (Revision 1333533)

Result = FAILURE
tedyu :
Files :

/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java

Hudson added a comment - 03/May/12 17:54 Integrated in HBase-0.94 #175 (See https://builds.apache.org/job/HBase-0.94/175/ ) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333533) Result = FAILURE tedyu : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java

Hudson added a comment - 03/May/12 17:43

Integrated in HBase-TRUNK #2843 (See https://builds.apache.org/job/HBase-TRUNK/2843/)
~~HBASE-5883~~ Backup master is going down due to connection refused exception (Jieshan) (Revision 1333530)

Result = SUCCESS
tedyu :
Files :

/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java

Hudson added a comment - 03/May/12 17:43 Integrated in HBase-TRUNK #2843 (See https://builds.apache.org/job/HBase-TRUNK/2843/ ) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333530) Result = SUCCESS tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java

Ted Yu added a comment - 03/May/12 16:49

Integrated to 0.92 and 0.90 as well.

Thanks for the patch Jieshan.

Thanks for the review, Lars.

Ted Yu added a comment - 03/May/12 16:49 Integrated to 0.92 and 0.90 as well. Thanks for the patch Jieshan. Thanks for the review, Lars.

Ted Yu added a comment - 03/May/12 16:30

Integrated to 0.94 and trunk.

Ted Yu added a comment - 03/May/12 16:30 Integrated to 0.94 and trunk.

Jieshan Bean added a comment - 03/May/12 11:58

Patches for all the branches. All test cases passed.

Jieshan Bean added a comment - 03/May/12 11:58 Patches for all the branches. All test cases passed.

Ted Yu added a comment - 03/May/12 03:43

Will integrate the patch to trunk and 0.94 branch tomorrow @ 10 AM PST if there is no objection.

Ted Yu added a comment - 03/May/12 03:43 Will integrate the patch to trunk and 0.94 branch tomorrow @ 10 AM PST if there is no objection.

Hadoop QA added a comment - 02/May/12 14:17

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12525272/HBASE-5883-trunk.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

-1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1721//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1721//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1721//console

This message is automatically generated.

Hadoop QA added a comment - 02/May/12 14:17 -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525272/HBASE-5883-trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1721//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1721//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1721//console This message is automatically generated.

Jieshan Bean added a comment - 02/May/12 10:12

Patch for trunk. We have tested it in real cluster.

Jieshan Bean added a comment - 02/May/12 10:12 Patch for trunk. We have tested it in real cluster.

Ted Yu added a comment - 02/May/12 04:31

A brief search in 0.94 branch revealed the following:

            throw new IOException(ite.toString(), ite);
            throw new IOException(t.toString(), t);
src/main/java/org/apache/hadoop/hbase/client/coprocessor/Batch.java
        IOException ioe = new IOException(target.toString());
        IOException ioe = new IOException(e.toString());
src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java
      IOException ioe = new IOException(target.toString());
      IOException ioe = new IOException(e.toString());
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java

@Jieshan:
Testing in real cluster is needed. Take your time.

Ted Yu added a comment - 02/May/12 04:31 A brief search in 0.94 branch revealed the following: throw new IOException(ite.toString(), ite); throw new IOException(t.toString(), t); src/main/java/org/apache/hadoop/hbase/client/coprocessor/Batch.java IOException ioe = new IOException(target.toString()); IOException ioe = new IOException(e.toString()); src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java IOException ioe = new IOException(target.toString()); IOException ioe = new IOException(e.toString()); src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java @Jieshan: Testing in real cluster is needed. Take your time.

Jieshan Bean added a comment - 02/May/12 04:24

I didn't find the code like that in HBase, and I can't figure out where this exception come from.
That's just one possibility based on my analysis.

Jieshan Bean added a comment - 02/May/12 04:24 I didn't find the code like that in HBase, and I can't figure out where this exception come from. That's just one possibility based on my analysis.

Lars Hofhansl added a comment - 02/May/12 04:13

Just in case of the exception likes: new IOException(connectionException.toString())

If we have code like that, I would consider that a bug.

Lars Hofhansl added a comment - 02/May/12 04:13 Just in case of the exception likes: new IOException(connectionException.toString()) If we have code like that, I would consider that a bug.

Jieshan Bean added a comment - 02/May/12 03:53

Is that ok?

Jieshan Bean added a comment - 02/May/12 03:53 Is that ok?

Jieshan Bean added a comment - 02/May/12 03:53

I have asked our tester to test this patch in real cluster. It can be finished today, then I will attach the patch for trunk

Jieshan Bean added a comment - 02/May/12 03:53 I have asked our tester to test this patch in real cluster. It can be finished today, then I will attach the patch for trunk

Ted Yu added a comment - 02/May/12 03:49

@Jieshan:
Do you want to attach patch for trunk so that Hadoop QA can run tests ?

Ted Yu added a comment - 02/May/12 03:49 @Jieshan: Do you want to attach patch for trunk so that Hadoop QA can run tests ?

Jieshan Bean added a comment - 02/May/12 03:42

Just in case of the exception likes:
new IOException(connectionException.toString())

Jieshan Bean added a comment - 02/May/12 03:42 Just in case of the exception likes: new IOException(connectionException.toString())

Ted Yu added a comment - 30/Apr/12 15:34

Why do we need the following code ?

+        } else if (ioex.getMessage().toLowerCase()
+            .contains("connection refused")) {
+          ce = new ConnectException(ioex.getMessage());

Ted Yu added a comment - 30/Apr/12 15:34 Why do we need the following code ? + } else if (ioex.getMessage().toLowerCase() + .contains( "connection refused" )) { + ce = new ConnectException(ioex.getMessage());

Jieshan Bean added a comment - 30/Apr/12 12:43

Patch for 94. All tests passed. We are still testing it in real cluster.
Your comments before I post the results is welcome.
Thank you.

Jieshan Bean added a comment - 30/Apr/12 12:43 Patch for 94. All tests passed. We are still testing it in real cluster. Your comments before I post the results is welcome. Thank you.

Jieshan Bean added a comment - 26/Apr/12 10:42

From the below log:

2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
java.io.IOException: java.net.ConnectException: Connection refused

We can deduce ConnectException was packaged as a IOException, likes below:

new IOException(new ConnecException("Connection refused"));

or something likes:
new IOException(connectException.toString());

If so, this exception is not handled from the code.

Jieshan Bean added a comment - 26/Apr/12 10:42 From the below log: 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused We can deduce ConnectException was packaged as a IOException, likes below: new IOException(new ConnecException("Connection refused")); or something likes: new IOException(connectException.toString()); If so, this exception is not handled from the code.

People

Assignee:: Jieshan Bean

Reporter:: Gopinathan A

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 26/Apr/12 10:34

Updated:: 26/Feb/13 16:56

Resolved:: 17/Jul/12 21:35