|
[
Permalink
| « Hide
]
Dyre Tjeldvoll added a comment - 20/Mar/08 04:10 PM
Removing Fix-version for unassigned issues
Dyre Tjeldvoll made changes - 20/Mar/08 04:10 PM
I'm not sure this can be solved. AFAIK, the client normally gets the error message from the server using the jdbc connection. In this case, however, there is no open connection, and the client is therefore unable to get the error message from the server.
During buddy testing I noticed a similar issue when trying to start master against a slave which did not listen on the correct network interface (remote host):
ij> connect 'jdbc:derby://localhost:1527/replicDB;startMaster=true;slaveHost=nanna19'; ERROR XRE05: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE05, SQLERRMC: replicDB113076142121XRE05XRE05 derby.log included the stack trace, which (correctly) said Connection Refused. When the master is down (I did a kill -9), and I stop replication on the slave,
the shutdown succeeds (XRE42), but the message text is also garbled. I see: connect 'jdbc:derby://localhost:1540/wombat;stopSlave=true'; ERROR XRE42: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE42, SQLERRMC: wombat^TXRE42 The message should be (from messages.xml): Replicated database 'wombat' shutdown. I also see this type of error when trying to start a master when no slave is listening:
ij> connect 'jdbc:derby://localhost/wombat;startMaster=true;slaveHost=localHost'; ERROR XRE04: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE04, SQLERRMC: wombat^TlocalHost^T4851^TXRE04.U.1 Another variant (XRE11) when trying to stop replication on a db when no replication is going on:
connect 'jdbc:derby://localhost:1540/wombat;stopSlave=true'; ERROR XRE11: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE11, SQLERRMC: stopSlave^Twombat^TXRE11
Dag H. Wanvik made changes - 20/Apr/09 05:51 PM
And another; (XRE22) trying to start master twice):
connect 'jdbc:derby://localhost/wombat;startMaster=true;slaveHost=localHost'; ERROR XRE22: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE22, SQLERRMC: wombat^TXRE22 By adding connection severity ".C" suffix to the relevant State state strings, the message description string is pre-formatted on the server (with the server's locale) before transmission to the client, and the messages look ok again.
Are any of these states in any context where connection level severity would be wrong? Will dig and see what I find..
Making those exceptions session severity sounds like a good idea to me.
Since an exception thrown by getConnection() will always prevent the connection from being established, perhaps it would be more robust if we made the server pre-format all exceptions from getConnection() regardless of the stated severity? Anyways, increasing the severity would be enough for the messages found here, and it sounds like the right thing to do. Thanks for looking at this, Knut. The approach of making them connection level severity seems to work,
but I see many red herring from the replication regression test. It seems very instable to me. I see anything from 0 to 3 failures on a typical run, so it makes it hard to use to verify changes in the replication code. I think we should put some effort into making it less brittle. Interesting idea about getConnection. Uploading a trial patch that makes the seen SQL states session level severity.
ReplicationSuite still works, some of the time... :( That is, I do see intermittent errors, but I *think* none of them are new with this patch. I replaced imports of SQLState; tests should be self contained. I also replaced some assertTrue with assertSQLState for better error reporting. I am still not clear on if any of these errors could be thrown in contexts where it would be wrong to give them session level severity; Knut's proposal would avoid that issue, at the cost of a new mechanism (already enough in the error apparatus, perhaps :) I did not make all the XRE* errors session level yet, maybe they could all be given that severity?
Dag H. Wanvik made changes - 22/Apr/09 12:35 AM
I logged one of the Heisenbugs I referred to as
I think some of the brittleness of the replication tests can be
attributed to the fact that they run by default on localhost, using client/Server Derby instances, and the timeout in the message transmission layer is fixed at 5 seconds: org.apache.derby.impl.store.replication.net.ReplicationMessageTransmit: : private final int DEFAULT_MESSAGE_RESPONSE_TIMEOUT = 5000; This is not only the default actually; there is currently no way to override it. In some failing tests I see XRE04 (REPLICATION_CONNECTION_LOST) as the root cause of other exceptions, see example below. By upping this time constant this class or errors went away. Since I was working on my machine while the tests were running that could explain why I see more intermittent errors than is usually seen on the test machines. For the example test (ReplicationRun_Local_StateTest_part1_3), when I increase DEFAULT_MESSAGE_RESPONSE_TIMEOUT, this error goes away. Conversely, if I reduced it the frequency of errors increase. I think this constant should be settable with a property for the end user or maybe just increased, since not every application of replication can assume machines with light load. I am unsure if increasing the timeout will have any negative effect (error detection latency springs to mind). What do you think? Example error seen: ------------------ 1) testReplication_Local_StateTest_part1_3(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_3)junit.framework.ComparisonFailure: connectionURL failed: -1 XRE21 DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE21, SQLERRMC: Error occurred while performing failover for database '/export/home/dag/java/sb/tests/derby-3417-replicationTests.ReplicationRun_Local_StateTest_part1_3-sb.sb4.classes-1.6.0_13-14549/db_master/wombat', Failover attempt was aborted.::SQLSTATE: XRE04Connection lost for replicated database 'null'. expected:<XRE2[0]> but was:<XRE2[1]> at org.apache.derbyTesting.junit.BaseJDBCTestCase.assertSQLState(BaseJDBCTestCase.java:762) at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_3._testPostStartedMasterAndSlave_Failover(ReplicationRun_Local_StateTest_part1_3.java:181) at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_3.testReplication_Local_StateTest_part1_3(ReplicationRun_Local_StateTest_part1_3.java:123) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:105) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) Caused by: java.sql.SQLException: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE21, SQLERRMC: Error occurred while performing failover for database '/export/home/dag/java/sb/tests/derby-3417-replicationTests.ReplicationRun_Local_StateTest_part1_3-sb.sb4.classes-1.6.0_13-14549/db_master/wombat', Failover attempt was aborted.::SQLSTATE: XRE04Connection lost for replicated database 'null'. at org.apache.derby.client.am.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:96) at org.apache.derby.client.am.SqlException.getSQLException(SqlException.java:358) at org.apache.derby.jdbc.ClientDriver.connect(ClientDriver.java:149) at java.sql.DriverManager.getConnection(DriverManager.java:582) at java.sql.DriverManager.getConnection(DriverManager.java:207) at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_3._testPostStartedMasterAndSlave_Failover(ReplicationRun_Local_StateTest_part1_3.java:171) ... 23 more Caused by: org.apache.derby.client.am.SqlException: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE21, SQLERRMC: Error occurred while performing failover for database '/export/home/dag/java/sb/tests/derby-3417-replicationTests.ReplicationRun_Local_StateTest_part1_3-sb.sb4.classes-1.6.0_13-14549/db_master/wombat', Failover attempt was aborted.::SQLSTATE: XRE04Connection lost for replicated database 'null'. at org.apache.derby.client.am.Connection.completeSqlca(Connection.java:2082) at org.apache.derby.client.net.NetConnectionReply.parseRdbAccessFailed(NetConnectionReply.java:540) at org.apache.derby.client.net.NetConnectionReply.parseAccessRdbError(NetConnectionReply.java:433) at org.apache.derby.client.net.NetConnectionReply.parseACCRDBreply(NetConnectionReply.java:297) at org.apache.derby.client.net.NetConnectionReply.readAccessDatabase(NetConnectionReply.java:121) at org.apache.derby.client.net.NetConnection.readSecurityCheckAndAccessRdb(NetConnection.java:835) at org.apache.derby.client.net.NetConnection.flowSecurityCheckAndAccessRdb(NetConnection.java:759) at org.apache.derby.client.net.NetConnection.flowUSRIDONLconnect(NetConnection.java:592) at org.apache.derby.client.net.NetConnection.flowConnect(NetConnection.java:399) at org.apache.derby.client.net.NetConnection.<init>(NetConnection.java:219) at org.apache.derby.client.net.NetConnection40.<init>(NetConnection40.java:77) at org.apache.derby.client.net.ClientJDBCObjectFactoryImpl40.newNetConnection(ClientJDBCObjectFactoryImpl40.java:269) at org.apache.derby.jdbc.ClientDriver.connect(ClientDriver.java:140) ... 26 more
Dag H. Wanvik made changes - 23/Apr/09 03:30 AM
I had a quick look through the patch, and it looks correct to me that the SQLStates touched by the patch should have session severity. +1 from me.
Thanks for looking at this, Knut.
I filed filed Uploading #2 of this patch, the regression broke due to missing updates
needed in ErrorCodeTest (new error message now have severity session or above).
Dag H. Wanvik made changes - 28/Apr/09 04:17 PM
Committed #2 as svn 769596, resolving.
Dag H. Wanvik made changes - 29/Apr/09 12:16 AM
Please don't close yet until we decide whether to backport.
Dag, could this be back-ported to 10.5? See http://www.nabble.com/Re%3A-Derby-replication-seems-to-be-mostly-useless-p24371227.html
Thanks for reminding me; I agree, I'll do it.
Backported to 10.5 branch as svn 791826, replication suite ran ok with sane jars.
Narayanan, feel free to close this now.
Dag H. Wanvik made changes - 07/Jul/09 01:25 PM
Thank you for the backport Dag! Closing Issue!
V.Narayanan made changes - 08/Jul/09 06:00 AM
Kathey Marsden made changes - 16/Jul/09 09:24 PM
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||