Issue Details (XML | Word | Printable)

Key: DERBY-3417
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Dag H. Wanvik
Reporter: V.Narayanan
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Derby

slave side stop in a client server mode results in SQLState printed without proper error message

Created: 14/Feb/08 10:18 AM   Updated: 16/Jul/09 09:24 PM
Component/s: Replication
Affects Version/s: 10.4.1.3
Fix Version/s: 10.5.2.0, 10.6.0.0

Time Tracking:
Not Specified

File Attachments:
  Size
File Licensed for inclusion in ASF works derby-3417-2.diff 2009-04-28 04:17 PM Dag H. Wanvik 22 kB
File Licensed for inclusion in ASF works derby-3417-2.stat 2009-04-28 04:17 PM Dag H. Wanvik 0.8 kB
File Licensed for inclusion in ASF works derby-3417.diff 2009-04-22 12:35 AM Dag H. Wanvik 19 kB
File Licensed for inclusion in ASF works derby-3417.stat 2009-04-22 12:35 AM Dag H. Wanvik 0.7 kB

Resolution Date: 29/Apr/09 12:16 AM
Labels:


 Description  « Hide
I tried a stopSlave on the slave side of the replication system and
found the below

ij> connect 'jdbc:derby://localhost:1528/replicationdb;stopSlave=true';
ERROR XRE41: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE41, SQLERRMC: XRE41

https://issues.apache.org/jira/browse/DERBY-3205 says

ERROR XRE41: Replication operation 'failover' or 'stopSlave' failed because the connection with the master is working. Issue the 'failover' or 'stopMaster' operation on the master database instead.

needs to be printed.

I am not sure if this is a generic case for client server replication messages.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Dyre Tjeldvoll added a comment - 20/Mar/08 04:10 PM
Removing Fix-version for unassigned issues

Dyre Tjeldvoll made changes - 20/Mar/08 04:10 PM
Field Original Value New Value
Fix Version/s 10.4.0.0 [ 12312540 ]
Jørgen Løland added a comment - 27/Mar/08 12:45 PM
I'm not sure this can be solved. AFAIK, the client normally gets the error message from the server using the jdbc connection. In this case, however, there is no open connection, and the client is therefore unable to get the error message from the server.

John H. Embretsen added a comment - 21/Apr/08 02:04 PM
During buddy testing I noticed a similar issue when trying to start master against a slave which did not listen on the correct network interface (remote host):

ij> connect 'jdbc:derby://localhost:1527/replicDB;startMaster=true;slaveHost=nanna19';
ERROR XRE05: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE05, SQLERRMC: replicDB113076142121XRE05XRE05

derby.log included the stack trace, which (correctly) said Connection Refused.

Dag H. Wanvik added a comment - 16/Apr/09 11:10 PM
When the master is down (I did a kill -9), and I stop replication on the slave,
the shutdown succeeds (XRE42), but the message text is also garbled. I see:

connect 'jdbc:derby://localhost:1540/wombat;stopSlave=true';
ERROR XRE42: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE42, SQLERRMC: wombat^TXRE42

The message should be (from messages.xml):

Replicated database 'wombat' shutdown.

Dag H. Wanvik added a comment - 20/Apr/09 05:07 PM
I also see this type of error when trying to start a master when no slave is listening:

ij> connect 'jdbc:derby://localhost/wombat;startMaster=true;slaveHost=localHost';
ERROR XRE04: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE04, SQLERRMC: wombat^TlocalHost^T4851^TXRE04.U.1

Dag H. Wanvik added a comment - 20/Apr/09 05:50 PM
Another variant (XRE11) when trying to stop replication on a db when no replication is going on:
connect 'jdbc:derby://localhost:1540/wombat;stopSlave=true';
ERROR XRE11: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE11, SQLERRMC: stopSlave^Twombat^TXRE11

Dag H. Wanvik made changes - 20/Apr/09 05:51 PM
Assignee Dag H. Wanvik [ dagw ]
Dag H. Wanvik added a comment - 20/Apr/09 06:06 PM
And another; (XRE22) trying to start master twice):

connect 'jdbc:derby://localhost/wombat;startMaster=true;slaveHost=localHost';
ERROR XRE22: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE22, SQLERRMC: wombat^TXRE22

Dag H. Wanvik added a comment - 20/Apr/09 06:11 PM
By adding connection severity ".C" suffix to the relevant State state strings, the message description string is pre-formatted on the server (with the server's locale) before transmission to the client, and the messages look ok again.

Dag H. Wanvik added a comment - 20/Apr/09 06:13 PM
Are any of these states in any context where connection level severity would be wrong? Will dig and see what I find..

Knut Anders Hatlen added a comment - 21/Apr/09 07:57 AM
Making those exceptions session severity sounds like a good idea to me.

Since an exception thrown by getConnection() will always prevent the connection from being established, perhaps it would be more robust if we made the server pre-format all exceptions from getConnection() regardless of the stated severity? Anyways, increasing the severity would be enough for the messages found here, and it sounds like the right thing to do.

Dag H. Wanvik added a comment - 21/Apr/09 04:07 PM
Thanks for looking at this, Knut. The approach of making them connection level severity seems to work,
but I see many red herring from the replication regression test. It seems very instable to me. I see anything from 0 to 3 failures
on a typical run, so it makes it hard to use to verify changes in the replication code. I think we should put some effort into making it less
brittle. Interesting idea about getConnection.

Dag H. Wanvik added a comment - 22/Apr/09 12:35 AM
Uploading a trial patch that makes the seen SQL states session level severity.
ReplicationSuite still works, some of the time... :( That is, I do see intermittent errors,
but I *think* none of them are new with this patch.

I replaced imports of SQLState; tests should be self contained.
I also replaced some assertTrue with assertSQLState for better error reporting.

I am still not clear on if any of these errors could be thrown in contexts where it would be wrong to give them session level severity; Knut's proposal would avoid that issue, at the cost of a new mechanism (already enough in the error apparatus, perhaps :)

I did not make all the XRE* errors session level yet, maybe they could all be given that severity?

Dag H. Wanvik made changes - 22/Apr/09 12:35 AM
Attachment derby-3417.diff [ 12406078 ]
Attachment derby-3417.stat [ 12406079 ]
Dag H. Wanvik added a comment - 22/Apr/09 12:37 AM
I logged one of the Heisenbugs I referred to as DERBY-4175.

Dag H. Wanvik added a comment - 22/Apr/09 11:12 PM
I think some of the brittleness of the replication tests can be
attributed to the fact that they run by default on localhost, using
client/Server Derby instances, and the timeout in the message
transmission layer is fixed at 5 seconds:

org.apache.derby.impl.store.replication.net.ReplicationMessageTransmit:
    :
    private final int DEFAULT_MESSAGE_RESPONSE_TIMEOUT = 5000;

This is not only the default actually; there is currently no way to
override it. In some failing tests I see XRE04
(REPLICATION_CONNECTION_LOST) as the root cause of other exceptions,
see example below. By upping this time constant this class or errors
went away. Since I was working on my machine while the tests were
running that could explain why I see more intermittent errors than is
usually seen on the test machines.

For the example test (ReplicationRun_Local_StateTest_part1_3), when I
increase DEFAULT_MESSAGE_RESPONSE_TIMEOUT, this error goes
away. Conversely, if I reduced it the frequency of errors increase.

I think this constant should be settable with a property for the end
user or maybe just increased, since not every application of
replication can assume machines with light load. I am unsure if
increasing the timeout will have any negative effect (error detection
latency springs to mind). What do you think?



Example error seen:
------------------
1) testReplication_Local_StateTest_part1_3(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_3)junit.framework.ComparisonFailure: connectionURL failed: -1 XRE21 DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE21, SQLERRMC: Error occurred while performing failover for database '/export/home/dag/java/sb/tests/derby-3417-replicationTests.ReplicationRun_Local_StateTest_part1_3-sb.sb4.classes-1.6.0_13-14549/db_master/wombat', Failover attempt was aborted.::SQLSTATE: XRE04Connection lost for replicated database 'null'. expected:<XRE2[0]> but was:<XRE2[1]>
at org.apache.derbyTesting.junit.BaseJDBCTestCase.assertSQLState(BaseJDBCTestCase.java:762)
at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_3._testPostStartedMasterAndSlave_Failover(ReplicationRun_Local_StateTest_part1_3.java:181)
at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_3.testReplication_Local_StateTest_part1_3(ReplicationRun_Local_StateTest_part1_3.java:123)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:105)
at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
at junit.extensions.TestSetup.run(TestSetup.java:25)

Caused by: java.sql.SQLException: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE21, SQLERRMC: Error occurred while performing failover for database '/export/home/dag/java/sb/tests/derby-3417-replicationTests.ReplicationRun_Local_StateTest_part1_3-sb.sb4.classes-1.6.0_13-14549/db_master/wombat', Failover attempt was aborted.::SQLSTATE: XRE04Connection lost for replicated database 'null'.
at org.apache.derby.client.am.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:96)
at org.apache.derby.client.am.SqlException.getSQLException(SqlException.java:358)
at org.apache.derby.jdbc.ClientDriver.connect(ClientDriver.java:149)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:207)
at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_3._testPostStartedMasterAndSlave_Failover(ReplicationRun_Local_StateTest_part1_3.java:171)
... 23 more

Caused by: org.apache.derby.client.am.SqlException: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE21, SQLERRMC: Error occurred while performing failover for database '/export/home/dag/java/sb/tests/derby-3417-replicationTests.ReplicationRun_Local_StateTest_part1_3-sb.sb4.classes-1.6.0_13-14549/db_master/wombat', Failover attempt was aborted.::SQLSTATE: XRE04Connection lost for replicated database 'null'.
at org.apache.derby.client.am.Connection.completeSqlca(Connection.java:2082)
at org.apache.derby.client.net.NetConnectionReply.parseRdbAccessFailed(NetConnectionReply.java:540)
at org.apache.derby.client.net.NetConnectionReply.parseAccessRdbError(NetConnectionReply.java:433)
at org.apache.derby.client.net.NetConnectionReply.parseACCRDBreply(NetConnectionReply.java:297)
at org.apache.derby.client.net.NetConnectionReply.readAccessDatabase(NetConnectionReply.java:121)
at org.apache.derby.client.net.NetConnection.readSecurityCheckAndAccessRdb(NetConnection.java:835)
at org.apache.derby.client.net.NetConnection.flowSecurityCheckAndAccessRdb(NetConnection.java:759)
at org.apache.derby.client.net.NetConnection.flowUSRIDONLconnect(NetConnection.java:592)
at org.apache.derby.client.net.NetConnection.flowConnect(NetConnection.java:399)
at org.apache.derby.client.net.NetConnection.<init>(NetConnection.java:219)
at org.apache.derby.client.net.NetConnection40.<init>(NetConnection40.java:77)
at org.apache.derby.client.net.ClientJDBCObjectFactoryImpl40.newNetConnection(ClientJDBCObjectFactoryImpl40.java:269)
at org.apache.derby.jdbc.ClientDriver.connect(ClientDriver.java:140)
... 26 more

Dag H. Wanvik made changes - 23/Apr/09 03:30 AM
Derby Info [Patch Available]
Knut Anders Hatlen added a comment - 24/Apr/09 02:04 PM
I had a quick look through the patch, and it looks correct to me that the SQLStates touched by the patch should have session severity. +1 from me.

Dag H. Wanvik added a comment - 24/Apr/09 04:57 PM
Thanks for looking at this, Knut.
I filed filed DERBY-4185 for the issue with DEFAULT_MESSAGE_RESPONSE_TIMEOUT.

Dag H. Wanvik added a comment - 28/Apr/09 04:17 PM
Uploading #2 of this patch, the regression broke due to missing updates
needed in ErrorCodeTest (new error message now have severity session or above).

Dag H. Wanvik made changes - 28/Apr/09 04:17 PM
Attachment derby-3417-2.diff [ 12406655 ]
Attachment derby-3417-2.stat [ 12406656 ]
Repository Revision Date User Message
ASF #769596 Wed Apr 29 00:14:20 UTC 2009 dag DERBY-3417 slave side stop in a client server mode results in SQLState printed without proper error message

Patch DERBY-3417-2.

A set of replication errors have been made session level (they are),
also having the effect of preformatting the error message strings on
the server, solving this issue. Also removed usage of
org.apache.derby.shared.common.reference.SQLState's strings in the
replication tests to make them self contained.
Files Changed
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/replicationTests/ReplicationRun_Local_StateTest_part1_2.java
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/replicationTests/ReplicationRun_Local_StateTest_part1_1.java
MODIFY /db/derby/code/trunk/java/shared/org/apache/derby/shared/common/reference/SQLState.java
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/replicationTests/ReplicationRun.java
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/replicationTests/ReplicationRun_Local_StateTest_part1.java
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/replicationTests/ReplicationSuite.java
MODIFY /db/derby/code/trunk/java/engine/org/apache/derby/loc/messages.xml
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/lang/ErrorCodeTest.java

Dag H. Wanvik added a comment - 29/Apr/09 12:15 AM
Committed #2 as svn 769596, resolving.

Dag H. Wanvik made changes - 29/Apr/09 12:16 AM
Status Open [ 1 ] Resolved [ 5 ]
Derby Info [Patch Available]
Resolution Fixed [ 1 ]
Repository Revision Date User Message
ASF #769602 Wed Apr 29 00:44:47 UTC 2009 dag DERBY-4175 Instability in some replication tests under load, since tests don't wait long enough for final state or anticipate intermediate states

Patch DERBY-4175-3 (+ resolved some conflicts arising from commit of DERBY-3417).

It makes three replication tests less sensitive to load by making
them accept intermediate states without failing or wait for longer
before giving up on seeing the final end state of a replication state
change.
Files Changed
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/replicationTests/ReplicationRun_Local_StateTest_part1_2.java
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/replicationTests/ReplicationRun_Local_StateTest_part1_1.java
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/replicationTests/ReplicationRun_Local_3_p3.java

Dag H. Wanvik added a comment - 29/Apr/09 12:46 AM
Please don't close yet until we decide whether to backport.

Knut Anders Hatlen added a comment - 07/Jul/09 11:53 AM

Dag H. Wanvik added a comment - 07/Jul/09 12:39 PM
Thanks for reminding me; I agree, I'll do it.


Repository Revision Date User Message
ASF #791826 Tue Jul 07 13:22:26 UTC 2009 dag DERBY-3417 slave side stop in a client server mode results in SQLState printed without proper error message

Backported patch DERBY-3417-2 from trunk as:
svn merge -c 769596 https://svn.eu.apache.org/repos/asf/db/derby/code/trunk

A set of replication errors have been made session level (they are),
also having the effect of preformatting the error message strings on
the server, solving this issue. Also removed usage of
org.apache.derby.shared.common.reference.SQLState's strings in the
replication tests to make them self contained.
Files Changed
MODIFY /db/derby/code/branches/10.5/java/shared/org/apache/derby/shared/common/reference/SQLState.java
MODIFY /db/derby/code/branches/10.5/java/testing/org/apache/derbyTesting/functionTests/tests/replicationTests/ReplicationRun_Local_StateTest_part1_2.java
MODIFY /db/derby/code/branches/10.5/java/testing/org/apache/derbyTesting/functionTests/tests/replicationTests/ReplicationRun_Local_StateTest_part1_1.java
MODIFY /db/derby/code/branches/10.5/java/testing/org/apache/derbyTesting/functionTests/tests/replicationTests/ReplicationRun.java
MODIFY /db/derby/code/branches/10.5/java/testing/org/apache/derbyTesting/functionTests/tests/replicationTests/ReplicationRun_Local_StateTest_part1.java
MODIFY /db/derby/code/branches/10.5/java/testing/org/apache/derbyTesting/functionTests/tests/replicationTests/ReplicationSuite.java
MODIFY /db/derby/code/branches/10.5/java/testing/org/apache/derbyTesting/functionTests/tests/lang/ErrorCodeTest.java
MODIFY /db/derby/code/branches/10.5
MODIFY /db/derby/code/branches/10.5/java/engine/org/apache/derby/loc/messages.xml

Dag H. Wanvik added a comment - 07/Jul/09 01:24 PM
Backported to 10.5 branch as svn 791826, replication suite ran ok with sane jars.
Narayanan, feel free to close this now.

Dag H. Wanvik made changes - 07/Jul/09 01:25 PM
Fix Version/s 10.5.1.2 [ 12313870 ]
Fix Version/s 10.6.0.0 [ 12313727 ]
V.Narayanan added a comment - 08/Jul/09 06:00 AM
Thank you for the backport Dag! Closing Issue!

V.Narayanan made changes - 08/Jul/09 06:00 AM
Status Resolved [ 5 ] Closed [ 6 ]
Kathey Marsden made changes - 16/Jul/09 09:24 PM
Fix Version/s 10.5.2.0 [ 12314116 ]
Fix Version/s 10.5.1.2 [ 12313870 ]