Derby
  1. Derby
  2. DERBY-3417

slave side stop in a client server mode results in SQLState printed without proper error message

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.4.1.3
    • Fix Version/s: 10.5.2.0, 10.6.1.0
    • Component/s: Replication
    • Labels:
      None

      Description

      I tried a stopSlave on the slave side of the replication system and
      found the below

      ij> connect 'jdbc:derby://localhost:1528/replicationdb;stopSlave=true';
      ERROR XRE41: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE41, SQLERRMC: XRE41

      https://issues.apache.org/jira/browse/DERBY-3205 says

      ERROR XRE41: Replication operation 'failover' or 'stopSlave' failed because the connection with the master is working. Issue the 'failover' or 'stopMaster' operation on the master database instead.

      needs to be printed.

      I am not sure if this is a generic case for client server replication messages.

      1. derby-3417-2.stat
        0.8 kB
        Dag H. Wanvik
      2. derby-3417-2.diff
        22 kB
        Dag H. Wanvik
      3. derby-3417.stat
        0.7 kB
        Dag H. Wanvik
      4. derby-3417.diff
        19 kB
        Dag H. Wanvik

        Activity

        Hide
        Dyre Tjeldvoll added a comment -

        Removing Fix-version for unassigned issues

        Show
        Dyre Tjeldvoll added a comment - Removing Fix-version for unassigned issues
        Hide
        Jørgen Løland added a comment -

        I'm not sure this can be solved. AFAIK, the client normally gets the error message from the server using the jdbc connection. In this case, however, there is no open connection, and the client is therefore unable to get the error message from the server.

        Show
        Jørgen Løland added a comment - I'm not sure this can be solved. AFAIK, the client normally gets the error message from the server using the jdbc connection. In this case, however, there is no open connection, and the client is therefore unable to get the error message from the server.
        Hide
        John H. Embretsen added a comment -

        During buddy testing I noticed a similar issue when trying to start master against a slave which did not listen on the correct network interface (remote host):

        ij> connect 'jdbc:derby://localhost:1527/replicDB;startMaster=true;slaveHost=nanna19';
        ERROR XRE05: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE05, SQLERRMC: replicDB113076142121XRE05XRE05

        derby.log included the stack trace, which (correctly) said Connection Refused.

        Show
        John H. Embretsen added a comment - During buddy testing I noticed a similar issue when trying to start master against a slave which did not listen on the correct network interface (remote host): ij> connect 'jdbc:derby://localhost:1527/replicDB;startMaster=true;slaveHost=nanna19'; ERROR XRE05: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE05, SQLERRMC: replicDB113076142121XRE05XRE05 derby.log included the stack trace, which (correctly) said Connection Refused.
        Hide
        Dag H. Wanvik added a comment -

        When the master is down (I did a kill -9), and I stop replication on the slave,
        the shutdown succeeds (XRE42), but the message text is also garbled. I see:

        connect 'jdbc:derby://localhost:1540/wombat;stopSlave=true';
        ERROR XRE42: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE42, SQLERRMC: wombat^TXRE42

        The message should be (from messages.xml):

        Replicated database 'wombat' shutdown.

        Show
        Dag H. Wanvik added a comment - When the master is down (I did a kill -9), and I stop replication on the slave, the shutdown succeeds (XRE42), but the message text is also garbled. I see: connect 'jdbc:derby://localhost:1540/wombat;stopSlave=true'; ERROR XRE42: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE42, SQLERRMC: wombat^TXRE42 The message should be (from messages.xml): Replicated database 'wombat' shutdown.
        Hide
        Dag H. Wanvik added a comment -

        I also see this type of error when trying to start a master when no slave is listening:

        ij> connect 'jdbc:derby://localhost/wombat;startMaster=true;slaveHost=localHost';
        ERROR XRE04: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE04, SQLERRMC: wombat^TlocalHost^T4851^TXRE04.U.1

        Show
        Dag H. Wanvik added a comment - I also see this type of error when trying to start a master when no slave is listening: ij> connect 'jdbc:derby://localhost/wombat;startMaster=true;slaveHost=localHost'; ERROR XRE04: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE04, SQLERRMC: wombat^TlocalHost^T4851^TXRE04.U.1
        Hide
        Dag H. Wanvik added a comment -

        Another variant (XRE11) when trying to stop replication on a db when no replication is going on:
        connect 'jdbc:derby://localhost:1540/wombat;stopSlave=true';
        ERROR XRE11: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE11, SQLERRMC: stopSlave^Twombat^TXRE11

        Show
        Dag H. Wanvik added a comment - Another variant (XRE11) when trying to stop replication on a db when no replication is going on: connect 'jdbc:derby://localhost:1540/wombat;stopSlave=true'; ERROR XRE11: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE11, SQLERRMC: stopSlave^Twombat^TXRE11
        Hide
        Dag H. Wanvik added a comment -

        And another; (XRE22) trying to start master twice):

        connect 'jdbc:derby://localhost/wombat;startMaster=true;slaveHost=localHost';
        ERROR XRE22: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE22, SQLERRMC: wombat^TXRE22

        Show
        Dag H. Wanvik added a comment - And another; (XRE22) trying to start master twice): connect 'jdbc:derby://localhost/wombat;startMaster=true;slaveHost=localHost'; ERROR XRE22: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE22, SQLERRMC: wombat^TXRE22
        Hide
        Dag H. Wanvik added a comment -

        By adding connection severity ".C" suffix to the relevant State state strings, the message description string is pre-formatted on the server (with the server's locale) before transmission to the client, and the messages look ok again.

        Show
        Dag H. Wanvik added a comment - By adding connection severity ".C" suffix to the relevant State state strings, the message description string is pre-formatted on the server (with the server's locale) before transmission to the client, and the messages look ok again.
        Hide
        Dag H. Wanvik added a comment -

        Are any of these states in any context where connection level severity would be wrong? Will dig and see what I find..

        Show
        Dag H. Wanvik added a comment - Are any of these states in any context where connection level severity would be wrong? Will dig and see what I find..
        Hide
        Knut Anders Hatlen added a comment -

        Making those exceptions session severity sounds like a good idea to me.

        Since an exception thrown by getConnection() will always prevent the connection from being established, perhaps it would be more robust if we made the server pre-format all exceptions from getConnection() regardless of the stated severity? Anyways, increasing the severity would be enough for the messages found here, and it sounds like the right thing to do.

        Show
        Knut Anders Hatlen added a comment - Making those exceptions session severity sounds like a good idea to me. Since an exception thrown by getConnection() will always prevent the connection from being established, perhaps it would be more robust if we made the server pre-format all exceptions from getConnection() regardless of the stated severity? Anyways, increasing the severity would be enough for the messages found here, and it sounds like the right thing to do.
        Hide
        Dag H. Wanvik added a comment -

        Thanks for looking at this, Knut. The approach of making them connection level severity seems to work,
        but I see many red herring from the replication regression test. It seems very instable to me. I see anything from 0 to 3 failures
        on a typical run, so it makes it hard to use to verify changes in the replication code. I think we should put some effort into making it less
        brittle. Interesting idea about getConnection.

        Show
        Dag H. Wanvik added a comment - Thanks for looking at this, Knut. The approach of making them connection level severity seems to work, but I see many red herring from the replication regression test. It seems very instable to me. I see anything from 0 to 3 failures on a typical run, so it makes it hard to use to verify changes in the replication code. I think we should put some effort into making it less brittle. Interesting idea about getConnection.
        Hide
        Dag H. Wanvik added a comment -

        Uploading a trial patch that makes the seen SQL states session level severity.
        ReplicationSuite still works, some of the time... That is, I do see intermittent errors,
        but I think none of them are new with this patch.

        I replaced imports of SQLState; tests should be self contained.
        I also replaced some assertTrue with assertSQLState for better error reporting.

        I am still not clear on if any of these errors could be thrown in contexts where it would be wrong to give them session level severity; Knut's proposal would avoid that issue, at the cost of a new mechanism (already enough in the error apparatus, perhaps

        I did not make all the XRE* errors session level yet, maybe they could all be given that severity?

        Show
        Dag H. Wanvik added a comment - Uploading a trial patch that makes the seen SQL states session level severity. ReplicationSuite still works, some of the time... That is, I do see intermittent errors, but I think none of them are new with this patch. I replaced imports of SQLState; tests should be self contained. I also replaced some assertTrue with assertSQLState for better error reporting. I am still not clear on if any of these errors could be thrown in contexts where it would be wrong to give them session level severity; Knut's proposal would avoid that issue, at the cost of a new mechanism (already enough in the error apparatus, perhaps I did not make all the XRE* errors session level yet, maybe they could all be given that severity?
        Hide
        Dag H. Wanvik added a comment -

        I logged one of the Heisenbugs I referred to as DERBY-4175.

        Show
        Dag H. Wanvik added a comment - I logged one of the Heisenbugs I referred to as DERBY-4175 .
        Hide
        Dag H. Wanvik added a comment -

        I think some of the brittleness of the replication tests can be
        attributed to the fact that they run by default on localhost, using
        client/Server Derby instances, and the timeout in the message
        transmission layer is fixed at 5 seconds:

        org.apache.derby.impl.store.replication.net.ReplicationMessageTransmit:
        :
        private final int DEFAULT_MESSAGE_RESPONSE_TIMEOUT = 5000;

        This is not only the default actually; there is currently no way to
        override it. In some failing tests I see XRE04
        (REPLICATION_CONNECTION_LOST) as the root cause of other exceptions,
        see example below. By upping this time constant this class or errors
        went away. Since I was working on my machine while the tests were
        running that could explain why I see more intermittent errors than is
        usually seen on the test machines.

        For the example test (ReplicationRun_Local_StateTest_part1_3), when I
        increase DEFAULT_MESSAGE_RESPONSE_TIMEOUT, this error goes
        away. Conversely, if I reduced it the frequency of errors increase.

        I think this constant should be settable with a property for the end
        user or maybe just increased, since not every application of
        replication can assume machines with light load. I am unsure if
        increasing the timeout will have any negative effect (error detection
        latency springs to mind). What do you think?

        Example error seen:
        ------------------
        1) testReplication_Local_StateTest_part1_3(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_3)junit.framework.ComparisonFailure: connectionURL failed: -1 XRE21 DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE21, SQLERRMC: Error occurred while performing failover for database '/export/home/dag/java/sb/tests/derby-3417-replicationTests.ReplicationRun_Local_StateTest_part1_3-sb.sb4.classes-1.6.0_13-14549/db_master/wombat', Failover attempt was aborted.::SQLSTATE: XRE04Connection lost for replicated database 'null'. expected:<XRE2[0]> but was:<XRE2[1]>
        at org.apache.derbyTesting.junit.BaseJDBCTestCase.assertSQLState(BaseJDBCTestCase.java:762)
        at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_3._testPostStartedMasterAndSlave_Failover(ReplicationRun_Local_StateTest_part1_3.java:181)
        at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_3.testReplication_Local_StateTest_part1_3(ReplicationRun_Local_StateTest_part1_3.java:123)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:105)
        at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
        at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
        at junit.extensions.TestSetup.run(TestSetup.java:25)

        Caused by: java.sql.SQLException: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE21, SQLERRMC: Error occurred while performing failover for database '/export/home/dag/java/sb/tests/derby-3417-replicationTests.ReplicationRun_Local_StateTest_part1_3-sb.sb4.classes-1.6.0_13-14549/db_master/wombat', Failover attempt was aborted.::SQLSTATE: XRE04Connection lost for replicated database 'null'.
        at org.apache.derby.client.am.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:96)
        at org.apache.derby.client.am.SqlException.getSQLException(SqlException.java:358)
        at org.apache.derby.jdbc.ClientDriver.connect(ClientDriver.java:149)
        at java.sql.DriverManager.getConnection(DriverManager.java:582)
        at java.sql.DriverManager.getConnection(DriverManager.java:207)
        at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_3._testPostStartedMasterAndSlave_Failover(ReplicationRun_Local_StateTest_part1_3.java:171)
        ... 23 more

        Caused by: org.apache.derby.client.am.SqlException: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE21, SQLERRMC: Error occurred while performing failover for database '/export/home/dag/java/sb/tests/derby-3417-replicationTests.ReplicationRun_Local_StateTest_part1_3-sb.sb4.classes-1.6.0_13-14549/db_master/wombat', Failover attempt was aborted.::SQLSTATE: XRE04Connection lost for replicated database 'null'.
        at org.apache.derby.client.am.Connection.completeSqlca(Connection.java:2082)
        at org.apache.derby.client.net.NetConnectionReply.parseRdbAccessFailed(NetConnectionReply.java:540)
        at org.apache.derby.client.net.NetConnectionReply.parseAccessRdbError(NetConnectionReply.java:433)
        at org.apache.derby.client.net.NetConnectionReply.parseACCRDBreply(NetConnectionReply.java:297)
        at org.apache.derby.client.net.NetConnectionReply.readAccessDatabase(NetConnectionReply.java:121)
        at org.apache.derby.client.net.NetConnection.readSecurityCheckAndAccessRdb(NetConnection.java:835)
        at org.apache.derby.client.net.NetConnection.flowSecurityCheckAndAccessRdb(NetConnection.java:759)
        at org.apache.derby.client.net.NetConnection.flowUSRIDONLconnect(NetConnection.java:592)
        at org.apache.derby.client.net.NetConnection.flowConnect(NetConnection.java:399)
        at org.apache.derby.client.net.NetConnection.<init>(NetConnection.java:219)
        at org.apache.derby.client.net.NetConnection40.<init>(NetConnection40.java:77)
        at org.apache.derby.client.net.ClientJDBCObjectFactoryImpl40.newNetConnection(ClientJDBCObjectFactoryImpl40.java:269)
        at org.apache.derby.jdbc.ClientDriver.connect(ClientDriver.java:140)
        ... 26 more

        Show
        Dag H. Wanvik added a comment - I think some of the brittleness of the replication tests can be attributed to the fact that they run by default on localhost, using client/Server Derby instances, and the timeout in the message transmission layer is fixed at 5 seconds: org.apache.derby.impl.store.replication.net.ReplicationMessageTransmit: : private final int DEFAULT_MESSAGE_RESPONSE_TIMEOUT = 5000; This is not only the default actually; there is currently no way to override it. In some failing tests I see XRE04 (REPLICATION_CONNECTION_LOST) as the root cause of other exceptions, see example below. By upping this time constant this class or errors went away. Since I was working on my machine while the tests were running that could explain why I see more intermittent errors than is usually seen on the test machines. For the example test (ReplicationRun_Local_StateTest_part1_3), when I increase DEFAULT_MESSAGE_RESPONSE_TIMEOUT, this error goes away. Conversely, if I reduced it the frequency of errors increase. I think this constant should be settable with a property for the end user or maybe just increased, since not every application of replication can assume machines with light load. I am unsure if increasing the timeout will have any negative effect (error detection latency springs to mind). What do you think? Example error seen: ------------------ 1) testReplication_Local_StateTest_part1_3(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_3)junit.framework.ComparisonFailure: connectionURL failed: -1 XRE21 DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE21, SQLERRMC: Error occurred while performing failover for database '/export/home/dag/java/sb/tests/derby-3417-replicationTests.ReplicationRun_Local_StateTest_part1_3-sb.sb4.classes-1.6.0_13-14549/db_master/wombat', Failover attempt was aborted.::SQLSTATE: XRE04Connection lost for replicated database 'null'. expected:<XRE2 [0] > but was:<XRE2 [1] > at org.apache.derbyTesting.junit.BaseJDBCTestCase.assertSQLState(BaseJDBCTestCase.java:762) at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_3._testPostStartedMasterAndSlave_Failover(ReplicationRun_Local_StateTest_part1_3.java:181) at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_3.testReplication_Local_StateTest_part1_3(ReplicationRun_Local_StateTest_part1_3.java:123) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:105) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) Caused by: java.sql.SQLException: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE21, SQLERRMC: Error occurred while performing failover for database '/export/home/dag/java/sb/tests/derby-3417-replicationTests.ReplicationRun_Local_StateTest_part1_3-sb.sb4.classes-1.6.0_13-14549/db_master/wombat', Failover attempt was aborted.::SQLSTATE: XRE04Connection lost for replicated database 'null'. at org.apache.derby.client.am.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:96) at org.apache.derby.client.am.SqlException.getSQLException(SqlException.java:358) at org.apache.derby.jdbc.ClientDriver.connect(ClientDriver.java:149) at java.sql.DriverManager.getConnection(DriverManager.java:582) at java.sql.DriverManager.getConnection(DriverManager.java:207) at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_3._testPostStartedMasterAndSlave_Failover(ReplicationRun_Local_StateTest_part1_3.java:171) ... 23 more Caused by: org.apache.derby.client.am.SqlException: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE21, SQLERRMC: Error occurred while performing failover for database '/export/home/dag/java/sb/tests/derby-3417-replicationTests.ReplicationRun_Local_StateTest_part1_3-sb.sb4.classes-1.6.0_13-14549/db_master/wombat', Failover attempt was aborted.::SQLSTATE: XRE04Connection lost for replicated database 'null'. at org.apache.derby.client.am.Connection.completeSqlca(Connection.java:2082) at org.apache.derby.client.net.NetConnectionReply.parseRdbAccessFailed(NetConnectionReply.java:540) at org.apache.derby.client.net.NetConnectionReply.parseAccessRdbError(NetConnectionReply.java:433) at org.apache.derby.client.net.NetConnectionReply.parseACCRDBreply(NetConnectionReply.java:297) at org.apache.derby.client.net.NetConnectionReply.readAccessDatabase(NetConnectionReply.java:121) at org.apache.derby.client.net.NetConnection.readSecurityCheckAndAccessRdb(NetConnection.java:835) at org.apache.derby.client.net.NetConnection.flowSecurityCheckAndAccessRdb(NetConnection.java:759) at org.apache.derby.client.net.NetConnection.flowUSRIDONLconnect(NetConnection.java:592) at org.apache.derby.client.net.NetConnection.flowConnect(NetConnection.java:399) at org.apache.derby.client.net.NetConnection.<init>(NetConnection.java:219) at org.apache.derby.client.net.NetConnection40.<init>(NetConnection40.java:77) at org.apache.derby.client.net.ClientJDBCObjectFactoryImpl40.newNetConnection(ClientJDBCObjectFactoryImpl40.java:269) at org.apache.derby.jdbc.ClientDriver.connect(ClientDriver.java:140) ... 26 more
        Hide
        Knut Anders Hatlen added a comment -

        I had a quick look through the patch, and it looks correct to me that the SQLStates touched by the patch should have session severity. +1 from me.

        Show
        Knut Anders Hatlen added a comment - I had a quick look through the patch, and it looks correct to me that the SQLStates touched by the patch should have session severity. +1 from me.
        Hide
        Dag H. Wanvik added a comment -

        Thanks for looking at this, Knut.
        I filed filed DERBY-4185 for the issue with DEFAULT_MESSAGE_RESPONSE_TIMEOUT.

        Show
        Dag H. Wanvik added a comment - Thanks for looking at this, Knut. I filed filed DERBY-4185 for the issue with DEFAULT_MESSAGE_RESPONSE_TIMEOUT.
        Hide
        Dag H. Wanvik added a comment -

        Uploading #2 of this patch, the regression broke due to missing updates
        needed in ErrorCodeTest (new error message now have severity session or above).

        Show
        Dag H. Wanvik added a comment - Uploading #2 of this patch, the regression broke due to missing updates needed in ErrorCodeTest (new error message now have severity session or above).
        Hide
        Dag H. Wanvik added a comment -

        Committed #2 as svn 769596, resolving.

        Show
        Dag H. Wanvik added a comment - Committed #2 as svn 769596, resolving.
        Hide
        Dag H. Wanvik added a comment -

        Please don't close yet until we decide whether to backport.

        Show
        Dag H. Wanvik added a comment - Please don't close yet until we decide whether to backport.
        Hide
        Knut Anders Hatlen added a comment -
        Show
        Knut Anders Hatlen added a comment - Dag, could this be back-ported to 10.5? See http://www.nabble.com/Re%3A-Derby-replication-seems-to-be-mostly-useless-p24371227.html
        Hide
        Dag H. Wanvik added a comment -

        Thanks for reminding me; I agree, I'll do it.

        Show
        Dag H. Wanvik added a comment - Thanks for reminding me; I agree, I'll do it.
        Hide
        Dag H. Wanvik added a comment -

        Backported to 10.5 branch as svn 791826, replication suite ran ok with sane jars.
        Narayanan, feel free to close this now.

        Show
        Dag H. Wanvik added a comment - Backported to 10.5 branch as svn 791826, replication suite ran ok with sane jars. Narayanan, feel free to close this now.
        Hide
        V.Narayanan added a comment -

        Thank you for the backport Dag! Closing Issue!

        Show
        V.Narayanan added a comment - Thank you for the backport Dag! Closing Issue!

          People

          • Assignee:
            Dag H. Wanvik
            Reporter:
            V.Narayanan
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development