Derby
  1. Derby
  2. DERBY-4910

Four replication tests fail on z/OS with junit.framework.AssertionFailedError: startMaster did not succeed.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.7.1.1
    • Fix Version/s: 10.7.1.4, 10.8.1.2, 10.9.1.0
    • Component/s: Replication
    • Environment:
    • Urgency:
      Normal
    • Bug behavior facts:
      Regression, Regression Test Failure

      Description

      Four replication tests failed with junit.framework.AssertionFailedError: startMaster did not succeed. running on the 64bit JVM on z/OS:
      ReplicationRun_Local_3_p2 ReplicationRun_Local_StateTest_part1
      ReplicationRun_Local_3_p3 ReplicationRun_Local_StateTest_part1_2

      Below are the stack traces. I will attach the fail directory.

      There were 4 failures:
      1) testReplication_Local_StateTest_part1(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1)junit.framework.AssertionFailedError: startMaster did not succeed.
      at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.startMaster_direct(ReplicationRun.java:1072)
      at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.startMaster(ReplicationRun.java:906)
      at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1._testPreStartedMaster(ReplicationRun_Local_StateTest_part1.java:375)
      at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1.testReplication_Local_StateTest_part1(ReplicationRun_Local_StateTest_part1.java:136)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:48)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:109)
      at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.runBare(ReplicationRun.java:207)
      at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
      at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
      at junit.extensions.TestSetup.run(TestSetup.java:16)
      2) testReplication_Local_StateTest_part1_2(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_2)junit.framework.AssertionFailedError: startMaster did not succeed.
      at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.startMaster_direct(ReplicationRun.java:1072)
      at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.startMaster(ReplicationRun.java:906)
      at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_2.testReplication_Local_StateTest_part1_2(ReplicationRun_Local_StateTest_part1_2.java:128)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:48)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:109)
      at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.runBare(ReplicationRun.java:207)
      at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
      at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
      at junit.extensions.TestSetup.run(TestSetup.java:16)
      3) testReplication_Local_3_p2_StateTests_bigInsert_sleepBeforeStopMaster(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p2)junit.framework.AssertionFailedError: startMaster did not succeed.
      at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.startMaster_direct(ReplicationRun.java:1072)
      at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.startMaster(ReplicationRun.java:906)
      at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p2.replication_Local_3_p2_StateTests(ReplicationRun_Local_3_p2.java:128)
      at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p2.testReplication_Local_3_p2_StateTests_bigInsert_sleepBeforeStopMaster(ReplicationRun_Local_3_p2.java:248)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:48)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:109)
      at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.runBare(ReplicationRun.java:207)
      at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
      at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
      at junit.extensions.TestSetup.run(TestSetup.java:16)
      4) testReplication_Local_3_p3_StateNegativeTests(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p3)junit.framework.AssertionFailedError: startMaster did not succeed.
      at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.startMaster_direct(ReplicationRun.java:1072)
      at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.startMaster(ReplicationRun.java:906)
      at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p3.testReplication_Local_3_p3_StateNegativeTests(ReplicationRun_Local_3_p3.java:128)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:48)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:109)
      at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.runBare(ReplicationRun.java:207)
      at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
      at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
      at junit.extensions.TestSetup.run(TestSetup.java:16)

      FAILURES!!!
      Tests run: 13032, Failures: 4, Errors: 0

      1. derby-4910-1a.stat
        0.3 kB
        Knut Anders Hatlen
      2. derby-4910-1a.diff
        5 kB
        Knut Anders Hatlen
      3. catch.diff
        1 kB
        Knut Anders Hatlen
      4. traces.diff
        6 kB
        Knut Anders Hatlen
      5. fail.zip
        3.40 MB
        Kathey Marsden

        Issue Links

          Activity

          Kathey Marsden created issue -
          Kathey Marsden made changes -
          Field Original Value New Value
          Attachment fail.zip [ 12460213 ]
          Rick Hillegas made changes -
          Affects Version/s 10.7.1.1 [ 12315564 ]
          Affects Version/s 10.7.1.0 [ 12314971 ]
          Hide
          Knut Anders Hatlen added a comment -

          I saw one of these failures again with the 10.7.1.1 release candidate on Solaris Express 11, JDK 7 (1.7.0-ea-b119):

          testReplication_Local_3_p2_StateTests_bigInsert_immediateStopMaster(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p2) FAILURE:
          junit.framework.AssertionFailedError: startMaster did not succeed.
          at junit.framework.Assert.fail(Assert.java:47)
          at junit.framework.Assert.assertTrue(Assert.java:20)
          at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.startMaster_direct(ReplicationRun.java:1072)
          at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.startMaster(ReplicationRun.java:906)
          at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p2.replication_Local_3_p2_StateTests(ReplicationRun_Local_3_p2.java:128)
          at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p2.testReplication_Local_3_p2_StateTests_bigInsert_immediateStopMaster(ReplicationRun_Local_3_p2.java:233)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:613)
          at junit.framework.TestCase.runTest(TestCase.java:164)
          at junit.framework.TestCase.runBare(TestCase.java:130)
          at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:109)
          at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.runBare(ReplicationRun.java:207)

          Show
          Knut Anders Hatlen added a comment - I saw one of these failures again with the 10.7.1.1 release candidate on Solaris Express 11, JDK 7 (1.7.0-ea-b119): testReplication_Local_3_p2_StateTests_bigInsert_immediateStopMaster(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p2) FAILURE: junit.framework.AssertionFailedError: startMaster did not succeed. at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.startMaster_direct(ReplicationRun.java:1072) at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.startMaster(ReplicationRun.java:906) at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p2.replication_Local_3_p2_StateTests(ReplicationRun_Local_3_p2.java:128) at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p2.testReplication_Local_3_p2_StateTests_bigInsert_immediateStopMaster(ReplicationRun_Local_3_p2.java:233) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:613) at junit.framework.TestCase.runTest(TestCase.java:164) at junit.framework.TestCase.runBare(TestCase.java:130) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:109) at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.runBare(ReplicationRun.java:207)
          Hide
          Knut Anders Hatlen added a comment -

          The attached patch makes the replication tests preserve stack traces of the underlying errors, so that we can hopefully see more of what's going on the next time they fail.

          Committed revision 1043748.

          Show
          Knut Anders Hatlen added a comment - The attached patch makes the replication tests preserve stack traces of the underlying errors, so that we can hopefully see more of what's going on the next time they fail. Committed revision 1043748.
          Knut Anders Hatlen made changes -
          Attachment traces.diff [ 12465851 ]
          Lily Wei made changes -
          Labels derby_triage10_8
          Urgency Normal
          Hide
          Knut Anders Hatlen added a comment -

          Saw this again with Derby 10.8.1.0 on FreeBSD 8.2, OpenJDK 6 (b20). After the patch that preserved the underlying exceptions, the stack trace now looks like this:

          junit.framework.AssertionFailedError: startMaster did not succeed
          at org.apache.derbyTesting.junit.BaseTestCase.fail(BaseTestCase.java:771)
          at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.startMaster_direct(ReplicationRun.java:1062)
          at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.startMaster(ReplicationRun.java:911)
          at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_2.testReplication_Local_StateTest_part1_2(ReplicationRun_Local_StateTest_part1_2.java:128)
          at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:112)
          at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.runBare(ReplicationRun.java:207)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          Caused by: java.sql.SQLException: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE04, SQLERRMC: Could not establish a connection to the peer of the replicated database '/var/hudson/workspace/Derby-10.8.1.0-FreeBSD-openjdk6-junit/tst/db_master/wombat' on address 'localhost:1533'.
          at org.apache.derby.client.am.SQLExceptionFactory40.getSQLException(Unknown Source)
          at org.apache.derby.client.am.SqlException.getSQLException(Unknown Source)
          at org.apache.derby.jdbc.ClientDataSource.getConnection(Unknown Source)
          at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.startMaster_direct(ReplicationRun.java:1042)
          Caused by: org.apache.derby.client.am.SqlException: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE04, SQLERRMC: Could not establish a connection to the peer of the replicated database '/var/hudson/workspace/Derby-10.8.1.0-FreeBSD-openjdk6-junit/tst/db_master/wombat' on address 'localhost:1533'.
          at org.apache.derby.client.am.Connection.completeSqlca(Unknown Source)
          at org.apache.derby.client.net.NetConnectionReply.parseRdbAccessFailed(Unknown Source)
          at org.apache.derby.client.net.NetConnectionReply.parseAccessRdbError(Unknown Source)
          at org.apache.derby.client.net.NetConnectionReply.parseACCRDBreply(Unknown Source)
          at org.apache.derby.client.net.NetConnectionReply.readAccessDatabase(Unknown Source)
          at org.apache.derby.client.net.NetConnection.readSecurityCheckAndAccessRdb(Unknown Source)
          at org.apache.derby.client.net.NetConnection.flowSecurityCheckAndAccessRdb(Unknown Source)
          at org.apache.derby.client.net.NetConnection.flowUSRIDONLconnect(Unknown Source)
          at org.apache.derby.client.net.NetConnection.flowConnect(Unknown Source)
          at org.apache.derby.client.net.NetConnection.initialize(Unknown Source)
          at org.apache.derby.client.net.NetConnection.<init>(Unknown Source)
          at org.apache.derby.client.net.NetConnection40.<init>(Unknown Source)
          at org.apache.derby.client.net.ClientJDBCObjectFactoryImpl40.newNetConnection(Unknown Source)
          at org.apache.derby.jdbc.ClientDataSource.getConnectionX(Unknown Source)

          Show
          Knut Anders Hatlen added a comment - Saw this again with Derby 10.8.1.0 on FreeBSD 8.2, OpenJDK 6 (b20). After the patch that preserved the underlying exceptions, the stack trace now looks like this: junit.framework.AssertionFailedError: startMaster did not succeed at org.apache.derbyTesting.junit.BaseTestCase.fail(BaseTestCase.java:771) at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.startMaster_direct(ReplicationRun.java:1062) at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.startMaster(ReplicationRun.java:911) at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_2.testReplication_Local_StateTest_part1_2(ReplicationRun_Local_StateTest_part1_2.java:128) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:112) at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.runBare(ReplicationRun.java:207) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) Caused by: java.sql.SQLException: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE04, SQLERRMC: Could not establish a connection to the peer of the replicated database '/var/hudson/workspace/Derby-10.8.1.0-FreeBSD-openjdk6-junit/tst/db_master/wombat' on address 'localhost:1533'. at org.apache.derby.client.am.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.client.am.SqlException.getSQLException(Unknown Source) at org.apache.derby.jdbc.ClientDataSource.getConnection(Unknown Source) at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.startMaster_direct(ReplicationRun.java:1042) Caused by: org.apache.derby.client.am.SqlException: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE04, SQLERRMC: Could not establish a connection to the peer of the replicated database '/var/hudson/workspace/Derby-10.8.1.0-FreeBSD-openjdk6-junit/tst/db_master/wombat' on address 'localhost:1533'. at org.apache.derby.client.am.Connection.completeSqlca(Unknown Source) at org.apache.derby.client.net.NetConnectionReply.parseRdbAccessFailed(Unknown Source) at org.apache.derby.client.net.NetConnectionReply.parseAccessRdbError(Unknown Source) at org.apache.derby.client.net.NetConnectionReply.parseACCRDBreply(Unknown Source) at org.apache.derby.client.net.NetConnectionReply.readAccessDatabase(Unknown Source) at org.apache.derby.client.net.NetConnection.readSecurityCheckAndAccessRdb(Unknown Source) at org.apache.derby.client.net.NetConnection.flowSecurityCheckAndAccessRdb(Unknown Source) at org.apache.derby.client.net.NetConnection.flowUSRIDONLconnect(Unknown Source) at org.apache.derby.client.net.NetConnection.flowConnect(Unknown Source) at org.apache.derby.client.net.NetConnection.initialize(Unknown Source) at org.apache.derby.client.net.NetConnection.<init>(Unknown Source) at org.apache.derby.client.net.NetConnection40.<init>(Unknown Source) at org.apache.derby.client.net.ClientJDBCObjectFactoryImpl40.newNetConnection(Unknown Source) at org.apache.derby.jdbc.ClientDataSource.getConnectionX(Unknown Source)
          Hide
          Knut Anders Hatlen added a comment -

          I think I see what's going on. derby.log for the slave showed the following exception had been raised:

          java.net.SocketTimeoutException: Accept timed out
          at java.net.PlainSocketImpl.socketAccept(Native Method)
          at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:358)
          at java.net.ServerSocket.implAccept(ServerSocket.java:470)
          at java.net.ServerSocket.accept(ServerSocket.java:438)
          at org.apache.derby.impl.store.replication.net.ReplicationMessageReceive$1.run(Unknown Source)
          at java.security.AccessController.doPrivileged(Native Method)
          at org.apache.derby.impl.store.replication.net.ReplicationMessageReceive.initConnection(Unknown Source)
          at org.apache.derby.impl.store.replication.slave.SlaveController.setupConnection(Unknown Source)
          at org.apache.derby.impl.store.replication.slave.SlaveController.startSlave(Unknown Source)
          at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
          at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
          at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
          at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
          at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
          at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
          at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
          at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
          at org.apache.derby.impl.db.SlaveDatabase.bootBasicDatabase(Unknown Source)
          at org.apache.derby.impl.db.SlaveDatabase.access$000(Unknown Source)
          at org.apache.derby.impl.db.SlaveDatabase$SlaveDatabaseBootThread.run(Unknown Source)
          at java.lang.Thread.run(Thread.java:636)

          The timeout for the accept call is 1 second, but the code is prepared to handle timeouts in SlaveController.setupConnection():

          } catch (Exception e) {
          // SocketTimeoutException is wrapped in
          // PrivilegedActionException.
          Throwable cause = e.getCause();
          if (cause instanceof SocketTimeoutException)

          { // Timeout! return false; }

          else

          { throw StandardException.newException (SQLState.REPLICATION_CONNECTION_EXCEPTION, e, dbname, getHostName(), String.valueOf(getPortNumber())); }

          }

          The timeout-handling code assumes that the timeout exception is wrapped in a PrivilegedActionException. However, DERBY-4812 changed this by unwrapping the PrivilegedActionException in ReplicationMessageReceive.initConnection().

          Show
          Knut Anders Hatlen added a comment - I think I see what's going on. derby.log for the slave showed the following exception had been raised: java.net.SocketTimeoutException: Accept timed out at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:358) at java.net.ServerSocket.implAccept(ServerSocket.java:470) at java.net.ServerSocket.accept(ServerSocket.java:438) at org.apache.derby.impl.store.replication.net.ReplicationMessageReceive$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at org.apache.derby.impl.store.replication.net.ReplicationMessageReceive.initConnection(Unknown Source) at org.apache.derby.impl.store.replication.slave.SlaveController.setupConnection(Unknown Source) at org.apache.derby.impl.store.replication.slave.SlaveController.startSlave(Unknown Source) at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source) at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source) at org.apache.derby.impl.db.SlaveDatabase.bootBasicDatabase(Unknown Source) at org.apache.derby.impl.db.SlaveDatabase.access$000(Unknown Source) at org.apache.derby.impl.db.SlaveDatabase$SlaveDatabaseBootThread.run(Unknown Source) at java.lang.Thread.run(Thread.java:636) The timeout for the accept call is 1 second, but the code is prepared to handle timeouts in SlaveController.setupConnection(): } catch (Exception e) { // SocketTimeoutException is wrapped in // PrivilegedActionException. Throwable cause = e.getCause(); if (cause instanceof SocketTimeoutException) { // Timeout! return false; } else { throw StandardException.newException (SQLState.REPLICATION_CONNECTION_EXCEPTION, e, dbname, getHostName(), String.valueOf(getPortNumber())); } } The timeout-handling code assumes that the timeout exception is wrapped in a PrivilegedActionException. However, DERBY-4812 changed this by unwrapping the PrivilegedActionException in ReplicationMessageReceive.initConnection().
          Hide
          Knut Anders Hatlen added a comment -

          Marking this as a regression. To reproduce the bug, perform the following steps in ij with Derby 10.7.1.1 or later:

          ij version 10.7
          ij> connect 'jdbc:derby:db;create=true';
          ij> connect 'jdbc:derby:db;shutdown=true';
          ERROR 08006: Database 'db' shutdown.
          ij> connect 'jdbc:derby:db;startSlave=true';
          ERROR XJ040: Failed to start database 'db' with class loader sun.misc.Launcher$AppClassLoader@360be0, see the next exception for details.
          ERROR XRE04: Could not establish a connection to the peer of the replicated database 'db' on address 'localhost:4851'.
          ij>

          The startSlave command fails after one second. When doing the same in Derby 10.6.2.1, startSlave will hang until the master is started.

          This sounds like a serious problem. It essentially gives startMaster a one second window to start after the slave has been started, which may be very hard to code reliably, especially when the master and the slave run on different hosts.

          Show
          Knut Anders Hatlen added a comment - Marking this as a regression. To reproduce the bug, perform the following steps in ij with Derby 10.7.1.1 or later: ij version 10.7 ij> connect 'jdbc:derby:db;create=true'; ij> connect 'jdbc:derby:db;shutdown=true'; ERROR 08006: Database 'db' shutdown. ij> connect 'jdbc:derby:db;startSlave=true'; ERROR XJ040: Failed to start database 'db' with class loader sun.misc.Launcher$AppClassLoader@360be0, see the next exception for details. ERROR XRE04: Could not establish a connection to the peer of the replicated database 'db' on address 'localhost:4851'. ij> The startSlave command fails after one second. When doing the same in Derby 10.6.2.1, startSlave will hang until the master is started. This sounds like a serious problem. It essentially gives startMaster a one second window to start after the slave has been started, which may be very hard to code reliably, especially when the master and the slave run on different hosts.
          Knut Anders Hatlen made changes -
          Bug behavior facts [Regression Test Failure] [Regression, Regression Test Failure]
          Assignee Knut Anders Hatlen [ knutanders ]
          Knut Anders Hatlen made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Hide
          Knut Anders Hatlen added a comment -

          The attached patch changes the logic in the timeout handling to account for the changes in DERBY-4812. This makes startSlave behave as it did in 10.6.2.1 and earlier again (it waits until the master is started, even if it takes more than one second.)

          I'll see if I can write a test that exposes the problem.

          Show
          Knut Anders Hatlen added a comment - The attached patch changes the logic in the timeout handling to account for the changes in DERBY-4812 . This makes startSlave behave as it did in 10.6.2.1 and earlier again (it waits until the master is started, even if it takes more than one second.) I'll see if I can write a test that exposes the problem.
          Knut Anders Hatlen made changes -
          Attachment catch.diff [ 12475799 ]
          Knut Anders Hatlen made changes -
          Link This issue is broken by DERBY-4812 [ DERBY-4812 ]
          Hide
          Knut Anders Hatlen added a comment -

          I ran the regression tests on the catch.diff patch and saw one failure in suites.All. The failure was in one of the replication tests, which is a bit suspicious, but its stack trace looks like the one in DERBY-4269, so I believe it's unrelated to the patch. I've also rerun suites.All twice without seeing any failures.

          Attaching a new patch (derby-4910-1a.diff) which adds a test case. The test case was created by copying the existing test case ReplicationRun_Local_1 and adding a sleep() call to it. Before the patch can be applied, the following svn command must be run:

          svn copy java/testing/org/apache/derbyTesting/functionTests/tests/replicationTests/ReplicationRun_Local_1.java java/testing/org/apache/derbyTesting/functionTests/tests/replicationTests/ReplicationRun_Local_Derby4910.java

          The new test case fails without the fix in SlaveController, and passes with the fix.

          Show
          Knut Anders Hatlen added a comment - I ran the regression tests on the catch.diff patch and saw one failure in suites.All. The failure was in one of the replication tests, which is a bit suspicious, but its stack trace looks like the one in DERBY-4269 , so I believe it's unrelated to the patch. I've also rerun suites.All twice without seeing any failures. Attaching a new patch (derby-4910-1a.diff) which adds a test case. The test case was created by copying the existing test case ReplicationRun_Local_1 and adding a sleep() call to it. Before the patch can be applied, the following svn command must be run: svn copy java/testing/org/apache/derbyTesting/functionTests/tests/replicationTests/ReplicationRun_Local_1.java java/testing/org/apache/derbyTesting/functionTests/tests/replicationTests/ReplicationRun_Local_Derby4910.java The new test case fails without the fix in SlaveController, and passes with the fix.
          Knut Anders Hatlen made changes -
          Attachment derby-4910-1a.diff [ 12475931 ]
          Attachment derby-4910-1a.stat [ 12475932 ]
          Knut Anders Hatlen made changes -
          Issue & fix info [Patch Available]
          Hide
          Knut Anders Hatlen added a comment -

          Committed revision 1091000.

          Show
          Knut Anders Hatlen added a comment - Committed revision 1091000.
          Knut Anders Hatlen made changes -
          Status In Progress [ 3 ] Resolved [ 5 ]
          Issue & fix info [Patch Available]
          Fix Version/s 10.9.0.0 [ 12316344 ]
          Resolution Fixed [ 1 ]
          Hide
          Knut Anders Hatlen added a comment -

          Merged to 10.8 with revision 1091004.

          Show
          Knut Anders Hatlen added a comment - Merged to 10.8 with revision 1091004.
          Knut Anders Hatlen made changes -
          Fix Version/s 10.8.1.1 [ 12316356 ]
          Hide
          Knut Anders Hatlen added a comment -

          Merged to 10.7 with revision 1091081.

          Show
          Knut Anders Hatlen added a comment - Merged to 10.7 with revision 1091081.
          Knut Anders Hatlen made changes -
          Fix Version/s 10.7.1.4 [ 12315902 ]
          Hide
          Kathey Marsden added a comment -

          Thank you Knut for tracking this issue down and fixing it and for the test! I think I have become too accustomed to ignoring the noise in replication tests and didn't make the connection with DERBY-4812 when this showed up a couple months later on z/OS. It is good now to have a solid deterministic test for the expected behavior.

          Show
          Kathey Marsden added a comment - Thank you Knut for tracking this issue down and fixing it and for the test! I think I have become too accustomed to ignoring the noise in replication tests and didn't make the connection with DERBY-4812 when this showed up a couple months later on z/OS. It is good now to have a solid deterministic test for the expected behavior.
          Rick Hillegas made changes -
          Fix Version/s 10.8.1.2 [ 12316362 ]
          Fix Version/s 10.8.1.1 [ 12316356 ]
          Kathey Marsden made changes -
          Labels derby_triage10_8 derby_backport_reject_10_5
          Gavin made changes -
          Workflow jira [ 12527508 ] Default workflow, editable Closed status [ 12801912 ]
          Hide
          Knut Anders Hatlen added a comment -

          [bulk update: close all resolved issues that haven't had any activity the last year]

          Show
          Knut Anders Hatlen added a comment - [bulk update: close all resolved issues that haven't had any activity the last year]
          Knut Anders Hatlen made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Knut Anders Hatlen
              Reporter:
              Kathey Marsden
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development