Derby
  1. Derby
  2. DERBY-4667

BootLockTest.testBootLock() sometimes fails with connection refused.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.7.1.1
    • Fix Version/s: 10.7.1.1
    • Component/s: Test
    • Labels:
      None
    • Bug behavior facts:
      Regression Test Failure

      Description

      It may just be related to firewall interference, but sometimes testBootLock fails with:

      1) testBootLock(org.apache.derbyTesting.functionTests.tests.store.BootLockTest)j
      unit.framework.AssertionFailedError: Minion did not start or boot db in 60 secon
      ds.
      ----Minion's stderr:
      java.net.ConnectException: Connection refused: connect at java.net.PlainSocketI
      mpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(Pl
      ainSocketImpl.java:352) at java.net.PlainSocketImpl.connectToAddress(PlainSocket
      Impl.java:214) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:201)at
      java.net.SocksSocketImpl.connect(SocksSocketImpl.java:378) at java.net.Sock
      et.connect(Socket.java:537) at java.net.Socket.connect(Socket.java:487)
      ----Minion's stderr ended
      at org.apache.derbyTesting.functionTests.tests.store.BootLockTest.testBo
      otLock(BootLockTest.java:173)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
      java:48)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
      sorImpl.java:37)
      at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:
      109)
      at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
      at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
      at junit.extensions.TestSetup.run(TestSetup.java:16)
      at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57
      )
      at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
      at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
      at junit.extensions.TestSetup.run(TestSetup.java:16)
      at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
      at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
      at junit.extensions.TestSetup.run(TestSetup.java:16)

      It used to pass on my machine, but I got this today and I think the IBM nightly runs have been hitting the problem.

      1. BootLockTestFailure.zip
        68 kB
        Kathey Marsden
      2. derby-4664_diff.txt
        1 kB
        Kathey Marsden
      3. reorgDerby151TestForDerby4667_diff.txt
        2 kB
        Kathey Marsden

        Activity

        Hide
        Mamta A. Satoor added a comment -

        I saw it to today on my windows xp with ibm16

        1) testBootLock(org.apache.derbyTesting.functionTests.tests.store.BootLockTest)junit.framework.AssertionFailedError: Minion did not start or boot db in 60 seconds.
        ----Minion's stderr:
        java.net.ConnectException: Connection refused: connect at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:352) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:214) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:201) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:368) at java.net.Socket.connect(Socket.java:526) at java.net.Socket.connect(Socket.java:476)
        ----Minion's stderr ended
        at org.apache.derbyTesting.functionTests.tests.store.BootLockTest.testBootLock(BootLockTest.java:173)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:48)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
        at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:109)
        at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
        at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
        at junit.extensions.TestSetup.run(TestSetup.java:16)
        at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
        at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
        at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
        at junit.extensions.TestSetup.run(TestSetup.java:16)
        at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
        at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
        at junit.extensions.TestSetup.run(TestSetup.java:16)

        FAILURES!!!
        Tests run: 12500, Failures: 1, Errors: 0

        Show
        Mamta A. Satoor added a comment - I saw it to today on my windows xp with ibm16 1) testBootLock(org.apache.derbyTesting.functionTests.tests.store.BootLockTest)junit.framework.AssertionFailedError: Minion did not start or boot db in 60 seconds. ----Minion's stderr: java.net.ConnectException: Connection refused: connect at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:352) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:214) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:201) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:368) at java.net.Socket.connect(Socket.java:526) at java.net.Socket.connect(Socket.java:476) ----Minion's stderr ended at org.apache.derbyTesting.functionTests.tests.store.BootLockTest.testBootLock(BootLockTest.java:173) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:48) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:109) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22) at junit.extensions.TestSetup$1.protect(TestSetup.java:19) at junit.extensions.TestSetup.run(TestSetup.java:16) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22) at junit.extensions.TestSetup$1.protect(TestSetup.java:19) at junit.extensions.TestSetup.run(TestSetup.java:16) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22) at junit.extensions.TestSetup$1.protect(TestSetup.java:19) at junit.extensions.TestSetup.run(TestSetup.java:16) FAILURES!!! Tests run: 12500, Failures: 1, Errors: 0
        Hide
        Kathey Marsden added a comment -

        Below are my brief and somewhat random thoughts on this issue.

        • My gut feel is that this is some sort of firewall interference. If someone can reproduce, you might want to turn your firewall off to verify.
        • The test coordinates launch of a spawned process with a socket, but it actually uses the same port as network server and the java in $JAVA_HOME/bin so while it feels like a firewall issue, I am don't really understand why the firewall would complain.
        • Mike suggested a previous tests network server process might be interfering and the port reused too quickly but I think if the previous tests network server did a clean shutdown that should not be the problem.
        • I thought maybe the test could instead of opening a socket, verify that the spawned test had booted derby by just checking for the lock file, but if that is the solution, it would be good to understand first why the socket solution sometimes fails.
        Show
        Kathey Marsden added a comment - Below are my brief and somewhat random thoughts on this issue. My gut feel is that this is some sort of firewall interference. If someone can reproduce, you might want to turn your firewall off to verify. The test coordinates launch of a spawned process with a socket, but it actually uses the same port as network server and the java in $JAVA_HOME/bin so while it feels like a firewall issue, I am don't really understand why the firewall would complain. Mike suggested a previous tests network server process might be interfering and the port reused too quickly but I think if the previous tests network server did a clean shutdown that should not be the problem. I thought maybe the test could instead of opening a socket, verify that the spawned test had booted derby by just checking for the lock file, but if that is the solution, it would be good to understand first why the socket solution sometimes fails.
        Hide
        Mamta A. Satoor added a comment -

        I ran the test stand alone handful of times and can't reproduce the problem. Following is the screen output of one of the test runs.

        $ java -Dderby.tests.trace=true -Xmx256M -XX:MaxPermSize=128M junit.textui.TestRunner org.apache.derbyTesting.functionTests.tests.store.BootLockTest
        .
        testBootLock used 14375 ms
        Time: 16.5

        OK (1 test)

        Show
        Mamta A. Satoor added a comment - I ran the test stand alone handful of times and can't reproduce the problem. Following is the screen output of one of the test runs. $ java -Dderby.tests.trace=true -Xmx256M -XX:MaxPermSize=128M junit.textui.TestRunner org.apache.derbyTesting.functionTests.tests.store.BootLockTest . testBootLock used 14375 ms Time: 16.5 OK (1 test)
        Hide
        Mamta A. Satoor added a comment -

        I will run the store suite org.apache.derbyTesting.functionTests.tests.store._Suite (this suite includes BootLockTest) to see if the previous test(s) might be interfering with BootLockTest.

        Show
        Mamta A. Satoor added a comment - I will run the store suite org.apache.derbyTesting.functionTests.tests.store._Suite (this suite includes BootLockTest) to see if the previous test(s) might be interfering with BootLockTest.
        Hide
        Dag H. Wanvik added a comment -

        Would changing the port to the one used by the replication tests for example, make the problem go away? Or could it be that the minion somehow races past the parent (including creating a Derby database..??!) before the parent has set up the server port?
        In the latter case, a solution could be to set up the port before forking. Weird..

        Show
        Dag H. Wanvik added a comment - Would changing the port to the one used by the replication tests for example, make the problem go away? Or could it be that the minion somehow races past the parent (including creating a Derby database..??!) before the parent has set up the server port? In the latter case, a solution could be to set up the port before forking. Weird..
        Hide
        Mamta A. Satoor added a comment -

        BTW, I ran the store suite multiple times on my laptop and didn't see the failure.

        Show
        Mamta A. Satoor added a comment - BTW, I ran the store suite multiple times on my laptop and didn't see the failure.
        Hide
        Mamta A. Satoor added a comment -

        Thought I would share that the test problem didn't repro on IBM's nightly machine either. I tried it both standalone and as part of store suite.

        Show
        Mamta A. Satoor added a comment - Thought I would share that the test problem didn't repro on IBM's nightly machine either. I tried it both standalone and as part of store suite.
        Hide
        Kathey Marsden added a comment -

        Mamta, do you have the test directory of a failing run? If so, might there be some clues in there as to the reason for the failure, like something in the derby.log? Otherwise could we add more debug information in the event of a failure?

        Show
        Kathey Marsden added a comment - Mamta, do you have the test directory of a failing run? If so, might there be some clues in there as to the reason for the failure, like something in the derby.log? Otherwise could we add more debug information in the event of a failure?
        Hide
        Mamta A. Satoor added a comment -

        I do not have the test directory from my laptop. If I see the failure again, I will surely attach the derby.log and any other information that might look relevant.

        In the mean time, I will look into adding some debug info in the test. Does anyone have any ideas on what might be useful to dump in case of failures? I have not looked at the test myself yet.

        Show
        Mamta A. Satoor added a comment - I do not have the test directory from my laptop. If I see the failure again, I will surely attach the derby.log and any other information that might look relevant. In the mean time, I will look into adding some debug info in the test. Does anyone have any ideas on what might be useful to dump in case of failures? I have not looked at the test myself yet.
        Hide
        Kathey Marsden added a comment -

        I am attaching the output from this failure when it once happened.

        The interesting things I noticed are:
        1) The BootLockTestDB database got created just fine by the forked process at 2010-06-02 09:25:34.284 GMT
        2) The last thing in the parent process (derby.log) were a start and stop of network server at:
        2010-06-02 09:25:32.862 GMT : Apache Derby Network Server - 10.7.0.0 alpha - (949333) started and ready to accept connections on port 1527

        2010-06-02 09:25:33.268 GMT : Apache Derby Network Server - 10.7.0.0 alpha - (949333) shutdown

        My impression is that even though the test is not running any fixtures in network sever mode it is starting and stopping network server for some reason and doing so, so close to the time that the port is being used for the synchronization that it interferes. I'll take a look at the test.

        Show
        Kathey Marsden added a comment - I am attaching the output from this failure when it once happened. The interesting things I noticed are: 1) The BootLockTestDB database got created just fine by the forked process at 2010-06-02 09:25:34.284 GMT 2) The last thing in the parent process (derby.log) were a start and stop of network server at: 2010-06-02 09:25:32.862 GMT : Apache Derby Network Server - 10.7.0.0 alpha - (949333) started and ready to accept connections on port 1527 2010-06-02 09:25:33.268 GMT : Apache Derby Network Server - 10.7.0.0 alpha - (949333) shutdown My impression is that even though the test is not running any fixtures in network sever mode it is starting and stopping network server for some reason and doing so, so close to the time that the port is being used for the synchronization that it interferes. I'll take a look at the test.
        Hide
        Kathey Marsden added a comment -

        I think the quick start and stop of network server is from the previous test Derby151Test, which has just one fixture which doesn't actually run on the IBM jvm, because of DERBY-4463. I am moving the vm logic outside of the fixture to avoid this and will let the IBM run, run tonight as I am curious if that is causing the problem.

        Show
        Kathey Marsden added a comment - I think the quick start and stop of network server is from the previous test Derby151Test, which has just one fixture which doesn't actually run on the IBM jvm, because of DERBY-4463 . I am moving the vm logic outside of the fixture to avoid this and will let the IBM run, run tonight as I am curious if that is causing the problem.
        Hide
        Kathey Marsden added a comment -

        Attached is a new patch for this issue. I removed the socket synchronization and just have the calling test wait for the lock file to be created before attempting the dual boot. I can't say I totally understand the socket issue, but since the new approach doesn't use sockets, it should be ok.

        Show
        Kathey Marsden added a comment - Attached is a new patch for this issue. I removed the socket synchronization and just have the calling test wait for the lock file to be created before attempting the dual boot. I can't say I totally understand the socket issue, but since the new approach doesn't use sockets, it should be ok.
        Hide
        Kathey Marsden added a comment -

        I think this is actually just a trunk issue. Only trunk has the converted test.

        Show
        Kathey Marsden added a comment - I think this is actually just a trunk issue. Only trunk has the converted test.

          People

          • Assignee:
            Kathey Marsden
            Reporter:
            Kathey Marsden
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development