Derby
  1. Derby
  2. DERBY-4053

Network Server's failure to rollback local transactions on shutdown can cause hang on startup with message java.net.BindException: Address already in use: NET_Bind in derby.log

    Details

    • Urgency:
      Blocker
    • Bug behavior facts:
      Regression Test Failure

      Description

      Running suites.All with IBM 1.5 on 10.5.0.0 alpha - (743198) I got a hang in the test run. The last test to run successfully was xtestNestedSavepoints, but I am not sure exactly what test caused the hang. I took a thread dump which I will attach, which showed network server up and running but no ClientThread and a ping attempt blocked.

      This hang is very similar to the hang that was seen after the fix attempts for DERBY-1465 but that change was backed out so it is not related to that change. It could be that the change for DERBY-1465 just made this highly intermittent problem more likely.

      1. derby.log
        83 kB
        Kathey Marsden
      2. javacore.20090211.123031.4000.0001.txt
        1.25 MB
        Kathey Marsden
      3. suites.All.out
        389 kB
        Kathey Marsden
      4. derby-4053_repro_dont_commit_diff.txt
        9 kB
        Kathey Marsden
      5. javacore-20090420-1735.txt
        7 kB
        Suran Jayathilaka
      6. DERBY4053_patch1_diff_July092009.txt
        5 kB
        Mamta A. Satoor
      7. mamtaDerby4053.java
        5 kB
        Mamta A. Satoor
      8. DERBY4053_patch2_diff_July102009.txt
        9 kB
        Mamta A. Satoor
      9. DERBY4053_patch2_stat_July102009.txt
        0.6 kB
        Mamta A. Satoor
      10. DERBY4053_patch3_diff_July102009.txt
        9 kB
        Mamta A. Satoor
      11. DERBY4053_patch3_stat_July102009.txt
        0.6 kB
        Mamta A. Satoor
      12. derby-4053_diag_diff.txt
        0.8 kB
        Kathey Marsden

        Issue Links

          Activity

          Hide
          Kathey Marsden added a comment -

          Attached are the derby.log, java core dump at the time of the hang, and the System.out of the run. This was run on a single CPU machine. Windows XP

          Show
          Kathey Marsden added a comment - Attached are the derby.log, java core dump at the time of the hang, and the System.out of the run. This was run on a single CPU machine. Windows XP
          Hide
          Myrna van Lunteren added a comment -

          I've seen this also with 10.4 and ibm 1.5, on a machine with 2 cpu, with 'hyperthreading' switched on.
          I'll experiment and see if it changes things if I switch off the hyperthreading.

          Show
          Myrna van Lunteren added a comment - I've seen this also with 10.4 and ibm 1.5, on a machine with 2 cpu, with 'hyperthreading' switched on. I'll experiment and see if it changes things if I switch off the hyperthreading.
          Hide
          Myrna van Lunteren added a comment -

          The frequency of the hangs on the machine where I run 10.4 tests have diminished a lot after I had hyperthreading switched off. With it switched on, it was practically every run, and not in the same test everytime.
          However, I did see one more recently.
          In addition to the Net-Bind messages in system/derby.log, I also got this in the test output, not sure if that was an environment quirk or if it's related to the problem:
          testAllDataSources used 313 ms java.net.SocketException: Connection reset
          at java.net.SocketInputStream.read(SocketInputStream.java:197)
          at java.net.SocketInputStream.read(SocketInputStream.java:116)
          at org.apache.derby.impl.drda.NetworkServerControlImpl.fillReplyBuffer(Unknown Source)
          at org.apache.derby.impl.drda.NetworkServerControlImpl.readResult(Unknown Source)
          at org.apache.derby.impl.drda.NetworkServerControlImpl.pingWithNoOpen(Unknown Source)
          at org.apache.derby.impl.drda.NetworkServerControlImpl.ping(Unknown Source)
          at org.apache.derby.drda.NetworkServerControl.ping(Unknown Source)
          at org.apache.derbyTesting.junit.NetworkServerTestSetup.pingForServerUp(NetworkServerTestSetup.java:546)
          at org.apache.derbyTesting.junit.NetworkServerTestSetup.pingForServerStart(NetworkServerTestSetup.java:615)
          at org.apache.derbyTesting.junit.NetworkServerTestSetup.setUp(NetworkServerTestSetup.java:191)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:18)
          at junit.framework.TestResult.runProtected(TestResult.java:124)
          at junit.extensions.TestSetup.run(TestSetup.java:23)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
          at junit.framework.TestResult.runProtected(TestResult.java:124)
          at junit.extensions.TestSetup.run(TestSetup.java:23)
          at junit.framework.TestSuite.runTest(TestSuite.java:208)
          at junit.framework.TestSuite.run(TestSuite.java:203)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
          at junit.framework.TestResult.runProtected(TestResult.java:124)
          at junit.extensions.TestSetup.run(TestSetup.java:23)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          at junit.framework.TestSuite.runTest(TestSuite.java:208)
          at junit.framework.TestSuite.run(TestSuite.java:203)
          at junit.framework.TestSuite.runTest(TestSuite.java:208)
          at junit.framework.TestSuite.run(TestSuite.java:203)
          at junit.framework.TestSuite.runTest(TestSuite.java:208)
          at junit.framework.TestSuite.run(TestSuite.java:203)
          at junit.textui.TestRunner.doRun(TestRunner.java:116)
          at junit.textui.TestRunner.start(TestRunner.java:172)
          at junit.textui.TestRunner.main(TestRunner.java:138)
          F.
          ----------------------
          After this, there were a couple more fixtures that apparently went ok:
          ----------------------
          connectionJdbc20 used 2109 ms .
          resultsetJdbc20 used 781 ms .
          maxfieldsize used 67593 ms .
          SetQueryTimeoutTest used 43750 ms .
          rsgetXXXcolumnNames used 125 ms .
          savepointJdbc30_JSR169 used 6594 ms .
          testRelative used 94 ms .
          savepointJdbc30_XA used 6516 ms
          ----------------------
          And after that, nothing.

          Show
          Myrna van Lunteren added a comment - The frequency of the hangs on the machine where I run 10.4 tests have diminished a lot after I had hyperthreading switched off. With it switched on, it was practically every run, and not in the same test everytime. However, I did see one more recently. In addition to the Net-Bind messages in system/derby.log, I also got this in the test output, not sure if that was an environment quirk or if it's related to the problem: testAllDataSources used 313 ms java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:197) at java.net.SocketInputStream.read(SocketInputStream.java:116) at org.apache.derby.impl.drda.NetworkServerControlImpl.fillReplyBuffer(Unknown Source) at org.apache.derby.impl.drda.NetworkServerControlImpl.readResult(Unknown Source) at org.apache.derby.impl.drda.NetworkServerControlImpl.pingWithNoOpen(Unknown Source) at org.apache.derby.impl.drda.NetworkServerControlImpl.ping(Unknown Source) at org.apache.derby.drda.NetworkServerControl.ping(Unknown Source) at org.apache.derbyTesting.junit.NetworkServerTestSetup.pingForServerUp(NetworkServerTestSetup.java:546) at org.apache.derbyTesting.junit.NetworkServerTestSetup.pingForServerStart(NetworkServerTestSetup.java:615) at org.apache.derbyTesting.junit.NetworkServerTestSetup.setUp(NetworkServerTestSetup.java:191) at junit.extensions.TestSetup$1.protect(TestSetup.java:18) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.extensions.TestSetup.run(TestSetup.java:23) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22) at junit.extensions.TestSetup$1.protect(TestSetup.java:19) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.extensions.TestSetup.run(TestSetup.java:23) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22) at junit.extensions.TestSetup$1.protect(TestSetup.java:19) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.extensions.TestSetup.run(TestSetup.java:23) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at junit.textui.TestRunner.doRun(TestRunner.java:116) at junit.textui.TestRunner.start(TestRunner.java:172) at junit.textui.TestRunner.main(TestRunner.java:138) F. ---------------------- After this, there were a couple more fixtures that apparently went ok: ---------------------- connectionJdbc20 used 2109 ms . resultsetJdbc20 used 781 ms . maxfieldsize used 67593 ms . SetQueryTimeoutTest used 43750 ms . rsgetXXXcolumnNames used 125 ms . savepointJdbc30_JSR169 used 6594 ms . testRelative used 94 ms . savepointJdbc30_XA used 6516 ms ---------------------- And after that, nothing.
          Hide
          Kathey Marsden added a comment -

          I hit this on the head of trunk revision 758826 with some minor changes that shouldn't affect this.

          java version "1.6.0"
          Java(TM) SE Runtime Environment (build pwi3260sr3-20081106_07(SR3))
          IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 Windows XP x86-32 jvmwi3260-20081105_25433 (JIT enabled, AOT enabled)
          J9VM - 20081105_025433_lHdSMr
          JIT - r9_20081031_1330
          GC - 20081027_AB)
          JCL - 20081106_01

          The hang happened after testDerby3799 used 0 ms

          Attempts to bring down network server to get my run going again also failed and I had to kill the run and start over.

          last entry in the log had the tell tale error:
          2009-04-09 20:34:58.156 GMT : Could not listen on port 1527 on host 127.0.0.1:
          java.net.BindException: Address already in use: JVM_Bind

          An exception was thrown during network server startup. DRDA_ListenPort.S:Could not listen on port 1527 on host 127.0.0.1:
          java.net.BindException: Address already in use: JVM_Bind

          java.lang.reflect.InvocationTargetException

          at sun.reflect.GeneratedMethodAccessor1950.invoke(Unknown Source)

          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)

          at java.lang.reflect.Method.invoke(Method.java:599)

          at org.apache.derby.iapi.jdbc.DRDAServerStarter.run(DRDAServerStarter.java:236)

          at java.lang.Thread.run(Thread.java:735)

          Caused by: java.lang.Exception: DRDA_ListenPort.S:Could not listen on port 1527 on host 127.0.0.1:
          java.net.BindException: Address already in use: JVM_Bind

          at org.apache.derby.impl.drda.NetworkServerControlImpl.consolePropertyMessageWork(NetworkServerControlImpl.java:3179)

          at org.apache.derby.impl.drda.NetworkServerControlImpl.consolePropertyMessage(NetworkServerControlImpl.java:1861)

          at org.apache.derby.impl.drda.NetworkServerControlImpl.blockingStart(NetworkServerControlImpl.java:731)

          ... 5 more

          Show
          Kathey Marsden added a comment - I hit this on the head of trunk revision 758826 with some minor changes that shouldn't affect this. java version "1.6.0" Java(TM) SE Runtime Environment (build pwi3260sr3-20081106_07(SR3)) IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 Windows XP x86-32 jvmwi3260-20081105_25433 (JIT enabled, AOT enabled) J9VM - 20081105_025433_lHdSMr JIT - r9_20081031_1330 GC - 20081027_AB) JCL - 20081106_01 The hang happened after testDerby3799 used 0 ms Attempts to bring down network server to get my run going again also failed and I had to kill the run and start over. last entry in the log had the tell tale error: 2009-04-09 20:34:58.156 GMT : Could not listen on port 1527 on host 127.0.0.1: java.net.BindException: Address already in use: JVM_Bind An exception was thrown during network server startup. DRDA_ListenPort.S:Could not listen on port 1527 on host 127.0.0.1: java.net.BindException: Address already in use: JVM_Bind java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor1950.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:599) at org.apache.derby.iapi.jdbc.DRDAServerStarter.run(DRDAServerStarter.java:236) at java.lang.Thread.run(Thread.java:735) Caused by: java.lang.Exception: DRDA_ListenPort.S:Could not listen on port 1527 on host 127.0.0.1: java.net.BindException: Address already in use: JVM_Bind at org.apache.derby.impl.drda.NetworkServerControlImpl.consolePropertyMessageWork(NetworkServerControlImpl.java:3179) at org.apache.derby.impl.drda.NetworkServerControlImpl.consolePropertyMessage(NetworkServerControlImpl.java:1861) at org.apache.derby.impl.drda.NetworkServerControlImpl.blockingStart(NetworkServerControlImpl.java:731) ... 5 more
          Hide
          Kathey Marsden added a comment - - edited

          I stumbled across a reliable reproduction for this issue when trying to enable XATest (DERBY-4155). To reproduce apply the patch derby-4053_repro_dont_commit_diff.txt and run jdbcapi._ShortSuite.

          _ShortSuite runs only XATest,DataSourceReferenceTest and J2EEDataSourceTest
          The run gets an invalid reply failure on testDataSourceReference during a ping and then the hang after testDerby3799.

          I ran with
          java version "1.6.0"
          Java(TM) SE Runtime Environment (build pwi3260sr3-20081106_07(SR3))
          IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 Windows XP x86-32 jvmwi3260-20081105_25433 (JIT enabled, AOT enabled)
          J9VM - 20081105_025433_lHdSMr
          JIT - r9_20081031_1330
          GC - 20081027_AB)
          JCL - 20081106_01

          Show
          Kathey Marsden added a comment - - edited I stumbled across a reliable reproduction for this issue when trying to enable XATest ( DERBY-4155 ). To reproduce apply the patch derby-4053_repro_dont_commit_diff.txt and run jdbcapi._ShortSuite. _ShortSuite runs only XATest,DataSourceReferenceTest and J2EEDataSourceTest The run gets an invalid reply failure on testDataSourceReference during a ping and then the hang after testDerby3799. I ran with java version "1.6.0" Java(TM) SE Runtime Environment (build pwi3260sr3-20081106_07(SR3)) IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 Windows XP x86-32 jvmwi3260-20081105_25433 (JIT enabled, AOT enabled) J9VM - 20081105_025433_lHdSMr JIT - r9_20081031_1330 GC - 20081027_AB) JCL - 20081106_01
          Hide
          Kathey Marsden added a comment -

          It doesn't seem to reproduce with the sun jvm
          java version "1.6.0_01"
          Java(TM) SE Runtime Environment (build 1.6.0_01-b06)
          Java HotSpot(TM) Client VM (build 1.6.0_01-b06, mixed mode)

          I get neither the Invalid Reply, nor the hang.

          Show
          Kathey Marsden added a comment - It doesn't seem to reproduce with the sun jvm java version "1.6.0_01" Java(TM) SE Runtime Environment (build 1.6.0_01-b06) Java HotSpot(TM) Client VM (build 1.6.0_01-b06, mixed mode) I get neither the Invalid Reply, nor the hang.
          Hide
          Myrna van Lunteren added a comment -

          I tried that repro with the same 1.6 sr3 on my windows machine, sane classes, and sane jars, but it didn't reproduce for me...
          classpath (in case it matters) was: derby.jar;derbytools.jar;derbynet.jar;derbyclient.jar;derbyrun.jar;derbyTesting.jar;junit.jar
          (with appropriate paths to the various jars).

          Show
          Myrna van Lunteren added a comment - I tried that repro with the same 1.6 sr3 on my windows machine, sane classes, and sane jars, but it didn't reproduce for me... classpath (in case it matters) was: derby.jar;derbytools.jar;derbynet.jar;derbyclient.jar;derbyrun.jar;derbyTesting.jar;junit.jar (with appropriate paths to the various jars).
          Hide
          Suran Jayathilaka added a comment -

          When testing client/server compatibility with 10.5.1.1.-rc2 and 10.4.2.0 I encountered what seems to be a reproduction of this issue.
          Environment: AMD Phenom II Quad Core CPU, 4GB Ram, Vista 64

          java version "1.6.0_10"
          Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
          Java HotSpot(TM) Client VM (build 11.0-b15, mixed mode, sharing)

          Client - 10.4.2.0. Jars
          -------------------
          derbyTesting.jar
          derbyClient.jar

          Server - 10.5.1.1. Jars
          -------------------
          derby.jar
          derbynet.jar
          derbytools.jar
          derbyLocale*.jar
          derbyrun.jar

          XA40Test.testCallableStatementPoolable fails with the following exception. And then the suite hangs.
          -----------------------------------------------------------------------------------------------------
          testPreparedStatementPoolable used 1 ms .
          testCallableStatementPoolable used 2 ms java.lang.Exception: DRDA_InvalidReplyTooShort.S:Invalid reply from network server: Insufficient data.
          at org.apache.derby.impl.drda.NetworkServerControlImpl.consolePropertyMessageWork(Unknown Source)
          at org.apache.derby.impl.drda.NetworkServerControlImpl.consolePropertyMessage(Unknown Source)
          at org.apache.derby.impl.drda.NetworkServerControlImpl.fillReplyBuffer(Unknown Source)
          at org.apache.derby.impl.drda.NetworkServerControlImpl.readResult(Unknown Source)
          at org.apache.derby.impl.drda.NetworkServerControlImpl.pingWithNoOpen(Unknown Source)
          at org.apache.derby.impl.drda.NetworkServerControlImpl.ping(Unknown Source)
          at org.apache.derby.drda.NetworkServerControl.ping(Unknown Source)
          at org.apache.derbyTesting.junit.NetworkServerTestSetup.pingForServerUp(NetworkServerTestSetup.java:546)
          at org.apache.derbyTesting.junit.NetworkServerTestSetup.pingForServerStart(NetworkServerTestSetup.java:615)
          at org.apache.derbyTesting.junit.NetworkServerTestSetup.setUp(NetworkServerTestSetup.java:191)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:20)
          at junit.framework.TestResult.runProtected(TestResult.java:124)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.framework.TestResult.runProtected(TestResult.java:124)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.framework.TestResult.runProtected(TestResult.java:124)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          at junit.framework.TestSuite.runTest(TestSuite.java:230)
          at junit.framework.TestSuite.run(TestSuite.java:225)
          at junit.framework.TestSuite.runTest(TestSuite.java:230)
          at junit.framework.TestSuite.run(TestSuite.java:225)
          at junit.textui.TestRunner.doRun(TestRunner.java:121)
          at junit.textui.TestRunner.start(TestRunner.java:185)
          at junit.textui.TestRunner.main(TestRunner.java:143)
          F.
          testCreateClob used 16 ms .
          testCreateBlob used 9 ms .
          testConnectionIsValid used 674 ms
          ---------<hang>---------------------

          The log shows the following
          --------------------------------------
          2009-04-20 16:41:07.264 GMT : Apache Derby Network Server - 10.5.1.1 - (764942) started and ready to accept connections on port 1527
          2009-04-20 16:41:10.618 GMT : Invalid reply from network server: Insufficient data.
          2009-04-20 16:41:10.618 GMT : Could not listen on port 1527 on host 127.0.0.1:
          java.net.BindException: Address already in use: JVM_Bind
          2009-04-20 16:41:12.416 GMT : Could not listen on port 1527 on host 127.0.0.1:
          java.net.BindException: Address already in use: JVM_Bind

          This hang occurred both times I ran suites.All and also when I ran jdbc4._Suite alone , always on new test folders.
          In the last instance, attempting to kill the hung test with
          java org.apache.derby.drda.NetworkServerControl shutdown
          also hung.

          Javacore file attached.

          Show
          Suran Jayathilaka added a comment - When testing client/server compatibility with 10.5.1.1.-rc2 and 10.4.2.0 I encountered what seems to be a reproduction of this issue. Environment: AMD Phenom II Quad Core CPU, 4GB Ram, Vista 64 java version "1.6.0_10" Java(TM) SE Runtime Environment (build 1.6.0_10-b33) Java HotSpot(TM) Client VM (build 11.0-b15, mixed mode, sharing) Client - 10.4.2.0. Jars ------------------- derbyTesting.jar derbyClient.jar Server - 10.5.1.1. Jars ------------------- derby.jar derbynet.jar derbytools.jar derbyLocale*.jar derbyrun.jar XA40Test.testCallableStatementPoolable fails with the following exception. And then the suite hangs. ----------------------------------------------------------------------------------------------------- testPreparedStatementPoolable used 1 ms . testCallableStatementPoolable used 2 ms java.lang.Exception: DRDA_InvalidReplyTooShort.S:Invalid reply from network server: Insufficient data. at org.apache.derby.impl.drda.NetworkServerControlImpl.consolePropertyMessageWork(Unknown Source) at org.apache.derby.impl.drda.NetworkServerControlImpl.consolePropertyMessage(Unknown Source) at org.apache.derby.impl.drda.NetworkServerControlImpl.fillReplyBuffer(Unknown Source) at org.apache.derby.impl.drda.NetworkServerControlImpl.readResult(Unknown Source) at org.apache.derby.impl.drda.NetworkServerControlImpl.pingWithNoOpen(Unknown Source) at org.apache.derby.impl.drda.NetworkServerControlImpl.ping(Unknown Source) at org.apache.derby.drda.NetworkServerControl.ping(Unknown Source) at org.apache.derbyTesting.junit.NetworkServerTestSetup.pingForServerUp(NetworkServerTestSetup.java:546) at org.apache.derbyTesting.junit.NetworkServerTestSetup.pingForServerStart(NetworkServerTestSetup.java:615) at org.apache.derbyTesting.junit.NetworkServerTestSetup.setUp(NetworkServerTestSetup.java:191) at junit.extensions.TestSetup$1.protect(TestSetup.java:20) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.extensions.TestSetup.run(TestSetup.java:25) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.extensions.TestSetup.run(TestSetup.java:25) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.extensions.TestSetup.run(TestSetup.java:25) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.framework.TestSuite.runTest(TestSuite.java:230) at junit.framework.TestSuite.run(TestSuite.java:225) at junit.framework.TestSuite.runTest(TestSuite.java:230) at junit.framework.TestSuite.run(TestSuite.java:225) at junit.textui.TestRunner.doRun(TestRunner.java:121) at junit.textui.TestRunner.start(TestRunner.java:185) at junit.textui.TestRunner.main(TestRunner.java:143) F. testCreateClob used 16 ms . testCreateBlob used 9 ms . testConnectionIsValid used 674 ms --------- <hang> --------------------- The log shows the following -------------------------------------- 2009-04-20 16:41:07.264 GMT : Apache Derby Network Server - 10.5.1.1 - (764942) started and ready to accept connections on port 1527 2009-04-20 16:41:10.618 GMT : Invalid reply from network server: Insufficient data. 2009-04-20 16:41:10.618 GMT : Could not listen on port 1527 on host 127.0.0.1: java.net.BindException: Address already in use: JVM_Bind 2009-04-20 16:41:12.416 GMT : Could not listen on port 1527 on host 127.0.0.1: java.net.BindException: Address already in use: JVM_Bind This hang occurred both times I ran suites.All and also when I ran jdbc4._Suite alone , always on new test folders. In the last instance, attempting to kill the hung test with java org.apache.derby.drda.NetworkServerControl shutdown also hung. Javacore file attached.
          Hide
          Suran Jayathilaka added a comment -

          javacore for 10.4Client/10.5server compat testing hang.

          Show
          Suran Jayathilaka added a comment - javacore for 10.4Client/10.5server compat testing hang.
          Hide
          Mike Matrigali added a comment -

          Looking at the javacore the tests are hung waiting on a ping:
          "main" prio=6 tid=0x003e9c00 nid=0x10ac runnable [0x0023f000..0x0023fe28]
          java.lang.Thread.State: RUNNABLE
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(SocketInputStream.java:129)
          at java.net.SocketInputStream.read(SocketInputStream.java:90)
          at org.apache.derby.impl.drda.NetworkServerControlImpl.fillReplyBuffer(Unknown Source)
          at org.apache.derby.impl.drda.NetworkServerControlImpl.readResult(Unknown Source)
          at org.apache.derby.impl.drda.NetworkServerControlImpl.pingWithNoOpen(Unknown Source)
          at org.apache.derby.impl.drda.NetworkServerControlImpl.ping(Unknown Source)
          at org.apache.derby.drda.NetworkServerControl.ping(Unknown Source)
          at org.apache.derbyTesting.junit.NetworkServerTestSetup.pingForServerUp(NetworkServerTestSetup.java:546)
          at org.apache.derbyTesting.junit.NetworkServerTestSetup.pingForServerStart(NetworkServerTestSetup.java:615)
          at org.apache.derbyTesting.junit.NetworkServerTestSetup.setUp(NetworkServerTestSetup.java:191)

          If anyone can reproduce this it might be interesting to instrument pingForServerUp to
          see what path it is taking through this code. Did the while loop go more than once,
          what are all the variables to the routine, did it hit any of the try/catch blocks (some
          don't do anything), ...

          It seems like this setup code should be changed to not wait forever in the ping since
          the tests seem to be able to break the network server and make it not ping. So maybe some sort of timer thread could be used to make the ping not take forever.
          It would be best to understand why the ping is taking forever and maybe if we did
          not hang then we could add more diagnostics to this routine to figure why the ping
          is not returning.

          Show
          Mike Matrigali added a comment - Looking at the javacore the tests are hung waiting on a ping: "main" prio=6 tid=0x003e9c00 nid=0x10ac runnable [0x0023f000..0x0023fe28] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.net.SocketInputStream.read(SocketInputStream.java:90) at org.apache.derby.impl.drda.NetworkServerControlImpl.fillReplyBuffer(Unknown Source) at org.apache.derby.impl.drda.NetworkServerControlImpl.readResult(Unknown Source) at org.apache.derby.impl.drda.NetworkServerControlImpl.pingWithNoOpen(Unknown Source) at org.apache.derby.impl.drda.NetworkServerControlImpl.ping(Unknown Source) at org.apache.derby.drda.NetworkServerControl.ping(Unknown Source) at org.apache.derbyTesting.junit.NetworkServerTestSetup.pingForServerUp(NetworkServerTestSetup.java:546) at org.apache.derbyTesting.junit.NetworkServerTestSetup.pingForServerStart(NetworkServerTestSetup.java:615) at org.apache.derbyTesting.junit.NetworkServerTestSetup.setUp(NetworkServerTestSetup.java:191) If anyone can reproduce this it might be interesting to instrument pingForServerUp to see what path it is taking through this code. Did the while loop go more than once, what are all the variables to the routine, did it hit any of the try/catch blocks (some don't do anything), ... It seems like this setup code should be changed to not wait forever in the ping since the tests seem to be able to break the network server and make it not ping. So maybe some sort of timer thread could be used to make the ping not take forever. It would be best to understand why the ping is taking forever and maybe if we did not hang then we could add more diagnostics to this routine to figure why the ping is not returning.
          Hide
          Kathey Marsden added a comment -

          My gut instinct is that this is some sort of shutdown timing problem with all the server side cleanup that starts about line 807 in NetworkServerControlImpl or perhaps a problem with bring it up in the first place. The key thing I notice is there is no ClientThread running which normally accepts new requests, but it does not seem the server socket was closed because there is one DRDAConnThread that is blocked on a read and one ping thread blocked on read that would terminate if the server.socket were closed.

          It would be interesting to trace when this occurs.
          1) calls to shutdown and start
          2) Entry into the for (; loop in ClientThread
          3) returns out of the ClientThread loop
          4) Remove the if(!parent.getShudown()) in this code so we always print the IOException if it occurs. Perhaps there is an exception we are not seeing.
          if (!parent.getShutdown())

          { parent.consoleExceptionPrintTrace(ioe); if (clientSocket != null) clientSocket.close(); }
          Show
          Kathey Marsden added a comment - My gut instinct is that this is some sort of shutdown timing problem with all the server side cleanup that starts about line 807 in NetworkServerControlImpl or perhaps a problem with bring it up in the first place. The key thing I notice is there is no ClientThread running which normally accepts new requests, but it does not seem the server socket was closed because there is one DRDAConnThread that is blocked on a read and one ping thread blocked on read that would terminate if the server.socket were closed. It would be interesting to trace when this occurs. 1) calls to shutdown and start 2) Entry into the for (; loop in ClientThread 3) returns out of the ClientThread loop 4) Remove the if(!parent.getShudown()) in this code so we always print the IOException if it occurs. Perhaps there is an exception we are not seeing. if (!parent.getShutdown()) { parent.consoleExceptionPrintTrace(ioe); if (clientSocket != null) clientSocket.close(); }
          Hide
          Kathey Marsden added a comment -

          I noticed this happened on the IBM 10.4 run last night and had the DRDA_InvalidReplyTooShort.S:Invalid reply from network server: Insufficient data trace on a ping shortly before the hang. I looked briefly at how this might occur, that a ping would return insufficient data.

          A ping would go through this code in DRDAConnThread:
          private void sessionInitialState()
          throws Exception
          {
          // process NetworkServerControl commands - if it is not either valid protocol let the
          // DRDA error handling handle it
          if (reader.isCmd())
          {
          try

          { server.processCommands(reader, writer, session); // reset reader and writer reader.initialize(this, null); writer.reset(null); closeSession(); }

          catch (Throwable t) {
          if (t instanceof InterruptedException)
          throw (InterruptedException)t;
          else

          { server.consoleExceptionPrintTrace(t); }

          }

          In NetworkServerControlImpl during shutdown we do the following which might interrupt the thread:
          synchronized (threadList)
          {
          //interupt any connection threads still active
          for (int i = 0; i < threadList.size(); i++)
          {
          final DRDAConnThread threadi = (DRDAConnThread)threadList.get;

          threadi.close();
          AccessController.doPrivileged(
          new PrivilegedAction() {
          public Object run()

          { threadi.interrupt(); return null; }

          });
          }
          threadList.clear();
          }

          It seems a kind of abrupt way to shutdown the threads. Would calling threadi() close() be more appropriate so it could finish what it was doing? I am also unclear about the state of the cleanup when such an interrupt occurs during the middle of writing a response.

          Tomorrow I will try putting a sleep between the reply header and the OK byte on the ping response and try to shutdown at the same time to see if I can get the InvalidReplyTooShort and then see if subsequent connection attempts hang.

          Show
          Kathey Marsden added a comment - I noticed this happened on the IBM 10.4 run last night and had the DRDA_InvalidReplyTooShort.S:Invalid reply from network server: Insufficient data trace on a ping shortly before the hang. I looked briefly at how this might occur, that a ping would return insufficient data. A ping would go through this code in DRDAConnThread: private void sessionInitialState() throws Exception { // process NetworkServerControl commands - if it is not either valid protocol let the // DRDA error handling handle it if (reader.isCmd()) { try { server.processCommands(reader, writer, session); // reset reader and writer reader.initialize(this, null); writer.reset(null); closeSession(); } catch (Throwable t) { if (t instanceof InterruptedException) throw (InterruptedException)t; else { server.consoleExceptionPrintTrace(t); } } In NetworkServerControlImpl during shutdown we do the following which might interrupt the thread: synchronized (threadList) { //interupt any connection threads still active for (int i = 0; i < threadList.size(); i++) { final DRDAConnThread threadi = (DRDAConnThread)threadList.get ; threadi.close(); AccessController.doPrivileged( new PrivilegedAction() { public Object run() { threadi.interrupt(); return null; } }); } threadList.clear(); } It seems a kind of abrupt way to shutdown the threads. Would calling threadi() close() be more appropriate so it could finish what it was doing? I am also unclear about the state of the cleanup when such an interrupt occurs during the middle of writing a response. Tomorrow I will try putting a sleep between the reply header and the OK byte on the ping response and try to shutdown at the same time to see if I can get the InvalidReplyTooShort and then see if subsequent connection attempts hang.
          Hide
          Mamta A. Satoor added a comment -

          Based on Kathey's suggestion, I tried putting sleep in the Network Server code right in the middle of ping protocol handshake.

          Following is what happens for ping on the server side (in NetworkServerControlImpl)
          private void sendOK(DDMWriter writer) throws Exception

          { writeCommandReplyHeader(writer); writer.writeByte(OK); writer.flush(); }

          I have copied the sendOK code inline where the ping is handled in NetworkServerControlImpl.processCommands(). Additionally, I changed that copied code to have the server sleep after writing the header but before sending the ok to the ping client as shown below.
          writeCommandReplyHeader(writer);
          writer.flush();
          System.out.println("before going to sleep");
          Thread.sleep(10000);
          System.out.println("after sleep");
          writer.writeByte(OK);
          System.out.println("after sending OK");
          writer.flush();
          System.out.println("after flushing OK");
          With the code changes above, I thought I would be able to reproduce the bug if I tried shutting down server while the server was still sleeping during ping handshake (ie before the ping protocol handshake is all finished). What I found was that the server shutdown properly, ping client got expected Invalid reply from network server: Insufficient data. We thought that if we tried bringing the server back up and tried ping on the new server session, it will hang because of the earlier insufficient data but that didn't happen. A hang here would have probably duplicated the intermittent hang behavior that we see when the nightly tests are running.

          Little more info on exact steps for the test case above
          Window 1 : Start the server
          java org.apache.derby.drda.NetworkServerControl -noSecurityManager start -p 1639
          Window 2 : ping the server (this put the server in the sleep mode)
          java org.apache.derby.drda.NetworkServerControl ping -p 1639
          Window 3: while server is sleeping, send shutdown request
          java org.apache.derby.drda.NetworkServerControl shutdown -p 1639

          After spending more time on the experiment above, found that the ping client was getting
          insufficient data because of the "writer.flush()" which I added right before
          Thread.sleep(...). This happened both with Sun and IBM jvms(1.6 versions). once I took the
          additional writer.flush() out, the ping client ran successfully and there was no insufficient
          data error.

          The goal here is to get a consistently (small) reproducible test case which will make debugging the problem easier but have not been to cause the ping to run into insufficient data in a small repro yet. Will brainstorm more but in the meantime, if anyone has any ideas
          on what may be causing the insufficient data error, I can pursue those.

          Show
          Mamta A. Satoor added a comment - Based on Kathey's suggestion, I tried putting sleep in the Network Server code right in the middle of ping protocol handshake. Following is what happens for ping on the server side (in NetworkServerControlImpl) private void sendOK(DDMWriter writer) throws Exception { writeCommandReplyHeader(writer); writer.writeByte(OK); writer.flush(); } I have copied the sendOK code inline where the ping is handled in NetworkServerControlImpl.processCommands(). Additionally, I changed that copied code to have the server sleep after writing the header but before sending the ok to the ping client as shown below. writeCommandReplyHeader(writer); writer.flush(); System.out.println("before going to sleep"); Thread.sleep(10000); System.out.println("after sleep"); writer.writeByte(OK); System.out.println("after sending OK"); writer.flush(); System.out.println("after flushing OK"); With the code changes above, I thought I would be able to reproduce the bug if I tried shutting down server while the server was still sleeping during ping handshake (ie before the ping protocol handshake is all finished). What I found was that the server shutdown properly, ping client got expected Invalid reply from network server: Insufficient data. We thought that if we tried bringing the server back up and tried ping on the new server session, it will hang because of the earlier insufficient data but that didn't happen. A hang here would have probably duplicated the intermittent hang behavior that we see when the nightly tests are running. Little more info on exact steps for the test case above Window 1 : Start the server java org.apache.derby.drda.NetworkServerControl -noSecurityManager start -p 1639 Window 2 : ping the server (this put the server in the sleep mode) java org.apache.derby.drda.NetworkServerControl ping -p 1639 Window 3: while server is sleeping, send shutdown request java org.apache.derby.drda.NetworkServerControl shutdown -p 1639 After spending more time on the experiment above, found that the ping client was getting insufficient data because of the "writer.flush()" which I added right before Thread.sleep(...). This happened both with Sun and IBM jvms(1.6 versions). once I took the additional writer.flush() out, the ping client ran successfully and there was no insufficient data error. The goal here is to get a consistently (small) reproducible test case which will make debugging the problem easier but have not been to cause the ping to run into insufficient data in a small repro yet. Will brainstorm more but in the meantime, if anyone has any ideas on what may be causing the insufficient data error, I can pursue those.
          Hide
          Mamta A. Satoor added a comment -

          Wanted to post some progress on this issue
          1)First of all, I don't need to run the entire suite anymore to run into hang problem. Just running the jdbc4 suite is enough to repro the problem. that is saving tons of time with reproducing the bug on my machine when running with 10.5 server jars and 10.4 client jars
          2)jdbc4 suite is consistently hanging in jdbc4.ConnectionMethodsTest
          3)The problem is not really with the test that is hung, rather it is the previous test jdbc4.UnsupportedVetter
          4)After lonts and lots of printlns in the network server code, it appears the problem might be that XA connections may not be cleaning up properly and hence the server can't shutdown successfully. But ofcourse as this jira shows, we do not handle that case properly. This leads into server not getting shutdown when the junit test is expecting it to be shutdown. The next test when it tries to bring the server up can't do that successfully.

          The next step is to come up with a XA test case to repro the improper cleanup of XA connection.

          Following is the stack trace that I have added into my codeline showing XA connection not closing properly

          java.sql.SQLException: Cannot close a connection while a transaction is still active.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:95)
          at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:87)
          at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:103)
          at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:167)
          at org.apache.derby.impl.jdbc.EmbedConnection.newSQLExceptionEmbedConnection.java:2948)
          at org.apache.derby.impl.jdbc.EmbedConnection.checkForTransactionInProgress(EmbedConnection.java:1806)
          at org.apache.derby.jdbc.EmbedPooledConnection.checkCloseEmbedPooledConnection.java:459)
          at org.apache.derby.jdbc.EmbedXAConnection.checkCloseEmbedXAConnection.java:152)
          at org.apache.derby.iapi.jdbc.BrokeredConnection.close(BrokeredConnection.java:169)
          at org.apache.derby.impl.drda.Database.close(Database.java:362)
          at org.apache.derby.impl.drda.Session.close(Session.java:115)
          at org.apache.derby.impl.drda.NetworkServerControlImpl.blockingStart(NetworkServerControlImpl.java:838)
          at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
          at java.lang.reflect.Method.invoke(Method.java:599)
          at org.apache.derby.iapi.jdbc.DRDAServerStarter.run(DRDAServerStarter.java:236)
          at java.lang.Thread.run(Thread.java:735)
          Caused by: java.sql.SQLException: Cannot close a connection while a transaction is still active.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:119)
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:73)
          ... 16 more

          Show
          Mamta A. Satoor added a comment - Wanted to post some progress on this issue 1)First of all, I don't need to run the entire suite anymore to run into hang problem. Just running the jdbc4 suite is enough to repro the problem. that is saving tons of time with reproducing the bug on my machine when running with 10.5 server jars and 10.4 client jars 2)jdbc4 suite is consistently hanging in jdbc4.ConnectionMethodsTest 3)The problem is not really with the test that is hung, rather it is the previous test jdbc4.UnsupportedVetter 4)After lonts and lots of printlns in the network server code, it appears the problem might be that XA connections may not be cleaning up properly and hence the server can't shutdown successfully. But ofcourse as this jira shows, we do not handle that case properly. This leads into server not getting shutdown when the junit test is expecting it to be shutdown. The next test when it tries to bring the server up can't do that successfully. The next step is to come up with a XA test case to repro the improper cleanup of XA connection. Following is the stack trace that I have added into my codeline showing XA connection not closing properly java.sql.SQLException: Cannot close a connection while a transaction is still active. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:95) at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:87) at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:103) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:167) at org.apache.derby.impl.jdbc.EmbedConnection.newSQLExceptionEmbedConnection.java:2948) at org.apache.derby.impl.jdbc.EmbedConnection.checkForTransactionInProgress(EmbedConnection.java:1806) at org.apache.derby.jdbc.EmbedPooledConnection.checkCloseEmbedPooledConnection.java:459) at org.apache.derby.jdbc.EmbedXAConnection.checkCloseEmbedXAConnection.java:152) at org.apache.derby.iapi.jdbc.BrokeredConnection.close(BrokeredConnection.java:169) at org.apache.derby.impl.drda.Database.close(Database.java:362) at org.apache.derby.impl.drda.Session.close(Session.java:115) at org.apache.derby.impl.drda.NetworkServerControlImpl.blockingStart(NetworkServerControlImpl.java:838) at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:599) at org.apache.derby.iapi.jdbc.DRDAServerStarter.run(DRDAServerStarter.java:236) at java.lang.Thread.run(Thread.java:735) Caused by: java.sql.SQLException: Cannot close a connection while a transaction is still active. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:119) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:73) ... 16 more
          Hide
          Kathey Marsden added a comment -

          With the DERBY-3319 fix I think this is much more likely to occur at a user site. Marking Urgent to try to get it into the 10.5.2 release.

          I actually would have thought this would be a regression associated with 10.5 after DERBY-3319, but Myrna has reported the problem on 10.4 as well. If it is a regression, it should probably move to BLOCKER.
          Thank you Mamta for tracking down this issue.

          I see three parts to the fix:
          1) Do a better job of cleaning up XA Connections. For local transactions we can rollback, but we need to be more careful to do the right thing if a global transaction is in progress. We cannot simply rollback a prepared transaction.

          2) Make sure an exception during shutdown processing does not prevent the remaining shutdown tasks, like closing the server socket from occurring.

          3) Make sure any exceptions that occur in shutdown processing are reported to the console.

          Show
          Kathey Marsden added a comment - With the DERBY-3319 fix I think this is much more likely to occur at a user site. Marking Urgent to try to get it into the 10.5.2 release. I actually would have thought this would be a regression associated with 10.5 after DERBY-3319 , but Myrna has reported the problem on 10.4 as well. If it is a regression, it should probably move to BLOCKER. Thank you Mamta for tracking down this issue. I see three parts to the fix: 1) Do a better job of cleaning up XA Connections. For local transactions we can rollback, but we need to be more careful to do the right thing if a global transaction is in progress. We cannot simply rollback a prepared transaction. 2) Make sure an exception during shutdown processing does not prevent the remaining shutdown tasks, like closing the server socket from occurring. 3) Make sure any exceptions that occur in shutdown processing are reported to the console.
          Hide
          Knut Anders Hatlen added a comment -

          Taking this from memory... I thought XA wasn't supposed to be affected by DERBY-3319. See my comment dated 07/Jul/08 01:57 AM on that issue. Or perhaps it was just global XA transactions that weren't affected by it. Will have to check later.

          Show
          Knut Anders Hatlen added a comment - Taking this from memory... I thought XA wasn't supposed to be affected by DERBY-3319 . See my comment dated 07/Jul/08 01:57 AM on that issue. Or perhaps it was just global XA transactions that weren't affected by it. Will have to check later.
          Hide
          Knut Anders Hatlen added a comment -

          I checked the code, and connections associated with a global XA transaction can still be closed without being committed or rolled back first. Local XA connections will fail if they are attempted closed while they have uncommitted operations.

          Show
          Knut Anders Hatlen added a comment - I checked the code, and connections associated with a global XA transaction can still be closed without being committed or rolled back first. Local XA connections will fail if they are attempted closed while they have uncommitted operations.
          Hide
          Knut Anders Hatlen added a comment -

          This code in impl.drda.Database.close() probably needs to be changed to check if the connection is a global XA transaction, and not only that it's XA:

          if ((conn != null) && !conn.isClosed())
          {
          if (! forXA)

          { conn.rollback(); }

          conn.close();
          }

          Show
          Knut Anders Hatlen added a comment - This code in impl.drda.Database.close() probably needs to be changed to check if the connection is a global XA transaction, and not only that it's XA: if ((conn != null) && !conn.isClosed()) { if (! forXA) { conn.rollback(); } conn.close(); }
          Hide
          Kathey Marsden added a comment -

          I think that sounds fine for item #1 in my list above. Perhaps we need a new method in the EngineConnection interface (inLocalTransaction()) to determine the transaction state? Then we also need to take care of the other two items I listed above so we don't have problems in the future with errors in shutdown causing this kind of difficult to track down problem.

          Show
          Kathey Marsden added a comment - I think that sounds fine for item #1 in my list above. Perhaps we need a new method in the EngineConnection interface (inLocalTransaction()) to determine the transaction state? Then we also need to take care of the other two items I listed above so we don't have problems in the future with errors in shutdown causing this kind of difficult to track down problem.
          Hide
          Kathey Marsden added a comment -

          Changing to blocker urgency for 10.5.2 at least for the XA hang. The other parts of the fix (proper handling of all exceptions that occur on shutdown and log to console when they occur) could be handled post 10.5.2 with a follow-up issue.

          Show
          Kathey Marsden added a comment - Changing to blocker urgency for 10.5.2 at least for the XA hang. The other parts of the fix (proper handling of all exceptions that occur on shutdown and log to console when they occur) could be handled post 10.5.2 with a follow-up issue.
          Hide
          Knut Anders Hatlen added a comment -

          I'll try to fix the XA hang along the lines suggested above.

          Show
          Knut Anders Hatlen added a comment - I'll try to fix the XA hang along the lines suggested above.
          Hide
          Knut Anders Hatlen added a comment -

          Oops. Just noticed that Kathey had sent a mail to derby-dev asking Mamta to fix this. Unassigning so that I don't block the issue.

          Show
          Knut Anders Hatlen added a comment - Oops. Just noticed that Kathey had sent a mail to derby-dev asking Mamta to fix this. Unassigning so that I don't block the issue.
          Hide
          Mamta A. Satoor added a comment -

          This patch DERBY4053_patch1_diff_July092009.txt is not yet intended for checkin.

          I thought this patch should fix the XA cleanup issue where we were not differentiating between global XA vs local XA transaction and hence not roling back local XA transaction before trying to close the connection.

          Unless my code changes are wrong, it appears though that all the XA tests in jdbc4 suite are non-global XA transactions. Is that right? The reason I ask is, I had printlns in all the implementations of the new method isInGlobalTransaction and they all always return false. I was expecting following method in EmbedXAConnection.java to return true sometimes because we are dealing with global XA transaction.
          Index: java/engine/org/apache/derby/jdbc/EmbedXAConnection.java
          ===================================================================
          — java/engine/org/apache/derby/jdbc/EmbedXAConnection.java (revision 792143
          )
          +++ java/engine/org/apache/derby/jdbc/EmbedXAConnection.java (working copy)
          @@ -22,6 +22,7 @@
          package org.apache.derby.jdbc;

          import org.apache.derby.impl.jdbc.Util;
          +import org.apache.derby.iapi.jdbc.BrokeredConnectionControl;
          import org.apache.derby.iapi.jdbc.EngineConnection;
          import org.apache.derby.iapi.jdbc.ResourceAdapter;

          @@ -53,6 +54,11 @@
          xaRes = new EmbedXAResource (this, ra);
          }

          + /** @see BrokeredConnectionControl#isInGlobalTransaction() */
          + public boolean isInGlobalTransaction()

          { + return isGlobal(); + }

          +
          /**

          • Check if this connection is part of a global XA transaction.
            *
            Can someone review this patch for me? There are not many comments yet. I will put those soon.

          I am right now also trying to work on a standalone repro case which will reproduce our theory of local XA transaction not getting rolled back before connection.close and hence the server shutdown does not happen correctly causing next server start to hang.

          Show
          Mamta A. Satoor added a comment - This patch DERBY4053_patch1_diff_July092009.txt is not yet intended for checkin. I thought this patch should fix the XA cleanup issue where we were not differentiating between global XA vs local XA transaction and hence not roling back local XA transaction before trying to close the connection. Unless my code changes are wrong, it appears though that all the XA tests in jdbc4 suite are non-global XA transactions. Is that right? The reason I ask is, I had printlns in all the implementations of the new method isInGlobalTransaction and they all always return false. I was expecting following method in EmbedXAConnection.java to return true sometimes because we are dealing with global XA transaction. Index: java/engine/org/apache/derby/jdbc/EmbedXAConnection.java =================================================================== — java/engine/org/apache/derby/jdbc/EmbedXAConnection.java (revision 792143 ) +++ java/engine/org/apache/derby/jdbc/EmbedXAConnection.java (working copy) @@ -22,6 +22,7 @@ package org.apache.derby.jdbc; import org.apache.derby.impl.jdbc.Util; +import org.apache.derby.iapi.jdbc.BrokeredConnectionControl; import org.apache.derby.iapi.jdbc.EngineConnection; import org.apache.derby.iapi.jdbc.ResourceAdapter; @@ -53,6 +54,11 @@ xaRes = new EmbedXAResource (this, ra); } + /** @see BrokeredConnectionControl#isInGlobalTransaction() */ + public boolean isInGlobalTransaction() { + return isGlobal(); + } + /** Check if this connection is part of a global XA transaction. * Can someone review this patch for me? There are not many comments yet. I will put those soon. I am right now also trying to work on a standalone repro case which will reproduce our theory of local XA transaction not getting rolled back before connection.close and hence the server shutdown does not happen correctly causing next server start to hang.
          Hide
          Kathey Marsden added a comment -

          Hi Mamta,

          I took a look at DERBY4053_patch1_diff_July092009.txt. On visual inspection it seems like it should do what we want.
          There is an import of BrokeredConnectionControl in EmbedConnection which should be removed.

          In your mail to derby-dev, you said "Despite the
          change, I still get the hang. " Do you still get the InvalidReply? If you put the try/catch block around the shutdown operations, do you still see an exception?

          I did some grepping on the tests in jdbc4 and I think you are right that they only use local transactions.

          Show
          Kathey Marsden added a comment - Hi Mamta, I took a look at DERBY4053_patch1_diff_July092009.txt. On visual inspection it seems like it should do what we want. There is an import of BrokeredConnectionControl in EmbedConnection which should be removed. In your mail to derby-dev, you said "Despite the change, I still get the hang. " Do you still get the InvalidReply? If you put the try/catch block around the shutdown operations, do you still see an exception? I did some grepping on the tests in jdbc4 and I think you are right that they only use local transactions.
          Hide
          Mamta A. Satoor added a comment -

          Sorry, I wasn't completely awake when i posted on derby-dev that "Despite the change, I still get the hang". With my changes, I do not see the hang in jdbc4 suite.

          Show
          Mamta A. Satoor added a comment - Sorry, I wasn't completely awake when i posted on derby-dev that "Despite the change, I still get the hang". With my changes, I do not see the hang in jdbc4 suite.
          Hide
          Kathey Marsden added a comment -

          Ahh, good news.

          Show
          Kathey Marsden added a comment - Ahh, good news.
          Hide
          Mamta A. Satoor added a comment - - edited

          I have a standalone repro of the problem which demonstrates the problem with local XA transaction. I am copying the comments from the program to descibe what it is doing. Next, I will add a test like this in our regression suite and put more comments in the Derby code changes.

          /**

          • This test does following
          • 1)Start the network server
          • 2)Start a local xa transaction
          • 3)Do not commit the local XA transaction
          • 4)Shutdown the network server
          • 5)Start the server again
          • Before the fix for DERBY-4053 went in, step 4) would not shutdown the
          • server properly because of the pending local XA transaction. During the
          • server shutdown, we try to close all the open connections and close on
          • the XA connection results into an exception because there is still a
          • pending transaction. That exception is not handled by the server and
          • because of that, all the code necessary to shutdown the server is not
          • executed. The next time around, step 5), when we try to bring up the
          • server again, it ends up hanging
          • 2009-07-09 21:21:28.828 GMT : Invalid reply from network server: Insufficient data.
          • 2009-07-09 21:21:28.843 GMT : Could not listen on port 1527 on host 127.0.0.1: java.net.BindException: Address already in use: JVM_Bind
          • The patch attached to the jira makes sure that before calling close on
          • local XA transaction, we first rollback any transaction active on the
          • connection. We should also work on server shutdown code to handle the
          • exceptions better so that it does not stop half way.
          • */

          Show
          Mamta A. Satoor added a comment - - edited I have a standalone repro of the problem which demonstrates the problem with local XA transaction. I am copying the comments from the program to descibe what it is doing. Next, I will add a test like this in our regression suite and put more comments in the Derby code changes. /** This test does following 1)Start the network server 2)Start a local xa transaction 3)Do not commit the local XA transaction 4)Shutdown the network server 5)Start the server again Before the fix for DERBY-4053 went in, step 4) would not shutdown the server properly because of the pending local XA transaction. During the server shutdown, we try to close all the open connections and close on the XA connection results into an exception because there is still a pending transaction. That exception is not handled by the server and because of that, all the code necessary to shutdown the server is not executed. The next time around, step 5), when we try to bring up the server again, it ends up hanging 2009-07-09 21:21:28.828 GMT : Invalid reply from network server: Insufficient data. 2009-07-09 21:21:28.843 GMT : Could not listen on port 1527 on host 127.0.0.1: java.net.BindException: Address already in use: JVM_Bind The patch attached to the jira makes sure that before calling close on local XA transaction, we first rollback any transaction active on the connection. We should also work on server shutdown code to handle the exceptions better so that it does not stop half way. */
          Hide
          Mamta A. Satoor added a comment -

          Recopying the standalone repro with correct name this time around.

          Show
          Mamta A. Satoor added a comment - Recopying the standalone repro with correct name this time around.
          Hide
          Mamta A. Satoor added a comment -

          Attaching the patch DERBY4053_patch2_diff_July102009.txt which is ready for commit. I have added a new test which will not hang anymore because of the pending local XA transaction. If there are no comments on the patch, I will plan on committing it early next week. I will also go ahead and enter new jira entry for doing a better job of handling the exception during server shutdown.

          Show
          Mamta A. Satoor added a comment - Attaching the patch DERBY4053_patch2_diff_July102009.txt which is ready for commit. I have added a new test which will not hang anymore because of the pending local XA transaction. If there are no comments on the patch, I will plan on committing it early next week. I will also go ahead and enter new jira entry for doing a better job of handling the exception during server shutdown.
          Hide
          Kathey Marsden added a comment -

          Thank you Mamta for your continued work on this issue. I didn't apply the patch but just looked at the diff and have the following comments. In EmbedConnection we have:
          import org.apache.derby.iapi.jdbc.BrokeredConnectionControl;
          which I don't think is needed.

          and a small typo in the test "why" instead of "while"
          bring the server down why the local xa transaction is still active

          I wonder if it would be cleaner instead of:
          ClientXADataSource ds = new ClientXADataSource();
          + ds.setPortNumber(TestConfiguration.getCurrent().getPort());
          + ds.setDatabaseName("wombat");
          + ds.setCreateDatabase("create");

          to use J2EEDataSource.getXADataSource()

          Do you think you can check this into 10.5 on Monday so we get a nightly run before I make the RC on Tuesday?

          Show
          Kathey Marsden added a comment - Thank you Mamta for your continued work on this issue. I didn't apply the patch but just looked at the diff and have the following comments. In EmbedConnection we have: import org.apache.derby.iapi.jdbc.BrokeredConnectionControl; which I don't think is needed. and a small typo in the test "why" instead of "while" bring the server down why the local xa transaction is still active I wonder if it would be cleaner instead of: ClientXADataSource ds = new ClientXADataSource(); + ds.setPortNumber(TestConfiguration.getCurrent().getPort()); + ds.setDatabaseName("wombat"); + ds.setCreateDatabase("create"); to use J2EEDataSource.getXADataSource() Do you think you can check this into 10.5 on Monday so we get a nightly run before I make the RC on Tuesday?
          Hide
          Mamta A. Satoor added a comment -

          Thanks for looking at the patch, Kathey. I will work on your comments and go ahead and commit the patch.

          Show
          Mamta A. Satoor added a comment - Thanks for looking at the patch, Kathey. I will work on your comments and go ahead and commit the patch.
          Hide
          Kathey Marsden added a comment -

          Mamta, could you also run jdbcapi.XATest with your patch. It hasn't been enabled (DERBY-4155) because it sometimes hung when run in a suite because of this issue. You will need to sync up to run it because I just checked in a fix to the test to make it run cleanly.

          Show
          Kathey Marsden added a comment - Mamta, could you also run jdbcapi.XATest with your patch. It hasn't been enabled ( DERBY-4155 ) because it sometimes hung when run in a suite because of this issue. You will need to sync up to run it because I just checked in a fix to the test to make it run cleanly.
          Hide
          Mamta A. Satoor added a comment -

          Kathey, when I ran the XATest, it failed
          $ java junit.textui.TestRunner org.apache.derbyTesting.functionTests.tests.jdbcapi.XATest
          .....EE
          Time: 10.031
          There were 2 errors:
          1) XATest:embeddedjava.sql.SQLException: The conglomerate (-1) requested does not exist.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:95)
          at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:201)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:391)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:346)
          at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2201)
          at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:81)
          at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1323)
          at org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:625)
          at org.apache.derby.impl.jdbc.EmbedStatement.executeUpdate(EmbedStatement.java:175)
          at org.apache.derbyTesting.junit.JDBC.dropSchema(JDBC.java:323)
          at org.apache.derbyTesting.junit.CleanDatabaseTestSetup.removeObjects(CleanDatabaseTestSetup.java:231)
          at org.apache.derbyTesting.junit.CleanDatabaseTestSetup.cleanDatabase(CleanDatabaseTestSetup.java:164)
          at org.apache.derbyTesting.junit.CleanDatabaseTestSetup.tearDown(CleanDatabaseTestSetup.java:151)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:20)
          at junit.extensions.TestSetup.run(TestSetup.java:23)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          Caused by: java.sql.SQLException: The conglomerate (-1) requested does not exist.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:119)
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:70)
          ... 21 more
          Caused by: ERROR XSAI2: The conglomerate (-1) requested does not exist.
          at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:286)
          at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(RAMTransaction.java:399)
          at org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(RAMTransaction.java:1308)
          at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.isSchemaReferenced(DataDictionaryImpl.java:2310)
          at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.isSchemaEmpty(DataDictionaryImpl.java:2245)
          at org.apache.derby.iapi.sql.dictionary.SchemaDescriptor.drop(SchemaDescriptor.java:419)
          at org.apache.derby.impl.sql.execute.DropSchemaConstantAction.executeConstantAction(DropSchemaConstantAction.java:99)
          at org.apache.derby.impl.sql.execute.MiscResultSet.open(MiscResultSet.java:64)
          at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:416)
          at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:297)
          at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1235)
          ... 15 more
          2) XATest:clientjava.sql.SQLException: The conglomerate (-1) requested does notexist.
          at org.apache.derby.client.am.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:96)
          at org.apache.derby.client.am.SqlException.getSQLException(SqlException.java:358)
          at org.apache.derby.client.am.Statement.executeUpdate(Statement.java:509)
          at org.apache.derbyTesting.junit.JDBC.dropSchema(JDBC.java:323)
          at org.apache.derbyTesting.junit.CleanDatabaseTestSetup.removeObjects(CleanDatabaseTestSetup.java:231)
          at org.apache.derbyTesting.junit.CleanDatabaseTestSetup.cleanDatabase(CleanDatabaseTestSetup.java:164)
          at org.apache.derbyTesting.junit.CleanDatabaseTestSetup.setUp(CleanDatabaseTestSetup.java:109)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:18)
          at junit.extensions.TestSetup.run(TestSetup.java:23)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
          at junit.extensions.TestSetup.run(TestSetup.java:23)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
          at junit.extensions.TestSetup.run(TestSetup.java:23)
          Caused by: org.apache.derby.client.am.SqlException: The conglomerate (-1) requested does not exist.
          at org.apache.derby.client.am.Statement.completeSqlca(Statement.java:1833)
          at org.apache.derby.client.am.Statement.completeExecuteImmediate(Statement.java:1421)
          at org.apache.derby.client.net.NetStatementReply.parseEXCSQLIMMreply(NetStatementReply.java:208)
          at org.apache.derby.client.net.NetStatementReply.readExecuteImmediate(NetStatementReply.java:59)
          at org.apache.derby.client.net.StatementReply.readExecuteImmediate(StatementReply.java:45)
          at org.apache.derby.client.net.NetStatement.readExecuteImmediate_(NetStatement.java:128)
          at org.apache.derby.client.am.Statement.readExecuteImmediate(Statement.java:1417)
          at org.apache.derby.client.am.Statement.flowExecute(Statement.java:2120)
          at org.apache.derby.client.am.Statement.executeUpdateX(Statement.java:514)
          at org.apache.derby.client.am.Statement.executeUpdate(Statement.java:500)
          ... 22 more

          FAILURES!!!
          Tests run: 5, Failures: 0, Errors: 2

          $

          Show
          Mamta A. Satoor added a comment - Kathey, when I ran the XATest, it failed $ java junit.textui.TestRunner org.apache.derbyTesting.functionTests.tests.jdbcapi.XATest .....EE Time: 10.031 There were 2 errors: 1) XATest:embeddedjava.sql.SQLException: The conglomerate (-1) requested does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:95) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:201) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:391) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:346) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2201) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:81) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1323) at org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:625) at org.apache.derby.impl.jdbc.EmbedStatement.executeUpdate(EmbedStatement.java:175) at org.apache.derbyTesting.junit.JDBC.dropSchema(JDBC.java:323) at org.apache.derbyTesting.junit.CleanDatabaseTestSetup.removeObjects(CleanDatabaseTestSetup.java:231) at org.apache.derbyTesting.junit.CleanDatabaseTestSetup.cleanDatabase(CleanDatabaseTestSetup.java:164) at org.apache.derbyTesting.junit.CleanDatabaseTestSetup.tearDown(CleanDatabaseTestSetup.java:151) at junit.extensions.TestSetup$1.protect(TestSetup.java:20) at junit.extensions.TestSetup.run(TestSetup.java:23) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) Caused by: java.sql.SQLException: The conglomerate (-1) requested does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:119) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:70) ... 21 more Caused by: ERROR XSAI2: The conglomerate (-1) requested does not exist. at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:286) at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(RAMTransaction.java:399) at org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(RAMTransaction.java:1308) at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.isSchemaReferenced(DataDictionaryImpl.java:2310) at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.isSchemaEmpty(DataDictionaryImpl.java:2245) at org.apache.derby.iapi.sql.dictionary.SchemaDescriptor.drop(SchemaDescriptor.java:419) at org.apache.derby.impl.sql.execute.DropSchemaConstantAction.executeConstantAction(DropSchemaConstantAction.java:99) at org.apache.derby.impl.sql.execute.MiscResultSet.open(MiscResultSet.java:64) at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:416) at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:297) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1235) ... 15 more 2) XATest:clientjava.sql.SQLException: The conglomerate (-1) requested does notexist. at org.apache.derby.client.am.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:96) at org.apache.derby.client.am.SqlException.getSQLException(SqlException.java:358) at org.apache.derby.client.am.Statement.executeUpdate(Statement.java:509) at org.apache.derbyTesting.junit.JDBC.dropSchema(JDBC.java:323) at org.apache.derbyTesting.junit.CleanDatabaseTestSetup.removeObjects(CleanDatabaseTestSetup.java:231) at org.apache.derbyTesting.junit.CleanDatabaseTestSetup.cleanDatabase(CleanDatabaseTestSetup.java:164) at org.apache.derbyTesting.junit.CleanDatabaseTestSetup.setUp(CleanDatabaseTestSetup.java:109) at junit.extensions.TestSetup$1.protect(TestSetup.java:18) at junit.extensions.TestSetup.run(TestSetup.java:23) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22) at junit.extensions.TestSetup$1.protect(TestSetup.java:19) at junit.extensions.TestSetup.run(TestSetup.java:23) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22) at junit.extensions.TestSetup$1.protect(TestSetup.java:19) at junit.extensions.TestSetup.run(TestSetup.java:23) Caused by: org.apache.derby.client.am.SqlException: The conglomerate (-1) requested does not exist. at org.apache.derby.client.am.Statement.completeSqlca(Statement.java:1833) at org.apache.derby.client.am.Statement.completeExecuteImmediate(Statement.java:1421) at org.apache.derby.client.net.NetStatementReply.parseEXCSQLIMMreply(NetStatementReply.java:208) at org.apache.derby.client.net.NetStatementReply.readExecuteImmediate(NetStatementReply.java:59) at org.apache.derby.client.net.StatementReply.readExecuteImmediate(StatementReply.java:45) at org.apache.derby.client.net.NetStatement.readExecuteImmediate_(NetStatement.java:128) at org.apache.derby.client.am.Statement.readExecuteImmediate(Statement.java:1417) at org.apache.derby.client.am.Statement.flowExecute(Statement.java:2120) at org.apache.derby.client.am.Statement.executeUpdateX(Statement.java:514) at org.apache.derby.client.am.Statement.executeUpdate(Statement.java:500) ... 22 more FAILURES!!! Tests run: 5, Failures: 0, Errors: 2 $
          Hide
          Mamta A. Satoor added a comment -

          Strange, but I am not able to reproduce the conglomerate does not exist error any more. Not sure why i got it. I cleaned up the test dir and reran the tests and XATest runs fine after that.

          Show
          Mamta A. Satoor added a comment - Strange, but I am not able to reproduce the conglomerate does not exist error any more. Not sure why i got it. I cleaned up the test dir and reran the tests and XATest runs fine after that.
          Hide
          Mamta A. Satoor added a comment -

          Posting an updated patch DERBY4053_patch3_diff_July102009.txt. It has changes suggested by Kathey on the previous patch. I will run the entire junit suite and derbyall. This is on trunk.

          Show
          Mamta A. Satoor added a comment - Posting an updated patch DERBY4053_patch3_diff_July102009.txt. It has changes suggested by Kathey on the previous patch. I will run the entire junit suite and derbyall. This is on trunk.
          Hide
          Kathey Marsden added a comment -

          Hi Mamta,

          Short story is I don't think this is because of your change and if you start with a clean directory the test should pass.

          I think maybe you got this because you did not clean up your test directory after synching up the DERBY-712 changes. If I create a schema with the 10.6.0.0.791528 build (before DERBY-712) and then try to drop it with a build after the change I get the same error.
          ij> drop schema myschema restrict;
          ERROR XSAI2: The conglomerate (-1) requested does not exist.

          I think this is to be expected. Suran could you confirm that all 10.6 databases made before the DERBY-712 patch are expected to be unusable?

          Show
          Kathey Marsden added a comment - Hi Mamta, Short story is I don't think this is because of your change and if you start with a clean directory the test should pass. I think maybe you got this because you did not clean up your test directory after synching up the DERBY-712 changes. If I create a schema with the 10.6.0.0.791528 build (before DERBY-712 ) and then try to drop it with a build after the change I get the same error. ij> drop schema myschema restrict; ERROR XSAI2: The conglomerate (-1) requested does not exist. I think this is to be expected. Suran could you confirm that all 10.6 databases made before the DERBY-712 patch are expected to be unusable?
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for doing this, Mamta. Patch #3 looks fine to me, except some nits like mixing tabs and spaces on the same line and trailing blanks. It would also be good if the comment in Database.close() explained why and not only what the code does. Thanks.

          Show
          Knut Anders Hatlen added a comment - Thanks for doing this, Mamta. Patch #3 looks fine to me, except some nits like mixing tabs and spaces on the same line and trailing blanks. It would also be good if the comment in Database.close() explained why and not only what the code does. Thanks.
          Hide
          Suran Jayathilaka added a comment -
          Show
          Suran Jayathilaka added a comment - With reference to http://issues.apache.org/jira/browse/DERBY-712?focusedCommentId=12728739&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12728739 I think it's safe to say 1.6 databases from before the DERBY-712 changes will now be unusable.
          Hide
          Mamta A. Satoor added a comment -

          Knut, thanks for looking at the patch. I will add comments to Database.close() for the code changes that I have made. As for mixing tabs and spaces, maybe someone can help me with that. I am using eclipse and it must be adding the tabs automatically because when I start a new codeline, it puts me automatically on what appears to be right place on that line to right the code. Maybe I need to change some settings.

          Also, I ran derball and junit and they all ran fine. After adding the comments, I will go ahead and check in the changes into trunk and then work on merging it to 10.5 codeline.

          Show
          Mamta A. Satoor added a comment - Knut, thanks for looking at the patch. I will add comments to Database.close() for the code changes that I have made. As for mixing tabs and spaces, maybe someone can help me with that. I am using eclipse and it must be adding the tabs automatically because when I start a new codeline, it puts me automatically on what appears to be right place on that line to right the code. Maybe I need to change some settings. Also, I ran derball and junit and they all ran fine. After adding the comments, I will go ahead and check in the changes into trunk and then work on merging it to 10.5 codeline.
          Hide
          Mamta A. Satoor added a comment -

          Committed changes to trunk with revision 793588 and with commit comments
          **************
          DERBY-4053 The local XA connections were not getting rollback during Database close and
          this resulted into exception at connection close if the connection object had any pending
          transaction. Made changes so that all the connections(except global XA connections) will
          have their transactions rolled back before those connections are closed.
          **************

          Also, I think have taken care of space and tab mixing. Will work on migrating this to 10.5

          Show
          Mamta A. Satoor added a comment - Committed changes to trunk with revision 793588 and with commit comments ************** DERBY-4053 The local XA connections were not getting rollback during Database close and this resulted into exception at connection close if the connection object had any pending transaction. Made changes so that all the connections(except global XA connections) will have their transactions rolled back before those connections are closed. ************** Also, I think have taken care of space and tab mixing. Will work on migrating this to 10.5
          Hide
          Kathey Marsden added a comment -

          Thanks Mamta for the fix. I still can't quite figure why this would occur on 10.4. DERBY-3319 wasn't fixed there. Does anyone have any theories? Perhaps the 10.4 issue is some other problem during shutdown that will be exposed/resolved once we properly handle and log exceptions during shutdown.

          Show
          Kathey Marsden added a comment - Thanks Mamta for the fix. I still can't quite figure why this would occur on 10.4. DERBY-3319 wasn't fixed there. Does anyone have any theories? Perhaps the 10.4 issue is some other problem during shutdown that will be exposed/resolved once we properly handle and log exceptions during shutdown.
          Hide
          Mamta A. Satoor added a comment -

          Created DERBY-4304 for
          1) Make sure an exception during shutdown processing does not prevent the remaining shutdown tasks, like closing the server socket from occurring.
          2) Make sure any exceptions that occur in shutdown processing are reported to the console.

          Show
          Mamta A. Satoor added a comment - Created DERBY-4304 for 1) Make sure an exception during shutdown processing does not prevent the remaining shutdown tasks, like closing the server socket from occurring. 2) Make sure any exceptions that occur in shutdown processing are reported to the console.
          Hide
          Mamta A. Satoor added a comment -

          This is my second time running junit suite on 10.5 codeline after merging the changes into 10.5 and both times I am running into problems with upgrade tests. Not sure if there is a known issue with running upgrade tests on 10.5. I see quite a different errors. Wish had run the junit tests in text runner rather than gui runner. But here are couple failures
          testOldVersion:Old minor(driver): expected<1> but was <5>
          testGrantRevokeStatements(...Changes10_2)Unexpected SQL state expected <..X01> but was <..Z60>
          testDatabaseOwnerChange(..10_3)AUTHORIZATIONID not valid for SYSIBM expected:<DBA> but was <APP>
          ....

          Does anyone have the cycle to merge my trunk patch to 10.5 codeline and see if the tests run fine on their machine?

          In the meantime, I will run the derbyall on merged 10.5 and see how that goes

          Show
          Mamta A. Satoor added a comment - This is my second time running junit suite on 10.5 codeline after merging the changes into 10.5 and both times I am running into problems with upgrade tests. Not sure if there is a known issue with running upgrade tests on 10.5. I see quite a different errors. Wish had run the junit tests in text runner rather than gui runner. But here are couple failures testOldVersion:Old minor(driver): expected<1> but was <5> testGrantRevokeStatements(...Changes10_2)Unexpected SQL state expected <..X01> but was <..Z60> testDatabaseOwnerChange(..10_3)AUTHORIZATIONID not valid for SYSIBM expected:<DBA> but was <APP> .... Does anyone have the cycle to merge my trunk patch to 10.5 codeline and see if the tests run fine on their machine? In the meantime, I will run the derbyall on merged 10.5 and see how that goes
          Hide
          Kathey Marsden added a comment -

          Hi Mamta, I have a test run going now but will test your patch as soon as that is done. I have never seen these failures before but I don't think upgrade tests run for network server at all so think probably this is not related to your change. I'll let you know how my run goes.

          Show
          Kathey Marsden added a comment - Hi Mamta, I have a test run going now but will test your patch as soon as that is done. I have never seen these failures before but I don't think upgrade tests run for network server at all so think probably this is not related to your change. I'll let you know how my run goes.
          Hide
          Mamta A. Satoor added a comment -

          derbyall ran fine out my machine with intermittent failure (I had seen this failure on trunk too. forgot to mention it)

                          • Diff file derbyall/storeall/storeunit/T_RawStoreFactory.diff
              • Start: T_RawStoreFactory jdk1.6.0 storeall:storeunit 2009-07-13 14:34:14 ***
                2 del
                < – Unit Test T_RawStoreFactory finished
                2 add
                > Shutting down due to unit test failure.
                Test Failed.
              • End: T_RawStoreFactory jdk1.6.0 storeall:storeunit 2009-07-13 14:34:31 ***
          Show
          Mamta A. Satoor added a comment - derbyall ran fine out my machine with intermittent failure (I had seen this failure on trunk too. forgot to mention it) Diff file derbyall/storeall/storeunit/T_RawStoreFactory.diff Start: T_RawStoreFactory jdk1.6.0 storeall:storeunit 2009-07-13 14:34:14 *** 2 del < – Unit Test T_RawStoreFactory finished 2 add > Shutting down due to unit test failure. Test Failed. End: T_RawStoreFactory jdk1.6.0 storeall:storeunit 2009-07-13 14:34:31 ***
          Hide
          Myrna van Lunteren added a comment -

          That's like DERBY-2635 (which is closed)...But I think it's important to check on derby.log with this test.

          Show
          Myrna van Lunteren added a comment - That's like DERBY-2635 (which is closed)...But I think it's important to check on derby.log with this test.
          Hide
          Kathey Marsden added a comment -

          or DERBY-3993 I have seen the test fail with the signature you describe but it has the observers error in derby.log.

          Show
          Kathey Marsden added a comment - or DERBY-3993 I have seen the test fail with the signature you describe but it has the observers error in derby.log.
          Hide
          Mamta A. Satoor added a comment -

          I looked through derby.log and see that it has observers error in derby.log
          [main] FAIL - org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED There should be 0 observers, but we still have 1 observers.
          org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED There should be 0 observers, but we still have 1 observers.
          at org.apache.derby.shared.common.sanity.SanityManager.THROWASSERT(SanityManager.java:162)
          at org.apache.derby.shared.common.sanity.SanityManager.THROWASSERT(SanityManager.java:147)
          at org.apache.derby.impl.store.raw.xact.Xact.doComplete(Xact.java:1969)
          at org.apache.derby.impl.store.raw.xact.Xact.preComplete(Xact.java:1925)
          at org.apache.derby.impl.store.raw.xact.Xact.prepareCommit(Xact.java:769)
          at org.apache.derby.impl.store.raw.xact.Xact.commit(Xact.java:882)
          at org.apache.derby.impl.store.raw.xact.Xact.commit(Xact.java:679)
          at org.apache.derbyTesting.unitTests.store.T_Util.t_commit(T_Util.java:840)
          at org.apache.derbyTesting.unitTests.store.T_RawStoreFactory.TC001(T_RawStoreFactory.java:7310)
          at org.apache.derbyTesting.unitTests.store.T_RawStoreFactory.runTempTests(T_RawStoreFactory.java:417)
          at org.apache.derbyTesting.unitTests.store.T_RawStoreFactory.runTestSet(T_RawStoreFactory.java:248)
          at org.apache.derbyTesting.unitTests.harness.T_MultiIterations.runTests(T_MultiIterations.java:95)
          at org.apache.derbyTesting.unitTests.harness.T_MultiThreadedIterations.runTests(T_MultiThreadedIterations.java:92)
          at org.apache.derbyTesting.unitTests.harness.T_Generic.Execute(T_Generic.java:118)
          at org.apache.derbyTesting.unitTests.harness.BasicUnitTestManager.runATest(BasicUnitTestManager.java:184)
          at org.apache.derbyTesting.unitTests.harness.BasicUnitTestManager.runTests(BasicUnitTestManager.java:246)
          at org.apache.derbyTesting.unitTests.harness.BasicUnitTestManager.boot(BasicUnitTestManager.java:93)
          at org.apache.derby.impl.services.monitor.BaseMonitor.boot(BaseMonitor.java:2021)
          at org.apache.derby.impl.services.monitor.TopService.bootModule(TopService.java:333)
          at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(BaseMonitor.java:1858)
          at org.apache.derby.impl.services.monitor.BaseMonitor.startServices(BaseMonitor.java:997)
          at org.apache.derby.impl.services.monitor.BaseMonitor.runWithState(BaseMonitor.java:429)
          at org.apache.derby.impl.services.monitor.FileMonitor.<init>(FileMonitor.java:60)
          at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Monitor.java:289)
          at org.apache.derbyTesting.unitTests.harness.UnitTestMain.main(UnitTestMain.java:51)

          Show
          Mamta A. Satoor added a comment - I looked through derby.log and see that it has observers error in derby.log [main] FAIL - org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED There should be 0 observers, but we still have 1 observers. org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED There should be 0 observers, but we still have 1 observers. at org.apache.derby.shared.common.sanity.SanityManager.THROWASSERT(SanityManager.java:162) at org.apache.derby.shared.common.sanity.SanityManager.THROWASSERT(SanityManager.java:147) at org.apache.derby.impl.store.raw.xact.Xact.doComplete(Xact.java:1969) at org.apache.derby.impl.store.raw.xact.Xact.preComplete(Xact.java:1925) at org.apache.derby.impl.store.raw.xact.Xact.prepareCommit(Xact.java:769) at org.apache.derby.impl.store.raw.xact.Xact.commit(Xact.java:882) at org.apache.derby.impl.store.raw.xact.Xact.commit(Xact.java:679) at org.apache.derbyTesting.unitTests.store.T_Util.t_commit(T_Util.java:840) at org.apache.derbyTesting.unitTests.store.T_RawStoreFactory.TC001(T_RawStoreFactory.java:7310) at org.apache.derbyTesting.unitTests.store.T_RawStoreFactory.runTempTests(T_RawStoreFactory.java:417) at org.apache.derbyTesting.unitTests.store.T_RawStoreFactory.runTestSet(T_RawStoreFactory.java:248) at org.apache.derbyTesting.unitTests.harness.T_MultiIterations.runTests(T_MultiIterations.java:95) at org.apache.derbyTesting.unitTests.harness.T_MultiThreadedIterations.runTests(T_MultiThreadedIterations.java:92) at org.apache.derbyTesting.unitTests.harness.T_Generic.Execute(T_Generic.java:118) at org.apache.derbyTesting.unitTests.harness.BasicUnitTestManager.runATest(BasicUnitTestManager.java:184) at org.apache.derbyTesting.unitTests.harness.BasicUnitTestManager.runTests(BasicUnitTestManager.java:246) at org.apache.derbyTesting.unitTests.harness.BasicUnitTestManager.boot(BasicUnitTestManager.java:93) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(BaseMonitor.java:2021) at org.apache.derby.impl.services.monitor.TopService.bootModule(TopService.java:333) at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(BaseMonitor.java:1858) at org.apache.derby.impl.services.monitor.BaseMonitor.startServices(BaseMonitor.java:997) at org.apache.derby.impl.services.monitor.BaseMonitor.runWithState(BaseMonitor.java:429) at org.apache.derby.impl.services.monitor.FileMonitor.<init>(FileMonitor.java:60) at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Monitor.java:289) at org.apache.derbyTesting.unitTests.harness.UnitTestMain.main(UnitTestMain.java:51)
          Hide
          Kathey Marsden added a comment -

          Hi Mamta,

          I got no failures on 10.5 with suites.All after merging this change to the branch.

          Show
          Kathey Marsden added a comment - Hi Mamta, I got no failures on 10.5 with suites.All after merging this change to the branch.
          Hide
          Mamta A. Satoor added a comment -

          Thanks for running the tests, Kathey. Appreciate it. Will go ahead and commit the changes into 10.5

          Show
          Mamta A. Satoor added a comment - Thanks for running the tests, Kathey. Appreciate it. Will go ahead and commit the changes into 10.5
          Hide
          Mamta A. Satoor added a comment -

          Migrated into 10.5 with revision 793900.

          Show
          Mamta A. Satoor added a comment - Migrated into 10.5 with revision 793900.
          Hide
          Kathey Marsden added a comment -

          Bad news: On trunk, I updated to get Mamta's change and ran tests with 2 other changes. First I applied Tiago's patch for DERBY-4292. Second I tried to enable XATest. I ran suites.All and hit DERBY-4053. I think it was the addition of XATest that did it, not Tiago's change because the initial InvalidReply came during XATest. I will catch and print any exception in shutdown and rerun and hopefully will hit it again and we can see what the issue is now that local transactions are rolling back.

          Show
          Kathey Marsden added a comment - Bad news: On trunk, I updated to get Mamta's change and ran tests with 2 other changes. First I applied Tiago's patch for DERBY-4292 . Second I tried to enable XATest. I ran suites.All and hit DERBY-4053 . I think it was the addition of XATest that did it, not Tiago's change because the initial InvalidReply came during XATest. I will catch and print any exception in shutdown and rerun and hopefully will hit it again and we can see what the issue is now that local transactions are rolling back.
          Hide
          Kathey Marsden added a comment -

          Tiago offered to try to reproduce as well. Attached is the patch that prints out any exceptions during shutdown. This is not for commit.

          Show
          Kathey Marsden added a comment - Tiago offered to try to reproduce as well. Attached is the patch that prints out any exceptions during shutdown. This is not for commit.
          Hide
          Kathey Marsden added a comment -

          With the diag patch, XATest, and the change to check unordered results for checkXATransactionView, and running jdbcapi._Suite I can reproduce the error on shutdown described here: https://issues.apache.org/jira/browse/DERBY-4155?focusedCommentId=12731086&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12731086

          followed later by an InvalidReply and the hang with bind exception in the derby.log

          I also see the following issue shutting down which does not cause problems restarting the server because the JMX cleanup comes after serverSocket.close()
          testSystemShutdown used 406 ms java.lang.NullPointerException
          at org.apache.derby.impl.services.jmx.JMXManagementService.unregisterMBean(JMXManagementService.java:286)
          at org.apache.derby.impl.services.jmx.JMXManagementService.unregisterMBean(JMXManagementService.java:277)
          at org.apache.derby.impl.drda.NetworkServerControlImpl.blockingStart(NetworkServerControlImpl.java:864)
          at sun.reflect.GeneratedMethodAccessor596.invoke(Unknown Source)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
          at java.lang.reflect.Method.invoke(Method.java:599)
          at org.apache.derby.iapi.jdbc.DRDAServerStarter.run(DRDAServerStarter.java:236)
          at java.lang.Thread.run(Thread.java:735)
          .
          testPoolDS used 5234 ms .

          In general ur failure to report and handle errors on shutdown has been masking other issues.

          I think we should proceed as follows:
          1) Change the title of DERBY-4053 to: Network Server does not rollback local transactions on XAConnections on shutdown causing hang on restart with message java.net.BindException: Address already in use: NET_Bind in derby.log

          2) File two new issues described above.

          3)Continue work on DERBY-4304 to expose additonal issues and eliminate the bind error on other circumstances.

          4) Add the unordered result set check as part of the DERBY-4155

          I don't think these tasks should hold up 10.5.2

          Show
          Kathey Marsden added a comment - With the diag patch, XATest, and the change to check unordered results for checkXATransactionView, and running jdbcapi._Suite I can reproduce the error on shutdown described here: https://issues.apache.org/jira/browse/DERBY-4155?focusedCommentId=12731086&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12731086 followed later by an InvalidReply and the hang with bind exception in the derby.log I also see the following issue shutting down which does not cause problems restarting the server because the JMX cleanup comes after serverSocket.close() testSystemShutdown used 406 ms java.lang.NullPointerException at org.apache.derby.impl.services.jmx.JMXManagementService.unregisterMBean(JMXManagementService.java:286) at org.apache.derby.impl.services.jmx.JMXManagementService.unregisterMBean(JMXManagementService.java:277) at org.apache.derby.impl.drda.NetworkServerControlImpl.blockingStart(NetworkServerControlImpl.java:864) at sun.reflect.GeneratedMethodAccessor596.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:599) at org.apache.derby.iapi.jdbc.DRDAServerStarter.run(DRDAServerStarter.java:236) at java.lang.Thread.run(Thread.java:735) . testPoolDS used 5234 ms . In general ur failure to report and handle errors on shutdown has been masking other issues. I think we should proceed as follows: 1) Change the title of DERBY-4053 to: Network Server does not rollback local transactions on XAConnections on shutdown causing hang on restart with message java.net.BindException: Address already in use: NET_Bind in derby.log 2) File two new issues described above. 3)Continue work on DERBY-4304 to expose additonal issues and eliminate the bind error on other circumstances. 4) Add the unordered result set check as part of the DERBY-4155 I don't think these tasks should hold up 10.5.2
          Hide
          Kathey Marsden added a comment -

          Verified fix in 10.5 and trunk. There were other issues on shutdown identified that may cause a similar symptom, but those will be logged separately.

          Show
          Kathey Marsden added a comment - Verified fix in 10.5 and trunk. There were other issues on shutdown identified that may cause a similar symptom, but those will be logged separately.
          Hide
          Mamta A. Satoor added a comment -

          Backported chages to 10.4 and 10.3 codelines

          Show
          Mamta A. Satoor added a comment - Backported chages to 10.4 and 10.3 codelines

            People

            • Assignee:
              Mamta A. Satoor
              Reporter:
              Kathey Marsden
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development