Derby
  1. Derby
  2. DERBY-2871

XATransactionTest gets XaException: Error executing a XAResource.commit(), server returned XAER_PROTO.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 10.3.1.4
    • Fix Version/s: 10.4.1.3
    • Component/s: JDBC
    • Labels:
      None
    • Environment:
      OS: HP-UX v1.11 i
      JDK: HP 1.5.0.03

      Description

      Method: org.apache.derbyTesting.functionTests.tests.jdbcapi.XATransactionTest
      Signature:
      %XAER_PROTO : Error executing a XAResource.commit(), server returned XAER_PROTO%

      Also see: http://dbtg.thresher.com/derby/test/10.3.1.0_RC/jvm1.5/testing/testlog/hpux/548006-suitesAll_diff.txt

      1. derby-2871.v2.diff
        23 kB
        Dyre Tjeldvoll
      2. derby-2871_NOT_FOR_COMMIT.diff
        22 kB
        Dyre Tjeldvoll
      3. DERBY-2871_020108.diff
        20 kB
        Myrna van Lunteren
      4. d2871-test.stat
        0.2 kB
        Julius Stroffek
      5. d2871-test.diff
        3 kB
        Julius Stroffek
      6. d2871.stat
        0.2 kB
        Julius Stroffek
      7. d2871.stat
        0.6 kB
        Julius Stroffek
      8. d2871.stat
        0.6 kB
        Julius Stroffek
      9. d2871.stat
        0.6 kB
        Julius Stroffek
      10. d2871.diff
        1 kB
        Julius Stroffek
      11. d2871.diff
        20 kB
        Julius Stroffek
      12. d2871.diff
        20 kB
        Julius Stroffek
      13. d2871.diff
        20 kB
        Julius Stroffek

        Issue Links

          Activity

          Hide
          Dyre Tjeldvoll added a comment -

          Committed revision 647078.

          (derbyall and suites.All both ran without failures)

          Show
          Dyre Tjeldvoll added a comment - Committed revision 647078. (derbyall and suites.All both ran without failures)
          Hide
          Dyre Tjeldvoll added a comment -

          Attaching a new patch (derby-2971.v2.diff) which re-enables the lock timeout setting. With this patch I could run the entire jdbcapi suite without failures.

          Show
          Dyre Tjeldvoll added a comment - Attaching a new patch (derby-2971.v2.diff) which re-enables the lock timeout setting. With this patch I could run the entire jdbcapi suite without failures.
          Hide
          Dyre Tjeldvoll added a comment -

          Thanks for the feedback, Julius. I know that you have other stuff on your plate these days.

          I see now why the lock timeouts had to be set back to the defaults in the suite method. I'll put that back and add a comment explaining why it is done that way.

          Show
          Dyre Tjeldvoll added a comment - Thanks for the feedback, Julius. I know that you have other stuff on your plate these days. I see now why the lock timeouts had to be set back to the defaults in the suite method. I'll put that back and add a comment explaining why it is done that way.
          Hide
          Julius Stroffek added a comment -

          Dyre, thanks for working on this.

          This is due the SetTransactionIsolationTest in jdbcapi test package sets the derby.locks.waitTimeout and derby.locks.deadLockTimeout to 3 which is not restored back to default. Afterwards, the xaTransactionTimeout test runs with a lock timeout to be smaller than an xa transaction timeout so the select on line 233 which checks that the locks are released has a smaller timeout than transactions them self. That's why I changed those properties to default values in the previous patches.

          I think that if you use DatabasePropertyTestSetup.setLockTimeouts to setup timeouts in "suite" method the test will stop failing.

          The mistakes I made in my last patch are stupid and are done mostly because I have not spend required time on those changes. I have a patch prepared for this for some time now but wanted to spend more time to review that I addressed all the comments. Your patch looks good to me - you performed some more cleanups in the test (removed unnecessary Assert class when calling assert functions).

          Show
          Julius Stroffek added a comment - Dyre, thanks for working on this. This is due the SetTransactionIsolationTest in jdbcapi test package sets the derby.locks.waitTimeout and derby.locks.deadLockTimeout to 3 which is not restored back to default. Afterwards, the xaTransactionTimeout test runs with a lock timeout to be smaller than an xa transaction timeout so the select on line 233 which checks that the locks are released has a smaller timeout than transactions them self. That's why I changed those properties to default values in the previous patches. I think that if you use DatabasePropertyTestSetup.setLockTimeouts to setup timeouts in "suite" method the test will stop failing. The mistakes I made in my last patch are stupid and are done mostly because I have not spend required time on those changes. I have a patch prepared for this for some time now but wanted to spend more time to review that I addressed all the comments. Your patch looks good to me - you performed some more cleanups in the test (removed unnecessary Assert class when calling assert functions).
          Hide
          Dyre Tjeldvoll added a comment -

          Attaching derby-2871-NOT_FOR_COMMIT.diff where I have tried to address the latest review comments, (and which applies cleanly to the current trunk). With the new patch I can run XATransactionTest by itself, but when running as part of suites.All it fails with a lock timeout on line 233 or 234. I'm not sure how best to modify the test to achieve the intended effect, so I have uploaded a preliminary patch for people to comment on.

          Show
          Dyre Tjeldvoll added a comment - Attaching derby-2871-NOT_FOR_COMMIT.diff where I have tried to address the latest review comments, (and which applies cleanly to the current trunk). With the new patch I can run XATransactionTest by itself, but when running as part of suites.All it fails with a lock timeout on line 233 or 234. I'm not sure how best to modify the test to achieve the intended effect, so I have uploaded a preliminary patch for people to comment on.
          Hide
          Knut Anders Hatlen added a comment -

          It seems like Dan's comment from 05/Feb/08 12:19 PM hasn't been addressed.

          ResourceAdapterImpl.cancelXATransaction() calls findConnection() twice. Is that intentional?

          Is the comment about synchronization in ResourceAdapter.cancelXATransaction() still valid in the latest patch?

          Could you comment on why you call DatabasePropertyTestSetup.setLockTimeouts() in XATransactionTest.suite()? As far as I can see, the timeout values specified (20 sec deadlock, 60 sec wait) are the same as the default values, so it doesn't change anything.

          Tiny nit: assertTrue(xaConn[i] != null) could be replaced with assertNotNull(xaConn[i]).

          Show
          Knut Anders Hatlen added a comment - It seems like Dan's comment from 05/Feb/08 12:19 PM hasn't been addressed. ResourceAdapterImpl.cancelXATransaction() calls findConnection() twice. Is that intentional? Is the comment about synchronization in ResourceAdapter.cancelXATransaction() still valid in the latest patch? Could you comment on why you call DatabasePropertyTestSetup.setLockTimeouts() in XATransactionTest.suite()? As far as I can see, the timeout values specified (20 sec deadlock, 60 sec wait) are the same as the default values, so it doesn't change anything. Tiny nit: assertTrue(xaConn [i] != null) could be replaced with assertNotNull(xaConn [i] ).
          Hide
          Julius Stroffek added a comment -

          Can somebody commit this patch if there are no objections?

          Show
          Julius Stroffek added a comment - Can somebody commit this patch if there are no objections?
          Hide
          Julius Stroffek added a comment -

          Dan, Knut: Thanks for catching this. I fixed all of those. I ran all the testing again without failures.

          Yes, it might be possible to run just the XATransactionTest since no other test is using the changed code but just for sure I ran all of them again.

          Attaching the latest patch...

          Show
          Julius Stroffek added a comment - Dan, Knut: Thanks for catching this. I fixed all of those. I ran all the testing again without failures. Yes, it might be possible to run just the XATransactionTest since no other test is using the changed code but just for sure I ran all of them again. Attaching the latest patch...
          Hide
          Knut Anders Hatlen added a comment -

          Some nits:

          • typo in new error message (J135): beeing -> being
          • inconsistent use of tabs/spaces in the new methods in ResourceAdapter, ResourceAdapterImpl and XADatabase
          • unnecessary whitespace diff (adding trailing spaces) in XATransactionState.java, chunk @@ -200,8 +199,7 @@.
          Show
          Knut Anders Hatlen added a comment - Some nits: typo in new error message (J135): beeing -> being inconsistent use of tabs/spaces in the new methods in ResourceAdapter, ResourceAdapterImpl and XADatabase unnecessary whitespace diff (adding trailing spaces) in XATransactionState.java, chunk @@ -200,8 +199,7 @@.
          Hide
          Daniel John Debrunner added a comment -

          Why does ResourceAdapter.cancelXATransaction() need to hold its synchronization for the lifetime of the call? It's explicitly called out in the javadoc and comments in this issue, but I don't see any reason for it. The rollback could take some time, and during that time any other global transaction work that needs the ResourceAdapter will be blocked.

          Show
          Daniel John Debrunner added a comment - Why does ResourceAdapter.cancelXATransaction() need to hold its synchronization for the lifetime of the call? It's explicitly called out in the javadoc and comments in this issue, but I don't see any reason for it. The rollback could take some time, and during that time any other global transaction work that needs the ResourceAdapter will be blocked.
          Hide
          Daniel John Debrunner added a comment -

          Can some comments be added to this in the test, it's a little unclear what's going on in the catch block.
          The exception that caused the failure is being thrown away, this is a problem if the test fails in the future, the exception normally points to where the problem is, losing that information makes debugging much harder, especially with an intermittent problem.

          + try

          { + stm = getConnection().createStatement(); + rs = stm.executeQuery("select count(*) from XATT"); + rs.next(); + }

          catch (SQLException e) {
          + rs = stm.executeQuery("select global_xid from syscs_diag.transaction_table "
          + + "where global_xid is not null order by global_xid");
          + StringBuffer sb = new StringBuffer("Global transactions in progress:\n");
          + while (rs.next())

          { + sb.append(rs.getString(1)); + sb.append("\n"); + }

          + Assert.fail(sb.toString());
          + }

          Show
          Daniel John Debrunner added a comment - Can some comments be added to this in the test, it's a little unclear what's going on in the catch block. The exception that caused the failure is being thrown away, this is a problem if the test fails in the future, the exception normally points to where the problem is, losing that information makes debugging much harder, especially with an intermittent problem. + try { + stm = getConnection().createStatement(); + rs = stm.executeQuery("select count(*) from XATT"); + rs.next(); + } catch (SQLException e) { + rs = stm.executeQuery("select global_xid from syscs_diag.transaction_table " + + "where global_xid is not null order by global_xid"); + StringBuffer sb = new StringBuffer("Global transactions in progress:\n"); + while (rs.next()) { + sb.append(rs.getString(1)); + sb.append("\n"); + } + Assert.fail(sb.toString()); + }
          Hide
          Myrna van Lunteren added a comment -

          I have no further comments, but again, I'm not familiar with the xa code, maybe someone else can review also.

          Show
          Myrna van Lunteren added a comment - I have no further comments, but again, I'm not familiar with the xa code, maybe someone else can review also.
          Hide
          Julius Stroffek added a comment -

          I added/changed

          • missing javadoc with @see tag
          • renamed isFinished to performTimeoutRollback
          • changed XATransactionTest to use DatabasePropertyTestSetup.setLockTimeouts
          • changed System.err.println to Assert.fail with the message

          I think that I still have not answered couple of questions:

          Kathey:
          > On the test in addition to Myrna's comments, do we lose anything by dropping from 1000 connections to 66?

          Exactly the same functionality will be tested. Using 1000 connections worked also as something like a stress test and utilized the database much more.

          > I got a bit confused about which changes were part of this issue and which were part of DERBY-2953.
          > Are the code changes necessary for the test to pass with its changes or were the two issues' patches
          > just combined for convenience?

          I created DERBY-2953 as that it will log information about rollbacks to derby log in a situation when I did not know what is/might be causing failures. I also reorganized the code a bit - one common function is called to rollback/cancel the transaction from different places. I thought that this change could be committed sooner and it could help to diagnose the problem after test failures. Now, I have fixies for every possible issue I found and It is also a bit difficult to divide the patch now to DERBY-2871 and DERBY-2953.

          I am running the tests now. derbyall already completed without failures. suites.All are still running.

          Show
          Julius Stroffek added a comment - I added/changed missing javadoc with @see tag renamed isFinished to performTimeoutRollback changed XATransactionTest to use DatabasePropertyTestSetup.setLockTimeouts changed System.err.println to Assert.fail with the message I think that I still have not answered couple of questions: Kathey: > On the test in addition to Myrna's comments, do we lose anything by dropping from 1000 connections to 66? Exactly the same functionality will be tested. Using 1000 connections worked also as something like a stress test and utilized the database much more. > I got a bit confused about which changes were part of this issue and which were part of DERBY-2953 . > Are the code changes necessary for the test to pass with its changes or were the two issues' patches > just combined for convenience? I created DERBY-2953 as that it will log information about rollbacks to derby log in a situation when I did not know what is/might be causing failures. I also reorganized the code a bit - one common function is called to rollback/cancel the transaction from different places. I thought that this change could be committed sooner and it could help to diagnose the problem after test failures. Now, I have fixies for every possible issue I found and It is also a bit difficult to divide the patch now to DERBY-2871 and DERBY-2953 . I am running the tests now. derbyall already completed without failures. suites.All are still running.
          Hide
          Kathey Marsden added a comment -

          Julius said:

          >So there are two possible options how to deal whith this:
          >a) rename isFinished to something less confusing like 'performTimeoutRollback', etc.
          >b) or change the logic so XATransactionState.start would be called also for TMNOFLAGS and isFinished would behave as expected from its name.

          I think either is fine, with a slight preference for a, since this is used only for timeout.

          Show
          Kathey Marsden added a comment - Julius said: >So there are two possible options how to deal whith this: >a) rename isFinished to something less confusing like 'performTimeoutRollback', etc. >b) or change the logic so XATransactionState.start would be called also for TMNOFLAGS and isFinished would behave as expected from its name. I think either is fine, with a slight preference for a, since this is used only for timeout.
          Hide
          Julius Stroffek added a comment -

          Myrna, Kathey, thanks for looking at this issue.

          • I'll add the missing javadoc with @see tags.
          • I'll change System.err.println to fail(...) and and think about BaseJDBCTestCase.prepareCall or DatabasePropertyTestSetup.
          • I agree that isFinished is a bit confusing. The idea why it is there is that there is a just a small window in which the cancelation task might be executing and still not obtained a lock on XATransactionState object. In a meanwhile the transaction might get committed or rolled back by the application and afterwards the cancellation task will obtain a lock and run till the completition (see http://java.sun.com/javase/6/docs/api/java/util/TimerTask.html#cancel() ). This property is used just for preventing attempts to do rollbacks of transactions already committed or rolled back.

          Value of isFinished is assigned to false in schedule timeout task beacuse if the global transaction is started with EmbedXAResource.start(TMNOFLAGS) the method XATransactionState.start(TMNOFLAGS) is not called but scheduleTimeoutTask is called if the timeout is required.

          So there are two possible options how to deal whith this:
          a) rename isFinished to something less confusing like 'performTimeoutRollback', etc.
          b) or change the logic so XATransactionState.start would be called also for TMNOFLAGS and isFinished would behave as expected from its name.

          Thanks for your opinions.

          Show
          Julius Stroffek added a comment - Myrna, Kathey, thanks for looking at this issue. I'll add the missing javadoc with @see tags. I'll change System.err.println to fail(...) and and think about BaseJDBCTestCase.prepareCall or DatabasePropertyTestSetup. I agree that isFinished is a bit confusing. The idea why it is there is that there is a just a small window in which the cancelation task might be executing and still not obtained a lock on XATransactionState object. In a meanwhile the transaction might get committed or rolled back by the application and afterwards the cancellation task will obtain a lock and run till the completition (see http://java.sun.com/javase/6/docs/api/java/util/TimerTask.html#cancel( ) ). This property is used just for preventing attempts to do rollbacks of transactions already committed or rolled back. Value of isFinished is assigned to false in schedule timeout task beacuse if the global transaction is started with EmbedXAResource.start(TMNOFLAGS) the method XATransactionState.start(TMNOFLAGS) is not called but scheduleTimeoutTask is called if the timeout is required. So there are two possible options how to deal whith this: a) rename isFinished to something less confusing like 'performTimeoutRollback', etc. b) or change the logic so XATransactionState.start would be called also for TMNOFLAGS and isFinished would behave as expected from its name. Thanks for your opinions.
          Hide
          Kathey Marsden added a comment -

          Thank you for looking at this issue. I got a bit confused about which changes were part of this issue and which were part of DERBY-2953. Are the code changes necessary for the test to pass with its changes or were the two issues' patches just combined for convenience?

          On the code changes I got a bit confused with the XATransationState.isFinished instance variable changes. isFinished is documented as:
          /** Has this transaction been finished (committed

          • or rolled back)? */
            But the makes isFinshed = true with the XATransactionState constructor, which I didn't quite understand. The transaction hasn't been committed or rolled back at that point. It looks like isFinished is only set to false in setTransactionTimeout which doesn't seem quite right.

          On the test in addition to Myrna's comments, do we lose anything by dropping from 1000 connections to 66?
          The javadoc still says 1000 connections.

          Show
          Kathey Marsden added a comment - Thank you for looking at this issue. I got a bit confused about which changes were part of this issue and which were part of DERBY-2953 . Are the code changes necessary for the test to pass with its changes or were the two issues' patches just combined for convenience? On the code changes I got a bit confused with the XATransationState.isFinished instance variable changes. isFinished is documented as: /** Has this transaction been finished (committed or rolled back)? */ But the makes isFinshed = true with the XATransactionState constructor, which I didn't quite understand. The transaction hasn't been committed or rolled back at that point. It looks like isFinished is only set to false in setTransactionTimeout which doesn't seem quite right. On the test in addition to Myrna's comments, do we lose anything by dropping from 1000 connections to 66? The javadoc still says 1000 connections.
          Hide
          Myrna van Lunteren added a comment -

          too bad that we didn't get this in sooner, looks like we missed a translation effort.

          Show
          Myrna van Lunteren added a comment - too bad that we didn't get this in sooner, looks like we missed a translation effort.
          Hide
          Myrna van Lunteren added a comment -

          I svn updated to the earlier revision, then could apply the patch fine, and svn update seems to have merged correctly too.
          But I'm out of my depth on this...
          Attaching an updated version so someone else can have a look too.

          questions/comments I do have:

          • as was previously observed, I think javadoc would be good for every new method. For ones that overwrite methods from a higher level you can use the @see javadoc tag
          • the new test fixtures uses System.err.println. I think that's a 'not done', you need to change that into a fail(...)
          • in the test' Setup and tearDown methods, you may be able to take advantage of the new BaseJDBCTestCase.prepareCall. Or, better, can't you use the DatabasePropertyTestSetup?
          Show
          Myrna van Lunteren added a comment - I svn updated to the earlier revision, then could apply the patch fine, and svn update seems to have merged correctly too. But I'm out of my depth on this... Attaching an updated version so someone else can have a look too. questions/comments I do have: as was previously observed, I think javadoc would be good for every new method. For ones that overwrite methods from a higher level you can use the @see javadoc tag the new test fixtures uses System.err.println. I think that's a 'not done', you need to change that into a fail(...) in the test' Setup and tearDown methods, you may be able to take advantage of the new BaseJDBCTestCase.prepareCall. Or, better, can't you use the DatabasePropertyTestSetup?
          Hide
          Kathey Marsden added a comment -

          The patch didn't apply for me. I saw conflicts in messages.xml and MessageId.java. Could you post an updated patch and I'll try to look quickly before it gets out of date?

          Show
          Kathey Marsden added a comment - The patch didn't apply for me. I saw conflicts in messages.xml and MessageId.java. Could you post an updated patch and I'll try to look quickly before it gets out of date?
          Hide
          Julius Stroffek added a comment -

          suites.All and derbyall ran without failures.

          Show
          Julius Stroffek added a comment - suites.All and derbyall ran without failures.
          Hide
          Julius Stroffek added a comment -

          Changes in xa transaction timeout:

          • added a code to print a text message to a log file when the global transaction will get rolled back.
          • joined a code performing a rollback of the global transaction on both embedded and network drivers. The transaction is now atomically (XATransactionState lock is obtained only once) disassociated from the resource and rolled back. The common code is placed in XATransactionState object.
          • a reference to the ResourceAdapter instance was necessary to be present in DRDAXAProtocol/XADatabase and it is captured from EmbedXADataSource during a creation of a connection in XADatabase instance.

          Changes in XATransactionTest:

          • The number of connections created was reduced from 1000 to 66.
          • All the references to connections are kept in an array so that no optimization will garbage collect them before the transaction will get rolled back by the proper code tested. All the connections are closed manually at the end of the test.
          • The long running statement was rewritten to use system tables thus the test will no longer need an appropriate number of records in a test table (thus less connections might be created).
          • derby.locks.waitTimeout and derby.locks.deadlockTimeout are being stored in test setup and changed to high enough values and changed back at tearDown. jdbcapi/SetTransactionIsolationTest changes both of these to 3 without restoring them back and this was causing issues with a test.
          • a list of global transactions in progress is dumped when the test fails in case there are pending transactions that were supposed to be rolled back

          Dyre (response to the comment at DERBY-2953):
          I am also not quite clear about the policy when to write a javadoc comment and when not. I think I have written javadoc comments for all new methods/classes except when the comment is present in the superclass or method of the superclass and there is nothing new to be written in the implementation. I can copy the comment from the superclass but I do not see a benefit of doing this.

          Thanks for reviewing the patch.

          Show
          Julius Stroffek added a comment - Changes in xa transaction timeout: added a code to print a text message to a log file when the global transaction will get rolled back. joined a code performing a rollback of the global transaction on both embedded and network drivers. The transaction is now atomically (XATransactionState lock is obtained only once) disassociated from the resource and rolled back. The common code is placed in XATransactionState object. a reference to the ResourceAdapter instance was necessary to be present in DRDAXAProtocol/XADatabase and it is captured from EmbedXADataSource during a creation of a connection in XADatabase instance. Changes in XATransactionTest: The number of connections created was reduced from 1000 to 66. All the references to connections are kept in an array so that no optimization will garbage collect them before the transaction will get rolled back by the proper code tested. All the connections are closed manually at the end of the test. The long running statement was rewritten to use system tables thus the test will no longer need an appropriate number of records in a test table (thus less connections might be created). derby.locks.waitTimeout and derby.locks.deadlockTimeout are being stored in test setup and changed to high enough values and changed back at tearDown. jdbcapi/SetTransactionIsolationTest changes both of these to 3 without restoring them back and this was causing issues with a test. a list of global transactions in progress is dumped when the test fails in case there are pending transactions that were supposed to be rolled back Dyre (response to the comment at DERBY-2953 ): I am also not quite clear about the policy when to write a javadoc comment and when not. I think I have written javadoc comments for all new methods/classes except when the comment is present in the superclass or method of the superclass and there is nothing new to be written in the implementation. I can copy the comment from the superclass but I do not see a benefit of doing this. Thanks for reviewing the patch.
          Hide
          Julius Stroffek added a comment -

          I have tested the patch on the machine where the tests were originally failing and it went well without failures.

          Show
          Julius Stroffek added a comment - I have tested the patch on the machine where the tests were originally failing and it went well without failures.
          Hide
          Julius Stroffek added a comment -

          This is the patch fixing all the possible issues with the test I found. It merges also a change made for DERBY-2953.

          I ran the tests without failures. I am planning to run the tests also on the box where the test was originally failing. More detailed description of changes will follow afterwards.

          Show
          Julius Stroffek added a comment - This is the patch fixing all the possible issues with the test I found. It merges also a change made for DERBY-2953 . I ran the tests without failures. I am planning to run the tests also on the box where the test was originally failing. More detailed description of changes will follow afterwards.
          Hide
          Julius Stroffek added a comment -

          I will rewrite a test in a way that will use much less connections created in parallel.

          Lesson learned: It was not a good idea to use 1000 connections in a test for a small embedded database as derby is. There might be some restrictions on older or embedded systems.

          However, I am not quite sure if this will fix the issue for sure. I'll try to test the result as much as possible.

          Show
          Julius Stroffek added a comment - I will rewrite a test in a way that will use much less connections created in parallel. Lesson learned: It was not a good idea to use 1000 connections in a test for a small embedded database as derby is. There might be some restrictions on older or embedded systems. However, I am not quite sure if this will fix the issue for sure. I'll try to test the result as much as possible.
          Hide
          Julius Stroffek added a comment -

          I rewrote the test that it will not close the connections and they will not be garbage collected neither since I call the XAConnection.close method on every connection at the end of the test.

          I ran all the tests (suites.All and derbyall) without failures on my box for 'd2871-test' patch also with a patch for DERBY-2953 without any failures. I tried to run those test also on a HP-UX box where the test was originally failing. The rewritten test always fails with the message


          1) testXATransactionTimeout(org.apache.derbyTesting.functionTests.tests.jdbcapi.XATransactionTest)java.sql.SQLException: A communications e
          rror has been detected: Broken pipe (errno:32).
          at org.apache.derby.client.am.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:46)
          at org.apache.derby.client.am.SqlException.getSQLException(SqlException.java:362)
          at org.apache.derby.client.ClientPooledConnection.<init>(ClientPooledConnection.java:115)
          at org.apache.derby.client.ClientXAConnection.<init>(ClientXAConnection.java:48)
          at org.apache.derby.client.net.ClientJDBCObjectFactoryImpl.newClientXAConnection(ClientJDBCObjectFactoryImpl.java:76)
          at org.apache.derby.jdbc.ClientXADataSource.getXAConnectionX(ClientXADataSource.java:88)
          at org.apache.derby.jdbc.ClientXADataSource.getXAConnection(ClientXADataSource.java:72)
          at org.apache.derby.jdbc.ClientXADataSource.getXAConnection(ClientXADataSource.java:65)
          at org.apache.derbyTesting.functionTests.tests.jdbcapi.XATransactionTest.testXATransactionTimeout(XATransactionTest.java:188)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:95)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          Caused by: org.apache.derby.client.am.DisconnectException: A communications error has been detected: Broken pipe (errno:32).
          at org.apache.derby.client.net.NetAgent.throwCommunicationsFailure(NetAgent.java:413)
          at org.apache.derby.client.net.NetAgent.sendRequest(NetAgent.java:387)
          at org.apache.derby.client.net.NetAgent.flush_(NetAgent.java:265)
          at org.apache.derby.client.am.Agent.flowOutsideUOW(Agent.java:196)
          at org.apache.derby.client.net.NetConnection.flowServerAttributesAndKeyExchange(NetConnection.java:773)
          at org.apache.derby.client.net.NetConnection.flowUSRIDPWDconnect(NetConnection.java:617)
          at org.apache.derby.client.net.NetConnection.flowConnect(NetConnection.java:435)
          at org.apache.derby.client.net.NetConnection.initialize(NetConnection.java:296)
          at org.apache.derby.client.net.NetConnection.<init>(NetConnection.java:280)
          at org.apache.derby.client.net.ClientJDBCObjectFactoryImpl.newNetConnection(ClientJDBCObjectFactoryImpl.java:264)
          at org.apache.derby.client.net.NetXAConnection.createNetConnection(NetXAConnection.java:269)
          at org.apache.derby.client.net.NetXAConnection.<init>(NetXAConnection.java:73)
          at org.apache.derby.client.ClientPooledConnection.getNetXAConnection(ClientPooledConnection.java:331)
          at org.apache.derby.client.ClientPooledConnection.<init>(ClientPooledConnection.java:108)
          ... 44 more
          Caused by: java.net.SocketException: Broken pipe (errno:32)
          at java.net.SocketOutputStream.socketWrite0(Native Method)
          at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:97)
          at java.net.SocketOutputStream.write(SocketOutputStream.java:141)
          at org.apache.derby.client.net.Request.sendBytes(Request.java:1388)
          at org.apache.derby.client.net.Request.flush(Request.java:1382)
          at org.apache.derby.client.net.NetAgent.sendRequest(NetAgent.java:385)
          ... 56 more

          I explored a problem a bit and discovered that the problem occurs due to the limit of the number of open files and I have created a simple code to verify this...

          int count = 1000;
          XAConnection xaConn[] = new XAConnection[count];

          try {
          // start-up the server
          NetworkServerControl server = new NetworkServerControl();
          server.start (null);

          for (int i=0; i < count; i++)

          { System.out.print("Creating connection number " + i + "..."); xaConn[i] = createXAConnection(connString, "", ""); System.out.println("Ok."); }

          for (int i=0; i < count; i++)

          { xaConn[i].close(); }

          } catch (Exception ex)

          { ex.printStackTrace(); }

          which will throw an exception after creating a connection number 391

          org.apache.derby.client.am.DisconnectException: java.net.SocketException : Error connecting to server localhost on port 1527 with message File table overflow (errno:23).
          at org.apache.derby.client.net.NetAgent.<init>(NetAgent.java:129)
          at org.apache.derby.client.net.NetConnection.newAgent_(NetConnection.java:1086)
          at org.apache.derby.client.am.Connection.initConnection(Connection.java:218)
          at org.apache.derby.client.am.Connection.<init>(Connection.java:169)
          at org.apache.derby.client.net.NetConnection.<init>(NetConnection.java:278)
          at org.apache.derby.client.net.ClientJDBCObjectFactoryImpl.newNetConnection(ClientJDBCObjectFactoryImpl.java:264)
          at org.apache.derby.client.net.NetXAConnection.createNetConnection(NetXAConnection.java:269)
          at org.apache.derby.client.net.NetXAConnection.<init>(NetXAConnection.java:73)
          at org.apache.derby.client.ClientPooledConnection.getNetXAConnection(ClientPooledConnection.java:331)
          at org.apache.derby.client.ClientPooledConnection.<init>(ClientPooledConnection.java:108)
          ... 8 more
          Caused by: java.net.SocketException: File table overflow (errno:23)
          at java.net.Socket.createImpl(Socket.java:397)
          at java.net.Socket.<init>(Socket.java:359)
          at java.net.Socket.<init>(Socket.java:178)
          at javax.net.DefaultSocketFactory.createSocket(SocketFactory.java:196)
          at org.apache.derby.client.net.OpenSocketAction.run(OpenSocketAction.java:62)
          at java.security.AccessController.doPrivileged(Native Method)
          at org.apache.derby.client.net.NetAgent.<init>(NetAgent.java:127)
          ... 17 more
          ------

          Show
          Julius Stroffek added a comment - I rewrote the test that it will not close the connections and they will not be garbage collected neither since I call the XAConnection.close method on every connection at the end of the test. I ran all the tests (suites.All and derbyall) without failures on my box for 'd2871-test' patch also with a patch for DERBY-2953 without any failures. I tried to run those test also on a HP-UX box where the test was originally failing. The rewritten test always fails with the message — 1) testXATransactionTimeout(org.apache.derbyTesting.functionTests.tests.jdbcapi.XATransactionTest)java.sql.SQLException: A communications e rror has been detected: Broken pipe (errno:32). at org.apache.derby.client.am.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:46) at org.apache.derby.client.am.SqlException.getSQLException(SqlException.java:362) at org.apache.derby.client.ClientPooledConnection.<init>(ClientPooledConnection.java:115) at org.apache.derby.client.ClientXAConnection.<init>(ClientXAConnection.java:48) at org.apache.derby.client.net.ClientJDBCObjectFactoryImpl.newClientXAConnection(ClientJDBCObjectFactoryImpl.java:76) at org.apache.derby.jdbc.ClientXADataSource.getXAConnectionX(ClientXADataSource.java:88) at org.apache.derby.jdbc.ClientXADataSource.getXAConnection(ClientXADataSource.java:72) at org.apache.derby.jdbc.ClientXADataSource.getXAConnection(ClientXADataSource.java:65) at org.apache.derbyTesting.functionTests.tests.jdbcapi.XATransactionTest.testXATransactionTimeout(XATransactionTest.java:188) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:95) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) Caused by: org.apache.derby.client.am.DisconnectException: A communications error has been detected: Broken pipe (errno:32). at org.apache.derby.client.net.NetAgent.throwCommunicationsFailure(NetAgent.java:413) at org.apache.derby.client.net.NetAgent.sendRequest(NetAgent.java:387) at org.apache.derby.client.net.NetAgent.flush_(NetAgent.java:265) at org.apache.derby.client.am.Agent.flowOutsideUOW(Agent.java:196) at org.apache.derby.client.net.NetConnection.flowServerAttributesAndKeyExchange(NetConnection.java:773) at org.apache.derby.client.net.NetConnection.flowUSRIDPWDconnect(NetConnection.java:617) at org.apache.derby.client.net.NetConnection.flowConnect(NetConnection.java:435) at org.apache.derby.client.net.NetConnection.initialize(NetConnection.java:296) at org.apache.derby.client.net.NetConnection.<init>(NetConnection.java:280) at org.apache.derby.client.net.ClientJDBCObjectFactoryImpl.newNetConnection(ClientJDBCObjectFactoryImpl.java:264) at org.apache.derby.client.net.NetXAConnection.createNetConnection(NetXAConnection.java:269) at org.apache.derby.client.net.NetXAConnection.<init>(NetXAConnection.java:73) at org.apache.derby.client.ClientPooledConnection.getNetXAConnection(ClientPooledConnection.java:331) at org.apache.derby.client.ClientPooledConnection.<init>(ClientPooledConnection.java:108) ... 44 more Caused by: java.net.SocketException: Broken pipe (errno:32) at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:97) at java.net.SocketOutputStream.write(SocketOutputStream.java:141) at org.apache.derby.client.net.Request.sendBytes(Request.java:1388) at org.apache.derby.client.net.Request.flush(Request.java:1382) at org.apache.derby.client.net.NetAgent.sendRequest(NetAgent.java:385) ... 56 more — I explored a problem a bit and discovered that the problem occurs due to the limit of the number of open files and I have created a simple code to verify this... int count = 1000; XAConnection xaConn[] = new XAConnection [count] ; try { // start-up the server NetworkServerControl server = new NetworkServerControl(); server.start (null); for (int i=0; i < count; i++) { System.out.print("Creating connection number " + i + "..."); xaConn[i] = createXAConnection(connString, "", ""); System.out.println("Ok."); } for (int i=0; i < count; i++) { xaConn[i].close(); } } catch (Exception ex) { ex.printStackTrace(); } which will throw an exception after creating a connection number 391 org.apache.derby.client.am.DisconnectException: java.net.SocketException : Error connecting to server localhost on port 1527 with message File table overflow (errno:23). at org.apache.derby.client.net.NetAgent.<init>(NetAgent.java:129) at org.apache.derby.client.net.NetConnection.newAgent_(NetConnection.java:1086) at org.apache.derby.client.am.Connection.initConnection(Connection.java:218) at org.apache.derby.client.am.Connection.<init>(Connection.java:169) at org.apache.derby.client.net.NetConnection.<init>(NetConnection.java:278) at org.apache.derby.client.net.ClientJDBCObjectFactoryImpl.newNetConnection(ClientJDBCObjectFactoryImpl.java:264) at org.apache.derby.client.net.NetXAConnection.createNetConnection(NetXAConnection.java:269) at org.apache.derby.client.net.NetXAConnection.<init>(NetXAConnection.java:73) at org.apache.derby.client.ClientPooledConnection.getNetXAConnection(ClientPooledConnection.java:331) at org.apache.derby.client.ClientPooledConnection.<init>(ClientPooledConnection.java:108) ... 8 more Caused by: java.net.SocketException: File table overflow (errno:23) at java.net.Socket.createImpl(Socket.java:397) at java.net.Socket.<init>(Socket.java:359) at java.net.Socket.<init>(Socket.java:178) at javax.net.DefaultSocketFactory.createSocket(SocketFactory.java:196) at org.apache.derby.client.net.OpenSocketAction.run(OpenSocketAction.java:62) at java.security.AccessController.doPrivileged(Native Method) at org.apache.derby.client.net.NetAgent.<init>(NetAgent.java:127) ... 17 more ------
          Hide
          Julius Stroffek added a comment -

          I have an access to the machine on which the test failed. However, I am not able to reproduce the failure. I will continue trying to find out what is going on.

          Show
          Julius Stroffek added a comment - I have an access to the machine on which the test failed. However, I am not able to reproduce the failure. I will continue trying to find out what is going on.
          Hide
          Myrna van Lunteren added a comment -

          unchecking patch available, I had checked in the change but it caused an intermittent failure in the test.
          Also unmarking fix for 10.3.

          Show
          Myrna van Lunteren added a comment - unchecking patch available, I had checked in the change but it caused an intermittent failure in the test. Also unmarking fix for 10.3.
          Hide
          Vemund Østgaard added a comment -

          I ran Suites.All on HP with my checked out version of trunk (revision 552770) to verify the fix.

          The same test failed, but this time with a lock timeout instead:

          1) testXATransactionTimeout(org.apache.derbyTesting.functionTests.tests.jdbcapi.XATransactionTest)java.sql.SQLException: A lock could not be obtained within
          the time requested
          at org.apache.derby.client.am.SQLExceptionFactory.getSQLException(Unknown Source)
          at org.apache.derby.client.am.SqlException.getSQLException(Unknown Source)
          at org.apache.derby.client.am.Statement.executeQuery(Unknown Source)
          at org.apache.derbyTesting.functionTests.tests.jdbcapi.XATransactionTest.testXATransactionTimeout(XATransactionTest.java:247)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:95)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          Caused by: org.apache.derby.client.am.SqlException: A lock could not be obtained within the time requested
          at org.apache.derby.client.am.Statement.completeSqlca(Unknown Source)
          at org.apache.derby.client.net.NetStatementReply.parseOpenQueryError(Unknown Source)
          at org.apache.derby.client.net.NetStatementReply.parseOPNQRYreply(Unknown Source)
          at org.apache.derby.client.net.NetStatementReply.readOpenQuery(Unknown Source)
          at org.apache.derby.client.net.StatementReply.readOpenQuery(Unknown Source)
          at org.apache.derby.client.net.NetStatement.readOpenQuery_(Unknown Source)
          at org.apache.derby.client.am.Statement.readOpenQuery(Unknown Source)
          at org.apache.derby.client.am.Statement.flowExecute(Unknown Source)
          at org.apache.derby.client.am.Statement.executeQueryX(Unknown Source)
          ... 40 more

          Show
          Vemund Østgaard added a comment - I ran Suites.All on HP with my checked out version of trunk (revision 552770) to verify the fix. The same test failed, but this time with a lock timeout instead: 1) testXATransactionTimeout(org.apache.derbyTesting.functionTests.tests.jdbcapi.XATransactionTest)java.sql.SQLException: A lock could not be obtained within the time requested at org.apache.derby.client.am.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.client.am.SqlException.getSQLException(Unknown Source) at org.apache.derby.client.am.Statement.executeQuery(Unknown Source) at org.apache.derbyTesting.functionTests.tests.jdbcapi.XATransactionTest.testXATransactionTimeout(XATransactionTest.java:247) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:95) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) Caused by: org.apache.derby.client.am.SqlException: A lock could not be obtained within the time requested at org.apache.derby.client.am.Statement.completeSqlca(Unknown Source) at org.apache.derby.client.net.NetStatementReply.parseOpenQueryError(Unknown Source) at org.apache.derby.client.net.NetStatementReply.parseOPNQRYreply(Unknown Source) at org.apache.derby.client.net.NetStatementReply.readOpenQuery(Unknown Source) at org.apache.derby.client.net.StatementReply.readOpenQuery(Unknown Source) at org.apache.derby.client.net.NetStatement.readOpenQuery_(Unknown Source) at org.apache.derby.client.am.Statement.readOpenQuery(Unknown Source) at org.apache.derby.client.am.Statement.flowExecute(Unknown Source) at org.apache.derby.client.am.Statement.executeQueryX(Unknown Source) ... 40 more
          Hide
          Myrna van Lunteren added a comment - - edited

          I committed the patch on trunk with revision 552621.
          But can someone verify this works on the platform it was reported on?

          Show
          Myrna van Lunteren added a comment - - edited I committed the patch on trunk with revision 552621. But can someone verify this works on the platform it was reported on?
          Hide
          Julius Stroffek added a comment -

          When the transaction in the test is supposed to be committed the timeout is set to 60 seconds. If a transaction is supposed to be rolled back the timeout is set to 8 seconds only.

          I ran suites.All without failures on my box. No need to run derbyall since the change is just in a junit test.

          Show
          Julius Stroffek added a comment - When the transaction in the test is supposed to be committed the timeout is set to 60 seconds. If a transaction is supposed to be rolled back the timeout is set to 8 seconds only. I ran suites.All without failures on my box. No need to run derbyall since the change is just in a junit test.
          Hide
          Julius Stroffek added a comment -

          I changed the way of setting up the transaction timeout in the test.

          Show
          Julius Stroffek added a comment - I changed the way of setting up the transaction timeout in the test.
          Hide
          Julius Stroffek added a comment -

          The exception is thrown probably on XATransaction:203 due to a to low transaction timeout value. I setted up the value just for 5 seconds which might not be enough to commit the transaction when garbage collection is invoked. I'll increase the value to 30s.

          Show
          Julius Stroffek added a comment - The exception is thrown probably on XATransaction:203 due to a to low transaction timeout value. I setted up the value just for 5 seconds which might not be enough to commit the transaction when garbage collection is invoked. I'll increase the value to 30s.

            People

            • Assignee:
              Julius Stroffek
              Reporter:
              Henri van de Scheur
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development