|
[
Permalink
| « Hide
]
Tiago R. Espinha added a comment - 16/Jul/09 07:48 AM
Attaching the log folder for this test.
linking to
Thank you Tiago for picking up this issue.
What seems to be happening is that for logical connections when we close a prepared statement it under some circumstances tries to reprepare the statement. This is a problem if the table being referenced has been dropped. It would be a good first step to get an embedded reproduction. Take a look at DERBY-1016 repro to see how to get an XA connection. Start with just a local transaction, creating a table and commit, preparing a statement to select from the table. Creating another regular connection with driver manager and dropping the table and then trying to close the prepared statement. Hopefully that will pop it. I'm attaching a repro that I am having trouble with. Basically this repro is locking with a timeout and I can't work out how to get around it, to actually get the repro to work.
Hi Tiago,
I think the timeout is expected in the embedded case as the statement has been executed but the results have not been fetched. I think we should focus on getting a client reproduction first and then will narrow it down to embedded. I changed the program to use client but it is not showing the issue on shutdown yet. I also made a few other minor changes to get it to run. From the test checkHeldRS method it does look like the test fetches some rows/commits etc. You might want to play with that to get it to reproduce. I'm attaching the XATest as I have it currently (with all the fixtures but testDerby966 "commented, and with testDerby966 reduced to the bare essentials to make the error pop) and a repro that I am trying to create, which does not yet reproduce the bug.
Attached is an updated reproduction with the Network Server repro ReproDerby4310.java.
I changed several things from the Tests.zip version but I think the key points were. Shutting down network server, using ClientXADataSource, closing the ResultSet, not closing the XAConnection I changed the port number because I was running some other tests on my machine. ReproEmbeddedDerby4310 shows the embedded bug which is the root cause. For that one we need to use a prepared statement. I suggest you use the embedded version for debugging the issue. Tiago and I discussed this issue on IRC. I will just talk about the PreparedStatement case. There probably similar bugs in similar code for regular Statement and CallableStatement but it seemed best to address PreparedStatement first. In general as we transition from a local transaction to a global one the underlying real connection gets swapped out. I am not sure of the details of why this is needed but it happens in EmbedXAConnection.getConnection(). If we try to access a statement after the connection has changed out from underneath it we end up calling XAStatementControl.getRealPreparedStatement() to perform the operation. This method detects the connection change and if it has occurred, tries to make a new statement just like the old one with a call to ((BrokeredPreparedStatement) applicationStatement).createDuplicateStatement(xaConnection.realConnection, realPreparedStatement);
The applicationStatement has saved away the sql text so we can reprepare. In our case however the table has been dropped so it cannot be reprepared. If we were trying to reexecute this I think is a perfectly reasonable error to get, but for close() we should avoid the recreation all together and I think just close the original realPreparedStatement. I suggested to Tiago we make a closeRealPreparedStatement() method in BrokeredStatementControl/XAStatementControl, which would not attempt to create a new statement. To implement this I think you would need to. - Add closeRealPreparedStatement to the BrokeredStatementControl interface. - Add closeRealPreparedStatement to XAStatementControl which just closes the realPreparedStatement and does not attempt to make a new one. Question: Currently we ignore any errors that occur closing this statement, should we continue to do so? I suggested to Tiago as a first pass not to and see if any problems arise in the tests. - In BrokeredPreparedStatement override the close() method to call control.closeRealPreparedStatement() Clobber before building. There may be another class implementing BrokeredStatementControl that I don't see. Implementing this solution does not bring us in sync with the behavior when we don't have a global connection, because we would still get the check for validity and error for calls like isClosed(), setFetchSize() or getFetchDirection() which currently succeed after the table has been dropped if there is no global transaction. I guess a more complete solution would determine which calls don't check and have them use the original prepared statement and then have the ones that should check try to create a new statement and copy the state as is currently done. I don't know if that is worth the effort. At this point I have took on Kathey's suggestion and I have the close() method in BrokeredPreparedStatement invoking the closeRealPreparedStatement() method in the control (XAControl) property. This last method invokes the close() on the realPreparedStatement object.
This solution does make the error go away (since we jump straight away to closing without repreparing) but for the suites.All run we added another change that seems to be causing mayhem. Under the getRealPreparedStatement() method we removed the try...catch block in the following piece of code: try { realPreparedStatement.close(); } catch (SQLException sqle) { } It is possible that this piece of code was hiding exceptions similar to the ones we are facing now, but this will also hide any other SQLException in the process, which is less than ideal. However, by removing the try..catch block, the suites.All would hang on the testSystemShutdown, so to temporarily avert this behavior I have removed the exception.getMessage() on the server shutdown, which will crash when the exception is an NPE. Still, this test seems to be hanging nonetheless. Hi Tiago, If you are still not getting console output showing the exception that is occurring shortly before the hang, then we could try reintroducing the try/catch but just printing out the exception stack trace to see what exception it is getting.
try { realPreparedStatement.close(); } catch (SQLException sqle) { sqle.printStackTrace(); } First off, this patch I am attaching is NOT for commit.
This patch fixes the issue in the sense that the PreparedStatements no longer are reprepared when we want to close them. Instead, we go directly to the source and try to close the statement. In the process, some exceptions aroused which would hang the run on the shutdown test, so to override this I had to encapsulate the close() call on the realPreparedStatement in a try..catch block, printing the stack trace of the exception. Now the run does go through until the end but I'm getting some exceptions. A great deal of them seem to be like this one: ------------8<-------------- 33) class org.apache.derby.iapi.jdbc.BrokeredCallableStatement40.setQueryTimeout(org.apache.derbyTesting.functionTests.tests.jdbcapi.ClosedObjectTest)junit.framework.AssertionFailedError: No exception was thrown at org.apache.derbyTesting.functionTests.tests.jdbcapi.ClosedObjectTest.testClosedObjects(ClosedObjectTest.java:99) at sun.reflect.GeneratedMethodAccessor2076.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:109) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) -----------------8<----------------- And there's a few others I have managed to copy & paste from the console that I will be attaching in a file. I will also be attaching the derby.log file from this run. (Do note that I also ran out of disk space during this run - this is because the test output folder is at around 5.5Gb due to the amount of failures). One problem seems to be that realPrepardStatement can be null on close so we can get a NullPointerException. I am not totally sure I think maybe this is happening for callable statements because we have not yet implemented closeRealCallableStatement() so for this reason we are getting errors like callbacks not gettting called, and perhaps events being called on the wrong statement but I haven't quite figured that one out.
As an aside, EmbedXAConnection.onStatementClose() is where the callbacks are processed and this is called from EmbedPreparedStatement.closeActions() which gets called from EmbedStatement.close(). So we definitely need to close a statement when doing the close and cannot have a null check. I suggest you go ahead and implement the callable statement changes. It will either make things better or expose this idea as a bad solution. The suites.All run went without a hitch after I implemented the changes for the BrokeredCallableStatement. I faced one failure but it does not seem to be related:
1) IjSecurityManagerTest:embeddedjunit.framework.AssertionFailedError: org/apache/derbyTesting/functionTests/tests/tools/IjSecurityManagerTest.sql at org.apache.derbyTesting.junit.SupportFilesSetup.copyFiles(SupportFilesSetup.java:174) at org.apache.derbyTesting.junit.SupportFilesSetup.access$000(SupportFilesSetup.java:64) at org.apache.derbyTesting.junit.SupportFilesSetup$1.run(SupportFilesSetup.java:139) at java.security.AccessController.doPrivileged(Native Method) at org.apache.derbyTesting.junit.SupportFilesSetup.privCopyFiles(SupportFilesSetup.java:135) at org.apache.derbyTesting.junit.SupportFilesSetup.setUp(SupportFilesSetup.java:121) at junit.extensions.TestSetup$1.protect(TestSetup.java:20) at junit.extensions.TestSetup.run(TestSetup.java:25) FAILURES!!! Tests run: 11362, Failures: 1, Errors: 0 Also, grepping the output file (attached) for "Exception" showed no signs of a stack trace whatsoever. My next test will be to remove the try..catch blocks on the calls to real(.*)Statement.close(); If this works, I think I can then issue a final patch for commit. I agree that looks unrelated. I wonder why it would have a problem getting the resource from the jar file:
URL url = BaseTestCase.getTestResource(name); assertNotNull(name, url); Does it pass on rerun? I neglected to mention for the final patch, you should add comments and javadoc. Do we need to make similar changes for BrokeredStatement or is that ok as is because no objects are referenced in the statement? What do you think about the other calls like isClosed()? Should we address those or just log a minor bug and move on? Interestingly enough I got that same failure on rerun. This becomes more intriguing considering the fact that it was I who introduced that file I believe. I'm thinking something might be wrong with my code tree. The plot thickens.
I'm not sure we would need this for the BrokeredStatement. After all, if this isn't a prepared statement, when we duplicate the statement to accommodate the connection change the statement does not get reprepared, so nothing evil should happen. However, perhaps for standardization's sake we should have all the statements being closed in the same manner (this would also possibly improve performance? We don't really need to duplicate a statement to close it...). As for the remaining calls, yes, we should probably get this for isClosed() and for isValid() as these are the other two methods that throw an exception when the statement is closed according to ClosedObjectTest. --------------8<------------------ if (name.equals("close") || name.equals("isClosed") || name.equals("isValid")) { return false; } -------------8<-------------------- Still, looking at BrokeredStatement's implementation of isClosed(), I am dazzled. Why exactly do we have an exception being thrown there? > throw Util.notImplemented(); I agree that probably the copyFiles problem is related to your client. Check to see if the file is in your source tree. Even if it is do svn update -force on that one file. Check to see it is in classes and lastly check the derbyTesting.jar with jar tvf.
The actual implementation of isClosed() is in BrokeredStatement40.java. This is because it is a new method with JDBC 4.0. With JDK 1.5 and lower (JDBC 3.0) you will see the not implemented exception, but for 1.6 it is implemented. isValid() is actually not a Statement method but rather a method on Connection. I think there are other methods besides isClosed() that in a regular connection won't check if the dependencies if the table is dropped. A couple I tried were getFetchSize() and setFetchSize(). If we fix this for isClosed() I think we should fix it for the other calls that are impacted as well. Doing so would require some more significant code reorganization where we first write tests to determine which calls check dependencies and then decide on a strategy to make sure the various calls get the possibly reprepared statement or the original one as appropriate. Also we will need to check that the state gets transferred properly if calls like setFetchSize() are made on the original statement. The more I think about it, I think this is a bigger project than you have time for in the next couple of weeks, so probably best to just get this issue wrapped up for close() and file a new bug for the other calls. Oh, I forgot to mention. I think it makes sense to implement the new close logic for BrokeredStatement too.
Ok Kathey, I'm not yet attaching the patch but I have "refreshed" that sql file that was causing the file and implemented the new logic for BrokeredStatement. At this point I'm running suites.All again to make sure everything still works fine.
I agree that the isClosed() and the other changes should be done in another issue. This fix will allow us to bring back/forth XATest which is already a big step forward. I'm attaching another patch that is not yet complete. This patch incorporates the fix for this issue and it adds a fixture to XATest (in addition to some house-cleaning on XATest since the fixtures are currently dependent on one another).
This patch is NOT final yet and it is not meant to be committed as it still fails when the client test is activated. If the test is ran only in embedded mode then it all passes, but if the client server mode is enabled, I get a hang which I am still trying to figure out. I tried the test and found the first failure to be in testSingleConnectionOnePhaseCommit
junit.framework.AssertionFailedError: Unexpected row count expected:<1> but was:<2> at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.failNotEquals(Assert.java:282) at junit.framework.Assert.assertEquals(Assert.java:64) at junit.framework.Assert.assertEquals(Assert.java:201) at org.apache.derbyTesting.junit.JDBC.assertUnorderedResultSet(JDBC.java:1227) at org.apache.derbyTesting.junit.JDBC.assertUnorderedResultSet(JDBC.java:1164) at org.apache.derbyTesting.junit.XATestUtil.checkXATransactionView(XATestUtil.java:82) at org.apache.derbyTesting.functionTests.tests.jdbcapi.XATest.testSingleConnectionOnePhaseCommit(XATest.java:111 ) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:599) at junit.framework.TestCase.runTest(TestCase.java:154) at junit.framework.TestCase.runBare(TestCase.java:127) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:109) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22) at junit.extensions.TestSetup$1.protect(TestSetup.java:19) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.extensions.TestSetup.run(TestSetup.java:23) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at junit.textui.TestRunner.doRun(TestRunner.java:116) at junit.textui.TestRunner.start(TestRunner.java:172) at junit.textui.TestRunner.main(TestRunner.java:138) I think the other issues cascade from there. Tiago said if he took out his new test this problem does not occur. It would be interesting 1) To print the transaction table at the failure point to see what our extra transaction is. 2) put an xar.rollback(xid) after xar.end(xid, XAResource.TMSUCCESS); in testDerby4310. It could be that transaction is sticking around for a while even though we have done no work. This patch fixes the issue described. The fixtures to reproduce the described behavior are added on the
I'll be running suites.All with both this and the other patch enabled. The suites.All run went fine with only my usual failure on IjSecurityManagerTest, but that's really an issue with my code tree that's been somehow corrupted. I had tried it on a different checkout folder before and I do not face this issue.
As long as the patches look good, I think we're ready for commit. Please note that this patch should be committed before the one in Resolving for 10.6.
Committed revision 800523 with this fix and the fix for See related comment related to affects version in:
https://issues.apache.org/jira/browse/DERBY-4337?focusedCommentId=12739680&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12739680 Per Kathey's request I have merged both
After both merges, the XATest ran fine with no failures whatsoever. I am now running suites.All with these changes on. The commands used for the merge were, by this order: svn merge -r 793088:793089 https://svn.apache.org/repos/asf/db/derby/code/trunk svn merge -r 800522:800523 https://svn.apache.org/repos/asf/db/derby/code/trunk Rick, should I put this into the 10.5 branch or wait until after you cut the new release candidate?
Changing title to be more descriptive.
I was running suites.All yesterday but unfortunately the power went out and my VM died overnight. Still, I was noticing a very odd symptom with the run being extremely slow and with several errors along the way. This is a symptom I am facing again today as I am repeating the run.
It is possible that it is somehow my code tree that is damaged so when this is done I'll be checking out the 10.5 trunk again and repeat at least the failing tests. If these failures persist we might want to hold on on committing this code to 10.5. Kathey pointed me to a merge that I was missing for the 10.5 backport. After I merged this last revision and rebuilt, suites.All ran fine with one failure that looks unrelated. I got a failure on testStartStopManagementFromApplication in ManagementMBeanTest and upon rerunning this one test, I got no failures.
I think it is safe to chalk this up as a random occurrence and do the merge. To wrap up, the following merge commands must be issued: svn merge -r 793088:793089 https://svn.apache.org/repos/asf/db/derby/code/trunk svn merge -r 800522:800523 https://svn.apache.org/repos/asf/db/derby/code/trunk svn merge -r 794105:794106 https://svn.apache.org/repos/asf/db/derby/code/trunk I've also ran derbyall with no failures.
FYI, I started a run before the weekend, just porting the
Glad to see that issue has been fixed! Likewise, this issue has also been merged to the 10.5 branch. Closing it.
Merged changes into 10.4 codeline using revision 809201.
Backported the changes into 10.3 codeline using revision 810210.
Backported changes into 10.2.2.1 codeline using revision 813882.
Had to hand write the tests in XATest because that test has not been converted to junit test in 10.2 codeline and hence the merge on it didn't work. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||