|
The attached 'repro.java' can be used to reproduce this bug.
As Army notes, there is no exception or other "hard" evidence of the problem. What I did was to run the attached program against a NetworkServer instance which had server-side tracing turned on (derby.drda.traceAll=true, derby.drda.traceDirectory=serverTraces), and also have client-side tracing turned on (;traceDirectory=clientTraces in my connection URL). Then, bring up both the server side traces and the client side traces in a pair of editor sessions side-by-side, and hunt down until you see the PRPSQLST message being sent from the client to the server. In the server-side trace, you can then see that the server responds with a SQLDARD which is far larger than 32K. The attached program prepares a statement with 2001 parameter markers and receives a SQLDARD message which is 118,300+ bytes long, so it appears that each parameter marker needs about 58 bytes in the SQLDARD, and it would only take about 500 parameter markers or so to blow the 32K limit.
Bryan Pendleton made changes - 14/Dec/05 03:43 AM
Bryan Pendleton made changes - 16/Dec/05 01:18 AM
Attached is a proposed change to address bugs 125, 170, 491, and 492, together with a detailed document discussing the thinking behind the changes and explaining how they work. Please review this at your convenience and let me know what you think.
The current patch does not contain any new tests, and I believe that is mandatory prior to submitting these fixes, so the current patch is not ready to be submitted. I will be working on adding new tests, but since review of these changes will require some thought, I thought it would be valuable to put the change proposal and the accompanying description out there now, while I continue to work on more testing in the meantime. thanks, bryan
Bryan Pendleton made changes - 19/Dec/05 04:28 AM
First diff included an accidental, unrelated change. Removed, and replaced with svn_2.diff.
Bryan Pendleton made changes - 20/Dec/05 12:35 AM
Bryan Pendleton made changes - 20/Dec/05 12:35 AM
Bryan Pendleton made changes - 29/Dec/05 09:48 AM
Bryan Pendleton made changes - 29/Dec/05 09:48 AM
Replaced "changes.html" with "changes2.html", which includes more details and two additional bug fixes of significance.
Replaced "svn-2.diff" with "svn-dec_28_2005.diff", which includes all the bug fixes that I propose for this patch, together with several new regression tests and associated output files. The regression tests are in lang/procedure.java and in derbynet/prepStmt.java, and there is also a change to util/ProcedureTest.java to support the new lang/procedure.java tests. I think these changes are now fully ready for review. Please let me know your comments as I am very interested to hear what you think.
Bryan Pendleton made changes - 29/Dec/05 09:52 AM
I am withdrawing this patch. It is unnecessarily large and complex, as it bundles multiple bug-fixes into one patch. I will submit a revised patch later which just addresses the changes to DDMWriter.finalizeDSSLength(). I will submit separate patches for bugs 170, 491, and 492. The patches for 491 and 492 will be dependent on the patch for this bug (125).
Bryan Pendleton made changes - 13/Jan/06 01:41 AM
Bryan Pendleton made changes - 13/Jan/06 01:41 AM
Attached is a focused patch which addresses *only* the problems in DDMWriter.finalizeDSSLength(). The patch includes a test, and updates to the master output files. Note, unfortunately, that this test works even *without* the changes to DDMWriter.java. The reason for this is explained in the changes.html file. I would like to write a better test, but don't know how to. Reviewers, please look at this patch and see if you can suggest a better way to write the test. Even if we can't write a better test, I think we should still review and consider this patch, because it fixes an important bug (and it is a pre-requisite for the fixes to
Bryan Pendleton made changes - 13/Jan/06 09:47 AM
Hi Bryan, I finally had a chance to look through your proposal (I have not yet looked at the actual patch), as I promised to do back in December. I think you have provided an exceptionally thorough description of the problem that was enjoyable to read. With regard to your questions; I think you have very high standards for your work, maybe too high :) Personally, I don't see any problems with filling the "bytes[]" array with junk to provoke an error, at least not in debug builds. And I would also let the client rely on the continuation header, and just refuse to read more bytes if the size was zero (or perhaps even throw an exception if the continuation bit is set, and the size is zero). If such protocol validation will break existing code I think one could print a warning in debug mode to make the test fail.
Hi Bryan
Thank you so much again for your excellent documentation. I am learning quite a bit and appreciate your very high standards for your work #:) I think once the series of continuation fixes is committed we should update the DssProtocolErrors page that Army wrote to point to your documents. I don't know if they should move to the documents to Wiki or if it is ok to just keep them here and point to the bugs. My only comment on the code change are: 1) Can you submit a new patch after updating your client for the 2) Can you add some brief comments to the code containing - a reference to - just a couple brief comments in the code as to why we have to add 1 to startOfCopyData and the continuation flag hnadling changes that you made. With regard to your questions and notes On the reading of DDMWriter traces I tend to find the client tracing more helpful than the server-side tracing , and shows the JDBC calls to provide better context. That said, probably the problems you note exist on the client side too, so it might be good to file enhancements while you are thinking about it. On prefilling the buffer with a known value to try to flush out corruption in the sent data: I think this is not a bad idea if (SanityManager.DEBUG), but not a high priority. Why don't bad DSS continuation headers cause more symptoms? You mentioned that this might be because the Derby client is very liberal in assuming that the block length is 32767 if the continuation bit is set. I think that it would be worthwhile for the client to be changed to not make this assumption when talking to 10.2 or higher servers. It would help prevent regressions in this area that might be very difficult to reproduce. Should there be a limit to DDMWriter auto-expansion? One poorly formed idea is to eliminate the expansion all together by istead of expanding have a pool of buffers available to all the DDMWriters. If a DDMWriter needs to write more, it picks up a new buffer and returns it to the pool when no longer needed. It seems like there should be a way to eliminate the shifting all together as well or at least shift sooner after the overrun happens to avoid shifting large amounts of data. I spent some time looking at the existing test case for this issue and at Bryan's excellent description of the problem. I played around with both and was able to come up with a repro for the problem that actually causes a failure on JCC. This isn't by any means a polished repro--the test would need to be cleaned up before it could be added to the nightlies. But I thought I'd post what I found and see what the reactions are. The repro is called offByOne.java (it's part of offByOne.jar). The theory behind this test is based on a quote from Bryan's email in the following thread: http://article.gmane.org/gmane.comp.apache.db.derby.devel/11865 The quote was: <begin_quote> Note that the only "random" or "nondeterministic" part of this damage is that the very last byte in the large DDM object gets replaced by an unknown byte from the unused section of the 'bytes' array. If the bytes array was just expanded in order to accomodate this object, then the unused section of the array will contain zeros, but if the bytes array was previously used for a larger amount of data, then those unused bytes contain unknown values. <end_quote> So what I've done with this test is remove the "random" part. Basically, I take the repro that is already part of the existing patch and I run it twice. The first iteration, I make both the table name and the column name extra long, so that the server-side buffer has more data in it. The second iteration, I use simpler names for the table and column, which take up less space in the server buffer. Then, per Bryan's quote, "if the bytes array was previously used for a larger amount of data, then those unused bytes contain unknown values". But since we intentionally put the "larger amount of data" into the buffer during the first iteration, we know what those "unknown" values are going to be. Then, by twiddling the size of the table/column names, we can 'shift' the data until we reach a point where the off-by-one error manifests itself: namely, we end up incorrectly leaving old data in the current server buffer. I've also attached the traces for Derby Client and JCC with and without the patch for 8160 000000000401F100 0000000000000000 and the second one looks like this: 8160 000000000001 If I'm understanding correctly, the "01" in the second line is INCORRECT--it should be "00". But because of the off-by-one error, the "01" is left over from the first iteration, which is a manifestation of the problem described for First occurence: 8160 000000000401F100 0000000000000000 Second occurence: 8160 000000000001002C D052000300262205 For JCC, the second occurrence has additional data in it--it looks like an OPNQRYRM is chained to the data? This might be a result of the 'defer' behavior, I'm not sure. For whatever reason, this error goes "unseen" by the Derby Client; but for JCC, the result is an error in the repro, namely: Exception in thread "main" com.ibm.db2.jcc.c.DisconnectException: Execution failed due to a distribution protocol error that caused deallocation of the conversation. A DRDA Data Stream Syntax Error was detected. Reason: 0x3 at com.ibm.db2.jcc.a.bb.l(bb.java:1206) at com.ibm.db2.jcc.a.bb.c(bb.java:363) at com.ibm.db2.jcc.a.bb.v(bb.java:1439) at com.ibm.db2.jcc.a.eb.c(eb.java:56) at com.ibm.db2.jcc.a.r.c(r.java:42) at com.ibm.db2.jcc.a.tb.h(tb.java:169) at com.ibm.db2.jcc.c.zc.p(zc.java:1223) at com.ibm.db2.jcc.c.ad.d(ad.java:2246) at com.ibm.db2.jcc.c.ad.U(ad.java:489) at com.ibm.db2.jcc.c.ad.executeQuery(ad.java:472) So while this isn't ideal, at least we'd have a test for JCC that didn't rely on any other fixes (esp. 1) why does this error occur in JCC but not in the client? Is it related to deferred prepares, or something else? Can/should the client be recognizing this error and doing something about it? 2) Is there some way to modify this repro so that the client fails, too? All of that said, I have to admit the repro is a bit fragile and hack-ish. In order to get this to work, the second iteration has to have a table name that is exactly 99 characters long and a column name that is 97 characters. There are other lengths that will cause the problem, as well--but just randomly using two lengths probably won't work. So yes, it's ugly. Nonetheless, at least we have a test case that fails without the I don't really have time to continue looking at this, but I'm hoping someone out there can follow up on this and turn it into something useful. My apologies for not taking this further, but I have a lot going on right now... To run the repro: > java offByOne --> Runs with Derby Client > java offByOne jcc --> Runs with JCC Hope this is helpful...
A B made changes - 18/Jan/06 06:40 AM
Bryan Pendleton made changes - 23/Jan/06 01:18 AM
Attached is the latest patch proposal, updated to the latest trunk,with all conflicts resolved and derbyall passing. There are two changes in this patch proposal relative to the previous one:
- I've changed the regression test to use the new test devised by Army. I've verified that the test fails before the patch to DDMWriter, and succeeds after it. Although I agree that the test is a bit delicate, it is much better to have a test which clearly demonstrates the problem. - I've improved the comments in finalizeDssLength() a bit to explain my changes a bit better, and also included a pointer to this bug report for those future historians who desire to study the bug in more detail. Kathey, Dyre, Army: thank you very much for your feedback on the previous proposals and on the questions I raised. My preference is to treat your suggestions independently, and to open separate JIRA entries to track resolving these questions in the future: - Improving DDMWriter traces to make them easier to read in the presence of continuations and chaining - Modifying the client to process the length portion of DSS Continuation Headers, compatibly with previous releases - Under SanityManager control, pre-filling the DDMWriter buffer with special non-zero data to reveal memory corruption problems more readily - Investigating performance issues in large object handling with respect to both memory usage and number of data copies Thanks again to everybody who has helped me with the investigations into this bug.
Bryan Pendleton made changes - 23/Jan/06 02:23 AM
Checked in this patch. Thanks Bryan! Date: Mon Jan 23 09:34:23 2006 New Revision: 371603 URL: http://svn.apache.org/viewcvs?rev=371603&view=rev Fixed in the trunk with commit of revision 371603
Bryan Pendleton made changes - 25/Jan/06 11:04 AM
Reopened to merge this fix to the 10.1 branch.
Bryan Pendleton made changes - 28/Apr/06 09:17 PM
The merge from main was straightforward, and the testing was successful, once I re-read Army's notes and remembered the details of how this regression test works. Specifically, I had forgotten that this regression test only shows a concrete symptom in the JCC configuration, not in the DerbyNetClient configuration.
Committed the change to the 10.1 branch as revision 398058: svn.apache.org/viewcvs?rev=398058&view=rev
Bryan Pendleton made changes - 29/Apr/06 08:31 AM
A B made changes - 24/May/06 03:58 AM
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
http://wiki.apache.org/db-derby/DssProtocolErrors
Thanks Army!