Issue Details (XML | Word | Printable)

Key: DERBY-125
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Bryan Pendleton
Reporter: A B
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Derby

Network Server can send DSS greater than 32K to client, which breaks DRDA protocol.

Created: 19/Jan/05 03:51 AM   Updated: 24/May/06 03:58 AM
Return to search
Component/s: Network Server
Affects Version/s: None
Fix Version/s: 10.1.3.1, 10.2.1.6

Time Tracking:
Not Specified

File Attachments:
  Size
HTML File Licensed for inclusion in ASF works changes.html 2006-01-13 09:47 AM Bryan Pendleton 22 kB
Java Archive File Licensed for inclusion in ASF works offByOne.jar 2006-01-18 06:40 AM A B 137 kB
Java Source File Licensed for inclusion in ASF works repro.java 2005-12-14 03:43 AM Bryan Pendleton 2 kB
File Licensed for inclusion in ASF works svn_jan_12_2006.status 2006-01-13 09:47 AM Bryan Pendleton 0.4 kB
File Licensed for inclusion in ASF works svn_jan_22_2006.diff 2006-01-23 02:23 AM Bryan Pendleton 12 kB

Resolution Date: 29/Apr/06 08:31 AM


 Description  « Hide
BACKGROUND:

DRDA protocol, which is the protocol used by Derby Network Server, dictates that all DSS objects "with data greater than 32,763 bytes" should be broken down into multiple "continuation" DSSes.

PROBLEM:

When Network Server receives a "prepareStatement" call that has a very large number of parameters, it can end up sending a reply DSS that is greater than 32K long to the client; doing so breaks DRDA protocol.

REPRODUCTION:

Note that this reproduction does NOT cause a protocol exception against the JCC driver--without further investigation, it would appear JCC doesn't mind that the DSS is too long. However, other DRDA clients (such as the DB2 ODBC client) will see that the data is too long and will fail because of it.

To reproduce, one can create a simple table and then prepare a statement such as:

SELECT id FROM t1 WHERE id in ( ?, ?, [ ... lots and lots of param markers ... ], ?)

Note that JCC uses deferred prepare by default; when connecting, one must append the "deferPrepares=false" attribute to the end of the URL in order to reproduce the problem (that or just try to execute the statement, since preparation will be done at execution time). When doing the prepare, I added a line in the "flush" method of org.apache.derby.impl.drda.DDMWriter.java to see how large the reply DSS was, and for cases where the number of parameter markers was high, the number of bytes in the single DSS would surpass 32K, and thus break protocol.

NOTES:

Network Server correctly uses continuation DSSes for LOBs and for result set data (data returned as the result of a query) that is greater than 32K. The problem appears to be in "other" cases, such as for the prepareStatement call described above.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Kathey Marsden added a comment - 04/Nov/05 09:50 AM
Army posted some really excellent information related to this issue on the following Wiki page.

http://wiki.apache.org/db-derby/DssProtocolErrors

Thanks Army!

Bryan Pendleton added a comment - 14/Dec/05 03:43 AM
The attached 'repro.java' can be used to reproduce this bug.

As Army notes, there is no exception or other "hard" evidence of the problem. What I did was to run the attached program against a NetworkServer instance which had server-side tracing turned on (derby.drda.traceAll=true, derby.drda.traceDirectory=serverTraces), and also have client-side tracing turned on (;traceDirectory=clientTraces in my connection URL).

Then, bring up both the server side traces and the client side traces in a pair of editor sessions side-by-side, and hunt down until you see the PRPSQLST message being sent from the client to the server. In the server-side trace, you can then see that the server responds with a SQLDARD which is far larger than 32K. The attached program prepares a statement with 2001 parameter markers and receives a SQLDARD message which is 118,300+ bytes long, so it appears that each parameter marker needs about 58 bytes in the SQLDARD, and it would only take about 500 parameter markers or so to blow the 32K limit.

Bryan Pendleton made changes - 14/Dec/05 03:43 AM
Field Original Value New Value
Attachment repro.java [ 12321310 ]
Bryan Pendleton made changes - 16/Dec/05 01:18 AM
Assignee Bryan Pendleton [ bryanpendleton ]
Bryan Pendleton added a comment - 19/Dec/05 04:28 AM
Attached is a proposed change to address bugs 125, 170, 491, and 492, together with a detailed document discussing the thinking behind the changes and explaining how they work. Please review this at your convenience and let me know what you think.

The current patch does not contain any new tests, and I believe that is mandatory prior to submitting these fixes, so the current patch is not ready to be submitted. I will be working on adding new tests, but since review of these changes will require some thought, I thought it would be valuable to put the change proposal and the accompanying description out there now, while I continue to work on more testing in the meantime.

thanks,

bryan

Bryan Pendleton made changes - 19/Dec/05 04:28 AM
Attachment svn.diff [ 12321405 ]
Attachment changes.html [ 12321404 ]
Bryan Pendleton added a comment - 20/Dec/05 12:35 AM
First diff included an accidental, unrelated change. Removed, and replaced with svn_2.diff.

Bryan Pendleton made changes - 20/Dec/05 12:35 AM
Attachment svn_2.diff [ 12321432 ]
Bryan Pendleton made changes - 20/Dec/05 12:35 AM
Attachment svn.diff [ 12321405 ]
Bryan Pendleton made changes - 29/Dec/05 09:48 AM
Attachment changes.html [ 12321404 ]
Bryan Pendleton made changes - 29/Dec/05 09:48 AM
Attachment svn_2.diff [ 12321432 ]
Bryan Pendleton added a comment - 29/Dec/05 09:52 AM
Replaced "changes.html" with "changes2.html", which includes more details and two additional bug fixes of significance.

Replaced "svn-2.diff" with "svn-dec_28_2005.diff", which includes all the bug fixes that I propose for this patch, together with several new regression tests and associated output files. The regression tests are in lang/procedure.java and in derbynet/prepStmt.java, and there is also a change to util/ProcedureTest.java to support the new lang/procedure.java tests.

I think these changes are now fully ready for review. Please let me know your comments as I am very interested to hear what you think.

Bryan Pendleton made changes - 29/Dec/05 09:52 AM
Attachment changes2.html [ 12321587 ]
Attachment svn-dec_28_2005.diff [ 12321588 ]
Bryan Pendleton added a comment - 13/Jan/06 01:41 AM
I am withdrawing this patch. It is unnecessarily large and complex, as it bundles multiple bug-fixes into one patch. I will submit a revised patch later which just addresses the changes to DDMWriter.finalizeDSSLength(). I will submit separate patches for bugs 170, 491, and 492. The patches for 491 and 492 will be dependent on the patch for this bug (125).

Bryan Pendleton made changes - 13/Jan/06 01:41 AM
Attachment changes2.html [ 12321587 ]
Bryan Pendleton made changes - 13/Jan/06 01:41 AM
Attachment svn-dec_28_2005.diff [ 12321588 ]
Bryan Pendleton added a comment - 13/Jan/06 09:47 AM
Attached is a focused patch which addresses *only* the problems in DDMWriter.finalizeDSSLength(). The patch includes a test, and updates to the master output files. Note, unfortunately, that this test works even *without* the changes to DDMWriter.java. The reason for this is explained in the changes.html file. I would like to write a better test, but don't know how to. Reviewers, please look at this patch and see if you can suggest a better way to write the test. Even if we can't write a better test, I think we should still review and consider this patch, because it fixes an important bug (and it is a pre-requisite for the fixes to DERBY-491 and DERBY-492, which also have tests, so it will get additional testing there.)

Bryan Pendleton made changes - 13/Jan/06 09:47 AM
Attachment svn_jan_12_2006.status [ 12321911 ]
Attachment svn_jan_12_2006.diff [ 12321910 ]
Attachment changes.html [ 12321909 ]
Dyre Tjeldvoll added a comment - 17/Jan/06 01:49 AM
Hi Bryan, I finally had a chance to look through your proposal (I have not yet looked at the actual patch), as I promised to do back in December. I think you have provided an exceptionally thorough description of the problem that was enjoyable to read. With regard to your questions; I think you have very high standards for your work, maybe too high :) Personally, I don't see any problems with filling the "bytes[]" array with junk to provoke an error, at least not in debug builds. And I would also let the client rely on the continuation header, and just refuse to read more bytes if the size was zero (or perhaps even throw an exception if the continuation bit is set, and the size is zero). If such protocol validation will break existing code I think one could print a warning in debug mode to make the test fail.

Kathey Marsden added a comment - 17/Jan/06 07:57 AM
Hi Bryan

Thank you so much again for your excellent documentation. I am learning quite a bit and appreciate your very high standards for your work #:) I think once the series of continuation fixes is committed we should update the DssProtocolErrors page that Army wrote to point to your documents. I don't know if they should move to the documents to Wiki or if it is ok to just keep them here and point to the bugs.


My only comment on the code change are:
1) Can you submit a new patch after updating your client for the DERBY-170 commit.
2) Can you add some brief comments to the code containing
    - a reference to DERBY-125 and the changes.html here
    - just a couple brief comments in the code as to why we have to add 1 to startOfCopyData and the continuation flag hnadling changes that you made.

With regard to your questions and notes

On the reading of DDMWriter traces
I tend to find the client tracing more helpful than the server-side tracing , and shows the JDBC calls to provide better context. That said, probably the problems you note exist on the client side too, so it might be good to file enhancements while you are thinking about it.

On prefilling the buffer with a known value to try to flush out corruption in the sent data:
I think this is not a bad idea if (SanityManager.DEBUG), but not a high priority.

Why don't bad DSS continuation headers cause more symptoms?
You mentioned that this might be because the Derby client is very liberal in assuming that the block length is 32767 if the continuation bit is set. I think that it would be worthwhile for the client to be changed to not make this assumption when talking to 10.2 or higher servers. It would help prevent regressions in this area that might be very difficult to reproduce.

Should there be a limit to DDMWriter auto-expansion?
One poorly formed idea is to eliminate the expansion all together by istead of expanding have a pool of buffers available to all the DDMWriters. If a DDMWriter needs to write more, it picks up a new buffer and returns it to the pool when no longer needed. It seems like there should be a way to eliminate the shifting all together as well or at least shift sooner after the overrun happens to avoid shifting large amounts of data.


A B added a comment - 18/Jan/06 06:40 AM

I spent some time looking at the existing test case for this issue and at Bryan's excellent description of the problem. I played around with both and was able to come up with a repro for the problem that actually causes a failure on JCC.

This isn't by any means a polished repro--the test would need to be cleaned up before it could be added to the nightlies. But I thought I'd post what I found and see what the reactions are.

The repro is called offByOne.java (it's part of offByOne.jar).

The theory behind this test is based on a quote from Bryan's email in the following thread:

http://article.gmane.org/gmane.comp.apache.db.derby.devel/11865

The quote was:

<begin_quote>

Note that the only "random" or "nondeterministic" part of this damage is that
the very last byte in the large DDM object gets replaced by an unknown byte
from the unused section of the 'bytes' array. If the bytes array was just
expanded in order to accomodate this object, then the unused section of the
array will contain zeros, but if the bytes array was previously used for a
larger amount of data, then those unused bytes contain unknown values.

<end_quote>

So what I've done with this test is remove the "random" part.

Basically, I take the repro that is already part of the existing patch and I run it twice. The first iteration, I make both the table name and the column name extra long, so that the server-side buffer has more data in it. The second iteration, I use simpler names for the table and column, which take up less space in the server buffer. Then, per Bryan's quote, "if the bytes array was previously used for a larger amount of data, then those unused bytes contain unknown values". But since we intentionally put the "larger amount of data" into the buffer during the first iteration, we know what those "unknown" values are going to be. Then, by twiddling the size of the table/column names, we can 'shift' the data until we reach a point where the off-by-one error manifests itself: namely, we end up incorrectly leaving old data in the current server buffer.

I've also attached the traces for Derby Client and JCC with and without the patch for DERBY-125. Without the patch, search the trace file for the lines beginning with "8160". There will be two such lines: one from the first "iteration" of the repro, and one from the second. With Derby Client, the first occurence looks like this:

8160 000000000401F100 0000000000000000

and the second one looks like this:

8160 000000000001

If I'm understanding correctly, the "01" in the second line is INCORRECT--it should be "00". But because of the off-by-one error, the "01" is left over from the first iteration, which is a manifestation of the problem described for DERBY-125. A similar phenomenon can be seen from the JCC traces:

First occurence:

8160 000000000401F100 0000000000000000

Second occurence:

8160 000000000001002C D052000300262205

For JCC, the second occurrence has additional data in it--it looks like an OPNQRYRM is chained to the data? This might be a result of the 'defer' behavior, I'm not sure.

For whatever reason, this error goes "unseen" by the Derby Client; but for JCC, the result is an error in the repro, namely:

Exception in thread "main" com.ibm.db2.jcc.c.DisconnectException: Execution failed due to a distribution protocol error that caused deallocation of the conversation. A DRDA Data Stream Syntax Error was detected. Reason: 0x3
        at com.ibm.db2.jcc.a.bb.l(bb.java:1206)
        at com.ibm.db2.jcc.a.bb.c(bb.java:363)
        at com.ibm.db2.jcc.a.bb.v(bb.java:1439)
        at com.ibm.db2.jcc.a.eb.c(eb.java:56)
        at com.ibm.db2.jcc.a.r.c(r.java:42)
        at com.ibm.db2.jcc.a.tb.h(tb.java:169)
        at com.ibm.db2.jcc.c.zc.p(zc.java:1223)
        at com.ibm.db2.jcc.c.ad.d(ad.java:2246)
        at com.ibm.db2.jcc.c.ad.U(ad.java:489)
        at com.ibm.db2.jcc.c.ad.executeQuery(ad.java:472)

So while this isn't ideal, at least we'd have a test for JCC that didn't rely on any other fixes (esp. DERBY-492) and that would indicate a regression. That said, though, things to consider would be:

1) why does this error occur in JCC but not in the client? Is it related to deferred prepares, or something else? Can/should the client be recognizing this error and doing something about it?

2) Is there some way to modify this repro so that the client fails, too?

All of that said, I have to admit the repro is a bit fragile and hack-ish. In order to get this to work, the second iteration has to have a table name that is exactly 99 characters long and a column name that is 97 characters. There are other lengths that will cause the problem, as well--but just randomly using two lengths probably won't work. So yes, it's ugly.

Nonetheless, at least we have a test case that fails without the DERBY-125 patch and passes with it, which was the big goal. If after reading this comment anyone feels like this is too hacky/unreliable, then so be it--I won't be offended :)

I don't really have time to continue looking at this, but I'm hoping someone out there can follow up on this and turn it into something useful. My apologies for not taking this further, but I have a lot going on right now...

To run the repro:

> java offByOne

  --> Runs with Derby Client

> java offByOne jcc

  --> Runs with JCC

Hope this is helpful...

A B made changes - 18/Jan/06 06:40 AM
Attachment offByOne.jar [ 12322047 ]
Bryan Pendleton made changes - 23/Jan/06 01:18 AM
Attachment svn_jan_12_2006.diff [ 12321910 ]
Bryan Pendleton added a comment - 23/Jan/06 02:23 AM
Attached is the latest patch proposal, updated to the latest trunk,with all conflicts resolved and derbyall passing. There are two changes in this patch proposal relative to the previous one:
 - I've changed the regression test to use the new test devised by Army. I've verified that the test fails before the patch to DDMWriter, and succeeds after it. Although I agree that the test is a bit delicate, it is much better to have a test which clearly demonstrates the problem.
 - I've improved the comments in finalizeDssLength() a bit to explain my changes a bit better, and also included a pointer to this bug report for those future historians who desire to study the bug in more detail.

Kathey, Dyre, Army: thank you very much for your feedback on the previous proposals and on the questions I raised. My preference is to treat your suggestions independently, and to open separate JIRA entries to track resolving these questions in the future:
 - Improving DDMWriter traces to make them easier to read in the presence of continuations and chaining
 - Modifying the client to process the length portion of DSS Continuation Headers, compatibly with previous releases
 - Under SanityManager control, pre-filling the DDMWriter buffer with special non-zero data to reveal memory corruption problems more readily
 - Investigating performance issues in large object handling with respect to both memory usage and number of data copies

Thanks again to everybody who has helped me with the investigations into this bug.

Bryan Pendleton made changes - 23/Jan/06 02:23 AM
Attachment svn_jan_22_2006.diff [ 12322220 ]
Repository Revision Date User Message
ASF #371603 Mon Jan 23 17:34:23 UTC 2006 kmarsden DERBY-125 Network Server can send DSS greater than 32K to client, which breaks DRDA protocol.

Fixes off by one error during segment shifting.
Fixes continuation flag handling to not overwrite the length.

See changes.html attached to DERBY-125 for more info

Contributed by Bryan Pendleton
Files Changed
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/master/DerbyNetClient/jdk15/prepStmt.out
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/master/prepStmt.out
MODIFY /db/derby/code/trunk/java/drda/org/apache/derby/impl/drda/DDMWriter.java
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/master/DerbyNet/jdk15/prepStmt.out
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/derbynet/prepStmt.java

Kathey Marsden added a comment - 24/Jan/06 04:14 AM

Checked in this patch. Thanks Bryan!

Date: Mon Jan 23 09:34:23 2006
New Revision: 371603

URL: http://svn.apache.org/viewcvs?rev=371603&view=rev

Bryan Pendleton added a comment - 25/Jan/06 11:04 AM
Fixed in the trunk with commit of revision 371603

Bryan Pendleton made changes - 25/Jan/06 11:04 AM
Resolution Fixed [ 1 ]
Fix Version/s 10.2.0.0 [ 11187 ]
Status Open [ 1 ] Resolved [ 5 ]
Bryan Pendleton added a comment - 28/Apr/06 09:17 PM
Reopened to merge this fix to the 10.1 branch.

Bryan Pendleton made changes - 28/Apr/06 09:17 PM
Resolution Fixed [ 1 ]
Status Resolved [ 5 ] Reopened [ 4 ]
Repository Revision Date User Message
ASF #398058 Sat Apr 29 01:29:36 UTC 2006 bpendleton DERBY-125: Network Server can send DSS greater than 32K to client

This change migrates the fix to this issue from the trunk to the 10.1 branch.
Files Changed
MODIFY /db/derby/code/branches/10.1/java/drda/org/apache/derby/impl/drda/DDMWriter.java
MODIFY /db/derby/code/branches/10.1/java/testing/org/apache/derbyTesting/functionTests/master/prepStmt.out
MODIFY /db/derby/code/branches/10.1/java/testing/org/apache/derbyTesting/functionTests/master/DerbyNet/jdk15/prepStmt.out
MODIFY /db/derby/code/branches/10.1/java/testing/org/apache/derbyTesting/functionTests/tests/derbynet/prepStmt.java
MODIFY /db/derby/code/branches/10.1/java/testing/org/apache/derbyTesting/functionTests/master/DerbyNetClient/jdk15/prepStmt.out

Bryan Pendleton added a comment - 29/Apr/06 08:31 AM
The merge from main was straightforward, and the testing was successful, once I re-read Army's notes and remembered the details of how this regression test works. Specifically, I had forgotten that this regression test only shows a concrete symptom in the JCC configuration, not in the DerbyNetClient configuration.

Committed the change to the 10.1 branch as revision 398058:
svn.apache.org/viewcvs?rev=398058&view=rev

Bryan Pendleton made changes - 29/Apr/06 08:31 AM
Fix Version/s 10.1.3.0 [ 12310616 ]
Status Reopened [ 4 ] Resolved [ 5 ]
Fix Version/s 10.1.2.4 [ 12310880 ]
Resolution Fixed [ 1 ]
A B added a comment - 24/May/06 03:58 AM
Issue has been fixed, tested, and ported back to 10.1, so I'm now marking it as closed. Thanks Bryan!

A B made changes - 24/May/06 03:58 AM
Status Resolved [ 5 ] Closed [ 6 ]