|
[
Permlink
| « Hide
]
Bryan Pendleton added a comment - 27/Nov/07 02:52 PM
What is the error that you get on the server? Can you look in derby.log?
Sorry, but I forgot to mention that there is no error reported in the server log file.
I tried it with the "derby.drda.debug" property set to "true" and i got pretty much output in derby.log. But no errors or exceptions were reported. I also tried to use the debug libraries to get more information on the error, but there were still no error messages. All the rows of the table are reported in derby.log. This is one of the reasons why I believe that it could be a client problem. I used the debugger on the client side and got to the line in method org.apache.derby.client.net.NetCursor.checkAndThrowReceivedEndqryrm where the exception is thrown. At the time when the exception is thrown all data are already transferred to the client side and it seems like there is just one byte missing for method readFdocaOneByte (which is than calling checkAndThrowReceivedEndqryrm because position_ == lastValidBytePosition_) but the server connection is already closed. I've been working with derby for a long time and i never got such a behavior. I'm curious about an explanation for this. What tool do I use to unpack a .rar file?
I know of WinRAR for Windows (http://www.rarlab.com/), and there's GNU unrar for the various Unix OSes.
Thanks Andrew, WinRAR worked fine.
I was able to reproduce the problem with the current trunk code and the provided test.rar file. Stepping through the code on the server side is interesting. When the entire dataset is fetched
via IJ, it just so happens that the last row ends up at position 32706 in the buffer, meaning that there is 61 bytes of space available in the buffer. But by the time DRDAConnThread.doneData() has finished writing the various bits and pieces of housekeeping data to the end of the buffer, it has gone beyond 32767 bytes and hence needs to be split. The splitQRYDTA processing then ends up splitting the QRYDTA block in the midst of the doneData SQLCAGRP bytes, rather than in the middle of some normal row data, which is how most QRYDTA splits work. The client code then gets into the processing of the doneData SQLCAGRP, but does not expect that the DRDA response block might have been split there, so instead of requesting the remainder of the block with a CNTQRY, the client code throws an exception. I'm sorry, I know that the above is kind of dense and jargon-y. I think that the bottom line is that this problem is closely related to the bad luck to need to split the query results at a point that was: - after the last row of the results - but before the end of the message - and hence at a place where the client code didn't expect the block to have been split. It may be an off-by-one bug; it appears that the final buffer is exactly 1 byte too long,
and the splitting of the buffer splits it into a 32767 byte portion and a 1 byte portion. Thanks Bryan for your comments on this issue so far.
I am taking a look at this issue and have a question about the position logic in the client. The failure occurs in this code because position_ = lastValidBytePosition_ in the following code in NetCursor. private int readFdocaOneByte() throws org.apache.derby.client.am.DisconnectException, SqlException { // For singleton select, the complete row always comes back, even if multiple query blocks are required, // so there is no need to drive a flowFetch (continue query) request for singleton select. if (position_ == lastValidBytePosition_) { // Check for ENDQRYRM, throw SqlException if already received one. checkAndThrowReceivedEndqryrm(); ....... There is no javadoc for lastValidBytePosition, nor currentRowPosition and nextRowPosition which seem to be at play here as well. I haven't really been able to figure it all out from the code yet. I'd most appreciate an explanation of the interplay of the various positions if someone understands it, so I can understand why position == lastValidBytePosition and why that is a bad thing in this case. I will add any explanation as comments to the code. Kathey In order to better understand what is going on here I thought I would try to develop a repro without the user database. I thought all that was needed was to produce data of the correct size approaching 32K. I wrote this not very intelligent program to just increment the size of a varchar by 1 to try to find the correct size and reproduce the error, but was unsuccessful in reproducing it. Anyway, I am quite out of ideas on how to approach this issue, either in creating a reproduction or fixing and would most appreciate advice on next steps.
Hi Kathey, I think that you are proceeding along the right course; developing a
standalone repro seems like the right plan. I agree that it should be very data dependent. Unfortunately, I am temporarily without a computer as my main computer has died, so I'm kind of handicapped until I get another computer set up. Can you (using a debugger) examine how the server behaves with your repro? In particular, I thought it was particularly fascinating that with the original repro we had a call to splitQRYDTA made from doneData, because I had never seen that case happen before while stepping through the server code. With your repro, can you tell whether there is a splitQRYDTA call made from doneData? Thank you Bryan for the help. I see that the reason my repro does not trigger the problem is because the server proactively ends the DSS in this code because it would not be able to fit another 15K row. If I comment out
// if ((stmt.getBlksize() - endLength ) < rowsize) // getMoreData = false; I can trigger the problem. I will work to adjust my rowsizes so I don't trigger this condition. // if we don't have enough room for a row of the // last row's size, don't try to cram it in. // It would get split up but it is not very efficient. if (getMoreData == true) { int endLength = writer.getDSSLength(); int rowsize = endLength - startLength; if ((stmt.getBlksize() - endLength ) < rowsize) getMoreData = false; Attached is a repro for this issue without the user database, Repro3230.java. Thanks Bryan for the help in narrowing the case down.
Marking as a regression because this passes with 10.1.3.1 client, with 10.4 server. It fails with 10.2.1.6. Even with 10.1.3.1, we pass through splitQRYDTA in doneData, so I assume this is a regression in the client being able to handle this case.
I think this dates back to
if (sqlcode == SqlCode.END_OF_DATA.getCode()) { setAllRowsReceivedFromServer(true); This marked the ResultSet as closed on the server before we sent that final CNTQRY to get the null indicator. Moving the line daNullIndicator = readFdocaOneByte(); up before we do the check and mark the result set as closed, seems to resolve the issue. I don't see why this can't happen sooner rather than later. I will work up a patch and run some tests. Attached is a patch for this issue. The solution was to move the retrieval of all of the data associated with the QRYDTA before the ResultSet is marked as closed on the server.
Ran suites.All and derbynetclientmats. Please review derby-3230_diff.txt
The change looks solid to me, Kathey. The new test failed as expected without the
code fix, and passed as expected with the code fix applied. Thanks for the careful research against the different versions to figure out what changed and why; your explanation makes good sense to me. +1 to commit. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||