|
[
Permlink
| « Hide
]
Bryan Pendleton added a comment - 11/Oct/05 06:49 AM
Here is the sysinfo for the environment in which my Network Server runs
Here is the client-side exception that I receive.
Here is the query that I believe I am running when the Distributed Protocol Error occurs.
I believe I have lucked into a reproducible test case! Below are two queries. The first query does *NOT* exhibit the problem, but the second query *DOES* exhibit the problem. The only difference in the two queries, I believe, is that the first query does not have a WHERE clause, but the second query does.
So apparently it is something about the particular style in which I am including the WHERE clause which is triggering this problem? Query that works: select tf.*,i.name as item_name,tc.case_name, p.name as project_name from apt_test_failure tf join apt_test_case tc on tf.test_id = tc.id join apt_history h on tf.history_id = h.id join apt_item_desc i on h.item_id = i.id join apt_projects p on h.project_id = p.id order by tf.id desc Query that causes missingCodePoint(QRYBLKSZ): select tf.*,i.name as item_name,tc.case_name, p.name as project_name from apt_test_failure tf join apt_test_case tc on tf.test_id = tc.id join apt_history h on tf.history_id = h.id join apt_item_desc i on h.item_id = i.id join apt_projects p on h.project_id = p.id where ( p.name = ?) order by tf.id desc Furthermore, this query works, too! All I do here is add one more condition into the where clause.
select tf.*,i.name as item_name,tc.case_name, p.name as project_name from apt_test_failure tf join apt_test_case tc on tf.test_id = tc.id join apt_history h on tf.history_id = h.id join apt_projects p on h.project_id = p.id join apt_item_desc i on h.item_id = i.id where ( p.name = ?) and ( i.name = ?) order by tf.id desc So it works with *no* where clause, and it works with a compound where clause (two conditions), but it fails with a one-condition where clause. I posted a few tips on debugging protocol errors on the Wiki.
http://wiki.apache.org/db-derby/ProtocolDebuggingTips So maybe the traces will give you some information. On the server side, I'd be curious to know if the queries that work go through that splitQRYDTA path. The fact that some data sets for the same result set format work and others don't might suggest some sort of problem with dividing the data on block boundaries, but only just a wild guess of a possible place of trouble. Here are the client-side traces. It appears that I successfully read 26 rows from the result set, then I crash when I try to close the result set. I suppose this means that the problem is really on the server side, and I'll need to collect traces from there. How do I collect the equivalent server-side traces, Kathy?
The server side tracing information is on the Wiki page as weel.
Looking briefly at your trace and at the splitQRYDTA code it looks like splitQRYDTA assumes it is going to get a CNTQRY after sending each block of data, but in fact in this case the client is sending a CLSQRY. Also from the comment in that function there seems to be something funny about the fact that we are getting the CNTQRY at all. See the DDM manual (Volume 3) take a look at LMTBLKPRC - Limited Block Protocol. for a description of the protocol flow for result sets to see what the actual flow should be here and then we'll figure out how to make it so. It occurred to me that it would probably be most useful to have a matching set of client-side and server-side tracing. So I've re-run the failing test case, with tracing turned on for both the client and the server, and fetched the pair of files and will attach them next.
Client side of a pair of matching trace files surrounding the error. Corresponds to the attachment serverSideTrace.txt.
Server side tracing of the error. Corresponds with the attachment clientSideTrace.txt.
Kathey says: "it looks like splitQRYDTA assumes it is going to get a CNTQRY after sending each block of data, but in fact in this case the client is sending a CLSQRY"
This is very, very interesting. I recently modified my application so that rather than doing: while (rs.next) read-row-and-put-data-on-web-page it now does: while (rs.next) read-row-and-put-data-on-web-page if page is full, break In other words, I used to read the entire result set, but now it is very common for me to read about two dozen rows, then close the result set "early". And this problem began to occur right when I made that change. So yes, I can see where it used to be the case that my client application responded to "split query" with "ok, continue query", but now it will often respond to "split query" with "close query". Let me go read the references you sent, so that I can follow your remarks a bit better, and then I'll be ready to pursue your next suggestion. Thanks! It looks to me like the problem is that, in the "splitQRYDTA" method that Kathey mentioned, the server reads a codepoint and _assumes_ it's a CNTQRY without explicitly checking for it:
// read CNTQRY - not sure why JCC sends this correlationID = reader.readDssHeader(); int codePoint = reader.readLengthAndCodePoint(); DRDAStatement contstmt = parseCNTQRY(); The server reads the codePoint and then assumes that a CNTQRY follows--but it looks like that's not a valid assumption. In this case, the app wants to stop reading the data early, so instead of sending a CNTQRY the client ends up sending a CLSQRY. The CLSQRY codepoint is read, but then parseCNTQRY() is called, so the server is looking for CNTQRY data (which expects a BLKSZ) and instead gets CLSQRY data (which has no BLKSZ), hence the missingCodePoint exception. I looked at the DDM Manual and found the following statements: [ LMTBLKPRC ]: A query is terminated any time the CLSQRY command suspends it. [ EXCSQLSTT ]: After the EXCSQLSTT or each subsequent CNTQRY command, each result set is suspended unless some condition terminates the result set (see CNTQRY (on page 234)). The CNTQRY command can continue a suspended result set so that the next portion of the answer set data is returned. A result set is terminated anytime the CLSQRY command is received and processed. [ LMTBLKPROC ]: The following examples show some of the valid responses to the CLSQRY command: ? If the query is suspended under normal conditions, then an SQLCARD object is returned. I'm assuming that in this case "the query is suspended under normal conditions" (at least, I don't see anything in the traces to indicate otherwise), so based on the above lines from the DDM manual, it seems like the server should be allowing for a CLSQRY to be received in the middle of splitQRYDTA and, if it receives one, should perform normal CLSQRY processing (ultimately returning a SQLCARD). That said, though, I spent a good chunk of time trying to force this situation to occur, and I wasn't able to do so. That doesn't mean it's not possible, just that I didn't have any luck reproducing the problem on my own. I agree that the CLSQRY is perfectly valid, and perhaps many other commands would be perfectly valid at this point too, maybe another statement execution or anything else. The expectation of splitQRYDTA that it is going to get a CNTQRY after sending the first block of data seems problematic to me. Can the server chain the DSS's and push all the remaining data out in a continuation block, or does it have to save the remaining data until it actually does get a CNTQRY? Bryan is investigating what the flow should be in this case, so I'll wait to hear back on his research and then take a closer look at the protocol.
Bryan, do you have a reproducible case that you can attach to the Jira issue? That would really help facilitate discussion on this issue. > Bryan, do you have a reproducible case that you can attach to the Jira issue?
Unfortunately, no. After your very clear explanation of the problem, I thought it would be straightforward to create a reproducible case, but lots of tweaking of my application over the weekend left me no closer to such a case. The two variables I've been principally playing with are the "size" of each row in my result set and the number of rows that I retrieve prior to closing the result set. I thought that what I was trying to do was to force the ResultSet.close() call to occur "cleanly" between QRYDATA blocks, possibly with a partial row spanning the gap between the QRYDATA blocks, but that by itself doesn't appear to be enough to make the problem happen (which I guess means I don't understand the bug yet!). I'll keep trying. YAY! I think I've found a reproducible test case! I'll attach a zip file with the details. Please let me know if you can reproduce the problem using this data.
I think the parseCNTQRY in splitQRYDTA is wrong. The client should be able to send any command after receiving a respose to its previous CNTQRY, so the remainder of the data needs to be handled in some other way than just assuming the next request it gets is going to be a CNTQRY.
What were you able to find on this in the documentation for LMTBLKPRC? Does the server need to associate the data with the result set and wait for the next CNTQRY, or can it send another block in a continuation DSS immediately after sending the first one? > Does the server need to associate the data with the result set and wait for the next CNTQRY, or can it send another block in a continuation DSS immediately after sending the first one?
I'm sorry; I'm finding these DRDA specs rather hard going. For example, I can't find the phrase "continuation DSS" in any of the 3 DRDA specs. I did find some things that seemed relevant, though: - Sections 4.4.6.2 and 7.21.1.1 in Volume 1 seem relevant to this issue, as does the MAXBLKEXT definition in Volume 3 - On page 477 of volume 3: "Answer set data may also be returned in one or more QRYDTA reply data objects" - On page 476 of volume 3: "Possibly, additional query blocks, limited in number by the value of the MAXBLKEXT parameter of OPNQRY, each containing a QRYDTA reply data object" - On page 137 of volume 1: "If the application server cannot return all the requested rows because it has used up all the query blocks it is allowed to, according to the maxblkext parameter, then the DRDA rowset is incomplete. The application server returns one or more QRYDTAs with the rows it has fetched, and expects the CNTQRY request from the application request either to complete the DRDA rowset (or, optionally) to reset it." - Volume 3 says that the MAXBLKEXT value defaults to 0, which I think means that, by default, the server can only return a single QRYDTA block at a time. - On page 137 of volume 1: "Note: At any time after the application requester has sent the OPNQRY command and the application server has successfully processed it, and before the application server has sent an ENDQRYRM reply message, the application requester can send a Close Query (CLSQRY) command..." - On page 435 of volume 1: "If the space remaining in the query block cannot contain the complete base rows, only that part of the row data that can fit in the remainder of the exact query block is added to the query block. If this is the first row added to the exact query block, then additional exact query blocks are generated to contain the remainder of the base row data. If this is not the first row added to the exact query block, then either a partial row is returned in the last exact query block or extra query blocks can be generated to contain the remainder of the row (and possibly, additional rows) as allowed by the maxblkext parameter." So, I'm guessing that by "continuation DSS" you mean the creation of a 2nd-through-nth QRYDATA block. If so, then yes, I believe that the server is allowed to do this, but only if the client has provided a non-zero MAXBLKEXT parameter. Once the server has generated (MAXBLKEXT + 1) QRYDATA blocks, if it has not yet exhausted the rows to be returned, then it must halt the data return process in mid-row, place the query into "suspended" state, and wait for the client to send either a CNTQRY or a CLSQRY. I do not believe that the client is allowed to send any commands other than CNTQRY or CLSQRY; those are the only legal options. Please help me better understand the terminology that these DRDA specs are using. I hear you about the DRDA specs. They are pretty hard. That's why I am making you do the research #:) This research by the way is excellent.
DSS stands for Data Stream Structure. It is sort of the packaging mechanism for data going to the client. It's size is limitted to 32K but sometimes it is ok to set a flag to continue in the next DSS and send more data. The Data Stream Structure documented in the DDM manual under DSS. I have to say I don't really understand the chaining very well, but hope this will be an opportunity for me to learn more about it myself. You said ... >I do not believe that the client is allowed to send any commands other than CNTQRY or >CLSQRY; those are the only legal options. For this particular result set you are right, the client can send only either a CNTQRY or CLSQRY and it will send an identifier for the result set (PKGNAMCSN) with the request to identify which result set it is, but there may be intervening requests that have nothing to do with this result set. For instance if you change your repro to execute some totally unrelated statement before closing the result set, such as "CREATE TABLE FOO (i int)" then before it ever sends the CLSQRY for this result set you will see an EXCSQLIMM as the next command sent from the client. So, we would again fall over if we simply changied splitQRYDTA to handle CLSQRY as well as CNTQRY. Also we may get an intervening CNTQRY for an entirely different result set and then we could end up sending the data for the wrong result set for the reqest, so that is why I think the parseCNTQRY in splitQRYDTA is wrong. Could you try to change your test case as described and see if in fact my assessment is correct that we might get some other request from the client before we get the CNTQRY or CLSQRY request for this result set? > Could you try to change your test case and see if we might get some other request from the client before we get the CNTQRY or CLSQRY request for this result set?
It is exactly as you suspected. If I add a 2nd statement and call 's2.executeUpdate("create table foo (i int)");' at the critical time, we still get the SYNTAXRM exception, but the details are indeed different: - the codepoint in the exception in the server log is 0x2105, which I think is RDBCMTOK - the server-side trace shows that we got EXCSQLIMM rather than CLSQRY: RECEIVE BUFFER: EXCSQLIMM (ASCII) (EBCDIC) - the server-side stack trace is just slightly different. Rather than parseCNTQRY calling 'missingCodePoint', which is what happens when we send a CLSQRY, this time parseCNTQRY calls 'invalidCodePoint', which throws the SYNTAXRM exception. So I agree with you: the client should be able to send *any* command at this point, so the server needs to be prepared to handle any command. I guess this means that the server needs to be able to fully suspend this query, holding the partial row in abeyance, such that it can resume processing the query, starting a new QRYDTA block with the remainder of this partial row, if the client so requests, but also be able to handle any other command that might arrive. I see that there is DRDAStatement.rsSuspend, which in turn calls DRDAResultSet.suspend, but those seem to simply set a status flag; we also need to record the contents of local variable 'temp' which is holding the remainder of the partially-returned row, yes? It seems that a solution should be something along the lines of: 1) when splitQRYDTA determines that it cannot fit all the data into this block, it should attach the remainder data to the result set, suspend the statement, and pop all the way back out to the processCommands loop 2) When a CNTQRY is received, the first thing that writeQRYDTA should do is to look for any partial remainder data that might have been saved by step 1, and write that data prior to writing any new data. 3) And it seems like the MAXBLKEXT parameter should figure in here as well, I think? It's still muddy, but it's getting much clearer. Your steps 1 and 2 sound like a good approach to me. It would be good to get input from others. I know Oyvind, Rick and Francois have been dabbling in DRDA. Opinions?
MAXBLKEXT I think would only be relevant if you took the alternate approach of trying to push the remaining data out to the client when splitQRYDTA is called and not save it for later. That approach might be a can of worms as I don't know that the client is set up to handle it, since it is currently sending MAXBLKEXT as zero. I wanted to check in on this issue. I wanted to make sure that my comment that it would be good to get input wasn't stalling your effort. I think the approach you proposed is a good one.
I have done a bit of reading on the subject (which was a great learning exercise btw) and I find Brian's approach with 1) and 2) to be appropriate. We'll need to ensure the remainder data attached to the resultset is released/freed if the CLSQRY DDM command is sent by the requester or due to some unexpected error condition (as part of ENDQRYRM) returned from the server - but that should happen as part of resultset deallocation logic..
As far as handling MAXBLKEXT in 3) - I'd say that we obviously need to take it into account since a client (requester) could set a value different than zero (i.e. other DRDA clients out there). Interestingly enough, the DDM specifications mention the following for MAXBLKEXT: "A value of zero indicates that the requester is not capable of receiving extra query blocks of answer set data. " which is the default we have right now, hence the server should not be returning extra query blocks of QRYDTA...would that also be why the client sent CLSQRY in the first place...I mean, if the client set a MAXBLKEXT of zero, then it will not expect reading further data from the server after the initial reply (just some thoughts). Now, the specs also say the following about MAXBLKEXT: "A value of minus one indicates that the requester is capable of receiving the entire result set." (am reading this as the requester will be able to receive the entire resultset without any particular limit on the number of blocks of answer set data returned by the server) - this is not very clear from the specs so not completely sure about this. Hence, should not we set '-1' as the default value of MAXBLKEXT for the Derby DRDA networkclient driver instead of zero? Since and IMHO, the (derby) server should by default try to return the entire resultset (in chunks/clocks as appropriate) to the derby networkclient which has control as far as reading more data or not (via CLSQRY) anyway. Now, in case of MAXBLKEXT limit being reached (or in MAXBLKEXT value being zero) then the server should probably terminate the query via a ENDQRYRM reply and some error condition (Severity Code). Hi Kathey & Francois (& others):
I'm trying to write up a mini-design spec for changes to the DRDA implementation to resolve this issue. But during my research, I've gotten myself pretty confused, and could use some help. Can you consider the following questions and tell me what you think? thanks in advance, bryan 1) Does the LMTBLKPRC protocol apply to invocations of callable statements? That is, if we call a procedure, and it has output parameters, and the data in those parameters exceeds the blocksize, should we split the query data and await a CNTQRY? 2) When we are returning externalized data as part of the response to a query, are there any rules for the intermingling of the EXTDTA objects and the QRYDTA objects? Suppose, for example, that a particular QRYDTA object contains the end of row 4, the entire contents of rows 5 and 6, and the start of row 7. I believe that the EXTDTA for rows 5 and 6 should be sent after we send this QRYDTA, but I am not sure about the EXTDTA for rows 4 and 7. I guess what I am asking is: is the rule "we send the EXTDTA for a particular row after the QRYDTA which contains the *start* of this row", or is the rule: "we send the EXTDTA for a particular row after the QRYDTA which contains the *end* of this row"? 3) At the JDBC user level, what JDBC API calls cause the qryrtndta parameter to be set to FALSE. In the DRDA spec, I see: "The qryrtndta parameter specifies whether data is to be returned with the command. If FALSE is sent on the command, the result is that only a positioning fetch operation is performed." How does this situation actually arise in ordinary use? What sort of client program would I write that would cause qryrtndta to be set to false? 4) Do you think that we have any tests in the test suite which validate the behavior of the server when using both LMTBLKPRC protocol and QRYROWSET specification? I've been reading through the DRDA spec's discussion of QRYROWSET handling, and although it does provide some rules for how to handle the various combinations which arise, the rules are *extremely* complicated and I'm nervous about accidentally breaking this processing. Well, this may be premature, or even wildly wrong, but I took a stab at designing a new implementation of the writeQRYDTA/writeFDODTA/splitQRYDTA methods in DRDAConnThread.java in order to fix this bug. Please have a look at the attached and tell me what you think. Depending on how much of a turkey coma I suffer from, I may take a stab at coding this up over the long weekend, so it would be great to have some early feedback prior to that.
thanks, bryan I've coded up a new implementation, which has some new behaviors. The client likes some of those new behaviors, but doesn't like others. Hmmm... this is going to be an "interesting" bug fix. :)
After stepping through the current code carefully in the debugger, and trying several failed attempts at changes, I learned several more things:
First, as Kathey observed on the mailing list, qryrowset is commonly 0 when clients perform a fetch of result set data. This means that the "while" loop in writeFDODTA usually runs exactly one time, and it is the responsibility of the "while" loop in writeQRYDTA to compose the implicit row set for return. I do think that it would be possible to push this behavior down into writeFDODTA and have a single "while" loop handle all these cases, but I also think that writeFDODTA is already complicated enough, so I chose instead to keep the current behavior, to be more surgical in my changes, and to document as best I could what is going on. Second, test "big.sql" in the lang suite does a pretty good job of exposing a lot of the interesting behaviors of this part of the code. Third, as Kathey observed on the mailing list, the new member variable to hold the split data between client requests belongs on the DRDAResultSet, not on the DRDAStatement. I'm still not really sure I understand this business about a statement having multiple current result sets, but the implementation is much more natural with that change. Fourth, it appears that the current code has the behavior that, when sending the remainder data for a split row in a subsequent QRYDTA block, no additional rows are added to an implicit result set for that QRYDTA block. Instead, the remainder data is placed into the QRYDTA block and immediately returned, and a subsequent CNTQRY/QRYDTA pair is needed to start fetching additional rows. This had completely escaped me during "code reading", but it is the whole point of the "moreData=false" after calling splitQRYDTA from writeFDODTA. Fifth, with respect to EXTDTA and split query data, the behavior of the current code appears to be "we send the EXTDTA for a particular row after the QRYDTA which contains the *end* of this row." I have tried to retain this behavior in the new code. One interesting case is when a row must be split across not just 2, but 3 QRYDTA blocks; in this case the EXTDTA needs to be sent after the 3rd such block; that is the reason for the extra "if" statement in writeQRYDTA. Lastly, I'm still not sure I understand how LMTBLKPRC is supposed to interact with callable procedure statements, nor do I understand what would cause the qryrtndta field to be set to FALSE, but I think that my changes don't materially affect those cases, and the new code should have generally the same behaviors in those cases as the old code. I'm doing some final testing, and should post my patch shortly. Here's my proposed patch. It includes substantive changes to DRDAResultSet, DRDAStatement, and DRDAConnThread, changes to the big.sql tests, and a small new test called jira614.java which specifically reproduces this bug.
For reviewers: the big.sql tests pass with both the old code and with the new code. I added new tests here in order to exercise a few more cases of the new code, specifically the handling of rows which were split across 3 QRYDTA blocks, but those cases were not broken in the old code, so the tests pass in both cases. The new jira614 test fails with the old code, but passes with the new code. Suggestions about additional testing that may be needed are welcome. I've run the reproduction case, as well as all the cases in both derbynetmats and derbynetclientmats, which appear to be the primary test suites for testing the network server code. Also, of course, suggestions and feedback about the code changes themselves would be wonderful! Please have a look at these changes at your convenience. thanks, bryan Ugh. There's always 1 last detail. My new test case in lang/big.sql fails, but only:
- if run with the db2jcc.jar driver - AND if the client is run by the test harness For some reason, if I just run IJ using the DB2Driver and run big.sql interactively (using "run 'big.sql';"), the test runs fine. But if I run it with all the flags that the harness uses, then I get a problem. So there's something in: C:\bryan\src\derby\main\tests>java -Dderby.system.home=C:\bryan\src\derby\main\t ests\DerbyNet\big -Dderby.infolog.append=true -Duser.dir=C:\bryan\src\derby\main \tests -Dij.defaultResourcePackage=/org/apache/derbyTesting/functionTests/tests/ lang/ -Dframework=DerbyNet -Dconsole.encoding=Cp1252 org.apache.derby.tools.ij - fr big.sql -p C:\bryan\src\derby\main\tests\DerbyNet\big\big_app.properties that is altering the behavior. Sigh. More fun things to investigate :) The change looks good to me. A couple small points and a request....
For the assertion "There was no data to split", it would be good to have something a bit more descriptive. Can the Jira614.java test be added to an existing test like prepStmt.java to avoid additional database creation in the test suite? Would you be willing to try your server with the original 10.1 client release?Probably the easiest thing to do is run derbynetclientmats with the derbyTesting.jar and derbyclient.jar from the 10.1.1.0 release and your 10.2 derby.jar and derbynet.jar. There may be a few diffs because of bug fixes in 10.2. If you post the diffs I'd be happy to spot check if they just look like bugfixes. Other thoughts that came to mind upon reviewing this change, but not directly relevant to the fix: Even though the agent errors are essentially assertions it might be good to file a Jira entry to localize them, since we do not put them in SanityManager.DEBUG blocks and they could be returned to the user. Since splitting is expensive, it might it be good for network server to be a bit smarter about the split by either improving the coarse estimation in writeQRYDTA or doing something more efficient to handle the overrun than copying it into a new array. Of course that may make bugs like this even harder to track down. Here are the results from running derbynetclientmats with the 10.1.1.0 versions of derbyTesting.jar and derbyclient.jar in my classpath.
I didn't see anything that directly pointed to a problem with the patch, but I didn't know how to interpret/explain all the diffs. Please have a look, Kathey, and tell me if any of these results concern you. Here is an updated patch and updated SVN status incorporating Kathey's feedback:
- big.sql includes some simple SQL tests of the split code - new method jira614 within prepStmt.java includes a focused test of precisely the situation that raised the original bug. - I tried to make the assertion in splitQRYDTA more descriptive of what it was trying to catch - I added some comments about the issues involved in splitting, and about possible alternate algorithms both DerbyNet and DerbyNetClient pass with this patch, and the rest of derbyall was uneventful, too. I committed this patch
Date: Fri Dec 9 17:51:26 2005 New Revision: 355689 URL: http://svn.apache.org/viewcvs?rev=355689&view=rev Thanks so much for the wonderful fix and the great javadoc you added. One thing I did notice that deserves followup and may be related to the intermittent hangs you have seen, is that there was a finalizeChain() call in the old file on line 6139 that seems to be missing from the new version, line 6278. This might be the source of the hang you reported for one of the dss chaining bugs (I can't seem to find that mail now). Can you take a look at that and also clean up the out of date patch attachments to 614. After re-reading the changes and stepping through the code again, I believe that the patch is correct, and the removal of the finalizeChain() call from splitQRYDTA is proper. Here is my reasoning:
The previous flow-of-control for a split QRYDTA block was that writeFDODTA assembled a large row, which splitQRYDTA then split across multiple QRYDTA blocks, sending each of those blocks and waiting for the next CNTQRY message before continuing. By the time that splitQRYDTA returned, any QRYDTA blocks containing partial portions of that large row had been sent to the client, and the current QRYDTA block was filled with the final bytes of the split row. The finalizeChain call in splitQRYDTA was necessary because that was how the partial QRYDTA blocks were sent to the client. In the new flow-of-control, splitQRYDTA does not actually send the QRYDTA block to the client, nor does it wait for the CNTQRY from the client. Instead, all sending and receiving happens in the "normal" location in processCommands. splitQRYDTA arranges to split the data, records the partial data in the result set, and control "pops" all the way back out to processCommands. The QRYDTA block which contains the partial row data is sent by the finalizeChain() call in processCommands, at line 897. In all cases, in both old and new code, externalized data was sent after the QRYDTA containing the last remainder bytes of the split row, by calling writeEXTDTA from writeQRYDTA. Thus the old code had two finalizeChain calls, because a QRYDTA block might be sent either by processCommands, or by splitQRYDTA, depending on whether it contained a split row or not. The new code has only one finalizeChain call, because all QRYDTA blocks are sent by processCommands. One case that I had trouble exercising in my testing is that there is *also* a call to splitQRYDTA from doneData, and perhaps in that case there is a problem? Fundamentally, I think the flow of control will be the same: doneData calls splitQRYDTA to prepare the partial block, control pops back out to processCommands to send the response and wait for the CNTQRY, then processLeftoverQRYDTA will drop the remainder bytes into a subsequent QRYDTA to be sent. Still, I have not been able to provoke such a case in my simple testing. Are there flaws in my reasoning here? Additional test cases I should try? Do you know of a test case which will cause splitQRYDTA to be called from doneData? Marking as closed since Kathey has committed the patch to the trunk and I've verified that it addresses my problem.
Reopening this bug to submit a patch to reset splitQRYDTA variable in DRDAResultSet.close(). This was found when working on DERBY-210 and discussion in derby-dev can be found in this thread: http://www.nabble.com/-jira-Commented%3A-%28DERBY-210%29-Network-Server-will-leak-prepared-statements-if-not-explicitly-closed-by-the-user-until-the-connection-is-closed-t1116045.html#a2952317
Attaching a patch 'reset_drda_rs.diff' which resets splitQRYDTA field in DRDAResultSet.close() method. This patch changes only one file:
M java\drda\org\apache\derby\impl\drda\DRDAResultSet.java With this patch, I ran derbynetmats and derbynetclientmats with Sun JDK 1.4.2 on Win XP. No failures. Bryan mentioned he'll try to make a repro/test specific to this issue: http://www.nabble.com/Re%3A-jira-Commented%3A-%28DERBY-210%29-Network-Server-will-leak-prepared-statements-if-not-explicitly-closed-by-the-user-until-the-connection-is-closed-p2961106.html I would appreciate if someone can take a look and commit this patch. Thanks. Welcome back, JIRA! Attached is d614_a.diff, which contains the patch that Deepa found as well as a new regression test which demonstrates the problem. To verify this fix, run the regression test without the patch and you will get a StringOutOfBoundsException in the client-side parsing, then apply the one-line change to DRDAResultSet.java and the test will pass. To run the test by itself:
java -Dframework=DerbyNetClient org.apache.derbyTesting.functionTests.harness.RunTest derbynet/prepStmt.java Thank you very much to Deepa for finding the problem and describing it so clearly. For everybody else, here's the quick summary: DRDAStatement objects on the server side can get re-used after they have been closed. After a statement has been closed, the client may re-use the Section Number when sending a new request for a new statement. The server then discovers that there was a statement object in memory, and it automatically closes that statement and re-uses it. With my first patch to this bug, I introduced a problem in this processing; the splitQRYDTA field was not cleared when the statement was closed. This meant that if that splitQRYDTA field just happened to have some pending query data from the previous statement at the point where it was re-used, the pending query data would erroneously stick around and be returned as the first result for the new statement. Thus, the conditions that lead up to this bug are: - a statement is created, and results are fetched, and a QRYDTA block has to be split. - instead of sending a CNTQRY to fetch the split data, the client instead closes the statement. - then, the client creates a new statement with the same Section Number (it is free to do so, since the previous statement has been closed), and sends a new query request over. - at that point, the server responds to the new query with the leftover split QRYDATA. The regression test simply forces the server down this code path. If somebody could please take a look at this patch, I would be very grateful. Updated patch d614_a_with_close_stmts.diff addresses feedback from Dan and Deepa: the test should close the statement objects at the end in order to avoid wasting client driver resources and to match standard JDBC style. I re-ran the test after adding the close calls, both before and after applying the patch to DRDAResultSet.java, and confirmed that the test still demonstrates the bug by failing prior to the fix and succeeding afterwards. Please have another look.
I committed this change. I noticed there is other state that is not reset in closed. Perhaps it is guaranteed to be initialized, I don't know. Maybe another Jira entry to make sure the other state is reset would be good.
Date: Fri Feb 17 07:59:55 2006 New Revision: 378552 URL: http://svn.apache.org/viewcvs?rev=378552&view=rev Deepa said
> > > >I had checked all the DRDAResultSet states get initialized when >re-used. If they are not in close, they get initialized in some other >DRDAStatement methods - like addResultSet, setRsDefaultOptions, >setOPNQRYOptions, setQueryOptions etc. I will open a JIRA and post my >observations there so that this can be rechecked. I think it will be >good to add reset for all states in close() method so that we don't >miss anything by any other code changes. > > Or maybe move them out to a reset method that is called when the Resultset is reused. Would that make things clearer for the future? At least add a comment about how important it is to add new fields to this reset process. Thank you Kathey for submitting the second patch.
Re-opening this bug to port the fix to 10.1
Attached patch 'derby614_10_1.diff' which merges the following changes to 10.1
355689 - Fixes the split logic when data passes the blksize boundary. Details of the fix are in spec.html attached to 378552 - clear splitQRYTDA field for reuse of the resultset Running derbyall now. Will commit this tomorrow. Checked this change into 10.1 branch:
Date: Thu Apr 13 06:14:26 2006 New Revision: 393798 URL: http://svn.apache.org/viewcvs?rev=393798&view=rev Thanks again to Bryan for this great fix. It will make an important addition to 10.1.3 and surely save lots of folks, including me, countless hours of trying to track this down and get it resolved. Kathey reclosing this issue after port to 10.1
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||