Issue Details (XML | Word | Printable)

Key: DERBY-3230
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Critical Critical
Assignee: Kathey Marsden
Reporter: Stefan Huber
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Derby

Selecting data from a Table raises Error XN008: Query processing has been terminated due to an error on the server

Created: 27/Nov/07 09:50 AM   Updated: 30/Jun/09 03:55 PM
Return to search
Component/s: Network Client, Network Server
Affects Version/s: 10.3.1.4
Fix Version/s: 10.3.3.0, 10.4.1.3

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works derby-3230_diff.txt 2008-01-17 12:06 AM Kathey Marsden 3 kB
Text File Licensed for inclusion in ASF works derby-3230_stat.txt 2008-01-17 12:00 AM Kathey Marsden 0.2 kB
Java Source File Licensed for inclusion in ASF works Repro3230.java 2008-01-16 12:48 AM Kathey Marsden 3 kB
File Licensed for inclusion in ASF works test.rar 2007-11-27 09:53 AM Stefan Huber 358 kB
Java Source File Licensed for inclusion in ASF works TestBoundaries.java 2008-01-15 05:11 PM Kathey Marsden 2 kB
Environment: Windows XP

Urgency: Urgent
Issue & fix info: Patch Available
Bug behavior facts: Regression
Resolution Date: 17/Jan/08 06:11 PM


 Description  « Hide
I got a table (PCLASS) in my database where I cannot select the data via a simple select statement:
- If I execute 'select * from PCLASS': XN008: Query processing has been terminated due to an error on the server
- If I execute 'select * from PCLASS order by CLASSNAME': it works
- If I add or remove only one row to/from the table: it works

If I use the embedded driver everything works fine. But as soon as I use the derby network client to access the database, this problem occurs.

This is very odd. I spent a lot of time to do some research on this behavior because I could not believe that the network client produces errors depending on the data contained in a table. But after all it looks like I managed to fill the table with an exact amount of data that causes this error to arise.

To reproduce the problem download the attached file and follow these instructions:
* extract test.rar to $derby.system.home$
* start derby network server
* start ji utility and execute the following commands:
** connect to database test using client driver:
connect 'jdbc:derby://localhost/test;user=ZOL;password=zol';
** select content of table PCLASS
select * from PCLASS;
--> Error XN008: Query processing has been terminated due to an error on the server

** disconnect from database
disconnect;
** shutdown database 'test'
connect 'jdbc:derby://localhost/test;user=ZOL;password=zol;shutdown=true';
** connect to database 'test' using embedded driver:
connect 'jdbc:derby:test;user=ZOL;password=zol';
** select content of table PCLASS
select * from PCLASS;
--> everything o.k.


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Bryan Pendleton added a comment - 27/Nov/07 02:52 PM
What is the error that you get on the server? Can you look in derby.log?

Stefan Huber added a comment - 27/Nov/07 03:46 PM
Sorry, but I forgot to mention that there is no error reported in the server log file.
I tried it with the "derby.drda.debug" property set to "true" and i got pretty much output in derby.log. But no errors or exceptions were reported.
I also tried to use the debug libraries to get more information on the error, but there were still no error messages.
All the rows of the table are reported in derby.log. This is one of the reasons why I believe that it could be a client problem. I used the debugger on the client side and got to the line in method
        org.apache.derby.client.net.NetCursor.checkAndThrowReceivedEndqryrm
where the exception is thrown.
At the time when the exception is thrown all data are already transferred to the client side and it seems like there is just one byte missing for method readFdocaOneByte (which is than calling checkAndThrowReceivedEndqryrm because position_ == lastValidBytePosition_) but the server connection is already closed.

I've been working with derby for a long time and i never got such a behavior. I'm curious about an explanation for this.

Bryan Pendleton added a comment - 28/Nov/07 02:01 AM
What tool do I use to unpack a .rar file?

Andrew McIntyre added a comment - 28/Nov/07 02:22 AM
I know of WinRAR for Windows (http://www.rarlab.com/), and there's GNU unrar for the various Unix OSes.

Bryan Pendleton added a comment - 28/Nov/07 03:21 AM
Thanks Andrew, WinRAR worked fine.

I was able to reproduce the problem with the current trunk code and the provided test.rar file.

Bryan Pendleton added a comment - 28/Nov/07 04:12 AM
Stepping through the code on the server side is interesting. When the entire dataset is fetched
via IJ, it just so happens that the last row ends up at position 32706 in the buffer, meaning that
there is 61 bytes of space available in the buffer. But by the time DRDAConnThread.doneData()
has finished writing the various bits and pieces of housekeeping data to the end of the buffer,
it has gone beyond 32767 bytes and hence needs to be split. The splitQRYDTA processing
then ends up splitting the QRYDTA block in the midst of the doneData SQLCAGRP bytes,
rather than in the middle of some normal row data, which is how most QRYDTA splits work.
The client code then gets into the processing of the doneData SQLCAGRP, but does not expect
that the DRDA response block might have been split there, so instead of requesting the
remainder of the block with a CNTQRY, the client code throws an exception.

I'm sorry, I know that the above is kind of dense and jargon-y. I think that the bottom line is that
this problem is closely related to DERBY-614 and has to do with the fact that this query had
the bad luck to need to split the query results at a point that was:
 - after the last row of the results
 - but before the end of the message
 - and hence at a place where the client code didn't expect the block to have been split.

Bryan Pendleton added a comment - 28/Nov/07 04:17 AM
It may be an off-by-one bug; it appears that the final buffer is exactly 1 byte too long,
and the splitting of the buffer splits it into a 32767 byte portion and a 1 byte portion.

Kathey Marsden added a comment - 09/Jan/08 11:44 PM
Thanks Bryan for your comments on this issue so far.

I am taking a look at this issue and have a question about the position logic in the client. The failure occurs in this code because position_ = lastValidBytePosition_ in the following code in NetCursor.

  private int readFdocaOneByte() throws org.apache.derby.client.am.DisconnectException, SqlException {
        // For singleton select, the complete row always comes back, even if multiple query blocks are required,
        // so there is no need to drive a flowFetch (continue query) request for singleton select.
        if (position_ == lastValidBytePosition_) {
            // Check for ENDQRYRM, throw SqlException if already received one.
            checkAndThrowReceivedEndqryrm();
     .......

There is no javadoc for lastValidBytePosition, nor currentRowPosition and nextRowPosition which seem to be at play here as well. I haven't really been able to figure it all out from the code yet.

  I'd most appreciate an explanation of the interplay of the various positions if someone understands it, so I can understand why position == lastValidBytePosition and why that is a bad thing in this case. I will add any explanation as comments to the code.


Kathey



Kathey Marsden added a comment - 15/Jan/08 05:11 PM
In order to better understand what is going on here I thought I would try to develop a repro without the user database. I thought all that was needed was to produce data of the correct size approaching 32K. I wrote this not very intelligent program to just increment the size of a varchar by 1 to try to find the correct size and reproduce the error, but was unsuccessful in reproducing it. Anyway, I am quite out of ideas on how to approach this issue, either in creating a reproduction or fixing and would most appreciate advice on next steps.

Bryan Pendleton added a comment - 15/Jan/08 06:26 PM
Hi Kathey, I think that you are proceeding along the right course; developing a
standalone repro seems like the right plan. I agree that it should be very data
dependent. Unfortunately, I am temporarily without a computer as my main
computer has died, so I'm kind of handicapped until I get another computer set up.

Can you (using a debugger) examine how the server behaves with your repro?
In particular, I thought it was particularly fascinating that with the original repro
we had a call to splitQRYDTA made from doneData, because I had never seen
that case happen before while stepping through the server code. With your
repro, can you tell whether there is a splitQRYDTA call made from doneData?

Kathey Marsden added a comment - 15/Jan/08 08:53 PM
Thank you Bryan for the help. I see that the reason my repro does not trigger the problem is because the server proactively ends the DSS in this code because it would not be able to fit another 15K row. If I comment out
// if ((stmt.getBlksize() - endLength ) < rowsize)
// getMoreData = false;

I can trigger the problem. I will work to adjust my rowsizes so I don't trigger this condition.

// if we don't have enough room for a row of the
// last row's size, don't try to cram it in.
// It would get split up but it is not very efficient.
if (getMoreData == true)
{
int endLength = writer.getDSSLength();
int rowsize = endLength - startLength;
if ((stmt.getBlksize() - endLength ) < rowsize)
getMoreData = false;


Kathey Marsden added a comment - 16/Jan/08 12:48 AM
Attached is a repro for this issue without the user database, Repro3230.java. Thanks Bryan for the help in narrowing the case down.

Kathey Marsden added a comment - 16/Jan/08 07:24 PM
Marking as a regression because this passes with 10.1.3.1 client, with 10.4 server. It fails with 10.2.1.6. Even with 10.1.3.1, we pass through splitQRYDTA in doneData, so I assume this is a regression in the client being able to handle this case.

Kathey Marsden added a comment - 16/Jan/08 09:10 PM
I think this dates back to DERBY-821 when we added the following code to NetCursor.calculateColumnOffsetsForRow()
                    if (sqlcode == SqlCode.END_OF_DATA.getCode()) {
                        setAllRowsReceivedFromServer(true);

This marked the ResultSet as closed on the server before we sent that final CNTQRY to get the null indicator. Moving the line
daNullIndicator = readFdocaOneByte(); up before we do the check and mark the result set as closed, seems to resolve the issue. I don't see why this can't happen sooner rather than later. I will work up a patch and run some tests.


Kathey Marsden added a comment - 17/Jan/08 12:01 AM
Attached is a patch for this issue. The solution was to move the retrieval of all of the data associated with the QRYDTA before the ResultSet is marked as closed on the server.
Ran suites.All and derbynetclientmats.

Kathey Marsden added a comment - 17/Jan/08 12:08 AM
Please review derby-3230_diff.txt

Bryan Pendleton added a comment - 17/Jan/08 03:21 AM
The change looks solid to me, Kathey. The new test failed as expected without the
code fix, and passed as expected with the code fix applied. Thanks for the careful
research against the different versions to figure out what changed and why; your
explanation makes good sense to me. +1 to commit.