Here's a patch (derby-4348-1a.diff) that adds a regression test case
and fixes the problem.
It turns out that there in fact is a problem with the special case for
LONG VARCHAR and LONG VARBINARY when performing normalization of the
values. Normally, DataTypeDescriptor.normalize() normalizes a
DataValueDescriptor by copying it into another DataValueDescriptor and
returning the copy. This destination DVD is cached and reused so that
one doesn't need to reallocate it for every value to normalize.
The special case for LONG VARCHAR and LONG VARBINARY changes this
slightly by returning the source DVD instead of the destination DVD,
apparently to avoid problems with shared streams.
Now, NormalizeResultSet has an ExecRow field, called normalizedRow, in
which the cached destination DVDs are stored. It is reused so that
NormalizeResultSet.getNextRowCore() returns the exact same instance
for every row. But since DataTypeDescriptor.normalize() returns the
source DVD instead of the copy for LONG VARCHAR, the cached ExecRow
will contain the original DVD and not the copy. When the next row is
requested from the NormalizeResultSet, it will therefore use the
source DVD for the previous row as the destination DVD for the call to
Copying a column from the current row to the previous row is not a
problem for most of the rows, as the previous row has already been
processed. However, when processing the first row in a new chunk
returned from BulkTableScanResultSet, the DVDs in the previous row
have also been reused in the fetch buffer to hold the last row in the
chunk. Since that row has not yet been processed, copying into it from
the current row will affect what we see when we get to it later.
The problem here is that NormalizeResultSet.normalizedRow serves two
purposes: (1) Hold an ExecRow object that can be reused, and (2) hold
one DataValueDescriptor per column that can be reused. This works fine
as long as the actual DVD references in the ExecRow are not changed,
but when one of the values is a LONG VARCHAR/LONG VARBINARY the
references are changed.
The patch addresses the problem by having a separate data structure
for each of the two purposes. NormalizeResultSet.normalizedRow
continues to cache the ExecRow object for reuse. A new field
(cachedDestinations) is added to hold each individual
DataValueDescriptor that should be reused. This way, changing the DVD
references in normalizedRow does not change which destination DVD is
used when processing the next row, and we don't end up modifying a DVD
which is also present later in the fetch buffer of the bulk scan.
Description of changes:
- new field cachedDestinations which takes over some of the
responsibility from normalizedRow
- new helper methods getCachedDestination() and getDesiredType() to
reduce the complexity of normalizeRow()
- removed unneeded throws clause from fetchResultTypes() to prevent
getDesiredType() from having to inherit the unneeded clause
- removed code in normalize() that initializes the cached destination
if it is null, since this is now handled by
- new JUnit test which exposes the bug
The regression tests ran cleanly with this patch.