I got stuck trying to implement the original solution, so I tried an alternative approach.
It is a lot simpler, but people might not like it. Note however, that it follows roughly the same pattern as Blob.
Note the patch is a quick mash-up, and I want some feedback from the community.
The alternative approach is to make all classes writing and reading data from store able to peek at it and determine which format it has to use to read/write the data.
Including my second format, we have these two byte formats:
- current: D1_D2_DATA
- new: D4_D3_M_D2_D1_DATA
M is a magic byte, and is used to detect the new format. It is a illegal UTF-8 encoding, so it should not be possible to interpret it incorrectly as the first format and data.
I have set M to F0 (11110000), but I'm masking out the last four bits when looking for the magic byte. This makes it possible to have arbitrary many formats, should that be necessary, the main point is to keep the four highest bits set.
With respect to data corruption (i.e. one bit getting flipped), is this approach safe enough?
So if we need to be able to store huge Clobs in the future, we could change M and use another format:
- future: D6_D5_M_D4_D3_D2_D1_DATA
The same approach could be used to store other meta information.
The patch 'derby-3907-alternative_approach.diff' only changes behavior for small Clobs. To enable a new format for a larger Clob, the streaming classes have to be changed (ReaderToUTF8Stream, UTF8Reader).
It should be noted that these classes are used to write other character types (CHAR, VARCHAR) as well, and I do not intend to change how they are represented. This means that I have to include enough information to be able to do the correct thing.
While the format can be detected on read, an informed decision must be made on write. Now I'm consulting the data dictionary to check the database version, and if it is less than 10.5 I use th e old format. Is there a better way?
Regarding the original approach, I got stuck because the upper layers of Derby are sending down NULL values of the data types into store. The upper layer don't have any context information, and is unable to choose the correct implementation. The system doesn't seem to be set up for having multiple implementations of a single data type at this level.
I ended up with a series of hacks, for instance having store override the Clob implementation type, but it just didn't work very well. At one point I had normal, soft- and hard-upgraded working, but compress table failed. I'm sure this isn't the only path that will fail.
I might pick up the work again later, but right now I want to wait for a while and work on other issues.