Derby
  1. Derby
  2. DERBY-4477

Selecting / projecting a column whose value is represented by a stream more than once fails

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.3.3.0, 10.4.2.0, 10.5.3.0
    • Fix Version/s: 10.5.3.1, 10.6.1.0
    • Component/s: SQL, Store
    • Labels:
      None
    • Bug behavior facts:
      Crash, Data corruption, Wrong query result

      Description

      Selecting / projecting a column whose value is represented as a stream more than once crashes Derby, i.e.:
      ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS clobTwo FROM mytable");
      rs.getString(1);
      rs.getString(2);

      After having looked at the class of bugs having to do with reuse of stream data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and DERBY-2349 (there may be more Jiras).
      The core of the fix is cloning certain DVDs being selected/projected in multiple columns. There are two types of cloning:
      A) materializing clone
      B) stream clone

      (A) can be implemented already, (B) requires code to clone a stream without materializing it. Note that the streams I'm talking about are streams originating from the store.

      Testing revealed the following:

      • the cost of the checks performed to figure out if cloning is required seems acceptable (negligible?)
      • in some cases (A) has better performance than (B) because the raw data only has to be decoded once
      • stream clones are preferred when the data value is above a certain size for several reasons:
      • avoids potential out-of-memory errors (and in case of a server environment, it lowers the memory pressure)
      • avoids decoding the whole value if the JDBC streaming APIs are used to access only parts of the value
      • avoids decoding overall in cases where the value isn't accessed by the client / user
        (this statement conflicts with the performance observation above)

      We don't always know the size of a value, and since the fix code deals with all kinds of data types, it is slightly more costly to try to obtain the size.

      What do people think about the following goal statement?
      Goals:
      ----- Phase 1
      1) No crashes or wrong results due to stream reuse when executing duplicate column selections (minus goal 4)
      2) Minimal performance degradation for non-duplicate column selections
      3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR BIT DATA] column selections
      ----- Phase 2
      4) No out-of-memory exceptions during execution of duplicate column selections of BLOB/CLOB
      5) Optimize BLOB/CLOB cloning

      I think phase 1 can proceed by reviewing and discussing the prototype patch. Phase 2 requires more discussion and work (see DERBY-3650).

      A note about the bug behavior facts:
      Since this issue is the underlying cause for several other reported issues, I have decided to be liberal when setting the bug behavior facts. Depending on where the duplicate column selection is used, it can cause both crashes, wrong results and data corruption.

      1. derby-4477-0a-prototype.diff
        26 kB
        Kristian Waagan
      2. derby-4477-lowmem.diff
        7 kB
        Dag H. Wanvik
      3. derby-4477-lowmem.stat
        0.2 kB
        Dag H. Wanvik
      4. derby-4477-lowmem-2.diff
        8 kB
        Dag H. Wanvik
      5. derby-4477-lowmem-2.stat
        0.4 kB
        Dag H. Wanvik
      6. derby-4477-lowmem-followup.diff
        2 kB
        Dag H. Wanvik
      7. derby-4477-lowmem-followup.stat
        0.2 kB
        Dag H. Wanvik
      8. derby-4477-partial.diff
        14 kB
        Dag H. Wanvik
      9. derby-4477-partial.stat
        0.5 kB
        Dag H. Wanvik
      10. derby-4477-partial-2.diff
        23 kB
        Dag H. Wanvik
      11. derby-4477-partial-2.stat
        0.5 kB
        Dag H. Wanvik
      12. derby-4477-useCloning.diff
        3 kB
        Dag H. Wanvik
      13. derby-4477-useCloning.stat
        0.3 kB
        Dag H. Wanvik

        Issue Links

          Activity

          Hide
          Kristian Waagan added a comment -

          Sounds good to me. Thanks, Kathey.

          Show
          Kristian Waagan added a comment - Sounds good to me. Thanks, Kathey.
          Hide
          Kathey Marsden added a comment -

          I looked at the changes that did get back ported for DERBY-3645 and DERBY-3646 and the partial fix for this issue and I think they are ok to leave in 10.5 as the behavior has improved and seem to accomplish the phase 1 goals of this fix:

          ----- Phase 1
          1) No crashes or wrong results due to stream reuse when executing duplicate column selections (minus goal 4)
          2) Minimal performance degradation for non-duplicate column selections
          3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR BIT DATA] column selections

          But not the phase 2 goals of:
          ----- Phase 2
          4) No out-of-memory exceptions during execution of duplicate column selections of BLOB/CLOB
          5) Optimize BLOB/CLOB cloning

          and thus are good fixes for 10.5. I fixed up the javadoc warnings for the references to the 10.5 lowmem tests with revision
          966440 in 10.5

          Show
          Kathey Marsden added a comment - I looked at the changes that did get back ported for DERBY-3645 and DERBY-3646 and the partial fix for this issue and I think they are ok to leave in 10.5 as the behavior has improved and seem to accomplish the phase 1 goals of this fix: ----- Phase 1 1) No crashes or wrong results due to stream reuse when executing duplicate column selections (minus goal 4) 2) Minimal performance degradation for non-duplicate column selections 3) Only a minor performance degradation for duplicate [ [LONG] VAR]CHAR [FOR BIT DATA] column selections But not the phase 2 goals of: ----- Phase 2 4) No out-of-memory exceptions during execution of duplicate column selections of BLOB/CLOB 5) Optimize BLOB/CLOB cloning and thus are good fixes for 10.5. I fixed up the javadoc warnings for the references to the 10.5 lowmem tests with revision 966440 in 10.5
          Hide
          Kathey Marsden added a comment -

          Thank you Kristian for the clarification. I am not sure where I got off track and ended up linking these issues. I must have pulled up the wrong email the day Lily and I made the links.

          I will remove the tags for now and take a closer look at whether DERBY-3645 and DERBY-3646 should be backed out of 10.5. Although they merged cleanly and passed regression tests, the severed fix may be problematic.

          Show
          Kathey Marsden added a comment - Thank you Kristian for the clarification. I am not sure where I got off track and ended up linking these issues. I must have pulled up the wrong email the day Lily and I made the links. I will remove the tags for now and take a closer look at whether DERBY-3645 and DERBY-3646 should be backed out of 10.5. Although they merged cleanly and passed regression tests, the severed fix may be problematic.
          Hide
          Kristian Waagan added a comment -

          You can't backport 908563 without also backporting DERBY-4520, which includes rather significant changes.
          I was a bit surprised to see these issues being backported, since the comment on the '10.5 backporting' thread on derby-dev suggested they wouldn't.
          DERBY-3645 and DERBY-3646 will also work in 10.5 as it stands now, as long as there is enough memory for the materialized LOB.

          I think the easiest thing to do here is to simply remove the @see tags, but I haven't looked in detail at the state of the code in 10.5.

          Show
          Kristian Waagan added a comment - You can't backport 908563 without also backporting DERBY-4520 , which includes rather significant changes. I was a bit surprised to see these issues being backported, since the comment on the '10.5 backporting' thread on derby-dev suggested they wouldn't. DERBY-3645 and DERBY-3646 will also work in 10.5 as it stands now, as long as there is enough memory for the materialized LOB. I think the easiest thing to do here is to simply remove the @see tags, but I haven't looked in detail at the state of the code in 10.5.
          Hide
          Kathey Marsden added a comment -

          Myrna pointed out that back porting the change introduced the following javadoc warnings.

          [javadoc] D:\svnnightlies\v10_5\src\opensource\java\testing\org\apache\derbyTesting\functionTests\tests\jdbcapi\BLOBTest.java:398: warning - Tag @see: can't find testDerby4477_3645_3646_Repro_lowmem in org.apache.derbyTesting.functionTests.tests.memory.BlobMemTest
          [javadoc] D:\svnnightlies\v10_5\src\opensource\java\testing\org\apache\derbyTesting\functionTests\tests\jdbcapi\BLOBTest.java:398: warning - Tag @see: can't find testDerby4477_3645_3646_Repro_lowmem_clob in org.apache.derbyTesting.functionTests.tests.memory.ClobMemTest

          Based on Kristian's earlier comment
          I am unclear on whether this means I should also backport revison 908563 along with the lowmem tests and hopefully in doing so resolve the difference in trunk and 10.5 or just fix up the javadoc not to refer to these methods which are not currently in 10.5.

          Show
          Kathey Marsden added a comment - Myrna pointed out that back porting the change introduced the following javadoc warnings. [javadoc] D:\svnnightlies\v10_5\src\opensource\java\testing\org\apache\derbyTesting\functionTests\tests\jdbcapi\BLOBTest.java:398: warning - Tag @see: can't find testDerby4477_3645_3646_Repro_lowmem in org.apache.derbyTesting.functionTests.tests.memory.BlobMemTest [javadoc] D:\svnnightlies\v10_5\src\opensource\java\testing\org\apache\derbyTesting\functionTests\tests\jdbcapi\BLOBTest.java:398: warning - Tag @see: can't find testDerby4477_3645_3646_Repro_lowmem_clob in org.apache.derbyTesting.functionTests.tests.memory.ClobMemTest Based on Kristian's earlier comment I am unclear on whether this means I should also backport revison 908563 along with the lowmem tests and hopefully in doing so resolve the difference in trunk and 10.5 or just fix up the javadoc not to refer to these methods which are not currently in 10.5.
          Hide
          Kristian Waagan added a comment -

          Backport went in with revision 963326.

          I just want to point out that the way the problematic scenario is handled in 10.5 and 10.6/trunk is different.
          10.5 will materialize the values ('((StreamStorable)dvd).loadStream()'), whereas 10.6/trunk will clone the source streams if possible (dvd.cloneValue(false)''). This will probably not cause much trouble, as LOBs would have to be projected more than once to trigger an OOME (i.e 'select myclob as clob1, myclob as clob2 ...').

          Show
          Kristian Waagan added a comment - Backport went in with revision 963326. I just want to point out that the way the problematic scenario is handled in 10.5 and 10.6/trunk is different. 10.5 will materialize the values ('((StreamStorable)dvd).loadStream()'), whereas 10.6/trunk will clone the source streams if possible (dvd.cloneValue(false)''). This will probably not cause much trouble, as LOBs would have to be projected more than once to trigger an OOME (i.e 'select myclob as clob1, myclob as clob2 ...').
          Hide
          Kathey Marsden added a comment -

          Re-closing after back port to 10.5

          Show
          Kathey Marsden added a comment - Re-closing after back port to 10.5
          Hide
          Kathey Marsden added a comment -

          reopen for 10.5 back port

          Show
          Kathey Marsden added a comment - reopen for 10.5 back port
          Hide
          Kristian Waagan added a comment -

          Closing this issue, no problems with the fix have been reported.

          Show
          Kristian Waagan added a comment - Closing this issue, no problems with the fix have been reported.
          Hide
          Dag H. Wanvik added a comment -

          resolving as fixed in 10.6.

          Show
          Dag H. Wanvik added a comment - resolving as fixed in 10.6.
          Hide
          Dag H. Wanvik added a comment -

          reopening due to wrong resolution; thank for noticing Kristian.

          Show
          Dag H. Wanvik added a comment - reopening due to wrong resolution; thank for noticing Kristian.
          Hide
          Kristian Waagan added a comment -

          Dag, this issue is resolved as "Not A Problem". Is this correct?
          Also, no fix version has been set.

          Show
          Kristian Waagan added a comment - Dag, this issue is resolved as "Not A Problem". Is this correct? Also, no fix version has been set.
          Hide
          Dag H. Wanvik added a comment -

          Thanks, Kristian!
          Committed as svn 908563, resolving.

          Show
          Dag H. Wanvik added a comment - Thanks, Kristian! Committed as svn 908563, resolving.
          Hide
          Kristian Waagan added a comment -

          +1 to commit patch 'derby-4477-useCloning'

          Note that the "length-materialization optimization" (code that was commented out earlier, and removed by the latest patch) hasn't been implemented yet. However, I think it is better placed inside the various cloneValue-methods and hope to get to it at a later time.

          Show
          Kristian Waagan added a comment - +1 to commit patch 'derby-4477-useCloning' Note that the "length-materialization optimization" (code that was commented out earlier, and removed by the latest patch) hasn't been implemented yet. However, I think it is better placed inside the various cloneValue-methods and hope to get to it at a later time.
          Hide
          Dag H. Wanvik added a comment - - edited

          Uploading the patch which uses the new stream cloning referred to earlier as a coming follow-up for this issue: derby-4477-useCloning.

          It also bumps the lob size in the lowmem suite's test cases added for this issue in ClobMemTest and BlobMemTest, which will run out of memory without this cloning used by this patch.

          Regressions ran ok, as well as the lowmem suite.

          Show
          Dag H. Wanvik added a comment - - edited Uploading the patch which uses the new stream cloning referred to earlier as a coming follow-up for this issue: derby-4477-useCloning. It also bumps the lob size in the lowmem suite's test cases added for this issue in ClobMemTest and BlobMemTest, which will run out of memory without this cloning used by this patch. Regressions ran ok, as well as the lowmem suite.
          Hide
          Dag H. Wanvik added a comment -

          Committed *-lowmem-followup as svn 905621, regressions ran ok.

          Show
          Dag H. Wanvik added a comment - Committed *-lowmem-followup as svn 905621, regressions ran ok.
          Hide
          Dag H. Wanvik added a comment -

          Uploading a patch, *-lowmem-followup, which fixes the problem with ClobMemTest,
          rerunning regressions.

          Show
          Dag H. Wanvik added a comment - Uploading a patch, *-lowmem-followup, which fixes the problem with ClobMemTest, rerunning regressions.
          Hide
          Dag H. Wanvik added a comment -

          The reason is that the client driver has a finalizer for the Class EncodedInputStream
          which when run, also closes the Reader passed in. This was introduced in 10.2; the embedded driver does not close the Reader when done with it. I filed DERBY-4531 for this difference in behavior. I'll change the ClobMemTest to side-step this.

          Show
          Dag H. Wanvik added a comment - The reason is that the client driver has a finalizer for the Class EncodedInputStream which when run, also closes the Reader passed in. This was introduced in 10.2; the embedded driver does not close the Reader when done with it. I filed DERBY-4531 for this difference in behavior. I'll change the ClobMemTest to side-step this.
          Hide
          Dag H. Wanvik added a comment -

          Hmm, could it be a finalizer for the prepared statement that closes the stream? I'll look into it.

          Show
          Dag H. Wanvik added a comment - Hmm, could it be a finalizer for the prepared statement that closes the stream? I'll look into it.
          Hide
          Knut Anders Hatlen added a comment -

          ClobMemTest.testDerby4477_3645_3646_Repro_lowmem_clob is failing intermittently in the nightly regression tests. See here:

          http://dbtg.foundry.sun.com/derby/test/Daily/jvm1.6/testing/testlog/vista/904556-suitesAll_diff.txt
          http://dbtg.foundry.sun.com/derby/test/Daily/jvm1.6/testing/testlog/vista-64/904812-suitesAll_diff.txt
          http://dbtg.foundry.sun.com/derby/test/Daily/jvm1.6/testing/testlog/solN+1/904812-suitesAll_diff.txt

          There are three different stack traces, but they all report that the LoopingAlphabetReader is closed, which is kind of odd since LoopingAlphabetReader.reopen() is called right before the failing line in all three cases.

          Show
          Knut Anders Hatlen added a comment - ClobMemTest.testDerby4477_3645_3646_Repro_lowmem_clob is failing intermittently in the nightly regression tests. See here: http://dbtg.foundry.sun.com/derby/test/Daily/jvm1.6/testing/testlog/vista/904556-suitesAll_diff.txt http://dbtg.foundry.sun.com/derby/test/Daily/jvm1.6/testing/testlog/vista-64/904812-suitesAll_diff.txt http://dbtg.foundry.sun.com/derby/test/Daily/jvm1.6/testing/testlog/solN+1/904812-suitesAll_diff.txt There are three different stack traces, but they all report that the LoopingAlphabetReader is closed, which is kind of odd since LoopingAlphabetReader.reopen() is called right before the failing line in all three cases.
          Hide
          Dag H. Wanvik added a comment -

          Committed patch derby-4477-lowmem-2 as svn 904538. There are not two committed patches on this issue, both of which should be revisited when cloning streams are properly handled (not materialized when large).

          Show
          Dag H. Wanvik added a comment - Committed patch derby-4477-lowmem-2 as svn 904538. There are not two committed patches on this issue, both of which should be revisited when cloning streams are properly handled (not materialized when large).
          Hide
          Dag H. Wanvik added a comment - - edited

          Uploading patch *-lowmem-2, which now uses the exisiting methods to verify equality between two InputStreams and between two Readers. I introduced a reopen method to LoopingAlphabetReader, since the BaseTestCase comparison method closes the reader after use and i needed to use it several times. I also moved the clob version of test test case to ClobMemTest to keep the pattern.

          Show
          Dag H. Wanvik added a comment - - edited Uploading patch *-lowmem-2, which now uses the exisiting methods to verify equality between two InputStreams and between two Readers. I introduced a reopen method to LoopingAlphabetReader, since the BaseTestCase comparison method closes the reader after use and i needed to use it several times. I also moved the clob version of test test case to ClobMemTest to keep the pattern.
          Hide
          Dag H. Wanvik added a comment -

          Thanks for looking at the patch, Knut. I'll substitute the existing assert methods in a new rev.

          Show
          Dag H. Wanvik added a comment - Thanks for looking at the patch, Knut. I'll substitute the existing assert methods in a new rev.
          Hide
          Knut Anders Hatlen added a comment -

          BaseTestCase already implements assertEquals() methods specialized for InputStreams and Readers, so I think the new methods assertEqualStreams() and assertEqualReaders() could be removed.

          Show
          Knut Anders Hatlen added a comment - BaseTestCase already implements assertEquals() methods specialized for InputStreams and Readers, so I think the new methods assertEqualStreams() and assertEqualReaders() could be removed.
          Hide
          Dag H. Wanvik added a comment -

          Uploading a new patch, derby-4477-lowmem, which adds test cases to BlobMemTest for the repros from 3645 and 3646, but with large blobs to test that materialization is not happening. These tests should be enabled when we update the "partial" patch when the support for cloning of streams without materializations becomes available. For now, the test cases uses a small blob, so the test does pass. Note, BlobMemTest is not part of the regression suite, but is run separately in ther lowmem suite, e.g. as "ant junit-lowmem".

          Show
          Dag H. Wanvik added a comment - Uploading a new patch, derby-4477-lowmem, which adds test cases to BlobMemTest for the repros from 3645 and 3646, but with large blobs to test that materialization is not happening. These tests should be enabled when we update the "partial" patch when the support for cloning of streams without materializations becomes available. For now, the test cases uses a small blob, so the test does pass. Note, BlobMemTest is not part of the regression suite, but is run separately in ther lowmem suite, e.g. as "ant junit-lowmem".
          Hide
          Dag H. Wanvik added a comment -

          Trying again to make HashTableResultSet fail in a similar way....
          I was able to make stream column appear duplicated in HashJoin/HashTableResultSet, e.g with the following query:

          select t1.b, t1.c, t2.b, t2.c from
          (select id, v, v, (select count from sys.systables) w from mytab) t1(a,b,c,d),
          (select id, v, v, (select count from sys.systables) w from mytab) t2(a,b,c,d)
          where t1.a=t2.a

          where v is the blob similar to repro in DERBY-3646, but there is a ProjectRestrictResultset under the HashTableResultset which takes care of the cloning after svn 902857 (before the patch, the query will fail).
          If I remove the subquery in the select list flattening is allowed, and I see a HashExistsJoin/HashScanResultset instead.

          I am not yet entirely convinced that a HashTableResultSet can't appear without an underlying ProjectRestrictResultset, though, but it may be the case.

          Show
          Dag H. Wanvik added a comment - Trying again to make HashTableResultSet fail in a similar way.... I was able to make stream column appear duplicated in HashJoin/HashTableResultSet, e.g with the following query: select t1.b, t1.c, t2.b, t2.c from (select id, v, v, (select count from sys.systables) w from mytab) t1(a,b,c,d), (select id, v, v, (select count from sys.systables) w from mytab) t2(a,b,c,d) where t1.a=t2.a where v is the blob similar to repro in DERBY-3646 , but there is a ProjectRestrictResultset under the HashTableResultset which takes care of the cloning after svn 902857 (before the patch, the query will fail). If I remove the subquery in the select list flattening is allowed, and I see a HashExistsJoin/HashScanResultset instead. I am not yet entirely convinced that a HashTableResultSet can't appear without an underlying ProjectRestrictResultset, though, but it may be the case.
          Hide
          Dag H. Wanvik added a comment -

          Committed patch derby-4477-partial-2 as svn 902857.

          Show
          Dag H. Wanvik added a comment - Committed patch derby-4477-partial-2 as svn 902857.
          Hide
          Knut Anders Hatlen added a comment -

          I don't think the scenario I suggested will cause any problems. If the data type is VARCHAR, the value appears to be materialized in memory even in the case of overflow. If one of the long data types is used, DISTINCT queries are not allowed.

          Show
          Knut Anders Hatlen added a comment - I don't think the scenario I suggested will cause any problems. If the data type is VARCHAR, the value appears to be materialized in memory even in the case of overflow. If one of the long data types is used, DISTINCT queries are not allowed.
          Hide
          Dag H. Wanvik added a comment -

          Uploading revision 2; derby-4477-partial-2, which changes the cloning to only happen in occurences 2..n in the RCL and incorporates Kristian's comments (except that the map is always present) and the repros for 3646 and 2349. Regressions ran ok.

          Show
          Dag H. Wanvik added a comment - Uploading revision 2; derby-4477-partial-2, which changes the cloning to only happen in occurences 2..n in the RCL and incorporates Kristian's comments (except that the map is always present) and the repros for 3646 and 2349. Regressions ran ok.
          Hide
          Dag H. Wanvik added a comment - - edited

          Thanks, Knut! Yes, I was concerned with the similar code in HashTableResultset; LOBs can't be involved in a distinct, so the remaining suspects would be the long character types. I am going to do some experiments to establish if this can be an issue. Thanks for the suggestion scenario! Probably the safest thing would be to just include the logic for the cloning in HashTableResultset as well.

          Show
          Dag H. Wanvik added a comment - - edited Thanks, Knut! Yes, I was concerned with the similar code in HashTableResultset; LOBs can't be involved in a distinct, so the remaining suspects would be the long character types. I am going to do some experiments to establish if this can be an issue. Thanks for the suggestion scenario! Probably the safest thing would be to just include the logic for the cloning in HashTableResultset as well.
          Hide
          Knut Anders Hatlen added a comment -

          I would expect the performance impact of these checks to be negligible, so +1 from me to keeping the code simple.

          The approach looks good to me. It's not obvious after the first quick look that it's a complete solution (that is, whether it's enough to do this duplication check in PRN), but I don't have any evidence suggesting it's not. The first thing that comes to mind is what if the duplication happens in HashTableNode (the other caller of mapSourceColumns()). Could it be that that code path will be triggered by this (untested) case

          • set page size to 4k
          • create table t (x varchar(32000))
          • insert into t values (..string longer than 4k..)
          • select distinct x,x from t

          ?

          Show
          Knut Anders Hatlen added a comment - I would expect the performance impact of these checks to be negligible, so +1 from me to keeping the code simple. The approach looks good to me. It's not obvious after the first quick look that it's a complete solution (that is, whether it's enough to do this duplication check in PRN), but I don't have any evidence suggesting it's not. The first thing that comes to mind is what if the duplication happens in HashTableNode (the other caller of mapSourceColumns()). Could it be that that code path will be triggered by this (untested) case set page size to 4k create table t (x varchar(32000)) insert into t values (..string longer than 4k..) select distinct x,x from t ?
          Hide
          Dag H. Wanvik added a comment -

          Thanks for the comments, Kristian, I'll include those in my next rev of the patch. I don't think always creating a column map matters for performance here, since the object is allocated at compilation time. Upside is slightly simpler code at execution time (one test in stead of two), although testing for a member in a boolean array is slightly expensive that checking to find an empty pointer perhaps. If you like, I can reintroduce that logic.

          Show
          Dag H. Wanvik added a comment - Thanks for the comments, Kristian, I'll include those in my next rev of the patch. I don't think always creating a column map matters for performance here, since the object is allocated at compilation time. Upside is slightly simpler code at execution time (one test in stead of two), although testing for a member in a boolean array is slightly expensive that checking to find an empty pointer perhaps. If you like, I can reintroduce that logic.
          Hide
          Kristian Waagan added a comment -

          Hi Dag,

          The current patch looks good to me. A few nits and one question:

          • typo in BLOBTest.testDerby4477Repro JavaDoc
          • typo in PRRS: "columsn"
          • typo in RCL.mapSourceColumns JavaDoc: "column" -> "columns"
          • if you make PRRS.cloneMap final and return null if there are no columns to clone (and add the check for cloneMap != null), do you think that will matter performance-wise?

          I think it's safe to not clone the first occurrence, but I don't know how much it matters. My assumption is that the clone code will only be activated for a very small percentage of queries. I would be more worried about not incurring extra cost in the normal case, where each column is reference only once.

          Show
          Kristian Waagan added a comment - Hi Dag, The current patch looks good to me. A few nits and one question: typo in BLOBTest.testDerby4477Repro JavaDoc typo in PRRS: "columsn" typo in RCL.mapSourceColumns JavaDoc: "column" -> "columns" if you make PRRS.cloneMap final and return null if there are no columns to clone (and add the check for cloneMap != null), do you think that will matter performance-wise? I think it's safe to not clone the first occurrence, but I don't know how much it matters. My assumption is that the clone code will only be activated for a very small percentage of queries. I would be more worried about not incurring extra cost in the normal case, where each column is reference only once.
          Hide
          Dag H. Wanvik added a comment -

          Regressions passed. I will file a new JIRA for the lack of heeding close in our internal streams, and add the corrected repro from derby-3646 to the test as well, and spin a new rev of my patch.

          Show
          Dag H. Wanvik added a comment - Regressions passed. I will file a new JIRA for the lack of heeding close in our internal streams, and add the corrected repro from derby-3646 to the test as well, and spin a new rev of my patch.
          Hide
          Dag H. Wanvik added a comment - - edited

          BinaryToRawStream qua FilterInputStream.close merely closes the passed in InputStream, in our case
          this is a FormatIdInputStream, which again wraps a OverflowInputStream which fails to take any action on close. This explains why no error was seen initially for the DERBY-3646 repro. This is unfortunate and I think we should override the close method somewhere, maybe in BinaryToRawStream to avoid allowing reading after close.

          Show
          Dag H. Wanvik added a comment - - edited BinaryToRawStream qua FilterInputStream.close merely closes the passed in InputStream, in our case this is a FormatIdInputStream, which again wraps a OverflowInputStream which fails to take any action on close. This explains why no error was seen initially for the DERBY-3646 repro. This is unfortunate and I think we should override the close method somewhere, maybe in BinaryToRawStream to avoid allowing reading after close.
          Hide
          Dag H. Wanvik added a comment -

          The repro for DERBY-3646 fails because it's coded wrong: Per JDBC, the stream should be digested before a new get* method is called, cf. http://java.sun.com/j2se/1.5.0/docs/api/java/sql/ResultSet.html#getBinaryStream(int).

          When I fix that error in the repro, both derby-4477-partial and derby-4477-0a-prototype (with 64K limit) passes. When above the limit with derby-4477-0a-prototype, materialization is not done, but rather the new copyForRead method is used. This will eventually return a wrapped stream, BinaryToRawStream which extends java.io.FilterInputStream. Strangely, FilterInputStream does not give an error on read even after it has been closed (by EmbedResultSet#closeCurrentStream), so that's why the repro passed with the original derby-4477-0a-prototype, so the wrong usage in the repro is not caught.

          Show
          Dag H. Wanvik added a comment - The repro for DERBY-3646 fails because it's coded wrong: Per JDBC, the stream should be digested before a new get* method is called, cf. http://java.sun.com/j2se/1.5.0/docs/api/java/sql/ResultSet.html#getBinaryStream(int ). When I fix that error in the repro, both derby-4477-partial and derby-4477-0a-prototype (with 64K limit) passes. When above the limit with derby-4477-0a-prototype, materialization is not done, but rather the new copyForRead method is used. This will eventually return a wrapped stream, BinaryToRawStream which extends java.io.FilterInputStream. Strangely, FilterInputStream does not give an error on read even after it has been closed (by EmbedResultSet#closeCurrentStream), so that's why the repro passed with the original derby-4477-0a-prototype, so the wrong usage in the repro is not caught.
          Hide
          Dag H. Wanvik added a comment -

          I notice that test repro for DERBY-3646 still fails with my version of the patch. The derby-4477-0a-prototype patch makes that repro pass, but only because the limit for materialization is set low: increase it from 32K to 64K , and that repro fails with that patch as well, so there seems to be more to be done here? Or maybe I misunderstood something of the prototype patch... In my version I marked for cloning all occurences which had duplicates in the RCL (not just occurence 2..n as the prototype patch does), but that does not seem to make a difference..

          Show
          Dag H. Wanvik added a comment - I notice that test repro for DERBY-3646 still fails with my version of the patch. The derby-4477-0a-prototype patch makes that repro pass, but only because the limit for materialization is set low: increase it from 32K to 64K , and that repro fails with that patch as well, so there seems to be more to be done here? Or maybe I misunderstood something of the prototype patch... In my version I marked for cloning all occurences which had duplicates in the RCL (not just occurence 2..n as the prototype patch does), but that does not seem to make a difference..
          Hide
          Dag H. Wanvik added a comment -

          Uploading a patch derby-4477-partial, which is a variant of the phase 1 patch, which has moved the establishment of which which column potentially need cloning from execution time to compile-time and adds the repro test from DERBY-3645 to BLOBTest, which passes with the patch.

          This partial patch does not address cloning at the store level (it materializes always) so the code needs to be updated (I have put in a FIXME comment) when that work is finished. Running regressions.

          Show
          Dag H. Wanvik added a comment - Uploading a patch derby-4477-partial, which is a variant of the phase 1 patch, which has moved the establishment of which which column potentially need cloning from execution time to compile-time and adds the repro test from DERBY-3645 to BLOBTest, which passes with the patch. This partial patch does not address cloning at the store level (it materializes always) so the code needs to be updated (I have put in a FIXME comment) when that work is finished. Running regressions.
          Hide
          Kristian Waagan added a comment -

          Attached a prototype patch 0a.

          The logic is isolated in ProjectRestrictResultSet, the rest of the patch is code from the patch attached to DERBY-3650.
          Currently, the prototype tried to implement something along the lines of phase 2.
          I'm running the regressions tests, and tomorrow I will post a performance test and some results.

          I'd like some feedback on how we want Derby to behave:

          • what should the clone stream threshold be?
          • is it okay to always materialize [[LONG] VAR]CHAR [FOR BIT DATA]?
          • the DataValueDescriptor.getLengthIfKnow was something I added just before posting the patch, to optimize where possible. Keep it or ditch it? Useful in other scenarios?
            (as a side note, getCharLengthIfKnown was added to InternalClob)
          • the second check could be done in the constructor if there was a way to reliably find the types of the relevant columns (I was only able to find the descriptors for the top-level result set, but then I don't know the available structures very well)

          Finally, with the exception of the code in the constructor, the added code should only be activated if the user selects a "store streamable" [1] column more than once. I don't know if that is very common, and I guess the most important issue is that Derby is able to handle it without crashing.

          [1] The store streamable are the various CHAR and CHAR FOR BIT DATA types, BLOB, and CLOB.
          (Hmm, what about XML?)

          Show
          Kristian Waagan added a comment - Attached a prototype patch 0a. The logic is isolated in ProjectRestrictResultSet, the rest of the patch is code from the patch attached to DERBY-3650 . Currently, the prototype tried to implement something along the lines of phase 2. I'm running the regressions tests, and tomorrow I will post a performance test and some results. I'd like some feedback on how we want Derby to behave: what should the clone stream threshold be? is it okay to always materialize [ [LONG] VAR]CHAR [FOR BIT DATA] ? the DataValueDescriptor.getLengthIfKnow was something I added just before posting the patch, to optimize where possible. Keep it or ditch it? Useful in other scenarios? (as a side note, getCharLengthIfKnown was added to InternalClob) the second check could be done in the constructor if there was a way to reliably find the types of the relevant columns (I was only able to find the descriptors for the top-level result set, but then I don't know the available structures very well) Finally, with the exception of the code in the constructor, the added code should only be activated if the user selects a "store streamable" [1] column more than once. I don't know if that is very common, and I guess the most important issue is that Derby is able to handle it without crashing. [1] The store streamable are the various CHAR and CHAR FOR BIT DATA types, BLOB, and CLOB. (Hmm, what about XML?)
          Hide
          Kristian Waagan added a comment -

          Removed unused imports with revision 890789.

          Show
          Kristian Waagan added a comment - Removed unused imports with revision 890789.

            People

            • Assignee:
              Dag H. Wanvik
              Reporter:
              Kristian Waagan
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development