Derby
  1. Derby
  2. DERBY-5489

getBinary() returns incorrect data after getObject() call on BLOB column

    Details

    • Urgency:
      Normal
    • Issue & fix info:
      Release Note Needed, Repro attached
    • Bug behavior facts:
      Embedded/Client difference, Wrong query result

      Description

      When ResultSet.getObject(int) is called on a BLOB column, the correct EmbedBlob object is returned. But if afterwards the ResultSet.getBytes(int) is called on the same row, the returned array contains invalid data - it is offset by 3 bytes and its size is incorrect.
      The problem only occurs when the stored BLOB is large enough to be internally represented by stream and not by array of bytes (at least ~32KiB).
      It seems that the getObject method shifts the stream position and therefore the getBytes method starts to read the data after the third byte, thus incorrectly calculating its length.

      1. releaseNote.html
        3 kB
        Kristian Waagan
      2. derby-5489-2b-fixes.diff
        20 kB
        Kristian Waagan
      3. derby-5489-2a-fixes.diff
        8 kB
        Kristian Waagan
      4. derby-5489-1b-test.diff
        17 kB
        Kristian Waagan
      5. derby-5489-1a-test.diff
        16 kB
        Kristian Waagan
      6. repro.diff
        7 kB
        Dag H. Wanvik
      7. SelectBlobBug.java
        0.9 kB
        Pawel Fronczak

        Issue Links

          Activity

          Hide
          Knut Anders Hatlen added a comment -

          [bulk update: close all resolved issues that haven't had any activity the last year]

          Show
          Knut Anders Hatlen added a comment - [bulk update: close all resolved issues that haven't had any activity the last year]
          Hide
          Dag H. Wanvik added a comment -

          I suggest you omit saying anything about that (for simplicity also).

          Show
          Dag H. Wanvik added a comment - I suggest you omit saying anything about that (for simplicity also).
          Hide
          Kristian Waagan added a comment -

          Attaching a first attempt at a release note.
          It doesn't say anything about the fact that you can invoke getBytes/getString multiple times. Should it, given that one of the purposes of a release note is to determine if an upgrade will cause your application to break?

          Show
          Kristian Waagan added a comment - Attaching a first attempt at a release note. It doesn't say anything about the fact that you can invoke getBytes/getString multiple times. Should it, given that one of the purposes of a release note is to determine if an upgrade will cause your application to break?
          Hide
          Kristian Waagan added a comment -

          Committed patch 2b to trunk with revision 1330681.

          Unless something new comes up, I don't plan more work on this issue.

          Show
          Kristian Waagan added a comment - Committed patch 2b to trunk with revision 1330681. Unless something new comes up, I don't plan more work on this issue.
          Hide
          Kristian Waagan added a comment -

          Attaching patch 2b:
          o ResultSet and EmbedResultSet:
          Adds the required checks to behave as specified (see above). Factored out method checkLOBMultiCall(int). In the client driver I added a missing check for getObject.
          o UpdatableResultSetTest:
          Made access pattern compatible with the new behavior. The test worked because it always called a getter that materialized the value first, or the value was so small that it was kept in a materialized representation.
          o jdbcapi/LobRsGetterTest:
          Renamed some methods (only actual test methods now start with "test", the private ones I renamed to "_test").
          Added some more comments, and two more tests. The tests verifies the content of LOB in various scenarios when multiple getters are invoked.
          o jdbcapi/_Suite:
          Enabled the test as part of the test suite.

          Patch ready for final review.
          Tests passed on Linux and Solaris with JDK 1.6.
          I intend to commit this patch tomorrow at the latest.

          Show
          Kristian Waagan added a comment - Attaching patch 2b: o ResultSet and EmbedResultSet: Adds the required checks to behave as specified (see above). Factored out method checkLOBMultiCall(int). In the client driver I added a missing check for getObject. o UpdatableResultSetTest: Made access pattern compatible with the new behavior. The test worked because it always called a getter that materialized the value first, or the value was so small that it was kept in a materialized representation. o jdbcapi/LobRsGetterTest: Renamed some methods (only actual test methods now start with "test", the private ones I renamed to "_test"). Added some more comments, and two more tests. The tests verifies the content of LOB in various scenarios when multiple getters are invoked. o jdbcapi/_Suite: Enabled the test as part of the test suite. Patch ready for final review. Tests passed on Linux and Solaris with JDK 1.6. I intend to commit this patch tomorrow at the latest.
          Hide
          Kristian Waagan added a comment -

          I committed the test (patch 1b) to trunk with revision 1329186.

          Thank you for your opinion, Knut Anders.
          Based on your feedback and my own opinion, I will keep the special treatment of getBytes and getString.

          Regarding the suggested refactoring, I will do so to more clearly document the special treatment. Note that the getters belong to two different result set implementations, so I still need two helper methods.

          Show
          Kristian Waagan added a comment - I committed the test (patch 1b) to trunk with revision 1329186. Thank you for your opinion, Knut Anders. Based on your feedback and my own opinion, I will keep the special treatment of getBytes and getString. Regarding the suggested refactoring, I will do so to more clearly document the special treatment. Note that the getters belong to two different result set implementations, so I still need two helper methods.
          Hide
          Knut Anders Hatlen added a comment -

          The new checks that are added to the getters look almost identical. Would it be possible to factor them out in a shared helper method?

          > Assuming everything else is ok, it must be decided if we want to keep the special behavior for getBytes and getString

          I'm fine with it either way. The advantage of not having special treatment of getBytes and getString is that the behaviour is easier to explain: LOB columns can only be accessed once. But keeping the special treatment reduces the chances of breaking existing applications, and I don't see any significant downside, so I think I'm leaning towards the approach you chose in the 2a patch.

          Show
          Knut Anders Hatlen added a comment - The new checks that are added to the getters look almost identical. Would it be possible to factor them out in a shared helper method? > Assuming everything else is ok, it must be decided if we want to keep the special behavior for getBytes and getString I'm fine with it either way. The advantage of not having special treatment of getBytes and getString is that the behaviour is easier to explain: LOB columns can only be accessed once. But keeping the special treatment reduces the chances of breaking existing applications, and I don't see any significant downside, so I think I'm leaning towards the approach you chose in the 2a patch.
          Hide
          Kristian Waagan added a comment -

          Attaching patch 2a, which implements the checks to make getBytes, getString, and getObject throw an exception when invoked on a LOB column after another getter has already has been invoked, and harmoinzes the behavior for the embedded and the client driver.

          As already noted, getBytes and getString currently behave differently than the rest. It is allowed to call those two getters repeatedly because they materialize the value.
          Examples of behavior with the patch:
          (I'm using getObject, but it could be any of the other valid getters for LOBs)
          a) OK: getBytes - getBytes
          b) FAILS: getObject - getBytes
          c) OK: getBytes - getObject
          d) FAILS: getBytes - getObject - getBytes
          e) FAILS: getBlob - getBinaryStream

          The changes to lang.UpdatableResultSetTest adjust the access pattern to be compatible with the new rules/restrictions.

          Assuming everything else is ok, it must be decided if we want to keep the special behavior for getBytes and getString (i.e. either remove the TODOs, or modify the implementation).

          I've successfully run suites.All with the patches:
          15488 tests executed
          0 errors
          0 failures

          Patch ready for review.

          Show
          Kristian Waagan added a comment - Attaching patch 2a, which implements the checks to make getBytes, getString, and getObject throw an exception when invoked on a LOB column after another getter has already has been invoked, and harmoinzes the behavior for the embedded and the client driver. As already noted, getBytes and getString currently behave differently than the rest. It is allowed to call those two getters repeatedly because they materialize the value. Examples of behavior with the patch: (I'm using getObject, but it could be any of the other valid getters for LOBs) a) OK: getBytes - getBytes b) FAILS: getObject - getBytes c) OK: getBytes - getObject d) FAILS: getBytes - getObject - getBytes e) FAILS: getBlob - getBinaryStream The changes to lang.UpdatableResultSetTest adjust the access pattern to be compatible with the new rules/restrictions. Assuming everything else is ok, it must be decided if we want to keep the special behavior for getBytes and getString (i.e. either remove the TODOs, or modify the implementation). I've successfully run suites.All with the patches: 15488 tests executed 0 errors 0 failures Patch ready for review.
          Hide
          Kristian Waagan added a comment -

          Attaching patch 1b, a slightly updated version of the test. Only naming, typos, and comments have been changed.

          Show
          Kristian Waagan added a comment - Attaching patch 1b, a slightly updated version of the test. Only naming, typos, and comments have been changed.
          Hide
          Kristian Waagan added a comment -

          Attaching a first version of a test as patch 1a.

          A few comments:
          o expects getBytes on BLOB and getString on CLOB to be supported
          o additionally, we support getString, getAsciiStream, and getCharacterStream on BLOB (note differing behavior regarding how the raw bytes are interpreted, see JDBC implementation notes on the Derby web site)
          o as an exception to the rule, multiple invocations of getString/getBytes is supported as long as no other getter has been invoked previously. This works because those getters always materialize. Do we want to keep this exception to the rule as a convenience, or should we go for consistency?
          o the behavior between embedded and client differs on several accounts

          Show
          Kristian Waagan added a comment - Attaching a first version of a test as patch 1a. A few comments: o expects getBytes on BLOB and getString on CLOB to be supported o additionally, we support getString, getAsciiStream, and getCharacterStream on BLOB (note differing behavior regarding how the raw bytes are interpreted, see JDBC implementation notes on the Derby web site) o as an exception to the rule, multiple invocations of getString/getBytes is supported as long as no other getter has been invoked previously. This works because those getters always materialize. Do we want to keep this exception to the rule as a convenience, or should we go for consistency? o the behavior between embedded and client differs on several accounts
          Hide
          Kristian Waagan added a comment -

          I've been talking to Lance Anderson, the JDBC spec lead, and the intention is to specify that getBytes is supported on BLOB columns in a future release/update of the JDBC specification. The same goes for getString on CLOB columns. The recommended way to access a BLOB is still through getBlob.

          This makes it clear that we must allow getBytes on BLOB columns, but the existing restriction we have on calling a getter only once on a LOB column will still be there.

          Knut, the answer to your question is definitely yes

          Show
          Kristian Waagan added a comment - I've been talking to Lance Anderson, the JDBC spec lead, and the intention is to specify that getBytes is supported on BLOB columns in a future release/update of the JDBC specification. The same goes for getString on CLOB columns. The recommended way to access a BLOB is still through getBlob. This makes it clear that we must allow getBytes on BLOB columns, but the existing restriction we have on calling a getter only once on a LOB column will still be there. Knut, the answer to your question is definitely yes
          Hide
          Knut Anders Hatlen added a comment -

          > The work mentioned above is probably not suitable for backporting,
          > but as an an alternative we could use the mechanism that disallows
          > multiple accesses of the same LOB/stream column on a single row to
          > improve the situation on 10.8 and earlier.

          If we take the trouble to implement this solution for 10.8 and
          earlier, couldn't we do the same for trunk? It sounds like it fixes
          the same problem, but with a much smaller compatibility impact.

          Show
          Knut Anders Hatlen added a comment - > The work mentioned above is probably not suitable for backporting, > but as an an alternative we could use the mechanism that disallows > multiple accesses of the same LOB/stream column on a single row to > improve the situation on 10.8 and earlier. If we take the trouble to implement this solution for 10.8 and earlier, couldn't we do the same for trunk? It sounds like it fixes the same problem, but with a much smaller compatibility impact.
          Hide
          Kristian Waagan added a comment -

          Thanks, Kim!
          I'll keep that in mind, but will hold off until we're sure we want to forbid those methods on LOB columns.

          Show
          Kristian Waagan added a comment - Thanks, Kim! I'll keep that in mind, but will hold off until we're sure we want to forbid those methods on LOB columns.
          Hide
          Kim Haase added a comment -

          You may also want to file a documentation issue, Kristian – the following topic mentions using getBytes with a BLOB, and getString with a CLOB:

          http://db.apache.org/derby/docs/dev/ref/rrefjdbc96386.html

          Topics elsewhere in the docs mention only the use of getBlob and getClob, fortunately.

          Show
          Kim Haase added a comment - You may also want to file a documentation issue, Kristian – the following topic mentions using getBytes with a BLOB, and getString with a CLOB: http://db.apache.org/derby/docs/dev/ref/rrefjdbc96386.html Topics elsewhere in the docs mention only the use of getBlob and getClob, fortunately.
          Hide
          Kristian Waagan added a comment -

          According to the JDBC specification, using getBytes on a BLOB column isn't allowed. This is stated in table B-6 and the following text:
          """
          TABLE B-6 Type Conversions Supported by ResultSet getter Methods
          This table shows which JDBC types may be returned by ResultSet getter methods. A
          bold X indicates the method recommended for retrieving a JDBC type. A plain x
          indicates for which JDBC types it is possible to use a getter method.
          """

          There is no x/X in the cell where getBytes and BLOB intersects.

          If this is correct, I will create a new JIRA to track the work on forbidding this in Derby since the title of this issue doesn't express what is changing very well. This change will most likely have consequences for some existing applications and will require a release note.

          The work mentioned above is probably not suitable for backporting, but as an an alternative we could use the mechanism that disallows multiple accesses of the same LOB/stream column on a single row to improve the situation on 10.8 and earlier. That means applications using only ResultSet.getBytes() on a BLOB will continue to work, whereas applications using ResultSet.get[Blob|BinaryStream|Object] plus getBytes will fail. Failing with an exception is preferable to data corruption/wrong results. Some applications may not see the data corruption/wrong results - I believe this will be the case if the BLOBs are below a given threshold (represented in-memory vs store stream). Based on Dag's comment this may also be the case for applications using the client driver (independent of the BLOB size).

          I haven't verified it, but I suspect there will be similar issues to fix for CLOB.

          Any opinions on my interpretation of the JDBC specification, and on the suggested improvements?

          Show
          Kristian Waagan added a comment - According to the JDBC specification, using getBytes on a BLOB column isn't allowed. This is stated in table B-6 and the following text: """ TABLE B-6 Type Conversions Supported by ResultSet getter Methods This table shows which JDBC types may be returned by ResultSet getter methods. A bold X indicates the method recommended for retrieving a JDBC type. A plain x indicates for which JDBC types it is possible to use a getter method. """ There is no x/X in the cell where getBytes and BLOB intersects. If this is correct, I will create a new JIRA to track the work on forbidding this in Derby since the title of this issue doesn't express what is changing very well. This change will most likely have consequences for some existing applications and will require a release note. The work mentioned above is probably not suitable for backporting, but as an an alternative we could use the mechanism that disallows multiple accesses of the same LOB/stream column on a single row to improve the situation on 10.8 and earlier. That means applications using only ResultSet.getBytes() on a BLOB will continue to work, whereas applications using ResultSet.get [Blob|BinaryStream|Object] plus getBytes will fail. Failing with an exception is preferable to data corruption/wrong results. Some applications may not see the data corruption/wrong results - I believe this will be the case if the BLOBs are below a given threshold (represented in-memory vs store stream). Based on Dag's comment this may also be the case for applications using the client driver (independent of the BLOB size). I haven't verified it, but I suspect there will be similar issues to fix for CLOB. Any opinions on my interpretation of the JDBC specification, and on the suggested improvements?
          Hide
          Dag H. Wanvik added a comment -

          Unassigning myself, not actively working on this.

          Show
          Dag H. Wanvik added a comment - Unassigning myself, not actively working on this.
          Hide
          Kristian Waagan added a comment -

          The Spring Framework issue has been logged as SPR-8810 [1].

          [1] https://jira.springsource.org/browse/SPR-8810

          Show
          Kristian Waagan added a comment - The Spring Framework issue has been logged as SPR-8810 [1] . [1] https://jira.springsource.org/browse/SPR-8810
          Hide
          Dag H. Wanvik added a comment -

          Data point: I don't see this issue with the client driver.

          Show
          Dag H. Wanvik added a comment - Data point: I don't see this issue with the client driver.
          Hide
          Dag H. Wanvik added a comment -

          I found I could reproduce the error in a simplified repro (without Spring dependencies) using the uploaded Repro.diff (diff against BLOBTest.java in the regressions suite).

          Show
          Dag H. Wanvik added a comment - I found I could reproduce the error in a simplified repro (without Spring dependencies) using the uploaded Repro.diff (diff against BLOBTest.java in the regressions suite).
          Hide
          Pawel Fronczak added a comment -

          The problem with changing the code responsible for the data retrieval is that it is part of the SpringFramework library. Since it seems that it is opposing the JDBC API, I shall file a bug report for that.

          Fortunately, there is a simple workaround - instead of using the queryForList() method, one could use the query() method and supply custom RowMapper (or RowCallbackHandler) which will read the BLOB column only once.

          Nevertheless, an exception from getBytes explaining what has happened would be very helpful. Right now it is not very easy to track down the cause of the problem.

          Show
          Pawel Fronczak added a comment - The problem with changing the code responsible for the data retrieval is that it is part of the SpringFramework library. Since it seems that it is opposing the JDBC API, I shall file a bug report for that. Fortunately, there is a simple workaround - instead of using the queryForList() method, one could use the query() method and supply custom RowMapper (or RowCallbackHandler) which will read the BLOB column only once. Nevertheless, an exception from getBytes explaining what has happened would be very helpful. Right now it is not very easy to track down the cause of the problem.
          Hide
          Kristian Waagan added a comment -

          I think we decided to forbid accessing the same column (on the same row) multiple times for streams and LOBs (I can't find the JIRA issue right now). It may be an oversight that getBytes doesn't throw an exception in this case. The decision was based partly on the following text for the Java API documentation for java.sql.ResultSet:
          "For maximum portability, result set columns within each row should be read in left-to-right order, and each column should be read only once."
          getString appears to have the same issue.

          A possible change/fix for the code above would be to cast obj to Blob and use one of the java.sql.Blob methods to obtain the bytes. One could also rewrite the code to chose the correct action based on metadata instead of the type of the object returned by getObject, I suppose.

          I have not looked at actually fixing the Derby positioning bug itself.

          Show
          Kristian Waagan added a comment - I think we decided to forbid accessing the same column (on the same row) multiple times for streams and LOBs (I can't find the JIRA issue right now). It may be an oversight that getBytes doesn't throw an exception in this case. The decision was based partly on the following text for the Java API documentation for java.sql.ResultSet: "For maximum portability, result set columns within each row should be read in left-to-right order, and each column should be read only once." getString appears to have the same issue. A possible change/fix for the code above would be to cast obj to Blob and use one of the java.sql.Blob methods to obtain the bytes. One could also rewrite the code to chose the correct action based on metadata instead of the type of the object returned by getObject, I suppose. I have not looked at actually fixing the Derby positioning bug itself.
          Hide
          Pawel Fronczak added a comment -

          I have attached a code snippet reproducing this bug.
          It requires the springframework-jdbc library to operate.
          The mentioned sequence of getObject / getBytes methods is buried inside the JdbcTemplate.queryForList method:
          Object obj = rs.getObject(index);
          (...)
          if (obj instanceof Blob)

          { obj = rs.getBytes(index); }
          Show
          Pawel Fronczak added a comment - I have attached a code snippet reproducing this bug. It requires the springframework-jdbc library to operate. The mentioned sequence of getObject / getBytes methods is buried inside the JdbcTemplate.queryForList method: Object obj = rs.getObject(index); (...) if (obj instanceof Blob) { obj = rs.getBytes(index); }

            People

            • Assignee:
              Kristian Waagan
              Reporter:
              Pawel Fronczak
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development