Derby
  1. Derby
  2. DERBY-4544

Referencing streaming CLOBs in (some) generated column clauses fails

    Details

    • Urgency:
      Low
    • Issue & fix info:
      Patch Available, Repro attached
    • Bug behavior facts:
      Data corruption

      Description

      Referencing a CLOB represented as a stream in generated columns can lead to data corruption or that the query fails.

      For instance, with 10.5:
      create table t (id int, myclob clob, clen generated always as (length(myclob)));

      1. Insert CLOB using the streaming APIs (setCharacterStream).
        The exception 'java.lang.ClassCastException: org.apache.derby.iapi.types.ReaderToUTF8Stream cannot be cast to org.apache.derby.iapi.types.Resetable'

      On trunk the same query results in data corruption, and this isn't detected before the value is read back from store.

      Workaround:
      Don't use the streaming APIs when using CLOBs in generated columns. This increases the memory footprint, and may not feasible for large CLOBs.

      FYI, BLOB deals with this by materializing the value, which effectively equals to using the workaround mentioned above.

      1. derby-4544-01-ab-shortCircuitLengthOptimization.diff
        5 kB
        Rick Hillegas
      2. derby-4544-01-ac-shortCircuitLengthOptimization.diff
        9 kB
        Rick Hillegas
      3. Test_4544.java
        8 kB
        Rick Hillegas
      4. Test_4544.java
        4 kB
        Rick Hillegas
      5. Test_4544.java
        3 kB
        Rick Hillegas

        Issue Links

          Activity

          Hide
          Kristian Waagan added a comment -

          One of the messages one might see due to this error:
          Caused by: java.sql.SQLException: Java exception: 'ASSERT FAILED Less than one byte per char, CharacterStreamBuiler@31380681:bufferable=false, isPositionAware=false, curBytePos=0, curCharPos=0, dataOffset=2, byteLength=24932, charLength=65534, maxCharLength=9223372036854775807, stream=class org.apache.derby.iapi.services.io.FormatIdInputStream: org.apache.derby.shared.common.sanity.AssertFailure'.

          In insane builds YMMV, depending on the contents of the value you insert. You may get an IOException when you read back the value, or you can get wrong results.
          Here's an example where 65536 chars have been inserted as a stream:
          id=4, clen=65536, length(myclob)=65534, String(myclob).length()=24930

          clen is the generated column, length(myclob) is obtained after the insert as select length(myclob), and the last number is obtained as rs.getString(1).length().

          Show
          Kristian Waagan added a comment - One of the messages one might see due to this error: Caused by: java.sql.SQLException: Java exception: 'ASSERT FAILED Less than one byte per char, CharacterStreamBuiler@31380681:bufferable=false, isPositionAware=false, curBytePos=0, curCharPos=0, dataOffset=2, byteLength=24932, charLength=65534, maxCharLength=9223372036854775807, stream=class org.apache.derby.iapi.services.io.FormatIdInputStream: org.apache.derby.shared.common.sanity.AssertFailure'. In insane builds YMMV, depending on the contents of the value you insert. You may get an IOException when you read back the value, or you can get wrong results. Here's an example where 65536 chars have been inserted as a stream: id=4, clen=65536, length(myclob)=65534, String(myclob).length()=24930 clen is the generated column, length(myclob) is obtained after the insert as select length(myclob), and the last number is obtained as rs.getString(1).length().
          Hide
          Rick Hillegas added a comment -

          Attaching Test_4544.java. This class demonstrates a problem when you follow the steps described in this issue. Since Kristian logged this issue, the symptoms of the problem seem to have changed. The repro shows that Derby raises the following exception when you try to initially position a ResultSet for reading the corrupt Clob:

          Exception in thread "main" java.sql.SQLException: Restore of a serializable or SQLData object of class , attempted to read more data than was originally stored
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:98)
          at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:142)
          at org.apache.derby.impl.jdbc.Util.seeNextException(Util.java:278)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:403)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:348)
          at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2290)
          at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:82)
          at org.apache.derby.impl.jdbc.EmbedResultSet.closeOnTransactionError(EmbedResultSet.java:4405)
          at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(EmbedResultSet.java:470)
          at org.apache.derby.impl.jdbc.EmbedResultSet.next(EmbedResultSet.java:374)
          at Test_4544.read(Test_4544.java:41)
          at Test_4544.main(Test_4544.java:18)

          Show
          Rick Hillegas added a comment - Attaching Test_4544.java. This class demonstrates a problem when you follow the steps described in this issue. Since Kristian logged this issue, the symptoms of the problem seem to have changed. The repro shows that Derby raises the following exception when you try to initially position a ResultSet for reading the corrupt Clob: Exception in thread "main" java.sql.SQLException: Restore of a serializable or SQLData object of class , attempted to read more data than was originally stored at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:98) at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:142) at org.apache.derby.impl.jdbc.Util.seeNextException(Util.java:278) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:403) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:348) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2290) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:82) at org.apache.derby.impl.jdbc.EmbedResultSet.closeOnTransactionError(EmbedResultSet.java:4405) at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(EmbedResultSet.java:470) at org.apache.derby.impl.jdbc.EmbedResultSet.next(EmbedResultSet.java:374) at Test_4544.read(Test_4544.java:41) at Test_4544.main(Test_4544.java:18)
          Hide
          Rick Hillegas added a comment -

          Attaching a revised version of the repro. This fixes a bug in the repro, allowing it to run under the network driver.

          Show
          Rick Hillegas added a comment - Attaching a revised version of the repro. This fixes a bug in the repro, allowing it to run under the network driver.
          Hide
          Rick Hillegas added a comment -

          Attaching derby-4544-01-ab-shortCircuitLengthOptimization.diff. This patch makes the repro behave correctly. I will run regression tests. More tests need to be written to verify that other generation expressions work on streaming Clobs.

          The SQLClob.getLength() method is a little tricky. It ends up calling getStreamWithDescriptor(). That method has a prominent comment saying that it doesn't expect to be called more than once--which happens if your generation clause invokes the length() function on a streaming Clob.

          There is no SQLBlob.getLength() method. Instead, if you call the length() function on a streaming Blob, you will get the getLength() behavior of the superclass, which materializes the Blob.

          The fix is to make SQLClob.getLength() first check whether it is operating on a non-resetable stream. If the stream is not resetable, then the assumptions of getStreamWithDescriptor() will be violated. For non-resetable streams, SQLClob.getLength() will just do what Blobs do, i.e., defer to the getLength() method in the superclass which materializes the Clob.

          This will be inefficient--but that is better than causing a data corruption.

          Touches the following files:

          ----------------

          M java/engine/org/apache/derby/iapi/types/SQLClob.java

          1 line fix to force materialization if getLength() is called on a non-resetable stream.

          ----------------

          A java/testing/org/apache/derbyTesting/functionTests/tests/jdbcapi/DummyReader.java
          M java/testing/org/apache/derbyTesting/functionTests/tests/jdbcapi/ClobTest.java

          Test case demonstrating the fix.

          Show
          Rick Hillegas added a comment - Attaching derby-4544-01-ab-shortCircuitLengthOptimization.diff. This patch makes the repro behave correctly. I will run regression tests. More tests need to be written to verify that other generation expressions work on streaming Clobs. The SQLClob.getLength() method is a little tricky. It ends up calling getStreamWithDescriptor(). That method has a prominent comment saying that it doesn't expect to be called more than once--which happens if your generation clause invokes the length() function on a streaming Clob. There is no SQLBlob.getLength() method. Instead, if you call the length() function on a streaming Blob, you will get the getLength() behavior of the superclass, which materializes the Blob. The fix is to make SQLClob.getLength() first check whether it is operating on a non-resetable stream. If the stream is not resetable, then the assumptions of getStreamWithDescriptor() will be violated. For non-resetable streams, SQLClob.getLength() will just do what Blobs do, i.e., defer to the getLength() method in the superclass which materializes the Clob. This will be inefficient--but that is better than causing a data corruption. Touches the following files: ---------------- M java/engine/org/apache/derby/iapi/types/SQLClob.java 1 line fix to force materialization if getLength() is called on a non-resetable stream. ---------------- A java/testing/org/apache/derbyTesting/functionTests/tests/jdbcapi/DummyReader.java M java/testing/org/apache/derbyTesting/functionTests/tests/jdbcapi/ClobTest.java Test case demonstrating the fix.
          Hide
          Rick Hillegas added a comment -

          Attaching a new version of the repro. This experiments with additional generation expressions. The additional expressions test the builtin substr, locate, upper, and trim functions as well as a user-written function. In all of these cases, Derby behaves correctly. It appears to me that length() was the only builtin function which caused this bug. For other Derby builtin functions (and user-written functions), Derby materializes the Clob. This gives me some assurance that the solution, although inefficient, makes Derby behave in a regular way. I will incorporate these additional tests into the new test case.

          Show
          Rick Hillegas added a comment - Attaching a new version of the repro. This experiments with additional generation expressions. The additional expressions test the builtin substr, locate, upper, and trim functions as well as a user-written function. In all of these cases, Derby behaves correctly. It appears to me that length() was the only builtin function which caused this bug. For other Derby builtin functions (and user-written functions), Derby materializes the Clob. This gives me some assurance that the solution, although inefficient, makes Derby behave in a regular way. I will incorporate these additional tests into the new test case.
          Hide
          Rick Hillegas added a comment -

          Tests passed cleanly for me on derby-4544-01-ab-shortCircuitLengthOptimization.diff.

          Show
          Rick Hillegas added a comment - Tests passed cleanly for me on derby-4544-01-ab-shortCircuitLengthOptimization.diff.
          Hide
          Rick Hillegas added a comment -

          Attaching a new version of the patch, derby-4544-01-ac-shortCircuitLengthOptimization.diff. This includes the extra test cases from the revised repro. Committed at subversion revision 1091169.

          Show
          Rick Hillegas added a comment - Attaching a new version of the patch, derby-4544-01-ac-shortCircuitLengthOptimization.diff. This includes the extra test cases from the revised repro. Committed at subversion revision 1091169.
          Hide
          Rick Hillegas added a comment -

          Ported 1091169 from trunk to 10.8 branch at subversion revision 1091172.

          Show
          Rick Hillegas added a comment - Ported 1091169 from trunk to 10.8 branch at subversion revision 1091172.
          Hide
          Rick Hillegas added a comment -

          Resolving this issue. We may want to figure out how to make the builtin operators (including length()) more efficient for this use case. However, I see that as a separate issue.

          Show
          Rick Hillegas added a comment - Resolving this issue. We may want to figure out how to make the builtin operators (including length()) more efficient for this use case. However, I see that as a separate issue.
          Hide
          Kristian Waagan added a comment -

          Reopening to assign to Rick.

          Show
          Kristian Waagan added a comment - Reopening to assign to Rick.
          Hide
          Kristian Waagan added a comment -

          Closing issue.

          Show
          Kristian Waagan added a comment - Closing issue.
          Hide
          Kathey Marsden added a comment -

          Reopen for 10.5 backport consideration. If working on the backport for this issue. Temporarily assign yourself and add a comment that you are working on it. When finished, reresolve with the new fix versions or label backport_reject_10_x as appropriate.

          Show
          Kathey Marsden added a comment - Reopen for 10.5 backport consideration. If working on the backport for this issue. Temporarily assign yourself and add a comment that you are working on it. When finished, reresolve with the new fix versions or label backport_reject_10_x as appropriate.
          Hide
          Mike Matrigali added a comment -

          temp assigning to myself to do backports.

          Show
          Mike Matrigali added a comment - temp assigning to myself to do backports.
          Hide
          Mike Matrigali added a comment -

          backported to 10.7, 10.6, and 10.5. Don't plan on any more backports of this issue at this time. resetting original owner.

          Show
          Mike Matrigali added a comment - backported to 10.7, 10.6, and 10.5. Don't plan on any more backports of this issue at this time. resetting original owner.
          Hide
          Knut Anders Hatlen added a comment -

          [bulk update] Close all resolved issues that haven't been updated for more than one year.

          Show
          Knut Anders Hatlen added a comment - [bulk update] Close all resolved issues that haven't been updated for more than one year.

            People

            • Assignee:
              Rick Hillegas
              Reporter:
              Kristian Waagan
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development