Derby
  1. Derby
  2. DERBY-1599

Clob.getSubString() throws NullPointerException when created by updatable result set

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 10.1.3.1, 10.2.1.6
    • Fix Version/s: None
    • Component/s: JDBC, Network Client
    • Urgency:
      Normal
    • Issue & fix info:
      High Value Fix, Repro attached

      Description

      If you create a clob value with one of the ResultSet.updateXXX methods that take a stream or a reader, and retrieve that value with ResultSet.getClob(), a NullPointerException will be thrown when getSubString() is called on the returned Clob object. This happens with the network client driver, and it has been observed on Derby 10.1.3.1 and trunk.

      Exception in thread "main" java.lang.NullPointerException
      at org.apache.derby.client.am.Clob.getSubStringX(Clob.java:229)
      at org.apache.derby.client.am.Clob.getSubString(Clob.java:210)
      at Repro.main(Repro.java:24)

      1. Repro.java
        1.0 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Hide
          Knut Anders Hatlen added a comment -

          Triaged for 10.5.2.

          Show
          Knut Anders Hatlen added a comment - Triaged for 10.5.2.
          Hide
          Kristian Waagan added a comment -

          Thinking about how we can solve this, I think the method is straight forward with the embedded driver; read the user stream and save the data to a temporary location, then create a stream from the temporary copy and give it to the user. Doing this will require some changes.
          With the client driver there are more options;
          a) Transfer the content of the stream to the server and use the same mechanism as for the embedded driver.
          b) Store the content on disk on the client. Works, but where should the client store the data? The data volume would be up to ~6 GB (2G characters, 3 bytes each).
          c) Store the content in memory on the client. This will fail for large enough streams.
          d) Hybrid solution (optimizations; c+a, c+b or maybe even c+b+a).

          Of course, before starting to implement a solution we should agree on what the behavior shall be.

          Show
          Kristian Waagan added a comment - Thinking about how we can solve this, I think the method is straight forward with the embedded driver; read the user stream and save the data to a temporary location, then create a stream from the temporary copy and give it to the user. Doing this will require some changes. With the client driver there are more options; a) Transfer the content of the stream to the server and use the same mechanism as for the embedded driver. b) Store the content on disk on the client. Works, but where should the client store the data? The data volume would be up to ~6 GB (2G characters, 3 bytes each). c) Store the content in memory on the client. This will fail for large enough streams. d) Hybrid solution (optimizations; c+a, c+b or maybe even c+b+a). Of course, before starting to implement a solution we should agree on what the behavior shall be.
          Hide
          Daniel John Debrunner added a comment -

          Knut Anders> By the way, what's the correct return value from a getter right after a column has been modified with one of the update methods? The old value or the updated value? Currently, we return the new value, but perhaps we should return the old value until we have called updateRow()?

          From a literal reading of the javadoc I would say the new value, this is because the updateXXX methods say update the current row of the ResultSet (not the database) and the getXXX methods retrieve column values from the current row of the ResultSet (not the database).

          I do think that updateCharacterStream() followed by getCharacterStream() (or any getXXX) should work. The logical behaviour of this is:

          updateCharacterStream() - set column value in current row in ResultSet to SQL value represented by the contents of the character stream.
          getCharacterStream() - convert SQL column value to Java character stream and return it to the user.

          updateRow() - send updated SQL values to database and update row in the database

          Now taking the character stream from the updateCharacterStream and sending in directly to the database in updateRow is an optimization that Derby can & should take, but when there is a getCharacterStream() that optimization can no longer be taken (though in somes cases it could be I think).

          Show
          Daniel John Debrunner added a comment - Knut Anders> By the way, what's the correct return value from a getter right after a column has been modified with one of the update methods? The old value or the updated value? Currently, we return the new value, but perhaps we should return the old value until we have called updateRow()? From a literal reading of the javadoc I would say the new value, this is because the updateXXX methods say update the current row of the ResultSet (not the database) and the getXXX methods retrieve column values from the current row of the ResultSet (not the database). I do think that updateCharacterStream() followed by getCharacterStream() (or any getXXX) should work. The logical behaviour of this is: updateCharacterStream() - set column value in current row in ResultSet to SQL value represented by the contents of the character stream. getCharacterStream() - convert SQL column value to Java character stream and return it to the user. updateRow() - send updated SQL values to database and update row in the database Now taking the character stream from the updateCharacterStream and sending in directly to the database in updateRow is an optimization that Derby can & should take, but when there is a getCharacterStream() that optimization can no longer be taken (though in somes cases it could be I think).
          Hide
          Knut Anders Hatlen added a comment -

          What I meant was that I'm fine with not fixing the combinations that don't work (that is, update

          {Clob,Blob,*Stream} + get{Clob,Blob,*Stream}

          with no call to updateRow() in between) if we instead make them fail gracefully ("reading updated blob/clob column is not supported until updateRow() has been called"). I don't see any compelling reasons for restricting the combinations that currently work (get + update). Of course, if we could make all combinations work, that would be the best solution.

          By the way, what's the correct return value from a getter right after a column has been modified with one of the update methods? The old value or the updated value? Currently, we return the new value, but perhaps we should return the old value until we have called updateRow()?

          Show
          Knut Anders Hatlen added a comment - What I meant was that I'm fine with not fixing the combinations that don't work (that is, update {Clob,Blob,*Stream} + get{Clob,Blob,*Stream} with no call to updateRow() in between) if we instead make them fail gracefully ("reading updated blob/clob column is not supported until updateRow() has been called"). I don't see any compelling reasons for restricting the combinations that currently work (get + update). Of course, if we could make all combinations work, that would be the best solution. By the way, what's the correct return value from a getter right after a column has been modified with one of the update methods? The old value or the updated value? Currently, we return the new value, but perhaps we should return the old value until we have called updateRow()?
          Hide
          Kristian Waagan added a comment -

          It is not clear to me why getCharacterStream and updateCharacterStream count as two read operations on a column.
          From a users point of view, I'm updating the column, reading it back (for some reason...) and then finally performing the update (updateRow, not commit). This fails in trunk for stream, but I assume it works for primitives?

          The other way around (equals what Dan asked in the previous comment), I read the current column value, supply new data and tell Derby to update the row. I believe this works in trunk?

          To be able to implement something that is consistent and robust, I think we have to investigate and specify what exactly is allowed.

          Show
          Kristian Waagan added a comment - It is not clear to me why getCharacterStream and updateCharacterStream count as two read operations on a column. From a users point of view, I'm updating the column, reading it back (for some reason...) and then finally performing the update (updateRow, not commit). This fails in trunk for stream, but I assume it works for primitives? The other way around (equals what Dan asked in the previous comment), I read the current column value, supply new data and tell Derby to update the row. I believe this works in trunk? To be able to implement something that is consistent and robust, I think we have to investigate and specify what exactly is allowed.
          Hide
          Daniel John Debrunner added a comment -

          What exactly would this restriction be? Can I call getCharacterStream() to view the contents and then updateCharacterStream() to update them (based upon what I read)?

          Show
          Daniel John Debrunner added a comment - What exactly would this restriction be? Can I call getCharacterStream() to view the contents and then updateCharacterStream() to update them (based upon what I read)?
          Hide
          Knut Anders Hatlen added a comment -

          I'm fine with that restriction as long as we get an SQLException with a clear message instead of NullPointerException/EOFException/AssertFailure.

          Show
          Knut Anders Hatlen added a comment - I'm fine with that restriction as long as we get an SQLException with a clear message instead of NullPointerException/EOFException/AssertFailure.
          Hide
          Kathey Marsden added a comment -

          Here is the snippet from the ResultSet javadoc that says each column should be read only once.

          "The ResultSet interface provides getter methods (getBoolean, getLong, and so on) for retrieving column values from the current row. Values can be retrieved using either the index number of the column or the name of the column. In general, using the column index will be more efficient. Columns are numbered from 1. For maximum portability, result set columns within each row should be read in left-to-right order, and each column should be read only once."

          Show
          Kathey Marsden added a comment - Here is the snippet from the ResultSet javadoc that says each column should be read only once. "The ResultSet interface provides getter methods (getBoolean, getLong, and so on) for retrieving column values from the current row. Values can be retrieved using either the index number of the column or the name of the column. In general, using the column index will be more efficient. Columns are numbered from 1. For maximum portability, result set columns within each row should be read in left-to-right order, and each column should be read only once."
          Hide
          Kathey Marsden added a comment -

          For DERBY-721 we decided that users are not allowed to call getCharacterStream twice on the same column for the same row. Should it not then follow that you can't call both getCharacterStream and updateCharacterStream?

          Show
          Kathey Marsden added a comment - For DERBY-721 we decided that users are not allowed to call getCharacterStream twice on the same column for the same row. Should it not then follow that you can't call both getCharacterStream and updateCharacterStream?
          Hide
          Kristian Waagan added a comment -

          This looks like a programming error somewhere in the JDBC layer. A stream of an unexpected type is seen in EmbedClob init, which indicates an odd path through the system. My comments are regarding the embedded driver.

          I think there are numerous scenarios to consider here. As far as I can tell, the stream data will be materialized if you call for instance rs.getString(). Fair enough, if you ask for lots of data in memory, Derby just have to do as told. However, the following scenario fails and looks very much like a bug:
          a) Call rs.updateCharacterStream() with a stream.
          b) Call rs.getCharacterStream() and drain the stream.
          c) Call rs.updateRow().

          Step c will fail (only excerpt below):
          Caused by: java.io.EOFException: Stream has already been read and end-of-file reached and cannot be re-used.
          at org.apache.derby.iapi.types.ReaderToUTF8Stream.read(ReaderToUTF8Stream.java:151)
          at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:320)
          at org.apache.derby.iapi.types.SQLChar.readExternal(SQLChar.java:649)
          at org.apache.derby.iapi.types.SQLChar.getString(SQLChar.java:363)
          at org.apache.derby.iapi.types.SQLChar.setFrom(SQLChar.java:1229)
          at org.apache.derby.iapi.types.DataType.setValue(DataType.java:490)
          at org.apache.derby.impl.jdbc.EmbedResultSet.updateRow(EmbedResultSet.java:3714)

          The problem here seems to be that Derby just passes the source stream directly out to the user.
          If we want this scenario to work, we must somehow store the data temporarily. Since we don't know the size, we must also be prepared to store it on disk. I believe we already have a solution for this, where it is kept in memory until it reaches a threshold and then it is written to disk. I have not checked if the code can be used as-is, or if it must be customized.

          The case with EmbedClob.init() is basically the same, but is has an assert that triggers.

          Has anyone consulted the JDBC spec on this issue?
          What is the defined behavior, if any?

          Show
          Kristian Waagan added a comment - This looks like a programming error somewhere in the JDBC layer. A stream of an unexpected type is seen in EmbedClob init, which indicates an odd path through the system. My comments are regarding the embedded driver. I think there are numerous scenarios to consider here. As far as I can tell, the stream data will be materialized if you call for instance rs.getString(). Fair enough, if you ask for lots of data in memory, Derby just have to do as told. However, the following scenario fails and looks very much like a bug: a) Call rs.updateCharacterStream() with a stream. b) Call rs.getCharacterStream() and drain the stream. c) Call rs.updateRow(). Step c will fail (only excerpt below): Caused by: java.io.EOFException: Stream has already been read and end-of-file reached and cannot be re-used. at org.apache.derby.iapi.types.ReaderToUTF8Stream.read(ReaderToUTF8Stream.java:151) at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:320) at org.apache.derby.iapi.types.SQLChar.readExternal(SQLChar.java:649) at org.apache.derby.iapi.types.SQLChar.getString(SQLChar.java:363) at org.apache.derby.iapi.types.SQLChar.setFrom(SQLChar.java:1229) at org.apache.derby.iapi.types.DataType.setValue(DataType.java:490) at org.apache.derby.impl.jdbc.EmbedResultSet.updateRow(EmbedResultSet.java:3714) The problem here seems to be that Derby just passes the source stream directly out to the user. If we want this scenario to work, we must somehow store the data temporarily. Since we don't know the size, we must also be prepared to store it on disk. I believe we already have a solution for this, where it is kept in memory until it reaches a threshold and then it is written to disk. I have not checked if the code can be used as-is, or if it must be customized. The case with EmbedClob.init() is basically the same, but is has an assert that triggers. Has anyone consulted the JDBC spec on this issue? What is the defined behavior, if any?
          Hide
          Mamta A. Satoor added a comment -

          I spent some time on this Jira entry and here is what I found.

          When the sample program does rs.updateCharacterStream() on a Clob column, we generate a derby.client.am.Clob object with it's dataType_ as CHARACTER_STREAM
          Thread [main] (Suspended)
          Clob.<init>(Agent, Reader, int) line: 170
          CrossConverters.setObject(int, Reader, int) line: 651
          NetResultSet(ResultSet).updateCharacterStream(int, Reader, int) line: 3066
          DERBY_1599_Repro.main(String[]) line: 56
          In other words, the resultant derby.client.am.Clob object has a character stream associated with it but it's string_ variable is NULL.

          Later, when the sample program retrieve the Clob object using
          Clob clob = rs.getClob(1);
          we get the Clob object with character stream associated with it but no direct string value. The return of Clob object for the call above is shown in the following call stack
          Thread [main] (Suspended)
          NetResultSet(ResultSet).getClob(int) line: 1256
          DERBY_1599_Repro.main(String[]) line: 57

          The problem happens for the next call in the user program when we try to use substring, ie
          String string = clob.getSubString(1, (int) clob.length());
          This is implemented in
          derby.client.am.Clob.getSubStringX(long, int) line: 326
          and the code for Clob.getSubStringX in derby network client looks as follows
          private String getSubStringX(long pos, int length) throws SqlException
          {
          checkForClosedConnection();
          // actual length is the lesser of the length requested
          // and the number of characters available from pos to the end
          long actualLength = Math.min(this.sqlLength() - pos + 1, (long) length);
          //Check to see if the Clob object is locator enabled.
          if (isLocator())

          { //The Clob object is locator enabled. Hence call the stored //procedure CLOBGETLENGTH to determine the length of the Clob. return agent_.connection_.locatorProcedureCall() .clobGetSubString(locator_, pos, (int)actualLength); }

          else

          { //The Clob object is not locator enabled. return string_.substring ((int) pos - 1, (int) (pos - 1 + actualLength)); }

          }

          Since this Clob object is not locator enabled, the code control goes to else in the code above and since string_ is NULL, we end up getting null pointer exception.

          To fix the problem, should we be checking if string_ is null, and if yes, then check what stream is associated with the Clob object and then materialize the string_ from the stream? Would like to know if anyone has any thoughts on this.

          BTW, I tried modifying the repro program so that we try to do
          rs.updateCharacterStream(1, r, 3);
          rs.getCharacterStream(1);
          but that didn't go very well either. It resulted in following exception
          $ java org.apache.derbyTesting.functionTests.tests.lang.DERBY_1599_Repro
          Exception in thread "main" java.sql.SQLException: An attempt was made to put a data value of type 'java.sql.Clob' into a data value of type 'CHAR'.
          at org.apache.derby.client.am.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:46)
          at org.apache.derby.client.am.SqlException.getSQLException(SqlException.java:362)
          at org.apache.derby.client.am.ResultSet.getCharacterStream(ResultSet.java:1211)
          at org.apache.derbyTesting.functionTests.tests.lang.DERBY_1599_Repro.main(DERBY_1599_Repro.java:57)
          Caused by: org.apache.derby.client.am.SqlException: An attempt was made to put a data value of type 'java.sql.Clob' into a data value of type 'CHAR'.
          at org.apache.derby.client.am.CrossConverters.setObject(CrossConverters.java:810)
          at org.apache.derby.client.am.CrossConverters.setObject(CrossConverters.java:846)
          at org.apache.derby.client.am.ResultSet.getCharacterStream(ResultSet.java:1198)
          ... 1 more

          Show
          Mamta A. Satoor added a comment - I spent some time on this Jira entry and here is what I found. When the sample program does rs.updateCharacterStream() on a Clob column, we generate a derby.client.am.Clob object with it's dataType_ as CHARACTER_STREAM Thread [main] (Suspended) Clob.<init>(Agent, Reader, int) line: 170 CrossConverters.setObject(int, Reader, int) line: 651 NetResultSet(ResultSet).updateCharacterStream(int, Reader, int) line: 3066 DERBY_1599_Repro.main(String[]) line: 56 In other words, the resultant derby.client.am.Clob object has a character stream associated with it but it's string_ variable is NULL. Later, when the sample program retrieve the Clob object using Clob clob = rs.getClob(1); we get the Clob object with character stream associated with it but no direct string value. The return of Clob object for the call above is shown in the following call stack Thread [main] (Suspended) NetResultSet(ResultSet).getClob(int) line: 1256 DERBY_1599_Repro.main(String[]) line: 57 The problem happens for the next call in the user program when we try to use substring, ie String string = clob.getSubString(1, (int) clob.length()); This is implemented in derby.client.am.Clob.getSubStringX(long, int) line: 326 and the code for Clob.getSubStringX in derby network client looks as follows private String getSubStringX(long pos, int length) throws SqlException { checkForClosedConnection(); // actual length is the lesser of the length requested // and the number of characters available from pos to the end long actualLength = Math.min(this.sqlLength() - pos + 1, (long) length); //Check to see if the Clob object is locator enabled. if (isLocator()) { //The Clob object is locator enabled. Hence call the stored //procedure CLOBGETLENGTH to determine the length of the Clob. return agent_.connection_.locatorProcedureCall() .clobGetSubString(locator_, pos, (int)actualLength); } else { //The Clob object is not locator enabled. return string_.substring ((int) pos - 1, (int) (pos - 1 + actualLength)); } } Since this Clob object is not locator enabled, the code control goes to else in the code above and since string_ is NULL, we end up getting null pointer exception. To fix the problem, should we be checking if string_ is null, and if yes, then check what stream is associated with the Clob object and then materialize the string_ from the stream? Would like to know if anyone has any thoughts on this. BTW, I tried modifying the repro program so that we try to do rs.updateCharacterStream(1, r, 3); rs.getCharacterStream(1); but that didn't go very well either. It resulted in following exception $ java org.apache.derbyTesting.functionTests.tests.lang.DERBY_1599_Repro Exception in thread "main" java.sql.SQLException: An attempt was made to put a data value of type 'java.sql.Clob' into a data value of type 'CHAR'. at org.apache.derby.client.am.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:46) at org.apache.derby.client.am.SqlException.getSQLException(SqlException.java:362) at org.apache.derby.client.am.ResultSet.getCharacterStream(ResultSet.java:1211) at org.apache.derbyTesting.functionTests.tests.lang.DERBY_1599_Repro.main(DERBY_1599_Repro.java:57) Caused by: org.apache.derby.client.am.SqlException: An attempt was made to put a data value of type 'java.sql.Clob' into a data value of type 'CHAR'. at org.apache.derby.client.am.CrossConverters.setObject(CrossConverters.java:810) at org.apache.derby.client.am.CrossConverters.setObject(CrossConverters.java:846) at org.apache.derby.client.am.ResultSet.getCharacterStream(ResultSet.java:1198) ... 1 more
          Hide
          Mamta A. Satoor added a comment -

          I tried the attached repro in trunk in embedded mode and got error there too but not a null pointer exception

          $ java org.apache.derbyTesting.functionTests.tests.lang.DERBY_1599_Repro
          Exception in thread "main" java.sql.SQLException: Java exception: 'ASSERT FAILED: org.apache.derby.shared.common.sanity.AssertFailure'.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
          at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:88)
          at org.apache.derby.impl.jdbc.Util.javaException(Util.java:245)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:403)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:346)
          at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:1574)
          at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:81)
          at org.apache.derby.impl.jdbc.EmbedResultSet.getClob(EmbedResultSet.java:4048)
          at org.apache.derbyTesting.functionTests.tests.lang.DERBY_1599_Repro.main(DERBY_1599_Repro.java:57)
          Caused by: org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED
          at org.apache.derby.shared.common.sanity.SanityManager.ASSERT(SanityManager.java:98)
          at org.apache.derby.impl.jdbc.EmbedClob.<init>(EmbedClob.java:144)
          at org.apache.derby.impl.jdbc.EmbedResultSet.getClob(EmbedResultSet.java:4046)
          ... 1 more

          Show
          Mamta A. Satoor added a comment - I tried the attached repro in trunk in embedded mode and got error there too but not a null pointer exception $ java org.apache.derbyTesting.functionTests.tests.lang.DERBY_1599_Repro Exception in thread "main" java.sql.SQLException: Java exception: 'ASSERT FAILED: org.apache.derby.shared.common.sanity.AssertFailure'. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45) at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:88) at org.apache.derby.impl.jdbc.Util.javaException(Util.java:245) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:403) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:346) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:1574) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:81) at org.apache.derby.impl.jdbc.EmbedResultSet.getClob(EmbedResultSet.java:4048) at org.apache.derbyTesting.functionTests.tests.lang.DERBY_1599_Repro.main(DERBY_1599_Repro.java:57) Caused by: org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED at org.apache.derby.shared.common.sanity.SanityManager.ASSERT(SanityManager.java:98) at org.apache.derby.impl.jdbc.EmbedClob.<init>(EmbedClob.java:144) at org.apache.derby.impl.jdbc.EmbedResultSet.getClob(EmbedResultSet.java:4046) ... 1 more
          Hide
          Knut Anders Hatlen added a comment -

          Attaching repro.

          Show
          Knut Anders Hatlen added a comment - Attaching repro.

            People

            • Assignee:
              Unassigned
              Reporter:
              Knut Anders Hatlen
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development