Derby
  1. Derby
  2. DERBY-3732

SQL Length function materializes BLOB into memory

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 10.3.3.0, 10.4.1.3, 10.5.1.1
    • Fix Version/s: 10.3.3.1, 10.4.2.0, 10.5.1.1
    • Component/s: SQL
    • Labels:
      None
    • Issue & fix info:
      High Value Fix

      Description

      Currently the SQL length function materializes the entire lob into memory. In SQLBinary.getLength() we have
      public final int getLength() throws StandardException
      {
      if (stream != null)

      { if (streamValueLength != -1) return streamValueLength; }

      return (getBytes() == null) ? 0 : getBytes().length;
      }
      Which actually is doubly bad because we call getBytes twice and materialize it twice.
      It would be good to read the length from the stream if available and otherwise stream the value to get the length, rather than materializing it into memory.

      To reproduce, run the attached repro.
      java -Xmx16M LengthLargeLob

      It gives an out of memory exception
      Caused by: java.lang.OutOfMemoryError: Java heap space
      at org.apache.derby.iapi.types.SQLBinary.readFromStream(SQLBinary.java:415)
      at org.apache.derby.iapi.types.SQLBinary.readExternal(SQLBinary.java:318)
      at org.apache.derby.iapi.types.SQLBinary.getValue(SQLBinary.java:220)
      at org.apache.derby.iapi.types.SQLBinary.getBytes(SQLBinary.java:210)
      at org.apache.derby.iapi.types.SQLBinary.getLength(SQLBinary.java:250)
      at org.apache.derby.impl.sql.execute.BaseActivation.getDB2Length(BaseActivation.java:1684)
      at org.apache.derby.exe.acf81e0010x011axa317x5db8x0000003d9dc81.e1(Unknown Source)
      at org.apache.derby.impl.services.reflect.DirectCall.invoke(ReflectGeneratedClass.java:141)
      at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.doProjection(ProjectRestrictResultSet.java:497)
      at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(ProjectRestrictResultSet.java:291)
      at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(BasicNoPutResultSetImpl.java:460)
      at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(EmbedResultSet.java:423)
      ... 2 more
      [

      1. derby-3732_skip3_diff.txt
        14 kB
        Kathey Marsden
      2. derby-3732_skip2_diff.txt
        19 kB
        Kathey Marsden
      3. derby-3732_skip_diff.txt
        17 kB
        Kathey Marsden
      4. derby-3732_diff.txt
        17 kB
        Kathey Marsden
      5. derby-3732_proto_diff.txt
        2 kB
        Kathey Marsden
      6. LengthThruBlob.java
        0.9 kB
        Kathey Marsden
      7. LengthLargeLob.zip
        1 kB
        Kathey Marsden

        Issue Links

          Activity

          Hide
          Kristian Waagan added a comment -

          Of course this bug should be fixed, but as Mike suggested in another Jira issue, I think it's time to look a bit closer on the code and see if we need a more thorough rewrite. The SQLXX classes themselves comes to mind, but we should also verify that there isn't a mismatch between what the SQLXX classes expect/assumes and what the store is capable of. Last, we also have a number of LOB specific classes on top.

          For instance, would it be wiser to more clearly separate code for LOBs from the smaller data types in the data type hierarchy?
          I don't know the history of the code, but it might be problematic if the existing code was written with small data types in mind.

          There seems to be at least two separate issues:
          a) Materialization
          b) Stream handling

          As a sub point under b, I'm also wondering why we don't encode stream length in the store stream and how hard it would be to fix this. One problem is that the space reserved for this purpose seems too limited.

          Show
          Kristian Waagan added a comment - Of course this bug should be fixed, but as Mike suggested in another Jira issue, I think it's time to look a bit closer on the code and see if we need a more thorough rewrite. The SQLXX classes themselves comes to mind, but we should also verify that there isn't a mismatch between what the SQLXX classes expect/assumes and what the store is capable of. Last, we also have a number of LOB specific classes on top. For instance, would it be wiser to more clearly separate code for LOBs from the smaller data types in the data type hierarchy? I don't know the history of the code, but it might be problematic if the existing code was written with small data types in mind. There seems to be at least two separate issues: a) Materialization b) Stream handling As a sub point under b, I'm also wondering why we don't encode stream length in the store stream and how hard it would be to fix this. One problem is that the space reserved for this purpose seems too limited.
          Hide
          Kathey Marsden added a comment -

          One possible workaround is just to select the BLOB and call Blob.length(). worst case this call will go through the entire stream without materializing. When inserting the Blob, it is better to use the setBinaryStream call that takes length so that the length is stored in the stream and the call to Blob.length() willl be a lot faster. See attached program LengthThruBlob. It needs Astream.java from the original repro zip file.

          Show
          Kathey Marsden added a comment - One possible workaround is just to select the BLOB and call Blob.length(). worst case this call will go through the entire stream without materializing. When inserting the Blob, it is better to use the setBinaryStream call that takes length so that the length is stored in the stream and the call to Blob.length() willl be a lot faster. See attached program LengthThruBlob. It needs Astream.java from the original repro zip file.
          Hide
          Kathey Marsden added a comment -

          Looking at a possible quick fix approach as opposed to reworking the type code.

          If I insert the lob using length, I should have a streamValueLength available. When we call getLength() stream is set but streamValueLength is not. Below is the trace when the stream is set. It seems to me readRecordFromArray should be able to tell the length and pass that to setStream(). Does that sound reasonable?

          SQLBlob(SQLBinary).setStream(InputStream) line: 563
          StoredPage.readRecordFromArray(Object[], int, int[], int[], ArrayInputStream, StoredRecordHeader, RecordHandle) line: 5592
          StoredPage.restoreRecordFromSlot(int, Object[], FetchDescriptor, RecordHandle, StoredRecordHeader, boolean) line: 1497
          StoredPage(BasePage).fetchFromSlot(RecordHandle, int, Object[], FetchDescriptor, boolean) line: 459
          HeapScan(GenericScanController).fetchRows(DataValueDescriptor[][], RowLocation[], BackingStoreHashtable, long, int[]) line: 759
          HeapScan.fetchNextGroup(DataValueDescriptor[][], RowLocation[]) line: 324
          BulkTableScanResultSet.reloadArray() line: 327
          BulkTableScanResultSet.getNextRowCore() line: 282
          ProjectRestrictResultSet.getNextRowCore() line: 255
          ProjectRestrictResultSet(BasicNoPutResultSetImpl).getNextRow() line: 460
          EmbedResultSet40(EmbedResultSet).movePosition(int, int, String) line: 423
          EmbedResultSet40(EmbedResultSet).next() line: 367
          LengthLargeLob.main(String[]) line: 21

          Show
          Kathey Marsden added a comment - Looking at a possible quick fix approach as opposed to reworking the type code. If I insert the lob using length, I should have a streamValueLength available. When we call getLength() stream is set but streamValueLength is not. Below is the trace when the stream is set. It seems to me readRecordFromArray should be able to tell the length and pass that to setStream(). Does that sound reasonable? SQLBlob(SQLBinary).setStream(InputStream) line: 563 StoredPage.readRecordFromArray(Object[], int, int[], int[], ArrayInputStream, StoredRecordHeader, RecordHandle) line: 5592 StoredPage.restoreRecordFromSlot(int, Object[], FetchDescriptor, RecordHandle, StoredRecordHeader, boolean) line: 1497 StoredPage(BasePage).fetchFromSlot(RecordHandle, int, Object[], FetchDescriptor, boolean) line: 459 HeapScan(GenericScanController).fetchRows(DataValueDescriptor[][], RowLocation[], BackingStoreHashtable, long, int[]) line: 759 HeapScan.fetchNextGroup(DataValueDescriptor[][], RowLocation[]) line: 324 BulkTableScanResultSet.reloadArray() line: 327 BulkTableScanResultSet.getNextRowCore() line: 282 ProjectRestrictResultSet.getNextRowCore() line: 255 ProjectRestrictResultSet(BasicNoPutResultSetImpl).getNextRow() line: 460 EmbedResultSet40(EmbedResultSet).movePosition(int, int, String) line: 423 EmbedResultSet40(EmbedResultSet).next() line: 367 LengthLargeLob.main(String[]) line: 21
          Hide
          Mike Matrigali added a comment -

          >Looking at a possible quick fix approach as opposed to reworking the type code.

          >If I insert the lob using length, I should have a streamValueLength available. When we call >getLength() stream is set but streamValueLength is not. Below is the trace when the stream is >set. It seems to me readRecordFromArray should be able to tell the length and pass that to >setStream(). Does that sound reasonable?

          No I don't think this will work. store does not really know anything about the stream, it basically
          just knows that it is a "long" column which means that it was asked to store a column longer than a page. And the current format for long column does not include any sort of length encoding for the entire column. It splits the long column into multiple pieces and links them and only maintains length of each link. In the case of blob the type itself has the length encoding in the column itself. The description of the encoding of this length is located in SQLBinary.java.

          What I think would be easiest is that the getlength() call in SQLBinary could read the first few bytes and determine the length and then "reset()" the stream so that the other calls would be
          unaffected. Also depending on how the stream was stored there may not be a length in
          the stream so get length would have to read all the stream and then again do the reset. The
          question is whether this reset is going to mess anything else up. Doing it this way would
          only cause overhead if getlength is called, rather than always reading it.
          SQLBinary.readBinaryLength() is the call that reads the length from inside the column at the
          front of the stream.

          Show
          Mike Matrigali added a comment - >Looking at a possible quick fix approach as opposed to reworking the type code. >If I insert the lob using length, I should have a streamValueLength available. When we call >getLength() stream is set but streamValueLength is not. Below is the trace when the stream is >set. It seems to me readRecordFromArray should be able to tell the length and pass that to >setStream(). Does that sound reasonable? No I don't think this will work. store does not really know anything about the stream, it basically just knows that it is a "long" column which means that it was asked to store a column longer than a page. And the current format for long column does not include any sort of length encoding for the entire column. It splits the long column into multiple pieces and links them and only maintains length of each link. In the case of blob the type itself has the length encoding in the column itself. The description of the encoding of this length is located in SQLBinary.java. What I think would be easiest is that the getlength() call in SQLBinary could read the first few bytes and determine the length and then "reset()" the stream so that the other calls would be unaffected. Also depending on how the stream was stored there may not be a length in the stream so get length would have to read all the stream and then again do the reset. The question is whether this reset is going to mess anything else up. Doing it this way would only cause overhead if getlength is called, rather than always reading it. SQLBinary.readBinaryLength() is the call that reads the length from inside the column at the front of the stream.
          Hide
          Kathey Marsden added a comment -

          Here is a first effort at fixing this in SQLBinary.getLength().
          If there is a stream, but it doesn't have the length it will read the length and return it if non-zero. Otherwise it will loop through the entire stream to get the length. Finally it will reset the stream. This seems to work for data inserted both with and without length.

          I still need to run tests and add tests, but wanted to make sure I am the right track.

          Kathey

          Show
          Kathey Marsden added a comment - Here is a first effort at fixing this in SQLBinary.getLength(). If there is a stream, but it doesn't have the length it will read the length and return it if non-zero. Otherwise it will loop through the entire stream to get the length. Finally it will reset the stream. This seems to work for data inserted both with and without length. I still need to run tests and add tests, but wanted to make sure I am the right track. Kathey
          Hide
          Mike Matrigali added a comment -

          I think you are on right track.

          What I described should work for the behavior when getLength() is called on a datatype that
          has been constructed by reading it out of the store. But I think the generic type code also has
          to worry about a datatype that is coming from a user and may be a stream. I think that is
          the "magic" of testing the type in the getValue() call. I don't know if we can always "reset" a
          stream from a user. I don't know off hand if it is possible to force the code through that
          path. Is it better to get an I/O error on the call, or to sometimes be able to instantiate the
          blob in memory and only get error if you run out memory?

          longer term the reading of the whole stream to determine the length could be optimized if
          the work could be pushed into the stream. In the case of reading this data from store I think
          the data is being read into the page cache page array, and then into a intermediate array
          to implement the stream back to the user. And now this is adding another level. I wonder
          if skip() will work rather than reading the actual bytes?

          one nit is that indentation doesn't look right for mixure of tabs/spaces.

          Show
          Mike Matrigali added a comment - I think you are on right track. What I described should work for the behavior when getLength() is called on a datatype that has been constructed by reading it out of the store. But I think the generic type code also has to worry about a datatype that is coming from a user and may be a stream. I think that is the "magic" of testing the type in the getValue() call. I don't know if we can always "reset" a stream from a user. I don't know off hand if it is possible to force the code through that path. Is it better to get an I/O error on the call, or to sometimes be able to instantiate the blob in memory and only get error if you run out memory? longer term the reading of the whole stream to determine the length could be optimized if the work could be pushed into the stream. In the case of reading this data from store I think the data is being read into the page cache page array, and then into a intermediate array to implement the stream back to the user. And now this is adding another level. I wonder if skip() will work rather than reading the actual bytes? one nit is that indentation doesn't look right for mixure of tabs/spaces.
          Hide
          Kathey Marsden added a comment -

          Thanks Mike for looking at the preview patch.

          I don't know that we can use skip. The javadoc seems to indicate that skip might return -1 even if it hasn't reached the end of the stream.

          I am trying to get my head around your statement. "But I think the generic type code also has to worry about a datatype that is coming from a user and may be a stream. "
          What user scenario is this?

          I think that probably the best thing to do is only pass through the new code if (stream instanceof Resetable), then the non-resetable stream will be materialized as beforek,
          but again I'd like to understand in what case that would occur. We do seem to cover the getValue() case where stream is not an instanceof FormatIdInputStream, so we must have a test case for it.

          Show
          Kathey Marsden added a comment - Thanks Mike for looking at the preview patch. I don't know that we can use skip. The javadoc seems to indicate that skip might return -1 even if it hasn't reached the end of the stream. I am trying to get my head around your statement. "But I think the generic type code also has to worry about a datatype that is coming from a user and may be a stream. " What user scenario is this? I think that probably the best thing to do is only pass through the new code if (stream instanceof Resetable), then the non-resetable stream will be materialized as beforek, but again I'd like to understand in what case that would occur. We do seem to cover the getValue() case where stream is not an instanceof FormatIdInputStream, so we must have a test case for it.
          Hide
          Mike Matrigali added a comment - - edited

          The 2 cases I try to think about with stream/blobs/clobs are always the following:
          1) The data is coming off the disk, through store as a stream, and back to user somehow

          • ie. select
            2) The data is coming from user to be put into store as a stream - ie. insert.

          Mostly the datatype code should not really care which of these 2 cases is in effect, but
          I think Resetable is a case that matters.

          The case you are testing is #1, with getLength() being called by the query itself on a stream
          created by the store. That stream is always going to be a OverflowInputStream when it comes
          out of store - does not matter what type. But type code should not use that. I think using
          instanceof Resetable is the right thing.

          In case 2 embedded I think the stream may come directly from user - but I am not sure what
          jdbc layer may do to it before it gets made into a SQLBinary. A different case might be what
          comes to the embedded engine after a stream gets sent in from a network client across drda
          and then into the embedded engine. In both of these cases I don't know if you can actually
          get getLength() called, but I also don't think you can count on resetting the stream so seems
          cleaner to leave the code as is for this case rather than have the code pretend it could handle it. I don't really know a lot about case 2, so may be off track. The fact that the stream has
          the length encoding in it means that some derby code has already got involved and modified
          the stream the user provided. Maybe something like a query that cast a parameter into a blob and then called the
          sql length function on that parameter and inserted the result into a table?

          I don't know if there are any other situations with streams available in the datatype -
          maybe triggers?

          you probably should ignore the skip comment, better to get it right than worry about optimizing it. Your fix should be way faster in case where length exists at front, and be way less memory intensive in the other case so a good incremental improvement.

          Show
          Mike Matrigali added a comment - - edited The 2 cases I try to think about with stream/blobs/clobs are always the following: 1) The data is coming off the disk, through store as a stream, and back to user somehow ie. select 2) The data is coming from user to be put into store as a stream - ie. insert. Mostly the datatype code should not really care which of these 2 cases is in effect, but I think Resetable is a case that matters. The case you are testing is #1, with getLength() being called by the query itself on a stream created by the store. That stream is always going to be a OverflowInputStream when it comes out of store - does not matter what type. But type code should not use that. I think using instanceof Resetable is the right thing. In case 2 embedded I think the stream may come directly from user - but I am not sure what jdbc layer may do to it before it gets made into a SQLBinary. A different case might be what comes to the embedded engine after a stream gets sent in from a network client across drda and then into the embedded engine. In both of these cases I don't know if you can actually get getLength() called, but I also don't think you can count on resetting the stream so seems cleaner to leave the code as is for this case rather than have the code pretend it could handle it. I don't really know a lot about case 2, so may be off track. The fact that the stream has the length encoding in it means that some derby code has already got involved and modified the stream the user provided. Maybe something like a query that cast a parameter into a blob and then called the sql length function on that parameter and inserted the result into a table? I don't know if there are any other situations with streams available in the datatype - maybe triggers? you probably should ignore the skip comment, better to get it right than worry about optimizing it. Your fix should be way faster in case where length exists at front, and be way less memory intensive in the other case so a good incremental improvement.
          Hide
          Kathey Marsden added a comment -

          I'm not really sure the best way to test this. Should I create a new LowMemTest and run it with 16M heap via the top level junit-all ant target? This would mean it wouldn't get run in suites.All. Alternately I can just add tests to existing tests in suites.All and make sure we have coverage of the new code, but then it won't test for the memory usage explicitly.

          We have an existing test derbyStress.java which runs in the old suite which runs with 64MB but I want to run with less memory than that.

          Show
          Kathey Marsden added a comment - I'm not really sure the best way to test this. Should I create a new LowMemTest and run it with 16M heap via the top level junit-all ant target? This would mean it wouldn't get run in suites.All. Alternately I can just add tests to existing tests in suites.All and make sure we have coverage of the new code, but then it won't test for the memory usage explicitly. We have an existing test derbyStress.java which runs in the old suite which runs with 64MB but I want to run with less memory than that.
          Hide
          Knut Anders Hatlen added a comment -

          > I don't know that we can use skip. The javadoc seems to indicate that skip might return -1 even if it hasn't reached the end of the stream.

          Could you point me to the javadoc that says skip() can return -1? I see that it can return 0, but I didn't find anything about returning negative values. http://java.sun.com/javase/6/docs/api/java/io/InputStream.html#skip(long)

          Show
          Knut Anders Hatlen added a comment - > I don't know that we can use skip. The javadoc seems to indicate that skip might return -1 even if it hasn't reached the end of the stream. Could you point me to the javadoc that says skip() can return -1? I see that it can return 0, but I didn't find anything about returning negative values. http://java.sun.com/javase/6/docs/api/java/io/InputStream.html#skip(long )
          Hide
          Knut Anders Hatlen added a comment -

          > I'm not really sure the best way to test this. Should I create a new
          > LowMemTest and run it with 16M heap via the top level junit-all ant
          > target? This would mean it wouldn't get run in
          > suites.All. Alternately I can just add tests to existing tests in
          > suites.All and make sure we have coverage of the new code, but then
          > it won't test for the memory usage explicitly.

          What about a combination: Create a JUnit test in a separate suite and
          add the suite to suites.All. Additionally, create an ant target that
          runs the suite with a 16 MB heap. Then we get both the coverage and
          the possibility to test the memory usage.

          Show
          Knut Anders Hatlen added a comment - > I'm not really sure the best way to test this. Should I create a new > LowMemTest and run it with 16M heap via the top level junit-all ant > target? This would mean it wouldn't get run in > suites.All. Alternately I can just add tests to existing tests in > suites.All and make sure we have coverage of the new code, but then > it won't test for the memory usage explicitly. What about a combination: Create a JUnit test in a separate suite and add the suite to suites.All. Additionally, create an ant target that runs the suite with a 16 MB heap. Then we get both the coverage and the possibility to test the memory usage.
          Hide
          Kathey Marsden added a comment -

          Sorry I misspoke. I should have said it may return 0 even if the end of file has not been reached.

          Show
          Kathey Marsden added a comment - Sorry I misspoke. I should have said it may return 0 even if the end of file has not been reached.
          Hide
          Kristian Waagan added a comment -

          Most implementations are "well behaved" when it comes to skip. However, we have no guarantee of this.

          At least in some places in Derby, where skipping an exact amount of bytes (or until EOF ) is required, a skip-loop is used. If it returns 0 and there are more bytes missing, it is confirmed by calling read. If EOF is reached, it returns -1. If skip returned 0 because of internal buffering, read will return the next byte and probably refill the internal buffer.

          Show
          Kristian Waagan added a comment - Most implementations are "well behaved" when it comes to skip. However, we have no guarantee of this. At least in some places in Derby, where skipping an exact amount of bytes (or until EOF ) is required, a skip-loop is used. If it returns 0 and there are more bytes missing, it is confirmed by calling read. If EOF is reached, it returns -1. If skip returned 0 because of internal buffering, read will return the next byte and probably refill the internal buffer.
          Hide
          Kathey Marsden added a comment -

          The javadoc for skip says:

          "The skip method of this class creates a byte array and then repeatedly reads into it until n bytes have been read or the end of the stream has been reached. "

          Is there any reason that we should expect it to be faster than reading into our own buffer?

          Show
          Kathey Marsden added a comment - The javadoc for skip says: "The skip method of this class creates a byte array and then repeatedly reads into it until n bytes have been read or the end of the stream has been reached. " Is there any reason that we should expect it to be faster than reading into our own buffer?
          Hide
          Kathey Marsden added a comment -

          Attached is a patch for this issue derby-3732_diff.txt which has the tests and as suggested adds both a junit-lowmem target as part of junit-all and runs the new memory._Suite as part of suites .All even though we don't have restricted memory.

          This solution does not use skip. discussion is still ongoing about that. I am running tests now, but am having I think unrelated problems running junit-all. I'll post if I can't figure that out.

          Show
          Kathey Marsden added a comment - Attached is a patch for this issue derby-3732_diff.txt which has the tests and as suggested adds both a junit-lowmem target as part of junit-all and runs the new memory._Suite as part of suites .All even though we don't have restricted memory. This solution does not use skip. discussion is still ongoing about that. I am running tests now, but am having I think unrelated problems running junit-all. I'll post if I can't figure that out.
          Hide
          Kristian Waagan added a comment -

          Regarding skip performance, I would say it's hard to tell.
          The code in the JVM might be optimized and perform slightly better, even if it is using the same approach.
          However, in some cases the stream implementation might be capable of skipping an amount of data a lot faster than reading it. One example might be streams from files, where a skip could in theory consist of just changing a pointer. ByteArrayInputStream also has its own implementation of skip, which doesn't use read.

          Only using read directly can result in loosing optimizations for some stream implementations.
          Also note that the contracts are different for InputStream.skip and Reader.skip.

          For reference, here's an excerpt from Java SE 6 InputStream.skip JavaDoc:
          "Subclasses are encouraged to provide a more efficient implementation of this method. For instance, the implementation may depend on the ability to seek."

          Show
          Kristian Waagan added a comment - Regarding skip performance, I would say it's hard to tell. The code in the JVM might be optimized and perform slightly better, even if it is using the same approach. However, in some cases the stream implementation might be capable of skipping an amount of data a lot faster than reading it. One example might be streams from files, where a skip could in theory consist of just changing a pointer. ByteArrayInputStream also has its own implementation of skip, which doesn't use read. Only using read directly can result in loosing optimizations for some stream implementations. Also note that the contracts are different for InputStream.skip and Reader.skip. For reference, here's an excerpt from Java SE 6 InputStream.skip JavaDoc: "Subclasses are encouraged to provide a more efficient implementation of this method. For instance, the implementation may depend on the ability to seek."
          Hide
          Kathey Marsden added a comment -

          Attached is a patch derby-3732_skip_diff.txt that uses skip. Please take a look. I also upped the blob size to 18000000 in the test to make sure we had a bigger lob than we had memory allocated (16MB) for the junit-lowmem suite. The new test takes 209 seconds to run on my machine which I hope is not too long of an addition to suites.All. It only runs with jdk16 since we need to test length with lengthless insert.

          Show
          Kathey Marsden added a comment - Attached is a patch derby-3732_skip_diff.txt that uses skip. Please take a look. I also upped the blob size to 18000000 in the test to make sure we had a bigger lob than we had memory allocated (16MB) for the junit-lowmem suite. The new test takes 209 seconds to run on my machine which I hope is not too long of an addition to suites.All. It only runs with jdk16 since we need to test length with lengthless insert.
          Hide
          Mike Matrigali added a comment -

          from review of derby-3732_diff.txt

          Code changes look fine, seems like a safe incremental bug fix. I would say it is ok for backport when ready.
          nit code comment - I would have broke the 2 exceptions to muliple lines to fit in 80 characters.

          test code comments (nothing to stop a checkin):
          o While it probably works in the case of the test it is a bad paradigm to assume any ordering of rows in a
          table without an order by. Even insert order in general is not the guaranteed return order. Simply fixed by adding
          a key column to the base table and an order by. I don't remember what we do with blobs to the sorter, so
          adding an index may also avoid unwanted memory usage - not sure as the row count is so small it probably
          fits in memory.
          o Because of other bugs in the system I was a little worried about selecting length(b), b in same statement - I know
          there are bugs where selecting b, b results in the stream of one column affecting the other. It might be nice to
          add a check that it also works if you select them separately.
          o I always like to see in blob tests one other case. You covered null, 0 length, and long. It would be nice to cover
          short also (ie. some blob less than the size of a page - 1k is a safe choice). Again from looking at code I don't
          think this is an issue but does cover the other case through the code (maybe 0 is good enough but seems worth
          checking it as a separate case).
          o can you explain the need for reflection code added to BaseTestCase?

          Show
          Mike Matrigali added a comment - from review of derby-3732_diff.txt Code changes look fine, seems like a safe incremental bug fix. I would say it is ok for backport when ready. nit code comment - I would have broke the 2 exceptions to muliple lines to fit in 80 characters. test code comments (nothing to stop a checkin): o While it probably works in the case of the test it is a bad paradigm to assume any ordering of rows in a table without an order by. Even insert order in general is not the guaranteed return order. Simply fixed by adding a key column to the base table and an order by. I don't remember what we do with blobs to the sorter, so adding an index may also avoid unwanted memory usage - not sure as the row count is so small it probably fits in memory. o Because of other bugs in the system I was a little worried about selecting length(b), b in same statement - I know there are bugs where selecting b, b results in the stream of one column affecting the other. It might be nice to add a check that it also works if you select them separately. o I always like to see in blob tests one other case. You covered null, 0 length, and long. It would be nice to cover short also (ie. some blob less than the size of a page - 1k is a safe choice). Again from looking at code I don't think this is an issue but does cover the other case through the code (maybe 0 is good enough but seems worth checking it as a separate case). o can you explain the need for reflection code added to BaseTestCase?
          Hide
          Kathey Marsden added a comment -

          Mike asked:
          o can you explain the need for reflection code added to BaseTestCase?

          I pulled that method up from AllPackages.java so I could us it in memory._Suite().
          We need to add the test by reflection because it requires jdk 1.6 because we test lengthless setBinaryStream.
          The test will be ignored lower jdk versions.

          Show
          Kathey Marsden added a comment - Mike asked: o can you explain the need for reflection code added to BaseTestCase? I pulled that method up from AllPackages.java so I could us it in memory._Suite(). We need to add the test by reflection because it requires jdk 1.6 because we test lengthless setBinaryStream. The test will be ignored lower jdk versions.
          Hide
          Mike Matrigali added a comment -

          With respect to the skip discussion. At least in the datatype case where the stream is coming out of store as part of a select then derby "owns" the skip code, and in that case there are array buffers sitting around which allow for
          a very efficient skip. The code has multiple classes built on top of each other but the interesting ones are:
          OverflowInputStream.java (this is the key one - it is the type of stream store always returns for any long type)
          BufferedByteHolderInputStream.java
          ByteHolder.java
          ByteHolderInputStream.java

          With more work we could do an even better job of this scan for length, but I think it would take some interface
          changes. The current code I think will read the disk bytes to the buffer manager bytes, and then copy the
          buffer manager bytes to a private array and then skip those bytes. Logically no need to copy the bytes if we know
          that we are just going to skip them - but would take some code and maybe some interface change. Probably
          a better thing to work on would be to just always insure that the length is at the front of the blob. I am going to
          file a JIRA with some ideas on this.

          Show
          Mike Matrigali added a comment - With respect to the skip discussion. At least in the datatype case where the stream is coming out of store as part of a select then derby "owns" the skip code, and in that case there are array buffers sitting around which allow for a very efficient skip. The code has multiple classes built on top of each other but the interesting ones are: OverflowInputStream.java (this is the key one - it is the type of stream store always returns for any long type) BufferedByteHolderInputStream.java ByteHolder.java ByteHolderInputStream.java With more work we could do an even better job of this scan for length, but I think it would take some interface changes. The current code I think will read the disk bytes to the buffer manager bytes, and then copy the buffer manager bytes to a private array and then skip those bytes. Logically no need to copy the bytes if we know that we are just going to skip them - but would take some code and maybe some interface change. Probably a better thing to work on would be to just always insure that the length is at the front of the blob. I am going to file a JIRA with some ideas on this.
          Hide
          Kathey Marsden added a comment -

          Attached is a patch derby-3732_skip2_diff.txt that incorporates changes for Mike's comments. I'll run tests again and check this in tomorrow if I don't hear back anything. One thing to note is that adding a primary key column to the table did not cause sort avoidance and I got an out of memory error in the query with the order by. I had to use an optimizer override to get it to use the index. I think that's a bug. I will file one after I check this in and reference this test as the repro.

          Show
          Kathey Marsden added a comment - Attached is a patch derby-3732_skip2_diff.txt that incorporates changes for Mike's comments. I'll run tests again and check this in tomorrow if I don't hear back anything. One thing to note is that adding a primary key column to the table did not cause sort avoidance and I got an out of memory error in the query with the order by. I had to use an optimizer override to get it to use the index. I think that's a bug. I will file one after I check this in and reference this test as the repro.
          Hide
          Mike Matrigali added a comment - - edited

          derby-3732_skip2_diff.txt patch looks good to me, no more comments.

          I was surprised the sort avoidance did not work for such a simple query, probably bad costing on the sort side.
          There are some problems with costing row size with rows that contain blobs.

          Show
          Mike Matrigali added a comment - - edited derby-3732_skip2_diff.txt patch looks good to me, no more comments. I was surprised the sort avoidance did not work for such a simple query, probably bad costing on the sort side. There are some problems with costing row size with rows that contain blobs.
          Hide
          Kristian Waagan added a comment -

          I had a look at 'derby-3732_skip2_diff.txt' and have the following comments:
          a) Instead of using your own stream (Astream), can you use one of the existing streams for testing?
          Using a stream that only returns the same value over and over doesn't detect off-by-one errors or similar (which I introduced myself and had to debug as part of DERBY-3735). The simplest one we have is LoopingAlphabetStream for lowercase modern Latin.

          b) When reading the data back, can we use a bigger buffer than 1000 bytes? This would speed up the test when run with the client driver. The max length for binary varchar is the maximum that can be transferred between the server and the client in one go with locators.

          Some tiny formatting nits for SQLBinary:
          1) Mix of tabs and spaces on line "throwStreamingIOException(ioe);"
          2) Trailing tab on "byte[] bytes = getBytes();"
          3) Last line of method "throwStreamingIOException" doesn't use tab for indent.

          thanks,

          Show
          Kristian Waagan added a comment - I had a look at 'derby-3732_skip2_diff.txt' and have the following comments: a) Instead of using your own stream (Astream), can you use one of the existing streams for testing? Using a stream that only returns the same value over and over doesn't detect off-by-one errors or similar (which I introduced myself and had to debug as part of DERBY-3735 ). The simplest one we have is LoopingAlphabetStream for lowercase modern Latin. b) When reading the data back, can we use a bigger buffer than 1000 bytes? This would speed up the test when run with the client driver. The max length for binary varchar is the maximum that can be transferred between the server and the client in one go with locators. Some tiny formatting nits for SQLBinary: 1) Mix of tabs and spaces on line "throwStreamingIOException(ioe);" 2) Trailing tab on "byte[] bytes = getBytes();" 3) Last line of method "throwStreamingIOException" doesn't use tab for indent. thanks,
          Hide
          Knut Anders Hatlen added a comment -

          Would it be possible to compile the test against 1.4, call the lengthless method with reflection and exclude the testBlobLengthWithLengthlessInsert test case if the method isn't available? Then the test cases that don't use it could run on lower Java versions.

          Show
          Knut Anders Hatlen added a comment - Would it be possible to compile the test against 1.4, call the lengthless method with reflection and exclude the testBlobLengthWithLengthlessInsert test case if the method isn't available? Then the test cases that don't use it could run on lower Java versions.
          Hide
          Knut Anders Hatlen added a comment -

          The setting of derby.storage.pageCacheSize will only be respected if the database is booted by that test. I see it is set to 100 when it's run as a separate test, but not when it's run as part of a larger suite.

          It would be good to document in a comment why the optimizer override is needed.

          I was going to suggest that we should call skip(Long.MAX_VALUE) instead of skip(Integer.MAX_VALUE), but then I saw that it would probably trigger a bug in ArrayInputStream.skip() (will log a separate bug for that).

          Tiny nit: testBlobLength(boolean) should be private so that it's clearer that it's not a top-level test case.

          Show
          Knut Anders Hatlen added a comment - The setting of derby.storage.pageCacheSize will only be respected if the database is booted by that test. I see it is set to 100 when it's run as a separate test, but not when it's run as part of a larger suite. It would be good to document in a comment why the optimizer override is needed. I was going to suggest that we should call skip(Long.MAX_VALUE) instead of skip(Integer.MAX_VALUE), but then I saw that it would probably trigger a bug in ArrayInputStream.skip() (will log a separate bug for that). Tiny nit: testBlobLength(boolean) should be private so that it's clearer that it's not a top-level test case.
          Hide
          Kathey Marsden added a comment -

          I tried switching over to reflection and running with IBM and Sun JDK 1.4.2 and get an OOM in client running with 16MB heap. JDK 1.5 runs fine. Embedded is also ok with JDK 1.4.2. I'm investigating but if anyone knows of maybe some different behavior in client for JDK 1.4.2 that might cause this, let me know.

          1) testBlobLength(org.apache.derbyTesting.functionTests.tests.memory.BlobMemTest)java.lang.OutOfMemoryError:
          at org.apache.derby.client.am.Cursor.get_VARCHAR_FOR_BIT_DATA(Cursor.java:647)
          at org.apache.derby.client.am.Cursor.getBytes(Cursor.java:1058)
          at org.apache.derby.client.am.CallableStatement.getBytesX(CallableStatement.java:676)
          at org.apache.derby.client.am.CallableLocatorProcedures.blobGetBytes(CallableLocatorProcedures.java:447)
          at org.apache.derby.client.am.BlobLocatorInputStream.readBytes(BlobLocatorInputStream.java:176)
          at org.apache.derby.client.am.BlobLocatorInputStream.read(BlobLocatorInputStream.java:135)
          at java.io.BufferedInputStream.read1(BufferedInputStream.java:237)
          at java.io.BufferedInputStream.read(BufferedInputStream.java:294)
          at java.io.FilterInputStream.read(FilterInputStream.java(Compiled Code))
          at org.apache.derby.client.am.CloseFilterInputStream.read(CloseFilterInputStream.java:79)
          at java.io.FilterInputStream.read(FilterInputStream.java:110)
          at org.apache.derby.client.am.CloseFilterInputStream.read(CloseFilterInputStream.java:65)
          at org.apache.derbyTesting.functionTests.tests.memory.BlobMemTest.testBlobLength(BlobMemTest.java:124)
          at org.apache.derbyTesting.functionTests.tests.memory.BlobMemTest.testBlobLength(BlobMemTest.java:170)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:85)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:58)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java(Compiled Code))
          at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:104)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
          at junit.extensions.TestSetup.run(TestSetup.java:23)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
          at junit.extensions.TestSetup.run(TestSetup.java:23)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
          at junit.extensions.TestSetup.run(TestSetup.java:23)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
          at junit.extensions.TestSetup.run(TestSetup.java:23)

          Show
          Kathey Marsden added a comment - I tried switching over to reflection and running with IBM and Sun JDK 1.4.2 and get an OOM in client running with 16MB heap. JDK 1.5 runs fine. Embedded is also ok with JDK 1.4.2. I'm investigating but if anyone knows of maybe some different behavior in client for JDK 1.4.2 that might cause this, let me know. 1) testBlobLength(org.apache.derbyTesting.functionTests.tests.memory.BlobMemTest)java.lang.OutOfMemoryError: at org.apache.derby.client.am.Cursor.get_VARCHAR_FOR_BIT_DATA(Cursor.java:647) at org.apache.derby.client.am.Cursor.getBytes(Cursor.java:1058) at org.apache.derby.client.am.CallableStatement.getBytesX(CallableStatement.java:676) at org.apache.derby.client.am.CallableLocatorProcedures.blobGetBytes(CallableLocatorProcedures.java:447) at org.apache.derby.client.am.BlobLocatorInputStream.readBytes(BlobLocatorInputStream.java:176) at org.apache.derby.client.am.BlobLocatorInputStream.read(BlobLocatorInputStream.java:135) at java.io.BufferedInputStream.read1(BufferedInputStream.java:237) at java.io.BufferedInputStream.read(BufferedInputStream.java:294) at java.io.FilterInputStream.read(FilterInputStream.java(Compiled Code)) at org.apache.derby.client.am.CloseFilterInputStream.read(CloseFilterInputStream.java:79) at java.io.FilterInputStream.read(FilterInputStream.java:110) at org.apache.derby.client.am.CloseFilterInputStream.read(CloseFilterInputStream.java:65) at org.apache.derbyTesting.functionTests.tests.memory.BlobMemTest.testBlobLength(BlobMemTest.java:124) at org.apache.derbyTesting.functionTests.tests.memory.BlobMemTest.testBlobLength(BlobMemTest.java:170) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:85) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:58) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java(Compiled Code)) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:104) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22) at junit.extensions.TestSetup$1.protect(TestSetup.java:19) at junit.extensions.TestSetup.run(TestSetup.java:23) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22) at junit.extensions.TestSetup$1.protect(TestSetup.java:19) at junit.extensions.TestSetup.run(TestSetup.java:23) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22) at junit.extensions.TestSetup$1.protect(TestSetup.java:19) at junit.extensions.TestSetup.run(TestSetup.java:23) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22) at junit.extensions.TestSetup$1.protect(TestSetup.java:19) at junit.extensions.TestSetup.run(TestSetup.java:23)
          Hide
          Kristian Waagan added a comment -

          I don't have any details, but before you changed the Blob size in the test, I tried working with a 48 M Blob. It worked with Java SE 6, but garbage collection "went crazy" - i.e. a lot of time was being spent on gc and I think I saw like 12 000 collections or something. This happened during the client test.

          I tried to debug it quickly, but besides from noticing lots of rather small byte arrays I couldn't find anything. I was unable to trace the source of these arras during the little time I spent on the investigation.

          A few "random ramblings":
          How big is your page cache?
          Also, unless you're already doing this, maybe it would make sense to run the client and the server in different JVM to better monitor the heap usage?
          And what about the ant / junit things? Do they use much memory?

          Show
          Kristian Waagan added a comment - I don't have any details, but before you changed the Blob size in the test, I tried working with a 48 M Blob. It worked with Java SE 6, but garbage collection "went crazy" - i.e. a lot of time was being spent on gc and I think I saw like 12 000 collections or something. This happened during the client test. I tried to debug it quickly, but besides from noticing lots of rather small byte arrays I couldn't find anything. I was unable to trace the source of these arras during the little time I spent on the investigation. A few "random ramblings": How big is your page cache? Also, unless you're already doing this, maybe it would make sense to run the client and the server in different JVM to better monitor the heap usage? And what about the ant / junit things? Do they use much memory?
          Hide
          Kathey Marsden added a comment -

          Thanks Knut, Kristian, and Mike for the reviews.

          Attached is a new patch derby-3732_skip3_diff.txt which incorporates all the review comments thus far except I am not quite sure what to do about the pageCacheSize not getting set if this test is not run as the first test. It's not an issue with suites.All since the heap is large enough and for the junit-lowmem suite it is the first and only test, so works ok there too.

          The one outstanding issue is that the new test with 16MB heap runs out of memory in the client for jdk 1.4.2. It runs fine with jdk 1.5, so right now the lowmem suite does not run with jdk 1.4.2. The problem is in client and is related to retrieving the entire lob, (not related to the length which is the focus of this issue.) I'd like to go ahead and check in this change and deal with the jdk 1.4.2 out of memory issue separately.

          I'm going to go ahead and run tests and checkin unless there are more comments.

          Show
          Kathey Marsden added a comment - Thanks Knut, Kristian, and Mike for the reviews. Attached is a new patch derby-3732_skip3_diff.txt which incorporates all the review comments thus far except I am not quite sure what to do about the pageCacheSize not getting set if this test is not run as the first test. It's not an issue with suites.All since the heap is large enough and for the junit-lowmem suite it is the first and only test, so works ok there too. The one outstanding issue is that the new test with 16MB heap runs out of memory in the client for jdk 1.4.2. It runs fine with jdk 1.5, so right now the lowmem suite does not run with jdk 1.4.2. The problem is in client and is related to retrieving the entire lob, (not related to the length which is the focus of this issue.) I'd like to go ahead and check in this change and deal with the jdk 1.4.2 out of memory issue separately. I'm going to go ahead and run tests and checkin unless there are more comments.
          Hide
          Kathey Marsden added a comment -

          Committed revision 672818. I will backport to 10.4 and 10.3 after the nightlies run cleanly.

          Show
          Kathey Marsden added a comment - Committed revision 672818. I will backport to 10.4 and 10.3 after the nightlies run cleanly.

            People

            • Assignee:
              Kathey Marsden
              Reporter:
              Kathey Marsden
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development