Derby
  1. Derby
  2. DERBY-5752

LOBStreamControl should materialize less aggressively

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.9.1.0
    • Fix Version/s: 10.10.1.1
    • Component/s: JDBC
    • Labels:
      None

      Description

      The constructor LOBStreamControl(EmbedConnection, byte[]) always makes the buffer size equal to the LOB size, effectively creating an extra, fully materialized copy of the LOB in memory.

      I think the assumption here is that a LOB that's already materialized is a small one. That is, LOBs that are smaller than 32 KB and fit in a single page are typically materialized when read from store. However, we sometimes materialize LOBs that are a lot bigger than 32 KB. For example, triggers that access LOBs may materialize them regardless of size (see comment in DMLWriteResultSet's constructor for details). For these large LOBs, it sounds unreasonable to allocate a buffer of the same size as the LOB itself.

      I'd suggest that we change the constructor so that it never allocates a buffer larger than 32KB. That would mean that the behaviour is preserved for all LOBs fetched directly from store (only LOBs that don't fit in a single page will cause temporary files to be created), whereas we'll prevent large LOBs accessed by triggers from being duplicated in memory by overflowing to temporary files.

      1. d5752-1a.diff
        6 kB
        Knut Anders Hatlen
      2. buffsize.diff
        0.6 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Hide
          Knut Anders Hatlen added a comment -

          I experimented with the attached buffsize.diff patch (not for commit), which simply sets the buffer size to LOBStreamControl.DEFAULT_MAX_BUF_SIZE (4K) regardless of the size of the LOB. With that patch, the heap requirement for TriggerTest, which works on some large LOBs that get materialized, went down from 110MB to 85MB. (I ran this experiment on sources that had been patched for DERBY-5751, which already had reduced the heap requirements from 140MB to 110MB.) suites.All also passed with these changes.

          This is not the approach I'm planning to take in the final patch. I intend to make the buffer size dynamic based on the LOB size, as it is today, but have a maximum size of 32KB.

          Since the constructor in question is only used when the underlying LOB value is materialized before the EmbedBlob is instantiated, and LOBs larger than 32KB are typically not materialized at that point, I think the suggested approach would only affect the cases where

          • the LOB is used in a trigger
          • getBytes() has been called on the column before getBlob() (this case was in fact something we considered disallowing in DERBY-5489)

          In those two cases, if the LOB is larger than 32KB, the LOBStreamControl instance will overflow to temporary files instead of buffering the entire LOB in memory, if the suggested approach is implemented.

          Show
          Knut Anders Hatlen added a comment - I experimented with the attached buffsize.diff patch (not for commit), which simply sets the buffer size to LOBStreamControl.DEFAULT_MAX_BUF_SIZE (4K) regardless of the size of the LOB. With that patch, the heap requirement for TriggerTest, which works on some large LOBs that get materialized, went down from 110MB to 85MB. (I ran this experiment on sources that had been patched for DERBY-5751 , which already had reduced the heap requirements from 140MB to 110MB.) suites.All also passed with these changes. This is not the approach I'm planning to take in the final patch. I intend to make the buffer size dynamic based on the LOB size, as it is today, but have a maximum size of 32KB. Since the constructor in question is only used when the underlying LOB value is materialized before the EmbedBlob is instantiated, and LOBs larger than 32KB are typically not materialized at that point, I think the suggested approach would only affect the cases where the LOB is used in a trigger getBytes() has been called on the column before getBlob() (this case was in fact something we considered disallowing in DERBY-5489 ) In those two cases, if the LOB is larger than 32KB, the LOBStreamControl instance will overflow to temporary files instead of buffering the entire LOB in memory, if the suggested approach is implemented.
          Hide
          Knut Anders Hatlen added a comment -

          Resolved the wrong issue. Reopening.

          Show
          Knut Anders Hatlen added a comment - Resolved the wrong issue. Reopening.
          Hide
          Knut Anders Hatlen added a comment -

          I added an assert to the constructor and made it fail on attempts to set a buffer size greater than 32K. This exposed a third case in which larger LOBs are materialized:

          • LOBs will be be materialized if the results need to go through sorting (even when not sorting on the LOB column)
          Show
          Knut Anders Hatlen added a comment - I added an assert to the constructor and made it fail on attempts to set a buffer size greater than 32K. This exposed a third case in which larger LOBs are materialized: LOBs will be be materialized if the results need to go through sorting (even when not sorting on the LOB column)
          Hide
          Knut Anders Hatlen added a comment -

          The attached patch, d5752-1a.diff, changes LOBStreamControl's constructor so that it doesn't materialize LOBs larger than 32 KB in memory. It also adds a test case to BlobMemTest, which fails with an OOME without the fix.

          All the regression tests ran cleanly. However, I noticed that BlobClob4BlobTest took significantly longer with the patch. Specifically, the time it took to run test case testPositionAgressive() in an encrypted database increased from 20 seconds to more than 4 minutes. I look into it and try to find out why this particular test case gets so much slower.

          Show
          Knut Anders Hatlen added a comment - The attached patch, d5752-1a.diff, changes LOBStreamControl's constructor so that it doesn't materialize LOBs larger than 32 KB in memory. It also adds a test case to BlobMemTest, which fails with an OOME without the fix. All the regression tests ran cleanly. However, I noticed that BlobClob4BlobTest took significantly longer with the patch. Specifically, the time it took to run test case testPositionAgressive() in an encrypted database increased from 20 seconds to more than 4 minutes. I look into it and try to find out why this particular test case gets so much slower.
          Hide
          Knut Anders Hatlen added a comment -

          I had forgotten about this...

          Now when I rerun the tests, I am not able to reproduce the big difference I saw in BlobClob4BlobTest in the first test run. I do still see a difference, but it's more like 165 seconds vs 180 seconds for the full BlobClob4BlobTest. As before, it looks like the entire difference is caused by testPositionAgressive() in an encrypted database, which slowed down from 7 seconds to 23 seconds in my environment. There is no difference in that test case on unencrypted databased.

          The test case in question inserts a number of CLOBs, some of which are greater than the 32k limit for materialization, into a table. However, the query that reads the CLOBs is ordered on one of the non-CLOB columns, and the sorting materializes all the columns in the result. It eventually scans through the fetched CLOBs using Clob.position().

          The performance difference is seen because the java.sql.Clob objects fetched from the result set are no longer fully materialized in memory with the patch, unless they are smaller than 32k. For the big objects, this means that each call to Clob.position() will have to read temporary files and decrypt the contents in order to search for the substring. Without the patch, the entire value would live unencrypted in memory, which makes position() a much cheaper operation.

          I think this is an expected difference, and that it is acceptable since the CLOB wasn't supposed to be materialized in this scenario in the first place. Of course, the current limit for materialization might not be optimal for all applications, as materialization indeed could improve performance of some operations if the system has enough memory. Increasing the limit or making it tunable might be a useful improvement, but it's outside the scope of this issue.

          Show
          Knut Anders Hatlen added a comment - I had forgotten about this... Now when I rerun the tests, I am not able to reproduce the big difference I saw in BlobClob4BlobTest in the first test run. I do still see a difference, but it's more like 165 seconds vs 180 seconds for the full BlobClob4BlobTest. As before, it looks like the entire difference is caused by testPositionAgressive() in an encrypted database, which slowed down from 7 seconds to 23 seconds in my environment. There is no difference in that test case on unencrypted databased. The test case in question inserts a number of CLOBs, some of which are greater than the 32k limit for materialization, into a table. However, the query that reads the CLOBs is ordered on one of the non-CLOB columns, and the sorting materializes all the columns in the result. It eventually scans through the fetched CLOBs using Clob.position(). The performance difference is seen because the java.sql.Clob objects fetched from the result set are no longer fully materialized in memory with the patch, unless they are smaller than 32k. For the big objects, this means that each call to Clob.position() will have to read temporary files and decrypt the contents in order to search for the substring. Without the patch, the entire value would live unencrypted in memory, which makes position() a much cheaper operation. I think this is an expected difference, and that it is acceptable since the CLOB wasn't supposed to be materialized in this scenario in the first place. Of course, the current limit for materialization might not be optimal for all applications, as materialization indeed could improve performance of some operations if the system has enough memory. Increasing the limit or making it tunable might be a useful improvement, but it's outside the scope of this issue.
          Hide
          Knut Anders Hatlen added a comment -

          Setting the Patch Available flag since reruns of the regression tests did not show the big performance degradation indicated by the initial test run.

          Show
          Knut Anders Hatlen added a comment - Setting the Patch Available flag since reruns of the regression tests did not show the big performance degradation indicated by the initial test run.
          Hide
          Knut Anders Hatlen added a comment -

          Committed revision 1447722.

          Show
          Knut Anders Hatlen added a comment - Committed revision 1447722.

            People

            • Assignee:
              Knut Anders Hatlen
              Reporter:
              Knut Anders Hatlen
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development