Derby
  1. Derby
  2. DERBY-4241

Improve transition from read-only to writable Clob representation

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 10.5.1.1, 10.6.1.0
    • Fix Version/s: 10.7.1.1
    • Component/s: JDBC
    • Labels:
      None
    • Bug behavior facts:
      Performance

      Description

      When a store stream Clob is going to be modified, it will be written out to the temporary area of Derby and represented as a TemporaryClob.
      The transfer of the data is done in a sub-optimal manner for two reasons;
      o for transfer of the complete Clob, the copy method operates on the byte level and we're not able to save the character length.
      o for transfer of parts of the Clob (i.e. truncation), we have to first decode the UTF-8 encoding to find the byte count and then transfer the same bytes.

      I intend to do the following two changes;
      1) Add a getCharLengthIfKnow-method to InternalClob.
      2) Add a UTF-8 aware copy method to LOBStreamControl.

      When a complete Clob is to be copied, code like this will be executed;
      cachedCharLength = internalClob.getLengthIfKnown();
      if (cachedCharLength > 0)
      // use existing byte-oriented copy method for best performance (copy until EOF)
      else
      cachedCharLength = control.copyUTF8Data()

      When parts of a Clob is to be copied, we always use the UTF-8 aware copy method, but we also do a cheap range check.
      cachedCharLength = internalClob.getLengthIfKnown();
      if (cachedCharLength > 0 && requestedLength > cachedCharLength)
      throw EOFException();
      if (cachedCharLength == requestedLength)
      // use existing byte-oriented copy method for best performance (copy until EOF)
      else
      cachedCharLength = control.copyUTF8Data(requestedLength);

      Adding the UTF-8 aware copy method was started under DERBY-4023, including comments on the first revision of a patch.

      1. better.txt
        4 kB
        Kristian Waagan
      2. derby-4241-1a-InternalClob.getLengthIfKnown.diff
        3 kB
        Kristian Waagan
      3. derby-4241-2a-utf8AwareCopy.diff
        11 kB
        Kristian Waagan
      4. derby-4241-2b-utf8AwareCopy.diff
        12 kB
        Kristian Waagan
      5. derby-4241-32core-cmt.txt
        5 kB
        Kristian Waagan

        Issue Links

          Activity

          Hide
          Kristian Waagan added a comment -

          Patch 1a adds the method getLengthIfKnown to the InternalClob interface, and implements it in TemporaryClob and StoreStreamClob.

          I expect to commit this shortly.
          Patch ready for review.

          Show
          Kristian Waagan added a comment - Patch 1a adds the method getLengthIfKnown to the InternalClob interface, and implements it in TemporaryClob and StoreStreamClob. I expect to commit this shortly. Patch ready for review.
          Hide
          Kristian Waagan added a comment -

          Patch 4241-2a is a second revision of patch 4023-2a. The patch is dependent on path 4241-1a.
          I have (hopefully) addressed Knut's comments from DERBY-4023.

          All feedback welcome, and especially on the working of the message I added.

          Regressions tests passed.
          Patch ready for review.

          Show
          Kristian Waagan added a comment - Patch 4241-2a is a second revision of patch 4023-2a. The patch is dependent on path 4241-1a. I have (hopefully) addressed Knut's comments from DERBY-4023 . All feedback welcome, and especially on the working of the message I added. Regressions tests passed. Patch ready for review.
          Hide
          Kristian Waagan added a comment -

          Committed patch 1a to trunk with revision 782600.
          I'll wait a few more days with patch 2a, hoping that someone will have a look at it.

          Show
          Kristian Waagan added a comment - Committed patch 1a to trunk with revision 782600. I'll wait a few more days with patch 2a, hoping that someone will have a look at it.
          Hide
          Knut Anders Hatlen added a comment -

          Sorry, I thought I had commented on 2a already, but seems I had only looked at it. It looks fine to me. Or perhaps "correct" is more accurate than "fine", since the LOB code is getting more and more complex every day, it seems...

          Did you run any tests to verify that the performance was improved by the changes?

          Show
          Knut Anders Hatlen added a comment - Sorry, I thought I had commented on 2a already, but seems I had only looked at it. It looks fine to me. Or perhaps "correct" is more accurate than "fine", since the LOB code is getting more and more complex every day, it seems... Did you run any tests to verify that the performance was improved by the changes?
          Hide
          Kristian Waagan added a comment -

          Hi Knut,

          I ran a series of tests, but it's a long time ago... I was also working on some statistical analysis at that time, which hasn't made it into the Derby repos (I'm not sure they can).

          You can see the results from one of the runs in 'derby-4241-32core-cmt.txt', and the file 'better.txt' is just a grep on a series of such results.

          I saw up to ~65% improvement with the patch on the machines with the slowest CPUs. I believe that the benefit will be greater the larger the CLOB is (the tests used 15 MB CLOBs, I think).
          The conclusions are based on time measurements and confidence intervals (obtained using a technique called bootstrapping) for both the mean and the standard deviation. Therefore, in some cases the conclusion was "indecisive", even though looking at only the means (from a series of runs) indicated an improvement.
          Now, since this is so long ago, please don't ask too many detailed questions Also, since I'm no statistician, I cannot guarantee anything about the results I present...

          A small glossary:
          meanP = mean point
          meanP 2sd = the difference between the mean points are at least two times the standard deviation
          meanP 3sd = the difference between the mean points are at least three times the standard deviation
          meanHL 3sd = the high estimate of PATCHED lies at least three times the standard deviation away from the low estimate of BASE

          I cannot remember which value I used for the standard deviation, but I guess it was the point value.

          Show
          Kristian Waagan added a comment - Hi Knut, I ran a series of tests, but it's a long time ago... I was also working on some statistical analysis at that time, which hasn't made it into the Derby repos (I'm not sure they can). You can see the results from one of the runs in 'derby-4241-32core-cmt.txt', and the file 'better.txt' is just a grep on a series of such results. I saw up to ~65% improvement with the patch on the machines with the slowest CPUs. I believe that the benefit will be greater the larger the CLOB is (the tests used 15 MB CLOBs, I think). The conclusions are based on time measurements and confidence intervals (obtained using a technique called bootstrapping) for both the mean and the standard deviation. Therefore, in some cases the conclusion was "indecisive", even though looking at only the means (from a series of runs) indicated an improvement. Now, since this is so long ago, please don't ask too many detailed questions Also, since I'm no statistician, I cannot guarantee anything about the results I present... A small glossary: meanP = mean point meanP 2sd = the difference between the mean points are at least two times the standard deviation meanP 3sd = the difference between the mean points are at least three times the standard deviation meanHL 3sd = the high estimate of PATCHED lies at least three times the standard deviation away from the low estimate of BASE I cannot remember which value I used for the standard deviation, but I guess it was the point value.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks, Kristian. It sounds like this is a good improvement. +1 to commit.

          Show
          Knut Anders Hatlen added a comment - Thanks, Kristian. It sounds like this is a good improvement. +1 to commit.
          Hide
          Kristian Waagan added a comment -

          Thanks, Knut.

          Attaching revision 2b of the patch. I had to adjust the patch slightly due to other changes made since 2a was created (i.e. change the error message from I028 to I029).
          Regression tests passed.

          Show
          Kristian Waagan added a comment - Thanks, Knut. Attaching revision 2b of the patch. I had to adjust the patch slightly due to other changes made since 2a was created (i.e. change the error message from I028 to I029). Regression tests passed.
          Hide
          Kristian Waagan added a comment -

          Committed patch 2b to trunk with revision 958522.

          Show
          Kristian Waagan added a comment - Committed patch 2b to trunk with revision 958522.
          Hide
          Kristian Waagan added a comment -

          Backported to 10.6 with revision 963652.
          Closing issue.

          Show
          Kristian Waagan added a comment - Backported to 10.6 with revision 963652. Closing issue.

            People

            • Assignee:
              Kristian Waagan
              Reporter:
              Kristian Waagan
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development