Derby
  1. Derby
  2. DERBY-3934

Improve performance of reading modified Clobs

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.5.1.1
    • Fix Version/s: 10.5.1.1
    • Component/s: JDBC
    • Labels:
      None
    • Bug behavior facts:
      Performance

      Description

      The performance of reading modified Clobs is poor, which is demonstrated by running a test program selecting a 10 MB Clob and then getting the contents using getSubString:

      • unmodified Clob (StoreStreamClob) : ~1 300 ms
      • modified Clob (TemporaryClob): ~156 000 ms

      In this case, the Clob was modified by changing the first character.

      A number of subtasks will be created to handle the various issues, which will be related to both performance and code cleanup.
      For a brief overview, see http://www.nabble.com/Suggestion-for-improving-ClobUpdatableReader-and-related-code-to20308303.html

      1. derby-3934-6a-UTF8Reader_remove_method.diff
        1 kB
        Kristian Waagan
      2. derby-3934-5a-UTF8Reader_cleanup.diff
        8 kB
        Kristian Waagan
      3. derby-3934-4a-getinternalreader_cachedlength.diff
        12 kB
        Kristian Waagan
      4. derby-3934-3a-clobupdreader_utf8reader.stat
        0.3 kB
        Kristian Waagan
      5. derby-3934-3a-clobupdreader_utf8reader.diff
        30 kB
        Kristian Waagan
      6. derby-3934-2a-intclob_new_methods.diff
        4 kB
        Kristian Waagan
      7. derby-3934-1a-clob_replace_test.diff
        3 kB
        Kristian Waagan

        Activity

        Kristian Waagan created issue -
        Kristian Waagan made changes -
        Field Original Value New Value
        Status Open [ 1 ] In Progress [ 3 ]
        Hide
        Kristian Waagan added a comment -

        'derby-3934-1a-clob_replace_test.diff' adds two simple tests where the characters of a Clob containing multi-byte chars are replaced one by one - forwards and backwards.
        This test can detect errors in the positioning logic and incorrect offsets/positions in the internal stream descriptors.

        Patch ready for review.
        I will commit this patch tomorrow.

        Show
        Kristian Waagan added a comment - 'derby-3934-1a-clob_replace_test.diff' adds two simple tests where the characters of a Clob containing multi-byte chars are replaced one by one - forwards and backwards. This test can detect errors in the positioning logic and incorrect offsets/positions in the internal stream descriptors. Patch ready for review. I will commit this patch tomorrow.
        Kristian Waagan made changes -
        Attachment derby-3934-1a-clob_replace_test.diff [ 12394660 ]
        Kristian Waagan made changes -
        Derby Info [Patch Available]
        Hide
        Kristian Waagan added a comment -

        'derby-3934-2a-intclob_new_methods.diff' adds two new methods to the InternalClob interface; getUpdateCount and isReleased.
        They can be used to determine if the content of a Clob has been changed, and if the internal Clob representation has been changed or the Clob has been closed.
        The isReleased() method will be used to detect when a read-only Clob representation is changed to a writable representation due to a user's request to modify the content (i.e. setString).

        The code is currently not being used, but it will be by the new implementation of ClobUpdatableReader (not yet posted - I do have a prototype if anyone is interested).

        Patch ready for review.

        Show
        Kristian Waagan added a comment - 'derby-3934-2a-intclob_new_methods.diff' adds two new methods to the InternalClob interface; getUpdateCount and isReleased. They can be used to determine if the content of a Clob has been changed, and if the internal Clob representation has been changed or the Clob has been closed. The isReleased() method will be used to detect when a read-only Clob representation is changed to a writable representation due to a user's request to modify the content (i.e. setString). The code is currently not being used, but it will be by the new implementation of ClobUpdatableReader (not yet posted - I do have a prototype if anyone is interested). Patch ready for review.
        Kristian Waagan made changes -
        Attachment derby-3934-2a-intclob_new_methods.diff [ 12394677 ]
        Hide
        Kristian Waagan added a comment -

        Committed patch 1a to trunk with revision 720767.

        Show
        Kristian Waagan added a comment - Committed patch 1a to trunk with revision 720767.
        Hide
        Kristian Waagan added a comment -

        Committed patch 2a to trunk with revision 721162.

        Show
        Kristian Waagan added a comment - Committed patch 2a to trunk with revision 721162.
        Kristian Waagan made changes -
        Derby Info [Patch Available]
        Hide
        Kristian Waagan added a comment -

        'derby-3934-3a-clobupdreader_utf8reader.diff' makes the handling of
        StoreStreamClob and TemporaryClob consistent.

        The following files are touched (all in derby.impl.jdbc):

            • EmbedClob.
              Updated call to ClobUpdatableReader. The change of the position argument is
              intentional.
            • TemporaryClob
              Replaced the ClobUpdatableReader returned by getReader with a UTF8Reader.
              Internal handling of TemporaryClob should deal with changing contents
              specifically, or create a ClobUpdatableReader where required.
              Note also the use of the new CharacterStreamDescriptor class. This piece of
              code will probably be changed later on, when there is more information about
              the stream available. For instance, caching byte/char positions allows to skip
              directly to the byte position through the underlying file API. This way, we
              don't have to decode all the raw bytes to skip the correct number of chars.
            • ClobUpdatableReader
              More or less rewritten. It now uses the new methods exposed by InternalClob to
              detect changes in the underlying Clob content. Note that this class doesn't
              handle repositioning, only detection of changes and forwarding of read/skip
              calls.
              Note the lazy initialization of the underlying reader.

        WARNING: There is one thing missing, which is proper synchronization. Access to
        store will be synchronized in other locations, but this class is not thread
        safe. I haven't decided yet whether to synchronize on the reader object or the
        root connection. I think the latter is the best choice. Does anyone know
        anything about the cost of taking locks on the same object multiple times?

            • StoreStreamClob
              Replaced old UTF8Reader constructor with the new one. Again, this code needs
              to be updated when more information about the stream is available. This is to
              allow UTF8Reader to perform better.
            • UTF8Reader
              Added a new constructor, using the new CharacterStreamDescriptor class.
              Removed one constructor.
              Retrofitted the second old constructor to use CharacterStreamDescriptor. This
              will be removed when the calling code has been updated.
              The old method calculating the buffer size will also be removed.
              Stopped referencing PositionedStoreStream, using PositionedStream interface
              instead. This allows the positioning logic to be used for both store streams
              and LOBInputStream streams.
              The reader has been prepared to be able to deal with multiple data offsets,
              i.e. handling several store stream formats. For instance, the current
              implementations has an offset of two bytes, where as the planned new one will
              have an offset of at least five bytes. LOBInputStream has an offset of zero
              bytes (no header information).
              From now on, position aware streams are not closed as early as before, because
              we might have go backwards in the stream. Streams that can only move forwards
              are closed as soon as possible (as before).

        Tests are running, and about 3/4 finished. No errors so far. I will post final
        results later.
        Patch ready for review.

        The plan forwards
        -----------------
        After patch 3a is in, I plan to do the following;
        1) Implement TemporaryClob.getInternalReader().
        This will dramatically improve the Clob.getSubString performance for
        modified Clobs.
        2) I will consider adding a simple byte/char position cache.
        The point of this is to be able to skip to a given byte position without
        having to decode byte into chars. This is a mechanism that will only help
        certain access patterns, but it should come with a very low overhead.
        3) Continue working with the new Clob format.
        When it is in place, care must be taken to utilize the new steam
        information where possible. The primary one is returning the length through
        Clob.length(). A second opportunity is using the length information to take
        decisions in the byte/char position cache.
        This work is mostly related to DERBY-3907.

        I'm using the simple Clob regression tests in my work, and it has already
        revealed a bug I had forgotten to include the know byte length in the
        CharacterStreamDescription, which caused UTF8Reader to allocate a buffer that
        was way too big (8K instead of 100 bytes).
        The last step in my LOB work will be to write a simple report documenting the
        improvements.

        Show
        Kristian Waagan added a comment - 'derby-3934-3a-clobupdreader_utf8reader.diff' makes the handling of StoreStreamClob and TemporaryClob consistent. The following files are touched (all in derby.impl.jdbc): EmbedClob. Updated call to ClobUpdatableReader. The change of the position argument is intentional. TemporaryClob Replaced the ClobUpdatableReader returned by getReader with a UTF8Reader. Internal handling of TemporaryClob should deal with changing contents specifically, or create a ClobUpdatableReader where required. Note also the use of the new CharacterStreamDescriptor class. This piece of code will probably be changed later on, when there is more information about the stream available. For instance, caching byte/char positions allows to skip directly to the byte position through the underlying file API. This way, we don't have to decode all the raw bytes to skip the correct number of chars. ClobUpdatableReader More or less rewritten. It now uses the new methods exposed by InternalClob to detect changes in the underlying Clob content. Note that this class doesn't handle repositioning, only detection of changes and forwarding of read/skip calls. Note the lazy initialization of the underlying reader. WARNING: There is one thing missing, which is proper synchronization. Access to store will be synchronized in other locations, but this class is not thread safe. I haven't decided yet whether to synchronize on the reader object or the root connection. I think the latter is the best choice. Does anyone know anything about the cost of taking locks on the same object multiple times? StoreStreamClob Replaced old UTF8Reader constructor with the new one. Again, this code needs to be updated when more information about the stream is available. This is to allow UTF8Reader to perform better. UTF8Reader Added a new constructor, using the new CharacterStreamDescriptor class. Removed one constructor. Retrofitted the second old constructor to use CharacterStreamDescriptor. This will be removed when the calling code has been updated. The old method calculating the buffer size will also be removed. Stopped referencing PositionedStoreStream, using PositionedStream interface instead. This allows the positioning logic to be used for both store streams and LOBInputStream streams. The reader has been prepared to be able to deal with multiple data offsets, i.e. handling several store stream formats. For instance, the current implementations has an offset of two bytes, where as the planned new one will have an offset of at least five bytes. LOBInputStream has an offset of zero bytes (no header information). From now on, position aware streams are not closed as early as before, because we might have go backwards in the stream. Streams that can only move forwards are closed as soon as possible (as before). Tests are running, and about 3/4 finished. No errors so far. I will post final results later. Patch ready for review. The plan forwards ----------------- After patch 3a is in, I plan to do the following; 1) Implement TemporaryClob.getInternalReader(). This will dramatically improve the Clob.getSubString performance for modified Clobs. 2) I will consider adding a simple byte/char position cache. The point of this is to be able to skip to a given byte position without having to decode byte into chars. This is a mechanism that will only help certain access patterns, but it should come with a very low overhead. 3) Continue working with the new Clob format. When it is in place, care must be taken to utilize the new steam information where possible. The primary one is returning the length through Clob.length(). A second opportunity is using the length information to take decisions in the byte/char position cache. This work is mostly related to DERBY-3907 . I'm using the simple Clob regression tests in my work, and it has already revealed a bug I had forgotten to include the know byte length in the CharacterStreamDescription, which caused UTF8Reader to allocate a buffer that was way too big (8K instead of 100 bytes). The last step in my LOB work will be to write a simple report documenting the improvements.
        Kristian Waagan made changes -
        Hide
        Kristian Waagan added a comment -

        Tests ran without errors.

        Committed patch 3a to trunk with revision 724294.

        Show
        Kristian Waagan added a comment - Tests ran without errors. Committed patch 3a to trunk with revision 724294.
        Hide
        Kristian Waagan added a comment -

        Patch 'derby-3934-4a-getinternalreader_cachedlength.diff' improves the performance of Clob.getSubString when the Clob is represented by a TemporaryClob.

        The main change in the patch is the implementation of TemporaryClob.getInternalReader(long).
        The second important change is that TemporaryClob now caches the character length. When the contents are changed, the length is either reset or updated accordingly if possible.
        I also added some length to verify that the length is correctly updated on Clob modifications.

        On my machine I see the following durations (one iteration) for ClobAccessTest.testFetchLargeClobPieceByPieceModified:
        trunk ~76'000 ms
        with patch 4a ~600 ms
        So with the patch the test completes in less than 1% of the time used by trunk. This is saved CPU time.

        Regressing tests are running.
        Patch ready for review.

        Show
        Kristian Waagan added a comment - Patch 'derby-3934-4a-getinternalreader_cachedlength.diff' improves the performance of Clob.getSubString when the Clob is represented by a TemporaryClob. The main change in the patch is the implementation of TemporaryClob.getInternalReader(long). The second important change is that TemporaryClob now caches the character length. When the contents are changed, the length is either reset or updated accordingly if possible. I also added some length to verify that the length is correctly updated on Clob modifications. On my machine I see the following durations (one iteration) for ClobAccessTest.testFetchLargeClobPieceByPieceModified: trunk ~76'000 ms with patch 4a ~600 ms So with the patch the test completes in less than 1% of the time used by trunk. This is saved CPU time. Regressing tests are running. Patch ready for review.
        Kristian Waagan made changes -
        Hide
        Kristian Waagan added a comment -

        Committed patch 4a to trunk with revision 726683.

        Work remaining:
        o See if a byte/char position cache can be used [more efficiently] to improve repositioning performance.
        o Cleaning up after the changes in UTF8Reader.

        Show
        Kristian Waagan added a comment - Committed patch 4a to trunk with revision 726683. Work remaining: o See if a byte/char position cache can be used [more efficiently] to improve repositioning performance. o Cleaning up after the changes in UTF8Reader.
        Kristian Waagan made changes -
        Fix Version/s 10.5.0.0 [ 12313010 ]
        Hide
        Kristian Waagan added a comment -

        Patch 5a removes a constructor and a method called only from that constructor.
        Updated the code in EmbedCallableStatement20 to use the new constructor.

        Running regression tests, will commit tomorrow if the tests pass.

        Show
        Kristian Waagan added a comment - Patch 5a removes a constructor and a method called only from that constructor. Updated the code in EmbedCallableStatement20 to use the new constructor. Running regression tests, will commit tomorrow if the tests pass.
        Kristian Waagan made changes -
        Attachment derby-3934-5a-UTF8Reader_cleanup.diff [ 12398233 ]
        Kristian Waagan made changes -
        Derby Info [Patch Available]
        Hide
        Kristian Waagan added a comment -

        Regression tests for patch 5a ran without failures.
        Committed to trunk with revision 736000.

        Show
        Kristian Waagan added a comment - Regression tests for patch 5a ran without failures. Committed to trunk with revision 736000.
        Kristian Waagan made changes -
        Derby Info [Patch Available]
        Hide
        Kristian Waagan added a comment -

        Patch 6a removes the now unused method 'readUnsignedShort'. It is no longer used because the reading of the header bytes have been moved outside the reader class.

        Committed to trunk with revision 742380.

        Show
        Kristian Waagan added a comment - Patch 6a removes the now unused method 'readUnsignedShort'. It is no longer used because the reading of the header bytes have been moved outside the reader class. Committed to trunk with revision 742380.
        Kristian Waagan made changes -
        Hide
        Kristian Waagan added a comment -

        Closing this issue.
        Some performance results (embedded only) can be seen here: http://www.nabble.com/CLOB-performance-td21831259.html#a21831259
        The most relevant result is testFetchLargeClobPieceByPieceModified, followed by testFetchLargeClobOneByOneCharModified and testFetchLargeClobsModified.

        There's a small performance hit for the testFetchLargeClobsModified, this may be because we create update sensitive streams unconditionally and that UTF8Reader has been equipped with more functionality. However, the changes to UTF8Reader are required to obtain acceptable performance with the client driver, which always fetches CLOBs piece by piece (through a callable statement using Clob.getSubString).

        Show
        Kristian Waagan added a comment - Closing this issue. Some performance results (embedded only) can be seen here: http://www.nabble.com/CLOB-performance-td21831259.html#a21831259 The most relevant result is testFetchLargeClobPieceByPieceModified, followed by testFetchLargeClobOneByOneCharModified and testFetchLargeClobsModified. There's a small performance hit for the testFetchLargeClobsModified, this may be because we create update sensitive streams unconditionally and that UTF8Reader has been equipped with more functionality. However, the changes to UTF8Reader are required to obtain acceptable performance with the client driver, which always fetches CLOBs piece by piece (through a callable statement using Clob.getSubString).
        Kristian Waagan made changes -
        Status In Progress [ 3 ] Closed [ 6 ]
        Resolution Fixed [ 1 ]
        Myrna van Lunteren made changes -
        Affects Version/s 10.5.1.1 [ 12313771 ]
        Affects Version/s 10.5.0.0 [ 12313010 ]
        Fix Version/s 10.5.1.1 [ 12313771 ]
        Fix Version/s 10.5.0.0 [ 12313010 ]
        Gavin made changes -
        Workflow jira [ 12445637 ] Default workflow, editable Closed status [ 12799049 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open In Progress In Progress
        20d 5h 51m 1 Kristian Waagan 25/Nov/08 16:33
        In Progress In Progress Closed Closed
        96d 19h 50m 1 Kristian Waagan 02/Mar/09 12:23

          People

          • Assignee:
            Kristian Waagan
            Reporter:
            Kristian Waagan
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development