Derby
  1. Derby
  2. DERBY-5068

Investigate increased CPU usage on client after introduction of UTF-8 CcsidManager

    Details

    • Bug behavior facts:
      Performance, Regression

      Description

      While looking at the performance graphs for the single-record select test during the last year - http://home.online.no/~olmsan/derby/perf/select_1y.html - I noticed that there was a significant increase (10-20%) in CPU usage per transaction on the client early in October 2010. To be precise, the increase seems to have happened between revision 1004381 and revision 1004794. In that period, there were three commits: two related to DERBY-4757, and one related to DERBY-4825 (tests only).

      We should try to find out what's causing the increased CPU usage and see if there's some way to reduce it.

      1. d5068-2b.diff
        33 kB
        Knut Anders Hatlen
      2. d5068-2a.stat
        0.6 kB
        Knut Anders Hatlen
      3. d5068-2a.diff
        31 kB
        Knut Anders Hatlen
      4. d5068-1a.diff
        4 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Knut Anders Hatlen created issue -
          Hide
          Knut Anders Hatlen added a comment -

          One transaction in the single-record select client results in the following calls to Utf8CcsidManager methods:

          • 8 calls to getByteLength()
          • 3 calls to convertFromJavaString(String,byte[],int,Agent)

          Both of these methods create an intermediate object (getByteLength() a byte array, convertFromJavaString() a String) that's thrown away, whereas EbcdicCcsid doesn't do that. Perhaps it's worthwhile to hand-code the UTF-8 encoding in these two methods to avoid having to create the intermediate objects. Or if we don't want to maintain code that does UTF-8 encoding ourselves: assume that all characters are ASCII and only fall back to using intermediate objects if a non-ASCII character is found.

          I'll run a quick experiment on the machines where I see this issue.

          Show
          Knut Anders Hatlen added a comment - One transaction in the single-record select client results in the following calls to Utf8CcsidManager methods: 8 calls to getByteLength() 3 calls to convertFromJavaString(String,byte[],int,Agent) Both of these methods create an intermediate object (getByteLength() a byte array, convertFromJavaString() a String) that's thrown away, whereas EbcdicCcsid doesn't do that. Perhaps it's worthwhile to hand-code the UTF-8 encoding in these two methods to avoid having to create the intermediate objects. Or if we don't want to maintain code that does UTF-8 encoding ourselves: assume that all characters are ASCII and only fall back to using intermediate objects if a non-ASCII character is found. I'll run a quick experiment on the machines where I see this issue.
          Hide
          Knut Anders Hatlen added a comment -

          The attached patch makes the two methods encode the strings without going via an intermediate object. I haven't double checked that the encoding is correct, but it should be sufficient for testing purposes.

          I ran the sr_select test with 10 threads using derbyclient.jar from 10.6.2.1, 10.8.1.2 and patched trunk. Four runs with each configuration gave the following average CPU time spent per transaction:

          10.6.2.1: 76.4 µs
          10.8.1.2: 81.3 µs
          trunk+patch: 76.8 µs

          So it looks like that approach will bring the CPU usage back down to the 10.6 level.

          Show
          Knut Anders Hatlen added a comment - The attached patch makes the two methods encode the strings without going via an intermediate object. I haven't double checked that the encoding is correct, but it should be sufficient for testing purposes. I ran the sr_select test with 10 threads using derbyclient.jar from 10.6.2.1, 10.8.1.2 and patched trunk. Four runs with each configuration gave the following average CPU time spent per transaction: 10.6.2.1: 76.4 µs 10.8.1.2: 81.3 µs trunk+patch: 76.8 µs So it looks like that approach will bring the CPU usage back down to the 10.6 level.
          Knut Anders Hatlen made changes -
          Field Original Value New Value
          Attachment d5068-1a.diff [ 12477653 ]
          Hide
          Dag H. Wanvik added a comment -

          Good to understand! The patch adds some complexity by manually converting to UTF-8 "manually" instead of using the Java libraries, though. If you think the performance increase is worth the added complexity, it might be nice to add some links which explains the logic. Does the logic work also if the string contains surrogate pairs?

          Show
          Dag H. Wanvik added a comment - Good to understand! The patch adds some complexity by manually converting to UTF-8 "manually" instead of using the Java libraries, though. If you think the performance increase is worth the added complexity, it might be nice to add some links which explains the logic. Does the logic work also if the string contains surrogate pairs?
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for taking a look at the patch, Dag! I agree that there should be more comments in the final patch, if we go for this approach. I don't know the answer to your question about surrogate pairs, that's one of the things that will need to be double checked and tested. My understanding, though, is that Java's UTF-8 encoding algorithm treats surrogate pairs as two separate characters, which would mean that there's no need for special handling of surrogate pairs in the patch.

          One alternative way to get rid of the intermediate objects would be to use a CharsetEncoder (like we did on the server side in DERBY-2936) which performs the encoding directly into a buffer. This would hide the complexity of doing the actual encoding. However, a CharsetEncoder works on ByteBuffer and CharBuffer objects, whereas the CcidManager interface works on String and byte[] objects, so this would mean changing the interface and the code that uses it. (We could of course wrap the strings and byte arrays using ByteBuffer.wrap() and CharBuffer.wrap() within the methods to keep the interface the same, but then we're back into the business of generating intermediate objects, and my quick and dirty experiments suggest that this wouldn't give any improvements compared to the current situation.)

          Show
          Knut Anders Hatlen added a comment - Thanks for taking a look at the patch, Dag! I agree that there should be more comments in the final patch, if we go for this approach. I don't know the answer to your question about surrogate pairs, that's one of the things that will need to be double checked and tested. My understanding, though, is that Java's UTF-8 encoding algorithm treats surrogate pairs as two separate characters, which would mean that there's no need for special handling of surrogate pairs in the patch. One alternative way to get rid of the intermediate objects would be to use a CharsetEncoder (like we did on the server side in DERBY-2936 ) which performs the encoding directly into a buffer. This would hide the complexity of doing the actual encoding. However, a CharsetEncoder works on ByteBuffer and CharBuffer objects, whereas the CcidManager interface works on String and byte[] objects, so this would mean changing the interface and the code that uses it. (We could of course wrap the strings and byte arrays using ByteBuffer.wrap() and CharBuffer.wrap() within the methods to keep the interface the same, but then we're back into the business of generating intermediate objects, and my quick and dirty experiments suggest that this wouldn't give any improvements compared to the current situation.)
          Hide
          Knut Anders Hatlen added a comment -

          I've started looking into moving from byte[] to ByteBuffer on the client. That allowed us to simplify some bit fiddling code on the server, so it would be an improvement regardless of this issue. It's a bit of work, but most of it is purely mechanical. I'll file a separate issue for that task, and then we can revisit this issue once we have that in place, and see if using a java.nio.charset.CharsetEncoder to encode directly into the buffer would be cheaper than the current approach.

          Show
          Knut Anders Hatlen added a comment - I've started looking into moving from byte[] to ByteBuffer on the client. That allowed us to simplify some bit fiddling code on the server, so it would be an improvement regardless of this issue. It's a bit of work, but most of it is purely mechanical. I'll file a separate issue for that task, and then we can revisit this issue once we have that in place, and see if using a java.nio.charset.CharsetEncoder to encode directly into the buffer would be cheaper than the current approach.
          Knut Anders Hatlen made changes -
          Link This issue is related to DERBY-5210 [ DERBY-5210 ]
          Hide
          Dag H. Wanvik added a comment -

          "then we can revisit this issue once we have that in place": seems a good idea to me! +1

          Show
          Dag H. Wanvik added a comment - "then we can revisit this issue once we have that in place": seems a good idea to me! +1
          Hide
          Knut Anders Hatlen added a comment -

          Attaching an alternative patch (d5068-2a.diff) that must be applied on
          top of the patch attached to DERBY-5210.

          The patch makes the following changes:

          1) Adds two new methods to CcsidManager: startEncoding() and encode().
          These are roughly equivalent to the reset() and encode() methods in
          java.nio.charset.CharsetEncoder (and Utf8CcsidManager indeed
          implements them as wrappers around the CharsetEncoder methods). The
          methods allow encoding a string directly into a ByteBuffer without
          going via an intermediate throw-away array.

          2) Removes these methods from CcsidManager:

          • convertFromJavaString(String, byte[], int, Agent)
          • convertToJavaString(byte[])
          • maxBytesPerChar()
          • getByteLength(String)

          3) Changes Request, NetPackageRequest and NetConnection to use the new
          methods instead of the removed ones.

          In addition to performing the string encoding without creating an
          intermediate byte array, the patch eliminates the use of
          getByteLength() completely (that method also created an intermediate
          byte array). The original code needed to know the exact byte length of
          the string up front so that it could make sure the destination buffer
          was large enough. The new interface for encoding the strings lets the
          caller know if it runs out of buffer space, so that the caller can
          allocate a larger buffer and continue the operation. This way, we
          don't need to encode each string twice.

          The one place where we still need to know the byte length up front, is
          in NetPackageRequest.buildCommonPKGNAMinfo(). That's because the
          format of the message depends on whether or not the string length
          exceeds a certain threshold. The method now creates a byte array
          representation of the string once, and uses that array both to find
          the byte length and to copy the encoded version of the string into the
          buffer.

          I've rerun the sr_select load client, with 10 threads, to see how this
          new patch performs. I used JDK 6u24 on Solaris 10, and collected the
          CPU usage in the client driver by using the /bin/time command. I ran
          each configuration twice, 10 minutes each. Here's the CPU time per
          transaction seen with various versions/patches:

          10.6.2.1 (plain): 62.4 µs/tx
          10.8.1.2 (plain): 67.0 µs/tx
          trunk + d5068-1a.diff: 63.4 µs/tx
          trunk + d5210-1a.diff: 67.9 µs/tx
          trunk + d5210-1a.diff + d5068-2a.diff: 65.2 µs/tx

          So, in short: None of the patches bring the CPU usage all the way down
          to the 10.6.2.1 level. The 1a patch attached to this issue (the one
          that does the UTF-8 encoding manually) is close, though.

          The 2a patch doesn't perform quite as well as the 1a patch, but still
          better than 10.8.1.2. The advantage is that it hides the details on
          how the encoding is done. Also, by using the standard class library
          interface, we may benefit from improvements that are made to the class
          library implementation in the future.

          I guess I'm leaning towards the approach in the 2a patch. The
          performance difference isn't that big anyway (I've only been able to
          see impact on CPU usage, never on the transaction rate), so it doesn't
          seem worthwhile to duplicate functionality provided by the standard
          libraries.

          Show
          Knut Anders Hatlen added a comment - Attaching an alternative patch (d5068-2a.diff) that must be applied on top of the patch attached to DERBY-5210 . The patch makes the following changes: 1) Adds two new methods to CcsidManager: startEncoding() and encode(). These are roughly equivalent to the reset() and encode() methods in java.nio.charset.CharsetEncoder (and Utf8CcsidManager indeed implements them as wrappers around the CharsetEncoder methods). The methods allow encoding a string directly into a ByteBuffer without going via an intermediate throw-away array. 2) Removes these methods from CcsidManager: convertFromJavaString(String, byte[], int, Agent) convertToJavaString(byte[]) maxBytesPerChar() getByteLength(String) 3) Changes Request, NetPackageRequest and NetConnection to use the new methods instead of the removed ones. In addition to performing the string encoding without creating an intermediate byte array, the patch eliminates the use of getByteLength() completely (that method also created an intermediate byte array). The original code needed to know the exact byte length of the string up front so that it could make sure the destination buffer was large enough. The new interface for encoding the strings lets the caller know if it runs out of buffer space, so that the caller can allocate a larger buffer and continue the operation. This way, we don't need to encode each string twice. The one place where we still need to know the byte length up front, is in NetPackageRequest.buildCommonPKGNAMinfo(). That's because the format of the message depends on whether or not the string length exceeds a certain threshold. The method now creates a byte array representation of the string once, and uses that array both to find the byte length and to copy the encoded version of the string into the buffer. I've rerun the sr_select load client, with 10 threads, to see how this new patch performs. I used JDK 6u24 on Solaris 10, and collected the CPU usage in the client driver by using the /bin/time command. I ran each configuration twice, 10 minutes each. Here's the CPU time per transaction seen with various versions/patches: 10.6.2.1 (plain): 62.4 µs/tx 10.8.1.2 (plain): 67.0 µs/tx trunk + d5068-1a.diff: 63.4 µs/tx trunk + d5210-1a.diff: 67.9 µs/tx trunk + d5210-1a.diff + d5068-2a.diff: 65.2 µs/tx So, in short: None of the patches bring the CPU usage all the way down to the 10.6.2.1 level. The 1a patch attached to this issue (the one that does the UTF-8 encoding manually) is close, though. The 2a patch doesn't perform quite as well as the 1a patch, but still better than 10.8.1.2. The advantage is that it hides the details on how the encoding is done. Also, by using the standard class library interface, we may benefit from improvements that are made to the class library implementation in the future. I guess I'm leaning towards the approach in the 2a patch. The performance difference isn't that big anyway (I've only been able to see impact on CPU usage, never on the transaction rate), so it doesn't seem worthwhile to duplicate functionality provided by the standard libraries.
          Knut Anders Hatlen made changes -
          Attachment d5068-2a.diff [ 12478695 ]
          Attachment d5068-2a.stat [ 12478696 ]
          Knut Anders Hatlen made changes -
          Issue & fix info [Patch Available]
          Hide
          Dag H. Wanvik added a comment - - edited

          I think 2a probably strikes the right compromise here: we get a slight performance improvement in addition to a good code cleanup. +1

          One question: Utf8CcsidManager#encode: Is there no need to check for malformed-input and
          unmappable-character errors in the returned CoderResult? The code
          presumes it is always OVERFLOW now. Just reading up on the APIs details here
          Perhaps we know it can't happen here, a comment might be good for the naive reader.

          Show
          Dag H. Wanvik added a comment - - edited I think 2a probably strikes the right compromise here: we get a slight performance improvement in addition to a good code cleanup. +1 One question: Utf8CcsidManager#encode: Is there no need to check for malformed-input and unmappable-character errors in the returned CoderResult? The code presumes it is always OVERFLOW now. Just reading up on the APIs details here Perhaps we know it can't happen here, a comment might be good for the naive reader.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for looking at the patch, Dag. I'm still learning the API myself.

          You're probably right that we should handle those conditions. I'm not sure how unmappable-character errors can happen with UTF-8, but malformed-input errors seem to be raised for characters in the range \uD800 to \uDFFF.

          We have two alternatives:

          1) Make the CharsetEncoder replace problematic characters with '?' instead of reporting an error. (By calling onMalformedInput() and onUnmappableCharacter() with CodingErrorAction.REPLACE.)

          2) Detect and report the conditions. (By checking the CoderResult and raising an exception.)

          Option 2 sounds like the right thing to do. However, the original code used String.getBytes(String) to do the encoding, which implements option 1 (the API javadoc says that it's unspecified what it does when it cannot encode the string, but its actual behaviour matches option 1). Also, we still have the convertFromJavaString(String,Agent) method which matches option 1.

          On the other hand, all the encoding methods in EbcdicCcsidManager do raise an exception if the string contains characters not in the EBCDIC range, so there's no clear precedence. I guess no matter what we choose to do, we should make all these methods consistent. I think my preference would be option 2.

          Show
          Knut Anders Hatlen added a comment - Thanks for looking at the patch, Dag. I'm still learning the API myself. You're probably right that we should handle those conditions. I'm not sure how unmappable-character errors can happen with UTF-8, but malformed-input errors seem to be raised for characters in the range \uD800 to \uDFFF. We have two alternatives: 1) Make the CharsetEncoder replace problematic characters with '?' instead of reporting an error. (By calling onMalformedInput() and onUnmappableCharacter() with CodingErrorAction.REPLACE.) 2) Detect and report the conditions. (By checking the CoderResult and raising an exception.) Option 2 sounds like the right thing to do. However, the original code used String.getBytes(String) to do the encoding, which implements option 1 (the API javadoc says that it's unspecified what it does when it cannot encode the string, but its actual behaviour matches option 1). Also, we still have the convertFromJavaString(String,Agent) method which matches option 1. On the other hand, all the encoding methods in EbcdicCcsidManager do raise an exception if the string contains characters not in the EBCDIC range, so there's no clear precedence. I guess no matter what we choose to do, we should make all these methods consistent. I think my preference would be option 2.
          Hide
          Dag H. Wanvik added a comment -

          Option 2 sounds right to me.

          Show
          Dag H. Wanvik added a comment - Option 2 sounds right to me.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching a new patch (2b) with the following changes from the 2a patch:

          • Added test case to Utf8CcsidManagerClientTest to verify that we don't silently ignore unmappable characters.
          • Changed Utf8CcsidManager.encode() to raise exception when unmappable character or malformed input is detected.
          • Made Utf8CcsidManager.convertFromJavaString() use CharsetEncoder.encode(String) instead of String.getBytes(String) so that umappable character/malformed input isn't silently ignored.

          All the regression tests ran cleanly with the patch.

          Show
          Knut Anders Hatlen added a comment - Attaching a new patch (2b) with the following changes from the 2a patch: Added test case to Utf8CcsidManagerClientTest to verify that we don't silently ignore unmappable characters. Changed Utf8CcsidManager.encode() to raise exception when unmappable character or malformed input is detected. Made Utf8CcsidManager.convertFromJavaString() use CharsetEncoder.encode(String) instead of String.getBytes(String) so that umappable character/malformed input isn't silently ignored. All the regression tests ran cleanly with the patch.
          Knut Anders Hatlen made changes -
          Attachment d5068-2b.diff [ 12479313 ]
          Hide
          Knut Anders Hatlen added a comment -

          Committed revision 1125299.

          Show
          Knut Anders Hatlen added a comment - Committed revision 1125299.
          Knut Anders Hatlen made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Issue & fix info [Patch Available]
          Assignee Knut Anders Hatlen [ knutanders ]
          Fix Version/s 10.9.0.0 [ 12316344 ]
          Resolution Fixed [ 1 ]
          Knut Anders Hatlen made changes -
          Labels derby_backport_reject_10_8
          Knut Anders Hatlen made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Gavin made changes -
          Workflow jira [ 12599903 ] Default workflow, editable Closed status [ 12801717 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          84d 14h 24m 1 Knut Anders Hatlen 20/May/11 10:50
          Resolved Resolved Closed Closed
          40d 23h 31m 1 Knut Anders Hatlen 30/Jun/11 10:21

            People

            • Assignee:
              Knut Anders Hatlen
              Reporter:
              Knut Anders Hatlen
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development