Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.7.1.1
    • Fix Version/s: 10.7.1.1
    • Component/s: Network Client
    • Labels:
      None

      Description

      This issue is DERBY-4746's counterpart for the client changes required to implement UTF8 support in DRDA.

      1. DERBY-4757_cleanup.diff
        3 kB
        Tiago R. Espinha
      2. DERBY-4757_p3.diff
        24 kB
        Tiago R. Espinha
      3. DERBY-4757_p3.diff
        26 kB
        Tiago R. Espinha
      4. DERBY-4757_p3.diff
        21 kB
        Tiago R. Espinha
      5. DERBY-4757_p3.diff
        17 kB
        Tiago R. Espinha
      6. DERBY-4757_p2.diff
        17 kB
        Tiago R. Espinha
      7. DERBY-4757_donotcommit.diff
        24 kB
        Tiago R. Espinha
      8. DERBY-4757_p1.diff
        32 kB
        Tiago R. Espinha

        Issue Links

          Activity

          Hide
          Tiago R. Espinha added a comment -

          I'm attaching a patch that lays the foundation for the UTF-8 support in the client. The patch may seem massive but all the changes are actually related. Here's an overview of what it does:

          • Changes the method names for the CcsidManager abstract class so that they are clearer in terms of what they do (convertToJavaString, convertFromJavaString) similar to what's been done for the server.

          Still on the CcsidManager changes, the patch refactors the code so that the new method names are used. It also introduces a new name to the CcsidManager that returns, for the given CcsidManager, the length of a String in bytes (which differs for the different managers).

          • Creates the Utf8CcsidManager.java for the client and the respective Utf8CcsidManagerClientTest.java. I wasn't sure what to name this test since there is already a test for the server, so I decided to include "Client" in its name. If there is a preferred alternative, I can change it. I have also hooked this test into the suite.
          • Changes NetAgent.java so that it no longer has a source and target CCSID managers. Also, similarly to the server, it now has an instance of each type of CCSID (EBCDIC and UTF-8) and a current CCSID which is the one currently in use.

          In NetAgent.java, I've also set the CCSID managers to private and created a getter for the current manager to enforce encapsulation.

          In sum, this patch does not yet enable UTF-8 support but puts everything in place so that the switch mechanism can be implemented.

          I will now be running regressions.

          Show
          Tiago R. Espinha added a comment - I'm attaching a patch that lays the foundation for the UTF-8 support in the client. The patch may seem massive but all the changes are actually related. Here's an overview of what it does: Changes the method names for the CcsidManager abstract class so that they are clearer in terms of what they do (convertToJavaString, convertFromJavaString) similar to what's been done for the server. Still on the CcsidManager changes, the patch refactors the code so that the new method names are used. It also introduces a new name to the CcsidManager that returns, for the given CcsidManager, the length of a String in bytes (which differs for the different managers). Creates the Utf8CcsidManager.java for the client and the respective Utf8CcsidManagerClientTest.java. I wasn't sure what to name this test since there is already a test for the server, so I decided to include "Client" in its name. If there is a preferred alternative, I can change it. I have also hooked this test into the suite. Changes NetAgent.java so that it no longer has a source and target CCSID managers. Also, similarly to the server, it now has an instance of each type of CCSID (EBCDIC and UTF-8) and a current CCSID which is the one currently in use. In NetAgent.java, I've also set the CCSID managers to private and created a getter for the current manager to enforce encapsulation. In sum, this patch does not yet enable UTF-8 support but puts everything in place so that the switch mechanism can be implemented. I will now be running regressions.
          Hide
          Tiago R. Espinha added a comment -

          The regressions have passed without any failures. However, please don't commit yet until we work out whether yesterday's regression failures have anything to do with DERBY-4746.

          Show
          Tiago R. Espinha added a comment - The regressions have passed without any failures. However, please don't commit yet until we work out whether yesterday's regression failures have anything to do with DERBY-4746 .
          Hide
          Tiago R. Espinha added a comment -

          The regressions failure was a false alarm. The latest regressions have ran without any failures. Marking as patch available for review.

          Show
          Tiago R. Espinha added a comment - The regressions failure was a false alarm. The latest regressions have ran without any failures. Marking as patch available for review.
          Hide
          Bryan Pendleton added a comment -

          I read through the patch and don't have any improvements to suggest.

          The patch builds cleanly in my environment and the tests run successfully.

          Tiago, do you have any additional work planned for this patch?

          I am intending to commit this patch, please let me know if there's any reason I should hold off.

          Show
          Bryan Pendleton added a comment - I read through the patch and don't have any improvements to suggest. The patch builds cleanly in my environment and the tests run successfully. Tiago, do you have any additional work planned for this patch? I am intending to commit this patch, please let me know if there's any reason I should hold off.
          Hide
          Bryan Pendleton added a comment -

          Committed to the trunk as revision 982852. Thanks for the contribution!

          Show
          Bryan Pendleton added a comment - Committed to the trunk as revision 982852. Thanks for the contribution!
          Hide
          Tiago R. Espinha added a comment -

          Thanks for committing Bryan!

          This patch was indeed meant to be commited. I will be doing more work on this issue but I will be submitting additional patches for this.

          Show
          Tiago R. Espinha added a comment - Thanks for committing Bryan! This patch was indeed meant to be commited. I will be doing more work on this issue but I will be submitting additional patches for this.
          Hide
          Tiago R. Espinha added a comment -

          I'm submitting this patch with my latest changes for the UTF8 support on the client. This patch is marked as DO NOT COMMIT as it has debug code and it has serious bugs that still need to be fixed. I'm submitting it so that I can wrap up my participation in GSoC 2010 which has its end today.

          I will still continue to contribute to Derby and I expect to iron out these issues during the course of this week.

          Show
          Tiago R. Espinha added a comment - I'm submitting this patch with my latest changes for the UTF8 support on the client. This patch is marked as DO NOT COMMIT as it has debug code and it has serious bugs that still need to be fixed. I'm submitting it so that I can wrap up my participation in GSoC 2010 which has its end today. I will still continue to contribute to Derby and I expect to iron out these issues during the course of this week.
          Hide
          Tiago R. Espinha added a comment -

          This is another auxiliary patch to lay the foundation for the UTF-8 support. It removes all the CcsidManager instances scattered all over the code and ensures that all the classes use the NetAgent.getCurrentCcsidManager() instead. Having several instances of it in different classes meant we would have to update each and every one of them each time we'd switch between the two managers and that wouldn't be a very good solution in the end.

          This way there's only one instance for each type of CCSID manager and there's a reference for the current CCSID manager being used, which essentially points to one of the two instances.

          This is just half of the changes I already have in place and as soon as this is committed I will submit the next patch. For now I will be running regressions on this patch alone and I will post the results here.

          Show
          Tiago R. Espinha added a comment - This is another auxiliary patch to lay the foundation for the UTF-8 support. It removes all the CcsidManager instances scattered all over the code and ensures that all the classes use the NetAgent.getCurrentCcsidManager() instead. Having several instances of it in different classes meant we would have to update each and every one of them each time we'd switch between the two managers and that wouldn't be a very good solution in the end. This way there's only one instance for each type of CCSID manager and there's a reference for the current CCSID manager being used, which essentially points to one of the two instances. This is just half of the changes I already have in place and as soon as this is committed I will submit the next patch. For now I will be running regressions on this patch alone and I will post the results here.
          Hide
          Tiago R. Espinha added a comment -

          I can confirm the regressions ran without failures for the latest patch. I'm marking as patch available, for review and commit.

          Show
          Tiago R. Espinha added a comment - I can confirm the regressions ran without failures for the latest patch. I'm marking as patch available, for review and commit.
          Hide
          Tiago R. Espinha added a comment - - edited

          We've hit a few regressions on Daily 987777. I'm not sure whether these are related to my changes but I will keep an eye out to see if we hit them again on tomorrow's run.

          They seem to be VM/OS specific for the most part so it could be just intermittent occurrences.

          Show
          Tiago R. Espinha added a comment - - edited We've hit a few regressions on Daily 987777. I'm not sure whether these are related to my changes but I will keep an eye out to see if we hit them again on tomorrow's run. They seem to be VM/OS specific for the most part so it could be just intermittent occurrences.
          Hide
          Tiago R. Espinha added a comment -

          It seems like the regressions weren't related, unmarking as patch available. Thanks for committing Kathey!

          Show
          Tiago R. Espinha added a comment - It seems like the regressions weren't related, unmarking as patch available. Thanks for committing Kathey!
          Hide
          Tiago R. Espinha added a comment -

          I'm uploading what would be the final patch for this issue. Here's a quick overview of what it does:

          • Adds methods to NetAgent that allow for a switch between the current CCSID manager being used.
          • Defines the codepoint for UNICODEMGR as per the ACR.
          • Defines a target level for the UnicodeMgr. This is the target that ideally we want to achieve (1208) but might have to fall back to 0 in case the server does not support it (old servers).
          • Adds a method to NetConnection that returns true or false depending on whether the server supports the UnicodeMgr at level 1208.
          • Writes the UNICODEMGR codepoint and reads it back from the server for the MGRLVLS negotiation.

          This patch also enables the InternationalConnectTest to run on client/server mode as opposed to before, where it ran only on embedded mode and expected an exception in case it was running on client/server.

          It needs to be mentioned that at this point, if the client falls back to level 0 on the UNICODEMGR (this happens if the server does not support UTF-8, i.e. any Derby versions lower than 10.7) then we will not send the RDBNAM at the ACCSEC. This will cause the server to throw a SYNTAXRM since it expected the RDBNAM and the client will, by consequence, also throw a SYNTAXRM. This crash is supposed to happen as the older servers just don't support UTF-8 characters. What is left for improvement is the exception thrown by the client, which can be a little more explicit and explain what is happening. After discussing this issue with Kathey on IRC however, we agreed that we could go ahead as it is to ensure this makes it to the 10.7 release and then at a later stage we can improve the exception.

          I'll be running regressions tonight. Please review my patch and if there are no objections I'd like to go ahead and commit this soon.

          Show
          Tiago R. Espinha added a comment - I'm uploading what would be the final patch for this issue. Here's a quick overview of what it does: Adds methods to NetAgent that allow for a switch between the current CCSID manager being used. Defines the codepoint for UNICODEMGR as per the ACR. Defines a target level for the UnicodeMgr. This is the target that ideally we want to achieve (1208) but might have to fall back to 0 in case the server does not support it (old servers). Adds a method to NetConnection that returns true or false depending on whether the server supports the UnicodeMgr at level 1208. Writes the UNICODEMGR codepoint and reads it back from the server for the MGRLVLS negotiation. This patch also enables the InternationalConnectTest to run on client/server mode as opposed to before, where it ran only on embedded mode and expected an exception in case it was running on client/server. It needs to be mentioned that at this point, if the client falls back to level 0 on the UNICODEMGR (this happens if the server does not support UTF-8, i.e. any Derby versions lower than 10.7) then we will not send the RDBNAM at the ACCSEC. This will cause the server to throw a SYNTAXRM since it expected the RDBNAM and the client will, by consequence, also throw a SYNTAXRM. This crash is supposed to happen as the older servers just don't support UTF-8 characters. What is left for improvement is the exception thrown by the client, which can be a little more explicit and explain what is happening. After discussing this issue with Kathey on IRC however, we agreed that we could go ahead as it is to ensure this makes it to the 10.7 release and then at a later stage we can improve the exception. I'll be running regressions tonight. Please review my patch and if there are no objections I'd like to go ahead and commit this soon.
          Hide
          Bryan Pendleton added a comment -

          The change looks clean to me, thanks very much for the detailed description!

          The only part of the patch that I didn't understand was this:

          @@ -355,7 +359,7 @@
          // support. the size will have ben previously checked so at this point just
          // write the data and pad with the correct number of bytes as needed.
          // this instance variable is always required.

          • buildRDBNAM(rdbnam,false);
            + buildRDBNAM(rdbnam,true);

          // the rdb access manager class specifies an instance of the SQLAM
          // that accesses the RDB. the sqlam manager class codepoint

          Can you clarify the purpose of this change?

          Thanks!

          Show
          Bryan Pendleton added a comment - The change looks clean to me, thanks very much for the detailed description! The only part of the patch that I didn't understand was this: @@ -355,7 +359,7 @@ // support. the size will have ben previously checked so at this point just // write the data and pad with the correct number of bytes as needed. // this instance variable is always required. buildRDBNAM(rdbnam,false); + buildRDBNAM(rdbnam,true); // the rdb access manager class specifies an instance of the SQLAM // that accesses the RDB. the sqlam manager class codepoint Can you clarify the purpose of this change? Thanks!
          Hide
          Tiago R. Espinha added a comment -

          I can. Maybe I should have put another comment there.

          That change isn't actually necessary per se, the only thing is, by setting that argument to true, in case the client can't convert the string using the current CCSID manager, then it won't try to send it to the server. What the code inside that function does is, it tries to convert the string and if we get an exception, then we catch it and add it to the NetAgent. At a later point we can choose to do something wise with that exception.

          If the argument is set to false, it will effectively try to send the RDBNAM to the server and it will still throw an exception but we don't catch it. We just let it break as the client will still not be able to encode the string.

          So this is more of just making sure that we don't even try to send an RDBNAM at ACCSEC if we can't convert it. This is fine for newer clients, as they don't expect RDBNAM at ACCSEC to be required and older clients will break, but then again, if we can't send RDBNAM at ACCSEC to an older client, it means it has UTF-8 characters and then it is supposed to break. If the RDBNAM does not have UTF-8 characters, then the encoding process will go fine and old and new clients will work perfectly.

          Show
          Tiago R. Espinha added a comment - I can. Maybe I should have put another comment there. That change isn't actually necessary per se, the only thing is, by setting that argument to true, in case the client can't convert the string using the current CCSID manager, then it won't try to send it to the server. What the code inside that function does is, it tries to convert the string and if we get an exception, then we catch it and add it to the NetAgent. At a later point we can choose to do something wise with that exception. If the argument is set to false, it will effectively try to send the RDBNAM to the server and it will still throw an exception but we don't catch it. We just let it break as the client will still not be able to encode the string. So this is more of just making sure that we don't even try to send an RDBNAM at ACCSEC if we can't convert it. This is fine for newer clients, as they don't expect RDBNAM at ACCSEC to be required and older clients will break, but then again, if we can't send RDBNAM at ACCSEC to an older client, it means it has UTF-8 characters and then it is supposed to break. If the RDBNAM does not have UTF-8 characters, then the encoding process will go fine and old and new clients will work perfectly.
          Hide
          Tiago R. Espinha added a comment -

          I've hit a few regressions so this patch isn't ready to commit. One of the failing fixtures is actually testGreekCharacters which expected a failure to connect and now it accepts these characters.

          Show
          Tiago R. Espinha added a comment - I've hit a few regressions so this patch isn't ready to commit. One of the failing fixtures is actually testGreekCharacters which expected a failure to connect and now it accepts these characters.
          Hide
          Bryan Pendleton added a comment -

          > making sure that we don't even try to send an RDBNAM at ACCSEC if we can't convert it.

          Thanks for the clear and precise explanation. Your technique seems appropriate to me.

          Regarding whether extra comments are needed in the code: I have found it sufficient in
          the past, when studying a change to the code, to be able to trace the change back to a
          Jira entry and then read the comments in the Jira in detail. So, in this case, i would say
          that your explanation in the Jira entry is very helpful and clear, and I don't think that any
          additional change to the patch is necessary.

          Show
          Bryan Pendleton added a comment - > making sure that we don't even try to send an RDBNAM at ACCSEC if we can't convert it. Thanks for the clear and precise explanation. Your technique seems appropriate to me. Regarding whether extra comments are needed in the code: I have found it sufficient in the past, when studying a change to the code, to be able to trace the change back to a Jira entry and then read the comments in the Jira in detail. So, in this case, i would say that your explanation in the Jira entry is very helpful and clear, and I don't think that any additional change to the patch is necessary.
          Hide
          Tiago R. Espinha added a comment -

          Thank you for your comments Bryan.

          With the latest patch I've fixed both the regressions I was hitting. One of them had to do with the jdbcapi/AuthenticationTest.java where we expected the connection to fail whenever Greek characters were used. Obviously, we do now support Greek characters and as such the test should pass.

          The second regression was with something I overlooked. I forgot to reset the CCSID manager back to EBCDIC for new sessions, which meant that if a client was to negotiate UTF-8, all the subsequent connections would be using UTF-8. This would fail as all connections must start as EBCDIC for the initial negotiation.

          Logically, this two-line fix should be part of DERBY-4746 that deals with the server but seeing as these two issues go hand in hand, I think sneaking in this fix here should be fine. The server implementation in DERBY-4746 stands for itself, but when the client kicks in, we need this additional fix.

          I'll be running regressions and if there are no failures and no one objects, I'd like to go ahead and check this in so that we can have it on time for a 10.7 release.

          Show
          Tiago R. Espinha added a comment - Thank you for your comments Bryan. With the latest patch I've fixed both the regressions I was hitting. One of them had to do with the jdbcapi/AuthenticationTest.java where we expected the connection to fail whenever Greek characters were used. Obviously, we do now support Greek characters and as such the test should pass. The second regression was with something I overlooked. I forgot to reset the CCSID manager back to EBCDIC for new sessions, which meant that if a client was to negotiate UTF-8, all the subsequent connections would be using UTF-8. This would fail as all connections must start as EBCDIC for the initial negotiation. Logically, this two-line fix should be part of DERBY-4746 that deals with the server but seeing as these two issues go hand in hand, I think sneaking in this fix here should be fine. The server implementation in DERBY-4746 stands for itself, but when the client kicks in, we need this additional fix. I'll be running regressions and if there are no failures and no one objects, I'd like to go ahead and check this in so that we can have it on time for a 10.7 release.
          Hide
          Bryan Pendleton added a comment -

          I read through your changes and they look good to me.

          +1 to commit the _v3 patch, once you're comfortable with your test results.

          If you'd prefer to submit the DRDAConnThread.java change separately,
          and indicate in the subversion log description that it is part of DERBY-4746,
          that seems fine to me.

          I think it's also fine to submit that change as part of this work. As you note,
          the various changes are intertwined, and I don't foresee anyone wishing
          to use only the server changes, or only the client changes, so I think it's
          fine to include that change here.

          Thanks for all the effort and hard work getting these changes in shape for
          the upcoming release!

          Show
          Bryan Pendleton added a comment - I read through your changes and they look good to me. +1 to commit the _v3 patch, once you're comfortable with your test results. If you'd prefer to submit the DRDAConnThread.java change separately, and indicate in the subversion log description that it is part of DERBY-4746 , that seems fine to me. I think it's also fine to submit that change as part of this work. As you note, the various changes are intertwined, and I don't foresee anyone wishing to use only the server changes, or only the client changes, so I think it's fine to include that change here. Thanks for all the effort and hard work getting these changes in shape for the upcoming release!
          Hide
          Tiago R. Espinha added a comment -

          I'm delaying this patch a little bit more. This is because I just realized that the server is actually lacking some changes that are crucial for the implementation. Namely, the write*String methods in DDMWriter are still using .length() calls to determine the length of strings and these invocations need to be piped through the current CCSID manager. I was certain I had done this but I either forgot about it or I lost the changes in one of the patches...

          In light of this I'll just bundle these changes with the other change I had made to the server and I'll submit an individual patch for it.

          After this I still want to add a couple of fixtures to the InternationalConnectTest to do some boundary testing with international characters.

          Kathey also suggested that we'd throw in a switch (via a property) to disable the UTF-8 CCSID manager so that if we hit issues in production environments, we can disable it and minimize the damage. In the end we agreed this would probably not be necessary but I thought I'd throw it out there anyway to get some thoughts.

          Show
          Tiago R. Espinha added a comment - I'm delaying this patch a little bit more. This is because I just realized that the server is actually lacking some changes that are crucial for the implementation. Namely, the write*String methods in DDMWriter are still using .length() calls to determine the length of strings and these invocations need to be piped through the current CCSID manager. I was certain I had done this but I either forgot about it or I lost the changes in one of the patches... In light of this I'll just bundle these changes with the other change I had made to the server and I'll submit an individual patch for it. After this I still want to add a couple of fixtures to the InternationalConnectTest to do some boundary testing with international characters. Kathey also suggested that we'd throw in a switch (via a property) to disable the UTF-8 CCSID manager so that if we hit issues in production environments, we can disable it and minimize the damage. In the end we agreed this would probably not be necessary but I thought I'd throw it out there anyway to get some thoughts.
          Hide
          Knut Anders Hatlen added a comment -

          I came across this method in Utf8CcidManager (client) added in one of the earlier patches:

          public char convertToJavaChar(byte sourceByte)

          { /* 1 byte = 0 to 255 which is the same in UTF-8 and ASCII */ return (char)sourceByte; }

          I'm not sure how it's supposed to be used, since it is currently only called from one of the tests, but it doesn't look quite right. The problem is that byte is a signed type, so for non-ascii characters in the range 0x80-0xff the corresponding byte values are negative. When casting that negative value to a char, we won't end up with a char in the 0x80-0xff range, but rather with a char in the 0xff80-0xffff range.

          To expose this problem, add these lines to Utf8CcsidManagerClientTest.testConvertToJavaChar()

          b = (byte) 0xe5;
          assertEquals('\u00e5', ccsidManager.convertToJavaChar(b));

          and see the test fail with

          AssertionFailedError: expected:<å> but was:<¥>

          Show
          Knut Anders Hatlen added a comment - I came across this method in Utf8CcidManager (client) added in one of the earlier patches: public char convertToJavaChar(byte sourceByte) { /* 1 byte = 0 to 255 which is the same in UTF-8 and ASCII */ return (char)sourceByte; } I'm not sure how it's supposed to be used, since it is currently only called from one of the tests, but it doesn't look quite right. The problem is that byte is a signed type, so for non-ascii characters in the range 0x80-0xff the corresponding byte values are negative. When casting that negative value to a char, we won't end up with a char in the 0x80-0xff range, but rather with a char in the 0xff80-0xffff range. To expose this problem, add these lines to Utf8CcsidManagerClientTest.testConvertToJavaChar() b = (byte) 0xe5; assertEquals('\u00e5', ccsidManager.convertToJavaChar(b)); and see the test fail with AssertionFailedError: expected:<å> but was:<¥>
          Hide
          Tiago R. Espinha added a comment -

          I'm attaching the final version of the _3 patch. It contains mostly cosmetic changes from the previous version and the additional fixture to the InternationalConnectTest to test the boundaries of the new CCSID manager.

          I've already ran suites.All and derbyall without any failures but I still want to do some mixed client/server testing before committing. In the meanwhile maybe someone could review the patch?

          Knut, you're right. I just included this method definition as this was part of its superclass, CcsidManager. I'm going to have a look at it again and if it really isn't invoked anywhere, I'll submit a separate patch that removes this method and the test fixture that uses it. If it's not used anywhere, it seems pointless to have dead code laying around.

          Apologies for the delays in the patches but I've been travelling lately and as such, I've been away from regular computer access. Still, I wanted to commit this patch before the RC for 10.7 is issued as I think it would be a nice feature to have as part of the new version.

          This would be the last patch that effectively fits everything in place for the new CCSID manager to be used.

          Show
          Tiago R. Espinha added a comment - I'm attaching the final version of the _3 patch. It contains mostly cosmetic changes from the previous version and the additional fixture to the InternationalConnectTest to test the boundaries of the new CCSID manager. I've already ran suites.All and derbyall without any failures but I still want to do some mixed client/server testing before committing. In the meanwhile maybe someone could review the patch? Knut, you're right. I just included this method definition as this was part of its superclass, CcsidManager. I'm going to have a look at it again and if it really isn't invoked anywhere, I'll submit a separate patch that removes this method and the test fixture that uses it. If it's not used anywhere, it seems pointless to have dead code laying around. Apologies for the delays in the patches but I've been travelling lately and as such, I've been away from regular computer access. Still, I wanted to commit this patch before the RC for 10.7 is issued as I think it would be a nice feature to have as part of the new version. This would be the last patch that effectively fits everything in place for the new CCSID manager to be used.
          Hide
          Tiago R. Espinha added a comment -

          Submitting the patch again as I made a mistake with the patch name.

          Show
          Tiago R. Espinha added a comment - Submitting the patch again as I made a mistake with the patch name.
          Hide
          Knut Anders Hatlen added a comment -

          Hi Tiago,

          I don't think the server-side changes (DDMWriter) in the latest patch are correct. writeString() and writeLDString() are not used for writing DRDA commands, so they don't go through the ccsid manager. These strings are encoded in UTF-8 also in those versions of Derby that encode DRDA commands in EBCDIC. Since these methods now use the ccsid manager for getting the byte length of the string, although the string is always encoded in UTF-8, the length may be wrong when talking to older clients.

          For example, I see this when I connect with a 10.6.2.1 client to a server running trunk with the patch:

          ij version 10.6
          ij> connect 'jdbc:derby://localhost/db;create=true';
          ij> values 'æøå';
          1

          æ

          1 row selected
          ij> values 'vêrhane';
          1
          -------
          vêrhan

          1 row selected

          Notice how the returned strings are truncated because the EBCDIC ccsid manager returns a shorter byte length than what the actual UTF-8 encoding results in.

          Show
          Knut Anders Hatlen added a comment - Hi Tiago, I don't think the server-side changes (DDMWriter) in the latest patch are correct. writeString() and writeLDString() are not used for writing DRDA commands, so they don't go through the ccsid manager. These strings are encoded in UTF-8 also in those versions of Derby that encode DRDA commands in EBCDIC. Since these methods now use the ccsid manager for getting the byte length of the string, although the string is always encoded in UTF-8, the length may be wrong when talking to older clients. For example, I see this when I connect with a 10.6.2.1 client to a server running trunk with the patch: ij version 10.6 ij> connect 'jdbc:derby://localhost/db;create=true'; ij> values 'æøå'; 1 — æ 1 row selected ij> values 'vêrhane'; 1 ------- vêrhan 1 row selected Notice how the returned strings are truncated because the EBCDIC ccsid manager returns a shorter byte length than what the actual UTF-8 encoding results in.
          Hide
          Tiago R. Espinha added a comment -

          You're right Knut, those methods are just for DRDA arguments. I think I've hit some issues before by leaving those unchanged though, but they definitely shouldn't be handled by the CCSID manager. I'll have a look and submit another patch soon.

          Show
          Tiago R. Espinha added a comment - You're right Knut, those methods are just for DRDA arguments. I think I've hit some issues before by leaving those unchanged though, but they definitely shouldn't be handled by the CCSID manager. I'll have a look and submit another patch soon.
          Hide
          Tiago R. Espinha added a comment -

          This patch removes the methods Knut mentioned in the previous comment. As mentioned before, these methods aren't used for DRDA commands but for the arguments of the DRDA instructions, for this reason they shouldn't be handled by the CCSID manager.

          I've already ran suites.all and derbyall on this patch with no failures.

          Show
          Tiago R. Espinha added a comment - This patch removes the methods Knut mentioned in the previous comment. As mentioned before, these methods aren't used for DRDA commands but for the arguments of the DRDA instructions, for this reason they shouldn't be handled by the CCSID manager. I've already ran suites.all and derbyall on this patch with no failures.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks, Tiago. The latest patch looks fine to me.

          Show
          Knut Anders Hatlen added a comment - Thanks, Tiago. The latest patch looks fine to me.
          Hide
          Tiago R. Espinha added a comment -

          Great! I've done some additional mixed client/server testing and here are the results. From what I can see, no regressions are introduced with older clients or servers as the support for Latin characters was actually broken. We can now use older clients on the 10.7 server and while this connection will use EBCDIC (older clients won't have UTF-8 support), it will now support the extra-ASCII characters featured in EBCDIC.

          I will leave it until 22:00 BST and unless someone objects until then, I will commit this patch.

          Results of testing:
          10.3.3 client - 10.7 server

          <= 255 chars (incl. Latin) - pass

          • would fail before with >3 Latin chars

          >= 256 chars (incl. Latin) - fail

          ###

          10.5.3 client - 10.7 server

          <= 255 chars (incl. Latin) - pass

          • would fail before with >3 Latin chars

          >= 256 chars (incl. Latin) - fail

          ###

          10.7 client - 10.7 server
          >= 122 chars (Latin only) - fail
          <= 121 chars (Latin only) - pass

          Reason: -> each char = 2 bytes, 122 * 2 = 244 + 12 single-byted chars (;create=true)
          == 256 bytes == 1 byte too many
          -> older clients don't experience this as everything is EBCDIC (1 byte/char)

          ###

          10.7 client - 10.3.3 server
          > 3 Latin only chars (mixed or not with ASCII) - fail

          • Used to fail before

          > 255 ASCII only chars (incl. ;create=true) - fail
          <= 255 ASCII only chars (incl. ;create=true) - pass

          10.7 client - 10.5.3 server

          • Same behaviour as 10.3.3 server
          Show
          Tiago R. Espinha added a comment - Great! I've done some additional mixed client/server testing and here are the results. From what I can see, no regressions are introduced with older clients or servers as the support for Latin characters was actually broken. We can now use older clients on the 10.7 server and while this connection will use EBCDIC (older clients won't have UTF-8 support), it will now support the extra-ASCII characters featured in EBCDIC. I will leave it until 22:00 BST and unless someone objects until then, I will commit this patch. Results of testing: 10.3.3 client - 10.7 server <= 255 chars (incl. Latin) - pass would fail before with >3 Latin chars >= 256 chars (incl. Latin) - fail ### 10.5.3 client - 10.7 server <= 255 chars (incl. Latin) - pass would fail before with >3 Latin chars >= 256 chars (incl. Latin) - fail ### 10.7 client - 10.7 server >= 122 chars (Latin only) - fail <= 121 chars (Latin only) - pass Reason: -> each char = 2 bytes, 122 * 2 = 244 + 12 single-byted chars (;create=true) == 256 bytes == 1 byte too many -> older clients don't experience this as everything is EBCDIC (1 byte/char) ### 10.7 client - 10.3.3 server > 3 Latin only chars (mixed or not with ASCII) - fail Used to fail before > 255 ASCII only chars (incl. ;create=true) - fail <= 255 ASCII only chars (incl. ;create=true) - pass 10.7 client - 10.5.3 server Same behaviour as 10.3.3 server
          Hide
          Tiago R. Espinha added a comment -

          Attaching a cleanup patch to then proceed with resolving the issue. This patch removes the convertToJavaChar() method as mentioned by Knut a few comments above. This is dead code that isn't used anywhere in the code.

          Show
          Tiago R. Espinha added a comment - Attaching a cleanup patch to then proceed with resolving the issue. This patch removes the convertToJavaChar() method as mentioned by Knut a few comments above. This is dead code that isn't used anywhere in the code.
          Hide
          Knut Anders Hatlen added a comment -

          The cleanup patch looks good. +1

          Show
          Knut Anders Hatlen added a comment - The cleanup patch looks good. +1
          Hide
          Tiago R. Espinha added a comment -

          Thank you Knut. I also ran suites.All just in case and got not failures.

          I've committed the patch and resolved the issue. Will wait for the regressions run to close it.

          Show
          Tiago R. Espinha added a comment - Thank you Knut. I also ran suites.All just in case and got not failures. I've committed the patch and resolved the issue. Will wait for the regressions run to close it.
          Hide
          Knut Anders Hatlen added a comment -

          [bulk update] Close all resolved issues that haven't been updated for more than one year.

          Show
          Knut Anders Hatlen added a comment - [bulk update] Close all resolved issues that haven't been updated for more than one year.

            People

            • Assignee:
              Tiago R. Espinha
              Reporter:
              Tiago R. Espinha
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development