Derby
  1. Derby
  2. DERBY-4827

Modify the documentation for the 10.7 release regarding the UTF-8 CCSID manager

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.7.1.1
    • Fix Version/s: 10.7.1.1
    • Component/s: Documentation
    • Labels:
      None

      Description

      With the introduction of UTF-8 support in the client driver (DERBY-728), the documentation regarding the length of the arguments (RDBNAM, USRID, etc) will become misleading.

      On the list, Kathey has identified [1] one of such spots. Before releasing, we should try to find any other occurrences and fix them accordingly. Please note that the UTF-8 is a variable length encoding and as such, since we are maintaining the 255-byte length cap, the length in characters will now be variable.

      Regular ASCII characters still take 1 byte, Latin and other extended characters take 2 bytes, Chinese characters take 3 bytes and some special characters take 4 bytes. [2]

      [1] - http://old.nabble.com/Database-name-length-tt29691419.html

      [2] - http://www.utf8-chartable.de/

      1. DERBY-4827.diff
        1 kB
        Kim Haase
      2. cadminappsclient.html
        20 kB
        Kim Haase

        Issue Links

          Activity

          Hide
          Tiago R. Espinha added a comment -

          It seems we're done for now so closing the issue.

          Show
          Tiago R. Espinha added a comment - It seems we're done for now so closing the issue.
          Hide
          Kim Haase added a comment -

          Resolving issue, since the immediately relevant documentation topic has been fixed.

          When DERBY-4805 is fixed, another doc issue can be filed to document the new size limit.

          Show
          Kim Haase added a comment - Resolving issue, since the immediately relevant documentation topic has been fixed. When DERBY-4805 is fixed, another doc issue can be filed to document the new size limit.
          Hide
          Kim Haase added a comment -

          Thanks, Tiago, for the okay, and Knut, for the additional information.

          Committed patch DERBY-4827.diff to documentation trunk at revision 1004232.

          I'm not resolving the issue, since more changes are likely.

          Show
          Kim Haase added a comment - Thanks, Tiago, for the okay, and Knut, for the additional information. Committed patch DERBY-4827 .diff to documentation trunk at revision 1004232. I'm not resolving the issue, since more changes are likely.
          Hide
          Knut Anders Hatlen added a comment -

          I've linked this issue to DERBY-4805 since I believe it's the plan to fix that to 10.7, and then the same text would need to be updated with the new maximum length (65531).

          Show
          Knut Anders Hatlen added a comment - I've linked this issue to DERBY-4805 since I believe it's the plan to fix that to 10.7, and then the same text would need to be updated with the new maximum length (65531).
          Hide
          Tiago R. Espinha added a comment -

          Thanks Kim, the patch looks good!

          Show
          Tiago R. Espinha added a comment - Thanks Kim, the patch looks good!
          Hide
          Kim Haase added a comment -

          Attaching DERBY-4827.diff and cadminappsclient.html, the only topic so far identified as needing modification. Thanks in advance for looking it over.

          Show
          Kim Haase added a comment - Attaching DERBY-4827 .diff and cadminappsclient.html, the only topic so far identified as needing modification. Thanks in advance for looking it over.
          Hide
          Kim Haase added a comment -

          Thanks so much, Tiago, that's very helpful.

          I can fix http://db.apache.org/derby/docs/dev/adminguide/cadminappsclient.html. However, that is the only topic in the doc set that mentions EBCDIC explicitly. So if, for example, any of the database attribute topics in the reference manual need fixing, it would be helpful to know that. Currently there is no mention of any character length limit for these.

          Show
          Kim Haase added a comment - Thanks so much, Tiago, that's very helpful. I can fix http://db.apache.org/derby/docs/dev/adminguide/cadminappsclient.html . However, that is the only topic in the doc set that mentions EBCDIC explicitly. So if, for example, any of the database attribute topics in the reference manual need fixing, it would be helpful to know that. Currently there is no mention of any character length limit for these.
          Hide
          Tiago R. Espinha added a comment -

          Ah, and indeed, like you said, this only affects the client driver. We already support other encodings in the embedded one.

          Show
          Tiago R. Espinha added a comment - Ah, and indeed, like you said, this only affects the client driver. We already support other encodings in the embedded one.
          Hide
          Tiago R. Espinha added a comment -

          Hi Kim,

          Apologies for not having provided more info to begin with.

          This will probably have to be a continuous effort even after the 10.7 release as the references to RDBNAM and other fields (pretty much any DRDA command will be affected by this) aren't always explicit and it might take a while to find them all. However, now that I think about it, from a user's point of view it might indeed just be these three fields: database name, username and password.

          The URL you mentioned doesn't seem to have anything in need of change. We only need to change references to EBCDIC (to date it was the only encoding available for the database name, username and password - now we support UTF-8) and to the 255-byte length limitation which now doesn't always translate to 255 characters.

          Kathey found this reference that requires changing: http://db.apache.org/derby/docs/dev/adminguide/cadminappsclient.html

          Here, it reads:
          "For both driver and DataSource access, the database name (including path), user, password and other attribute values must consist of single-byte characters that can be converted to EBCDIC. The total byte length of the database name plus attributes when converted to EBCDIC must not exceed 255 bytes. You may be able to work around this restriction for long paths or paths that include multibyte characters by setting the derby.system.home system property when starting Network Server and accessing the database with a relative path that is shorter and does not include multibyte characters."

          This is wrong for the most part now. Those three attribute values can consist of any character that can be converted to UTF-8 and while the 255-byte limit still exists, perhaps it would be nice to mention that in UTF-8 this might not always translate to 255 characters (might be shorter).

          I'll try to find more references to EBCDIC in the documentation - anything mentioning EBCDIC will probably require some slight changes. If I find anything, I'll post it here.

          Thanks.

          Show
          Tiago R. Espinha added a comment - Hi Kim, Apologies for not having provided more info to begin with. This will probably have to be a continuous effort even after the 10.7 release as the references to RDBNAM and other fields (pretty much any DRDA command will be affected by this) aren't always explicit and it might take a while to find them all. However, now that I think about it, from a user's point of view it might indeed just be these three fields: database name, username and password. The URL you mentioned doesn't seem to have anything in need of change. We only need to change references to EBCDIC (to date it was the only encoding available for the database name, username and password - now we support UTF-8) and to the 255-byte length limitation which now doesn't always translate to 255 characters. Kathey found this reference that requires changing: http://db.apache.org/derby/docs/dev/adminguide/cadminappsclient.html Here, it reads: "For both driver and DataSource access, the database name (including path), user, password and other attribute values must consist of single-byte characters that can be converted to EBCDIC. The total byte length of the database name plus attributes when converted to EBCDIC must not exceed 255 bytes. You may be able to work around this restriction for long paths or paths that include multibyte characters by setting the derby.system.home system property when starting Network Server and accessing the database with a relative path that is shorter and does not include multibyte characters." This is wrong for the most part now. Those three attribute values can consist of any character that can be converted to UTF-8 and while the 255-byte limit still exists, perhaps it would be nice to mention that in UTF-8 this might not always translate to 255 characters (might be shorter). I'll try to find more references to EBCDIC in the documentation - anything mentioning EBCDIC will probably require some slight changes. If I find anything, I'll post it here. Thanks.
          Hide
          Kim Haase added a comment -

          It's not clear to me from the information here or in DERBY-728 exactly what kinds of strings will have variable lengths. Kathey mentioned database names, so would http://db.apache.org/derby/docs/dev/ref/rrefattrib17246.html need to be changed? What about values of other properties? Paths? Apparently this applies only to the client driver, not the embedded driver?

          Currently, RDBNAM itself is mentioned only in an error message, and USRID is not mentioned anywhere. So there is no explicit "documentation regarding the length of the arguments (RDBNAM, USRID, etc)" .

          It would be helpful to know what information in what manuals now needs fixing.

          Show
          Kim Haase added a comment - It's not clear to me from the information here or in DERBY-728 exactly what kinds of strings will have variable lengths. Kathey mentioned database names, so would http://db.apache.org/derby/docs/dev/ref/rrefattrib17246.html need to be changed? What about values of other properties? Paths? Apparently this applies only to the client driver, not the embedded driver? Currently, RDBNAM itself is mentioned only in an error message, and USRID is not mentioned anywhere. So there is no explicit "documentation regarding the length of the arguments (RDBNAM, USRID, etc)" . It would be helpful to know what information in what manuals now needs fixing.
          Hide
          Tiago R. Espinha added a comment -

          Linking to DERBY-728 as "is blocked by". This issue is dependent on DERBY-728 making it to the 10.7 release.

          Show
          Tiago R. Espinha added a comment - Linking to DERBY-728 as "is blocked by". This issue is dependent on DERBY-728 making it to the 10.7 release.

            People

            • Assignee:
              Kim Haase
              Reporter:
              Tiago R. Espinha
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development