Commons Codec
  1. Commons Codec
  2. CODEC-73

Make string2byte conversions indepedent of platform default encoding

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.4
    • Labels:
      None
    • Environment:

      any platform whose default encoding is not a superset of ASCII, e.g. UTF-16 or EBCDIC

      Description

      Both the library itself and many of its tests are utterly dependent on the JVM's default charset. For example, DigestUtils calls String.getBytes() to convert an input string to a byte array, happily delivering different digests for the same input string if run on different platforms.

      If you want to try out the havor yourself, just run the unit tests in a JVM with UTF-16, e.g. by adding the line

      <argLine>-Dfile.encoding=UTF-16</argLine>
      

      to the configuration of the Surefire Plugin in the POM.

      1. CODEC-73.patch
        140 kB
        Benjamin Bentmann
      2. Hex.patch
        5 kB
        Sebb

        Issue Links

          Activity

          Benjamin Bentmann created issue -
          Benjamin Bentmann made changes -
          Field Original Value New Value
          Attachment CODEC-73.patch [ 12385957 ]
          Hide
          Sebb added a comment -

          Agreed that the default charset dependency needs to be removed.

          However, I have an alternative suggestion:

          In the case of the Hex encode() method, one could completely avoid the need to use getBytes() by using a byte[] array for the conversion.

          Likewise, I think the Hex decode() could just be performed on bytes rather than converting to char first.

          Patch to follow.

          Show
          Sebb added a comment - Agreed that the default charset dependency needs to be removed. However, I have an alternative suggestion: In the case of the Hex encode() method, one could completely avoid the need to use getBytes() by using a byte[] array for the conversion. Likewise, I think the Hex decode() could just be performed on bytes rather than converting to char first. Patch to follow.
          Sebb made changes -
          Attachment Hex.patch [ 12385974 ]
          Sebb made changes -
          Attachment Hex.patch [ 12385974 ]
          Hide
          Sebb added a comment -

          Add byte[] conversion methods (with private toDigit(byte method)

          Show
          Sebb added a comment - Add byte[] conversion methods (with private toDigit(byte method)
          Sebb made changes -
          Attachment Hex.patch [ 12385975 ]
          Sebb made changes -
          Comment [ Add byte[] conversion methods ]
          Henri Yandell made changes -
          Fix Version/s 1.4 [ 12311779 ]
          Gary Gregory committed 800270 (2 files)
          Hide
          ggregory@seagullsw.com added a comment -

          Fixed DigestUtils and test case.

          Show
          ggregory@seagullsw.com added a comment - Fixed DigestUtils and test case.
          Sebb made changes -
          Link This issue incorporates CODEC-85 [ CODEC-85 ]
          Gary Gregory committed 801396 (1 file)
          Reviews: none

          [CODEC-73] In-line comments on odd test results with some charsets on different JREs.

          Niall Pemberton committed 801709 (3 files)
          Reviews: none

          Update release notes for CODEC-73 and site changes for 1.4 release

          Niall Pemberton made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Niall Pemberton made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Mark Thomas made changes -
          Workflow jira [ 12435240 ] Default workflow, editable Closed status [ 12601635 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Benjamin Bentmann
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development