Commons Codec
  1. Commons Codec
  2. CODEC-73

Make string2byte conversions indepedent of platform default encoding

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.4
    • Labels:
      None
    • Environment:

      any platform whose default encoding is not a superset of ASCII, e.g. UTF-16 or EBCDIC

      Description

      Both the library itself and many of its tests are utterly dependent on the JVM's default charset. For example, DigestUtils calls String.getBytes() to convert an input string to a byte array, happily delivering different digests for the same input string if run on different platforms.

      If you want to try out the havor yourself, just run the unit tests in a JVM with UTF-16, e.g. by adding the line

      <argLine>-Dfile.encoding=UTF-16</argLine>
      

      to the configuration of the Surefire Plugin in the POM.

      1. CODEC-73.patch
        140 kB
        Benjamin Bentmann
      2. Hex.patch
        5 kB
        Sebb

        Issue Links

          Activity

          Mark Thomas made changes -
          Workflow jira [ 12435240 ] Default workflow, editable Closed status [ 12601635 ]
          Niall Pemberton made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Niall Pemberton made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Sebb made changes -
          Link This issue incorporates CODEC-85 [ CODEC-85 ]
          Hide
          ggregory@seagullsw.com added a comment -

          Fixed DigestUtils and test case.

          Show
          ggregory@seagullsw.com added a comment - Fixed DigestUtils and test case.
          Henri Yandell made changes -
          Fix Version/s 1.4 [ 12311779 ]
          Sebb made changes -
          Comment [ Add byte[] conversion methods ]
          Sebb made changes -
          Attachment Hex.patch [ 12385975 ]
          Hide
          Sebb added a comment -

          Add byte[] conversion methods (with private toDigit(byte method)

          Show
          Sebb added a comment - Add byte[] conversion methods (with private toDigit(byte method)
          Sebb made changes -
          Attachment Hex.patch [ 12385974 ]
          Sebb made changes -
          Attachment Hex.patch [ 12385974 ]
          Hide
          Sebb added a comment -

          Agreed that the default charset dependency needs to be removed.

          However, I have an alternative suggestion:

          In the case of the Hex encode() method, one could completely avoid the need to use getBytes() by using a byte[] array for the conversion.

          Likewise, I think the Hex decode() could just be performed on bytes rather than converting to char first.

          Patch to follow.

          Show
          Sebb added a comment - Agreed that the default charset dependency needs to be removed. However, I have an alternative suggestion: In the case of the Hex encode() method, one could completely avoid the need to use getBytes() by using a byte[] array for the conversion. Likewise, I think the Hex decode() could just be performed on bytes rather than converting to char first. Patch to follow.
          Benjamin Bentmann made changes -
          Field Original Value New Value
          Attachment CODEC-73.patch [ 12385957 ]
          Benjamin Bentmann created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              Benjamin Bentmann
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development