Commons Codec
  1. Commons Codec
  2. CODEC-73

Make string2byte conversions indepedent of platform default encoding

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.4
    • Labels:
      None
    • Environment:

      any platform whose default encoding is not a superset of ASCII, e.g. UTF-16 or EBCDIC

      Description

      Both the library itself and many of its tests are utterly dependent on the JVM's default charset. For example, DigestUtils calls String.getBytes() to convert an input string to a byte array, happily delivering different digests for the same input string if run on different platforms.

      If you want to try out the havor yourself, just run the unit tests in a JVM with UTF-16, e.g. by adding the line

      <argLine>-Dfile.encoding=UTF-16</argLine>
      

      to the configuration of the Surefire Plugin in the POM.

      1. CODEC-73.patch
        140 kB
        Benjamin Bentmann
      2. Hex.patch
        5 kB
        Sebb

        Issue Links

          Activity

          Benjamin Bentmann created issue -
          Benjamin Bentmann made changes -
          Field Original Value New Value
          Attachment CODEC-73.patch [ 12385957 ]
          Hide
          Sebb added a comment -

          Agreed that the default charset dependency needs to be removed.

          However, I have an alternative suggestion:

          In the case of the Hex encode() method, one could completely avoid the need to use getBytes() by using a byte[] array for the conversion.

          Likewise, I think the Hex decode() could just be performed on bytes rather than converting to char first.

          Patch to follow.

          Show
          Sebb added a comment - Agreed that the default charset dependency needs to be removed. However, I have an alternative suggestion: In the case of the Hex encode() method, one could completely avoid the need to use getBytes() by using a byte[] array for the conversion. Likewise, I think the Hex decode() could just be performed on bytes rather than converting to char first. Patch to follow.
          Sebb made changes -
          Attachment Hex.patch [ 12385974 ]
          Sebb made changes -
          Attachment Hex.patch [ 12385974 ]
          Hide
          Sebb added a comment -

          Add byte[] conversion methods (with private toDigit(byte method)

          Show
          Sebb added a comment - Add byte[] conversion methods (with private toDigit(byte method)
          Sebb made changes -
          Attachment Hex.patch [ 12385975 ]
          Sebb made changes -
          Comment [ Add byte[] conversion methods ]
          Henri Yandell made changes -
          Fix Version/s 1.4 [ 12311779 ]
          Hide
          ggregory@seagullsw.com added a comment -

          Fixed DigestUtils and test case.

          Show
          ggregory@seagullsw.com added a comment - Fixed DigestUtils and test case.
          Sebb made changes -
          Link This issue incorporates CODEC-85 [ CODEC-85 ]
          Niall Pemberton made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Niall Pemberton made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Mark Thomas made changes -
          Workflow jira [ 12435240 ] Default workflow, editable Closed status [ 12601635 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          388d 9h 32m 1 Niall Pemberton 06/Aug/09 19:25
          Resolved Resolved Closed Closed
          13h 46m 1 Niall Pemberton 07/Aug/09 09:12

            People

            • Assignee:
              Unassigned
              Reporter:
              Benjamin Bentmann
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development