Commons IO
  1. Commons IO
  2. IO-315

Replace all "String encoding" parameters with a value type

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.1
    • Fix Version/s: 2.3
    • Component/s: Streams/Writers
    • Labels:
      None

      Description

      Please create an interface "Encoding" plus a set of useful defaults (UTF_8, ISO_LATIN_1, CP_1250 and CP_1252).

      Use this interface in all places where "String encoding" is used now. This would make the API more reliable, improve code reuse and reduce futile catch blocks for UnsupportedEncodingException.

        Activity

        Hide
        Sebb added a comment -

        I think this was fixed by the addition of the Charsets class.
        If not, please re-open with details

        Show
        Sebb added a comment - I think this was fixed by the addition of the Charsets class. If not, please re-open with details
        Hide
        Gary Gregory added a comment -

        Such a class exists in IO trunk now. The vote to release 2.3 is underway.

        Gary

        Show
        Gary Gregory added a comment - Such a class exists in IO trunk now. The vote to release 2.3 is underway. Gary
        Hide
        Aaron Digulla added a comment -

        @Sebb: Which is why I want a class that provides standard constants.

        Just like Gary did in http://svn.apache.org/viewvc/commons/proper/codec/trunk/src/main/java/org/apache/commons/codec/Charsets.java?revision=1308315&view=markup

        Show
        Aaron Digulla added a comment - @Sebb: Which is why I want a class that provides standard constants. Just like Gary did in http://svn.apache.org/viewvc/commons/proper/codec/trunk/src/main/java/org/apache/commons/codec/Charsets.java?revision=1308315&view=markup
        Hide
        Gary Gregory added a comment -

        We now have a Charsets class in [codec].

        Show
        Gary Gregory added a comment - We now have a Charsets class in [codec] .
        Hide
        Gary Gregory added a comment -

        FYI: I'm experimenting with a "Charsets" constant class in [codec] now (not committed).

        Show
        Gary Gregory added a comment - FYI: I'm experimenting with a "Charsets" constant class in [codec] now (not committed).
        Hide
        Sebb added a comment - - edited

        That makes more sense now, but I think it would be overkill to introduce a new interface here.

        Using Charset would be better IMO.

        Using Charset would convert the checked UnsupportedEncodingException into the unchecked UnsupportedCharsetException.
        This should simplify application code that does not already catch IOException, though of course in Commons IO many methods throw IOE already.

        AFAICT, parameters would need to be changed to use (e.g.) Charset.forName("UTF-8") instead of "UTF-8" so user code would be slightly longer.

        Show
        Sebb added a comment - - edited That makes more sense now, but I think it would be overkill to introduce a new interface here. Using Charset would be better IMO. Using Charset would convert the checked UnsupportedEncodingException into the unchecked UnsupportedCharsetException . This should simplify application code that does not already catch IOException , though of course in Commons IO many methods throw IOE already. AFAICT, parameters would need to be changed to use (e.g.) Charset.forName("UTF-8") instead of "UTF-8" so user code would be slightly longer.
        Hide
        Aaron Digulla added a comment -

        If you don't like to add a new interface, how about supporting Charset? It doesn't throw a checked exception, for example and eventually, all the methods that accept string will have to lookup a Charset.

        I'll try to convince commons-lang to convert the String constants to Charset constants (https://issues.apache.org/jira/browse/LANG-795)

        Show
        Aaron Digulla added a comment - If you don't like to add a new interface, how about supporting Charset ? It doesn't throw a checked exception, for example and eventually, all the methods that accept string will have to lookup a Charset . I'll try to convince commons-lang to convert the String constants to Charset constants ( https://issues.apache.org/jira/browse/LANG-795 )
        Hide
        Aaron Digulla added a comment -

        My point is that everyone litters their code with string constants. String constants are bad for various reasons and APIs should not endorse them. In my own code, I use an interface so everyone can add more encodings if they need that but afterwards, I always know what is an encoding and what is text data (so no mixups like FileUtils.write("UTF-8", "Hello, world")).

        But I agree that commons IO is probably the wrong place to add them. Moving to commons-lang (which also contains code that handles the exception).

        Show
        Aaron Digulla added a comment - My point is that everyone litters their code with string constants. String constants are bad for various reasons and APIs should not endorse them. In my own code, I use an interface so everyone can add more encodings if they need that but afterwards, I always know what is an encoding and what is text data (so no mixups like FileUtils.write("UTF-8", "Hello, world") ). But I agree that commons IO is probably the wrong place to add them. Moving to commons-lang (which also contains code that handles the exception).
        Hide
        Sebb added a comment -

        Thanks for the list.
        However, I don't think that makes a significant difference.

        Show
        Sebb added a comment - Thanks for the list. However, I don't think that makes a significant difference.
        Hide
        Gary Gregory added a comment -

        There are six required encodings: http://docs.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html

        These are defined in a couple of Commons places:

        • [lang]: org.apache.commons.lang3.CharEncoding
        • [codec]: org.apache.commons.codec.CharEncoding

        Gary

        Show
        Gary Gregory added a comment - There are six required encodings: http://docs.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html These are defined in a couple of Commons places: [lang] : org.apache.commons.lang3.CharEncoding [codec] : org.apache.commons.codec.CharEncoding Gary
        Hide
        Sebb added a comment -

        I don't think this is a good idea.
        There are a lot of different encodings, and who is to say which ones are "useful"?
        There would still need to be a way to use the String encoding to allow for encodings that are not provided by the interface.

        Also, the code would still need to catch UnsupportedEncodingException.
        As far as I know there is no requirement for a Java class-library to support any specific encodings, though it would be a fairly useless implementation that did not support UTF-8.

        Show
        Sebb added a comment - I don't think this is a good idea. There are a lot of different encodings, and who is to say which ones are "useful"? There would still need to be a way to use the String encoding to allow for encodings that are not provided by the interface. Also, the code would still need to catch UnsupportedEncodingException . As far as I know there is no requirement for a Java class-library to support any specific encodings, though it would be a fairly useless implementation that did not support UTF-8.

          People

          • Assignee:
            Unassigned
            Reporter:
            Aaron Digulla
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development