Tapestry
  1. Tapestry
  2. TAPESTRY-2525

Properties files in a message catalog should be read using UTF-8 encoding, rather than default encoding

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.0.13
    • Fix Version/s: 5.0.14
    • Component/s: tapestry-core
    • Labels:
      None

      Description

      Allow different encodings to be used for properties files so that native2ascii is not necessary. Possibly utilise the new constructors in the Java library that take a Reader. (Added in 1.6)

        Activity

        Hide
        Andy Blower added a comment - - edited

        Unfortunately, the fix doesn't work as implemented Howard. Basically, the readStreamAsUTF8() method does nothing, because it reads the file as UTF8 and then encodes it again as UTF8. Even if the StringBuffer is encoded as ISO-8859-1, it cannot express the characters so this approach is not going to work as far as I can see.

        I've implemented a fix which reads UTF8 encoded properties files if the new Properties.load(Reader) method is available (JDK1.6 and above) and it works perfectly. Here's the fixed version of MessagesSourceImpl.readProperties() - I think that the CHARSET constant ("UTF-8") might be a candidate for a symbol so it can be changed, but since we're using UTF8 I'm not bothered.

        /**

        • Creates and returns a new map that contains properties read from the properties file.
          */
          private Map<String, String> readProperties(Resource resource)
          {
          if (!resource.exists()) return emptyMap;

        tracker.add(resource.toURL());

        Map<String, String> result = CollectionFactory.newCaseInsensitiveMap();

        Properties p = new Properties();
        InputStream is = null;

        try
        {
        is = resource.openStream();

        try

        { // Use new reader loader for > JDK1.6 via reflection. Method newLoader = Properties.class.getMethod("load", Reader.class); Reader propReader = new BufferedReader(new InputStreamReader(is, CHARSET)); newLoader.invoke(p, propReader); }

        catch (NoSuchMethodException e)

        { // Use old stream loader for < JDK1.6 (properties files must be ISO-8859-1 encoded) p.load(is); }

        is.close();

        is = null;
        }
        catch (Exception ex)

        { throw new RuntimeException(ServicesMessages.failureReadingMessages(resource, ex), ex); }

        finally

        { InternalUtils.close(is); }

        for (Map.Entry e : p.entrySet())

        { String key = e.getKey().toString(); String value = p.getProperty(key); result.put(key, value); }

        return result;
        }

        Show
        Andy Blower added a comment - - edited Unfortunately, the fix doesn't work as implemented Howard. Basically, the readStreamAsUTF8() method does nothing, because it reads the file as UTF8 and then encodes it again as UTF8. Even if the StringBuffer is encoded as ISO-8859-1, it cannot express the characters so this approach is not going to work as far as I can see. I've implemented a fix which reads UTF8 encoded properties files if the new Properties.load(Reader) method is available (JDK1.6 and above) and it works perfectly. Here's the fixed version of MessagesSourceImpl.readProperties() - I think that the CHARSET constant ("UTF-8") might be a candidate for a symbol so it can be changed, but since we're using UTF8 I'm not bothered. /** Creates and returns a new map that contains properties read from the properties file. */ private Map<String, String> readProperties(Resource resource) { if (!resource.exists()) return emptyMap; tracker.add(resource.toURL()); Map<String, String> result = CollectionFactory.newCaseInsensitiveMap(); Properties p = new Properties(); InputStream is = null; try { is = resource.openStream(); try { // Use new reader loader for > JDK1.6 via reflection. Method newLoader = Properties.class.getMethod("load", Reader.class); Reader propReader = new BufferedReader(new InputStreamReader(is, CHARSET)); newLoader.invoke(p, propReader); } catch (NoSuchMethodException e) { // Use old stream loader for < JDK1.6 (properties files must be ISO-8859-1 encoded) p.load(is); } is.close(); is = null; } catch (Exception ex) { throw new RuntimeException(ServicesMessages.failureReadingMessages(resource, ex), ex); } finally { InternalUtils.close(is); } for (Map.Entry e : p.entrySet()) { String key = e.getKey().toString(); String value = p.getProperty(key); result.put(key, value); } return result; }
        Hide
        Howard M. Lewis Ship added a comment -

        Tapestry 4 had a more elaborate system of meta-data used to determine the encoding when reading an individual properties file. Lets see how well it works when we just assume a UTF-8 encoding (which seems to read normal ASCII files quite well).

        Show
        Howard M. Lewis Ship added a comment - Tapestry 4 had a more elaborate system of meta-data used to determine the encoding when reading an individual properties file. Lets see how well it works when we just assume a UTF-8 encoding (which seems to read normal ASCII files quite well).

          People

          • Assignee:
            Howard M. Lewis Ship
            Reporter:
            Andy Blower
          • Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development