Cocoon
  1. Cocoon
  2. COCOON-2063

NekoHTMLTransformer needs to set the default-encoding of the current system to work properly with UTF-8

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.1.11, 2.2
    • Fix Version/s: 2.1.12, 2.2.1
    • Component/s: Blocks: HTML
    • Labels:
      None
    • Other Info:
      Patch available
    • Affects version (Component):
      Blocks: HTML - 1.0.0-M1
    • Fix version (Component):
      Blocks: HTML

      Description

      The NekoHTMLTransformer uses the cyberneko HTMLConfiguration for tidying html. Unfortunately it does not use the system's current encoding as default, instead you have to set a property to set your encoding. But this varies from one OS to another, so the best solution is to set this property automatically in the NekoHTMLTransformer depending on what Java uses as defaultCharset:

                  config.setProperty("http://cyberneko.org/html/properties/default-encoding", Charset.defaultCharset().name());
      1. NekoHTMLGenerator_BRANCH2_1_X.patch
        1.0 kB
        Ellis Pritchard
      2. nekohtmltransformer-encoding.patch
        3 kB
        Alexander Klimetschek

        Activity

        Hide
        Alexander Klimetschek added a comment -
        Affects cocoon-html-impl.
        Show
        Alexander Klimetschek added a comment - Affects cocoon-html-impl.
        Hide
        Alexander Klimetschek added a comment -
        I forgot to mention that if someone wants to override this property via the configuration of the NekoHTMLTransformer, he can certainly do it. The manual config is applied after the dynamic setting of the encoding property, thus the manual one overrides the dynamic one.
        Show
        Alexander Klimetschek added a comment - I forgot to mention that if someone wants to override this property via the configuration of the NekoHTMLTransformer, he can certainly do it. The manual config is applied after the dynamic setting of the encoding property, thus the manual one overrides the dynamic one.
        Hide
        Ellis Pritchard added a comment -
        This has bitten us too.

        Here's a patch for Cocoon 2.1.X, rev 597695
        Show
        Ellis Pritchard added a comment - This has bitten us too. Here's a patch for Cocoon 2.1.X, rev 597695
        Hide
        Ellis Pritchard added a comment -
        Added Affects Version 2.1.11-dev
        Show
        Ellis Pritchard added a comment - Added Affects Version 2.1.11-dev
        Hide
        Ellis Pritchard added a comment -
        Anyone fancy applying the patches?
        Show
        Ellis Pritchard added a comment - Anyone fancy applying the patches?
        Hide
        Jörg Heinicke added a comment -
        I fixed this issue in 2.2. The fix in 2.1 does not work since it uses java.nio which was only added in Java 1.4. Cocoon 2.1 has to be Java 1.3 compatible. Is there a way to find out the default encoding in Java 1.3? All the classes and methods were it would be necessary like new String(byte[]) or InputStreamReader() only point to http://java.sun.com/j2se/1.3/docs/api/java/lang/package-summary.html#charenc which just names some constants. With Mac OS X I also have no access to the source code of the JDK. The bytecode implies that the mentioned classes and methods use some Sun-internal class to retrieve the default encoding.
        Show
        Jörg Heinicke added a comment - I fixed this issue in 2.2. The fix in 2.1 does not work since it uses java.nio which was only added in Java 1.4. Cocoon 2.1 has to be Java 1.3 compatible. Is there a way to find out the default encoding in Java 1.3? All the classes and methods were it would be necessary like new String(byte[]) or InputStreamReader() only point to http://java.sun.com/j2se/1.3/docs/api/java/lang/package-summary.html#charenc which just names some constants. With Mac OS X I also have no access to the source code of the JDK. The bytecode implies that the mentioned classes and methods use some Sun-internal class to retrieve the default encoding.
        Hide
        Jörg Heinicke added a comment -
        Also had to revert the fix for Cocoon 2.2 since Charset.defaultCharset() is only available on Java 5.
        Show
        Jörg Heinicke added a comment - Also had to revert the fix for Cocoon 2.2 since Charset.defaultCharset() is only available on Java 5.
        Hide
        Ellis Pritchard added a comment -
        Oh, how annoying.

        Is there a possibility of starting to use the src/jdk1.x directories for these kind of patches?

        Just because some people are stuck in the dark ages doesn't mean we can't shine a light...
        Show
        Ellis Pritchard added a comment - Oh, how annoying. Is there a possibility of starting to use the src/jdk1.x directories for these kind of patches? Just because some people are stuck in the dark ages doesn't mean we can't shine a light...
        Hide
        Jörg Heinicke added a comment -
        With Ant this would be rather easy as we already have some 1.4-specific code in Cocoon 2.1. No idea about Maven.

        Is there really no other way to figure out the default encoding on before Java 5 JVMs?
        Show
        Jörg Heinicke added a comment - With Ant this would be rather easy as we already have some 1.4-specific code in Cocoon 2.1. No idea about Maven. Is there really no other way to figure out the default encoding on before Java 5 JVMs?
        Hide
        Vadim Gritsenko added a comment -
        I'm not sure why generator needs to know encoding... Can it be simply always set to UTF-8?
        Show
        Vadim Gritsenko added a comment - I'm not sure why generator needs to know encoding... Can it be simply always set to UTF-8?
        Hide
        Jörg Heinicke added a comment -
        As Vadim mentioned at http://marc.info/?l=xml-cocoon-dev&m=120905050708311&w=4 the NekoHTMLTransformer had an issue with converting String to byte[] using OS' default encoding rather than keeping the string - and so had the NekoHTMLGenerator when reading a request parameter value. These both issues are fixed in SVN and maybe caused the symptoms you saw. Closing the issue for now. Feel free to reopen it if problem still persists.
        Show
        Jörg Heinicke added a comment - As Vadim mentioned at http://marc.info/?l=xml-cocoon-dev&m=120905050708311&w=4 the NekoHTMLTransformer had an issue with converting String to byte[] using OS' default encoding rather than keeping the string - and so had the NekoHTMLGenerator when reading a request parameter value. These both issues are fixed in SVN and maybe caused the symptoms you saw. Closing the issue for now. Feel free to reopen it if problem still persists.

          People

          • Assignee:
            Jörg Heinicke
            Reporter:
            Alexander Klimetschek
          • Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development