Issue Details (XML | Word | Printable)

Key: COCOON-2063
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Jörg Heinicke
Reporter: Alexander Klimetschek
Votes: 1
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Cocoon

NekoHTMLTransformer needs to set the default-encoding of the current system to work properly with UTF-8

Created: 10/May/07 09:15 AM   Updated: 08/May/08 05:38 AM
Return to search
Component/s: Blocks: HTML
Affects Version/s: 2.1.11, 2.2
Fix Version/s: 2.1.12-dev (Current SVN), 2.2-dev (Current SVN)

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works NekoHTMLGenerator_BRANCH2_1_X.patch 2007-11-26 04:17 PM Ellis Pritchard 1.0 kB
Text File Licensed for inclusion in ASF works nekohtmltransformer-encoding.patch 2007-05-10 09:16 AM Alexander Klimetschek 3 kB

Other Info: Patch available
Affects version (Component): Blocks: HTML - 1.0.0-M1
Fix version (Component): Blocks: HTML


 Description  « Hide
The NekoHTMLTransformer uses the cyberneko HTMLConfiguration for tidying html. Unfortunately it does not use the system's current encoding as default, instead you have to set a property to set your encoding. But this varies from one OS to another, so the best solution is to set this property automatically in the NekoHTMLTransformer depending on what Java uses as defaultCharset:

            config.setProperty("http://cyberneko.org/html/properties/default-encoding", Charset.defaultCharset().name());


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Alexander Klimetschek added a comment - 10/May/07 09:16 AM
Affects cocoon-html-impl.

Alexander Klimetschek added a comment - 10/May/07 09:18 AM
I forgot to mention that if someone wants to override this property via the configuration of the NekoHTMLTransformer, he can certainly do it. The manual config is applied after the dynamic setting of the encoding property, thus the manual one overrides the dynamic one.

Ellis Pritchard added a comment - 26/Nov/07 04:17 PM
This has bitten us too.

Here's a patch for Cocoon 2.1.X, rev 597695

Ellis Pritchard added a comment - 26/Nov/07 04:20 PM
Added Affects Version 2.1.11-dev

Ellis Pritchard added a comment - 03/Apr/08 04:11 PM
Anyone fancy applying the patches?

Jörg Heinicke added a comment - 04/Apr/08 03:31 AM
I fixed this issue in 2.2. The fix in 2.1 does not work since it uses java.nio which was only added in Java 1.4. Cocoon 2.1 has to be Java 1.3 compatible. Is there a way to find out the default encoding in Java 1.3? All the classes and methods were it would be necessary like new String(byte[]) or InputStreamReader() only point to http://java.sun.com/j2se/1.3/docs/api/java/lang/package-summary.html#charenc which just names some constants. With Mac OS X I also have no access to the source code of the JDK. The bytecode implies that the mentioned classes and methods use some Sun-internal class to retrieve the default encoding.

Jörg Heinicke added a comment - 04/Apr/08 12:00 PM
Also had to revert the fix for Cocoon 2.2 since Charset.defaultCharset() is only available on Java 5.

Ellis Pritchard added a comment - 22/Apr/08 09:54 AM
Oh, how annoying.

Is there a possibility of starting to use the src/jdk1.x directories for these kind of patches?

Just because some people are stuck in the dark ages doesn't mean we can't shine a light...

Jörg Heinicke added a comment - 24/Apr/08 05:32 AM
With Ant this would be rather easy as we already have some 1.4-specific code in Cocoon 2.1. No idea about Maven.

Is there really no other way to figure out the default encoding on before Java 5 JVMs?

Vadim Gritsenko added a comment - 24/Apr/08 02:07 PM
I'm not sure why generator needs to know encoding... Can it be simply always set to UTF-8?

Jörg Heinicke added a comment - 08/May/08 05:38 AM
As Vadim mentioned at http://marc.info/?l=xml-cocoon-dev&m=120905050708311&w=4 the NekoHTMLTransformer had an issue with converting String to byte[] using OS' default encoding rather than keeping the string - and so had the NekoHTMLGenerator when reading a request parameter value. These both issues are fixed in SVN and maybe caused the symptoms you saw. Closing the issue for now. Feel free to reopen it if problem still persists.