Issue Details (XML | Word | Printable)

Key: COCOON-2063
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Jörg Heinicke
Reporter: Alexander Klimetschek
Votes: 1
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Cocoon

NekoHTMLTransformer needs to set the default-encoding of the current system to work properly with UTF-8

Created: 10/May/07 09:15 AM   Updated: 08/May/08 05:38 AM
Return to search
Component/s: Blocks: HTML
Affects Version/s: 2.1.11, 2.2
Fix Version/s: 2.1.12-dev (Current SVN), 2.2-dev (Current SVN)

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works NekoHTMLGenerator_BRANCH2_1_X.patch 2007-11-26 04:17 PM Ellis Pritchard 1.0 kB
Text File Licensed for inclusion in ASF works nekohtmltransformer-encoding.patch 2007-05-10 09:16 AM Alexander Klimetschek 3 kB

Other Info: Patch available
Affects version (Component): Blocks: HTML - 1.0.0-M1
Fix version (Component): Blocks: HTML


 Description  « Hide
The NekoHTMLTransformer uses the cyberneko HTMLConfiguration for tidying html. Unfortunately it does not use the system's current encoding as default, instead you have to set a property to set your encoding. But this varies from one OS to another, so the best solution is to set this property automatically in the NekoHTMLTransformer depending on what Java uses as defaultCharset:

            config.setProperty("http://cyberneko.org/html/properties/default-encoding", Charset.defaultCharset().name());


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Alexander Klimetschek added a comment - 10/May/07 09:16 AM
Affects cocoon-html-impl.

Alexander Klimetschek made changes - 10/May/07 09:16 AM
Field Original Value New Value
Attachment nekohtmltransformer-encoding.patch [ 12357020 ]
Alexander Klimetschek added a comment - 10/May/07 09:18 AM
I forgot to mention that if someone wants to override this property via the configuration of the NekoHTMLTransformer, he can certainly do it. The manual config is applied after the dynamic setting of the encoding property, thus the manual one overrides the dynamic one.

Ellis Pritchard added a comment - 26/Nov/07 04:17 PM
This has bitten us too.

Here's a patch for Cocoon 2.1.X, rev 597695

Ellis Pritchard made changes - 26/Nov/07 04:17 PM
Attachment NekoHTMLGenerator_BRANCH2_1_X.patch [ 12370212 ]
Ellis Pritchard added a comment - 26/Nov/07 04:20 PM
Added Affects Version 2.1.11-dev

Ellis Pritchard made changes - 26/Nov/07 04:20 PM
Affects Version/s 2.1.11-dev (Current SVN) [ 12312231 ]
Ellis Pritchard added a comment - 03/Apr/08 04:11 PM
Anyone fancy applying the patches?

Repository Revision Date User Message
ASF #644595 Fri Apr 04 03:11:52 UTC 2008 joerg COCOON-2063: Set system's default encoding on NekoHTMLGenerator and NekoHTMLTransformer configuration to make them work with UTF-8.
Files Changed
MODIFY /cocoon/trunk/blocks/cocoon-html/cocoon-html-impl/src/main/java/org/apache/cocoon/components/NekoHtmlSaxParser.java
MODIFY /cocoon/trunk/blocks/cocoon-html/cocoon-html-impl/src/changes/changes.xml

Jörg Heinicke added a comment - 04/Apr/08 03:31 AM
I fixed this issue in 2.2. The fix in 2.1 does not work since it uses java.nio which was only added in Java 1.4. Cocoon 2.1 has to be Java 1.3 compatible. Is there a way to find out the default encoding in Java 1.3? All the classes and methods were it would be necessary like new String(byte[]) or InputStreamReader() only point to http://java.sun.com/j2se/1.3/docs/api/java/lang/package-summary.html#charenc which just names some constants. With Mac OS X I also have no access to the source code of the JDK. The bytecode implies that the mentioned classes and methods use some Sun-internal class to retrieve the default encoding.

Jörg Heinicke made changes - 04/Apr/08 03:31 AM
Affects version (Component) Parent values: Blocks: HTML(10168). Level 1 values: 1.0.0-M1(10198).
Fix version (Component) Parent values: Blocks: HTML(10240). Level 1 values: 1.0.0-RC1(10271).
Priority Major [ 3 ] Minor [ 4 ]
Affects Version/s 2.2 [ 12310611 ]
Fix Version/s 2.2-dev (Current SVN) [ 12313093 ]
Assignee Jörg Heinicke [ joerg.heinicke@gmx.de ]
Jörg Heinicke made changes - 04/Apr/08 03:35 AM
Affects Version/s 2.2 [ 12310611 ]
Jörg Heinicke added a comment - 04/Apr/08 12:00 PM
Also had to revert the fix for Cocoon 2.2 since Charset.defaultCharset() is only available on Java 5.

Jörg Heinicke made changes - 04/Apr/08 12:00 PM
Fix version (Component) Parent values: Blocks: HTML(10240). Level 1 values: 1.0.0-RC1(10271). Parent values: Blocks: HTML(10240).
Fix Version/s 2.2-dev (Current SVN) [ 12313093 ]
Repository Revision Date User Message
ASF #644687 Fri Apr 04 12:01:03 UTC 2008 joerg COCOON-2063: revert for the time being since Charset.defaultCharset() is only Java 5. Any replacement?
Files Changed
MODIFY /cocoon/trunk/blocks/cocoon-html/cocoon-html-impl/src/main/java/org/apache/cocoon/components/NekoHtmlSaxParser.java
MODIFY /cocoon/trunk/blocks/cocoon-html/cocoon-html-impl/src/changes/changes.xml

Ellis Pritchard added a comment - 22/Apr/08 09:54 AM
Oh, how annoying.

Is there a possibility of starting to use the src/jdk1.x directories for these kind of patches?

Just because some people are stuck in the dark ages doesn't mean we can't shine a light...

Jörg Heinicke added a comment - 24/Apr/08 05:32 AM
With Ant this would be rather easy as we already have some 1.4-specific code in Cocoon 2.1. No idea about Maven.

Is there really no other way to figure out the default encoding on before Java 5 JVMs?

Vadim Gritsenko added a comment - 24/Apr/08 02:07 PM
I'm not sure why generator needs to know encoding... Can it be simply always set to UTF-8?

Repository Revision Date User Message
ASF #654401 Thu May 08 03:26:25 UTC 2008 joerg COCOON-2063: Fix encoding issue in NekoHTMLTransformer (http://marc.info/?l=xml-cocoon-dev&m=120905050708311&w=4)
Files Changed
MODIFY /cocoon/branches/BRANCH_2_1_X/src/blocks/html/java/org/apache/cocoon/transformation/NekoHTMLTransformer.java
MODIFY /cocoon/branches/BRANCH_2_1_X/status.xml

Repository Revision Date User Message
ASF #654403 Thu May 08 03:44:59 UTC 2008 joerg COCOON-2063: Fix encoding issue in NekoHTMLTransformer (http://marc.info/?l=xml-cocoon-dev&m=120905050708311&w=4)
Files Changed
MODIFY /cocoon/trunk/blocks/cocoon-html/cocoon-html-impl/src/changes/changes.xml
MODIFY /cocoon/trunk/blocks/cocoon-html/cocoon-html-impl/src/main/java/org/apache/cocoon/transformation/NekoHTMLTransformer.java

Repository Revision Date User Message
ASF #654418 Thu May 08 05:24:42 UTC 2008 joerg COCOON-2063: Fix encoding issue in NekoHTMLGenerator when reading a request parameter value.
Files Changed
MODIFY /cocoon/branches/BRANCH_2_1_X/status.xml
MODIFY /cocoon/branches/BRANCH_2_1_X/src/blocks/html/java/org/apache/cocoon/generation/NekoHTMLGenerator.java

Repository Revision Date User Message
ASF #654419 Thu May 08 05:25:15 UTC 2008 joerg COCOON-2063: Fix encoding issue in NekoHTMLGenerator when reading a request parameter value.
Files Changed
MODIFY /cocoon/trunk/blocks/cocoon-html/cocoon-html-impl/src/changes/changes.xml
MODIFY /cocoon/trunk/blocks/cocoon-html/cocoon-html-impl/src/main/java/org/apache/cocoon/generation/NekoHTMLGenerator.java

Jörg Heinicke added a comment - 08/May/08 05:38 AM
As Vadim mentioned at http://marc.info/?l=xml-cocoon-dev&m=120905050708311&w=4 the NekoHTMLTransformer had an issue with converting String to byte[] using OS' default encoding rather than keeping the string - and so had the NekoHTMLGenerator when reading a request parameter value. These both issues are fixed in SVN and maybe caused the symptoms you saw. Closing the issue for now. Feel free to reopen it if problem still persists.

Jörg Heinicke made changes - 08/May/08 05:38 AM
Resolution Fixed [ 1 ]
Fix Version/s 2.2-dev (Current SVN) [ 12313093 ]
Fix Version/s 2.1.12-dev (Current SVN) [ 12312903 ]
Status Open [ 1 ] Closed [ 6 ]