[WICKET-5416] BOM in UTF markup file breaks encoding detection - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.5.10, 6.12.0
Fix Version/s: 1.5.11, 6.13.0, 7.0.0-M1
Component/s: wicket
Labels:
None
Environment:
Windows 7, jdk1.7.0_45

Description

I have project with internationalization and experienced this problem with one of the pages with non-english content. Page had UTF-8 encoding, but my JVM encoding is different. I always use "<?xml encoding ... ?>" to specify encoding for markup pages (and "MarkupSettings.defaultMarkupEncoding" is not set).

Unexpectedly I got problem with bad encoding on page. After several hours of debugging I found what source of this issue was UTF BOM (Byte order mark) at the beggining of file and inability of "XmlReader" to process it. "XmlReader.getXmlDeclaration" tries to match xml declaration with regular expression, but fails because of BOM. After that encoding defaults to JVM encoding.

It's possible to use "org.apache.commons.io.input.BOMInputStream" to handle BOM or you could handle it manually inside "XmlReader".

PS: issue found with Wicket 1.5.10 and I see same code in 6.12.0 without BOM handling, so I added it to "Affects Version/s", but no proof-in-code available from me at this moment.

Attachments

Activity

People

Assignee:: Martin Tzvetanov Grigorov

Reporter:: Vadim Ponomarev

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 13/Nov/13 20:11

Updated:: 15/Nov/13 14:11

Resolved:: 15/Nov/13 14:11