[CAMEL-11846] xtokenize and apply xslt to a string does not work with UTF-16BE - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.17.5
Fix Version/s: 3.10.0
Component/s: camel-core
Labels:
None

Estimated Complexity:
Unknown

Description

In XML, encoding is often provided inside <?xml ..?> tag. In general, you cannot read the tag, if you don't know the encoding, but XML Parsers support the detection of several encodings which allows them to read the tag. With that information they can read the whole file without knowing the "charset" in first place.

xtokenize and xslt use XmlInputFactory#createXmlStreamReader(Reader). But by providing a reader Camel tells, that it knows the encoding, so it won't be detected by the XML parser.
Also Camel sets the charset to UTF-8 if it is not provided inside a header. This makes the underlying reader fail reading UTF-16.

Using XmlInputFactory#createXmlStreamReader(InputStream) inside XMLTokenExpressionIterator works (tried in a patch). But the next xslt steps fails again because it again uses a Reader.

See Stackoverflow Question for reference:
https://stackoverflow.com/questions/46322376/apache-camel-to-handle-encoding-declared-in-xml-file

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

my example looks like this (and it's really UTF-16BE).png
09/Nov/17 12:37
10 kB
Robert Half
UTF-16BE (with BOM).png
09/Nov/17 12:27
18 kB
Robert Half

Issue Links

relates to

CAMEL-13374 XMLTokenExpressionIterator Default Exchange charset overrides original xml encoding from InputStream

Resolved

Activity

People

Assignee:: Claus Ibsen

Reporter:: Robert Half

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 26/Sep/17 14:49

Updated:: 23/Mar/21 09:01

Resolved:: 23/Mar/21 09:01