[CAMEL-12769] Combination of File consumer with charset and Split DSL with XPath doesn't parse XML correctly - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.22.0
Fix Version/s: 2.21.3, 2.22.2, 2.23.0
Component/s: camel-core
Labels:
None

Estimated Complexity:
Unknown

Description

This route:

from("file:/...?charset=iso-8859-1&&include=.*\.xml")
    .split(xpath("/foo/bar"))
        ...

does not read and split XML like the following with the correct encoding:

<?xml version="1.0" encoding="ISO-8859-1"?>
<foo>
	<bar>abc</bar>
	<bar>xyz</bar>
	<bar>åäö</bar>
</root>

The root cause is due to the spec of IOConverter.toInputStream(File, String):
https://github.com/apache/camel/blob/camel-2.22.1/camel-core/src/main/java/org/apache/camel/converter/IOConverter.java#L84-L119
which was clarified at ~~CAMEL-8346~~ and ~~CAMEL-8356~~.

This method converts a File with a charset to an InputStream with the JVM default charset encoding whatever the format of the file is. However, in turn XmlConverter.toDOMDocument(...) uses DocumentBuilder to convert the input stream to a DOM Document and DocumentBuilder is aware of the XML declaration:

<?xml version="1.0" encoding="ISO-8859-1"?>

to detect the file encoding, and there is a mismatch between the actual encoding of the input stream (JVM default) and the encoding declared in XML.

Attachments

Issue Links

is related to

CAMEL-13136 File consumer with charset doesn't parse XML

Resolved

links to

GitHub Pull Request #2505

Activity

People

Assignee:: Tadayoshi Sato

Reporter:: Tadayoshi Sato

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 03/Sep/18 08:22

Updated:: 14/May/19 10:18

Resolved:: 04/Sep/18 14:37