Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.22.0
-
None
-
Unknown
Description
This route:
from("file:/...?charset=iso-8859-1&&include=.*\.xml") .split(xpath("/foo/bar")) ...
does not read and split XML like the following with the correct encoding:
<?xml version="1.0" encoding="ISO-8859-1"?> <foo> <bar>abc</bar> <bar>xyz</bar> <bar>åäö</bar> </root>
The root cause is due to the spec of IOConverter.toInputStream(File, String):
https://github.com/apache/camel/blob/camel-2.22.1/camel-core/src/main/java/org/apache/camel/converter/IOConverter.java#L84-L119
which was clarified at CAMEL-8346 and CAMEL-8356.
This method converts a File with a charset to an InputStream with the JVM default charset encoding whatever the format of the file is. However, in turn XmlConverter.toDOMDocument(...) uses DocumentBuilder to convert the input stream to a DOM Document and DocumentBuilder is aware of the XML declaration:
<?xml version="1.0" encoding="ISO-8859-1"?>
to detect the file encoding, and there is a mismatch between the actual encoding of the input stream (JVM default) and the encoding declared in XML.
Attachments
Issue Links
- is related to
-
CAMEL-13136 File consumer with charset doesn't parse XML
- Resolved
- links to