Uploaded image for project: 'Camel'
  1. Camel
  2. CAMEL-12769

Combination of File consumer with charset and Split DSL with XPath doesn't parse XML correctly

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.22.0
    • Fix Version/s: 2.21.3, 2.22.2, 2.23.0
    • Component/s: camel-core
    • Labels:
      None
    • Estimated Complexity:
      Unknown

      Description

      This route:

      from("file:/...?charset=iso-8859-1&&include=.*\.xml")
          .split(xpath("/foo/bar"))
              ...
      

      does not read and split XML like the following with the correct encoding:

      <?xml version="1.0" encoding="ISO-8859-1"?>
      <foo>
      	<bar>abc</bar>
      	<bar>xyz</bar>
      	<bar>åäö</bar>
      </root>
      

      The root cause is due to the spec of IOConverter.toInputStream(File, String):
      https://github.com/apache/camel/blob/camel-2.22.1/camel-core/src/main/java/org/apache/camel/converter/IOConverter.java#L84-L119
      which was clarified at CAMEL-8346 and CAMEL-8356.

      This method converts a File with a charset to an InputStream with the JVM default charset encoding whatever the format of the file is. However, in turn XmlConverter.toDOMDocument(...) uses DocumentBuilder to convert the input stream to a DOM Document and DocumentBuilder is aware of the XML declaration:

      <?xml version="1.0" encoding="ISO-8859-1"?>
      

      to detect the file encoding, and there is a mismatch between the actual encoding of the input stream (JVM default) and the encoding declared in XML.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tadayosi Tadayoshi Sato
                Reporter:
                tadayosi Tadayoshi Sato
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: