Uploaded image for project: 'Camel'
  1. Camel
  2. CAMEL-12769

Combination of File consumer with charset and Split DSL with XPath doesn't parse XML correctly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.22.0
    • 2.21.3, 2.22.2, 2.23.0
    • camel-core
    • None
    • Unknown

    Description

      This route:

      from("file:/...?charset=iso-8859-1&&include=.*\.xml")
          .split(xpath("/foo/bar"))
              ...
      

      does not read and split XML like the following with the correct encoding:

      <?xml version="1.0" encoding="ISO-8859-1"?>
      <foo>
      	<bar>abc</bar>
      	<bar>xyz</bar>
      	<bar>åäö</bar>
      </root>
      

      The root cause is due to the spec of IOConverter.toInputStream(File, String):
      https://github.com/apache/camel/blob/camel-2.22.1/camel-core/src/main/java/org/apache/camel/converter/IOConverter.java#L84-L119
      which was clarified at CAMEL-8346 and CAMEL-8356.

      This method converts a File with a charset to an InputStream with the JVM default charset encoding whatever the format of the file is. However, in turn XmlConverter.toDOMDocument(...) uses DocumentBuilder to convert the input stream to a DOM Document and DocumentBuilder is aware of the XML declaration:

      <?xml version="1.0" encoding="ISO-8859-1"?>
      

      to detect the file encoding, and there is a mismatch between the actual encoding of the input stream (JVM default) and the encoding declared in XML.

      Attachments

        Issue Links

          Activity

            People

              tadayosi Tadayoshi Sato
              tadayosi Tadayoshi Sato
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: