Uploaded image for project: 'Camel'
  1. Camel
  2. CAMEL-11846

xtokenize and apply xslt to a string does not work with UTF-16BE

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.17.5
    • Fix Version/s: 3.10.0
    • Component/s: camel-core
    • Labels:
      None
    • Estimated Complexity:
      Unknown

      Description

      In XML, encoding is often provided inside <?xml ..?> tag. In general, you cannot read the tag, if you don't know the encoding, but XML Parsers support the detection of several encodings which allows them to read the tag. With that information they can read the whole file without knowing the "charset" in first place.

      xtokenize and xslt use XmlInputFactory#createXmlStreamReader(Reader). But by providing a reader Camel tells, that it knows the encoding, so it won't be detected by the XML parser.
      Also Camel sets the charset to UTF-8 if it is not provided inside a header. This makes the underlying reader fail reading UTF-16.

      Using XmlInputFactory#createXmlStreamReader(InputStream) inside XMLTokenExpressionIterator works (tried in a patch). But the next xslt steps fails again because it again uses a Reader.

      See Stackoverflow Question for reference:
      https://stackoverflow.com/questions/46322376/apache-camel-to-handle-encoding-declared-in-xml-file

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                davsclaus Claus Ibsen
                Reporter:
                antidote2 Robert Half
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: