Uploaded image for project: 'Camel'
  1. Camel
  2. CAMEL-7584

XML-Aware Tokenizer failing with utf-8 multibyte characters

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.14.0
    • camel-core
    • None
    • Unknown

    Description

      There is some issue in the underlining Stax reader's getLocation().getCharOffset() when the input data is an InputStream to the stax reader.

      This issue was brought up in the woodstox community. But I believe fixing it seems to be non trivial as woodstox internally uses char/Reader and keeps the offset value to the character sequence and not to the original input stream.

      We change the tokenzer to pass java.io.Reader to the woodstox parser instead of passing java.io.InputStream directly.

      Attachments

        Issue Links

          Activity

            People

              ay Akitoshi Yoshida
              ay Akitoshi Yoshida
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: