Uploaded image for project: 'Abdera'
  1. Abdera
  2. ABDERA-222

Parse failures reading utf-8 xml files that have attribute values that contain non US-ASCII valid utf-8 characters

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.4.0
    • None
    • None
    • solarix x86_64, MaxOS Leopard x86_64, linux x86_64

    Description

      When parsing XML files that are items fetched by http-client 3.1

      The same items parse correctly, if written to a byte array and then a ByteArrayInputStream on the byte array, is passed to parse.
      parser.parse(response.getResponseBodyAsStream());

      Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character (NULL, unicode 0) encountered: not valid in any content
      at [row,col

      {unknown-source}

      ]: [3,56]
      at com.ctc.wstx.sr.StreamScanner.constructNullCharException(StreamScanner.java:615)
      at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:644)
      at com.ctc.wstx.sr.BasicStreamReader.readTextPrimary(BasicStreamReader.java:4554)
      at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2886)
      at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
      at org.apache.abdera.parser.stax.FOMBuilder.getNextElementToParse(FOMBuilder.java:163)
      at org.apache.abdera.parser.stax.FOMBuilder.next(FOMBuilder.java:187)

      Attachments

        1. ChunkedTransferFailure.java
          9 kB
          Jason Venner (www.prohadoop.com)

        Activity

          People

            jasnell James M Snell
            jv_ning Jason Venner (www.prohadoop.com)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: