Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-3818

Unsuported XML entity by XmiCas(De)serializer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Implemented
    • 2.4.2SDK
    • 2.6.0SDK
    • Collection Processing
    • None

    Description

      The UTF8 character '𝒪' can not be deserialized by `XmiCasDeserializer.deserialize'.

      Here is a way to reproduce this:

      import java.io.File;
      import java.io.FileInputStream;
      import java.io.FileOutputStream;
      import java.io.InputStream;
      import java.io.OutputStream;
      
      import org.apache.uima.cas.impl.XmiCasDeserializer;
      import org.apache.uima.cas.impl.XmiCasSerializer;
      import org.apache.uima.fit.factory.JCasFactory;
      import org.apache.uima.jcas.JCas;
      
      public class Test {
          public static void main(String[] args) throws Exception {
              JCas jCas = JCasFactory.createJCas();
              jCas.setDocumentText("𝒪");
              File file = new File("/tmp/test.xmi");
              OutputStream outputStream = new FileOutputStream(file);
              XmiCasSerializer.serialize(jCas.getCas(), outputStream);
      
              InputStream inputStream = new FileInputStream(file);
              XmiCasDeserializer.deserialize(inputStream, jCas.getCas());
          }
      }
      

      And here is the stacktrace:

      [Fatal Error] :1:350: Character reference "&#56490" is an invalid XML character.
      Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 350; Character reference "&#56490" is an invalid XML character.
      	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
      	at org.apache.uima.cas.impl.XmiCasDeserializer.deserialize(XmiCasDeserializer.java:1955)
      	at org.apache.uima.cas.impl.XmiCasDeserializer.deserialize(XmiCasDeserializer.java:1872)
      	at Test.main(Test.java:24)
           [java] Java Result: 1
      

      Please tell me if you need more information.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jgreg Gregoire Jadi
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: