Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Implemented
-
2.4.2SDK
-
None
Description
The UTF8 character '𝒪' can not be deserialized by `XmiCasDeserializer.deserialize'.
Here is a way to reproduce this:
import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.InputStream; import java.io.OutputStream; import org.apache.uima.cas.impl.XmiCasDeserializer; import org.apache.uima.cas.impl.XmiCasSerializer; import org.apache.uima.fit.factory.JCasFactory; import org.apache.uima.jcas.JCas; public class Test { public static void main(String[] args) throws Exception { JCas jCas = JCasFactory.createJCas(); jCas.setDocumentText("𝒪"); File file = new File("/tmp/test.xmi"); OutputStream outputStream = new FileOutputStream(file); XmiCasSerializer.serialize(jCas.getCas(), outputStream); InputStream inputStream = new FileInputStream(file); XmiCasDeserializer.deserialize(inputStream, jCas.getCas()); } }
And here is the stacktrace:
[Fatal Error] :1:350: Character reference "�" is an invalid XML character. Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 350; Character reference "�" is an invalid XML character. at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.uima.cas.impl.XmiCasDeserializer.deserialize(XmiCasDeserializer.java:1955) at org.apache.uima.cas.impl.XmiCasDeserializer.deserialize(XmiCasDeserializer.java:1872) at Test.main(Test.java:24) [java] Java Result: 1
Please tell me if you need more information.