Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.3.1SDK
-
None
-
None
Description
CasToInlineXml adds indentation between adjacent XML elements. E.g. for a single character document with a single annotation covering that one character, it will write:
<?xml version="1.0" encoding="UTF-8"?> <Document> <uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1" language="x-unspecified"> <uima.tcas.Annotation sofa="Sofa" begin="0" end="1"> </uima.tcas.Annotation> </uima.tcas.DocumentAnnotation> </Document>
I think it should instead write everything in a single line, that is:
<?xml version="1.0" encoding="UTF-8"?> <Document><uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1" language="x-unspecified"><uima.tcas.Annotation sofa="Sofa" begin="0" end="1"> </uima.tcas.Annotation></uima.tcas.DocumentAnnotation></Document>
I believe this could be fixed by replacing the line:
XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream);
with the line:
XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream, false);
I think it's a bug that CasToInlineXml is changing the character offsets, but I would also be happy if there was an alternate constructor or a method on CasToInlineXml that allowed disabling the formatting.