Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.1SDK
    • Fix Version/s: 2.6.0SDK
    • Component/s: None
    • Labels:
      None

      Description

      CasToInlineXml adds indentation between adjacent XML elements. E.g. for a single character document with a single annotation covering that one character, it will write:

      <?xml version="1.0" encoding="UTF-8"?>
      <Document>
          <uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1" language="x-unspecified">
              <uima.tcas.Annotation sofa="Sofa" begin="0" end="1"> </uima.tcas.Annotation>
          </uima.tcas.DocumentAnnotation>
      </Document>
      

      I think it should instead write everything in a single line, that is:

      <?xml version="1.0" encoding="UTF-8"?>
      <Document><uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1" language="x-unspecified"><uima.tcas.Annotation sofa="Sofa" begin="0" end="1"> </uima.tcas.Annotation></uima.tcas.DocumentAnnotation></Document>
      

      I believe this could be fixed by replacing the line:

      XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream);
      

      with the line:

      XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream, false);
      

      I think it's a bug that CasToInlineXml is changing the character offsets, but I would also be happy if there was an alternate constructor or a method on CasToInlineXml that allowed disabling the formatting.

        Attachments

          Activity

            People

            • Assignee:
              schor Marshall Schor
              Reporter:
              steven.bethard Steven Bethard
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: