Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-2101

CasToInlineXml adds whitespace

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.1SDK
    • 2.6.0SDK
    • None
    • None

    Description

      CasToInlineXml adds indentation between adjacent XML elements. E.g. for a single character document with a single annotation covering that one character, it will write:

      <?xml version="1.0" encoding="UTF-8"?>
      <Document>
          <uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1" language="x-unspecified">
              <uima.tcas.Annotation sofa="Sofa" begin="0" end="1"> </uima.tcas.Annotation>
          </uima.tcas.DocumentAnnotation>
      </Document>
      

      I think it should instead write everything in a single line, that is:

      <?xml version="1.0" encoding="UTF-8"?>
      <Document><uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1" language="x-unspecified"><uima.tcas.Annotation sofa="Sofa" begin="0" end="1"> </uima.tcas.Annotation></uima.tcas.DocumentAnnotation></Document>
      

      I believe this could be fixed by replacing the line:

      XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream);
      

      with the line:

      XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream, false);
      

      I think it's a bug that CasToInlineXml is changing the character offsets, but I would also be happy if there was an alternate constructor or a method on CasToInlineXml that allowed disabling the formatting.

      Attachments

        1. UIMA-2101-eckart-20110329.patch
          15 kB
          deprecated (use "rec")

        Activity

          People

            schor Marshall Schor
            steven.bethard Steven Bethard
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: