UIMA
  1. UIMA
  2. UIMA-1771

CasToInlineXml truncates attributes

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.3.1SDK
    • Component/s: None
    • Labels:
      None

      Description

      org.apache.uima.util.CasToInlineXml has a hard coded limit of 64 characters for attribute values. This makes inline xml suitable only for debugging, and the documentation contains no warning to this effect.

      Given that this component is part of uima-core (and not just example code) I would expect such pitfalls to be at least documented. A better solution would be to make the truncating optional/configurable, so that inline xml could be used as a basis for further processing outside UIMA.

        Activity

        Hide
        Marshall Schor added a comment -

        This routine has other limitations as well. For instance, a feature structure can have slots whose value is a reference to other feature structures. These are not handled by this routine either.

        It also replaces certain characters in the document's subject-of-analysis text fields by blanks (see method replaceInvalidXmlChars). It also presumes that the CAS's subject of analysis is a text string, which is not always the case. If the CAS doesn't have a document text, then this method will throw a null pointer exception.

        It also will fail to faithfully write out annotations which are overlapping, because those cannot be represented as inline XML.

        I guess I'm in favor of documenting these issues in the Javadocs.

        Show
        Marshall Schor added a comment - This routine has other limitations as well. For instance, a feature structure can have slots whose value is a reference to other feature structures. These are not handled by this routine either. It also replaces certain characters in the document's subject-of-analysis text fields by blanks (see method replaceInvalidXmlChars). It also presumes that the CAS's subject of analysis is a text string, which is not always the case. If the CAS doesn't have a document text, then this method will throw a null pointer exception. It also will fail to faithfully write out annotations which are overlapping, because those cannot be represented as inline XML. I guess I'm in favor of documenting these issues in the Javadocs.
        Hide
        Marshall Schor added a comment -

        updated Javadocs to say this only creates an approximate representation.

        Show
        Marshall Schor added a comment - updated Javadocs to say this only creates an approximate representation.

          People

          • Assignee:
            Unassigned
            Reporter:
            Jens Grivolla
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development