Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-6128

Allow XMI to be optionally serialized with XML 1.1 instead of only 1.0

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.11.0SDK, 3.2.0SDK
    • UIMA
    • None

    Description

      Some unicode characters are not handled by XML 1.0 and it can require some normalization or cleanup to be able to serialize the CAS to XMI, but requirements may not necessarily allow all such characters to be fully removed from the CAS. It can also be impossible to do such normalization/cleanup without full reprocess when converting data already stored as compressed binaries to XMI. Being able to optionally select XML 1.1 instead of the default XML 1.0 would be an easier way for some to bypass many of those unicode issues.

      See also discussion on the UIMA mailing list:

      https://lists.apache.org/thread.html/7f8124b7be9ea20ab21dc616243e5661a0b7668a856532031fda71e3@%3Cuser.uima.apache.org%3E

      This feature request suggests that an additional SerialFormat is introduced, e.g. XMI_1_1, which can be selected as format parameter in the CasIOUtils.save methods.

       

       

      Attachments

        1. OddFeatureText.java
          2 kB
          Rune Stilling
        2. SimpleTypeSystem_TS.xml
          3 kB
          Rune Stilling

        Activity

          People

            schor Marshall Schor
            mjuric Mario Juric
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h
                1h