Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-4111

Change how default bag indices are created

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.7.0SDK
    • Component/s: Core Java Framework
    • Labels:
      None

      Description

      UIMA-173 added the concept of a universal default bag index for types that would be created if no other index was defined for that type. That Jira has a link to the motivation, where it is clear that this was intended to simplify how UIMA works and allow all feature structures that were addedToIndexes() to be retrieved.

      UIMA-297 corrected some anomalies in the original implementation.

      This Jira is to correct the edge cases that happen when there are only Set indices defined for a type. Because of the behavior of Set indices which
      do not add to their index the 2nd or subsequent FSs whose key values match the comparator definition for the Set, the original motivation of the default bag index is thwarted in this case. This has caused several edge case issues; a special note about this surprising behavior had to be included in the UIMA documentation, etc.

      More recently, another edge case has been discovered, when an annotator contained in an aggregate having sufficient index definitions to insure a non-set index for type T is remoted, and that remote service has only a Set index for type T. Assume that the client has added-to-indices 100 instances of type T, the CAS is serialized to the remote, the remote deserializes the CAS and does 100 add-to-indices, of which perhaps 50 succeed, and the other 50 are no-ops (due to the Set equivalance). Now when the remote CAS is returned, only 50 will appear in the index back at the client. This goes against the principle in UIMA where we try and have remoting of components not affect the semantics, where possible. This is also quite a surprising effect, which won't be expected by most users. This is also an "unstable" effect, in that, if a pipeline "assembler" (knowing little about the "internals" of the components) were to add a component to the remote which included a non-set index for type T, it would start behaving differently, not losing any indexed items.

      The converse would also be true: If the remote had no indices defined for type T, then add-to-indices for type T would be recorded in lazily created default bag indices, and those events would be sent back to the client. If an assembler were to now add a component which contained only a set definition for type T, this behavior would suddenly start dropping FSs that were excluded due to the Set comparator.

      For all these reasons (discovered in discussions with Edward Epstein and Adam Lally), and because of the original intent of this default bag index (discovered by reading the mail archives pointed to by the above two Jiras which describe in some detail the motivations for this), this Jira changes the logic of when the default bag index is created to create it whenever the situation is that some add-to-indices event would not record an addition (e.g., if there were no indices, or only Set indices, and the FS matched elements already in the Sets.).

      This change will affect documentation, so update that too. In particular, the NOTE in this section http://uima.apache.org/d/uimaj-2.6.0/tutorials_and_users_guides.html#ugr.tug.aae.reading_results_previous_annotators will no longer apply.

      The behavior of getAllIndexedFS(type) will change - it will no longer have an exception for the special case where only Set indices were defined for the type.

      Because it seems that it is extremely unlikely that the previous behavior was being depended upon, there is no global flag to restore the previous behavior.

        Issue Links

          Activity

          Hide
          schor Marshall Schor added a comment -

          Also changed the type of the internal index used for Sofa FSs in the base view to be a Bag, rather than a Set. This should be marginally smaller and faster, and will avoid the new code which would be creating a Default Bag index for this in the base view.

          Show
          schor Marshall Schor added a comment - Also changed the type of the internal index used for Sofa FSs in the base view to be a Bag, rather than a Set. This should be marginally smaller and faster, and will avoid the new code which would be creating a Default Bag index for this in the base view.

            People

            • Assignee:
              schor Marshall Schor
              Reporter:
              schor Marshall Schor
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development