Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-4059

Checking for incorrect key modifications

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.7.0SDK
    • Component/s: Core Java Framework
    • Labels:
      None

      Description

      Address the issue raised in Jira UIMA-4049, as follows: (Note: this implementation has been superceded by UIMA-4135) Add an optional check that checks, for every set of a Feature value, whether or not that Feature is used as a key in any Sort or Set index, and if so, if the Feature Structure is currently in any index in any View; if so, an exception is thrown.

      This additional check is normally disabled, but can be enabled by specifying the JVM property -Duima.check_fs_update_corrupts_index.

        Issue Links

          Activity

          Hide
          schor Marshall Schor added a comment -

          The tracking of which FSs are added/removed to/from indices needs to be "per view", since it's perfectly OK to add the same FS to multiple views (if it is not a subtype of AnnotatorBase). The testing of feature modifications has to check if the FS is indexed in any view - a potentially more expensive operation. The overheads involved in add/remove to/from index are such that the additional check probably is too small to measure, but the overhead for testing if a FS is in any index for every setting of Feature Values that are in keys might be excessive. Since the most common pattern is to set several features for a new FS at once, it may pay to have a one-element cache of the last FS that was found to not be in any index, or to have this check be separately omittable (like Java asserts - you could turn it on if you have some issue like unexpected index behavior.)

          Show
          schor Marshall Schor added a comment - The tracking of which FSs are added/removed to/from indices needs to be "per view", since it's perfectly OK to add the same FS to multiple views (if it is not a subtype of AnnotatorBase). The testing of feature modifications has to check if the FS is indexed in any view - a potentially more expensive operation. The overheads involved in add/remove to/from index are such that the additional check probably is too small to measure, but the overhead for testing if a FS is in any index for every setting of Feature Values that are in keys might be excessive. Since the most common pattern is to set several features for a new FS at once, it may pay to have a one-element cache of the last FS that was found to not be in any index, or to have this check be separately omittable (like Java asserts - you could turn it on if you have some issue like unexpected index behavior.)
          Hide
          schor Marshall Schor added a comment - - edited

          If this was enabled, it would be theoretically be possible to not throw an exception, but rather "make it work" by automatically removing the FS from the indices, making the modification, and then adding it back. (This won't work if the modification were inside an iterator - the iterator throw a ConcurrentModificationException, which is why this is just theoretically possible).

          Show
          schor Marshall Schor added a comment - - edited If this was enabled, it would be theoretically be possible to not throw an exception, but rather "make it work" by automatically removing the FS from the indices, making the modification, and then adding it back. (This won't work if the modification were inside an iterator - the iterator throw a ConcurrentModificationException, which is why this is just theoretically possible).
          Hide
          schor Marshall Schor added a comment -

          Bhavani pointed out that the test - remove - modify - add operation would need to be done per View, since any given FS could be indexed in some but not all views. A speed up would be possible for FSs which were subtypes of AnnotationBase, because those can only be indexed in one View - the view corresponding to the Sofa for that FS.

          Show
          schor Marshall Schor added a comment - Bhavani pointed out that the test - remove - modify - add operation would need to be done per View, since any given FS could be indexed in some but not all views. A speed up would be possible for FSs which were subtypes of AnnotationBase, because those can only be indexed in one View - the view corresponding to the Sofa for that FS.
          Hide
          schor Marshall Schor added a comment -

          The remove - modify - add sequence is only needed for Set and Sorted (not Bag) indices - Bag indices have no keys.

          Show
          schor Marshall Schor added a comment - The remove - modify - add sequence is only needed for Set and Sorted (not Bag) indices - Bag indices have no keys.
          Hide
          schor Marshall Schor added a comment -

          Change this Jira to just address the additional optional checking for modification of a key used in any index in any View. UIMA-3399 is the issue for the change in behavior for multiple addToIndexes for the same identical FS

          Show
          schor Marshall Schor added a comment - Change this Jira to just address the additional optional checking for modification of a key used in any index in any View. UIMA-3399 is the issue for the change in behavior for multiple addToIndexes for the same identical FS
          Hide
          schor Marshall Schor added a comment -

          This has evolved with the realization that the Framework ought to support protecting indices, since the remove / addback operation is potentially complex, and optimizable. Complex: the remove must be over all views where the item may be indexed; the remove can exclude bag indices (because they have no keys), and if allow-multiple-add-to-indices are allowed, a count needs to be maintained when removing and used when adding back (and multiple removes need to happen, until all identical instances are removed). optimizable - the remove only needs to be done if the FS is in the index, and we can cache the fact it's not in the index which covers the most common cases; and the remove and add can skip bag indices.

          Support for this is in UIMA-4135. Instead of throwing exceptions, the check would be better if it operated in "automatic" mode - where it automatically does the right remove/addback operation, and (optionally) gives a report.

          This design would have the following modes:

          1) high performance - no runtime-checks for corruption, as was the case prior to 2.7.0. But the user could implement protectIndices blocks or try-finally blocks to do the optimized remove/addbacks.

          2) automatic, with or without reporting. This would replace the earlier design of this Jira, so instead of throwing an exception, it would report a Warning, and then proceed to automatically "handle" it. Users not concerned with high performance could run with this mode without reporting. Users concerned with high performance would get the report, implement their own "fixes" to the reported issues (including implementing protectIndices blocks / try - finally blocks around the reported spots), and then, run in #1 high performance mode.

          Users running with automatic and with their own protectIndices blocks or try-finally blocks might be "rechecking" their pipeline to see if other things have crept in. To make this work, automatic would be "turned off" within a protected area.

          Show
          schor Marshall Schor added a comment - This has evolved with the realization that the Framework ought to support protecting indices, since the remove / addback operation is potentially complex, and optimizable. Complex: the remove must be over all views where the item may be indexed; the remove can exclude bag indices (because they have no keys), and if allow-multiple-add-to-indices are allowed, a count needs to be maintained when removing and used when adding back (and multiple removes need to happen, until all identical instances are removed). optimizable - the remove only needs to be done if the FS is in the index, and we can cache the fact it's not in the index which covers the most common cases; and the remove and add can skip bag indices. Support for this is in UIMA-4135 . Instead of throwing exceptions, the check would be better if it operated in "automatic" mode - where it automatically does the right remove/addback operation, and (optionally) gives a report. This design would have the following modes: 1) high performance - no runtime-checks for corruption, as was the case prior to 2.7.0. But the user could implement protectIndices blocks or try-finally blocks to do the optimized remove/addbacks. 2) automatic, with or without reporting. This would replace the earlier design of this Jira, so instead of throwing an exception, it would report a Warning, and then proceed to automatically "handle" it. Users not concerned with high performance could run with this mode without reporting. Users concerned with high performance would get the report, implement their own "fixes" to the reported issues (including implementing protectIndices blocks / try - finally blocks around the reported spots), and then, run in #1 high performance mode. Users running with automatic and with their own protectIndices blocks or try-finally blocks might be "rechecking" their pipeline to see if other things have crept in. To make this work, automatic would be "turned off" within a protected area.
          Hide
          schor Marshall Schor added a comment -

          This implementation has been superceded.

          Show
          schor Marshall Schor added a comment - This implementation has been superceded.

            People

            • Assignee:
              schor Marshall Schor
              Reporter:
              schor Marshall Schor
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development