Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-4146

Support Snapshot iterators for FSIndexes

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.7.0SDK
    • Component/s: Core Java Framework
    • Labels:
      None

      Description

      Implementing "protectIndices" could have a consequence that some updated FSs could be removed from the indices by the framework, and addedback later. If the user code were iterating they might get unexpected ConcurrentModificationExceptions because of this.

      Extend the iterators to include "snapshot" iterators, that take a snapshot of the index contents at iterator creation time, and then use that to iterate over; this allows the iterator to avoid ConcurrentModificationExceptions.

      Do this in a manner to continue to support the "extended for" style of iterating, where you can write

      for (MyAnnotation : fsIndexProducingSnapshotIterators) { ... }
      

        Activity

        Hide
        schor Marshall Schor added a comment -

        Thanks for expanding on this . I've been too bogged down in getting the new stuff to work (and along the way learning the details of the indexing / iterating / deserializing code) to focus more broadly....

        The snapshot stuff I want to work for both non-JCas and JCas approaches. Because UIMA Core already has support for the extended-for for FSIndex impl,, I'm just piggy-backing on top of that. If you want "snapshot" style, you would:

        // instead of
        for (AnnotationFS : cas.getAnnotationIndex(type)) { ... }
        // you would write
        for (AnnotationFS : cas.getAnnotationIndex(type).withSnapshotIterators()) { ... }
        

        I think the difference vs the "select" approach might be that select materializes (as actual Java Objects) all the FSs before the iteration starts, while the snapshot stuff doesn't - it is just "copying" the index (as a set of IntVectors), although it is converting the "Set" form to a "Sorted" form internally (which is more efficient and ok because this special index is read-only).

        I haven't considered approaches to make things more capable for Generics, although I've seen some examples. That sounds like a good extension - maybe for the next release after this one... I think that (at least for me) this might take some "quality thought time" which I'm short of right now

        Show
        schor Marshall Schor added a comment - Thanks for expanding on this . I've been too bogged down in getting the new stuff to work (and along the way learning the details of the indexing / iterating / deserializing code) to focus more broadly.... The snapshot stuff I want to work for both non-JCas and JCas approaches. Because UIMA Core already has support for the extended-for for FSIndex impl,, I'm just piggy-backing on top of that. If you want "snapshot" style, you would: // instead of for (AnnotationFS : cas.getAnnotationIndex(type)) { ... } // you would write for (AnnotationFS : cas.getAnnotationIndex(type).withSnapshotIterators()) { ... } I think the difference vs the "select" approach might be that select materializes (as actual Java Objects) all the FSs before the iteration starts, while the snapshot stuff doesn't - it is just "copying" the index (as a set of IntVectors), although it is converting the "Set" form to a "Sorted" form internally (which is more efficient and ok because this special index is read-only). I haven't considered approaches to make things more capable for Generics, although I've seen some examples. That sounds like a good extension - maybe for the next release after this one... I think that (at least for me) this might take some "quality thought time" which I'm short of right now
        Hide
        rec Richard Eckart de Castilho added a comment -

        Well...

        UIMA Core currently has a quite limited approach to using generics, e.g. the indexes use "AnnotationFS" instead of "MyAnnotation".
        Your post suggests that you might want to introduce a new index API that works on JCas rather than on CAS level.

        UIMA Core has a CAS-based API supporting extended for-loops already (FSIndex extends Iterable and FSIterator extends Iterator)

        for (AnnotationFS : cas.getAnnotationIndex(type)) { ... }
        

        uimaFIT offers a JCas-based type-safe API that looks like this:

        for (MyAnnotation : select(cas, MyAnnotation.class)) { ... }
        

        However, this doesn't operate on specific indexes, rather on the generic annotation index or potentially even on all indexed FSes.

        You might get some inspiration from these classes:

        org.apache.uima.fit.util.JCasUtil
        org.apache.uima.fit.util.FSCollectionFactory<T>
        

        ... and maybe also to some degree from this one, although it's not type-safe, because it doesn't use JCas:

        org.apache.uima.fit.util.CasUtil
        

        But actually, introducing a JCas API for indexes would probably be another issue unrelated to the support for snapshots, right?

        Show
        rec Richard Eckart de Castilho added a comment - Well... UIMA Core currently has a quite limited approach to using generics, e.g. the indexes use "AnnotationFS" instead of "MyAnnotation". Your post suggests that you might want to introduce a new index API that works on JCas rather than on CAS level. UIMA Core has a CAS-based API supporting extended for-loops already (FSIndex extends Iterable and FSIterator extends Iterator) for (AnnotationFS : cas.getAnnotationIndex(type)) { ... } uimaFIT offers a JCas-based type-safe API that looks like this: for (MyAnnotation : select(cas, MyAnnotation.class)) { ... } However, this doesn't operate on specific indexes, rather on the generic annotation index or potentially even on all indexed FSes. You might get some inspiration from these classes: org.apache.uima.fit.util.JCasUtil org.apache.uima.fit.util.FSCollectionFactory<T> ... and maybe also to some degree from this one, although it's not type-safe, because it doesn't use JCas: org.apache.uima.fit.util.CasUtil But actually, introducing a JCas API for indexes would probably be another issue unrelated to the support for snapshots, right?
        Hide
        schor Marshall Schor added a comment -

        I wasn't, but perhaps due to ignorance. Should I take a look at these?

        Show
        schor Marshall Schor added a comment - I wasn't, but perhaps due to ignorance. Should I take a look at these?
        Hide
        rec Richard Eckart de Castilho added a comment -

        You may be aware of the "select*(CAS, ...)" methods from the uimaFIT CasUtil and JCasUtil classes. Are you planning to copy their approach over to the UIMA core?

        Show
        rec Richard Eckart de Castilho added a comment - You may be aware of the "select*(CAS, ...)" methods from the uimaFIT CasUtil and JCasUtil classes. Are you planning to copy their approach over to the UIMA core?

          People

          • Assignee:
            schor Marshall Schor
            Reporter:
            schor Marshall Schor
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development