Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Iterating over the annotation index with even a moderate number of defined types is dominated by the time spent checking individual indexes for concurrent modification. This is due to the fact that concurrent modification checks are done on all types being iterated over, even if the iteration only needs to process a couple of iterators. In fact, checking all iterators for modification has linear complexity in the number of subiterators used, while the actual iteration can be implemented with logarithmic complexity using e.g. a binary heap.
The UIMA documentation and JavaDoc do not state that the iterators should always recognize concurrent modification (FSIterator JavaDoc states "Implementations of this interface are not required to be fail-fast. That is, if the iterator's collection is modified, the effects on the iterator are in general undefined."). It thus makes sense to reduce the number of iterators being tested for concurrent modification at each moveToNext() step.
The attached patch replaces the checkConcurrentModificationAll() call in FSIndexRepositoryImpl.PointerIterator.moveToNext() with concurrent modification checks on only the iterators being used by the step; as the iterator becomes invalid it also checks all involved iterators for modification. By doing this it should be able to catch almost all concurrent modification without the excessive overhead.
In one of our performance tests iterating over the annotation index with 140 types defined is more than twice faster after the attached patch is applied.
Attachments
Attachments
Issue Links
- is part of
-
UIMA-1366 Binary heap annotation iterator implementation
- Closed