Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-4003

add alternative int - int maps, and int sets for better space/time performance

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.6.0SDK
    • Fix Version/s: 2.7.0SDK
    • Component/s: Core Java Framework
    • Labels:
      None

      Description

      int - int maps and int sets are implemented in UIMA as either red-black trees (size = 5 words (4 words for sets) + 1 bit per item, search time = log 2 size (binary search), insert /removal can cause rebalancing tree), or as intVectors (like ArrayList<Integers> but doesn't wrap ints as Integers).

      For int - int maps, add a hash version (loses key "ordering"), which takes 3 - 6 words per item (avg 4.5 words - slightly smaller), and has O(1) performance (based on existing JCasHashMap impl, but without concurrency support).

      For int sets, add 2 styles: one based on the same hash map (space = 1.5 to 3 words per item vs 4 words), the other based on BitSets. The BitSet one would be faster, would have a conventional key ordering, and would be smaller, if the max int stored < 128 * number of items stored. (This is often the case in uses where it is used to track Feature Structure addresses).

      Finally, add one adaptable int set that is usually is implemented as a BitSet, but if the items being stored are such that the size would be significantly larger than the IntHashSet implementation, switch to that alternative representation.

        Activity

        Hide
        schor Marshall Schor added a comment -

        Switched ListUtils to use PositiveIntSet

        Show
        schor Marshall Schor added a comment - Switched ListUtils to use PositiveIntSet
        Hide
        schor Marshall Schor added a comment -

        make this behave better for small sizes, simplify logic, improve testing for switching between styles

        Show
        schor Marshall Schor added a comment - make this behave better for small sizes, simplify logic, improve testing for switching between styles
        Hide
        schor Marshall Schor added a comment -

        Check use of PositiveIntSet - switch to IntHashSet if use is for small number of ints not starting and near 0. Example: ListUtils, where they're used to detect collisions in List elements.

        Show
        schor Marshall Schor added a comment - Check use of PositiveIntSet - switch to IntHashSet if use is for small number of ints not starting and near 0. Example: ListUtils, where they're used to detect collisions in List elements.
        Hide
        schor Marshall Schor added a comment -

        assert error while expanding

        Show
        schor Marshall Schor added a comment - assert error while expanding

          People

          • Assignee:
            schor Marshall Schor
            Reporter:
            schor Marshall Schor
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development