Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-4003

add alternative int - int maps, and int sets for better space/time performance


    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.6.0SDK
    • Fix Version/s: 2.7.0SDK
    • Component/s: Core Java Framework
    • Labels:


      int - int maps and int sets are implemented in UIMA as either red-black trees (size = 5 words (4 words for sets) + 1 bit per item, search time = log 2 size (binary search), insert /removal can cause rebalancing tree), or as intVectors (like ArrayList<Integers> but doesn't wrap ints as Integers).

      For int - int maps, add a hash version (loses key "ordering"), which takes 3 - 6 words per item (avg 4.5 words - slightly smaller), and has O(1) performance (based on existing JCasHashMap impl, but without concurrency support).

      For int sets, add 2 styles: one based on the same hash map (space = 1.5 to 3 words per item vs 4 words), the other based on BitSets. The BitSet one would be faster, would have a conventional key ordering, and would be smaller, if the max int stored < 128 * number of items stored. (This is often the case in uses where it is used to track Feature Structure addresses).

      Finally, add one adaptable int set that is usually is implemented as a BitSet, but if the items being stored are such that the size would be significantly larger than the IntHashSet implementation, switch to that alternative representation.




            • Assignee:
              schor Marshall Schor
              schor Marshall Schor
            • Votes:
              0 Vote for this issue
              1 Start watching this issue


              • Created: