Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-4003

add alternative int - int maps, and int sets for better space/time performance

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.6.0SDK
    • 2.7.0SDK
    • Core Java Framework
    • None

    Description

      int - int maps and int sets are implemented in UIMA as either red-black trees (size = 5 words (4 words for sets) + 1 bit per item, search time = log 2 size (binary search), insert /removal can cause rebalancing tree), or as intVectors (like ArrayList<Integers> but doesn't wrap ints as Integers).

      For int - int maps, add a hash version (loses key "ordering"), which takes 3 - 6 words per item (avg 4.5 words - slightly smaller), and has O(1) performance (based on existing JCasHashMap impl, but without concurrency support).

      For int sets, add 2 styles: one based on the same hash map (space = 1.5 to 3 words per item vs 4 words), the other based on BitSets. The BitSet one would be faster, would have a conventional key ordering, and would be smaller, if the max int stored < 128 * number of items stored. (This is often the case in uses where it is used to track Feature Structure addresses).

      Finally, add one adaptable int set that is usually is implemented as a BitSet, but if the items being stored are such that the size would be significantly larger than the IntHashSet implementation, switch to that alternative representation.

      Attachments

        Activity

          People

            schor Marshall Schor
            schor Marshall Schor
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: