Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10233

Store docIds as bitset when doc IDs are strictly sorted and dense

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 9.1
    • core/codecs
    • None
    • New

    Description

      In low cardinality points cases, id blocks will usually store doc ids that have the same point value, and intersect will get into addAll logic. If we store ids as bitset, and give the IntersectVisitor bulk visiting ability, we can speed up addAll because we can just execute the 'or' logic between the result and the block ids.

      Optimization will be triggered when the following conditions are met at the same time:

      1. doc IDs are sorted strictly
      2. max(docId) - min(docId) <= 16 * pointCount (in order to avoid expanding too much storage)

      I mocked a field that has 10,000,000 docs per value and search it with a 1 term PointInSetQuery, the build scorer time decreased from 151ms to 5ms.

      Attachments

        1. SparseFixedBitSet.png
          481 kB
          Feng Guo

        Activity

          People

            Unassigned Unassigned
            gf2121 Feng Guo
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3h 50m
                3h 50m