Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5052

bitset codec for off heap filters

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/codecs
    • Lucene Fields:
      New

      Description

      Colleagues,

      When we filter we don’t care any of scoring factors i.e. norms, positions, tf, but it should be fast. The obvious way to handle this is to decode postings list and cache it in heap (CachingWrappingFilter, Solr’s DocSet). Both of consuming a heap and decoding as well are expensive.
      Let’s write a posting list as a bitset, if df is greater than segment's maxdocs/8 (what about skiplists? and overall performance?).
      Beside of the codec implementation, the trickiest part to me is to design API for this. How we can let the app know that a term query don’t need to be cached in heap, but can be held as an mmaped bitset?

      WDYT?

        Attachments

        1. LUCENE-5052-1.patch
          19 kB
          Dr Oleg Savrasov
        2. LUCENE-5052.patch
          20 kB
          Nina Gracheva
        3. bitsetcodec.zip
          15 kB
          Yury Pakhomov
        4. bitsetcodec.zip
          25 kB
          Dr Oleg Savrasov

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                mkhl Mikhail Khludnev
              • Votes:
                3 Vote for this issue
                Watchers:
                22 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: