Lucene - Core
  1. Lucene - Core
  2. LUCENE-2724

BooleanFilter and ChainedFilter miss to fully optimize for OpenBitSets


    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0.2
    • Fix Version/s: None
    • Component/s: modules/other
    • Labels:
    • Lucene Fields:


      In line 65 of the BooleanFilter class there is an optimization for OpenBitSets, but i miss an optimization in line 62.

      I would replace the existing line:

      res = new OpenBitSetDISI(getDISI(shouldFilters, i, reader), reader.maxDoc());

      with following code:

      DocIdSet docIdSet = shouldFilters.get(i).getDocIdSet(reader);
      if(docIdSet instanceof OpenBitSet) {
      	res = new OpenBitSetDISI(reader.maxDoc());
      	res.or((OpenBitSet) docIdSet);
      } else {
      	res = new OpenBitSetDISI(docIdSet.iterator(), reader.maxDoc());

      Same for line 78 and 95, adjusted for not and must filters.

      That leads to an up to 5 times slower AND-combination in my test, where i had two filters to be AND-combined returning each a cached OpenBitSet, one with a cardinality of 15000 and the other with a cardinality of 13000. The result had a cardinality of 8300. Thats important if you do that 1000 times with a lot more documents.

      The same must be also done for ChainedFilter in the method initialResult(..).

      Also, the getDISI method in the BooleanFilter must be replaced by a getDocIdSet(..) method. This is useful because in line 87 the docIdSet is retrieved and in line 92 again when it is not of type OpenBitSet. This may also lead to a performance issue if the getDocIdSet method of a sub filter is not super fast.


        No work has yet been logged on this issue.


          • Assignee:
            Fatih Uzdilli
          • Votes:
            0 Vote for this issue
            0 Start watching this issue


            • Created: