Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.0.2
-
None
-
None
-
New
Description
In line 65 of the BooleanFilter class there is an optimization for OpenBitSets, but i miss an optimization in line 62.
I would replace the existing line:
res = new OpenBitSetDISI(getDISI(shouldFilters, i, reader), reader.maxDoc());
with following code:
DocIdSet docIdSet = shouldFilters.get(i).getDocIdSet(reader); if(docIdSet instanceof OpenBitSet) { res = new OpenBitSetDISI(reader.maxDoc()); res.or((OpenBitSet) docIdSet); } else { res = new OpenBitSetDISI(docIdSet.iterator(), reader.maxDoc()); }
Same for line 78 and 95, adjusted for not and must filters.
That leads to an up to 5 times slower AND-combination in my test, where i had two filters to be AND-combined returning each a cached OpenBitSet, one with a cardinality of 15000 and the other with a cardinality of 13000. The result had a cardinality of 8300. Thats important if you do that 1000 times with a lot more documents.
The same must be also done for ChainedFilter in the method initialResult(..).
Also, the getDISI method in the BooleanFilter must be replaced by a getDocIdSet(..) method. This is useful because in line 87 the docIdSet is retrieved and in line 92 again when it is not of type OpenBitSet. This may also lead to a performance issue if the getDocIdSet method of a sub filter is not super fast.