[LUCENE-9107] CommonsTermsQuery with huge no. of terms slower with top-k scoring - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 8.3
Fix Version/s: None
Component/s: core/search
Labels:
None

Lucene Fields:

New

Description

In [1] a CommonTermsQuery is used in order to perform a query with lots of (duplicate) terms. Using a max term frequency cutoff of 0.999 for low frequency terms, the query, although big, finishes in around 2-300ms with Lucene 7.6.0.
However, when upgrading the code to Lucene 8.x, the query runs in 2-3s instead [2].
After digging a bit into it it seems that the regression in speed comes from the fact that top-k scoring introduced by default in version 8 is causing that, not sure "where" exactly in the code though.
When switching back to complete hit scoring [3], the speed goes back to the initial 2-300ms also in Lucene 8.3.x.
It'd be nice to understand the reason why this is happening and if it is only concerning CommonTermsQuery or affecting BooleanQuery as well.
If this is a case that depends on the data and application involved (Anserini in this case), the application should handle it, otherwise if it is a regression/bug in Lucene it'd be nice to fix it.

[1] : https://github.com/tteofili/Anserini-embeddings/blob/nnsearch/src/main/java/io/anserini/embeddings/nn/fw/FakeWordsRunner.java
[2] : https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/analysis/vectors/ApproximateNearestNeighborEval.java
[3] : https://github.com/tteofili/anserini/blob/ann-paper-reproduce/src/main/java/io/anserini/analysis/vectors/ApproximateNearestNeighborEval.java#L174

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2020-08-07-16-54-27-905.png
07/Aug/20 14:54
1.40 MB
Vincenzo D'Amore
Screenshot 2020-08-07 at 16.20.01.png
07/Aug/20 14:52
1.40 MB
Vincenzo D'Amore
Screenshot 2020-08-07 at 16.20.05.png
07/Aug/20 14:57
1.39 MB
Vincenzo D'Amore

Activity

People

Assignee:: Unassigned

Reporter:: Tommaso Teofili

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 23/Dec/19 09:15

Updated:: 28/Aug/22 15:54