-
Type:
Improvement
-
Status: Closed
-
Priority:
Major
-
Resolution: Won't Fix
-
Affects Version/s: 2.2, 2.3, 2.3.1, 2.4
-
Fix Version/s: 2.4
-
Component/s: core/search
-
Labels:None
-
Environment:
any
-
Lucene Fields:New, Patch Available
The current implementation of the "Hits" class sometimes performs score normalization.
In particular, whenever the top-ranked score is bigger than 1.0, it is normalized to a maximum of 1.0.
In this case, Hits may return different score results than TopDocs-based methods.
In my scenario (a federated search system), Hits delievered just plain wrong results.
I was merging results from several sources, all having homogeneous statistics (similar to MultiSearcher, but over the Internet using HTTP/XML-based protocols).
Sometimes, some of the sources had a top-score greater than 1, so I ended up with garbled results.
I suggest to add a switch to enable/disable this score-normalization at runtime.
My patch (attached) has an additional peformance benefit, since score normalization now occurs only when Hits#score() is called, not when creating the Hits result list. Whenever scores are not required, you save one multiplication per retrieved hit (i.e., at least 100 multiplications with the current implementation of Hits).