Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
New
Description
Term frequencies are discarded in the DOCS_ONLY case from the postings list but they still count against the length normalization, which looks like it may screw stuff up.
I ran some quick experiments on LUCENE-8025, by encoding fieldInvertState.getUniqueTermCount() and it seemed worth fixing (e.g. 20% or 30% improvement potentially). Happy to do testing for real, if we want to fix.
But this seems tricky, today you can downgrade to DOCS_ONLY on the fly, and its hard for me to think about that case (i think its generally screwed up besides this, but still).