[LUCENE-1261] Impossible to use custom norm encoding/decoding - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Duplicate
Affects Version/s: 2.3.1
Fix Version/s: None
Component/s: core/query/scoring
Labels:
None
Environment:

All

Lucene Fields:

New

Description

Although it is possible to override methods encodeNorm and decodeNorm in a custom Similarity class, these methods are not actually used by the query processing and scoring functions, not by the indexing functions. The relevant Lucene classes all call "Similarity.decodeNorm" rather than "similarity.decodeNorm", i.e. the norm encoding/decoding is fixed to use that of the base Similarity class. Also index writing classes such as DocumentWriter use "Similarity.decodeNorm" rather than "similarity.decodeNorm", so we are stuck with the 3 bit mantissa encoding implemented by SmallFloat.floatToByte315 and SmallFloat.byte315ToFloat.

This is very restrictive and annoying, since in practice many users would prefer an encoding that allows finer distinctions for boost and normalisation factors close to 1.0. For example. SmallFloat.floatToByte52 uses 5 bits of mantissa, and this would be of great help in distinguishing much better between subtly different lengthNorms and FieldBoost/DocumentBoost values.

It hsould be easy to fix this by changing all instances of "Similarity.decodeNorm" and "Similarity.encodeNorm" to "similarity.decodeNorm" and "similarity.encodeNorm" in the Lucene code (there are only a few of each).

Attachments

Activity

People

Assignee:: Otis Gospodnetic

Reporter:: John Adams

Votes:: 1 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 08/Apr/08 16:40

Updated:: 28/Aug/22 11:49

Resolved:: 19/May/08 13:05