[LUCENE-1420] Similarity.lengthNorm and positionIncrement=0 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.9
Fix Version/s: 2.9
Component/s: core/index
Labels:
None

Lucene Fields:

New

Description

Calculation of lengthNorm factor should in some cases take into account the number of tokens with positionIncrement=0. This should be made optional, to support two different scenarios:

when analyzers insert artificially constructed tokens into TokenStream (e.g. ASCII-fied versions of accented terms, stemmed terms), and it's unlikely that users submit queries containing both versions of tokens: in this case lengthNorm calculation should ignore the tokens with positionIncrement=0.

when analyzers insert synonyms, and it's likely that users may submit queries that contain multiple synonymous terms: in this case the lengthNorm should be calculated as it is now, i.e. it should take into account all terms no matter what is their positionIncrement.

The default should be backward-compatible, i.e. it should count all tokens.

(See also the discussion here: http://markmail.org/message/vfvmzrzhr6pya22h )

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-1420.patch
28/Oct/08 14:04
24 kB
Michael McCandless
similarity-v2.patch
25/Oct/08 22:56
15 kB
Andrzej Bialecki
similarity.patch
13/Oct/08 13:50
15 kB
Andrzej Bialecki

Activity

People

Assignee:: Michael McCandless

Reporter:: Andrzej Bialecki

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 13/Oct/08 10:08

Updated:: 28/Aug/22 11:54

Resolved:: 03/Nov/08 18:05