Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
New
Description
Ideally, the first token of WDGF is the preserveOriginal (if configured to emit), and the second should be the catenateAll (if configured to emit). The deprecated WDF does this but WDGF can sometimes put the first other token earlier when there is a non-emitted candidate sub-token.
Example input "8-other" when only generateWordParts and catenateAll – not generateNumberParts. WDGF internally sees the '8' but moves on. Ultimately, the "other" token and the catenated "8other" will appear at the same internal position, which by luck fools the sorter to emit "other" first.
Attachments
Issue Links
- is related to
-
LUCENE-8730 Ensure WordDelimiterGraphFilter always emits its original token first
-
- Resolved
-
- relates to
-
LUCENE-9458 WordDelimiterGraphFilter (and non-graph) should tie-break order using end offset
-
- Closed
-
- links to