[LUCENE-7536] ASCIIFoldingFilterFactory.getMultiTermComponent can emit two tokens - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 6.4, 7.0
Component/s: None
Labels:
None

Lucene Fields:

New

Description

My understanding is that it is a requirement for multi-term analysis to only normalize tokens, and not eg. remove tokens (stop filter) or add tokens (by tokenizing or adding synonyms). Yet ASCIIFoldingFilterFactory.getMultiTermComponent will return a factory that emits synonyms if preserveOriginal is set to true on the original filter.

This looks like a bug to me but I'm not entirely sure how to fix it. Should the multi-term analysis component do ascii folding or not if the original factory has preserveOriginal set to true?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-7536.patch
15/Nov/16 14:47
5 kB
Adrien Grand

Activity

People

Assignee:: Unassigned

Reporter:: Adrien Grand

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 03/Nov/16 17:50

Updated:: 28/Aug/22 15:05

Resolved:: 18/Nov/16 09:28