Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
New
Description
When ngram or edgengram filters are used, any terms that are shorter than the minGramSize are completely removed from the token stream.
This is probably 100% what was intended, but I've seen it cause a lot of problems for users. I am not suggesting that the default behavior be changed. That would be far too disruptive to the existing user base.
I do think there should be a new boolean option, with a name like keepShortTerms, that defaults to false, to allow the short terms to be preserved.
Attachments
Attachments
Issue Links
- is duplicated by
-
SOLR-5152 EdgeNGramFilterFactory deletes token
- Closed
- links to