Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
3.1, 4.0-ALPHA
-
None
-
New, Patch Available
Description
DictionaryCompoundWordTokenFilter: Due to an off-by-one error, a word component placed last in a compound word, will not get a token if its length is equal to the minimal sub-word length.
Example:
min sub-word length: 4
Dictionary:
word: "alfabeta"
Created tokens:
Expected tokens:
{"alfabeta", "alfa", "beta"}I have a patch with a testcase that fails on versions 3.1 and 4.0 (probably for everything between as well, and for previous versions), along with a bugfix.