Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
8.7
-
None
-
None
-
*Elasticsearch version* 7.11.2:
*Plugins installed*: [analysis-stempel]
*OS version* CentOS
-
New
Description
Actual:
I observed unexpected behaviour. Some numbers are affected by stemmer. It causes wrong search results.
For example "2021" -> "20ć".
Expected:
string numbers should not be changed.
Reproduce:
Issue can be reproduced with elasticsearch:
request:
POST _analyze { "tokenizer": "standard", "filter": ["polish_stem"], "text": "2021" }
response:
{ "tokens": [ { "token": "20ć", "start_offset": 0, "end_offset": 4, "type": "<NUM>", "position": 0 } ] }
I suspect the newer versions are also affected, but I don't have possibility to verify it.