Description
TokenFilter that folds all unicode digits (http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:General_Category=Decimal_Number:]) to 0-9.
Historically a lot of the impacted analyzers couldn't even tokenize numbers at all, but now they use standardtokenizer for numbers/alphanum tokens. But its usually the case you will find e.g. a mix of both ascii digits and "native" digits, and today that makes searching difficult.
Note this only impacts decimal digits, hence the name DecimalDigitFilter. So no processing of chinese numerals or anything crazy like that.
Attachments
Attachments
Issue Links
- relates to
-
LUCENE-6914 DecimalDigitFilter skips characters in some cases (supplemental?)
- Resolved