[LUCENE-6737] Add DecimalDigitFilter - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 5.4, 6.0
Component/s: None
Labels:
None

Lucene Fields:

New

Description

TokenFilter that folds all unicode digits (http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:General_Category=Decimal_Number:]) to 0-9.

Historically a lot of the impacted analyzers couldn't even tokenize numbers at all, but now they use standardtokenizer for numbers/alphanum tokens. But its usually the case you will find e.g. a mix of both ascii digits and "native" digits, and today that makes searching difficult.

Note this only impacts decimal digits, hence the name DecimalDigitFilter. So no processing of chinese numerals or anything crazy like that.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-6737.patch
14/Aug/15 00:36
31 kB
Robert Muir

Issue Links

relates to

LUCENE-6914 DecimalDigitFilter skips characters in some cases (supplemental?)

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Robert Muir

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 14/Aug/15 00:35

Updated:: 28/Aug/22 14:40

Resolved:: 14/Aug/15 13:42