Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6737

Add DecimalDigitFilter

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      TokenFilter that folds all unicode digits (http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:General_Category=Decimal_Number:]) to 0-9.

      Historically a lot of the impacted analyzers couldn't even tokenize numbers at all, but now they use standardtokenizer for numbers/alphanum tokens. But its usually the case you will find e.g. a mix of both ascii digits and "native" digits, and today that makes searching difficult.

      Note this only impacts decimal digits, hence the name DecimalDigitFilter. So no processing of chinese numerals or anything crazy like that.

        Attachments

        1. LUCENE-6737.patch
          31 kB
          Robert Muir

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                rcmuir Robert Muir
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: