Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      TokenFilter that folds all unicode digits (http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:General_Category=Decimal_Number:]) to 0-9.

      Historically a lot of the impacted analyzers couldn't even tokenize numbers at all, but now they use standardtokenizer for numbers/alphanum tokens. But its usually the case you will find e.g. a mix of both ascii digits and "native" digits, and today that makes searching difficult.

      Note this only impacts decimal digits, hence the name DecimalDigitFilter. So no processing of chinese numerals or anything crazy like that.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                rcmuir Robert Muir
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: