Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1571

unicode collation support

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 1.5, 3.1, 4.0-ALPHA
    • Schema and Analysis
    • None

    Description

      This patch adds support for unicode collation (searching and sorting).
      Unicode collation is helpful in a search engine, for many languages you want things to match or sort differently.
      You might even want to use copyfield and support different sort orders/matching schemes if you need to support multiple languages.

      This is simply a factory for lucene's CollationKeyFilter, which indexes binary collation keys in a special format that preserves binary sort order.

      I've added support for creating a Collator in two ways:

      • system collator from a Locale spec (language + country + variant)
      • tailored collator from custom rules in a text file

      in no way is there an option to use the "default" locale of the jvm, (I consider this a bit dangerous)
      in this patch, it is mandatory to define the locale explicitly for a system collator.

      The required lucene-collation-2.9.1.jar is only 12KB.

      Attachments

        1. SOLR-1571.patch
          12 kB
          Robert Muir

        Issue Links

          Activity

            People

              shalin Shalin Shekhar Mangar
              rcmuir Robert Muir
              Votes:
              4 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: