Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8129

Support for defining a Unicode set filter when using ICUFoldingFilter

Details

    • New, Patch Available

    Description

      While ICUNormalizer2FilterFactory supports a filter attribute to define a Unicode set filter, ICUFoldingFilterFactory does not support it. A filter allows one to e.g. exclude a set of characters from being folded. E.g. for Finnish and Swedish the filter could be defined like this:

      <filter class="solr.ICUFoldingFilterFactory" filter="[^åäöÅÄÖ]"/>

      Note: An additional MappingCharFilterFactory or solr.LowerCaseFilterFactory would be needed for lowercasing the characters excluded from folding. This is similar to what ElasticSearch provides (see https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu-folding.html).

      I'll add a patch that does this similar to ICUNormalizer2FilterFactory. Applies at least to master and branch_7x.

      Attachments

        1. LUCENE-8129.patch
          6 kB
          Ere Maijala
        2. LUCENE-8129.patch
          6 kB
          Ere Maijala

        Activity

          People

            Unassigned Unassigned
            emaijala Ere Maijala
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment