Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8129

Support for defining a Unicode set filter when using ICUFoldingFilter

    Details

    • Lucene Fields:
      New, Patch Available

      Description

      While ICUNormalizer2FilterFactory supports a filter attribute to define a Unicode set filter, ICUFoldingFilterFactory does not support it. A filter allows one to e.g. exclude a set of characters from being folded. E.g. for Finnish and Swedish the filter could be defined like this:

      <filter class="solr.ICUFoldingFilterFactory" filter="[^åäöÅÄÖ]"/>

      Note: An additional MappingCharFilterFactory or solr.LowerCaseFilterFactory would be needed for lowercasing the characters excluded from folding. This is similar to what ElasticSearch provides (see https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu-folding.html).

      I'll add a patch that does this similar to ICUNormalizer2FilterFactory. Applies at least to master and branch_7x.

        Attachments

        1. LUCENE-8129.patch
          6 kB
          Ere Maijala
        2. LUCENE-8129.patch
          6 kB
          Ere Maijala

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              emaijala Ere Maijala
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: