-
Type:
Improvement
-
Status: Closed
-
Priority:
Minor
-
Resolution: Fixed
-
Affects Version/s: None
-
Component/s: modules/analysis
-
Lucene Fields:New, Patch Available
While ICUNormalizer2FilterFactory supports a filter attribute to define a Unicode set filter, ICUFoldingFilterFactory does not support it. A filter allows one to e.g. exclude a set of characters from being folded. E.g. for Finnish and Swedish the filter could be defined like this:
<filter class="solr.ICUFoldingFilterFactory" filter="[^åäöÅÄÖ]"/>
Note: An additional MappingCharFilterFactory or solr.LowerCaseFilterFactory would be needed for lowercasing the characters excluded from folding. This is similar to what ElasticSearch provides (see https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu-folding.html).
I'll add a patch that does this similar to ICUNormalizer2FilterFactory. Applies at least to master and branch_7x.