Affects Version/s: None
Fix Version/s: None
Component/s: contrib - Solr Cell (Tika extraction)
This follows from
1. Tika generated field Category=Foo for a doc (e.g., this comes from user-defined document properties).
4. User supplied literal.category=bar.
According to the rules, literalsOverride is applied before lowernames and, thus, will have no effect here since the field Category from Tika and literal.category are considered different fields at this stage before lowernames=true kicks in. And when lowernames=true kicks in, it has the effect of merging Category into category, giving it both values Foo and bar.
Adding fmap.Category=tika_category does not help because fmap is applied even later, by that time category already contains both Foo and bar.
Adding fmap.Category=tika_category and with lowernames=false would do (regardless of literalsOverride), but what if we need lowernames=true and what if the capitalization of Category can vary (e.g., CATEGORY).
Would it make sense to have an option to apply the rules in the order that they are specified in the config file or URL params rather than always in a static order?
PS. Marking this as Major because there seems to be no easy workaround (condition for Minor).
Response from Jan Høydahl (link):
To me it sounds like a potential, very simple solution would be to apply lowercasing at several places if lowernames=true
Agreed. Particularly, to apply lowernames=true as soon as Tika has extracted a field, before literalsOverride is even considered.