Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-5362

SolrCell's order of field operation with lowernames=true

    XMLWordPrintableJSON

    Details

      Description

      This follows from SOLR-1634.

      I am not sure if SOLR-1856 completely fixes SOLR-1634, particularly when lowernames=true comes in to the picture. Consider a case where:

      1. Tika generated field Category=Foo for a doc (e.g., this comes from user-defined document properties).

      2. literalsOverride=true.

      3. lowernames=true.

      4. User supplied literal.category=bar.

      According to the rules, literalsOverride is applied before lowernames and, thus, will have no effect here since the field Category from Tika and literal.category are considered different fields at this stage before lowernames=true kicks in. And when lowernames=true kicks in, it has the effect of merging Category into category, giving it both values Foo and bar.

      Adding fmap.Category=tika_category does not help because fmap is applied even later, by that time category already contains both Foo and bar.

      Adding fmap.Category=tika_category and with lowernames=false would do (regardless of literalsOverride), but what if we need lowernames=true and what if the capitalization of Category can vary (e.g., CATEGORY).

      Would it make sense to have an option to apply the rules in the order that they are specified in the config file or URL params rather than always in a static order?

      Thanks.

      PS. Marking this as Major because there seems to be no easy workaround (condition for Minor).

      ------------------------

      Response from Jan H√łydahl (link):

      To me it sounds like a potential, very simple solution would be to apply lowercasing at several places if lowernames=true

      Agreed. Particularly, to apply lowernames=true as soon as Tika has extracted a field, before literalsOverride is even considered.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Sit Manovit Chaiyasit (Sit) Manovit
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: