Details

    • Type: New Feature New Feature
    • Status: Reopened
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 6.0
    • Component/s: update
    • Labels:
      None

      Description

      When dealing with highly heterogeneous documents with different fields per document, it can be very useful to know what fields are present on the result documents from a search. For example, this could be used to determine which fields make the best facets for a given query.

      1. FieldsUsedUpdateProcessorFactory.java
        4 kB
        Erik Hatcher
      2. FieldsUsedUpdateProcessorFactory.java
        1 kB
        Erik Hatcher
      3. SOLR-1280.patch
        2 kB
        Erik Hatcher

        Activity

        Hide
        Erik Hatcher added a comment - - edited

        This is just a quick implementation to demonstrate a field being added dynamically which lists all the fields on a given document.

        TODO: Make this configurable to specify the field name and perhaps list fields (by pattern, possibly) to include or exclude.

        Show
        Erik Hatcher added a comment - - edited This is just a quick implementation to demonstrate a field being added dynamically which lists all the fields on a given document. TODO: Make this configurable to specify the field name and perhaps list fields (by pattern, possibly) to include or exclude.
        Hide
        Erik Hatcher added a comment -

        Updated version that allows configuration of "fields used" field and a field name regex for matching

        Show
        Erik Hatcher added a comment - Updated version that allows configuration of "fields used" field and a field name regex for matching
        Hide
        Erik Hatcher added a comment -

        In this update the config can be something like this:

            <updateRequestProcessorChain name="fields_used" default="true">
              <processor class="solr.processor.FieldsUsedUpdateProcessorFactory">
                <str name="fieldsUsedFieldName">attribute_fields</str>
                <str name="fieldNameRegex">.*_attribute</str>
              </processor>
              <processor class="solr.LogUpdateProcessorFactory" />
              <processor class="solr.RunUpdateProcessorFactory" />
            </updateRequestProcessorChain>
        

        Regex was chosen to allow flexibility in matching field names for inclusion, but I think perhaps a better (more easily comprehended/configured) way would be to have a comma-separated list of field names that could contain a "*" for globbing, which should be about all the flexibility needed for this.

        Show
        Erik Hatcher added a comment - In this update the config can be something like this: <updateRequestProcessorChain name= "fields_used" default = " true " > <processor class= "solr.processor.FieldsUsedUpdateProcessorFactory" > <str name= "fieldsUsedFieldName" >attribute_fields</str> <str name= "fieldNameRegex" >.*_attribute</str> </processor> <processor class= "solr.LogUpdateProcessorFactory" /> <processor class= "solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> Regex was chosen to allow flexibility in matching field names for inclusion, but I think perhaps a better (more easily comprehended/configured) way would be to have a comma-separated list of field names that could contain a "*" for globbing, which should be about all the flexibility needed for this.
        Hide
        Mohammad added a comment -

        This is functionality that would be useful in my project. I have no experience in java but I have studied the code and it looks straight forward and I could probably use it straight out of the box in my own solr build. Has this been used in production by anyone? Has there been any issues?

        Show
        Mohammad added a comment - This is functionality that would be useful in my project. I have no experience in java but I have studied the code and it looks straight forward and I could probably use it straight out of the box in my own solr build. Has this been used in production by anyone? Has there been any issues?
        Hide
        Erik Hatcher added a comment -

        (in a few days) I'm going to address this one by using the new script update processor and providing an example script that achieves this same thing without the need to add these classes inside Solr.

        Mohammad - I'm unaware of this being used in production as-is, though this was distilled from the same idea being used at the Smithsonian. Feel free to borrow this for your own application, though I won't be committing this to Solr but rather refactoring it to a script update processor example.

        Show
        Erik Hatcher added a comment - (in a few days) I'm going to address this one by using the new script update processor and providing an example script that achieves this same thing without the need to add these classes inside Solr. Mohammad - I'm unaware of this being used in production as-is, though this was distilled from the same idea being used at the Smithsonian. Feel free to borrow this for your own application, though I won't be committing this to Solr but rather refactoring it to a script update processor example.
        Hide
        Mohammad added a comment - - edited

        That sounds awesome. I have found this jira issue about the script update processor (https://issues.apache.org/jira/browse/SOLR-1725) which relates to your comment. I plan to read it today and will do some further research on its uses. I will attempt to write the script myself (I am a new to solr) but I look forward to your implementation.

        Is there anything you can suggest I read?

        Show
        Mohammad added a comment - - edited That sounds awesome. I have found this jira issue about the script update processor ( https://issues.apache.org/jira/browse/SOLR-1725 ) which relates to your comment. I plan to read it today and will do some further research on its uses. I will attempt to write the script myself (I am a new to solr) but I look forward to your implementation. Is there anything you can suggest I read?
        Hide
        Erik Hatcher added a comment -

        Patch implementing the "fields used" technique using a JavaScript update processor.

        Show
        Erik Hatcher added a comment - Patch implementing the "fields used" technique using a JavaScript update processor.
        Hide
        Erik Hatcher added a comment -

        I've added a patch that I'll commit to trunk and 4_x (with the script update processor chain commented out just like dedupe and langid is now) that implements this field used trick as a few (not so elegant, but straightforward standard JavaScript) lines using a regexes pattern (/attr_.*/).

        Show
        Erik Hatcher added a comment - I've added a patch that I'll commit to trunk and 4_x (with the script update processor chain commented out just like dedupe and langid is now) that implements this field used trick as a few (not so elegant, but straightforward standard JavaScript) lines using a regexes pattern (/attr_.*/).
        Hide
        Erik Hatcher added a comment -

        Committed a slightly updated (with CHANGES entry and more comments) version to trunk (r1366588) and 4_x (r1366589).

        Show
        Erik Hatcher added a comment - Committed a slightly updated (with CHANGES entry and more comments) version to trunk (r1366588) and 4_x (r1366589).
        Hide
        Erik Hatcher added a comment -

        Added this as a commented out pieces of a JavaScript update processor

        Show
        Erik Hatcher added a comment - Added this as a commented out pieces of a JavaScript update processor
        Hide
        Erik Hatcher added a comment -

        Re-opening this issue to add this as a general purpose (Java-based) update processor. As it is, what is in Solr proper now is a techproducts-specific update-script.js commented out snippet that achieves the same goal but more crudely.

        Show
        Erik Hatcher added a comment - Re-opening this issue to add this as a general purpose (Java-based) update processor. As it is, what is in Solr proper now is a techproducts-specific update-script.js commented out snippet that achieves the same goal but more crudely.

          People

          • Assignee:
            Erik Hatcher
            Reporter:
            Erik Hatcher
          • Votes:
            3 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:

              Development