Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: update
    • Labels:
      None

      Description

      UpdateProcessorChains must currently be defined with XML in solrconfig.xml. We should explore a scriptable chain implementation with a DSL that allows for full flexibility. The first step would be to make UpdateChain implementations pluggable in solrconfig.xml, for backward compat support.

      Benefits and possibilities with a Scriptable UpdateChain:

      • A compact DSL for defining Processors and Chains (Workflows would be a better, less limited term here)
      • Keeping update processor config separate from solrconfig.xml gives better separations of roles
      • Use this as an opportunity to natively support scripting language Processors (ideas from SOLR-1725)

      This issue is spun off from SOLR-2823.

        Activity

        Jan Høydahl created issue -
        Hide
        Jan Høydahl added a comment -

        The DSL could be based on Groovy, JRuby, Jython or JS. Here's my quasi sketch of a Groovy example from 2823:

        ...This approach also solves another wish of mine, namely being able to define chains outside of solrconfig.xml. Logically, configuring schema and document processing is done by a "content" guy, but configuring solrconfig.xml is done by the "hardware/operations" guys. Imagine a solr/conf/pipeline.groovy defined in solrconfig.xml:

        <updateProcessorChain class="solr.ScriptedUpdateProcessorChainFactory" file="updateprocessing.groovy" />
        

        updateprocessing.groovy:

        chain simple {
          process(langid)
          process(copyfield)
          chain(logAndRun)
        }
        
        chain moreComplex {
          process(langid)
          if(doc.getFieldValue("employees") > 10)
            process(copyfield)
          else
            chain(myOtherProcesses)
          doc.deleteField("title")
          chain(logAndRun)
        }
        
        chain logAndRun {
          process(log)
          process(run)
        }
        
        processor langid {
          class = "solr.LanguageIdentifierUpdateProcessorFactory"
          config("langid.fl", "title,body")
          config("langid.langField", "language")
          config("map", true)
        }
        
        processor copyfield {
          script = "copyfield.groovy"
          config("from", "title")
          config("to", "title_en")
        }
        

        I don't know what it takes to code such a thing, but if we had it, I'd never go back to defining pipelines in XML

        Show
        Jan Høydahl added a comment - The DSL could be based on Groovy, JRuby, Jython or JS. Here's my quasi sketch of a Groovy example from 2823: ...This approach also solves another wish of mine, namely being able to define chains outside of solrconfig.xml. Logically, configuring schema and document processing is done by a "content" guy, but configuring solrconfig.xml is done by the "hardware/operations" guys. Imagine a solr/conf/pipeline.groovy defined in solrconfig.xml: <updateProcessorChain class= "solr.ScriptedUpdateProcessorChainFactory" file= "updateprocessing.groovy" /> updateprocessing.groovy: chain simple { process(langid) process(copyfield) chain(logAndRun) } chain moreComplex { process(langid) if (doc.getFieldValue( "employees" ) > 10) process(copyfield) else chain(myOtherProcesses) doc.deleteField( "title" ) chain(logAndRun) } chain logAndRun { process(log) process(run) } processor langid { class = "solr.LanguageIdentifierUpdateProcessorFactory" config( "langid.fl" , "title,body" ) config( "langid.langField" , "language" ) config( "map" , true ) } processor copyfield { script = "copyfield.groovy" config( "from" , "title" ) config( "to" , "title_en" ) } I don't know what it takes to code such a thing, but if we had it, I'd never go back to defining pipelines in XML
        Hide
        Lance Norskog added a comment -

        +1

        Another use case for scripting at the top level is "multi-query" queries: where the app creates the second based on the first. Would your proposal handle this problem?

        Many use cases for grouping/collapsing can be implemented with 2 queries. Perhaps the guts of collapsing could be simplified if the more outré use cases could be pushed out into multiple queries.

        Show
        Lance Norskog added a comment - +1 Another use case for scripting at the top level is "multi-query" queries: where the app creates the second based on the first. Would your proposal handle this problem? Many use cases for grouping/collapsing can be implemented with 2 queries. Perhaps the guts of collapsing could be simplified if the more outré use cases could be pushed out into multiple queries.
        Hide
        Jan Høydahl added a comment -

        Interesting idea, but only update side has been considered this far.

        Show
        Jan Høydahl added a comment - Interesting idea, but only update side has been considered this far.
        Hide
        Chris Male added a comment -

        Another use case for scripting at the top level is "multi-query" queries: where the app creates the second based on the first. Would your proposal handle this problem?

        So you mean having scriptable query time logic? That seems a pretty scary thing to get into.

        Show
        Chris Male added a comment - Another use case for scripting at the top level is "multi-query" queries: where the app creates the second based on the first. Would your proposal handle this problem? So you mean having scriptable query time logic? That seems a pretty scary thing to get into.

          People

          • Assignee:
            Unassigned
            Reporter:
            Jan Høydahl
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development