Solr
  1. Solr
  2. SOLR-2370

Let some UpdateProcessors be default without explicitly configuring them

    Details

      Description

      Problem:
      Today the user needs to make sure that crucial UpdateProcessors like the Log- and Run UpdateProcessors are present when creating a new UpdateRequestProcessorChain. This is error prone, and when introducing a new core UpdateProcessor, like in SOLR-2358, all existing users need to insert the changes into all their pipelines.

      A customer made pipeline should not need to care about distributed indexing, logging or anything else, and should be as slim as possible.

        Issue Links

          Activity

          Hide
          Yonik Seeley added a comment -

          I'd like to hear from others, but at first blush it seems like a good idea.

          aside: The description field of a JIRA issue is repeated in every update to the mailing list. It's probably best to use a few sentences to summarize and put more meat in a comment.

          Show
          Yonik Seeley added a comment - I'd like to hear from others, but at first blush it seems like a good idea. aside: The description field of a JIRA issue is repeated in every update to the mailing list. It's probably best to use a few sentences to summarize and put more meat in a comment.
          Hide
          Jan Høydahl added a comment -

          (Moving proposal to a comment as per best-practice)

          Proposal:
          The proposal is to lend from the <first-components> and <last-components> pattern used in RequestHandler configs. In that way, we could let all core processors be included either first or last by default in all UpdateChains.

          To do this, we need a place to configure the defaults, e.g. by a default="true" param:

          <updateRequestProcessorChain name="default" default="true">
            <first-processors>
              <processor class="solr.DistributedUpdateRequestProcessor"/>
            </first-processors>
            <last-processors>
              <processor class="solr.LogUpdateProcessorFactory" />
              <processor class="solr.RunUpdateProcessorFactory" />
            </last-processors>
          </updateRequestProcessorChain>
          

          Next, the customer made chain will be only the "center" part:

          <updateRequestProcessorChain name="mychain">
            <processor class="my.nice.DoSomethingProcessor"/>
            <processor class="my.nice.DoAnotherThingProcessor"/>
          </updateRequestProcessorChain>
          

          To override the core processors config for a particular chain, you would start a clean chain by parameter reset="true"

          <updateRequestProcessorChain name="mychain" reset="true">
            <processor class="my.nice.DoSomethingProcessor"/>
            <processor class="my.nice.DoAnotherThingProcessor"/>
            <processor class="solr.RunUpdateProcessorFactory" />
          </updateRequestProcessorChain>
          

          If you only need to make sure that one of your custom processors run at the very beginning or the very end, you could use:

          <updateRequestProcessorChain name="mychain">
            <processor class="my.nice.DoSomethingProcessor"/>
            <processor class="my.nice.DoAnotherThingProcessor"/>
            <last-processors>
              <processor class="solr.MySpecialDebugProcessor" />
            </last-processors>
          </updateRequestProcessorChain>
          

          The default should be reset="false", but the example schema could keep the default chain commented out to provide backward compatibility for upgraders.

          Show
          Jan Høydahl added a comment - (Moving proposal to a comment as per best-practice) Proposal: The proposal is to lend from the <first-components> and <last-components> pattern used in RequestHandler configs. In that way, we could let all core processors be included either first or last by default in all UpdateChains. To do this, we need a place to configure the defaults, e.g. by a default="true" param: <updateRequestProcessorChain name= "default" default= "true" > <first-processors> <processor class= "solr.DistributedUpdateRequestProcessor" /> </first-processors> <last-processors> <processor class= "solr.LogUpdateProcessorFactory" /> <processor class= "solr.RunUpdateProcessorFactory" /> </last-processors> </updateRequestProcessorChain> Next, the customer made chain will be only the "center" part: <updateRequestProcessorChain name= "mychain" > <processor class= "my.nice.DoSomethingProcessor" /> <processor class= "my.nice.DoAnotherThingProcessor" /> </updateRequestProcessorChain> To override the core processors config for a particular chain, you would start a clean chain by parameter reset="true" <updateRequestProcessorChain name= "mychain" reset= "true" > <processor class= "my.nice.DoSomethingProcessor" /> <processor class= "my.nice.DoAnotherThingProcessor" /> <processor class= "solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> If you only need to make sure that one of your custom processors run at the very beginning or the very end, you could use: <updateRequestProcessorChain name= "mychain" > <processor class= "my.nice.DoSomethingProcessor" /> <processor class= "my.nice.DoAnotherThingProcessor" /> <last-processors> <processor class= "solr.MySpecialDebugProcessor" /> </last-processors> </updateRequestProcessorChain> The default should be reset="false", but the example schema could keep the default chain commented out to provide backward compatibility for upgraders.
          Hide
          Hoss Man added a comment -

          I'm sort of on board this ... peronally i think UpdateProcessors are a complicated enough beast that if you are configuring them you reallY need to configure the whole chain – but assuming i'm in the minority there, the part i don't understand is the value add in the 'customer made chain will be only the "center" part' and reset="true" pieces ... that seems overly complicated.

          The pattern (ultimately) implemented with search components was that:

          • If first-components was specified, it came before the default list.
          • If last-components was specified, it came after the default list, but before debug.
          • If a full components list was specified (instead of first/last lists) it overrode the entire default list.

          the same pattern seems like it would make perfect sense here - substituting RunUpdateProcessorFactory for DebugComponent. If you specify your own complete chain, then you override the complete chain. If you specify a list of first-components they come before all the defualt stuff. if you specify last-components they come after default stuff, but RunUpdateProcessorFactory is still used at the end.

          what am i missing that necessitates this idea of replacing the "center" part of the chain being the default?

          (maybe i'm just missing the point of the examples ... it owuld help to know what the hypothetical default chain is in these scenerios, and then what the final resulting chain would be in each instance)

          Show
          Hoss Man added a comment - I'm sort of on board this ... peronally i think UpdateProcessors are a complicated enough beast that if you are configuring them you reallY need to configure the whole chain – but assuming i'm in the minority there, the part i don't understand is the value add in the 'customer made chain will be only the "center" part' and reset="true" pieces ... that seems overly complicated. The pattern (ultimately) implemented with search components was that: If first-components was specified, it came before the default list. If last-components was specified, it came after the default list, but before debug. If a full components list was specified (instead of first/last lists) it overrode the entire default list. the same pattern seems like it would make perfect sense here - substituting RunUpdateProcessorFactory for DebugComponent. If you specify your own complete chain, then you override the complete chain. If you specify a list of first-components they come before all the defualt stuff. if you specify last-components they come after default stuff, but RunUpdateProcessorFactory is still used at the end. what am i missing that necessitates this idea of replacing the "center" part of the chain being the default? (maybe i'm just missing the point of the examples ... it owuld help to know what the hypothetical default chain is in these scenerios, and then what the final resulting chain would be in each instance)
          Hide
          Jan Høydahl added a comment -

          I was assuming that the DistributedUpdateHandler would always want to run first, and then the rest of the processing could happen per shard. That way you get somewhat load balanced processing, compared to running the whole chain before distributing. Thus you have

          DistributedUpdateProcessor
          CustomUpdateProcessor
          CustomUpdateProcessor
          LogUpdateProcessor
          RunUpdateProcessor

          Thus in my head it makes most sense to insert user chains in the middle. A more explicit way to do that could be

          <updateRequestProcessorChain name="mychain">
            <middle-processors>
              <processor class="my.nice.DoSomethingProcessor"/>
              <processor class="my.nice.DoAnotherThingProcessor"/>
            </middle-processors>
          </updateRequestProcessorChain>
          

          and let the existing syntax define the whole chain as today. We then only need to find a way to mark the "middle" of the default chain.

          Show
          Jan Høydahl added a comment - I was assuming that the DistributedUpdateHandler would always want to run first, and then the rest of the processing could happen per shard. That way you get somewhat load balanced processing, compared to running the whole chain before distributing. Thus you have DistributedUpdateProcessor CustomUpdateProcessor CustomUpdateProcessor LogUpdateProcessor RunUpdateProcessor Thus in my head it makes most sense to insert user chains in the middle. A more explicit way to do that could be <updateRequestProcessorChain name= "mychain" > <middle-processors> <processor class= "my.nice.DoSomethingProcessor" /> <processor class= "my.nice.DoAnotherThingProcessor" /> </middle-processors> </updateRequestProcessorChain> and let the existing syntax define the whole chain as today. We then only need to find a way to mark the "middle" of the default chain.

            People

            • Assignee:
              Unassigned
              Reporter:
              Jan Høydahl
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development