Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: SolrCloud, update
    • Labels:
      None

      Description

      An update will first go through processors until it gets to the point where it is forwarded to the leader (or forwarded to replicas if already on the leader).
      We need a way to skip over the processors that were already run (perhaps by using a processor chain dedicated to sub-updates?

      1. SOLR-2822.patch
        59 kB
        Hoss Man
      2. SOLR-2822.patch
        57 kB
        Hoss Man
      3. SOLR-2822.patch
        45 kB
        Hoss Man

        Issue Links

          Activity

          Hide
          Jan Høydahl added a comment -

          Nice thought to allow modeling the flow in the update-chain itself as an intuitive way of choosing which processing to do on the receiving node, on the leader node, or on each replica:

          A-UP -> B-UP -> Distrib-UP(leader) -> C-UP -> Distrib-UP(replica) -> D-UP -> Run-UP
          

          The Distrib processor could set some state info on the request to the next node so that chain processing could continue where it left off. E.g. &update.chain.nextproc=<name/id-of-next-proc>. This would require introduction of named processor instances.

          The default if no distrib explicitly specified could be to run Distrib-UP(leader) prior to the chain and Distrib-UP(replica) right before Run-UP? See SOLR-2370. In a non-cloud setting, the Distrib-UPs would do nothing.

          Show
          Jan Høydahl added a comment - Nice thought to allow modeling the flow in the update-chain itself as an intuitive way of choosing which processing to do on the receiving node, on the leader node, or on each replica: A-UP -> B-UP -> Distrib-UP(leader) -> C-UP -> Distrib-UP(replica) -> D-UP -> Run-UP The Distrib processor could set some state info on the request to the next node so that chain processing could continue where it left off. E.g. &update.chain.nextproc=<name/id-of-next-proc>. This would require introduction of named processor instances. The default if no distrib explicitly specified could be to run Distrib-UP(leader) prior to the chain and Distrib-UP(replica) right before Run-UP? See SOLR-2370 . In a non-cloud setting, the Distrib-UPs would do nothing.
          Hide
          Hoss Man added a comment -

          Yonik suggested that this issue might be a good place for me to start learning more about the internals of Solr Cloud. First some background on what i started thinking about (archived for posterity) but have since dismissed as a bad idea...

          My Initial Bad Idea That I Think Sucks

          my first impression when yonik mentioned the problem was along the lines of the initial bug summary...

          perhaps by using a processor chain dedicated to sub-updates

          ...at first glance, this seemed like it would be an easy way to deal with the problem in a way that would be very easy to understand when looking at your chains in the solrconfig.xml, and wouldn't involve any sort of "magic" that might be hard for users to understand about how distributed updates work...

          • some chain(s) (probably the default) would end with DistributedUpdateProcessorFactory, which would be configured with a "distrib.chain" init param value having some value $foo
          • some other chain would be named $foo, and would end with RunUpdateProcessorFactory
          • when DistributedUpdateProcessorFactory was triggered, it would forward the request to all of the nodes (including itself) using the configured distrib.chain

          But as i started looking at the code a bit more, i realized that was a trivially naive understanding of what's needed here – mainly because i completely forgot about that fact that there are two types of forwarding:

          • some arbitrary node gets the initial request from a client, and forwards to the leader
          • the leader does version checking/adding on the command/docs, and forwards to all the nodes

          This is when i started thinking that maybe there should be three named chains, and started understanding Jan's comment above about choosing which logic should happen "on the leader node".

          But the more i thought about the more the more the idea of specifying certain logic needed to be run on the leader node made less sense. I can think of lots of reasons for why you would care about whether an UpdateProcessor ran on one node or N nodes (mainly the trade off between an expensive computation you might only wnat to do once, vs processors that will add a lot of fields to the request so you might want to defer until after distribution to minimize data over the wire) but i couldn't come up with any compelling reason why, if you have a processor you only want to execute once you would care if it happens on the first node randomly selected by the client, or if you really wanted to ensure it happened on the leader – particularly since leader election/promotion can happen to any node at any time, so you couldn't even use the leader logic to try and take advantage of some local resources on a particular machine.

          But even if we go back to the "two chain" model i suggested above, because of the leader logic, the second chain would need to start with some UpdateProcessor that handled the "version" & "forward to all nodes" work currently handled by DistributedUpdateProcessor when "isLeader" is true. Which means people editing their configs would need to not only remember to put some sort of "SendDistributedUpdateProcessorFactory" at the end of the chains they expect clients to use, but also some sort of "RecieveDistributedUpdateProcessorFactory" at the beginning of the chain that "SendDistributedUpdateProcessorFactory" would be pointing at (or maybe both are just instances of DistributedUpdateProcessorFactory and we make it smart enough to know when it's the start vs end of the chain) ... which starts sounding really tedious and easy to screw up – not to mention it doesn't play very nicely with the idea of a hardcoded default chain users get for free if they have none in their config; and it thoroughly complicates the story of someone who starts with the example (no declared chains) in cloud mode and then wants to customize a little, say by adding SignatureUpdateProcessor – they wouldn't be able to just configure a chain starting with SignatureUpdateProcessor, they'd need to add two, and be careful about how they start and end.

          i'm all for transparency and eliminating "magic" by having things spelled out in the configs, but even for me this idea was starting to seem like a serious pain in the ass.

          Which leads me to the idea i do think has merit...

          My Current Idea That I Don't Think Sucks

          I think in an ideal world...

          • Solr Cloud users should be able to just declare a straightforwrd chain of processors and not need to worry about which nodes they run on (ie: any chain they used pre-cloud should still work)
          • if a user wants to care what parts of the chain run once vs on every node, it should be simple to indicate that decision by putting some "marker" processor (ie: DistributedUpdateProcessorFactory) in the chain denoting when distribution should happen.
          • If you are using cloud mode, and don't specify where the distributed update logic happens, then Solr should pick for you – either "first" before any other processors, or "last" just before RunUpdateProcessor ... I don't have an opinion which is better, i'm sure other people who have been experimenting with Solr Cloud for a while can tell me.

          (I'm usually the person opposed to doing things magically behind the scenes like this, but "Solr Cloud" is becoming a central enough concept in Solr, with many components doing special things if "zkEnabled", that I think moving forward it's "ok" to treat distributed updating as the norm in cloud mode and optimize for it.)

          The most straight forward way i can think of to do this would be:

          • if Solr is in "cloud mode" then when SolrCore is initializing the chains from solrconfig.xml, any chain that doesn't include DistributedUpdateProcessorFactory should have it added automatically (alternatively: maybe we only add it if RunUpdateProcessor is in the chain, so anyone using chains w/o RunUpdateProcessor – for some weird bizare purpose we can't imagine – won't be surprised by DistributedUpdateProcessorFactory getting added magically)
          • DistributedUpdateProcessor should add some param when forwarding to any other node indicating that the request is being distributed
          • when an update request is received, if it has this special param in it, then any processor in the chain prior to DistributedUpdateProcessorFactory is skipped

          This idea is very similar to part of what Jan suggested in his comment above...

          The Distrib processor could set some state info on the request to the next node so that chain processing could continue where it left off. E.g. &update.chain.nextproc=<name/id-of-next-proc>. This would require introduction of named processor instances.

          ...but what i'm thinking of wouldn't require named processors and would be specific to distributed updates (but wouldn't precluded named processors and more enhanced logic down the road if someone wanted it).

          I think this would be fairly feasible just by making some small modifications to DistributedUpdateProcessor (to add the new special param when forwarding) and UpdateRequestProcessorChain (to inject the DistributedUpdateProcessorFactory if cloud mode, and to skip up to the DistributedUpdateProcessorFactory if the param is set). I do however still think we should generalize somewhat:

          • DistributedUpdateProcessorFactory should be made to implement some marker interface with no methods (ie: DistributedUpdateMarker)
          • UpdateRequestProcessorChain.init should scan for instances of DistributedUpdateMarker in the chain (instead of looking explicitly for DistributedUpdateProcessorFactory) when deciding whether to inject a new DistributedUpdateProcessorFactory into the chain
          • UpdateRequestProcessorChain.createProcessor should scan for instances of DistributedUpdateMarker in the chain (instead of looking explicitly for DistributedUpdateProcessorFactory) when "skipping ahead" if the special param is found in the request

          ...that way advanced users can implement their own distributed update processor implementing that interface and register it explicitly in their chain if they are so inclined, or implement a NoOp update processor implementing that interface if they want to bipass the magic completley.

          As a possible optimization/simplification to what gets sent over the wire, the new param we add that UpdateRequestProcessorChain would start looking for to "skip ahead" in the chain could replace the existing "leader" boolean param DistributedUpdateProcessor currently uses (aka: SEEN_LEADER) by having an enum style param (perhaps called "update.distrib" ?)...

          • none - default if unset, means no distribution has happend
          • toleader - means the request is being sent to the leader
          • fromleader - means the leader is sending the request to all nodes

          UpdateRequestProcessorChain would only care if the value is not "none", in which case it would skip ahead to the DistributedUpdateMarker in the chain. DistributedUpdateProcessorFactory would care if the value is "toleader" or "fromleader" in which case it's logic would toggle in the same way it does currently for SEEN_LEADER.

          So that's my idea. As i mentioned, this is really the first time i've looked into any of the "Solr Cloud" internals, and it's very possible i may still be missing something major. So if you spot any holes in this idea, please let me know – otherwise i'll try to take a stab at a patch in the next few days.

          Show
          Hoss Man added a comment - Yonik suggested that this issue might be a good place for me to start learning more about the internals of Solr Cloud. First some background on what i started thinking about (archived for posterity) but have since dismissed as a bad idea... My Initial Bad Idea That I Think Sucks my first impression when yonik mentioned the problem was along the lines of the initial bug summary... perhaps by using a processor chain dedicated to sub-updates ...at first glance, this seemed like it would be an easy way to deal with the problem in a way that would be very easy to understand when looking at your chains in the solrconfig.xml, and wouldn't involve any sort of "magic" that might be hard for users to understand about how distributed updates work... some chain(s) (probably the default) would end with DistributedUpdateProcessorFactory, which would be configured with a " distrib.chain " init param value having some value $foo some other chain would be named $foo , and would end with RunUpdateProcessorFactory when DistributedUpdateProcessorFactory was triggered, it would forward the request to all of the nodes (including itself) using the configured distrib.chain But as i started looking at the code a bit more, i realized that was a trivially naive understanding of what's needed here – mainly because i completely forgot about that fact that there are two types of forwarding: some arbitrary node gets the initial request from a client, and forwards to the leader the leader does version checking/adding on the command/docs, and forwards to all the nodes This is when i started thinking that maybe there should be three named chains, and started understanding Jan's comment above about choosing which logic should happen "on the leader node". But the more i thought about the more the more the idea of specifying certain logic needed to be run on the leader node made less sense. I can think of lots of reasons for why you would care about whether an UpdateProcessor ran on one node or N nodes (mainly the trade off between an expensive computation you might only wnat to do once, vs processors that will add a lot of fields to the request so you might want to defer until after distribution to minimize data over the wire) but i couldn't come up with any compelling reason why, if you have a processor you only want to execute once you would care if it happens on the first node randomly selected by the client, or if you really wanted to ensure it happened on the leader – particularly since leader election/promotion can happen to any node at any time, so you couldn't even use the leader logic to try and take advantage of some local resources on a particular machine. But even if we go back to the "two chain" model i suggested above, because of the leader logic, the second chain would need to start with some UpdateProcessor that handled the "version" & "forward to all nodes" work currently handled by DistributedUpdateProcessor when "isLeader" is true. Which means people editing their configs would need to not only remember to put some sort of "SendDistributedUpdateProcessorFactory" at the end of the chains they expect clients to use, but also some sort of "RecieveDistributedUpdateProcessorFactory" at the beginning of the chain that "SendDistributedUpdateProcessorFactory" would be pointing at (or maybe both are just instances of DistributedUpdateProcessorFactory and we make it smart enough to know when it's the start vs end of the chain) ... which starts sounding really tedious and easy to screw up – not to mention it doesn't play very nicely with the idea of a hardcoded default chain users get for free if they have none in their config; and it thoroughly complicates the story of someone who starts with the example (no declared chains) in cloud mode and then wants to customize a little, say by adding SignatureUpdateProcessor – they wouldn't be able to just configure a chain starting with SignatureUpdateProcessor, they'd need to add two, and be careful about how they start and end. i'm all for transparency and eliminating "magic" by having things spelled out in the configs, but even for me this idea was starting to seem like a serious pain in the ass. Which leads me to the idea i do think has merit... My Current Idea That I Don't Think Sucks I think in an ideal world... Solr Cloud users should be able to just declare a straightforwrd chain of processors and not need to worry about which nodes they run on (ie: any chain they used pre-cloud should still work) if a user wants to care what parts of the chain run once vs on every node, it should be simple to indicate that decision by putting some "marker" processor (ie: DistributedUpdateProcessorFactory) in the chain denoting when distribution should happen. If you are using cloud mode, and don't specify where the distributed update logic happens, then Solr should pick for you – either "first" before any other processors, or "last" just before RunUpdateProcessor ... I don't have an opinion which is better, i'm sure other people who have been experimenting with Solr Cloud for a while can tell me. (I'm usually the person opposed to doing things magically behind the scenes like this, but "Solr Cloud" is becoming a central enough concept in Solr, with many components doing special things if "zkEnabled", that I think moving forward it's "ok" to treat distributed updating as the norm in cloud mode and optimize for it.) The most straight forward way i can think of to do this would be: if Solr is in "cloud mode" then when SolrCore is initializing the chains from solrconfig.xml, any chain that doesn't include DistributedUpdateProcessorFactory should have it added automatically (alternatively: maybe we only add it if RunUpdateProcessor is in the chain, so anyone using chains w/o RunUpdateProcessor – for some weird bizare purpose we can't imagine – won't be surprised by DistributedUpdateProcessorFactory getting added magically) DistributedUpdateProcessor should add some param when forwarding to any other node indicating that the request is being distributed when an update request is received, if it has this special param in it, then any processor in the chain prior to DistributedUpdateProcessorFactory is skipped This idea is very similar to part of what Jan suggested in his comment above... The Distrib processor could set some state info on the request to the next node so that chain processing could continue where it left off. E.g. &update.chain.nextproc=<name/id-of-next-proc>. This would require introduction of named processor instances. ...but what i'm thinking of wouldn't require named processors and would be specific to distributed updates (but wouldn't precluded named processors and more enhanced logic down the road if someone wanted it). I think this would be fairly feasible just by making some small modifications to DistributedUpdateProcessor (to add the new special param when forwarding) and UpdateRequestProcessorChain (to inject the DistributedUpdateProcessorFactory if cloud mode, and to skip up to the DistributedUpdateProcessorFactory if the param is set). I do however still think we should generalize somewhat: DistributedUpdateProcessorFactory should be made to implement some marker interface with no methods (ie: DistributedUpdateMarker) UpdateRequestProcessorChain.init should scan for instances of DistributedUpdateMarker in the chain (instead of looking explicitly for DistributedUpdateProcessorFactory) when deciding whether to inject a new DistributedUpdateProcessorFactory into the chain UpdateRequestProcessorChain.createProcessor should scan for instances of DistributedUpdateMarker in the chain (instead of looking explicitly for DistributedUpdateProcessorFactory) when "skipping ahead" if the special param is found in the request ...that way advanced users can implement their own distributed update processor implementing that interface and register it explicitly in their chain if they are so inclined, or implement a NoOp update processor implementing that interface if they want to bipass the magic completley. As a possible optimization/simplification to what gets sent over the wire, the new param we add that UpdateRequestProcessorChain would start looking for to "skip ahead" in the chain could replace the existing "leader" boolean param DistributedUpdateProcessor currently uses (aka: SEEN_LEADER ) by having an enum style param (perhaps called " update.distrib " ?)... none - default if unset, means no distribution has happend toleader - means the request is being sent to the leader fromleader - means the leader is sending the request to all nodes UpdateRequestProcessorChain would only care if the value is not " none ", in which case it would skip ahead to the DistributedUpdateMarker in the chain. DistributedUpdateProcessorFactory would care if the value is " toleader " or " fromleader " in which case it's logic would toggle in the same way it does currently for SEEN_LEADER. So that's my idea. As i mentioned, this is really the first time i've looked into any of the "Solr Cloud" internals, and it's very possible i may still be missing something major. So if you spot any holes in this idea, please let me know – otherwise i'll try to take a stab at a patch in the next few days.
          Hide
          Jan Høydahl added a comment -

          This idea is very similar to part of what Jan suggested in his comment above...SNIP...but what i'm thinking of wouldn't require named processors and would be specific to distributed updates (but wouldn't precluded named processors and more enhanced logic down the road if someone wanted it).

          Yup, that's one way. However, I think we can achieve even more transparency and flexibility by introducing the concept of GOTO in our pipeline! Remember in the old days of programming, we could jump to a specified place in the code (well, HTML's anchor does the same, but I thought GOTO was a cooler analogy ) Let's say we create an interface ChainLabel with two methods getLabel/setLabel, same as your marker but with a nametag. Then DistribProcessor would set "distrib" as its label, and we could imagine a future processor which delegates processing to an external pipeline cluster, which sets another label "externalPipeline". We could even have a dummy noop UpdateProcessor which sets the label as a config param. Then, you could call update.chain.goto=myLabel to continue processing at the label. The URPChain class would not know about DistributedUpdateProcessor, but about labels and goto in general.

          I like your update.distrib=none|toleader|fromleader optimization

          Show
          Jan Høydahl added a comment - This idea is very similar to part of what Jan suggested in his comment above...SNIP...but what i'm thinking of wouldn't require named processors and would be specific to distributed updates (but wouldn't precluded named processors and more enhanced logic down the road if someone wanted it). Yup, that's one way. However, I think we can achieve even more transparency and flexibility by introducing the concept of GOTO in our pipeline! Remember in the old days of programming, we could jump to a specified place in the code (well, HTML's anchor does the same, but I thought GOTO was a cooler analogy ) Let's say we create an interface ChainLabel with two methods getLabel/setLabel , same as your marker but with a nametag. Then DistribProcessor would set "distrib" as its label, and we could imagine a future processor which delegates processing to an external pipeline cluster, which sets another label "externalPipeline". We could even have a dummy noop UpdateProcessor which sets the label as a config param. Then, you could call update.chain.goto=myLabel to continue processing at the label. The URPChain class would not know about DistributedUpdateProcessor, but about labels and goto in general. I like your update.distrib=none|toleader|fromleader optimization
          Hide
          Hoss Man added a comment -

          patch i worked up during the last couple of weeks (but evidently forgot to upload)

          • implements the general idea i outlined before (using the interface name "DistributingUpdateProcessorFactory" instead of DistributedUpdateMarker)
          • includes a randomized test demonstrating the existing problem with doc adds and update processors both before and after the distrib handler – but this was just shoehorned into an existing test and should be refactored
          • needs more unit tests - particularly something non randomized that does a straight forward verification of the processor skipping done by the chain
          • I want to add a NoOpDistributingUpdateProcessorFactory - both for testing and for easy use by any existing SolrCloud users who want absolute total control of their chains (ie: isn't using DistributedUpdateProcessorFactory and is building shards their own custom way)
          • has a few nocommits - other then things i've already mentioned, they are primarily related to docs and logging

          One interesting thing i didn't realize until i started implementing this is that the deleteByQuery code path already had something nearly identical to my update.distrib=NONE|TOLEADER|FROMLEADER param value (via a "dqb_level=1|2|3") .. i take that as validation that generalizing it to replace the existing SEEN_LEADER logic in the other code paths isn't a bad idea.

          Show
          Hoss Man added a comment - patch i worked up during the last couple of weeks (but evidently forgot to upload) implements the general idea i outlined before (using the interface name "DistributingUpdateProcessorFactory" instead of DistributedUpdateMarker) includes a randomized test demonstrating the existing problem with doc adds and update processors both before and after the distrib handler – but this was just shoehorned into an existing test and should be refactored needs more unit tests - particularly something non randomized that does a straight forward verification of the processor skipping done by the chain I want to add a NoOpDistributingUpdateProcessorFactory - both for testing and for easy use by any existing SolrCloud users who want absolute total control of their chains (ie: isn't using DistributedUpdateProcessorFactory and is building shards their own custom way) has a few nocommits - other then things i've already mentioned, they are primarily related to docs and logging One interesting thing i didn't realize until i started implementing this is that the deleteByQuery code path already had something nearly identical to my update.distrib=NONE|TOLEADER|FROMLEADER param value (via a "dqb_level=1|2|3") .. i take that as validation that generalizing it to replace the existing SEEN_LEADER logic in the other code paths isn't a bad idea.
          Hide
          Hoss Man added a comment -

          Let's say we create an interface ChainLabel with two methods getLabel/setLabel, same as your marker but with a nametag. Then DistribProcessor would set "distrib" as its label, and we could imagine a future processor which delegates processing to an external pipeline cluster, which sets another label "externalPipeline". We could even have a dummy noop UpdateProcessor which sets the label as a config param. Then, you could call update.chain.goto=myLabel to continue processing at the label

          This seems kind of dangerous to me and should probably be fleshed out in it's own issue with a lot more discussion. As it relates to this particular problem, it really seems like "skipping to" the distribute update processor really needs to be independent of any sort of generalized "goto" support that we might add to the chain, because shouldn't this hypothetical "goto" logic to work in combination with distributed updates, not in place of it?

          ie: if a chain is A->B->DISTRIB->E->F->..., and you send a request to a random node with "goto=F", then should that randomly selected node really execute F locally, ignoring the rest of the cloud? it seems like should it forward to a leader (which may decide to broadcast to all shards, or all leaders if it's a deleteByQuery) and then they should skip E and goto F ... maybe.

          like i said, it seems complicated. i'd rather keep the issue focused for now.

          Show
          Hoss Man added a comment - Let's say we create an interface ChainLabel with two methods getLabel/setLabel, same as your marker but with a nametag. Then DistribProcessor would set "distrib" as its label, and we could imagine a future processor which delegates processing to an external pipeline cluster, which sets another label "externalPipeline". We could even have a dummy noop UpdateProcessor which sets the label as a config param. Then, you could call update.chain.goto=myLabel to continue processing at the label This seems kind of dangerous to me and should probably be fleshed out in it's own issue with a lot more discussion. As it relates to this particular problem, it really seems like "skipping to" the distribute update processor really needs to be independent of any sort of generalized "goto" support that we might add to the chain, because shouldn't this hypothetical "goto" logic to work in combination with distributed updates, not in place of it? ie: if a chain is A->B->DISTRIB->E->F->..., and you send a request to a random node with "goto=F", then should that randomly selected node really execute F locally, ignoring the rest of the cloud? it seems like should it forward to a leader (which may decide to broadcast to all shards, or all leaders if it's a deleteByQuery) and then they should skip E and goto F ... maybe. like i said, it seems complicated. i'd rather keep the issue focused for now.
          Hide
          Andrzej Bialecki added a comment -

          This looks good. Some comments:

          • javadocs should mention somewhere (UpdateProcessorRequestChain) the skipping of parts of the chain when FROMLEADER is seen.
          • tests don't seem to cover any case that uses requests with TOLEADER set.
          • re. comment in UpdateRequestProcessorChain, where DUPFactory() is inserted if missing: I'd say if someone forgot or is unaware of the need for a distrib processor then it should be inserted as it is done in the patch - right before the run processor, so that all modifications to documents are completed before distributing (modifications could affect the shard #).
          • minor nits:
            • BasicDistributedZkTest.testUpdateProcessorsRunOnlyOnce() has a hanging :TODO:, not sure what it refers to?
            • DistributingUpdateProcessorFactory: typos "not inteded", "skiped".
            • DistributedUpdateProcessor: typo "exposted"
            • UpdateRequestProcessorChain: mentions DistributedUpdateProcessorFactoryMarker, "then and".
          Show
          Andrzej Bialecki added a comment - This looks good. Some comments: javadocs should mention somewhere (UpdateProcessorRequestChain) the skipping of parts of the chain when FROMLEADER is seen. tests don't seem to cover any case that uses requests with TOLEADER set. re. comment in UpdateRequestProcessorChain, where DUPFactory() is inserted if missing: I'd say if someone forgot or is unaware of the need for a distrib processor then it should be inserted as it is done in the patch - right before the run processor, so that all modifications to documents are completed before distributing (modifications could affect the shard #). minor nits: BasicDistributedZkTest.testUpdateProcessorsRunOnlyOnce() has a hanging :TODO:, not sure what it refers to? DistributingUpdateProcessorFactory: typos "not inteded", "skiped". DistributedUpdateProcessor: typo "exposted" UpdateRequestProcessorChain: mentions DistributedUpdateProcessorFactoryMarker, "then and".
          Hide
          Hoss Man added a comment -

          Updated patch dealing with all the nocommits and (i think) fixing all the TODO items that both i and andrzej mentioned, except for...

          * includes a randomized test demonstrating the existing problem with doc adds and update processors both before and after the distrib handler – but this was just shoehorned into an existing test and should be refactored

          I started looking at this and realized that the overhead of creating the cloud instance in this test was somewhat significant (in terms of wall clock time), so it seemed better to leave the test method where it was (cleaned up a bit) then to refactor into another test with the same amount of setup.

          * tests don't seem to cover any case that uses requests with TOLEADER set.

          the processor skipping code in the chain doesn't care what the value is, or even what type of requests are processed, so the next test i added in this version of the patch covers both cases of update.distrib being set or not set.

          The only code path that cares if the values is TOLEADER is in deleteByQuery – all i did was ensure that the existing tests where changed to use the new update.distrib param instead of the old "dbg_level"

          If we need more deleteByQuery tests, that seems like it should be a distinct issue (or at the very least: an additional patch from someone who udnerstands the distrib tests better then me. writing new (whitebox) tests for distribted deleteByQuery is above my solrcloud foo at this point.)

          Show
          Hoss Man added a comment - Updated patch dealing with all the nocommits and (i think) fixing all the TODO items that both i and andrzej mentioned, except for... * includes a randomized test demonstrating the existing problem with doc adds and update processors both before and after the distrib handler – but this was just shoehorned into an existing test and should be refactored I started looking at this and realized that the overhead of creating the cloud instance in this test was somewhat significant (in terms of wall clock time), so it seemed better to leave the test method where it was (cleaned up a bit) then to refactor into another test with the same amount of setup. * tests don't seem to cover any case that uses requests with TOLEADER set. the processor skipping code in the chain doesn't care what the value is, or even what type of requests are processed, so the next test i added in this version of the patch covers both cases of update.distrib being set or not set. The only code path that cares if the values is TOLEADER is in deleteByQuery – all i did was ensure that the existing tests where changed to use the new update.distrib param instead of the old "dbg_level" If we need more deleteByQuery tests, that seems like it should be a distinct issue (or at the very least: an additional patch from someone who udnerstands the distrib tests better then me. writing new (whitebox) tests for distribted deleteByQuery is above my solrcloud foo at this point.)
          Hide
          Mark Miller added a comment -

          I was under the impression this might solve SOLR-3215, but now I'm not sure?

          Show
          Mark Miller added a comment - I was under the impression this might solve SOLR-3215 , but now I'm not sure?
          Hide
          Hoss Man added a comment -

          updated patch to trunk.

          Show
          Hoss Man added a comment - updated patch to trunk.
          Hide
          Mark Miller added a comment -

          +1 - approach seems as elegant as we could shoot for right now. I much prefer it to juggling multiple chains.

          I still worry about the 'clone doc' issue and update procs between distrib and run - if we do decide to not let procs live there, we should probably hard fail on it.

          Latest patch looks good to me - let's commit and iterate on trunk.

          Show
          Mark Miller added a comment - +1 - approach seems as elegant as we could shoot for right now. I much prefer it to juggling multiple chains. I still worry about the 'clone doc' issue and update procs between distrib and run - if we do decide to not let procs live there, we should probably hard fail on it. Latest patch looks good to me - let's commit and iterate on trunk.
          Hide
          Hoss Man added a comment -

          Committed revision 1342743.

          Show
          Hoss Man added a comment - Committed revision 1342743.

            People

            • Assignee:
              Hoss Man
              Reporter:
              Yonik Seeley
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development