Solr
  1. Solr
  2. SOLR-8500

Allow the number of threads ConcurrentUpdateSolrClient StreamingSolrClients configurable by a system property

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Despite the warning in that code, in extremely high throughput situations where there are guaranteed to be no updates to existing documents, it can be useful to have more than one runner.

      I envision this as an "expert" kind of thing, used only in situations where the a-priori knowledge is that there are no updates to existing documents.

      1. SOLR-8500.patch
        2 kB
        Erick Erickson
      2. SOLR-8500.patch
        1 kB
        Erick Erickson

        Issue Links

          Activity

          Hide
          Erick Erickson added a comment -

          Here's an extremely simple patch for this. Mostly I'm looking for opinions about whether this is a Good Idea or not.

          Show
          Erick Erickson added a comment - Here's an extremely simple patch for this. Mostly I'm looking for opinions about whether this is a Good Idea or not.
          Hide
          Mark Miller added a comment -

          I think using more than 1 thread may actually introduce more reordering problems right now.

          Show
          Mark Miller added a comment - I think using more than 1 thread may actually introduce more reordering problems right now.
          Hide
          Erick Erickson added a comment -

          bq: I think using more than 1 thread may actually introduce more reordering problems right now.

          Does it matter in the case that I outlined? That there are no updates to existing documents to contend with so even if docs get reordered it shouldn't have any effects noticeably by the end user.

          Or am I missing the boat?

          Show
          Erick Erickson added a comment - bq: I think using more than 1 thread may actually introduce more reordering problems right now. Does it matter in the case that I outlined? That there are no updates to existing documents to contend with so even if docs get reordered it shouldn't have any effects noticeably by the end user. Or am I missing the boat?
          Hide
          Erick Erickson added a comment -

          Mark Miller So do you oppose this as a "use at your own risk under very special circumstances" kind of thing? This isn't theoretical, there are clients who have used a patch in here and are seeing significant benefits.

          Show
          Erick Erickson added a comment - Mark Miller So do you oppose this as a "use at your own risk under very special circumstances" kind of thing? This isn't theoretical, there are clients who have used a patch in here and are seeing significant benefits.
          Hide
          Mark Miller added a comment - - edited

          No, I don't think we should allow config that we know will break the system, whether it's fast or not. If correctness does not matter, we can make things really fast.

          Once Yonik finishes the peer sync finger print it should no longer be a correctness issue to have these reorders though.

          Show
          Mark Miller added a comment - - edited No, I don't think we should allow config that we know will break the system, whether it's fast or not. If correctness does not matter, we can make things really fast. Once Yonik finishes the peer sync finger print it should no longer be a correctness issue to have these reorders though.
          Hide
          Mark Miller added a comment -

          In the short term, you can spin up more threads from the client rather than spinning up more threads here.

          Show
          Mark Miller added a comment - In the short term, you can spin up more threads from the client rather than spinning up more threads here.
          Hide
          Erick Erickson added a comment -

          First let me say I have only the most cursory understanding of "the reordering problem". My assumption is that since CUSC is batching up sub-lists of the update set and sending them in parallel that if doc1 is followed by doc2 in the original list, doc2 might get to the indexing node before doc1, be it an update, delete, add, whatever.

          That said, I don't really understand how reordering matters if (as per the original problem statement), it's guaranteed that each document is new and is submitted exactly once ever. I guess another important restriction is that the client doesn't care if docs get into the index in a different order than they were sent. How would correctness be threatened in that situation?

          If the concern is that this is a too-specialized use-case that allows people to set it and shoot themselves in the foot too easily, that's a point. I just don't get why, in this specific use-case, this is a correctness question.

          All that said, if Yonik's fingerprint stuff is going in relatively soon, it's probably all moot and we can just wait on this...

          Show
          Erick Erickson added a comment - First let me say I have only the most cursory understanding of "the reordering problem". My assumption is that since CUSC is batching up sub-lists of the update set and sending them in parallel that if doc1 is followed by doc2 in the original list, doc2 might get to the indexing node before doc1, be it an update, delete, add, whatever. That said, I don't really understand how reordering matters if (as per the original problem statement), it's guaranteed that each document is new and is submitted exactly once ever . I guess another important restriction is that the client doesn't care if docs get into the index in a different order than they were sent. How would correctness be threatened in that situation? If the concern is that this is a too-specialized use-case that allows people to set it and shoot themselves in the foot too easily, that's a point. I just don't get why, in this specific use-case, this is a correctness question. All that said, if Yonik's fingerprint stuff is going in relatively soon, it's probably all moot and we can just wait on this...
          Hide
          Mark Miller added a comment -

          The system as is simply can't correctly deal with these kind of reorders. I wish it wasn't true, but I wish I had a pony too

          Show
          Mark Miller added a comment - The system as is simply can't correctly deal with these kind of reorders. I wish it wasn't true, but I wish I had a pony too
          Hide
          Mark Miller added a comment -

          As far as a special case, the only special case that gets around this is if they never have to recover.

          Show
          Mark Miller added a comment - As far as a special case, the only special case that gets around this is if they never have to recover.
          Hide
          Erick Erickson added a comment -

          Oh, that kind of reorder....

          As it happens, in this case the volume is so high that they...er...well, can't recover a replica if it gets out of sync, they have to wait for the indexing for that time slice to stop and do a full sync.

          None of which matters, I see Yonik is making progress on 8586 so I'll just wait. My reminder entry is tireless in bringing up that I should look at this JIRA....

          Show
          Erick Erickson added a comment - Oh, that kind of reorder.... As it happens, in this case the volume is so high that they...er...well, can't recover a replica if it gets out of sync, they have to wait for the indexing for that time slice to stop and do a full sync. None of which matters, I see Yonik is making progress on 8586 so I'll just wait. My reminder entry is tireless in bringing up that I should look at this JIRA....
          Hide
          ASF subversion and git services added a comment -

          Commit 3e7fe7867f64b254680d462092d01f07858aa7c3 in lucene-solr's branch refs/heads/master from Erick Erickson
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3e7fe78 ]

          SOLR-8500: Allow the number of threads ConcurrentUpdateSolrClient StreamingSolrClients configurable by a system property

          Show
          ASF subversion and git services added a comment - Commit 3e7fe7867f64b254680d462092d01f07858aa7c3 in lucene-solr's branch refs/heads/master from Erick Erickson [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3e7fe78 ] SOLR-8500 : Allow the number of threads ConcurrentUpdateSolrClient StreamingSolrClients configurable by a system property
          Hide
          ASF subversion and git services added a comment -

          Commit 112a2311df50142ec19ec0033133fbc10df223c9 in lucene-solr's branch refs/heads/master from Erick Erickson
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=112a231 ]

          Put CHANGES entry for SOLR-8500 in the wrong section.

          Show
          ASF subversion and git services added a comment - Commit 112a2311df50142ec19ec0033133fbc10df223c9 in lucene-solr's branch refs/heads/master from Erick Erickson [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=112a231 ] Put CHANGES entry for SOLR-8500 in the wrong section.
          Hide
          ASF subversion and git services added a comment -

          Commit 129f7153087b279908d6340e7f8a5b024f0f7cad in lucene-solr's branch refs/heads/branch_5x from Erick Erickson
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=129f715 ]

          SOLR-8500: Allow the number of threads ConcurrentUpdateSolrClient StreamingSolrClients configurable by a system property

          (cherry picked from commit 3e7fe7867f64b254680d462092d01f07858aa7c3)

          Conflicts:
          solr/CHANGES.txt

          Show
          ASF subversion and git services added a comment - Commit 129f7153087b279908d6340e7f8a5b024f0f7cad in lucene-solr's branch refs/heads/branch_5x from Erick Erickson [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=129f715 ] SOLR-8500 : Allow the number of threads ConcurrentUpdateSolrClient StreamingSolrClients configurable by a system property (cherry picked from commit 3e7fe7867f64b254680d462092d01f07858aa7c3) Conflicts: solr/CHANGES.txt
          Hide
          Erick Erickson added a comment -

          NOTE: This is an "expert" level operation, see the CHANGES.txt entry, reproduced here:

          this is an expert option and can result in more often needing to do full index replication for recovery, the sweet spot for using this is very high volume, leader-only indexing.

          Show
          Erick Erickson added a comment - NOTE: This is an "expert" level operation, see the CHANGES.txt entry, reproduced here: this is an expert option and can result in more often needing to do full index replication for recovery, the sweet spot for using this is very high volume, leader-only indexing.
          Hide
          Erick Erickson added a comment -

          Forgot to attach patch with CHANGES.

          Show
          Erick Erickson added a comment - Forgot to attach patch with CHANGES.

            People

            • Assignee:
              Erick Erickson
              Reporter:
              Erick Erickson
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development