Solr
  1. Solr
  2. SOLR-6264

Distributed commit and optimize are executed serially across all replicas.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      Regardless of whether one agrees with optimizing, when you execute an optimize request using waitSearcher=true, the requests from the controller node are sent to each replica in the collection serially.

      You can send the optimize command to the update handler for a collection to any node in the cluster. For instance, if I had a collection named "foo":

      curl -i -v http://localhost:8984/solr/foo/update --data-binary '<optimize maxSegments="1" waitSearcher="true"/>' -H 'Content-type:application/xml'

      The node that receives this request will collect the URL for all "live" replicas in the collection (not just leaders) (see DistributedUpdateProcessor#getCollectionUrls) and then forward the commit request to each of them. On the surface, the code looks like it forwards the request asynchronously to all replicas. However, this is not actually what happens; the commit requests to each replica in the collection will be processed serially when using waitSearcher=true (because ConcurrentUpdateSolrServer's background queue processing is by-passed for commits).

      Bottom-line, if you request the collection to be optimized, the request gets forwarded around as you'd expect but is done synchronously so can take a long time.

      1. SOLR-6264.patch
        7 kB
        Mark Miller
      2. SOLR-6264.patch
        6 kB
        Mark Miller

        Activity

        Hide
        Yonik Seeley added a comment -

        Good catch! Is this true for commit also?

        Show
        Yonik Seeley added a comment - Good catch! Is this true for commit also?
        Hide
        Mark Miller added a comment -

        waitSearcher=true (because ConcurrentUpdateSolrServer's background queue processing is by-passed for commits).

        But don't we use a different ConcurrentUpdateSolrServer for each Solr URL?

        Show
        Mark Miller added a comment - waitSearcher=true (because ConcurrentUpdateSolrServer's background queue processing is by-passed for commits). But don't we use a different ConcurrentUpdateSolrServer for each Solr URL?
        Hide
        Mark Miller added a comment -

        I see - it waits for each call on each ConcurrentUpdateSolrServer#request call as it loops through them. Interesting. Good find, fairly ugly, let's fix it.

        Show
        Mark Miller added a comment - I see - it waits for each call on each ConcurrentUpdateSolrServer#request call as it loops through them. Interesting. Good find, fairly ugly, let's fix it.
        Hide
        Timothy Potter added a comment -

        Yes, we do, which is why this is tricky to see The SolrCmdDistributor.distribCommit has a for loop that calls submit:

        for (Node node : nodes)

        { submit(new Req(cmd.toString(), node, uReq, false)); }

        The submit uses a different CUSS of course, but the for loop is blocked because the "async" submit is actually sync because ConcurrentUpdateSolrServer skips the runners part if it's a commit. I only stumbled upon this by looking at timestamp of requests and realized they were running serially and then scratched my head a bit because I know StreamingSolrServers and CUSS pretty well at this point.

        I think it is true for commits too.

        Show
        Timothy Potter added a comment - Yes, we do, which is why this is tricky to see The SolrCmdDistributor.distribCommit has a for loop that calls submit: for (Node node : nodes) { submit(new Req(cmd.toString(), node, uReq, false)); } The submit uses a different CUSS of course, but the for loop is blocked because the "async" submit is actually sync because ConcurrentUpdateSolrServer skips the runners part if it's a commit. I only stumbled upon this by looking at timestamp of requests and realized they were running serially and then scratched my head a bit because I know StreamingSolrServers and CUSS pretty well at this point. I think it is true for commits too.
        Hide
        Mark Miller added a comment -

        Perhaps we have to put in a thread pool and ensure the async path of SolrCmdDistrbiutor#submit is async by putting it on another thread and making errors thread safe. I'm not sure - take a bit of thought to trace it all out.

        Show
        Mark Miller added a comment - Perhaps we have to put in a thread pool and ensure the async path of SolrCmdDistrbiutor#submit is async by putting it on another thread and making errors thread safe. I'm not sure - take a bit of thought to trace it all out.
        Hide
        Mark Miller added a comment -

        I think it is true for commits too.

        It's certainly true for commits - it only happens for optimize because it rides commits.

        Show
        Mark Miller added a comment - I think it is true for commits too. It's certainly true for commits - it only happens for optimize because it rides commits.
        Hide
        Timothy Potter added a comment -

        We probably only want to use the thread poll for commits (and optimizes) ... for other update requests, we probably don't want to spawn a thread that spawns runners, right?

        Show
        Timothy Potter added a comment - We probably only want to use the thread poll for commits (and optimizes) ... for other update requests, we probably don't want to spawn a thread that spawns runners, right?
        Hide
        Mark Miller added a comment -

        Here is a rough patch with what I'm thinking.

        Show
        Mark Miller added a comment - Here is a rough patch with what I'm thinking.
        Hide
        Mark Miller added a comment -

        I still have to finish it up and run some tests - just a quick jam out for comment.

        Show
        Mark Miller added a comment - I still have to finish it up and run some tests - just a quick jam out for comment.
        Hide
        Mark Miller added a comment -

        with waitSearcher=true

        I think you will get the same thing with a pure commit and no docs or deletes.

        Show
        Mark Miller added a comment - with waitSearcher=true I think you will get the same thing with a pure commit and no docs or deletes.
        Hide
        Timothy Potter added a comment -

        Patch looks good and I ran it through my scenario (described above ^) and the optimize was definitely sent to all replicas in parallel and finished in less than half the runtime previously.

        Show
        Timothy Potter added a comment - Patch looks good and I ran it through my scenario (described above ^) and the optimize was definitely sent to all replicas in parallel and finished in less than half the runtime previously.
        Hide
        Mark Miller added a comment -

        Here is a cleaned up patch. If there are no further comments, I'll commit this soon.

        Show
        Mark Miller added a comment - Here is a cleaned up patch. If there are no further comments, I'll commit this soon.
        Hide
        Shalin Shekhar Mangar added a comment -

        +1 LGTM

        Show
        Shalin Shekhar Mangar added a comment - +1 LGTM
        Hide
        ASF subversion and git services added a comment -

        Commit 1614118 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1614118 ]

        SOLR-6264: Distributed commit and optimize are executed serially across all replicas.

        Show
        ASF subversion and git services added a comment - Commit 1614118 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1614118 ] SOLR-6264 : Distributed commit and optimize are executed serially across all replicas.
        Hide
        ASF subversion and git services added a comment -

        Commit 1614120 from Mark Miller in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1614120 ]

        SOLR-6264: Distributed commit and optimize are executed serially across all replicas.

        Show
        ASF subversion and git services added a comment - Commit 1614120 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1614120 ] SOLR-6264 : Distributed commit and optimize are executed serially across all replicas.
        Hide
        Mark Miller added a comment -

        Thanks!

        Show
        Mark Miller added a comment - Thanks!

          People

          • Assignee:
            Mark Miller
            Reporter:
            Timothy Potter
          • Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development