Solr
  1. Solr
  2. SOLR-6816 Review SolrCloud Indexing Performance.
  3. SOLR-7333

Make the poll queue time configurable and use knowledge that a batch is being processed to poll efficiently

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.2, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      StreamingSolrClients uses ConcurrentUpdateSolrServer to stream documents from leader to replica, by default it sets the pollQueueTime for CUSS to 0 so that we don't impose an unnecessary wait when processing single document updates or the last doc in a batch. However, the downside is that replicas receive many more update requests than leaders; I've seen up to 40x number of update requests between replica and leader.

      If we're processing a batch of docs, then ideally the poll queue time should be greater than 0 up until the last doc is pulled off the queue. If we're processing a single doc, then the poll queue time should always be 0 as we don't want the thread to wait unnecessarily for another doc that won't come.

      Rather than force indexing applications to provide this optional parameter in an update request, it would be better for server-side code that can detect whether an update request is a single document or batch of documents to override this value internally, i.e. it'll be 0 by default, but since JavaBinUpdateRequestCodec can determine when it's seen the last doc in a batch, it can override the pollQueueTime to something greater than 0.

      This means that current indexing clients will see a boost when doing batch updates without making any changes on their side.

      1. SOLR-7333.patch
        14 kB
        Timothy Potter
      2. SOLR-7333.patch
        9 kB
        Timothy Potter

        Issue Links

          Activity

          Show
          Timothy Potter added a comment - Here's the original comment I made about this issue: https://issues.apache.org/jira/browse/SOLR-6816?focusedCommentId=14233700&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14233700
          Hide
          Timothy Potter added a comment -

          Here's a first pass at a patch that uses a 25ms pollQueueTime when processing a batch of documents. The main idea here is that the javabin unmarshalling code can detect when it sees the last doc in a batch and can pass that hint down the line, eventually to the ConcurrentUpdateSolrServer the leader uses to stream docs to replicas. CUSS uses that hint to poll the queue for 0 (vs. waiting the 25ms). This helps stream more docs from the leader to replica per request, which keeps the requests processed by leaders and replicas nearly the same and reduces round-trips per batch between leader and replica. The hint about being the last doc in a batch is necessary to avoid the wait when processing docs one-by-one or when the last doc in a batch has been processed, i.e. poll the queue with a brief wait if more docs are available but don't wait if not.

          Currently, the pollQueueTime is hardcoded to 25 ms, but I suppose we could make that configurable. The key is to use a short wait so I felt 25 ms should be sufficient for most indexing applications.

          Lastly, I added the isLastDocInBatch flag as a member to UpdateRequest instead of including it into the params because CUSS checks for params changing while processing UpdateRequests in a batch and treats a change in parameters as a separate request, which is what we're trying to avoid here.

          Show
          Timothy Potter added a comment - Here's a first pass at a patch that uses a 25ms pollQueueTime when processing a batch of documents. The main idea here is that the javabin unmarshalling code can detect when it sees the last doc in a batch and can pass that hint down the line, eventually to the ConcurrentUpdateSolrServer the leader uses to stream docs to replicas. CUSS uses that hint to poll the queue for 0 (vs. waiting the 25ms). This helps stream more docs from the leader to replica per request, which keeps the requests processed by leaders and replicas nearly the same and reduces round-trips per batch between leader and replica. The hint about being the last doc in a batch is necessary to avoid the wait when processing docs one-by-one or when the last doc in a batch has been processed, i.e. poll the queue with a brief wait if more docs are available but don't wait if not. Currently, the pollQueueTime is hardcoded to 25 ms, but I suppose we could make that configurable. The key is to use a short wait so I felt 25 ms should be sufficient for most indexing applications. Lastly, I added the isLastDocInBatch flag as a member to UpdateRequest instead of including it into the params because CUSS checks for params changing while processing UpdateRequests in a batch and treats a change in parameters as a separate request, which is what we're trying to avoid here.
          Hide
          Mark Miller added a comment -

          Cool. Great stuff.

          Currently, the pollQueueTime is hardcoded to 25 ms, but I suppose we could make that configurable. The key is to use a short wait so I felt 25 ms should be sufficient for most indexing applications.

          With Solr no longer being a webapp, I like to make things like this configurable by sys prop as a quasi supported option (eg I don't document ). It shouldn't be needed, but given someone hits a case it is, they can exither easily experiment or work around something without an update.

          Show
          Mark Miller added a comment - Cool. Great stuff. Currently, the pollQueueTime is hardcoded to 25 ms, but I suppose we could make that configurable. The key is to use a short wait so I felt 25 ms should be sufficient for most indexing applications. With Solr no longer being a webapp, I like to make things like this configurable by sys prop as a quasi supported option (eg I don't document ). It shouldn't be needed, but given someone hits a case it is, they can exither easily experiment or work around something without an update.
          Hide
          Mark Miller added a comment -

          For example, RecoveryStrategy has this 'safety valve' property you shouldn't have to touch:

          private static final int WAIT_FOR_UPDATES_WITH_STALE_STATE_PAUSE = Integer.getInteger("solr.cloud.wait-for-updates-with-stale-state-pause", 7000);
          
          Show
          Mark Miller added a comment - For example, RecoveryStrategy has this 'safety valve' property you shouldn't have to touch: private static final int WAIT_FOR_UPDATES_WITH_STALE_STATE_PAUSE = Integer .getInteger( "solr.cloud.wait- for -updates-with-stale-state-pause" , 7000);
          Hide
          Timothy Potter added a comment -

          Thanks for the suggestion Mark! Updated patch with unit test added and ability to set the poll time using a Java system property, default is 25 ms. I think this one is ready to go.

          Show
          Timothy Potter added a comment - Thanks for the suggestion Mark! Updated patch with unit test added and ability to set the poll time using a Java system property, default is 25 ms. I think this one is ready to go.
          Hide
          ASF subversion and git services added a comment -

          Commit 1680436 from Timothy Potter in branch 'dev/trunk'
          [ https://svn.apache.org/r1680436 ]

          SOLR-7333: Make the poll queue time configurable and use knowledge that a batch is being processed to poll efficiently

          Show
          ASF subversion and git services added a comment - Commit 1680436 from Timothy Potter in branch 'dev/trunk' [ https://svn.apache.org/r1680436 ] SOLR-7333 : Make the poll queue time configurable and use knowledge that a batch is being processed to poll efficiently
          Hide
          ASF subversion and git services added a comment -

          Commit 1680441 from Timothy Potter in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1680441 ]

          SOLR-7333: Make the poll queue time configurable and use knowledge that a batch is being processed to poll efficiently

          Show
          ASF subversion and git services added a comment - Commit 1680441 from Timothy Potter in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1680441 ] SOLR-7333 : Make the poll queue time configurable and use knowledge that a batch is being processed to poll efficiently
          Hide
          Anshum Gupta added a comment -

          Bulk close for 5.2.0.

          Show
          Anshum Gupta added a comment - Bulk close for 5.2.0.

            People

            • Assignee:
              Timothy Potter
              Reporter:
              Timothy Potter
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development