Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-17160

Bulk admin operations may fail because of max tracked requests

    XMLWordPrintableJSON

Details

    Description

      In CoreAdminHandler, we maintain in-memory the list of in-flight requests and completed/failed request.

      Note they are core/replica level async requests, and not top level requests which mostly at the collection level. Top level requests are tracked by storing the async ID in a Zookeeper node, which is not related to this ticket.

       

      For completed/failed requests, we only track a maximum of 100 requests by dropping the oldest ones. The typical client in CollectionHandlingUtils.waitForCoreAdminAsyncCallToComplete() polls status of the submitted requests, with a retry loop until requests are completed. If for some reason we have more than 100 requests that complete or fail on a node before all statuses are polled by the client, the statuses are lost and the client will fail with an unexpected error similar to:

      Invalid status request for requestId: '<id>' - 'notfound'. Retried <n> times

       

      Instead of having a hard limit for the number of requests we track, we could have time based eviction. I think it makes sense to keep status of a request until a given timeout, and then drop it ignoring how many requests we currently track.

      Attachments

        Activity

          People

            Unassigned Unassigned
            pierre.salagnac Pierre Salagnac
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3h
                3h