Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-15106

Thread in OverseerTaskProcessor should not "return"

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 8.6, main (9.0)
    • Fix Version/s: None
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      I have encountered a scenario were ZK was not accessible for a long time (due to jute.maxbuffer issue, but not related to the rest of this issue).
      During that time, the ClusterStateUpdater and OC queues from the Overseer got filled with 1200+ messages.

      Once we restored ZK availability, the ClusterStateUpdater queue got emptied, but not the OC one.

      The Overseer stopped to dequeue from the OC queue.

      After some digging in the code it seems that a return from the overseer thread starting the runners could be the issue.

      Code in OverseerTaskProcessor.java (https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java#L357)
      The lines of codes that immediately follow should also be reviewed carefully as they also return or interrupt the thread that is responsible to execute the runners.

      Anyhow, if anybody hit that same issue, the quick workaround is to bump the overseer instance to elect a new overseer on another node.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              matmarie Mathieu Marie

              Dates

              • Created:
                Updated:

                Issue deployment