Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-6402

OverseerCollectionProcessor should not exit for ZK ConnectionLoss

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.8, 6.0
    • 4.10, 6.0
    • SolrCloud
    • None

    Description

      We saw an occurrence where we had some ZK connection blip and the OverseerCollectionProcessor thread stopped but the ClusterStateUpdater output some error but kept running, and the node didn't lose its leadership. this caused our collection work queue to back up.

      Right now OverseerCollectionProcessor's run method has on trunk:

      344 if (e.code() == KeeperException.Code.SESSIONEXPIRED
      345 || e.code() == KeeperException.Code.CONNECTIONLOSS) {
      346 log.warn("Overseer cannot talk to ZK");
      347 return;
      348 }

      I think this if statement should only be for SESSIONEXPIRED. If it just experiences a connection loss but then reconnect before the session expired, it'll keep being the leader.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            markrmiller@gmail.com Mark Miller
            mewmewball Jessica Cheng Mallet
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment