[SOLR-6402] OverseerCollectionProcessor should not exit for ZK ConnectionLoss - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 4.8, 6.0
Fix Version/s: 4.10, 6.0
Component/s: SolrCloud
Labels:
None

Description

We saw an occurrence where we had some ZK connection blip and the OverseerCollectionProcessor thread stopped but the ClusterStateUpdater output some error but kept running, and the node didn't lose its leadership. this caused our collection work queue to back up.

Right now OverseerCollectionProcessor's run method has on trunk:

344 if (e.code() == KeeperException.Code.SESSIONEXPIRED
345 || e.code() == KeeperException.Code.CONNECTIONLOSS) {
346 log.warn("Overseer cannot talk to ZK");
347 return;
348 }

I think this if statement should only be for SESSIONEXPIRED. If it just experiences a connection loss but then reconnect before the session expired, it'll keep being the leader.

Attachments

Issue Links

is related to

SOLR-6405 ZooKeeper calls can easily not be retried enough on ConnectionLoss.

Resolved

Activity

People

Assignee:: Mark Miller

Reporter:: Jessica Cheng Mallet

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 22/Aug/14 00:42

Updated:: 09/May/16 18:48

Resolved:: 24/Aug/14 13:46