[SOLR-16257] Possible Race condition in ZkStateReader between fields watchedCollectionStates and collectionWatches - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 8.8, main (10.0)
Fix Version/s: 9.1, main (10.0)
Component/s: SolrCloud
Labels:
None

Description

We have observed certain weird behavior with our Solr that the cluster state for certain collections seem to be "stuck" in certain stale states. While inspecting the current code logic, it's found that certain race condition can arise and lead to inconsistent states of `collectionWatches` and `watchedCollectionStates`

By looking at the current design, it appears that those 2 fields:

collectionWatches - for "watching" ZK updates on collections (via notification). Such map has the collection name (ie org) as the key and value is CollectionWatch which holds another set of DocCollectionWatchers in this case (ie ConcurrentHashMap<String, CollectionWatch<DocCollectionWatcher>> collectionWatches)
watchedCollectionStates - as ConcurrentHashMap<String, DocCollection>, which key is collection name and value is the DocCollection value triggered by previous watch event handled by ZkStateReader$StateWatcher (either by the fetch on first watcher registration or by notification from ZK on state changes)

On the design level, such 2 fields should be "in-sync" , ie if a collection is no longer tracked in collectionWatches then it should not have any entry in watchedCollectionStates neither.

However such guarantee is not a strong one as we can see there are various code pieces that try to remove entries from watchedCollectionStates if somehow the collection is no longer in collectionWatches for example in here , in particular this appears to address certain race condition with unregisterCore. The code that removes registered DocCollectionWatcher also tries to ensure both maps are in sync as in here

The core of the issue appears to be that there's an assumption when the last DocCollectionWatcher is removed from the CollectionWatch, both the watchedCollectionStates and collectionWatches should be purged of the watched collection. Hence the clusterState should have the LazyCollectionRef instead, which DocCollection get(boolean allowCached) invocation should return the correct DocCollection state on allowCached=false. Unfortunately, there could still be possible race condition as far as there are 2 separate maps. One possible race condition is demonstrated in later section

PR proposal

An idea is to Merge DocCollectionWater into CollectionWatch so we would not run into race condition that the collection key appears in one but not the other. Please see the PR here https://github.com/apache/solr/pull/909 and would love to gather some feedbacks and thoughts!

Steps to reproduce a race condition

Spin up 2 nodes, only one node should serve the test collection, the other node should be made the overseer/leader of the cluster. Debug on the overseer node

From the IDE, ensure all breakpoints suspend "thread" only, not "all" (that's for intelliJ)
All breakpoints are in ZkStateReader. Set breakpoint at line here, put condition Thread.currentThread().getName().startsWith("zkCallback") . This pauses when zk notification comes in and right before it checks whether the collection is already in watchedCollectionStates in method updateWatchedCollection
Set breakpoint at here, which throws TimeoutException after the latch timeout, for example from call of (inside ZkStateWatcher#waitForState)
now start debugging the overseer node
Issue a split shard request to the overseer. For example http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=test&shard=shard1
Eventually a thread will stop the first breakpoint within updateWatchedCollection, the thread name should be something like zkCallback-127-thread-2
Might have to wait for 320 secs (timeout on CollectionHandlingUtils.waitForNewShard), another thread should stop at 2nd breakpoint throwing TimeoutException, the thread name should be something similar to OverseerThreadFactory...
Add breakpoint at removeDocCollectionWatcherhere, that removes the entry from watchedCollectionStates but not yet purge the collectionWatches
Resume this OverseerThreadFactory thread that's going to throw TimeoutException, it should stop the new breakpoint in removeDocCollectionWatcher.
Switch back to thread zkCallback..., resume that thread, it should find that the watchedCollectionStates is empty, hence trying to insert a CollectionRef into it
Switch back to OverseerThreadFactory thread, step a few steps, it will purge the collectionWatches but the watchedCollectionStates will still have one entry in it.
If we inspect the zkStateReader.clusterState at this point, we will notice that the collection will have a non lazy CollectionRef for the test collection, while the collectionWatches is empty but watchedCollectionStates would still have the collection state in there

Attachments

Issue Links

links to

GitHub Pull Request #909

GitHub Pull Request #966

Activity

People

Assignee:: Houston Putman

Reporter:: Patson Luk

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 17/Jun/22 00:54

Updated:: 10/Jan/23 20:45

Resolved:: 20/Jul/22 19:06

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

14h 20m