Affects Version/s: None
Fix Version/s: 8.8
This work has the same goal as
SOLR-13951, that is to reduce overseer bottlenecks by avoiding replica state updates from going to the state.json via the overseer. However, the approach taken here is different from SOLR-13951 and hence this work supercedes that work.
The design proposed is here: https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit
- Every replica's state will be in a separate znode nested under the state.json. It has the name that encodes the replica name, state, leadership status.
- An additional children watcher to be set on state.json for state changes.
- Upon a state change, a ZK multi-op to delete the previous znode and add a new znode with new state.
Differences between this and
SOLR-13951, we planned to leverage shard terms for per shard states.
- As a consequence, the code changes required for
SOLR-13951were massive (we needed a shard state provider abstraction and introduce it everywhere in the codebase).
- This approach is a drastically simpler change and design.
Credits for this design and the PR is due to Noble Paul. Mark Miller, Noble Paul and I have collaborated on this effort. The reference branch takes a conceptually similar (but not identical) approach.
I shall attach a PR and performance benchmarks shortly.