I don't fully understand this functionality, but this commit looks scary as shit...
SOLR-6801 test always writes to leader so that replication lag does not impact next insertion
...why this change is considered a "safe" solution to the existing test failures? .. it seems to just be making the test absurdly week – isn't the root problem here (replication lag) something that can and will come up when end users try to use this same functionality?
if the only way the test can reliably pass is if we put hacks into the test to ensure that the updates only go to the "blob" leader, that suggests to me that the functionality itself isn't going to work reliably for end users unless they also only ever hit the leader ... what stops a user from encountering the same replication lag?
it seems like either:
1) we need to protect user by locking the feature down:
- document that the blob store it only works when talking to the "blob" leader
- lock down the blob handler to reject requests to nodes that aren't the leader
2) the solr code itself needs hardened to do some sort of forward to leader (ala: atomic updates and/or real time get) or push the responsibilty down to the client via something like opportunistic locking (i'm hand wavy here because i don't fully understand the usecases/goals)