Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
The basic functionality of the authorization/authentication REST APIs works by persisting changes to a security.json file in ZooKeeper which is monitored by every node via a Watcher. When the watchers fire, the affected plugin types are (re)-initialized ith the new settings.
Since this information is "pulled" from ZK by the nodes, there is a (small) inherent delay between when the REST API is hit by external clients, and when each node learns of the changes. An additional delay exists as the config is "reloaded" to (re)initialize the plugins.
Practically speaking these delays have very little impact on a "real" solr cloud cluster, but they can be problematic in test cases – while the SecurityConfHandler on each node could be used to query the "current" security.json file, it doesn't indicate if/when the plugins identified in the "current" configuration are fully in use.
For now, we have a "white box" work around available for MiniSolrCloudCluster based tests by comparing the Plugins of each CoreContainer in use before and after making known changes via the API (see commits identified below).
This issue exists as a placeholder for future consideration of UX/API improvements making it easier for external clients (w/o "white box" access to solr internals) to know definitively if/when modified security settings take effect.
I've been investigating some sporadic and hard to reproduce test failures related to authentication in cloud mode, and i think (but have not directly verified) that the common cause is that after uses one of the /admin/auth... handlers to update some setting, there is an inherient and unpredictible delay (due to ZK watches) until every node in the cluster has had a chance to (re)load the new configuration and initialize the various security plugins with the new settings.
Which means, if a test client does a POST to some node to add/change/remove some authn/authz settings, and then immediately hits the exact same node (or any other node) to test that the effects of those settings exist, there is no garuntee that they will have taken affect yet.