ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-2014

Only admin should be allowed to reconfig a cluster

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Blocker Blocker
    • Resolution: Unresolved
    • Affects Version/s: 3.5.0
    • Fix Version/s: 3.5.3
    • Component/s: server
    • Labels:
      None

      Description

      ZOOKEEPER-107 introduces reconfiguration support via the reconfig() call. We should, at the very least, ensure that only the Admin can reconfigure a cluster. Perhaps restricting access to /zookeeper/config as well, though this is debatable. Surely one could ensure Admin only access via an ACL, but that would leave everyone who doesn't use ACLs unprotected. We could also force a default ACL to make it a bit more consistent (maybe).

      Finally, making reconfig() only available to Admins means they have to run with zookeeper.DigestAuthenticationProvider.superDigest (which I am not sure if everyone does, or how would it work with other authentication providers).

      1. ZOOKEEPER-2014.patch
        2 kB
        Raul Gutierrez Segales

        Issue Links

          Activity

          Hide
          Raul Gutierrez Segales added a comment -

          this is RFC and very basic.

          cc: Alexander Shraer, Patrick Hunt, Hongchao Deng

          Show
          Raul Gutierrez Segales added a comment - this is RFC and very basic. cc: Alexander Shraer , Patrick Hunt , Hongchao Deng
          Hide
          Hongchao Deng added a comment -

          ensure that only the Admin can reconfigure a cluster.

          This change will reduce much flexibility in reconfig.
          A counter scenario would be I have a process that detects "permanent" failed ZK servers and removes them to make a smaller quorum (better fault tolerance). Does the process have to be Admin? Or would a default ACL be a better option here?

          Show
          Hongchao Deng added a comment - ensure that only the Admin can reconfigure a cluster. This change will reduce much flexibility in reconfig. A counter scenario would be I have a process that detects "permanent" failed ZK servers and removes them to make a smaller quorum (better fault tolerance). Does the process have to be Admin? Or would a default ACL be a better option here?
          Hide
          Alexander Shraer added a comment -

          Thanks for starting this thread, Raul. I'm not sure what's the right way to handle the issue, but just wanted to mention a couple of things.

          1. Regarding access to /zookeeper/config - clients may need read access for example in order to run the new client-side load balancing functionality, they need to detect a config change and get a new connection string. They can use getConfig for this. It is probably a good idea to restrict access for writes. Writing it probably won't break the system but may cause all clients to migrate to any server I choose to mention there, if clients are using the client-side load balancing algorithm.

          2. Notice that when the leader processes reconfig in PrepRequestProcessor it already checks ACL:

          nodeRecord = getRecordForPath(ZooDefs.CONFIG_NODE);
          checkACL(zks, nodeRecord.acl, ZooDefs.Perms.WRITE, request.authInfo);

          Show
          Alexander Shraer added a comment - Thanks for starting this thread, Raul. I'm not sure what's the right way to handle the issue, but just wanted to mention a couple of things. 1. Regarding access to /zookeeper/config - clients may need read access for example in order to run the new client-side load balancing functionality, they need to detect a config change and get a new connection string. They can use getConfig for this. It is probably a good idea to restrict access for writes. Writing it probably won't break the system but may cause all clients to migrate to any server I choose to mention there, if clients are using the client-side load balancing algorithm. 2. Notice that when the leader processes reconfig in PrepRequestProcessor it already checks ACL: nodeRecord = getRecordForPath(ZooDefs.CONFIG_NODE); checkACL(zks, nodeRecord.acl, ZooDefs.Perms.WRITE, request.authInfo);
          Hide
          Alexander Shraer added a comment -

          Actually perhaps we should open a JIRA to hide the client-side rebalancing from clients (not for 3.5.0). We may want to implement it inside the client-side library somehow and just let the client specify whether this is enabled or disabled when creating a ZK handle. What do you think ?

          Show
          Alexander Shraer added a comment - Actually perhaps we should open a JIRA to hide the client-side rebalancing from clients (not for 3.5.0). We may want to implement it inside the client-side library somehow and just let the client specify whether this is enabled or disabled when creating a ZK handle. What do you think ?
          Hide
          Alexander Shraer added a comment -
          Show
          Alexander Shraer added a comment - + Flavio Junqueira , Michi Mutsuzaki
          Hide
          Alexander Shraer added a comment -

          Raul Gutierrez Segales, should this be similar to how updating the quota znode works ? or do you think changing configuration is different ?

          Show
          Alexander Shraer added a comment - Raul Gutierrez Segales , should this be similar to how updating the quota znode works ? or do you think changing configuration is different ?
          Hide
          Raul Gutierrez Segales added a comment -

          Alexander Shraer: well, quotas rely on ACLs. But, quotas are (currently) advisory only (i.e.: we don't reject traffic) so no damage can be done by non-admins. So I think we need a stricter enforcement here, no?

          Show
          Raul Gutierrez Segales added a comment - Alexander Shraer : well, quotas rely on ACLs. But, quotas are (currently) advisory only (i.e.: we don't reject traffic) so no damage can be done by non-admins. So I think we need a stricter enforcement here, no?
          Hide
          Raul Gutierrez Segales added a comment -

          (by which I mean, you can really bring a cluster down with this).

          Show
          Raul Gutierrez Segales added a comment - (by which I mean, you can really bring a cluster down with this).
          Hide
          Patrick Hunt added a comment -

          Yes, this is a concern for me as well.

          1) mixing the client api and the admin api is not great. It would be better to have them separate. We should fix this asap.

          2) this (controlling access to reconfig) is a big issue from a security perspective IMO.

          A few comments on the comments so far:

          ensure that only the Admin can reconfigure a cluster

          sounds sensible to me

          Perhaps restricting access to /zookeeper/config as well

          in the past we've (ben in particular) tried to limit the amount of information we provide to the client/session. For example we don't tell them which server they are connected to. I see this in the same vein.

          one could ensure Admin only access via an ACL, but that would leave everyone who doesn't use ACLs unprotected.

          well, you're already unprotected in this situation so I don't really see it as a sticking point.

          clients may need read access for example in order to run the new client-side load balancing functionality

          perhaps this argues for pushing this to the the server? encapsulate the information on the service I mean and expose as a specific api.

          Actually perhaps we should open a JIRA to hide the client-side rebalancing from clients (not for 3.5.0).

          yes, that's what I was trying to get at in the previous item in this comment. Although I think we'd need to fix this now - while we can still change the apis w/o worrying about b/w compat (we can change new apis during the alpha period w/o worrying about back compat)

          should this be similar to how updating the quota znode works ? or do you think changing configuration is different ?

          if quotas are "admin" functions we probably need to fix those as well - lock them down to just admin level authz I mean.

          Show
          Patrick Hunt added a comment - Yes, this is a concern for me as well. 1) mixing the client api and the admin api is not great. It would be better to have them separate. We should fix this asap. 2) this (controlling access to reconfig) is a big issue from a security perspective IMO. A few comments on the comments so far: ensure that only the Admin can reconfigure a cluster sounds sensible to me Perhaps restricting access to /zookeeper/config as well in the past we've (ben in particular) tried to limit the amount of information we provide to the client/session. For example we don't tell them which server they are connected to. I see this in the same vein. one could ensure Admin only access via an ACL, but that would leave everyone who doesn't use ACLs unprotected. well, you're already unprotected in this situation so I don't really see it as a sticking point. clients may need read access for example in order to run the new client-side load balancing functionality perhaps this argues for pushing this to the the server? encapsulate the information on the service I mean and expose as a specific api. Actually perhaps we should open a JIRA to hide the client-side rebalancing from clients (not for 3.5.0). yes, that's what I was trying to get at in the previous item in this comment. Although I think we'd need to fix this now - while we can still change the apis w/o worrying about b/w compat (we can change new apis during the alpha period w/o worrying about back compat) should this be similar to how updating the quota znode works ? or do you think changing configuration is different ? if quotas are "admin" functions we probably need to fix those as well - lock them down to just admin level authz I mean.
          Hide
          Alexander Shraer added a comment -

          Everything you guys are saying about admin-only controls sounds very reasonable. I just want to clarify about the special reconfig znode. IMHO we should not allow write permissions to this node. I don't even see why an admin should have it its set only through the reconfig API.

          I do think that all clients should have read permissions, and here's why - the information they get from this znode is the up-to-date connection string. When the configuration changes this is the bare minimum they need in order not to loose track of the system. When their server crashes they need to know whom to connect to next. The new connection string is exactly the information we exploit for rebalancing. It is even implemented inside the updateServerList method, which is also needed in any case.

          Regarding my suggestion - Hongchao opened a JIRA for it and you can read the discussion there https://issues.apache.org/jira/browse/ZOOKEEPER-2016
          Please especially see Marshall's comment.
          All I'm proposing there is to implement some default behaviour that will save most clients from setting a watch and invoking updateServerList - I suggest that the client-side-library does it for them if they opt-in for the default behaviour. It doesn't change APIs, just adds one more feature, so it doesn't delay 3.5.0.

          Show
          Alexander Shraer added a comment - Everything you guys are saying about admin-only controls sounds very reasonable. I just want to clarify about the special reconfig znode. IMHO we should not allow write permissions to this node. I don't even see why an admin should have it its set only through the reconfig API. I do think that all clients should have read permissions, and here's why - the information they get from this znode is the up-to-date connection string. When the configuration changes this is the bare minimum they need in order not to loose track of the system. When their server crashes they need to know whom to connect to next. The new connection string is exactly the information we exploit for rebalancing. It is even implemented inside the updateServerList method, which is also needed in any case. Regarding my suggestion - Hongchao opened a JIRA for it and you can read the discussion there https://issues.apache.org/jira/browse/ZOOKEEPER-2016 Please especially see Marshall's comment. All I'm proposing there is to implement some default behaviour that will save most clients from setting a watch and invoking updateServerList - I suggest that the client-side-library does it for them if they opt-in for the default behaviour. It doesn't change APIs, just adds one more feature, so it doesn't delay 3.5.0.
          Hide
          Alexander Shraer added a comment -

          Raul Gutierrez Segales, does your patch solve the issue ? if not, what is still missing ? If I remember correctly my concern was that I'd like getConfig to be available to regular clients, not only admin, so they can react to configuration changes.

          If this JIRA is what's blocking 3.5 perhaps we could reconsider the approach and go with something
          simpler to start with, such as relying on ACLs. Or setting default ACLs for the config znode and requiring
          client admins to have these permissions.

          Show
          Alexander Shraer added a comment - Raul Gutierrez Segales , does your patch solve the issue ? if not, what is still missing ? If I remember correctly my concern was that I'd like getConfig to be available to regular clients, not only admin, so they can react to configuration changes. If this JIRA is what's blocking 3.5 perhaps we could reconsider the approach and go with something simpler to start with, such as relying on ACLs. Or setting default ACLs for the config znode and requiring client admins to have these permissions.
          Hide
          Patrick Hunt added a comment -

          We can't (or at least promise we won't) change the api in a non b/w compatible way once it's post-alpha. As such I believe at a minimum we need to clean up the API (should be simple - move it out of ZooKeeper class) and ensure there are no security issues.

          Show
          Patrick Hunt added a comment - We can't (or at least promise we won't) change the api in a non b/w compatible way once it's post-alpha. As such I believe at a minimum we need to clean up the API (should be simple - move it out of ZooKeeper class) and ensure there are no security issues.
          Hide
          Michael Han added a comment - - edited

          Had an offline discussion with Patrick Hunt, Flavio Junqueira yesterday regarding this issue, the conversation is captured as following points (with some of my thoughts as well):

          • We can't fix security issue unless we enforce authentication and authorization. Just by moving client APIs around is not enough because at protocol level the server still open to reconfiguration and someone can exploit this easily (by writing their own ZK client instead of using ZooKeeper client for example.).
          • That said though, the ZooKeeper::reconfig API should be moved out of ZooKeeper class (for Java client) anyway, because it's more about an admin feature rather than a client API. Moving reconfig out of ZooKeeper will also remove constraints we possibly put on normal ZooKeeper clients (such as having to use zookeeper.DigestAuthenticationProvider.superDigest), which is a bonus. Due to API backward compatibility concerns, this refactoring should happen before we move to beta.
          • We'd like to introduce a new configuration option in zoo.cfg to turn off reconfig feature by default. ZK users who needs use this feature need turn it on explicitly. Because dynamic reconfig feature brings something new (e.g. cfg.dynamic file), have this feature off is good for 'the principal of least surprise'. Having the feature off by default will also buy us sometime to fortify and stabilize the feature without having it being a blocker issue.

          The action items / plans I am thinking in the time frame of 3.5.3:

          • Fix Security:
            • Enforce an ACL on /zookeeper/config, such that only users that have write permission to it can reconfig the cluster. /zookeeper/config is by default readable to anyone so zookeeper (none-admin) clients can load balancing on client side either manually (current behavior) or automatically (ZOOKEEPER-2016).
            • ZooKeeper users are responsible for properly configure the ACL such that only a limited set of admin users are part of the ACL with write access. The authentication of these users will be delegated to existing mechanisms ZooKeeper already supports, such as SASL client login, so not much work here except documentation on such a requirement to use reconfig feature.
            • The default behavior is such that if /zookeeper/config node has no associated ACLs, then no one is allowed write access except super user.
          • Fix API:
            • Create a new class ZooKeeperAdmin that inherits ZooKeeper class.
            • Move reconfig API into ZooKeeperAdmin class.
          • Introduce a new zoo.cfg option to switch reconfiguration feature on or off. The new configuration option will be set as 'disable reconfig' by default when 3.5.3 shipped, and user who wants reconfig has to enable reconfig option explicitly.

          Are these good enough for now to address all security concerns or we still miss something?
          Comments?

          CC Alexander Shraer, Raul Gutierrez Segales

          Show
          Michael Han added a comment - - edited Had an offline discussion with Patrick Hunt , Flavio Junqueira yesterday regarding this issue, the conversation is captured as following points (with some of my thoughts as well): We can't fix security issue unless we enforce authentication and authorization. Just by moving client APIs around is not enough because at protocol level the server still open to reconfiguration and someone can exploit this easily (by writing their own ZK client instead of using ZooKeeper client for example.). That said though, the ZooKeeper::reconfig API should be moved out of ZooKeeper class (for Java client) anyway, because it's more about an admin feature rather than a client API. Moving reconfig out of ZooKeeper will also remove constraints we possibly put on normal ZooKeeper clients (such as having to use zookeeper.DigestAuthenticationProvider.superDigest), which is a bonus. Due to API backward compatibility concerns, this refactoring should happen before we move to beta. We'd like to introduce a new configuration option in zoo.cfg to turn off reconfig feature by default. ZK users who needs use this feature need turn it on explicitly. Because dynamic reconfig feature brings something new (e.g. cfg.dynamic file), have this feature off is good for 'the principal of least surprise'. Having the feature off by default will also buy us sometime to fortify and stabilize the feature without having it being a blocker issue. The action items / plans I am thinking in the time frame of 3.5.3: Fix Security: Enforce an ACL on /zookeeper/config, such that only users that have write permission to it can reconfig the cluster. /zookeeper/config is by default readable to anyone so zookeeper (none-admin) clients can load balancing on client side either manually (current behavior) or automatically ( ZOOKEEPER-2016 ). ZooKeeper users are responsible for properly configure the ACL such that only a limited set of admin users are part of the ACL with write access. The authentication of these users will be delegated to existing mechanisms ZooKeeper already supports, such as SASL client login, so not much work here except documentation on such a requirement to use reconfig feature. The default behavior is such that if /zookeeper/config node has no associated ACLs, then no one is allowed write access except super user. Fix API: Create a new class ZooKeeperAdmin that inherits ZooKeeper class. Move reconfig API into ZooKeeperAdmin class. Introduce a new zoo.cfg option to switch reconfiguration feature on or off. The new configuration option will be set as 'disable reconfig' by default when 3.5.3 shipped, and user who wants reconfig has to enable reconfig option explicitly. Are these good enough for now to address all security concerns or we still miss something? Comments? CC Alexander Shraer , Raul Gutierrez Segales
          Hide
          Alexander Shraer added a comment -

          Hi Michael, all of this sounds good to me except for moving getConfig to ZooKeeperAdmin. I've explained why several times in the thread above (search for "getconfig"), please let me know if you disagree.

          Show
          Alexander Shraer added a comment - Hi Michael, all of this sounds good to me except for moving getConfig to ZooKeeperAdmin. I've explained why several times in the thread above (search for "getconfig"), please let me know if you disagree.
          Hide
          Michael Han added a comment -

          Thanks for your feedback Alex. I agree about getConfig API remaining as part of ZooKeeper because we don't transparently handle load balancing for clients. I had that in my first version of proposal but I forgot to keep it when update the proposal.

          A side question, I think this (handle load balancing when server changes) is the only use case for getConfig API right? Basically:

          • Client sets a watcher when calling getConfig.
          • The callback of the watcher invokes updateServerList.
          • When servers are reconfigured, watcher gets triggered and updateServerList gets invoked, which would do load balancing for clients.
          Show
          Michael Han added a comment - Thanks for your feedback Alex. I agree about getConfig API remaining as part of ZooKeeper because we don't transparently handle load balancing for clients. I had that in my first version of proposal but I forgot to keep it when update the proposal. A side question, I think this (handle load balancing when server changes) is the only use case for getConfig API right? Basically: Client sets a watcher when calling getConfig. The callback of the watcher invokes updateServerList. When servers are reconfigured, watcher gets triggered and updateServerList gets invoked, which would do load balancing for clients.
          Hide
          Alexander Shraer added a comment -

          I think that the general idea is that clients should be able to track the latest configuration and do something when it changes. For example if the server to which the client is connected goes away, it should have a way to query the system for the new list of servers and connect to one of them. The method you described is one option, outlined in the reconfig manual (right at the end there's some code for this).

          It would be nice to automate these steps for the user and to support a user defined policy instead of these steps (see some ideas in ZOOKEEPER-2016). For example, if you want the user to only connect to near-by servers, you'd need to filter the new list of servers before calling updateServerList. This can be done as part of a user-supplied policy.

          Show
          Alexander Shraer added a comment - I think that the general idea is that clients should be able to track the latest configuration and do something when it changes. For example if the server to which the client is connected goes away, it should have a way to query the system for the new list of servers and connect to one of them. The method you described is one option, outlined in the reconfig manual (right at the end there's some code for this). It would be nice to automate these steps for the user and to support a user defined policy instead of these steps (see some ideas in ZOOKEEPER-2016 ). For example, if you want the user to only connect to near-by servers, you'd need to filter the new list of servers before calling updateServerList. This can be done as part of a user-supplied policy.

            People

            • Assignee:
              Michael Han
              Reporter:
              Raul Gutierrez Segales
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development