Hadoop Common
  1. Hadoop Common
  2. HADOOP-8215

Security support for ZK Failover controller

    Details

      Description

      To keep the initial patches manageable, kerberos security is not currently supported in the ZKFC implementation. This JIRA is to support the following important pieces for security:

      • integrate with ZK authentication (kerberos or password-based)
      • allow the user to configure ACLs for the relevant znodes
      • add keytab configuration and login to the ZKFC daemons
      • ensure that the RPCs made by the health monitor and failover controller properly authenticate to the target daemons
      1. hadoop-8215.txt
        31 kB
        Todd Lipcon
      2. hadoop-8215.txt
        34 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          I'll commit this momentarily to the branch based on ATM's above +1, since the review feedback changes were mostly cosmetic. I ran the ZKFC and HAAdmin tests locally for both common and HDFS and they passed.

          Show
          Todd Lipcon added a comment - I'll commit this momentarily to the branch based on ATM's above +1, since the review feedback changes were mostly cosmetic. I ran the ZKFC and HAAdmin tests locally for both common and HDFS and they passed.
          Hide
          Todd Lipcon added a comment -

          Fixed the above. I left one case of not asserting anything about the exception – the test where we expect NoAuth. Since there's only one call inside the try clause, and we are explicitly catching NoAuth and no other types of exceptions, there wasn't anything to really assert about that was more specific than what we already checked.

          This patch is also rebased on the tip of the branch.

          Show
          Todd Lipcon added a comment - Fixed the above. I left one case of not asserting anything about the exception – the test where we expect NoAuth. Since there's only one call inside the try clause, and we are explicitly catching NoAuth and no other types of exceptions, there wasn't anything to really assert about that was more specific than what we already checked. This patch is also rebased on the tip of the branch.
          Hide
          Aaron T. Myers added a comment -

          Patch looks pretty good, Todd. Just a few little nits. +1 once these are addressed.

          1. Typo: "partiall borrowed from"
          2. Please add a comment to the public method HAZKUtil#parseAuth.
          3. In a few test methods you expect exceptions to be thrown but then just silently ignore them. Should probably put some GenericTestUtil#verifyExceptionContains in there.
          Show
          Aaron T. Myers added a comment - Patch looks pretty good, Todd. Just a few little nits. +1 once these are addressed. Typo: "partiall borrowed from" Please add a comment to the public method HAZKUtil#parseAuth. In a few test methods you expect exceptions to be thrown but then just silently ignore them. Should probably put some GenericTestUtil#verifyExceptionContains in there.
          Hide
          Todd Lipcon added a comment -

          Because coverage of security is hard to automate, I performed the following manual test steps to verify this patch on a secure cluster:

          • Set up two NNs with kerberos security enabled
          • Use ZK command line to generate digest credentials:
            todd@todd-w510:~/releases/zookeeper-3.4.1-cdh4b1$ java -cp lib/*:zookeeper-3.4.1-cdh4b1.jar org.apache.zookeeper.server.auth.DigestAuthenticationProvider foo:testing
            foo:testing->foo:vlUvLnd8MlacsE80rDuu6ONESbM=
            

          Add these two the HDFS configuration:

           <property>
             <name>ha.zookeeper.acl</name>
             <value>digest:foo:vlUvLnd8MlacsE80rDuu6ONESbM=:rwcda</value>
           </property>
           <property>
             <name>ha.zookeeper.auth</name>
             <value>digest:foo:testing</value>
           </property>
          
          • Run bin/hdfs zkfc -formatZK
          • Run bin/hdfs zkfc for each NN
          • Run bin/hdfs namenode for each NN
          • Verify that one of the NNs becomes active. Kill that NN. Verify that the other NN becomes active within a few seconds.
          • Verify authentication results in the NN logs:
            12/04/02 17:25:22 INFO authorize.ServiceAuthorizationManager: Authorization successfull for hdfs-todd/todd-w510@HADOOP.COM (auth:KERBEROS) for protocol=interface org.apache.hadoop.ha.HAServiceProtocol
            
          • Use ZK CLI to verify the acls:
            [zk: localhost:2181(CONNECTED) 1] addauth digest foo:testing
            [zk: localhost:2181(CONNECTED) 2] ls /hadoop-ha
            [ActiveBreadCrumb, ActiveStandbyElectorLock]
            [zk: localhost:2181(CONNECTED) 3] getAcl /hadoop-ha
            'digest,'foo:vlUvLnd8MlacsE80rDuu6ONESbM=
            : cdrwa
            [zk: localhost:2181(CONNECTED) 4] getAcl /hadoop-ha/ActiveBreadCrumb
            'digest,'foo:vlUvLnd8MlacsE80rDuu6ONESbM=
            : cdrwa
            
          • Shut down nodes, replace configuration with indirect version:
             <property>
               <name>ha.zookeeper.acl</name>
               <value>@/home/todd/confs/devconf.ha.common/zk-acl.txt</value>
             </property>
             <property>
               <name>ha.zookeeper.auth</name>
               <value>@/home/todd/confs/devconf.ha.common/zk-auth.txt</value>
             </property>
            

            and move the actual values to the files as specified above

          • Restart ZKFCs, verify that the ACLs are still being correctly used
          • chmod 000 the ACL data so it's no longer readable, try to restart one of the ZKFCs, verify error:
            Exception in thread "main" java.io.FileNotFoundException: /home/todd/confs/devconf.ha.common/zk-acl.txt (Permission denied)
            
          Show
          Todd Lipcon added a comment - Because coverage of security is hard to automate, I performed the following manual test steps to verify this patch on a secure cluster: Set up two NNs with kerberos security enabled Use ZK command line to generate digest credentials: todd@todd-w510:~/releases/zookeeper-3.4.1-cdh4b1$ java -cp lib/*:zookeeper-3.4.1-cdh4b1.jar org.apache.zookeeper.server.auth.DigestAuthenticationProvider foo:testing foo:testing->foo:vlUvLnd8MlacsE80rDuu6ONESbM= Add these two the HDFS configuration: <property> <name>ha.zookeeper.acl</name> <value>digest:foo:vlUvLnd8MlacsE80rDuu6ONESbM=:rwcda</value> </property> <property> <name>ha.zookeeper.auth</name> <value>digest:foo:testing</value> </property> Run bin/hdfs zkfc -formatZK Run bin/hdfs zkfc for each NN Run bin/hdfs namenode for each NN Verify that one of the NNs becomes active. Kill that NN. Verify that the other NN becomes active within a few seconds. Verify authentication results in the NN logs: 12/04/02 17:25:22 INFO authorize.ServiceAuthorizationManager: Authorization successfull for hdfs-todd/todd-w510@HADOOP.COM (auth:KERBEROS) for protocol= interface org.apache.hadoop.ha.HAServiceProtocol Use ZK CLI to verify the acls: [zk: localhost:2181(CONNECTED) 1] addauth digest foo:testing [zk: localhost:2181(CONNECTED) 2] ls /hadoop-ha [ActiveBreadCrumb, ActiveStandbyElectorLock] [zk: localhost:2181(CONNECTED) 3] getAcl /hadoop-ha 'digest,'foo:vlUvLnd8MlacsE80rDuu6ONESbM= : cdrwa [zk: localhost:2181(CONNECTED) 4] getAcl /hadoop-ha/ActiveBreadCrumb 'digest,'foo:vlUvLnd8MlacsE80rDuu6ONESbM= : cdrwa Shut down nodes, replace configuration with indirect version: <property> <name>ha.zookeeper.acl</name> <value>@/home/todd/confs/devconf.ha.common/zk-acl.txt</value> </property> <property> <name>ha.zookeeper.auth</name> <value>@/home/todd/confs/devconf.ha.common/zk-auth.txt</value> </property> and move the actual values to the files as specified above Restart ZKFCs, verify that the ACLs are still being correctly used chmod 000 the ACL data so it's no longer readable, try to restart one of the ZKFCs, verify error: Exception in thread "main" java.io.FileNotFoundException: /home/todd/confs/devconf.ha.common/zk-acl.txt (Permission denied)
          Hide
          Todd Lipcon added a comment -

          Attached patch implements the above. Here's a summary of changes:

          • The ZKFC provides a new hook loginAsFCUser which implementations should implement for keytab login. The DFS implementation implements this by logging in using the NameNode keytab and credentials.
          • Refactored some of the code in DFSHAAdmin into a static method to set up the protocol principal information. This code is now called by DFSZKFailoverController.setConf as well.
          • Adds ha.zookeeper.acl and ha.zookeeper.auth configurations. These configs specify the ACL used for the znodes, and the authentications added when connecting to ZooKeeper. The format is the same as is used in the ZK shell. Additionally, the config values may be specified as "@/path/to/file" which allows an indirection. This is important when using digest-based authentication so as to avoid leaking the secret password via the /conf servlet, etc.
          • The ZK auth and acl parsing is in a new file called HAZKUtil. If we start using ZK for other purposes in Hadoop, we could rename it to HadoopZKUtil or something – nothing HA-specific in here.

          Note that a few of the RPC-related changes here are duplicate with HADOOP-8243. I'll resolve that during the merge when necessary.

          I also ran through some manual tests with a secure HDFS cluster and the ZKFC and it seemed to work. That was on an earlier version of the patch. I'll re-test with the latest patch before committing.

          Show
          Todd Lipcon added a comment - Attached patch implements the above. Here's a summary of changes: The ZKFC provides a new hook loginAsFCUser which implementations should implement for keytab login. The DFS implementation implements this by logging in using the NameNode keytab and credentials. Refactored some of the code in DFSHAAdmin into a static method to set up the protocol principal information. This code is now called by DFSZKFailoverController.setConf as well. Adds ha.zookeeper.acl and ha.zookeeper.auth configurations. These configs specify the ACL used for the znodes, and the authentications added when connecting to ZooKeeper. The format is the same as is used in the ZK shell. Additionally, the config values may be specified as "@/path/to/file" which allows an indirection. This is important when using digest-based authentication so as to avoid leaking the secret password via the /conf servlet, etc. The ZK auth and acl parsing is in a new file called HAZKUtil. If we start using ZK for other purposes in Hadoop, we could rename it to HadoopZKUtil or something – nothing HA-specific in here. Note that a few of the RPC-related changes here are duplicate with HADOOP-8243 . I'll resolve that during the merge when necessary. I also ran through some manual tests with a secure HDFS cluster and the ZKFC and it seemed to work. That was on an earlier version of the patch. I'll re-test with the latest patch before committing.
          Hide
          Todd Lipcon added a comment -

          I'm starting to work on this. Here's the plan:

          integrate with ZK authentication (kerberos or password-based)

          Based on https://github.com/ekoontz/zookeeper/wiki and http://hbase.apache.org/configuration.html#zk.sasl.auth it looks like the SASL setup is a bit complicated, though entirely configuration based. I think for a first pass we should be OK to just use password-based authentication for ZK. I think this is sufficient because we have a well-defined set of clients that need to access these znodes, and they don't contain any content that needs to be encrypted over the wire. We can add SASL support later.

          allow the user to configure ACLs for the relevant znodes

          This is reasonably straightforward - just needs some additional configuration keys to specify the ACL, and then tying it in to where we create the znodes.

          add keytab configuration and login to the ZKFC daemons

          I think it should be OK to re-use the namenode principals here. That simplifies deployment since it avoids having to add new principals to the KDC, and given that the ZKFCs are intended to run on the same machines as the NNs, they will have access to the keytab files by default. Please speak up if you think we need separate keytabs/principals for the ZKFC daemons.

          ensure that the RPCs made by the health monitor and failover controller properly authenticate to the target daemons

          This is just a matter of making sure we set up the target principal in the Configuration, and do the proper login/doAs before we start the main ZKFC code.

          Show
          Todd Lipcon added a comment - I'm starting to work on this. Here's the plan: integrate with ZK authentication (kerberos or password-based) Based on https://github.com/ekoontz/zookeeper/wiki and http://hbase.apache.org/configuration.html#zk.sasl.auth it looks like the SASL setup is a bit complicated, though entirely configuration based. I think for a first pass we should be OK to just use password-based authentication for ZK. I think this is sufficient because we have a well-defined set of clients that need to access these znodes, and they don't contain any content that needs to be encrypted over the wire. We can add SASL support later. allow the user to configure ACLs for the relevant znodes This is reasonably straightforward - just needs some additional configuration keys to specify the ACL, and then tying it in to where we create the znodes. add keytab configuration and login to the ZKFC daemons I think it should be OK to re-use the namenode principals here. That simplifies deployment since it avoids having to add new principals to the KDC, and given that the ZKFCs are intended to run on the same machines as the NNs, they will have access to the keytab files by default. Please speak up if you think we need separate keytabs/principals for the ZKFC daemons. ensure that the RPCs made by the health monitor and failover controller properly authenticate to the target daemons This is just a matter of making sure we set up the target principal in the Configuration, and do the proper login/doAs before we start the main ZKFC code.

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development