Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-236

SSL Support for Atomic Broadcast protocol

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: quorum, server
    • Labels:
      None

      Description

      We should have the ability to use SSL to authenticate and encrypt the traffic between ZooKeeper servers. For the most part this is a very easy change. We would probably only want to support this for TCP based leader elections.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user afine opened a pull request:

          https://github.com/apache/zookeeper/pull/184

          ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol [DO NOT MERGE]

          This is a work in progress, I wanted to get some feedback from the community while I worked on this. Please do not merge yet. Tests, documentation, and some cleanup still coming.

          This is a first pass at ssl support for the zookeeper quorum. It supports encrypting both leader election and normal operation.

          Rolling upgrades are supported via port unification (`portUnification=true`). This should only be used while performing a rolling upgrade.

          Some open questions:

          Thanks,
          Abe

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/afine/zookeeper ZOOKEEPER-236

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/zookeeper/pull/184.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #184


          commit db33552046dea7e8e850945da4f18d18644d8ee5
          Author: Abraham Fine <afine@apache.org>
          Date: 2017-03-06T23:12:59Z

          ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user afine opened a pull request: https://github.com/apache/zookeeper/pull/184 ZOOKEEPER-236 : SSL Support for Atomic Broadcast protocol [DO NOT MERGE] This is a work in progress, I wanted to get some feedback from the community while I worked on this. Please do not merge yet. Tests, documentation, and some cleanup still coming. This is a first pass at ssl support for the zookeeper quorum. It supports encrypting both leader election and normal operation. Rolling upgrades are supported via port unification (`portUnification=true`). This should only be used while performing a rolling upgrade. Some open questions: Anyone have any ideas for better names for the configuration options (`sslQuorum` and `portUnification` currently). I am using the same configuration that points to the truststore/keystore used for server <-> client ssl. Do they need to be separate? Is port unification the correct approach for rolling upgrades? Is the impact from the use of `BufferedSocket`s during the upgrade acceptable? See: http://stackoverflow.com/questions/25637039/detecting-ssl-connection-and-converting-socket-to-sslsocket http://stackoverflow.com/questions/6559859/is-it-possible-to-change-plain-socket-to-sslsocket server < > client ssl is implemented with netty. I did not feel that rewriting our server < > server logic with netty was necessary given how easy ssl was to implement with standard java `SSLSocket`s. Any arguments to the contrary? Thanks, Abe You can merge this pull request into a Git repository by running: $ git pull https://github.com/afine/zookeeper ZOOKEEPER-236 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/184.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #184 commit db33552046dea7e8e850945da4f18d18644d8ee5 Author: Abraham Fine <afine@apache.org> Date: 2017-03-06T23:12:59Z ZOOKEEPER-236 : SSL Support for Atomic Broadcast protocol
          Hide
          geek101 Powell Molleti added a comment -

          Hi Abe,

          I have attempted to work on providing the same support last year for branch-3.5 and I have created the pull request for what I have so far for you to look at: https://github.com/apache/zookeeper/pull/185
          Readme: https://github.com/geek101/zookeeper/blob/branch-3.4/README_SSL.md

          I definitely did not consider upgrading non SSL cluster to an SSL one for ZAB and FLE, glad that you brought that up.

          I am attempting to combine these changes and Netty changes I have for Trunk. You can find them here:
          https://github.com/geek101/zookeeper/pull/6

          Let me know what you think.
          thanks
          Powell.

          Show
          geek101 Powell Molleti added a comment - Hi Abe, I have attempted to work on providing the same support last year for branch-3.5 and I have created the pull request for what I have so far for you to look at: https://github.com/apache/zookeeper/pull/185 Readme: https://github.com/geek101/zookeeper/blob/branch-3.4/README_SSL.md I definitely did not consider upgrading non SSL cluster to an SSL one for ZAB and FLE, glad that you brought that up. I am attempting to combine these changes and Netty changes I have for Trunk. You can find them here: https://github.com/geek101/zookeeper/pull/6 Let me know what you think. thanks Powell.
          Hide
          abrahamfine Abraham Fine added a comment -

          Hi Powell Molleti-

          Thanks for letting me know about the work that you have done. I apologize if I have missed it somewhere else in JIRA. Hopefully we can combine efforts here.

          I wanted to discuss about the way that certificates are being handled in your patch, which I think is a fundamental difference in our two approaches (which I think is based on different operational assumptions).

          Your patch, and please correct me if I am wrong, appears to use self signed certs on each node and a fingerprint (passed through the configuration system) as a mechanism of verification. This makes zookeeper self contained and easy to manage.

          My patch assumes certificates are likely not self signed and some public key (or possibly keys) are available in the trust store that would be able to authenticate all zk servers. I think this has the advantage of making it much more difficult for unauthorized servers to join a quorum as they would need to have access to the CA that was used to generate the keys in the truststore. In addition, I needed to make minimal changes to the config system.

          What do you think is the best path forward?

          Abe

          Show
          abrahamfine Abraham Fine added a comment - Hi Powell Molleti - Thanks for letting me know about the work that you have done. I apologize if I have missed it somewhere else in JIRA. Hopefully we can combine efforts here. I wanted to discuss about the way that certificates are being handled in your patch, which I think is a fundamental difference in our two approaches (which I think is based on different operational assumptions). Your patch, and please correct me if I am wrong, appears to use self signed certs on each node and a fingerprint (passed through the configuration system) as a mechanism of verification. This makes zookeeper self contained and easy to manage. My patch assumes certificates are likely not self signed and some public key (or possibly keys) are available in the trust store that would be able to authenticate all zk servers. I think this has the advantage of making it much more difficult for unauthorized servers to join a quorum as they would need to have access to the CA that was used to generate the keys in the truststore. In addition, I needed to make minimal changes to the config system. What do you think is the best path forward? Abe
          Hide
          geek101 Powell Molleti added a comment -

          Hi Abe,

          I am more then happy to combine efforts and collaborate. I think the significant amount of work is testing which is what I am attempting to get it right in the Trunk patch which also has Netty support when compared to work on branch-3.5 (this patch).

          This patch has support for configuration of for Truststore. Hence if the admin provides a cert chain that contains the CA(s) or for that matter if the Truststore contains all the self signed certs it should also work. I did not do extensive testing for this in this branch (I did this for branch-3.4, hence I think it should work).

          Let me point you to the code and let me know if this can work or is what you are expecting:

          createSSLContext() calls always chain Trustmanagers via X509ChainedTrustManager class with Truststore manager first i.e ZKX509TrustManager class and dynamic trust verification via ZKPeerX509TrustManager class.

          Client side SSL socket creation:
          https://github.com/apache/zookeeper/pull/185/files#diff-74d86ae5e83698ad62aada32dcbe615eR53

          Quorum server side SSL socket creation:
          https://github.com/apache/zookeeper/pull/185/files#diff-eb0052bc7ed160b8dee226f2ed1bdad2R51

          I do think we need consensus here if there is need or want to support dynamic reconfiguration with SSL in a fault-tolerant way. By which I mean, if I understand correctly, both the operations of managing the certs (add/remove of certs) and reconfig() API to change members of a quorum have to be fault-tolerant. reconfig() is atomic hence no issue there. But then that leaves managing the certs for the Quorum which is a non trivial problem and there exists a solution I think that does need support in the guts of ZK. But I think we can simplify the orchestration of Zookeeper with SSL if it is SSL aware.

          CA seems like the best way and simple way to go till such a time arrises when the CA has to be changed. Also prevalent security considerations recommend revoking the cert of the Quorum member that is removed. Again how do we accomplish this in a fault-tolerant way. Hence there is some work still left to do in CA case.

          If we do not consider fault-tolerance for SSL config management or take a stance that ZK does not have to solve this for the admin then just providing a Truststore would suffice.

          Also regarding the path forward do you think we should aim for a Trunk patch or patch to 3.5?. Also should we split the effort two phases, first without Netty and then perhaps bring in Netty etc.

          Let me know if I got something wrong and what do you think about bringing in Netty support for Quorum communication.
          Cheers
          Powell.

          Show
          geek101 Powell Molleti added a comment - Hi Abe, I am more then happy to combine efforts and collaborate. I think the significant amount of work is testing which is what I am attempting to get it right in the Trunk patch which also has Netty support when compared to work on branch-3.5 (this patch). This patch has support for configuration of for Truststore. Hence if the admin provides a cert chain that contains the CA(s) or for that matter if the Truststore contains all the self signed certs it should also work. I did not do extensive testing for this in this branch (I did this for branch-3.4, hence I think it should work). Let me point you to the code and let me know if this can work or is what you are expecting: createSSLContext() calls always chain Trustmanagers via X509ChainedTrustManager class with Truststore manager first i.e ZKX509TrustManager class and dynamic trust verification via ZKPeerX509TrustManager class. Client side SSL socket creation: https://github.com/apache/zookeeper/pull/185/files#diff-74d86ae5e83698ad62aada32dcbe615eR53 Quorum server side SSL socket creation: https://github.com/apache/zookeeper/pull/185/files#diff-eb0052bc7ed160b8dee226f2ed1bdad2R51 I do think we need consensus here if there is need or want to support dynamic reconfiguration with SSL in a fault-tolerant way. By which I mean, if I understand correctly, both the operations of managing the certs (add/remove of certs) and reconfig() API to change members of a quorum have to be fault-tolerant. reconfig() is atomic hence no issue there. But then that leaves managing the certs for the Quorum which is a non trivial problem and there exists a solution I think that does need support in the guts of ZK. But I think we can simplify the orchestration of Zookeeper with SSL if it is SSL aware. CA seems like the best way and simple way to go till such a time arrises when the CA has to be changed. Also prevalent security considerations recommend revoking the cert of the Quorum member that is removed. Again how do we accomplish this in a fault-tolerant way. Hence there is some work still left to do in CA case. If we do not consider fault-tolerance for SSL config management or take a stance that ZK does not have to solve this for the admin then just providing a Truststore would suffice. Also regarding the path forward do you think we should aim for a Trunk patch or patch to 3.5?. Also should we split the effort two phases, first without Netty and then perhaps bring in Netty etc. Let me know if I got something wrong and what do you think about bringing in Netty support for Quorum communication. Cheers Powell.
          Hide
          abrahamfine Abraham Fine added a comment -

          Hi Powell Molleti-

          if I understand correctly, both the operations of managing the certs (add/remove of certs) and reconfig() API to change members of a quorum have to be fault-tolerant.

          Would you mind clarifying what you mean by "fault-tolerant" here? Can you give an example of how a fault would break my patch?

          CA seems like the best way and simple way to go till such a time arrises when the CA has to be changed.

          You are correct in that there is certainly additional complexity involved here. But these are "solved" problems that I don't feel we need to write a custom solution for.

          Also prevalent security considerations recommend revoking the cert of the Quorum member that is removed. Again how do we accomplish this in a fault-tolerant way. Hence there is some work still left to do in CA case.

          Good point. I forgot to include a note on this with my patch. I was thinking supporting OCSP (or even CRL's) would be a fine solution to that. What do you think?

          Also regarding the path forward do you think we should aim for a Trunk patch or patch to 3.5?

          I think we should aim for both.

          Let me know if I got something wrong and what do you think about bringing in Netty support for Quorum communication.

          What are your motivations for bringing in Netty?

          Thanks,
          Abe

          Show
          abrahamfine Abraham Fine added a comment - Hi Powell Molleti - if I understand correctly, both the operations of managing the certs (add/remove of certs) and reconfig() API to change members of a quorum have to be fault-tolerant. Would you mind clarifying what you mean by "fault-tolerant" here? Can you give an example of how a fault would break my patch? CA seems like the best way and simple way to go till such a time arrises when the CA has to be changed. You are correct in that there is certainly additional complexity involved here. But these are "solved" problems that I don't feel we need to write a custom solution for. Also prevalent security considerations recommend revoking the cert of the Quorum member that is removed. Again how do we accomplish this in a fault-tolerant way. Hence there is some work still left to do in CA case. Good point. I forgot to include a note on this with my patch. I was thinking supporting OCSP (or even CRL's) would be a fine solution to that. What do you think? Also regarding the path forward do you think we should aim for a Trunk patch or patch to 3.5? I think we should aim for both. Let me know if I got something wrong and what do you think about bringing in Netty support for Quorum communication. What are your motivations for bringing in Netty? Thanks, Abe
          Hide
          hanm Michael Han added a comment -

          Thanks Abe for driving this. I have some comments for your questions in pull request.

          I am using the same configuration that points to the truststore/keystore used for server <-> client ssl. Do they need to be separate?

          Separate configuration option provides better flexibility and is also consistent with SASL / Kerberos configurations for client-server and server-server. For example we have separate server login context for server credential in client-server and server-server cases (ZOOKEEPER-1045), the keytab might be the same but the configuration options are separate.

          Is port unification the correct approach for rolling upgrades?

          I think it might be helpful to do some scoping and decide if we want to support rolling upgrade - or supporting rolling upgrade in first release of this patch. When it comes to security usually customers are OK to do a cold restart. Not saying that we should not support rolling upgrade someday but you might also want consider the amount of work involved to support it (implementation, how to properly testing, etc.) and it might be easier if we do this in phases - unless it is trivial to implement and test rolling upgrade (I haven't looked this in much details.) or folks feel it is absolutely required to get rolling upgrade capability before shipping this feature.

          server logic with netty was necessary given how easy ssl was to implement with standard java `SSLSocket`s. Any arguments to the contrary?

          Today we don't use Netty for server-server chat so it seems no immediate needs to rely on Netty for this work. Though, issues like ZOOKEEPER-900 and ZOOKEEPER-901 might push towards the direction of using Netty so the FLE and server chats are none blocking, plus we have TLS on Netty for client-server secured communication, so for consistency we could choose to implement TLS on Netty for server-server as well. I am interesting to hear what others think about this.

          Show
          hanm Michael Han added a comment - Thanks Abe for driving this. I have some comments for your questions in pull request. I am using the same configuration that points to the truststore/keystore used for server <-> client ssl. Do they need to be separate? Separate configuration option provides better flexibility and is also consistent with SASL / Kerberos configurations for client-server and server-server. For example we have separate server login context for server credential in client-server and server-server cases ( ZOOKEEPER-1045 ), the keytab might be the same but the configuration options are separate. Is port unification the correct approach for rolling upgrades? I think it might be helpful to do some scoping and decide if we want to support rolling upgrade - or supporting rolling upgrade in first release of this patch. When it comes to security usually customers are OK to do a cold restart. Not saying that we should not support rolling upgrade someday but you might also want consider the amount of work involved to support it (implementation, how to properly testing, etc.) and it might be easier if we do this in phases - unless it is trivial to implement and test rolling upgrade (I haven't looked this in much details.) or folks feel it is absolutely required to get rolling upgrade capability before shipping this feature. server logic with netty was necessary given how easy ssl was to implement with standard java `SSLSocket`s. Any arguments to the contrary? Today we don't use Netty for server-server chat so it seems no immediate needs to rely on Netty for this work. Though, issues like ZOOKEEPER-900 and ZOOKEEPER-901 might push towards the direction of using Netty so the FLE and server chats are none blocking, plus we have TLS on Netty for client-server secured communication, so for consistency we could choose to implement TLS on Netty for server-server as well. I am interesting to hear what others think about this.
          Hide
          abrahamfine Abraham Fine added a comment -

          Michael Han-

          Separate configuration option provides better flexibility and is also consistent with SASL / Kerberos configurations for client-server and server-server.

          That makes sense to me. I'll update the patch. I'm also taking suggestions for what such an option should be named.

          unless it is trivial to implement and test rolling upgrade

          It was pretty trivial to implement and I imagine testing should not be too difficult either.

          Today we don't use Netty for server-server chat so it seems no immediate needs to rely on Netty for this work.

          This is how I feel as well. I'm sure we can pretty quickly come up with a list of deficiencies in the current design but I don't think there is anything severe enough at this moment to give us cause to rewrite right now.

          Show
          abrahamfine Abraham Fine added a comment - Michael Han - Separate configuration option provides better flexibility and is also consistent with SASL / Kerberos configurations for client-server and server-server. That makes sense to me. I'll update the patch. I'm also taking suggestions for what such an option should be named. unless it is trivial to implement and test rolling upgrade It was pretty trivial to implement and I imagine testing should not be too difficult either. Today we don't use Netty for server-server chat so it seems no immediate needs to rely on Netty for this work. This is how I feel as well. I'm sure we can pretty quickly come up with a list of deficiencies in the current design but I don't think there is anything severe enough at this moment to give us cause to rewrite right now.
          Hide
          geek101 Powell Molleti added a comment -

          Hi Abe,

          if I understand correctly, both the operations of managing the certs (add/remove of certs) and reconfig() API to change members of quorum have to be fault-tolerant.

          Would you mind clarifying what you mean by "fault-tolerant" here? Can you give an example of how a fault would break my patch?

          Either it be CA(s) with CRL's or self signed list of certs what I am pointing to is that the way an admin manages this information should also support fault-tolerance. Not only it should be fault-tolerant but also should work nicely/easily with most probable next thing an admin would do i.e issue a reconfig() command, it could be an add/removing/modify quorum peer(s) configuration.

          It will be nice to provide a way to manage reconfiguration of quorum peers when SSL is enabled with the same weak assumptions that are necessary for reconfig() to work when SSL is not enabled.

          Providing a Truststore and asking admins to manage them on their own for the entire quorum will mean that this operation is not fault-tolerant i.e we are expecting them to first set all members of the quorum to a consistent SSL config state and then issue reconfig() command.

          It would seem that a set of quorum IP addresses dictate what the current configuration of connectivity is allowed and this has to be managed properly to ensure safety and extending this idea the set of SSL certs(be self signed or CA signed) also dictate the current configuration of connectivity. Hence if one considers the Pair<IP set, SSL set> as config and provide that to reconfig() API it should work. That is what is done for self signed certs in my patch and we should/could provide similar functionality for CA cert case.

          Hence there is no new problem to solve here, we piggy back on reconfig() API and provide a single API to manage this, we get fault-tolerance for this configuration and safety that reconfig() provides for free.

          Please consider the above comments and let me know what you think, I was not saying that your patch is breaking fault-tolerance instead what my comments pointed to is that we should provide fault-tolerance and safety for reconfiguration of SSL configuration be it self signed or CA based. There are use cases where CA cert based cluster deployment might not be possible hence it would be nice to see Zookeeper provide both possibilities but also maintain the ease of use and provide same guarantees that reconfig() does.

          This is how I feel as well. I'm sure we can pretty quickly come up with a list of deficiencies in the current design but I don't think there is anything severe enough at this moment to give us cause to rewrite right now.

          There are bugs like ZOOKEEPER-2164, ZOOKEEPER-1678 to consider along with ZOOKEEPER-901. Netty or NIO will work but considering SSL will mean Netty will make it easier to implement.

          Doing this in phases is better, getting SSL socket to work with reconfig() support is great first step. The Netty patch I have also gets this support only for FLE and not ZAB. I found it not so easier to abstract away the calls to socket(s) from ZAB code.

          Cheers
          Powell.

          Show
          geek101 Powell Molleti added a comment - Hi Abe, if I understand correctly, both the operations of managing the certs (add/remove of certs) and reconfig() API to change members of quorum have to be fault-tolerant. Would you mind clarifying what you mean by "fault-tolerant" here? Can you give an example of how a fault would break my patch? Either it be CA(s) with CRL's or self signed list of certs what I am pointing to is that the way an admin manages this information should also support fault-tolerance. Not only it should be fault-tolerant but also should work nicely/easily with most probable next thing an admin would do i.e issue a reconfig() command, it could be an add/removing/modify quorum peer(s) configuration. It will be nice to provide a way to manage reconfiguration of quorum peers when SSL is enabled with the same weak assumptions that are necessary for reconfig() to work when SSL is not enabled. Providing a Truststore and asking admins to manage them on their own for the entire quorum will mean that this operation is not fault-tolerant i.e we are expecting them to first set all members of the quorum to a consistent SSL config state and then issue reconfig() command. It would seem that a set of quorum IP addresses dictate what the current configuration of connectivity is allowed and this has to be managed properly to ensure safety and extending this idea the set of SSL certs(be self signed or CA signed) also dictate the current configuration of connectivity. Hence if one considers the Pair<IP set, SSL set> as config and provide that to reconfig() API it should work. That is what is done for self signed certs in my patch and we should/could provide similar functionality for CA cert case. Hence there is no new problem to solve here, we piggy back on reconfig() API and provide a single API to manage this, we get fault-tolerance for this configuration and safety that reconfig() provides for free. Please consider the above comments and let me know what you think, I was not saying that your patch is breaking fault-tolerance instead what my comments pointed to is that we should provide fault-tolerance and safety for reconfiguration of SSL configuration be it self signed or CA based. There are use cases where CA cert based cluster deployment might not be possible hence it would be nice to see Zookeeper provide both possibilities but also maintain the ease of use and provide same guarantees that reconfig() does. This is how I feel as well. I'm sure we can pretty quickly come up with a list of deficiencies in the current design but I don't think there is anything severe enough at this moment to give us cause to rewrite right now. There are bugs like ZOOKEEPER-2164 , ZOOKEEPER-1678 to consider along with ZOOKEEPER-901 . Netty or NIO will work but considering SSL will mean Netty will make it easier to implement. Doing this in phases is better, getting SSL socket to work with reconfig() support is great first step. The Netty patch I have also gets this support only for FLE and not ZAB. I found it not so easier to abstract away the calls to socket(s) from ZAB code. Cheers Powell.
          Hide
          phunt Patrick Hunt added a comment -

          In general I don't think we want to tie/require security through reconfig. Seems like a potentially useful feature, but most folks are pretty familiar with typically java/ssl configuration. Additionally what was originally proposed here is very consistent across the Hadoop 'ecosystem'. That said perhaps a separate jira for the reconfig feature, for those interested in that approach (although I don't know if we need two)

          Keep in mind also that some folks might want security w/o enabling reconfig - reconfig is off by default iirc in 3.5. Those that are not using reconfig might not want to enable it just to enable security.

          Show
          phunt Patrick Hunt added a comment - In general I don't think we want to tie/require security through reconfig. Seems like a potentially useful feature, but most folks are pretty familiar with typically java/ssl configuration. Additionally what was originally proposed here is very consistent across the Hadoop 'ecosystem'. That said perhaps a separate jira for the reconfig feature, for those interested in that approach (although I don't know if we need two) Keep in mind also that some folks might want security w/o enabling reconfig - reconfig is off by default iirc in 3.5. Those that are not using reconfig might not want to enable it just to enable security.
          Hide
          abrahamfine Abraham Fine added a comment -

          Powell Molleti-

          but also should work nicely/easily with most probable next thing an admin would do i.e issue a reconfig() command

          I agree that doing it through reconfig() does provide a more integrated user experience. But I am not sure that it is what an "admin" would expect as the rest of the hadoop ecosystem handles it the other way.

          Providing a Truststore and asking admins to manage them on their own for the entire quorum will mean that this operation is not fault-tolerant i.e we are expecting them to first set all members of the quorum to a consistent SSL config state and then issue reconfig() command.

          I'm not sure that requiring proper ssl configuration for nodes before they join a cluster is unreasonable to expect of an admin. I think this is a decision better left to the community.

          There are bugs like ZOOKEEPER-2164, ZOOKEEPER-1678 to consider along with ZOOKEEPER-901. Netty or NIO will work but considering SSL will mean Netty will make it easier to implement.

          I agree that there are some reasons to discuss using netty for server<->server but I think it is outside the scope of this JIRA.

          Doing this in phases is better,

          I agree. What do you think about Patrick Hunt's recommendation? Implement SSL in this JIRA in the old fashioned way (we could even backport to 3.4) here and open another JIRA for reconfig() support.

          Show
          abrahamfine Abraham Fine added a comment - Powell Molleti - but also should work nicely/easily with most probable next thing an admin would do i.e issue a reconfig() command I agree that doing it through reconfig() does provide a more integrated user experience. But I am not sure that it is what an "admin" would expect as the rest of the hadoop ecosystem handles it the other way. Providing a Truststore and asking admins to manage them on their own for the entire quorum will mean that this operation is not fault-tolerant i.e we are expecting them to first set all members of the quorum to a consistent SSL config state and then issue reconfig() command. I'm not sure that requiring proper ssl configuration for nodes before they join a cluster is unreasonable to expect of an admin. I think this is a decision better left to the community. There are bugs like ZOOKEEPER-2164 , ZOOKEEPER-1678 to consider along with ZOOKEEPER-901 . Netty or NIO will work but considering SSL will mean Netty will make it easier to implement. I agree that there are some reasons to discuss using netty for server<->server but I think it is outside the scope of this JIRA. Doing this in phases is better, I agree. What do you think about Patrick Hunt 's recommendation? Implement SSL in this JIRA in the old fashioned way (we could even backport to 3.4) here and open another JIRA for reconfig() support.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user geek101 opened a pull request:

          https://github.com/apache/zookeeper/pull/188

          ZOOKEEPER-236: SSL support for ZAB and FLE [DO NOT MERGE]

          1. Zookeeper FLE & ZAB SSL

          Provides SSL for Leader Election and ZAB i.e ports 3888 and 2888.

          Goal of this patch is to build on top of SSL changes for [branch-3.4](https://github.com/geek101/zookeeper/blob/branch-3.4/README_SSL.md)

              1. Some details
                  1. Building

          ```
          git checkout branch-3.5-ssl-review5
          ant jar
          ```

          Args to enable SSL:
          ```
          -Dquorum.ssl.enabled="true"
          -Dquorum.ssl.keyStore.location="<Private key and signed cert, key store file>"
          -Dquorum.ssl.keyStore.password="<Password for the above>"
          -Dquorum.ssl.trustStore.location="<Root CA cert, key store file>"
          -Dquorum.ssl.trustStore.password="<Password for the above>"
          ```

          Example run command:
          ```
          java -Dquorum.ssl.enabled="true" -Dquorum.ssl.keyStore.location="node1.ks"
          -Dquorum.ssl.keyStore.password="CertPassword1" -Dquorum.ssl.trustStore.location="truststore.ks" -Dquorum.ssl.trustStore.password="StorePass" -cp zookeeper.jar:lib/* org.apache.zookeeper.server.quorum.QuorumPeerMain zoo1.cfg
          ```

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/geek101/zookeeper branch-3.5-ssl-review6

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/zookeeper/pull/188.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #188


          commit 2fe3e971839dfa9d3a5040990e50b5cb657502e3
          Author: Powell Molleti <powellm79@yahoo.com>
          Date: 2016-08-14T02:27:03Z

          SSL support for ZAB and FLE.

          Pass ZKConfig() around to make things work.
          Seperate SSL config for client and quorum
          Quorum servers will have different properties for
          SSL config, example:
          -Dquorum.ssl.enabled=true
          -Dquorum.ssl.keyStore.location=/root/zookeeper/ssl/testKeyStore.jks
          -Dquorum.ssl.keyStore.password=testpass
          -Dquorum.ssl.trustStore.location=/root/zookeeper/ssl/testTrustStore.jks
          -Dquorum.ssl.trustStore.password=testpass

          Enable each zookeeper node will be able to also allowed to be authenticated
          as a client using dynamic reconfig.

          Basic algorithm for various SSL connections.

          Client:
          1. Use the given truststore if available

          Server:
          1. Use the given truststore if available

          Quorum:
          1. Use the given truststore if available

          Add README to help explain what this code tries to accomplish.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user geek101 opened a pull request: https://github.com/apache/zookeeper/pull/188 ZOOKEEPER-236 : SSL support for ZAB and FLE [DO NOT MERGE] Zookeeper FLE & ZAB SSL Provides SSL for Leader Election and ZAB i.e ports 3888 and 2888. Goal of this patch is to build on top of SSL changes for [branch-3.4] ( https://github.com/geek101/zookeeper/blob/branch-3.4/README_SSL.md ) Some details [X509Util] ( https://github.com/geek101/zookeeper/blob/branch-3.5-ssl-review5/src/java/main/org/apache/zookeeper/common/X509Util.java ) becomes first class citizen and [QuorumX509Util] ( https://github.com/geek101/zookeeper/blob/branch-3.5-ssl-review5/src/java/main/org/apache/zookeeper/server/quorum/util/QuorumX509Util.java ) and [ServerX509Util] ( https://github.com/geek101/zookeeper/blob/branch-3.5-ssl-review5/src/java/main/org/apache/zookeeper/server/util/ServerX509Util.java ) extend it. [ZKConfig] ( https://github.com/geek101/zookeeper/blob/branch-3.5-ssl-review5/src/java/main/org/apache/zookeeper/common/ZKConfig.java ) becomes an abstract class and [QuorumSslConfig] ( https://github.com/geek101/zookeeper/blob/branch-3.5-ssl-review5/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java ) and [ZookeeperServerConfig] ( https://github.com/geek101/zookeeper/blob/branch-3.5-ssl-review5/src/java/main/org/apache/zookeeper/server/ZookeeperServerConfig.java ) implement it. Support for a Quorum peer to also be authenticated as a [ZK client] ( https://github.com/geek101/zookeeper/blob/branch-3.5-ssl-review5/src/java/main/org/apache/zookeeper/server/util/ServerX509Util.java#L62 ) (this will be removed if it breaks security and or is not needed) Building ``` git checkout branch-3.5-ssl-review5 ant jar ``` Args to enable SSL: ``` -Dquorum.ssl.enabled="true" -Dquorum.ssl.keyStore.location="<Private key and signed cert, key store file>" -Dquorum.ssl.keyStore.password="<Password for the above>" -Dquorum.ssl.trustStore.location="<Root CA cert, key store file>" -Dquorum.ssl.trustStore.password="<Password for the above>" ``` Example run command: ``` java -Dquorum.ssl.enabled="true" -Dquorum.ssl.keyStore.location="node1.ks" -Dquorum.ssl.keyStore.password="CertPassword1" -Dquorum.ssl.trustStore.location="truststore.ks" -Dquorum.ssl.trustStore.password="StorePass" -cp zookeeper.jar:lib/* org.apache.zookeeper.server.quorum.QuorumPeerMain zoo1.cfg ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/geek101/zookeeper branch-3.5-ssl-review6 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/188.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #188 commit 2fe3e971839dfa9d3a5040990e50b5cb657502e3 Author: Powell Molleti <powellm79@yahoo.com> Date: 2016-08-14T02:27:03Z SSL support for ZAB and FLE. Pass ZKConfig() around to make things work. Seperate SSL config for client and quorum Quorum servers will have different properties for SSL config, example: -Dquorum.ssl.enabled=true -Dquorum.ssl.keyStore.location=/root/zookeeper/ssl/testKeyStore.jks -Dquorum.ssl.keyStore.password=testpass -Dquorum.ssl.trustStore.location=/root/zookeeper/ssl/testTrustStore.jks -Dquorum.ssl.trustStore.password=testpass Enable each zookeeper node will be able to also allowed to be authenticated as a client using dynamic reconfig. Basic algorithm for various SSL connections. Client: 1. Use the given truststore if available Server: 1. Use the given truststore if available Quorum: 1. Use the given truststore if available Add README to help explain what this code tries to accomplish.
          Hide
          geek101 Powell Molleti added a comment -

          Sounds good, I have created another pull request that does not have the changes for dynamic propagation of certificate fingerprints of the new quorum / new peer etc.

          https://github.com/apache/zookeeper/pull/188

          Porting this backward to branch-3.4 can be more work due to ZKConfig work done in branch-3.4. I have worked on branch-3.4 first for SSL patch and branch-3.5 work is based on this. Will try to rebase that and see how that goes.
          https://github.com/geek101/zookeeper/blob/branch-3.4/README_SSL.md

          thanks
          Powell.

          Show
          geek101 Powell Molleti added a comment - Sounds good, I have created another pull request that does not have the changes for dynamic propagation of certificate fingerprints of the new quorum / new peer etc. https://github.com/apache/zookeeper/pull/188 Porting this backward to branch-3.4 can be more work due to ZKConfig work done in branch-3.4. I have worked on branch-3.4 first for SSL patch and branch-3.5 work is based on this. Will try to rebase that and see how that goes. https://github.com/geek101/zookeeper/blob/branch-3.4/README_SSL.md thanks Powell.
          Hide
          abrahamfine Abraham Fine added a comment - - edited

          Hi Powell Molleti-

          I have been looking at your latest patch and it looks great. Just in case I am missing something, are there any major functionality differences between my original patch and pull request 188 (besides the documentation)?

          I'm also a little concerned with the use of the `ServerCfg` class. I understand how it wraps the hostString and InetSocketAddress but it creates a lot of plumbing changes that I do not think are necessary for the funcitonality we are trying to implement and complicate the diff.

          Do you think it would be easier to move forward with the patch I uploaded originally and put your reconfig changes on top of that (as I think the configuration refactoring you did is more appropriate there anyway)?

          Thanks,
          Abe

          Show
          abrahamfine Abraham Fine added a comment - - edited Hi Powell Molleti - I have been looking at your latest patch and it looks great. Just in case I am missing something, are there any major functionality differences between my original patch and pull request 188 (besides the documentation)? I'm also a little concerned with the use of the `ServerCfg` class. I understand how it wraps the hostString and InetSocketAddress but it creates a lot of plumbing changes that I do not think are necessary for the funcitonality we are trying to implement and complicate the diff. Do you think it would be easier to move forward with the patch I uploaded originally and put your reconfig changes on top of that (as I think the configuration refactoring you did is more appropriate there anyway)? Thanks, Abe
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. GitHub Pull Request Build

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 3.0.1) warnings.

          -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings).

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/399//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/399//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/399//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/399//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 3.0.1) warnings. -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/399//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/399//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/399//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/399//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. GitHub Pull Request Build

          -1 @author. The patch appears to contain 2 @author tags which the Zookeeper community has agreed to not allow in code contributions.

          +1 tests included. The patch appears to include 27 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings.

          -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings).

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/401//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/401//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/401//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/401//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. GitHub Pull Request Build -1 @author. The patch appears to contain 2 @author tags which the Zookeeper community has agreed to not allow in code contributions. +1 tests included. The patch appears to include 27 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/401//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/401//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/401//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/401//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. GitHub Pull Request Build

          -1 @author. The patch appears to contain 2 @author tags which the Zookeeper community has agreed to not allow in code contributions.

          +1 tests included. The patch appears to include 27 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings.

          -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings).

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/402//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/402//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/402//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/402//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. GitHub Pull Request Build -1 @author. The patch appears to contain 2 @author tags which the Zookeeper community has agreed to not allow in code contributions. +1 tests included. The patch appears to include 27 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/402//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/402//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/402//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/402//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. GitHub Pull Request Build

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 30 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings.

          -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings).

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/403//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/403//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/403//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/403//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 30 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/403//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/403//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/403//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/403//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. GitHub Pull Request Build

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 49 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings.

          -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings).

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/404//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/404//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/404//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/404//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 49 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/404//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/404//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/404//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/404//console This message is automatically generated.
          Hide
          geek101 Powell Molleti added a comment -

          Hi Abe,

          Thanks for looking at the patch and feedback.

          I'm also a little concerned with the use of the `ServerCfg` class. I understand how it wraps the hostString and InetSocketAddress but it creates a lot of plumbing changes that I do not think are necessary for the functionality we are trying to implement and complicate the diff.

          Noted, I was using that class to band together hostString, InetSocketAddress and Certificate fingerprint. I have removed it and eliminated the impact of it files. See my comments later in the message on the need for hostString.

          Do you think it would be easier to move forward with the patch I uploaded originally and put your reconfig changes on top of that (as I think the configuration refactoring you did is more appropriate there anyway)?

          I have no issues if you want to commit your changes. Here are few of the things I think are needed.

          • Need for separate SSL config for client to server and quorum peer to quorum peer. Changes to X509Util and ZKConfig are for this.
          • Need for Hostname verification and CRL lists at-least for quorum peer to quorum peer SSL would mean that we will need X509ExtendedTrustManager hence the reason for ZKX509TrustManager class and its helpers.
          • Hostname verification will need hostname to be supplied at SSLEngine creation time if reverse DNS lookup is not desired. I do not have this either.

          Some more risks from my patch:

          • Modifying the code paths for client to server ssl. I have ensured that these tests work. More testing might be needed.
          • Risk of using X509ExtendedTrustManager, did we miss any checks that are done by the default Trustmanager that was used before.
          • Lack of upgrade path, I like the idea of new nodes optionally supporting sockets that support both plain and ssl connections.

          To make my patch more complete I need to continue to write more unit tests specially testing quorum bootstrap with and without SSL and tests for ZKX509TrustManager, have already build tools for that.I have ported some from Netty Trunk work and I could port more to get those going.

          thanks
          Powell.

          Show
          geek101 Powell Molleti added a comment - Hi Abe, Thanks for looking at the patch and feedback. I'm also a little concerned with the use of the `ServerCfg` class. I understand how it wraps the hostString and InetSocketAddress but it creates a lot of plumbing changes that I do not think are necessary for the functionality we are trying to implement and complicate the diff. Noted, I was using that class to band together hostString, InetSocketAddress and Certificate fingerprint. I have removed it and eliminated the impact of it files. See my comments later in the message on the need for hostString. Do you think it would be easier to move forward with the patch I uploaded originally and put your reconfig changes on top of that (as I think the configuration refactoring you did is more appropriate there anyway)? I have no issues if you want to commit your changes. Here are few of the things I think are needed. Need for separate SSL config for client to server and quorum peer to quorum peer. Changes to X509Util and ZKConfig are for this. Need for Hostname verification and CRL lists at-least for quorum peer to quorum peer SSL would mean that we will need X509ExtendedTrustManager hence the reason for ZKX509TrustManager class and its helpers. Hostname verification will need hostname to be supplied at SSLEngine creation time if reverse DNS lookup is not desired. I do not have this either. Some more risks from my patch: Modifying the code paths for client to server ssl. I have ensured that these tests work. More testing might be needed. Risk of using X509ExtendedTrustManager, did we miss any checks that are done by the default Trustmanager that was used before. Lack of upgrade path, I like the idea of new nodes optionally supporting sockets that support both plain and ssl connections. To make my patch more complete I need to continue to write more unit tests specially testing quorum bootstrap with and without SSL and tests for ZKX509TrustManager, have already build tools for that.I have ported some from Netty Trunk work and I could port more to get those going. thanks Powell.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. GitHub Pull Request Build

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 54 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings.

          -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings).

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/406//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/406//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/406//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/406//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 54 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/406//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/406//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/406//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/406//console This message is automatically generated.
          Hide
          abrahamfine Abraham Fine added a comment -

          Hi Powell Molleti-

          Need for separate SSL config for client to server and quorum peer to quorum peer. Changes to X509Util and ZKConfig are for this.

          Totally agree.

          Need for Hostname verification and CRL lists at-least for quorum peer to quorum peer SSL would mean that we will need X509ExtendedTrustManager hence the reason for ZKX509TrustManager class and its helpers.

          I'm not sure I agree with this one. I uploaded a new commit on my pr showing that hostname verification can be implemented outside of the trust manager (since hostname verification is not part of ssl). I think that is easier this way because we often do not know which zookeeper sid is connecting until some information is transferred. In addition, I thought CRL is implemented completely outside of application logic (see: http://stackoverflow.com/questions/8506661/check-x509-certificate-revocation-status-in-spring-security-before-authenticatin/8507905#8507905), I could be very wrong though. Still need to test this.

          Hostname verification will need hostname to be supplied at SSLEngine creation time if reverse DNS lookup is not desired. I do not have this either.

          For client <-> server I think this is true. We could move this into another patch as this is outside that would be outside the scope of the JIRA.

          Please take a look at my latest changes and let me know what you think. I still have not implemented separating the client and server configurations. That should be coming soon.

          Thanks,
          Abe

          Show
          abrahamfine Abraham Fine added a comment - Hi Powell Molleti - Need for separate SSL config for client to server and quorum peer to quorum peer. Changes to X509Util and ZKConfig are for this. Totally agree. Need for Hostname verification and CRL lists at-least for quorum peer to quorum peer SSL would mean that we will need X509ExtendedTrustManager hence the reason for ZKX509TrustManager class and its helpers. I'm not sure I agree with this one. I uploaded a new commit on my pr showing that hostname verification can be implemented outside of the trust manager (since hostname verification is not part of ssl). I think that is easier this way because we often do not know which zookeeper sid is connecting until some information is transferred. In addition, I thought CRL is implemented completely outside of application logic (see: http://stackoverflow.com/questions/8506661/check-x509-certificate-revocation-status-in-spring-security-before-authenticatin/8507905#8507905 ), I could be very wrong though. Still need to test this. Hostname verification will need hostname to be supplied at SSLEngine creation time if reverse DNS lookup is not desired. I do not have this either. For client <-> server I think this is true. We could move this into another patch as this is outside that would be outside the scope of the JIRA. Please take a look at my latest changes and let me know what you think. I still have not implemented separating the client and server configurations. That should be coming soon. Thanks, Abe
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. GitHub Pull Request Build

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 54 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings.

          -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings).

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/411//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/411//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/411//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/411//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 54 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/411//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/411//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/411//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/411//console This message is automatically generated.
          Hide
          abrahamfine Abraham Fine added a comment -

          Powell Molleti my latest commit includes certificate revocation support. let me know if it behaves the way you would like it to.

          Abe

          Show
          abrahamfine Abraham Fine added a comment - Powell Molleti my latest commit includes certificate revocation support. let me know if it behaves the way you would like it to. Abe
          Hide
          geek101 Powell Molleti added a comment -

          Hi Abe,

          Few comments about the changes:

          One way to go is to minimize changes to Leader.java, Follower.java, Learner.java etc. This will help anyone who wants to abstract away sockets from that code. Let me know what you think.

          Regarding host verification one other way to go is to follow this:
          X509ExtendedTrustManager mentions about where to plugin host verification , it specifically quotes:

           Besides TLS 1.2 support, the X509ExtendedTrustManager class also support algorithm constraints and SSL layer hostname verification. For JSSE providers and trust manager implementations, the X509ExtendedTrustManager class is highly recommended rather than the legacy X509TrustManager interface. 

          It is possible to argue that thought TLS does not perform host verification I do not think it mentions this should be not be done at cert verification time nor should we allow exchange of application bits when the certificate is not what we expected it to be.

          It is possible to minimize risk by extending X509ExtendedTrustManager and calling default TrustManager's API first and then performing extra checks. Something like a composite X509TrustManager. This will also help others add to the list of checks in one place and all of this could be conditional based on options passed.

          I am fine with either OSCP or CRL verification as long as the admin is aware of how this affects the latency of session setup and reliability of the Quorum since they are all perhaps talking to one entity(hopefully not) for this to work.

          Also I noticed the ZOOKEEPER-2184 could be addressed by some of my plumbing changes that pass the configured hostname around along with the resolved ip address. Hence avoiding reverse DNS lookups when hostname is provided at config time etc.

          It is best to combine the PRs into one so we can collaborate and increase the velocity. We should agree on few things, I would like to suggest that we use X509ExtendedTrustManager and also minimize the amount of changes and maintain compatibility for client - server, I would like to keep the BC helper code in unit tests and the unit tests. I lean towards just changing Socket() calls to something else but not adding any more code to Leader.java, Follower.java, Learner.java etc.

          Let me know what you prefer.
          thanks
          Powell.

          Show
          geek101 Powell Molleti added a comment - Hi Abe, Few comments about the changes: One way to go is to minimize changes to Leader.java, Follower.java, Learner.java etc. This will help anyone who wants to abstract away sockets from that code. Let me know what you think. Regarding host verification one other way to go is to follow this: X509ExtendedTrustManager mentions about where to plugin host verification , it specifically quotes: Besides TLS 1.2 support, the X509ExtendedTrustManager class also support algorithm constraints and SSL layer hostname verification. For JSSE providers and trust manager implementations, the X509ExtendedTrustManager class is highly recommended rather than the legacy X509TrustManager interface. It is possible to argue that thought TLS does not perform host verification I do not think it mentions this should be not be done at cert verification time nor should we allow exchange of application bits when the certificate is not what we expected it to be. It is possible to minimize risk by extending X509ExtendedTrustManager and calling default TrustManager's API first and then performing extra checks. Something like a composite X509TrustManager. This will also help others add to the list of checks in one place and all of this could be conditional based on options passed. I am fine with either OSCP or CRL verification as long as the admin is aware of how this affects the latency of session setup and reliability of the Quorum since they are all perhaps talking to one entity(hopefully not) for this to work. Also I noticed the ZOOKEEPER-2184 could be addressed by some of my plumbing changes that pass the configured hostname around along with the resolved ip address. Hence avoiding reverse DNS lookups when hostname is provided at config time etc. It is best to combine the PRs into one so we can collaborate and increase the velocity. We should agree on few things, I would like to suggest that we use X509ExtendedTrustManager and also minimize the amount of changes and maintain compatibility for client - server, I would like to keep the BC helper code in unit tests and the unit tests. I lean towards just changing Socket() calls to something else but not adding any more code to Leader.java, Follower.java, Learner.java etc. Let me know what you prefer. thanks Powell.
          Hide
          abrahamfine Abraham Fine added a comment - - edited

          Powell Molleti-

          One way to go is to minimize changes to Leader.java, Follower.java, Learner.java etc

          I agree. That will make understanding those classes much easier

          I do not think it mentions this should be not be done at cert verification time nor should we allow exchange of application bits when the certificate is not what we expected it to be.

          So the issue is more zookeeper specific I think. Imagine the case where, and I know this is very contrived but I think the principal is valid, we have 3 zk servers all running on the same host with different ports. We have 3 dns records pointing to this machine with different names, say zk1, zk2, and zk3. Each zkX has a certificate with the zkX common name. Our zookeeper configuration identifies these servers by the correct name server.1=zk1... When one of these servers connects to the server socket on the other I do not think it is possible for the "server" to tell which zkX connected until the sid is read from the socket unless we want to start doing reverse dns lookups. I would rather just use the hostname we already know about. That is why I think we cannot do hostname verification in the trust manager. Or you could argue that we only need hostname verification for the "client", but I would rather have it both ways. Please let me know if I am missing something.

          I am fine with either OSCP or CRL verification as long as the admin is aware of how this affects the latency of session setup and reliability of the Quorum since they are all perhaps talking to one entity(hopefully not) for this to work.

          I agree. We need to document this drawbacks clearly.

          Also I noticed the ZOOKEEPER-2184 could be addressed by some of my plumbing changes that pass the configured hostname around along with the resolved ip address.

          There is an active PR for this so I don't think we need to address it in this patch.

          It is best to combine the PRs into one so we can collaborate and increase the velocity.

          I agree.

          I would like to suggest that we use X509ExtendedTrustManager

          If the concerns I listed above with hostname verification can be addressed I would be open to this.

          I would like to keep the BC helper code

          What do you mean by bc helper code?

          I lean towards just changing Socket() calls to something else but not adding any more code to Leader.java, Follower.java, Learner.java etc.

          Sounds like a good idea. Once we get the implementation details flushed out there will be plenty of refactoring to clean this stuff up.

          Thanks,
          Abe

          Show
          abrahamfine Abraham Fine added a comment - - edited Powell Molleti - One way to go is to minimize changes to Leader.java, Follower.java, Learner.java etc I agree. That will make understanding those classes much easier I do not think it mentions this should be not be done at cert verification time nor should we allow exchange of application bits when the certificate is not what we expected it to be. So the issue is more zookeeper specific I think. Imagine the case where, and I know this is very contrived but I think the principal is valid, we have 3 zk servers all running on the same host with different ports. We have 3 dns records pointing to this machine with different names, say zk1, zk2, and zk3. Each zkX has a certificate with the zkX common name. Our zookeeper configuration identifies these servers by the correct name server.1=zk1... When one of these servers connects to the server socket on the other I do not think it is possible for the "server" to tell which zkX connected until the sid is read from the socket unless we want to start doing reverse dns lookups. I would rather just use the hostname we already know about. That is why I think we cannot do hostname verification in the trust manager. Or you could argue that we only need hostname verification for the "client", but I would rather have it both ways. Please let me know if I am missing something. I am fine with either OSCP or CRL verification as long as the admin is aware of how this affects the latency of session setup and reliability of the Quorum since they are all perhaps talking to one entity(hopefully not) for this to work. I agree. We need to document this drawbacks clearly. Also I noticed the ZOOKEEPER-2184 could be addressed by some of my plumbing changes that pass the configured hostname around along with the resolved ip address. There is an active PR for this so I don't think we need to address it in this patch. It is best to combine the PRs into one so we can collaborate and increase the velocity. I agree. I would like to suggest that we use X509ExtendedTrustManager If the concerns I listed above with hostname verification can be addressed I would be open to this. I would like to keep the BC helper code What do you mean by bc helper code? I lean towards just changing Socket() calls to something else but not adding any more code to Leader.java, Follower.java, Learner.java etc. Sounds like a good idea. Once we get the implementation details flushed out there will be plenty of refactoring to clean this stuff up. Thanks, Abe
          Hide
          geek101 Powell Molleti added a comment -

          Hi Abe,

          I do not think it mentions this should be not be done at cert verification time nor should we allow exchange of application bits when the certificate is not what we expected it to be.

          So the issue is more zookeeper specific I think. Imagine the case where, and I know this is very contrived but I think the principal is valid, we have 3 zk servers all running on the same host with different ports. We have 3 dns records pointing to this machine with different names, say zk1, zk2, and zk3. Each zkX has a certificate with the zkX common name. Our zookeeper configuration identifies these servers by the correct name server.1=zk1... When one of these servers connects to the server socket on the other I do not think it is possible for the "server" to tell which zkX connected until the sid is read from the socket unless we want to start doing reverse dns lookups. I would rather just use the hostname we already know about. That is why I think we cannot do hostname verification in the trust manager. Or you could argue that we only need hostname verification for the "client", but I would rather have it both ways. Please let me know if I am missing something.

          If multiple servers have certs with the same subjectAltName entry of type dNSName and that is indeed is how the CA signed them then it should be ok from TLS perspective, since we trust the CA, we trust any server signed by the CA and we match the hostname(i.e we trust DNS lookup or we trust the config provided). We do not have to verify the specific host since it is sufficient to verify that it is one of the valid hosts, this is secure.

          Take a case where if someone can subvert the CA get signed by it for the same domain and subvert DNS then they might as well try few sids starting from zero before ZK lets the server connect.

          Hostname verification does not apply to self-signed certs. I am quite skeptical about the need for hostname verification in private CA (enterprise setting) too. We should probably have it off by default and let the admin turn it on.

          I would like to keep the BC helper code

          What do you mean by bc helper code?

          BC is bouncy castle.

          Also wanted to ask you if we could make the all sockets BufferedSocket by default rather then making that conditional on port unification configuration.

          I need to restore backward compatibility to X509Util, I removed the public statics that were marked deprecated, probably cannot do that just yet?.

          thanks
          Powell.

          Show
          geek101 Powell Molleti added a comment - Hi Abe, I do not think it mentions this should be not be done at cert verification time nor should we allow exchange of application bits when the certificate is not what we expected it to be. So the issue is more zookeeper specific I think. Imagine the case where, and I know this is very contrived but I think the principal is valid, we have 3 zk servers all running on the same host with different ports. We have 3 dns records pointing to this machine with different names, say zk1, zk2, and zk3. Each zkX has a certificate with the zkX common name. Our zookeeper configuration identifies these servers by the correct name server.1=zk1... When one of these servers connects to the server socket on the other I do not think it is possible for the "server" to tell which zkX connected until the sid is read from the socket unless we want to start doing reverse dns lookups. I would rather just use the hostname we already know about. That is why I think we cannot do hostname verification in the trust manager. Or you could argue that we only need hostname verification for the "client", but I would rather have it both ways. Please let me know if I am missing something. If multiple servers have certs with the same subjectAltName entry of type dNSName and that is indeed is how the CA signed them then it should be ok from TLS perspective, since we trust the CA, we trust any server signed by the CA and we match the hostname(i.e we trust DNS lookup or we trust the config provided). We do not have to verify the specific host since it is sufficient to verify that it is one of the valid hosts, this is secure. Take a case where if someone can subvert the CA get signed by it for the same domain and subvert DNS then they might as well try few sids starting from zero before ZK lets the server connect. Hostname verification does not apply to self-signed certs. I am quite skeptical about the need for hostname verification in private CA (enterprise setting) too. We should probably have it off by default and let the admin turn it on. I would like to keep the BC helper code What do you mean by bc helper code? BC is bouncy castle. Also wanted to ask you if we could make the all sockets BufferedSocket by default rather then making that conditional on port unification configuration. I need to restore backward compatibility to X509Util, I removed the public statics that were marked deprecated, probably cannot do that just yet?. thanks Powell.
          Hide
          abrahamfine Abraham Fine added a comment -

          Hi Powell Molleti-

          If multiple servers have certs with the same subjectAltName entry of type dNSName and that is indeed is how the CA signed them then it should be ok from TLS perspective

          I agree that at least one of the alt_names needs to match the host from configuration (not all of them).

          Take a case where if someone can subvert the CA get signed by it for the same domain and subvert DNS then they might as well try few sids starting from zero before ZK lets the server connect.

          If the CA is subverted (certificates are being issued for servers for domains not under their control) and the name service is subverted, hostname verification of any kind can't work. Either the CA or the NS (dns or the zk config) needs to have integrity. But I think you raise an interesting issue where a client can just try multiple id's when connecting to another server until it matches the one on its (lets assume stolen) certificate. I did not consider that. Let me dig into this a little deeper and see if I can get a better idea of what is going on.

          We should probably have it off by default and let the admin turn it on.

          The latest patch has this behavior.

          Also wanted to ask you if we could make the all sockets BufferedSocket by default rather then making that conditional on port unification configuration.

          Why would we want to do this? I think there is a small performance hit involved.

          Show
          abrahamfine Abraham Fine added a comment - Hi Powell Molleti - If multiple servers have certs with the same subjectAltName entry of type dNSName and that is indeed is how the CA signed them then it should be ok from TLS perspective I agree that at least one of the alt_names needs to match the host from configuration (not all of them). Take a case where if someone can subvert the CA get signed by it for the same domain and subvert DNS then they might as well try few sids starting from zero before ZK lets the server connect. If the CA is subverted (certificates are being issued for servers for domains not under their control) and the name service is subverted, hostname verification of any kind can't work. Either the CA or the NS (dns or the zk config) needs to have integrity. But I think you raise an interesting issue where a client can just try multiple id's when connecting to another server until it matches the one on its (lets assume stolen) certificate. I did not consider that. Let me dig into this a little deeper and see if I can get a better idea of what is going on. We should probably have it off by default and let the admin turn it on. The latest patch has this behavior. Also wanted to ask you if we could make the all sockets BufferedSocket by default rather then making that conditional on port unification configuration. Why would we want to do this? I think there is a small performance hit involved.
          Hide
          abrahamfine Abraham Fine added a comment - - edited

          Hi Powell Molleti-

          Regarding host verification one other way to go is to follow this: X509ExtendedTrustManager mentions about where to plugin host verification , it specifically quotes:

          Let me know if this is what you had in mind. We do not need to subclass `X509ExtendedTrustManager` ourselves to get this to work, since the `X509TrustManagerImpl` object generated by the PKIX trustmanager factory (and x509 as well I think) extends `X509ExtendedTrustManager` already. If endpoint verification is set on the sslParameters of the sslSocket we get endpoint verification for free in `X509ExtendedTrustManager`. The issue is that we are limited to the built in implementations of endpoint verification but I think the "https" algorithm is sufficient for our use case.

          I implemented this here: https://github.com/apache/zookeeper/pull/184/commits/bebe09660f652f243e746905460ecbdffe2d155e

          Show
          abrahamfine Abraham Fine added a comment - - edited Hi Powell Molleti - Regarding host verification one other way to go is to follow this: X509ExtendedTrustManager mentions about where to plugin host verification , it specifically quotes: Let me know if this is what you had in mind. We do not need to subclass `X509ExtendedTrustManager` ourselves to get this to work, since the `X509TrustManagerImpl` object generated by the PKIX trustmanager factory (and x509 as well I think) extends `X509ExtendedTrustManager` already. If endpoint verification is set on the sslParameters of the sslSocket we get endpoint verification for free in `X509ExtendedTrustManager`. The issue is that we are limited to the built in implementations of endpoint verification but I think the "https" algorithm is sufficient for our use case. I implemented this here: https://github.com/apache/zookeeper/pull/184/commits/bebe09660f652f243e746905460ecbdffe2d155e
          Hide
          geek101 Powell Molleti added a comment -

          Hi Abe,

          Not sure that will work, I think SSL engine will need the the hostname set for this to work. Since server cannot know which client will connect it cannot use this mechanism I believe. This was meant for client side only.

          thanks
          Powell.

          Show
          geek101 Powell Molleti added a comment - Hi Abe, Not sure that will work, I think SSL engine will need the the hostname set for this to work. Since server cannot know which client will connect it cannot use this mechanism I believe. This was meant for client side only. thanks Powell.
          Hide
          abrahamfine Abraham Fine added a comment -

          Powell Molleti-

          You are correct. Apparently java doesn't do a reverse dns lookup by default. I had the IP address to be included as a subjectAltName of the client's certificate, so I didn't notice this failure. So I pushed an update with a subclass to X509ExtendedTrustManager:

                      for (final TrustManager tm : tmf.getTrustManagers()) {
                          if (tm instanceof X509TrustManager) {
                              return new X509ExtendedTrustManager() {
                                  HostnameChecker hostnameChecker = HostnameChecker.getInstance(HostnameChecker.TYPE_TLS);
          
                                  @Override
                                  public X509Certificate[] getAcceptedIssuers() {
                                      return ((X509ExtendedTrustManager) tm).getAcceptedIssuers();
                                  }
          
                                  @Override
                                  public void checkClientTrusted(X509Certificate[] x509Certificates, String s, Socket socket) throws CertificateException {
                                      hostnameChecker.match(socket.getInetAddress().getHostName(), x509Certificates[0]);
                                      ((X509ExtendedTrustManager) tm).checkClientTrusted(x509Certificates, s, socket);
                                  }
          
                                  @Override
                                  public void checkServerTrusted(X509Certificate[] x509Certificates, String s, Socket socket) throws CertificateException {
                                      hostnameChecker.match(((SSLSocket) socket).getHandshakeSession().getPeerHost(), x509Certificates[0]);
                                      ((X509ExtendedTrustManager) tm).checkServerTrusted(x509Certificates, s, socket);
                                  }
          

          We do a reverse dns lookup when a client is connecting to the server. This works. The primary issue I see is that import sun.security.util.HostnameChecker is proprietary and throws a compile warning.

              [javac] /Users/abefine/cloudera_code/zookeeper/src/java/main/org/apache/zookeeper/common/X509Util.java:26: warning: HostnameChecker is internal proprietary API and may be removed in a future release
              [javac] import sun.security.util.HostnameChecker;
          

          I'm not sure if it is the preference of the community to copy the code contained in HostnameChecker, add a dependency with similar functionality, or leave it as is. The issue is described clearly here https://kevinlocke.name/bits/2012/10/03/ssl-certificate-verification-in-dispatch-and-asynchttpclient/:

          To make matters worse, the check is not trivial (consider SAN and wildcard matching) and is implemented in sun.security.util.HostnameChecker (a Sun internal proprietary API). This leaves the developer in the position of either depending on an internal API or finding/copying/creating another implementation of this functionality. For the examples in this article, I have opted for the first option.

          Thanks,
          Abe

          Show
          abrahamfine Abraham Fine added a comment - Powell Molleti - You are correct. Apparently java doesn't do a reverse dns lookup by default. I had the IP address to be included as a subjectAltName of the client's certificate, so I didn't notice this failure. So I pushed an update with a subclass to X509ExtendedTrustManager : for ( final TrustManager tm : tmf.getTrustManagers()) { if (tm instanceof X509TrustManager) { return new X509ExtendedTrustManager() { HostnameChecker hostnameChecker = HostnameChecker.getInstance(HostnameChecker.TYPE_TLS); @Override public X509Certificate[] getAcceptedIssuers() { return ((X509ExtendedTrustManager) tm).getAcceptedIssuers(); } @Override public void checkClientTrusted(X509Certificate[] x509Certificates, String s, Socket socket) throws CertificateException { hostnameChecker.match(socket.getInetAddress().getHostName(), x509Certificates[0]); ((X509ExtendedTrustManager) tm).checkClientTrusted(x509Certificates, s, socket); } @Override public void checkServerTrusted(X509Certificate[] x509Certificates, String s, Socket socket) throws CertificateException { hostnameChecker.match(((SSLSocket) socket).getHandshakeSession().getPeerHost(), x509Certificates[0]); ((X509ExtendedTrustManager) tm).checkServerTrusted(x509Certificates, s, socket); } We do a reverse dns lookup when a client is connecting to the server. This works. The primary issue I see is that import sun.security.util.HostnameChecker is proprietary and throws a compile warning. [javac] /Users/abefine/cloudera_code/zookeeper/src/java/main/org/apache/zookeeper/common/X509Util.java:26: warning: HostnameChecker is internal proprietary API and may be removed in a future release [javac] import sun.security.util.HostnameChecker; I'm not sure if it is the preference of the community to copy the code contained in HostnameChecker, add a dependency with similar functionality, or leave it as is. The issue is described clearly here https://kevinlocke.name/bits/2012/10/03/ssl-certificate-verification-in-dispatch-and-asynchttpclient/: To make matters worse, the check is not trivial (consider SAN and wildcard matching) and is implemented in sun.security.util.HostnameChecker (a Sun internal proprietary API). This leaves the developer in the position of either depending on an internal API or finding/copying/creating another implementation of this functionality. For the examples in this article, I have opted for the first option. Thanks, Abe
          Hide
          abrahamfine Abraham Fine added a comment -

          I found that org.apache.httpcomponents.httpclient has org.apache.http.conn.ssl.DefaultHostnameVerifier which should serve our purposes well.

          Show
          abrahamfine Abraham Fine added a comment - I found that org.apache.httpcomponents.httpclient has org.apache.http.conn.ssl.DefaultHostnameVerifier which should serve our purposes well.

            People

            • Assignee:
              abrahamfine Abraham Fine
              Reporter:
              breed Benjamin Reed
            • Votes:
              4 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:

                Development