Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: server
    • Labels:
      None

      Description

      ZOOKEEPER-938 addresses mutual authentication between clients and servers. This bug, on the other hand, is for authentication among quorum peers. Hopefully much of the work done on SASL integration with Zookeeper for ZOOKEEPER-938 can be used as a foundation for this enhancement.

        Issue Links

          Activity

          Hide
          Rakesh R added a comment -

          Thanks a lot Chris Nauroth for the thoughts. Its really useful sharing. Currently in my patch I haven't considered QOP in quorum peer connections. How about pushing the basic SASL solution first and later will enhance the feature by including the QOP support based on the community interests.

          Show
          Rakesh R added a comment - Thanks a lot Chris Nauroth for the thoughts. Its really useful sharing. Currently in my patch I haven't considered QOP in quorum peer connections. How about pushing the basic SASL solution first and later will enhance the feature by including the QOP support based on the community interests.
          Hide
          Ivan Kelly added a comment -

          Two concerns that I have are , is it architecturally ok to enforce ZK to talk to an external server(perhaps on regular intervals) to form a quorum and if that is ok then is this the most widely used/requested feature by users.

          The patch already supports DIGEST-MD5, so Krb isn't required to have auth.

          Encryption of the links is a separate concern, and shouldn't be done as part of this change. There's already a JIRA for it (ZOOKEEPER-1000).

          Show
          Ivan Kelly added a comment - Two concerns that I have are , is it architecturally ok to enforce ZK to talk to an external server(perhaps on regular intervals) to form a quorum and if that is ok then is this the most widely used/requested feature by users. The patch already supports DIGEST-MD5, so Krb isn't required to have auth. Encryption of the links is a separate concern, and shouldn't be done as part of this change. There's already a JIRA for it ( ZOOKEEPER-1000 ).
          Hide
          Jason Rosenberg added a comment -

          I think simple TLS/SSL is more important to have, and I expect significantly easier to implement and test, and to use operationally. A kerberos might be a nice to have feature (but should not be prioritized ahead of basic SSL, IMHO).

          Show
          Jason Rosenberg added a comment - I think simple TLS/SSL is more important to have, and I expect significantly easier to implement and test, and to use operationally. A kerberos might be a nice to have feature (but should not be prioritized ahead of basic SSL, IMHO).
          Hide
          Chris Nauroth added a comment -

          Doesn't Kerberos have requirement of timeout for session token etc?. Is Kerberos used widely for data transfer protocol channels?

          I can speak to how this is done in Hadoop. The Hadoop daemons do authenticate via Kerberos, using a keytab file. The login is done once during initial startup of the daemon. After that, the daemon can authenticate to other remote daemons using the Kerberos ticket in SASL authentication.

          There are some edge cases that need to be handled. Kerberos tickets have a maximum lifetime, after which it is no longer possible to renew. To handle this, Hadoop's RPC layer is capable of detecting an authentication failure during a connection attempt, and it will handle it by doing an automatic relogin of the same principal from the same keytab that was used during process startup.

          Another issue is that Kerberos infrastructure typically attempts to detect replay attacks by checking for multiple login attempts for the same principal within a short window. To handle that, we apply some backoff logic before trying again.

          It's tricky stuff, but it's solvable, and it has worked well for Hadoop.

          Two concerns that I have are , is it architecturally ok to enforce ZK to talk to an external server(perhaps on regular intervals) to form a quorum and if that is ok then is this the most widely used/requested feature by users.

          You're right that overall availability then becomes tied to availability of the KDC. I don't have any perspective to offer on which approach is more widely requested by ZooKeeper users. I haven't personally received any requests for quorum peer authentication myself.

          Show
          Chris Nauroth added a comment - Doesn't Kerberos have requirement of timeout for session token etc?. Is Kerberos used widely for data transfer protocol channels? I can speak to how this is done in Hadoop. The Hadoop daemons do authenticate via Kerberos, using a keytab file. The login is done once during initial startup of the daemon. After that, the daemon can authenticate to other remote daemons using the Kerberos ticket in SASL authentication. There are some edge cases that need to be handled. Kerberos tickets have a maximum lifetime, after which it is no longer possible to renew. To handle this, Hadoop's RPC layer is capable of detecting an authentication failure during a connection attempt, and it will handle it by doing an automatic relogin of the same principal from the same keytab that was used during process startup. Another issue is that Kerberos infrastructure typically attempts to detect replay attacks by checking for multiple login attempts for the same principal within a short window. To handle that, we apply some backoff logic before trying again. It's tricky stuff, but it's solvable, and it has worked well for Hadoop. Two concerns that I have are , is it architecturally ok to enforce ZK to talk to an external server(perhaps on regular intervals) to form a quorum and if that is ok then is this the most widely used/requested feature by users. You're right that overall availability then becomes tied to availability of the KDC. I don't have any perspective to offer on which approach is more widely requested by ZooKeeper users. I haven't personally received any requests for quorum peer authentication myself.
          Hide
          Powell Molleti added a comment -

          Is it considered ok to enforce, a strong CA system like, Zookeeper to connect to an external entity to get authenticated to form a quorum. Will the external entity be considered equal or more reliable than Zookeeper?. There is a difference in reading a key from local filesystem and reading key over socket from a remote machine. I see two layers of issues here, single path networking to the authentication server and the HA capabilities of the authentication server it self.

          Doesn't Kerberos have requirement of timeout for session token etc?. Is Kerberos used widely for data transfer protocol channels?. From my understanding it is pretty common to control user access to systems via Kerberos etc, I am unsure however w.r.t inter-cluster / inter-server channels.

          A quick survey of comparable/semi-comparable projects here:

          Two concerns that I have are , is it architecturally ok to enforce ZK to talk to an external server(perhaps on regular intervals) to form a quorum and if that is ok then is this the most widely used/requested feature by users.

          Show
          Powell Molleti added a comment - Is it considered ok to enforce, a strong CA system like, Zookeeper to connect to an external entity to get authenticated to form a quorum. Will the external entity be considered equal or more reliable than Zookeeper?. There is a difference in reading a key from local filesystem and reading key over socket from a remote machine. I see two layers of issues here, single path networking to the authentication server and the HA capabilities of the authentication server it self. Doesn't Kerberos have requirement of timeout for session token etc?. Is Kerberos used widely for data transfer protocol channels?. From my understanding it is pretty common to control user access to systems via Kerberos etc, I am unsure however w.r.t inter-cluster / inter-server channels. A quick survey of comparable/semi-comparable projects here: Etcd TLS/SSL for inter-node encryption Consul TLS/SSL for inter-node encryption Cassandra TLS/SSL for inter-node encryption Mongdb TLS/SSL for intern-node encryption ? Two concerns that I have are , is it architecturally ok to enforce ZK to talk to an external server(perhaps on regular intervals) to form a quorum and if that is ok then is this the most widely used/requested feature by users.
          Hide
          Chris Nauroth added a comment -

          Regarding the QOP settings, use of auth-int (integrity checking to guard against man-in-the-middle tampering) or auth-conf (encryption to prevent man-in-the-middle reading data) requires wrapping and unwrapping the data exchanged between client and server so that the SASL code is given an opportunity to inspect the data, either to validate it hasn't been tampered or encrypt/decrypt. This is accomplished by passing the stream data through a couple of special methods in the SASL API.

          http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.html#unwrap(byte[],%20int,%20int)

          http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.html#wrap(byte[],%20int,%20int)

          http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.html#unwrap(byte[],%20int,%20int)

          http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.html#wrap(byte[],%20int,%20int)

          This means that supporting auth-int or auth-conf would require more coding work for us compared to just plain auth. I haven't looked at this specific patch to see if it tried to do this. The last time I considered supporting the full range of QOP settings, it looked like it was going to be a very intrusive change to the existing ZooKeeper codebase. I was looking at the client-server connection though, not the quorum peer connections.

          In Hadoop, we implement this with special subclasses of InputStream and OutputStream that do the SASL wrap/unwrap calls internally and then delegate to another underlying stream. This has proven to be a pretty elegant design, because it encapsulates the SASL wrapping and unwrapping from the rest of the Hadoop codebase. The rest of the code doesn't need to worry about whether auth or auth-int or auth-conf is in effect. It just reads from/writes to streams.

          Show
          Chris Nauroth added a comment - Regarding the QOP settings, use of auth-int (integrity checking to guard against man-in-the-middle tampering) or auth-conf (encryption to prevent man-in-the-middle reading data) requires wrapping and unwrapping the data exchanged between client and server so that the SASL code is given an opportunity to inspect the data, either to validate it hasn't been tampered or encrypt/decrypt. This is accomplished by passing the stream data through a couple of special methods in the SASL API. http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.html#unwrap(byte[],%20int,%20int ) http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.html#wrap(byte[],%20int,%20int ) http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.html#unwrap(byte[],%20int,%20int ) http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.html#wrap(byte[],%20int,%20int ) This means that supporting auth-int or auth-conf would require more coding work for us compared to just plain auth. I haven't looked at this specific patch to see if it tried to do this. The last time I considered supporting the full range of QOP settings, it looked like it was going to be a very intrusive change to the existing ZooKeeper codebase. I was looking at the client-server connection though, not the quorum peer connections. In Hadoop, we implement this with special subclasses of InputStream and OutputStream that do the SASL wrap / unwrap calls internally and then delegate to another underlying stream. This has proven to be a pretty elegant design, because it encapsulates the SASL wrapping and unwrapping from the rest of the Hadoop codebase. The rest of the code doesn't need to worry about whether auth or auth-int or auth-conf is in effect. It just reads from/writes to streams.
          Hide
          Rakesh R added a comment -

          Is there much user demand for Kerberos support for inter-zk channels?.

          SASL approach is a good choice considering that SASL(Kerberos) is already supported between zkclient-zksever communications quite some time. I think users/admins will be much more comfortable with SASL deployment. Also, it will be easy for them to upgrade the existing clusters which is secured with SASL(kerberos) way.

          Will ZK have to always get token from KDC first before authenticating a peer?. I am not quite familiar with SASL Java API can you shed some light into the system level process.

          Yes, Initial authentication takes place between the Kerberos client and the KDC server. In ZooKeeper,every QuorumPeer will have a Kerberos client and at the beginning gets token from the KDC server. For the initial TGT can either choose kinit USERNAME or use the local keytab file. I've tried capturing few configurations needed for implementing this feature in README.md

          For example,

          QuorumServer {
                 com.sun.security.auth.module.Krb5LoginModule required
                 useKeyTab=false
                 useTicketCache=true
                 principal="zkquorum/localhost@EXAMPLE.COM";
          };
          
          QuorumServer {
                 com.sun.security.auth.module.Krb5LoginModule required
                 useKeyTab=true
                 keyTab="/path/to/keytab"
                 storeKey=true
                 useTicketCache=false
                 debug=false
                 principal="zkquorum/localhost@EXAMPLE.COM";
          };
          

          I hope the following links will help to understand Kerberos, SASL java API:

          http://www.roguelynn.com/words/explain-like-im-5-kerberos/
          https://docs.oracle.com/javase/7/docs/technotes/guides/security/sasl/sasl-refguide.html
          https://software.intel.com/sites/manageability/AMT_Implementation_and_Reference_Guide/default.htm?turl=WordDocuments%2Fintroductiontokerberosauthentication.htm

          Does this provide encryption of the data traffic using the shared secret key?.

          As per my understanding, Kerberos provides facilities to make sure that messages are not changed as they travel across the network. The messages can optionally be encrypted so only the parties that know the session key can examine their contents. Java SASL provides QOP settings. "auth-conf" – This stands for authentication, integrity and confidentiality. This setting guarantees that data exchanged between client and server is encrypted and is not readable by a "man in the middle". Truly I haven't explored much to implement this part. I hope experienced folks can shed some light on this.

          Show
          Rakesh R added a comment - Is there much user demand for Kerberos support for inter-zk channels?. SASL approach is a good choice considering that SASL(Kerberos) is already supported between zkclient-zksever communications quite some time. I think users/admins will be much more comfortable with SASL deployment. Also, it will be easy for them to upgrade the existing clusters which is secured with SASL(kerberos) way. Will ZK have to always get token from KDC first before authenticating a peer?. I am not quite familiar with SASL Java API can you shed some light into the system level process. Yes, Initial authentication takes place between the Kerberos client and the KDC server. In ZooKeeper,every QuorumPeer will have a Kerberos client and at the beginning gets token from the KDC server. For the initial TGT can either choose kinit USERNAME or use the local keytab file. I've tried capturing few configurations needed for implementing this feature in README.md For example, QuorumServer { com.sun.security.auth.module.Krb5LoginModule required useKeyTab= false useTicketCache= true principal= "zkquorum/localhost@EXAMPLE.COM" ; }; QuorumServer { com.sun.security.auth.module.Krb5LoginModule required useKeyTab= true keyTab= "/path/to/keytab" storeKey= true useTicketCache= false debug= false principal= "zkquorum/localhost@EXAMPLE.COM" ; }; I hope the following links will help to understand Kerberos, SASL java API: http://www.roguelynn.com/words/explain-like-im-5-kerberos/ https://docs.oracle.com/javase/7/docs/technotes/guides/security/sasl/sasl-refguide.html https://software.intel.com/sites/manageability/AMT_Implementation_and_Reference_Guide/default.htm?turl=WordDocuments%2Fintroductiontokerberosauthentication.htm Does this provide encryption of the data traffic using the shared secret key?. As per my understanding, Kerberos provides facilities to make sure that messages are not changed as they travel across the network. The messages can optionally be encrypted so only the parties that know the session key can examine their contents. Java SASL provides QOP settings. "auth-conf" – This stands for authentication, integrity and confidentiality. This setting guarantees that data exchanged between client and server is encrypted and is not readable by a "man in the middle". Truly I haven't explored much to implement this part. I hope experienced folks can shed some light on this.
          Hide
          Powell Molleti added a comment -

          Is there much user demand for Kerberos support for inter-zk channels?. Will ZK have to always get token from KDC first before authenticating a peer?. I am not quite familiar with SASL Java API can you shed some light into the system level process. Does this provide encryption of the data traffic using the shared secret key?.

          Show
          Powell Molleti added a comment - Is there much user demand for Kerberos support for inter-zk channels?. Will ZK have to always get token from KDC first before authenticating a peer?. I am not quite familiar with SASL Java API can you shed some light into the system level process. Does this provide encryption of the data traffic using the shared secret key?.
          Hide
          Rakesh R added a comment -

          Pending Work:
          1. Need to support upgrade execution path. I'll update the proposal to support this soon.

          I've captured the design thoughts that comes in my mind and attached a draft document which describes the proposal. It would be really great to see the feedback from the community about the proposal. Thanks!

          Show
          Rakesh R added a comment - Pending Work: 1. Need to support upgrade execution path. I'll update the proposal to support this soon. I've captured the design thoughts that comes in my mind and attached a draft document which describes the proposal. It would be really great to see the feedback from the community about the proposal. Thanks!
          Hide
          Powell Molleti added a comment -

          All API are async , will return immediately. The module does try to hold the invariant of ensuring that current vote is sent to peers in all cases(adding peers, reconnecting peers, msg tx errors etc). Will come with unit-tests to verify correctness. I will certainly write a design doc and publish that soon. The goal is to address ZOOKEEPER-901 and its related issues. Perhaps we should move this discussion there.

          Two points of divergence from current implementation w.r.t FLE:
          1. FLE does not to put the Vote into per Peer queue via manager.toSend() anymore only has to send call broadcast(vote) when ever it likes. QCM will take of sending the new Vote, FLE asked it to , if it knows this is a new Vote for peer(s) its managing. There are no outgoing queue to each Peer in QCM either, when it connects to a peer it just sends current Vote it has.

          2. FLE unlike now will call getVotesBlockingQueue() (instead of manager.pollRecvQueue()) will get current votes of all peers that QCM knows now and any future vote received that is different than last one sent for every peer since the first call.

          From my understanding this will eliminate unnecessary transitions of FLE since currently it has to digest all the messages received when it calls pollRecvQueue() since last round, this is because QCM's Rx/Tx is always alive until QuorumPeer shuts it down, which is when QuorumPeer is shutdown.

          There is also an API call called getVotes() which simply returns current vote view that QCM knows, but too keep FLE simple/same with current implementation I will add getVotesBlockingQueue() to mimic pollRecvQueue().

          Few points regarding implementation:
          1. QCM will use a single thread executor. Hopefully this will address the concern of starting and stoping multiple threads to each peer.

          2. This thread is shared by Netty to handle all TCP channels and also tasks in QCM to perform operations (like new channel/write and queue mgmt).

          3. Netty handlers are written to be thread safe so one could pass more threads but I think one thread should be enough to handle handful of QuorumPeers talking part time.

          4. Will try to keep FLE changes to minimum, only touching interfaces to manager.

          Please let me know if I assumed something wrong and all feedback/comments are welcome.

          Show
          Powell Molleti added a comment - All API are async , will return immediately. The module does try to hold the invariant of ensuring that current vote is sent to peers in all cases(adding peers, reconnecting peers, msg tx errors etc). Will come with unit-tests to verify correctness. I will certainly write a design doc and publish that soon. The goal is to address ZOOKEEPER-901 and its related issues. Perhaps we should move this discussion there. Two points of divergence from current implementation w.r.t FLE: 1. FLE does not to put the Vote into per Peer queue via manager.toSend() anymore only has to send call broadcast(vote) when ever it likes. QCM will take of sending the new Vote, FLE asked it to , if it knows this is a new Vote for peer(s) its managing. There are no outgoing queue to each Peer in QCM either, when it connects to a peer it just sends current Vote it has. 2. FLE unlike now will call getVotesBlockingQueue() (instead of manager.pollRecvQueue()) will get current votes of all peers that QCM knows now and any future vote received that is different than last one sent for every peer since the first call. From my understanding this will eliminate unnecessary transitions of FLE since currently it has to digest all the messages received when it calls pollRecvQueue() since last round, this is because QCM's Rx/Tx is always alive until QuorumPeer shuts it down, which is when QuorumPeer is shutdown. There is also an API call called getVotes() which simply returns current vote view that QCM knows, but too keep FLE simple/same with current implementation I will add getVotesBlockingQueue() to mimic pollRecvQueue(). Few points regarding implementation: 1. QCM will use a single thread executor. Hopefully this will address the concern of starting and stoping multiple threads to each peer. 2. This thread is shared by Netty to handle all TCP channels and also tasks in QCM to perform operations (like new channel/write and queue mgmt). 3. Netty handlers are written to be thread safe so one could pass more threads but I think one thread should be enough to handle handful of QuorumPeers talking part time. 4. Will try to keep FLE changes to minimum, only touching interfaces to manager. Please let me know if I assumed something wrong and all feedback/comments are welcome.
          Hide
          Powell Molleti added a comment -

          Hi Shasha,

          If you want to try out SSL based implementation please refer to ZOOKEEPER-1000 I have posted links to source. This is implemented for 3.4.x branch. Feel free to let us know what you requirements are this will help us refine the solution.

          thanks
          Powell.

          Show
          Powell Molleti added a comment - Hi Shasha, If you want to try out SSL based implementation please refer to ZOOKEEPER-1000 I have posted links to source. This is implemented for 3.4.x branch. Feel free to let us know what you requirements are this will help us refine the solution. thanks Powell.
          Hide
          Rakesh R added a comment -

          I'm attaching patch to supports QuorumPeer authentication using the SASL(Kerberos/Digest) mechanism. This patch is based on branch-3.4. Also, please refer PR: https://github.com/apache/zookeeper/pull/49. Any questions and comments are very welcome.

          Following are the changes:

          1. Please refer src/java/main/org/apache/zookeeper/server/quorum/auth/README.md to see the configurations.
          2. Introduced QuorumConnectionThread, through which the connection will be established between the quorum peers asynchronously. This will not block other connection requests.
          3. Added org.apache.zookeeper.util.SecurityUtils to reduce the code duplication
          4. Added org.apache.zookeeper.server.quorum.QuorumAuthPacket, jute buffer for messaging.
          5. Refer QuorumAuthClient and QuorumAuthServer for the major auth logic.
          6. Included tests to verify Digest mechanism
          7. Included tests to verify the Kerberos. I've used MiniKdc way of testing from the HDFS and taken few test classes from that project. This code base is quite big and added few test jar dependencies apache.directory.server

          Thanks a lot Ivan Kelly, Hongchao Deng, Flavio Junqueira, Patrick Hunt, Raul Gutierrez Segales for the offline discussions and advice.

          Pending Work:

          1. Need to support upgrade execution path. I'll update the proposal to support this soon.
          Show
          Rakesh R added a comment - I'm attaching patch to supports QuorumPeer authentication using the SASL(Kerberos/Digest) mechanism. This patch is based on branch-3.4. Also, please refer PR: https://github.com/apache/zookeeper/pull/49 . Any questions and comments are very welcome. Following are the changes: Please refer src/java/main/org/apache/zookeeper/server/quorum/auth/README.md to see the configurations. Introduced QuorumConnectionThread , through which the connection will be established between the quorum peers asynchronously. This will not block other connection requests. Added org.apache.zookeeper.util.SecurityUtils to reduce the code duplication Added org.apache.zookeeper.server.quorum.QuorumAuthPacket , jute buffer for messaging. Refer QuorumAuthClient and QuorumAuthServer for the major auth logic. Included tests to verify Digest mechanism Included tests to verify the Kerberos. I've used MiniKdc way of testing from the HDFS and taken few test classes from that project. This code base is quite big and added few test jar dependencies apache.directory.server Thanks a lot Ivan Kelly , Hongchao Deng , Flavio Junqueira , Patrick Hunt , Raul Gutierrez Segales for the offline discussions and advice. Pending Work: Need to support upgrade execution path. I'll update the proposal to support this soon.
          Hide
          Flavio Junqueira added a comment -

          Powell Molleti this looks good, but I don't fully understand the semantics of VoteBroadcast.broadcast(msg). The problem I see is that you don't want to block the call until everyone receives the message, but at the same time you need to deliver votes to late joiners.

          One suggestion is to write a short design doc explaining the reasoning for this proposal.

          Show
          Flavio Junqueira added a comment - Powell Molleti this looks good, but I don't fully understand the semantics of VoteBroadcast.broadcast(msg). The problem I see is that you don't want to block the call until everyone receives the message, but at the same time you need to deliver votes to late joiners. One suggestion is to write a short design doc explaining the reasoning for this proposal.
          Hide
          Rakesh R added a comment -

          Thanks everyone for the interests and useful discussions. I agree with Flavio Junqueira to make a clear distinction between the two jira issues. I could see the idea of this jira is to provide an authentication mechanism among the quorum peers using SASL. I'm currently working on SASL + Kerberos based solution in branch-3.4. I'm assigning the issue to myself and will upload a patch soon.

          Thank you Powell Molleti for the efforts in building SSL way, we could use ZOOKEEPER-1000 for this solution. My personal opinion is to implement SSL solution in 3.5.* or trunk as Netty + SSL feature for the client-server communication is available from branch-3.5 onwards.

          Show
          Rakesh R added a comment - Thanks everyone for the interests and useful discussions. I agree with Flavio Junqueira to make a clear distinction between the two jira issues. I could see the idea of this jira is to provide an authentication mechanism among the quorum peers using SASL. I'm currently working on SASL + Kerberos based solution in branch-3.4. I'm assigning the issue to myself and will upload a patch soon. Thank you Powell Molleti for the efforts in building SSL way, we could use ZOOKEEPER-1000 for this solution. My personal opinion is to implement SSL solution in 3.5.* or trunk as Netty + SSL feature for the client-server communication is available from branch-3.5 onwards.
          Hide
          Jason Rosenberg added a comment -

          Mutual SSL support, seems simpler (if I'm not mistaken), and is adequate for the basic blocker we have, of wanting to have a cluster with nodes spanning multiple datacenters (e.g. with remote observer nodes, etc.). SASL is probably overkill for that.

          Show
          Jason Rosenberg added a comment - Mutual SSL support, seems simpler (if I'm not mistaken), and is adequate for the basic blocker we have, of wanting to have a cluster with nodes spanning multiple datacenters (e.g. with remote observer nodes, etc.). SASL is probably overkill for that.
          Hide
          Flavio Junqueira added a comment -

          I agree, but it'd be nice to make the distinction between the two jiras clear. We can also use SSL authentication, so I assume we can narrow down the scope of this jira to just SASL or perhaps leave the decision of whether to use SSL authentication at all to this jira and have a note in the other.

          Show
          Flavio Junqueira added a comment - I agree, but it'd be nice to make the distinction between the two jiras clear. We can also use SSL authentication, so I assume we can narrow down the scope of this jira to just SASL or perhaps leave the decision of whether to use SSL authentication at all to this jira and have a note in the other.
          Hide
          Jason Heiss added a comment -

          I would vote for keeping this ticket open for SASL server-server authentication. SSL is better than nothing, but SASL support would be nice.

          Show
          Jason Heiss added a comment - I would vote for keeping this ticket open for SASL server-server authentication. SSL is better than nothing, but SASL support would be nice.
          Hide
          Shasha Song added a comment -

          3.4.6

          Show
          Shasha Song added a comment - 3.4.6
          Hide
          Powell Molleti added a comment -

          Both are the same unless the author of the bug wants more complex auth that what I have proposed, perhaps duplicating this with ZOOKEEPER-1000 and moving the discussion there is best.

          Show
          Powell Molleti added a comment - Both are the same unless the author of the bug wants more complex auth that what I have proposed, perhaps duplicating this with ZOOKEEPER-1000 and moving the discussion there is best.
          Hide
          Powell Molleti added a comment -

          Hi Raul,

          For now I have written something that tries to replace QuorumCnxManager class using Netty 4.1 for ZOOKEEPER-901, which tries to address both issues of SSL and serialized connect.

          Which would work something like this:
          1. Initialize as VoteBroadcast(Set<QuorumServer>) (QuorumPeer will do that)
          2. Then use it as follows FLE.sendNotifications(msg) -> VoteBroadcast.broadcast(msg) and FLE.WorkerReceiver.run() -> VoteBroadcast.getVotes().
          I am providing addServer() and removeServer() methods will could address 3.5.x I think(not sure yet!).

          I was hoping to use this stuff for Learner but I at this point in time SSL Sockets for Learner seems like a better way to get SSL working for it.
          The transport/encode/decode is pretty entrenched there and making all of that async seems risky just to get SSL and implementing streaming interface on top of Netty channels seems like increasing complexity just to get SSL. Hence I am leaning towards SSL Sockets for Learner side of things. Let me know what you think or if I have gotten that wrong.

          I will post a patches for 3.4 first since I am most familiar with it then work my way upstream. I will post two patches one for QCM and other for Learner. I have yet to start with Learner side of things.

          Thanks
          Powell.

          Show
          Powell Molleti added a comment - Hi Raul, For now I have written something that tries to replace QuorumCnxManager class using Netty 4.1 for ZOOKEEPER-901 , which tries to address both issues of SSL and serialized connect. Which would work something like this: 1. Initialize as VoteBroadcast(Set<QuorumServer>) (QuorumPeer will do that) 2. Then use it as follows FLE.sendNotifications(msg) -> VoteBroadcast.broadcast(msg) and FLE.WorkerReceiver.run() -> VoteBroadcast.getVotes(). I am providing addServer() and removeServer() methods will could address 3.5.x I think(not sure yet!). I was hoping to use this stuff for Learner but I at this point in time SSL Sockets for Learner seems like a better way to get SSL working for it. The transport/encode/decode is pretty entrenched there and making all of that async seems risky just to get SSL and implementing streaming interface on top of Netty channels seems like increasing complexity just to get SSL. Hence I am leaning towards SSL Sockets for Learner side of things. Let me know what you think or if I have gotten that wrong. I will post a patches for 3.4 first since I am most familiar with it then work my way upstream. I will post two patches one for QCM and other for Learner. I have yet to start with Learner side of things. Thanks Powell.
          Hide
          Jason Rosenberg added a comment -

          How does this ticket compare to ZOOKEEPER-1000? Do they solve different problems? Or alternate solutions to the same problem?

          Show
          Jason Rosenberg added a comment - How does this ticket compare to ZOOKEEPER-1000 ? Do they solve different problems? Or alternate solutions to the same problem?
          Hide
          Raul Gutierrez Segales added a comment -

          Thanks for working on this Powell Molleti! Happy to help with reviewing & merging those patches. What branch are your patches based on (3.4 or 3.5)?

          Show
          Raul Gutierrez Segales added a comment - Thanks for working on this Powell Molleti ! Happy to help with reviewing & merging those patches. What branch are your patches based on (3.4 or 3.5)?
          Hide
          Powell Molleti added a comment -

          Hi Shasha,

          I am working on a patch at this time its still work in progress. What version of ZK are you looking to use will it be 3.4.x or 3.5.x?.

          thanks
          Powell.

          Show
          Powell Molleti added a comment - Hi Shasha, I am working on a patch at this time its still work in progress. What version of ZK are you looking to use will it be 3.4.x or 3.5.x?. thanks Powell.
          Hide
          Shasha Song added a comment -

          Hi Powell,

          That should work. Is that already implemented and where can I find the instructions? Or it's going to be implemented as part of this jira?

          Thanks
          Shasha

          Show
          Shasha Song added a comment - Hi Powell, That should work. Is that already implemented and where can I find the instructions? Or it's going to be implemented as part of this jira? Thanks Shasha
          Hide
          Powell Molleti added a comment -

          Hi Sasha,

          Will SSL based Cert authentication solve your problem?. Mind that there are two channels/tcp connections between Quorum Peers one for election and other for ZAB. One can create a CA cert for a ZK cluster and use that to sign the cert of each ZK node. Which will ensure that nodes signed by this CA cert, i.e part of this cluster, can connect to each other.

          Let me know if this works for your use case.
          Thanks
          Powell.

          Show
          Powell Molleti added a comment - Hi Sasha, Will SSL based Cert authentication solve your problem?. Mind that there are two channels/tcp connections between Quorum Peers one for election and other for ZAB. One can create a CA cert for a ZK cluster and use that to sign the cert of each ZK node. Which will ensure that nodes signed by this CA cert, i.e part of this cluster, can connect to each other. Let me know if this works for your use case. Thanks Powell.
          Hide
          Shasha Song added a comment -

          Hi Mahadev,

          Our team started to use zookeeper recently, and we think we need the server to server authentication to prevent other server joining the cluster. Any plan for this jira?

          Thanks

          Show
          Shasha Song added a comment - Hi Mahadev, Our team started to use zookeeper recently, and we think we need the server to server authentication to prevent other server joining the cluster. Any plan for this jira? Thanks
          Hide
          Mahadev konar added a comment -

          Devaraj,
          The server to server protocol is very different from client to server, which makes it harder to implement kerberos in the quorom peer protocols. I dont think we'll have this in 3.4.0 release. Maybe 3.5?

          Show
          Mahadev konar added a comment - Devaraj, The server to server protocol is very different from client to server, which makes it harder to implement kerberos in the quorom peer protocols. I dont think we'll have this in 3.4.0 release. Maybe 3.5?
          Hide
          Devaraj Das added a comment -

          Thanks for the security work on ZK, folks.

          I have a question - how is the quorom peer protocol different from the client-server protocol? Any rough estimate on the ETA for a patch on this issue?

          Show
          Devaraj Das added a comment - Thanks for the security work on ZK, folks. I have a question - how is the quorom peer protocol different from the client-server protocol? Any rough estimate on the ETA for a patch on this issue?

            People

            • Assignee:
              Rakesh R
              Reporter:
              Eugene Koontz
            • Votes:
              2 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

              • Created:
                Updated:

                Development