Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-2856

Fix block protocol so that Datanodes don't require root or jsvc

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: datanode, security
    • Labels:
      None

      Description

      Since we send the block tokens unencrypted to the datanode, we currently start the datanode as root using jsvc and get a secure (< 1024) port.

      If we have the datanode generate a nonce and send it on the connection and the sends an hmac of the nonce back instead of the block token it won't reveal any secrets. Thus, we wouldn't require a secure port and would not require root or jsvc.

      1. Datanode-Security-Design.pdf
        93 kB
        Chris Nauroth
      2. Datanode-Security-Design.pdf
        95 kB
        Chris Nauroth
      3. Datanode-Security-Design.pdf
        95 kB
        Chris Nauroth

        Issue Links

          Activity

          Hide
          Ram Marti added a comment -

          I am not sure this is quite correct.
          Let us recall the original issue:
          The authenticated data node processes that bind to the port has crashed

          • The tasks that have been launched by a malicious user and running on the data node monitors for the crash, bind to that
            port and receive the data and the block access token.
          • Till the block token expires (configurable but defaults to 10 hours) can use that token to access data on other data
            nodes.

          This may be fixed by what you propose above.But consider the write case. The client sends the data (unencrypted) and this data is available to the process listening on that port.

          I think the only way you can remove this restriction is if you enable integrity and encryption on the channel.

          Show
          Ram Marti added a comment - I am not sure this is quite correct. Let us recall the original issue: The authenticated data node processes that bind to the port has crashed The tasks that have been launched by a malicious user and running on the data node monitors for the crash, bind to that port and receive the data and the block access token. Till the block token expires (configurable but defaults to 10 hours) can use that token to access data on other data nodes. This may be fixed by what you propose above.But consider the write case. The client sends the data (unencrypted) and this data is available to the process listening on that port. I think the only way you can remove this restriction is if you enable integrity and encryption on the channel.
          Hide
          Todd Lipcon added a comment -

          Or handshake at the beginning of a write – we already have the DN send back a BlockOpResponseProto. We can authenticate the DN there before the client sends any private data.

          Show
          Todd Lipcon added a comment - Or handshake at the beginning of a write – we already have the DN send back a BlockOpResponseProto. We can authenticate the DN there before the client sends any private data.
          Hide
          Devaraj Das added a comment -

          We considered this option back at the time when we were trying to secure the datanode protocols. The problem with this approach is the increased number of round-trips that the handshake would introduce in every hop in the write-pipeline. We hadn't benchmark this though..

          Show
          Devaraj Das added a comment - We considered this option back at the time when we were trying to secure the datanode protocols. The problem with this approach is the increased number of round-trips that the handshake would introduce in every hop in the write-pipeline. We hadn't benchmark this though..
          Hide
          Todd Lipcon added a comment -

          We do already wait for a BlockOpResponseProto before beginning to stream data as it is... I suppose it would preclude a future optimization where we start streaming data before getting a response, but that's not an optimization in effect today.

          Show
          Todd Lipcon added a comment - We do already wait for a BlockOpResponseProto before beginning to stream data as it is... I suppose it would preclude a future optimization where we start streaming data before getting a response, but that's not an optimization in effect today.
          Hide
          Owen O'Malley added a comment -

          We should move the protocol to be a handshake-based one. Devaraj, the place where we rejected the handshake was the shuffle where the number of small connections is very high.

          Show
          Owen O'Malley added a comment - We should move the protocol to be a handshake-based one. Devaraj, the place where we rejected the handshake was the shuffle where the number of small connections is very high.
          Hide
          Suresh Srinivas added a comment -

          As this requires protocol changes between client and datanode, it is good to get this into 2.0.5 beta, to ensure wire compatibility.

          Show
          Suresh Srinivas added a comment - As this requires protocol changes between client and datanode, it is good to get this into 2.0.5 beta, to ensure wire compatibility.
          Hide
          Suresh Srinivas added a comment -

          Marking this as blocker for 2.0.5 beta.

          Show
          Suresh Srinivas added a comment - Marking this as blocker for 2.0.5 beta.
          Hide
          Chris Nauroth added a comment -

          I'm attaching a design document for establishing authentication of the datanode to the client. Feedback is welcome. I'm also reassigning the issue to myself.

          Show
          Chris Nauroth added a comment - I'm attaching a design document for establishing authentication of the datanode to the client. Feedback is welcome. I'm also reassigning the issue to myself.
          Hide
          Owen O'Malley added a comment -

          Chris, can you update the document with a read path?

          We should also include the current timestamp from the client to the datanode (both directly and in the hmac) to make replay attacks harder.

          Show
          Owen O'Malley added a comment - Chris, can you update the document with a read path? We should also include the current timestamp from the client to the datanode (both directly and in the hmac) to make replay attacks harder.
          Hide
          Chris Nauroth added a comment -

          Thanks, Owen. Here is a new version of the design doc.

          Chris, can you update the document with a read path?

          The preliminary handshake is the same, so I didn't clone all of that information to discuss readBlock. Instead, I added a statement that other operations like readBlock are similar in steps 1-6. If you still prefer to see a detailed section dedicated to readBlock, let me know, and I'll add it.

          We should also include the current timestamp from the client to the datanode (both directly and in the hmac) to make replay attacks harder.

          Good idea. I've changed step 3 to include client timestamp in the arguments and the calculation of client digest. I've changed step 4 so that the datanode checks client timestamp is within a threshold. I've changed step 5 to include client timestamp in calculation of server digest.

          Show
          Chris Nauroth added a comment - Thanks, Owen. Here is a new version of the design doc. Chris, can you update the document with a read path? The preliminary handshake is the same, so I didn't clone all of that information to discuss readBlock. Instead, I added a statement that other operations like readBlock are similar in steps 1-6. If you still prefer to see a detailed section dedicated to readBlock, let me know, and I'll add it. We should also include the current timestamp from the client to the datanode (both directly and in the hmac) to make replay attacks harder. Good idea. I've changed step 3 to include client timestamp in the arguments and the calculation of client digest. I've changed step 4 so that the datanode checks client timestamp is within a threshold. I've changed step 5 to include client timestamp in calculation of server digest.
          Hide
          Dilli Arumugam added a comment -

          Would suggest adding a config parameter on DataNode to define acceptable time skew on the client timestamp.

          Show
          Dilli Arumugam added a comment - Would suggest adding a config parameter on DataNode to define acceptable time skew on the client timestamp.
          Hide
          Chris Nauroth added a comment -

          Uploading a new version of the design document with 2:

          1. Mentioned that timestamp threshold is configurable. (Thank you, Dilli.)
          2. Stated more clearly on page 1 that the existing connection between datanode and namenode is already authenticated via Kerberos before giving the block key to the datanode. Therefore, if the datanode proves to the client that it has the block key, then the client knows that the datanode has authenticated. (Thank you, Sanjay.)
          Show
          Chris Nauroth added a comment - Uploading a new version of the design document with 2: Mentioned that timestamp threshold is configurable. (Thank you, Dilli.) Stated more clearly on page 1 that the existing connection between datanode and namenode is already authenticated via Kerberos before giving the block key to the datanode. Therefore, if the datanode proves to the client that it has the block key, then the client knows that the datanode has authenticated. (Thank you, Sanjay.)
          Hide
          Todd Lipcon added a comment -

          One question about this new protocol – it relies on the client and server addresses to prevent MITM type attacks. But many nodes are multi-homed, and in the case of cross-cluster communication there may even be NAT or SOCKS proxies in the way. Given that, a client may not know its own address (as seen by the datanode), and the address that the client is using to speak to the DN may not be the same one the DN has bound to.

          Instead, can we just use the DatanodeID and port of the target DN? This would still prevent a man-in-the-middle where the request is forwarded to a different DN. I'm not sure what value is provided by including the client's address in the digest.

          Show
          Todd Lipcon added a comment - One question about this new protocol – it relies on the client and server addresses to prevent MITM type attacks. But many nodes are multi-homed, and in the case of cross-cluster communication there may even be NAT or SOCKS proxies in the way. Given that, a client may not know its own address (as seen by the datanode), and the address that the client is using to speak to the DN may not be the same one the DN has bound to. Instead, can we just use the DatanodeID and port of the target DN? This would still prevent a man-in-the-middle where the request is forwarded to a different DN. I'm not sure what value is provided by including the client 's address in the digest.
          Hide
          Dilli Arumugam added a comment -

          Not sure whether the following problem should be addressed outside the scope of this bug.
          But seems related and looks like a more serious security problem.

          In WebHDFS world, client submits DelegationToken to DataNode.

          If we fix the current problem but have WebHDFS On, we have a bigger security problem.

          Show
          Dilli Arumugam added a comment - Not sure whether the following problem should be addressed outside the scope of this bug. But seems related and looks like a more serious security problem. In WebHDFS world, client submits DelegationToken to DataNode. If we fix the current problem but have WebHDFS On, we have a bigger security problem.
          Hide
          Aaron T. Myers added a comment -

          Thanks a lot for working on this issue, Chris. Two questions for you:

          1. In steps 5 and 6 of the proposed protocol, I think you may need to do an 's/block key/block access token/g'. As you have it currently, if the server digest returned by the DN is based on the block key directly, the client will not be able to recompute/verify the returned server digest, since the client does not know the block key. However, the client does know the block access token, and a properly authenticated DN will be able to recompute the block access token based on the block key it shares with the NN.
          2. Did you consider at all scrapping our custom authentication protocol and instead switching to using straight SASL MD5-DIGEST for the DataTransferProtocol? This is roughly what I did to add support for encrypting the DataTransferProtocol in HDFS-3637.
          Show
          Aaron T. Myers added a comment - Thanks a lot for working on this issue, Chris. Two questions for you: In steps 5 and 6 of the proposed protocol, I think you may need to do an 's/block key/block access token/g'. As you have it currently, if the server digest returned by the DN is based on the block key directly, the client will not be able to recompute/verify the returned server digest, since the client does not know the block key. However, the client does know the block access token, and a properly authenticated DN will be able to recompute the block access token based on the block key it shares with the NN. Did you consider at all scrapping our custom authentication protocol and instead switching to using straight SASL MD5-DIGEST for the DataTransferProtocol? This is roughly what I did to add support for encrypting the DataTransferProtocol in HDFS-3637 .
          Hide
          Suresh Srinivas added a comment -

          The changes for this jira needs to be backward compatible. Given that, marking the priority as Major instead of Blocker.

          Show
          Suresh Srinivas added a comment - The changes for this jira needs to be backward compatible. Given that, marking the priority as Major instead of Blocker.
          Hide
          Chris Nauroth added a comment -

          Thanks for the comments, everyone. Let's discuss the SASL point first, because it could shift the design and make the specific questions about the proposed protocol change irrelevant.

          Did you consider at all scrapping our custom authentication protocol and instead switching to using straight SASL MD5-DIGEST for the DataTransferProtocol?

          Thanks for pointing out HDFS-3637. After further review of that patch, I see how we can iterate on that. I think it also has some benefits over the proposal that I posted: 1) consistency with authentication in the rest of the codebase, and 2) enabling encryption would defeat a man-in-the-middle attack without causing harm to intermediate proxy deployments like source address validation might cause. I'd like to explore the SASL solution further.

          The only potential downside I see is that if we ever pipeline multiple operations over a single connection, then we'd need to renegotiate SASL per operation, because the authorization decision may be different per block. This doesn't seem like an insurmountable problem though.

          I have a question about the compatibility impact of HDFS-3637. I see that an upgraded client can talk to an old cluster, and an old client can talk to an upgraded cluster if encryption is off. It looks like if it's an upgraded cluster and encryption is on, then DataXceiver will not run operations sent from unencrypted client connections, including connections initiated from an old client. This implies that all clients must be upgraded before it's safe to turn on encryption in the cluster. Do I understand correctly? If so, can we relax this logic a bit to allow for compatibility of an old client connected to an upgraded cluster with SASL on? The design doc proposed checking whether or not the datanode port is < 1024, and if so, then allow the old connection. The thinking here is that anyone continuing to run on a port < 1024 must still have a component that hasn't upgraded, so therefore it needs to support the old connection. Once datanode has been reconfigured to run on a port >= 1024, then all non-encrypted connections can be rejected.

          Also, I wasn't sure about how the HDFS-3637 patch impacts compatibility for inter-datanode connections. Is it possible to have a mix of old and upgraded datanodes running, some with encryption on and some with encryption off, or does it require a coordinated push to turn on encryption across the whole cluster?

          We wanted to be conscious of backwards compatibility with this change, particularly for a rolling upgrade scenario.

          Show
          Chris Nauroth added a comment - Thanks for the comments, everyone. Let's discuss the SASL point first, because it could shift the design and make the specific questions about the proposed protocol change irrelevant. Did you consider at all scrapping our custom authentication protocol and instead switching to using straight SASL MD5-DIGEST for the DataTransferProtocol? Thanks for pointing out HDFS-3637 . After further review of that patch, I see how we can iterate on that. I think it also has some benefits over the proposal that I posted: 1) consistency with authentication in the rest of the codebase, and 2) enabling encryption would defeat a man-in-the-middle attack without causing harm to intermediate proxy deployments like source address validation might cause. I'd like to explore the SASL solution further. The only potential downside I see is that if we ever pipeline multiple operations over a single connection, then we'd need to renegotiate SASL per operation, because the authorization decision may be different per block. This doesn't seem like an insurmountable problem though. I have a question about the compatibility impact of HDFS-3637 . I see that an upgraded client can talk to an old cluster, and an old client can talk to an upgraded cluster if encryption is off. It looks like if it's an upgraded cluster and encryption is on, then DataXceiver will not run operations sent from unencrypted client connections, including connections initiated from an old client. This implies that all clients must be upgraded before it's safe to turn on encryption in the cluster. Do I understand correctly? If so, can we relax this logic a bit to allow for compatibility of an old client connected to an upgraded cluster with SASL on? The design doc proposed checking whether or not the datanode port is < 1024, and if so, then allow the old connection. The thinking here is that anyone continuing to run on a port < 1024 must still have a component that hasn't upgraded, so therefore it needs to support the old connection. Once datanode has been reconfigured to run on a port >= 1024, then all non-encrypted connections can be rejected. Also, I wasn't sure about how the HDFS-3637 patch impacts compatibility for inter-datanode connections. Is it possible to have a mix of old and upgraded datanodes running, some with encryption on and some with encryption off, or does it require a coordinated push to turn on encryption across the whole cluster? We wanted to be conscious of backwards compatibility with this change, particularly for a rolling upgrade scenario.
          Hide
          Daryn Sharp added a comment -

          I haven't digested the whole jira, but want to request more info about:

          The only potential downside I see is that if we ever pipeline multiple operations over a single connection, then we'd need to renegotiate SASL per operation, because the authorization decision may be different per block

          I've made some RPCv9 changes to allow the future possibility to multiplex connections. Will multiplexing help with this jira's use case? If so, SASL negotiation per operation should not be necessary as negotiation will occur per virtual stream.

          Show
          Daryn Sharp added a comment - I haven't digested the whole jira, but want to request more info about: The only potential downside I see is that if we ever pipeline multiple operations over a single connection, then we'd need to renegotiate SASL per operation, because the authorization decision may be different per block I've made some RPCv9 changes to allow the future possibility to multiplex connections. Will multiplexing help with this jira's use case? If so, SASL negotiation per operation should not be necessary as negotiation will occur per virtual stream.
          Hide
          Chris Nauroth added a comment -

          Will multiplexing help with this jira's use case?

          My comment referred to the fact that block-level operations, like readBlock and writeBlock, require a unique authorization decision per block, using a different block access token for each one. If multiple readBlock/writeBlock calls were pipelined over a single connection, then we'd need to check authorization on each one. If authorization for DataTransferProtocol is moving fully to SASL, then this implies to me that we would need to renegotiate SASL at the start of each block-level operation.

          I don't see a way for multiplexing to help with this problem, because there would still be the problem that we don't know what block the client requested until we start inspecting the front of the message. I haven't followed the RPCv9 changes closely though, so if I'm misunderstanding, please let me know. Thanks, Daryn.

          Show
          Chris Nauroth added a comment - Will multiplexing help with this jira's use case? My comment referred to the fact that block-level operations, like readBlock and writeBlock, require a unique authorization decision per block, using a different block access token for each one. If multiple readBlock/writeBlock calls were pipelined over a single connection, then we'd need to check authorization on each one. If authorization for DataTransferProtocol is moving fully to SASL, then this implies to me that we would need to renegotiate SASL at the start of each block-level operation. I don't see a way for multiplexing to help with this problem, because there would still be the problem that we don't know what block the client requested until we start inspecting the front of the message. I haven't followed the RPCv9 changes closely though, so if I'm misunderstanding, please let me know. Thanks, Daryn.

            People

            • Assignee:
              Chris Nauroth
              Reporter:
              Owen O'Malley
            • Votes:
              0 Vote for this issue
              Watchers:
              26 Start watching this issue

              Dates

              • Created:
                Updated:

                Development