Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-10768

Optimize Hadoop RPC encryption performance

    Details

    • Type: Improvement
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0.0-alpha1
    • Fix Version/s: None
    • Component/s: performance, security
    • Labels:
      None
    • Target Version/s:

      Description

      Hadoop RPC encryption is enabled by setting hadoop.rpc.protection to "privacy". It utilized SASL GSSAPI and DIGEST-MD5 mechanisms for secure authentication and data protection. Even GSSAPI supports using AES, but without AES-NI support by default, so the encryption is slow and will become bottleneck.

      After discuss with Aaron T. Myers, Alejandro Abdelnur and Uma Maheswara Rao G, we can do the same optimization as in HDFS-6606. Use AES-NI with more than 20x speedup.

      On the other hand, RPC message is small, but RPC is frequent and there may be lots of RPC calls in one connection, we needs to setup benchmark to see real improvement and then make a trade-off.

      1. HADOOP-10768.001.patch
        71 kB
        Dian Fu
      2. HADOOP-10768.002.patch
        76 kB
        Dian Fu
      3. Optimize Hadoop RPC encryption performance.pdf
        280 kB
        Dian Fu

        Issue Links

          Activity

          Hide
          apurtell Andrew Purtell added a comment -

          Even GSSAPI supports using AES, but without AES-NI support by default, so the encryption is slow and will become bottleneck.

          Java's GSSAPI uses JCE ciphers for crypto support. Would it be possible to simply swap in an accelerated provider like Diceros?

          On the other hand, whether to wrap payloads using the SASL client or server or not is an application decision. One could wrap the initial payloads with whatever encryption was negotiated during connection initiation until completing additional key exchange and negotiation steps, then switch to an alternate means of applying a symmetric cipher to RPC payloads.

          On the other hand, RPC message is small

          This is a similar issue we had/have with HBase write ahead log encryption, because we need to encrypt on a per-entry boundary for avoiding data loss during recovery, and each entry is small. You might think that small payloads mean we won't be able to increase throughput with accelerated crypto, and you would be right, but the accelerated crypto still reduces on CPU time substantially, with proportional reduction in latency introduced by cryptographic operations. I think for both the HBase WAL and Hadoop RPC, latency is a critical consideration.

          Show
          apurtell Andrew Purtell added a comment - Even GSSAPI supports using AES, but without AES-NI support by default, so the encryption is slow and will become bottleneck. Java's GSSAPI uses JCE ciphers for crypto support. Would it be possible to simply swap in an accelerated provider like Diceros? On the other hand, whether to wrap payloads using the SASL client or server or not is an application decision. One could wrap the initial payloads with whatever encryption was negotiated during connection initiation until completing additional key exchange and negotiation steps, then switch to an alternate means of applying a symmetric cipher to RPC payloads. On the other hand, RPC message is small This is a similar issue we had/have with HBase write ahead log encryption, because we need to encrypt on a per-entry boundary for avoiding data loss during recovery, and each entry is small. You might think that small payloads mean we won't be able to increase throughput with accelerated crypto, and you would be right, but the accelerated crypto still reduces on CPU time substantially, with proportional reduction in latency introduced by cryptographic operations. I think for both the HBase WAL and Hadoop RPC, latency is a critical consideration.
          Hide
          hitliuyi Yi Liu added a comment -

          Andrew Purtell, thanks for your nice comments, you gave good ideas.

          Java's GSSAPI uses JCE ciphers for crypto support. Would it be possible to simply swap in an accelerated provider like Diceros?

          Right. We also try to make improvement for RPCs which don't use kerberos for authentication and data protection, maybe use delegation token and so on. About simply swapping in an accelerated provider, I'm still considering the detail, I'm intended to to resolve them together.

          On the other hand, whether to wrap payloads using the SASL client or server or not is an application decision. One could wrap the initial payloads with whatever encryption was negotiated during connection initiation until completing additional key exchange and negotiation steps, then switch to an alternate means of applying a symmetric cipher to RPC payloads.

          Right, I agree, it's a good idea.

          This is a similar issue we had/have with HBase write ahead log encryption, because we need to encrypt on a per-entry boundary for avoiding data loss during recovery, and each entry is small. You might think that small payloads mean we won't be able to increase throughput with accelerated crypto, and you would be right, but the accelerated crypto still reduces on CPU time substantially, with proportional reduction in latency introduced by cryptographic operations. I think for both the HBase WAL and Hadoop RPC, latency is a critical consideration.

          You have a point. I will also setup benchmark for this.

          Show
          hitliuyi Yi Liu added a comment - Andrew Purtell , thanks for your nice comments, you gave good ideas. Java's GSSAPI uses JCE ciphers for crypto support. Would it be possible to simply swap in an accelerated provider like Diceros? Right. We also try to make improvement for RPCs which don't use kerberos for authentication and data protection, maybe use delegation token and so on. About simply swapping in an accelerated provider, I'm still considering the detail, I'm intended to to resolve them together. On the other hand, whether to wrap payloads using the SASL client or server or not is an application decision. One could wrap the initial payloads with whatever encryption was negotiated during connection initiation until completing additional key exchange and negotiation steps, then switch to an alternate means of applying a symmetric cipher to RPC payloads. Right, I agree, it's a good idea. This is a similar issue we had/have with HBase write ahead log encryption, because we need to encrypt on a per-entry boundary for avoiding data loss during recovery, and each entry is small. You might think that small payloads mean we won't be able to increase throughput with accelerated crypto, and you would be right, but the accelerated crypto still reduces on CPU time substantially, with proportional reduction in latency introduced by cryptographic operations. I think for both the HBase WAL and Hadoop RPC, latency is a critical consideration. You have a point. I will also setup benchmark for this.
          Hide
          hitliuyi Yi Liu added a comment -

          I'm working on this. It seems for services with heavy RPC calls like NameNode, the performance degrades obviously if encryption is enabled.
          I will show performance benefits after the patch is ready.

          Show
          hitliuyi Yi Liu added a comment - I'm working on this. It seems for services with heavy RPC calls like NameNode, the performance degrades obviously if encryption is enabled. I will show performance benefits after the patch is ready.
          Hide
          dian.fu Dian Fu added a comment -

          As discussed with Yi Liu offline, I'd like to pick up this JIRA. Attach an initial patch for review.

          Show
          dian.fu Dian Fu added a comment - As discussed with Yi Liu offline, I'd like to pick up this JIRA. Attach an initial patch for review.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 11s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 15s Maven dependency ordering for branch
          +1 mvninstall 6m 37s trunk passed
          +1 compile 5m 50s trunk passed with JDK v1.8.0_77
          +1 compile 6m 42s trunk passed with JDK v1.7.0_95
          +1 checkstyle 1m 13s trunk passed
          +1 mvnsite 2m 22s trunk passed
          +1 mvneclipse 0m 40s trunk passed
          +1 findbugs 5m 12s trunk passed
          +1 javadoc 2m 18s trunk passed with JDK v1.8.0_77
          +1 javadoc 3m 15s trunk passed with JDK v1.7.0_95
          0 mvndep 0m 14s Maven dependency ordering for patch
          +1 mvninstall 1m 58s the patch passed
          +1 compile 5m 42s the patch passed with JDK v1.8.0_77
          +1 cc 5m 42s the patch passed
          +1 javac 5m 42s the patch passed
          +1 compile 6m 36s the patch passed with JDK v1.7.0_95
          +1 cc 6m 36s the patch passed
          +1 javac 6m 36s the patch passed
          -1 checkstyle 1m 15s root: patch generated 52 new + 662 unchanged - 5 fixed = 714 total (was 667)
          +1 mvnsite 2m 21s the patch passed
          +1 mvneclipse 0m 41s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          +1 findbugs 5m 47s the patch passed
          +1 javadoc 2m 16s the patch passed with JDK v1.8.0_77
          +1 javadoc 3m 12s the patch passed with JDK v1.7.0_95
          -1 unit 6m 38s hadoop-common in the patch failed with JDK v1.8.0_77.
          +1 unit 0m 50s hadoop-hdfs-client in the patch passed with JDK v1.8.0_77.
          -1 unit 67m 25s hadoop-hdfs in the patch failed with JDK v1.8.0_77.
          +1 unit 8m 2s hadoop-common in the patch passed with JDK v1.7.0_95.
          +1 unit 1m 2s hadoop-hdfs-client in the patch passed with JDK v1.7.0_95.
          -1 unit 64m 18s hadoop-hdfs in the patch failed with JDK v1.7.0_95.
          +1 asflicense 0m 27s Patch does not generate ASF License warnings.
          214m 59s



          Reason Tests
          JDK v1.8.0_77 Failed junit tests hadoop.metrics2.impl.TestGangliaMetrics
            hadoop.hdfs.TestHFlush
            hadoop.hdfs.server.datanode.TestDataNodeMetrics
            hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure
            hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead
            hadoop.hdfs.server.datanode.TestFsDatasetCache
            hadoop.hdfs.TestReadStripedFileWithMissingBlocks
            hadoop.hdfs.server.balancer.TestBalancer
          JDK v1.8.0_77 Timed out junit tests org.apache.hadoop.hdfs.TestWriteReadStripedFile
            org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding
          JDK v1.7.0_95 Failed junit tests hadoop.hdfs.TestHFlush
            hadoop.hdfs.server.blockmanagement.TestNodeCount
            hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure
            hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations
            hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead
            hadoop.hdfs.server.datanode.TestFsDatasetCache
            hadoop.hdfs.TestReadStripedFileWithMissingBlocks
            hadoop.hdfs.TestEncryptionZones
          JDK v1.7.0_95 Timed out junit tests org.apache.hadoop.hdfs.TestWriteReadStripedFile
            org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:fbe3e86
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12798454/HADOOP-10768.001.patch
          JIRA Issue HADOOP-10768
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux cf1e85acd556 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 35f0770
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/9079/artifact/patchprocess/diff-checkstyle-root.txt
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/9079/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_77.txt
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/9079/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/9079/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          unit test logs https://builds.apache.org/job/PreCommit-HADOOP-Build/9079/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HADOOP-Build/9079/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HADOOP-Build/9079/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/9079/testReport/
          modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: .
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/9079/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 11s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 15s Maven dependency ordering for branch +1 mvninstall 6m 37s trunk passed +1 compile 5m 50s trunk passed with JDK v1.8.0_77 +1 compile 6m 42s trunk passed with JDK v1.7.0_95 +1 checkstyle 1m 13s trunk passed +1 mvnsite 2m 22s trunk passed +1 mvneclipse 0m 40s trunk passed +1 findbugs 5m 12s trunk passed +1 javadoc 2m 18s trunk passed with JDK v1.8.0_77 +1 javadoc 3m 15s trunk passed with JDK v1.7.0_95 0 mvndep 0m 14s Maven dependency ordering for patch +1 mvninstall 1m 58s the patch passed +1 compile 5m 42s the patch passed with JDK v1.8.0_77 +1 cc 5m 42s the patch passed +1 javac 5m 42s the patch passed +1 compile 6m 36s the patch passed with JDK v1.7.0_95 +1 cc 6m 36s the patch passed +1 javac 6m 36s the patch passed -1 checkstyle 1m 15s root: patch generated 52 new + 662 unchanged - 5 fixed = 714 total (was 667) +1 mvnsite 2m 21s the patch passed +1 mvneclipse 0m 41s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 5m 47s the patch passed +1 javadoc 2m 16s the patch passed with JDK v1.8.0_77 +1 javadoc 3m 12s the patch passed with JDK v1.7.0_95 -1 unit 6m 38s hadoop-common in the patch failed with JDK v1.8.0_77. +1 unit 0m 50s hadoop-hdfs-client in the patch passed with JDK v1.8.0_77. -1 unit 67m 25s hadoop-hdfs in the patch failed with JDK v1.8.0_77. +1 unit 8m 2s hadoop-common in the patch passed with JDK v1.7.0_95. +1 unit 1m 2s hadoop-hdfs-client in the patch passed with JDK v1.7.0_95. -1 unit 64m 18s hadoop-hdfs in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 27s Patch does not generate ASF License warnings. 214m 59s Reason Tests JDK v1.8.0_77 Failed junit tests hadoop.metrics2.impl.TestGangliaMetrics   hadoop.hdfs.TestHFlush   hadoop.hdfs.server.datanode.TestDataNodeMetrics   hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure   hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead   hadoop.hdfs.server.datanode.TestFsDatasetCache   hadoop.hdfs.TestReadStripedFileWithMissingBlocks   hadoop.hdfs.server.balancer.TestBalancer JDK v1.8.0_77 Timed out junit tests org.apache.hadoop.hdfs.TestWriteReadStripedFile   org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding JDK v1.7.0_95 Failed junit tests hadoop.hdfs.TestHFlush   hadoop.hdfs.server.blockmanagement.TestNodeCount   hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure   hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations   hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead   hadoop.hdfs.server.datanode.TestFsDatasetCache   hadoop.hdfs.TestReadStripedFileWithMissingBlocks   hadoop.hdfs.TestEncryptionZones JDK v1.7.0_95 Timed out junit tests org.apache.hadoop.hdfs.TestWriteReadStripedFile   org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12798454/HADOOP-10768.001.patch JIRA Issue HADOOP-10768 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux cf1e85acd556 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 35f0770 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/9079/artifact/patchprocess/diff-checkstyle-root.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/9079/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_77.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/9079/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/9079/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HADOOP-Build/9079/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HADOOP-Build/9079/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HADOOP-Build/9079/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/9079/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/9079/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          dian.fu Dian Fu added a comment -

          Updated the patch to fix the checkstyle issues. The test failures aren't relate to this patch. The failure of TestShortCircuitLocalRead is cased by HADOOP-12994. Other test failures have passed in my local environment.

          Show
          dian.fu Dian Fu added a comment - Updated the patch to fix the checkstyle issues. The test failures aren't relate to this patch. The failure of TestShortCircuitLocalRead is cased by HADOOP-12994 . Other test failures have passed in my local environment.
          Hide
          drankye Kai Zheng added a comment -

          Thanks Dian Fu for working on and attacking this!
          I only did a quick look at the work. So far some questions in high level:

          • Would you have a design doc that describes the requirement, the approach? I understand this was well discussed in the past, but guess a doc like this may be good to summarize and bring fresh discussion.
          • I guess it's all about and for performance. Do you have any number to share?
          • What's the impact? Does it mean to upgrade RPC version? Can external clients still be able to talk with the server via SASL? How this affect downstream components?
          • Looks like the work is mainly in SASL layer, when Kerberos is enabled, will it still favor the GSSAPI mechanism? If not or it's bypassed, what encryption key is used and how it's obtained?
          • The patch looks rather large, the change covering crypto, protocol, sasl rpc client and server, data transfer and some misc. Would you break it down? This one can be the umbrella.

          Thanks again!

          Show
          drankye Kai Zheng added a comment - Thanks Dian Fu for working on and attacking this! I only did a quick look at the work. So far some questions in high level: Would you have a design doc that describes the requirement, the approach? I understand this was well discussed in the past, but guess a doc like this may be good to summarize and bring fresh discussion. I guess it's all about and for performance. Do you have any number to share? What's the impact? Does it mean to upgrade RPC version? Can external clients still be able to talk with the server via SASL? How this affect downstream components? Looks like the work is mainly in SASL layer, when Kerberos is enabled, will it still favor the GSSAPI mechanism? If not or it's bypassed, what encryption key is used and how it's obtained? The patch looks rather large, the change covering crypto, protocol, sasl rpc client and server, data transfer and some misc. Would you break it down? This one can be the umbrella. Thanks again!
          Hide
          dian.fu Dian Fu added a comment -

          Hi Kai Zheng,
          Thanks a lot for your comments. Have attached the design doc.

          I guess it's all about and for performance. Do you have any number to share?

          Correct. It's all about performance. I will post the performance data later today.

          What's the impact? Does it mean to upgrade RPC version? Can external clients still be able to talk with the server via SASL? How this affect downstream components?

          The patch is backward compatible and so downstream components won't be affected. So I think there is no need to upgrade RPC version as well.

          Looks like the work is mainly in SASL layer, when Kerberos is enabled, will it still favor the GSSAPI mechanism? If not or it's bypassed, what encryption key is used and how it's obtained?

          It still relies on GSSAPI mechanism or DIGEST-MD5 mechanism to do authentication. At the end of the original SASL handshake, a pair of encryption keys will be generated by RPC server randomly and sent to RPC client via the secure SASL channel.

          Would you break it down? This one can be the umbrella.

          Good advice. I will split the patch to ease the review.

          Show
          dian.fu Dian Fu added a comment - Hi Kai Zheng , Thanks a lot for your comments. Have attached the design doc. I guess it's all about and for performance. Do you have any number to share? Correct. It's all about performance. I will post the performance data later today. What's the impact? Does it mean to upgrade RPC version? Can external clients still be able to talk with the server via SASL? How this affect downstream components? The patch is backward compatible and so downstream components won't be affected. So I think there is no need to upgrade RPC version as well. Looks like the work is mainly in SASL layer, when Kerberos is enabled, will it still favor the GSSAPI mechanism? If not or it's bypassed, what encryption key is used and how it's obtained? It still relies on GSSAPI mechanism or DIGEST-MD5 mechanism to do authentication. At the end of the original SASL handshake, a pair of encryption keys will be generated by RPC server randomly and sent to RPC client via the secure SASL channel. Would you break it down? This one can be the umbrella. Good advice. I will split the patch to ease the review.
          Hide
          drankye Kai Zheng added a comment -

          Thanks for the design doc and clarifying. It looks good work, Dian Fu!

          Comments about the doc:

          • It would be good to clearly say: this builds application layer data encryption ABOVE SASL (not mixed or not in the same layer of SASL). So accordingly, you can simplify your flow picture very much, by reducing it into only two steps: 1) SASL handshake; 2) Hadoop data encryption cipher negotiation. The illustrated 7 steps for SASL may be specific to GSSAPI, for others it may be much simpler, anyhow we don't need to show it here.
          • Why need to have SaslCryptoCodec? What it does? Maybe after separate encryption negotiation is complete, we can create CryptoOutputStream directly?
          • Since we're going in the same approach with data transfer encryption, both doing separate encryption cipher negotiation and data encryption after and above SASL, one being for file data, the other for RPC data, maybe we can mostly reuse the existing work? Did we go this way in implementation or is there any difference?
          • How the encryption key(s) is negotiated or determined? Do it consider the established session key from SASL if available? It seems to produce a key pair and how the two keys are used?
          • Do we hard-code the AES cipher to be AES/CTR mode? Guess other mode like AES/GCM can also be used.
          Show
          drankye Kai Zheng added a comment - Thanks for the design doc and clarifying. It looks good work, Dian Fu ! Comments about the doc: It would be good to clearly say: this builds application layer data encryption ABOVE SASL (not mixed or not in the same layer of SASL). So accordingly, you can simplify your flow picture very much, by reducing it into only two steps: 1) SASL handshake; 2) Hadoop data encryption cipher negotiation. The illustrated 7 steps for SASL may be specific to GSSAPI, for others it may be much simpler, anyhow we don't need to show it here. Why need to have SaslCryptoCodec ? What it does? Maybe after separate encryption negotiation is complete, we can create CryptoOutputStream directly? Since we're going in the same approach with data transfer encryption, both doing separate encryption cipher negotiation and data encryption after and above SASL, one being for file data, the other for RPC data, maybe we can mostly reuse the existing work? Did we go this way in implementation or is there any difference? How the encryption key(s) is negotiated or determined? Do it consider the established session key from SASL if available? It seems to produce a key pair and how the two keys are used? Do we hard-code the AES cipher to be AES/CTR mode? Guess other mode like AES/GCM can also be used.
          Hide
          drankye Kai Zheng added a comment -

          This actually bypasses the low efficient SASL.wrap/unwrap operations by providing an extra Hadoop layer above, it should be mostly flexible for Hadoop. A further consideration is how to make the layer look good and also available for the ecosystem since other projects like HBase doesn't use Hadoop IPC.

          Any thoughts?

          Show
          drankye Kai Zheng added a comment - This actually bypasses the low efficient SASL.wrap/unwrap operations by providing an extra Hadoop layer above, it should be mostly flexible for Hadoop. A further consideration is how to make the layer look good and also available for the ecosystem since other projects like HBase doesn't use Hadoop IPC. Any thoughts?

            People

            • Assignee:
              dian.fu Dian Fu
              Reporter:
              hitliuyi Yi Liu
            • Votes:
              0 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

              • Created:
                Updated:

                Development