Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9644

Update encryption documentation to reflect nested EZs

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed
    1. HDFS-9644.01.patch
      15 kB
      Zhe Zhang
    2. HDFS-9644.00.patch
      14 kB
      Zhe Zhang

      Issue Links

        Activity

        Hide
        zhz Zhe Zhang added a comment -
         For transparent encryption, we introduce a new abstraction to HDFS: the *encryption zone*. An encryption zone is a special directory whose contents will be transparently encrypted upon write and transparently decrypted upon read. Each encryption zone is associated with a single *encryption zone key* which is specified when the zone is created. Each file within an encryption zone has its own unique *data encryption key (DEK)*. DEKs are never handled directly by HDFS. Instead, HDFS only ever handles an *encrypted data encryption key (EDEK)*. Clients decrypt an EDEK, and then use the subsequent DEK to read and write data. HDFS datanodes simply see a stream of encrypted bytes.
         
        +A very important use case of encryption is to "switch it on" and ensure all files across the entire filesystem are encrypted. To support this strong guarantee without losing the flexibility of using different encryption zone keys in different parts of the filesystem, HDFS allows *nested encryption zones*. After an encryption zone is created (e.g. on the root directory `/`), a user can create more encryption zones on its descendant directories (e.g. `/home/alice`) with different keys. The EDEK of a file will generated using the encryption zone key from the lowest ancestor encryption zone.
        

        I plan to add the above (second paragraph in the snippet) to TransparentEncryption.md. Andrew Wang Could you take a look? I don't think anywhere else in the doc needs update – move validity is not talked about anyway. Should we add a section for Trash support?

        Show
        zhz Zhe Zhang added a comment - For transparent encryption, we introduce a new abstraction to HDFS: the *encryption zone*. An encryption zone is a special directory whose contents will be transparently encrypted upon write and transparently decrypted upon read. Each encryption zone is associated with a single *encryption zone key* which is specified when the zone is created. Each file within an encryption zone has its own unique *data encryption key (DEK)*. DEKs are never handled directly by HDFS. Instead, HDFS only ever handles an *encrypted data encryption key (EDEK)*. Clients decrypt an EDEK, and then use the subsequent DEK to read and write data. HDFS datanodes simply see a stream of encrypted bytes. +A very important use case of encryption is to " switch it on" and ensure all files across the entire filesystem are encrypted. To support this strong guarantee without losing the flexibility of using different encryption zone keys in different parts of the filesystem, HDFS allows *nested encryption zones*. After an encryption zone is created (e.g. on the root directory `/`), a user can create more encryption zones on its descendant directories (e.g. `/home/alice`) with different keys. The EDEK of a file will generated using the encryption zone key from the lowest ancestor encryption zone. I plan to add the above (second paragraph in the snippet) to TransparentEncryption.md . Andrew Wang Could you take a look? I don't think anywhere else in the doc needs update – move validity is not talked about anyway. Should we add a section for Trash support?
        Hide
        andrew.wang Andrew Wang added a comment -

        Text looks good! A section on rename and trash also sounds great, thanks Zhe.

        Show
        andrew.wang Andrew Wang added a comment - Text looks good! A section on rename and trash also sounds great, thanks Zhe.
        Hide
        zhz Zhe Zhang added a comment -

        Thanks Andrew! Attaching updated patch with rename and support discussion. Also fixing the anchors in the markdown file.

        Show
        zhz Zhe Zhang added a comment - Thanks Andrew! Attaching updated patch with rename and support discussion. Also fixing the anchors in the markdown file.
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 12s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 mvnsite 0m 50s trunk passed
        +1 mvnsite 0m 47s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 asflicense 0m 21s Patch does not generate ASF License warnings.
        2m 22s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ca8df7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12787388/HDFS-9644.00.patch
        JIRA Issue HDFS-9644
        Optional Tests asflicense mvnsite
        uname Linux 929c55bb7e62 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / aeb13ef
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Max memory used 30MB
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14449/console
        Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 mvnsite 0m 50s trunk passed +1 mvnsite 0m 47s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 asflicense 0m 21s Patch does not generate ASF License warnings. 2m 22s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12787388/HDFS-9644.00.patch JIRA Issue HDFS-9644 Optional Tests asflicense mvnsite uname Linux 929c55bb7e62 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / aeb13ef modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 30MB Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14449/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        andrew.wang Andrew Wang added a comment -

        Overall looks good, thanks for fixing the anchors. Few comments:

        • Rather than "lowest ancestor" I would say "closest ancestor" since trees can be drawn splaying upwards.
        • Recommend introducing the section with the rename restriction before explaining why, e.g. "HDFS restricts renames into and out of an encryption zone. This includes renames of unencrypted contents into...<give some examples>".
        • "All file EDEKs under an encryption zone are generated with its encryption zone key." change "generated" to "encrypted", "its" to "the"
        • The reason for the rename restriction is for security / ease of management. Imagine a situation where an EZ key is compromised. We want a way of identifying all potentially vulnerable files, and re-encrypting them. This is easy if all files must remain within the EZ. It's hard if they can be scattered anywhere around the filesystem. We also store the EZ key version in the xattr, so there's no memory overhead savings.
        • "encryption zone status" is a new phrase and not used again, so I don't think we need to introduce it.
        Show
        andrew.wang Andrew Wang added a comment - Overall looks good, thanks for fixing the anchors. Few comments: Rather than "lowest ancestor" I would say "closest ancestor" since trees can be drawn splaying upwards. Recommend introducing the section with the rename restriction before explaining why, e.g. "HDFS restricts renames into and out of an encryption zone. This includes renames of unencrypted contents into...<give some examples>". "All file EDEKs under an encryption zone are generated with its encryption zone key." change "generated" to "encrypted", "its" to "the" The reason for the rename restriction is for security / ease of management. Imagine a situation where an EZ key is compromised. We want a way of identifying all potentially vulnerable files, and re-encrypting them. This is easy if all files must remain within the EZ. It's hard if they can be scattered anywhere around the filesystem. We also store the EZ key version in the xattr, so there's no memory overhead savings. "encryption zone status" is a new phrase and not used again, so I don't think we need to introduce it.
        Hide
        zhz Zhe Zhang added a comment -

        Thanks Andrew! Great suggestions. I missed the logic of storing EZ key info in FileEncryptionInfo.

        Updating the patch to address..

        Show
        zhz Zhe Zhang added a comment - Thanks Andrew! Great suggestions. I missed the logic of storing EZ key info in FileEncryptionInfo . Updating the patch to address..
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 16s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 mvnsite 1m 10s trunk passed
        +1 mvnsite 1m 7s the patch passed
        +1 whitespace 0m 1s Patch has no whitespace issues.
        +1 asflicense 0m 19s Patch does not generate ASF License warnings.
        3m 7s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ca8df7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12787532/HDFS-9644.01.patch
        JIRA Issue HDFS-9644
        Optional Tests asflicense mvnsite
        uname Linux df699c3d7270 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 23f937e
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Max memory used 30MB
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14459/console
        Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 mvnsite 1m 10s trunk passed +1 mvnsite 1m 7s the patch passed +1 whitespace 0m 1s Patch has no whitespace issues. +1 asflicense 0m 19s Patch does not generate ASF License warnings. 3m 7s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12787532/HDFS-9644.01.patch JIRA Issue HDFS-9644 Optional Tests asflicense mvnsite uname Linux df699c3d7270 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 23f937e modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 30MB Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14459/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        andrew.wang Andrew Wang added a comment -

        LGTM, nice work here. +1.

        Show
        andrew.wang Andrew Wang added a comment - LGTM, nice work here. +1.
        Hide
        zhz Zhe Zhang added a comment -

        Thanks Andrew! Committed to trunk, branch-2, and branch-2.8.

        Show
        zhz Zhe Zhang added a comment - Thanks Andrew! Committed to trunk, branch-2, and branch-2.8.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #9295 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9295/)
        HDFS-9644. Update encryption documentation to reflect nested EZs. (zhz) (zhz: rev b21bbe9ed1baae1a3b8b8dcb984f1d08930109a0)

        • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/TransparentEncryption.md
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9295 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9295/ ) HDFS-9644 . Update encryption documentation to reflect nested EZs. (zhz) (zhz: rev b21bbe9ed1baae1a3b8b8dcb984f1d08930109a0) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/TransparentEncryption.md hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

          People

          • Assignee:
            zhz Zhe Zhang
            Reporter:
            zhz Zhe Zhang
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development