Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.0-beta1
    • Component/s: encryption, kms
    • Labels:
      None

      Description

      Currently when an encryption zone (EZ) key is rotated, it only takes effect on new EDEKs. We should provide a way to re-encrypt EDEKs after the EZ key rotation, for improved security.

      1. editsStored
        6 kB
        Xiao Chen
      2. HDFS-10899.01.patch
        163 kB
        Xiao Chen
      3. HDFS-10899.02.patch
        119 kB
        Xiao Chen
      4. HDFS-10899.03.patch
        121 kB
        Xiao Chen
      5. HDFS-10899.04.patch
        126 kB
        Xiao Chen
      6. HDFS-10899.05.patch
        133 kB
        Xiao Chen
      7. HDFS-10899.06.patch
        143 kB
        Xiao Chen
      8. HDFS-10899.07.patch
        144 kB
        Xiao Chen
      9. HDFS-10899.08.patch
        147 kB
        Xiao Chen
      10. HDFS-10899.09.patch
        154 kB
        Xiao Chen
      11. HDFS-10899.10.patch
        209 kB
        Xiao Chen
      12. HDFS-10899.10.wip.patch
        160 kB
        Xiao Chen
      13. HDFS-10899.11.patch
        209 kB
        Xiao Chen
      14. HDFS-10899.12.patch
        215 kB
        Xiao Chen
      15. HDFS-10899.13.patch
        227 kB
        Xiao Chen
      16. HDFS-10899.14.patch
        228 kB
        Xiao Chen
      17. HDFS-10899.15.patch
        241 kB
        Xiao Chen
      18. HDFS-10899.16.patch
        242 kB
        Xiao Chen
      19. HDFS-10899.17.patch
        244 kB
        Xiao Chen
      20. HDFS-10899.wip.2.patch
        88 kB
        Xiao Chen
      21. HDFS-10899.wip.patch
        106 kB
        Xiao Chen
      22. Re-encrypt edek design doc.pdf
        117 kB
        Xiao Chen
      23. Re-encrypt edek design doc V2.pdf
        156 kB
        Xiao Chen

        Issue Links

          Activity

          Hide
          xiaochen Xiao Chen added a comment -

          Will post a design doc and more context soon.

          Show
          xiaochen Xiao Chen added a comment - Will post a design doc and more context soon.
          Hide
          xiaochen Xiao Chen added a comment -

          Attaching a design doc. Many thanks to Andrew Wang for the offline reviews!

          Show
          xiaochen Xiao Chen added a comment - Attaching a design doc. Many thanks to Andrew Wang for the offline reviews!
          Hide
          sacharya Suraj Acharya added a comment -

          hi Xiao,
          this is a good feature and +1 for the feature itself.
          However, I had a couple of suggestions.

          • It would be great if this was made as a REST API endpoint along with the other key creation/management REST API.
          • May be we can add a new parameter to the key roll REST API saying something along the lines of roll all the zones with given key, or change it to :
            {
              "material"    : "<material>",
              "reencrypt"	: "True", // setting this to true will cause re encryption to begin.
              "zones"		: "a,c,b" // This will start it in the order of a->c->b
            }
            
          • Create an endpoint to check the status or reencryption.
          Show
          sacharya Suraj Acharya added a comment - hi Xiao, this is a good feature and +1 for the feature itself. However, I had a couple of suggestions. It would be great if this was made as a REST API endpoint along with the other key creation/management REST API. May be we can add a new parameter to the key roll REST API saying something along the lines of roll all the zones with given key, or change it to : { "material" : "<material>" , "reencrypt" : "True" , // setting this to true will cause re encryption to begin. "zones" : "a,c,b" // This will start it in the order of a->c->b } Create an endpoint to check the status or reencryption.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks for the comments Suraj Acharya.

          It would be great if this was made as a REST API endpoint along with the other key creation/management REST API.

          Sure, this will be part of http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-kms/index.html#KMS_HTTP_REST_API

          May be we can add a new parameter to the key roll REST API saying something along the lines of roll all the zones with given key

          I have a mixed feeling of this. As listed in the Alternatives of 'HDFS client-side', it is more admin-friendly. But it will also make the already not-so-friendly ACLs more complex.
          Also, in this example, to support the scenario where admin accidentally missed a zone 'd', we still need the reencrypt command to re-encrypt it. So I'd like to leave out rollover's option for now, and focus on re-encrypt in this jira. Can choose to add the option, or maybe reencrypt -key as an improvement later.

          Create an endpoint to check the status or reencryption.

          Good idea, I was thinking maybe add this to NN webui. Will work on CLI checks first and hook up an endpoint afterwards.

          Show
          xiaochen Xiao Chen added a comment - Thanks for the comments Suraj Acharya . It would be great if this was made as a REST API endpoint along with the other key creation/management REST API. Sure, this will be part of http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-kms/index.html#KMS_HTTP_REST_API May be we can add a new parameter to the key roll REST API saying something along the lines of roll all the zones with given key I have a mixed feeling of this. As listed in the Alternatives of 'HDFS client-side', it is more admin-friendly. But it will also make the already not-so-friendly ACLs more complex. Also, in this example, to support the scenario where admin accidentally missed a zone 'd', we still need the reencrypt command to re-encrypt it. So I'd like to leave out rollover's option for now, and focus on re-encrypt in this jira. Can choose to add the option, or maybe reencrypt -key as an improvement later. Create an endpoint to check the status or reencryption. Good idea, I was thinking maybe add this to NN webui. Will work on CLI checks first and hook up an endpoint afterwards.
          Hide
          sacharya Suraj Acharya added a comment -

          I have a mixed feeling of this. As listed in the Alternatives of 'HDFS client-side', it is more admin-friendly. But it will also make the already not-so-friendly ACLs more complex.

          Doesn't Roll key already have an underlying generate EEK as warming of edeks? Since you mentioned re-encrypt will have the same ACLs as generate EEK I think we should be okay. That was my thinking.

          Also, in this example, to support the scenario where admin accidentally missed a zone 'd', we still need the reencrypt command to re-encrypt it.

          Regarding this my thought was that they can use the rencrypt zone command/ endpoint to re-encrypt zone 'd' (point1)

          Show
          sacharya Suraj Acharya added a comment - I have a mixed feeling of this. As listed in the Alternatives of 'HDFS client-side', it is more admin-friendly. But it will also make the already not-so-friendly ACLs more complex. Doesn't Roll key already have an underlying generate EEK as warming of edeks? Since you mentioned re-encrypt will have the same ACLs as generate EEK I think we should be okay. That was my thinking. Also, in this example, to support the scenario where admin accidentally missed a zone 'd', we still need the reencrypt command to re-encrypt it. Regarding this my thought was that they can use the rencrypt zone command/ endpoint to re-encrypt zone 'd' (point1)
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Suraj for the discussions offline. Some high-level conclusion was that re-encrypt being a EEK op, is better controlled under EEK op, so sticking with generate_eek for now. Will post a WIP patch momentarily.

          Show
          xiaochen Xiao Chen added a comment - Thanks Suraj for the discussions offline. Some high-level conclusion was that re-encrypt being a EEK op, is better controlled under EEK op, so sticking with generate_eek for now. Will post a WIP patch momentarily.
          Hide
          xiaochen Xiao Chen added a comment - - edited

          WIP patch attached. The re-encrypt is functional, and all test cases from the doc added. Appreciate any early feedback, thanks in advance!

          Work on going:

          • Cancel re-encrypt
          • Check re-encrypt progress
          • Support client to verify that files are re-encrypted
          • Perf testing and related tuning (including updating EZM to unlock when calling reencryptEncrytedKey - and maybe batching that)
          Show
          xiaochen Xiao Chen added a comment - - edited WIP patch attached. The re-encrypt is functional, and all test cases from the doc added. Appreciate any early feedback, thanks in advance! Work on going: Cancel re-encrypt Check re-encrypt progress Support client to verify that files are re-encrypted Perf testing and related tuning (including updating EZM to unlock when calling reencryptEncrytedKey - and maybe batching that)
          Hide
          andrew.wang Andrew Wang added a comment -

          Thanks for working on this Xiao! I didn't give the whole patch a deep look, but some review comments to start it off. I didn't get to the KMS changes, it'd help to split those into a separate patch to make review easier.

          Comments as follows:

          • getShortUsage, are -path / -cancel / -verify exclusive operations? We should indicate this in the usage text if so. man git for instance looks like:
          usage: git [--version] [--help] [-C <path>] [-c name=value]
                     [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
                     [-p | --paginate | --no-pager] [--no-replace-objects] [--bare]
                     [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
                     <command> [<args>]
          

          So <command> <path> would work here, with the long usage explaining the commands.

          • We have new functionality to specify time with units in configuration, want to use it here?
          • New keys should be documented in hdfs-default.xml also

          EZManager:

          • Some unused imports in I'm sure checkstyle will complain about
          • Typo: "filesReencryptedInCurentZone" -> "Current"
          • Can the new methods in EZManager be encapsulated as a class? Could use some javadoc about expected behavior also.
          • Empty finally case in the run method, can be deleted?
          • Thread should check if the NN is active or standby, typically we do this in FSNamesystem#startActiveServices and FSNamesystem#stopActiveServices
          • reencryptDir looks like it takes the writeLock the entire time while it's processing the files in a directory, including when talking to the KMS. We should not hold the lock while talking to the KMS.
          • reencryptDir is also a big method, refactor the innards of the for loop?
          • We also should do a logSync before we log the "Completed" message in reencryptEncryptionZoneInt, so we block first. Maybe elsewhere too.
          • Calling getListing will generate audit logs, we should be calling getListingInt instead. Also, we don't need a full HdfsFileStatus to reencrypt each file, so let's consider an INode-based traversal.
          • Structuring this as an iterator (and maybe also a visitor) would make the code more clear.
          • Recommend we rename updateReencryptStatus so it's clear that this is used during edit log / fsimage loading, or at least add some javadoc.

          ReencryptInfo:

          • Recommend we name ReencryptInfo something like "PendingReencryptionZones" or something to be more self documenting. The javadoc also could mention that this is basically a queue.
          • A small warning that hasZone will be O if the JDK implementation is truly a queue, maybe we should use a LinkedHashMap instead? I don't think we use the deque nature.
          • How is the reencryptInfo in EZManager synchronized? I ask because it's a ConcurrentLinkedDeque, is there actually concurrency happening?
          Show
          andrew.wang Andrew Wang added a comment - Thanks for working on this Xiao! I didn't give the whole patch a deep look, but some review comments to start it off. I didn't get to the KMS changes, it'd help to split those into a separate patch to make review easier. Comments as follows: getShortUsage, are -path / -cancel / -verify exclusive operations? We should indicate this in the usage text if so. man git for instance looks like: usage: git [--version] [--help] [-C <path>] [-c name=value] [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path] [-p | --paginate | --no-pager] [--no-replace-objects] [--bare] [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>] <command> [<args>] So <command> <path> would work here, with the long usage explaining the commands. We have new functionality to specify time with units in configuration, want to use it here? New keys should be documented in hdfs-default.xml also EZManager: Some unused imports in I'm sure checkstyle will complain about Typo: "filesReencryptedInCurentZone" -> "Current" Can the new methods in EZManager be encapsulated as a class? Could use some javadoc about expected behavior also. Empty finally case in the run method, can be deleted? Thread should check if the NN is active or standby, typically we do this in FSNamesystem#startActiveServices and FSNamesystem#stopActiveServices reencryptDir looks like it takes the writeLock the entire time while it's processing the files in a directory, including when talking to the KMS. We should not hold the lock while talking to the KMS. reencryptDir is also a big method, refactor the innards of the for loop? We also should do a logSync before we log the "Completed" message in reencryptEncryptionZoneInt , so we block first. Maybe elsewhere too. Calling getListing will generate audit logs, we should be calling getListingInt instead. Also, we don't need a full HdfsFileStatus to reencrypt each file, so let's consider an INode-based traversal. Structuring this as an iterator (and maybe also a visitor) would make the code more clear. Recommend we rename updateReencryptStatus so it's clear that this is used during edit log / fsimage loading, or at least add some javadoc. ReencryptInfo: Recommend we name ReencryptInfo something like "PendingReencryptionZones" or something to be more self documenting. The javadoc also could mention that this is basically a queue. A small warning that hasZone will be O if the JDK implementation is truly a queue, maybe we should use a LinkedHashMap instead? I don't think we use the deque nature. How is the reencryptInfo in EZManager synchronized? I ask because it's a ConcurrentLinkedDeque, is there actually concurrency happening?
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Andrew for taking the time to look at this, and great comments! Really appreciate it. I will accommodate them into the next patch.

          A little clarifications first:

          LinkedHashMap v.s. ConcurrentLinkedDeque ... is there actually concurrency happening?

          I thought of this too. Currently there's just the single background thread in the executor to process it. But later we likely want to multi-thread that for performance. Also, we don't use the value (just the key) of the LinkedHashMap. OTOH, I agree not using the deque nature is also bad...
          Also thought about hasZone being O, but thought it's okay since it's not a usual operation (only when new command submission / NN startup). Really wish there's a concurrent linked hash set.... or at least a linked hash set..

          Can the new methods in EZManager be encapsulated as a class?

          You mean to have a separate class for the background thread right? Surely can do.

          Show
          xiaochen Xiao Chen added a comment - Thanks Andrew for taking the time to look at this, and great comments! Really appreciate it. I will accommodate them into the next patch. A little clarifications first: LinkedHashMap v.s. ConcurrentLinkedDeque ... is there actually concurrency happening? I thought of this too. Currently there's just the single background thread in the executor to process it. But later we likely want to multi-thread that for performance. Also, we don't use the value (just the key) of the LinkedHashMap. OTOH, I agree not using the deque nature is also bad... Also thought about hasZone being O , but thought it's okay since it's not a usual operation (only when new command submission / NN startup). Really wish there's a concurrent linked hash set.... or at least a linked hash set.. Can the new methods in EZManager be encapsulated as a class? You mean to have a separate class for the background thread right? Surely can do.
          Hide
          andrew.wang Andrew Wang added a comment -

          <LinkedHashMap discussion>

          Makes sense, though at this point I'd prefer to choose a datastructure for clarity rather than performance. We're likely going to be bottlenecked on KMS RPCs rather than this map, so even simple synchronization will probably be okay.

          It's also not so bad to just use a Map for a Set, the JDK HashSet for instance is actually backed by a map:

          http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashSet.java

          Show
          andrew.wang Andrew Wang added a comment - <LinkedHashMap discussion> Makes sense, though at this point I'd prefer to choose a datastructure for clarity rather than performance. We're likely going to be bottlenecked on KMS RPCs rather than this map, so even simple synchronization will probably be okay. It's also not so bad to just use a Map for a Set, the JDK HashSet for instance is actually backed by a map: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashSet.java
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks a lot Andrew Wang for the great reviews and also Dilaver for offline suggestions! Still wip, but attaching for review.

          What's changed in wip2:

          • Split kms related change to a subtask.
          • Addressed most comments. Except some items left as todo: : command usage, writelock when talking to kms. Also structuring as iterator/visitor part isn't done, but inode based traversal looks better now.

          TODO items:

          • cancel / verify commands
          • better metrics
          • lock tuning and potential batching communication with kms, based on test results.
          • some additional tests for correctness (raised by Dilaver):
            • rm file while its EZ is being reencrypted
            • mv file while its EZ is being reencrypted
            • rm & recreate file while its EZ is being reencrypted
            • append to file while its EZ is being reencrypted
          Show
          xiaochen Xiao Chen added a comment - Thanks a lot Andrew Wang for the great reviews and also Dilaver for offline suggestions! Still wip, but attaching for review. What's changed in wip2: Split kms related change to a subtask. Addressed most comments. Except some items left as todo: : command usage, writelock when talking to kms. Also structuring as iterator/visitor part isn't done, but inode based traversal looks better now. TODO items: cancel / verify commands better metrics lock tuning and potential batching communication with kms, based on test results. some additional tests for correctness (raised by Dilaver): rm file while its EZ is being reencrypted mv file while its EZ is being reencrypted rm & recreate file while its EZ is being reencrypted append to file while its EZ is being reencrypted
          Hide
          xiaochen Xiao Chen added a comment -

          Uploading patch 1 ready for review. Note I included HADOOP-13827's patch 3 to kick off a pre-commit, please only review HDFS code here.

          After a quick offline discussion with Andrew Wang, we decided to handle renames in another jira, to ease review.

          Some left items, mostly marked as TODO in the code too:

          • rename: HDFS-11203.
          • More testing against snapshots. (and add the append case above)
          • status command
          • Think more about the metrics (expose to jmx etc.)
          • Better throttling logic. Also, batch contacting KMS may or may not be needed, pending some benchmark runs. I'll do it this week.

          Appreciate your feedback!

          Show
          xiaochen Xiao Chen added a comment - Uploading patch 1 ready for review. Note I included HADOOP-13827 's patch 3 to kick off a pre-commit, please only review HDFS code here. After a quick offline discussion with Andrew Wang , we decided to handle renames in another jira, to ease review. Some left items, mostly marked as TODO in the code too: rename: HDFS-11203 . More testing against snapshots. (and add the append case above) status command Think more about the metrics (expose to jmx etc.) Better throttling logic. Also, batch contacting KMS may or may not be needed, pending some benchmark runs. I'll do it this week. Appreciate your feedback!
          Hide
          andrew.wang Andrew Wang added a comment -

          I just looking at HADOOP-13827, and noticed that it also tries to drain the key caches when a key is rolled.

          Question, how will admins know when they can start reencryption after rolling a key? This is also complicated because there can be multiple KMSs serving requests, and only one of them sees the rollNewVersion call, and we have layers of caches.

          Show
          andrew.wang Andrew Wang added a comment - I just looking at HADOOP-13827 , and noticed that it also tries to drain the key caches when a key is rolled. Question, how will admins know when they can start reencryption after rolling a key? This is also complicated because there can be multiple KMSs serving requests, and only one of them sees the rollNewVersion call, and we have layers of caches.
          Hide
          xiaochen Xiao Chen added a comment -

          Good question. My intention was that after a hadoop key roll returns an admin can safely hdfs crypto reencrypt.

          KMS historically didn't care about this, and hence all caches may not be invalidated. HADOOP-13827 fixes the client side to drain all KMSCPs, but only 1 server would be drained. To drain all the servers I think we need to add one interface to the server to explicitly do that. Given that KMS servers ain't aware of each other, this seems to be the only reasonable way. (And only drain client after servers are all drained. The new drain interface can be controlled under the MANAGEMENT ACL which currently controls rollNewVersion).
          Thoughts?

          Show
          xiaochen Xiao Chen added a comment - Good question. My intention was that after a hadoop key roll returns an admin can safely hdfs crypto reencrypt . KMS historically didn't care about this, and hence all caches may not be invalidated. HADOOP-13827 fixes the client side to drain all KMSCPs, but only 1 server would be drained. To drain all the servers I think we need to add one interface to the server to explicitly do that. Given that KMS servers ain't aware of each other, this seems to be the only reasonable way. (And only drain client after servers are all drained. The new drain interface can be controlled under the MANAGEMENT ACL which currently controls rollNewVersion ). Thoughts?
          Hide
          andrew.wang Andrew Wang added a comment -

          I think that will work, but do we also need a drain command for the cache in the NN's client?

          Show
          andrew.wang Andrew Wang added a comment - I think that will work, but do we also need a drain command for the cache in the NN's client?
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 5 new or modified test files.
          0 mvndep 1m 37s Maven dependency ordering for branch
          +1 mvninstall 6m 46s trunk passed
          +1 compile 9m 35s trunk passed
          +1 checkstyle 1m 53s trunk passed
          +1 mvnsite 3m 3s trunk passed
          +1 mvneclipse 1m 11s trunk passed
          +1 findbugs 5m 8s trunk passed
          +1 javadoc 2m 15s trunk passed
          0 mvndep 0m 15s Maven dependency ordering for patch
          -1 mvninstall 0m 24s hadoop-hdfs in the patch failed.
          -1 compile 1m 36s root in the patch failed.
          -1 cc 1m 36s root in the patch failed.
          -1 javac 1m 36s root in the patch failed.
          -0 checkstyle 1m 47s root: The patch generated 29 new + 1694 unchanged - 25 fixed = 1723 total (was 1719)
          -1 mvnsite 0m 26s hadoop-hdfs in the patch failed.
          +1 mvneclipse 0m 43s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 2s The patch has no ill-formed XML file.
          -1 findbugs 0m 25s hadoop-hdfs in the patch failed.
          +1 javadoc 1m 45s the patch passed
          +1 unit 8m 16s hadoop-common in the patch passed.
          +1 unit 2m 5s hadoop-kms in the patch passed.
          +1 unit 0m 55s hadoop-hdfs-client in the patch passed.
          -1 unit 0m 25s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          58m 39s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12841699/HDFS-10899.01.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux f7097d7577cb 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 8e63fa9
          Default Java 1.8.0_111
          findbugs v3.0.0
          mvninstall https://builds.apache.org/job/PreCommit-HDFS-Build/17770/artifact/patchprocess/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt
          compile https://builds.apache.org/job/PreCommit-HDFS-Build/17770/artifact/patchprocess/patch-compile-root.txt
          cc https://builds.apache.org/job/PreCommit-HDFS-Build/17770/artifact/patchprocess/patch-compile-root.txt
          javac https://builds.apache.org/job/PreCommit-HDFS-Build/17770/artifact/patchprocess/patch-compile-root.txt
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17770/artifact/patchprocess/diff-checkstyle-root.txt
          mvnsite https://builds.apache.org/job/PreCommit-HDFS-Build/17770/artifact/patchprocess/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/17770/artifact/patchprocess/patch-findbugs-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/17770/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17770/testReport/
          modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: .
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17770/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 5 new or modified test files. 0 mvndep 1m 37s Maven dependency ordering for branch +1 mvninstall 6m 46s trunk passed +1 compile 9m 35s trunk passed +1 checkstyle 1m 53s trunk passed +1 mvnsite 3m 3s trunk passed +1 mvneclipse 1m 11s trunk passed +1 findbugs 5m 8s trunk passed +1 javadoc 2m 15s trunk passed 0 mvndep 0m 15s Maven dependency ordering for patch -1 mvninstall 0m 24s hadoop-hdfs in the patch failed. -1 compile 1m 36s root in the patch failed. -1 cc 1m 36s root in the patch failed. -1 javac 1m 36s root in the patch failed. -0 checkstyle 1m 47s root: The patch generated 29 new + 1694 unchanged - 25 fixed = 1723 total (was 1719) -1 mvnsite 0m 26s hadoop-hdfs in the patch failed. +1 mvneclipse 0m 43s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 2s The patch has no ill-formed XML file. -1 findbugs 0m 25s hadoop-hdfs in the patch failed. +1 javadoc 1m 45s the patch passed +1 unit 8m 16s hadoop-common in the patch passed. +1 unit 2m 5s hadoop-kms in the patch passed. +1 unit 0m 55s hadoop-hdfs-client in the patch passed. -1 unit 0m 25s hadoop-hdfs in the patch failed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 58m 39s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12841699/HDFS-10899.01.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux f7097d7577cb 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 8e63fa9 Default Java 1.8.0_111 findbugs v3.0.0 mvninstall https://builds.apache.org/job/PreCommit-HDFS-Build/17770/artifact/patchprocess/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt compile https://builds.apache.org/job/PreCommit-HDFS-Build/17770/artifact/patchprocess/patch-compile-root.txt cc https://builds.apache.org/job/PreCommit-HDFS-Build/17770/artifact/patchprocess/patch-compile-root.txt javac https://builds.apache.org/job/PreCommit-HDFS-Build/17770/artifact/patchprocess/patch-compile-root.txt checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17770/artifact/patchprocess/diff-checkstyle-root.txt mvnsite https://builds.apache.org/job/PreCommit-HDFS-Build/17770/artifact/patchprocess/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/17770/artifact/patchprocess/patch-findbugs-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/17770/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17770/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17770/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment - - edited

          Oops, was fixing some last-minute checkstyle which turns out failed compilation. Re-uploaded patch 1.

          After some offline sync with Andrew, created HDFS-11210 to handle key rolling end-to-end.

          Show
          xiaochen Xiao Chen added a comment - - edited Oops, was fixing some last-minute checkstyle which turns out failed compilation. Re-uploaded patch 1. After some offline sync with Andrew, created HDFS-11210 to handle key rolling end-to-end.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 5 new or modified test files.
          0 mvndep 0m 16s Maven dependency ordering for branch
          +1 mvninstall 8m 37s trunk passed
          +1 compile 11m 6s trunk passed
          +1 checkstyle 1m 53s trunk passed
          +1 mvnsite 3m 19s trunk passed
          +1 mvneclipse 1m 10s trunk passed
          +1 findbugs 5m 20s trunk passed
          +1 javadoc 2m 29s trunk passed
          0 mvndep 0m 17s Maven dependency ordering for patch
          +1 mvninstall 2m 27s the patch passed
          +1 compile 9m 42s the patch passed
          +1 cc 9m 42s the patch passed
          +1 javac 9m 42s the patch passed
          -0 checkstyle 2m 10s root: The patch generated 31 new + 1693 unchanged - 25 fixed = 1724 total (was 1718)
          +1 mvnsite 3m 24s the patch passed
          +1 mvneclipse 1m 21s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 3s The patch has no ill-formed XML file.
          -1 findbugs 2m 2s hadoop-hdfs-project/hadoop-hdfs generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
          +1 javadoc 2m 21s the patch passed
          +1 unit 9m 34s hadoop-common in the patch passed.
          +1 unit 2m 22s hadoop-kms in the patch passed.
          +1 unit 1m 7s hadoop-hdfs-client in the patch passed.
          -1 unit 94m 35s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 45s The patch does not generate ASF License warnings.
          172m 19s



          Reason Tests
          FindBugs module:hadoop-hdfs-project/hadoop-hdfs
            new org.apache.hadoop.hdfs.protocol.ReencryptionZonesStatus(ReencryptionZonesStatus) invokes inefficient new String(String) constructor At ReencryptionZonesStatus.java:String(String) constructor At ReencryptionZonesStatus.java:[line 53]
            Suspicious comparison of Long references in org.apache.hadoop.hdfs.server.namenode.EncryptionZoneManager.loadReencryptComplete(Long) At EncryptionZoneManager.java:in org.apache.hadoop.hdfs.server.namenode.EncryptionZoneManager.loadReencryptComplete(Long) At EncryptionZoneManager.java:[line 536]
          Failed junit tests hadoop.hdfs.TestDFSInotifyEventInputStream
            hadoop.hdfs.server.namenode.TestCheckpoint
            hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
            hadoop.cli.TestCryptoAdminCLI
            hadoop.hdfs.TestEncryptionZones
            hadoop.hdfs.server.blockmanagement.TestReplicationPolicy
            hadoop.hdfs.TestEncryptionZonesWithKMS



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12841894/HDFS-10899.01.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux b8ebe3ca9c25 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 43cb167
          Default Java 1.8.0_111
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17778/artifact/patchprocess/diff-checkstyle-root.txt
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/17778/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/17778/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17778/testReport/
          modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: .
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17778/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 5 new or modified test files. 0 mvndep 0m 16s Maven dependency ordering for branch +1 mvninstall 8m 37s trunk passed +1 compile 11m 6s trunk passed +1 checkstyle 1m 53s trunk passed +1 mvnsite 3m 19s trunk passed +1 mvneclipse 1m 10s trunk passed +1 findbugs 5m 20s trunk passed +1 javadoc 2m 29s trunk passed 0 mvndep 0m 17s Maven dependency ordering for patch +1 mvninstall 2m 27s the patch passed +1 compile 9m 42s the patch passed +1 cc 9m 42s the patch passed +1 javac 9m 42s the patch passed -0 checkstyle 2m 10s root: The patch generated 31 new + 1693 unchanged - 25 fixed = 1724 total (was 1718) +1 mvnsite 3m 24s the patch passed +1 mvneclipse 1m 21s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 3s The patch has no ill-formed XML file. -1 findbugs 2m 2s hadoop-hdfs-project/hadoop-hdfs generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) +1 javadoc 2m 21s the patch passed +1 unit 9m 34s hadoop-common in the patch passed. +1 unit 2m 22s hadoop-kms in the patch passed. +1 unit 1m 7s hadoop-hdfs-client in the patch passed. -1 unit 94m 35s hadoop-hdfs in the patch failed. +1 asflicense 0m 45s The patch does not generate ASF License warnings. 172m 19s Reason Tests FindBugs module:hadoop-hdfs-project/hadoop-hdfs   new org.apache.hadoop.hdfs.protocol.ReencryptionZonesStatus(ReencryptionZonesStatus) invokes inefficient new String(String) constructor At ReencryptionZonesStatus.java:String(String) constructor At ReencryptionZonesStatus.java: [line 53]   Suspicious comparison of Long references in org.apache.hadoop.hdfs.server.namenode.EncryptionZoneManager.loadReencryptComplete(Long) At EncryptionZoneManager.java:in org.apache.hadoop.hdfs.server.namenode.EncryptionZoneManager.loadReencryptComplete(Long) At EncryptionZoneManager.java: [line 536] Failed junit tests hadoop.hdfs.TestDFSInotifyEventInputStream   hadoop.hdfs.server.namenode.TestCheckpoint   hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.cli.TestCryptoAdminCLI   hadoop.hdfs.TestEncryptionZones   hadoop.hdfs.server.blockmanagement.TestReplicationPolicy   hadoop.hdfs.TestEncryptionZonesWithKMS Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12841894/HDFS-10899.01.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux b8ebe3ca9c25 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 43cb167 Default Java 1.8.0_111 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17778/artifact/patchprocess/diff-checkstyle-root.txt findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/17778/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html unit https://builds.apache.org/job/PreCommit-HDFS-Build/17778/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17778/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17778/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Rebased on latest trunk, and fixed precommits.

          Show
          xiaochen Xiao Chen added a comment - Rebased on latest trunk, and fixed precommits.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 20s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          0 mvndep 0m 32s Maven dependency ordering for branch
          +1 mvninstall 13m 6s trunk passed
          +1 compile 1m 22s trunk passed
          +1 checkstyle 0m 49s trunk passed
          +1 mvnsite 1m 22s trunk passed
          +1 mvneclipse 0m 25s trunk passed
          +1 findbugs 3m 7s trunk passed
          +1 javadoc 0m 58s trunk passed
          0 mvndep 0m 6s Maven dependency ordering for patch
          +1 mvninstall 1m 15s the patch passed
          +1 compile 1m 19s the patch passed
          +1 cc 1m 19s the patch passed
          +1 javac 1m 19s the patch passed
          -0 checkstyle 0m 47s hadoop-hdfs-project: The patch generated 29 new + 1529 unchanged - 2 fixed = 1558 total (was 1531)
          +1 mvnsite 1m 17s the patch passed
          +1 mvneclipse 0m 21s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 2s The patch has no ill-formed XML file.
          +1 findbugs 3m 20s the patch passed
          +1 javadoc 0m 55s the patch passed
          +1 unit 0m 54s hadoop-hdfs-client in the patch passed.
          -1 unit 67m 15s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 21s The patch does not generate ASF License warnings.
          101m 28s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.blockmanagement.TestReplicationPolicy
            hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
            hadoop.hdfs.TestDFSInotifyEventInputStream
            hadoop.hdfs.server.namenode.TestCheckpoint



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12844832/HDFS-10899.02.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux 799095a23714 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 9262797
          Default Java 1.8.0_111
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17960/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/17960/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17960/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17960/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 32s Maven dependency ordering for branch +1 mvninstall 13m 6s trunk passed +1 compile 1m 22s trunk passed +1 checkstyle 0m 49s trunk passed +1 mvnsite 1m 22s trunk passed +1 mvneclipse 0m 25s trunk passed +1 findbugs 3m 7s trunk passed +1 javadoc 0m 58s trunk passed 0 mvndep 0m 6s Maven dependency ordering for patch +1 mvninstall 1m 15s the patch passed +1 compile 1m 19s the patch passed +1 cc 1m 19s the patch passed +1 javac 1m 19s the patch passed -0 checkstyle 0m 47s hadoop-hdfs-project: The patch generated 29 new + 1529 unchanged - 2 fixed = 1558 total (was 1531) +1 mvnsite 1m 17s the patch passed +1 mvneclipse 0m 21s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 2s The patch has no ill-formed XML file. +1 findbugs 3m 20s the patch passed +1 javadoc 0m 55s the patch passed +1 unit 0m 54s hadoop-hdfs-client in the patch passed. -1 unit 67m 15s hadoop-hdfs in the patch failed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 101m 28s Reason Tests Failed junit tests hadoop.hdfs.server.blockmanagement.TestReplicationPolicy   hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer   hadoop.hdfs.TestDFSInotifyEventInputStream   hadoop.hdfs.server.namenode.TestCheckpoint Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12844832/HDFS-10899.02.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux 799095a23714 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 9262797 Default Java 1.8.0_111 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17960/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/17960/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17960/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17960/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
          0 mvndep 0m 23s Maven dependency ordering for branch
          +1 mvninstall 14m 11s trunk passed
          +1 compile 1m 30s trunk passed
          +1 checkstyle 0m 48s trunk passed
          +1 mvnsite 1m 30s trunk passed
          +1 mvneclipse 0m 28s trunk passed
          +1 findbugs 3m 40s trunk passed
          +1 javadoc 1m 8s trunk passed
          0 mvndep 0m 8s Maven dependency ordering for patch
          +1 mvninstall 1m 37s the patch passed
          +1 compile 1m 42s the patch passed
          +1 cc 1m 42s the patch passed
          +1 javac 1m 42s the patch passed
          -0 checkstyle 0m 52s hadoop-hdfs-project: The patch generated 14 new + 1571 unchanged - 2 fixed = 1585 total (was 1573)
          +1 mvnsite 1m 44s the patch passed
          +1 mvneclipse 0m 26s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 3s The patch has no ill-formed XML file.
          +1 findbugs 4m 22s the patch passed
          +1 javadoc 1m 9s the patch passed
          +1 unit 0m 55s hadoop-hdfs-client in the patch passed.
          -1 unit 66m 38s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 25s The patch does not generate ASF License warnings.
          105m 33s



          Reason Tests
          Failed junit tests hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
            hadoop.hdfs.TestDataTransferKeepalive



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12844842/HDFS-10899.02.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux a61e906a21d4 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 9262797
          Default Java 1.8.0_111
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17963/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/17963/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17963/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17963/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. 0 mvndep 0m 23s Maven dependency ordering for branch +1 mvninstall 14m 11s trunk passed +1 compile 1m 30s trunk passed +1 checkstyle 0m 48s trunk passed +1 mvnsite 1m 30s trunk passed +1 mvneclipse 0m 28s trunk passed +1 findbugs 3m 40s trunk passed +1 javadoc 1m 8s trunk passed 0 mvndep 0m 8s Maven dependency ordering for patch +1 mvninstall 1m 37s the patch passed +1 compile 1m 42s the patch passed +1 cc 1m 42s the patch passed +1 javac 1m 42s the patch passed -0 checkstyle 0m 52s hadoop-hdfs-project: The patch generated 14 new + 1571 unchanged - 2 fixed = 1585 total (was 1573) +1 mvnsite 1m 44s the patch passed +1 mvneclipse 0m 26s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 3s The patch has no ill-formed XML file. +1 findbugs 4m 22s the patch passed +1 javadoc 1m 9s the patch passed +1 unit 0m 55s hadoop-hdfs-client in the patch passed. -1 unit 66m 38s hadoop-hdfs in the patch failed. +1 asflicense 0m 25s The patch does not generate ASF License warnings. 105m 33s Reason Tests Failed junit tests hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer   hadoop.hdfs.TestDataTransferKeepalive Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12844842/HDFS-10899.02.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux a61e906a21d4 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 9262797 Default Java 1.8.0_111 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17963/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/17963/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17963/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17963/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Dilaver had a pretty good offline review of patch 2. Attaching #3 to address the following comments:

          • ReencryptionZonesStatus.reencryptRequests declared type should probably be List instead of Collection considering "it should preserve the order".
          • Should ReencryptionZonesStatus set lastFileProcessed to null upon removeZone when the removed zone is the current zone? Instead of the callers invoking both removeZone() and setLastFileProcessed() in tandem?
          • Rename flipPauseForTesting to separate methods (pauseForTesting and resumeFromTestPause)?
          • Make ReencryptionHandler#pauseForTesting() synchronized instead of synchronized block in the method? That way the second log statement will be guaranteed to be executed together with the preceding statements.
          • Add a max retry and terminate re-encrypt thread if keyprovider is still null.
          • misplaced log statement "{}({}) is a nested EZ, skipping for re-encrypt"
          • INodeDirectory.nextChild() will return 0 if startAfter has length 0 so there doesn't seem to be a need for the if.
          • While re-encryption is single threaded (at least for now), could it be more appropriate to create a ThreadFactory for a given instance of EncrytpionZoneManager instead of creating a new one for every invocation of EncryptionZoneManager#startReencryptThread(), especially considering the logged names of threads will overlap (if/when there are multiple threads)?
          • EncryptionZoneManager#removeEncryptionZone() make the logging clear and unconditional
          • Missing documentation for EncryptionZoneManager#reencryptEncryptionZone, #cancelReencryptEncryptionZone, #isEncryptionZoneRoot, #getIdRootEncryptionZone.
          • In EncryptionZoneManager#reencryptEncryptionZone(), unnecessary break in String constant (in throw).
          • EncryptionZoneManager#loadReencryptStatus(): why no null check for zoneId in the else? Note that LinkedHashSet doc says it allows null elements. Move the zoneId null check before the if?
          • In FSNamesystem, use this.dir.ezManager instead of dir.ezManager to match the surrounding style for setProvider?

          More coming...

          Show
          xiaochen Xiao Chen added a comment - Dilaver had a pretty good offline review of patch 2. Attaching #3 to address the following comments: ReencryptionZonesStatus.reencryptRequests declared type should probably be List instead of Collection considering "it should preserve the order". Should ReencryptionZonesStatus set lastFileProcessed to null upon removeZone when the removed zone is the current zone? Instead of the callers invoking both removeZone() and setLastFileProcessed() in tandem? Rename flipPauseForTesting to separate methods ( pauseForTesting and resumeFromTestPause )? Make ReencryptionHandler#pauseForTesting() synchronized instead of synchronized block in the method? That way the second log statement will be guaranteed to be executed together with the preceding statements. Add a max retry and terminate re-encrypt thread if keyprovider is still null. misplaced log statement "{}({}) is a nested EZ, skipping for re-encrypt" INodeDirectory.nextChild() will return 0 if startAfter has length 0 so there doesn't seem to be a need for the if . While re-encryption is single threaded (at least for now), could it be more appropriate to create a ThreadFactory for a given instance of EncrytpionZoneManager instead of creating a new one for every invocation of EncryptionZoneManager#startReencryptThread() , especially considering the logged names of threads will overlap (if/when there are multiple threads)? EncryptionZoneManager#removeEncryptionZone() make the logging clear and unconditional Missing documentation for EncryptionZoneManager#reencryptEncryptionZone, #cancelReencryptEncryptionZone, #isEncryptionZoneRoot, #getIdRootEncryptionZone . In EncryptionZoneManager#reencryptEncryptionZone() , unnecessary break in String constant (in throw). EncryptionZoneManager#loadReencryptStatus() : why no null check for zoneId in the else ? Note that LinkedHashSet doc says it allows null elements. Move the zoneId null check before the if ? In FSNamesystem, use this.dir.ezManager instead of dir.ezManager to match the surrounding style for setProvider ? More coming...
          Hide
          xiaochen Xiao Chen added a comment -

          During the offline discussion, Dilaver also asked about the progress restoring logic - the startAfter usage. It turns out in my conversion from getListing to inode based traversal, this wasn't treated correctly.

          Fixed it in patch 4, and added a unit test TestEncryptionZones#testRestartDuringReencrypt for that. Hopefully review can be a tad easier by diff-ing patch 3 v.s. patch 4.

          Show
          xiaochen Xiao Chen added a comment - During the offline discussion, Dilaver also asked about the progress restoring logic - the startAfter usage. It turns out in my conversion from getListing to inode based traversal, this wasn't treated correctly. Fixed it in patch 4, and added a unit test TestEncryptionZones#testRestartDuringReencrypt for that. Hopefully review can be a tad easier by diff-ing patch 3 v.s. patch 4.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
          0 mvndep 0m 34s Maven dependency ordering for branch
          +1 mvninstall 14m 9s trunk passed
          +1 compile 1m 22s trunk passed
          +1 checkstyle 0m 51s trunk passed
          +1 mvnsite 1m 23s trunk passed
          +1 mvneclipse 0m 27s trunk passed
          +1 findbugs 3m 9s trunk passed
          +1 javadoc 1m 1s trunk passed
          0 mvndep 0m 7s Maven dependency ordering for patch
          +1 mvninstall 1m 15s the patch passed
          +1 compile 1m 19s the patch passed
          +1 cc 1m 19s the patch passed
          +1 javac 1m 19s the patch passed
          -0 checkstyle 0m 48s hadoop-hdfs-project: The patch generated 17 new + 1572 unchanged - 2 fixed = 1589 total (was 1574)
          +1 mvnsite 1m 16s the patch passed
          +1 mvneclipse 0m 20s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 3s The patch has no ill-formed XML file.
          -1 findbugs 1m 51s hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
          +1 javadoc 0m 56s the patch passed
          +1 unit 0m 59s hadoop-hdfs-client in the patch passed.
          -1 unit 92m 7s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 21s The patch does not generate ASF License warnings.
          127m 29s



          Reason Tests
          FindBugs module:hadoop-hdfs-project/hadoop-hdfs
            Found reliance on default encoding in org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler.restoreReencryptFromProgress(Long, List, NavigableMap):in org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler.restoreReencryptFromProgress(Long, List, NavigableMap): new String(byte[]) At ReencryptionHandler.java:[line 239]
          Failed junit tests hadoop.hdfs.tools.TestDFSZKFailoverController
            hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
            hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
            hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
            hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA
            hadoop.hdfs.server.namenode.TestReconstructStripedBlocks
            hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints
            hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes
            hadoop.cli.TestCryptoAdminCLI
            hadoop.hdfs.tools.TestDFSHAAdminMiniCluster
            hadoop.hdfs.server.namenode.ha.TestHAStateTransitions



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12845960/HDFS-10899.04.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux b89a713f3302 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 4a659ff
          Default Java 1.8.0_111
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18047/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/18047/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18047/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18047/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18047/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. 0 mvndep 0m 34s Maven dependency ordering for branch +1 mvninstall 14m 9s trunk passed +1 compile 1m 22s trunk passed +1 checkstyle 0m 51s trunk passed +1 mvnsite 1m 23s trunk passed +1 mvneclipse 0m 27s trunk passed +1 findbugs 3m 9s trunk passed +1 javadoc 1m 1s trunk passed 0 mvndep 0m 7s Maven dependency ordering for patch +1 mvninstall 1m 15s the patch passed +1 compile 1m 19s the patch passed +1 cc 1m 19s the patch passed +1 javac 1m 19s the patch passed -0 checkstyle 0m 48s hadoop-hdfs-project: The patch generated 17 new + 1572 unchanged - 2 fixed = 1589 total (was 1574) +1 mvnsite 1m 16s the patch passed +1 mvneclipse 0m 20s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 3s The patch has no ill-formed XML file. -1 findbugs 1m 51s hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) +1 javadoc 0m 56s the patch passed +1 unit 0m 59s hadoop-hdfs-client in the patch passed. -1 unit 92m 7s hadoop-hdfs in the patch failed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 127m 29s Reason Tests FindBugs module:hadoop-hdfs-project/hadoop-hdfs   Found reliance on default encoding in org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler.restoreReencryptFromProgress(Long, List, NavigableMap):in org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler.restoreReencryptFromProgress(Long, List, NavigableMap): new String(byte[]) At ReencryptionHandler.java: [line 239] Failed junit tests hadoop.hdfs.tools.TestDFSZKFailoverController   hadoop.hdfs.server.namenode.ha.TestPipelinesFailover   hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer   hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication   hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA   hadoop.hdfs.server.namenode.TestReconstructStripedBlocks   hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints   hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes   hadoop.cli.TestCryptoAdminCLI   hadoop.hdfs.tools.TestDFSHAAdminMiniCluster   hadoop.hdfs.server.namenode.ha.TestHAStateTransitions Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12845960/HDFS-10899.04.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux b89a713f3302 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 4a659ff Default Java 1.8.0_111 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18047/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/18047/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html unit https://builds.apache.org/job/PreCommit-HDFS-Build/18047/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18047/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18047/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          dilaver Dilaver added a comment -

          Thanks for the changes Xiao. Some comments about patch 04:

          LOG.info("{} removed from re-encrypt queue {} when removing the zone.",
          result ? "Also " : "Did not", inode.getId(), reencryptionStatus);

          In EncryptionZoneManager#removeEncryptionZone:

          • the format string has 2 placeholders, but 3 args
          • I'd suggest moving "removed" into the arg as ? "Also removed" : "Did not remove"
          • it may be useful to log the inode name or path in addition to or instead of the Id

          // Verify the same is true after NN restart

          Not sure if this comment is needed (TestEncryptionZones#restartClusterDisableReencrypt).

          Typos in ReencryptionHandler#run: "reties", "maxRetires".

          Show
          dilaver Dilaver added a comment - Thanks for the changes Xiao. Some comments about patch 04: LOG.info("{} removed from re-encrypt queue {} when removing the zone.", result ? "Also " : "Did not", inode.getId(), reencryptionStatus); In EncryptionZoneManager#removeEncryptionZone : the format string has 2 placeholders, but 3 args I'd suggest moving "removed" into the arg as ? "Also removed" : "Did not remove" it may be useful to log the inode name or path in addition to or instead of the Id // Verify the same is true after NN restart Not sure if this comment is needed ( TestEncryptionZones#restartClusterDisableReencrypt ). Typos in ReencryptionHandler#run : "reties", "maxRetires".
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks a lot for the prompt review, Dilaver.

          Patch addressed the comments. Also moved the reencryptionStatus variable from EZM onto ReencryptionHandler, to have the info more contained.

          Fixed the unit test failures, and cleaned up TestEncryptionZones a little bit.

          Show
          xiaochen Xiao Chen added a comment - Thanks a lot for the prompt review, Dilaver. Patch addressed the comments. Also moved the reencryptionStatus variable from EZM onto ReencryptionHandler, to have the info more contained. Fixed the unit test failures, and cleaned up TestEncryptionZones a little bit.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
          0 shelldocs 0m 1s Shelldocs was not available.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 8 new or modified test files.
          0 mvndep 0m 8s Maven dependency ordering for branch
          +1 mvninstall 25m 51s trunk passed
          +1 compile 1m 58s trunk passed
          +1 checkstyle 1m 4s trunk passed
          +1 mvnsite 1m 45s trunk passed
          +1 mvneclipse 0m 34s trunk passed
          +1 findbugs 3m 35s trunk passed
          +1 javadoc 1m 12s trunk passed
          0 mvndep 0m 9s Maven dependency ordering for patch
          +1 mvninstall 1m 33s the patch passed
          +1 compile 1m 40s the patch passed
          +1 cc 1m 40s the patch passed
          +1 javac 1m 40s the patch passed
          -0 checkstyle 0m 59s hadoop-hdfs-project: The patch generated 15 new + 1699 unchanged - 3 fixed = 1714 total (was 1702)
          +1 mvnsite 1m 46s the patch passed
          +1 mvneclipse 0m 26s the patch passed
          +1 shellcheck 0m 12s The patch generated 0 new + 106 unchanged - 1 fixed = 106 total (was 107)
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 4s The patch has no ill-formed XML file.
          +1 findbugs 3m 52s the patch passed
          +1 javadoc 1m 1s the patch passed
          +1 unit 1m 0s hadoop-hdfs-client in the patch passed.
          -1 unit 70m 26s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 24s The patch does not generate ASF License warnings.
          121m 38s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA
            hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
            hadoop.hdfs.TestClientProtocolForPipelineRecovery
            hadoop.hdfs.server.namenode.TestNamenodeRetryCache



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12846150/HDFS-10899.05.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml shellcheck shelldocs
          uname Linux 17ae7fde37bf 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / a59df15
          Default Java 1.8.0_111
          shellcheck v0.4.5
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18102/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18102/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18102/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18102/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. 0 shelldocs 0m 1s Shelldocs was not available. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 8 new or modified test files. 0 mvndep 0m 8s Maven dependency ordering for branch +1 mvninstall 25m 51s trunk passed +1 compile 1m 58s trunk passed +1 checkstyle 1m 4s trunk passed +1 mvnsite 1m 45s trunk passed +1 mvneclipse 0m 34s trunk passed +1 findbugs 3m 35s trunk passed +1 javadoc 1m 12s trunk passed 0 mvndep 0m 9s Maven dependency ordering for patch +1 mvninstall 1m 33s the patch passed +1 compile 1m 40s the patch passed +1 cc 1m 40s the patch passed +1 javac 1m 40s the patch passed -0 checkstyle 0m 59s hadoop-hdfs-project: The patch generated 15 new + 1699 unchanged - 3 fixed = 1714 total (was 1702) +1 mvnsite 1m 46s the patch passed +1 mvneclipse 0m 26s the patch passed +1 shellcheck 0m 12s The patch generated 0 new + 106 unchanged - 1 fixed = 106 total (was 107) +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 4s The patch has no ill-formed XML file. +1 findbugs 3m 52s the patch passed +1 javadoc 1m 1s the patch passed +1 unit 1m 0s hadoop-hdfs-client in the patch passed. -1 unit 70m 26s hadoop-hdfs in the patch failed. +1 asflicense 0m 24s The patch does not generate ASF License warnings. 121m 38s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA   hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer   hadoop.hdfs.TestClientProtocolForPipelineRecovery   hadoop.hdfs.server.namenode.TestNamenodeRetryCache Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12846150/HDFS-10899.05.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml shellcheck shelldocs uname Linux 17ae7fde37bf 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / a59df15 Default Java 1.8.0_111 shellcheck v0.4.5 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18102/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/18102/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18102/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18102/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Patch 6 fixes the failed tests. Had a bug that retry cache rpc Id isn't handled correctly on edits.

          Show
          xiaochen Xiao Chen added a comment - Patch 6 fixes the failed tests. Had a bug that retry cache rpc Id isn't handled correctly on edits.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 34s Docker mode activated.
          0 shelldocs 0m 0s Shelldocs was not available.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 11 new or modified test files.
          0 mvndep 0m 9s Maven dependency ordering for branch
          +1 mvninstall 16m 8s trunk passed
          +1 compile 2m 8s trunk passed
          +1 checkstyle 1m 26s trunk passed
          +1 mvnsite 2m 2s trunk passed
          +1 mvneclipse 0m 41s trunk passed
          +1 findbugs 4m 13s trunk passed
          +1 javadoc 1m 27s trunk passed
          0 mvndep 0m 9s Maven dependency ordering for patch
          +1 mvninstall 1m 49s the patch passed
          +1 compile 1m 52s the patch passed
          +1 cc 1m 52s the patch passed
          +1 javac 1m 52s the patch passed
          -0 checkstyle 1m 12s hadoop-hdfs-project: The patch generated 17 new + 1778 unchanged - 5 fixed = 1795 total (was 1783)
          +1 mvnsite 1m 53s the patch passed
          +1 mvneclipse 0m 33s the patch passed
          +1 shellcheck 0m 14s The patch generated 0 new + 106 unchanged - 1 fixed = 106 total (was 107)
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 5s The patch has no ill-formed XML file.
          +1 findbugs 4m 22s the patch passed
          +1 javadoc 1m 13s the patch passed
          +1 unit 1m 12s hadoop-hdfs-client in the patch passed.
          -1 unit 115m 45s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 24s The patch does not generate ASF License warnings.
          161m 38s



          Reason Tests
          Failed junit tests hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
            hadoop.hdfs.TestAclsEndToEnd
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure030
          Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2
            org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12849374/HDFS-10899.06.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml shellcheck shelldocs
          uname Linux 3ca2b9c9b712 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / a7463b6
          Default Java 1.8.0_121
          shellcheck v0.4.5
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18259/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18259/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18259/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18259/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 34s Docker mode activated. 0 shelldocs 0m 0s Shelldocs was not available. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 11 new or modified test files. 0 mvndep 0m 9s Maven dependency ordering for branch +1 mvninstall 16m 8s trunk passed +1 compile 2m 8s trunk passed +1 checkstyle 1m 26s trunk passed +1 mvnsite 2m 2s trunk passed +1 mvneclipse 0m 41s trunk passed +1 findbugs 4m 13s trunk passed +1 javadoc 1m 27s trunk passed 0 mvndep 0m 9s Maven dependency ordering for patch +1 mvninstall 1m 49s the patch passed +1 compile 1m 52s the patch passed +1 cc 1m 52s the patch passed +1 javac 1m 52s the patch passed -0 checkstyle 1m 12s hadoop-hdfs-project: The patch generated 17 new + 1778 unchanged - 5 fixed = 1795 total (was 1783) +1 mvnsite 1m 53s the patch passed +1 mvneclipse 0m 33s the patch passed +1 shellcheck 0m 14s The patch generated 0 new + 106 unchanged - 1 fixed = 106 total (was 107) +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 5s The patch has no ill-formed XML file. +1 findbugs 4m 22s the patch passed +1 javadoc 1m 13s the patch passed +1 unit 1m 12s hadoop-hdfs-client in the patch passed. -1 unit 115m 45s hadoop-hdfs in the patch failed. +1 asflicense 0m 24s The patch does not generate ASF License warnings. 161m 38s Reason Tests Failed junit tests hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer   hadoop.hdfs.TestAclsEndToEnd   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure030 Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2   org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12849374/HDFS-10899.06.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml shellcheck shelldocs uname Linux 3ca2b9c9b712 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / a7463b6 Default Java 1.8.0_121 shellcheck v0.4.5 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18259/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/18259/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18259/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18259/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Failed tests are not related and passes locally, except for the oev which I added a wrong editStored binary locally... Patch 6 again.

          Show
          xiaochen Xiao Chen added a comment - Failed tests are not related and passes locally, except for the oev which I added a wrong editStored binary locally... Patch 6 again.
          Hide
          xiaochen Xiao Chen added a comment -

          Patch 7 rebases to latest trunk, and added a test about snapshot.
          editsStored actually can't be picked up by jenkins, so precommit always fails. File is attached and passes locally.

          Show
          xiaochen Xiao Chen added a comment - Patch 7 rebases to latest trunk, and added a test about snapshot. editsStored actually can't be picked up by jenkins, so precommit always fails. File is attached and passes locally.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
          0 shelldocs 0m 0s Shelldocs was not available.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 11 new or modified test files.
          0 mvndep 0m 29s Maven dependency ordering for branch
          +1 mvninstall 13m 54s trunk passed
          +1 compile 1m 30s trunk passed
          +1 checkstyle 1m 2s trunk passed
          +1 mvnsite 1m 44s trunk passed
          +1 mvneclipse 0m 33s trunk passed
          +1 findbugs 3m 34s trunk passed
          +1 javadoc 1m 9s trunk passed
          0 mvndep 0m 8s Maven dependency ordering for patch
          +1 mvninstall 1m 28s the patch passed
          +1 compile 1m 36s the patch passed
          +1 cc 1m 36s the patch passed
          +1 javac 1m 36s the patch passed
          -0 checkstyle 0m 56s hadoop-hdfs-project: The patch generated 18 new + 1779 unchanged - 5 fixed = 1797 total (was 1784)
          +1 mvnsite 1m 33s the patch passed
          +1 mvneclipse 0m 25s the patch passed
          +1 shellcheck 0m 13s The patch generated 0 new + 106 unchanged - 1 fixed = 106 total (was 107)
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 5s The patch has no ill-formed XML file.
          +1 findbugs 3m 49s the patch passed
          +1 javadoc 1m 2s the patch passed
          +1 unit 1m 6s hadoop-hdfs-client in the patch passed.
          -1 unit 80m 0s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 24s The patch does not generate ASF License warnings.
          118m 45s



          Reason Tests
          Failed junit tests hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
          Timed out junit tests org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12850517/HDFS-10899.07.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml shellcheck shelldocs
          uname Linux 2f1b3510d65d 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / b6f290d
          Default Java 1.8.0_121
          shellcheck v0.4.5
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18312/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18312/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18312/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18312/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. 0 shelldocs 0m 0s Shelldocs was not available. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 11 new or modified test files. 0 mvndep 0m 29s Maven dependency ordering for branch +1 mvninstall 13m 54s trunk passed +1 compile 1m 30s trunk passed +1 checkstyle 1m 2s trunk passed +1 mvnsite 1m 44s trunk passed +1 mvneclipse 0m 33s trunk passed +1 findbugs 3m 34s trunk passed +1 javadoc 1m 9s trunk passed 0 mvndep 0m 8s Maven dependency ordering for patch +1 mvninstall 1m 28s the patch passed +1 compile 1m 36s the patch passed +1 cc 1m 36s the patch passed +1 javac 1m 36s the patch passed -0 checkstyle 0m 56s hadoop-hdfs-project: The patch generated 18 new + 1779 unchanged - 5 fixed = 1797 total (was 1784) +1 mvnsite 1m 33s the patch passed +1 mvneclipse 0m 25s the patch passed +1 shellcheck 0m 13s The patch generated 0 new + 106 unchanged - 1 fixed = 106 total (was 107) +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 5s The patch has no ill-formed XML file. +1 findbugs 3m 49s the patch passed +1 javadoc 1m 2s the patch passed +1 unit 1m 6s hadoop-hdfs-client in the patch passed. -1 unit 80m 0s hadoop-hdfs in the patch failed. +1 asflicense 0m 24s The patch does not generate ASF License warnings. 118m 45s Reason Tests Failed junit tests hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer Timed out junit tests org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12850517/HDFS-10899.07.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml shellcheck shelldocs uname Linux 2f1b3510d65d 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / b6f290d Default Java 1.8.0_121 shellcheck v0.4.5 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18312/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/18312/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18312/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18312/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          andrew.wang Andrew Wang added a comment -

          Taking a look this week. One small note, I'm unable to apply the patch with "git apply" due to the below error message. I think using some combination of "git diff --full-index --binary" will make this work.

          error: cannot apply binary patch to 'hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored' without full index line
          error: hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored: patch does not apply
          
          Show
          andrew.wang Andrew Wang added a comment - Taking a look this week. One small note, I'm unable to apply the patch with "git apply" due to the below error message. I think using some combination of "git diff --full-index --binary" will make this work. error: cannot apply binary patch to 'hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored' without full index line error: hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored: patch does not apply
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Andrew for the heads up! Yeah that file has to be downloaded + replaced it seems. All history jiras touched the file had this same binary git apply problem.

          Show
          xiaochen Xiao Chen added a comment - Thanks Andrew for the heads up! Yeah that file has to be downloaded + replaced it seems. All history jiras touched the file had this same binary git apply problem.
          Hide
          andrew.wang Andrew Wang added a comment -

          Hi Xiao, I soaked in this patch for a while, had a bunch of review comments. I didn't get through everything, but I think this is worth posting now, and I'll give a more detailed review of the next rev:

          ReencryptionHandler:

          • Rather than having get/setProvider, can we set it in the EZManager constructor by getting it from dir.getFSNamesystem().getProvider() ? Then we don't need the retry loop in run either.
          • In FSEditLog, normally we log one edit per call to a log function. This is broken by logReencryptZoneStatus, could you change it so ReencryptionHandler does the for loop over the batch calling logSetXAttrs then a final call to logReencryptZoneStatus ?
          • We shouldn't call logSync inside the write lock in reencryptDir, since it's a time consuming operation. In the RPC context, we call this outside the lock, just before responding to the caller. This means there's a little gap where readers could see uncommitted data. For reencrypt, this is actually very safe since the old and new EDEK are functionally equivalent, and we aren't providing any guarantees while the zone is still reencrypting.
          • We need the write lock held when logging to the FSEditLog.
          • I'd prefer we gather all the file EDEK information in one shot with just the read lock held, do all the KMS communication, then take the writelock at the end to log the batch. This also makes it easier to multi-thread the KMS communication stage later, and better performance since we're doing more work under the lock.
          • There are lock/unlocks across functions which makes this harder to understand. For instance, unlock then relock in reencryptINode because it was locked in the caller. Consider refactoring? It'd be great if all locking was handled with simple lock/unlock in try/finally.
          • restoreReencryptFromProgress, javadoc mentions lastFileProcessed but abbreviations here and below are lpf.
          • Is it possible to rework reencryptEncryptionZoneInt so we only call reencryptDir once after setting up the iteration state? I think this would also make restoreReencryptFromProgress clearer, since it could have fewer arguments.
          • Would also consider renaming restoreReencryptFromProgress, like restoreFromLastFileProcessed
          • Is it possible to encapsulate the file iteration as an iterator? We pass subdirs around a lot right now.

          ReencryptionStatus:

          • Add Precondition check to add, to make sure that it's not added if already present

          EZManager:

          • typo: zondId
          • What does the "Id" in getIdRootEncryptionZone mean?
          • getIdRootEncryptionZone checks for read in resolvePath, but aren't reencrypt and cancel write operations? We don't want to be running this on snapshot paths. I think we should push the resolvePath up to the caller of this function, since it also lets us reuse the IIP if necessary (resolvePath is expensive).
          • In FSNamesystem, looks like we call resolvePath in reencryptEncryptionZoneInt but not the IIP version. The normal flow is to resolvePath to an IIP in the corresponding FSDir file, then pass it down to avoid resolving again for performance reasons.

          ReencryptionHandler#checkNNReadyForWrite and catching exceptions

          • checkNNReadyForWrite checks that the NN is active. The handler should only be live when the NN is active though, stopActiveServices will interrupt the thread. If we're worried about this, it'd be better to periodically check for the interrupt and throw InterruptedException.
          • With this, we can change checkNNReadyForWrite into a simple safemode check. Since exceptions are expensive, I'd prefer a boolean return value and if statement instead.
          • Note also that safemode checks need to be done with the lock held, so we should do it where we're grabbing the lock anyway. For correctness, I think we only need to check this just before we do a write (i.e. apply the new EDEKs to our in-memory state). We can still check elsewhere to save doing extra work.
          • Related to the above, in two places in run we catch InterruptedException and just log and return. This is a code smell; although it's the top-level function, it's still better to re-interrupt the thread before returning.
          • I like this article I found on InterruptedException: https://www.ibm.com/developerworks/library/j-jtp05236/
          • Catching Exception in run is a code smell. What is the intent? It looks like we already catch the checked exceptions, so this will catch RuntimeExceptions (which are normally unrecoverable).

          Metrics:

          • I'd prefer that we track the per zone metrics in ReencryptionStatus, since it's what also tracks the currentZone. This avoids having to reset the metrics on removeZone.
          • Follow-on idea: it'd be nice for admins to be able to query the status of queued and running reencrypt commands. Progress indicators like submission time, start time, # skipped, # reencrypted, total # (if this is cheap to get) would be helpful.

          Others:

          • Looks like we aren't using the op cache in FSEditLog SetXAttrOp / RemoveXAttrOp. I think this is accidental, could you do some research? Particularly since we'll be doing a lot of SetXAttrOps, avoiding all that object allocation would be nice. This could be a separate JIRA.
          • DFSTestUtil, the new comment says "Pause re-encrypt to allow the cancel command can go through", but IIUC this isn't about authorization but instead pausing reencryption so there's an active zone reencryption to cancel. Could you reword?
          • fsimage.proto and loader code, since the lastFile belongs to a specific zone, I'd prefer we define the on-disk format like this rather than needing to know about zone ordering:
          • Rather than making FSNamesystem#getFileInfo public for tests, can we dig it out from the RPC interface?
          message EncryptionManagerSection {
            message ReencryptCommand {
              required uint64 zone     = 1;
              optional string lastFile = 2;
            }
            repeated ReencryptCommand reencryptCommands = 1;
          }
          

          With the current on-disk format, we can only reencrypt one zone at a time. Later, we might want to reencrypt in parallel.

          Follow-on:

          • This could be difficult with all the lock/unlocks and stages, but I'd prefer a goal-pause-time configuration for the run loop. This is easier for admins to reason about. We would still use the batch size for determining when to log a batch.
          Show
          andrew.wang Andrew Wang added a comment - Hi Xiao, I soaked in this patch for a while, had a bunch of review comments. I didn't get through everything, but I think this is worth posting now, and I'll give a more detailed review of the next rev: ReencryptionHandler: Rather than having get/setProvider, can we set it in the EZManager constructor by getting it from dir.getFSNamesystem().getProvider() ? Then we don't need the retry loop in run either. In FSEditLog, normally we log one edit per call to a log function. This is broken by logReencryptZoneStatus , could you change it so ReencryptionHandler does the for loop over the batch calling logSetXAttrs then a final call to logReencryptZoneStatus ? We shouldn't call logSync inside the write lock in reencryptDir, since it's a time consuming operation. In the RPC context, we call this outside the lock, just before responding to the caller. This means there's a little gap where readers could see uncommitted data. For reencrypt, this is actually very safe since the old and new EDEK are functionally equivalent, and we aren't providing any guarantees while the zone is still reencrypting. We need the write lock held when logging to the FSEditLog. I'd prefer we gather all the file EDEK information in one shot with just the read lock held, do all the KMS communication, then take the writelock at the end to log the batch. This also makes it easier to multi-thread the KMS communication stage later, and better performance since we're doing more work under the lock. There are lock/unlocks across functions which makes this harder to understand. For instance, unlock then relock in reencryptINode because it was locked in the caller. Consider refactoring? It'd be great if all locking was handled with simple lock/unlock in try/finally. restoreReencryptFromProgress, javadoc mentions lastFileProcessed but abbreviations here and below are lpf . Is it possible to rework reencryptEncryptionZoneInt so we only call reencryptDir once after setting up the iteration state? I think this would also make restoreReencryptFromProgress clearer, since it could have fewer arguments. Would also consider renaming restoreReencryptFromProgress, like restoreFromLastFileProcessed Is it possible to encapsulate the file iteration as an iterator? We pass subdirs around a lot right now. ReencryptionStatus: Add Precondition check to add, to make sure that it's not added if already present EZManager: typo: zondId What does the "Id" in getIdRootEncryptionZone mean? getIdRootEncryptionZone checks for read in resolvePath, but aren't reencrypt and cancel write operations? We don't want to be running this on snapshot paths. I think we should push the resolvePath up to the caller of this function, since it also lets us reuse the IIP if necessary (resolvePath is expensive). In FSNamesystem, looks like we call resolvePath in reencryptEncryptionZoneInt but not the IIP version. The normal flow is to resolvePath to an IIP in the corresponding FSDir file, then pass it down to avoid resolving again for performance reasons. ReencryptionHandler#checkNNReadyForWrite and catching exceptions checkNNReadyForWrite checks that the NN is active. The handler should only be live when the NN is active though, stopActiveServices will interrupt the thread. If we're worried about this, it'd be better to periodically check for the interrupt and throw InterruptedException. With this, we can change checkNNReadyForWrite into a simple safemode check. Since exceptions are expensive, I'd prefer a boolean return value and if statement instead. Note also that safemode checks need to be done with the lock held, so we should do it where we're grabbing the lock anyway. For correctness, I think we only need to check this just before we do a write (i.e. apply the new EDEKs to our in-memory state). We can still check elsewhere to save doing extra work. Related to the above, in two places in run we catch InterruptedException and just log and return. This is a code smell; although it's the top-level function, it's still better to re-interrupt the thread before returning. I like this article I found on InterruptedException: https://www.ibm.com/developerworks/library/j-jtp05236/ Catching Exception in run is a code smell. What is the intent? It looks like we already catch the checked exceptions, so this will catch RuntimeExceptions (which are normally unrecoverable). Metrics: I'd prefer that we track the per zone metrics in ReencryptionStatus, since it's what also tracks the currentZone. This avoids having to reset the metrics on removeZone. Follow-on idea: it'd be nice for admins to be able to query the status of queued and running reencrypt commands. Progress indicators like submission time, start time, # skipped, # reencrypted, total # (if this is cheap to get) would be helpful. Others: Looks like we aren't using the op cache in FSEditLog SetXAttrOp / RemoveXAttrOp. I think this is accidental, could you do some research? Particularly since we'll be doing a lot of SetXAttrOps, avoiding all that object allocation would be nice. This could be a separate JIRA. DFSTestUtil, the new comment says "Pause re-encrypt to allow the cancel command can go through", but IIUC this isn't about authorization but instead pausing reencryption so there's an active zone reencryption to cancel. Could you reword? fsimage.proto and loader code, since the lastFile belongs to a specific zone, I'd prefer we define the on-disk format like this rather than needing to know about zone ordering: Rather than making FSNamesystem#getFileInfo public for tests, can we dig it out from the RPC interface? message EncryptionManagerSection { message ReencryptCommand { required uint64 zone = 1; optional string lastFile = 2; } repeated ReencryptCommand reencryptCommands = 1; } With the current on-disk format, we can only reencrypt one zone at a time. Later, we might want to reencrypt in parallel. Follow-on: This could be difficult with all the lock/unlocks and stages, but I'd prefer a goal-pause-time configuration for the run loop. This is easier for admins to reason about. We would still use the batch size for determining when to log a batch.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks much for the soak and all these good comments Andrew. Sorry this took a while to update.

          Attaching patch 8:

          • Refactored the ReencryptionHandler so the innards are easier to reason about. This makes the locking easier, gets rid of the subdirs, and contacts the KMS after a full batch is ready, in a new method processCurrentBatch
          • Having a some local bench mark it seems the communication overhead to KMS is dominating (60%+), so we can potentially add the re-encrypt batch API to KMS and use that on the above. Could multithread the processCurrentBatch to further push performance I think.

          Also addresses all comments above, with the exceptions following:

          This could be difficult with all the lock/unlocks and stages, but I'd prefer a goal-pause-time configuration for the run loop. This is easier for admins to reason about. We would still use the batch size for determining when to log a batch.

          Good idea. Will be working on that.

          Looks like we aren't using the op cache in FSEditLog SetXAttrOp / RemoveXAttrOp. I think this is accidental, could you do some research? Particularly since we'll be doing a lot of SetXAttrOps, avoiding all that object allocation would be nice. This could be a separate JIRA.

          Tracked this back to the initial HDFS-6301, pinged there but no response. Agree this is likely a bug, created HDFS-11410. Good find!

          Follow-on idea: it'd be nice for admins to be able to query the status of queued and running reencrypt commands. Progress indicators like submission time, start time, # skipped, # reencrypted, total # (if this is cheap to get) would be helpful.

          Planed to add -status to the crypto command, still marking as todo so far ...

          Catching Exception in run is a code smell. What is the intent? It looks like we already catch the checked exceptions, so this will catch RuntimeExceptions (which are normally unrecoverable).

          True it's not ideal. The reason I added it is re-encrypt is happening in a separate thread, and Exceptions go to stderr which may or may not get collected. Logging the exception in the NN log will add some supportability.

          Also it almost feels we should do the kms batching already. Let me play with it and update no later than Wednesday.

          Show
          xiaochen Xiao Chen added a comment - Thanks much for the soak and all these good comments Andrew. Sorry this took a while to update. Attaching patch 8: Refactored the ReencryptionHandler so the innards are easier to reason about. This makes the locking easier, gets rid of the subdirs , and contacts the KMS after a full batch is ready, in a new method processCurrentBatch Having a some local bench mark it seems the communication overhead to KMS is dominating (60%+), so we can potentially add the re-encrypt batch API to KMS and use that on the above. Could multithread the processCurrentBatch to further push performance I think. Also addresses all comments above, with the exceptions following: This could be difficult with all the lock/unlocks and stages, but I'd prefer a goal-pause-time configuration for the run loop. This is easier for admins to reason about. We would still use the batch size for determining when to log a batch. Good idea. Will be working on that. Looks like we aren't using the op cache in FSEditLog SetXAttrOp / RemoveXAttrOp. I think this is accidental, could you do some research? Particularly since we'll be doing a lot of SetXAttrOps, avoiding all that object allocation would be nice. This could be a separate JIRA. Tracked this back to the initial HDFS-6301 , pinged there but no response. Agree this is likely a bug, created HDFS-11410 . Good find! Follow-on idea: it'd be nice for admins to be able to query the status of queued and running reencrypt commands. Progress indicators like submission time, start time, # skipped, # reencrypted, total # (if this is cheap to get) would be helpful. Planed to add -status to the crypto command, still marking as todo so far ... Catching Exception in run is a code smell. What is the intent? It looks like we already catch the checked exceptions, so this will catch RuntimeExceptions (which are normally unrecoverable). True it's not ideal. The reason I added it is re-encrypt is happening in a separate thread, and Exceptions go to stderr which may or may not get collected. Logging the exception in the NN log will add some supportability. Also it almost feels we should do the kms batching already. Let me play with it and update no later than Wednesday.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          -1 patch 0m 7s HDFS-10899 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



          Subsystem Report/Notes
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12852456/HDFS-10899.08.patch
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18362/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 7s HDFS-10899 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12852456/HDFS-10899.08.patch Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18362/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 17m 43s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 10 new or modified test files.
          0 mvndep 0m 6s Maven dependency ordering for branch
          +1 mvninstall 13m 52s trunk passed
          +1 compile 1m 34s trunk passed
          +1 checkstyle 0m 48s trunk passed
          +1 mvnsite 1m 34s trunk passed
          +1 mvneclipse 0m 25s trunk passed
          +1 findbugs 3m 30s trunk passed
          +1 javadoc 1m 3s trunk passed
          0 mvndep 0m 7s Maven dependency ordering for patch
          +1 mvninstall 1m 28s the patch passed
          +1 compile 1m 33s the patch passed
          +1 cc 1m 33s the patch passed
          +1 javac 1m 33s the patch passed
          -0 checkstyle 0m 47s hadoop-hdfs-project: The patch generated 19 new + 1779 unchanged - 4 fixed = 1798 total (was 1783)
          +1 mvnsite 1m 30s the patch passed
          +1 mvneclipse 0m 22s the patch passed
          -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          +1 xml 0m 4s The patch has no ill-formed XML file.
          -1 findbugs 2m 2s hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
          +1 javadoc 0m 54s the patch passed
          +1 unit 0m 54s hadoop-hdfs-client in the patch passed.
          -1 unit 65m 19s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 19s The patch does not generate ASF License warnings.
          120m 3s



          Reason Tests
          FindBugs module:hadoop-hdfs-project/hadoop-hdfs
            Possible null pointer dereference of fei in new org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler$ReencryptTracker(ReencryptionHandler, INodeFile) Dereferenced at ReencryptionHandler.java:fei in new org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler$ReencryptTracker(ReencryptionHandler, INodeFile) Dereferenced at ReencryptionHandler.java:[line 128]
          Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
            hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12852457/HDFS-10899.08.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux eba84980382a 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 71c23c9
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18363/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/18363/artifact/patchprocess/whitespace-eol.txt
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/18363/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18363/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18363/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18363/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 17m 43s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 10 new or modified test files. 0 mvndep 0m 6s Maven dependency ordering for branch +1 mvninstall 13m 52s trunk passed +1 compile 1m 34s trunk passed +1 checkstyle 0m 48s trunk passed +1 mvnsite 1m 34s trunk passed +1 mvneclipse 0m 25s trunk passed +1 findbugs 3m 30s trunk passed +1 javadoc 1m 3s trunk passed 0 mvndep 0m 7s Maven dependency ordering for patch +1 mvninstall 1m 28s the patch passed +1 compile 1m 33s the patch passed +1 cc 1m 33s the patch passed +1 javac 1m 33s the patch passed -0 checkstyle 0m 47s hadoop-hdfs-project: The patch generated 19 new + 1779 unchanged - 4 fixed = 1798 total (was 1783) +1 mvnsite 1m 30s the patch passed +1 mvneclipse 0m 22s the patch passed -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply +1 xml 0m 4s The patch has no ill-formed XML file. -1 findbugs 2m 2s hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) +1 javadoc 0m 54s the patch passed +1 unit 0m 54s hadoop-hdfs-client in the patch passed. -1 unit 65m 19s hadoop-hdfs in the patch failed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 120m 3s Reason Tests FindBugs module:hadoop-hdfs-project/hadoop-hdfs   Possible null pointer dereference of fei in new org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler$ReencryptTracker(ReencryptionHandler, INodeFile) Dereferenced at ReencryptionHandler.java:fei in new org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler$ReencryptTracker(ReencryptionHandler, INodeFile) Dereferenced at ReencryptionHandler.java: [line 128] Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12852457/HDFS-10899.08.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux eba84980382a 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 71c23c9 Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18363/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/18363/artifact/patchprocess/whitespace-eol.txt findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/18363/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html unit https://builds.apache.org/job/PreCommit-HDFS-Build/18363/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18363/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18363/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          patch 9:

          • added a throttler. Note this won't be very accurate as we do not want frequent lock/unlock, and the throttling has to happen out of locks after a batch is created. Better approaches welcome. (Wasn't sure if follow-on means separate jira preferable. can split that out if so to keep the central patch size sane.).
          • Should have mentioned in patch 8: I think after refactor the code is more readable, but seems we can't lock/unlock within each method. This is similar to FSDirEncryptionZoneOp#getEncryptionKeyInfo
          • Fixed precommit complains.
          Show
          xiaochen Xiao Chen added a comment - patch 9: added a throttler. Note this won't be very accurate as we do not want frequent lock/unlock, and the throttling has to happen out of locks after a batch is created. Better approaches welcome. (Wasn't sure if follow-on means separate jira preferable. can split that out if so to keep the central patch size sane.). Should have mentioned in patch 8: I think after refactor the code is more readable, but seems we can't lock/unlock within each method. This is similar to FSDirEncryptionZoneOp#getEncryptionKeyInfo Fixed precommit complains.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 11 new or modified test files.
          0 mvndep 0m 29s Maven dependency ordering for branch
          +1 mvninstall 12m 37s trunk passed
          +1 compile 1m 22s trunk passed
          +1 checkstyle 0m 40s trunk passed
          +1 mvnsite 1m 23s trunk passed
          +1 mvneclipse 0m 25s trunk passed
          +1 findbugs 3m 8s trunk passed
          +1 javadoc 1m 1s trunk passed
          0 mvndep 0m 7s Maven dependency ordering for patch
          -1 mvninstall 0m 24s hadoop-hdfs in the patch failed.
          -1 compile 0m 43s hadoop-hdfs-project in the patch failed.
          -1 cc 0m 43s hadoop-hdfs-project in the patch failed.
          -1 javac 0m 43s hadoop-hdfs-project in the patch failed.
          +1 checkstyle 0m 37s the patch passed
          -1 mvnsite 0m 25s hadoop-hdfs in the patch failed.
          +1 mvneclipse 0m 20s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 4s The patch has no ill-formed XML file.
          -1 findbugs 0m 25s hadoop-hdfs in the patch failed.
          -1 javadoc 0m 37s hadoop-hdfs-project_hadoop-hdfs generated 2 new + 7 unchanged - 0 fixed = 9 total (was 7)
          +1 unit 0m 54s hadoop-hdfs-client in the patch passed.
          -1 unit 0m 25s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          30m 52s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12852766/HDFS-10899.09.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux 3135be15d837 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / cd3e59a
          Default Java 1.8.0_121
          findbugs v3.0.0
          mvninstall https://builds.apache.org/job/PreCommit-HDFS-Build/18379/artifact/patchprocess/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt
          compile https://builds.apache.org/job/PreCommit-HDFS-Build/18379/artifact/patchprocess/patch-compile-hadoop-hdfs-project.txt
          cc https://builds.apache.org/job/PreCommit-HDFS-Build/18379/artifact/patchprocess/patch-compile-hadoop-hdfs-project.txt
          javac https://builds.apache.org/job/PreCommit-HDFS-Build/18379/artifact/patchprocess/patch-compile-hadoop-hdfs-project.txt
          mvnsite https://builds.apache.org/job/PreCommit-HDFS-Build/18379/artifact/patchprocess/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/18379/artifact/patchprocess/patch-findbugs-hadoop-hdfs-project_hadoop-hdfs.txt
          javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/18379/artifact/patchprocess/diff-javadoc-javadoc-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18379/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18379/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18379/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 11 new or modified test files. 0 mvndep 0m 29s Maven dependency ordering for branch +1 mvninstall 12m 37s trunk passed +1 compile 1m 22s trunk passed +1 checkstyle 0m 40s trunk passed +1 mvnsite 1m 23s trunk passed +1 mvneclipse 0m 25s trunk passed +1 findbugs 3m 8s trunk passed +1 javadoc 1m 1s trunk passed 0 mvndep 0m 7s Maven dependency ordering for patch -1 mvninstall 0m 24s hadoop-hdfs in the patch failed. -1 compile 0m 43s hadoop-hdfs-project in the patch failed. -1 cc 0m 43s hadoop-hdfs-project in the patch failed. -1 javac 0m 43s hadoop-hdfs-project in the patch failed. +1 checkstyle 0m 37s the patch passed -1 mvnsite 0m 25s hadoop-hdfs in the patch failed. +1 mvneclipse 0m 20s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 4s The patch has no ill-formed XML file. -1 findbugs 0m 25s hadoop-hdfs in the patch failed. -1 javadoc 0m 37s hadoop-hdfs-project_hadoop-hdfs generated 2 new + 7 unchanged - 0 fixed = 9 total (was 7) +1 unit 0m 54s hadoop-hdfs-client in the patch passed. -1 unit 0m 25s hadoop-hdfs in the patch failed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 30m 52s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12852766/HDFS-10899.09.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux 3135be15d837 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / cd3e59a Default Java 1.8.0_121 findbugs v3.0.0 mvninstall https://builds.apache.org/job/PreCommit-HDFS-Build/18379/artifact/patchprocess/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt compile https://builds.apache.org/job/PreCommit-HDFS-Build/18379/artifact/patchprocess/patch-compile-hadoop-hdfs-project.txt cc https://builds.apache.org/job/PreCommit-HDFS-Build/18379/artifact/patchprocess/patch-compile-hadoop-hdfs-project.txt javac https://builds.apache.org/job/PreCommit-HDFS-Build/18379/artifact/patchprocess/patch-compile-hadoop-hdfs-project.txt mvnsite https://builds.apache.org/job/PreCommit-HDFS-Build/18379/artifact/patchprocess/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/18379/artifact/patchprocess/patch-findbugs-hadoop-hdfs-project_hadoop-hdfs.txt javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/18379/artifact/patchprocess/diff-javadoc-javadoc-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/18379/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18379/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18379/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 1m 6s Docker mode activated.
          +1 @author 0m 1s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 11 new or modified test files.
          0 mvndep 0m 11s Maven dependency ordering for branch
          +1 mvninstall 17m 44s trunk passed
          +1 compile 1m 38s trunk passed
          +1 checkstyle 0m 44s trunk passed
          +1 mvnsite 1m 35s trunk passed
          +1 mvneclipse 0m 28s trunk passed
          +1 findbugs 3m 50s trunk passed
          +1 javadoc 1m 14s trunk passed
          0 mvndep 0m 8s Maven dependency ordering for patch
          +1 mvninstall 1m 35s the patch passed
          +1 compile 1m 40s the patch passed
          +1 cc 1m 40s the patch passed
          +1 javac 1m 40s the patch passed
          +1 checkstyle 0m 45s the patch passed
          +1 mvnsite 1m 28s the patch passed
          +1 mvneclipse 0m 20s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 4s The patch has no ill-formed XML file.
          +1 findbugs 3m 33s the patch passed
          +1 javadoc 0m 57s the patch passed
          +1 unit 0m 56s hadoop-hdfs-client in the patch passed.
          -1 unit 96m 28s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 20s The patch does not generate ASF License warnings.
          138m 24s



          Reason Tests
          Failed junit tests hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12852861/HDFS-10899.09.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux 0d52057c4281 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 0fc6f38
          Default Java 1.8.0_121
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18382/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18382/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18382/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 1m 6s Docker mode activated. +1 @author 0m 1s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 11 new or modified test files. 0 mvndep 0m 11s Maven dependency ordering for branch +1 mvninstall 17m 44s trunk passed +1 compile 1m 38s trunk passed +1 checkstyle 0m 44s trunk passed +1 mvnsite 1m 35s trunk passed +1 mvneclipse 0m 28s trunk passed +1 findbugs 3m 50s trunk passed +1 javadoc 1m 14s trunk passed 0 mvndep 0m 8s Maven dependency ordering for patch +1 mvninstall 1m 35s the patch passed +1 compile 1m 40s the patch passed +1 cc 1m 40s the patch passed +1 javac 1m 40s the patch passed +1 checkstyle 0m 45s the patch passed +1 mvnsite 1m 28s the patch passed +1 mvneclipse 0m 20s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 4s The patch has no ill-formed XML file. +1 findbugs 3m 33s the patch passed +1 javadoc 0m 57s the patch passed +1 unit 0m 56s hadoop-hdfs-client in the patch passed. -1 unit 96m 28s hadoop-hdfs in the patch failed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 138m 24s Reason Tests Failed junit tests hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12852861/HDFS-10899.09.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux 0d52057c4281 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 0fc6f38 Default Java 1.8.0_121 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/18382/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18382/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18382/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 11 new or modified test files.
          0 mvndep 0m 6s Maven dependency ordering for branch
          +1 mvninstall 14m 12s trunk passed
          +1 compile 1m 24s trunk passed
          +1 checkstyle 0m 41s trunk passed
          +1 mvnsite 1m 27s trunk passed
          +1 mvneclipse 0m 27s trunk passed
          +1 findbugs 3m 19s trunk passed
          +1 javadoc 1m 1s trunk passed
          0 mvndep 0m 6s Maven dependency ordering for patch
          +1 mvninstall 1m 25s the patch passed
          +1 compile 1m 22s the patch passed
          +1 cc 1m 22s the patch passed
          +1 javac 1m 22s the patch passed
          +1 checkstyle 0m 39s the patch passed
          +1 mvnsite 1m 27s the patch passed
          +1 mvneclipse 0m 21s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 3s The patch has no ill-formed XML file.
          +1 findbugs 3m 33s the patch passed
          +1 javadoc 0m 58s the patch passed
          +1 unit 0m 57s hadoop-hdfs-client in the patch passed.
          -1 unit 114m 23s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 20s The patch does not generate ASF License warnings.
          150m 3s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency
            hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration
            hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12852861/HDFS-10899.09.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux 9223997301ec 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 0fc6f38
          Default Java 1.8.0_121
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18381/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18381/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18381/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 11 new or modified test files. 0 mvndep 0m 6s Maven dependency ordering for branch +1 mvninstall 14m 12s trunk passed +1 compile 1m 24s trunk passed +1 checkstyle 0m 41s trunk passed +1 mvnsite 1m 27s trunk passed +1 mvneclipse 0m 27s trunk passed +1 findbugs 3m 19s trunk passed +1 javadoc 1m 1s trunk passed 0 mvndep 0m 6s Maven dependency ordering for patch +1 mvninstall 1m 25s the patch passed +1 compile 1m 22s the patch passed +1 cc 1m 22s the patch passed +1 javac 1m 22s the patch passed +1 checkstyle 0m 39s the patch passed +1 mvnsite 1m 27s the patch passed +1 mvneclipse 0m 21s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 3s The patch has no ill-formed XML file. +1 findbugs 3m 33s the patch passed +1 javadoc 0m 58s the patch passed +1 unit 0m 57s hadoop-hdfs-client in the patch passed. -1 unit 114m 23s hadoop-hdfs in the patch failed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 150m 3s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency   hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration   hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12852861/HDFS-10899.09.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux 9223997301ec 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 0fc6f38 Default Java 1.8.0_121 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/18381/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18381/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18381/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          andrew.wang Andrew Wang added a comment -

          Hi Xiao, this latest rev looks like a big improvement in readability, and thanks for the speedy work getting through all the previous comments!

          Nit:

          • FSEditLog, rename logCompleteReencryptZone to logReencryptZoneComplete to match name of the op and the Status method?

          ReencryptionHandler:

          • processCurrentBatch, checkReadyForWrite can throw an exception, and it's not guarded by the try/finally for the writeLock.
          • processCurrentBatch, it looks like we ride over a GeneralSecurityException or null return value when talking to the KMS. Shouldn't this either throw, or be retried? We don't want to miss reencrypting an EDEKs.
          • High-level, it'd be good to think about the behavior when the KMS is slow/flaky/down. It looks like right now an IOException will throw all the way up to run which has a fixed sleep, but we might want some backoff for connectivity issues.
          • In reencryptEncryptionZone, I think restoreFromLastProcessedFile also needs the read lock held when calling? Sprinkling additional paranoia hasReadLock / hasWriteLock checks everywhere would be appreciated.
          • restoreFromLPF, javadoc looks out of date since there's no longer a subdirs
          • The new throttling mechanism could use some code comments
          • Would still prefer this if statement to be in the constructor, have it throw an exception all the way up to the FSDirectory constructor:
              if (ezManager.getProvider() == null) {
                LOG.info("No provider set, cannot re-encrypt");
                return;
              }
          
          • Similarly, we should do some bounds checking on all the config keys in the constructor (like > 0). I think isEnabled becomes extraneous afterwards. I don't think enable/disable is necessary since reencryption is already "opt-in" in that an admin has to trigger it, but if we really do want a flag, I'd prefer we do it with a new config key rather than with the throttling configs.
          • I still find the call hierarchy complicated to reason about, since it has two entry points and is bi-recursive (run > reencryptEZ -> restoreFromLPF or reencryptDir -> reencryptDirInt <> reencryptINode). Rather than recursion, have you considered using a stack or queue to hold unprocessed items? I think this gets us towards a single reencryptDir call in reencryptEZ, since restoreFromLPF would be responsible for repopulating the stack/queue.
          • reencryptDirInt, I see there's a fixup for i if things have changed, but I don't think this is sufficient. Imagine if the dir's children have been deleted. INodeDirectory#nextChild can return a negative index, which will always be less than the for loop termination condition children.size(). children.get won't behave correctly with a negative index.
          • reencryptINode, needs documentation on the return value
          • If the # of items is an exact multiple of reencryptBatchSize, will the final call to processCurrentBatch with lastBatch = true in reencryptEncryptionZone correctly log the completion? It returns early if currentBatch is empty. IMO we should separate batch processing from logging the completion, there doesn't seem like much code sharing.
          • Could use some class-level documentation about the iteration order (appears to be depth-first) and how we handle deletes and creates.
          • Add a Precondition check in ReencryptTracker that passed inode is not null? This is important so the Objects.equals in processCurrentBatch behaves correctly.

          EZManager:

          • Rename getDir to getFSDirectory for clarity?

          Follow-on:

          • This relates to the status follow-on you're planning. I noticed cancelling a reencrypt reuses the complete edit log op. It'd be nice to store some additional info in the EZ root xattr admins can differentiate between completion and cancellation.
          Show
          andrew.wang Andrew Wang added a comment - Hi Xiao, this latest rev looks like a big improvement in readability, and thanks for the speedy work getting through all the previous comments! Nit: FSEditLog, rename logCompleteReencryptZone to logReencryptZoneComplete to match name of the op and the Status method? ReencryptionHandler: processCurrentBatch, checkReadyForWrite can throw an exception, and it's not guarded by the try/finally for the writeLock. processCurrentBatch, it looks like we ride over a GeneralSecurityException or null return value when talking to the KMS. Shouldn't this either throw, or be retried? We don't want to miss reencrypting an EDEKs. High-level, it'd be good to think about the behavior when the KMS is slow/flaky/down. It looks like right now an IOException will throw all the way up to run which has a fixed sleep, but we might want some backoff for connectivity issues. In reencryptEncryptionZone, I think restoreFromLastProcessedFile also needs the read lock held when calling? Sprinkling additional paranoia hasReadLock / hasWriteLock checks everywhere would be appreciated. restoreFromLPF , javadoc looks out of date since there's no longer a subdirs The new throttling mechanism could use some code comments Would still prefer this if statement to be in the constructor, have it throw an exception all the way up to the FSDirectory constructor: if (ezManager.getProvider() == null ) { LOG.info( "No provider set, cannot re-encrypt" ); return ; } Similarly, we should do some bounds checking on all the config keys in the constructor (like > 0). I think isEnabled becomes extraneous afterwards. I don't think enable/disable is necessary since reencryption is already "opt-in" in that an admin has to trigger it, but if we really do want a flag, I'd prefer we do it with a new config key rather than with the throttling configs. I still find the call hierarchy complicated to reason about, since it has two entry points and is bi-recursive (run > reencryptEZ -> restoreFromLPF or reencryptDir -> reencryptDirInt < > reencryptINode). Rather than recursion, have you considered using a stack or queue to hold unprocessed items? I think this gets us towards a single reencryptDir call in reencryptEZ, since restoreFromLPF would be responsible for repopulating the stack/queue. reencryptDirInt, I see there's a fixup for i if things have changed, but I don't think this is sufficient. Imagine if the dir's children have been deleted. INodeDirectory#nextChild can return a negative index, which will always be less than the for loop termination condition children.size() . children.get won't behave correctly with a negative index. reencryptINode, needs documentation on the return value If the # of items is an exact multiple of reencryptBatchSize , will the final call to processCurrentBatch with lastBatch = true in reencryptEncryptionZone correctly log the completion? It returns early if currentBatch is empty. IMO we should separate batch processing from logging the completion, there doesn't seem like much code sharing. Could use some class-level documentation about the iteration order (appears to be depth-first) and how we handle deletes and creates. Add a Precondition check in ReencryptTracker that passed inode is not null? This is important so the Objects.equals in processCurrentBatch behaves correctly. EZManager: Rename getDir to getFSDirectory for clarity? Follow-on: This relates to the status follow-on you're planning. I noticed cancelling a reencrypt reuses the complete edit log op. It'd be nice to store some additional info in the EZ root xattr admins can differentiate between completion and cancellation.
          Hide
          xiaochen Xiao Chen added a comment -

          Reviving this after a quite period.... Updated v2 design doc and a WIP patch v10.
          Most notable changes are:

          • Reuse the existing EZ xattr for re-encryption logging - by adding an optional field to protobuf
          • Added multi-threading to contact the KMS to re-encrypt edeks.

          Design doc:

          • Briefed up a little, to keep the doc short
          • added multi-threading section
          • updated metadata persistent section

          wip patch demonstrates the idea. Still have a bunch of todos:

          • Previous todos
          • Address Andrew's comments above
          • Review fsn/fsd locking, and check synchronization of multi-threads
          • Throttler need to consider memory consumption
          Show
          xiaochen Xiao Chen added a comment - Reviving this after a quite period.... Updated v2 design doc and a WIP patch v10. Most notable changes are: Reuse the existing EZ xattr for re-encryption logging - by adding an optional field to protobuf Added multi-threading to contact the KMS to re-encrypt edeks. Design doc: Briefed up a little, to keep the doc short added multi-threading section updated metadata persistent section wip patch demonstrates the idea. Still have a bunch of todos: Previous todos Address Andrew's comments above Review fsn/fsd locking, and check synchronization of multi-threads Throttler need to consider memory consumption
          Hide
          xiaochen Xiao Chen added a comment -

          I'm working on the next rev to address Andrew's comments and previous todos. Should be up by this week.
          The only comment addressed differently is this, please let me know if you have other thoughts:

          (ReencryptionHandler) Would still prefer this if statement to be in the constructor, have it throw an exception all the way up to the FSDirectory constructor:

          This will cause problems for a non-encryption NN (i.e. with no kms). I think we can instantiate re-encryption related objects only if provider != null, so we don't need the check in the loop.

          Show
          xiaochen Xiao Chen added a comment - I'm working on the next rev to address Andrew's comments and previous todos. Should be up by this week. The only comment addressed differently is this, please let me know if you have other thoughts: (ReencryptionHandler) Would still prefer this if statement to be in the constructor, have it throw an exception all the way up to the FSDirectory constructor: This will cause problems for a non-encryption NN (i.e. with no kms). I think we can instantiate re-encryption related objects only if provider != null , so we don't need the check in the loop.
          Hide
          xiaochen Xiao Chen added a comment -

          Patch 10 ready for review.

          • Addressed all comments and todos, including some additional offline reviews with Wei-Chiu Chuang. (Details listed below)
          • implement listReencryptionStatus
          • completed the multi-thread handler/updater, added testing hook, and related unit tests for race conditions
          • added edek keyversion comparison, and only re-encrypt files with older edeks
          • review and improved locking / EZ iteration

          The only TODOs left I think are:

          • throttling
          • better failure handling when KMS is flaky. (Currently just tells admin there're failures and require another re-encryption).

          Wei-Chiu's review comments:

          Wei-Chiu comments, all reflected in the patch:

          • cancelReencryption need to cancel current running tasks.
          • stopThreads: make sure all futures are canceled.
          • config description: find a better name than reencrypt.interval
          • ReencryptionHandler, test methods reduce scope
          • ReencryptionUpdater: add more comments. (And rename it, previous name Finalizer is confusing)
          • ReencryptionUpdater: add class annotation.
          Show
          xiaochen Xiao Chen added a comment - Patch 10 ready for review. Addressed all comments and todos, including some additional offline reviews with Wei-Chiu Chuang . (Details listed below) implement listReencryptionStatus completed the multi-thread handler/updater, added testing hook, and related unit tests for race conditions added edek keyversion comparison, and only re-encrypt files with older edeks review and improved locking / EZ iteration The only TODOs left I think are: throttling better failure handling when KMS is flaky. (Currently just tells admin there're failures and require another re-encryption). Wei-Chiu's review comments: Wei-Chiu comments, all reflected in the patch: cancelReencryption need to cancel current running tasks. stopThreads: make sure all futures are canceled. config description: find a better name than reencrypt.interval ReencryptionHandler, test methods reduce scope ReencryptionUpdater: add more comments. (And rename it, previous name Finalizer is confusing) ReencryptionUpdater: add class annotation.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
                trunk Compile Tests
          0 mvndep 0m 25s Maven dependency ordering for branch
          +1 mvninstall 13m 22s trunk passed
          +1 compile 1m 26s trunk passed
          +1 checkstyle 0m 53s trunk passed
          +1 mvnsite 1m 27s trunk passed
          -1 findbugs 1m 24s hadoop-hdfs-project/hadoop-hdfs-client in trunk has 2 extant Findbugs warnings.
          -1 findbugs 1m 39s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings.
          +1 javadoc 1m 2s trunk passed
                Patch Compile Tests
          0 mvndep 0m 7s Maven dependency ordering for patch
          +1 mvninstall 1m 23s the patch passed
          +1 compile 1m 23s the patch passed
          +1 cc 1m 23s the patch passed
          +1 javac 1m 23s the patch passed
          -0 checkstyle 0m 51s hadoop-hdfs-project: The patch generated 49 new + 960 unchanged - 2 fixed = 1009 total (was 962)
          +1 mvnsite 1m 22s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 2s The patch has no ill-formed XML file.
          -1 findbugs 1m 48s hadoop-hdfs-project/hadoop-hdfs generated 3 new + 10 unchanged - 0 fixed = 13 total (was 10)
          +1 javadoc 0m 57s the patch passed
                Other Tests
          +1 unit 1m 12s hadoop-hdfs-client in the patch passed.
          -1 unit 68m 10s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 20s The patch does not generate ASF License warnings.
          102m 25s



          Reason Tests
          FindBugs module:hadoop-hdfs-project/hadoop-hdfs
            Naked notify in org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler.notifyNewSubmission() At ReencryptionHandler.java:At ReencryptionHandler.java:[line 813]
            org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler$EDEKReencryptCallable.call() uses the same code for two branches At ReencryptionHandler.java:for two branches At ReencryptionHandler.java:[line 604]
            org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler$EDEKReencryptCallable.call() uses the same code for two branches At ReencryptionHandler.java:for two branches At ReencryptionHandler.java:[line 614]
          Failed junit tests hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070
          Timed out junit tests org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877010/HDFS-10899.10.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux 36cf7f41c5c1 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / e15e271
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20255/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20255/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/20255/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20255/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/20255/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20255/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20255/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.       trunk Compile Tests 0 mvndep 0m 25s Maven dependency ordering for branch +1 mvninstall 13m 22s trunk passed +1 compile 1m 26s trunk passed +1 checkstyle 0m 53s trunk passed +1 mvnsite 1m 27s trunk passed -1 findbugs 1m 24s hadoop-hdfs-project/hadoop-hdfs-client in trunk has 2 extant Findbugs warnings. -1 findbugs 1m 39s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. +1 javadoc 1m 2s trunk passed       Patch Compile Tests 0 mvndep 0m 7s Maven dependency ordering for patch +1 mvninstall 1m 23s the patch passed +1 compile 1m 23s the patch passed +1 cc 1m 23s the patch passed +1 javac 1m 23s the patch passed -0 checkstyle 0m 51s hadoop-hdfs-project: The patch generated 49 new + 960 unchanged - 2 fixed = 1009 total (was 962) +1 mvnsite 1m 22s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 2s The patch has no ill-formed XML file. -1 findbugs 1m 48s hadoop-hdfs-project/hadoop-hdfs generated 3 new + 10 unchanged - 0 fixed = 13 total (was 10) +1 javadoc 0m 57s the patch passed       Other Tests +1 unit 1m 12s hadoop-hdfs-client in the patch passed. -1 unit 68m 10s hadoop-hdfs in the patch failed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 102m 25s Reason Tests FindBugs module:hadoop-hdfs-project/hadoop-hdfs   Naked notify in org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler.notifyNewSubmission() At ReencryptionHandler.java:At ReencryptionHandler.java: [line 813]   org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler$EDEKReencryptCallable.call() uses the same code for two branches At ReencryptionHandler.java:for two branches At ReencryptionHandler.java: [line 604]   org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler$EDEKReencryptCallable.call() uses the same code for two branches At ReencryptionHandler.java:for two branches At ReencryptionHandler.java: [line 614] Failed junit tests hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 Timed out junit tests org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877010/HDFS-10899.10.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux 36cf7f41c5c1 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / e15e271 Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20255/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20255/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/20255/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20255/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html unit https://builds.apache.org/job/PreCommit-HDFS-Build/20255/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20255/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20255/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          daryn Daryn Sharp added a comment -

          Looks like Andrew has been providing very good input but I'd like to review this too. We're trying to deploy EZ and avoiding further generalized performance degradations is crucial. I see lots of references to locking and edit logs.

          Show
          daryn Daryn Sharp added a comment - Looks like Andrew has been providing very good input but I'd like to review this too. We're trying to deploy EZ and avoiding further generalized performance degradations is crucial. I see lots of references to locking and edit logs.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks a lot Daryn Sharp for the heads up. Would love your reviews.

          At a high level, this is an opt-in admin command, so without an admin running hdfs crypto -reencrypt, none of the code here should impact a cluster. I'll double check on this as well, and will be posting a v11 patch shortly today.

          Show
          xiaochen Xiao Chen added a comment - Thanks a lot Daryn Sharp for the heads up. Would love your reviews. At a high level, this is an opt-in admin command, so without an admin running hdfs crypto -reencrypt , none of the code here should impact a cluster. I'll double check on this as well, and will be posting a v11 patch shortly today.
          Hide
          xiaochen Xiao Chen added a comment -

          Attaching patch 11 for review purposes.

          Delta:

          • Only store inodeid in the callable.
          • Modified Updater to only setxattr when the edek is changed
          • Modified test to verify snapshots are not re-encrypted (since they're immutable)
          • Unified re-encryption thread names, ensure the EDEKReencryptionCallable thread pool is correctly configured.

          Haven't done the TODOs, but was getting some basic benchmarks. Will be posting soon.

          Show
          xiaochen Xiao Chen added a comment - Attaching patch 11 for review purposes. Delta: Only store inodeid in the callable. Modified Updater to only setxattr when the edek is changed Modified test to verify snapshots are not re-encrypted (since they're immutable) Unified re-encryption thread names, ensure the EDEKReencryptionCallable thread pool is correctly configured. Haven't done the TODOs, but was getting some basic benchmarks. Will be posting soon.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
                trunk Compile Tests
          0 mvndep 0m 25s Maven dependency ordering for branch
          +1 mvninstall 14m 45s trunk passed
          +1 compile 1m 29s trunk passed
          +1 checkstyle 0m 52s trunk passed
          +1 mvnsite 1m 35s trunk passed
          -1 findbugs 1m 31s hadoop-hdfs-project/hadoop-hdfs-client in trunk has 2 extant Findbugs warnings.
          -1 findbugs 1m 41s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings.
          +1 javadoc 1m 1s trunk passed
                Patch Compile Tests
          0 mvndep 0m 7s Maven dependency ordering for patch
          +1 mvninstall 1m 22s the patch passed
          +1 compile 1m 25s the patch passed
          +1 cc 1m 25s the patch passed
          +1 javac 1m 25s the patch passed
          -0 checkstyle 0m 52s hadoop-hdfs-project: The patch generated 47 new + 960 unchanged - 2 fixed = 1007 total (was 962)
          +1 mvnsite 1m 30s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 3s The patch has no ill-formed XML file.
          +1 findbugs 3m 26s the patch passed
          +1 javadoc 0m 55s the patch passed
                Other Tests
          +1 unit 1m 12s hadoop-hdfs-client in the patch passed.
          -1 unit 66m 51s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 20s The patch does not generate ASF License warnings.
          103m 11s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12879040/HDFS-10899.11.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux 727858aab4d5 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 27a1a5f
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20418/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20418/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/20418/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/20418/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20418/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20418/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.       trunk Compile Tests 0 mvndep 0m 25s Maven dependency ordering for branch +1 mvninstall 14m 45s trunk passed +1 compile 1m 29s trunk passed +1 checkstyle 0m 52s trunk passed +1 mvnsite 1m 35s trunk passed -1 findbugs 1m 31s hadoop-hdfs-project/hadoop-hdfs-client in trunk has 2 extant Findbugs warnings. -1 findbugs 1m 41s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. +1 javadoc 1m 1s trunk passed       Patch Compile Tests 0 mvndep 0m 7s Maven dependency ordering for patch +1 mvninstall 1m 22s the patch passed +1 compile 1m 25s the patch passed +1 cc 1m 25s the patch passed +1 javac 1m 25s the patch passed -0 checkstyle 0m 52s hadoop-hdfs-project: The patch generated 47 new + 960 unchanged - 2 fixed = 1007 total (was 962) +1 mvnsite 1m 30s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 3s The patch has no ill-formed XML file. +1 findbugs 3m 26s the patch passed +1 javadoc 0m 55s the patch passed       Other Tests +1 unit 1m 12s hadoop-hdfs-client in the patch passed. -1 unit 66m 51s hadoop-hdfs in the patch failed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 103m 11s Reason Tests Failed junit tests hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes   hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12879040/HDFS-10899.11.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux 727858aab4d5 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 27a1a5f Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20418/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20418/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/20418/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/20418/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20418/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20418/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Had a productive offline review session with Andrew Wang, where we discussed about several things. Thanks Andrew!

          • By design snapshots are immutable, so even after re-encryption, the snapshots of the EZ will still have old edeks. If there was security breach, admin's need to remove old snapshots, or take manual measures (e.g. cp, or mv to a non-snapshot dir then re-encrypt). Should add this to docs. This jira should not attempt to touch snapshots.
          • Perf -> latency
            I have been running this in a test cluster (with kerberos + SSL), with 1 sample EZ and 1M files. Cluster is on GCE, NN is n1-highmem-4, 2 KMS instances on n1-standard-2.
            Some perf numbers (when all 1M files have old edeks):
            • 1 edek thread: 40~50 mins.
            • 10 edek threads (5000 edeks per task): 13 mins
            • 30 edek threads (1000 edeks per task): 12 mins
              Time to generate all the tasks for 1M files is ~10 seconds.
              The bottleneck of the entire operation is on contacting KMS - from NN side the HTTPS connection to KMS took an average of single digit milliseconds per request, where inside the KMS the actual re-encryption only took 10s of microseconds. The default keep-alive of 5 connections is used, and the first 5 connections (clean setup) took even longer.
              This leads me to prototype a batched re-encryption interface on the KMS, and the perf of that is:
              • 20 threads (1000 per task): 1.5 min.
                Which well fits in our 200M files within 8 hours goal.

          Discussing with Andrew, we felt the batched API is the way to go. I will file another jira to add a batched re-encryption API to the KMS, and update this patch to use that.

          • Perf - memory
            Above test is done without any throttling. We should throttle the ReencryptionHandler when instantiating Callables, to keep NN memory sane. The plan is to use a static calculation, so we only keep a configurable # of Callables in memory - the handler simply waits until a Callable is done and released before creating a new one. Will have a default number calculated from # of cores of the NN. Surely we should also considers how many edeks per Callable. Will implement this soon.
          • Perf - lock throttling
            Ideally we'd also throttle the ReencryptionHandler to control what % of time it can hold the readlock, and similarly ReencryptionUpdater for writelock. But since we already need to wait for the Callables, this kinda comes naturally. I.e. we won't be holding a writelock continuously for a long time. So we may not implement this in v1, pending confirmation from further perf runs.
          • Failure handling:
            Now that we use a batched re-encryption, it makes sense to simply retry the entire Callable (hence entire batch, since that just fails in 1 call). Then it sounds more admin-friendly to simply retry forever, with backoffs. If admin finds this annoying he can always cancel. This is better than the current way of fail the re-encryption after a few times, tell the admin, and force him to rerun the command.
            Also should add fault injectors to unit test some failure scenarios.
          Show
          xiaochen Xiao Chen added a comment - Had a productive offline review session with Andrew Wang , where we discussed about several things. Thanks Andrew! By design snapshots are immutable, so even after re-encryption, the snapshots of the EZ will still have old edeks. If there was security breach, admin's need to remove old snapshots, or take manual measures (e.g. cp, or mv to a non-snapshot dir then re-encrypt). Should add this to docs. This jira should not attempt to touch snapshots. Perf -> latency I have been running this in a test cluster (with kerberos + SSL), with 1 sample EZ and 1M files. Cluster is on GCE , NN is n1-highmem-4, 2 KMS instances on n1-standard-2. Some perf numbers (when all 1M files have old edeks): 1 edek thread: 40~50 mins. 10 edek threads (5000 edeks per task): 13 mins 30 edek threads (1000 edeks per task): 12 mins Time to generate all the tasks for 1M files is ~10 seconds. The bottleneck of the entire operation is on contacting KMS - from NN side the HTTPS connection to KMS took an average of single digit milliseconds per request, where inside the KMS the actual re-encryption only took 10s of microseconds. The default keep-alive of 5 connections is used, and the first 5 connections (clean setup) took even longer. This leads me to prototype a batched re-encryption interface on the KMS, and the perf of that is: 20 threads (1000 per task): 1.5 min. Which well fits in our 200M files within 8 hours goal. Discussing with Andrew, we felt the batched API is the way to go. I will file another jira to add a batched re-encryption API to the KMS, and update this patch to use that. Perf - memory Above test is done without any throttling. We should throttle the ReencryptionHandler when instantiating Callables, to keep NN memory sane. The plan is to use a static calculation, so we only keep a configurable # of Callables in memory - the handler simply waits until a Callable is done and released before creating a new one. Will have a default number calculated from # of cores of the NN. Surely we should also considers how many edeks per Callable. Will implement this soon. Perf - lock throttling Ideally we'd also throttle the ReencryptionHandler to control what % of time it can hold the readlock, and similarly ReencryptionUpdater for writelock. But since we already need to wait for the Callables, this kinda comes naturally. I.e. we won't be holding a writelock continuously for a long time. So we may not implement this in v1, pending confirmation from further perf runs. Failure handling: Now that we use a batched re-encryption, it makes sense to simply retry the entire Callable (hence entire batch, since that just fails in 1 call). Then it sounds more admin-friendly to simply retry forever, with backoffs. If admin finds this annoying he can always cancel. This is better than the current way of fail the re-encryption after a few times, tell the admin, and force him to rerun the command. Also should add fault injectors to unit test some failure scenarios.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Hi Xiao Chen nice to see real perf number!
          One high level question: did you notice any significant NameNode pause during re-encryption other than GC pauses?

          Here's my code review note. I'm still half way through my review.

          • The class ZoneStatus in ReencryptionStatus should be refactored into its own class file, because it is used everywhere and itself is around half of ReencryptionStatus. Considering future expansion, this class would be refactored in the future anyway.
          • ReencryptionHandler#cancelZone() should throw exception if ZoneStatus is not found. Otherwise the caller can't tell if zone is actually being cancelled.
          • I feel that the behavior of "cancel" should be documented somewhere, that is, it does not interrupt existing re-encrypt operation and does not wait for the ongoing re-encrypt operation to stop.
          • EncryptionZoneManager#listReencryptionStatus and ReencryptionStatus#listReencryptionStatus instantiates an empty arraylist. We can use a flyweight pattern (pre-allocate an empty BatchedListEntries/array list) to reduce new object allocation. This is rather minor issue though.
          • ReencryptHandler#startThreads/stopThreads are a little confusing to me initially, because ReencryptHandler itself is a Runnable, but startThreads/stopThreads actually are about ReencryptionUpdater threads.
          • Suggest that ReencryptHandler#run() and ReencryptUpdater#run() add extra logs at the end like
            LOG.info("ReencryptHandler thread exiting.");
            
          • Also in ReencryptHandler#run() :
                   catch (Exception e) {
                    LOG.error("Exception caught when re-encrypting zone {}.", zoneId, e);
                    throw e;
                  }
            

            The throw e does not make much sense, since the code runs in a thread and run() does not declare Exception exception. (java compiler does not complain about it, though)
            Suggest the following instead:

                  catch (Throwable t) {
                    LOG.error("Exiting when re-encrypting zone {}", zoneId, t);
                    return;
                  }
            
          Show
          jojochuang Wei-Chiu Chuang added a comment - Hi Xiao Chen nice to see real perf number! One high level question: did you notice any significant NameNode pause during re-encryption other than GC pauses? Here's my code review note. I'm still half way through my review. The class ZoneStatus in ReencryptionStatus should be refactored into its own class file, because it is used everywhere and itself is around half of ReencryptionStatus. Considering future expansion, this class would be refactored in the future anyway. ReencryptionHandler#cancelZone() should throw exception if ZoneStatus is not found. Otherwise the caller can't tell if zone is actually being cancelled. I feel that the behavior of "cancel" should be documented somewhere, that is, it does not interrupt existing re-encrypt operation and does not wait for the ongoing re-encrypt operation to stop. EncryptionZoneManager#listReencryptionStatus and ReencryptionStatus#listReencryptionStatus instantiates an empty arraylist. We can use a flyweight pattern (pre-allocate an empty BatchedListEntries/array list) to reduce new object allocation. This is rather minor issue though. ReencryptHandler#startThreads/stopThreads are a little confusing to me initially, because ReencryptHandler itself is a Runnable, but startThreads/stopThreads actually are about ReencryptionUpdater threads. Suggest that ReencryptHandler#run() and ReencryptUpdater#run() add extra logs at the end like LOG.info( "ReencryptHandler thread exiting." ); Also in ReencryptHandler#run() : catch (Exception e) { LOG.error( "Exception caught when re-encrypting zone {}." , zoneId, e); throw e; } The throw e does not make much sense, since the code runs in a thread and run() does not declare Exception exception. (java compiler does not complain about it, though) Suggest the following instead: catch (Throwable t) { LOG.error( "Exiting when re-encrypting zone {}" , zoneId, t); return ; }
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks for the review Wei-Chiu Chuang, patch 12 to address all comments.

          • Updated cancel to cancel any pending futures. Do not think we need to wait for the ongoing operation (aka if the updater is processing a completed future) to stop .
          • ReencryptHandler#startThreads/stopThreads renamed for readability.
          • Also updated docs about my snapshot point above.

          did you notice any significant NameNode pause during re-encryption other than GC pauses?

          As clarified offline, didn't see any RPC hang, since we're not holding locks when contacting KMS, which appears to be the bottleneck.
          Re-encryption does have to take the read/write locks when iterating the EZ / updating xattrs, respectively. But since we're not holding the locks continuously, and throttling will be done based on KMS latency, my gut feeling is this will not significantly block NN (or pause).
          Will try to test with some jobs running during re-encryption to verify - after the throttling code is done.

          Show
          xiaochen Xiao Chen added a comment - Thanks for the review Wei-Chiu Chuang , patch 12 to address all comments. Updated cancel to cancel any pending futures. Do not think we need to wait for the ongoing operation (aka if the updater is processing a completed future) to stop . ReencryptHandler#startThreads/stopThreads renamed for readability. Also updated docs about my snapshot point above. did you notice any significant NameNode pause during re-encryption other than GC pauses? As clarified offline, didn't see any RPC hang, since we're not holding locks when contacting KMS, which appears to be the bottleneck. Re-encryption does have to take the read/write locks when iterating the EZ / updating xattrs, respectively. But since we're not holding the locks continuously, and throttling will be done based on KMS latency, my gut feeling is this will not significantly block NN (or pause). Will try to test with some jobs running during re-encryption to verify - after the throttling code is done.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          -1 patch 0m 6s HDFS-10899 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



          Subsystem Report/Notes
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12879578/HDFS-10899.12.patch
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20493/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 6s HDFS-10899 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12879578/HDFS-10899.12.patch Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20493/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          rebased to latest trunk

          Show
          xiaochen Xiao Chen added a comment - rebased to latest trunk
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 28s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
                trunk Compile Tests
          0 mvndep 0m 8s Maven dependency ordering for branch
          +1 mvninstall 15m 9s trunk passed
          +1 compile 1m 35s trunk passed
          +1 checkstyle 0m 56s trunk passed
          +1 mvnsite 1m 35s trunk passed
          -1 findbugs 1m 32s hadoop-hdfs-project/hadoop-hdfs-client in trunk has 2 extant Findbugs warnings.
          -1 findbugs 1m 45s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings.
          +1 javadoc 1m 1s trunk passed
                Patch Compile Tests
          0 mvndep 0m 7s Maven dependency ordering for patch
          +1 mvninstall 1m 22s the patch passed
          +1 compile 1m 40s the patch passed
          +1 cc 1m 40s the patch passed
          +1 javac 1m 40s the patch passed
          -0 checkstyle 0m 54s hadoop-hdfs-project: The patch generated 65 new + 962 unchanged - 2 fixed = 1027 total (was 964)
          +1 mvnsite 1m 29s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 3s The patch has no ill-formed XML file.
          +1 findbugs 3m 25s the patch passed
          +1 javadoc 0m 57s the patch passed
                Other Tests
          +1 unit 1m 16s hadoop-hdfs-client in the patch passed.
          -1 unit 94m 22s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          131m 38s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
            hadoop.hdfs.TestReconstructStripedFile
            hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12879696/HDFS-10899.12.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux 2fa0ba68abf2 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 3e23415
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20501/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20501/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/20501/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/20501/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20501/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20501/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 28s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.       trunk Compile Tests 0 mvndep 0m 8s Maven dependency ordering for branch +1 mvninstall 15m 9s trunk passed +1 compile 1m 35s trunk passed +1 checkstyle 0m 56s trunk passed +1 mvnsite 1m 35s trunk passed -1 findbugs 1m 32s hadoop-hdfs-project/hadoop-hdfs-client in trunk has 2 extant Findbugs warnings. -1 findbugs 1m 45s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. +1 javadoc 1m 1s trunk passed       Patch Compile Tests 0 mvndep 0m 7s Maven dependency ordering for patch +1 mvninstall 1m 22s the patch passed +1 compile 1m 40s the patch passed +1 cc 1m 40s the patch passed +1 javac 1m 40s the patch passed -0 checkstyle 0m 54s hadoop-hdfs-project: The patch generated 65 new + 962 unchanged - 2 fixed = 1027 total (was 964) +1 mvnsite 1m 29s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 3s The patch has no ill-formed XML file. +1 findbugs 3m 25s the patch passed +1 javadoc 0m 57s the patch passed       Other Tests +1 unit 1m 16s hadoop-hdfs-client in the patch passed. -1 unit 94m 22s hadoop-hdfs in the patch failed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 131m 38s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   hadoop.hdfs.TestReconstructStripedFile   hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140 Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12879696/HDFS-10899.12.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux 2fa0ba68abf2 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 3e23415 Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20501/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20501/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/20501/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/20501/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20501/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20501/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          findbugs warning is unrelated, caused by HDFS-11696

          Show
          jojochuang Wei-Chiu Chuang added a comment - findbugs warning is unrelated, caused by HDFS-11696
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Still half way through my review. Post my notes for reference:

          ReencryptionStatus#listReencryptionStatus

          final boolean hasMore = (numResp < tailMap.size());
          

          This seems untrue. if the system has 1001 (tailMap.size()) encryption zones, and we get 1 (count) encryption zone starting from the 1000th and we expect to get 100 results (numResp), the hasMore should be false.
          It looks like it would cause an infinite loop between client and NN because hasMore is always true.

          Javadoc for HdfsAdmin#listReencryptionStatus:
          This method can only be called by HDFS superusers.

          This is a little unnecessary and misleading.
          By definition, methods in HdfsAdmin are superuser only.
          Also, the permission check is enforced at namenode side, not client side.

          ZoneReencryptionStatus

          @InterfaceAudience.Private annotation for ZoneReencryptionStatus

          It seems EncryptionZoneManager#reencryptionHandler can be null if NN does not configure any key providers.
          So every access to reencryptionHandler should be checked to avoid NPE.

          EncryptionZoneManager#fixReencryptionZoneNames()
          suggest a more appropriate method name — “fixReencryptionZoneNames” feels like a hack for a bug.

          EncryptionZoneManager#cancelReencryptEncryptionZone: the Javadoc “If the given path If the given path is not the root of an encryption zone,” has duplicate words

          Would ReencryptionStatus#getZoneStatus ever return null?
          The method is called by multiple callers so I can’t be sure if it is ever possible. But it could return null, some value checking would be necessary for the callers.

          ReencryptionHandler#submitCurrentBatch
          IntelliJ is complaining that this code

          TreeMap<String, FileEdekPair> batch = (TreeMap<String, FileEdekPair>)
    ((TreeMap<String, FileEdekPair>) currentBatch).clone();
          

          is an unchecked cast.

          ReencryptionHandler#run()
          Append “millisecond” at the end of log message "Starting up re-encrypt thread with interval={}”

          A number of typos
          “inode cannot be resolve to a full path” : “resolve” —> resolved

          Show
          jojochuang Wei-Chiu Chuang added a comment - Still half way through my review. Post my notes for reference: ReencryptionStatus#listReencryptionStatus final boolean hasMore = (numResp < tailMap.size()); This seems untrue. if the system has 1001 (tailMap.size()) encryption zones, and we get 1 (count) encryption zone starting from the 1000th and we expect to get 100 results (numResp), the hasMore should be false. It looks like it would cause an infinite loop between client and NN because hasMore is always true. Javadoc for HdfsAdmin#listReencryptionStatus: This method can only be called by HDFS superusers. This is a little unnecessary and misleading. By definition, methods in HdfsAdmin are superuser only. Also, the permission check is enforced at namenode side, not client side. ZoneReencryptionStatus @InterfaceAudience.Private annotation for ZoneReencryptionStatus It seems EncryptionZoneManager#reencryptionHandler can be null if NN does not configure any key providers. So every access to reencryptionHandler should be checked to avoid NPE. EncryptionZoneManager#fixReencryptionZoneNames() suggest a more appropriate method name — “fixReencryptionZoneNames” feels like a hack for a bug. EncryptionZoneManager#cancelReencryptEncryptionZone: the Javadoc “If the given path If the given path is not the root of an encryption zone,” has duplicate words Would ReencryptionStatus#getZoneStatus ever return null? The method is called by multiple callers so I can’t be sure if it is ever possible. But it could return null, some value checking would be necessary for the callers. ReencryptionHandler#submitCurrentBatch IntelliJ is complaining that this code TreeMap< String , FileEdekPair> batch = (TreeMap< String , FileEdekPair>)
 ((TreeMap< String , FileEdekPair>) currentBatch).clone(); is an unchecked cast. ReencryptionHandler#run() Append “millisecond” at the end of log message "Starting up re-encrypt thread with interval={}” A number of typos “inode cannot be resolve to a full path” : “resolve” —> resolved
          Hide
          daryn Daryn Sharp added a comment -

          I'm really trying to find cycles for this jira. Since I keep seeing locking being mentioned, I've skimmed the patch. A few quick observations:

          • I see a lot of full path reconstruction. This is not cheap. Avoid requiring the full path if possible.
          • Why can't a file with a re-encrypting EDEK be renamed?
          • The batching behavior is a bit worrisome to me. Reading this: "my gut feeling is this will not significantly block NN (or pause)" makes my gut drop. I need more proof than a feeling that this very desirable feature will not tank a production cluster. See below.
          • The managing of the depth-first scan claims that tracking a list of path components is cheaper than tracking inodes. How/why? Creating a list of boxed byte arrays has a lot more overhead than tracking inodes. Back and forth resolving of path components to inodes and vice-versa is not cheap at all.

          Not clear to me that proper locking is always being used. I'll spiel what might already be known but it's what I'll look for:

          • Holding the fsdir lock technically requires holding the fsn lock.
          • If the fsn lock is released & reacquired, checkOperation must always be called to ensure the NN hasn't dropped into standby. Logging an edit as a standby would be very very bad.
          • IIPs cannot be resolved w/o the lock and are null and void if the lock is released.
          • Edit logs are surprisingly not thread safe. logEdit must hold the fsn write lock or rolling can corrupt the logs.
          • logSync must never be called with the write lock.

          Regarding edits, you don't always have to sync immediately. Syncing is technically only required before sending a client a response to ensure durability. If you just log edits they will be batched and eventually synced by the next write op. It looks like if the NN was crash during reencryption, and buffered edits are lost, that it will correctly resume.

          Didn't check out how big the batches are. Lock time in general definitely becomes a concern, read lock or not. Another consideration is an edit flood. Sending more edits than can be buffered without a sync will cause a sync while holding the write lock. Not good.

          As for performance. I'm concerned you are focusing only on raw re-encrypt performance. That means little to me. Throughput during the process matters. Here's some rudimentary ways to measure performance on a quiescent NN.

          • Blast the NN with read ops like getFileInfo. Try to peg it. Should be easy to achieve a steady state of 100-300k ops/sec. Measure ops/sec during a lengthy reencrypt.
          • Blast the NN with write ops. Rename files back and forth in a directory. Try to peg it and depending on hw, maybe a steady 5-10k ops/sec. Measure ops/sec during a lengthy reencrypt.
            This will give us an idea of how costly the implementation is. I'll make up some numbers. During an emergency, a 10% hit might be acceptable. 10% is unacceptable for a proactive roll.

          I'll comment further after I review more.

          Show
          daryn Daryn Sharp added a comment - I'm really trying to find cycles for this jira. Since I keep seeing locking being mentioned, I've skimmed the patch. A few quick observations: I see a lot of full path reconstruction. This is not cheap. Avoid requiring the full path if possible. Why can't a file with a re-encrypting EDEK be renamed? The batching behavior is a bit worrisome to me. Reading this: "my gut feeling is this will not significantly block NN (or pause)" makes my gut drop. I need more proof than a feeling that this very desirable feature will not tank a production cluster. See below. The managing of the depth-first scan claims that tracking a list of path components is cheaper than tracking inodes. How/why? Creating a list of boxed byte arrays has a lot more overhead than tracking inodes. Back and forth resolving of path components to inodes and vice-versa is not cheap at all. Not clear to me that proper locking is always being used. I'll spiel what might already be known but it's what I'll look for: Holding the fsdir lock technically requires holding the fsn lock. If the fsn lock is released & reacquired, checkOperation must always be called to ensure the NN hasn't dropped into standby. Logging an edit as a standby would be very very bad. IIPs cannot be resolved w/o the lock and are null and void if the lock is released. Edit logs are surprisingly not thread safe. logEdit must hold the fsn write lock or rolling can corrupt the logs. logSync must never be called with the write lock. Regarding edits, you don't always have to sync immediately. Syncing is technically only required before sending a client a response to ensure durability. If you just log edits they will be batched and eventually synced by the next write op. It looks like if the NN was crash during reencryption, and buffered edits are lost, that it will correctly resume. Didn't check out how big the batches are. Lock time in general definitely becomes a concern, read lock or not. Another consideration is an edit flood. Sending more edits than can be buffered without a sync will cause a sync while holding the write lock. Not good. As for performance. I'm concerned you are focusing only on raw re-encrypt performance. That means little to me. Throughput during the process matters. Here's some rudimentary ways to measure performance on a quiescent NN. Blast the NN with read ops like getFileInfo. Try to peg it. Should be easy to achieve a steady state of 100-300k ops/sec. Measure ops/sec during a lengthy reencrypt. Blast the NN with write ops. Rename files back and forth in a directory. Try to peg it and depending on hw, maybe a steady 5-10k ops/sec. Measure ops/sec during a lengthy reencrypt. This will give us an idea of how costly the implementation is. I'll make up some numbers. During an emergency, a 10% hit might be acceptable. 10% is unacceptable for a proactive roll. I'll comment further after I review more.
          Hide
          andrew.wang Andrew Wang added a comment -

          quick comment on Wei-chiu's review:

          By definition, methods in HdfsAdmin are superuser only.

          This is not actually true, this class is just for HDFS-specific operations. Putting "Admin" in the name is a misnomer, and since this continues to be confusing, maybe we should enhance the class javadoc to make this explicit.

          Show
          andrew.wang Andrew Wang added a comment - quick comment on Wei-chiu's review: By definition, methods in HdfsAdmin are superuser only. This is not actually true, this class is just for HDFS-specific operations. Putting "Admin" in the name is a misnomer, and since this continues to be confusing, maybe we should enhance the class javadoc to make this explicit.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks a lot for managing your time to get to this, Daryn Sharp.

          I will go through the patch and make sure the suggestions are applied. A few clarifications:

          Why can't a file with a re-encrypting EDEK be renamed?

          Details were added to HDFS-11203's description. Basically we can't guarantee re-encryption doesn't miss any renames easily. Can we further discuss on that jira, to keep our focuses?

          tracking via path components v.s. via inodes

          (also replied in HDFS-11203 but likes some clarification). Tracking inodes would work, but that means we have to go from ROOT_INODE_ID to the biggest inodeid, no matter what the EZ is, right? It makes sense if the EZ is /, but if it's a small EZ, we don't want to iterate all inodes. Please elaborate if I misunderstood.

          bullet points from the spiel

          Thanks a lot for the advice! I think all of those were considered, except for Sending more edits than can be buffered without a sync will cause a sync while holding the write lock. I will carefully look at this. For a quick reference, the batch size defaults to 1000.

          Will get more numbers from different aspects about performance, I want everyone's gut kept safe Quick question, what tool did you use to blast the NN? Is it sharable?

          Show
          xiaochen Xiao Chen added a comment - Thanks a lot for managing your time to get to this, Daryn Sharp . I will go through the patch and make sure the suggestions are applied. A few clarifications: Why can't a file with a re-encrypting EDEK be renamed? Details were added to HDFS-11203 's description. Basically we can't guarantee re-encryption doesn't miss any renames easily. Can we further discuss on that jira, to keep our focuses? tracking via path components v.s. via inodes (also replied in HDFS-11203 but likes some clarification). Tracking inodes would work, but that means we have to go from ROOT_INODE_ID to the biggest inodeid, no matter what the EZ is, right? It makes sense if the EZ is / , but if it's a small EZ, we don't want to iterate all inodes. Please elaborate if I misunderstood. bullet points from the spiel Thanks a lot for the advice! I think all of those were considered, except for Sending more edits than can be buffered without a sync will cause a sync while holding the write lock . I will carefully look at this. For a quick reference, the batch size defaults to 1000. Will get more numbers from different aspects about performance, I want everyone's gut kept safe Quick question, what tool did you use to blast the NN? Is it sharable?
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks for the review Wei-Chiu!

          Patch 13 addressed most, comments below:

          ReencryptionStatus#listReencryptionStatus
          final boolean hasMore = (numResp < tailMap.size());
          This seems untrue. if the system has 1001 (tailMap.size()) encryption zones, and we get 1 (count) encryption zone starting from the 1000th and we expect to get 100 results (numResp), the hasMore should be false.
          It looks like it would cause an infinite loop between client and NN because hasMore is always true.

          tailMap is a view of the entire map. So if 1001 zones has status, and prevId is the 1000th, tailMap.size() is 1. So I think current code is correct.

          Javadoc for HdfsAdmin#listReencryptionStatus:

          Thanks Andrew for the suggestion. Created HDFS-12261 to limit the change of this jira.

          Would ReencryptionStatus#getZoneStatus ever return null?

          Theoretically could, but shouldn't, since zone status is protected by fsd lock. Added asserts to all callers.

          fixReencryptionZoneNames naming...

          Changed to updateReencryptionZoneNamesAfterImageLoaded, how does that sound? I'm having a hard time to give it a meaningful yet short name.

          In addition to the above, this patch also includes:

          • use inodeid as much as possible, instead of path names. I see HDFS-12172 has some nice optimizations by Daryn Sharp, should rebase this once that's done.
          • implement throttling on the handler and updater sides. This should give some freedom to control lock contention. Ideally these throttler should be changeable on-fly - not done in this patch though.

          what tool did you use to blast the NN

          Should play with nnbenchmark I think. Appreciate other ideas though.

          Also looked at edit log flooding problem. The default size is 256 - I have yet to verify how many ops it can hold, but sounds like we should be doing something smart to proactively unlock, sync, and reacquire if this is destined to happen..

          I will be doing performance related tests/tunings next week, and will post results and updated patch here by next Friday.

          Show
          xiaochen Xiao Chen added a comment - Thanks for the review Wei-Chiu! Patch 13 addressed most, comments below: ReencryptionStatus#listReencryptionStatus final boolean hasMore = (numResp < tailMap.size()); This seems untrue. if the system has 1001 (tailMap.size()) encryption zones, and we get 1 (count) encryption zone starting from the 1000th and we expect to get 100 results (numResp), the hasMore should be false. It looks like it would cause an infinite loop between client and NN because hasMore is always true. tailMap is a view of the entire map. So if 1001 zones has status, and prevId is the 1000th, tailMap.size() is 1. So I think current code is correct. Javadoc for HdfsAdmin#listReencryptionStatus: Thanks Andrew for the suggestion. Created HDFS-12261 to limit the change of this jira. Would ReencryptionStatus#getZoneStatus ever return null? Theoretically could, but shouldn't, since zone status is protected by fsd lock. Added asserts to all callers. fixReencryptionZoneNames naming... Changed to updateReencryptionZoneNamesAfterImageLoaded , how does that sound? I'm having a hard time to give it a meaningful yet short name. In addition to the above, this patch also includes: use inodeid as much as possible, instead of path names. I see HDFS-12172 has some nice optimizations by Daryn Sharp , should rebase this once that's done. implement throttling on the handler and updater sides. This should give some freedom to control lock contention. Ideally these throttler should be changeable on-fly - not done in this patch though. what tool did you use to blast the NN Should play with nnbenchmark I think. Appreciate other ideas though. Also looked at edit log flooding problem. The default size is 256 - I have yet to verify how many ops it can hold, but sounds like we should be doing something smart to proactively unlock, sync, and reacquire if this is destined to happen.. I will be doing performance related tests/tunings next week, and will post results and updated patch here by next Friday.
          Hide
          xiaochen Xiao Chen added a comment -

          There's also HADOOP-14705 which Wei-Chiu Chuang is helping to review. Once that's done this jira will also need a rebase.

          Show
          xiaochen Xiao Chen added a comment - There's also HADOOP-14705 which Wei-Chiu Chuang is helping to review. Once that's done this jira will also need a rebase.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
                trunk Compile Tests
          0 mvndep 0m 25s Maven dependency ordering for branch
          +1 mvninstall 15m 37s trunk passed
          +1 compile 1m 40s trunk passed
          +1 checkstyle 0m 54s trunk passed
          +1 mvnsite 1m 40s trunk passed
          -1 findbugs 1m 32s hadoop-hdfs-project/hadoop-hdfs-client in trunk has 2 extant Findbugs warnings.
          -1 findbugs 1m 55s hadoop-hdfs-project/hadoop-hdfs in trunk has 9 extant Findbugs warnings.
          +1 javadoc 1m 14s trunk passed
                Patch Compile Tests
          0 mvndep 0m 7s Maven dependency ordering for patch
          +1 mvninstall 1m 33s the patch passed
          +1 compile 1m 42s the patch passed
          +1 cc 1m 42s the patch passed
          +1 javac 1m 42s the patch passed
          -0 checkstyle 0m 59s hadoop-hdfs-project: The patch generated 31 new + 959 unchanged - 2 fixed = 990 total (was 961)
          +1 mvnsite 1m 42s the patch passed
          +1 whitespace 0m 1s The patch has no whitespace issues.
          +1 xml 0m 2s The patch has no ill-formed XML file.
          +1 findbugs 3m 51s the patch passed
          +1 javadoc 1m 5s the patch passed
                Other Tests
          +1 unit 1m 23s hadoop-hdfs-client in the patch passed.
          -1 unit 75m 53s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          115m 38s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.datanode.TestDirectoryScanner
            hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy
            hadoop.hdfs.server.namenode.TestReencryption



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12880492/HDFS-10899.13.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux 0fc3b8b55b3b 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 024c3ec
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20567/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20567/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/20567/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/20567/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20567/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20567/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.       trunk Compile Tests 0 mvndep 0m 25s Maven dependency ordering for branch +1 mvninstall 15m 37s trunk passed +1 compile 1m 40s trunk passed +1 checkstyle 0m 54s trunk passed +1 mvnsite 1m 40s trunk passed -1 findbugs 1m 32s hadoop-hdfs-project/hadoop-hdfs-client in trunk has 2 extant Findbugs warnings. -1 findbugs 1m 55s hadoop-hdfs-project/hadoop-hdfs in trunk has 9 extant Findbugs warnings. +1 javadoc 1m 14s trunk passed       Patch Compile Tests 0 mvndep 0m 7s Maven dependency ordering for patch +1 mvninstall 1m 33s the patch passed +1 compile 1m 42s the patch passed +1 cc 1m 42s the patch passed +1 javac 1m 42s the patch passed -0 checkstyle 0m 59s hadoop-hdfs-project: The patch generated 31 new + 959 unchanged - 2 fixed = 990 total (was 961) +1 mvnsite 1m 42s the patch passed +1 whitespace 0m 1s The patch has no whitespace issues. +1 xml 0m 2s The patch has no ill-formed XML file. +1 findbugs 3m 51s the patch passed +1 javadoc 1m 5s the patch passed       Other Tests +1 unit 1m 23s hadoop-hdfs-client in the patch passed. -1 unit 75m 53s hadoop-hdfs in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 115m 38s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestDirectoryScanner   hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy   hadoop.hdfs.server.namenode.TestReencryption Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12880492/HDFS-10899.13.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux 0fc3b8b55b3b 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 024c3ec Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20567/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20567/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/20567/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/20567/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20567/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20567/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Looked at the Daryn comment about edit log buffering and experimented.

          For a SetXAttrOp, since re-encryption always updates the 1 encryption xattr, the size of one logEdit is between size 100 ~ 200 bytes (including the 17 bytes overhead which is always there). Default buffer size is 512KB non-configurable for both the qjm and fjm, so the default batch size of 1000 is safe - as long as we logSync after each batch, which is current behavior.

          As a safety check, I'm adding a scary warning if batch size is larger than 2000.

          Also benchmarking to save our guts about NN throughput. Will post when all results are ready.

          Show
          xiaochen Xiao Chen added a comment - Looked at the Daryn comment about edit log buffering and experimented. For a SetXAttrOp , since re-encryption always updates the 1 encryption xattr, the size of one logEdit is between size 100 ~ 200 bytes (including the 17 bytes overhead which is always there). Default buffer size is 512KB non-configurable for both the qjm and fjm , so the default batch size of 1000 is safe - as long as we logSync after each batch, which is current behavior. As a safety check, I'm adding a scary warning if batch size is larger than 2000. Also benchmarking to save our guts about NN throughput. Will post when all results are ready.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks again Daryn Sharp for the guidelines about general benchmarking.

          I have run some numbers, proving the throttler to work. Shared a result doc.
          Unfortunately for the time being I don't have a beefy enough machine to stress the NN to reach the ops/sec you described, and only have a gce instance. Therefore as shown in the doc, the stress is bottlenecked on CPU, instead of the read/write lock.

          Some highlights from the doc which I think shows the throttling taking effect:

          • Throttle ratio 0.1, without stress (only re-encryption), we see host CPU dropped to be steadily < 50%.
          • Without throttling, throughput during re-encryption is only ~ 65% of baseline.
          • With throttle ratio 0.1, ~ 75%
          • With an extreme throttle ratio of 0.001, ~90%

          I'll look into why 0.001 doesn't provide a 99% throughput. But otherwise ready for review.
          Patch 14 is attached, based on trunk on top of HADOOP-14705's latest patch.

          Show
          xiaochen Xiao Chen added a comment - Thanks again Daryn Sharp for the guidelines about general benchmarking. I have run some numbers, proving the throttler to work. Shared a result doc . Unfortunately for the time being I don't have a beefy enough machine to stress the NN to reach the ops/sec you described, and only have a gce instance. Therefore as shown in the doc, the stress is bottlenecked on CPU, instead of the read/write lock. Some highlights from the doc which I think shows the throttling taking effect: Throttle ratio 0.1, without stress (only re-encryption), we see host CPU dropped to be steadily < 50%. Without throttling, throughput during re-encryption is only ~ 65% of baseline. With throttle ratio 0.1, ~ 75% With an extreme throttle ratio of 0.001, ~90% I'll look into why 0.001 doesn't provide a 99% throughput. But otherwise ready for review. Patch 14 is attached, based on trunk on top of HADOOP-14705 's latest patch.
          Hide
          xiaochen Xiao Chen added a comment - - edited

          Just wanted to update about the myths with the 0.001 not 99% symptom. After much profiling it turns out the contribution here is just CPU being stressed (while the thottler calculation is based on lock time). Out of the lock there are various parts that's consuming CPU, from the callable contacting KMS to the thread pool managing the futures.
          If I change the ratio to 0.000001 (hence sleeping except for 1st batch), for my env the throughput is at the same level of without re-encryption.

          Assuming on a production NN lock is usually the bottleneck, instead of CPU, I think this achieves our goal to allow a fast enough re-encryption by default, and leave the configuration flexibility to slow re-encryption down to keep up throughput.

          Closing up on a few TODOs around KMS error handling and fault injected unit testing, will update soon.

          Show
          xiaochen Xiao Chen added a comment - - edited Just wanted to update about the myths with the 0.001 not 99% symptom. After much profiling it turns out the contribution here is just CPU being stressed (while the thottler calculation is based on lock time). Out of the lock there are various parts that's consuming CPU, from the callable contacting KMS to the thread pool managing the futures. If I change the ratio to 0.000001 (hence sleeping except for 1st batch), for my env the throughput is at the same level of without re-encryption. Assuming on a production NN lock is usually the bottleneck, instead of CPU, I think this achieves our goal to allow a fast enough re-encryption by default, and leave the configuration flexibility to slow re-encryption down to keep up throughput. Closing up on a few TODOs around KMS error handling and fault injected unit testing, will update soon.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Hi Xiao, thanks for the latest work!
          I reviewed the latest patch (rev 014), mostly focused on ReencryptionHandler.

          1. ReencryptionHandler#(constructor)
            ReencryptionHandler#(constructor)
            if (reencryptBatchSize > 2000) {
  // 2000 is based on buffer size = 512 * 1024, and SetXAttr op size ~ 200.
  
            LOG.warn("Re-encryption batch size is {}. It could cause edit log buffer "

                  + "to be full an trigger a logSync within the writelock, greatly "
            
      + "impacting namenode throughput.");
}
            
          • The log needs to print reencryptBatchSize.
          • It might also improve readability by making the hard coded number (2000) a final static member variable.
          • Please also add few more lines of comment to add the reference to QuorumJournalManager#outputBufferCapacity.
          1. ReencryptionHandler#batchService, and ReencryptionUpdater#batchService
            can be declared as ExecutorCompletionService<ReencryptionTask>
          1. ReencryptionHandler#removeZone(): It looks like you can de-dup some code by reusing ReencryptionHandler#cancelZone().
          1. reencryptionHandler#reencryptEncryptionZone()
            reencryptionHandler#reencryptEncryptionZone()
            zs = getReencryptionStatus().getZoneStatus(zoneId);

            assert zs != null;
            
          • zoneId is obtained when holding FSDirectory read lock, release the lock, and then acquire FSDirectory read lock again.
          • This assertion is only correct if there will be only one ReencryptionHandler running.
          1. ReencryptionStatus#updateZoneStatus() should check that zoneNode is an encryption zone.
          1. Why is currentBatch a TreeMap?
          1. Does ZoneReencryptionStatus#getLastProcessedFile return the relative path? or file name only? or absolute path?
          1. ReencryptionHandler#checkZoneReady
            ReencryptionHandler#checkZoneReady
            dir.getFSNamesystem()
            
    .checkNameNodeSafeMode("NN is in safe mode, cannot " + "re-encrypt");
            
          • The “+” is redundant.
          1. ReencryptionHandler#reencryptDir
            ReencryptionHandler#reencryptDir
            curr = parent;

            curr =

                reencryptDirInt(zoneId, currentBatch, curr, startAfters, ezKeyVerName);
            

            Better rewrite as

            

            curr =
            
    reencryptDirInt(zoneId, currentBatch, parent, startAfters, ezKeyVerName);
            
          1. ReencryptionHandler#reencryptDirInt
            ReencryptionHandler#reencryptDirInt
            assert dir.hasReadLock();
            
Preconditions.checkNotNull(curr, "Current inode can't be null");

            assert dir.hasReadLock(); 
            
          • The second assertion is redundant.
          1. ReencryptionHandler#reencryptINode
            ReencryptionHandler#reencryptINode
            FileEncryptionInfo feInfo = FSDirEncryptionZoneOp
            
    .getFileEncryptionInfo(dir, INodesInPath.fromINode(inode));
            
if (feInfo == null || ezKeyVerName.equals(feInfo.getEzKeyVersionName())) {
            
  LOG.debug("File {} skipped re-encryption because edek's key version name"

                  + " is not changed.", name);
            
  return false;
            
}
            
          • if feInfo is null, it would be unexpected — an INodeFile under an encryption zone is supposed to have FileEncryptionInfo.
          1. ReencryptionHandler#submitCurrentBatch
          • It Allocates a 2000-element map, copy it over, and then clear the map. That looks suboptimal. Would it be feasible to wrap TreeMap and make a method that simply assigns the TreeMap reference to another currentBatch?
          1. ReencryptionHandler.EDEKReencryptCallable:
            // TODO: add reasonable retries here.
            
          • I feel the retry should be handled by provider.
          • if reencryptEdeks() returns numFailures > 0, call() should not return a new ReencryptionTask object.
          Show
          jojochuang Wei-Chiu Chuang added a comment - Hi Xiao, thanks for the latest work! I reviewed the latest patch (rev 014), mostly focused on ReencryptionHandler. ReencryptionHandler#(constructor) ReencryptionHandler#(constructor) if (reencryptBatchSize > 2000) {
 // 2000 is based on buffer size = 512 * 1024, and SetXAttr op size ~ 200.
 LOG.warn( "Re-encryption batch size is {}. It could cause edit log buffer " 
 + "to be full an trigger a logSync within the writelock, greatly " 
 + "impacting namenode throughput." );
} The log needs to print reencryptBatchSize. It might also improve readability by making the hard coded number (2000) a final static member variable. Please also add few more lines of comment to add the reference to QuorumJournalManager#outputBufferCapacity. ReencryptionHandler#batchService, and ReencryptionUpdater#batchService can be declared as ExecutorCompletionService<ReencryptionTask> ReencryptionHandler#removeZone(): It looks like you can de-dup some code by reusing ReencryptionHandler#cancelZone(). reencryptionHandler#reencryptEncryptionZone() reencryptionHandler#reencryptEncryptionZone() zs = getReencryptionStatus().getZoneStatus(zoneId);
 assert zs != null ; zoneId is obtained when holding FSDirectory read lock, release the lock, and then acquire FSDirectory read lock again. This assertion is only correct if there will be only one ReencryptionHandler running. ReencryptionStatus#updateZoneStatus() should check that zoneNode is an encryption zone. Why is currentBatch a TreeMap? Does ZoneReencryptionStatus#getLastProcessedFile return the relative path? or file name only? or absolute path? ReencryptionHandler#checkZoneReady ReencryptionHandler#checkZoneReady dir.getFSNamesystem() 
 .checkNameNodeSafeMode( "NN is in safe mode, cannot " + "re-encrypt" ); The “+” is redundant. ReencryptionHandler#reencryptDir ReencryptionHandler#reencryptDir curr = parent;
 curr =
 reencryptDirInt(zoneId, currentBatch, curr, startAfters, ezKeyVerName); Better rewrite as 
 curr = 
 reencryptDirInt(zoneId, currentBatch, parent, startAfters, ezKeyVerName); ReencryptionHandler#reencryptDirInt ReencryptionHandler#reencryptDirInt assert dir.hasReadLock(); 
Preconditions.checkNotNull(curr, "Current inode can't be null " );
 assert dir.hasReadLock(); The second assertion is redundant. ReencryptionHandler#reencryptINode ReencryptionHandler#reencryptINode FileEncryptionInfo feInfo = FSDirEncryptionZoneOp 
 .getFileEncryptionInfo(dir, INodesInPath.fromINode(inode)); 
 if (feInfo == null || ezKeyVerName.equals(feInfo.getEzKeyVersionName())) { 
 LOG.debug( "File {} skipped re-encryption because edek's key version name" 
 + " is not changed." , name); 
 return false ; 
} if feInfo is null, it would be unexpected — an INodeFile under an encryption zone is supposed to have FileEncryptionInfo. ReencryptionHandler#submitCurrentBatch It Allocates a 2000-element map, copy it over, and then clear the map. That looks suboptimal. Would it be feasible to wrap TreeMap and make a method that simply assigns the TreeMap reference to another currentBatch? ReencryptionHandler.EDEKReencryptCallable: // TODO: add reasonable retries here. I feel the retry should be handled by provider. if reencryptEdeks() returns numFailures > 0, call() should not return a new ReencryptionTask object.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Still part way through the rev014 patch. This time with more focus on ReencryptionUpdater:

          It looks like ReencryptionTask.batch does not need to use file name as the key; instead, it can use INode id as key, and this way it reduces the overhead to translate inode to file name back and forth.

          ReencryptionUpdater#processOneTask

          It looks like this method is better part of ReencryptionTask instead of ReencryptionUpdater. (Other than dir, none of member variables is used)

          final FileEdekInfo rt = entry.getValue();
          

          Could you use a more descriptive variable name than “rt”?

          LOG.info("Updated xattrs on {}({}) files in zone {} for re-encryption,"
        + " starting:{}.", task.numFilesUpdated, task.batch.size(),
    startingFile, zonePath);
          

          I think you got startingFile and zonePath reversed.

          Does task.numFilesUpdated equal task.batch.size()?

          ReencryptionUpdater#processCompletedTasks
          This variable name is a little cryptic:
          final ZoneSubmissionTracker zst

          There are a few TODOs

          } catch (RetriableException re) {
          
  // TODO

          } catch (IOException ioe) {
          
  // TODO
          }
          
          Show
          jojochuang Wei-Chiu Chuang added a comment - Still part way through the rev014 patch. This time with more focus on ReencryptionUpdater: It looks like ReencryptionTask.batch does not need to use file name as the key; instead, it can use INode id as key, and this way it reduces the overhead to translate inode to file name back and forth. ReencryptionUpdater#processOneTask It looks like this method is better part of ReencryptionTask instead of ReencryptionUpdater. (Other than dir, none of member variables is used) final FileEdekInfo rt = entry.getValue(); Could you use a more descriptive variable name than “rt”? LOG.info( "Updated xattrs on {}({}) files in zone {} for re-encryption," 
 + " starting:{}." , task.numFilesUpdated, task.batch.size(),
 startingFile, zonePath); I think you got startingFile and zonePath reversed. Does task.numFilesUpdated equal task.batch.size()? ReencryptionUpdater#processCompletedTasks This variable name is a little cryptic: final ZoneSubmissionTracker zst There are a few TODOs } catch (RetriableException re) { 
 // TODO
 } catch (IOException ioe) { 
 // TODO }
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          ReencryptionUpdater#throttle():

          private void throttle() throws InterruptedException {
  if (throttleLimitRatio >= 1.0) {
    return;
  }
          

          the default is 1.0, which means no throttle for updater, and updater would keep contending for namenode lock even if it has nothing to do?

          ReencryptionUpdater#processCheckpoints

          LOG.warn("Failed to update re-encrypted edek to xattr for file {}",
    zonePath);
          

          zonePath is not a file name.

          Please log the ioe as well.

          ReencryptionUpdater#processCheckpoints()

          LinkedList<Future> tasks = zst.getTasks();
          

          ReencryptionHandler and ReencryptionUpdater shares ZoneSubmissionTracker#tasks.

          Access to zst.getTasks() is protected in processCheckpoints() by a FSDirectory write lock and a FSNamesystem write lock, whereas it is protected in ReencryptionHandler#submitCurrentBatch by a FSDirectory read lock and a FSNameSystem readlock. If there will always be just one ReencryptionHandler, then this is okay.

          List<XAttr> xAttrs = null;
          
          …
          while( … ) {
            xAttrs = Lists.newArrayListWithCapacity(1);
          
            final XAttr xattr = FSDirEncryptionZoneOp
    .updateReencryptionProgress(dir, zoneNode, status, task.lastFile,
        task.numFilesUpdated, task.numFailures);
  xAttrs.add(xattr);
          }
          …
          return xAttrs;
          

          Looks like you allocate a new xAttrs array for each task, but then discard it, only to return the the xAttrs array for the last task?
          I guess you want to instantiate the xAttrs array outside of the while loop?

          the edit log is written only when all tasks are successful. If some tasks fail, the successful tasks would record xattrs in memory, but will not get written into edit log.

          Show
          jojochuang Wei-Chiu Chuang added a comment - ReencryptionUpdater#throttle(): private void throttle() throws InterruptedException {
 if (throttleLimitRatio >= 1.0) {
 return ;
 } the default is 1.0, which means no throttle for updater, and updater would keep contending for namenode lock even if it has nothing to do? ReencryptionUpdater#processCheckpoints LOG.warn( "Failed to update re-encrypted edek to xattr for file {}" ,
 zonePath); zonePath is not a file name. Please log the ioe as well. ReencryptionUpdater#processCheckpoints() LinkedList<Future> tasks = zst.getTasks(); ReencryptionHandler and ReencryptionUpdater shares ZoneSubmissionTracker#tasks. Access to zst.getTasks() is protected in processCheckpoints() by a FSDirectory write lock and a FSNamesystem write lock, whereas it is protected in ReencryptionHandler#submitCurrentBatch by a FSDirectory read lock and a FSNameSystem readlock. If there will always be just one ReencryptionHandler, then this is okay. List<XAttr> xAttrs = null ; … while ( … ) { xAttrs = Lists.newArrayListWithCapacity(1); final XAttr xattr = FSDirEncryptionZoneOp
 .updateReencryptionProgress(dir, zoneNode, status, task.lastFile,
 task.numFilesUpdated, task.numFailures);
 xAttrs.add(xattr); } … return xAttrs; Looks like you allocate a new xAttrs array for each task, but then discard it, only to return the the xAttrs array for the last task? I guess you want to instantiate the xAttrs array outside of the while loop? the edit log is written only when all tasks are successful. If some tasks fail, the successful tasks would record xattrs in memory, but will not get written into edit log.
          Hide
          xiaochen Xiao Chen added a comment - - edited

          Thanks a lot for the reviews Wei-Chiu Chuang, good comments!
          Replying one by one, and attaching a patch in the end. Comments that's not mentioned are all addressed.

          reencryptionHandler#reencryptEncryptionZone()
          zoneId is obtained when holding FSDirectory read lock, release the lock, and then acquire FSDirectory read lock again.
          This assertion is only correct if there will be only one ReencryptionHandler running.

          There is only one ReencryptionHandler. Added texts to javadoc.
          If the zone referred to by inodeid is changed (e.g. deleted/renamed) while the lock is not held, checkZoneReady will throw. A similar test case would be TestReencryption#testZoneDeleteDuringReencrypt.

          ReencryptionStatus#updateZoneStatus() should check that zoneNode is an encryption zone.

          For the 2 callers, FSD#addEZ is where the zoneId is added, so always true. FSDirXAttrOp#unprotectedSetXAttrs is happening within the EZXattr, so also always true. (There's no 'disable encryption' command, so zone node can only be deleted/renamed)

          Why is currentBatch a TreeMap?

          Good question. Initially this was done to keep the element's ordering and using path as the key. Now that it's changed to inode id based, we can just use a list. (Sorry didn't rebase the inodeid patch here on 14...)

          Does ZoneReencryptionStatus#getLastProcessedFile return the relative path? or file name only? or absolute path?

          Absolute path - so we can restore in case of fail over.

          It Allocates a 2000-element map, copy it over, and then clear the map. That looks suboptimal. Would it be feasible to wrap TreeMap and make a method that simply assigns the TreeMap reference to another currentBatch?

          Agreed, problem is currentBatch here is passed in from the very outside of the call stack.
          Made it a member variable of ReencryptionHandler to address this. It's still safe with the single-threaded handler model, but perhaps harder to read. Please share your thoughts.

          EDEKReencryptCallable ... retry ... if reencryptEdeks() returns numFailures > 0, call() should not return a new ReencryptionTask object.

          Initially talking with Andrew Wang, we wanted to always retry things, so admin can just fix the error, and continue (or cancel).
          But since KMSCP already has the retry logic added by HADOOP-14521, and to trade off for maintainability, we do not 'double retry' here, and only let KMSCP's retry policy to handle failures.
          When -listReencryptionStatus, if numOfFailures > 0, a message is printed to ask admin to examine failure and re-submit.
          Implementation-wise, we still depend on the ReencryptionTask object to pass the failures to the updater, so need that object. Updater handles failed tasks differently.

          Show
          xiaochen Xiao Chen added a comment - - edited Thanks a lot for the reviews Wei-Chiu Chuang , good comments! Replying one by one, and attaching a patch in the end. Comments that's not mentioned are all addressed. reencryptionHandler#reencryptEncryptionZone() zoneId is obtained when holding FSDirectory read lock, release the lock, and then acquire FSDirectory read lock again. This assertion is only correct if there will be only one ReencryptionHandler running. There is only one ReencryptionHandler . Added texts to javadoc. If the zone referred to by inodeid is changed (e.g. deleted/renamed) while the lock is not held, checkZoneReady will throw. A similar test case would be TestReencryption#testZoneDeleteDuringReencrypt . ReencryptionStatus#updateZoneStatus() should check that zoneNode is an encryption zone. For the 2 callers, FSD#addEZ is where the zoneId is added, so always true. FSDirXAttrOp#unprotectedSetXAttrs is happening within the EZXattr, so also always true. (There's no 'disable encryption' command, so zone node can only be deleted/renamed) Why is currentBatch a TreeMap? Good question. Initially this was done to keep the element's ordering and using path as the key. Now that it's changed to inode id based, we can just use a list. (Sorry didn't rebase the inodeid patch here on 14...) Does ZoneReencryptionStatus#getLastProcessedFile return the relative path? or file name only? or absolute path? Absolute path - so we can restore in case of fail over. It Allocates a 2000-element map, copy it over, and then clear the map. That looks suboptimal. Would it be feasible to wrap TreeMap and make a method that simply assigns the TreeMap reference to another currentBatch? Agreed, problem is currentBatch here is passed in from the very outside of the call stack. Made it a member variable of ReencryptionHandler to address this. It's still safe with the single-threaded handler model, but perhaps harder to read. Please share your thoughts. EDEKReencryptCallable ... retry ... if reencryptEdeks() returns numFailures > 0, call() should not return a new ReencryptionTask object. Initially talking with Andrew Wang , we wanted to always retry things, so admin can just fix the error, and continue (or cancel). But since KMSCP already has the retry logic added by HADOOP-14521 , and to trade off for maintainability, we do not 'double retry' here, and only let KMSCP's retry policy to handle failures. When -listReencryptionStatus, if numOfFailures > 0, a message is printed to ask admin to examine failure and re-submit. Implementation-wise, we still depend on the ReencryptionTask object to pass the failures to the updater, so need that object. Updater handles failed tasks differently.
          Hide
          xiaochen Xiao Chen added a comment -

          It looks like ReencryptionTask.batch does not need to use file name as the key; instead, it can use INode id as key, and this way it reduces the overhead to translate inode to file name back and forth.

          Yup, I think that's inline with Daryn's comment.

          ReencryptionUpdater#processOneTask ... better part of ReencryptionTask instead of ReencryptionUpdater...

          Intention was that handler produces callables, and updater consumes callables. ReencryptionTask is a struct holding information about the callable, so I think process on updater makes sense.

          Does task.numFilesUpdated equal task.batch.size()?

          May or may not. If things caused a file to be skipped (those conditions above which would lead to a continue), then not equal.

          This variable name is a little cryptic: zst

          Renamed to 'tracker', better name suggestion welcome.

          There are a few TODOs

          Failure handling wasn't done at the time. patch 15 does it, with added unit tests that utilize fault injector.

          Show
          xiaochen Xiao Chen added a comment - It looks like ReencryptionTask.batch does not need to use file name as the key; instead, it can use INode id as key, and this way it reduces the overhead to translate inode to file name back and forth. Yup, I think that's inline with Daryn's comment. ReencryptionUpdater#processOneTask ... better part of ReencryptionTask instead of ReencryptionUpdater... Intention was that handler produces callables, and updater consumes callables. ReencryptionTask is a struct holding information about the callable, so I think process on updater makes sense. Does task.numFilesUpdated equal task.batch.size()? May or may not. If things caused a file to be skipped (those conditions above which would lead to a continue ), then not equal. This variable name is a little cryptic: zst Renamed to 'tracker', better name suggestion welcome. There are a few TODOs Failure handling wasn't done at the time. patch 15 does it, with added unit tests that utilize fault injector.
          Hide
          xiaochen Xiao Chen added a comment - - edited

          ReencryptionUpdater#throttle(): updater would keep contending for namenode lock

          batchService.take(); is a blocking call, so hangs there if 'nothing to do', so NN lock untouched.
          1.0 means no throttling, so would be touch on locking - that's because this is intended to be run in a maintenance window. Same reason why renames are disabled during this time.
          Throttler also considers how many tasks are pending, to prevent piling up tasks on NN heap.

          ... ZoneSubmissionTracker#tasks.... If there will always be just one ReencryptionHandler, then this is okay.

          Good analysis. Yes, one handler.

          the edit log is written only when all tasks are successful.

          Edit: I think I misread your comment.
          Yes, in case of failures, we best effort by telling admin 'there are failures, please examine and rerun'.

          Show
          xiaochen Xiao Chen added a comment - - edited ReencryptionUpdater#throttle(): updater would keep contending for namenode lock batchService.take(); is a blocking call, so hangs there if 'nothing to do', so NN lock untouched. 1.0 means no throttling, so would be touch on locking - that's because this is intended to be run in a maintenance window. Same reason why renames are disabled during this time. Throttler also considers how many tasks are pending, to prevent piling up tasks on NN heap. ... ZoneSubmissionTracker#tasks.... If there will always be just one ReencryptionHandler, then this is okay. Good analysis. Yes, one handler. the edit log is written only when all tasks are successful. Edit: I think I misread your comment. Yes, in case of failures, we best effort by telling admin 'there are failures, please examine and rerun'.
          Hide
          xiaochen Xiao Chen added a comment -

          Patch 15 uploaded, still based on PATCH-14705..

          Addressed all Wei-Chiu Chuang comments above (I think), as well as some issues found during internal testing:

          • fixed listReencryptionStatus to track by inode, and also filter out snapshots
          • added fault injectors and related unit tests for failure handling.
          • throttling improvements
          Show
          xiaochen Xiao Chen added a comment - Patch 15 uploaded, still based on PATCH-14705.. Addressed all Wei-Chiu Chuang comments above (I think), as well as some issues found during internal testing: fixed listReencryptionStatus to track by inode, and also filter out snapshots added fault injectors and related unit tests for failure handling. throttling improvements
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
                trunk Compile Tests
          0 mvndep 0m 35s Maven dependency ordering for branch
          +1 mvninstall 14m 6s trunk passed
          +1 compile 1m 24s trunk passed
          +1 checkstyle 0m 54s trunk passed
          +1 mvnsite 1m 27s trunk passed
          +1 findbugs 3m 4s trunk passed
          +1 javadoc 1m 2s trunk passed
                Patch Compile Tests
          0 mvndep 0m 8s Maven dependency ordering for patch
          +1 mvninstall 1m 20s the patch passed
          +1 compile 1m 22s the patch passed
          +1 cc 1m 22s the patch passed
          +1 javac 1m 22s the patch passed
          -0 checkstyle 0m 52s hadoop-hdfs-project: The patch generated 43 new + 933 unchanged - 2 fixed = 976 total (was 935)
          +1 mvnsite 1m 22s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 2s The patch has no ill-formed XML file.
          +1 findbugs 3m 16s the patch passed
          +1 javadoc 0m 55s the patch passed
                Other Tests
          +1 unit 1m 12s hadoop-hdfs-client in the patch passed.
          -1 unit 72m 21s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 20s The patch does not generate ASF License warnings.
          107m 29s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.ha.TestEditLogTailer
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
            hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12883075/HDFS-10899.15.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux 6e0ecc2e1b22 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 4ec5acc
          Default Java 1.8.0_144
          findbugs v3.1.0-RC1
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/20805/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/20805/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20805/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20805/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.       trunk Compile Tests 0 mvndep 0m 35s Maven dependency ordering for branch +1 mvninstall 14m 6s trunk passed +1 compile 1m 24s trunk passed +1 checkstyle 0m 54s trunk passed +1 mvnsite 1m 27s trunk passed +1 findbugs 3m 4s trunk passed +1 javadoc 1m 2s trunk passed       Patch Compile Tests 0 mvndep 0m 8s Maven dependency ordering for patch +1 mvninstall 1m 20s the patch passed +1 compile 1m 22s the patch passed +1 cc 1m 22s the patch passed +1 javac 1m 22s the patch passed -0 checkstyle 0m 52s hadoop-hdfs-project: The patch generated 43 new + 933 unchanged - 2 fixed = 976 total (was 935) +1 mvnsite 1m 22s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 2s The patch has no ill-formed XML file. +1 findbugs 3m 16s the patch passed +1 javadoc 0m 55s the patch passed       Other Tests +1 unit 1m 12s hadoop-hdfs-client in the patch passed. -1 unit 72m 21s hadoop-hdfs in the patch failed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 107m 29s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.ha.TestEditLogTailer   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12883075/HDFS-10899.15.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux 6e0ecc2e1b22 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 4ec5acc Default Java 1.8.0_144 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/20805/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/20805/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20805/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20805/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jojochuang Wei-Chiu Chuang added a comment - - edited

          Thanks for the rev015 patch!

          Looks like all the concerns found in the reviews are addressed.

          Given that

          1. this feature does not affect existing functionality if not used,
          2. there is sufficient proof that it works in an integrated scale test,
          3. and all deficiencies are considered and addressed,

          I would like to vote my +1 for the latest, rev 015 patch (pending Jenkins and checkstyle), and will proceed to commit the patch after 24 hours if there's no objection. If there are minor derfinciecies found afterwards, I'd like to suggest deferring them to a new jira.

          Show
          jojochuang Wei-Chiu Chuang added a comment - - edited Thanks for the rev015 patch! Looks like all the concerns found in the reviews are addressed. Given that this feature does not affect existing functionality if not used, there is sufficient proof that it works in an integrated scale test, and all deficiencies are considered and addressed, I would like to vote my +1 for the latest, rev 015 patch (pending Jenkins and checkstyle), and will proceed to commit the patch after 24 hours if there's no objection. If there are minor derfinciecies found afterwards, I'd like to suggest deferring them to a new jira.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks a lot Wei-Chiu!
          Patch 16 to fix checkstyles, and edited a few log contents. failed tests did not reproduce locally.

          Show
          xiaochen Xiao Chen added a comment - Thanks a lot Wei-Chiu! Patch 16 to fix checkstyles, and edited a few log contents. failed tests did not reproduce locally.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
                trunk Compile Tests
          0 mvndep 0m 25s Maven dependency ordering for branch
          +1 mvninstall 16m 16s trunk passed
          +1 compile 1m 44s trunk passed
          +1 checkstyle 0m 58s trunk passed
          +1 mvnsite 1m 43s trunk passed
          +1 findbugs 3m 41s trunk passed
          +1 javadoc 1m 10s trunk passed
                Patch Compile Tests
          0 mvndep 0m 9s Maven dependency ordering for patch
          +1 mvninstall 1m 43s the patch passed
          +1 compile 1m 47s the patch passed
          +1 cc 1m 47s the patch passed
          +1 javac 1m 47s the patch passed
          -0 checkstyle 1m 1s hadoop-hdfs-project: The patch generated 10 new + 932 unchanged - 2 fixed = 942 total (was 934)
          +1 mvnsite 1m 44s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 3s The patch has no ill-formed XML file.
          +1 findbugs 3m 51s the patch passed
          +1 javadoc 1m 8s the patch passed
                Other Tests
          +1 unit 1m 21s hadoop-hdfs-client in the patch passed.
          -1 unit 76m 0s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 21s The patch does not generate ASF License warnings.
          117m 13s



          Reason Tests
          Failed junit tests hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080
            hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits
          Timed out junit tests org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12883263/HDFS-10899.16.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux b590f7db5adb 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 4249172
          Default Java 1.8.0_144
          findbugs v3.1.0-RC1
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/20815/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/20815/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20815/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20815/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.       trunk Compile Tests 0 mvndep 0m 25s Maven dependency ordering for branch +1 mvninstall 16m 16s trunk passed +1 compile 1m 44s trunk passed +1 checkstyle 0m 58s trunk passed +1 mvnsite 1m 43s trunk passed +1 findbugs 3m 41s trunk passed +1 javadoc 1m 10s trunk passed       Patch Compile Tests 0 mvndep 0m 9s Maven dependency ordering for patch +1 mvninstall 1m 43s the patch passed +1 compile 1m 47s the patch passed +1 cc 1m 47s the patch passed +1 javac 1m 47s the patch passed -0 checkstyle 1m 1s hadoop-hdfs-project: The patch generated 10 new + 932 unchanged - 2 fixed = 942 total (was 934) +1 mvnsite 1m 44s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 3s The patch has no ill-formed XML file. +1 findbugs 3m 51s the patch passed +1 javadoc 1m 8s the patch passed       Other Tests +1 unit 1m 21s hadoop-hdfs-client in the patch passed. -1 unit 76m 0s hadoop-hdfs in the patch failed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 117m 13s Reason Tests Failed junit tests hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080   hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits Timed out junit tests org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12883263/HDFS-10899.16.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux b590f7db5adb 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 4249172 Default Java 1.8.0_144 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/20815/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/20815/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20815/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20815/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Wei-Chiu Chuang suggested offline that the doc should over more. So patch 17 is a doc-only update on top of 16.

          The checkstyles are all DFSConfigKeys line number > 80, so good to go as-is.

          Show
          xiaochen Xiao Chen added a comment - Wei-Chiu Chuang suggested offline that the doc should over more. So patch 17 is a doc-only update on top of 16. The checkstyles are all DFSConfigKeys line number > 80, so good to go as-is.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Thanks Xiao,
          the doc update doesn't need to be in this patch. But it looks good to me so +1, pending Jenkins.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Thanks Xiao, the doc update doesn't need to be in this patch. But it looks good to me so +1, pending Jenkins.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
                trunk Compile Tests
          0 mvndep 0m 8s Maven dependency ordering for branch
          +1 mvninstall 17m 49s trunk passed
          +1 compile 1m 53s trunk passed
          +1 checkstyle 1m 1s trunk passed
          +1 mvnsite 1m 51s trunk passed
          +1 findbugs 3m 41s trunk passed
          +1 javadoc 1m 15s trunk passed
                Patch Compile Tests
          0 mvndep 0m 9s Maven dependency ordering for patch
          +1 mvninstall 1m 44s the patch passed
          +1 compile 1m 51s the patch passed
          +1 cc 1m 51s the patch passed
          +1 javac 1m 51s the patch passed
          -0 checkstyle 0m 59s hadoop-hdfs-project: The patch generated 10 new + 931 unchanged - 2 fixed = 941 total (was 933)
          +1 mvnsite 1m 47s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 3s The patch has no ill-formed XML file.
          +1 findbugs 4m 7s the patch passed
          +1 javadoc 1m 14s the patch passed
                Other Tests
          +1 unit 1m 29s hadoop-hdfs-client in the patch passed.
          -1 unit 89m 29s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 22s The patch does not generate ASF License warnings.
          133m 8s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
            hadoop.hdfs.TestEncryptionZones
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
            hadoop.hdfs.server.balancer.TestBalancerRPCDelay
            hadoop.hdfs.server.namenode.TestReconstructStripedBlocks
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080
            hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HDFS-10899
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12883375/HDFS-10899.17.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux 2cf5c297fdad 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / f49843a
          Default Java 1.8.0_144
          findbugs v3.1.0-RC1
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/20824/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/20824/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20824/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20824/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.       trunk Compile Tests 0 mvndep 0m 8s Maven dependency ordering for branch +1 mvninstall 17m 49s trunk passed +1 compile 1m 53s trunk passed +1 checkstyle 1m 1s trunk passed +1 mvnsite 1m 51s trunk passed +1 findbugs 3m 41s trunk passed +1 javadoc 1m 15s trunk passed       Patch Compile Tests 0 mvndep 0m 9s Maven dependency ordering for patch +1 mvninstall 1m 44s the patch passed +1 compile 1m 51s the patch passed +1 cc 1m 51s the patch passed +1 javac 1m 51s the patch passed -0 checkstyle 0m 59s hadoop-hdfs-project: The patch generated 10 new + 931 unchanged - 2 fixed = 941 total (was 933) +1 mvnsite 1m 47s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 3s The patch has no ill-formed XML file. +1 findbugs 4m 7s the patch passed +1 javadoc 1m 14s the patch passed       Other Tests +1 unit 1m 29s hadoop-hdfs-client in the patch passed. -1 unit 89m 29s hadoop-hdfs in the patch failed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 133m 8s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.TestEncryptionZones   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   hadoop.hdfs.server.balancer.TestBalancerRPCDelay   hadoop.hdfs.server.namenode.TestReconstructStripedBlocks   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080   hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-10899 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12883375/HDFS-10899.17.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux 2cf5c297fdad 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / f49843a Default Java 1.8.0_144 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/20824/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/20824/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20824/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20824/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Committed to trunk.
          (Checkstyle same as above, DFSConfigKeys. Ran failed tests locally, not able to reproduce - except for TestBalancerRPCDelay which fails with or without this patch - created HDFS-12347 for that)

          My sincere thanks to Andrew Wang, Wei-Chiu Chuang, Daryn Sharp, Dilaver, Suraj Acharya, Anthony Young-Garner, Aaron T. Myers, Yongjun Zhang, Misha Dmitriev, and everyone else involved!

          Show
          xiaochen Xiao Chen added a comment - Committed to trunk. (Checkstyle same as above, DFSConfigKeys . Ran failed tests locally, not able to reproduce - except for TestBalancerRPCDelay which fails with or without this patch - created HDFS-12347 for that) My sincere thanks to Andrew Wang , Wei-Chiu Chuang , Daryn Sharp , Dilaver , Suraj Acharya , Anthony Young-Garner , Aaron T. Myers , Yongjun Zhang , Misha Dmitriev , and everyone else involved!
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12233 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12233/)
          HDFS-10899. Add functionality to re-encrypt EDEKs. (xiao: rev 1000a2af04b24c123a3b08168f36b4e90420cab7)

          • (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestReencryption.java
          • (add) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ReencryptionStatus.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
          • (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestReencryptionHandler.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testCryptoConf.xml
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirEncryptionZoneOp.java
          • (add) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ReencryptionUpdater.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirWriteFileOp.java
          • (add) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ZoneReencryptionStatus.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/CryptoAdmin.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsAdmin.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionFaultInjector.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/encryption.proto
          • (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
          • (add) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ReencryptionStatusIterator.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/TransparentEncryption.md
          • (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
          • (add) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ReencryptionHandler.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/hdfs.proto
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirXAttrOp.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12233 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12233/ ) HDFS-10899 . Add functionality to re-encrypt EDEKs. (xiao: rev 1000a2af04b24c123a3b08168f36b4e90420cab7) (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestReencryption.java (add) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ReencryptionStatus.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestReencryptionHandler.java (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testCryptoConf.xml (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirEncryptionZoneOp.java (add) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ReencryptionUpdater.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirWriteFileOp.java (add) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ZoneReencryptionStatus.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/CryptoAdmin.java (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsAdmin.java (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionFaultInjector.java (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/encryption.proto (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java (add) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ReencryptionStatusIterator.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/TransparentEncryption.md (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java (add) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ReencryptionHandler.java (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/hdfs.proto (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirXAttrOp.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java

            People

            • Assignee:
              xiaochen Xiao Chen
              Reporter:
              xiaochen Xiao Chen
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development