Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11026

Convert BlockTokenIdentifier to use Protobuf

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.9.0, 3.0.0-alpha1
    • Fix Version/s: 3.0.0-alpha4
    • Component/s: hdfs, hdfs-client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Changed the serialized format of BlockTokenIdentifier to protocol buffers. Includes logic to decode both the old Writable format and the new PB format to support existing clients. Client implementations in other languages will require similar functionality.
      Show
      Changed the serialized format of BlockTokenIdentifier to protocol buffers. Includes logic to decode both the old Writable format and the new PB format to support existing clients. Client implementations in other languages will require similar functionality.

      Description

      BlockTokenIdentifier currently uses a DataInput/DataOutput (basically a byte[]) and manual serialization to get data into and out of the encrypted buffer (in BlockKeyProto). Other TokenIdentifiers (e.g. ContainerTokenIdentifier, AMRMTokenIdentifier) use Protobuf. The BlockTokenIdenfitier should use Protobuf as well so it can be expanded more easily and will be consistent with the rest of the system.

      1. blocktokenidentifier-protobuf.patch
        7 kB
        Ewan Higgs
      2. HDFS-11026.002.patch
        27 kB
        Ewan Higgs
      3. HDFS-11026.003.patch
        27 kB
        Ewan Higgs
      4. HDFS-11026.004.patch
        31 kB
        Ewan Higgs
      5. HDFS-11026.005.patch
        31 kB
        Ewan Higgs
      6. HDFS-11026.006.patch
        34 kB
        Ewan Higgs
      7. HDFS-11026.007.patch
        37 kB
        Ewan Higgs

        Issue Links

          Activity

          Hide
          kihwal Kihwal Lee added a comment -

          Daryn Sharp, would you care to comment on this?

          Show
          kihwal Kihwal Lee added a comment - Daryn Sharp , would you care to comment on this?
          Hide
          ehiggs Ewan Higgs added a comment -

          Attaching a patch that converts the BlockTokenIdentifier to use Protobuf.

          NB: This uses optional values for all the items being written to the token secret since that's the behaviour of the previous version that used WritableUtils.writeString. I think it would be better to force the Strings to be non-null when writing but I chose to do it this way so it's like-for-like with the previous version.

          Show
          ehiggs Ewan Higgs added a comment - Attaching a patch that converts the BlockTokenIdentifier to use Protobuf. NB: This uses optional values for all the items being written to the token secret since that's the behaviour of the previous version that used WritableUtils.writeString . I think it would be better to force the Strings to be non-null when writing but I chose to do it this way so it's like-for-like with the previous version.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 7m 18s trunk passed
          +1 compile 0m 32s trunk passed
          +1 checkstyle 0m 15s trunk passed
          +1 mvnsite 0m 36s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 35s trunk passed
          +1 javadoc 0m 19s trunk passed
          +1 mvninstall 0m 31s the patch passed
          +1 compile 0m 29s the patch passed
          +1 cc 0m 29s the patch passed
          +1 javac 0m 29s the patch passed
          -0 checkstyle 0m 13s hadoop-hdfs-project/hadoop-hdfs-client: The patch generated 12 new + 50 unchanged - 0 fixed = 62 total (was 50)
          +1 mvnsite 0m 32s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          -1 whitespace 0m 0s The patch has 4 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          +1 findbugs 1m 34s the patch passed
          +1 javadoc 0m 18s the patch passed
          +1 unit 0m 55s hadoop-hdfs-client in the patch passed.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          17m 12s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-11026
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834137/blocktokenidentifier-protobuf.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc
          uname Linux 6acc1ce6805c 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / c5573e6
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17215/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-client.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/17215/artifact/patchprocess/whitespace-eol.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17215/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17215/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 7m 18s trunk passed +1 compile 0m 32s trunk passed +1 checkstyle 0m 15s trunk passed +1 mvnsite 0m 36s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 35s trunk passed +1 javadoc 0m 19s trunk passed +1 mvninstall 0m 31s the patch passed +1 compile 0m 29s the patch passed +1 cc 0m 29s the patch passed +1 javac 0m 29s the patch passed -0 checkstyle 0m 13s hadoop-hdfs-project/hadoop-hdfs-client: The patch generated 12 new + 50 unchanged - 0 fixed = 62 total (was 50) +1 mvnsite 0m 32s the patch passed +1 mvneclipse 0m 10s the patch passed -1 whitespace 0m 0s The patch has 4 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply +1 findbugs 1m 34s the patch passed +1 javadoc 0m 18s the patch passed +1 unit 0m 55s hadoop-hdfs-client in the patch passed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 17m 12s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-11026 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834137/blocktokenidentifier-protobuf.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc uname Linux 6acc1ce6805c 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / c5573e6 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17215/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-client.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/17215/artifact/patchprocess/whitespace-eol.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17215/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17215/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          ehiggs Ewan Higgs added a comment -

          No new tests were added to the patch because this is a like-for-like change behind the interfaces.

          Show
          ehiggs Ewan Higgs added a comment - No new tests were added to the patch because this is a like-for-like change behind the interfaces.
          Hide
          daryn Daryn Sharp added a comment -

          -1 While it's a long needed change, there must be backwards compatibility.

          This patch's knife-switch approach will completely break rolling upgrades - ie. standard procedure is upgrade NN, upgrade DNs. DNs not upgraded will fail to decode the new PB tokens issued by the upgraded NN. This means full downtime for the to-be-upgraded cluster. Running clients on other clusters accessing the upgraded cluster will experience major disruptions. Neither are acceptable to a production environment.

          This change will require via dual decoding of old Writable and new PB tokens. After initial integration the NN must continue issuing old Writable tokens to support rolling upgrades. In some future release, preferably a major release, the NN can start issuing new PB tokens - while maintaining dual decode support to prevent disrupting clients.

          Show
          daryn Daryn Sharp added a comment - -1 While it's a long needed change, there must be backwards compatibility. This patch's knife-switch approach will completely break rolling upgrades - ie. standard procedure is upgrade NN, upgrade DNs. DNs not upgraded will fail to decode the new PB tokens issued by the upgraded NN. This means full downtime for the to-be-upgraded cluster. Running clients on other clusters accessing the upgraded cluster will experience major disruptions. Neither are acceptable to a production environment. This change will require via dual decoding of old Writable and new PB tokens. After initial integration the NN must continue issuing old Writable tokens to support rolling upgrades. In some future release, preferably a major release, the NN can start issuing new PB tokens - while maintaining dual decode support to prevent disrupting clients.
          Hide
          ehiggs Ewan Higgs added a comment -

          Hi Daryn,

          Thanks for taking a look. The patch targets Hadoop 3.0 which hasn't been released yet and I was under the impression that rolling from 2.x to 3.0 isn't supported (hence the major version number change). Also, the change to make Token Identifiers in Yarn
          use protobuf instead of WritableUtils (YARN-668) was done with no gating. Those changes were done in order to support rolling upgrades in the first place (YARN-666):

          https://github.com/apache/hadoop/commit/5391919b09ce9549d13c897aa89bb0a0536760fe

          That aside, if we want to support both payload formats then I propose a config option (DFS_ACCESS_TOKEN_ENABLE_PROTOBUF = "dfs.block.access.token.protobuf.enable") which is turned off on default. Then the NN sends old or new style payloads based on this. It should be turned off until all datanodes are updated, as which point the config can be changed and the NN starts sending new protobuf style payloads.

          I'm not confident that the datanode will be able to 100% detect whether it's looking at a protobuf message or an old style message (which is overly flexible). So, we can put a boolean in BlockKeyProto that describes the version information for the payload.

          Show
          ehiggs Ewan Higgs added a comment - Hi Daryn, Thanks for taking a look. The patch targets Hadoop 3.0 which hasn't been released yet and I was under the impression that rolling from 2.x to 3.0 isn't supported (hence the major version number change). Also, the change to make Token Identifiers in Yarn use protobuf instead of WritableUtils ( YARN-668 ) was done with no gating. Those changes were done in order to support rolling upgrades in the first place ( YARN-666 ): https://github.com/apache/hadoop/commit/5391919b09ce9549d13c897aa89bb0a0536760fe That aside, if we want to support both payload formats then I propose a config option ( DFS_ACCESS_TOKEN_ENABLE_PROTOBUF = "dfs.block.access.token.protobuf.enable" ) which is turned off on default. Then the NN sends old or new style payloads based on this. It should be turned off until all datanodes are updated, as which point the config can be changed and the NN starts sending new protobuf style payloads. I'm not confident that the datanode will be able to 100% detect whether it's looking at a protobuf message or an old style message (which is overly flexible). So, we can put a boolean in BlockKeyProto that describes the version information for the payload.
          Hide
          daryn Daryn Sharp added a comment -

          IMHO, a major upgrade shouldn't be a sufficient reason to introduce avoidable protocol incompatibilities. It'll guarantee that 3.x is DOA for large production deployments.

          Yarn isn't exactly a good counter-example. As you point out, yarn made the change to support RU so the cost/benefit was justified. What is the compelling case that breaks the crucial feature that justified the yarn incompatibility?

          The main difference between hdfs and yarn is scope of impact. The impact from the yarn incompatibility was confined to the cluster being upgraded. This hdfs incompatibility would impact clients on other clusters. Imagine trying to get the stars to align with customers across many clusters to give a green light for their SLAs possibly being impacted. Not once, but every time you upgrade dozens of clusters.

          In the end, the DN will have to support dual decoding for inter-cluster clients, super long running clients holding old tokens, the balancer, etc. If we go forward with this change, I'd like/prefer to see a designated 2.x "minimum release" before a 3.x upgrade. That designated release would add the latent support for PB tokens.

          Show
          daryn Daryn Sharp added a comment - IMHO, a major upgrade shouldn't be a sufficient reason to introduce avoidable protocol incompatibilities. It'll guarantee that 3.x is DOA for large production deployments. Yarn isn't exactly a good counter-example. As you point out, yarn made the change to support RU so the cost/benefit was justified. What is the compelling case that breaks the crucial feature that justified the yarn incompatibility? The main difference between hdfs and yarn is scope of impact. The impact from the yarn incompatibility was confined to the cluster being upgraded. This hdfs incompatibility would impact clients on other clusters. Imagine trying to get the stars to align with customers across many clusters to give a green light for their SLAs possibly being impacted. Not once, but every time you upgrade dozens of clusters. In the end, the DN will have to support dual decoding for inter-cluster clients, super long running clients holding old tokens, the balancer, etc. If we go forward with this change, I'd like/prefer to see a designated 2.x "minimum release" before a 3.x upgrade. That designated release would add the latent support for PB tokens.
          Hide
          chris.douglas Chris Douglas added a comment -

          The length prefix is a WritableUtils vint. The expiryDate field should start with a negative byte, since any valid expiryDate requires more than one byte to represent. In contrast, the encoding for PB prefixes each field with (field_number << 3) | wire_type, so as long as the first field has a field_number less than 16, the first byte will be positive.

          Daryn Sharp, would this be an acceptable way to distinguish the two token types? Anu Engineer in HDFS-11010 cited incompatible changes in DFSClient that have already affected backwards compatibility. Even if we adopt an approach that makes this switch redundant (e.g., separate 2.x/3.x ports), we can undo it without affecting clients.

          Thanks Owen O'Malley (who knows too much about the internals of PB)

          Show
          chris.douglas Chris Douglas added a comment - The length prefix is a WritableUtils vint. The expiryDate field should start with a negative byte, since any valid expiryDate requires more than one byte to represent. In contrast, the encoding for PB prefixes each field with (field_number << 3) | wire_type , so as long as the first field has a field_number less than 16, the first byte will be positive. Daryn Sharp , would this be an acceptable way to distinguish the two token types? Anu Engineer in HDFS-11010 cited incompatible changes in DFSClient that have already affected backwards compatibility. Even if we adopt an approach that makes this switch redundant (e.g., separate 2.x/3.x ports), we can undo it without affecting clients. Thanks Owen O'Malley (who knows too much about the internals of PB)
          Hide
          ehiggs Ewan Higgs added a comment -

          Attached is HDFS-11026.002.patch. This provides a configuration option, dfs.block.access.token.protobuf.enable, which optionally makes the BlockTokenIdentifier write using WritableUtils ("Legacy") when set to false or Protobuf when the option is set to true.

          How to use it

          Admins can roll out the new datanodes or namenodes in any order. When all the servers are in place, stop the namenode. Set the configuration option to true. Restart the namenode.

          How to roll back

          Take down the Namenode. Update the configuration option. Restart the namenode.

          How it works

          When writing to the buffer, we use the value provided by the configuration to determine if legacy or protobuf is written. This sets the useProto flag in the BlockTokenIdentifier and is then used to determine how we write the buffer.

          When reading, we peeks at the first byte and checks if it's less than or equal to 0 as Chris Douglas suggested [1]. If we discover that we are in a protobuf world, then we set the BlockTokenIdentifier.useProto flag to true. This is required because we often have a pattern:

          BlockTokenIdentifier id = new BlockTokenIdentifier();
          id.readFields(in);
          ...
          

          Down the line, we may ask the BlockTokenSecretManager to create a password. This uses a BlockTokenIdentifier which has been built up as above and then calls getBytes. If we didn't acknowledge that we are in a protobuf world, then it would create a password using legacy WritableUtils.

          Tests

          Also in the patch, I've updated the TestBlockToken tests to run in both Legacy and protobuf modes.

          I also manually tested it locally using the following configurations. I couldn't get a 3.0 and 2.8 datanode/namenode to talk to each other so I'm not sure if I'm doing something wrong of if this is a show stopper on the rolling upgrades (if so, I'll enter a ticket):

          NameNode version Datanode version config option result
          3.0.0-alpha1-SNAPSHOT 3.0.0-alpha1-SNAPSHOT false Works using legacy
          3.0.0-alpha1-SNAPSHOT 3.0.0-alpha1-SNAPSHOT true Works using protobuf
          2.8.0-SNAPSHOT 3.0.0-alpha1-SNAPSHOT false/true Fails with IncorrectVersionException
          3.0.0-alpha1-SNAPSHOT 2.8.0-SNAPSHOT false/true Fails with IncorrectVersionException

          [1] Chris Douglas actually suggested only checking that the value is negative but it's possible to create empty BlockTokenIdentifier}}s and write them. This results in a 0 in the first byte. There is an explicit test in {{TestBlockToken.testWritable so I decided to keep the behaviour.

          Show
          ehiggs Ewan Higgs added a comment - Attached is HDFS-11026 .002.patch. This provides a configuration option, dfs.block.access.token.protobuf.enable , which optionally makes the BlockTokenIdentifier write using WritableUtils ("Legacy") when set to false or Protobuf when the option is set to true. How to use it Admins can roll out the new datanodes or namenodes in any order. When all the servers are in place, stop the namenode. Set the configuration option to true . Restart the namenode. How to roll back Take down the Namenode. Update the configuration option. Restart the namenode. How it works When writing to the buffer, we use the value provided by the configuration to determine if legacy or protobuf is written. This sets the useProto flag in the BlockTokenIdentifier and is then used to determine how we write the buffer. When reading, we peeks at the first byte and checks if it's less than or equal to 0 as Chris Douglas suggested [1]. If we discover that we are in a protobuf world, then we set the BlockTokenIdentifier.useProto flag to true. This is required because we often have a pattern: BlockTokenIdentifier id = new BlockTokenIdentifier(); id.readFields(in); ... Down the line, we may ask the BlockTokenSecretManager to create a password. This uses a BlockTokenIdentifier which has been built up as above and then calls getBytes . If we didn't acknowledge that we are in a protobuf world, then it would create a password using legacy WritableUtils. Tests Also in the patch, I've updated the TestBlockToken tests to run in both Legacy and protobuf modes. I also manually tested it locally using the following configurations. I couldn't get a 3.0 and 2.8 datanode/namenode to talk to each other so I'm not sure if I'm doing something wrong of if this is a show stopper on the rolling upgrades (if so, I'll enter a ticket): NameNode version Datanode version config option result 3.0.0-alpha1-SNAPSHOT 3.0.0-alpha1-SNAPSHOT false Works using legacy 3.0.0-alpha1-SNAPSHOT 3.0.0-alpha1-SNAPSHOT true Works using protobuf 2.8.0-SNAPSHOT 3.0.0-alpha1-SNAPSHOT false/true Fails with IncorrectVersionException 3.0.0-alpha1-SNAPSHOT 2.8.0-SNAPSHOT false/true Fails with IncorrectVersionException [1] Chris Douglas actually suggested only checking that the value is negative but it's possible to create empty BlockTokenIdentifier}}s and write them. This results in a 0 in the first byte. There is an explicit test in {{TestBlockToken.testWritable so I decided to keep the behaviour.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 20s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 6s Maven dependency ordering for branch
          +1 mvninstall 6m 54s trunk passed
          +1 compile 1m 21s trunk passed
          +1 checkstyle 0m 39s trunk passed
          +1 mvnsite 1m 25s trunk passed
          +1 mvneclipse 0m 25s trunk passed
          +1 findbugs 3m 3s trunk passed
          +1 javadoc 0m 59s trunk passed
          0 mvndep 0m 6s Maven dependency ordering for patch
          +1 mvninstall 1m 15s the patch passed
          +1 compile 1m 19s the patch passed
          +1 cc 1m 19s the patch passed
          +1 javac 1m 19s the patch passed
          -0 checkstyle 0m 35s hadoop-hdfs-project: The patch generated 3 new + 775 unchanged - 3 fixed = 778 total (was 778)
          +1 mvnsite 1m 20s the patch passed
          +1 mvneclipse 0m 20s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 3m 13s the patch passed
          +1 javadoc 0m 54s the patch passed
          +1 unit 0m 54s hadoop-hdfs-client in the patch passed.
          -1 unit 71m 32s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 20s The patch does not generate ASF License warnings.
          98m 20s



          Reason Tests
          Failed junit tests hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs
            hadoop.tools.TestHdfsConfigFields



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-11026
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12836531/HDFS-11026.002.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc
          uname Linux cd79b92babf1 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / cb5cc0d
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17381/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/17381/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17381/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17381/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 6s Maven dependency ordering for branch +1 mvninstall 6m 54s trunk passed +1 compile 1m 21s trunk passed +1 checkstyle 0m 39s trunk passed +1 mvnsite 1m 25s trunk passed +1 mvneclipse 0m 25s trunk passed +1 findbugs 3m 3s trunk passed +1 javadoc 0m 59s trunk passed 0 mvndep 0m 6s Maven dependency ordering for patch +1 mvninstall 1m 15s the patch passed +1 compile 1m 19s the patch passed +1 cc 1m 19s the patch passed +1 javac 1m 19s the patch passed -0 checkstyle 0m 35s hadoop-hdfs-project: The patch generated 3 new + 775 unchanged - 3 fixed = 778 total (was 778) +1 mvnsite 1m 20s the patch passed +1 mvneclipse 0m 20s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 3m 13s the patch passed +1 javadoc 0m 54s the patch passed +1 unit 0m 54s hadoop-hdfs-client in the patch passed. -1 unit 71m 32s hadoop-hdfs in the patch failed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 98m 20s Reason Tests Failed junit tests hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs   hadoop.tools.TestHdfsConfigFields Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-11026 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12836531/HDFS-11026.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc uname Linux cd79b92babf1 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / cb5cc0d Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17381/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/17381/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17381/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17381/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          daryn Daryn Sharp added a comment -

          I couldn't get a 3.0 and 2.8 datanode/namenode to talk to each other so I'm not sure if I'm doing something wrong of if this is a show stopper on the rolling upgrades (if so, I'll enter a ticket):

          Are you saying you can't make your patch work in a compatible way? Or you can't get stock 2.8/3.0 to work together? If the former, then this patch can't go in. If the latter, then it's a show stopper.

          Show
          daryn Daryn Sharp added a comment - I couldn't get a 3.0 and 2.8 datanode/namenode to talk to each other so I'm not sure if I'm doing something wrong of if this is a show stopper on the rolling upgrades (if so, I'll enter a ticket): Are you saying you can't make your patch work in a compatible way? Or you can't get stock 2.8/3.0 to work together? If the former, then this patch can't go in. If the latter, then it's a show stopper.
          Hide
          andrew.wang Andrew Wang added a comment -

          This is probably related to HADOOP-13142, which bumped the min DN version to 3.0.0-alpha1 in trunk.

          On second thought, this was probably a mistake, since it should be possible to support rolling upgrade to 3.0 even though it's not required by our compatibility guidelines. I'll file a new JIRA to track this.

          Show
          andrew.wang Andrew Wang added a comment - This is probably related to HADOOP-13142 , which bumped the min DN version to 3.0.0-alpha1 in trunk. On second thought, this was probably a mistake, since it should be possible to support rolling upgrade to 3.0 even though it's not required by our compatibility guidelines. I'll file a new JIRA to track this.
          Hide
          andrew.wang Andrew Wang added a comment -

          Filed HDFS-11096 and linked it here. There might be more to do to restore rolling upgrade functionality. Will need testing obviously.

          Show
          andrew.wang Andrew Wang added a comment - Filed HDFS-11096 and linked it here. There might be more to do to restore rolling upgrade functionality. Will need testing obviously.
          Hide
          ehiggs Ewan Higgs added a comment -

          Are you saying you can't make your patch work in a compatible way? Or you can't get stock 2.8/3.0 to work together?

          The latter. When I try to start NN and DN across 2.8/3.0 boundaries I get the following error (NN 2.8, DN 3.0):

          2016-11-01 11:35:25,727 WARN datanode.DataNode: The reported NameNode version is too low to communicate with this DataNode. NameNode version: '2.8.0-SNAPSHOT' Minimum NameNode version: '3.0.0-alpha1-SNAPSHOT'
          2016-11-01 11:35:25,728 ERROR datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to localhost/127.0.0.1:60020. Exiting. 
          org.apache.hadoop.hdfs.server.common.IncorrectVersionException: The reported NameNode version is too low to communicate with this DataNode. NameNode version: '2.8.0-SNAPSHOT' Minimum NameNode version: '3.0.0-alpha1-SNAPSHOT'
          	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.checkNNVersion(BPServiceActor.java:252)
          	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:239)
          	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:271)
          	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:782)
          	at java.lang.Thread.run(Thread.java:745)
          

          or NN 3.0, DN 2.8:

          2016-11-01 11:30:26,767 INFO datanode.DataNode: Block pool BP-875416690-10.108.37.99-1476803009605 (Datanode Uuid 39b7e605-d1ec-470e-a4e5-39b0c0d4bd9f) service to localhost/127.0.0.1:60020 beginning handshake with NN
          2016-11-01 11:30:26,818 ERROR datanode.DataNode: Initialization failed for Block pool BP-875416690-10.108.37.99-1476803009605 (Datanode Uuid 39b7e605-d1ec-470e-a4e5-39b0c0d4bd9f) service to localhost/127.0.0.1:60020 The reported DataNode version is too low to communicate with this NameNode. DataNode version: '2.8.0-SNAPSHOT' Minimum DataNode version: '3.0.0-alpha1-SNAPSHOT'
          	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.verifySoftwareVersion(NameNodeRpcServer.java:1663)
          	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1402)
          	at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:100)
          	at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:30055)
          	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:467)
          	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:990)
          	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845)
          	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:788)
          	at java.security.AccessController.doPrivileged(Native Method)
          	at javax.security.auth.Subject.doAs(Subject.java:422)
          	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1795)
          	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2535)
          
          Show
          ehiggs Ewan Higgs added a comment - Are you saying you can't make your patch work in a compatible way? Or you can't get stock 2.8/3.0 to work together? The latter. When I try to start NN and DN across 2.8/3.0 boundaries I get the following error (NN 2.8, DN 3.0): 2016-11-01 11:35:25,727 WARN datanode.DataNode: The reported NameNode version is too low to communicate with this DataNode. NameNode version: '2.8.0-SNAPSHOT' Minimum NameNode version: '3.0.0-alpha1-SNAPSHOT' 2016-11-01 11:35:25,728 ERROR datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to localhost/127.0.0.1:60020. Exiting. org.apache.hadoop.hdfs.server.common.IncorrectVersionException: The reported NameNode version is too low to communicate with this DataNode. NameNode version: '2.8.0-SNAPSHOT' Minimum NameNode version: '3.0.0-alpha1-SNAPSHOT' at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.checkNNVersion(BPServiceActor.java:252) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:239) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:271) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:782) at java.lang. Thread .run( Thread .java:745) or NN 3.0, DN 2.8: 2016-11-01 11:30:26,767 INFO datanode.DataNode: Block pool BP-875416690-10.108.37.99-1476803009605 (Datanode Uuid 39b7e605-d1ec-470e-a4e5-39b0c0d4bd9f) service to localhost/127.0.0.1:60020 beginning handshake with NN 2016-11-01 11:30:26,818 ERROR datanode.DataNode: Initialization failed for Block pool BP-875416690-10.108.37.99-1476803009605 (Datanode Uuid 39b7e605-d1ec-470e-a4e5-39b0c0d4bd9f) service to localhost/127.0.0.1:60020 The reported DataNode version is too low to communicate with this NameNode. DataNode version: '2.8.0-SNAPSHOT' Minimum DataNode version: '3.0.0-alpha1-SNAPSHOT' at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.verifySoftwareVersion(NameNodeRpcServer.java:1663) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1402) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:100) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:30055) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:467) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:990) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:788) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1795) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2535)
          Hide
          Thomas Demoor Thomas Demoor added a comment -

          Ewan's stacktraces match with Andrew Wang's remarks.
          Daryn Sharp, once HDFS-11096 gets resolved we expect the current patch to work across 2.x and 3.0.

          Thanks for looking at our patch.

          Show
          Thomas Demoor Thomas Demoor added a comment - Ewan's stacktraces match with Andrew Wang 's remarks. Daryn Sharp , once HDFS-11096 gets resolved we expect the current patch to work across 2.x and 3.0. Thanks for looking at our patch.
          Hide
          ehiggs Ewan Higgs added a comment -

          Attaching HDFS-11026.003.patch which adds the default value for dfs.block.access.token.protobuf.enable to hdfs-defaults.xml. This fixes one of the test failures. The other looked spurious.

          Show
          ehiggs Ewan Higgs added a comment - Attaching HDFS-11026 .003.patch which adds the default value for dfs.block.access.token.protobuf.enable to hdfs-defaults.xml . This fixes one of the test failures. The other looked spurious.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 28s Maven dependency ordering for branch
          +1 mvninstall 7m 31s trunk passed
          +1 compile 1m 33s trunk passed
          +1 checkstyle 0m 38s trunk passed
          +1 mvnsite 1m 32s trunk passed
          +1 mvneclipse 0m 25s trunk passed
          +1 findbugs 3m 15s trunk passed
          +1 javadoc 0m 59s trunk passed
          0 mvndep 0m 7s Maven dependency ordering for patch
          +1 mvninstall 1m 20s the patch passed
          +1 compile 1m 28s the patch passed
          +1 cc 1m 28s the patch passed
          +1 javac 1m 28s the patch passed
          -0 checkstyle 0m 38s hadoop-hdfs-project: The patch generated 3 new + 775 unchanged - 3 fixed = 778 total (was 778)
          +1 mvnsite 1m 36s the patch passed
          +1 mvneclipse 0m 19s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          +1 findbugs 3m 27s the patch passed
          +1 javadoc 0m 58s the patch passed
          +1 unit 0m 57s hadoop-hdfs-client in the patch passed.
          -1 unit 59m 0s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          88m 10s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.datanode.checker.TestThrottledAsyncChecker
            hadoop.hdfs.server.namenode.ha.TestHAMetrics



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:e809691
          JIRA Issue HDFS-11026
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12838356/HDFS-11026.003.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux a7193253d3d4 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / ca68f9c
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17506/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/17506/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17506/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17506/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 28s Maven dependency ordering for branch +1 mvninstall 7m 31s trunk passed +1 compile 1m 33s trunk passed +1 checkstyle 0m 38s trunk passed +1 mvnsite 1m 32s trunk passed +1 mvneclipse 0m 25s trunk passed +1 findbugs 3m 15s trunk passed +1 javadoc 0m 59s trunk passed 0 mvndep 0m 7s Maven dependency ordering for patch +1 mvninstall 1m 20s the patch passed +1 compile 1m 28s the patch passed +1 cc 1m 28s the patch passed +1 javac 1m 28s the patch passed -0 checkstyle 0m 38s hadoop-hdfs-project: The patch generated 3 new + 775 unchanged - 3 fixed = 778 total (was 778) +1 mvnsite 1m 36s the patch passed +1 mvneclipse 0m 19s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 3m 27s the patch passed +1 javadoc 0m 58s the patch passed +1 unit 0m 57s hadoop-hdfs-client in the patch passed. -1 unit 59m 0s hadoop-hdfs in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 88m 10s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.checker.TestThrottledAsyncChecker   hadoop.hdfs.server.namenode.ha.TestHAMetrics Subsystem Report/Notes Docker Image:yetus/hadoop:e809691 JIRA Issue HDFS-11026 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12838356/HDFS-11026.003.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux a7193253d3d4 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / ca68f9c Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17506/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/17506/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17506/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17506/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          chris.douglas Chris Douglas added a comment -

          v003 of the patch looks like a reasonable approach, particularly given the constraints we're working with. It addresses backwards compatibility, and I'd like to commit this instead of waiting for HDFS-11096 to shake out. 2.x/3.x rolling upgrades will be a journey, and Ewan Higgs's latest patch at least gives us a mechanism to evolve BlockTokenIdentifier.

          Andrew Wang, Daryn Sharp? If there's a way to validate this, we might want to consider it for branch-2. For users upgrading their clusters to 3.x, we'll almost certainly require an intermediate upgrade to a recent 2.x release. To Daryn Sharp's earlier point, applications that started in a 2.x environment will need to cope with 3.x tokens during a rolling upgrade.

          Show
          chris.douglas Chris Douglas added a comment - v003 of the patch looks like a reasonable approach, particularly given the constraints we're working with. It addresses backwards compatibility, and I'd like to commit this instead of waiting for HDFS-11096 to shake out. 2.x/3.x rolling upgrades will be a journey, and Ewan Higgs 's latest patch at least gives us a mechanism to evolve BlockTokenIdentifier . Andrew Wang , Daryn Sharp ? If there's a way to validate this, we might want to consider it for branch-2. For users upgrading their clusters to 3.x, we'll almost certainly require an intermediate upgrade to a recent 2.x release. To Daryn Sharp 's earlier point, applications that started in a 2.x environment will need to cope with 3.x tokens during a rolling upgrade.
          Hide
          andrew.wang Andrew Wang added a comment -

          I had a few clarifying questions about the version requirements:

          Why do we need to upgrade to a "recent 2.x release" before upgrading to 3.0? I thought the procedure was:

          • rolling upgrade your cluster from 2.whatever to 3.0
          • flip the config option to true to start issuing PB tokens

          Also just to confirm my understanding, there is no requirement for the client to upgrade to a "recent 2.x release," since the token is opaque? I ask because client JARs can be embedded in third-party apps or out-of-cluster, and thus difficult to upgrade.

          Show
          andrew.wang Andrew Wang added a comment - I had a few clarifying questions about the version requirements: Why do we need to upgrade to a "recent 2.x release" before upgrading to 3.0? I thought the procedure was: rolling upgrade your cluster from 2.whatever to 3.0 flip the config option to true to start issuing PB tokens Also just to confirm my understanding, there is no requirement for the client to upgrade to a "recent 2.x release," since the token is opaque? I ask because client JARs can be embedded in third-party apps or out-of-cluster, and thus difficult to upgrade.
          Hide
          chris.douglas Chris Douglas added a comment -

          Why do we need to upgrade to a "recent 2.x release" before upgrading to 3.0?

          Sorry, that was poorly phrased. That's not a requirement, but an expectation. To take this JIRA as an example: a 2.x client could work with protobuf tokens issued by a 3.x cluster, but only if it was a recent 2.x release containing this improvement. As we discover (and create) more of these cases, rolling upgrades and interop across 2.x/3.y may establish a minimum x, in some settings.

          The suggestion was to backport this change, so 2.9 clients would understand protobuf tokens, even though 2.x servers won't issue them and 3.x servers will only issue them when configured.

          Show
          chris.douglas Chris Douglas added a comment - Why do we need to upgrade to a "recent 2.x release" before upgrading to 3.0? Sorry, that was poorly phrased. That's not a requirement, but an expectation. To take this JIRA as an example: a 2.x client could work with protobuf tokens issued by a 3.x cluster, but only if it was a recent 2.x release containing this improvement. As we discover (and create) more of these cases, rolling upgrades and interop across 2.x/3.y may establish a minimum x, in some settings. The suggestion was to backport this change, so 2.9 clients would understand protobuf tokens, even though 2.x servers won't issue them and 3.x servers will only issue them when configured.
          Hide
          andrew.wang Andrew Wang added a comment -

          a 2.x client could work with protobuf tokens issued by a 3.x cluster, but only if it was a recent 2.x release containing this improvement.

          Sorry if this is a dumb question, but I thought tokens were opaque to clients. Why do we need to upgrade the client to be aware of PB block tokens? To be concrete, will a 2.6.0 client be able to talk to a 3.0 cluster issuing PB block tokens?

          Show
          andrew.wang Andrew Wang added a comment - a 2.x client could work with protobuf tokens issued by a 3.x cluster, but only if it was a recent 2.x release containing this improvement. Sorry if this is a dumb question, but I thought tokens were opaque to clients. Why do we need to upgrade the client to be aware of PB block tokens? To be concrete, will a 2.6.0 client be able to talk to a 3.0 cluster issuing PB block tokens?
          Hide
          chris.douglas Chris Douglas added a comment -

          Why do we need to upgrade the client to be aware of PB block tokens?

          They are opaque to the user, but TokenSelector, SecretManager and other client modules need to interpret some of its fields/attributes. We can't change the encoding without breaking old clients.

          To be concrete, will a 2.6.0 client be able to talk to a 3.0 cluster issuing PB block tokens?

          Not unless the 3.0 server is configured to issue PB tokens, understands it's speaking to a 2.6 client, and it sends the token in the legacy format. Servers should understand tokens written in either encoding, with this change; this is the "dual decode" support that Daryn Sharp referred to above.

          Show
          chris.douglas Chris Douglas added a comment - Why do we need to upgrade the client to be aware of PB block tokens? They are opaque to the user, but TokenSelector , SecretManager and other client modules need to interpret some of its fields/attributes. We can't change the encoding without breaking old clients. To be concrete, will a 2.6.0 client be able to talk to a 3.0 cluster issuing PB block tokens? Not unless the 3.0 server is configured to issue PB tokens, understands it's speaking to a 2.6 client, and it sends the token in the legacy format. Servers should understand tokens written in either encoding, with this change; this is the "dual decode" support that Daryn Sharp referred to above.
          Hide
          andrew.wang Andrew Wang added a comment -

          Got it, thanks for the explanation. In this case, I think we should consider backporting dual-decoding support to the 2.6 and 2.7 release lines as well. Historically, users have been more willing to upgrade to a maintenance release than a new minor release.

          Looking at the patch, it overall looks good. I'd like to see some tests though that verify both legacy and PB modes are actually generating the expected bytes on the wire when the boolean is passed. I see the read/write legacy/protobuf methods are annotated VisibleForTesting but not tested directly.

          Nits:

          • typo "deseriablize" in the readFields comment.
          • BlockTokenSecretProto could also use a comment saying "only add optional/repeated fields", since legacy tokens won't have these new fields.

          Would be good for Daryn to review too, since I'm not really an expert on this part of the code.

          Show
          andrew.wang Andrew Wang added a comment - Got it, thanks for the explanation. In this case, I think we should consider backporting dual-decoding support to the 2.6 and 2.7 release lines as well. Historically, users have been more willing to upgrade to a maintenance release than a new minor release. Looking at the patch, it overall looks good. I'd like to see some tests though that verify both legacy and PB modes are actually generating the expected bytes on the wire when the boolean is passed. I see the read/write legacy/protobuf methods are annotated VisibleForTesting but not tested directly. Nits: typo "deseriablize" in the readFields comment. BlockTokenSecretProto could also use a comment saying "only add optional/repeated fields", since legacy tokens won't have these new fields. Would be good for Daryn to review too, since I'm not really an expert on this part of the code.
          Hide
          ehiggs Ewan Higgs added a comment -

          Attached new patch rebased to 165f07f51a03137d2e73e39ed1cb48385d963f39. This addressed @andrew.wang's comments:

          • Fixed spelling of deserialize.
          • Added comment to remind maintainers to only add optional fields to the BlockKeyProto.
          • Added explicit tests that call readFieldsLegacy and readFieldsProto. These tests don't compare bytes against any hard coded bytes on the wire, but do check that results from readFields are consistent with results from the appropriate function. It also confirms that parsing using the wrong function will fail.

          checkstyle complains about two lines being too long, but these are lines with constants declarations.

          Show
          ehiggs Ewan Higgs added a comment - Attached new patch rebased to 165f07f51a03137d2e73e39ed1cb48385d963f39 . This addressed @andrew.wang's comments: Fixed spelling of deserialize. Added comment to remind maintainers to only add optional fields to the BlockKeyProto . Added explicit tests that call readFieldsLegacy and readFieldsProto . These tests don't compare bytes against any hard coded bytes on the wire, but do check that results from readFields are consistent with results from the appropriate function. It also confirms that parsing using the wrong function will fail. checkstyle complains about two lines being too long, but these are lines with constants declarations.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 27s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 7s Maven dependency ordering for branch
          +1 mvninstall 13m 0s trunk passed
          +1 compile 1m 25s trunk passed
          +1 checkstyle 0m 40s trunk passed
          +1 mvnsite 1m 27s trunk passed
          +1 mvneclipse 0m 25s trunk passed
          +1 findbugs 3m 15s trunk passed
          +1 javadoc 1m 1s trunk passed
          0 mvndep 0m 7s Maven dependency ordering for patch
          +1 mvninstall 1m 21s the patch passed
          +1 compile 1m 23s the patch passed
          +1 cc 1m 23s the patch passed
          +1 javac 1m 23s the patch passed
          -0 checkstyle 0m 36s hadoop-hdfs-project: The patch generated 3 new + 772 unchanged - 3 fixed = 775 total (was 775)
          +1 mvnsite 1m 26s the patch passed
          +1 mvneclipse 0m 21s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 2s The patch has no ill-formed XML file.
          +1 findbugs 3m 31s the patch passed
          +1 javadoc 0m 53s the patch passed
          +1 unit 0m 55s hadoop-hdfs-client in the patch passed.
          -1 unit 103m 4s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 19s The patch does not generate ASF License warnings.
          137m 7s



          Reason Tests
          Failed junit tests hadoop.hdfs.security.token.block.TestBlockToken



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-11026
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12849952/HDFS-11026.004.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux d5c8e3c57226 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 312b36d
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18291/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18291/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18291/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18291/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 27s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 7s Maven dependency ordering for branch +1 mvninstall 13m 0s trunk passed +1 compile 1m 25s trunk passed +1 checkstyle 0m 40s trunk passed +1 mvnsite 1m 27s trunk passed +1 mvneclipse 0m 25s trunk passed +1 findbugs 3m 15s trunk passed +1 javadoc 1m 1s trunk passed 0 mvndep 0m 7s Maven dependency ordering for patch +1 mvninstall 1m 21s the patch passed +1 compile 1m 23s the patch passed +1 cc 1m 23s the patch passed +1 javac 1m 23s the patch passed -0 checkstyle 0m 36s hadoop-hdfs-project: The patch generated 3 new + 772 unchanged - 3 fixed = 775 total (was 775) +1 mvnsite 1m 26s the patch passed +1 mvneclipse 0m 21s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 2s The patch has no ill-formed XML file. +1 findbugs 3m 31s the patch passed +1 javadoc 0m 53s the patch passed +1 unit 0m 55s hadoop-hdfs-client in the patch passed. -1 unit 103m 4s hadoop-hdfs in the patch failed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 137m 7s Reason Tests Failed junit tests hadoop.hdfs.security.token.block.TestBlockToken Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11026 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12849952/HDFS-11026.004.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux d5c8e3c57226 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 312b36d Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18291/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/18291/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18291/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18291/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          andrew.wang Andrew Wang added a comment -

          LGTM, though I'd like a review from someone more expert in this area (Daryn Sharp?) before we commit this.

          Show
          andrew.wang Andrew Wang added a comment - LGTM, though I'd like a review from someone more expert in this area ( Daryn Sharp ?) before we commit this.
          Hide
          daryn Daryn Sharp added a comment -

          Added to my to-do.

          Show
          daryn Daryn Sharp added a comment - Added to my to-do.
          Hide
          ehiggs Ewan Higgs added a comment -

          Jenkins revealed an issue in the new tests I added: when attempting to parse a protobuf as though it's a legacy block token, OpenJDK will raise an IOException while Oracle JDK throws a RuntimeException.

          I've updated the patch to catch both in the appropriate test.

          Show
          ehiggs Ewan Higgs added a comment - Jenkins revealed an issue in the new tests I added: when attempting to parse a protobuf as though it's a legacy block token, OpenJDK will raise an IOException while Oracle JDK throws a RuntimeException. I've updated the patch to catch both in the appropriate test.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 22s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 25s Maven dependency ordering for branch
          +1 mvninstall 14m 44s trunk passed
          +1 compile 1m 53s trunk passed
          +1 checkstyle 0m 52s trunk passed
          +1 mvnsite 1m 57s trunk passed
          +1 mvneclipse 0m 36s trunk passed
          +1 findbugs 3m 42s trunk passed
          +1 javadoc 1m 6s trunk passed
          0 mvndep 0m 7s Maven dependency ordering for patch
          +1 mvninstall 1m 27s the patch passed
          +1 compile 1m 47s the patch passed
          +1 cc 1m 47s the patch passed
          +1 javac 1m 47s the patch passed
          -0 checkstyle 0m 41s hadoop-hdfs-project: The patch generated 3 new + 772 unchanged - 3 fixed = 775 total (was 775)
          +1 mvnsite 1m 37s the patch passed
          +1 mvneclipse 0m 23s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          +1 findbugs 3m 48s the patch passed
          +1 javadoc 1m 4s the patch passed
          +1 unit 1m 5s hadoop-hdfs-client in the patch passed.
          -1 unit 112m 42s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 23s The patch does not generate ASF License warnings.
          152m 28s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits
            hadoop.hdfs.server.namenode.ha.TestHAAppend
            hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs
          Timed out junit tests org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
            org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-11026
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12850387/HDFS-11026.005.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux 5a04d205de70 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / bec9b7a
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18305/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18305/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18305/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18305/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 22s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 25s Maven dependency ordering for branch +1 mvninstall 14m 44s trunk passed +1 compile 1m 53s trunk passed +1 checkstyle 0m 52s trunk passed +1 mvnsite 1m 57s trunk passed +1 mvneclipse 0m 36s trunk passed +1 findbugs 3m 42s trunk passed +1 javadoc 1m 6s trunk passed 0 mvndep 0m 7s Maven dependency ordering for patch +1 mvninstall 1m 27s the patch passed +1 compile 1m 47s the patch passed +1 cc 1m 47s the patch passed +1 javac 1m 47s the patch passed -0 checkstyle 0m 41s hadoop-hdfs-project: The patch generated 3 new + 772 unchanged - 3 fixed = 775 total (was 775) +1 mvnsite 1m 37s the patch passed +1 mvneclipse 0m 23s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 3m 48s the patch passed +1 javadoc 1m 4s the patch passed +1 unit 1m 5s hadoop-hdfs-client in the patch passed. -1 unit 112m 42s hadoop-hdfs in the patch failed. +1 asflicense 0m 23s The patch does not generate ASF License warnings. 152m 28s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits   hadoop.hdfs.server.namenode.ha.TestHAAppend   hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs Timed out junit tests org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11026 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12850387/HDFS-11026.005.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux 5a04d205de70 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / bec9b7a Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18305/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/18305/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18305/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18305/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          daryn Daryn Sharp added a comment -

          Taking a look today. sorry for the wait.

          Show
          daryn Daryn Sharp added a comment - Taking a look today. sorry for the wait.
          Hide
          daryn Daryn Sharp added a comment -

          The first byte trick is clever. I'd prefer an equality check for a definitive magic byte that can't occur or represents something so large it can't/won't occur . Perhaps something like -1 – I haven't checked if that's impossible for the varint.

          Curiosity, why do different jdks throw an IOException or RuntimeException during incorrect decoding? I'd expect a deterministic exception.

          Show
          daryn Daryn Sharp added a comment - The first byte trick is clever. I'd prefer an equality check for a definitive magic byte that can't occur or represents something so large it can't/won't occur . Perhaps something like -1 – I haven't checked if that's impossible for the varint. Curiosity, why do different jdks throw an IOException or RuntimeException during incorrect decoding? I'd expect a deterministic exception.
          Hide
          ehiggs Ewan Higgs added a comment -

          Thanks for taking a look, Daryn Sharp.

          The first byte trick is clever. I'd prefer an equality check for a definitive magic byte that can't occur or represents something so large it can't/won't occur . Perhaps something like -1 – I haven't checked if that's impossible for the varint.

          Do you want to inject a magic byte or do you want to detect one? I guess if we're going to inject a magic value in then we can choose any positive byte. It seems to me that it would just complicate the scheme -though it would be more explicit. Maybe Chris Douglas can comment as he devised the scheme with Owen O'Malley's advice.

          Curiosity, why do different jdks throw an IOException or RuntimeException during incorrect decoding? I'd expect a deterministic exception.

          Oracle Java gives the following exception (putting in an e.printStackTrace():

          Received exception reading protobuf message: java.io.IOException: value too long to fit in integer
          java.io.IOException: value too long to fit in integer
                  at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:331)
                  at org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:194)
                  at org.apache.hadoop.hdfs.security.token.block.TestBlockToken.testProtobufBlockTokenBytesIsProtobuf(TestBlockToken.java:544)
                  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
                  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                  at java.lang.reflect.Method.invoke(Method.java:498)
                  at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
                  at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
                  at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
                  at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
                  at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
                  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
                  at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
                  at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
                  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
                  at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
                  at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
                  at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
                  at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
                  at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
                  at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
                  at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
                  at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
                  at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
                  at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
                  at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
          

          OpenJDK gives:

          Received exception reading protobuf message: java.lang.NegativeArraySizeException
          java.lang.NegativeArraySizeException
                  at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:124)
                  at org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:195)
                  at org.apache.hadoop.hdfs.security.token.block.TestBlockToken.testProtobufBlockTokenBytesIsProtobuf(TestBlockToken.java:544)
                  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
                  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                  at java.lang.reflect.Method.invoke(Method.java:498)
                  at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
                  at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
                  at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
                  at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
                  at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
                  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
                  at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
                  at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
                  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
                  at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
                  at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
                  at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
                  at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
                  at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
                  at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
                  at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
                  at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
                  at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
                  at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
                  at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
          
          Show
          ehiggs Ewan Higgs added a comment - Thanks for taking a look, Daryn Sharp . The first byte trick is clever. I'd prefer an equality check for a definitive magic byte that can't occur or represents something so large it can't/won't occur . Perhaps something like -1 – I haven't checked if that's impossible for the varint. Do you want to inject a magic byte or do you want to detect one? I guess if we're going to inject a magic value in then we can choose any positive byte. It seems to me that it would just complicate the scheme -though it would be more explicit. Maybe Chris Douglas can comment as he devised the scheme with Owen O'Malley 's advice. Curiosity, why do different jdks throw an IOException or RuntimeException during incorrect decoding? I'd expect a deterministic exception. Oracle Java gives the following exception (putting in an e.printStackTrace() : Received exception reading protobuf message: java.io.IOException: value too long to fit in integer java.io.IOException: value too long to fit in integer at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:331) at org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:194) at org.apache.hadoop.hdfs.security.token.block.TestBlockToken.testProtobufBlockTokenBytesIsProtobuf(TestBlockToken.java:544) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) OpenJDK gives: Received exception reading protobuf message: java.lang.NegativeArraySizeException java.lang.NegativeArraySizeException at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:124) at org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:195) at org.apache.hadoop.hdfs.security.token.block.TestBlockToken.testProtobufBlockTokenBytesIsProtobuf(TestBlockToken.java:544) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
          Hide
          chris.douglas Chris Douglas added a comment -

          Ewan Higgs, a comment in the .proto explaining the constraint on field numbers would also be a good idea.

          I'd prefer an equality check for a definitive magic byte that can't occur or represents something so large it can't/won't occur . Perhaps something like -1 – I haven't checked if that's impossible for the varint.

          The current patch relies on the record including least one field with an ID less than 16 (required to be written sequentially, ordered by field ID). We could require the first field, and match on its ID and type explicitly, rather than allowing it to float.

          An explicit check could be removed as part of deprecating the field, but... this is becoming harder to maintain, for a pretty marginal gain over the guarantees we get by exploiting the gap between encodings. If we moved to an equality check, then we can't deprecate the first field while we support backwards compatibility with 3.x. That's true even if we add the magic bytes as a constant, optional PB field. I agree, that's morally equivalent to not deprecating 15 fields in a way, but we're more likely to get the latter correct accidentally, after this JIRA is forgotten.

          There's one more level of defensible paranoia: add code to writeLegacy that refuses to write an expiryTime that would confuse the switch, and a unit test affirming the same. That should at least cause noisy failures in the server attempting to issue confusing tokens, but if we change the semantics of expiryTime then we've already broken backwards compatibility with 2.x. Though someone may read this later and curse me, checking this for each block token seems like unnecessary overhead.

          Thoughts? We're down to debating the merits of approaches that are unlikely to be distinguished in practice, so with all the above being said, I'd be fine with any of the three approaches in this order:

          1. current patch Rely on gap between encodings (expiryTime > 128, PB field number | type < 16)
          2. Fix the first field and match its number and type exactly
          3. Define a new PB field, magic_number that has a constant encoding and match on it (adding a version field for a PB record, really)
          Show
          chris.douglas Chris Douglas added a comment - Ewan Higgs , a comment in the .proto explaining the constraint on field numbers would also be a good idea. I'd prefer an equality check for a definitive magic byte that can't occur or represents something so large it can't/won't occur . Perhaps something like -1 – I haven't checked if that's impossible for the varint. The current patch relies on the record including least one field with an ID less than 16 ( required to be written sequentially, ordered by field ID). We could require the first field, and match on its ID and type explicitly, rather than allowing it to float. An explicit check could be removed as part of deprecating the field, but... this is becoming harder to maintain, for a pretty marginal gain over the guarantees we get by exploiting the gap between encodings. If we moved to an equality check, then we can't deprecate the first field while we support backwards compatibility with 3.x . That's true even if we add the magic bytes as a constant, optional PB field. I agree, that's morally equivalent to not deprecating 15 fields in a way, but we're more likely to get the latter correct accidentally, after this JIRA is forgotten. There's one more level of defensible paranoia: add code to writeLegacy that refuses to write an expiryTime that would confuse the switch, and a unit test affirming the same. That should at least cause noisy failures in the server attempting to issue confusing tokens, but if we change the semantics of expiryTime then we've already broken backwards compatibility with 2.x. Though someone may read this later and curse me, checking this for each block token seems like unnecessary overhead. Thoughts? We're down to debating the merits of approaches that are unlikely to be distinguished in practice, so with all the above being said, I'd be fine with any of the three approaches in this order: current patch Rely on gap between encodings (expiryTime > 128, PB field number | type < 16) Fix the first field and match its number and type exactly Define a new PB field, magic_number that has a constant encoding and match on it (adding a version field for a PB record, really)
          Hide
          ehiggs Ewan Higgs added a comment -

          Ewan Higgs, a comment in the .proto explaining the constraint on field numbers would also be a good idea.

          Will do. I'll wait until the rest of the discussion is settled before submitting an updated patch.

          Thoughts?

          I favour the current patch for the reasons you provided. If there has to be a magic byte, I think implementing it as a version prefix in the protobuf makes the most sense since it can reuse the existing parsing machinery. So my preferences are 1, 3, 2.

          Show
          ehiggs Ewan Higgs added a comment - Ewan Higgs, a comment in the .proto explaining the constraint on field numbers would also be a good idea. Will do. I'll wait until the rest of the discussion is settled before submitting an updated patch. Thoughts? I favour the current patch for the reasons you provided. If there has to be a magic byte, I think implementing it as a version prefix in the protobuf makes the most sense since it can reuse the existing parsing machinery. So my preferences are 1, 3, 2.
          Hide
          daryn Daryn Sharp added a comment -

          I don't feel super strongly. What I meant – and maybe this is #2 from Chris – I'd suggest read the varlong expiry as we do today. If the long (I know, I said byte) is a magic number like min or max long, decode rest as PB which will include the real expiry, else continue decoding as writable. Ex. something like this:

          @Override
            public void readFields(DataInput in) throws IOException {
              this.cache = null;
              expiryDate = WritableUtils.readVLong(in)
              if (expiryDate == MAGIC_PB_VALUE) {
                readFieldsProtobuf(in);
                return;
              }   
              keyId = WritableUtils.readVInt(in);
              ......
            }
          

          Thanks for the stack traces. Something looks concerning... Oracle jdk blows up in WritableUtils.readVInt which decodes a long and throws if outside the range of an int. OpenJDK somehow made it past that check and blew up in WritableUtils.readString while reading the next field. Huh? How is that possible? Do we have a byte ordering issue?

          Show
          daryn Daryn Sharp added a comment - I don't feel super strongly. What I meant – and maybe this is #2 from Chris – I'd suggest read the varlong expiry as we do today. If the long (I know, I said byte) is a magic number like min or max long, decode rest as PB which will include the real expiry, else continue decoding as writable. Ex. something like this: @Override public void readFields(DataInput in) throws IOException { this .cache = null ; expiryDate = WritableUtils.readVLong(in) if (expiryDate == MAGIC_PB_VALUE) { readFieldsProtobuf(in); return ; } keyId = WritableUtils.readVInt(in); ...... } Thanks for the stack traces. Something looks concerning... Oracle jdk blows up in WritableUtils.readVInt which decodes a long and throws if outside the range of an int. OpenJDK somehow made it past that check and blew up in WritableUtils.readString while reading the next field. Huh? How is that possible? Do we have a byte ordering issue?
          Hide
          chris.douglas Chris Douglas added a comment -

          If the long (I know, I said byte) is a magic number like min or max long, decode rest as PB which will include the real expiry, else continue decoding as writable.

          In #2, the magic number matches the field number << 3 | type from PB (0x08 in the current patch, IIRC). Using the WritableUtils parsing machinery is a nice touch, though we still need the marker byte at the head of the stream when we call protobuf. We're still prevented from removing the expiryTime field while we support 3.x. I suppose it can evolve to #3, where we include the expiryTime field to support 3.x clients. Implementors not targeting 2.x get a normal PB record.

          What you describe is closer to #3, where the first byte is an optional PB field that we strip away before handing the record to protobuf. The protobuf parser doesn't know or care if we strip the first, optional field. It'll be vestigial in future implementations that don't support backwards compatibility with 2.x. Parsing is simplified because we don't need mark/reset (or similar), but we add a (trivial) overhead to each record and a weird compatibility case. Also the indignity of adding a version field to a DDL whose purpose was to manage versioning.

          Again, we're splitting hairs, here. It's very unlikely we'll remove/replace expiryTime, or notice the overhead of an explicit version field. I'm +1 on the current patch (with a note in the .proto to help future maintainers), but would be fine with either variant.

          Show
          chris.douglas Chris Douglas added a comment - If the long (I know, I said byte) is a magic number like min or max long, decode rest as PB which will include the real expiry, else continue decoding as writable. In #2, the magic number matches the field number << 3 | type from PB (0x08 in the current patch, IIRC). Using the WritableUtils parsing machinery is a nice touch, though we still need the marker byte at the head of the stream when we call protobuf. We're still prevented from removing the expiryTime field while we support 3.x. I suppose it can evolve to #3, where we include the expiryTime field to support 3.x clients. Implementors not targeting 2.x get a normal PB record. What you describe is closer to #3, where the first byte is an optional PB field that we strip away before handing the record to protobuf. The protobuf parser doesn't know or care if we strip the first, optional field. It'll be vestigial in future implementations that don't support backwards compatibility with 2.x. Parsing is simplified because we don't need mark/reset (or similar), but we add a (trivial) overhead to each record and a weird compatibility case. Also the indignity of adding a version field to a DDL whose purpose was to manage versioning. Again, we're splitting hairs, here. It's very unlikely we'll remove/replace expiryTime, or notice the overhead of an explicit version field. I'm +1 on the current patch (with a note in the .proto to help future maintainers), but would be fine with either variant.
          Hide
          ehiggs Ewan Higgs added a comment -

          Chris Douglas, I think Daryn Sharp was sketching out a version of #2 as he didn't peek the value; he actually consumed it. Solution #3 would be something like making the protobuf look like:

          enum BlockTokenSecretIndicationByte { 
              MAGIC_VALUE = 1;
          }
          message BlockTokenSecretProto {
            required BlockTokenSecretIndicationByte magic = 1 [default=MAGIC_VALUE];
            optional uint64 expiryDate = 2;
            optional uint32 keyId = 3;
            optional string userId = 4;
            optional string blockPoolId = 5;
            optional uint64 blockId = 6;
            repeated AccessModeProto modes = 7;
          }
          

          Then detecting MAGIC_VALUE would involve peeking at the first byte as I currently do.

          I propose we move forward with the current approach. I will add a an updated patch with the documentation to the protobuf message:

          /**
           * Secret information for the BlockKeyProto. This is not sent on the wire as
           * such but is used to pack a byte array and encrypted and put in
           * BlockKeyProto.bytes
           * When adding further fields, make sure they are optional as they would
           * otherwise not be backwards compatible.
           * 
           * Note: As part of the migration from WritableUtils based tokens (aka "legacy")
           * to Protocol Buffers, we use the first byte to determine the type. If the 
           * first byte is <=0 then it is a legacy token. This means that when using
           * protobuf tokens, the the first field sent must have a `field_number` less
           * than 16 to make sure that the first byte is positive. Otherwise it could be
           * parsed as a legacy token. See HDFS-11026 for more discussion.
           */
          

          I also have two more unit tests which check empty messages (e.g. when expiryDate isn't explicitly set).

          Show
          ehiggs Ewan Higgs added a comment - Chris Douglas , I think Daryn Sharp was sketching out a version of #2 as he didn't peek the value; he actually consumed it. Solution #3 would be something like making the protobuf look like: enum BlockTokenSecretIndicationByte { MAGIC_VALUE = 1; } message BlockTokenSecretProto { required BlockTokenSecretIndicationByte magic = 1 [ default =MAGIC_VALUE]; optional uint64 expiryDate = 2; optional uint32 keyId = 3; optional string userId = 4; optional string blockPoolId = 5; optional uint64 blockId = 6; repeated AccessModeProto modes = 7; } Then detecting MAGIC_VALUE would involve peeking at the first byte as I currently do. I propose we move forward with the current approach. I will add a an updated patch with the documentation to the protobuf message: /** * Secret information for the BlockKeyProto. This is not sent on the wire as * such but is used to pack a byte array and encrypted and put in * BlockKeyProto.bytes * When adding further fields, make sure they are optional as they would * otherwise not be backwards compatible. * * Note: As part of the migration from WritableUtils based tokens (aka "legacy" ) * to Protocol Buffers, we use the first byte to determine the type. If the * first byte is <=0 then it is a legacy token. This means that when using * protobuf tokens, the the first field sent must have a `field_number` less * than 16 to make sure that the first byte is positive. Otherwise it could be * parsed as a legacy token. See HDFS-11026 for more discussion. */ I also have two more unit tests which check empty messages (e.g. when expiryDate isn't explicitly set).
          Hide
          ehiggs Ewan Higgs added a comment -

          Attached a new patch which adds the documentation requested by Chris Douglas and added two more tests for empty BlockTokenIdentifiers.

          Show
          ehiggs Ewan Higgs added a comment - Attached a new patch which adds the documentation requested by Chris Douglas and added two more tests for empty BlockTokenIdentifiers.
          Hide
          daryn Daryn Sharp added a comment -

          In the end what I really care about is compatibility during RU, which appears to have been achieved. I won't split hairs over a few bytes so +0 on impl details.

          The non-deterministic decoding exception needs to resolved before integration.

          Jdks should not be decoding different values from the (supposedly) identically encoded bytes! One thinks a long is outside the range of an int, the other things it's in range. This smells like a byte ordering issue which would be very bad for heterogenous jdk environments.

          Show
          daryn Daryn Sharp added a comment - In the end what I really care about is compatibility during RU, which appears to have been achieved. I won't split hairs over a few bytes so +0 on impl details. The non-deterministic decoding exception needs to resolved before integration. Jdks should not be decoding different values from the (supposedly) identically encoded bytes! One thinks a long is outside the range of an int, the other things it's in range. This smells like a byte ordering issue which would be very bad for heterogenous jdk environments.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 21s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 9s Maven dependency ordering for branch
          +1 mvninstall 12m 29s trunk passed
          +1 compile 1m 23s trunk passed
          +1 checkstyle 0m 39s trunk passed
          +1 mvnsite 1m 23s trunk passed
          +1 mvneclipse 0m 26s trunk passed
          +1 findbugs 3m 8s trunk passed
          +1 javadoc 1m 0s trunk passed
          0 mvndep 0m 7s Maven dependency ordering for patch
          +1 mvninstall 1m 15s the patch passed
          +1 compile 1m 19s the patch passed
          +1 cc 1m 19s the patch passed
          +1 javac 1m 19s the patch passed
          -0 checkstyle 0m 37s hadoop-hdfs-project: The patch generated 3 new + 771 unchanged - 3 fixed = 774 total (was 774)
          +1 mvnsite 1m 19s the patch passed
          +1 mvneclipse 0m 21s the patch passed
          -1 whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          +1 xml 0m 1s The patch has no ill-formed XML file.
          +1 findbugs 3m 22s the patch passed
          +1 javadoc 0m 54s the patch passed
          +1 unit 0m 54s hadoop-hdfs-client in the patch passed.
          -1 unit 81m 39s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 20s The patch does not generate ASF License warnings.
          114m 33s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.datanode.TestFsDatasetCache
            hadoop.hdfs.security.TestDelegationTokenForProxyUser
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150
            hadoop.hdfs.server.namenode.ha.TestHAAppend



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-11026
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12852066/HDFS-11026.006.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux 11f4c43ce70b 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 3a0a0a4
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18347/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/18347/artifact/patchprocess/whitespace-eol.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18347/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18347/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18347/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 21s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 9s Maven dependency ordering for branch +1 mvninstall 12m 29s trunk passed +1 compile 1m 23s trunk passed +1 checkstyle 0m 39s trunk passed +1 mvnsite 1m 23s trunk passed +1 mvneclipse 0m 26s trunk passed +1 findbugs 3m 8s trunk passed +1 javadoc 1m 0s trunk passed 0 mvndep 0m 7s Maven dependency ordering for patch +1 mvninstall 1m 15s the patch passed +1 compile 1m 19s the patch passed +1 cc 1m 19s the patch passed +1 javac 1m 19s the patch passed -0 checkstyle 0m 37s hadoop-hdfs-project: The patch generated 3 new + 771 unchanged - 3 fixed = 774 total (was 774) +1 mvnsite 1m 19s the patch passed +1 mvneclipse 0m 21s the patch passed -1 whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 3m 22s the patch passed +1 javadoc 0m 54s the patch passed +1 unit 0m 54s hadoop-hdfs-client in the patch passed. -1 unit 81m 39s hadoop-hdfs in the patch failed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 114m 33s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestFsDatasetCache   hadoop.hdfs.security.TestDelegationTokenForProxyUser   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150   hadoop.hdfs.server.namenode.ha.TestHAAppend Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11026 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12852066/HDFS-11026.006.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux 11f4c43ce70b 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 3a0a0a4 Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18347/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/18347/artifact/patchprocess/whitespace-eol.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/18347/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18347/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18347/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          chris.douglas Chris Douglas added a comment -

          The non-deterministic decoding exception needs to resolved before integration.

          I'm having trouble reproducing it. Ewan Higgs, what version of the Oracle JDK are you using? I tried Oracle 1.8.0_92, 1.8.0_121 and openjdk 1.8.0_121. All throw NegativeArraySizeException.

          Show
          chris.douglas Chris Douglas added a comment - The non-deterministic decoding exception needs to resolved before integration. I'm having trouble reproducing it. Ewan Higgs , what version of the Oracle JDK are you using? I tried Oracle 1.8.0_92, 1.8.0_121 and openjdk 1.8.0_121. All throw NegativeArraySizeException.
          Hide
          chris.douglas Chris Douglas added a comment -

          bq, I think Daryn Sharp was sketching out a version of #2 as he didn't peek the value; he actually consumed it. Solution #3 would be something like making the protobuf look like:

          Maybe we can take this offline, but I think #2 can't consume the first byte; protobuf needs it. #3 can consume the first byte [and the rest of the field]. The field is optional in PB, so it doesn't matter if a 3.x client doesn't see it (stripped in readFields) and a later version ignores it.

          Show
          chris.douglas Chris Douglas added a comment - bq, I think Daryn Sharp was sketching out a version of #2 as he didn't peek the value; he actually consumed it. Solution #3 would be something like making the protobuf look like: Maybe we can take this offline, but I think #2 can't consume the first byte; protobuf needs it. #3 can consume the first byte [and the rest of the field] . The field is optional in PB, so it doesn't matter if a 3.x client doesn't see it (stripped in readFields) and a later version ignores it.
          Hide
          ehiggs Ewan Higgs added a comment -

          The non-deterministic decoding exception needs to resolved before integration.

          I'm having trouble reproducing it.

          Upon further investigation I found that it was a timing issue rather than a difference between Oracle and Open JDK.

          Because the expiryDate is first and the tests use Time.now() in generating the tokens, sometimes the expiryDate written by the protobuf happens to be parseable by the readFieldsLegacy function. Of course this is only an issue if we have a protobuf BlockTokenIdentifier and then we force it to be read by the old parser (readFieldsLegacy) - which we do not do.

          An example BlockTokenIdentifier with expiryDate that will throw IOException is: 2017-02-09 00:12:35,072+0100. In contrast, 2017-02-09 00:12:35,071+0100 will raise java.lang.NegativeArraySizeException.

          I'm adding a new test that demonstrates the different exceptions with crafted timestamps.

          Show
          ehiggs Ewan Higgs added a comment - The non-deterministic decoding exception needs to resolved before integration. I'm having trouble reproducing it. Upon further investigation I found that it was a timing issue rather than a difference between Oracle and Open JDK. Because the expiryDate is first and the tests use Time.now() in generating the tokens, sometimes the expiryDate written by the protobuf happens to be parseable by the readFieldsLegacy function. Of course this is only an issue if we have a protobuf BlockTokenIdentifier and then we force it to be read by the old parser ( readFieldsLegacy ) - which we do not do. An example BlockTokenIdentifier with expiryDate that will throw IOException is: 2017-02-09 00:12:35,072+0100 . In contrast, 2017-02-09 00:12:35,071+0100 will raise java.lang.NegativeArraySizeException . I'm adding a new test that demonstrates the different exceptions with crafted timestamps.
          Hide
          ehiggs Ewan Higgs added a comment -

          Attaching the aforementioned patch.

          Show
          ehiggs Ewan Higgs added a comment - Attaching the aforementioned patch.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 1s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 7s Maven dependency ordering for branch
          +1 mvninstall 12m 36s trunk passed
          +1 compile 1m 22s trunk passed
          +1 checkstyle 0m 41s trunk passed
          +1 mvnsite 1m 22s trunk passed
          +1 mvneclipse 0m 26s trunk passed
          +1 findbugs 3m 8s trunk passed
          +1 javadoc 1m 1s trunk passed
          0 mvndep 0m 7s Maven dependency ordering for patch
          +1 mvninstall 1m 16s the patch passed
          +1 compile 1m 19s the patch passed
          +1 cc 1m 19s the patch passed
          +1 javac 1m 19s the patch passed
          -0 checkstyle 0m 37s hadoop-hdfs-project: The patch generated 3 new + 771 unchanged - 3 fixed = 774 total (was 774)
          +1 mvnsite 1m 17s the patch passed
          +1 mvneclipse 0m 20s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          +1 findbugs 3m 19s the patch passed
          +1 javadoc 0m 54s the patch passed
          +1 unit 0m 54s hadoop-hdfs-client in the patch passed.
          -1 unit 88m 35s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 19s The patch does not generate ASF License warnings.
          121m 21s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency
            hadoop.hdfs.TestRollingUpgrade
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-11026
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12852374/HDFS-11026.007.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
          uname Linux a7b7afd6ebce 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 464ff47
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18358/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18358/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18358/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18358/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 1s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 7s Maven dependency ordering for branch +1 mvninstall 12m 36s trunk passed +1 compile 1m 22s trunk passed +1 checkstyle 0m 41s trunk passed +1 mvnsite 1m 22s trunk passed +1 mvneclipse 0m 26s trunk passed +1 findbugs 3m 8s trunk passed +1 javadoc 1m 1s trunk passed 0 mvndep 0m 7s Maven dependency ordering for patch +1 mvninstall 1m 16s the patch passed +1 compile 1m 19s the patch passed +1 cc 1m 19s the patch passed +1 javac 1m 19s the patch passed -0 checkstyle 0m 37s hadoop-hdfs-project: The patch generated 3 new + 771 unchanged - 3 fixed = 774 total (was 774) +1 mvnsite 1m 17s the patch passed +1 mvneclipse 0m 20s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 3m 19s the patch passed +1 javadoc 0m 54s the patch passed +1 unit 0m 54s hadoop-hdfs-client in the patch passed. -1 unit 88m 35s hadoop-hdfs in the patch failed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 121m 21s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency   hadoop.hdfs.TestRollingUpgrade   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11026 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12852374/HDFS-11026.007.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux a7b7afd6ebce 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 464ff47 Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18358/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/18358/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18358/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18358/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          chris.douglas Chris Douglas added a comment -

          +1 Thanks for the detail, Ewan. I committed this.

          This isn't marked as an incompatible change because this isn't a public API and it's compatible, but libraries implementing the HDFS client protocol will need to be updated. Added that to the release note.

          Show
          chris.douglas Chris Douglas added a comment - +1 Thanks for the detail, Ewan. I committed this. This isn't marked as an incompatible change because this isn't a public API and it's compatible, but libraries implementing the HDFS client protocol will need to be updated. Added that to the release note.
          Hide
          andrew.wang Andrew Wang added a comment -

          Really happy to see this in, thanks Ewan, Chris, and Daryn for getting it over the finish line!

          Show
          andrew.wang Andrew Wang added a comment - Really happy to see this in, thanks Ewan, Chris, and Daryn for getting it over the finish line!
          Hide
          hudson Hudson added a comment -

          ABORTED: Integrated in Jenkins build Hadoop-trunk-Commit #11240 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11240/)
          HDFS-11026. Convert BlockTokenIdentifier to use Protobuf. Contributed by (cdouglas: rev 4ed33e9ca3d85568e3904753a3ef61a85f801838)

          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/KeyManager.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockTokenIdentifier.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockTokenSecretManager.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/hdfs.proto
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/security/token/block/TestBlockToken.java
          Show
          hudson Hudson added a comment - ABORTED: Integrated in Jenkins build Hadoop-trunk-Commit #11240 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11240/ ) HDFS-11026 . Convert BlockTokenIdentifier to use Protobuf. Contributed by (cdouglas: rev 4ed33e9ca3d85568e3904753a3ef61a85f801838) (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/KeyManager.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockTokenIdentifier.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockTokenSecretManager.java (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/hdfs.proto (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/security/token/block/TestBlockToken.java

            People

            • Assignee:
              ehiggs Ewan Higgs
              Reporter:
              ehiggs Ewan Higgs
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development