Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8310

Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.6.0
    • 2.10.0, 3.2.0, 3.1.1, 3.0.3
    • None
    • None
    • Reviewed

    Description

      In some recent upgrade testing, we saw this error causing the NodeManager to fail to startup afterwards:

      org.apache.hadoop.service.ServiceStateException: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).
      	at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
      	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
      	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
      	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:441)
      	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
      	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834)
      	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:895)
      Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).
      	at com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
      	at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
      	at org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.<init>(YarnSecurityTokenProtos.java:1860)
      	at org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.<init>(YarnSecurityTokenProtos.java:1824)
      	at org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2016)
      	at org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2011)
      	at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
      	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
      	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
      	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
      	at org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.parseFrom(YarnSecurityTokenProtos.java:2686)
      	at org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:254)
      	at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:177)
      	at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:322)
      	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:455)
      	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:373)
      	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
      	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
      	... 5 more
      

      The NodeManager fails because it's trying to read a ContainerTokenIdentifier in the "old" format before we changed them to protobufs (YARN-668). This is very similar to YARN-5594 where we ran into a similar problem with the ResourceManager and RM Delegation Tokens.

      To provide a better experience, we should make the code able to read the old format if it's unable to read it using the new format. We didn't run into any errors with the other two types of tokens that YARN-668 incompatibly changed (NMTokenIdentifier and AMRMTokenIdentifier), but we may as well fix those while we're at it.

      Attachments

        1. YARN-8310.003.patch
          21 kB
          Robert Kanter
        2. YARN-8310.branch-2.003.patch
          21 kB
          Robert Kanter
        3. YARN-8310.002.patch
          19 kB
          Robert Kanter
        4. YARN-8310.branch-2.002.patch
          19 kB
          Robert Kanter
        5. YARN-8310.001.patch
          18 kB
          Robert Kanter
        6. YARN-8310.branch-2.001.patch
          17 kB
          Robert Kanter

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rkanter Robert Kanter Assign to me
            rkanter Robert Kanter
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment