Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15566

NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.3.1, 3.4.0
    • 3.3.1, 3.4.0
    • namenode
    • None

    Description

      • After rollingUpgrade NN from 3.1.3/3.2.1 to 3.3.0, if the NN is restarted, it fails while replaying edit logs.
      • HDFS-14922HDFS-14924, and HDFS-15054 introduced the modification time bits to the editLog transactions.
      • When NN is restarted and the edit logs are replayed, the NN reads the old layout version from the editLog file. When parsing the transactions, it assumes that the transactions are also from the previous layout and hence skips parsing the modification time bits.
      • This cascades into reading the wrong set of bits for other fields and leads to NN shutting down.
      2020-09-07 19:34:42,085 | DEBUG | main | Stopping client | Client.java:1361
      2020-09-07 19:34:42,087 | ERROR | main | Failed to start namenode. | NameNode.java:1751
      java.lang.IllegalArgumentException
       at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
       at org.apache.hadoop.ipc.ClientId.toString(ClientId.java:56)
       at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendRpcIdsToString(FSEditLogOp.java:318)
       at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$700(FSEditLogOp.java:153)
       at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$DeleteSnapshotOp.toString(FSEditLogOp.java:3606)
       at java.lang.String.valueOf(String.java:2994)
       at java.lang.StringBuilder.append(StringBuilder.java:131)
       at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:305)
       at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:188)
       at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:932)
       at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:779)
       at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:337)
       at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1136)
       at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:742)
       at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:654)
       at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:716)
       at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:959)
       at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:932)
       at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1674)
       at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1744)

      Attachments

        1. HDFS-15566-001.patch
          3 kB
          Brahma Reddy Battula
        2. HDFS-15566-002.patch
          5 kB
          Brahma Reddy Battula
        3. HDFS-15566-003.patch
          7 kB
          Brahma Reddy Battula

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            brahmareddy Brahma Reddy Battula
            brahmareddy Brahma Reddy Battula
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment