Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-2090

BackupNode fails when log is streamed due checksum error

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.23.0
    • None
    • namenode
    • None

    Description

      Reproductions steps:

      1) An HDFS cluster is up and running
      2) A backupnode is up, running, and registered to the namenode
      3) Do a write operation like copying a file to the FS.

      Expected Result: No exception is thrown
      Actual Result: A exception is thrown due a checksum error in the streamed log:

      log

      11/06/15 17:52:22 INFO ipc.Server: IPC Server handler 1 on 50100, call journal(NamenodeRegistration(localhost:8020, role=NameNode), 101, 164, [B@3951f910), rpc version=1, client version=5, methodsFingerPrint=302283637 from 192.168.1.102:56780: error: java.io.IOException: Error replaying edit log at offset 13
      Recent opcode offsets: 1
      java.io.IOException: Error replaying edit log at offset 13
      Recent opcode offsets: 1
      at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:514)
      at org.apache.hadoop.hdfs.server.namenode.BackupImage.journal(BackupImage.java:242)
      at org.apache.hadoop.hdfs.server.namenode.BackupNode.journal(BackupNode.java:251)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:422)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1496)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1492)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1490)
      Caused by: org.apache.hadoop.fs.ChecksumException: Transaction 1 is corrupt. Calculated checksum is -2116249809 but read checksum 0
      at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.validateChecksum(FSEditLogLoader.java:546)
      at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:490)
      ... 13 more

      Attachments

        Activity

          People

            Unassigned Unassigned
            aoriani André Oriani
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: