Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-6825

Edit log corruption due to delayed block removal

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.5.0
    • Fix Version/s: 2.6.0
    • Component/s: namenode
    • Labels:
      None

      Description

      Observed the following stack:

      2014-08-04 23:49:44,133 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., newlength=..., newtargets=..., closeFile=true, deleteBlock=false)
      2014-08-04 23:49:44,133 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception while updating disk space. 
      java.io.FileNotFoundException: Path not found: /solr/hierarchy/core_node1/data/tlog/tlog.xyz
              at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807)
              at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975)
              at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178)
              at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146)
              at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662)
              at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270)
              at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073)
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
      

      Found this is what happened:

      • client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz
      • client tried to append to this file, but the lease expired, so lease recovery is started, thus the append failed
      • the file get deleted, however, there are still pending blocks of this file not deleted
      • then commitBlockSynchronization() method is called (see stack above), an InodeFile is created out of the pending block, not aware of that the file was deleted already
      • FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but swallowed by commitOrCompleteLastBlock
      • closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction and wrote CloseOp to the edit log

        Attachments

        1. HDFS-6825.001.patch
          21 kB
          Yongjun Zhang
        2. HDFS-6825.002.patch
          23 kB
          Yongjun Zhang
        3. HDFS-6825.003.patch
          22 kB
          Yongjun Zhang
        4. HDFS-6825.004.patch
          15 kB
          Yongjun Zhang
        5. HDFS-6825.005.patch
          19 kB
          Yongjun Zhang

          Issue Links

            Activity

              People

              • Assignee:
                yzhangal Yongjun Zhang
                Reporter:
                yzhangal Yongjun Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: