Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13757

After HDFS-12886, close() can throw AssertionError "Negative replicas!"

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.1.0, 2.10.0, 2.9.1, 3.2.0, 3.0.3
    • Fix Version/s: None
    • Component/s: namenode
    • Labels:
      None

      Description

      While investigating a data corruption bug caused by concurrent recoverLease() and close(), I found HDFS-12886 may cause close() to throw AssertionError under a corner case, because the block has zero live replica, and client calls recoverLease() immediately followed by close().

      org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Negative replicas!
      at org.apache.hadoop.hdfs.server.blockmanagement.LowRedundancyBlocks.getPriority(LowRedundancyBlocks.java:197)
      at org.apache.hadoop.hdfs.server.blockmanagement.LowRedundancyBlocks.update(LowRedundancyBlocks.java:422)
      at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.updateNeededReconstructions(BlockManager.java:4274)
      at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.commitOrCompleteLastBlock(BlockManager.java:1001)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3471)
      at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFileInternal(FSDirWriteFileOp.java:713)
      at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFile(FSDirWriteFileOp.java:671)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2854)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:928)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:607)
      at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
      at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
      at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:422)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
      

      I have a test case to reproduce it.

      Lukas Majercak Íñigo Goiri would you please take a look at it? I think we should add a check to reject completeFile() if the block is under recovery, similar to what's proposed in HDFS-10240.

        Attachments

        1. HDFS-13757.test.02.patch
          6 kB
          Wei-Chiu Chuang
        2. HDFS-13757.test.patch
          3 kB
          Wei-Chiu Chuang

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                weichiu Wei-Chiu Chuang
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated: