Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13671

Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

    XMLWordPrintableJSON

Details

    Description

      NameNode hung when deleting large files/blocks. The stack info:

      "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 tid=0x00007fb505b27800 nid=0x94c3 runnable [0x00007fa861361000]
         java.lang.Thread.State: RUNNABLE
      	at org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
      	at org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
      	at org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
      	at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
      	at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
      	at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
      	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
      	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
      	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
      	at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
      	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
      

      In the current deletion logic in NameNode, there are mainly two steps:

      • Collect INodes and all blocks to be deleted, then delete INodes.
      • Remove blocks chunk by chunk in a loop.
        Actually the first step should be a more expensive operation and will takes more time. However, now we always see NN hangs during the remove block operation.

      Looking into this, we introduced a new structure FoldedTreeSet to have a better performance in dealing FBR/IBRs. But compared with early implementation in remove-block logic, FoldedTreeSet seems more slower since It will take additional time to balance tree node. When there are large block to be removed/deleted, it looks bad.

      For the get type operations in DatanodeStorageInfo, we only provide the getBlockIterator to return blocks iterator and no other get operation with specified block. Still we need to use FoldedTreeSet in DatanodeStorageInfo? As we know FoldedTreeSet is benefit for Get not Update. Maybe we can revert this to the early implementation.

      Attachments

        1. image-2021-06-18-15-47-04-037.png
          150 kB
          Haibin Huang
        2. image-2021-06-18-15-46-46-052.png
          185 kB
          Haibin Huang
        3. image-2021-06-10-19-28-58-359.png
          91 kB
          Haibin Huang
        4. image-2021-06-10-19-28-18-373.png
          113 kB
          Haibin Huang
        5. HDFS-13671-001.patch
          151 kB
          Haibin Huang

        Issue Links

          Activity

            People

              huanghaibin Haibin Huang
              linyiqun Yiqun Lin
              Votes:
              6 Vote for this issue
              Watchers:
              68 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 7h 40m
                  7h 40m