Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16574

Reduces the time it takes once to hold FSNamesystem write lock to remove blocks associated with dead datanodes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • namenode

    Description

      In BlockManager.removeBlocksAssociatedTo(final DatanodeStorageInfo storageInfo)

      /** Remove the blocks associated to the given datanode. */
        void removeBlocksAssociatedTo(final DatanodeDescriptor node) {
          providedStorageMap.removeDatanode(node);
          final Iterator<BlockInfo> it = node.getBlockIterator();
          while(it.hasNext()) {
            removeStoredBlock(it.next(), node);
          }
          // Remove all pending DN messages referencing this DN.
          pendingDNMessages.removeAllMessagesForDatanode(node);
      
          node.resetBlocks();
          invalidateBlocks.remove(node);
        }
      

      it holds FSNamesystem write lock to remove all blocks associated with a dead data node,which can take a lot of time when the datanode has a large number of blocks. eg :49.2s for 22.30M blocks,7.9s for 2.84M blocks.

      # ${ip-1} has 22304612 blocks
      2022-05-08 13:43:35,864 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/${ip-1}:800
      2022-05-08 13:43:35,864 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock held for 49184 ms via
      java.lang.Thread.getStackTrace(Thread.java:1552)
      org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1021)
      org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:263)
      org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:220)
      org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1579)
      org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager.heartbeatCheck(HeartbeatManager.java:409)
      org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor.run(HeartbeatManager.java:436)
      java.lang.Thread.run(Thread.java:745)
              Number of suppressed write-lock reports: 0
              Longest write-lock held interval: 49184 
      
      
      # ${ip-2} has 22304612 blocks
      2022-05-08 08:11:55,559 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/${ip-2}:800
      2022-05-08 08:11:55,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock held for 7925 ms via
      java.lang.Thread.getStackTrace(Thread.java:1552)
      org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1021)
      org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:263)
      org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:220)
      org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1579)
      org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager.heartbeatCheck(HeartbeatManager.java:409)
      org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor.run(HeartbeatManager.java:436)
      java.lang.Thread.run(Thread.java:745)
          Number of suppressed write-lock reports: 0
          Longest write-lock held interval: 7925

      This whill block all RPC requests that require FSNamesystem lock,and cannot process HealthMonitor request from zkfc in time,which eventually leads to namenode failover

      2022-05-08 13:43:32,279 WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at hd044.corp.yodao.com/10.108.162.60:8000: java.net.SocketTimeoutException: 45000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.108.162.60:16861 remote=hd044.corp.yodao.com/10.108.162.60:8000] Call From hd044.corp.yodao.com/10.108.162.60 to hd044.corp.yodao.com:8000 failed on socket timeout exception: java.net.SocketTimeoutException: 45000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.108.162.60:16861 remote=hd044.corp.yodao.com/10.108.162.60:8000]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout
      2022-05-08 13:43:32,283 INFO org.apache.hadoop.ha.HealthMonitor: Entering state SERVICE_NOT_RESPONDING
      2022-05-08 13:43:32,283 INFO org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at hd044.corp.yodao.com/10.108.162.60:8000 entered state: SERVICE_NOT_RESPONDING
      2022-05-08 13:43:32,922 INFO org.apache.hadoop.hdfs.tools.DFSZKFailoverController: -- Local NN thread dump --
      Process Thread Dump: 
      ......
      Thread 100 (IPC Server handler 0 on 8000):
        State: WAITING
        Blocked count: 14895292
        Waited count: 210385351
        Waiting on java.util.concurrent.locks.ReentrantReadWriteLock$FairSync@5723ab57
        Stack:
          sun.misc.Unsafe.park(Native Method)
          java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
          java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
          java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
          java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
          java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
          org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readLock(FSNamesystemLock.java:144)
          org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readLock(FSNamesystem.java:1556)
          org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1842)
          org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:707)
          org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381)
          org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
          org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
          org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
          org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
          org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
          java.security.AccessController.doPrivileged(Native Method)
          javax.security.auth.Subject.doAs(Subject.java:422)
          org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
          org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
      ......
      Thread 41 (org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@7c351808):
        State: RUNNABLE
        Blocked count: 10
        Waited count: 720146
        Stack:
          org.apache.hadoop.hdfs.util.LightWeightHashSet.resize(LightWeightHashSet.java:479)
          org.apache.hadoop.hdfs.util.LightWeightHashSet.expandIfNecessary(LightWeightHashSet.java:497)
          org.apache.hadoop.hdfs.util.LightWeightHashSet.add(LightWeightHashSet.java:244)
          org.apache.hadoop.hdfs.server.blockmanagement.UnderReplicatedBlocks.update(UnderReplicatedBlocks.java:312)
          org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.updateNeededReplications(BlockManager.java:3718)
          org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeStoredBlock(BlockManager.java:3310)
          org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlocksAssociatedTo(BlockManager.java:1314)
          org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDatanode(DatanodeManager.java:638)
          org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:683)
          org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager.heartbeatCheck(HeartbeatManager.java:407)
          org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor.run(HeartbeatManager.java:436)
          java.lang.Thread.run(Thread.java:745)
      .....
       -- Local NN thread dump --
      2022-05-08 13:43:32,923 INFO org.apache.hadoop.ha.ZKFailoverController: Quitting master election for NameNode at hd044.corp.yodao.com/10.108.162.60:8000 and marking that fencing is necessary
      2022-05-08 13:43:32,923 INFO org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election
      2022-05-08 13:43:32,925 INFO org.apache.zookeeper.ZooKeeper: Session: 0x97e5d7aac530e03 closed
      2022-05-08 13:43:32,925 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x97e5d7aac530e03
      2022-05-08 13:43:32,926 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
      2022-05-08 13:43:36,186 INFO org.apache.hadoop.ha.HealthMonitor: Entering state SERVICE_HEALTHY

      After it has processed some blocks or consumed some time, it should release the FSNamesystem lock for a period of time so that other Rpc requests can be processed in time

      Attachments

        Issue Links

          Activity

            People

              Happy-shi ZhiWei Shi
              Happy-shi ZhiWei Shi
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m