Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
3.3.5
-
None
Description
The UT TestLeaseRecovery2#testHardLeaseRecoveryAfterNameNodeRestart failed with error message: Waiting for cluster to become active. And the blocking jstack as bellows:
"BP-1618793397-192.168.3.4-1669198559828 heartbeating to localhost/127.0.0.1:54673" #260 daemon prio=5 os_prio=31 tid=0x 00007fc1108fa000 nid=0x19303 waiting on condition [0x0000700017884000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000007430a9ec0> (a java.util.concurrent.SynchronousQueue$TransferQueue) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.SynchronousQueue$TransferQueue.awaitFulfill(SynchronousQueue.java:762) at java.util.concurrent.SynchronousQueue$TransferQueue.transfer(SynchronousQueue.java:695) at java.util.concurrent.SynchronousQueue.put(SynchronousQueue.java:877) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1186) at org.apache.hadoop.ipc.Client.call(Client.java:1482) at org.apache.hadoop.ipc.Client.call(Client.java:1429) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139) at com.sun.proxy.$Proxy23.sendHeartbeat(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClient SideTranslatorPB.java:168) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:570) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:714) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:915) at java.lang.Thread.run(Thread.java:748)
After looking into the code and found that this bug is imported by HADOOP-18324. Because RpcRequestSender exited without cleaning up the rpcRequestQueue, then caused BPServiceActor was blocked in sending request.
Attachments
Issue Links
- blocks
-
HADOOP-18470 Release hadoop 3.3.5
- Resolved
- is caused by
-
HADOOP-18324 Interrupting RPC Client calls can lead to thread exhaustion
- Resolved
- is duplicated by
-
HDFS-16878 TestLeaseRecovery2 timeouts
- Resolved
- links to