Description
FSN write/read lock log statement only prints longest lock held interval not its stack trace during suppress warning interval. Only current thread is printed, but it looks not so useful. Once NN is slowing down, the most important thing we take care is that which operation holds longest time of the lock.
Following is log printed based on current logic.
2018-09-30 13:56:06,700 INFO [IPC Server handler 119 on 8020] org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock held for 11 ms via java.lang.Thread.getStackTrace(Thread.java:1589) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1688) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4281) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4247) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4183) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4167) org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:848)org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311) org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625) org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226) org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2222) java.security.AccessController.doPrivileged(Native Method) javax.security.auth.Subject.doAs(Subject.java:415) org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220) Number of suppressed write-lock reports: 14 Longest write-lock held interval: 70
Also it will be good for the trouble shooting.