Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13946

Log longest FSN write/read lock held stack trace

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.1.1
    • 3.3.0
    • None
    • None
    • Reviewed

    Description

      FSN write/read lock log statement only prints longest lock held interval not its stack trace during suppress warning interval. Only current thread is printed, but it looks not so useful. Once NN is slowing down, the most important thing we take care is that which operation holds longest time of the lock.
      Following is log printed based on current logic.

      2018-09-30 13:56:06,700 INFO [IPC Server handler 119 on 8020] org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock held for 11 ms via
      java.lang.Thread.getStackTrace(Thread.java:1589)
      org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
      org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
      org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1688)
      org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4281)
      org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4247)
      org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4183)
      org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4167)
      org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:848)org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
      org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
      org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
      org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
      org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
      org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2222)
      java.security.AccessController.doPrivileged(Native Method)
      javax.security.auth.Subject.doAs(Subject.java:415)
      org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
      org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
              Number of suppressed write-lock reports: 14
              Longest write-lock held interval: 70
      

      Also it will be good for the trouble shooting.

      Attachments

        1. HDFS-13946.001.patch
          4 kB
          Yiqun Lin
        2. HDFS-13946.002.patch
          8 kB
          Yiqun Lin
        3. HDFS-13946.003.patch
          8 kB
          Yiqun Lin
        4. HDFS-13946.004.patch
          11 kB
          Yiqun Lin
        5. HDFS-13946.005.patch
          11 kB
          Yiqun Lin
        6. HDFS-13946.006.patch
          12 kB
          Yiqun Lin
        7. HDFS-13946.007.patch
          14 kB
          Erik Krogen
        8. HDFS-13946.008.patch
          14 kB
          Yiqun Lin

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            linyiqun Yiqun Lin Assign to me
            linyiqun Yiqun Lin
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment