Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16833

InstrumentedLock should log lock queue time

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0
    • util
    • None

    Description

      Within the Datanode, we often see contention around the FsDatasetImpl lock. This can be for various reasons, eg the DN is under IO load, and other bugs, eg HDFS-15131.

      When DN slow downs happen, it is very difficult to debug what is causing it, as there are few messages in the logs which indicate what is happening.

      In my experience, the presence of this log is informative:

      2019-11-27 09:00:49,678 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Took 21540ms to process 1 commands from NN
      

      This comes from the datanode heartbeat thread, and it is logged when the commands returned from the heartbeat cannot be enqueued, and usually this is because the thread cannot get the FsDatasetImpl lock due to contention with other slow threads.

      HDFS-14997 moved the command processing to an async thread, and hence this useful message will disappear.

      InstrumentedLock introduced a feature to the datanode, where it will log a stack trace if any thread holds the lock for over 300ms. However this will not catch a scenario where 10 threads each hold a lock for 200ms, leading to the next in the queue having waited over 2 seconds.

      I believe it would be useful to extend InstrumentedLock to log if a thread has to wait for over some threshold. That way, we could be able to catch scenarios like the heartbeat thread shows us, but in a much more obvious way, provided lock contention is involved.

      Attachments

        1. HADOOP-16833.006.patch
          15 kB
          Stephen O'Donnell
        2. HADOOP-16833.005.patch
          15 kB
          Stephen O'Donnell
        3. HADOOP-16833.004.patch
          14 kB
          Stephen O'Donnell
        4. HADOOP-16833.003.patch
          14 kB
          Stephen O'Donnell
        5. HADOOP-16833.002.patch
          9 kB
          Stephen O'Donnell
        6. HADOOP-16833.001.patch
          9 kB
          Stephen O'Donnell

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sodonnell Stephen O'Donnell Assign to me
            sodonnell Stephen O'Donnell
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment