Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16833

InstrumentedLock should log lock queue time

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.3.0
    • Fix Version/s: 3.3.0
    • Component/s: util
    • Labels:
      None

      Description

      Within the Datanode, we often see contention around the FsDatasetImpl lock. This can be for various reasons, eg the DN is under IO load, and other bugs, eg HDFS-15131.

      When DN slow downs happen, it is very difficult to debug what is causing it, as there are few messages in the logs which indicate what is happening.

      In my experience, the presence of this log is informative:

      2019-11-27 09:00:49,678 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Took 21540ms to process 1 commands from NN
      

      This comes from the datanode heartbeat thread, and it is logged when the commands returned from the heartbeat cannot be enqueued, and usually this is because the thread cannot get the FsDatasetImpl lock due to contention with other slow threads.

      HDFS-14997 moved the command processing to an async thread, and hence this useful message will disappear.

      InstrumentedLock introduced a feature to the datanode, where it will log a stack trace if any thread holds the lock for over 300ms. However this will not catch a scenario where 10 threads each hold a lock for 200ms, leading to the next in the queue having waited over 2 seconds.

      I believe it would be useful to extend InstrumentedLock to log if a thread has to wait for over some threshold. That way, we could be able to catch scenarios like the heartbeat thread shows us, but in a much more obvious way, provided lock contention is involved.

        Attachments

        1. HADOOP-16833.001.patch
          9 kB
          Stephen O'Donnell
        2. HADOOP-16833.002.patch
          9 kB
          Stephen O'Donnell
        3. HADOOP-16833.003.patch
          14 kB
          Stephen O'Donnell
        4. HADOOP-16833.004.patch
          14 kB
          Stephen O'Donnell
        5. HADOOP-16833.005.patch
          15 kB
          Stephen O'Donnell
        6. HADOOP-16833.006.patch
          15 kB
          Stephen O'Donnell

          Activity

            People

            • Assignee:
              sodonnell Stephen O'Donnell
              Reporter:
              sodonnell Stephen O'Donnell
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: