Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14126

DataNode DirectoryScanner holding global lock for too long

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 3.0.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I've got a Hadoop 3 based cluster set up, and this DN has just 434 thousand blocks.

      And yet, DirectoryScanner holds the fsdataset lock for 2.7 seconds:

      2018-12-03 21:33:09,130 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-4588049-10.17.XXX-XX-XXXX281857726 Total blocks: 434401, missing metadata fi
      les:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0
      2018-12-03 21:33:09,131 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Lock held time above threshold: lock identifier: org.apache.hadoop.hdfs.serve
      r.datanode.fsdataset.impl.FsDatasetImpl lockHeldTimeMs=2710 ms. Suppressed 0 lock warnings. The stack trace is: java.lang.Thread.getStackTrace(Thread.java:1559)
      org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
      org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
      org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
      org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
      org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
      org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
      org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:473)
      org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:373)
      org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:318)
      java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      java.lang.Thread.run(Thread.java:748)

      Log messages like this repeats every several hours (6, to be exact). I am not sure if this is a performance regression, or just the fact that the lock information is printed in Hadoop 3. Chen Liang or Daniel Templeton do you know?

      There's no log in DN to indicate any sort of JVM GC going on. Plus, the DN's heap size is set to several GB.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                weichiu Wei-Chiu Chuang
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: