Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14126

DataNode DirectoryScanner holding global lock for too long

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 3.0.0
    • None
    • None
    • None

    Description

      I've got a Hadoop 3 based cluster set up, and this DN has just 434 thousand blocks.

      And yet, DirectoryScanner holds the fsdataset lock for 2.7 seconds:

      2018-12-03 21:33:09,130 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-4588049-10.17.XXX-XX-XXXX281857726 Total blocks: 434401, missing metadata fi
      les:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0
      2018-12-03 21:33:09,131 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Lock held time above threshold: lock identifier: org.apache.hadoop.hdfs.serve
      r.datanode.fsdataset.impl.FsDatasetImpl lockHeldTimeMs=2710 ms. Suppressed 0 lock warnings. The stack trace is: java.lang.Thread.getStackTrace(Thread.java:1559)
      org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
      org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
      org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
      org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
      org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
      org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
      org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:473)
      org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:373)
      org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:318)
      java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      java.lang.Thread.run(Thread.java:748)

      Log messages like this repeats every several hours (6, to be exact). I am not sure if this is a performance regression, or just the fact that the lock information is printed in Hadoop 3. vagarychen or templedf do you know?

      There's no log in DN to indicate any sort of JVM GC going on. Plus, the DN's heap size is set to several GB.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              weichiu Wei-Chiu Chuang
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: