Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
3.0.0
-
None
-
None
-
None
Description
I've got a Hadoop 3 based cluster set up, and this DN has just 434 thousand blocks.
And yet, DirectoryScanner holds the fsdataset lock for 2.7 seconds:
2018-12-03 21:33:09,130 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-4588049-10.17.XXX-XX-XXXX281857726 Total blocks: 434401, missing metadata fi
les:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0
2018-12-03 21:33:09,131 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Lock held time above threshold: lock identifier: org.apache.hadoop.hdfs.serve
r.datanode.fsdataset.impl.FsDatasetImpl lockHeldTimeMs=2710 ms. Suppressed 0 lock warnings. The stack trace is: java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:473)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:373)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:318)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
Log messages like this repeats every several hours (6, to be exact). I am not sure if this is a performance regression, or just the fact that the lock information is printed in Hadoop 3. vagarychen or templedf do you know?
There's no log in DN to indicate any sort of JVM GC going on. Plus, the DN's heap size is set to several GB.
Attachments
Issue Links
- duplicates
-
HDFS-14476 lock too long when fix inconsistent blocks between disk and in-memory
- Resolved
- relates to
-
HDFS-14476 lock too long when fix inconsistent blocks between disk and in-memory
- Resolved