When directoryScanner have the results of differences between disk and in-memory blocks. it will try to run checkAndUpdate to fix it. However FsDatasetImpl.checkAndUpdate is a synchronized call
As I have about 6millions blocks for every datanodes and every 6hours' scan will have about 25000 abnormal blocks to fix. That leads to a long lock holding FsDatasetImpl object.
let's assume every block need 10ms to fix(because of latency of SAS disk), that will cost 250 seconds to finish. That means all reads and writes will be blocked for 3mins for that datanode.
Take long time to process command from nn because threads are blocked. And namenode will see long lastContact time for this datanode.
Maybe this affect all hdfs versions.
how to fix:
just like process invalidate command from namenode with 1000 batch size, fix these abnormal block should be handled with batch too and sleep 2 seconds between the batch to allow normal reading/writing blocks.