Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0-alpha3
Description
ErasureCodingWorker#stripedReconstructionPool is with corePoolSize=2 and maxPoolSize=8 as default. And it rejects more tasks if the queue is full.
When BlockManager#maxReplicationStream is larger than ErasureCodingWorker#stripedReconstructionPool#corePoolSize/maxPoolSize, for example, maxReplicationStream=20 and corePoolSize=2 , maxPoolSize=8. Meanwhile, NN sends up to maxTransfer reconstruction tasks to DN for each heartbeat, and it is calculated in FSNamesystem:
final int maxTransfer = blockManager.getMaxReplicationStreams() - xmitsInProgress;
However, at any giving time, {ErasureCodingWorker#stripedReconstructionPool takes 2 xmitInProcess. So for each heartbeat in 3s, NN will send about 20-2 = 18 reconstruction tasks to the DN, and DN throw away most of them if there were 8 tasks in the queue already. So NN needs to take longer to re-consider these blocks were under-replicated to schedule new tasks.
Attachments
Attachments
Issue Links
- is related to
-
HDFS-12215 DataNode#transferBlock does not create its daemon in the xceiver thread group
- Resolved
-
HDFS-12208 NN should consider DataNode#xmitInProgress when placing new block
- Open