[HDFS-12044] Mismatch between BlockManager#maxReplicationStreams and ErasureCodingWorker.stripedReconstructionPool pool size causes slow and bursty recovery - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0-alpha3
Fix Version/s: 3.0.0-beta1
Component/s: erasure-coding
Labels:
- hdfs-ec-3.0-must-do

Target Version/s:

3.0.0-beta1

Description

ErasureCodingWorker#stripedReconstructionPool is with corePoolSize=2 and maxPoolSize=8 as default. And it rejects more tasks if the queue is full.

When BlockManager#maxReplicationStream is larger than ErasureCodingWorker#stripedReconstructionPool#corePoolSize/maxPoolSize, for example, maxReplicationStream=20 and corePoolSize=2 , maxPoolSize=8. Meanwhile, NN sends up to maxTransfer reconstruction tasks to DN for each heartbeat, and it is calculated in FSNamesystem:

final int maxTransfer = blockManager.getMaxReplicationStreams() - xmitsInProgress;

However, at any giving time, {ErasureCodingWorker#stripedReconstructionPool takes 2 xmitInProcess. So for each heartbeat in 3s, NN will send about 20-2 = 18 reconstruction tasks to the DN, and DN throw away most of them if there were 8 tasks in the queue already. So NN needs to take longer to re-consider these blocks were under-replicated to schedule new tasks.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-12044.00.patch
27/Jun/17 21:58
1 kB
Lei (Eddy) Xu
HDFS-12044.01.patch
28/Jun/17 18:28
5 kB
Lei (Eddy) Xu
HDFS-12044.02.patch
28/Jun/17 22:03
8 kB
Lei (Eddy) Xu
HDFS-12044.03.patch
30/Jun/17 23:25
14 kB
Lei (Eddy) Xu
HDFS-12044.04.patch
25/Jul/17 20:47
17 kB
Lei (Eddy) Xu
HDFS-12044.05.patch
27/Jul/17 22:07
17 kB
Lei (Eddy) Xu

Issue Links

is related to

HDFS-12215 DataNode#transferBlock does not create its daemon in the xceiver thread group

Resolved

HDFS-12208 NN should consider DataNode#xmitInProgress when placing new block

Open

Activity

People

Assignee:: Lei (Eddy) Xu

Reporter:: Lei (Eddy) Xu

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 27/Jun/17 00:37

Updated:: 28/Jul/17 18:15

Resolved:: 28/Jul/17 17:53