Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
We know the Repl checking depends on BlockManager#countNodes(), but countNodes() has limitation for striped blockGroup.
One missing internal block will be catched by Repl checking, and handled by ReplicationMonitor.
One over-replicated internal block will be catched by Repl checking, and handled by processOverReplicatedBlocks.
One missing internal block and two over-replicated internal blocks at the same time will be catched by Repl checking, and handled by processOverReplicatedBlocks, later by ReplicationMonitor.
One missing internal block and One over-replicated internal block at the same time will NOT be catched by Repl checking.
"at the same time" means one missing internal block can't be recovered, and one internal block got over-replicated anyway. For example:
scenario A:
step 1. block #0 and #1 are reported missing.
2. a new #1 got recovered.
3. the old #1 come back, and the recovery work for #0 failed.
scenario B:
1. An DN decommissioned/dead which has #1.
2. block #0 is reported missing.
3. The DN has #1 recommisioned, and the recovery work for #0 failed.
In the end, the blockGroup has [1, 1, 2, 3, 4, 5, 6, 7, 8], assume 6+3 schema. Client always needs to decode #0 if the blockGroup doesn't get handled.
Attachments
Attachments
Issue Links
- duplicates
-
HDFS-14699 Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold
- Resolved