Details
-
Sub-task
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Erasure Coding: Improve memory efficiency of BlockInfoStriped
Assume we have a BlockInfoStriped:
triplets[] = {s0, s1, s2, s3} indices[] = {0, 1, 2, 3}
s0 means storage_0
When we run balancer/mover to re-locate replica on s2, firstly it becomes:
triplets[] = {s0, s1, s2, s3, s4} indices[] = {0, 1, 2, 3, 2}
Then the replica on s2 is removed, finally it becomes:
triplets[] = {s0, s1, null, s3, s4} indices[] = {0, 1, -1, 3, 2}
The worst case is:
triplets[] = {null, null, null, null, s4, s5, s6, s7} indices[] = {-1, -1, -1, -1, 0, 1, 2, 3}
We should learn from BlockInfoContiguous.removeStorage(..). When a storage is removed, we move the last item front.
With the improvement, the worst case become:
triplets[] = {s4, s5, s6, s7, null} indices[] = {0, 1, 2, 3, -1}
We have an empty slot.
Notes:
Assume we copy 4 storage first, then delete 4. Even with the improvement, the worst case could be:
triplets[] = {s4, s5, s6, s7, null, null, null, null} indices[] = {0, 1, 2, 3, -1, -1, -1, -1}
But the Balancer uses delHint. So when add one will always delete one. So this case won't happen for striped and contiguous blocks.
idx_i must be moved to slot_i. So slot_i will have idx_i. So we can do further improvement in HDFS-8032.