By using following approach we can save about 45% memory footprint for each block replica in DataNode memory (This JIRA only talks about ReplicaMap in DataNode), the details are:
Currently we use a HashMap Map<Long, ReplicaInfo> to store the replicas in memory. The key is block id of the block replica which is already included in ReplicaInfo, so this memory can be saved. Also HashMap Entry has a object overhead. We can implement a lightweight Set which is similar to LightWeightGSet, but not a fixed size (LightWeightGSet uses fix size for the entries array, usually it's a big value, an example is BlocksMap, this can avoid full gc since no need to resize), also we should be able to get Element through key.
Following is comparison of memory footprint If we implement a lightweight set as described:
We can save:
Total: -44 bytes
We need to add:
Total: +4 bytes
So totally we can save 40bytes for each block replica
And currently one finalized replica needs around 46 bytes (notice: we ignore memory alignment here).
We can save 1 - (4 + 46) / (44 + 46) = 45% memory for each block replica in DataNode.