> This in turn maps to the capability of serving ~3000 nodes with 3 second heartbeat latency.
This makes sense. I hope it's not going to come to that all heartbeats will remove 1000 blocks on all nodes.
> Most of the classes in DataNode are named Block*
Traditionally hdfs did not distinguish between blocks and their replicas, which we found very confusing while implementing append and tried to call new classes Replica*. So yes you see a lot of Block* classes, but it would be really good to turn this in the right direction. Wouldn't you agree that "replica" is a more precise term for a copy of a block on a specific data-node.
> I like the abstraction of creating a task using the 5 arguments, and then do "execute(Task)".
I think the abstraction should provide an api to delete replica files independently on whether it is multi-threaded or single-threaded, so it makes sense to me to keep the implementation details concealed in the deleter.
Looked into the implementation details a bit. By default ThreadPoolExecutor sets allowCoreThreadTimeOut to false, which means the threads never shutdown even if there are no deletes. I would rather pay the price of restarting threads when new deletes arrive than keep those threads running forever. Data-nodes spawn a lot of threads besides the deletion. Besides, it will also automatically take care of the condition when a volume dies and we remove it from FSVolumeSet. It would be a waist to keep a thread around for a dead volume.
The key for the HashMap of threads is the reference to the volume. This is based on that you do not explicitly define equals() and hashCode() for FSVolume. Currently we do not alter instances of volumes, but if we ever do this could be a problem. May be it is better to use volume's directory as the key in the HashMap.
You still need to remove the HashMap entry, when the volume is removed from the system.