When a node is transitioned to/stay in/transitioned out of maintenance state, we need to make sure blocks w.r.t. that nodes are properly handled.
- When nodes are put into maintenance, it will first go to ENTERING_MAINTENANCE, and make sure blocks are minimally replicated before the nodes are transitioned to IN_MAINTENANCE.
- Do not replica blocks when nodes are in maintenance states. Maintenance replica will remain in BlockMaps and thus is still considered valid from block replication point of view. In other words, putting a node to “maintenance” mode won’t trigger BlockManager to replicate its blocks.
- Do not invalidate replicas on node under maintenance. After any file's replication factor is reduced, NN needs to invalidate some replicas. It should exclude nodes under maintenance in the handling.
- Do not put IN_MAINTENANCE replicas in LocatedBlock for read operation.
- Do not allocate any new block on nodes under maintenance.
- Have Balancer exclude nodes under maintenance.
- Exclude nodes under maintenance for DN cache.