Current logging and metrics are insufficient to diagnose latency problems in cluster startup. Add:
1. better logs in both Datanode and Namenode for Initial Block Report processing, to help distinguish between block
report processing problems and RPC/queuing problems;
2. new logs to measure cost of scanning all blocks for over/under/invalid replicas, which occurs in Namenode just
before exiting safe mode;
3. new logs to measure cost of processing the under/invalid replica queues (created by the above mentioned scan), which
occurs just after exiting safe mode, and is said to take 100% of CPU.