Whenever we restart a cluster, there's a chance of losing some blocks if more than three datanodes don't come up.
HDFS-457 increases this chance by keeping the datanodes up even when
- /tmp disk goes read-only
- /disk0 that is used for storing PID goes read-only
and probably more.
In our environment, /tmp and /disk0 are from the same device.
When trying to restart a datanode, it would fail with
I can recover the missing blocks but it takes some time.
Also, we are losing track of block movements since log directory can also go to read-only but datanode would continue running.
For 0.21 release, can we revert
HDFS-457 or make it configurable?