current format functionality is broken itself. It deletes the metadata while doing nothing about the data stored in data-nodes.
Just like mkfs. And just like it, the fact that it doesn't delete the actual data is a feature, not a bug. If I restore the fsimage back then my data should come back too. (mostly... new data ofc is likely to be missing, etc) It's why making a copy of the fsimage is Hadoop Ops 101.
Some key advice I give to admins: you can try to prevent mistakes, but they'll still happen despite your best efforts. After low hanging warnings, the energy is better spent on how to quickly recover. But that's a problem that's outside of the core code.
For the record, yes, I've made HUGE mistakes like this in my career. Every admin has. In my case, I brought down an entire hospital once. Even with that experience, I still think requiring metadata deletion outside of the tool set is waaaaay overkill.
may be being able to tag a cluster as "production" like discussed above is a better idea?
Yeah, sure, whatever. All that's going to happen is:
hdfs --config /tmp/mymodifiedconfig namenode -format -force
If a user is too lazy/impatient/distracted to check that they are on a live system before hitting y, they'll just change the flag and then format. But if that makes folks happy, fine. It still sounds like the console output needs some work though if a user couldn't "see" it. (Not sure I agree with that either, but whatever.)
BTW, a quick search for how the equivalent problem is solved in databases is interesting. Almost all of them that I looked at: don't give the user access. So yes, enough rope to hang themselves seems to be the expectation operationally.