I did a little investigation to try to answer Konstantin's questions above.
First, I'll summarize our current behavior, verified on 0.23.1 release (I didn't understand this thoroughly before trying it out):
- In a running cluster, if you restart the NN without the -upgrade flag, then the DataNodes will happily re-register without exiting.
- If you restart the NN with -upgrade, then when the DN next heartbeats, it will fail the verifyRequest() check, since the registration ID's namespace fields no longer match (the ctime has been incremented by the upgrade). This causes the DataNode to exit.
- Of course, restarting the DN at this point makes it take the snapshot and participate in the upgrade as expected.
So, to try to respond to Konstantin's questions, here are a couple example scenarios:
Scenario 1: rolling upgrade without doing a "snapshot" upgrade (for emergency bug fixes, hot fixes, MR fixes, other fixes which we don't expect to affect data reliability):
- Leave the NN running, on the old version.
- On each DN, in succession: (1) shutdown DN, (2) upgrade software to the new version, (3) start DN
The above is sufficient if the changes are scoped only to DNs. If the change also affects the NN, then you will need to add the following step, either at the beginning or end of the process:
- shutdown NN. upgrade installed software. start NN on new version
In the case of an HA setup, we can do the NN upgrade without downtime:
- shutdown SBN. upgrade SBN software. start SBN.
- failover to SBN running new version.
- Shutdown previous active. Upgrade software. Start previous active
- Optionally fail back
Scenario 2: upgrade to a version with a new layout version (LV)
In this case, a "snapshot" style upgrade is required – the NN will not restart without the "-upgrade" flag, and a DN will not connect to a NN with a different LV. So the scenario is the same as today:
- Shutdown entire cluster
- Upgrade all software in teh clsuter
- Start cluster with -upgrade flag
- any nodes that missed the software upgrade will fail to connect, since their LV does not match (this patch retains that behavior)
Scenario 3: upgrade to a version with same layout version, but some data risk (for example upgrading to a version with bug fixes pertaining to replication policies, corrupt block detection, etc)
In this scenario, the NN does not mandate a -upgrade flag, but as Sanjay mentioned above, it can still be useful for data protection. As with today, if the user does not want the extra protection, this scenario can be treated identically to scenario 1. If the user does want the protection, it can be treated identically to scenario 2. Scenario 2 remains safe because of the check against the NameNode's ctime matching the DN's ctime. As soon as you restart the NN with the -upgrade flag, all running DNs will exit. Any newly started DN will noticethe new namespace ctime and take part in the snapshot upgrade.
Does the above description address your concerns? Another idea would be to add a new configuration option like dfs.allow.rolling.upgrades which enables the new behavior, so an admin who prefers not to use the feature can disallow it completely.