Current changes for clearing IBRs on re-Register() looks good.
For the second part, i.e. Avoid accumulation of IBRs when the standby is down for long time, can we consider as below. (Already mentioned in my above comment)
1. IBRs for StandbyNN can have a threshold ( say 100K or 1Million IBRs ).
2. Also not to loose any important IBRs, IBRs can be cleared when "the threshold is reached AND 'lastIBR' is more than 'heartbeatExpiryInterval'. i.e. DataNode is considered dead in Namenode side". In that case, for sure re-Register() will be called on reconnection to running NameNode (if any).
Only question is, heartBeatExpiryInterval in NameNode depends on conf "dfs.namenode.heartbeat.recheck-interval" which is namenode side configuration. By default this is 5 min. If there is any change in this in Namenode side, that change should also be present in datanode config. Is it okay to use this? or introduce a common conf to NN and DN?
Tsz Wo Nicholas Sze, what is your opinion in this?