The standby uses the following method to check if it is time to trigger edit log rolling on active.
In doTailEdits(), lastLoadTimeMs is updated when standby is able to successfully tail any edits
The default configuration for dfs.ha.log-roll.period is 120 seconds and dfs.ha.tail-edits.period is 60 seconds. With in-progress edit log tailing enabled, tooLongSinceLastLoad() will almost never return true resulting in edit logs not rolled for a long time until this configuration dfs.namenode.edit.log.autoroll.multiplier.threshold takes effect.
[In our deployment, this resulted in in-progress edit logs getting deleted. The sequence of events is that standby was able to checkpoint twice while the in-progress edit log was growing on active. When the NNStorageRetentionManager decided to cleanup old checkpoints and edit logs, it cleaned up the in-progress edit log from active and QJM (as the txnid on in-progress edit log was older than the 2 most recent checkpoints) resulting in irrecoverably losing a few minutes worth of metadata].