Hairong, what I am seen on a real (0.20.2 based cluster) the NN storage volume which has been once removed (e.g. because of a faulty NFS mount or something) is emptied as soon SNN starts checkpoint process. This happens because FSEditLog.synchronized void rollEditLog calls FSImage.attemptRestoreRemovedStorage and effectively formats a faulty volume if it becomes available.
I guess it is possible that a checkpoint can happen before rollEditLog was called and than the inconsistency you've mentioned might be introduced. I think it won't happen because SecondaryNameNode.doMerge iterates through Storage.storageDirs which won't contain failed volume unless it has been restored and formatted. If this all is true then we have a test which is failing not because the feature doesn't work but rather because the test needs to be changed in lights of
Please let me know if my analysis is incorrect.