Checkpoints are done in the NN by writing to fsimage.ckpt_TXID files, and rename to fsimage_TXID files upon success.
If a checkpoint fails half way, the fsimage.ckpt_ file will be left on disk. There is no logic to clean it up at all.
After talking with Aaron Myers, I understand the historical reason for not immediately cleaning up those files, since they maybe useful for disaster recovery.
But feels like cleaning those ckpt files after a successful checkpoint, with a larger TXID threshold is also safe to do.