Some of the new info messages should probably be debug level
There were only a few new info messages. I changed one of them to debug, and made one other less verbose, since some of the info is only relevant in the event of an error, and in that case the extra info is printed as part of the exception.
Do we also need to add some locking so that only one 2NN could be uploading an image at the same time?
Agreed. This strictly necessary to fix the issue identified in this JIRA, but I agree that this is a potential for error as well.
getNewChecksum looks like it will leak a file descriptor
Thanks, good catch.
would it be easier to just backport the part of 903 that creates an "imageChecksum" member which is updated whenever the image is merged, by the existing output stream? That would reduce divergence between 20s and trunk. That is to say, backport
HDFS-903 except for the part where the checksum is put in the VERSION file.
I thought about doing this. Thought it seems like it would make for a more straight-forward back-port, the back-port isn't easy regardless because of other divergences between trunk and branch-0.20-security. So, we don't seem to be gaining much by doing it this way, and since we wouldn't be storing the previous checksum as part of the VERSION file, we wouldn't be getting the intended benefit of
HDFS-903 ("NN should verify images and edit logs on startup.")
I'll upload a patch in a moment which addresses all of these issues, except the last one. Todd, if you feel strongly about it, I can rework the patch as you described to be a more faithful back-port of