The image part of the patch looks good. I liked the seamless calculation of the checksum with DigestInputStream.
The checkpoint part is a bit inconsistent.
The CheckpointSignature is an invariant of the current checkpoint process. This is the way for the NN to recognize that it is still talking with the same checkpointer that started the process. Therefore, we should never alter the signature within the same checkpoint process.
We should probably make the JavaDoc for CheckpointSignature more explicit on that.
Current implementation is inconsistent in two ways.
On the one hand, rollImage for SNN sends back a new checkpoint signature with the digest of the new image. This value is recorded by the NN as the new digest. Instead
- The SNN should send back the digest of the original (old) image, and the NN should verify it against the original signature by calling validateStorageInfo(), as it is done in endCheckpoint().
- The NN should calculate the digest by itself not relying on the value passed by SNN.
It could probably be done while uploading the image from SNN.
On the other hand, the BN/CN sends back the original checkpoint signature with the old image digest. And the NN records it as the new digest, which should lead to an error during reloading. Again the NN should always calculate the digest by itself, it should be the only authority for its own image.
A couple of nits:
- At the end of FSImage.loadFSImage(File) the same LOG message is printed twice.
- Empty line added in FSImage.resetVersion()