Thanks for the thorough review. Here's a new patch.
FSImage.loadFSEdits(StorageDirectory sd) should return boolean instead of FSEditLogLoader
You can avoid introducing FSEditLogLoader.needResave by setting expectedStartingTxId before checking that logVersion != FSConstants.LAYOUT_VERSION. Then the old logic of counting this event as an extra transaction will work
I found the former logic here to be very confusing and somewhat of a hack. It's also important that the loader returns the correct number of edits rather than potentially returning 1 when there are 0 edits. If it did that, it would break many cases by potentially causing a skip in transaction IDs. Though the new code adds a new member, the new member has a clear purpose and I think it's easier to understand from the caller's perspective, especially now that your point #1 above is addressed.
It would be good if you could replace FSEditLogLoader.expectedStartingTxId member by the respective parameter to loadFSEdits
I think after that you can also get rid of FSEditLogLoader.numEditsLoaded.
Why don't we write first opCode, then txID, then Writable. There will be less code changes on the loading part
Very good call! This indeed cleaned up the loading code a lot.
Should we introduce TransactionHeader at this point and write it as Writable. Just something to consider
I think given that the header is still pretty simple it's not worth it at this point.
Need to change JavaDoc for EditLogOutputStream.write(). Missing parameter
I don't see any reason to have txID in the beginning of every edits file. You will have it the name, right
beginTransaction() instead of startTransaction, as it matches with endTransaction()
Don't change rollEditLog() to return long. It is only used in the test
It's necessary that the transaction ID be returned inside the same synchronization block. If we used a separate call to getLastWrittenTxId() then another txid could have been written in between (note that the test is multithreaded).
It looks to me that FSImage.checkpointTxId is simply currentTxId. If it is, it would be more intuitive
It's not really current - it's the txid of the image file, not including any edits that have been written to the edit log - sort of like how checkpointTime is set only when an image is saved. Naming it "currentTxId" would imply that it is updated on every edit.
BackupStorage.lastAppliedTxId isn't it just checkpointTxId, which is already defined in the base FSImage.
Contrary to above, lastAppliedTxId refers to the transaction ID that has been applied to the namespace. This is always >= checkpointTxId - checkpointTxId only changes when the BN saves an image, but lastAppliedTxId changes every time some edits are applied via RPC.
I'll run the new patch through the unit test suite one more time.