Thanks a lot for the reviews Wei-Chiu Chuang, good comments!
Replying one by one, and attaching a patch in the end. Comments that's not mentioned are all addressed.
zoneId is obtained when holding FSDirectory read lock, release the lock, and then acquire FSDirectory read lock again.
This assertion is only correct if there will be only one ReencryptionHandler running.
There is only one ReencryptionHandler. Added texts to javadoc.
If the zone referred to by inodeid is changed (e.g. deleted/renamed) while the lock is not held, checkZoneReady will throw. A similar test case would be TestReencryption#testZoneDeleteDuringReencrypt.
ReencryptionStatus#updateZoneStatus() should check that zoneNode is an encryption zone.
For the 2 callers, FSD#addEZ is where the zoneId is added, so always true. FSDirXAttrOp#unprotectedSetXAttrs is happening within the EZXattr, so also always true. (There's no 'disable encryption' command, so zone node can only be deleted/renamed)
Why is currentBatch a TreeMap?
Good question. Initially this was done to keep the element's ordering and using path as the key. Now that it's changed to inode id based, we can just use a list. (Sorry didn't rebase the inodeid patch here on 14...)
Does ZoneReencryptionStatus#getLastProcessedFile return the relative path? or file name only? or absolute path?
Absolute path - so we can restore in case of fail over.
It Allocates a 2000-element map, copy it over, and then clear the map. That looks suboptimal. Would it be feasible to wrap TreeMap and make a method that simply assigns the TreeMap reference to another currentBatch?
Agreed, problem is currentBatch here is passed in from the very outside of the call stack.
Made it a member variable of ReencryptionHandler to address this. It's still safe with the single-threaded handler model, but perhaps harder to read. Please share your thoughts.
EDEKReencryptCallable ... retry ... if reencryptEdeks() returns numFailures > 0, call() should not return a new ReencryptionTask object.
Initially talking with Andrew Wang, we wanted to always retry things, so admin can just fix the error, and continue (or cancel).
But since KMSCP already has the retry logic added by
HADOOP-14521, and to trade off for maintainability, we do not 'double retry' here, and only let KMSCP's retry policy to handle failures.
When -listReencryptionStatus, if numOfFailures > 0, a message is printed to ask admin to examine failure and re-submit.
Implementation-wise, we still depend on the ReencryptionTask object to pass the failures to the updater, so need that object. Updater handles failed tasks differently.