Robert Kanter the overall approach looks good.
Some questions and comments.
Re checkAggregatedStatus and various LogAggregationStatus states
1. a. Looks like TIMED_OUT state will be considered eligible for logaggregation. That seems the right thing to do
1. b. Similarly FAILED is not considered for log aggregation while earlier it was, right? Should it also be considered for aggregation? Looking at the description it indicates some log has error in aggreation. Are we not considering it because it may cause some logs to be left behind? If thats the reason I am ok with the current patch approach. Just want to confirm.
2. If a new non terminal LogAggregationStatus gets added to the enum we will not filter it out in checkAggregatedStatus. Is that a problem that we accidentally process it? If so we should extract a function and add a unit test case to catch those newly added states and explicitly fail the unit test.
3. consider reversing the conditions for the user != null in AppInfo#equals to make it easier to read
return user != null ? user.equals(appInfo.user) : appInfo.user == null;
4. consider adding log lines when we skip an application because the maxTotalSize is exceeded and also when we get an IOException.
5. FileWriter import is now unused
6. consider seedAndCheckFiles -> checkFilesAndSeedApps
7. consider checkAggregatedStatus -> filterAppsByAggregatedStatus
8. In seedAndCheckFiles we have a catch for the outer iteration. This could potentially break out of the loop on first app with an error and not process any more apps for that user. Should it instead be like the old way where we wrap the innermost
FileStatus files = fs.listStatus(appLogPath);
That will also match the comment
// Ignore any apps we can't read
8. Consider extracting out System.getProperty("user.name") into a variable to avoid repeated calls
9. Formatting inconsistency in in the findbugs exclude