Some more details on the test failure in TestServiceLevelAuthorization:
The problem is with TestMiniMRWithDFS.checkTaskDirectories. As explained above, this method verifies an exact list of localized directories on a tasktracker. To do so, it seems to wait until the TTs become idle. But idleness is only a function of the number of active tasks. When tasks complete, the TT adds directories to be deleted to the CleanupQueue, which by default, asynchronously deletes them. Clearly that means that waiting for the TTs to become idle is not a sufficient pre-condition for the testing of localized directory contents.
Like I explained, this problem exists on trunk as well. But this patch can increase the chances of hitting it more often, because it does more work in the asynchronous portion. Given that, I am not comfortable committing it, particularly as there's no blazing hurry for this patch right now. We do run a risk of the patch going stale, but I am relying on the holiday season to slow people down (smile). More importantly, we'll not make the test scene worse than it already is.
As regards the fix, a simple solution may be to configure the MiniMRCluster with an inline cleanup queue whenever we want to check the deletion of localized files and directories - essentially in all test cases that use TestMiniMRWithDFS.checkTaskDirectories. Fortunately, there are only four of them right now. Another option could be to wait until the cleanup queue is empty. The disadvantage with this approach is that it uses the much maligned "sleep(100) until some condition is satisfied" pattern. I would use the former approach instead, which has worked well elsewhere. It may slow the tests down a tad bit, but hopefully not too badly.