On further review, I see we don't have the right hooks in place to avoid sleep times in this test completely. Trying to introduce those hooks would be a very intrusive refactoring.
However, we can still limit our reliance on sleep times and provide a more predictable test by using CountDownLatch. I'm attaching patch v06. The main code is the same, but I've changed the test code to use CountDownLatch for coordination between the 2 threads. There is still some sleep time on the background thread, which is required to make sure we cover the lease acquisition retry logic during the delete. In addition to making the test more predictable, this is also faster. The patch v05 test ran in ~65 seconds. The patch v06 test runs in ~9 seconds.
Xiaoyu Yao, could we please get your help with a code review on this? I am +1 on the main code portion of this, but I can't commit this based on my own +1, because I've touched the test code portion of the patch. Thanks!