Following the merging of a large patch chain, it was noticed on our internal CI that several tests had become flaky, with a similar pattern in the failures: the tests fail early when a future times out. Often, this occurs when a test cluster is being spun up and one of the offer futures times out. This has been observed in the following tests:
- SlaveRecoveryTest/0.ReconnectHTTPExecutor (
- ResourceOffersTest.ResourcesGetReofferedAfterTaskInfoError (
- SlaveTest.CommandTaskWithKillPolicy (
See the linked JIRAs noted above for individual tickets addressing a couple of these.