Description
Triaging one issue (which I believe is not related to network flake) I found while testing the 1.0.1 RC release bits for tracking purposes. I noticed that the failure pattern for some of these tests are quite common. Here are the sequences of operations from logs just before failure:
- ExternalMiniClusterITestBase::StartCluster() succeeds creating the tablet servers to hold tablet replicas
- TestTable is created either directly from test or via TestWorkload.Setup(), so tablet replicas are spilled to the tablet servers.
- Meanwhile the tablet replicas haven't elected a leader successfully (only terms are advanced for about 30 secs), and eventually table create fails.
It is not clear to me if this is a bug, I need to dig into this little more than just logs. If this is not a bug, I wonder if we have some room to make this less failure-prone here. Attached are 2 logs I have from the test run.