Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.9.0
-
None
Description
RemoteKsckTest.TestClusterWithLocation is flaky
Alexey took a look at it and here is the analysis:
In essence, due to slowness of TSAN builds, connection negotiation from kudu CLI to one of master servers timed out, so one of the preconditions of the test didn't meet. The error output by the test was:
/data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/ksck_remote-test.cc:523: Failure Failed Bad status: Network error: failed to gather info from all masters: 1 of 3 had errors
The corresponding error in the master's log was:
W0221 12:38:27.119146 31380 negotiation.cc:313] Failed RPC negotiation. Trace: 0221 12:38:23.949428 (+ 0us) reactor.cc:583] Submitting negotiation task for client connection to 127.25.42.190:51799 0221 12:38:25.362220 (+1412792us) negotiation.cc:98] Waiting for socket to connect 0221 12:38:25.363489 (+ 1269us) client_negotiation.cc:167] Beginning negotiation 0221 12:38:25.369976 (+ 6487us) client_negotiation.cc:244] Sending NEGOTIATE NegotiatePB request 0221 12:38:25.431582 (+ 61606us) client_negotiation.cc:261] Received NEGOTIATE NegotiatePB response 0221 12:38:25.431610 (+ 28us) client_negotiation.cc:355] Received NEGOTIATE response from server 0221 12:38:25.432659 (+ 1049us) client_negotiation.cc:182] Negotiated authn=SASL 0221 12:38:27.051125 (+1618466us) client_negotiation.cc:483] Received TLS_HANDSHAKE response from server 0221 12:38:27.062085 (+ 10960us) client_negotiation.cc:471] Sending TLS_HANDSHAKE message to server 0221 12:38:27.062132 (+ 47us) client_negotiation.cc:244] Sending TLS_HANDSHAKE NegotiatePB request 0221 12:38:27.064391 (+ 2259us) negotiation.cc:304] Negotiation complete: Timed out: Client connection negotiation failed: client connection to 127.25.42.190:51799: BlockingWrite timed out
We are seeing this on the flaky test dashboard for both TSAN and ASAN builds.