Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
0.7.0
-
None
Description
I'm hitting a case where the client fails to scan from the node that remains after we kill two:
I0219 09:27:25.552407 8830 meta_cache.cc:635] Marking tablet server 92c6616aee764f2bafdeb5ece5816102 (127.34.126.1:56329) as failed. W0219 09:27:25.552430 8830 meta_cache.cc:191] Tablet 2bd2a0aa8c0d4f2890106408638d7860: Replica 92c6616aee764f2bafdeb5ece5816102 (127.34.126.1:56329) has failed: Network error: TS failed: Client connection negotiation failed: client connection to 127.34.126.1:56329: connect: Connection refused (error 111) I0219 09:27:25.552924 8830 meta_cache.cc:635] Marking tablet server cf874235214a4471b761e84bad1fdd03 (127.34.126.2:36921) as failed. W0219 09:27:25.552945 8830 meta_cache.cc:191] Tablet 2bd2a0aa8c0d4f2890106408638d7860: Replica cf874235214a4471b761e84bad1fdd03 (127.34.126.2:36921) has failed: Network error: TS failed: Client connection negotiation failed: client connection to 127.34.126.2:36921: connect: Connection refused (error 111) I0219 09:27:25.553062 8830 meta_cache.cc:635] Marking tablet server 92c6616aee764f2bafdeb5ece5816102 (127.34.126.1:56329) as failed. W0219 09:27:25.553074 8830 meta_cache.cc:191] Tablet 2bd2a0aa8c0d4f2890106408638d7860: Replica 92c6616aee764f2bafdeb5ece5816102 (127.34.126.1:56329) has failed: Network error: TS failed: Client connection negotiation failed: client connection to 127.34.126.1:56329: connect: Connection refused (error 111) I0219 09:27:25.553458 8830 meta_cache.cc:635] Marking tablet server 92c6616aee764f2bafdeb5ece5816102 (127.34.126.1:56329) as failed. W0219 09:27:25.553478 8830 meta_cache.cc:191] Tablet 2bd2a0aa8c0d4f2890106408638d7860: Replica 92c6616aee764f2bafdeb5ece5816102 (127.34.126.1:56329) has failed: Network error: TS failed: Client connection negotiation failed: client connection to 127.34.126.1:56329: connect: Connection refused (error 111) I0219 09:27:25.554150 8830 linked_list-test-util.h:826] Done collecting results (0 rows in 0.001179ms)
You can see it's trying to hit the two dead nodes. Meanwhile, the survivor 5863a398b4c340aea712e4097c355457 is trying to run a leader election.
It reproes about 50% of the time on this fast machine I'm using, but setting the verbose logging higher makes it work 100% of the time I tried.
Attachments
Attachments
Issue Links
- duplicates
-
KUDU-1387 Scanner gets into tight loop followed by long sleep when leader TS is down
- Resolved