Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1342

LinkedListTest.TestLoadWhileOneServerDownAndVerify flakiness

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.7.0
    • 0.9.0
    • integration
    • None

    Description

      I'm hitting a case where the client fails to scan from the node that remains after we kill two:

      I0219 09:27:25.552407  8830 meta_cache.cc:635] Marking tablet server 92c6616aee764f2bafdeb5ece5816102 (127.34.126.1:56329) as failed.
      W0219 09:27:25.552430  8830 meta_cache.cc:191] Tablet 2bd2a0aa8c0d4f2890106408638d7860: Replica 92c6616aee764f2bafdeb5ece5816102 (127.34.126.1:56329) has failed: Network error: TS failed: Client connection negotiation failed: client connection to 127.34.126.1:56329: connect: Connection refused (error 111)
      I0219 09:27:25.552924  8830 meta_cache.cc:635] Marking tablet server cf874235214a4471b761e84bad1fdd03 (127.34.126.2:36921) as failed.
      W0219 09:27:25.552945  8830 meta_cache.cc:191] Tablet 2bd2a0aa8c0d4f2890106408638d7860: Replica cf874235214a4471b761e84bad1fdd03 (127.34.126.2:36921) has failed: Network error: TS failed: Client connection negotiation failed: client connection to 127.34.126.2:36921: connect: Connection refused (error 111)
      I0219 09:27:25.553062  8830 meta_cache.cc:635] Marking tablet server 92c6616aee764f2bafdeb5ece5816102 (127.34.126.1:56329) as failed.
      W0219 09:27:25.553074  8830 meta_cache.cc:191] Tablet 2bd2a0aa8c0d4f2890106408638d7860: Replica 92c6616aee764f2bafdeb5ece5816102 (127.34.126.1:56329) has failed: Network error: TS failed: Client connection negotiation failed: client connection to 127.34.126.1:56329: connect: Connection refused (error 111)
      I0219 09:27:25.553458  8830 meta_cache.cc:635] Marking tablet server 92c6616aee764f2bafdeb5ece5816102 (127.34.126.1:56329) as failed.
      W0219 09:27:25.553478  8830 meta_cache.cc:191] Tablet 2bd2a0aa8c0d4f2890106408638d7860: Replica 92c6616aee764f2bafdeb5ece5816102 (127.34.126.1:56329) has failed: Network error: TS failed: Client connection negotiation failed: client connection to 127.34.126.1:56329: connect: Connection refused (error 111)
      I0219 09:27:25.554150  8830 linked_list-test-util.h:826] Done collecting results (0 rows in 0.001179ms)
      

      You can see it's trying to hit the two dead nodes. Meanwhile, the survivor 5863a398b4c340aea712e4097c355457 is trying to run a leader election.

      It reproes about 50% of the time on this fast machine I'm using, but setting the verbose logging higher makes it work 100% of the time I tried.

      Attachments

        1. llt-1342.log
          646 kB
          Jean-Daniel Cryans

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jdcryans Jean-Daniel Cryans
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: