Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9467

ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSessionNewMaster is flaky

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • Mac OSX with ssl enabled

    Description

      The following error was observed in an internal CI run:

      ../../src/tests/master_contender_detector_tests.cpp:872: Failure
      Failed to wait 15secs for detected
      

      Full log:

      [ RUN      ] ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSessionNewMaster
      I1210 16:18:13.068011 338650560 zookeeper_test_server.cpp:156] Started ZooKeeperTestServer on port 54990
      2018-12-10 16:18:13,068:28813(0x70000e2f6000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8
      2018-12-10 16:18:13,068:28813(0x70000e2f6000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local
      2018-12-10 16:18:13,068:28813(0x70000e2f6000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin
      2018-12-10 16:18:13,068:28813(0x70000e2f6000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0
      2018-12-10 16:18:13,068:28813(0x70000e2f6000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64
      2018-12-10 16:18:13,068:28813(0x70000e2f6000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins
      2018-12-10 16:18:13,068:28813(0x70000e2f6000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins
      2018-12-10 16:18:13,068:28813(0x70000e2f6000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build
      2018-12-10 16:18:13,068:28813(0x70000e2f6000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:54990 sessionTimeout=10000 watcher=0x116d03e00 sessionId=0 sessionPasswd=<null> context=0x7fd3883958d0 flags=0
      2018-12-10 16:18:13,068:28813(0x70000ed1d000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:54990]
      I1210 16:18:13.069262 236376064 contender.cpp:152] Joining the ZK group
      2018-12-10 16:18:13,070:28813(0x70000ed1d000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:54990], sessionId=0x1679aa0ddc90000, negotiated timeout=10000
      I1210 16:18:13.070789 239058944 group.cpp:341] Group process (zookeeper-group(28)@10.0.49.4:54069) connected to ZooKeeper
      I1210 16:18:13.070853 239058944 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
      I1210 16:18:13.070868 239058944 group.cpp:419] Trying to create path '/mesos' in ZooKeeper
      I1210 16:18:13.073835 235839488 contender.cpp:268] New candidate (id='0') has entered the contest for leadership
      I1210 16:18:13.074319 237985792 detector.cpp:152] Detected a new leader: (id='0')
      I1210 16:18:13.074406 237449216 group.cpp:700] Trying to get '/mesos/json.info_0000000000' in ZooKeeper
      I1210 16:18:13.075139 239058944 zookeeper.cpp:262] A new leading master (UPID=@0.152.150.128:10000) is detected
      2018-12-10 16:18:13,075:28813(0x70000e273000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8
      2018-12-10 16:18:13,075:28813(0x70000e273000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local
      2018-12-10 16:18:13,075:28813(0x70000e273000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin
      2018-12-10 16:18:13,075:28813(0x70000e273000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0
      2018-12-10 16:18:13,075:28813(0x70000e273000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64
      2018-12-10 16:18:13,075:28813(0x70000e273000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins
      2018-12-10 16:18:13,075:28813(0x70000e273000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins
      2018-12-10 16:18:13,075:28813(0x70000e273000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build
      2018-12-10 16:18:13,075:28813(0x70000e273000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:54990 sessionTimeout=10000 watcher=0x116d03e00 sessionId=0 sessionPasswd=<null> context=0x7fd3886b40e0 flags=0
      2018-12-10 16:18:13,075:28813(0x70000f944000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:54990]
      I1210 16:18:13.076236 238522368 contender.cpp:152] Joining the ZK group
      2018-12-10 16:18:13,077:28813(0x70000f944000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:54990], sessionId=0x1679aa0ddc90001, negotiated timeout=10000
      I1210 16:18:13.077278 239058944 group.cpp:341] Group process (zookeeper-group(29)@10.0.49.4:54069) connected to ZooKeeper
      I1210 16:18:13.077311 239058944 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
      I1210 16:18:13.077327 239058944 group.cpp:419] Trying to create path '/mesos' in ZooKeeper
      I1210 16:18:13.079026 236912640 contender.cpp:268] New candidate (id='1') has entered the contest for leadership
      I1210 16:18:13.079605 237449216 detector.cpp:152] Detected a new leader: (id='0')
      I1210 16:18:13.079732 236912640 group.cpp:700] Trying to get '/mesos/json.info_0000000000' in ZooKeeper
      I1210 16:18:13.080291 237449216 zookeeper.cpp:262] A new leading master (UPID=@0.152.150.128:10000) is detected
      2018-12-10 16:18:13,080:28813(0x70000e3fc000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8
      2018-12-10 16:18:13,080:28813(0x70000e3fc000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local
      2018-12-10 16:18:13,080:28813(0x70000e3fc000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin
      2018-12-10 16:18:13,080:28813(0x70000e3fc000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0
      2018-12-10 16:18:13,080:28813(0x70000e3fc000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64
      2018-12-10 16:18:13,080:28813(0x70000e3fc000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins
      2018-12-10 16:18:13,080:28813(0x70000e3fc000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins
      2018-12-10 16:18:13,080:28813(0x70000e3fc000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build
      2018-12-10 16:18:13,080:28813(0x70000e3fc000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:54990 sessionTimeout=10000 watcher=0x116d03e00 sessionId=0 sessionPasswd=<null> context=0x7fd388623260 flags=0
      2018-12-10 16:18:13,081:28813(0x70000fa4a000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:54990]
      2018-12-10 16:18:13,082:28813(0x70000fa4a000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:54990], sessionId=0x1679aa0ddc90002, negotiated timeout=10000
      I1210 16:18:13.082370 239058944 group.cpp:341] Group process (zookeeper-group(30)@10.0.49.4:54069) connected to ZooKeeper
      I1210 16:18:13.082406 239058944 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
      I1210 16:18:13.082422 239058944 group.cpp:419] Trying to create path '/mesos' in ZooKeeper
      I1210 16:18:13.083561 235302912 detector.cpp:152] Detected a new leader: (id='0')
      I1210 16:18:13.083673 237985792 group.cpp:700] Trying to get '/mesos/json.info_0000000000' in ZooKeeper
      I1210 16:18:13.084331 235302912 zookeeper.cpp:262] A new leading master (UPID=@0.152.150.128:10000) is detected
      2018-12-10 16:18:13,085:28813(0x70000fa4a000):ZOO_ERROR@handle_socket_error_msg@1782: Socket [127.0.0.1:54990] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
      I1210 16:18:13.085988 236376064 group.cpp:452] Lost connection to ZooKeeper, attempting to reconnect ...
      2018-12-10 16:18:13,086:28813(0x70000ed1d000):ZOO_ERROR@handle_socket_error_msg@1782: Socket [127.0.0.1:54990] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
      I1210 16:18:13.086808 236912640 group.cpp:452] Lost connection to ZooKeeper, attempting to reconnect ...
      I1210 16:18:13.087200 236376064 detector.cpp:140] The current leader (id=0) is lost
      I1210 16:18:13.087239 236376064 detector.cpp:152] Detected a new leader: (id='1')
      I1210 16:18:13.087339 237985792 group.cpp:700] Trying to get '/mesos/json.info_0000000001' in ZooKeeper
      I1210 16:18:13.087929 236376064 zookeeper.cpp:262] A new leading master (UPID=@0.152.150.129:10001) is detected
      W1210 16:18:23.086897 239058944 group.cpp:495] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=1679aa0ddc90002) expiration
      I1210 16:18:23.087049 239058944 group.cpp:511] ZooKeeper session expired
      I1210 16:18:23.087095 235302912 detector.cpp:140] The current leader (id=0) is lost
      I1210 16:18:23.087132 235302912 detector.cpp:152] Detected a new leader: None
      2018-12-10 16:18:23,087:28813(0x70000e3fc000):ZOO_INFO@zookeeper_close@2579: Freeing zookeeper resources for sessionId=0x1679aa0ddc90002
      
      2018-12-10 16:18:23,087:28813(0x70000e067000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8
      2018-12-10 16:18:23,087:28813(0x70000e067000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local
      2018-12-10 16:18:23,087:28813(0x70000e067000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin
      2018-12-10 16:18:23,087:28813(0x70000e067000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0
      2018-12-10 16:18:23,087:28813(0x70000e067000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64
      2018-12-10 16:18:23,087:28813(0x70000e067000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins
      2018-12-10 16:18:23,087:28813(0x70000e067000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins
      2018-12-10 16:18:23,087:28813(0x70000e067000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build
      2018-12-10 16:18:23,087:28813(0x70000e067000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:54990 sessionTimeout=10000 watcher=0x116d03e00 sessionId=0 sessionPasswd=<null> context=0x7fd388623260 flags=0
      2018-12-10 16:18:23,087:28813(0x70000f4b5000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:54990]
      W1210 16:18:23.087962 235302912 group.cpp:495] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=1679aa0ddc90000) expiration
      I1210 16:18:23.088090 235302912 group.cpp:511] ZooKeeper session expired
      I1210 16:18:23.088133 237449216 detector.cpp:140] The current leader (id=0) is lost
      I1210 16:18:23.088150 237449216 detector.cpp:152] Detected a new leader: None
      I1210 16:18:23.088217 237985792 contender.cpp:217] Membership cancelled: 0
      2018-12-10 16:18:23,088:28813(0x70000e067000):ZOO_INFO@zookeeper_close@2579: Freeing zookeeper resources for sessionId=0x1679aa0ddc90000
      
      2018-12-10 16:18:23,088:28813(0x70000e16d000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8
      2018-12-10 16:18:23,088:28813(0x70000e16d000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local
      2018-12-10 16:18:23,088:28813(0x70000e16d000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin
      2018-12-10 16:18:23,088:28813(0x70000e16d000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0
      2018-12-10 16:18:23,088:28813(0x70000e16d000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64
      2018-12-10 16:18:23,088:28813(0x70000e16d000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins
      2018-12-10 16:18:23,088:28813(0x70000e16d000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins
      2018-12-10 16:18:23,088:28813(0x70000e16d000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build
      2018-12-10 16:18:23,088:28813(0x70000e16d000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:54990 sessionTimeout=10000 watcher=0x116d03e00 sessionId=0 sessionPasswd=<null> context=0x7fd3885d9a10 flags=0
      2018-12-10 16:18:23,088:28813(0x70000ed1d000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:54990]
      2018-12-10 16:18:23,165:28813(0x70000f944000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 6746ms
      2018-12-10 16:18:23,166:28813(0x70000f944000):ZOO_ERROR@handle_socket_error_msg@1702: Socket [127.0.0.1:54990] zk retcode=-7, errno=60(Operation timed out): connection to 127.0.0.1:54990 timed out (exceeded timeout by 3412ms)
      2018-12-10 16:18:23,166:28813(0x70000f944000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 6746ms
      I1210 16:18:23.166213 236376064 group.cpp:452] Lost connection to ZooKeeper, attempting to reconnect ...
      2018-12-10 16:18:25,268:28813(0x70000f4b5000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:54990], sessionId=0x1679aa0ddc90003, negotiated timeout=10000
      I1210 16:18:25.268710 237449216 group.cpp:341] Group process (zookeeper-group(30)@10.0.49.4:54069) connected to ZooKeeper
      I1210 16:18:25.268859 237449216 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
      I1210 16:18:25.268879 237449216 group.cpp:419] Trying to create path '/mesos' in ZooKeeper
      2018-12-10 16:18:25,269:28813(0x70000ed1d000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:54990], sessionId=0x1679aa0ddc90004, negotiated timeout=10000
      I1210 16:18:25.269650 239058944 group.cpp:341] Group process (zookeeper-group(28)@10.0.49.4:54069) connected to ZooKeeper
      I1210 16:18:25.269680 239058944 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
      I1210 16:18:25.269692 239058944 group.cpp:419] Trying to create path '/mesos' in ZooKeeper
      2018-12-10 16:18:26,509:28813(0x70000f944000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 10088ms
      2018-12-10 16:18:26,509:28813(0x70000f944000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:54990]
      2018-12-10 16:18:26,510:28813(0x70000f944000):ZOO_ERROR@handle_socket_error_msg@1800: Socket [127.0.0.1:54990] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x1679aa0ddc90001 has expired.
      I1210 16:18:26.510257 237985792 group.cpp:511] ZooKeeper session expired
      I1210 16:18:26.510342 236376064 detector.cpp:140] The current leader (id=1) is lost
      I1210 16:18:26.510363 236376064 detector.cpp:152] Detected a new leader: None
      I1210 16:18:26.510382 235839488 contender.cpp:217] Membership cancelled: 1
      2018-12-10 16:18:26,510:28813(0x70000e2f6000):ZOO_INFO@zookeeper_close@2579: Freeing zookeeper resources for sessionId=0x1679aa0ddc90001
      
      2018-12-10 16:18:26,510:28813(0x70000e3fc000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8
      2018-12-10 16:18:26,510:28813(0x70000e3fc000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local
      2018-12-10 16:18:26,510:28813(0x70000e3fc000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin
      2018-12-10 16:18:26,510:28813(0x70000e3fc000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0
      2018-12-10 16:18:26,510:28813(0x70000e3fc000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64
      2018-12-10 16:18:26,510:28813(0x70000e3fc000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins
      2018-12-10 16:18:26,510:28813(0x70000e3fc000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins
      2018-12-10 16:18:26,510:28813(0x70000e3fc000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build
      2018-12-10 16:18:26,510:28813(0x70000e3fc000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:54990 sessionTimeout=10000 watcher=0x116d03e00 sessionId=0 sessionPasswd=<null> context=0x7fd3885d9bc0 flags=0
      2018-12-10 16:18:26,511:28813(0x70000f944000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:54990]
      2018-12-10 16:18:26,512:28813(0x70000f944000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:54990], sessionId=0x1679aa0ddc90005, negotiated timeout=10000
      I1210 16:18:26.512154 239058944 group.cpp:341] Group process (zookeeper-group(29)@10.0.49.4:54069) connected to ZooKeeper
      I1210 16:18:26.512214 239058944 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
      I1210 16:18:26.512230 239058944 group.cpp:419] Trying to create path '/mesos' in ZooKeeper
      2018-12-10 16:18:29,859:28813(0x70000f944000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 13ms
      ../../src/tests/master_contender_detector_tests.cpp:872: Failure
      Failed to wait 15secs for detected
      2018-12-10 16:18:38,091:28813(0x1142f65c0):ZOO_INFO@zookeeper_close@2562: Closing zookeeper sessionId=0x1679aa0ddc90003 to [127.0.0.1:54990]
      
      I1210 16:18:38.091266 338650560 contender.cpp:207] Now cancelling the membership: 1
      2018-12-10 16:18:38,091:28813(0x1142f65c0):ZOO_INFO@zookeeper_close@2562: Closing zookeeper sessionId=0x1679aa0ddc90005 to [127.0.0.1:54990]
      
      I1210 16:18:38.091816 338650560 contender.cpp:207] Now cancelling the membership: 0
      2018-12-10 16:18:38,092:28813(0x1142f65c0):ZOO_INFO@zookeeper_close@2562: Closing zookeeper sessionId=0x1679aa0ddc90004 to [127.0.0.1:54990]
      
      I1210 16:18:38.092182 338650560 zookeeper_test_server.cpp:116] Shutting down ZooKeeperTestServer on port 54990
      [  FAILED  ] ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSessionNewMaster (25032 ms)
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bennoe Benno Evers
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: