Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9215

SchedulerSubscribeAfterFailoverTimeout times out.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.7.0, 1.8.0
    • Fix Version/s: None
    • Component/s: flaky, scheduler api, test
    • Labels:
      None
    • Environment:

      Ubuntu 16.04

      Description

      Observed this on the ASF CI - Ubuntu 16.04, all variants:

      mesos-ec2-ubuntu-16.04-Clang.Mesos.HttpFaultToleranceTest.SchedulerSubscribeAfterFailoverTimeout
      mesos-ec2-ubuntu-16.04-Plain.Mesos.HttpFaultToleranceTest.SchedulerSubscribeAfterFailoverTimeout
      mesos-ec2-ubuntu-16.04-SSL_GRPC.Mesos.HttpFaultToleranceTest.SchedulerSubscribeAfterFailoverTimeout
      
      04:04:32  [ RUN      ] HttpFaultToleranceTest.SchedulerSubscribeAfterFailoverTimeout
      04:04:32  I0907 04:04:32.274106 31619 cluster.cpp:173] Creating default 'local' authorizer
      04:04:32  I0907 04:04:32.275048 31642 master.cpp:413] Master 7979925f-4333-4207-b20f-b7ab6ea59fdd (ip-172-16-10-184.ec2.internal) started on 172.16.10.184:35708
      04:04:32  I0907 04:04:32.275074 31642 master.cpp:416] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="false" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/MwIsXa/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --role_sorter="drf" --root_submissions="true" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/MwIsXa/master" --zk_session_timeout="10secs"
      04:04:32  I0907 04:04:32.275192 31642 master.cpp:467] Master allowing unauthenticated frameworks to register
      04:04:32  I0907 04:04:32.275202 31642 master.cpp:471] Master only allowing authenticated agents to register
      04:04:32  I0907 04:04:32.275208 31642 master.cpp:477] Master only allowing authenticated HTTP frameworks to register
      04:04:32  I0907 04:04:32.275214 31642 credentials.hpp:37] Loading credentials for authentication from '/tmp/MwIsXa/credentials'
      04:04:32  I0907 04:04:32.275274 31642 master.cpp:521] Using default 'crammd5' authenticator
      04:04:32  I0907 04:04:32.275319 31642 http.cpp:1037] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
      04:04:32  I0907 04:04:32.275353 31642 http.cpp:1037] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'
      04:04:32  I0907 04:04:32.275379 31642 http.cpp:1037] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler'
      04:04:32  I0907 04:04:32.275403 31642 master.cpp:602] Authorization enabled
      04:04:32  I0907 04:04:32.275573 31643 hierarchical.cpp:182] Initialized hierarchical allocator process
      04:04:32  I0907 04:04:32.275609 31643 whitelist_watcher.cpp:77] No whitelist given
      04:04:32  I0907 04:04:32.276106 31646 master.cpp:2083] Elected as the leading master!
      04:04:32  I0907 04:04:32.276127 31646 master.cpp:1638] Recovering from registrar
      04:04:32  I0907 04:04:32.276168 31646 registrar.cpp:339] Recovering registrar
      04:04:32  I0907 04:04:32.276291 31643 registrar.cpp:383] Successfully fetched the registry (0B) in 104960ns
      04:04:32  I0907 04:04:32.276325 31643 registrar.cpp:487] Applied 1 operations in 7790ns; attempting to update the registry
      04:04:32  I0907 04:04:32.276428 31646 registrar.cpp:544] Successfully updated the registry in 86016ns
      04:04:32  I0907 04:04:32.276450 31646 registrar.cpp:416] Successfully recovered registrar
      04:04:32  I0907 04:04:32.276528 31646 master.cpp:1752] Recovered 0 agents from the registry (176B); allowing 10mins for agents to reregister
      04:04:32  I0907 04:04:32.276536 31641 hierarchical.cpp:220] Skipping recovery of hierarchical allocator: nothing to recover
      04:04:32  I0907 04:04:32.276836 31619 scheduler.cpp:189] Version: 1.8.0
      04:04:32  I0907 04:04:32.277160 31643 scheduler.cpp:355] Using default 'basic' HTTP authenticatee
      04:04:32  I0907 04:04:32.277261 31643 scheduler.cpp:538] New master detected at master@172.16.10.184:35708
      04:04:32  I0907 04:04:32.277271 31643 scheduler.cpp:547] Waiting for 0ns before initiating a re-(connection) attempt with the master
      04:04:32  I0907 04:04:32.277776 31639 scheduler.cpp:429] Connected with the master at http://172.16.10.184:35708/master/api/v1/scheduler
      04:04:32  I0907 04:04:32.278054 31642 scheduler.cpp:248] Sending SUBSCRIBE call to http://172.16.10.184:35708/master/api/v1/scheduler
      04:04:32  I0907 04:04:32.278386 31643 process.cpp:3569] Handling HTTP event for process 'master' with path: '/master/api/v1/scheduler'
      04:04:32  I0907 04:04:32.278705 31644 http.cpp:1177] HTTP POST for /master/api/v1/scheduler from 172.16.10.184:39922
      04:04:32  I0907 04:04:32.278781 31644 master.cpp:2502] Received subscription request for HTTP framework 'default'
      04:04:32  I0907 04:04:32.278818 31644 master.cpp:2155] Authorizing framework principal 'test-principal' to receive offers for roles '{ * }'
      04:04:32  I0907 04:04:32.278971 31644 master.cpp:2637] Subscribing framework 'default' with checkpointing disabled and capabilities [ MULTI_ROLE, RESERVATION_REFINEMENT ]
      04:04:32  I0907 04:04:32.279516 31644 master.cpp:9883] Adding framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default) with roles {  } suppressed
      04:04:32  I0907 04:04:32.279621 31642 hierarchical.cpp:306] Added framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000
      04:04:32  I0907 04:04:32.279753 31642 hierarchical.cpp:1564] Performed allocation for 0 agents in 9539ns
      04:04:32  I0907 04:04:32.280083 31643 scheduler.cpp:845] Enqueuing event SUBSCRIBED received from http://172.16.10.184:35708/master/api/v1/scheduler
      04:04:32  I0907 04:04:32.280647 31639 master.cpp:1366] Framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default) disconnected
      04:04:32  I0907 04:04:32.280664 31639 master.cpp:3230] Deactivating framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default)
      04:04:32  I0907 04:04:32.280678 31639 master.cpp:3207] Disconnecting framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default)
      04:04:32  I0907 04:04:32.280686 31639 master.cpp:1381] Giving framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default) 2weeks to failover
      04:04:32  I0907 04:04:32.280725 31639 hierarchical.cpp:420] Deactivated framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000
      04:04:32  I0907 04:04:32.284456 31640 hierarchical.cpp:1564] Performed allocation for 0 agents in 7296ns
      04:04:32  I0907 04:04:32.284528 31640 master.cpp:1827] Skipping periodic registry garbage collection: no agents qualify for removal
      04:04:32  I0907 04:04:32.284554 31640 master.cpp:9261] Framework failover timeout, removing framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default)
      04:04:32  I0907 04:04:32.284564 31640 master.cpp:10197] Removing framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default)
      04:04:32  I0907 04:04:32.284636 31643 hierarchical.cpp:359] Removed framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000
      04:04:32  I0907 04:04:32.284920 31619 scheduler.cpp:189] Version: 1.8.0
      04:04:32  I0907 04:04:32.285246 31642 scheduler.cpp:355] Using default 'basic' HTTP authenticatee
      04:04:32  I0907 04:04:32.285333 31642 scheduler.cpp:538] New master detected at master@172.16.10.184:35708
      04:04:32  I0907 04:04:32.285346 31642 scheduler.cpp:547] Waiting for 0ns before initiating a re-(connection) attempt with the master
      04:04:32  I0907 04:04:32.285820 31643 scheduler.cpp:429] Connected with the master at http://172.16.10.184:35708/master/api/v1/scheduler
      04:04:32  I0907 04:04:32.286113 31645 scheduler.cpp:248] Sending SUBSCRIBE call to http://172.16.10.184:35708/master/api/v1/scheduler
      04:04:32  I0907 04:04:32.286424 31640 process.cpp:3569] Handling HTTP event for process 'master' with path: '/master/api/v1/scheduler'
      04:04:32  I0907 04:04:32.286682 31640 http.cpp:1177] HTTP POST for /master/api/v1/scheduler from 172.16.10.184:39926
      04:04:32  I0907 04:04:32.286759 31640 master.cpp:2502] Received subscription request for HTTP framework 'default'
      04:04:32  I0907 04:04:32.286778 31640 master.cpp:2574] Refusing subscription of framework 'default': Framework has been removed
      04:04:32  I0907 04:04:32.292038 31646 scheduler.cpp:512] Re-detecting master
      04:04:32  I0907 04:04:32.292126 31646 scheduler.cpp:799] Ignoring event from old stale connection
      04:04:32  I0907 04:04:32.292181 31646 scheduler.cpp:463] Ignoring disconnection attempt from stale connection
      04:04:32  I0907 04:04:32.292212 31646 scheduler.cpp:538] New master detected at master@172.16.10.184:35708
      04:04:32  I0907 04:04:32.292228 31646 scheduler.cpp:547] Waiting for 0ns before initiating a re-(connection) attempt with the master
      04:04:32  I0907 04:04:32.292666 31644 scheduler.cpp:429] Connected with the master at http://172.16.10.184:35708/master/api/v1/scheduler
      04:04:33  I0907 04:04:33.285504 31643 hierarchical.cpp:1564] Performed allocation for 0 agents in 11191ns
      04:04:34  I0907 04:04:34.286386 31644 hierarchical.cpp:1564] Performed allocation for 0 agents in 11069ns
      04:04:35  I0907 04:04:35.287335 31639 hierarchical.cpp:1564] Performed allocation for 0 agents in 10568ns
      04:04:36  I0907 04:04:36.288154 31640 hierarchical.cpp:1564] Performed allocation for 0 agents in 11117ns
      04:04:37  I0907 04:04:37.289096 31646 hierarchical.cpp:1564] Performed allocation for 0 agents in 11680ns
      04:04:38  I0907 04:04:38.289909 31642 hierarchical.cpp:1564] Performed allocation for 0 agents in 11237ns
      04:04:39  I0907 04:04:39.290695 31641 hierarchical.cpp:1564] Performed allocation for 0 agents in 11360ns
      04:04:40  I0907 04:04:40.291617 31645 hierarchical.cpp:1564] Performed allocation for 0 agents in 11394ns
      04:04:41  I0907 04:04:41.292557 31643 hierarchical.cpp:1564] Performed allocation for 0 agents in 12273ns
      04:04:42  I0907 04:04:42.293593 31644 hierarchical.cpp:1564] Performed allocation for 0 agents in 11371ns
      04:04:43  I0907 04:04:43.294623 31639 hierarchical.cpp:1564] Performed allocation for 0 agents in 12066ns
      04:04:44  I0907 04:04:44.295605 31640 hierarchical.cpp:1564] Performed allocation for 0 agents in 12010ns
      04:04:45  I0907 04:04:45.296494 31646 hierarchical.cpp:1564] Performed allocation for 0 agents in 11259ns
      04:04:46  I0907 04:04:46.297247 31642 hierarchical.cpp:1564] Performed allocation for 0 agents in 11839ns
      04:04:47  ../../src/tests/http_fault_tolerance_tests.cpp:200: Failure
      04:04:47  Failed to wait 15secs for error
      04:04:47  ../../src/tests/http_fault_tolerance_tests.cpp:182: Failure
      04:04:47  Actual function call count doesn't match EXPECT_CALL(*scheduler, error(_, _))...
      04:04:47           Expected: to be called once
      04:04:47             Actual: never called - unsatisfied and active
      04:04:47  I0907 04:04:47.287803 31619 master.cpp:1093] Master terminating
      04:04:47  [  FAILED  ] HttpFaultToleranceTest.SchedulerSubscribeAfterFailoverTimeout (15016 ms)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                tillt Till Toenshoff
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: