Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.7.0, 1.8.0
-
None
-
None
-
Ubuntu 16.04
Description
Observed this on the ASF CI - Ubuntu 16.04, all variants:
mesos-ec2-ubuntu-16.04-Clang.Mesos.HttpFaultToleranceTest.SchedulerSubscribeAfterFailoverTimeout mesos-ec2-ubuntu-16.04-Plain.Mesos.HttpFaultToleranceTest.SchedulerSubscribeAfterFailoverTimeout mesos-ec2-ubuntu-16.04-SSL_GRPC.Mesos.HttpFaultToleranceTest.SchedulerSubscribeAfterFailoverTimeout
04:04:32 [ RUN ] HttpFaultToleranceTest.SchedulerSubscribeAfterFailoverTimeout 04:04:32 I0907 04:04:32.274106 31619 cluster.cpp:173] Creating default 'local' authorizer 04:04:32 I0907 04:04:32.275048 31642 master.cpp:413] Master 7979925f-4333-4207-b20f-b7ab6ea59fdd (ip-172-16-10-184.ec2.internal) started on 172.16.10.184:35708 04:04:32 I0907 04:04:32.275074 31642 master.cpp:416] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="false" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/MwIsXa/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --role_sorter="drf" --root_submissions="true" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/MwIsXa/master" --zk_session_timeout="10secs" 04:04:32 I0907 04:04:32.275192 31642 master.cpp:467] Master allowing unauthenticated frameworks to register 04:04:32 I0907 04:04:32.275202 31642 master.cpp:471] Master only allowing authenticated agents to register 04:04:32 I0907 04:04:32.275208 31642 master.cpp:477] Master only allowing authenticated HTTP frameworks to register 04:04:32 I0907 04:04:32.275214 31642 credentials.hpp:37] Loading credentials for authentication from '/tmp/MwIsXa/credentials' 04:04:32 I0907 04:04:32.275274 31642 master.cpp:521] Using default 'crammd5' authenticator 04:04:32 I0907 04:04:32.275319 31642 http.cpp:1037] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' 04:04:32 I0907 04:04:32.275353 31642 http.cpp:1037] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' 04:04:32 I0907 04:04:32.275379 31642 http.cpp:1037] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' 04:04:32 I0907 04:04:32.275403 31642 master.cpp:602] Authorization enabled 04:04:32 I0907 04:04:32.275573 31643 hierarchical.cpp:182] Initialized hierarchical allocator process 04:04:32 I0907 04:04:32.275609 31643 whitelist_watcher.cpp:77] No whitelist given 04:04:32 I0907 04:04:32.276106 31646 master.cpp:2083] Elected as the leading master! 04:04:32 I0907 04:04:32.276127 31646 master.cpp:1638] Recovering from registrar 04:04:32 I0907 04:04:32.276168 31646 registrar.cpp:339] Recovering registrar 04:04:32 I0907 04:04:32.276291 31643 registrar.cpp:383] Successfully fetched the registry (0B) in 104960ns 04:04:32 I0907 04:04:32.276325 31643 registrar.cpp:487] Applied 1 operations in 7790ns; attempting to update the registry 04:04:32 I0907 04:04:32.276428 31646 registrar.cpp:544] Successfully updated the registry in 86016ns 04:04:32 I0907 04:04:32.276450 31646 registrar.cpp:416] Successfully recovered registrar 04:04:32 I0907 04:04:32.276528 31646 master.cpp:1752] Recovered 0 agents from the registry (176B); allowing 10mins for agents to reregister 04:04:32 I0907 04:04:32.276536 31641 hierarchical.cpp:220] Skipping recovery of hierarchical allocator: nothing to recover 04:04:32 I0907 04:04:32.276836 31619 scheduler.cpp:189] Version: 1.8.0 04:04:32 I0907 04:04:32.277160 31643 scheduler.cpp:355] Using default 'basic' HTTP authenticatee 04:04:32 I0907 04:04:32.277261 31643 scheduler.cpp:538] New master detected at master@172.16.10.184:35708 04:04:32 I0907 04:04:32.277271 31643 scheduler.cpp:547] Waiting for 0ns before initiating a re-(connection) attempt with the master 04:04:32 I0907 04:04:32.277776 31639 scheduler.cpp:429] Connected with the master at http://172.16.10.184:35708/master/api/v1/scheduler 04:04:32 I0907 04:04:32.278054 31642 scheduler.cpp:248] Sending SUBSCRIBE call to http://172.16.10.184:35708/master/api/v1/scheduler 04:04:32 I0907 04:04:32.278386 31643 process.cpp:3569] Handling HTTP event for process 'master' with path: '/master/api/v1/scheduler' 04:04:32 I0907 04:04:32.278705 31644 http.cpp:1177] HTTP POST for /master/api/v1/scheduler from 172.16.10.184:39922 04:04:32 I0907 04:04:32.278781 31644 master.cpp:2502] Received subscription request for HTTP framework 'default' 04:04:32 I0907 04:04:32.278818 31644 master.cpp:2155] Authorizing framework principal 'test-principal' to receive offers for roles '{ * }' 04:04:32 I0907 04:04:32.278971 31644 master.cpp:2637] Subscribing framework 'default' with checkpointing disabled and capabilities [ MULTI_ROLE, RESERVATION_REFINEMENT ] 04:04:32 I0907 04:04:32.279516 31644 master.cpp:9883] Adding framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default) with roles { } suppressed 04:04:32 I0907 04:04:32.279621 31642 hierarchical.cpp:306] Added framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 04:04:32 I0907 04:04:32.279753 31642 hierarchical.cpp:1564] Performed allocation for 0 agents in 9539ns 04:04:32 I0907 04:04:32.280083 31643 scheduler.cpp:845] Enqueuing event SUBSCRIBED received from http://172.16.10.184:35708/master/api/v1/scheduler 04:04:32 I0907 04:04:32.280647 31639 master.cpp:1366] Framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default) disconnected 04:04:32 I0907 04:04:32.280664 31639 master.cpp:3230] Deactivating framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default) 04:04:32 I0907 04:04:32.280678 31639 master.cpp:3207] Disconnecting framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default) 04:04:32 I0907 04:04:32.280686 31639 master.cpp:1381] Giving framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default) 2weeks to failover 04:04:32 I0907 04:04:32.280725 31639 hierarchical.cpp:420] Deactivated framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 04:04:32 I0907 04:04:32.284456 31640 hierarchical.cpp:1564] Performed allocation for 0 agents in 7296ns 04:04:32 I0907 04:04:32.284528 31640 master.cpp:1827] Skipping periodic registry garbage collection: no agents qualify for removal 04:04:32 I0907 04:04:32.284554 31640 master.cpp:9261] Framework failover timeout, removing framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default) 04:04:32 I0907 04:04:32.284564 31640 master.cpp:10197] Removing framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default) 04:04:32 I0907 04:04:32.284636 31643 hierarchical.cpp:359] Removed framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 04:04:32 I0907 04:04:32.284920 31619 scheduler.cpp:189] Version: 1.8.0 04:04:32 I0907 04:04:32.285246 31642 scheduler.cpp:355] Using default 'basic' HTTP authenticatee 04:04:32 I0907 04:04:32.285333 31642 scheduler.cpp:538] New master detected at master@172.16.10.184:35708 04:04:32 I0907 04:04:32.285346 31642 scheduler.cpp:547] Waiting for 0ns before initiating a re-(connection) attempt with the master 04:04:32 I0907 04:04:32.285820 31643 scheduler.cpp:429] Connected with the master at http://172.16.10.184:35708/master/api/v1/scheduler 04:04:32 I0907 04:04:32.286113 31645 scheduler.cpp:248] Sending SUBSCRIBE call to http://172.16.10.184:35708/master/api/v1/scheduler 04:04:32 I0907 04:04:32.286424 31640 process.cpp:3569] Handling HTTP event for process 'master' with path: '/master/api/v1/scheduler' 04:04:32 I0907 04:04:32.286682 31640 http.cpp:1177] HTTP POST for /master/api/v1/scheduler from 172.16.10.184:39926 04:04:32 I0907 04:04:32.286759 31640 master.cpp:2502] Received subscription request for HTTP framework 'default' 04:04:32 I0907 04:04:32.286778 31640 master.cpp:2574] Refusing subscription of framework 'default': Framework has been removed 04:04:32 I0907 04:04:32.292038 31646 scheduler.cpp:512] Re-detecting master 04:04:32 I0907 04:04:32.292126 31646 scheduler.cpp:799] Ignoring event from old stale connection 04:04:32 I0907 04:04:32.292181 31646 scheduler.cpp:463] Ignoring disconnection attempt from stale connection 04:04:32 I0907 04:04:32.292212 31646 scheduler.cpp:538] New master detected at master@172.16.10.184:35708 04:04:32 I0907 04:04:32.292228 31646 scheduler.cpp:547] Waiting for 0ns before initiating a re-(connection) attempt with the master 04:04:32 I0907 04:04:32.292666 31644 scheduler.cpp:429] Connected with the master at http://172.16.10.184:35708/master/api/v1/scheduler 04:04:33 I0907 04:04:33.285504 31643 hierarchical.cpp:1564] Performed allocation for 0 agents in 11191ns 04:04:34 I0907 04:04:34.286386 31644 hierarchical.cpp:1564] Performed allocation for 0 agents in 11069ns 04:04:35 I0907 04:04:35.287335 31639 hierarchical.cpp:1564] Performed allocation for 0 agents in 10568ns 04:04:36 I0907 04:04:36.288154 31640 hierarchical.cpp:1564] Performed allocation for 0 agents in 11117ns 04:04:37 I0907 04:04:37.289096 31646 hierarchical.cpp:1564] Performed allocation for 0 agents in 11680ns 04:04:38 I0907 04:04:38.289909 31642 hierarchical.cpp:1564] Performed allocation for 0 agents in 11237ns 04:04:39 I0907 04:04:39.290695 31641 hierarchical.cpp:1564] Performed allocation for 0 agents in 11360ns 04:04:40 I0907 04:04:40.291617 31645 hierarchical.cpp:1564] Performed allocation for 0 agents in 11394ns 04:04:41 I0907 04:04:41.292557 31643 hierarchical.cpp:1564] Performed allocation for 0 agents in 12273ns 04:04:42 I0907 04:04:42.293593 31644 hierarchical.cpp:1564] Performed allocation for 0 agents in 11371ns 04:04:43 I0907 04:04:43.294623 31639 hierarchical.cpp:1564] Performed allocation for 0 agents in 12066ns 04:04:44 I0907 04:04:44.295605 31640 hierarchical.cpp:1564] Performed allocation for 0 agents in 12010ns 04:04:45 I0907 04:04:45.296494 31646 hierarchical.cpp:1564] Performed allocation for 0 agents in 11259ns 04:04:46 I0907 04:04:46.297247 31642 hierarchical.cpp:1564] Performed allocation for 0 agents in 11839ns 04:04:47 ../../src/tests/http_fault_tolerance_tests.cpp:200: Failure 04:04:47 Failed to wait 15secs for error 04:04:47 ../../src/tests/http_fault_tolerance_tests.cpp:182: Failure 04:04:47 Actual function call count doesn't match EXPECT_CALL(*scheduler, error(_, _))... 04:04:47 Expected: to be called once 04:04:47 Actual: never called - unsatisfied and active 04:04:47 I0907 04:04:47.287803 31619 master.cpp:1093] Master terminating 04:04:47 [ FAILED ] HttpFaultToleranceTest.SchedulerSubscribeAfterFailoverTimeout (15016 ms)
Attachments
Issue Links
- blocks
-
MESOS-9189 Include 'Connection: close' header in master streaming API responses.
- Accepted
- is related to
-
MESOS-9216 SchedulerTest.SchedulerFailover is flaky and times out.
- Open
-
MESOS-7111 HttpFaultToleranceTest.SchedulerFailoverFrameworkToExecutorMessage segfaults
- Accepted