Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9216

SchedulerTest.SchedulerFailover is flaky and times out.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.8.0
    • Fix Version/s: None
    • Component/s: scheduler api, test
    • Labels:
    • Environment:

      debian-9, centos-6, ubuntu-16.04, ..., macOS

      Description

      Easy to reproduce for me on macOS but also observed on the ASF CI;

      $ ./bin/mesos-tests.sh --gtest_filter="*SchedulerTest.SchedulerFailover*" --gtest_repeat=100 --gtest_break_on_failure --verbose
      
      [...]
      Repeating all tests (iteration 61) . . .
      [...]
      [ RUN      ] ContentType/SchedulerTest.SchedulerFailover/1
      I0907 11:31:42.409766 311620992 cluster.cpp:173] Creating default 'local' authorizer
      I0907 11:31:42.411957 110624768 master.cpp:413] Master 4450e893-595f-48c2-9ea2-31325fda2c76 (lobomacpro4.fritz.box) started on 192.168.178.20:54546
      I0907 11:31:42.411975 110624768 master.cpp:416] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" --authenticators="crammd5" --authorizers="local" --credentials="/private/var/folders/66/mgr662nx7t90lspb7wjg8ctr0000gn/T/aVGDNy/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --role_sorter="drf" --root_submissions="true" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/private/var/folders/66/mgr662nx7t90lspb7wjg8ctr0000gn/T/aVGDNy/master" --zk_session_timeout="10secs"
      I0907 11:31:42.412191 110624768 master.cpp:465] Master only allowing authenticated frameworks to register
      I0907 11:31:42.412202 110624768 master.cpp:471] Master only allowing authenticated agents to register
      I0907 11:31:42.412210 110624768 master.cpp:477] Master only allowing authenticated HTTP frameworks to register
      I0907 11:31:42.412219 110624768 credentials.hpp:37] Loading credentials for authentication from '/private/var/folders/66/mgr662nx7t90lspb7wjg8ctr0000gn/T/aVGDNy/credentials'
      I0907 11:31:42.412322 110624768 master.cpp:521] Using default 'crammd5' authenticator
      I0907 11:31:42.412355 110624768 http.cpp:1037] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
      I0907 11:31:42.412390 110624768 http.cpp:1037] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'
      I0907 11:31:42.412417 110624768 http.cpp:1037] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler'
      I0907 11:31:42.412439 110624768 master.cpp:602] Authorization enabled
      I0907 11:31:42.413738 110624768 master.cpp:2083] Elected as the leading master!
      I0907 11:31:42.413750 110624768 master.cpp:1638] Recovering from registrar
      I0907 11:31:42.413913 109551616 registrar.cpp:383] Successfully fetched the registry (0B) in 128us
      I0907 11:31:42.413962 109551616 registrar.cpp:487] Applied 1 operations in 19755ns; attempting to update the registry
      I0907 11:31:42.414093 109551616 registrar.cpp:544] Successfully updated the registry in 107008ns
      I0907 11:31:42.414126 109551616 registrar.cpp:416] Successfully recovered registrar
      I0907 11:31:42.414232 110624768 master.cpp:1752] Recovered 0 agents from the registry (162B); allowing 10mins for agents to reregister
      I0907 11:31:42.414614 311620992 scheduler.cpp:189] Version: 1.8.0
      I0907 11:31:42.415856 113844224 scheduler.cpp:355] Using default 'basic' HTTP authenticatee
      I0907 11:31:42.415974 112771072 scheduler.cpp:538] New master detected at master@192.168.178.20:54546
      I0907 11:31:42.417650 113844224 http.cpp:1177] HTTP POST for /master/api/v1/scheduler from 192.168.178.20:55273
      I0907 11:31:42.417768 113844224 master.cpp:2502] Received subscription request for HTTP framework 'default'
      I0907 11:31:42.417788 113844224 master.cpp:2155] Authorizing framework principal 'test-principal' to receive offers for roles '{ * }'
      I0907 11:31:42.417914 113844224 master.cpp:2637] Subscribing framework 'default' with checkpointing disabled and capabilities [ MULTI_ROLE, RESERVATION_REFINEMENT ]
      I0907 11:31:42.418388 113844224 master.cpp:9883] Adding framework 4450e893-595f-48c2-9ea2-31325fda2c76-0000 (default) with roles {  } suppressed
      I0907 11:31:42.418522 110624768 hierarchical.cpp:306] Added framework 4450e893-595f-48c2-9ea2-31325fda2c76-0000
      I0907 11:31:42.419454 311620992 scheduler.cpp:189] Version: 1.8.0
      I0907 11:31:42.420704 110088192 scheduler.cpp:355] Using default 'basic' HTTP authenticatee
      I0907 11:31:42.420807 111161344 scheduler.cpp:538] New master detected at master@192.168.178.20:54546
      I0907 11:31:42.422297 113844224 http.cpp:1177] HTTP POST for /master/api/v1/scheduler from 192.168.178.20:55275
      I0907 11:31:42.422423 113844224 master.cpp:2502] Received subscription request for HTTP framework 'default'
      I0907 11:31:42.422446 113844224 master.cpp:2155] Authorizing framework principal 'test-principal' to receive offers for roles '{ * }'
      I0907 11:31:42.422591 113844224 master.cpp:2637] Subscribing framework 'default' with checkpointing disabled and capabilities [ MULTI_ROLE, RESERVATION_REFINEMENT ]
      I0907 11:31:42.422608 113844224 master.cpp:7760] Updating framework 4450e893-595f-48c2-9ea2-31325fda2c76-0000 (default) with roles {  } suppressed
      I0907 11:31:42.422904 111161344 master.cpp:1226] Ignoring disconnection for framework 4450e893-595f-48c2-9ea2-31325fda2c76-0000 (default) as it has already reconnected
      I0907 11:31:42.423132 113844224 scheduler.cpp:512] Re-detecting master
      I0907 11:31:42.423475 113844224 scheduler.cpp:538] New master detected at master@192.168.178.20:54546
      ../../src/tests/scheduler_tests.cpp:251: Failure
      Failed to wait 15secs for error
      *** Aborted at 1536312717 (unix time) try "date -d @1536312717" if you are using GNU date ***
      PC: @        0x10d891ded testing::UnitTest::AddTestPartResult()
      *** SIGSEGV (@0x0) received by PID 16639 (TID 0x11292f580) stack trace: ***
          @     0x7fff72af7b3d _sigtramp
          @        0x1108a1a00 (unknown)
          @        0x10d8915e7 testing::internal::AssertHelper::operator=()
          @        0x10cf83948 mesos::internal::tests::SchedulerTest_SchedulerFailover_Test::TestBody()
          @        0x10d904c4e testing::internal::HandleSehExceptionsInMethodIfSupported<>()
          @        0x10d8a9a9b testing::internal::HandleExceptionsInMethodIfSupported<>()
          @        0x10d8a99c6 testing::Test::Run()
          @        0x10d8ab79d testing::TestInfo::Run()
          @        0x10d8acddc testing::TestCase::Run()
          @        0x10d8bd2cc testing::internal::UnitTestImpl::RunAllTests()
          @        0x10d90779e testing::internal::HandleSehExceptionsInMethodIfSupported<>()
          @        0x10d8bcceb testing::internal::HandleExceptionsInMethodIfSupported<>()
          @        0x10d8bcbac testing::UnitTest::Run()
          @        0x10c1f52f1 RUN_ALL_TESTS()
          @        0x10c1f0c9c main
          @     0x7fff7290e0a1 start
      Segmentation fault: 11
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                tillt Till Toenshoff
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: