Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-5223

MasterAllocatorTest/1.RebalancedForUpdatedWeights is flaky

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • allocation, flaky, test

    Description

      I0415 06:52:22.243783 31906 cluster.cpp:149] Creating default 'local' authorizer
      I0415 06:52:22.365927 31906 leveldb.cpp:174] Opened db in 121.715227ms
      I0415 06:52:22.413648 31906 leveldb.cpp:181] Compacted db in 47.651756ms
      I0415 06:52:22.413713 31906 leveldb.cpp:196] Created db iterator in 25647ns
      I0415 06:52:22.413729 31906 leveldb.cpp:202] Seeked to beginning of db in 1890ns
      I0415 06:52:22.413741 31906 leveldb.cpp:271] Iterated through 0 keys in the db in 317ns
      I0415 06:52:22.413800 31906 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
      I0415 06:52:22.414681 31939 recover.cpp:447] Starting replica recovery
      I0415 06:52:22.414999 31939 recover.cpp:473] Replica is in EMPTY status
      I0415 06:52:22.416792 31939 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (17242)@172.17.0.2:44024
      I0415 06:52:22.417222 31925 recover.cpp:193] Received a recover response from a replica in EMPTY status
      I0415 06:52:22.417966 31925 recover.cpp:564] Updating replica status to STARTING
      I0415 06:52:22.421860 31933 master.cpp:382] Master c4bfcab0-cd45-4c65-953a-f810c14806e0 (37d6f4eebe29) started on 172.17.0.2:44024
      I0415 06:52:22.421900 31933 master.cpp:384] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="false" --authenticate_http="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/ImAAfx/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-0.29.0/_inst/share/mesos/webui" --work_dir="/tmp/ImAAfx/master" --zk_session_timeout="10secs"
      I0415 06:52:22.422327 31933 master.cpp:435] Master allowing unauthenticated frameworks to register
      I0415 06:52:22.422339 31933 master.cpp:438] Master only allowing authenticated agents to register
      I0415 06:52:22.422349 31933 credentials.hpp:37] Loading credentials for authentication from '/tmp/ImAAfx/credentials'
      I0415 06:52:22.422750 31933 master.cpp:480] Using default 'crammd5' authenticator
      I0415 06:52:22.422914 31933 master.cpp:551] Using default 'basic' HTTP authenticator
      I0415 06:52:22.423054 31933 master.cpp:589] Authorization enabled
      I0415 06:52:22.423259 31926 hierarchical.cpp:142] Initialized hierarchical allocator process
      I0415 06:52:22.423327 31926 whitelist_watcher.cpp:77] No whitelist given
      I0415 06:52:22.425593 31937 master.cpp:1832] The newly elected leader is master@172.17.0.2:44024 with id c4bfcab0-cd45-4c65-953a-f810c14806e0
      I0415 06:52:22.425631 31937 master.cpp:1845] Elected as the leading master!
      I0415 06:52:22.425650 31937 master.cpp:1532] Recovering from registrar
      I0415 06:52:22.425915 31937 registrar.cpp:331] Recovering registrar
      I0415 06:52:22.458044 31928 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 39.766176ms
      I0415 06:52:22.458093 31928 replica.cpp:320] Persisted replica status to STARTING
      I0415 06:52:22.458391 31928 recover.cpp:473] Replica is in STARTING status
      I0415 06:52:22.459728 31930 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (17245)@172.17.0.2:44024
      I0415 06:52:22.459952 31928 recover.cpp:193] Received a recover response from a replica in STARTING status
      I0415 06:52:22.460414 31925 recover.cpp:564] Updating replica status to VOTING
      I0415 06:52:22.499866 31925 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 39.170393ms
      I0415 06:52:22.499905 31925 replica.cpp:320] Persisted replica status to VOTING
      I0415 06:52:22.500013 31927 recover.cpp:578] Successfully joined the Paxos group
      I0415 06:52:22.500238 31927 recover.cpp:462] Recover process terminated
      I0415 06:52:22.500746 31925 log.cpp:659] Attempting to start the writer
      I0415 06:52:22.501936 31927 replica.cpp:493] Replica received implicit promise request from (17246)@172.17.0.2:44024 with proposal 1
      I0415 06:52:22.541733 31927 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 39.750816ms
      I0415 06:52:22.541791 31927 replica.cpp:342] Persisted promised to 1
      I0415 06:52:22.542614 31925 coordinator.cpp:238] Coordinator attempting to fill missing positions
      I0415 06:52:22.543927 31940 replica.cpp:388] Replica received explicit promise request from (17247)@172.17.0.2:44024 for position 0 with proposal 2
      I0415 06:52:22.583595 31940 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 39.617873ms
      I0415 06:52:22.583636 31940 replica.cpp:712] Persisted action at 0
      I0415 06:52:22.585011 31940 replica.cpp:537] Replica received write request for position 0 from (17248)@172.17.0.2:44024
      I0415 06:52:22.585075 31940 leveldb.cpp:436] Reading position from leveldb took 31296ns
      I0415 06:52:22.626227 31940 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 41.109487ms
      I0415 06:52:22.626267 31940 replica.cpp:712] Persisted action at 0
      I0415 06:52:22.627079 31932 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0
      I0415 06:52:22.659732 31932 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 32.618299ms
      I0415 06:52:22.659770 31932 replica.cpp:712] Persisted action at 0
      I0415 06:52:22.659795 31932 replica.cpp:697] Replica learned NOP action at position 0
      I0415 06:52:22.660634 31933 log.cpp:675] Writer started with ending position 0
      I0415 06:52:22.661909 31926 leveldb.cpp:436] Reading position from leveldb took 35058ns
      I0415 06:52:22.662916 31929 registrar.cpp:364] Successfully fetched the registry (0B) in 236.949248ms
      I0415 06:52:22.663043 31929 registrar.cpp:463] Applied 1 operations in 34456ns; attempting to update the 'registry'
      I0415 06:52:22.663756 31929 log.cpp:683] Attempting to append 170 bytes to the log
      I0415 06:52:22.663882 31937 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1
      I0415 06:52:22.664733 31937 replica.cpp:537] Replica received write request for position 1 from (17249)@172.17.0.2:44024
      I0415 06:52:22.684954 31937 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 20.181228ms
      I0415 06:52:22.684986 31937 replica.cpp:712] Persisted action at 1
      I0415 06:52:22.685683 31929 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0
      I0415 06:52:22.710132 31929 leveldb.cpp:341] Persisting action (191 bytes) to leveldb took 24.40419ms
      I0415 06:52:22.710196 31929 replica.cpp:712] Persisted action at 1
      I0415 06:52:22.710227 31929 replica.cpp:697] Replica learned APPEND action at position 1
      I0415 06:52:22.711658 31935 registrar.cpp:508] Successfully updated the 'registry' in 48.544768ms
      I0415 06:52:22.711805 31935 registrar.cpp:394] Successfully recovered registrar
      I0415 06:52:22.711860 31930 log.cpp:702] Attempting to truncate the log to 1
      I0415 06:52:22.711977 31932 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2
      I0415 06:52:22.712251 31930 master.cpp:1640] Recovered 0 agents from the Registry (131B) ; allowing 10mins for agents to re-register
      I0415 06:52:22.712287 31925 hierarchical.cpp:169] Skipping recovery of hierarchical allocator: nothing to recover
      I0415 06:52:22.712888 31929 replica.cpp:537] Replica received write request for position 2 from (17250)@172.17.0.2:44024
      I0415 06:52:22.735254 31929 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 22.230197ms
      I0415 06:52:22.735362 31929 replica.cpp:712] Persisted action at 2
      I0415 06:52:22.736608 31929 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0
      I0415 06:52:22.760432 31929 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 23.583689ms
      I0415 06:52:22.760596 31929 leveldb.cpp:399] Deleting ~1 keys from leveldb took 54889ns
      I0415 06:52:22.760715 31929 replica.cpp:712] Persisted action at 2
      I0415 06:52:22.760831 31929 replica.cpp:697] Replica learned TRUNCATE action at position 2
      I0415 06:52:22.764772 31906 scheduler.cpp:175] Version: 0.29.0
      I0415 06:52:22.765332 31933 scheduler.cpp:469] New master detected at master@172.17.0.2:44024
      I0415 06:52:22.767814 31934 scheduler.cpp:358] Connected with the master at http://172.17.0.2:44024/master/api/v1/scheduler
      I0415 06:52:22.769918 31926 scheduler.cpp:440] Re-detecting master
      I0415 06:52:22.770468 31926 scheduler.cpp:391] Ignoring disconnection attempt from stale connection
      I0415 06:52:22.770582 31926 scheduler.cpp:391] Ignoring disconnection attempt from stale connection
      I0415 06:52:22.770738 31926 scheduler.cpp:469] New master detected at master@172.17.0.2:44024
      I0415 06:52:22.772797 31938 scheduler.cpp:358] Connected with the master at http://172.17.0.2:44024/master/api/v1/scheduler
      I0415 06:52:22.774351 31930 scheduler.cpp:444] Lost leading master
      I0415 06:52:22.774554 31930 scheduler.cpp:391] Ignoring disconnection attempt from stale connection
      I0415 06:52:22.774596 31930 scheduler.cpp:391] Ignoring disconnection attempt from stale connection
      I0415 06:52:22.775264 31939 scheduler.cpp:278] Ignoring reconnect request from scheduler since we are disconnected
      I0415 06:52:22.795821 31906 master.cpp:1089] Master terminating
      [       OK ] ContentType/SchedulerTest.SchedulerReconnect/1 (563 ms)
      [----------] 28 tests from ContentType/SchedulerTest (19871 ms total)
      
      [----------] Global test environment tear-down
      [==========] 1068 tests from 130 test cases ran. (858866 ms total)
      [  PASSED  ] 1067 tests.
      [  FAILED  ] 1 test, listed below:
      [  FAILED  ] MasterAllocatorTest/1.RebalancedForUpdatedWeights, where TypeParam = mesos::internal::tests::Module<mesos::master::allocator::Allocator, (mesos::internal::tests::ModuleID)6>
      
       1 FAILED TEST
        YOU HAVE 6 DISABLED TESTS
      
      make[4]: *** [check-local] Error 1
      make[4]: Leaving directory `/mesos/mesos-0.29.0/_build/src'
      make[3]: *** [check-am] Error 2
      make[3]: Leaving directory `/mesos/mesos-0.29.0/_build/src'
      make[2]: Leaving directory `/mesos/mesos-0.29.0/_build/src'
      make[2]: *** [check] Error 2
      make[1]: Leaving directory `/mesos/mesos-0.29.0/_build'
      make[1]: *** [check-recursive] Error 1
      make: *** [distcheck] Error 1
      + docker rmi mesos-1460700132-26235
      Untagged: mesos-1460700132-26235:latest
      Deleted: 10b530f72567e86835533fd7c2abcf1da82a0854b5d2800d49f59acd02f89594
      Deleted: bbf5088e7150b0ec02e1245fe31c33342f8cc62c03e884072ad8a2b6ef784b39
      Deleted: 032a8c1023dc662765711ba0ebdb3db4ec6587fed8ba6501f6e2dd4cce767c64
      Deleted: 85171cc26dd8d78a29e0d41721f8952d0bc7511c47f4640e74779b24912c9f12
      Deleted: 5de8958f36cb31adacc6453023c11672088580abe9bf89f81739538d1bc2aaab
      Deleted: 327de4c4ce93976c566d4573e4a040c3c6b690c5bffae312dbb59631e7165f0c
      Deleted: 3113a3b702fcea6030e96c4430449ea39d943634874fd6916864799126eab314
      Deleted: ddbb9c88d19533ea1356576fa44cb4a86c8135a95dbb698447d8449e4769bfc5
      Deleted: d501f0e2d04230137a9cd241442bf2e1be00786c8df9d7c36748faf65e7e5308
      Deleted: 1cc136efb0d2c1e73689a6254a6700ee7c8ff43af50ceb180fa0f8137f0281e1
      Deleted: dc910f88300c093e680418b4189eb072253fd6fc2a0817e74dfb7319fdbcaef9
      Deleted: 7014e2fc12b3a59bce0f4638d2025e9107befb22d060a14bac6b46a4292a2bb2
      Deleted: 349eb9fd40d7a0863a1305004a0cc1b9871fced4f829434e90c4159727b6fd4a
      Deleted: 2f8f77b4eab09d68b59e476b28fdeec80057a7e757e35b554c53254f59efbc2c
      Deleted: e7b4174e120a507b6cfd55ad64976c3ef54c32fe7211881d5b0c5d0f258416e4
      Deleted: 976ce2b6490f82b5ec3221d256bfb30e4fa9f7f49e5f5151038ee336ed5d57e1
      
      Full log: https://builds.apache.org/job/mesos-reviewbot/12541/console
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gyliu Guangya Liu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: