Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-4768

MasterMaintenanceTest.InverseOffers is flaky

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.28.0
    • 0.28.0
    • test
    • Mesosphere Sprint 29
    • 1

    Description

      MESOS-4169 significantly sped up this test, but also surfaced some more flakiness. This can be fixed in the same way as MESOS-4059.

      Verbose logs from ASF Centos7 build:

      [ RUN      ] MasterMaintenanceTest.InverseOffers
      I0224 22:35:53.714018  1948 leveldb.cpp:174] Opened db in 2.034387ms
      I0224 22:35:53.714663  1948 leveldb.cpp:181] Compacted db in 608839ns
      I0224 22:35:53.714709  1948 leveldb.cpp:196] Created db iterator in 19043ns
      I0224 22:35:53.714844  1948 leveldb.cpp:202] Seeked to beginning of db in 2330ns
      I0224 22:35:53.714956  1948 leveldb.cpp:271] Iterated through 0 keys in the db in 518ns
      I0224 22:35:53.715092  1948 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
      I0224 22:35:53.715646  1968 recover.cpp:447] Starting replica recovery
      I0224 22:35:53.715915  1981 recover.cpp:473] Replica is in EMPTY status
      I0224 22:35:53.717067  1972 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (4533)@172.17.0.1:36678
      I0224 22:35:53.717445  1981 recover.cpp:193] Received a recover response from a replica in EMPTY status
      I0224 22:35:53.717888  1978 recover.cpp:564] Updating replica status to STARTING
      I0224 22:35:53.718585  1979 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 525061ns
      I0224 22:35:53.718618  1979 replica.cpp:320] Persisted replica status to STARTING
      I0224 22:35:53.718827  1982 recover.cpp:473] Replica is in STARTING status
      I0224 22:35:53.719728  1969 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (4534)@172.17.0.1:36678
      I0224 22:35:53.719974  1971 recover.cpp:193] Received a recover response from a replica in STARTING status
      I0224 22:35:53.720369  1970 recover.cpp:564] Updating replica status to VOTING
      I0224 22:35:53.720789  1982 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 322308ns
      I0224 22:35:53.720823  1982 replica.cpp:320] Persisted replica status to VOTING
      I0224 22:35:53.720968  1982 recover.cpp:578] Successfully joined the Paxos group
      I0224 22:35:53.721101  1982 recover.cpp:462] Recover process terminated
      I0224 22:35:53.721698  1982 master.cpp:376] Master aab18b61-7811-4c43-a672-d1a63818c880 (4db5fa128d2d) started on 172.17.0.1:36678
      I0224 22:35:53.721719  1982 master.cpp:378] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="false" --authenticate_http="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/MjbcWP/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-0.28.0/_inst/share/mesos/webui" --work_dir="/tmp/MjbcWP/master" --zk_session_timeout="10secs"
      I0224 22:35:53.722039  1982 master.cpp:425] Master allowing unauthenticated frameworks to register
      I0224 22:35:53.722053  1982 master.cpp:428] Master only allowing authenticated slaves to register
      I0224 22:35:53.722061  1982 credentials.hpp:35] Loading credentials for authentication from '/tmp/MjbcWP/credentials'
      I0224 22:35:53.722394  1982 master.cpp:468] Using default 'crammd5' authenticator
      I0224 22:35:53.722525  1982 master.cpp:537] Using default 'basic' HTTP authenticator
      I0224 22:35:53.722661  1982 master.cpp:571] Authorization enabled
      I0224 22:35:53.722813  1968 hierarchical.cpp:144] Initialized hierarchical allocator process
      I0224 22:35:53.722846  1980 whitelist_watcher.cpp:77] No whitelist given
      I0224 22:35:53.724957  1977 master.cpp:1712] The newly elected leader is master@172.17.0.1:36678 with id aab18b61-7811-4c43-a672-d1a63818c880
      I0224 22:35:53.725000  1977 master.cpp:1725] Elected as the leading master!
      I0224 22:35:53.725023  1977 master.cpp:1470] Recovering from registrar
      I0224 22:35:53.725306  1967 registrar.cpp:307] Recovering registrar
      I0224 22:35:53.725808  1977 log.cpp:659] Attempting to start the writer
      I0224 22:35:53.727145  1973 replica.cpp:493] Replica received implicit promise request from (4536)@172.17.0.1:36678 with proposal 1
      I0224 22:35:53.727728  1973 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 424560ns
      I0224 22:35:53.727828  1973 replica.cpp:342] Persisted promised to 1
      I0224 22:35:53.729080  1973 coordinator.cpp:238] Coordinator attempting to fill missing positions
      I0224 22:35:53.731009  1979 replica.cpp:388] Replica received explicit promise request from (4537)@172.17.0.1:36678 for position 0 with proposal 2
      I0224 22:35:53.731580  1979 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 478479ns
      I0224 22:35:53.731613  1979 replica.cpp:712] Persisted action at 0
      I0224 22:35:53.734354  1979 replica.cpp:537] Replica received write request for position 0 from (4538)@172.17.0.1:36678
      I0224 22:35:53.734485  1979 leveldb.cpp:436] Reading position from leveldb took 60879ns
      I0224 22:35:53.735877  1979 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 1.324061ms
      I0224 22:35:53.735930  1979 replica.cpp:712] Persisted action at 0
      I0224 22:35:53.737061  1970 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0
      I0224 22:35:53.738881  1970 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 1.772814ms
      I0224 22:35:53.738939  1970 replica.cpp:712] Persisted action at 0
      I0224 22:35:53.738975  1970 replica.cpp:697] Replica learned NOP action at position 0
      I0224 22:35:53.740136  1976 log.cpp:675] Writer started with ending position 0
      I0224 22:35:53.741750  1976 leveldb.cpp:436] Reading position from leveldb took 74863ns
      I0224 22:35:53.743479  1976 registrar.cpp:340] Successfully fetched the registry (0B) in 18.11968ms
      I0224 22:35:53.743755  1976 registrar.cpp:439] Applied 1 operations in 56670ns; attempting to update the 'registry'
      I0224 22:35:53.745604  1978 log.cpp:683] Attempting to append 170 bytes to the log
      I0224 22:35:53.745905  1977 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1
      I0224 22:35:53.746968  1981 replica.cpp:537] Replica received write request for position 1 from (4539)@172.17.0.1:36678
      I0224 22:35:53.747480  1981 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 456947ns
      I0224 22:35:53.747609  1981 replica.cpp:712] Persisted action at 1
      I0224 22:35:53.750448  1981 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0
      I0224 22:35:53.751158  1981 leveldb.cpp:341] Persisting action (191 bytes) to leveldb took 535163ns
      I0224 22:35:53.751258  1981 replica.cpp:712] Persisted action at 1
      I0224 22:35:53.751389  1981 replica.cpp:697] Replica learned APPEND action at position 1
      I0224 22:35:53.753149  1979 registrar.cpp:484] Successfully updated the 'registry' in 9.228032ms
      I0224 22:35:53.753324  1979 registrar.cpp:370] Successfully recovered registrar
      I0224 22:35:53.753593  1979 log.cpp:702] Attempting to truncate the log to 1
      I0224 22:35:53.753805  1979 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2
      I0224 22:35:53.754055  1981 master.cpp:1522] Recovered 0 slaves from the Registry (131B) ; allowing 10mins for slaves to re-register
      I0224 22:35:53.754349  1979 hierarchical.cpp:171] Skipping recovery of hierarchical allocator: nothing to recover
      I0224 22:35:53.755764  1977 replica.cpp:537] Replica received write request for position 2 from (4540)@172.17.0.1:36678
      I0224 22:35:53.756459  1977 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 488559ns
      I0224 22:35:53.756561  1977 replica.cpp:712] Persisted action at 2
      I0224 22:35:53.757932  1972 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0
      I0224 22:35:53.758400  1972 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 343827ns
      I0224 22:35:53.758539  1972 leveldb.cpp:399] Deleting ~1 keys from leveldb took 34231ns
      I0224 22:35:53.758658  1972 replica.cpp:712] Persisted action at 2
      I0224 22:35:53.758782  1972 replica.cpp:697] Replica learned TRUNCATE action at position 2
      I0224 22:35:53.778059  1978 slave.cpp:193] Slave started on 115)@172.17.0.1:36678
      I0224 22:35:53.778105  1978 slave.cpp:194] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --credential="/tmp/MasterMaintenanceTest_InverseOffers_ywqvFF/credential" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/MasterMaintenanceTest_InverseOffers_ywqvFF/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname="maintenance-host" --hostname_lookup="true" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher_dir="/mesos/mesos-0.28.0/_build/src" --logbufsecs="0" --logging_level="INFO" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="10ms" --resources="cpus:2;mem:1024;disk:1024;ports:[31000-32000]" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/MasterMaintenanceTest_InverseOffers_ywqvFF"
      I0224 22:35:53.778609  1978 credentials.hpp:83] Loading credential for authentication from '/tmp/MasterMaintenanceTest_InverseOffers_ywqvFF/credential'
      I0224 22:35:53.779175  1978 slave.cpp:324] Slave using credential for: test-principal
      I0224 22:35:53.779520  1978 resources.cpp:576] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ports:[31000-32000]
      Trying semicolon-delimited string format instead
      I0224 22:35:53.780192  1978 slave.cpp:464] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
      I0224 22:35:53.780362  1978 slave.cpp:472] Slave attributes: [  ]
      I0224 22:35:53.780483  1978 slave.cpp:477] Slave hostname: maintenance-host
      I0224 22:35:53.782126  1967 state.cpp:58] Recovering state from '/tmp/MasterMaintenanceTest_InverseOffers_ywqvFF/meta'
      I0224 22:35:53.782892  1969 status_update_manager.cpp:200] Recovering status update manager
      I0224 22:35:53.783242  1969 slave.cpp:4565] Finished recovery
      I0224 22:35:53.784001  1969 slave.cpp:4737] Querying resource estimator for oversubscribable resources
      I0224 22:35:53.784678  1969 slave.cpp:796] New master detected at master@172.17.0.1:36678
      I0224 22:35:53.784874  1967 status_update_manager.cpp:174] Pausing sending status updates
      I0224 22:35:53.784808  1969 slave.cpp:859] Authenticating with master master@172.17.0.1:36678
      I0224 22:35:53.784945  1969 slave.cpp:864] Using default CRAM-MD5 authenticatee
      I0224 22:35:53.785181  1969 slave.cpp:832] Detecting new master
      I0224 22:35:53.785326  1969 slave.cpp:4751] Received oversubscribable resources  from the resource estimator
      I0224 22:35:53.785557  1969 authenticatee.cpp:121] Creating new client SASL connection
      I0224 22:35:53.786227  1969 master.cpp:5526] Authenticating slave(115)@172.17.0.1:36678
      I0224 22:35:53.786492  1969 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(298)@172.17.0.1:36678
      I0224 22:35:53.786962  1969 authenticator.cpp:98] Creating new server SASL connection
      I0224 22:35:53.787274  1969 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5
      I0224 22:35:53.787308  1969 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5'
      I0224 22:35:53.787400  1969 authenticator.cpp:203] Received SASL authentication start
      I0224 22:35:53.787470  1969 authenticator.cpp:325] Authentication requires more steps
      I0224 22:35:53.787884  1972 authenticatee.cpp:258] Received SASL authentication step
      I0224 22:35:53.787992  1972 authenticator.cpp:231] Received SASL authentication step
      I0224 22:35:53.788027  1972 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '4db5fa128d2d' server FQDN: '4db5fa128d2d' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
      I0224 22:35:53.788040  1972 auxprop.cpp:179] Looking up auxiliary property '*userPassword'
      I0224 22:35:53.788090  1972 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
      I0224 22:35:53.788122  1972 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '4db5fa128d2d' server FQDN: '4db5fa128d2d' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
      I0224 22:35:53.788136  1972 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
      I0224 22:35:53.788146  1972 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
      I0224 22:35:53.788164  1972 authenticator.cpp:317] Authentication success
      I0224 22:35:53.788331  1972 authenticatee.cpp:298] Authentication success
      I0224 22:35:53.788439  1972 master.cpp:5556] Successfully authenticated principal 'test-principal' at slave(115)@172.17.0.1:36678
      I0224 22:35:53.788529  1972 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(298)@172.17.0.1:36678
      I0224 22:35:53.788988  1972 slave.cpp:927] Successfully authenticated with master master@172.17.0.1:36678
      I0224 22:35:53.789139  1972 slave.cpp:1321] Will retry registration in 1.535786ms if necessary
      I0224 22:35:53.789515  1972 master.cpp:4240] Registering slave at slave(115)@172.17.0.1:36678 (maintenance-host) with id aab18b61-7811-4c43-a672-d1a63818c880-S0
      I0224 22:35:53.790577  1972 registrar.cpp:439] Applied 1 operations in 78745ns; attempting to update the 'registry'
      I0224 22:35:53.791128  1971 process.cpp:3141] Handling HTTP event for process 'master' with path: '/master/maintenance/schedule'
      I0224 22:35:53.791877  1971 http.cpp:501] HTTP POST for /master/maintenance/schedule from 172.17.0.1:45095
      I0224 22:35:53.793313  1972 log.cpp:683] Attempting to append 343 bytes to the log
      I0224 22:35:53.793586  1972 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3
      I0224 22:35:53.794533  1971 replica.cpp:537] Replica received write request for position 3 from (4547)@172.17.0.1:36678
      I0224 22:35:53.794862  1971 leveldb.cpp:341] Persisting action (362 bytes) to leveldb took 283614ns
      I0224 22:35:53.794893  1971 replica.cpp:712] Persisted action at 3
      I0224 22:35:53.796646  1979 replica.cpp:691] Replica received learned notice for position 3 from @0.0.0.0:0
      I0224 22:35:53.797102  1972 slave.cpp:1321] Will retry registration in 17.198963ms if necessary
      I0224 22:35:53.797186  1979 leveldb.cpp:341] Persisting action (364 bytes) to leveldb took 498502ns
      I0224 22:35:53.797230  1979 replica.cpp:712] Persisted action at 3
      I0224 22:35:53.797260  1979 replica.cpp:697] Replica learned APPEND action at position 3
      I0224 22:35:53.797417  1972 master.cpp:4228] Ignoring register slave message from slave(115)@172.17.0.1:36678 (maintenance-host) as admission is already in progress
      I0224 22:35:53.799119  1978 registrar.cpp:484] Successfully updated the 'registry' in 8.45824ms
      I0224 22:35:53.799613  1978 registrar.cpp:439] Applied 1 operations in 176193ns; attempting to update the 'registry'
      I0224 22:35:53.800472  1972 master.cpp:4308] Registered slave aab18b61-7811-4c43-a672-d1a63818c880-S0 at slave(115)@172.17.0.1:36678 (maintenance-host) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
      I0224 22:35:53.800623  1978 log.cpp:702] Attempting to truncate the log to 3
      I0224 22:35:53.801255  1969 hierarchical.cpp:473] Added slave aab18b61-7811-4c43-a672-d1a63818c880-S0 (maintenance-host) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: )
      I0224 22:35:53.801301  1978 slave.cpp:971] Registered with master master@172.17.0.1:36678; given slave ID aab18b61-7811-4c43-a672-d1a63818c880-S0
      I0224 22:35:53.801331  1978 fetcher.cpp:81] Clearing fetcher cache
      I0224 22:35:53.801431  1969 hierarchical.cpp:1434] No resources available to allocate!
      I0224 22:35:53.801466  1969 hierarchical.cpp:1147] Performed allocation for slave aab18b61-7811-4c43-a672-d1a63818c880-S0 in 162751ns
      I0224 22:35:53.801532  1969 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 4
      I0224 22:35:53.801867  1978 slave.cpp:994] Checkpointing SlaveInfo to '/tmp/MasterMaintenanceTest_InverseOffers_ywqvFF/meta/slaves/aab18b61-7811-4c43-a672-d1a63818c880-S0/slave.info'
      I0224 22:35:53.801877  1969 status_update_manager.cpp:181] Resuming sending status updates
      I0224 22:35:53.802898  1977 replica.cpp:537] Replica received write request for position 4 from (4548)@172.17.0.1:36678
      I0224 22:35:53.803252  1978 slave.cpp:1030] Forwarding total oversubscribed resources 
      I0224 22:35:53.803640  1970 master.cpp:4649] Received update of slave aab18b61-7811-4c43-a672-d1a63818c880-S0 at slave(115)@172.17.0.1:36678 (maintenance-host) with total oversubscribed resources 
      I0224 22:35:53.803858  1977 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 912626ns
      I0224 22:35:53.803889  1977 replica.cpp:712] Persisted action at 4
      I0224 22:35:53.804144  1978 slave.cpp:3482] Received ping from slave-observer(117)@172.17.0.1:36678
      I0224 22:35:53.804535  1971 hierarchical.cpp:531] Slave aab18b61-7811-4c43-a672-d1a63818c880-S0 (maintenance-host) updated with oversubscribed resources  (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: )
      I0224 22:35:53.804684  1971 hierarchical.cpp:1434] No resources available to allocate!
      I0224 22:35:53.804714  1971 hierarchical.cpp:1147] Performed allocation for slave aab18b61-7811-4c43-a672-d1a63818c880-S0 in 131453ns
      I0224 22:35:53.805541  1967 replica.cpp:691] Replica received learned notice for position 4 from @0.0.0.0:0
      I0224 22:35:53.805941  1967 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 366444ns
      I0224 22:35:53.806015  1967 leveldb.cpp:399] Deleting ~2 keys from leveldb took 42808ns
      I0224 22:35:53.806041  1967 replica.cpp:712] Persisted action at 4
      I0224 22:35:53.806066  1967 replica.cpp:697] Replica learned TRUNCATE action at position 4
      I0224 22:35:53.807355  1978 log.cpp:683] Attempting to append 465 bytes to the log
      I0224 22:35:53.807551  1978 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 5
      I0224 22:35:53.809638  1979 replica.cpp:537] Replica received write request for position 5 from (4549)@172.17.0.1:36678
      I0224 22:35:53.810858  1979 leveldb.cpp:341] Persisting action (484 bytes) to leveldb took 1.167663ms
      I0224 22:35:53.810904  1979 replica.cpp:712] Persisted action at 5
      I0224 22:35:53.811997  1979 replica.cpp:691] Replica received learned notice for position 5 from @0.0.0.0:0
      I0224 22:35:53.812348  1979 leveldb.cpp:341] Persisting action (486 bytes) to leveldb took 318928ns
      I0224 22:35:53.812376  1979 replica.cpp:712] Persisted action at 5
      I0224 22:35:53.812397  1979 replica.cpp:697] Replica learned APPEND action at position 5
      I0224 22:35:53.815132  1973 registrar.cpp:484] Successfully updated the 'registry' in 15.437312ms
      I0224 22:35:53.815491  1976 log.cpp:702] Attempting to truncate the log to 5
      I0224 22:35:53.815610  1973 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 6
      I0224 22:35:53.815661  1968 master.cpp:4705] Updating unavailability of slave aab18b61-7811-4c43-a672-d1a63818c880-S0 at slave(115)@172.17.0.1:36678 (maintenance-host), starting at 2410.99235909694weeks
      I0224 22:35:53.815845  1968 master.cpp:4705] Updating unavailability of slave aab18b61-7811-4c43-a672-d1a63818c880-S0 at slave(115)@172.17.0.1:36678 (maintenance-host), starting at 2410.99235909694weeks
      I0224 22:35:53.816069  1975 hierarchical.cpp:1434] No resources available to allocate!
      I0224 22:35:53.816103  1975 hierarchical.cpp:1147] Performed allocation for slave aab18b61-7811-4c43-a672-d1a63818c880-S0 in 175822ns
      I0224 22:35:53.816272  1975 hierarchical.cpp:1434] No resources available to allocate!
      I0224 22:35:53.816303  1975 hierarchical.cpp:1147] Performed allocation for slave aab18b61-7811-4c43-a672-d1a63818c880-S0 in 110913ns
      I0224 22:35:53.817291  1972 replica.cpp:537] Replica received write request for position 6 from (4550)@172.17.0.1:36678
      I0224 22:35:53.817908  1972 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 576032ns
      I0224 22:35:53.817932  1972 replica.cpp:712] Persisted action at 6
      I0224 22:35:53.818686  1980 replica.cpp:691] Replica received learned notice for position 6 from @0.0.0.0:0
      I0224 22:35:53.819021  1980 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 305298ns
      I0224 22:35:53.819095  1980 leveldb.cpp:399] Deleting ~2 keys from leveldb took 44332ns
      I0224 22:35:53.819120  1980 replica.cpp:712] Persisted action at 6
      I0224 22:35:53.819162  1980 replica.cpp:697] Replica learned TRUNCATE action at position 6
      I0224 22:35:53.820662  1967 process.cpp:3141] Handling HTTP event for process 'master' with path: '/master/maintenance/status'
      I0224 22:35:53.821190  1976 http.cpp:501] HTTP GET for /master/maintenance/status from 172.17.0.1:45096
      I0224 22:35:53.823709  1948 scheduler.cpp:154] Version: 0.28.0
      I0224 22:35:53.824424  1972 scheduler.cpp:236] New master detected at master@172.17.0.1:36678
      I0224 22:35:53.825402  1982 scheduler.cpp:298] Sending SUBSCRIBE call to master@172.17.0.1:36678
      I0224 22:35:53.827201  1978 process.cpp:3141] Handling HTTP event for process 'master' with path: '/master/api/v1/scheduler'
      I0224 22:35:53.827636  1978 http.cpp:501] HTTP POST for /master/api/v1/scheduler from 172.17.0.1:45097
      I0224 22:35:53.827922  1978 master.cpp:1974] Received subscription request for HTTP framework 'default'
      I0224 22:35:53.827991  1978 master.cpp:1751] Authorizing framework principal 'test-principal' to receive offers for role '*'
      I0224 22:35:53.828418  1982 master.cpp:2065] Subscribing framework 'default' with checkpointing disabled and capabilities [  ]
      I0224 22:35:53.828943  1968 hierarchical.cpp:265] Added framework aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:35:53.829124  1982 master.hpp:1657] Sending heartbeat to aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:35:53.829987  1968 hierarchical.cpp:1127] Performed allocation for 1 slaves in 1.011356ms
      I0224 22:35:53.830204  1982 master.cpp:5355] Sending 1 offers to framework aab18b61-7811-4c43-a672-d1a63818c880-0000 (default)
      I0224 22:35:53.830801  1982 master.cpp:5445] Sending 1 inverse offers to framework aab18b61-7811-4c43-a672-d1a63818c880-0000 (default)
      I0224 22:35:53.831132  1969 scheduler.cpp:457] Enqueuing event SUBSCRIBED received from master@172.17.0.1:36678
      I0224 22:35:53.832396  1968 scheduler.cpp:457] Enqueuing event HEARTBEAT received from master@172.17.0.1:36678
      I0224 22:35:53.833050  1976 master_maintenance_tests.cpp:177] Ignoring HEARTBEAT event
      I0224 22:35:53.833256  1979 scheduler.cpp:457] Enqueuing event OFFERS received from master@172.17.0.1:36678
      I0224 22:35:53.833775  1979 scheduler.cpp:457] Enqueuing event OFFERS received from master@172.17.0.1:36678
      I0224 22:35:53.835662  1980 scheduler.cpp:298] Sending ACCEPT call to master@172.17.0.1:36678
      I0224 22:35:53.837591  1967 process.cpp:3141] Handling HTTP event for process 'master' with path: '/master/api/v1/scheduler'
      I0224 22:35:53.838021  1967 http.cpp:501] HTTP POST for /master/api/v1/scheduler from 172.17.0.1:45098
      I0224 22:35:53.838851  1967 master.cpp:3138] Processing ACCEPT call for offers: [ aab18b61-7811-4c43-a672-d1a63818c880-O0 ] on slave aab18b61-7811-4c43-a672-d1a63818c880-S0 at slave(115)@172.17.0.1:36678 (maintenance-host) for framework aab18b61-7811-4c43-a672-d1a63818c880-0000 (default)
      I0224 22:35:53.838946  1967 master.cpp:2825] Authorizing framework principal 'test-principal' to launch task 90bcae0c-9d40-40b7-9537-dae7e83479f6 as user 'mesos'
      W0224 22:35:53.841048  1967 validation.cpp:404] Executor default for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases.
      W0224 22:35:53.841101  1967 validation.cpp:416] Executor default for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases.
      I0224 22:35:53.841624  1967 master.hpp:176] Adding task 90bcae0c-9d40-40b7-9537-dae7e83479f6 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave aab18b61-7811-4c43-a672-d1a63818c880-S0 (maintenance-host)
      I0224 22:35:53.842157  1967 master.cpp:3623] Launching task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000 (default) with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave aab18b61-7811-4c43-a672-d1a63818c880-S0 at slave(115)@172.17.0.1:36678 (maintenance-host)
      I0224 22:35:53.842571  1980 slave.cpp:1361] Got assigned task 90bcae0c-9d40-40b7-9537-dae7e83479f6 for framework aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:35:53.843122  1980 slave.cpp:1480] Launching task 90bcae0c-9d40-40b7-9537-dae7e83479f6 for framework aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:35:53.843718  1980 paths.cpp:474] Trying to chown '/tmp/MasterMaintenanceTest_InverseOffers_ywqvFF/slaves/aab18b61-7811-4c43-a672-d1a63818c880-S0/frameworks/aab18b61-7811-4c43-a672-d1a63818c880-0000/executors/default/runs/a5a1e49d-20a8-4796-8ec0-5a1595e76159' to user 'mesos'
      I0224 22:35:53.852052  1980 slave.cpp:5367] Launching executor default of framework aab18b61-7811-4c43-a672-d1a63818c880-0000 with resources  in work directory '/tmp/MasterMaintenanceTest_InverseOffers_ywqvFF/slaves/aab18b61-7811-4c43-a672-d1a63818c880-S0/frameworks/aab18b61-7811-4c43-a672-d1a63818c880-0000/executors/default/runs/a5a1e49d-20a8-4796-8ec0-5a1595e76159'
      I0224 22:35:53.854452  1980 exec.cpp:143] Version: 0.28.0
      I0224 22:35:53.854812  1967 exec.cpp:193] Executor started at: executor(47)@172.17.0.1:36678 with pid 1948
      I0224 22:35:53.855108  1980 slave.cpp:1698] Queuing task '90bcae0c-9d40-40b7-9537-dae7e83479f6' for executor 'default' of framework aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:35:53.855264  1980 slave.cpp:749] Successfully attached file '/tmp/MasterMaintenanceTest_InverseOffers_ywqvFF/slaves/aab18b61-7811-4c43-a672-d1a63818c880-S0/frameworks/aab18b61-7811-4c43-a672-d1a63818c880-0000/executors/default/runs/a5a1e49d-20a8-4796-8ec0-5a1595e76159'
      I0224 22:35:53.855362  1980 slave.cpp:2643] Got registration for executor 'default' of framework aab18b61-7811-4c43-a672-d1a63818c880-0000 from executor(47)@172.17.0.1:36678
      I0224 22:35:53.855785  1974 exec.cpp:217] Executor registered on slave aab18b61-7811-4c43-a672-d1a63818c880-S0
      I0224 22:35:53.855857  1974 exec.cpp:229] Executor::registered took 42512ns
      I0224 22:35:53.856391  1980 slave.cpp:1863] Sending queued task '90bcae0c-9d40-40b7-9537-dae7e83479f6' to executor 'default' of framework aab18b61-7811-4c43-a672-d1a63818c880-0000 at executor(47)@172.17.0.1:36678
      I0224 22:35:53.856720  1974 exec.cpp:304] Executor asked to run task '90bcae0c-9d40-40b7-9537-dae7e83479f6'
      I0224 22:35:53.856812  1974 exec.cpp:313] Executor::launchTask took 65703ns
      I0224 22:35:53.856922  1974 exec.cpp:526] Executor sending status update TASK_RUNNING (UUID: 249b169a-6b5f-4776-95c8-c897ba6b3f0b) for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:35:53.857378  1980 slave.cpp:3002] Handling status update TASK_RUNNING (UUID: 249b169a-6b5f-4776-95c8-c897ba6b3f0b) for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000 from executor(47)@172.17.0.1:36678
      I0224 22:35:53.858175  1980 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: 249b169a-6b5f-4776-95c8-c897ba6b3f0b) for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:35:53.858222  1980 status_update_manager.cpp:497] Creating StatusUpdate stream for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:35:53.858687  1980 status_update_manager.cpp:374] Forwarding update TASK_RUNNING (UUID: 249b169a-6b5f-4776-95c8-c897ba6b3f0b) for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000 to the slave
      I0224 22:35:53.859210  1980 slave.cpp:3400] Forwarding the update TASK_RUNNING (UUID: 249b169a-6b5f-4776-95c8-c897ba6b3f0b) for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000 to master@172.17.0.1:36678
      I0224 22:35:53.859390  1980 slave.cpp:3294] Status update manager successfully handled status update TASK_RUNNING (UUID: 249b169a-6b5f-4776-95c8-c897ba6b3f0b) for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:35:53.859436  1980 slave.cpp:3310] Sending acknowledgement for status update TASK_RUNNING (UUID: 249b169a-6b5f-4776-95c8-c897ba6b3f0b) for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000 to executor(47)@172.17.0.1:36678
      I0224 22:35:53.859663  1980 exec.cpp:350] Executor received status update acknowledgement 249b169a-6b5f-4776-95c8-c897ba6b3f0b for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:35:53.859657  1967 master.cpp:4794] Status update TASK_RUNNING (UUID: 249b169a-6b5f-4776-95c8-c897ba6b3f0b) for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000 from slave aab18b61-7811-4c43-a672-d1a63818c880-S0 at slave(115)@172.17.0.1:36678 (maintenance-host)
      I0224 22:35:53.859851  1967 master.cpp:4842] Forwarding status update TASK_RUNNING (UUID: 249b169a-6b5f-4776-95c8-c897ba6b3f0b) for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:35:53.860587  1967 master.cpp:6450] Updating the state of task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000 (latest state: TASK_RUNNING, status update state: TASK_RUNNING)
      I0224 22:35:53.862711  1967 scheduler.cpp:457] Enqueuing event UPDATE received from master@172.17.0.1:36678
      I0224 22:35:53.866711  1976 scheduler.cpp:298] Sending ACKNOWLEDGE call to master@172.17.0.1:36678
      I0224 22:35:53.870667  1972 process.cpp:3141] Handling HTTP event for process 'master' with path: '/master/api/v1/scheduler'
      I0224 22:35:53.871269  1972 http.cpp:501] HTTP POST for /master/api/v1/scheduler from 172.17.0.1:45099
      I0224 22:35:53.871459  1972 master.cpp:3952] Processing ACKNOWLEDGE call 249b169a-6b5f-4776-95c8-c897ba6b3f0b for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000 (default) on slave aab18b61-7811-4c43-a672-d1a63818c880-S0
      I0224 22:35:53.872184  1972 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 249b169a-6b5f-4776-95c8-c897ba6b3f0b) for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:35:53.872537  1972 slave.cpp:2412] Status update manager successfully handled status update acknowledgement (UUID: 249b169a-6b5f-4776-95c8-c897ba6b3f0b) for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:35:53.874407  1975 scheduler.cpp:298] Sending DECLINE call to master@172.17.0.1:36678
      I0224 22:35:53.877537  1979 hierarchical.cpp:1434] No resources available to allocate!
      I0224 22:35:53.877795  1979 hierarchical.cpp:1127] Performed allocation for 1 slaves in 482441ns
      I0224 22:35:53.878082  1981 process.cpp:3141] Handling HTTP event for process 'master' with path: '/master/api/v1/scheduler'
      I0224 22:35:53.878675  1978 http.cpp:501] HTTP POST for /master/api/v1/scheduler from 172.17.0.1:45100
      I0224 22:35:53.878931  1978 master.cpp:3675] Processing DECLINE call for offers: [ aab18b61-7811-4c43-a672-d1a63818c880-O1 ] for framework aab18b61-7811-4c43-a672-d1a63818c880-0000 (default)
      ../../src/tests/master_maintenance_tests.cpp:1222: Failure
      Failed to wait 15secs for event
      I0224 22:36:08.881649  1948 master.cpp:1027] Master terminating
      W0224 22:36:08.881925  1948 master.cpp:6502] Removing task 90bcae0c-9d40-40b7-9537-dae7e83479f6 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] of framework aab18b61-7811-4c43-a672-d1a63818c880-0000 on slave aab18b61-7811-4c43-a672-d1a63818c880-S0 at slave(115)@172.17.0.1:36678 (maintenance-host) in non-terminal state TASK_RUNNING
      I0224 22:36:08.882961  1948 master.cpp:6545] Removing executor 'default' with resources  of framework aab18b61-7811-4c43-a672-d1a63818c880-0000 on slave aab18b61-7811-4c43-a672-d1a63818c880-S0 at slave(115)@172.17.0.1:36678 (maintenance-host)
      I0224 22:36:08.884789  1969 hierarchical.cpp:505] Removed slave aab18b61-7811-4c43-a672-d1a63818c880-S0
      I0224 22:36:08.887261  1969 hierarchical.cpp:326] Removed framework aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:36:08.916983  1976 slave.cpp:3528] master@172.17.0.1:36678 exited
      W0224 22:36:08.917191  1976 slave.cpp:3531] Master disconnected! Waiting for a new master to be elected
      I0224 22:36:08.934546  1975 slave.cpp:3528] executor(47)@172.17.0.1:36678 exited
      I0224 22:36:08.934806  1974 slave.cpp:3886] Executor 'default' of framework aab18b61-7811-4c43-a672-d1a63818c880-0000 exited with status 0
      I0224 22:36:08.935024  1974 slave.cpp:3002] Handling status update TASK_FAILED (UUID: 77d415df-58bd-4cf5-9c49-6106691d9599) for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000 from @0.0.0.0:0
      I0224 22:36:08.935505  1974 slave.cpp:5677] Terminating task 90bcae0c-9d40-40b7-9537-dae7e83479f6
      I0224 22:36:08.936190  1967 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 77d415df-58bd-4cf5-9c49-6106691d9599) for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:36:08.936368  1967 status_update_manager.cpp:374] Forwarding update TASK_FAILED (UUID: 77d415df-58bd-4cf5-9c49-6106691d9599) for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000 to the slave
      I0224 22:36:08.936606  1974 slave.cpp:3400] Forwarding the update TASK_FAILED (UUID: 77d415df-58bd-4cf5-9c49-6106691d9599) for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000 to master@172.17.0.1:36678
      I0224 22:36:08.936779  1974 slave.cpp:3294] Status update manager successfully handled status update TASK_FAILED (UUID: 77d415df-58bd-4cf5-9c49-6106691d9599) for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:36:08.955370  1967 slave.cpp:668] Slave terminating
      I0224 22:36:08.955499  1967 slave.cpp:2079] Asked to shut down framework aab18b61-7811-4c43-a672-d1a63818c880-0000 by @0.0.0.0:0
      I0224 22:36:08.955538  1967 slave.cpp:2104] Shutting down framework aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:36:08.955606  1967 slave.cpp:3990] Cleaning up executor 'default' of framework aab18b61-7811-4c43-a672-d1a63818c880-0000 at executor(47)@172.17.0.1:36678
      I0224 22:36:08.956053  1967 slave.cpp:4078] Cleaning up framework aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:36:08.956327  1967 gc.cpp:54] Scheduling '/tmp/MasterMaintenanceTest_InverseOffers_ywqvFF/slaves/aab18b61-7811-4c43-a672-d1a63818c880-S0/frameworks/aab18b61-7811-4c43-a672-d1a63818c880-0000/executors/default/runs/a5a1e49d-20a8-4796-8ec0-5a1595e76159' for gc 1.00002336880296weeks in the future
      I0224 22:36:08.956495  1973 status_update_manager.cpp:282] Closing status update streams for framework aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:36:08.956524  1967 gc.cpp:54] Scheduling '/tmp/MasterMaintenanceTest_InverseOffers_ywqvFF/slaves/aab18b61-7811-4c43-a672-d1a63818c880-S0/frameworks/aab18b61-7811-4c43-a672-d1a63818c880-0000/executors/default' for gc 1.00002336880296weeks in the future
      I0224 22:36:08.956549  1973 status_update_manager.cpp:528] Cleaning up status update stream for task 90bcae0c-9d40-40b7-9537-dae7e83479f6 of framework aab18b61-7811-4c43-a672-d1a63818c880-0000
      I0224 22:36:08.956619  1967 gc.cpp:54] Scheduling '/tmp/MasterMaintenanceTest_InverseOffers_ywqvFF/slaves/aab18b61-7811-4c43-a672-d1a63818c880-S0/frameworks/aab18b61-7811-4c43-a672-d1a63818c880-0000' for gc 1.00002336880296weeks in the future
      [  FAILED  ] MasterMaintenanceTest.InverseOffers (15258 ms)
      

      Attachments

        Activity

          People

            kaysoky Joseph Wu
            kaysoky Joseph Wu
            Joris Van Remoortere Joris Van Remoortere
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: