Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-5548

FaultToleranceTest.UpdateFrameworkInfoOnSchedulerFailover is flaky

    XMLWordPrintableJSON

Details

    Description

      Observed this on ASF CI, where this test runs forever

      [ RUN      ] FaultToleranceTest.UpdateFrameworkInfoOnSchedulerFailover
      I0606 17:09:02.953631 29338 cluster.cpp:155] Creating default 'local' authorizer
      I0606 17:09:02.957620 29338 leveldb.cpp:174] Opened db in 3.247876ms
      I0606 17:09:02.958684 29338 leveldb.cpp:181] Compacted db in 1.023058ms
      I0606 17:09:02.958762 29338 leveldb.cpp:196] Created db iterator in 17962ns
      I0606 17:09:02.958794 29338 leveldb.cpp:202] Seeked to beginning of db in 2453ns
      I0606 17:09:02.958820 29338 leveldb.cpp:271] Iterated through 0 keys in the db in 465ns
      I0606 17:09:02.958880 29338 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
      I0606 17:09:02.959601 29362 recover.cpp:448] Starting replica recovery
      I0606 17:09:02.959996 29362 recover.cpp:474] Replica is in EMPTY status
      I0606 17:09:02.961241 29357 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (1793)@172.17.0.2:39784
      I0606 17:09:02.961608 29369 recover.cpp:194] Received a recover response from a replica in EMPTY status
      I0606 17:09:02.962347 29357 recover.cpp:565] Updating replica status to STARTING
      I0606 17:09:02.963209 29371 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 706012ns
      I0606 17:09:02.963240 29371 replica.cpp:320] Persisted replica status to STARTING
      I0606 17:09:02.963435 29370 recover.cpp:474] Replica is in STARTING status
      I0606 17:09:02.963881 29360 master.cpp:382] Master 7d25f96c-ab51-4074-b613-df1b15675b81 (1ead8e6f9ec5) started on 172.17.0.2:39784
      I0606 17:09:02.963907 29360 master.cpp:384] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http="true" --authenticate_http_frameworks="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/TzbJnN/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" --registry_strict="true" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-1.0.0/_inst/share/mesos/webui" --work_dir="/tmp/TzbJnN/master" --zk_session_timeout="10secs"
      I0606 17:09:02.964334 29360 master.cpp:433] Master only allowing authenticated frameworks to register
      I0606 17:09:02.964355 29360 master.cpp:439] Master only allowing authenticated agents to register
      I0606 17:09:02.964366 29360 master.cpp:445] Master only allowing authenticated HTTP frameworks to register
      I0606 17:09:02.964376 29360 credentials.hpp:37] Loading credentials for authentication from '/tmp/TzbJnN/credentials'
      I0606 17:09:02.964445 29372 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (1794)@172.17.0.2:39784
      I0606 17:09:02.964728 29360 master.cpp:489] Using default 'crammd5' authenticator
      I0606 17:09:02.964891 29360 master.cpp:560] Using default 'basic' HTTP authenticator
      I0606 17:09:02.964926 29361 recover.cpp:194] Received a recover response from a replica in STARTING status
      I0606 17:09:02.965065 29360 master.cpp:640] Using default 'basic' HTTP framework authenticator
      I0606 17:09:02.965330 29360 master.cpp:687] Authorization enabled
      I0606 17:09:02.965418 29362 recover.cpp:565] Updating replica status to VOTING
      I0606 17:09:02.965591 29359 hierarchical.cpp:142] Initialized hierarchical allocator process
      I0606 17:09:02.965672 29367 whitelist_watcher.cpp:77] No whitelist given
      I0606 17:09:02.966047 29365 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 366626ns
      I0606 17:09:02.966080 29365 replica.cpp:320] Persisted replica status to VOTING
      I0606 17:09:02.966222 29368 recover.cpp:579] Successfully joined the Paxos group
      I0606 17:09:02.966526 29368 recover.cpp:463] Recover process terminated
      I0606 17:09:02.968436 29366 master.cpp:1951] The newly elected leader is master@172.17.0.2:39784 with id 7d25f96c-ab51-4074-b613-df1b15675b81
      I0606 17:09:02.968492 29366 master.cpp:1964] Elected as the leading master!
      I0606 17:09:02.968526 29366 master.cpp:1651] Recovering from registrar
      I0606 17:09:02.968731 29371 registrar.cpp:332] Recovering registrar
      I0606 17:09:02.969341 29359 log.cpp:524] Attempting to start the writer
      I0606 17:09:02.970733 29368 replica.cpp:493] Replica received implicit promise request from (1797)@172.17.0.2:39784 with proposal 1
      I0606 17:09:02.971122 29368 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 346218ns
      I0606 17:09:02.971153 29368 replica.cpp:342] Persisted promised to 1
      I0606 17:09:02.972035 29363 coordinator.cpp:238] Coordinator attempting to fill missing positions
      I0606 17:09:02.973377 29368 replica.cpp:388] Replica received explicit promise request from (1798)@172.17.0.2:39784 for position 0 with proposal 2
      I0606 17:09:02.973850 29368 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 419380ns
      I0606 17:09:02.973884 29368 replica.cpp:712] Persisted action at 0
      I0606 17:09:02.975050 29364 replica.cpp:537] Replica received write request for position 0 from (1799)@172.17.0.2:39784
      I0606 17:09:02.975131 29364 leveldb.cpp:436] Reading position from leveldb took 36667ns
      I0606 17:09:02.975550 29364 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 366348ns
      I0606 17:09:02.975582 29364 replica.cpp:712] Persisted action at 0
      I0606 17:09:02.976235 29360 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0
      I0606 17:09:02.976652 29360 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 376620ns
      I0606 17:09:02.976683 29360 replica.cpp:712] Persisted action at 0
      I0606 17:09:02.976724 29360 replica.cpp:697] Replica learned NOP action at position 0
      I0606 17:09:02.977457 29362 log.cpp:540] Writer started with ending position 0
      I0606 17:09:02.978617 29364 leveldb.cpp:436] Reading position from leveldb took 35230ns
      I0606 17:09:02.979652 29367 registrar.cpp:365] Successfully fetched the registry (0B) in 10.86208ms
      I0606 17:09:02.979790 29367 registrar.cpp:464] Applied 1 operations in 22882ns; attempting to update the 'registry'
      I0606 17:09:02.980599 29363 log.cpp:548] Attempting to append 168 bytes to the log
      I0606 17:09:02.980787 29366 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1
      I0606 17:09:02.981648 29363 replica.cpp:537] Replica received write request for position 1 from (1800)@172.17.0.2:39784
      I0606 17:09:02.982123 29363 leveldb.cpp:341] Persisting action (187 bytes) to leveldb took 429124ns
      I0606 17:09:02.982154 29363 replica.cpp:712] Persisted action at 1
      I0606 17:09:02.983042 29366 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0
      I0606 17:09:02.983487 29366 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 407945ns
      I0606 17:09:02.983518 29366 replica.cpp:712] Persisted action at 1
      I0606 17:09:02.983541 29366 replica.cpp:697] Replica learned APPEND action at position 1
      I0606 17:09:02.984771 29368 registrar.cpp:509] Successfully updated the 'registry' in 4.911872ms
      I0606 17:09:02.984946 29368 registrar.cpp:395] Successfully recovered registrar
      I0606 17:09:02.985101 29362 log.cpp:567] Attempting to truncate the log to 1
      I0606 17:09:02.985332 29366 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2
      I0606 17:09:02.985574 29362 hierarchical.cpp:169] Skipping recovery of hierarchical allocator: nothing to recover
      I0606 17:09:02.985570 29357 master.cpp:1759] Recovered 0 agents from the Registry (129B) ; allowing 10mins for agents to re-register
      I0606 17:09:02.986429 29367 replica.cpp:537] Replica received write request for position 2 from (1801)@172.17.0.2:39784
      I0606 17:09:02.986819 29367 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 345811ns
      I0606 17:09:02.986852 29367 replica.cpp:712] Persisted action at 2
      I0606 17:09:02.987632 29369 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0
      I0606 17:09:02.988188 29369 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 512643ns
      I0606 17:09:02.988260 29369 leveldb.cpp:399] Deleting ~1 keys from leveldb took 38736ns
      I0606 17:09:02.988298 29369 replica.cpp:712] Persisted action at 2
      I0606 17:09:02.988327 29369 replica.cpp:697] Replica learned TRUNCATE action at position 2
      I0606 17:09:03.000023 29338 containerizer.cpp:198] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
      W0606 17:09:03.000926 29338 backend.cpp:66] Failed to create 'bind' backend: BindBackend requires root privileges
      I0606 17:09:03.002598 29338 cluster.cpp:432] Creating default 'local' authorizer
      I0606 17:09:03.004048 29360 slave.cpp:203] Agent started on 41)@172.17.0.2:39784
      I0606 17:09:03.004072 29360 slave.cpp:204] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http="true" --authenticatee="crammd5" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --credential="/tmp/FaultToleranceTest_UpdateFrameworkInfoOnSchedulerFailover_fcOHgu/credential" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/FaultToleranceTest_UpdateFrameworkInfoOnSchedulerFailover_fcOHgu/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="true" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_credentials="/tmp/FaultToleranceTest_UpdateFrameworkInfoOnSchedulerFailover_fcOHgu/http_credentials" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher_dir="/mesos/mesos-1.0.0/_build/src" --logbufsecs="0" --logging_level="INFO" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="10ms" --resources="cpus:2;mem:1024;disk:1024;ports:[31000-32000]" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/FaultToleranceTest_UpdateFrameworkInfoOnSchedulerFailover_fcOHgu"
      I0606 17:09:03.004683 29360 credentials.hpp:86] Loading credential for authentication from '/tmp/FaultToleranceTest_UpdateFrameworkInfoOnSchedulerFailover_fcOHgu/credential'
      I0606 17:09:03.004874 29360 slave.cpp:341] Agent using credential for: test-principal
      I0606 17:09:03.004922 29360 credentials.hpp:37] Loading credentials for authentication from '/tmp/FaultToleranceTest_UpdateFrameworkInfoOnSchedulerFailover_fcOHgu/http_credentials'
      I0606 17:09:03.005223 29360 slave.cpp:393] Using default 'basic' HTTP authenticator
      I0606 17:09:03.005403 29338 sched.cpp:224] Version: 1.0.0
      I0606 17:09:03.005568 29360 resources.cpp:572] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ports:[31000-32000]
      Trying semicolon-delimited string format instead
      I0606 17:09:03.005972 29360 slave.cpp:592] Agent resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
      I0606 17:09:03.006027 29360 slave.cpp:600] Agent attributes: [  ]
      I0606 17:09:03.006038 29360 slave.cpp:605] Agent hostname: 1ead8e6f9ec5
      I0606 17:09:03.006053 29369 sched.cpp:328] New master detected at master@172.17.0.2:39784
      I0606 17:09:03.006173 29369 sched.cpp:384] Authenticating with master master@172.17.0.2:39784
      I0606 17:09:03.006197 29369 sched.cpp:391] Using default CRAM-MD5 authenticatee
      I0606 17:09:03.006466 29359 authenticatee.cpp:121] Creating new client SASL connection
      I0606 17:09:03.006822 29370 master.cpp:5925] Authenticating scheduler-fa9f8a5a-c0ef-4162-a371-d9b9affd58b2@172.17.0.2:39784
      I0606 17:09:03.006969 29364 authenticator.cpp:414] Starting authentication session for crammd5_authenticatee(125)@172.17.0.2:39784
      I0606 17:09:03.007194 29369 state.cpp:57] Recovering state from '/tmp/FaultToleranceTest_UpdateFrameworkInfoOnSchedulerFailover_fcOHgu/meta'
      I0606 17:09:03.007253 29359 authenticator.cpp:98] Creating new server SASL connection
      I0606 17:09:03.007606 29364 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
      I0606 17:09:03.007643 29366 status_update_manager.cpp:200] Recovering status update manager
      I0606 17:09:03.007649 29364 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
      I0606 17:09:03.007835 29363 authenticator.cpp:204] Received SASL authentication start
      I0606 17:09:03.007853 29360 containerizer.cpp:486] Recovering containerizer
      I0606 17:09:03.007918 29363 authenticator.cpp:326] Authentication requires more steps
      I0606 17:09:03.008039 29367 authenticatee.cpp:259] Received SASL authentication step
      I0606 17:09:03.008167 29363 authenticator.cpp:232] Received SASL authentication step
      I0606 17:09:03.008211 29363 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '1ead8e6f9ec5' server FQDN: '1ead8e6f9ec5' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
      I0606 17:09:03.008232 29363 auxprop.cpp:179] Looking up auxiliary property '*userPassword'
      I0606 17:09:03.008280 29363 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
      I0606 17:09:03.008319 29363 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '1ead8e6f9ec5' server FQDN: '1ead8e6f9ec5' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
      I0606 17:09:03.008342 29363 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
      I0606 17:09:03.008358 29363 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
      I0606 17:09:03.008388 29363 authenticator.cpp:318] Authentication success
      I0606 17:09:03.008494 29370 authenticatee.cpp:299] Authentication success
      I0606 17:09:03.008725 29360 master.cpp:5955] Successfully authenticated principal 'test-principal' at scheduler-fa9f8a5a-c0ef-4162-a371-d9b9affd58b2@172.17.0.2:39784
      I0606 17:09:03.008774 29361 authenticator.cpp:432] Authentication session cleanup for crammd5_authenticatee(125)@172.17.0.2:39784
      I0606 17:09:03.009054 29368 sched.cpp:474] Successfully authenticated with master master@172.17.0.2:39784
      I0606 17:09:03.009078 29368 sched.cpp:785] Sending SUBSCRIBE call to master@172.17.0.2:39784
      I0606 17:09:03.009256 29368 sched.cpp:818] Will retry registration in 905.999953ms if necessary
      I0606 17:09:03.009430 29371 master.cpp:2521] Received SUBSCRIBE call for framework 'Framework 1' at scheduler-fa9f8a5a-c0ef-4162-a371-d9b9affd58b2@172.17.0.2:39784
      I0606 17:09:03.009510 29371 master.cpp:1990] Authorizing framework principal 'test-principal' to receive offers for role '*'
      I0606 17:09:03.009595 29362 provisioner.cpp:253] Provisioner recovery complete
      I0606 17:09:03.009912 29372 slave.cpp:4827] Finished recovery
      I0606 17:09:03.010077 29370 master.cpp:2597] Subscribing framework Framework 1 with checkpointing disabled and capabilities [  ]
      I0606 17:09:03.010352 29372 slave.cpp:4999] Querying resource estimator for oversubscribable resources
      I0606 17:09:03.010742 29369 hierarchical.cpp:264] Added framework 7d25f96c-ab51-4074-b613-df1b15675b81-0000
      I0606 17:09:03.010783 29364 sched.cpp:712] Framework registered with 7d25f96c-ab51-4074-b613-df1b15675b81-0000
      I0606 17:09:03.010810 29369 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:03.010843 29369 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:03.010877 29364 sched.cpp:726] Scheduler::registered took 65429ns
      I0606 17:09:03.010897 29369 hierarchical.cpp:1139] Performed allocation for 0 agents in 128801ns
      I0606 17:09:03.010937 29372 slave.cpp:962] New master detected at master@172.17.0.2:39784
      I0606 17:09:03.010946 29364 status_update_manager.cpp:174] Pausing sending status updates
      I0606 17:09:03.010967 29372 slave.cpp:1024] Authenticating with master master@172.17.0.2:39784
      I0606 17:09:03.011085 29372 slave.cpp:1033] Using default CRAM-MD5 authenticatee
      I0606 17:09:03.011230 29372 slave.cpp:997] Detecting new master
      I0606 17:09:03.011281 29358 authenticatee.cpp:121] Creating new client SASL connection
      I0606 17:09:03.011375 29372 slave.cpp:5013] Received oversubscribable resources  from the resource estimator
      I0606 17:09:03.011526 29357 master.cpp:5925] Authenticating slave(41)@172.17.0.2:39784
      I0606 17:09:03.011629 29371 authenticator.cpp:414] Starting authentication session for crammd5_authenticatee(126)@172.17.0.2:39784
      I0606 17:09:03.011870 29364 authenticator.cpp:98] Creating new server SASL connection
      I0606 17:09:03.012056 29369 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
      I0606 17:09:03.012080 29369 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
      I0606 17:09:03.012192 29368 authenticator.cpp:204] Received SASL authentication start
      I0606 17:09:03.012259 29368 authenticator.cpp:326] Authentication requires more steps
      I0606 17:09:03.012300 29338 sched.cpp:224] Version: 1.0.0
      I0606 17:09:03.012362 29362 authenticatee.cpp:259] Received SASL authentication step
      I0606 17:09:03.012496 29363 authenticator.cpp:232] Received SASL authentication step
      I0606 17:09:03.012531 29363 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '1ead8e6f9ec5' server FQDN: '1ead8e6f9ec5' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
      I0606 17:09:03.012543 29363 auxprop.cpp:179] Looking up auxiliary property '*userPassword'
      I0606 17:09:03.012576 29363 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
      I0606 17:09:03.012614 29363 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '1ead8e6f9ec5' server FQDN: '1ead8e6f9ec5' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
      I0606 17:09:03.012626 29363 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
      I0606 17:09:03.012632 29363 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
      I0606 17:09:03.012645 29363 authenticator.cpp:318] Authentication success
      I0606 17:09:03.012899 29361 authenticator.cpp:432] Authentication session cleanup for crammd5_authenticatee(126)@172.17.0.2:39784
      I0606 17:09:03.012929 29360 master.cpp:5955] Successfully authenticated principal 'test-principal' at slave(41)@172.17.0.2:39784
      I0606 17:09:03.013170 29368 authenticatee.cpp:299] Authentication success
      I0606 17:09:03.013368 29368 sched.cpp:328] New master detected at master@172.17.0.2:39784
      I0606 17:09:03.013453 29368 sched.cpp:384] Authenticating with master master@172.17.0.2:39784
      I0606 17:09:03.013473 29368 sched.cpp:391] Using default CRAM-MD5 authenticatee
      I0606 17:09:03.013674 29358 authenticatee.cpp:121] Creating new client SASL connection
      I0606 17:09:03.013984 29364 master.cpp:5925] Authenticating scheduler-05379d5d-39d9-4f8d-b6c4-8ebb7d92613e@172.17.0.2:39784
      I0606 17:09:03.014000 29357 slave.cpp:1098] Successfully authenticated with master master@172.17.0.2:39784
      I0606 17:09:03.014120 29359 authenticator.cpp:414] Starting authentication session for crammd5_authenticatee(127)@172.17.0.2:39784
      I0606 17:09:03.014191 29357 slave.cpp:1501] Will retry registration in 7.250151ms if necessary
      I0606 17:09:03.014392 29367 authenticator.cpp:98] Creating new server SASL connection
      I0606 17:09:03.014456 29366 master.cpp:4635] Registering agent at slave(41)@172.17.0.2:39784 (1ead8e6f9ec5) with id 7d25f96c-ab51-4074-b613-df1b15675b81-S0
      I0606 17:09:03.014631 29372 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
      I0606 17:09:03.014669 29372 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
      I0606 17:09:03.014936 29371 authenticator.cpp:204] Received SASL authentication start
      I0606 17:09:03.015007 29371 authenticator.cpp:326] Authentication requires more steps
      I0606 17:09:03.015121 29369 registrar.cpp:464] Applied 1 operations in 80047ns; attempting to update the 'registry'
      I0606 17:09:03.015172 29372 authenticatee.cpp:259] Received SASL authentication step
      I0606 17:09:03.015329 29372 authenticator.cpp:232] Received SASL authentication step
      I0606 17:09:03.015367 29372 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '1ead8e6f9ec5' server FQDN: '1ead8e6f9ec5' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
      I0606 17:09:03.015379 29372 auxprop.cpp:179] Looking up auxiliary property '*userPassword'
      I0606 17:09:03.015405 29372 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
      I0606 17:09:03.015427 29372 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '1ead8e6f9ec5' server FQDN: '1ead8e6f9ec5' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
      I0606 17:09:03.015439 29372 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
      I0606 17:09:03.015446 29372 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
      I0606 17:09:03.015463 29372 authenticator.cpp:318] Authentication success
      I0606 17:09:03.015569 29370 authenticatee.cpp:299] Authentication success
      I0606 17:09:03.015602 29366 master.cpp:5955] Successfully authenticated principal 'test-principal' at scheduler-05379d5d-39d9-4f8d-b6c4-8ebb7d92613e@172.17.0.2:39784
      I0606 17:09:03.015635 29372 authenticator.cpp:432] Authentication session cleanup for crammd5_authenticatee(127)@172.17.0.2:39784
      I0606 17:09:03.016139 29370 sched.cpp:474] Successfully authenticated with master master@172.17.0.2:39784
      I0606 17:09:03.016167 29370 sched.cpp:785] Sending SUBSCRIBE call to master@172.17.0.2:39784
      I0606 17:09:03.016304 29364 log.cpp:548] Attempting to append 337 bytes to the log
      I0606 17:09:03.016314 29370 sched.cpp:818] Will retry registration in 1.18637508secs if necessary
      I0606 17:09:03.016474 29370 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3
      I0606 17:09:03.016614 29359 master.cpp:2521] Received SUBSCRIBE call for framework 'Framework 2' at scheduler-05379d5d-39d9-4f8d-b6c4-8ebb7d92613e@172.17.0.2:39784
      I0606 17:09:03.016782 29359 master.cpp:1990] Authorizing framework principal 'test-principal' to receive offers for role '*'
      I0606 17:09:03.017385 29358 master.cpp:2597] Subscribing framework Framework 2 with checkpointing disabled and capabilities [ REVOCABLE_RESOURCES ]
      I0606 17:09:03.017515 29358 master.cpp:2661] Updating info for framework 7d25f96c-ab51-4074-b613-df1b15675b81-0000
      I0606 17:09:03.017648 29361 replica.cpp:537] Replica received write request for position 3 from (1817)@172.17.0.2:39784
      I0606 17:09:03.017693 29358 master.cpp:2674] Framework 7d25f96c-ab51-4074-b613-df1b15675b81-0000 (Framework 2) at scheduler-fa9f8a5a-c0ef-4162-a371-d9b9affd58b2@172.17.0.2:39784 failed over
      I0606 17:09:03.017875 29365 sched.cpp:1136] Got error 'Framework failed over'
      I0606 17:09:03.017997 29365 sched.cpp:1983] Asked to abort the driver
      I0606 17:09:03.018044 29357 sched.cpp:712] Framework registered with 7d25f96c-ab51-4074-b613-df1b15675b81-0000
      I0606 17:09:03.018101 29365 sched.cpp:1147] Scheduler::error took 49642ns
      I0606 17:09:03.018149 29357 sched.cpp:726] Scheduler::registered took 77613ns
      I0606 17:09:03.018177 29365 sched.cpp:1182] Aborting framework '7d25f96c-ab51-4074-b613-df1b15675b81-0000'
      I0606 17:09:03.017907 29361 leveldb.cpp:341] Persisting action (356 bytes) to leveldb took 212588ns
      I0606 17:09:03.018410 29361 replica.cpp:712] Persisted action at 3
      W0606 17:09:03.018651 29366 master.cpp:2808] Ignoring deactivate framework message for framework 7d25f96c-ab51-4074-b613-df1b15675b81-0000 (Framework 2) at scheduler-05379d5d-39d9-4f8d-b6c4-8ebb7d92613e@172.17.0.2:39784 because it is not expected from scheduler-fa9f8a5a-c0ef-4162-a371-d9b9affd58b2@172.17.0.2:39784
      I0606 17:09:03.019297 29366 replica.cpp:691] Replica received learned notice for position 3 from @0.0.0.0:0
      I0606 17:09:03.019763 29366 leveldb.cpp:341] Persisting action (358 bytes) to leveldb took 428994ns
      I0606 17:09:03.019795 29366 replica.cpp:712] Persisted action at 3
      I0606 17:09:03.019819 29366 replica.cpp:697] Replica learned APPEND action at position 3
      I0606 17:09:03.020942 29364 process.cpp:3254] Handling HTTP event for process 'master' with path: '/master/state'
      I0606 17:09:03.021627 29361 registrar.cpp:509] Successfully updated the 'registry' in 6.425856ms
      I0606 17:09:03.021914 29371 log.cpp:567] Attempting to truncate the log to 3
      I0606 17:09:03.022142 29370 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 4
      I0606 17:09:03.022351 29362 slave.cpp:1501] Will retry registration in 2.888229ms if necessary
      I0606 17:09:03.022819 29363 slave.cpp:3737] Received ping from slave-observer(40)@172.17.0.2:39784
      I0606 17:09:03.022935 29360 master.cpp:4703] Registered agent 7d25f96c-ab51-4074-b613-df1b15675b81-S0 at slave(41)@172.17.0.2:39784 (1ead8e6f9ec5) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
      I0606 17:09:03.023005 29363 slave.cpp:1142] Registered with master master@172.17.0.2:39784; given agent ID 7d25f96c-ab51-4074-b613-df1b15675b81-S0
      I0606 17:09:03.023035 29363 fetcher.cpp:86] Clearing fetcher cache
      I0606 17:09:03.023097 29360 http.cpp:483] HTTP GET for /master/state from 172.17.0.2:47179
      I0606 17:09:03.023205 29366 status_update_manager.cpp:181] Resuming sending status updates
      I0606 17:09:03.023236 29365 hierarchical.cpp:473] Added agent 7d25f96c-ab51-4074-b613-df1b15675b81-S0 (1ead8e6f9ec5) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: )
      I0606 17:09:03.023524 29363 slave.cpp:1165] Checkpointing SlaveInfo to '/tmp/FaultToleranceTest_UpdateFrameworkInfoOnSchedulerFailover_fcOHgu/meta/slaves/7d25f96c-ab51-4074-b613-df1b15675b81-S0/slave.info'
      I0606 17:09:03.023816 29372 replica.cpp:537] Replica received write request for position 4 from (1820)@172.17.0.2:39784
      I0606 17:09:03.023905 29360 master.cpp:4605] Agent 7d25f96c-ab51-4074-b613-df1b15675b81-S0 at slave(41)@172.17.0.2:39784 (1ead8e6f9ec5) already registered, resending acknowledgement
      I0606 17:09:03.024102 29363 slave.cpp:1202] Forwarding total oversubscribed resources 
      W0606 17:09:03.024260 29363 slave.cpp:1188] Already registered with master master@172.17.0.2:39784
      I0606 17:09:03.024292 29363 slave.cpp:1202] Forwarding total oversubscribed resources 
      I0606 17:09:03.024327 29372 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 456116ns
      I0606 17:09:03.024356 29372 replica.cpp:712] Persisted action at 4
      I0606 17:09:03.024351 29359 master.cpp:5048] Received update of agent 7d25f96c-ab51-4074-b613-df1b15675b81-S0 at slave(41)@172.17.0.2:39784 (1ead8e6f9ec5) with total oversubscribed resources 
      I0606 17:09:03.024433 29365 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:03.024502 29365 hierarchical.cpp:1162] Performed allocation for agent 7d25f96c-ab51-4074-b613-df1b15675b81-S0 in 1.205716ms
      I0606 17:09:03.025060 29359 master.cpp:5048] Received update of agent 7d25f96c-ab51-4074-b613-df1b15675b81-S0 at slave(41)@172.17.0.2:39784 (1ead8e6f9ec5) with total oversubscribed resources 
      [libprotobuf FATAL ../3rdparty/protobuf-2.6.1/src/google/protobuf/repeated_field.h:886] CHECK failed: (index) < (size()): 
      I0606 17:09:03.025326 29365 hierarchical.cpp:531] Agent 7d25f96c-ab51-4074-b613-df1b15675b81-S0 (1ead8e6f9ec5) updated with oversubscribed resources  (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000])
      I0606 17:09:03.025398 29367 replica.cpp:691] Replica received learned notice for position 4 from @0.0.0.0:0
      I0606 17:09:03.025485 29365 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:03.025543 29365 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:03.025624 29365 hierarchical.cpp:1162] Performed allocation for agent 7d25f96c-ab51-4074-b613-df1b15675b81-S0 in 236277ns
      I0606 17:09:03.025830 29367 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 390640ns
      libprocess: (1821)@I0606 17:09:03.025889 29359 master.cpp:5754] Sending 1 offers to framework 7d25f96c-ab51-4074-b613-df1b15675b81-0000 (Framework 2) at scheduler-05379d5d-39d9-4f8d-b6c4-8ebb7d92613e@172.17.0.2:39784
      172.17.0.2:39784 terminating due to CHECK failed: (index) < (size()): 
      I0606 17:09:03.025944 29367 leveldb.cpp:399] Deleting ~2 keys from leveldb took 67826ns
      I0606 17:09:03.025980 29367 replica.cpp:712] Persisted action at 4
      I0606 17:09:03.026021 29367 replica.cpp:697] Replica learned TRUNCATE action at position 4
      I0606 17:09:03.026072 29365 hierarchical.cpp:531] Agent 7d25f96c-ab51-4074-b613-df1b15675b81-S0 (1ead8e6f9ec5) updated with oversubscribed resources  (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000])
      I0606 17:09:03.026221 29365 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:03.026264 29365 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:03.026319 29365 hierarchical.cpp:1162] Performed allocation for agent 7d25f96c-ab51-4074-b613-df1b15675b81-S0 in 196386ns
      I0606 17:09:03.026330 29362 sched.cpp:882] Scheduler::resourceOffers took 29474ns
      I0606 17:09:03.966789 29370 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:03.966842 29370 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:03.966917 29370 hierarchical.cpp:1139] Performed allocation for 1 agents in 306394ns
      I0606 17:09:04.930238 31721 exec.cpp:98] Committing suicide by killing the process group
      I0606 17:09:04.930488 31742 exec.cpp:98] Committing suicide by killing the process group
      I0606 17:09:04.932122 31715 exec.cpp:98] Committing suicide by killing the process group
      I0606 17:09:04.967631 29368 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:04.967682 29368 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:04.967756 29368 hierarchical.cpp:1139] Performed allocation for 1 agents in 243827ns
      I0606 17:09:05.969039 29367 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:05.969097 29367 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:05.969184 29367 hierarchical.cpp:1139] Performed allocation for 1 agents in 260260ns
      I0606 17:09:06.970069 29370 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:06.970116 29370 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:06.970175 29370 hierarchical.cpp:1139] Performed allocation for 1 agents in 208444ns
      I0606 17:09:07.970765 29359 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:07.970818 29359 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:07.970890 29359 hierarchical.cpp:1139] Performed allocation for 1 agents in 258423ns
      I0606 17:09:08.007097 29370 sched.cpp:486] Ignoring authentication timeout because the driver is not running!
      I0606 17:09:08.972277 29370 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:08.972326 29370 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:08.972383 29370 hierarchical.cpp:1139] Performed allocation for 1 agents in 206527ns
      I0606 17:09:09.973254 29372 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:09.973302 29372 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:09.973361 29372 hierarchical.cpp:1139] Performed allocation for 1 agents in 212278ns
      I0606 17:09:10.974567 29371 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:10.974620 29371 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:10.974841 29371 hierarchical.cpp:1139] Performed allocation for 1 agents in 253821ns
      I0606 17:09:11.975347 29368 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:11.975396 29368 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:11.975455 29368 hierarchical.cpp:1139] Performed allocation for 1 agents in 208010ns
      I0606 17:09:12.976863 29362 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:12.976918 29362 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:12.976991 29362 hierarchical.cpp:1139] Performed allocation for 1 agents in 353721ns
      I0606 17:09:13.978327 29370 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:13.978474 29370 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:13.978638 29370 hierarchical.cpp:1139] Performed allocation for 1 agents in 1.035881ms
      I0606 17:09:14.979547 29365 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:14.979603 29365 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:14.979676 29365 hierarchical.cpp:1139] Performed allocation for 1 agents in 268215ns
      I0606 17:09:15.980909 29362 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:15.980957 29362 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:15.981017 29362 hierarchical.cpp:1139] Performed allocation for 1 agents in 229419ns
      I0606 17:09:16.982249 29357 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:16.982297 29357 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:16.982357 29357 hierarchical.cpp:1139] Performed allocation for 1 agents in 226096ns
      I0606 17:09:17.983276 29365 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:17.983322 29365 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:17.983379 29365 hierarchical.cpp:1139] Performed allocation for 1 agents in 204849ns
      I0606 17:09:18.012662 29370 slave.cpp:4999] Querying resource estimator for oversubscribable resources
      I0606 17:09:18.013126 29360 slave.cpp:5013] Received oversubscribable resources  from the resource estimator
      I0606 17:09:18.024147 29366 slave.cpp:3737] Received ping from slave-observer(40)@172.17.0.2:39784
      I0606 17:09:18.984776 29367 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:18.984824 29367 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:18.984884 29367 hierarchical.cpp:1139] Performed allocation for 1 agents in 264252ns
      I0606 17:09:19.986194 29365 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:19.986248 29365 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:19.986309 29365 hierarchical.cpp:1139] Performed allocation for 1 agents in 216298ns
      I0606 17:09:20.987082 29366 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:20.987128 29366 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:20.987186 29366 hierarchical.cpp:1139] Performed allocation for 1 agents in 204558ns
      I0606 17:09:21.988206 29358 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:21.988255 29358 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:21.988314 29358 hierarchical.cpp:1139] Performed allocation for 1 agents in 224919ns
      I0606 17:09:22.989120 29360 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:22.989166 29360 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:22.989225 29360 hierarchical.cpp:1139] Performed allocation for 1 agents in 212465ns
      I0606 17:09:23.990144 29368 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:23.990216 29368 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:23.990299 29368 hierarchical.cpp:1139] Performed allocation for 1 agents in 389041ns
      I0606 17:09:24.991322 29357 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:24.991394 29357 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:24.991478 29357 hierarchical.cpp:1139] Performed allocation for 1 agents in 369972ns
      I0606 17:09:25.992058 29361 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:25.992105 29361 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:25.992166 29361 hierarchical.cpp:1139] Performed allocation for 1 agents in 231349ns
      I0606 17:09:26.993691 29366 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:26.993775 29366 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:26.993860 29366 hierarchical.cpp:1139] Performed allocation for 1 agents in 421170ns
      I0606 17:09:27.994453 29369 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:27.994501 29369 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:27.994562 29369 hierarchical.cpp:1139] Performed allocation for 1 agents in 253922ns
      I0606 17:09:28.995723 29372 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:28.995813 29372 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:28.996001 29372 hierarchical.cpp:1139] Performed allocation for 1 agents in 757147ns
      I0606 17:09:29.996526 29365 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:29.996578 29365 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:29.996644 29365 hierarchical.cpp:1139] Performed allocation for 1 agents in 272727ns
      I0606 17:09:30.997154 29368 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:30.997208 29368 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:30.997273 29368 hierarchical.cpp:1139] Performed allocation for 1 agents in 260618ns
      I0606 17:09:31.997973 29358 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:31.998029 29358 hierarchical.cpp:1583] No inverse offers to send out!
      I0606 17:09:31.998103 29358 hierarchical.cpp:1139] Performed allocation for 1 agents in 471132ns
      I0606 17:09:32.999361 29360 hierarchical.cpp:1488] No allocations performed
      I0606 17:09:32.999409 29360 hierarchical.cpp:1583] No inverse offers to send out!
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            vinodkone Vinod Kone
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: