Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-3275

ContentType/HttpApiTest.UpdatePidToHttpSchedulerWithoutForce is flaky

    XMLWordPrintableJSON

Details

    Description

      Observed this on ASF CI.

      This looks like the HTTP scheduler received an event (heartbeat?) after the master disallowed it to upgrade from PID to HTTP.

      [ RUN      ] ContentType/HttpApiTest.UpdatePidToHttpSchedulerWithoutForce/1
      Using temporary directory '/tmp/ContentType_HttpApiTest_UpdatePidToHttpSchedulerWithoutForce_1_DTcsOS'
      I0816 23:17:25.201861 28791 leveldb.cpp:176] Opened db in 2.031526ms
      I0816 23:17:25.202529 28791 leveldb.cpp:183] Compacted db in 623251ns
      I0816 23:17:25.202581 28791 leveldb.cpp:198] Created db iterator in 25756ns
      I0816 23:17:25.202605 28791 leveldb.cpp:204] Seeked to beginning of db in 2310ns
      I0816 23:17:25.202620 28791 leveldb.cpp:273] Iterated through 0 keys in the db in 478ns
      I0816 23:17:25.202680 28791 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
      I0816 23:17:25.203255 28814 recover.cpp:449] Starting replica recovery
      I0816 23:17:25.203511 28814 recover.cpp:475] Replica is in EMPTY status
      I0816 23:17:25.204787 28819 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request
      I0816 23:17:25.205131 28823 recover.cpp:195] Received a recover response from a replica in EMPTY status
      I0816 23:17:25.205700 28825 recover.cpp:566] Updating replica status to STARTING
      I0816 23:17:25.206635 28811 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 724632ns
      I0816 23:17:25.206665 28811 replica.cpp:323] Persisted replica status to STARTING
      I0816 23:17:25.206836 28811 recover.cpp:475] Replica is in STARTING status
      I0816 23:17:25.207716 28810 replica.cpp:641] Replica in STARTING status received a broadcasted recover request
      I0816 23:17:25.208051 28812 recover.cpp:195] Received a recover response from a replica in STARTING status
      I0816 23:17:25.209333 28817 master.cpp:378] Master 20150816-231725-3170963884-43501-28791 (f3a9451c4cf3) started on 172.17.1.189:43501
      I0816 23:17:25.209509 28817 master.cpp:380] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="false" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/ContentType_HttpApiTest_UpdatePidToHttpSchedulerWithoutForce_1_DTcsOS/credentials" --framework_sorter="drf" --help="false" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-0.24.0/_inst/share/mesos/webui" --work_dir="/tmp/ContentType_HttpApiTest_UpdatePidToHttpSchedulerWithoutForce_1_DTcsOS/master" --zk_session_timeout="10secs"
      I0816 23:17:25.209997 28817 master.cpp:427] Master allowing unauthenticated frameworks to register
      I0816 23:17:25.210186 28817 master.cpp:430] Master only allowing authenticated slaves to register
      I0816 23:17:25.210292 28817 credentials.hpp:37] Loading credentials for authentication from '/tmp/ContentType_HttpApiTest_UpdatePidToHttpSchedulerWithoutForce_1_DTcsOS/credentials'
      I0816 23:17:25.210752 28817 master.cpp:469] Using default 'crammd5' authenticator
      I0816 23:17:25.210754 28825 recover.cpp:566] Updating replica status to VOTING
      I0816 23:17:25.211364 28817 master.cpp:506] Authorization enabled
      I0816 23:17:25.211706 28813 hierarchical.hpp:346] Initialized hierarchical allocator process
      I0816 23:17:25.211778 28811 whitelist_watcher.cpp:79] No whitelist given
      I0816 23:17:25.213102 28817 master.cpp:1525] The newly elected leader is master@172.17.1.189:43501 with id 20150816-231725-3170963884-43501-28791
      I0816 23:17:25.213229 28817 master.cpp:1538] Elected as the leading master!
      I0816 23:17:25.213354 28817 master.cpp:1308] Recovering from registrar
      I0816 23:17:25.213632 28825 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 1.357147ms
      I0816 23:17:25.213762 28825 replica.cpp:323] Persisted replica status to VOTING
      I0816 23:17:25.213724 28817 registrar.cpp:311] Recovering registrar
      I0816 23:17:25.213999 28825 recover.cpp:580] Successfully joined the Paxos group
      I0816 23:17:25.215277 28825 recover.cpp:464] Recover process terminated
      I0816 23:17:25.215857 28813 log.cpp:661] Attempting to start the writer
      I0816 23:17:25.217447 28814 replica.cpp:477] Replica received implicit promise request with proposal 1
      I0816 23:17:25.218040 28814 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 424309ns
      I0816 23:17:25.218194 28814 replica.cpp:345] Persisted promised to 1
      I0816 23:17:25.218904 28814 coordinator.cpp:231] Coordinator attemping to fill missing position
      I0816 23:17:25.220862 28819 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2
      I0816 23:17:25.222211 28819 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 1.302196ms
      I0816 23:17:25.222260 28819 replica.cpp:679] Persisted action at 0
      I0816 23:17:25.223625 28823 replica.cpp:511] Replica received write request for position 0
      I0816 23:17:25.223696 28823 leveldb.cpp:438] Reading position from leveldb took 35551ns
      I0816 23:17:25.224108 28823 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 358020ns
      I0816 23:17:25.224138 28823 replica.cpp:679] Persisted action at 0
      I0816 23:17:25.224994 28816 replica.cpp:658] Replica received learned notice for position 0
      I0816 23:17:25.225673 28816 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 363713ns
      I0816 23:17:25.225883 28816 replica.cpp:679] Persisted action at 0
      I0816 23:17:25.226079 28816 replica.cpp:664] Replica learned NOP action at position 0
      I0816 23:17:25.226887 28816 log.cpp:677] Writer started with ending position 0
      I0816 23:17:25.228317 28810 leveldb.cpp:438] Reading position from leveldb took 40494ns
      I0816 23:17:25.231591 28810 registrar.cpp:344] Successfully fetched the registry (0B) in 17.540864ms
      I0816 23:17:25.231739 28810 registrar.cpp:443] Applied 1 operations in 30654ns; attempting to update the 'registry'
      I0816 23:17:25.234084 28818 log.cpp:685] Attempting to append 178 bytes to the log
      I0816 23:17:25.234246 28815 coordinator.cpp:341] Coordinator attempting to write APPEND action at position 1
      I0816 23:17:25.235080 28821 replica.cpp:511] Replica received write request for position 1
      I0816 23:17:25.235671 28821 leveldb.cpp:343] Persisting action (197 bytes) to leveldb took 556874ns
      I0816 23:17:25.235700 28821 replica.cpp:679] Persisted action at 1
      I0816 23:17:25.236559 28821 replica.cpp:658] Replica received learned notice for position 1
      I0816 23:17:25.236899 28821 leveldb.cpp:343] Persisting action (199 bytes) to leveldb took 317171ns
      I0816 23:17:25.236922 28821 replica.cpp:679] Persisted action at 1
      I0816 23:17:25.236938 28821 replica.cpp:664] Replica learned APPEND action at position 1
      I0816 23:17:25.238258 28821 log.cpp:704] Attempting to truncate the log to 1
      I0816 23:17:25.238286 28818 registrar.cpp:488] Successfully updated the 'registry' in 6.47808ms
      I0816 23:17:25.238657 28818 registrar.cpp:374] Successfully recovered registrar
      I0816 23:17:25.238965 28821 coordinator.cpp:341] Coordinator attempting to write TRUNCATE action at position 2
      I0816 23:17:25.239836 28823 replica.cpp:511] Replica received write request for position 2
      I0816 23:17:25.240257 28823 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 384807ns
      I0816 23:17:25.240303 28823 replica.cpp:679] Persisted action at 2
      I0816 23:17:25.241461 28823 replica.cpp:658] Replica received learned notice for position 2
      I0816 23:17:25.241971 28813 master.cpp:1335] Recovered 0 slaves from the Registry (139B) ; allowing 10mins for slaves to re-register
      I0816 23:17:25.242049 28823 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 555462ns
      I0816 23:17:25.242127 28823 leveldb.cpp:401] Deleting ~1 keys from leveldb took 48784ns
      I0816 23:17:25.242152 28823 replica.cpp:679] Persisted action at 2
      I0816 23:17:25.242205 28823 replica.cpp:664] Replica learned TRUNCATE action at position 2
      I0816 23:17:25.257192 28791 sched.cpp:164] Version: 0.24.0
      I0816 23:17:25.257940 28819 sched.cpp:262] New master detected at master@172.17.1.189:43501
      I0816 23:17:25.258066 28819 sched.cpp:318] Authenticating with master master@172.17.1.189:43501
      I0816 23:17:25.258091 28819 sched.cpp:325] Using default CRAM-MD5 authenticatee
      I0816 23:17:25.258472 28819 authenticatee.cpp:115] Creating new client SASL connection
      I0816 23:17:25.258849 28819 master.cpp:4694] Authenticating scheduler-291ea773-fa51-48e4-bf4a-f5298409953b@172.17.1.189:43501
      I0816 23:17:25.259053 28819 authenticator.cpp:407] Starting authentication session for crammd5_authenticatee(694)@172.17.1.189:43501
      I0816 23:17:25.259399 28812 authenticator.cpp:92] Creating new server SASL connection
      I0816 23:17:25.259770 28819 authenticatee.cpp:206] Received SASL authentication mechanisms: CRAM-MD5
      I0816 23:17:25.259820 28819 authenticatee.cpp:232] Attempting to authenticate with mechanism 'CRAM-MD5'
      I0816 23:17:25.259961 28819 authenticator.cpp:197] Received SASL authentication start
      I0816 23:17:25.260041 28819 authenticator.cpp:319] Authentication requires more steps
      I0816 23:17:25.260155 28819 authenticatee.cpp:252] Received SASL authentication step
      I0816 23:17:25.260319 28812 authenticator.cpp:225] Received SASL authentication step
      I0816 23:17:25.260354 28812 auxprop.cpp:102] Request to lookup properties for user: 'test-principal' realm: 'f3a9451c4cf3' server FQDN: 'f3a9451c4cf3' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
      I0816 23:17:25.260370 28812 auxprop.cpp:174] Looking up auxiliary property '*userPassword'
      I0816 23:17:25.260426 28812 auxprop.cpp:174] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
      I0816 23:17:25.260464 28812 auxprop.cpp:102] Request to lookup properties for user: 'test-principal' realm: 'f3a9451c4cf3' server FQDN: 'f3a9451c4cf3' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
      I0816 23:17:25.260483 28812 auxprop.cpp:124] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
      I0816 23:17:25.260494 28812 auxprop.cpp:124] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
      I0816 23:17:25.260515 28812 authenticator.cpp:311] Authentication success
      I0816 23:17:25.260612 28819 authenticatee.cpp:292] Authentication success
      I0816 23:17:25.260768 28811 authenticator.cpp:425] Authentication session cleanup for crammd5_authenticatee(694)@172.17.1.189:43501
      I0816 23:17:25.260694 28812 master.cpp:4724] Successfully authenticated principal 'test-principal' at scheduler-291ea773-fa51-48e4-bf4a-f5298409953b@172.17.1.189:43501
      I0816 23:17:25.261353 28819 sched.cpp:407] Successfully authenticated with master master@172.17.1.189:43501
      I0816 23:17:25.261376 28819 sched.cpp:713] Sending SUBSCRIBE call to master@172.17.1.189:43501
      I0816 23:17:25.261497 28819 sched.cpp:746] Will retry registration in 1.206717027secs if necessary
      I0816 23:17:25.261682 28825 master.cpp:2094] Received SUBSCRIBE call for framework 'default' at scheduler-291ea773-fa51-48e4-bf4a-f5298409953b@172.17.1.189:43501
      I0816 23:17:25.261885 28825 master.cpp:1564] Authorizing framework principal 'test-principal' to receive offers for role '*'
      I0816 23:17:25.262307 28825 master.cpp:2164] Subscribing framework default with checkpointing disabled and capabilities [  ]
      I0816 23:17:25.262744 28814 hierarchical.hpp:391] Added framework 20150816-231725-3170963884-43501-28791-0000
      I0816 23:17:25.262915 28814 hierarchical.hpp:1008] No resources available to allocate!
      I0816 23:17:25.263098 28814 hierarchical.hpp:908] Performed allocation for 0 slaves in 192184ns
      I0816 23:17:25.263025 28825 sched.cpp:640] Framework registered with 20150816-231725-3170963884-43501-28791-0000
      I0816 23:17:25.263273 28825 sched.cpp:654] Scheduler::registered took 123101ns
      I0816 23:17:25.265447 28823 process.cpp:3043] Handling HTTP event for process 'master' with path: '/master/api/v1/scheduler'
      I0816 23:17:25.265732 28823 http.cpp:319] HTTP POST for /master/api/v1/scheduler from 172.17.1.189:41579
      I0816 23:17:25.266278 28823 master.cpp:1783] Received subscription request for HTTP framework 'default'
      I0816 23:17:25.266372 28823 master.cpp:1564] Authorizing framework principal 'test-principal' to receive offers for role '*'
      I0816 23:17:25.266746 28822 master.cpp:1875] Subscribing framework 'default' with checkpointing disabled and capabilities [  ]
      I0816 23:17:25.266922 28822 master.cpp:1916] Updating info for framework 20150816-231725-3170963884-43501-28791-0000
      E0816 23:17:25.267024 28822 master.cpp:1931] Disallowing subscription attempt of framework 20150816-231725-3170963884-43501-28791-0000 (default) at scheduler-291ea773-fa51-48e4-bf4a-f5298409953b@172.17.1.189:43501 because it is already connected
      I0816 23:17:25.269484 28820 hierarchical.hpp:1008] No resources available to allocate!
      I0816 23:17:25.269604 28820 hierarchical.hpp:908] Performed allocation for 0 slaves in 137770ns
      ../../src/tests/http_api_tests.cpp:598: Failure
      Value of: event.isPending()
        Actual: false
      Expected: true
      I0816 23:17:25.285747 28823 master.cpp:1051] Framework 20150816-231725-3170963884-43501-28791-0000 (default) at scheduler-291ea773-fa51-48e4-bf4a-f5298409953b@172.17.1.189:43501 disconnected
      I0816 23:17:25.285939 28823 master.cpp:2370] Disconnecting framework 20150816-231725-3170963884-43501-28791-0000 (default) at scheduler-291ea773-fa51-48e4-bf4a-f5298409953b@172.17.1.189:43501
      I0816 23:17:25.286100 28823 master.cpp:2394] Deactivating framework 20150816-231725-3170963884-43501-28791-0000 (default) at scheduler-291ea773-fa51-48e4-bf4a-f5298409953b@172.17.1.189:43501
      I0816 23:17:25.286303 28814 hierarchical.hpp:474] Deactivated framework 20150816-231725-3170963884-43501-28791-0000
      I0816 23:17:25.286293 28823 master.cpp:1075] Giving framework 20150816-231725-3170963884-43501-28791-0000 (default) at scheduler-291ea773-fa51-48e4-bf4a-f5298409953b@172.17.1.189:43501 2weeks to failover
      I0816 23:17:25.286866 28812 master.cpp:860] Master terminating
      I0816 23:17:25.287035 28817 hierarchical.hpp:428] Removed framework 20150816-231725-3170963884-43501-28791-0000
      [  FAILED  ] ContentType/HttpApiTest.UpdatePidToHttpSchedulerWithoutForce/1, where GetParam() = "application/json" (94 ms)
      
      

      Attachments

        Activity

          People

            bmahler Benjamin Mahler
            vinodkone Vinod Kone
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: