Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-10116

Attempt to reactivate disconnected agent crashes the master

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.9.0, 1.10.0
    • 1.9.1, 1.10.0
    • master
    • None

    Description

      Observed the following scenario on a production cluster:

      • operator performs agent draining
      • draining completes, operator disconnects the agent
      • operator reactivates agent via REACTIVATE_AGENT call
      • master issues an offer for a reactivated disconnected agent
      • a framework issues ACCEPT call with this offer
      • master crashes with the following stack trace:
        F0311 09:06:18.852365 11289 validation.cpp:2123] Check failed: slave->connected Offer 4067082c-ec7a-4efc-ac2d-c6e7cbc77356-O13981526 outlived disconnected agent 968ea9b2-374d-45cb-b5b3-c4ffb45a4a78-S0 at slave(1)@10.50.7.59:5051 (10.50.7.59)
        *** Check failure stack trace: ***
        @ 0x7feac6a1dc6d google::LogMessage::Fail()
        @ 0x7feac6a1fec8 google::LogMessage::SendToLog()
        @ 0x7feac6a1d803 google::LogMessage::Flush()
        @ 0x7feac6a20809 google::LogMessageFatal::~LogMessageFatal()
        @ 0x7feac57cdea0 mesos::internal::master::validation::offer::validateSlave()
        @ 0x7feac57d09c1 std::_Function_handler<>::_M_invoke()
        @ 0x7feac57d0fd1 std::function<>::operator()()
        @ 0x7feac57cea3c mesos::internal::master::validation::offer::validate()
        @ 0x7feac56d5565 mesos::internal::master::Master::accept()
        @ 0x7feac56468f0 mesos::internal::master::Master::Http::scheduler()
        @ 0x7feac5689797 _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEERKNS2_7RequestERK6OptionINS2_14authentication9PrincipalEEEZN5mesos8internal6master6Master10initializeEvEUlS7_SD_E1_E9_M_invokeERKSt9_Any_dataS7_SD_
        @ 0x7feac697038c _ZNO6lambda12CallableOnceIFN7process6FutureINS1_4http8ResponseEEEvEE10CallableFnINS_8internal7PartialIZZNS1_11ProcessBase8_consumeERKNSB_12HttpEndpointERKSsRKNS1_5OwnedINS3_7RequestEEEENKUlRK6OptionINS3_14authentication20AuthenticationResultEEE0_clESR_EUlbE0_IbEEEEclEv
        @ 0x7feac53f30e7 _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8internal8DispatchINS1_6FutureINS1_4http8ResponseEEEEclINS0_IFSE_vEEEEESE_RKNS1_4UPIDEOT_EUlSt10unique_ptrINS1_7PromiseISD_EESt14default_deleteISQ_EEOSI_S3_E_IST_SI_St12_PlaceholderILi1EEEEEEclEOS3_
        @ 0x7feac6966561 process::ProcessBase::consume()
        @ 0x7feac697db5b process::ProcessManager::resume()
        @ 0x7feac69837f6 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
        @ 0x7feac262f070 (unknown)
        @ 0x7feac1e4de65 start_thread
        @ 0x7feac1b7688d __clone
        

      Attachments

        Issue Links

          Activity

            People

              asekretenko Andrei Sekretenko
              asekretenko Andrei Sekretenko
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: