Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-3834

slave upgrade framework checkpoint incompatibility

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.24.1
    • 0.24.2, 0.25.1, 0.26.1, 0.27.0
    • None
    • None

    Description

      We are upgrading from 0.22 to 0.25 and experienced the following crash in the 0.24 slave:

      F1104 05:20:49.162701  1153 slave.cpp:4175] Check failed: frameworkInfo.has_id()
      *** Check failure stack trace: ***
          @     0x7fef9c294650  google::LogMessage::Fail()
          @     0x7fef9c29459f  google::LogMessage::SendToLog()
          @     0x7fef9c293fb0  google::LogMessage::Flush()
          @     0x7fef9c296ce4  google::LogMessageFatal::~LogMessageFatal()
          @     0x7fef9b9a5492  mesos::internal::slave::Slave::recoverFramework()
          @     0x7fef9b9a3314  mesos::internal::slave::Slave::recover()
          @     0x7fef9b9d069c  _ZZN7process8dispatchI7NothingN5mesos8internal5slave5SlaveERK6ResultINS4_5state5StateEES9_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSG_FSE_T1_ET2_ENKUlPNS_11ProcessBaseEE_clESP_
          @     0x7fef9ba039f4  _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingN5mesos8internal5slave5SlaveERK6ResultINS8_5state5StateEESD_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSK_FSI_T1_ET2_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
      

      As near as I can tell, what happened was this:

      • 0.22 wrote framework.info without the FrameworkID
      • 0.23 had a compatibility check so it was ok with it
      • 0.24 removed the compatibility check in MESOS-2259
      • the framework checkpoint doesn't get rewritten during recovery so when the 0.24 slave starts it reads the 0.22 version
      • 0.24 asserts

      Attachments

        Issue Links

          Activity

            People

              jamespeach James Peach
              jamespeach James Peach
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: