Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-1376

CHECK failure in the Registrar

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 0.19.0
    • 0.19.0
    • master
    • None
    • Q2'14 Sprint 2

    Description

      I0515 05:44:37.049137  7179 master.cpp:2301] Ignoring re-register slave message from slave 20140416-015639-1890854154-5050-1354-24152 at slave(1)@10.34.119.132:5051 (smf1-aep-35-sr1.prod.twitter.com) as readmission is already in progress
      E0515 05:44:37.271734  7168 registrar.cpp:500] Registrar aborting: Failed to update 'registry': Failed to perform store within 5secs
      F0515 05:44:37.271728  7170 master.cpp:2341] Failed to readmit slave 20140416-015639-1890854154-5050-1354-24133 at slave(1)@10.34.119.131:5051 (smf1-aep-31-sr4.prod.twitter.com): Failed to update 'registry': Failed to perform store within 5secs
      *** Check failure stack trace: ***
      F0515 05:44:37.272384 7168 owned.hpp:103] Check failed: data->t != NULL This owned pointer has already been shared
      *** Check failure stack trace: ***
          @     0x7f687d06e2ad  google::LogMessage::Fail()
          @     0x7f687d06e2ad  google::LogMessage::Fail()
          @     0x7f687d0700f4  google::LogMessage::SendToLog()
          @     0x7f687d0700f4  google::LogMessage::SendToLog()
          @     0x7f687d06de9c  google::LogMessage::Flush()
          @     0x7f687d06de9c  google::LogMessage::Flush()
          @     0x7f687d0709e9  google::LogMessageFatal::~LogMessageFatal()
          @     0x7f687d0709e9  google::LogMessageFatal::~LogMessageFatal()
          @     0x7f687cc46182  process::Owned<>::get()
          @     0x7f687cbdaa41  mesos::internal::master::Master::_reregisterSlave()
          @     0x7f687cc46209  process::Owned<>::operator->()
          @     0x7f687cbe987a  _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master6MasterERKNS5_9SlaveInfoERKNS0_4UPIDERKSt6vectorINS5_12ExecutorInfoESaISG_EERKSF_INS6_4TaskESaISL_EERKSF_INS6_17Archive_FrameworkESaISQ_EERKNS0_6FutureIbEES9_SC_SI_SN_SS_SW_EEvRKNS0_3PIDIT_EEMS10_FvT0_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_T11_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
          @     0x7f687cc39e05  mesos::internal::master::fail()
          @     0x7f687cfa3c72  process::ProcessManager::resume()
          @     0x7f687cc39f97  mesos::internal::master::RegistrarProcess::abort()
          @     0x7f687cc3d77f  mesos::internal::master::RegistrarProcess::_update()
          @     0x7f687cfa3f6c  process::schedule()
          @     0x7f687c47883d  start_thread
          @     0x7f687cc47b27  _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master16RegistrarProcessERKNS0_6FutureI6OptionINS6_5state8protobuf8VariableINS6_8RegistryEEEEEESt5dequeINS0_5OwnedINS7_9OperationEEESaISN_EESH_SP_EEvRKNS0_3PIDIT_EEMSR_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
          @     0x7f687b1e026d  clone
      

      jieyu pointed out the following problematic code:

      // Helper for failing a deque of operations.
      void fail(deque<Owned<Operation> >* operations, const string& message)
      {
        while (!operations->empty()) {
          const Owned<Operation>& operation = operations->front(); // This reference becomes invalid!
          operations->pop_front();
      
          operation->fail(message);
        }
      }
      

      Attachments

        Activity

          People

            bmahler Benjamin Mahler
            bmahler Benjamin Mahler
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: